[DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Vladimir Sitnikov Sun, 29 Dec 2019 11:10:39 -0800

Hi,

We have a 1-year old issue with an idea to sort RexNode operands so they
are consistent.


For instance, "x=5" and "5=x" have the same semantics, so it would make
sense to stick to a single implementation.
A discussion can be found in
https://issues.apache.org/jira/browse/CALCITE-2450

We do not normalize RexNodes, thus it results in excessive planning time,
especially when the planner is trying to reorder joins.
For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are
different joins, however, they are equivalent.

The normalization does not seem to cost much, however, it enables me to
activate more rules (e.g. EnumerabeMergeRule),
so it is good as it enables to consider more sophisticated plans.

I see two approaches:
a) Normalize in RexNode constructor. This seems easy to implement, however,
there's a catch
if someone assumed that the order of operands would be the same as the one
that was passed to the constructor.
I don't think there are such assumptions in the wild, but there might be.
The javadoc for the relevant methods says nothing regarding the operand
order.
However, the good thing would be RexNode would feel the same in the
debugger and in its toString representation.

b) Normalize at RexCall#computeDigest only.
In other words, keep the operands unsorted, but make sure the digest is
created as if the operands were sorted.
This seems to be the most transparent change, however, it might surprise
that `toString` does not match to whatever is seen in the debugger.

In any case, making `RexCall#toString` print sorted representation would
alter lots of tests.
For :core it is like 5540 tests completed, 358 failed, 91 skipped :((

WDYT?

Hopefully, making the RexNode representation sorted would reduce the number
of `$1=$0` vs `$0=$1` plan diffs.

Vladimir

[DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Reply via email to