[
https://issues.apache.org/jira/browse/CALCITE-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CALCITE-7635:
------------------------------------
Labels: pull-request-available (was: )
> Simplification result of conjunction of comparisons depends on terms order
> --------------------------------------------------------------------------
>
> Key: CALCITE-7635
> URL: https://issues.apache.org/jira/browse/CALCITE-7635
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.42.0
> Reporter: Ruben Q L
> Assignee: Ruben Q L
> Priority: Major
> Labels: pull-request-available
>
> Calling RexSimplify with unknown=FALSE on this expression:
> {noformat}
> x < 10 AND x > 0 AND x < 20
> i.e.
> AND( <(x, 10), >(x, 0), <(x, 20) )
> {noformat}
> Should return {{x < 10 AND x > 0}}
> And this happens if we run the following test inside RexProgramTest
> (simplification result is returned on SEARCH format):
> {code:java}
> @Test void testSimplifyAndComparison() {
> checkSimplifyFilter(
> and(
> gt(vInt(), literal(0)),
> lt(vInt(), literal(10)),
> lt(vInt(), literal(20))),
> "SEARCH(?0.int0, Sarg[(0..10)])"); // runs OK
> }
> {code}
> However, if we alter the order of the terms, and run instead:
> {code:java}
> @Test void testSimplifyAndComparison() {
> checkSimplifyFilter(
> and(
> lt(vInt(), literal(10)),
> gt(vInt(), literal(0)),
> lt(vInt(), literal(20))),
> "SEARCH(?0.int0, Sarg[(0..10)])"); // fails
> }
> {code}
> If fails with:
> {noformat}
> Expected: with toString() is "SEARCH(?0.int0, Sarg[(0..10)])"
> but: toString() was "SEARCH(?0.int0, Sarg[(0..10); NULL AS FALSE])"
> {noformat}
> The problem is that, in the second case, the simplification result (before
> being converted into a SEARCH) is not {{x < 10 AND x > 0}} , but {{x < 10 AND
> x > 0 AND X IS NOT NULL}}; this extra IS NOT NULL turns the final SEARCH
> expression into "NULL AS FALSE" (instead of the default "NULL AS UNKNOWN"
> that we got on the first case). This "NULL AS FALSE" will make that, if this
> SEARCH expression gets later expanded, we'll get again {{x < 10 AND x > 0 AND
> X IS NOT NULL}}; instead of the expected {{x < 10 AND x > 0}};
> The problem originates on {{RexSimplify#simplifyAnd}}, with unknownAs ==
> FALSE and predicateElimination (be default true), we get into
> {{simplifyAndTerms}}:
> {code:java}
> RexNode simplifyAnd(RexCall e, RexUnknownAs unknownAs) {
> List<RexNode> operands = RelOptUtil.conjunctions(e);
> if (unknownAs == FALSE && predicateElimination) {
> simplifyAndTerms(operands, FALSE);
> }
> ...
> {code}
> Inside this method, each term is simplified using the previous terms as
> "predicates" into the RexSimplify:
> {code:java}
> private void simplifyAndTerms(List<RexNode> terms, RexUnknownAs unknownAs) {
> RexSimplify simplify = this;
> for (int i = 0; i < terms.size(); i++) {
> RexNode t = terms.get(i);
> if (Predicate.of(t) == null) {
> continue;
> }
> terms.set(i, simplify.simplify(t, unknownAs)); // simplifies using
> previous terms as predicates
> RelOptPredicateList newPredicates =
> simplify.predicates.union(rexBuilder,
> RelOptPredicateList.of(rexBuilder, terms.subList(i, i + 1)));
> simplify = simplify.withPredicates(newPredicates);
> }
> ...
> {code}
> When simplifying each term (using the previous ones as predicates), since
> they are comparisons, we'll get inside RexSimplify#simplifyComparison, which
> calls at the end {{simplifyUsingPredicates}}:
> {code:java}
> private <C extends Comparable<C>> RexNode simplifyComparison(RexCall e,
> RexUnknownAs unknownAs, Class<C> clazz) {
> ...
> return simplifyUsingPredicates(e2, clazz);
> }
> {code}
> And {{simplifyUsingPredicates}} calls {{residue}} method, which can return
> {{RangeSets.rangeSetAll()}} (which results in returning IS NOT NULL, because
> "nullability might be problematic"), but only depending on the order in which
> the terms have been processed:
> {code:java}
> private <C extends Comparable<C>> RexNode simplifyUsingPredicates(RexNode e,
> Class<C> clazz) {
> ...
> final C v0 = comparison.literal.getValueAs(clazz);
> ...
> final RangeSet<C> rangeSet = rangeSet(comparison.kind, v0);
> final RangeSet<C> rangeSet2 =
> residue(comparison.ref, rangeSet, predicates.pulledUpPredicates,
> clazz);
> if (rangeSet2.isEmpty()) {
> // Term is impossible to satisfy given these predicates
> return rexBuilder.makeLiteral(false);
> } else if (rangeSet2.equals(rangeSet)) {
> // no change
> return e;
> } else if (rangeSet2.equals(RangeSets.rangeSetAll())) {
> // Range is always satisfied given these predicates; but nullability
> might
> // be problematic
> return simplify(
> rexBuilder.makeCall(RexUtil.getPos(e),
> SqlStdOperatorTable.IS_NOT_NULL, comparison.ref),
> RexUnknownAs.UNKNOWN); // [1]
> ...
> } else {
> // range has been reduced but it's not worth simplifying
> return e; // [2]
> }
> {code}
> For example:
> - Processing x<10, x>0, x<20; will return in X is NOT NULL (via [1]) when
> x<20 is processed here
> - Processing x<10, x<20, x>0; will return in X is NOT NULL (via [1]) when
> x<20 is processed here
> - Processing x>0, x<10, x<20; will return x<20 (via [2]) when x<20 is
> processed here
> - Processing x>0, x<20, x<10; will return x<20 (via [2]) when x<20 is
> processed here (and x<10 when x<10 is processed)
> - Processing x<20, x>0, x<10; will return x<20 (via [2]) when x<20 is
> processed here (and x<10 when x<10 is processed)
> Basically, the problem is: when the expression x<20 (i.e. range (-INF, 20))
> with the list of predicates [x<10, x>0] i.e. [(-INF, 10) , (0, INF)], is
> processed and, as in this example, the list of previous predicates contains
> as first item a predicate enclosed by it [3], in this case x<10 (i.e.
> range(-INF, 10)), {{residue}} will consider intermediate result as
> {{RangeSets.rangeSetAll()}} i.e. (-INF, +INF), which will enclose any other
> predicate on subsequent iterations, so this will be the result to be returned:
> {code:java}
> private static <C extends Comparable<C>> RangeSet<C> residue(RexNode ref,
> RangeSet<C> r0, List<RexNode> predicates, Class<C> clazz) {
> RangeSet<C> result = r0;
> for (RexNode predicate : predicates) {
> ...
> final RexCall call = (RexCall) predicate;
> final Comparison comparison = Comparison.of(call);
> if (comparison != null && comparison.ref.equals(ref)) {
> final C c1 = comparison.literal.getValueAs(clazz);
> ...
> final Range<C> r1 = range(comparison.kind, c1);
> if (result.encloses(r1)) {
> // Given these predicates, term is always satisfied.
> // e.g. r0 is "$0 < 10", r1 is "$0 < 5"
> result = RangeSets.rangeSetAll(); [3]
> continue;
> }
> result = result.subRangeSet(r1); [4]
> ...
> }
> return result;
> }
> {code}
> Notice that, in case we process x<20 (i.e. range (-INF, 20)), but the list of
> predicates is not [(-INF, 10), (0, INF)] , but instead the other way around
> [(0, INF), (-INF, 10)]; then result will not be (-INF, +INF), but (0,10),
> since it will be accumulated as (0, 20) via [4] on the first iteration (when
> processing (-INF, 20) with first predicate (0, INF)); and then (0, 10) on the
> second iteration, again via [4] when processing this intermediate result (0,
> 20) with the second predicate (-INF, 10).
> To sum up, the fact that RexSimplify#residue result can differ depending on
> the order of its list of predicates parameter, leads to a global different
> result on the initial RexSimplify#simplify call.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)