[ 
https://issues.apache.org/jira/browse/CALCITE-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Q L updated CALCITE-7635:
-------------------------------
    Description: 
Calling RexSimplify with unknown=FALSE on this expression:
{noformat}
x < 10 AND x > 0 AND x < 20
i.e.
AND( <(x, 10), >(x, 0), <(x, 20) )
{noformat}
Should return {{x < 10 AND x > 0}}
And this happens if we run the following test inside RexProgramTest 
(simplification result is returned on SEARCH format):
{code:java}
  @Test void testSimplifyAndComparison() {
    checkSimplifyFilter(
        and(
            gt(vInt(), literal(0)),
            lt(vInt(), literal(10)),
            lt(vInt(), literal(20))),
        "SEARCH(?0.int0, Sarg[(0..10)])"); // runs OK
  }
{code}
However, if we alter the order of the terms, and run instead:
{code:java}
  @Test void testSimplifyAndComparison() {
    checkSimplifyFilter(
        and(
            lt(vInt(), literal(10)),
            gt(vInt(), literal(0)),
            lt(vInt(), literal(20))),
        "SEARCH(?0.int0, Sarg[(0..10)])"); // fails
  }
{code}
If fails with:
{noformat}
Expected: with toString() is "SEARCH(?0.int0, Sarg[(0..10)])"
     but: toString() was "SEARCH(?0.int0, Sarg[(0..10); NULL AS FALSE])"
{noformat}

The problem is that, in the second case, the simplification result (before 
being converted into a SEARCH) is not {{x < 10 AND x > 0}} , but {{x < 10 AND x 
> 0 AND X IS NOT NULL}}; this extra IS NOT NULL turns the final SEARCH 
expression into "NULL AS FALSE" (instead of the default "NULL AS UNKNOWN" that 
we got on the first case). This "NULL AS FALSE" will make that, if this SEARCH 
expression gets later expanded, we'll get again {{x < 10 AND x > 0 AND X IS NOT 
NULL}}; instead of the expected {{x < 10 AND x > 0}};

The problem originates on {{RexSimplify#simplifyAnd}}, with unknownAs == FALSE 
and predicateElimination (be default true), we get into {{simplifyAndTerms}}:
{code:java}
RexNode simplifyAnd(RexCall e, RexUnknownAs unknownAs) {
    List<RexNode> operands = RelOptUtil.conjunctions(e);
    if (unknownAs == FALSE && predicateElimination) {
      simplifyAndTerms(operands, FALSE);
    }
    ...
{code}

Inside this method, each term is simplified using the previous terms as 
"predicates" into the RexSimplify:
{code:java}
private void simplifyAndTerms(List<RexNode> terms, RexUnknownAs unknownAs) {
    RexSimplify simplify = this;
    for (int i = 0; i < terms.size(); i++) {
      RexNode t = terms.get(i);
      if (Predicate.of(t) == null) {
        continue;
      }
      terms.set(i, simplify.simplify(t, unknownAs)); // simplifies using 
previous terms as predicates
      RelOptPredicateList newPredicates =
          simplify.predicates.union(rexBuilder,
              RelOptPredicateList.of(rexBuilder, terms.subList(i, i + 1)));
      simplify = simplify.withPredicates(newPredicates);
    }
    ...
{code}

When simplifying each term (using the previous ones as predicates), since they 
are comparisons, we'll get inside RexSimplify#simplifyComparison, which calls 
at the end {{simplifyUsingPredicates}}:
{code:java}
private <C extends Comparable<C>> RexNode simplifyComparison(RexCall e,
      RexUnknownAs unknownAs, Class<C> clazz) {
    ...
    return simplifyUsingPredicates(e2, clazz);
  }
{code}

And {{simplifyUsingPredicates}} calls {{residue}} method, which can return 
{{RangeSets.rangeSetAll()}} (which results in returning IS NOT NULL, because 
"nullability might be problematic"), but only depending on the order in which 
the terms have been processed:
{code:java}
  private <C extends Comparable<C>> RexNode simplifyUsingPredicates(RexNode e,
      Class<C> clazz) {
    ...
    final C v0 = comparison.literal.getValueAs(clazz);
    ...
    final RangeSet<C> rangeSet = rangeSet(comparison.kind, v0);
    final RangeSet<C> rangeSet2 =
        residue(comparison.ref, rangeSet, predicates.pulledUpPredicates,
            clazz);
    if (rangeSet2.isEmpty()) {
      // Term is impossible to satisfy given these predicates
      return rexBuilder.makeLiteral(false);
    } else if (rangeSet2.equals(rangeSet)) {
      // no change
      return e;
    } else if (rangeSet2.equals(RangeSets.rangeSetAll())) {
      // Range is always satisfied given these predicates; but nullability might
      // be problematic
      return simplify(
          rexBuilder.makeCall(RexUtil.getPos(e), 
SqlStdOperatorTable.IS_NOT_NULL, comparison.ref),
          RexUnknownAs.UNKNOWN); // [1]
    ...
    } else {
      // range has been reduced but it's not worth simplifying
      return e; // [2]
    }
{code}

For example: 
- Processing x<10, x>0, x<20; will return in X is NOT NULL (via [1]) when x<20 
is processed here
- Processing x<10, x<20, x>0; will return in X is NOT NULL (via [1]) when x<20 
is processed here
- Processing x>0, x<10, x<20; will return x<20 (via [2]) when x<20 is processed 
here 
- Processing x>0, x<20, x<10; will return x<20 (via [2]) when x<20 is processed 
here (and x<10 when x<10 is processed) 
- Processing x<20, x>0, x<10; will return x<20 (via [2]) when x<20 is processed 
here (and x<10 when x<10 is processed)

Basically, the problem is: when the expression x<20 (i.e. range (-INF, 20)) 
with the list of predicates [x<10, x>0] i.e. [(-INF, 10) , (0, INF)], is 
processed and, as in this example, the list of previous predicates contains as 
first item a predicate enclosed by it [3], in this case x<10 (i.e. range(-INF, 
10)), {{residue}} will consider intermediate result as 
{{RangeSets.rangeSetAll()}} i.e. (-INF, +INF), which will enclose any other 
predicate on subsequent iterations, so this will be the result to be returned:
{code:java}
private static <C extends Comparable<C>> RangeSet<C> residue(RexNode ref,
      RangeSet<C> r0, List<RexNode> predicates, Class<C> clazz) {
    RangeSet<C> result = r0;
    for (RexNode predicate : predicates) {
      ...
        final RexCall call = (RexCall) predicate;
        final Comparison comparison = Comparison.of(call);
        if (comparison != null && comparison.ref.equals(ref)) {
          final C c1 = comparison.literal.getValueAs(clazz);
          ...
            final Range<C> r1 = range(comparison.kind, c1);
            if (result.encloses(r1)) {
              // Given these predicates, term is always satisfied.
              // e.g. r0 is "$0 < 10", r1 is "$0 < 5"
              result = RangeSets.rangeSetAll(); [3]
              continue;
            }
            result = result.subRangeSet(r1); [4]
      ...
    }
    return result;
  }
{code}

Notice that, in case we process x<20 (i.e. range (-INF, 20)), but the list of 
predicates is not [(-INF, 10), (0, INF)] , but instead the other way around 
[(0, INF), (-INF, 10)]; then result will not be (-INF, +INF), but (0,10), since 
it will be accumulated as (0, 20) via [4] on the first iteration (when 
processing (-INF, 20) with first predicate (0, INF)); and then (0, 10) on the 
second iteration, again via [4] when processing this intermediate result (0, 
20) with the second predicate (-INF, 10).

To sum up, the fact that RexSimplify#residue result can differ depending on the 
order of its list of predicates parameter, leads to a global different result 
on the initial RexSimplify#simplify call.



  was:
Calling RexSimplify with unknown=FALSE on this expression:
{noformat}
x < 10 AND x > 0 AND x < 20
i.e.
AND( <(x, 10), >(x, 0), <(x, 20) )
{noformat}
Should return {{x < 10 AND x > 0}}
And this happens if we run the following test inside RexProgramTest 
(simplification result is returned on SEARCH format):
{code:java}
  @Test void testSimplifyAndComparison() {
    checkSimplifyFilter(
        and(
            gt(vInt(), literal(0)),
            lt(vInt(), literal(10)),
            lt(vInt(), literal(20))),
        "SEARCH(?0.int0, Sarg[(0..10)])"); // runs OK
  }
{code}
However, if we alter the order of the terms, and run instead:
{code:java}
  @Test void testSimplifyAndComparison() {
    checkSimplifyFilter(
        and(
            lt(vInt(), literal(10)),
            gt(vInt(), literal(0)),
            lt(vInt(), literal(20))),
        "SEARCH(?0.int0, Sarg[(0..10)])"); // fails
  }
{code}
If fails with:
{noformat}
Expected: with toString() is "SEARCH(?0.int0, Sarg[(0..10)])"
     but: toString() was "SEARCH(?0.int0, Sarg[(0..10); NULL AS FALSE])"
{noformat}

The problem is that, in the second case, the simplification result (before 
being converted into a SEARCH) is not {{x < 10 AND x > 0}} , but {{x < 10 AND x 
> 0 AND X IS NOT NULL}}; this extra IS NOT NULL turns the final SEARCH 
expression into "NULL AS FALSE" (instead of the default "NULL AS UNKNOWN" that 
we got on the first case). This "NULL AS FALSE" will make that, if this SEARCH 
expression gets later expanded, we'll get again {{x < 10 AND x > 0 AND X IS NOT 
NULL}}; instead of the expected {{x < 10 AND x > 0}};

The problem originates on {{RexSimplify#simplifyAnd}}, with unknownAs == FALSE 
and predicateElimination (be default true), we get into {{simplifyAndTerms}}:
{code:java}
RexNode simplifyAnd(RexCall e, RexUnknownAs unknownAs) {
    List<RexNode> operands = RelOptUtil.conjunctions(e);
    if (unknownAs == FALSE && predicateElimination) {
      simplifyAndTerms(operands, FALSE);
    }
    ...
{code}

Inside this method, each term is simplified using the previous terms as 
"predicates" into the RexSimplify:
{code:java}
private void simplifyAndTerms(List<RexNode> terms, RexUnknownAs unknownAs) {
    RexSimplify simplify = this;
    for (int i = 0; i < terms.size(); i++) {
      RexNode t = terms.get(i);
      if (Predicate.of(t) == null) {
        continue;
      }
      terms.set(i, simplify.simplify(t, unknownAs)); // simplifies using 
previous terms as predicates
      RelOptPredicateList newPredicates =
          simplify.predicates.union(rexBuilder,
              RelOptPredicateList.of(rexBuilder, terms.subList(i, i + 1)));
      simplify = simplify.withPredicates(newPredicates);
    }
    ...
{code}

When simplifying each term (using the previous ones as predicates), since they 
are comparisons, we'll get inside RexSimplify#simplifyComparison, which calls 
at the end {{simplifyUsingPredicates}}:
{code:java}
private <C extends Comparable<C>> RexNode simplifyComparison(RexCall e,
      RexUnknownAs unknownAs, Class<C> clazz) {
    ...
    return simplifyUsingPredicates(e2, clazz);
  }
{code}

And {{simplifyUsingPredicates}} calls {{residue}} method, which can return 
{{RangeSets.rangeSetAll()}} (which results in returning IS NOT NULL, because 
"nullability might be problematic"), but only depending on the order in which 
the terms have been processed:
{code:java}
  private <C extends Comparable<C>> RexNode simplifyUsingPredicates(RexNode e,
      Class<C> clazz) {
    ...
    final C v0 = comparison.literal.getValueAs(clazz);
    ...
    final RangeSet<C> rangeSet = rangeSet(comparison.kind, v0);
    final RangeSet<C> rangeSet2 =
        residue(comparison.ref, rangeSet, predicates.pulledUpPredicates,
            clazz);
    if (rangeSet2.isEmpty()) {
      // Term is impossible to satisfy given these predicates
      return rexBuilder.makeLiteral(false);
    } else if (rangeSet2.equals(rangeSet)) {
      // no change
      return e;
    } else if (rangeSet2.equals(RangeSets.rangeSetAll())) {
      // Range is always satisfied given these predicates; but nullability might
      // be problematic
      return simplify(
          rexBuilder.makeCall(RexUtil.getPos(e), 
SqlStdOperatorTable.IS_NOT_NULL, comparison.ref),
          RexUnknownAs.UNKNOWN); // [1]
    ...
    } else {
      // range has been reduced but it's not worth simplifying
      return e; // [2]
    }
{code}

For example: 
- Processing x<10, x>0, x<20; will return in X is NOT NULL (via [1]) when x<20 
is processed here
- Processing x<10, x<20, x>0; will return in X is NOT NULL (via [1]) when x<20 
is processed here
- Processing x>0, x<10, x<20; will return x<20 (via [2]) when x<20 is processed 
here 
- Processing x>0, x<20, x<10; will return x<20 (via [2]) when x<20 is processed 
here (and x<10 when x<10 is processed) 
- Processing x<20, x>0, x<10; will return x<20 (via [2]) when x<20 is processed 
here (and x<10 when x<10 is processed)

Basically, the problem is: when the expression x<20 (i.e. range (-INF, 20)) 
with the list of predicates [x<10, x>0] i.e. [(-INF, 10) , (0, INF)], is 
processed and, as in this example, the list of previous predicates contains as 
first item a predicate enclosed by it [3], in this case x<10 (i.e. range(-INF, 
10)), {{residue}} will consider intermediate result as 
{{RangeSets.rangeSetAll()}} i.e. (-INF, +INF), which will enclose any other 
predicate on subsequent iterations, so this will be the result to be returned:
{code:java}
private static <C extends Comparable<C>> RangeSet<C> residue(RexNode ref,
      RangeSet<C> r0, List<RexNode> predicates, Class<C> clazz) {
    RangeSet<C> result = r0;
    for (RexNode predicate : predicates) {
      ...
        final RexCall call = (RexCall) predicate;
        final Comparison comparison = Comparison.of(call);
        if (comparison != null && comparison.ref.equals(ref)) {
          final C c1 = comparison.literal.getValueAs(clazz);
          ...
            final Range<C> r1 = range(comparison.kind, c1);
            if (result.encloses(r1)) {
              // Given these predicates, term is always satisfied.
              // e.g. r0 is "$0 < 10", r1 is "$0 < 5"
              result = RangeSets.rangeSetAll(); [3]
              continue;
            }
            result = result.subRangeSet(r1); [4]
      ...
    }
    return result;
  }
{code}

Notice that, in case we process x<20 (i.e. range (-INF, 20)), but the list of 
predicates is not [(-INF, 10), (0, INF)] , but instead the other way around 
[(0, INF), (-INF, 10)]; then result will not be (-INF, +INF), but (0,10), since 
it will be accumulated as (0, 20) via [4] on the first iteration (when 
processing (-INF, 20) with first predicate (0, INF)); and then (0, 10) on the 
second iteration, again via [4] when processing this intermediate result (0, 
20) with the second predicate (-INF, 10).

To sum up, the fact that RexSimplify#residue result can differ depending on the 
order of its list of predicates parameter, results in a total different result 
on the initial RexSimplify#simplify call.




> Simplification result of conjunction of comparisons depends on terms order
> --------------------------------------------------------------------------
>
>                 Key: CALCITE-7635
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7635
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.42.0
>            Reporter: Ruben Q L
>            Priority: Major
>
> Calling RexSimplify with unknown=FALSE on this expression:
> {noformat}
> x < 10 AND x > 0 AND x < 20
> i.e.
> AND( <(x, 10), >(x, 0), <(x, 20) )
> {noformat}
> Should return {{x < 10 AND x > 0}}
> And this happens if we run the following test inside RexProgramTest 
> (simplification result is returned on SEARCH format):
> {code:java}
>   @Test void testSimplifyAndComparison() {
>     checkSimplifyFilter(
>         and(
>             gt(vInt(), literal(0)),
>             lt(vInt(), literal(10)),
>             lt(vInt(), literal(20))),
>         "SEARCH(?0.int0, Sarg[(0..10)])"); // runs OK
>   }
> {code}
> However, if we alter the order of the terms, and run instead:
> {code:java}
>   @Test void testSimplifyAndComparison() {
>     checkSimplifyFilter(
>         and(
>             lt(vInt(), literal(10)),
>             gt(vInt(), literal(0)),
>             lt(vInt(), literal(20))),
>         "SEARCH(?0.int0, Sarg[(0..10)])"); // fails
>   }
> {code}
> If fails with:
> {noformat}
> Expected: with toString() is "SEARCH(?0.int0, Sarg[(0..10)])"
>      but: toString() was "SEARCH(?0.int0, Sarg[(0..10); NULL AS FALSE])"
> {noformat}
> The problem is that, in the second case, the simplification result (before 
> being converted into a SEARCH) is not {{x < 10 AND x > 0}} , but {{x < 10 AND 
> x > 0 AND X IS NOT NULL}}; this extra IS NOT NULL turns the final SEARCH 
> expression into "NULL AS FALSE" (instead of the default "NULL AS UNKNOWN" 
> that we got on the first case). This "NULL AS FALSE" will make that, if this 
> SEARCH expression gets later expanded, we'll get again {{x < 10 AND x > 0 AND 
> X IS NOT NULL}}; instead of the expected {{x < 10 AND x > 0}};
> The problem originates on {{RexSimplify#simplifyAnd}}, with unknownAs == 
> FALSE and predicateElimination (be default true), we get into 
> {{simplifyAndTerms}}:
> {code:java}
> RexNode simplifyAnd(RexCall e, RexUnknownAs unknownAs) {
>     List<RexNode> operands = RelOptUtil.conjunctions(e);
>     if (unknownAs == FALSE && predicateElimination) {
>       simplifyAndTerms(operands, FALSE);
>     }
>     ...
> {code}
> Inside this method, each term is simplified using the previous terms as 
> "predicates" into the RexSimplify:
> {code:java}
> private void simplifyAndTerms(List<RexNode> terms, RexUnknownAs unknownAs) {
>     RexSimplify simplify = this;
>     for (int i = 0; i < terms.size(); i++) {
>       RexNode t = terms.get(i);
>       if (Predicate.of(t) == null) {
>         continue;
>       }
>       terms.set(i, simplify.simplify(t, unknownAs)); // simplifies using 
> previous terms as predicates
>       RelOptPredicateList newPredicates =
>           simplify.predicates.union(rexBuilder,
>               RelOptPredicateList.of(rexBuilder, terms.subList(i, i + 1)));
>       simplify = simplify.withPredicates(newPredicates);
>     }
>     ...
> {code}
> When simplifying each term (using the previous ones as predicates), since 
> they are comparisons, we'll get inside RexSimplify#simplifyComparison, which 
> calls at the end {{simplifyUsingPredicates}}:
> {code:java}
> private <C extends Comparable<C>> RexNode simplifyComparison(RexCall e,
>       RexUnknownAs unknownAs, Class<C> clazz) {
>     ...
>     return simplifyUsingPredicates(e2, clazz);
>   }
> {code}
> And {{simplifyUsingPredicates}} calls {{residue}} method, which can return 
> {{RangeSets.rangeSetAll()}} (which results in returning IS NOT NULL, because 
> "nullability might be problematic"), but only depending on the order in which 
> the terms have been processed:
> {code:java}
>   private <C extends Comparable<C>> RexNode simplifyUsingPredicates(RexNode e,
>       Class<C> clazz) {
>     ...
>     final C v0 = comparison.literal.getValueAs(clazz);
>     ...
>     final RangeSet<C> rangeSet = rangeSet(comparison.kind, v0);
>     final RangeSet<C> rangeSet2 =
>         residue(comparison.ref, rangeSet, predicates.pulledUpPredicates,
>             clazz);
>     if (rangeSet2.isEmpty()) {
>       // Term is impossible to satisfy given these predicates
>       return rexBuilder.makeLiteral(false);
>     } else if (rangeSet2.equals(rangeSet)) {
>       // no change
>       return e;
>     } else if (rangeSet2.equals(RangeSets.rangeSetAll())) {
>       // Range is always satisfied given these predicates; but nullability 
> might
>       // be problematic
>       return simplify(
>           rexBuilder.makeCall(RexUtil.getPos(e), 
> SqlStdOperatorTable.IS_NOT_NULL, comparison.ref),
>           RexUnknownAs.UNKNOWN); // [1]
>     ...
>     } else {
>       // range has been reduced but it's not worth simplifying
>       return e; // [2]
>     }
> {code}
> For example: 
> - Processing x<10, x>0, x<20; will return in X is NOT NULL (via [1]) when 
> x<20 is processed here
> - Processing x<10, x<20, x>0; will return in X is NOT NULL (via [1]) when 
> x<20 is processed here
> - Processing x>0, x<10, x<20; will return x<20 (via [2]) when x<20 is 
> processed here 
> - Processing x>0, x<20, x<10; will return x<20 (via [2]) when x<20 is 
> processed here (and x<10 when x<10 is processed) 
> - Processing x<20, x>0, x<10; will return x<20 (via [2]) when x<20 is 
> processed here (and x<10 when x<10 is processed)
> Basically, the problem is: when the expression x<20 (i.e. range (-INF, 20)) 
> with the list of predicates [x<10, x>0] i.e. [(-INF, 10) , (0, INF)], is 
> processed and, as in this example, the list of previous predicates contains 
> as first item a predicate enclosed by it [3], in this case x<10 (i.e. 
> range(-INF, 10)), {{residue}} will consider intermediate result as 
> {{RangeSets.rangeSetAll()}} i.e. (-INF, +INF), which will enclose any other 
> predicate on subsequent iterations, so this will be the result to be returned:
> {code:java}
> private static <C extends Comparable<C>> RangeSet<C> residue(RexNode ref,
>       RangeSet<C> r0, List<RexNode> predicates, Class<C> clazz) {
>     RangeSet<C> result = r0;
>     for (RexNode predicate : predicates) {
>       ...
>         final RexCall call = (RexCall) predicate;
>         final Comparison comparison = Comparison.of(call);
>         if (comparison != null && comparison.ref.equals(ref)) {
>           final C c1 = comparison.literal.getValueAs(clazz);
>           ...
>             final Range<C> r1 = range(comparison.kind, c1);
>             if (result.encloses(r1)) {
>               // Given these predicates, term is always satisfied.
>               // e.g. r0 is "$0 < 10", r1 is "$0 < 5"
>               result = RangeSets.rangeSetAll(); [3]
>               continue;
>             }
>             result = result.subRangeSet(r1); [4]
>       ...
>     }
>     return result;
>   }
> {code}
> Notice that, in case we process x<20 (i.e. range (-INF, 20)), but the list of 
> predicates is not [(-INF, 10), (0, INF)] , but instead the other way around 
> [(0, INF), (-INF, 10)]; then result will not be (-INF, +INF), but (0,10), 
> since it will be accumulated as (0, 20) via [4] on the first iteration (when 
> processing (-INF, 20) with first predicate (0, INF)); and then (0, 10) on the 
> second iteration, again via [4] when processing this intermediate result (0, 
> 20) with the second predicate (-INF, 10).
> To sum up, the fact that RexSimplify#residue result can differ depending on 
> the order of its list of predicates parameter, leads to a global different 
> result on the initial RexSimplify#simplify call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to