[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Solimando resolved HIVE-26722. ----------------------------------------- Resolution: Fixed > HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands > ------------------------------------------------------------------ > > Key: HIVE-26722 > URL: https://issues.apache.org/jira/browse/HIVE-26722 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 4.0.0-alpha-1 > Reporter: Alessandro Solimando > Assignee: Alessandro Solimando > Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > h1. Reproducer > Consider the following query: > {code:java} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > CREATE EXTERNAL TABLE t (a string, b string); > INSERT INTO t VALUES ('1000', 'b1'); > INSERT INTO t VALUES ('2000', 'b2'); > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000;EXPLAIN CBO > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000; {code} > The expected result is: > {code:java} > 1000 b1 > 1000 NULL{code} > An example of correct plan is as follows: > {noformat} > CBO PLAN: > HiveUnion(all=[true]) > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) > HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET > "UTF-16LE"]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} > > Consider now a scenario where expression reduction in projections is disabled > by setting the following property{_}:{_} > {noformat} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > {noformat} > In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, > and we get the following (invalid) result: > {code:java} > 1000 b1{code} > produced by the following invalid plan: > {code:java} > CBO PLAN: > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) {code} > h1. Problem Analysis > At > [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] > the _RelMetadataQuery::getPulledUpPredicates_ method infers the following > predicate due to the CAST(NULL) in the projection: > {code:java} > (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} > When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is > inferred. > In > [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], > the rule checks if the conjunction of the predicate coming from the filter > (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or > not, under the _UnknownAsFalse_ semantics. > To summarize, the following expression is simplified under the > _UnknownAsFalse_ semantics: > {code:java} > AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), > =(CAST($0):DOUBLE, 1000)) > {code} > Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to > {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, > =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and > the UNION ALL operand is pruned. > Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the > issue, due to the _IS_NULL($1)_ inferred predicate, see > [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] > for understanding how the NULL literal is treated differently during > predicate inference. > _HiveRelMdPredicates_ should not use equality ('=') for nullable constant > expressions, but rather IS NOT DISTINCT FROM, as detailed in HIVE-26733, but > nonetheless the way simplification is done is not correct here, inferred > predicates should be used as "context", rather than been used in a > conjunctive expression, this usage does not conform with any of the similar > uses of simplification with inferred predicates (see the bottom of the > "Solution" section for examples and details). > h1. Solution > In order to correctly simplify a predicate and test if it's always false or > not, we should build RexSimplify with _predicates_ as the list of predicates > known to hold in the context. In this way, the different semantics are > correctly taken into account. > The code at > [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] > should be replaced by the following: > {code:java} > final RexExecutor executor = > Util.first(filterRel.getCluster().getPlanner().getExecutor(), > RexUtil.EXECUTOR); > final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, > executor); > final RexNode x = simplify.simplifyUnknownAs(newCondition, > RexUnknownAs.FALSE);{code} > This is in line with other uses of simplification, like in Calcite: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L704-L705] > You can see the above method used for different kind of rels, always in the > way identical to what this ticket advocates:filters: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L152-L155] > projects: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L306-L310] > joins: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L380-L383] > windows: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L567-L576] -- This message was sent by Atlassian Jira (v8.20.10#820010)