[ 
https://issues.apache.org/jira/browse/HIVE-25758?focusedWorklogId=761845&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-761845
 ]

ASF GitHub Bot logged work on HIVE-25758:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Apr/22 15:16
            Start Date: 25/Apr/22 15:16
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on code in PR #2966:
URL: https://github.com/apache/hive/pull/2966#discussion_r857739177


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java:
##########
@@ -1257,15 +1249,26 @@ public Void visitCall(RexCall call) {
     }
 
     public boolean hasDisjunction(RexNode node) {
-      // clear the state
-      inNegation = false;
-      hasDisjunction = false;
-
       node.accept(this);
       return hasDisjunction;
     }
   }
 
+  /**
+   * Find disjunction (OR) in an expression (at any level of nesting).

Review Comment:
   `Find` implies that we are going to return a disjunctive expression. 
Suggestion:
   `Returns whether the expression has disjunctions (OR) at any level of 
nesting.`



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java:
##########
@@ -1257,15 +1249,26 @@ public Void visitCall(RexCall call) {
     }
 
     public boolean hasDisjunction(RexNode node) {
-      // clear the state
-      inNegation = false;
-      hasDisjunction = false;
-
       node.accept(this);
       return hasDisjunction;
     }
   }
 
+  /**
+   * Find disjunction (OR) in an expression (at any level of nesting).
+   *
+   * Example 1: OR(=($0, $1), IS NOT NULL($2))):INTEGER (OR in the top-level 
expression)
+   * Example 2: NOT(AND(=($0, $1), IS NOT NULL($2))
+   *   this is equivalent to OR((<>($0, $1), IS NULL($2))
+   * Example 3: AND(OR(=($0, $1), IS NOT NULL($2)))) (OR in inner expression)
+   *
+   * @param node the expression where to look for disjunctions.
+   * @return true if the given expressions contains a disjunction, false 
otherwise.
+   */
+  public static boolean hasDisjuction(RexNode node) {
+    return new DisjunctivePredicatesFinder().hasDisjunction(node);

Review Comment:
   nit to get rid of the `DisjunctivePredicateFinder#hasDisjunction` method:
   ```
   DisjunctivePredicateFinder finder = new DisjunctivePredicateFinder();
   node.accept(finder);
   return finder.hasDisjunction;
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java:
##########
@@ -1214,6 +1214,58 @@ public FixNullabilityShuttle(RexBuilder rexBuilder,
     }
   }
 
+  /**
+   * Find disjunction (OR) in an expression (at any level of nesting).
+   *
+   * Example 1: OR(=($0, $1), IS NOT NULL($2))):INTEGER (OR in the top-level 
expression)
+   * Example 2: NOT(AND(=($0, $1), IS NOT NULL($2))
+   *   this is equivalent to OR((<>($0, $1), IS NULL($2))
+   * Example 3: AND(OR(=($0, $1), IS NOT NULL($2)))) (OR in inner expression)
+   */

Review Comment:
   I had in mind compiling with `-Pjavadoc` profile and checking for new errors 
in this class. Actually I am afraid of `<>` symbols as well as `$`. Don't 
remember if it is fine to use them like that.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 761845)
    Time Spent: 3h 20m  (was: 3h 10m)

> OOM due to recursive application of CBO rules
> ---------------------------------------------
>
>                 Key: HIVE-25758
>                 URL: https://issues.apache.org/jira/browse/HIVE-25758
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, Query Planning
>    Affects Versions: 4.0.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
>  
> Reproducing query is as follows:
> {code:java}
> create table test1 (act_nbr string);
> create table test2 (month int);
> create table test3 (mth int, con_usd double);
> EXPLAIN
>    SELECT c.month,
>           d.con_usd
>    FROM
>      (SELECT 
> cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(), 
> 'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) AS month
>       FROM test1
>       UNION ALL
>       SELECT month
>       FROM test2
>       WHERE month = 202110) c
>    JOIN test3 d ON c.month = d.mth; {code}
>  
> Different plans are generated during the first CBO steps, last being:
> {noformat}
> 2021-12-01T08:28:08,598 DEBUG [a18191bb-3a2b-4193-9abf-4e37dd1996bb main] 
> parse.CalcitePlanner: Plan after decorre
> lation:
> HiveProject(month=[$0], con_usd=[$2])
>   HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], 
> cost=[not available])
>     HiveProject(month=[$0])
>       HiveUnion(all=[true])
>         
> HiveProject(month=[CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
>  _UTF-16LE'yyyy-MM-d
> d':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), 
> _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-
> 16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER])
>           HiveTableScan(table=[[default, test1]], table:alias=[test1])
>         HiveProject(month=[$0])
>           HiveFilter(condition=[=($0, CAST(202110):INTEGER)])
>             HiveTableScan(table=[[default, test2]], table:alias=[test2])
>     HiveTableScan(table=[[default, test3]], table:alias=[d]){noformat}
>  
> Then, the HEP planner will keep expanding the filter expression with 
> redundant expressions, such as the following, where the identical CAST 
> expression is present multiple times:
>  
> {noformat}
> rel#118:HiveFilter.HIVE.[].any(input=HepRelVertex#39,condition=IN(CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
>  _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 
> 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
> _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, 
> CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, 
> _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 
> 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
> _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, 
> 202110)){noformat}
>  
> The problem seems to come from a bad interaction of at least 
> _HiveFilterProjectTransposeRule_ and 
> {_}HiveJoinPushTransitivePredicatesRule{_}, possibly more.
> Most probably then UNION part can be removed and the reproducer be simplified 
> even further.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to