[ 
https://issues.apache.org/jira/browse/HIVE-22074?focusedWorklogId=290872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290872
 ]

ASF GitHub Bot logged work on HIVE-22074:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Aug/19 23:43
            Start Date: 07/Aug/19 23:43
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on pull request #746: HIVE-22074: 
Slow compilation due to IN to OR transformation
URL: https://github.com/apache/hive/pull/746#discussion_r311806267
 
 

 ##########
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
 ##########
 @@ -1220,16 +1220,26 @@ protected ExprNodeDesc 
getXpathOrFuncExprNodeDesc(ASTNode expr,
             }
             outputOpList.add(nullConst);
           }
+
           if (!ctx.isCBOExecuted()) {
-            ArrayList<ExprNodeDesc> orOperands = 
TypeCheckProcFactoryUtils.rewriteInToOR(children);
-            if (orOperands != null) {
-              if (orOperands.size() == 1) {
-                orOperands.add(new 
ExprNodeConstantDesc(TypeInfoFactory.booleanTypeInfo, false));
+
+            HiveConf conf;
+            try {
+              conf = Hive.get().getConf();
 
 Review comment:
   I think it is better to pass this value from the callers in the context. You 
would not need to change all callers; if value is not passed, rewriting could 
be skipped. I see mainly two advantages of doing this:
   1) if transformation is never happening, we will not be retrieving the conf 
and this value for every IN clause in a query (note that `isCBOExecuted` method 
is misleading, the value returned is `foldExpr` boolean which is `false` 
sometimes even for calls coming from CBO cf. first line in `genFilterRelNode` 
method in `CalcitePlanner`), and
   2) removing the static call to Hive object from within the folding logic.
   I see there are other calls to `Hive.get()` in the class, that information 
should probably be moved to context too.
   These can all be tackled together in a follow-up, but I think since we are 
cleaning up this logic, it would make sense to do it at some point.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 290872)
    Time Spent: 50m  (was: 40m)

> Slow compilation due to IN to OR transformation
> -----------------------------------------------
>
>                 Key: HIVE-22074
>                 URL: https://issues.apache.org/jira/browse/HIVE-22074
>             Project: Hive
>          Issue Type: Improvement
>          Components: Logical Optimizer
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22074.1.patch, HIVE-22074.2.patch, 
> HIVE-22074.3.patch, HIVE-22074.4.patch
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently Hive transform IN expressions to OR to apply various CBO rules. 
> This incur significant performance hit if IN consist of large number of 
> expressions. 
> It is better to not transform IN expressions to OR in such cases because 
> overall benefit of various optimizations/transformations is unrealized due to 
> the compilation overhead



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to