jcamachor commented on a change in pull request #952: HIVE-23006 ProbeDecode 
compiler support
URL: https://github.com/apache/hive/pull/952#discussion_r405765078
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
 ##########
 @@ -1482,18 +1490,131 @@ private void 
removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
         deque.addAll(op.getChildOperators());
       }
     }
+    //  No need to remove SJ branches when we have semi-join reduction or when 
semijoins are enabled for parallel mapjoins.
+    if 
(!procCtx.conf.getBoolVar(ConfVars.TEZ_DYNAMIC_SEMIJOIN_REDUCTION_FOR_MAPJOIN)) 
{
+      if (semijoins.size() > 0) {
+        for (Entry<ReduceSinkOperator, TableScanOperator> semiEntry : 
semijoins.entrySet()) {
+          SemiJoinBranchInfo sjInfo = 
procCtx.parseContext.getRsToSemiJoinBranchInfo().get(semiEntry.getKey());
+          if (sjInfo.getIsHint() || !sjInfo.getShouldRemove()) {
+            // Created by hint, skip it
+            continue;
+          }
+          if (LOG.isDebugEnabled()) {
+            LOG.debug("Semijoin optimization with parallel edge to map join. 
Removing semijoin " +
+                OperatorUtils.getOpNamePretty(semiEntry.getKey()) + " - " + 
OperatorUtils.getOpNamePretty(semiEntry.getValue()));
+          }
+          GenTezUtils.removeBranch(semiEntry.getKey());
+          GenTezUtils.removeSemiJoinOperator(procCtx.parseContext, 
semiEntry.getKey(), semiEntry.getValue());
+        }
+      }
+    }
+    if (procCtx.conf.getBoolVar(ConfVars.HIVE_OPTIMIZE_SCAN_PROBEDECODE)) {
+      if (probeDecodeMJoins.size() > 0) {
 
 Review comment:
   The path for `HIVE_OPTIMIZE_SCAN_PROBEDECODE` seems independent from SJ 
optimization. Should we add a mechanism to remove the context for the 
optimization when we think it is not going to be beneficial, e.g., it is not 
filtering any data? Or you think that the cost of checking is negligible and we 
should always apply this optimization? What do you experiments show in the 
worst case scenario? (In any case, this could be tackled in a follow-up but I 
wanted to ask)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to