kgyrtkirk commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r578602640
##########
File path: ql/src/test/results/clientpositive/llap/auto_join10.q.out
##########
@@ -57,6 +57,7 @@ STAGE PLANS:
TableScan
alias: src
filterExpr: key is not null (type: boolean)
+ probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_30_container,
bigKeyColName:key, smallTablePos:0, keyRatio:1.582
Review comment:
why is `keyRatio` above 1? shouldn't it mean the expected selectivity of
the operation?
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
##########
@@ -362,26 +362,26 @@ public static boolean isDeterministic(ExprNodeDesc desc) {
*/
public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
Operator<?> current, Operator<?> terminal) throws SemanticException {
- return backtrack(sources, current, terminal, false);
+ return backtrack(sources, current, terminal, false, false);
}
public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
- Operator<?> current, Operator<?> terminal, boolean foldExpr) throws
SemanticException {
- ArrayList<ExprNodeDesc> result = new ArrayList<ExprNodeDesc>();
+ Operator<?> current, Operator<?> terminal, boolean foldExpr, boolean
skipRSParent) throws SemanticException {
Review comment:
I think `skipRSParent` is a bit misleading ; you don't want to skip the
RS - you want to stay in the same vertex
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##########
@@ -1589,13 +1588,17 @@ private void
removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
List<ExprNodeDesc> keyDesc =
selectedMJOp.getConf().getKeys().get(posBigTable);
ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0);
- String realTSColName = OperatorUtils.findTableColNameOf(selectedMJOp,
keyCol.getColumn());
- if (realTSColName != null) {
+ ExprNodeColumnDesc originTSColExpr =
OperatorUtils.findTableOriginColExpr(keyCol, selectedMJOp, tsOp);
+ if (originTSColExpr == null) {
+ LOG.warn("ProbeDecode could not find origTSCol for mjCol: {} with MJ
Schema: {}",
Review comment:
current algorithm seems to be:
* select best mj candidate
* do some further processing - which may bail out
bailing out for the best candidate doesn't neccessarily mean that we will
still bail out for a less charming candidate - I think it might worth to try to
restructure the extra compilation into to for loop - or instead of selecting
the best candidate the first part could be implemented as a priority logic
just an idea for a followup
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
##########
@@ -120,7 +120,7 @@ public Object process(Node nd, Stack<Node> stack,
NodeProcessorCtx procCtx,
String outputColumnName = cSELOutputColumnNames.get(i);
ExprNodeDesc cSELExprNodeDesc = cSELColList.get(i);
ExprNodeDesc newPSELExprNodeDesc =
- ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL,
true);
+ ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL,
true, false);
Review comment:
instead of modifying every callsite - can we have a method with the
original signature?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]