amansinha100 commented on code in PR #4864:
URL: https://github.com/apache/hive/pull/4864#discussion_r1387580078
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java:
##########
@@ -755,6 +755,14 @@ private boolean checkConvertJoinSMBJoin(JoinOperator
joinOp, OptimizeTezProcCont
LOG.debug("External table {} found in join and also could not provide
statistics - disabling SMB join.", sb);
return false;
}
+ for (Operator<?> grandParent : parentOp.getParentOperators()) {
Review Comment:
Similar to the other other method, pls add a non-null check:
if (parentOp.getParentOperators() != null) {
for ( ...)
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java:
##########
@@ -755,6 +755,14 @@ private boolean checkConvertJoinSMBJoin(JoinOperator
joinOp, OptimizeTezProcCont
LOG.debug("External table {} found in join and also could not provide
statistics - disabling SMB join.", sb);
return false;
}
+ for (Operator<?> grandParent : parentOp.getParentOperators()) {
+ if (hasMoreGBYs(grandParent, 2)) {
+ LOG.info(
+ "We cannot convert to SMB because one of the join branches has
more than one GBY in the same reducer");
Review Comment:
nit: can we use the full form of GBY because this message will be at INFO
level and not all readers are familiar with the acronym. Also, suggest adding
'join' after SMB .
##########
ql/src/test/queries/clientpositive/auto_sortmerge_join_17.q:
##########
@@ -0,0 +1,22 @@
+CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key)
INTO 2 BUCKETS;
+
+insert into tbl1_n5(key, value)
+values
+(0, 'val_0'),
+(2, 'val_2'),
+(9, 'val_9');
+
+set hive.optimize.semijoin.conversion = false;
Review Comment:
It wasn't clear why this config has to be set false for this test case. If
it is needed, can you add a comment in the test file ?
##########
ql/src/test/queries/clientpositive/auto_sortmerge_join_17.q:
##########
@@ -0,0 +1,22 @@
+CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key)
INTO 2 BUCKETS;
+
+insert into tbl1_n5(key, value)
+values
+(0, 'val_0'),
+(2, 'val_2'),
+(9, 'val_9');
+
+set hive.optimize.semijoin.conversion = false;
+
+explain
Review Comment:
Can we also add a negative test case where the number of group-by within a
reducer is 1 or 0 and we expect to see the SMB join being used.
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java:
##########
@@ -857,6 +865,26 @@ private boolean checkConvertJoinSMBJoin(JoinOperator
joinOp, OptimizeTezProcCont
return true;
}
+ private boolean hasMoreGBYs(Operator<?> start, int max) {
Review Comment:
A brief comment for this method would be good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]