[
https://issues.apache.org/jira/browse/HIVE-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570002#comment-13570002
]
Namit Jain commented on HIVE-3980:
----------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
503 ↗
(On Diff #26877)
Better name: hive.auto.convert.sortmerge.join?
504 ↗
(On Diff #26877)
hive.auto.convert.sortmerge.join.bigtable.selection.policy ?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java
108 ↗
(On Diff #26877)
Why is this needed ? If orders[] = index for all i, this is not needed and if
orders[] != index for some i, than we are partially updating orders[] which
sounds dangerous.
114 ↗
(On Diff #26877)
I think it makes more sense to do this check before looping over and comparing
orders, since this is a stricter check, in a sense that if all join keys are
not bucketed columns there is no point in checking order.
98 ↗
(On Diff #26877)
Better argument name: joinKeys ?
99 ↗
(On Diff #26877)
Better argument name: joinKeyOrders?
80 ↗
(On Diff #26877)
Better function name: getBucketFilePathOfPartition() ?
Also can you refactor this to return List<Path> instead, we had troubles
because of String Vs Path interchangeable usage in Hive code base in past.
97 ↗
(On Diff #26877)
Comment for function ?
"This function checks whether all bucketing columns are also in join keys and
are in same order."
117 ↗
(On Diff #26877)
Better function name: checkNumOfBucketsAgainstBigTable()?
118 ↗
(On Diff #26877)
Better param name: aliasToNumOfBuckets, NumOfBucketsInPartition ?
182 ↗
(On Diff #26877)
This function does bunch of checks to determine if join can be converted to
bucketmapjoin. It will be good idea to list all the checks it performs in
comments for this method.
190 ↗
(On Diff #26877)
better name: tblAliasToNumOfBucketsInEachPartition ?
Actually, it seems that you can always derive this info from the next field, it
is just the size of the innermost list in next map. If so, please consider
getting rid of this map and just use next field wherever this one is required.
Having too many datastructures makes code harder to follow as well as results
in tighter function coupling.
192 ↗
(On Diff #26877)
better name: tblAliasToListOfBucketedFilePathInEachPartition?
200–203 ↗
(On Diff #26877)
Similar comment as the one above applies here as well.
210 ↗
(On Diff #26877)
In what case, topOp for the alias could be null. Is it a subquery case? Either
case, please add a comment when this topOp could be null and why in that case
we cannot join to BMJ.
219 ↗
(On Diff #26877)
Comment:
We cannot get to root TS operator, likely because there is a join or group-by
between topOp and root TS operator. We don't handle that case, so we simply
return.
226 ↗
(On Diff #26877)
Should this be topOpEntry.getValue().equals(tso) instead ?
361 ↗
(On Diff #26877)
This function does bunch of checks to determine if mapjoin can be converted to
bucketmapjoin. It will be good idea to list all the checks it performs in
comments for this method.
366 ↗
(On Diff #26877)
Better name ?
205 ↗
(On Diff #26877)
Better name: joinKeyOrder?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java
87 ↗
(On Diff #26877)
Comment:
First check if this map-join operator is already a BMJ or not. If not give up
we are only trying to convert a BMJ to sort-BMJ.
121 ↗
(On Diff #26877)
If table is not sorted, we cannot do sortBMJ anyways. Secondly we cannot do
sortBMJ for outer joins either. Shouldn't this check be made much earlier?
232 ↗
(On Diff #26877)
Early parts of this method performs some of the same checks of
checkConvertBucketMapJoin() of AbstractBucketJoinProc.java. Perhaps, we can
refactor to avoid code duplication. Further, name isTableSorted is not correct,
since it does much more than checking if table is sorted. Better name:
isEligibleForSBMJ() ?
462 ↗
(On Diff #26877)
This should be a checked exception (perhaps of type SemanticException).
Certainly, not RuntimeException.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AvgPartitionSizeSortMergeJoinBigTableMatcher.java
41 ↗
(On Diff #26877)
Better name: AvgPartitionSizeBasedTblSelectorForSMJ ?
41 ↗
(On Diff #26877)
Is there a test for this policy ?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketJoinOptProcCtx.java
39 ↗
(On Diff #26877)
Better name: rejectedJoinOps.
42 ↗
(On Diff #26877)
Better name: convertedJoinOps. Also, update the comment.
60 ↗
(On Diff #26877)
Name: getRejectedJoinOps ?
Also can similarly name other getters and setters of this class.
> Cleanup after HIVE-3403
> -----------------------
>
> Key: HIVE-3980
> URL: https://issues.apache.org/jira/browse/HIVE-3980
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: Namit Jain
>
> There have been a lot of comments on HIVE-3403, which involve changing
> variable names/function names/adding more comments/general cleanup etc.
> Since HIVE-3403 involves a lot of refactoring, it was fairly difficult to
> address the comments there, since refreshing becomes impossible. This jira
> is to track those cleanups.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira