[jira] [Commented] (HIVE-3980) Cleanup after HIVE-3403

Namit Jain (JIRA) Sun, 03 Feb 2013 21:56:14 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570002#comment-13570002
 ]


Namit Jain commented on HIVE-3980:
----------------------------------


common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
503 ↗
(On Diff #26877)        
Better name: hive.auto.convert.sortmerge.join?
504 ↗
(On Diff #26877)        
hive.auto.convert.sortmerge.join.bigtable.selection.policy ?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java
108 ↗
(On Diff #26877)        
Why is this needed ? If orders[] = index for all i, this is not needed and if 
orders[] != index for some i, than we are partially updating orders[] which 
sounds dangerous.
114 ↗
(On Diff #26877)        
I think it makes more sense to do this check before looping over and comparing 
orders, since this is a stricter check, in a sense that if all join keys are 
not bucketed columns there is no point in checking order.
98 ↗
(On Diff #26877)        
Better argument name: joinKeys ?
99 ↗
(On Diff #26877)        
Better argument name: joinKeyOrders?
80 ↗
(On Diff #26877)        
Better function name: getBucketFilePathOfPartition() ?
Also can you refactor this to return List<Path> instead, we had troubles 
because of String Vs Path interchangeable usage in Hive code base in past.
97 ↗
(On Diff #26877)        
Comment for function ?
"This function checks whether all bucketing columns are also in join keys and 
are in same order."
117 ↗
(On Diff #26877)        
Better function name: checkNumOfBucketsAgainstBigTable()?
118 ↗
(On Diff #26877)        
Better param name: aliasToNumOfBuckets, NumOfBucketsInPartition ?
182 ↗
(On Diff #26877)        
This function does bunch of checks to determine if join can be converted to 
bucketmapjoin. It will be good idea to list all the checks it performs in 
comments for this method.
190 ↗
(On Diff #26877)        
better name: tblAliasToNumOfBucketsInEachPartition ?
Actually, it seems that you can always derive this info from the next field, it 
is just the size of the innermost list in next map. If so, please consider 
getting rid of this map and just use next field wherever this one is required. 
Having too many datastructures makes code harder to follow as well as results 
in tighter function coupling.
192 ↗
(On Diff #26877)        
better name: tblAliasToListOfBucketedFilePathInEachPartition?
200–203 ↗
(On Diff #26877)        
Similar comment as the one above applies here as well.
210 ↗
(On Diff #26877)        
In what case, topOp for the alias could be null. Is it a subquery case? Either 
case, please add a comment when this topOp could be null and why in that case 
we cannot join to BMJ.
219 ↗
(On Diff #26877)        
Comment:
We cannot get to root TS operator, likely because there is a join or group-by 
between topOp and root TS operator. We don't handle that case, so we simply 
return.
226 ↗
(On Diff #26877)        
Should this be topOpEntry.getValue().equals(tso) instead ?
361 ↗
(On Diff #26877)        
This function does bunch of checks to determine if mapjoin can be converted to 
bucketmapjoin. It will be good idea to list all the checks it performs in 
comments for this method.
366 ↗
(On Diff #26877)        
Better name ?
205 ↗
(On Diff #26877)        
Better name: joinKeyOrder?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java
87 ↗
(On Diff #26877)        
Comment:
First check if this map-join operator is already a BMJ or not. If not give up 
we are only trying to convert a BMJ to sort-BMJ.
121 ↗
(On Diff #26877)        
If table is not sorted, we cannot do sortBMJ anyways. Secondly we cannot do 
sortBMJ for outer joins either. Shouldn't this check be made much earlier?
232 ↗
(On Diff #26877)        
Early parts of this method performs some of the same checks of 
checkConvertBucketMapJoin() of AbstractBucketJoinProc.java. Perhaps, we can 
refactor to avoid code duplication. Further, name isTableSorted is not correct, 
since it does much more than checking if table is sorted. Better name: 
isEligibleForSBMJ() ?
462 ↗
(On Diff #26877)        
This should be a checked exception (perhaps of type SemanticException). 
Certainly, not RuntimeException.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AvgPartitionSizeSortMergeJoinBigTableMatcher.java
41 ↗
(On Diff #26877)        
Better name: AvgPartitionSizeBasedTblSelectorForSMJ ?
41 ↗
(On Diff #26877)        
Is there a test for this policy ?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketJoinOptProcCtx.java
39 ↗
(On Diff #26877)        
Better name: rejectedJoinOps.
42 ↗
(On Diff #26877)        
Better name: convertedJoinOps. Also, update the comment.
60 ↗
(On Diff #26877)        
Name: getRejectedJoinOps ?
Also can similarly name other getters and setters of this class.
                
> Cleanup after HIVE-3403
> -----------------------
>
>                 Key: HIVE-3980
>                 URL: https://issues.apache.org/jira/browse/HIVE-3980
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> There have been a lot of comments on HIVE-3403, which involve changing 
> variable names/function names/adding more comments/general cleanup etc.
> Since HIVE-3403 involves a lot of refactoring, it was fairly difficult to
> address the comments there, since refreshing becomes impossible. This jira
> is to track those cleanups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3980) Cleanup after HIVE-3403

Reply via email to