[
https://issues.apache.org/jira/browse/HIVE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638808#comment-13638808
]
Phabricator commented on HIVE-4377:
-----------------------------------
navis has commented on the revision "HIVE-4377 [jira] Add more comment to
https://reviews.facebook.net/D1209 (HIVE-2340)".
INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:209
It was implemented as your suggestion at first but it was very confusing with
many redundant codes(There are seven possible cases sharing common rule). But
if you prefer, I'll update patch.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251
processOrderBy?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:254
ProcessGroupBy?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:264
Will be moved to JoinReducerProc.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:512
I was thinking of OperatorUtils or someting. Methods like this would be made
continuously.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:685
can be null if there exists operator like ScriptOperator between two RSs.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:690
If there is difference in key/partition/sort-order in common part of two RSs,
it's not possible to merge. I'll add comment for that.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:395
I'll try.
REVISION DETAIL
https://reviews.facebook.net/D10377
To: JIRA, navis
Cc: njain
> Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)
> ------------------------------------------------------------------
>
> Key: HIVE-4377
> URL: https://issues.apache.org/jira/browse/HIVE-4377
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Gang Tim Liu
> Assignee: Navis
> Attachments: HIVE-4377.D10377.1.patch
>
>
> thanks a lot for addressing optimization in HIVE-2340. Awesome!
> Since we are developing at a very fast pace, it would be really useful to
> think about maintainability and testing of the large codebase. Highlights
> which are applicable for D1209:
> 1. Javadoc for all public/private functions, except for
> setters/getters. For any complex function, clear examples (input/output)
> would really help.
> 2. Specially, for query optimizations, it might be a good idea to have
> a simple working query at the top, and the expected changes. For e.g..
> The operator tree for that query at each step, or a detailed explanation
> at the top.
> 3. If possible, the test name (.q file) where the function is being
> invoked, or the query which would potentially test that scenario, if it
> is a query processor change.
> 4. Comments in each test (.q file) that should include the jira
> number, what is it trying to test. Assumptions about each query.
> 5. Reduce the output for each test whenever query is outputting more
> than 10 results, it should have a reason. Otherwise, each query result
> should be bounded by 10 rows.
> thanks a lot
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira