[
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750398#comment-13750398
]
Phabricator commented on HIVE-3562:
-----------------------------------
ashutoshc has commented on the revision "HIVE-3562 [jira] Some limit can be
pushed down to map stage".
Looks pretty good. Just requesting to add few more comments.
INLINE COMMENTS
conf/hive-default.xml.template:1586-1590 We can remove this now.
ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:1186 Just to
add more clarity, say something like: we can push the limit above GBY (running
in Reducer), since that will generate single row for each group. This doesn't
necessarily hold for GBY (running in Mappers), so we don't push limit above it.
ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:182 It
will be good to add comment about what this field is holding. Add a comment
saying: This two dimensional array holds key data and a corresponding Union
object which contains the tag identifying the aggregate expression for distinct
columns.
Ideally, instead of this 2-D array, we should have probably enhanced HiveKey
class for this logic. We should do that in a follow-up jira.
ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:267 I
didnt follow this logic completely.
Seems like this is an optimization not to evaluate union object repeatedly
and do system copy for it. Can you add a comment explaining this?
ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:271 seems
like it will be null only for all i = 0. If so, better do if (i==0) check ?
Also add comment when this will be null and when it will be non-null?
ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:260 You
made changes to this section, because you found bug or are you purely
refactoring this? If you hit upon the bug, can you explain what was it?
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:50-51 It will be
good to add comment about what these 2D arrays are holding?
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:52 Also, add comment
saying this array holds hashcode for keys.
Also, add note that indices of all these arrays must line up.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:82
Nice Comments!
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:34 It will be good
to add a javadoc for this class.
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:36 Also javadoc for
this interface.
REVISION DETAIL
https://reviews.facebook.net/D5967
To: JIRA, tarball, ashutoshc, navis
Cc: njain
> Some limit can be pushed down to map stage
> ------------------------------------------
>
> Key: HIVE-3562
> URL: https://issues.apache.org/jira/browse/HIVE-3562
> Project: Hive
> Issue Type: Bug
> Reporter: Navis
> Assignee: Navis
> Priority: Trivial
> Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch,
> HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch,
> HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree,
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira