[
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546643#comment-13546643
]
Phabricator commented on HIVE-3562:
-----------------------------------
njain has commented on the revision "HIVE-3562 [jira] Some limit can be pushed
down to map stage".
INLINE COMMENTS
conf/hive-default.xml.template:1434 Can you add more details here - a example
query would really help ?
ql/src/test/queries/clientpositive/limit_pushdown.q:16 What is so special
about 40 ?
set hive.limit.pushdown.heap.threshold explicitly at the beginning of the
test, makes the
test easier to maintain in the long run.
ql/src/test/queries/clientpositive/limit_pushdown.q:34 What is the difference
between this and line 3 ?
ql/src/test/queries/clientpositive/limit_pushdown.q:10 I think this plan is
not correct.
Let us say, the values are
v1
v2
..
v10
v11
v12
..
v20
The first mapper does not have v8-10, so it emits v1-v7, v11-v13
The second mapper contains data for all values, but it only emits v1-v10
Since it does not involves a order by, it is possible that the data for v11
will get picked up, which does not contain data from the second mapper. If you
are pushing the limit up, you should create an additional MR job which orders
the rows - in the above example, making sure that only v1-v10 are picked up.
Am I missing something here ?
REVISION DETAIL
https://reviews.facebook.net/D5967
To: JIRA, tarball, navis
Cc: njain
> Some limit can be pushed down to map stage
> ------------------------------------------
>
> Key: HIVE-3562
> URL: https://issues.apache.org/jira/browse/HIVE-3562
> Project: Hive
> Issue Type: Bug
> Reporter: Navis
> Assignee: Navis
> Priority: Trivial
> Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch,
> HIVE-3562.D5967.3.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree,
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira