[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546643#comment-13546643
 ] 

Phabricator commented on HIVE-3562:
-----------------------------------

njain has commented on the revision "HIVE-3562 [jira] Some limit can be pushed 
down to map stage".

INLINE COMMENTS
  conf/hive-default.xml.template:1434 Can you add more details here - a example 
query would really help ?
  ql/src/test/queries/clientpositive/limit_pushdown.q:16 What is so special 
about 40 ?

  set hive.limit.pushdown.heap.threshold explicitly at the beginning of the 
test, makes the
  test easier to maintain in the long run.

  ql/src/test/queries/clientpositive/limit_pushdown.q:34 What is the difference 
between this and line 3 ?

  ql/src/test/queries/clientpositive/limit_pushdown.q:10 I think this plan is 
not correct.

  Let us say, the values are
  v1
  v2
  ..
  v10
  v11
  v12
  ..
  v20

  The first mapper does not have v8-10, so it emits v1-v7, v11-v13
  The second mapper contains data for all values, but it only emits v1-v10

  Since it does not involves a order by, it is possible that the data for v11 
will get picked up, which does not contain data from the second mapper. If you 
are pushing the limit up, you should create an additional MR job which orders 
the rows - in the above example, making sure that only v1-v10 are picked up.

  Am I missing something here ?

REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, navis
Cc: njain

                
> Some limit can be pushed down to map stage
> ------------------------------------------
>
>                 Key: HIVE-3562
>                 URL: https://issues.apache.org/jira/browse/HIVE-3562
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
> HIVE-3562.D5967.3.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to