[
https://issues.apache.org/jira/browse/HADOOP-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630855#action_12630855
]
Joydeep Sen Sarma commented on HADOOP-4086:
-------------------------------------------
some questions:
- The extra reducesink (in the limitmap -> reducesink -> linkreduce) - what
will it reduce on?
- in many cases - the limit does not seem to need a reduce. for example - in
the dumbest case - select * limit N - we just need to run the mappers and then
keep concatenating mapper outputs until we have N rows.
- in the other case where the output is sorted/grouped - we need to have N from
each mapper and then limit N in reducer (standard top N operator
based on last 2 observations - i find it much easier to understand the limit
operator implementation as:
- a simple select * like operator on a dataset (a table - whether it's an
intermediate dataset or not)
- there are two cases:
- if the table/data is sorted/grouped - then the limit operator needs to do a
merge of all the tables files and produce top N
- if the table/data is not sorted/grouped - then the limit task needs to get
any N rows - possibly by scanning one file at a time
the limit operator is sequential by definition.
the limit task can run in a single mapper map-only hadoop job in case it's
writing to a file - or if it's writing to console (select * limit N) - can just
run from the client side. this is orthogonal to what it does.
> Add limit to Hive QL
> --------------------
>
> Key: HADOOP-4086
> URL: https://issues.apache.org/jira/browse/HADOOP-4086
> Project: Hadoop Core
> Issue Type: New Feature
> Components: contrib/hive
> Reporter: Ashish Thusoo
> Assignee: Ashish Thusoo
>
> Add a limit feature to the Hive Query language.
> so you can do the following things:
> SELECT * FROM T LIMIT 10;
> and this would just return the 10 rows.
> No gaurantees are made on which 10 rows are returned by the query.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.