[
https://issues.apache.org/jira/browse/DRILL-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562530#comment-13562530
]
Julian Hyde edited comment on DRILL-20 at 1/25/13 8:08 AM:
-----------------------------------------------------------
LIMIT is useful. As Ted says, the two main use cases are for debugging and for
Top. Another use case is pagination: in MySQL's implementation, you can ask for
rows 200 through 300, for instance.
In theory you could implement everything LIMIT does using the using RANK
followed by a filter, but that's more verbose for the user writing the query
and more difficult to the optimizer to recognize.
There are some optimizations you can apply if you know you only want the top 10
customers out of 10 million. For instance, if you are doing merge sort, you
only to keep the top 10 customers in each sort run.
MySQL has a LIMIT clause. It was so useful that it made it into the SQL
standard. (Most other MySQL innovations were scoffed at by the standards folks,
for good reason.)
Not sure what you mean by "per segment". If you mean, say, top 10 customers
within each state, or within a (city, state) combination, I would not extend
LIMIT to handle that case. The RANK function (combined with PARTITION BY and
ORDER BY in standard SQL) has enough expressive power for that.
was (Author: julianhyde):
LIMIT is useful. As Ted says, the two main use cases are for debugging and
for Top.
In theory you could implement using rank followed by a filter, but that's more
verbose for the user writing the query and more difficult to the optimizer to
recognize.
There are some optimizations you can apply if you know you only want the top 10
customers out of 10 million. For instance, if you are doing merge sort, you
only to keep the top 10 customers in each sort run.
MySQL has a LIMIT clause. It was so useful that it made it into the SQL
standard.
Not sure what you mean by "per segment". If you mean, say, top 10 customers
within each state, or within a (city, state) combination, I would not extend
LIMIT to handle that case. The RANK function (combined with PARTITION BY and
ORDER BY in standard SQL) has enough expressive power for that.
> Limit Operator Reference Implementation
> ---------------------------------------
>
> Key: DRILL-20
> URL: https://issues.apache.org/jira/browse/DRILL-20
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Chris Merrick
> Attachments: limit-reference.patch
>
>
> Build off of Jacques work on reference implementations - the limit operator.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira