[ 
https://issues.apache.org/jira/browse/DRILL-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562530#comment-13562530
 ] 

Julian Hyde edited comment on DRILL-20 at 1/25/13 8:08 AM:
-----------------------------------------------------------

LIMIT is useful. As Ted says, the two main use cases are for debugging and for 
Top. Another use case is pagination: in MySQL's implementation, you can ask for 
rows 200 through 300, for instance.

In theory you could implement everything LIMIT does using the using RANK 
followed by a filter, but that's more verbose for the user writing the query 
and more difficult to the optimizer to recognize.

There are some optimizations you can apply if you know you only want the top 10 
customers out of 10 million. For instance, if you are doing merge sort, you 
only to keep the top 10 customers in each sort run.

MySQL has a LIMIT clause. It was so useful that it made it into the SQL 
standard. (Most other MySQL innovations were scoffed at by the standards folks, 
for good reason.)

Not sure what you mean by "per segment". If you mean, say, top 10 customers 
within each state, or within a (city, state) combination, I would not extend 
LIMIT to handle that case. The RANK function (combined with PARTITION BY and 
ORDER BY in standard SQL) has enough expressive power for that.
                
      was (Author: julianhyde):
    LIMIT is useful. As Ted says, the two main use cases are for debugging and 
for Top.

In theory you could implement using rank followed by a filter, but that's more 
verbose for the user writing the query and more difficult to the optimizer to 
recognize.

There are some optimizations you can apply if you know you only want the top 10 
customers out of 10 million. For instance, if you are doing merge sort, you 
only to keep the top 10 customers in each sort run.

MySQL has a LIMIT clause. It was so useful that it made it into the SQL 
standard.

Not sure what you mean by "per segment". If you mean, say, top 10 customers 
within each state, or within a (city, state) combination, I would not extend 
LIMIT to handle that case. The RANK function (combined with PARTITION BY and 
ORDER BY in standard SQL) has enough expressive power for that.
                  
> Limit Operator Reference Implementation
> ---------------------------------------
>
>                 Key: DRILL-20
>                 URL: https://issues.apache.org/jira/browse/DRILL-20
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Chris Merrick
>         Attachments: limit-reference.patch
>
>
> Build off of Jacques work on reference implementations - the limit operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to