[ 
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954743#comment-15954743
 ] 

Daniel Dai commented on PIG-5211:
---------------------------------

Looks pretty good so far. Need to fine tune NestedLimitOptimizer, existence of 
both LOLimit and LOSort is not enough, must make sure LOLimit is right after 
LOSort, or you can follow LimitOptimizer to push LOLimit all the way up, which 
is more sophisticated (I am not insisting this tough). Also 
SecondaryKeyOptimizer does not recognize limited nested sort currently, it is 
possible SecondaryKeyOptimizer optimize limited sort into MR/Tez secondary 
sort, thus the limit is lost. So we shall disable SecondaryKeyOptimizer if the 
nested sort is a limited sort in SecondaryKeyOptimizer. You can use the 
following script as the test case which SecondaryKeyOptimizer is get involved:
{code}
a = load 'studenttab10k' as (name:chararray, age:int, gpa:double);
b = group a by name;
c = foreach b {
    c1 = order a by age;
    c2 = limit c1 5;
    generate c2;
}
explain c;
{code}

> Optimize Nested Limited Sort
> ----------------------------
>
>                 Key: PIG-5211
>                 URL: https://issues.apache.org/jira/browse/PIG-5211
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jin Sun
>            Assignee: Jin Sun
>             Fix For: 0.17.0
>
>         Attachments: PIG-5211-1.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig 
> stores all elements and sort them. It should use a priority queue to be more 
> efficient in space. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to