[
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954743#comment-15954743
]
Daniel Dai commented on PIG-5211:
---------------------------------
Looks pretty good so far. Need to fine tune NestedLimitOptimizer, existence of
both LOLimit and LOSort is not enough, must make sure LOLimit is right after
LOSort, or you can follow LimitOptimizer to push LOLimit all the way up, which
is more sophisticated (I am not insisting this tough). Also
SecondaryKeyOptimizer does not recognize limited nested sort currently, it is
possible SecondaryKeyOptimizer optimize limited sort into MR/Tez secondary
sort, thus the limit is lost. So we shall disable SecondaryKeyOptimizer if the
nested sort is a limited sort in SecondaryKeyOptimizer. You can use the
following script as the test case which SecondaryKeyOptimizer is get involved:
{code}
a = load 'studenttab10k' as (name:chararray, age:int, gpa:double);
b = group a by name;
c = foreach b {
c1 = order a by age;
c2 = limit c1 5;
generate c2;
}
explain c;
{code}
> Optimize Nested Limited Sort
> ----------------------------
>
> Key: PIG-5211
> URL: https://issues.apache.org/jira/browse/PIG-5211
> Project: Pig
> Issue Type: Improvement
> Reporter: Jin Sun
> Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5211-1.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig
> stores all elements and sort them. It should use a priority queue to be more
> efficient in space.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)