[ 
https://issues.apache.org/jira/browse/PIG-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-364:
---------------------------

    Attachment: PIG-364.patch

This patch takes approach 1. It will add one additional map-reduce operator 
with 1 reducer if the requested parallelism > 1. Now the behavior of limit is:

1. If the map plan is closed before POLimit operator, we put POLimit in reduce 
plan, grant requested parallelism, if requested parallelism > 1, close reduce 
plan, add one additional map-reduce operator with 1 reducer

2. If the map plan is open before POLimit operator, we put POLimit in map plan, 
close map plan, add another POLimit to reduce plan, and set parallelism of this 
map-reduce operator 1. Although in this case, POLimit create a map-reduce 
boundary, we do not associate a parallel option with limit keyword. I believe 
provide a parallel option with limit will arouse confusion to the user, because 
it is relatively hard to explain to the user whether this parallel option will 
be granted or not

3. In limited sort case, we will have POSort with limit<>-1. If the parallelism 
for POSort > 1, we add one additional map-reduce operator with 1 reducer


> Limit return incorrect records when we use multiple reducer
> -----------------------------------------------------------
>
>                 Key: PIG-364
>                 URL: https://issues.apache.org/jira/browse/PIG-364
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: types_branch
>
>         Attachments: PIG-364.patch
>
>
> Currently we put Limit(k) operator in the reducer plan. However, in the case 
> of n reducer, we will get up to n*k output. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to