[ 
https://issues.apache.org/jira/browse/PIG-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871376#comment-13871376
 ] 

Alex Bain commented on PIG-3557:
--------------------------------

1. You can check requestedParallelism for the tezOperator. This should be 
doable.

This doesn't sound quite right to me. Let's say you are doing:
a = LOAD '/data/myLargeDataSet';
b = LIMIT a 1000000;
...
where myLargeDataSet contains lots of block-sized files. Then, in that case, 
the Tez vertex for the POLoad has a requestedParallelism of 1, but the actual 
runtime parallelism will be equal to the number of files. In this case, the 
optimization (putting the limit only in the plan for the previous vertex, which 
in this case, is the vertex for the load) and not having a second vertex fails. 
Basically, we can't depend on requestedParallelism = 1 to actually be the 
parallelism at runtime.

[Just to note, the LimitOptimizer would actually push the limit up to the Input 
Handler, but just to keep this example simple, let's ignore that for now]

> Implement optimizations for LIMIT
> ---------------------------------
>
>                 Key: PIG-3557
>                 URL: https://issues.apache.org/jira/browse/PIG-3557
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Alex Bain
>            Assignee: Alex Bain
>
> Implement optimizations for LIMIT when other parts of Pig-on-Tez are more 
> mature. Some of the optimizations mentioned by Daniel include:
> 1. If the previous stage using 1 reduce, no need to add one more vertex
> 2. If the limitplan is null (ie, not the "limited order by" case), we might 
> not need a shuffle edge, a pass through edge should be enough if possible
> 3. Similar to PIG-1270, we can push limit to InputHandler
> 4. We also need to think through the "limited order by" case once "order by" 
> is implemented



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to