[ 
https://issues.apache.org/jira/browse/PIG-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403351#comment-13403351
 ] 

Jie Li commented on PIG-2675:
-----------------------------

Limit is now always compiled to two jobs. We can optimize at both compile-time 
and runtime.

{code}
data = LOAD 'queries/1.txt' AS (k, v, x);
selected = LIMIT data 2;
explain selected;
{code}

For this query, LIMIT is compiled at both the map phase and reduce phase in the 
1st job, whose requestedParallelism is already set to 1, thus we don't need to 
compile the 2nd job.

{code}
data = LOAD 'queries/1.txt' AS (k, v, x);
grouped = GROUP data BY k;
selected = LIMIT grouped 2;
explain selected;
{code}

For this query, LIMIT is compiled at the reduce phase of the 1st job, therefore 
we need to compile a 2nd job, which can be skipped at run-time.

                
> Optimization: Remove unnecessary Limit jobs from plan
> -----------------------------------------------------
>
>                 Key: PIG-2675
>                 URL: https://issues.apache.org/jira/browse/PIG-2675
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Daniel Dai
>
> LIMIT operator always inserts a limiting single-reducer job after PIG-2652.
> We can optimize this job away when the preceding job only has 1 reducer at 
> run-time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to