[ 
https://issues.apache.org/jira/browse/PIG-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264635#comment-13264635
 ] 

Dmitriy V. Ryaboy commented on PIG-483:
---------------------------------------

Note that since parallelism can be determined at runtime, this improvement 
needs to happen after the plan is compiled, right before the sample job is run.

Also Note that Skewed Join has the same issue (and in fact, uses the same 
indexing job...)

Skewed Join should be converted to a normal join, and order-by should be 
converted to a naive single-reducer order. 
                
> PERFORMANCE: different strategies for large and small order bys
> ---------------------------------------------------------------
>
>                 Key: PIG-483
>                 URL: https://issues.apache.org/jira/browse/PIG-483
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>              Labels: gsoc2011
>
> Currently pig always does a multi-pass order by where it first determines a 
> distribution for the keys and then orders in a second pass.  This avoids the 
> necessity of having a single reducer.  However, in cases where the data is 
> small enough to fit into a single reducer, this is inefficient.  For small 
> data sets it would be good to realize the small size of the set and do the 
> order by in a single pass with a single reducer.
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to