[ 
https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255405#comment-13255405
 ] 

Dmitriy V. Ryaboy commented on PIG-2652:
----------------------------------------

I've verified that the same bug (undercounting the input when estimating 
reducers) is in effect on trunk, when SampleOptimizer is able to estimate 
reducers.

Starting to uncover quite a few issues in skewed join implementation.. for 
example even if I explicitly set parallelism to 56, and have a half-dozen 
unique values in the skewed relation, the output is only split across 16 
reducers.
                
> Skew join and order by don't trigger reducer estimation
> -------------------------------------------------------
>
>                 Key: PIG-2652
>                 URL: https://issues.apache.org/jira/browse/PIG-2652
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>             Fix For: 0.10.0, 0.9.3, 0.11
>
>         Attachments: PIG-2652_1.patch, PIG-2652_2.patch, PIG-2652_3.patch, 
> PIG-2652_3_10.patch
>
>
> If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the 
> number of reducers is not estimated based on input size for skew joins or 
> order by. Instead, these jobs get only 1 reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to