[ 
https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255363#comment-13255363
 ] 

Dmitriy V. Ryaboy commented on PIG-2652:
----------------------------------------

Spent some time debugging my refactoring and decided maybe there's a bug in 
your patch, Daniel. As written, we look at the inputs to the sampling job and 
estimating reducers for the successor based on those inputs. However, the 
successor actually has two inputs -- the sampled dataset, and the second joined 
relation. That means the earlier estimate is incorrect.

I tried running the estimator on the post-sample job, but there doesn't seem to 
be a way to connect the plan to its predecessor -- the plan passed in is 
already trimmed at the top. I'll try the following instead: identify a sampling 
job's children, and set them aside somewhere; then check against the saved list 
of known post-sample jobs and re-run the estimator for them if parallelism is 
set to 1.
                
> Skew join and order by don't trigger reducer estimation
> -------------------------------------------------------
>
>                 Key: PIG-2652
>                 URL: https://issues.apache.org/jira/browse/PIG-2652
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>             Fix For: 0.10.0, 0.9.3, 0.11
>
>         Attachments: PIG-2652_1.patch, PIG-2652_2.patch, PIG-2652_3.patch, 
> PIG-2652_3_10.patch
>
>
> If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the 
> number of reducers is not estimated based on input size for skew joins or 
> order by. Instead, these jobs get only 1 reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to