[
https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254394#comment-13254394
]
Bill Graham commented on PIG-2652:
----------------------------------
I was able to reproduce with a similar script that didn't have a reducer in the
first MR job. The code in questions is this block in {{SampleOptimizer}}. It
returns in the second conditional with {{Predecessor should be a root of the
plan}} before reducers can be estimated.
{{noformat}}
// Get this job's predecessor. There should be exactly one.;
List<MapReduceOper> preds = mPlan.getPredecessors(mr);
if (preds.size() != 1) {
log.debug("Too many predecessors to sampling job.");
return;
}
MapReduceOper pred = preds.get(0);
// The predecessor should be a root.
List<MapReduceOper> predPreds = mPlan.getPredecessors(pred);
if (predPreds != null && predPreds.size() > 0) {
log.debug("Predecessor should be a root of the plan");
return;
}
// The predecessor should have just a load and store in the map, and nothing
// in the combine or reduce.
if ( !(pred.reducePlan.isEmpty() && pred.combinePlan.isEmpty())) {
log.debug("Predecessor has a combine or reduce plan");
return;
}
{{noformat}}
> Skew join and order by don't trigger reducer estimation
> -------------------------------------------------------
>
> Key: PIG-2652
> URL: https://issues.apache.org/jira/browse/PIG-2652
> Project: Pig
> Issue Type: Bug
> Reporter: Bill Graham
> Assignee: Bill Graham
> Fix For: 0.10.0, 0.9.3, 0.11
>
> Attachments: PIG-2652_1.patch
>
>
> If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the
> number of reducers is not estimated based on input size for skew joins or
> order by. Instead, these jobs get only 1 reducer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira