[
https://issues.apache.org/jira/browse/PIG-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aniket Mokashi updated PIG-3928:
--------------------------------
Fix Version/s: (was: 0.13.0)
0.14.0
> Reducer estimator gets wrong configuration for ORDER_BY job
> -----------------------------------------------------------
>
> Key: PIG-3928
> URL: https://issues.apache.org/jira/browse/PIG-3928
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.12.1, 0.13.0
> Reporter: Aniket Mokashi
> Fix For: 0.14.0
>
>
> SAMPLER job requires a parameter that needs to be equal to number of reducers
> used by ORDER_BY job. This is done by getting successor of SAMPLER job and
> estimating reducers for it in the following code. However, job (conf) passed
> to calculateRuntimeReducers is corresponding to SAMPLER job instead of
> ORDER_BY job which causes problems in some custom reducer estimators that
> depend on the configuration.
> {code}
> // inside
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> public void adjustNumReducers(MROperPlan plan, MapReduceOper mro,
> org.apache.hadoop.mapreduce.Job nwJob) throws IOException {
> int jobParallelism = calculateRuntimeReducers(mro, nwJob);
> if (mro.isSampler() && plan.getSuccessors(mro) != null) {
> // We need to calculate the final number of reducers of the next
> job (order-by or skew-join)
> // to generate the quantfile.
> MapReduceOper nextMro = plan.getSuccessors(mro).get(0);
> // Here we use the same conf and Job to calculate the runtime
> #reducers of the next job
> // which is fine as the statistics comes from the nextMro's
> POLoads
> int nPartitions = calculateRuntimeReducers(nextMro, nwJob);
> // set the runtime #reducer of the next job as the #partition
> ParallelConstantVisitor visitor =
> new ParallelConstantVisitor(mro.reducePlan, nPartitions);
> visitor.visit();
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)