Philip (flip) Kromer created PIG-3985:
-----------------------------------------
Summary: Multiquery execution of RANK with RANK BY causes NPE
JobCreationException "ERROR 2017: Internal error creating job configuration"
Key: PIG-3985
URL: https://issues.apache.org/jira/browse/PIG-3985
Project: Pig
Issue Type: Bug
Reporter: Philip (flip) Kromer
A script with both RANK and RANK BY will crash with a Null Pointer Exception in
JobControlCompiler.java when multiquery is enabled.
The following script will work for any combination of the RANK BY operations;
or if there is one RANK operation only (i.e. no other RANK or RANK BY
operation). Non-BY-RANKS will perish together but succeed alone.
Disabling multiquery execution makes everything work again.
I am using Hadoop 2.4.0 with Pig Trunk (d24d06a48, after PIG-3739). The error
occurs in local or mapreduce mode.
{code}
-- disable multiquery and you can rank all day long
-- SET opt.multiquery false
citypops = LOAD 'us_city_pops.tsv' AS (city:chararray, state:chararray,
pop_2011:int);
citypops_o = ORDER citypops BY city;
--
-- if you have one non-by RANK you may not have any other RANKs
--
citypops_nosort_inplace = RANK citypops;
citypops_presorted_inplace = RANK citypops_o;
citypops_ties_cause_skips = RANK citypops BY city;
citypops_ties_no_skips = RANK citypops BY city DENSE;
citypops_presorted_ranked = RANK citypops_o BY city;
STORE citypops_nosort_inplace INTO '/tmp/citypops_nosort_inplace' USING
PigStorage('\t', '--overwrite true');
-- STORE citypops_presorted_inplace INTO '/tmp/citypops_presorted_inplace'
USING PigStorage('\t', '--overwrite true');
STORE citypops_ties_cause_skips INTO '/tmp/citypops_ties_cause_skips' USING
PigStorage('\t', '--overwrite true');
-- STORE citypops_ties_no_skips INTO '/tmp/citypops_ties_no_skips'
USING PigStorage('\t', '--overwrite true');
-- STORE citypops_presorted_ranked INTO '/tmp/citypops_presorted_ranked'
USING PigStorage('\t', '--overwrite true');
{code}
{code}
Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 2017:
Internal error creating job configuration.
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:946)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:322)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:200)
--- SNIP ----
Caused by: java.lang.NullPointerException
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:886)
... 19 more
{code}
The proximate offense seems to be that globalCounters.get(operationID) returns
null:
{code}
if(mro.isRankOperation()) {
Iterator<String> operationIDs =
mro.getRankOperationId().iterator();
while(operationIDs.hasNext()) {
String operationID = operationIDs.next();
Iterator<Pair<String, Long>> itPairs =
globalCounters.get(operationID).iterator();
Pair<String,Long> pair = null;
while(itPairs.hasNext()) {
pair = itPairs.next();
conf.setLong(pair.first, pair.second);
}
}
}
{code}
PORank.java line 184 seems to need a counter value, and so this part does need
to happen.
--
This message was sent by Atlassian JIRA
(v6.2#6252)