[ 
https://issues.apache.org/jira/browse/PIG-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3985:
------------------------------
    Attachment: pig-3985-v01.txt

{noformat}
Caused by: java.lang.NullPointerException
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:938)
{noformat}
{code:title=JobControlCompiler.java|borderStyle=solid}
 938                     Iterator<Pair<String, Long>> itPairs = 
globalCounters.get(operationID).iterator();
{code}
This was due to globalCounters not containing operationID.
This itself was caused by saveCounters not being called due to 
mro.isCounterOperation incorrectly returning false.
{code:title=JobControlCompiler.java|borderStyle=solid}
 358                 if (!pigContext.inIllustrator && mro.isCounterOperation())
 359                     saveCounters(job,mro.getOperationID());
{code}

This was caused by  mro.isCounterOperation assuming that POCount is always 
placed at the leaf level.
{code:title=MapReduceOper.java|borderStyle=solid}
511     public boolean isCounterOperation() {
512         return (getCounterOperation() != null);
513     }
...
525     private POCounter getCounterOperation() {
526         PhysicalOperator operator;
527         Iterator<PhysicalOperator> it =  
this.mapPlan.getLeaves().iterator();
528
529         while(it.hasNext()) {
530             operator = it.next();
531             if(operator instanceof POCounter)
532                 return (POCounter) operator;
533         }
...
{code}

For the sample pig test program given by Philip, mapreduce plan showed "SPLIT" 
as the only leaf.
{noformat}
MapReduce node scope-34
Map Plan
Split - scope-69
|   |
|   
Store(file:/tmp/temp465448860/tmp1018450824:org.apache.pig.impl.io.InterStorage)
 - scope-38
|   |
|   |---citypops_nosort_inplace: POCounter[tuple] - scope-14
|   |
|   citypops_ties_cause_skips: Local Rearrange[tuple]{chararray}(false) - 
scope-21
|   |   |
|   |   Project[chararray][0] - scope-22
|
|---citypops: New For Each(false,false,false)[bag] - scope-10
    |   |
    |   Cast[chararray] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[chararray] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[int] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |
    |---citypops: 
Load(file:///Users/knoguchi/git/pig/pig-3985/us_city_pops.tsv:org.apache.pig.builtin.PigStorage)
 - scope-0--------
{noformat}

I initially tried fixing MapReduceOper.getCounterOperation() so that it'll find 
the POCounter even if it's part of the split.  However, I soon learned that 
POCount requires different map-reduce class 
(PigMapReduceCounter.PigMapCounter.class and PigReduceCounter.class) and it 
currently doesn't work if they are mixed with other operations.

Instead of rewriting Rank, for now made a change so that all POCount starts a 
new mapreduce job.

> Multiquery execution of RANK with RANK BY causes NPE JobCreationException 
> "ERROR 2017: Internal error creating job configuration"
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3985
>                 URL: https://issues.apache.org/jira/browse/PIG-3985
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Philip (flip) Kromer
>              Labels: nullpointerexception, rank, udf
>         Attachments: many_ranks_much_sadness.pig, pig-3985-v01.txt, 
> us_city_pops.tsv
>
>
> A script with both RANK and RANK BY will crash with a Null Pointer Exception 
> in JobControlCompiler.java when multiquery is enabled.
> The following script will work for any combination of the RANK BY operations; 
> or if there is one RANK operation only (i.e. no other RANK or RANK BY 
> operation). Non-BY-RANKS will perish together but succeed alone.
> Disabling multiquery execution makes everything work again.
> I am using Hadoop 2.4.0 with Pig Trunk (d24d06a48, after PIG-3739). The error 
> occurs in local or mapreduce mode.
> {code}
> -- disable multiquery and you can rank all day long
> -- SET opt.multiquery false
> citypops = LOAD 'us_city_pops.tsv' AS (city:chararray, state:chararray, 
> pop_2011:int);
> citypops_o = ORDER citypops BY city;
> --
> -- if you have one non-by RANK you may not have any other RANKs
> --
> citypops_nosort_inplace    = RANK citypops;
> citypops_presorted_inplace = RANK citypops_o;
> citypops_ties_cause_skips  = RANK citypops   BY city;
> citypops_ties_no_skips     = RANK citypops   BY city  DENSE;
> citypops_presorted_ranked  = RANK citypops_o BY city;
> STORE citypops_nosort_inplace    INTO '/tmp/citypops_nosort_inplace'    USING 
> PigStorage('\t', '--overwrite true');
> -- STORE citypops_presorted_inplace INTO '/tmp/citypops_presorted_inplace' 
> USING PigStorage('\t', '--overwrite true');
> STORE citypops_ties_cause_skips  INTO '/tmp/citypops_ties_cause_skips'  USING 
> PigStorage('\t', '--overwrite true');
> -- STORE citypops_ties_no_skips     INTO '/tmp/citypops_ties_no_skips'     
> USING PigStorage('\t', '--overwrite true');
> -- STORE citypops_presorted_ranked  INTO '/tmp/citypops_presorted_ranked'  
> USING PigStorage('\t', '--overwrite true');
> {code}
> {code}
> Pig Stack Trace
> ---------------
> ERROR 2017: Internal error creating job configuration.
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:946)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:322)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:200)
>      --- SNIP ----
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:886)
>         ... 19 more
> {code}
> The proximate offense seems to be that globalCounters.get(operationID) 
> returns null:
> {code}
>             if(mro.isRankOperation()) {
>                 Iterator<String> operationIDs = 
> mro.getRankOperationId().iterator();
>                 while(operationIDs.hasNext()) {
>                     String operationID = operationIDs.next();
>                     Iterator<Pair<String, Long>> itPairs = 
> globalCounters.get(operationID).iterator();
>                     Pair<String,Long> pair = null;
>                     while(itPairs.hasNext()) {
>                         pair = itPairs.next();
>                         conf.setLong(pair.first, pair.second);
>                     }
>                 }
>             }
> {code}
> PORank.java line 184 seems to need a counter value, and so this part does 
> need to happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to