[jira] [Updated] (PIG-4722) [Pig on Tez] NPE while running Combiner

2015-11-13 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4722:

Attachment: PIG-4722-1.patch

Attached patch just fixes the NPE and avoids additional threadlocal call.  This 
is assuming that moving of PigProcessor.close() in TEZ-2937 will avoid cleanup 
being done when SpillThread is running combiner.  If TEZ-2937 introduces a new 
API then will have another patch depending on that.

> [Pig on Tez] NPE while running Combiner
> ---
>
> Key: PIG-4722
> URL: https://issues.apache.org/jira/browse/PIG-4722
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4722-1.patch
>
>
> DefaultSorter in Tez calls Combiner from two different threads - during spill 
> in SpillThread and flush in the main thread. If both run the combiner, one 
> ends up with NPE as Reporter is set on only one thread in initialization. 
> {code}
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
> at 
> org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:303)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNextTuple(POProject.java:403)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
> ... 12 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4722) [Pig on Tez] NPE while running Combiner

2015-11-13 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4722:

Status: Patch Available  (was: Open)

> [Pig on Tez] NPE while running Combiner
> ---
>
> Key: PIG-4722
> URL: https://issues.apache.org/jira/browse/PIG-4722
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4722-1.patch
>
>
> DefaultSorter in Tez calls Combiner from two different threads - during spill 
> in SpillThread and flush in the main thread. If both run the combiner, one 
> ends up with NPE as Reporter is set on only one thread in initialization. 
> {code}
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
> at 
> org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:303)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNextTuple(POProject.java:403)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
> ... 12 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4722) [Pig on Tez] NPE while running Combiner

2015-11-13 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4722:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the review Daniel.

> [Pig on Tez] NPE while running Combiner
> ---
>
> Key: PIG-4722
> URL: https://issues.apache.org/jira/browse/PIG-4722
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4722-1.patch
>
>
> DefaultSorter in Tez calls Combiner from two different threads - during spill 
> in SpillThread and flush in the main thread. If both run the combiner, one 
> ends up with NPE as Reporter is set on only one thread in initialization. 
> {code}
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
> at 
> org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:303)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNextTuple(POProject.java:403)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
> ... 12 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4722) [Pig on Tez] NPE while running Combiner

2015-11-10 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4722:

 Assignee: Rohini Palaniswamy
Fix Version/s: 0.16.0

Actually my initial analysis was incorrect. Combiner  is called multiple times 
from SpillThread and also called from ExternalSorter.flush() which is a 
different thread. But setup(),reduce() and cleanup() is always called as a 
batch everytime, so there is no cross thread issue there with initialization or 
cleanup. The actual problem was PigProcessor.close() was being called when 
spill thread was still running combiner. So even though PhysicalOperator.java 
had a getReporter() != null check, it became null when calling 
getReporter().progress() as PigProcessor.close() was resetting it.

{code}
if (getReporter() != null) {
getReporter().progress();
}
{code}

Need to ensure that PigProcessor.close does not cleanup static data when 
Combiner is running.

> [Pig on Tez] NPE while running Combiner
> ---
>
> Key: PIG-4722
> URL: https://issues.apache.org/jira/browse/PIG-4722
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
>
> DefaultSorter in Tez calls Combiner from two different threads - during spill 
> in SpillThread and flush in the main thread. If both run the combiner, one 
> ends up with NPE as Reporter is set on only one thread in initialization. 
> {code}
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
> at 
> org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780)
> at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:303)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNextTuple(POProject.java:403)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
> ... 12 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4722) [Pig on Tez] NPE while running Combiner

2015-10-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4722:

Description: 
DefaultSorter in Tez calls Combiner from two different threads - during spill 
in SpillThread and flush in the main thread. If both run the combiner, one ends 
up with NPE as Reporter is set on only one thread in initialization. 

{code}
java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191)
at 
org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
at 
org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708)
Caused by: java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:303)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNextTuple(POProject.java:403)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
... 12 more
{code}

  was:
DefaultSorter in Tez calls Combiner from two different threads - during spill 
in SpillThread and flush in the main thread. If both run the combiner, one ends 
up with NPE as Reporter is set on only one thread in initialization. 

{code}
java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:332)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:128)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:191)
at 
org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
at 
org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:279)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:854)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:780)
at 
org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:708)
{code}


> [Pig on Tez] NPE while running Combiner
> ---
>
> Key: PIG-4722
> URL: https://issues.apache.org/jira/browse/PIG-4722
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>
> DefaultSorter in Tez calls Combiner from two different threads - during spill 
> in SpillThread and flush in the main thread. If both run the combiner, one 
> ends up with NPE as Reporter is set on only one thread in initialization. 
> {code}
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
> at 
>