[jira] Commented: (PIG-1442) java.lang.OutOfMemoryError: Java heap space (Reopen of PIG-766)

2010-06-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878336#action_12878336
 ] 

Ashutosh Chauhan commented on PIG-1442:
---

This looks like a variant of PIG-1446 and PIG-1448 PigCombiner attaches the 
tuple to the roots of combine plan, but never detaches them. PODemux also 
attach the tuple to the inner plan, but never detaches it. Note that 
PigCombiner may also contain multiple pipelines depending on number of 
operations done inside For Each resulting in similar problems as described in 
PIG-1448.

> java.lang.OutOfMemoryError: Java heap space (Reopen of PIG-766)
> ---
>
> Key: PIG-1442
> URL: https://issues.apache.org/jira/browse/PIG-1442
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0, 0.7.0
> Environment: Apache-Hadoop 0.20.2 + Pig 0.7.0 and also for 0.8.0-dev 
> (18/may)
> Hadoop-0.18.3 (cloudera RPMs) + PIG 0.2.0
>Reporter: Dirk Schmid
>
> As mentioned by Ashutosh this is a reopen of 
> https://issues.apache.org/jira/browse/PIG-766 because there is still a 
> problem which causes that PIG scales only by memory.
> For convenience here comes the last entry of the PIG-766-Jira-Ticket:
> {quote}1. Are you getting the exact same stack trace as mentioned in the 
> jira?{quote} Yes the same and some similar traces:
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:2786)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>   at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:279)
>   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
>   at 
> org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:249)
>   at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:214)
>   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
>   at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:209)
>   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>   at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:179)
>   at 
> org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880)
>   at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201)
>   at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:199)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>   at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2563)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)
> java.lang.OutOfMemoryError: Java heap space
>   at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:58)
>   at 
> org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35)
>   at 
> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:61)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>   at 
> org.apache.pig.data.DefaultAbstractBag.readFields(DefaultAbstractBag.java:263)
>   at 
> org.apache.pig.data.DataReaderWriter.bytesToBag(DataReaderWriter.java:71)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:145)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>   at 
> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:63)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(D

[jira] Commented: (PIG-1442) java.lang.OutOfMemoryError: Java heap space (Reopen of PIG-766)

2010-07-15 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888961#action_12888961
 ] 

Thejas M Nair commented on PIG-1442:


I investigated the reason for OutOfMemory error in another query which also had 
a similar  'distinct + count() in a nested foreach statement'. The reason for 
failure there was the combiner was producing very large records with very large 
bag of distinct values, when the combiner (which does distinct) was run 
intermediate merge-sort results in reduce. And it hits runs out of memory 
because MapReduce framework merger loads key-value pair from all the merge 
streams. This is being fixed in HADOOP-5494 .
This issue with large records can happen if any of the groups that you are 
doing distinct on has very large number of values. 

You can disable combiner using -Dpig.exec.nocombiner=true on the commandline. 
That is likely to get this query working.  Please let us know if you are able 
to try it .

Fixing the missing detach in POCombiner and PODemux will certainly help in 
releasing the memory earlier. I will be submitting a patch for that.


> java.lang.OutOfMemoryError: Java heap space (Reopen of PIG-766)
> ---
>
> Key: PIG-1442
> URL: https://issues.apache.org/jira/browse/PIG-1442
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0, 0.7.0
> Environment: Apache-Hadoop 0.20.2 + Pig 0.7.0 and also for 0.8.0-dev 
> (18/may)
> Hadoop-0.18.3 (cloudera RPMs) + PIG 0.2.0
>Reporter: Dirk Schmid
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> As mentioned by Ashutosh this is a reopen of 
> https://issues.apache.org/jira/browse/PIG-766 because there is still a 
> problem which causes that PIG scales only by memory.
> For convenience here comes the last entry of the PIG-766-Jira-Ticket:
> {quote}1. Are you getting the exact same stack trace as mentioned in the 
> jira?{quote} Yes the same and some similar traces:
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:2786)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>   at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:279)
>   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
>   at 
> org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:249)
>   at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:214)
>   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
>   at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:209)
>   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>   at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:179)
>   at 
> org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880)
>   at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201)
>   at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:199)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>   at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2563)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)
> java.lang.OutOfMemoryError: Java heap space
>   at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:58)
>   at 
> org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35)
>   at 
> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:61)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142)
>   at 
> org.apache.pig.data.DataReaderWriter.re

[jira] Commented: (PIG-1442) java.lang.OutOfMemoryError: Java heap space (Reopen of PIG-766)

2010-07-17 Thread Dirk Schmid (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889539#action_12889539
 ] 

Dirk Schmid commented on PIG-1442:
--

Yup - thats it - thank you very much for your help. 
Using {{pig.exec.nocombiner=true}} the script able to run smooth without any 
OOME.

> java.lang.OutOfMemoryError: Java heap space (Reopen of PIG-766)
> ---
>
> Key: PIG-1442
> URL: https://issues.apache.org/jira/browse/PIG-1442
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0, 0.7.0
> Environment: Apache-Hadoop 0.20.2 + Pig 0.7.0 and also for 0.8.0-dev 
> (18/may)
> Hadoop-0.18.3 (cloudera RPMs) + PIG 0.2.0
>Reporter: Dirk Schmid
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> As mentioned by Ashutosh this is a reopen of 
> https://issues.apache.org/jira/browse/PIG-766 because there is still a 
> problem which causes that PIG scales only by memory.
> For convenience here comes the last entry of the PIG-766-Jira-Ticket:
> {quote}1. Are you getting the exact same stack trace as mentioned in the 
> jira?{quote} Yes the same and some similar traces:
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:2786)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>   at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:279)
>   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
>   at 
> org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:249)
>   at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:214)
>   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
>   at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:209)
>   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>   at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:179)
>   at 
> org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880)
>   at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201)
>   at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:199)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>   at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2563)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)
> java.lang.OutOfMemoryError: Java heap space
>   at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:58)
>   at 
> org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35)
>   at 
> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:61)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>   at 
> org.apache.pig.data.DefaultAbstractBag.readFields(DefaultAbstractBag.java:263)
>   at 
> org.apache.pig.data.DataReaderWriter.bytesToBag(DataReaderWriter.java:71)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:145)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>   at 
> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:63)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142)
>   at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>   at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:284)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.