[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698996#action_12698996
 ] 

Alan Gates commented on PIG-766:
--------------------------------

It isn't overall data size that matters.  It is the size of a given key.  So if 
you have a 2G data set up it has only one key (that is, every row has that 
key), then you'll hit this problem (assuming you can't fit 2G in memory on your 
data nodes).  Pig does try to spill to avoid this, but has a hard time knowing 
when and how much to spill, and thus often runs out of memory.

But I think you're right that this isn't in the join.  From the stack it looks 
like it's trying to write data out of the map task.  Do you have very large 
rows in this data?

> ava.lang.OutOfMemoryError: Java heap space
> ------------------------------------------
>
>                 Key: PIG-766
>                 URL: https://issues.apache.org/jira/browse/PIG-766
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop-0.18.3 (cloudera RPMs).
> mapred.child.java.opts=-Xmx1024m
>            Reporter: Vadim Zaliva
>
> My pig script always fails with the following error:
> Java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOf(Arrays.java:2786)
>        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>        at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
>        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
>        at 
> org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
>        at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
>        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
>        at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
>        at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>        at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
>        at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
>        at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>        at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to