[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699277#action_12699277
 ] 

Vadim Zaliva commented on PIG-766:
----------------------------------

increasing sort buffer to 500Mb did not work for me.

since implementation of many basic algorithms (like counting number of records 
in relationship) in PIG requires using GROUP BY which could produce very long 
records (up to number of tuples in relationship), this is a very serious 
problem. Potentially record could exceed available Java heap memory. What are 
the strategies for overcoming this limitation? Does pig plan to address this?


> ava.lang.OutOfMemoryError: Java heap space
> ------------------------------------------
>
>                 Key: PIG-766
>                 URL: https://issues.apache.org/jira/browse/PIG-766
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop-0.18.3 (cloudera RPMs).
> mapred.child.java.opts=-Xmx1024m
>            Reporter: Vadim Zaliva
>
> My pig script always fails with the following error:
> Java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOf(Arrays.java:2786)
>        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>        at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
>        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
>        at 
> org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
>        at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
>        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
>        at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
>        at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>        at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
>        at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
>        at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>        at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to