[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699277#action_12699277 ]
Vadim Zaliva commented on PIG-766: ---------------------------------- increasing sort buffer to 500Mb did not work for me. since implementation of many basic algorithms (like counting number of records in relationship) in PIG requires using GROUP BY which could produce very long records (up to number of tuples in relationship), this is a very serious problem. Potentially record could exceed available Java heap memory. What are the strategies for overcoming this limitation? Does pig plan to address this? > ava.lang.OutOfMemoryError: Java heap space > ------------------------------------------ > > Key: PIG-766 > URL: https://issues.apache.org/jira/browse/PIG-766 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.2.0 > Environment: Hadoop-0.18.3 (cloudera RPMs). > mapred.child.java.opts=-Xmx1024m > Reporter: Vadim Zaliva > > My pig script always fails with the following error: > Java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2786) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at java.io.FilterOutputStream.write(FilterOutputStream.java:80) > at > org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) > at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) > at > org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) > at > org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) > at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) > at > org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) > at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.