Hi all, I am running a data-intensive job on 18 nodes on EC2, each with just 1.7GB of memory. The input size is 50GB, and as a result, my mapper splits it up automatically to 786 map tasks. This runs fine. However, I am setting the reduce task number to 18. This is where I get a java heap out of memory error:
java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.(String.java:216) at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) at java.nio.CharBuffer.toString(CharBuffer.java:1157) at org.apache.hadoop.io.Text.decode(Text.java:350) at org.apache.hadoop.io.Text.decode(Text.java:327) at org.apache.hadoop.io.Text.toString(Text.java:254) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430) at org.apache.hadoop.mapred.Child.main(Child.java:155)