Hi all,
    I am running a data-intensive job on 18 nodes on EC2, each with just
1.7GB of memory.  The input size is 50GB, and as a result, my mapper splits
it up automatically to 786 map tasks.  This runs fine.  However, I am
setting the reduce task number to 18.  This is where I get a java heap out
of memory error:

java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3209)
        at java.lang.String.(String.java:216)
        at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
        at java.nio.CharBuffer.toString(CharBuffer.java:1157)
        at org.apache.hadoop.io.Text.decode(Text.java:350)
        at org.apache.hadoop.io.Text.decode(Text.java:327)
        at org.apache.hadoop.io.Text.toString(Text.java:254)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
        at org.apache.hadoop.mapred.Child.main(Child.java:155)

Reply via email to