Hi, I've been running our app on EC2 using the small instances and it's been mostly fine. Very occasionally a task will die due to a heap out of memory exception. So far these failed tasks have successfully been restarted by Hadoop on other nodes and the job has run to completion.
I want to know how to avoid those occasional out of memory problems. I tried increasing the mapred.child.java.opts from -Xmx550m to -Xmx768m but this caused more and much quicker out of memory exceptions. Can someone help me understand why? I then reduced it to -Xmx400m and it is running ok so far. My application is a custom threaded maprunnable app and I often have hundreds of threads operating at the same time. Cheers, John
