Hi all,
We've been running a pretty big job on 20 extra-large high-CPU EC2 servers
(Hadoop version 0.18, Java 1.6, the standard AMIs), and started getting the
dreaded "Could not find any valid local directory" error during the final
reduce phase.

I've confirmed that some of the boxes are running out of space, but the disk
usage seems to be very uneven across the servers. The datanodes report 50%
of available space used on all servers, which matches what I'm seeing in the
/mnt/hadoop/dfs/data/folder (an even ~200 Gb / server). But the space used
by files in /mnt/hadoop/mapred/local differs a lot from server to server
(going from 70 Gb to 190 Gb).

Is there any way to predict how much space will be used by the temporary
data stored outside of HDFS? We're only running a total of 20 reducers which
I suspect is very low, since there are a few thousand map tasks. Could that
be the cause or is there anything else we're doing that's obviously wrong?

Besides this, we're also getting this error: "java.lang.OutOfMemoryError: GC
overhead limit exceeded"

Thanks for any help,
/ Per

Reply via email to