Hi all -

I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889)

It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately.

Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust?

Thanks.

Reply via email to