Hi all -
I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
at org.apache.hadoop.dfs.DFSClient
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access
$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream
$DataStreamer.run(DFSClient.java:1889)
It seems to be related to trying to write to many files to HDFS. I
have a class extending
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a
few file names, everything works. However, if I output to thousands
of small files, the above error occurs. I'm having trouble isolating
the problem, as the problem doesn't occur in the debugger unfortunately.
Is this a memory issue, or is there an upper limit to the number of
files HDFS can hold? Any settings to adjust?
Thanks.