Correct.
+1 to Jason's more unix file handles suggestion. That's a must-have.
-Bryan
On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:
This would be an addition to the hadoop-site.xml file, to up
dfs.datanode.max.xcievers?
Thanks.
On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
Small files are bad for hadoop. You should avoid keeping a lot of
small files if possible.
That said, that error is something I've seen a lot. It usually
happens when the number of xcievers hasn't been adjusted upwards
from the default of 256. We run with 8000 xcievers, and that seems
to solve our problems. I think that if you have a lot of open
files, this problem happens a lot faster.
-Bryan
On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:
Hi all -
I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
at org.apache.hadoop.dfs.DFSClient
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400
(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream
$DataStreamer.run(DFSClient.java:1889)
It seems to be related to trying to write to many files to HDFS.
I have a class extending
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output
to a few file names, everything works. However, if I output to
thousands of small files, the above error occurs. I'm having
trouble isolating the problem, as the problem doesn't occur in
the debugger unfortunately.
Is this a memory issue, or is there an upper limit to the number
of files HDFS can hold? Any settings to adjust?
Thanks.