Re: java.io.IOException: Could not get block locations. Aborting...

Brian Bockelman Mon, 09 Feb 2009 17:57:22 -0800


On Feb 9, 2009, at 7:50 PM, jason hadoop wrote:

The other issue you may run into, with many files in your HDFS isthat you
may end up with more than a few 100k worth of blocks on each of your
datanodes. At present this can lead to instability due to the way the
periodic block reports to the namenode are handled. The more blocksper
datanode, the larger the risk of congestion collapse in your hdfs.

Of course, if you stay below, say, 500k, you don't have much of a riskof congestion.

In our experience, 500k blocks or less is going to be fine with decenthardware. Between 500k and 750k, you will hit a wall somewheredepending on your hardware. Good luck getting anything above 750k.

The recommendation is that you keep this number as low as possible --and explore the limits of your system and hardware in testing beforeyou discover them in production :)


Brian

On Mon, Feb 9, 2009 at 5:11 PM, Bryan Duxbury <br...@rapleaf.com>wrote:
Correct.

+1 to Jason's more unix file handles suggestion. That's a must-have.

-Bryan


On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:

This would be an addition to the hadoop-site.xml file, to up
dfs.datanode.max.xcievers?

Thanks.



On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
Small files are bad for hadoop. You should avoid keeping a lot ofsmall
files if possible.
That said, that error is something I've seen a lot. It usuallyhappenswhen the number of xcievers hasn't been adjusted upwards from thedefault of256. We run with 8000 xcievers, and that seems to solve ourproblems. Ithink that if you have a lot of open files, this problem happensa lot
faster.

-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:

Hi all -
I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
      at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
      at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
      at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
It seems to be related to trying to write to many files toHDFS. I havea class extendingorg.apache.hadoop.mapred.lib.MultipleOutputFormat and if Ioutput to a few file names, everything works. However, if Ioutput tothousands of small files, the above error occurs. I'm havingtroubleisolating the problem, as the problem doesn't occur in thedebugger
unfortunately.
Is this a memory issue, or is there an upper limit to the numberof
files HDFS can hold?  Any settings to adjust?

Thanks.

Re: java.io.IOException: Could not get block locations. Aborting...

Reply via email to