Re: Datanode handling of single disk failure

Konstantin Shvachko Fri, 19 Dec 2008 10:38:07 -0800


Brian Bockelman wrote:

Hello all,
I'd like to take the datanode's capability to handle multipledirectories to a somewhat-extreme, and get feedback on how well thismight work.
We have a few large RAID servers (12 to 48 disks) which we'd like totransition to Hadoop. I'd like to mount each of the disks individually(i.e., /mnt/disk1, /mnt/disk2, ....) and take advantage of Hadoop'sreplication - instead of pay the overhead to set up a RAID and stillhave to pay the overhead of replication.


In my experience this is the right way to go.

However, we're a bit concerned about how well Hadoop might handle one ofthe directories disappearing from underneath it. If a single volume,say, /mnt/disk1 starts returning I/O errors, is Hadoop smart enough tofigure out that this whole volume is broken? Or will we have to restartthe datanode after any disk failure for it to search the directoryrealize everything is broken? What happens if you start up the datanodewith a data directory that it can't write into?


In current implementation if at any point Datanode detects an unwritable or
unreadable drive it shuts itself down logging a message what went wrong and
reporting the problem to the name-node.
So yes if such thing happens you will have to restart the data-node.
But since the cluster takes care of data-node failures by re-replicating
lost blocks that should not be a problem.

Is anyone running in this fashion (i.e., multiple data directoriescorresponding to different disk volumes ... even better if you're doingit with more than a few disks)?


We have a large experience running 4 drives per data-node (no RAID).
So this is not something new or untested.

Thanks,
--Konstantin

Re: Datanode handling of single disk failure

Reply via email to