Re: Datanode handling of single disk failure

Brian Bockelman Fri, 19 Dec 2008 10:43:33 -0800

Thank you Konstantin, this information will be useful.


Brian

On Dec 19, 2008, at 12:37 PM, Konstantin Shvachko wrote:

Brian Bockelman wrote:
Hello all,
I'd like to take the datanode's capability to handle multipledirectories to a somewhat-extreme, and get feedback on how wellthis might work.We have a few large RAID servers (12 to 48 disks) which we'd liketo transition to Hadoop. I'd like to mount each of the disksindividually (i.e., /mnt/disk1, /mnt/disk2, ....) and takeadvantage of Hadoop's replication - instead of pay the overhead toset up a RAID and still have to pay the overhead of replication.
In my experience this is the right way to go.
However, we're a bit concerned about how well Hadoop might handleone of the directories disappearing from underneath it. If asingle volume, say, /mnt/disk1 starts returning I/O errors, isHadoop smart enough to figure out that this whole volume isbroken? Or will we have to restart the datanode after any diskfailure for it to search the directory realize everything isbroken? What happens if you start up the datanode with a datadirectory that it can't write into?
In current implementation if at any point Datanode detects anunwritable orunreadable drive it shuts itself down logging a message what wentwrong and
reporting the problem to the name-node.
So yes if such thing happens you will have to restart the data-node.
But since the cluster takes care of data-node failures by re-replicating
lost blocks that should not be a problem.
Is anyone running in this fashion (i.e., multiple data directoriescorresponding to different disk volumes ... even better if you'redoing it with more than a few disks)?
We have a large experience running 4 drives per data-node (no RAID).
So this is not something new or untested.

Thanks,
--Konstantin

Re: Datanode handling of single disk failure

Reply via email to