> The big issues you will encounter is losing a disk - the DataNode process 
> will crash, and if you comment out the affected drive,
> when you replace it you will have 9 disks full to N% and one empty disk.  
> The DFS balancer cannot fix this - usually when I have data nodes down more 
> than an hour, I format all drives in the box and rebalance.

Yeah this bites us when we add a disk, love getting monitors going off for 
"disk 90% full" when you've got the new disk at <10%.  We've tried a few tricks 
moving the reserved blocks up to force 'balance' it but it's pretty ineffective 
by and large.


>> but if the loss of a single drive necessitated rebuilding an entire node, 
>> and therefore being down in capacity during that period, 
>> just doesn't seem to be the most efficient approach

This bit about rebuilding the entire node isn't true, that's just Jonathan's 
choice to wipe the node & an interesting one it is (we might consider that for 
our small cluster).  Lose a disk & you lose just the capacity of that disk from 
the entire pool of space in the cluster.

1 out of 3 copies of *some* of the HDFS blocks go away, not the entire nodes 
blocks, usually this wouldn't be very much of a loss (typical 4 disk boxes, x 
XYZ boxes = quite a few disks).  The 1 missing replica will likely be re-copied 
(I often say re-built, but that's RAID) before you put the new disk in, but say 
somehow you were 100% full, you'd add the new disk and the blocks which were in 
a 2 copies/replica state would copy themselves a 3rd time.  (the lack of 
inter-node disk balance is an issue again here)


> We are building a new cluster aimed primarily at storage - we will be using 
> SuperMicro 4U machines 
> with 36 2TB SATA disks in three RAID6 volumes (for roughly 20TB usable per 
> volume, 60 total), 

I really like the SuperMicro cases for big disk boxes.  What are you using to 
run the 36 disks all at once ?

Scott Golby

Reply via email to