> The big issues you will encounter is losing a disk - the DataNode process > will crash, and if you comment out the affected drive, > when you replace it you will have 9 disks full to N% and one empty disk. > The DFS balancer cannot fix this - usually when I have data nodes down more > than an hour, I format all drives in the box and rebalance.
Yeah this bites us when we add a disk, love getting monitors going off for "disk 90% full" when you've got the new disk at <10%. We've tried a few tricks moving the reserved blocks up to force 'balance' it but it's pretty ineffective by and large. >> but if the loss of a single drive necessitated rebuilding an entire node, >> and therefore being down in capacity during that period, >> just doesn't seem to be the most efficient approach This bit about rebuilding the entire node isn't true, that's just Jonathan's choice to wipe the node & an interesting one it is (we might consider that for our small cluster). Lose a disk & you lose just the capacity of that disk from the entire pool of space in the cluster. 1 out of 3 copies of *some* of the HDFS blocks go away, not the entire nodes blocks, usually this wouldn't be very much of a loss (typical 4 disk boxes, x XYZ boxes = quite a few disks). The 1 missing replica will likely be re-copied (I often say re-built, but that's RAID) before you put the new disk in, but say somehow you were 100% full, you'd add the new disk and the blocks which were in a 2 copies/replica state would copy themselves a 3rd time. (the lack of inter-node disk balance is an issue again here) > We are building a new cluster aimed primarily at storage - we will be using > SuperMicro 4U machines > with 36 2TB SATA disks in three RAID6 volumes (for roughly 20TB usable per > volume, 60 total), I really like the SuperMicro cases for big disk boxes. What are you using to run the 36 disks all at once ? Scott Golby