Re: HDFS drive, partition best practice

Jonathan Disher Mon, 07 Feb 2011 14:07:26 -0800

I've never seen an implementation of concat volumes that tolerate a disk 
failure, just like RAID0.

Currently I have a 48 node cluster using Dell R710's with 12 disks - two 250GB 
SATA drives in RAID1 for OS, and ten 1TB SATA disks as a JBOD (mounted on 
/data/0 through /data/9) and listed separately in hdfs-site.xml.  It works... 
mostly.  The big issues you will encounter is losing a disk - the DataNode 
process will crash, and if you comment out the affected drive, when you replace 
it you will have 9 disks full to N% and one empty disk.  The DFS balancer 
cannot fix this - usually when I have data nodes down more than an hour, I 
format all drives in the box and rebalance.

We are building a new cluster aimed primarily at storage - we will be using 
SuperMicro 4U machines with 36 2TB SATA disks in three RAID6 volumes (for 
roughly 20TB usable per volume, 60 total), plus two SSD's for OS.  At this 
scale, JBOD is going to kill you (rebalancing 40-50TB, even when I bond 8 gigE 
interfaces together, takes too long).

-j

On Feb 7, 2011, at 12:25 PM, John Buchanan wrote:

> Hello,
> 
> My company will be building a small but quickly growing Hadoop deployment, 
> and I had a question regarding best practice for configuring the storage for 
> the datanodes.  Cloudera has a page where they recommend a JBOD configuration 
> over RAID.  My question, though, is whether they are referring to the 
> simplest definition of JBOD, that being literally just a collection of 
> heterogeneous drives, each with its own distinct partition and mount point?  
> Or are they referring to a concatenated span of heterogeneous drives 
> presented to the OS as a single device?
> 
> Through some digging I've discovered that data volumes may be specified in a 
> comma-delimited fashion in the hdfs-site.xml file and are then accessed 
> individually, but are of course all available within the pool.  To test this 
> I brought a Ubuntu Server 10.04 VM online (on Xen Cloud Platform) with 3 
> storage volumes.  The first is the OS, I created a single partition the 
> second and third, mounting them as /hadoop-datastore/a and 
> /hadoop-datastore/b respectively, specifying them in hdfs-site.xml in 
> comma-delimited fashion.  I then continued to construct a single node 
> pseudo-distributed install, executed the bin/start-all.sh script, and all 
> seems just great.  The volumes are 5GB each, and HDFS status page shows a 
> configured capacity of 9.84GB, so both are in use, I successfully added a 
> file using bin/hadoop dfs –put.
> 
> This lead me to think that perhaps an optimal datanode configuration would be 
> 2 drives in Raid1 for OS, then 2-4 additional drives for data, individually 
> partitioned, mounted, and configured in hdfs-site.xml.  Mirrored system 
> drives would make my node more robust but data drives would still be 
> independent.  I do realize that HDFS assures data redundancy at a higher 
> level by design, but if the loss of a single drive necessitated rebuilding an 
> entire node, and therefore being down in capacity during that period, just 
> doesn't seem to be the most efficient approach.
> 
> Would love to hear what others are doing in this regard, whether anyone is 
> using concatenated disks and whether the loss of a drive requires them to 
> rebuild the entire system.
> 
> John Buchanan

Re: HDFS drive, partition best practice

Reply via email to