On Feb 8, 2011, at 7:20 AM, John Buchanan wrote: > What we were thinking for our first deployment was 10 HP DL385's each with > 8 2TB SATA drives. First pair in Raid1 for the system drive, the > remaining each containing a distinct partition and mount point, then > specified in hdfs-site.xml in comma-delimited fashion. Seems to make more > sense to use Raid at least for the system drives so the loss of 1 drive > won't take down the entire node. Granted data integrity wouldn't be > affected but how much time do you want to spend rebuilding an entire node > due to the loss of one drive. Considered using a smaller pair for the > system drives but if they're all the same then we only need to stock one > type of spare drive.
Don't bother RAID'ing the system drive. Seriously. You're giving up performance for something that rarely happens. If you have decent configuration management, rebuilding a node is not a big deal and doesn't take that long anyway. Besides, losing one of the JBOD disks will likely bring the node down anyway. > Another question I have is whether using 1TB drives would be advisable > over 2TB for the purpose of reducing rebuild time. You're over thinking the rebuild time. Again, configuration management makes this a non-issue. > Or perhaps I'm still > thinking of this as I would a Raid volume. If we needed to rebalance > across the cluster would the time needed be more dependent on the amount > of data involved and the connectivity between nodes? Yes. When a node goes down, the data and tasks are automatically moved. So a node can be down for as long as it needs to be down. The grid will still be functional. So don't panic if a compute node goes down. :)