Interesting conversation. What is your default filesystem? Are you using ext3?
On Tue, Feb 8, 2011 at 3:22 PM, Patrick Angeles <patr...@cloudera.com> wrote: > OT: > Allen, did you turn down a job offer from Google or something? GMail sends > everything from you straight to the spam folder. > > On Tue, Feb 8, 2011 at 12:17 PM, Patrick Angeles <patr...@cloudera.com> > wrote: >> >> >> On Tue, Feb 8, 2011 at 12:09 PM, Allen Wittenauer >> <awittena...@linkedin.com> wrote: >>> >>> On Feb 8, 2011, at 11:33 AM, Adam Phelps wrote: >>> >>> > On 2/7/11 2:06 PM, Jonathan Disher wrote: >>> >> Currently I have a 48 node cluster using Dell R710's with 12 disks - >>> >> two >>> >> 250GB SATA drives in RAID1 for OS, and ten 1TB SATA disks as a JBOD >>> >> (mounted on /data/0 through /data/9) and listed separately in >>> >> hdfs-site.xml. It works... mostly. The big issues you will encounter >>> >> is >>> >> losing a disk - the DataNode process will crash, and if you comment >>> >> out >>> >> the affected drive, when you replace it you will have 9 disks full to >>> >> N% >>> >> and one empty disk. >>> > >>> > If DataNode is going down after a single disk failure then you probably >>> > haven't set dfs.datanode.failed.volumes.tolerated in hdfs-site.xml. You >>> > can >>> > up that number to allow DataNode to tolerate dead drives. >>> >>> a) only if you have a version that supports it >>> >>> b) that only protects you on the DN side. The TT is, AFAIK, still >>> susceptible to drive failures. >> >> c) And it only works when the drive fails on read (HDFS-457), not on write >> (HDFS-1273). >> > >