dan wrote:
> Unfortunately there is a 'rebuild hole' in many redundant
> configurations.  In RAID1 that is when one drive fails and just one
> remains.  This can be eliminated by running 3 drives so that 1 drive can
> fail and 2 would still be operational.
> There are plenty of charts online to give % of redundancy for regular
> RAID arrays.

I must admit, this is something I have never given a lot of thought
to... Then again, I've not yet worked in an environment with large
numbers of disks. Of course, that is no excuse, and I'm always
interested in filling in knowledge gaps...

Is it really worthwhile considering a 3 drive RAID1 system, or even a 4
drive RAID1 system (one hot spare). Of course, worthwhile depends on the
cost of not having access to the data, but from a "best practice" point
of view. ie, Looking at any of the large "online backup" companies, or
gmail backend, etc... what level of redundancy is considered acceptable.
(Somewhat surprising actually that google/hotmail/yahoo/etc have ever
lost any data...)

> With a modern filesystem capable of multiple copies of each file this
> can be overcome. ZFS can handle multiple drive failures by selecting the
> number of redundant copies of each file to store on different physical
> volumes.  Simply put, a ZFS RAIDZ with 4 drives can be set to have 3
> copies which would allow 2 drives to fail.  This is somewhat better than
> RAID1 and RAID5  both because more storage is available yet still allows
> up to 2 drives to fail before leaving a rebuild hole where the storage
> is vulnerable to a single drive failure during a rebuild or resilver.

So, using 4 x 100G drives provides 133G usable storage... we can lose
any two drives without any data loss. However, from my calculations
(which might be wrong), RAID6 would be more efficient. On a 4 drive 100G
system you get 200G available storage, and can lose any two drives
without data loss.

> Standard RAID is not going to have this capability and is going to
> require more drives to improve though each drive also decreases
> reliability has more drives are likely to fail.

Well, doesn't RAID6 do exactly that (add an additional drive to improve
data security)? How is ZFS better than RAID6? Not that I am suggesting
ZFS is bad, I'm just trying to understand the differences...

> ZFS also is able to put metadata on a different volume and even have a
> cache on a different volume which can spread out the chance of a loss. 
> very complicated schemes can be developed to minimize data loss.

In my experience, if it is too complicated:
1) Very few people use it because they don't understand it
2) Some people who use it, use it in-correctly, and then don't
understand why they lose data (see the discussion of people who use RAID
controller cards but don't know enough to read the logfile on the RAID
card when recovering from failed drives).

Also, I'm not sure what the advantage of metadata on a different volume
is? If you lose all your metadata how easily will you recover your
files? Perhaps you should be just as concerned about protecting your
metadata as you do for your data, thus why separate it?

What is the advantage of using another volume as a cache ? Sure, you
might be lucky enough that the data you need is still in cache when you
lose the whole array, but that doesn't exactly sound like a scenario to
plan for? (For performance, the cache might be a faster/more expensive
drive, (read SSD or similar) but we are discussing reliability here)

> This is precisely the need for next-gen filesystems like ZFS and soon
> BTRFS.  To fill these gaps in storage needs.  Imagine the 10TB drives of
> tomorrow that are only capable of being read at 100MB/s.  Thats a 30
> hour rebuild under ideal conditions.  even when SATA3 or SATA6 are
> standardized (or SAS) you can cut that to 7.5 or 15 hours but that is
> still a very large window for a rebuild.  

Last time I heard of someone using ZFS for their backuppc pool under
linux, they didn't seem to consider it ready for production use due to
the significant failures. Is this still true, or did I mis-read something?

Personally, I used reiserfs for years, and once or twice had some
problems with it (actually due to RAID hardware problems). I have
somewhat moved to ext3 now due to the 'stigma' that seems to be attached
to reiserfs. I don't want to move to another FS before it is very stable...

> On-line rebuilds and
> filesystems aware of the disk systems are becoming more and more relevant.

I actually thought it would be better to disable these since it:
1) increases wear 'n' tear on the drives
2) what happens if you have a drive failure in the middle of the rebuild?

Mainly the 2nd one scared me the most.

Sorry for such a long post, but hopefully a few other people will learn
a thing or two about storage, which is fundamentally important to


