On Tue, Jun 2, 2009 at 12:36 AM, Adam Goryachev <
mailingli...@websitemanagers.com.au> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> dan wrote:
> > Unfortunately there is a 'rebuild hole' in many redundant
> > configurations. In RAID1 that is when one drive fails and just one
> > remains. This can be eliminated by running 3 drives so that 1 drive can
> > fail and 2 would still be operational.
> >
> > There are plenty of charts online to give % of redundancy for regular
> > RAID arrays.
>
> I must admit, this is something I have never given a lot of thought
> to... Then again, I've not yet worked in an environment with large
> numbers of disks. Of course, that is no excuse, and I'm always
> interested in filling in knowledge gaps...
>
> Is it really worthwhile considering a 3 drive RAID1 system, or even a 4
> drive RAID1 system (one hot spare). Of course, worthwhile depends on the
> cost of not having access to the data, but from a "best practice" point
> of view. ie, Looking at any of the large "online backup" companies, or
> gmail backend, etc... what level of redundancy is considered acceptable.
> (Somewhat surprising actually that google/hotmail/yahoo/etc have ever
> lost any data...)
>
Redundancy is the key for these companies. They use databases that can be
spread out among servers and replicated many times accross their network.
Google for instance could have 20 copies of data in different servers and a
catastrophic loss at one facility has no effect on the whole(or little
effect anyway)
I might also add that these companies have a lot of losable data. Caches
for website is simply rebuilt in the even data is lost.
>
> > With a modern filesystem capable of multiple copies of each file this
> > can be overcome. ZFS can handle multiple drive failures by selecting the
> > number of redundant copies of each file to store on different physical
> > volumes. Simply put, a ZFS RAIDZ with 4 drives can be set to have 3
> > copies which would allow 2 drives to fail. This is somewhat better than
> > RAID1 and RAID5 both because more storage is available yet still allows
> > up to 2 drives to fail before leaving a rebuild hole where the storage
> > is vulnerable to a single drive failure during a rebuild or resilver.
>
> So, using 4 x 100G drives provides 133G usable storage... we can lose
> any two drives without any data loss. However, from my calculations
> (which might be wrong), RAID6 would be more efficient. On a 4 drive 100G
> system you get 200G available storage, and can lose any two drives
> without data loss.
>
Well, really the key to filesystems with build in volume management is that
a large array an be broken down into smaller chunks with various levels of
redundancy across different data stores. using 4x100 you would likely do a
raidz2 which calculates 2 peices of parity for each file which is something
like raid6.
The real issue with raid6 is abismal performance on software raid because of
the double parity compute and limited support in hardware cards and
similarly the load on the card's cpu and slower performance.
The arguement is always data safety vs access speed. Keep in mind that the
raid5 write whole also applies to raid6.
>
> > Standard RAID is not going to have this capability and is going to
> > require more drives to improve though each drive also decreases
> > reliability has more drives are likely to fail.
>
> Well, doesn't RAID6 do exactly that (add an additional drive to improve
> data security)? How is ZFS better than RAID6? Not that I am suggesting
> ZFS is bad, I'm just trying to understand the differences...
>
raid6 has a write whole during parity computation that can catch you
suprisingly often. zfs does not have this. not to be a zfs fanboy, btrfs
will also have such capabilities.
>
> > ZFS also is able to put metadata on a different volume and even have a
> > cache on a different volume which can spread out the chance of a loss.
> > very complicated schemes can be developed to minimize data loss.
>
> In my experience, if it is too complicated:
> 1) Very few people use it because they don't understand it
> 2) Some people who use it, use it in-correctly, and then don't
> understand why they lose data (see the discussion of people who use RAID
> controller cards but don't know enough to read the logfile on the RAID
> card when recovering from failed drives).
>
> Also, I'm not sure what the advantage of metadata on a different volume
> is? If you lose all your metadata how easily will you recover your
> files? Perhaps you should be just as concerned about protecting your
> metadata as you do for your data, thus why separate it?
>
> What is the advantage of using another volume as a cache ? Sure, you
> might be lucky enough that the data you need is still in cache when you
> lose the whole array, but that doesn't exactly sound like a scenario to
> plan for? (For performance, the cache might be a faster/more expensive
> drive, (read SSD or similar) but we are discussing reliability here)
>
as far as re-locating metadata, you can put metadata an an additional
redundant array for performance and additionally the parity checks on the
data can be spread accross different controllers which not only improves
performance but allows the parity to be calculated in paralell which means
it will be completed and written soon which means a smaller window for data
loss.
>
> > This is precisely the need for next-gen filesystems like ZFS and soon
> > BTRFS. To fill these gaps in storage needs. Imagine the 10TB drives of
> > tomorrow that are only capable of being read at 100MB/s. Thats a 30
> > hour rebuild under ideal conditions. even when SATA3 or SATA6 are
> > standardized (or SAS) you can cut that to 7.5 or 15 hours but that is
> > still a very large window for a rebuild.
>
> Last time I heard of someone using ZFS for their backuppc pool under
> linux, they didn't seem to consider it ready for production use due to
> the significant failures. Is this still true, or did I mis-read something?
>
> Personally, I used reiserfs for years, and once or twice had some
> problems with it (actually due to RAID hardware problems). I have
> somewhat moved to ext3 now due to the 'stigma' that seems to be attached
> to reiserfs. I don't want to move to another FS before it is very stable...
>
ZFS on linux = bad. ZFS is a solaris thing and will be for some time.
someday *BSD will have stable ZFS but I doubt linux ever will. btrfs will
likely be in wide use which will serve many of the same purposes as ZFS.
The reason to use a nextgen filesystem is exactly as you stated above. some
dataloss caused by funky raid hardware. unlike traditional RAID where the
filesystem has no awareness of the disk geometry and disk errors, you can
get silent corruption that the filesystem isnt aware of and cant take steps
to correct. next-gen filesystems like btrfs and zfs are aware of disk
geometry and can correct for these issues.
reiserfs is a good filesystem but is succeptable to disk errors. this is
typical of older filesystems.
>
> > On-line rebuilds and
> > filesystems aware of the disk systems are becoming more and more
> relevant.
>
> I actually thought it would be better to disable these since it:
> 1) increases wear 'n' tear on the drives
> 2) what happens if you have a drive failure in the middle of the rebuild?
1) this is an arguable point. many would say that disk usable makes little
to bo difference on disk life. heat is what effect disk life most. also,
in important workloads disks should have a scheduled lifetime and be rotated
out.
2)drive failure during rebuild is certianly a worst case but likely
scenario. it is even more likely because all of the disks in an array are
likely the same age and may all be nearling the MTBF which increases the
probability of failure. this is what raid6 or 10 or 5+ or whatever multiple
redundant disk raidlevel is for.
>
>
> Mainly the 2nd one scared me the most.
>
> Sorry for such a long post, but hopefully a few other people will learn
> a thing or two about storage, which is fundamentally important to
> backuppc...
>
> Regards,
> Adam
>
I have really done a ton of testing on filesystems in MySQL and PostgreSQL
environments as well as backuppc systems. There is a catch22 for backuppc.
The absolute best filesystem I have found for backuppc is zfs used directly
on the physical volumes running with raidz2 on sata/sas and metadata and
cache on SSD. The catch22 is that this must be run on *solaris. *BSD does
not have a zfs ready for such uses.
With that being said, I am running backuppc with debian systems on raid10
with ext3. I run a limited number of hosts on each machine and use multiple
servers to handle my needs. my raid10 is a 4 drive setup in Dell 2U
hardware(I have 2 spare drive bays, ready for zfs and some SSDs. I am not a
slowlaris fan and am dying to see stable zfs on bsd.
------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/