Please don't misinterpret this post: ZFS's ability to recover from fairly
catastrophic failures is pretty stellar, but I'm wondering if there can be

from my testing it is exactly opposite. You have to see a difference between marketing and reality.

a little room for improvement.

I use RAID pretty much everywhere.  I don't like to loose data and disks
are cheap.  I have a fair amount of experience with all flavors ... and ZFS

just like me. And because i want performance and - as you described - disks are cheap - i use RAID-1 (gmirror).

has become a go-to filesystem for most of my applications.

My applications doesn't tolerate low performance, overcomplexity and high risk of data loss.

That's why i use properly tuned UFS, gmirror, and prefer not to use gstripe but have multiple filesystems

One of the best recommendations I can give for ZFS is it's
crash-recoverability.

Which is marketing, not truth. If you want bullet-proof recoverability, UFS beats everything i've ever seen.

If you want FAST crash recovery, use softupdates+journal, available in FreeBSD 9.

 As a counter example, if you have most hardware RAID
going or a software whole-disk raid, after a crash it will generally
declare one disk as good and the other disk as "to be repaired" ... after
which a full surface scan of the affected disks --- reading one and writing
the other --- ensues.

true. gmirror do it, but you can defer mirror rebuild, which i use.
I have a script that send me a mail when gmirror is degraded, and i - after finding out the cause of problem, and possibly replacing disk - run rebuild after work hours, so no slowdown is experienced.

ZFS is smart on this point: it will recover on reboot with a minimum amount
of fuss.  Even if you dislodge a drive ... so that it's missing the last
'n' transactions, ZFS seems to figure this out (which I thought was extra
cudos).

Yes this is marketing. practice is somehow different. as you discovered yourself.


MY PROBLEM comes from problems that scrub can fix.

Let's talk, in specific, about my home array.  It has 9x 1.5T and 8x 2T in
a RAID-Z configuration (2 sets, obviously).

While RAID-Z is already a king of bad performance, i assume you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools you would spread load unevenly and make performance even worse.


A full scrub of my drives weighs in at 36 hours or so.

which is funny as ZFS is marketed as doing this efficient (like checking only used space).

dd if=/dev/disk of=/dev/null bs=2m would take no more than a few hours. and you may do all in parallel.

       vr2/cvs:<0x1c1>

Now ... this is just an example: after each scrub, the hex number was

seems like scrub simply not do it's work right.

before the old error was cleared.  Then this new error gets similarly
cleared by the next scrub.  It seems that if the scrub returned to this new
found error after fixing the "known" errors, this could save whole new
scrub runs from being required.

Even better - use UFS.
For both bullet proof recoverability and performance.
If you need help in tuning you may ask me privately.
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Reply via email to