Karl Denninger wrote:
On 5/7/2019 00:02, Michelle Sullivan wrote:
The problem I see with that statement is that the zfs dev mailing lists 
constantly and consistently following the line of, the data is always right 
there is no need for a “fsck” (which I actually get) but it’s used to shut down 
every thread... the irony is I’m now installing windows 7 and SP1 on a usb 
stick (well it’s actually installed, but sp1 isn’t finished yet) so I can 
install a zfs data recovery tool which reports to be able to “walk the data” to 
retrieve all the files...  the irony eh... install windows7 on a usb stick to 
recover a FreeBSD installed zfs filesystem...  will let you know if the tool 
works, but as it was recommended by a dev I’m hopeful... have another array 
(with zfs I might add) loaded and ready to go... if the data recovery is 
successful I’ll blow away the original machine and work out what OS and drive 
setup will be safe for the data in the future.  I might even put FreeBSD and 
zfs back on it, but if I do it won’t be in the current Zraid2 config.
Meh.

Hardware failure is, well, hardware failure.  Yes, power-related
failures are hardware failures.

Never mind the potential for /software /failures.  Bugs are, well,
bugs.  And they're a real thing.  Never had the shortcomings of UFS bite
you on an "unexpected" power loss?  Well, I have.  Is ZFS absolutely
safe against any such event?  No, but it's safe*r*.

Yes and no ... I'll explain...


I've yet to have ZFS lose an entire pool due to something bad happening,
but the same basic risk (entire filesystem being gone)

Everytime I have seen this issue (and it's been more than once - though until now recoverable - even if extremely painful) - its always been during a resilver of a failed drive and something happening... panic, another drive failure, power etc.. any other time its rock solid... which is the yes and no... under normal circumstances zfs is very very good and seems as safe as or safer than UFS... but my experience is ZFS has one really bad flaw.. if there is a corruption in the metadata - even if the stored data is 100% correct - it will fault the pool and thats it it's gone barring some luck and painful recovery (backups aside) ... this other file systems also suffer but there are tools that *majority of the time* will get you out of the s**t with little pain. Barring this windows based tool I haven't been able to run yet, zfs appears to have nothing.

has occurred more
than once in my IT career with other filesystems -- including UFS, lowly
MSDOS and NTFS, never mind their predecessors all the way back to floppy
disks and the first 5Mb Winchesters.

Absolutely, been there done that.. and btrfs...*ouch* still as bad.. however with the only one btrfs install I had (I didn't knopw it was btrfs underneath, but netgear NAS...) I was still able to recover the data even though it had screwed the file system so bad I vowed never to consider or use it again on anything ever...


I learned a long time ago that two is one and one is none when it comes
to data, and WHEN two becomes one you SWEAT, because that second failure
CAN happen at the worst possible time.

and does..


As for RaidZ2 .vs. mirrored it's not as simple as you might think.
Mirrored vdevs can only lose one member per mirror set, unless you use
three-member mirrors.  That sounds insane but actually it isn't in
certain circumstances, such as very-read-heavy and high-performance-read
environments.

I know - this is why I don't use mirrored - because wear patterns will ensure both sides of the mirror are closely matched.


The short answer is that a 2-way mirrored set is materially faster on
reads but has no acceleration on writes, and can lose one member per
mirror.  If the SECOND one fails before you can resilver, and that
resilver takes quite a long while if the disks are large, you're dead.
However, if you do six drives as a 2x3 way mirror (that is, 3 vdevs each
of a 2-way mirror) you now have three parallel data paths going at once
and potentially six for reads -- and performance is MUCH better.  A
3-way mirror can lose two members (and could be organized as 3x2) but
obviously requires lots of drive slots, 3x as much *power* per gigabyte
stored (and you pay for power twice; once to buy it and again to get the
heat out of the room where the machine is.)

my problem (as always) is slots not so much the power.


Raidz2 can also lose 2 drives without being dead.  However, it doesn't
get any of the read performance improvement *and* takes a write
performance penalty; Z2 has more write penalty than Z1 since it has to
compute and write two parity entries instead of one, although in theory
at least it can parallel those parity writes -- albeit at the cost of
drive bandwidth congestion (e.g. interfering with other accesses to the
same disk at the same time.)  In short RaidZx performs about as "well"
as the *slowest* disk in the set.
Which is why I built mine with identical drives (though different production batches :) ) ... majority of the data in my storage array is write once (or twice) read many.

   So why use it (particularly Z2) at
all?  Because for "N" drives you get the protection of a 3-way mirror
and *much* more storage.  A six-member RaidZ2 setup returns ~4Tb of
usable space, where with a 2-way mirror it returns 3Tb and a 3-way
mirror (which provides the same protection against drive failure as Z2)
you have only *half* the storage.  IMHO ordinary Raidz isn't worth the
trade-offs, but Z2 frequently is.

In addition more spindles means more failures, all other things being
equal, so if you need "X" TB of storage and organize it as 3-way mirrors
you now have twice as many physical spindles which means on average
you'll take twice as many faults.  If performance is more important then
the choice is obvious.  If density is more important (that is, a lot or
even most of the data is rarely accessed at all) then the choice is
fairly simple too.  In many workloads you have some of both, and thus
the correct choice is a hybrid arrangement; that's what I do here,
because I have a lot of data that is rarely-to-never accessed and
read-only but also have some data that is frequently accessed and
frequently written.  One size does not fit all in such a workload.
This is where I came to 2 systems (with different data) .. one was for density, the other performance. Storage vs working etc..

MOST systems, by the way, have this sort of paradigm (a huge percentage
of the data is rarely read and never written) but it doesn't become
economic or sane to try to separate them until you get well into the
terabytes of storage range and a half-dozen or so physical volumes.
There's a  very clean argument that prior to that point but with greater
than one drive mirrored is always the better choice.

Note that if you have an *adapter* go insane (and as I've noted here
I've had it happen TWICE in my IT career!) then *all* of the data on the
disks served by that adapter is screwed.

100% with you - been there done that... and it doesn't matter what os or filesystem, hardware failure where silent data corruption happens because of an adapter will always take you out (and zfs will not save you in many cases of that either.)

It doesn't make a bit of difference what filesystem you're using in that
scenario and thus you had better have a backup scheme and make sure it
works as well, never mind software bugs or administrator stupidity ("dd"
as root to the wrong target, for example, will reliably screw you every
single time!)

For a single-disk machine ZFS is no *less* safe than UFS and provides a
number of advantages, with arguably the most-important being easily-used
snapshots.

Depends in normal operating I agree... but when it comes to all or nothing, that is a matter of perspective. Personally I prefer to have in place recovery options and/or multiple *possible* recovery options rather than ... "destroy the pool and recreate it from scratch, hope you have backups"...

   Not only does this simplify backups since coherency during
the backup is never at issue and incremental backups become fast and
easily-done in addition boot environments make roll-forward and even
*roll-back* reasonable to implement for software updates -- a critical
capability if you ever run an OS version update and something goes
seriously wrong with it.  If you've never had that happen then consider
yourself blessed;

I have been there (especially in the early days (pre 0.83 kernel) versions of Linux :) )

  it's NOT fun to manage in a UFS environment and often
winds up leading to a "restore from backup" scenario.  (To be fair it
can be with ZFS too if you're foolish enough to upgrade the pool before
being sure you're happy with the new OS rev.)

Actually I have a simple way with UFS (and ext2/3/4 etc) ... split the boot disk almost down the center.. create 3 partitions.. root, swap, altroot. root and altroot are almost identical, one is always active, new OS goes on the other, switch to make the other active (primary) when you've tested... it's only gives one level of roll forward/roll back, but it works for me and has never failed (boot disk/OS wise) since I implemented it... but then I don't let anyone else in the company have root access so they cannot dd or "rm -r . /" or "rm -r .*" (both of which are the only way I have done that before - back in 1994 and never done it since - its something you learn or get out of IT :P .. and for those who didn't get the latter it should have been 'rm -r .??*' - and why are you on '-stable' ...? :P )

Regards,

--
Michelle Sullivan
http://www.mhix.org/


_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to