Re: ZFS...

Walter Cramer Tue, 30 Apr 2019 09:11:50 -0700

Brief "Old Man" summary/perspective here...

Computers and hard drives are complex, sensitive physical things. They,or the data on them, can be lost to fire, flood, lightning strikes, theft,transportation screw-ups, and more. Mass data corruption by faultyhardware or software is mostly rare, but does happen. Then there's theusers - authorized or not - who are inept or malicious.

You can spent a fortune to make loss of the "live" data in your homeserver / server room / data center very unlikely. Is that worth the timeand money? Depends on the business case. At any scale, it's best to havea manager - who understands both computers and the bottom line - keep aclose eye on this.

"Real" protection from data loss means multiple off-site and generallyoff-line backups. You could spend a fortune on that, too...but for youruse case (~21TB in an array that could hold ~39TB, and what sounds like a"home power user" budget), I'd say to put together two "backup servers" -cheap little (aka transportable) FreeBSD systems with, say 7x6GB HD's,raidz1. With even a 1Gbit ethernet connection to your main system, savvyuse of (say) rsync (net/rsync in Ports), and the sort of "know your data /divide & conquer" tactics that Karl mentions, you should be able tocomplete initial backups (on both backup servers) in <1 month. After that- rsync can generally do incremental backups far, far faster. How oftenyou gently haul the backup servers to/from your off-site location(s)depends on a bunch of factors - backup frequency, cost of bandwidth, etc.


Never skimp on power supplies.

-Walter

[Credits: Nothing above is original. Others have already made most of mypoints in this thread. It's pretty much all decades-old computer wisdomin any case.]



On Tue, 30 Apr 2019, Michelle Sullivan wrote:

Karl Denninger wrote:
On 4/30/2019 05:14, Michelle Sullivan wrote:
On 30 Apr 2019, at 19:50, Xin LI <delp...@gmail.com> wrote:
On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <miche...@sorbs.net>
wrote:
but in my recent experience 2 issues colliding at the same time results
in disaster
Do we know exactly what kind of corruption happen to your pool? If you
see it twice in a row, it might suggest a software bug that should beinvestigated.
All I know is it???s a checksum error on a meta slab (122) and from what
I can gather it???s the spacemap that is corrupt... but I am no expert. Idon???t believe it???s a software fault as such, because this was cause by ahard outage (damaged UPSes) whilst resilvering a single (but completelyfailed) drive. ...and after the first outage a second occurred (same as thefirst but more damaging to the power hardware)... the host itself was notdamaged nor were the drives or controller.
.....
Note that ZFS stores multiple copies of its essential metadata, and in my
experience with my old, consumer grade crappy hardware (non-ECC RAM, withseveral faulty, single hard drive pool: bad enough to crash almost monthlyand damages my data from time to time),
This was a top end consumer grade mb with non ecc ram that had been
running for 8+ years without fault (except for hard drive platter failures.).Uptime would have been years if it wasn???t for patching.
Yuck.

I'm sorry, but that may well be what nailed you.

ECC is not just about the random cosmic ray.  It also saves your bacon
when there are power glitches.
No. Sorry no. If the data is only half to disk, ECC isn't going to saveyou at all... it's all about power on the drives to complete the write.
Unfortunately however there is also cache memory on most modern hard
drives, most of the time (unless you explicitly shut it off) it's on for
write caching, and it'll nail you too.  Oh, and it's never, in my
experience, ECC.
No comment on that - you're right in the first part, I can't comment ifthere are drives with ECC.
In addition, however, and this is something I learned a LONG time ago
(think Z-80 processors!) is that as in so many very important things
"two is one and one is none."

In other words without a backup you WILL lose data eventually, and it
WILL be important.

Raidz2 is very nice, but as the name implies it you have two
redundancies.  If you take three errors, or if, God forbid, you *write*
a block that has a bad checksum in it because it got scrambled while in
RAM, you're dead if that happens in the wrong place.
Or in my case you write part data therefore invalidating the checksum...
Yeah.. unlike UFS that has to get really really hosed to restore from
backup with nothing recoverable it seems ZFS can get hosed where issues occurin just the wrong bit... but mostly it is recoverable (and my experience hasbeen some nasty shit that always ended up being recoverable.)
Michelle
Oh that is definitely NOT true.... again, from hard experience,
including (but not limited to) on FreeBSD.

My experience is that ZFS is materially more-resilient but there is no
such thing as "can never be corrupted by any set of events."
The latter part is true - and my blog and my current situation is notlimited to or aimed at FreeBSD specifically, FreeBSD is my experience.The former part... it has been very resilient, but I think (based onthis certain set of events) it is easily corruptible and I have justbeen lucky. You just have to hit a certain write to activate the issue,and whilst that write and issue might be very very difficult (read: hitand miss) to hit in normal every day scenarios it can and willeventually happen.
   Backup
strategies for moderately large (e.g. many Terabytes) to very large
(e.g. Petabytes and beyond) get quite complex but they're also very
necessary.
and there in lies the problem. If you don't have a many 10's ofthousands of dollars backup solutions, you're either:
1/ down for a looooong time.
2/ losing all data and starting again...
..and that's the problem... ufs you can recover most (in mostsituations) and providing the *data* is there uncorrupted by the faultyou can get it all off with various tools even if it is a completemess.... here I am with the data that is apparently ok, but themetadata is corrupt (and note: as I had stopped writing to the drivewhen it started resilvering the data - all of it - should be intact...even if a mess.)
Michelle

--
Michelle Sullivan
http://www.mhix.org/

_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS...

Reply via email to