Am Wed, 28 May 2014 08:26:58 -0700
schrieb Bob Sanders <rsand...@sgi.com>:

> 
> Marc Joliet, mused, then expounded:
> > Am Tue, 27 May 2014 15:39:38 -0700
> > schrieb Bob Sanders <rsand...@sgi.com>:
> > 
> > While I am far from a filesystem/storage expert (I see myself as a mere 
> > user),
> > the cited threads lead me to believe that this is most likely an
> > overhyped/misunderstood class of errors (e.g., posts [1] and [2]), so I 
> > would
> > suggest reading them in their entirety.
> > 
> > [0] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31832
> > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31871
> > [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31877
> > [3] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31821
> >
> 
> FWIW - here's the FreeNAS ZFS ECC discussion on what happens with a bad
> memory bit and no ECC memory:
> 
> http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/

Thanks for explicitly linking that.  I didn't read it the first time around,
but just read through most of it, then reread the threads [0] and [3] above and
*think* that I understand the problem (and how it doesn't apply to BTRFS)
better now.

IIUC, the claim is: data is written to disk, but it must go through the RAM
first, obviously, where it is corrupted (due to a permanent bit flip caused,
e.g., by deteriorating hardware).  At some later point, when the data is read
back from disk, it might happen to load around the damaged location in RAM,
where it is further corrupted.  At this point the checksum fails, and ZFS
corrects the data in RAM (using parity information!), where it is immediately
corrupted again (because apparently it is corrected at the same physical
location in RAM? perhaps this is specific to correction via parity?). This
*additionally* corrupted data is then written back to disk (without any further
checks).

So the point is that, apparently, without ECC RAM, you could get a (long-term)
cascade of errors, especially during a scrub.  The likelihood of such permanent
RAM corruption happening in the first place is another question entirely.

The various posts in [0] then basically say that regardless of whether this
really is true of ZFS, it certainly doesn't apply to BTRFS, for various
reasons.  I suppose this quote from [1] (see above) says it most clearly:

> In hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449, they 
> talk about
> reconstructing corrupted data from parity information:
> 
> > Ok, no problem. ZFS will check against its parity. Oops, the parity failed 
> > since we have a new corrupted
> bit. Remember, the checksum data was calculated after the corruption from the 
> first memory error
> occurred. So now the parity data is used to "repair" the bad data. So the 
> data is "fixed" in RAM.
> 
> i.e. that there is parity information stored with every piece of data, and 
> ZFS will "correct" errors
> automatically from the parity information.  I start to suspect that there is 
> confusion here between
> checksumming for data integrity and parity information.  If this is really 
> how ZFS works, then if memory
> corruption interferes with this process, then I can see how a scrub could be 
> devastating.  I don't know if
> ZFS really works like this.  It sounds very odd to do this without an 
> additional checksum check.  This sounds
> very different to what you say below that btrfs does, which is only to check 
> against redundantly-stored
> copies, which I agree sounds much safer.

The rest is also relevant, but I think the point that the data is corrected via
parity information, as opposed to using a known-good redundant copy of the data
(which I originally missed, and thus got confused), is the key point in
understanding the (supposed) difference in behaviour between ZFS and BTRFS.

All this assumes, of course, that the FreeNAS forum post that ignited this
discussion is correct in the first place.

> Thanks Mark!  Interesting discussion on btrfs.
> 
> Bob

You're welcome!  I agree, it's an interesting discussion.  And regarding the
misspelling of my name: no problem :-) .

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

Attachment: signature.asc
Description: PGP signature

Reply via email to