On Mon, May 16, 2016 at 5:44 PM, Richard A. Lochner <loch...@clone1.com> wrote:
> Chris,
>
> It has actually happened to me three times that I know of in ~7mos.,
> but your point about the "larger footprint" for data corruption is a
> good one.  No doubt I have silently experienced that too.

I dunno three is a lot to have the exact same corruption only in
memory then written out into two copies with valid node checksums; and
yet not have other problems, like a node item, or uuid, or xattr or
any number of other item or object types all of which get checksummed.
I suppose if the file system contains large files, the % of metadata
that's csums could be the 2nd largest footprint. But still.

Three times in 7 months, if it's really the same vector, is just short
of almost reproducible. Ha. It seems like if you merely balanced this
file system a few times, you'd eventually stumble on this. And if
that's true, then it's time for debug options and see if it can be
caught in action, and whether there's a hardware or software
explanation for it.


> And, as you
> suggest, there is no way to prevent those errors.  If the memory to be
> written to disk gets corrupted before its checksum is calculated, the
> data will be silently corrupted, period.

Well, no way in the present design, maybe.



>
> Clearly, I won't rely on this machine to produce any data directly that
> I would consider important at this point.
>
> One odd thing to me is that if this is really due to undetected memory
> errors, I'd think this system would crash fairly often due to detected
> "parity errors."  This system rarely crashes.  It often runs for
> several months without an indication of problems.

I think you'd have other problems. Only data csums are being corrupt
after they're read in, but before the node csum is computed? Three
times?  Pretty wonky.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to