Re: RAID56 Warning on "multiple serious data-loss bugs"

Chris Murphy Tue, 29 Jan 2019 11:03:36 -0800

On Mon, Jan 28, 2019 at 3:52 PM Remi Gauvin <[email protected]> wrote:
>
> On 2019-01-28 5:07 p.m., DanglingPointer wrote:
>
> > From Qu's statement and perspective, there's no difference to other
> > non-BTRFS software RAID56's out there that are marked as stable (except
> > ZFS).
> > Also there are no "multiple serious data-loss bugs".
> > Please do consider my proposal as it will decrease the amount of
> > incorrect paranoia that exists in the community.
> > As long as the Wiki properly mentions the current state with the options
> > for mitigation; like backup power and perhaps RAID1 for metadata or
> > anything else you believe as appropriate.
>
> Should implement some way to automatically scrub on unclean shutdown.
> BTRFS is the only (to my knowlege) Raid implementation that will not
> automatically detect an unclean shutdown and fix the affected parity
> blocks, (either by some form of write journal/write intent map, or full
> resync.)


There's no dirty bit set on mount, and thus no dirty bit to unset on
clean mount, from which to infer a dirty unmount if it's present at
the next mount.

If there were a way to implement an abridged scrub, it could be done
on every mount if metadata uses raid56 profile. But I think Qu is
working on something like a raid56 that would obviate the problem,
which is probably the best and most scalable solution.

An abridged scrub could be metadata only, and only if it's raid56 profile.

But still in 2019, we have this super crap default SCSI block layer
command timeout of 30 seconds. This encourages corruption in common
consumer devices by prematurely resetting it when it's merely in deep
recoveries that take longer than 30s. And this prevents automatic
repair from happening, since it prevents the device from reporting a
discrete read + sector value error, and therefore the problem gets
masked behind link resets.


-- 
Chris Murphy

Re: RAID56 Warning on "multiple serious data-loss bugs"

Reply via email to