Re: RAID56 Warning on "multiple serious data-loss bugs"

Qu Wenruo Mon, 28 Jan 2019 17:47:01 -0800


On 2019/1/29 上午6:07, DanglingPointer wrote:
> Thanks Qu!
> I thought as much from following the mailing list and your great work
> over the years!
> 
> Would it be possible to get the wiki updated to reflect the current
> "real" status?
> 
> From Qu's statement and perspective, there's no difference to other
> non-BTRFS software RAID56's out there that are marked as stable (except
> ZFS).


I'm afraid that my old statement is wrong.

Quite a lot software RAID56 has a way to record which block get
modified, just like some hardware RAID56 controller does, thus get rid
of the write hole problem.

Thanks,
Qu

> Also there are no "multiple serious data-loss bugs".
> Please do consider my proposal as it will decrease the amount of
> incorrect paranoia that exists in the community.
> As long as the Wiki properly mentions the current state with the options
> for mitigation; like backup power and perhaps RAID1 for metadata or
> anything else you believe as appropriate.
> 
> 
> Thanks,
> 
> DP
> 
> 
> On 28/1/19 11:52 am, Qu Wenruo wrote:
>>
>> On 2019/1/26 下午7:45, DanglingPointer wrote:
>>>
>>> Hi All,
>>>
>>> For clarity for the masses, what are the "multiple serious data-loss
>>> bugs" as mentioned in the btrfs wiki?
>>> The bullet points on this page:
>>> https://btrfs.wiki.kernel.org/index.php/RAID56
>>> don't enumerate the bugs.  Not even in a high level.  If anything what
>>> can be closest to a bug or issue or "resilience use-case missing" would
>>> be the first point on that page.
>>>
>>> "Parity may be inconsistent after a crash (the "write hole"). The
>>> problem born when after "an unclean shutdown" a disk failure happens.
>>> But these are *two* distinct failures. These together break the BTRFS
>>> raid5 redundancy. If you run a scrub process after "an unclean shutdown"
>>> (with no disk failure in between) those data which match their checksum
>>> can still be read out while the mismatched data are lost forever."
>>>
>>> So in a nutshell; "What are the multiple serious data-loss bugs?".
>> There used to be two, like scrub racing (minor), and screwing up good
>> copy when doing recovery (major).
>>
>> Although these two should already be fixed.
>>
>> So for current upstream kernel, there should be no major problem despite
>> write hole.
>>
>> Thanks,
>> Qu
>>
>>> If
>>> there aren't any, perhaps updating the wiki should be considered for
>>> something less the "dramatic" .
>>>
>>>
>>>

signature.asc
Description: OpenPGP digital signature

Re: RAID56 Warning on "multiple serious data-loss bugs"

Reply via email to