At 06/22/2017 10:43 AM, Chris Murphy wrote:
On Wed, Jun 21, 2017 at 8:12 PM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:
Well, in fact, thanks to data csum and btrfs metadata CoW, there is quite a
high chance that we won't cause any data damage.
But we have examples where data does not COW, we see a partial stripe
overwrite. And if that is interrupted it's clear that both old and new
metadata pointing to that stripe is wrong. There are way more problems
where we see csum errors on Btrfs raid56 after crashes, and there are
no bad devices.
First, if it's interrupted, there is no new metadata, as metadata is
always updated after data.
And metadata is always update CoW, so if data write is interrupted, we
are still at previous trans.
And in that case, no COW means no csum.
Btrfs won't check the correctness due to the lack of csum.
So the case will be that, for nodatacow case, btrfs won't detect the
corruption, users take the responsibility to keep their data correct.
For the example I gave above, no data damage at all.
First the data is written and power loss, and data is always written before
metadata, so that's to say, after power loss, superblock is still using the
old tree roots.
So no one is really using that newly written data.
OK but that assumes that the newly written data is always COW which on
Btrfs raid56 is not certain, there's a bunch of RMW code which
suggests overwrites are possible.
RMW is mainly to update P/Q, as even we only update data stripe1, we
still need data stripe 2 to calculate P/Q.
And for raid56 metadata it suggests RMW could happen for metadata also.
As long as we have P/Q, RMW must be used.
The root problem will be, we need cross-device FUA to ensure full stripe
is written correctly.
Or we may take the extent allocator modification, to ensure we only
write into vertical stripe without used data.
So anyway, RAID5/6 is only designed to handle missing devices, not power
loss.
IIRC mdadm RAID5/6 array needs to be scrubbed each time power loss is
detected.
Thanks,
Qu
There's fairly strong anecdotal evidence that people have less
problems with Btrfs raid5 when raid5 applies to data block groups, and
metadata block groups use some other non-parity based profile like
raid1.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html