On Fri, Apr 14, 2017 at 08:35:55AM +0800, Qu Wenruo wrote: > When scrubbing a RAID5 which has recoverable data corruption (only one > data stripe is corrupted), sometimes scrub will report more csum errors > than expected. Sometimes even unrecoverable error will be reported. > > The problem can be easily reproduced by the following steps: > 1) Create a btrfs with RAID5 data profile with 3 devs > 2) Mount it with nospace_cache or space_cache=v2 > To avoid extra data space usage. > 3) Create a 128K file and sync the fs, unmount it > Now the 128K file lies at the beginning of the data chunk > 4) Locate the physical bytenr of data chunk on dev3 > Dev3 is the 1st data stripe. > 5) Corrupt the first 64K of the data chunk stripe on dev3 > 6) Mount the fs and scrub it > > The correct csum error number should be 16(assuming using x86_64). > Larger csum error number can be reported in a 1/3 chance. > And unrecoverable error can also be reported in a 1/10 chance. > > The root cause of the problem is RAID5/6 recover code has race > condition, due to the fact that full scrub is initiated per device. > > While for other mirror based profiles, each mirror is independent with > each other, so race won't cause any big problem. > > For example: > Corrupted | Correct | Correct | > | Scrub dev3 (D1) | Scrub dev2 (D2) | Scrub dev1(P) | > ------------------------------------------------------------------------ > Read out D1 |Read out D2 |Read full stripe | > Check csum |Check csum |Check parity | > Csum mismatch |Csum match, continue |Parity mismatch | > handle_errored_block | |handle_errored_block | > Read out full stripe | | Read out full stripe| > D1 csum error(err++) | | D1 csum error(err++)| > Recover D1 | | Recover D1 | > > So D1's csum error is accounted twice, just because > handle_errored_block() doesn't have enough protect, and race can happen. > > On even worse case, for example D1's recovery code is re-writing > D1/D2/P, and P's recovery code is just reading out full stripe, then we > can cause unrecoverable error. > > This patch will use previously introduced lock_full_stripe() and > unlock_full_stripe() to protect the whole scrub_handle_errored_block() > function for RAID56 recovery. > So no extra csum error nor unrecoverable error. > > Reported-by: Goffredo Baroncelli <kreij...@libero.it> > Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com>
Reviewed-by: David Sterba <dste...@suse.com> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html