On Tue, Jan 02, 2018 at 01:36:41PM -0700, Liu Bo wrote: > There is a scenario that can end up with rebuild process failing to > return good content, i.e. > suppose that all disks can be read without problems and if the content > that was read out doesn't match its checksum, currently for raid6 > btrfs at most retries twice, > > - the 1st retry is to rebuild with all other stripes, it'll eventually > be a raid5 xor rebuild, > - if the 1st fails, the 2nd retry will deliberately fail parity p so > that it will do raid6 style rebuild, > > however, the chances are that another non-parity stripe content also > has something corrupted, so that the above retries are not able to > return correct content, and users will think of this as data loss. > More seriouly, if the loss happens on some important internal btree > roots, it could refuse to mount. > > This extends btrfs to do more retries and each retry fails only one > stripe. Since raid6 can tolerate 2 disk failures, if there is one > more failure besides the failure on which we're recovering, this can > always work. > > The worst case is to retry as many times as the number of raid6 disks, > but given the fact that such a scenario is really rare in practice, > it's still acceptable. > > Signed-off-by: Liu Bo <bo.li....@oracle.com>
1 and added to for-next. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html