On 2018年06月28日 21:26, Anand Jain wrote: > > > On 06/28/2018 04:02 PM, Qu Wenruo wrote: >> Also CC Anand Jain, since he is also working on various device related >> work, and an expert on this. >> >> Especially I'm not pretty sure if such enhancement is already on his >> agenda or there are already some unmerged patch for this. > > Right, some of the patches are already in the ML and probably it needs > a revival. Unless there are new challenges, my target is to consolidate > them and get them integrated by this year. I am trying harder. > > I think its a good idea to report the write-hole status to the > user-land instead of failing the mount, so that a script can trigger > the re-silver as required by the use case. I introduced > fs_devices::volume_flags, which is under review as of now, and have > plans to add the volume degraded state bit which can be accessed by > the sysfs. So yes this is been taken care.
Glad to hear that! However I found it pretty hard to trace your latest work, would you mind to share the git repo and branch you're working on? Maybe I could take some time to do some review. Thanks, Qu > > > Thanks, Anand > > >> Thanks, >> Qu >> >> On 2018年06月28日 15:04, Qu Wenruo wrote: >>> There is a reporter considering btrfs raid1 has a major design flaw >>> which can't handle nodatasum files. >>> >>> Despite his incorrect expectation, btrfs indeed doesn't handle device >>> generation mismatch well. >>> >>> This means if one devices missed and re-appeared, even its generation >>> no longer matches with the rest device pool, btrfs does nothing to it, >>> but treat it as normal good device. >>> >>> At least let's detect such generation mismatch and avoid mounting the >>> fs. >>> Currently there is no automatic rebuild yet, which means if users find >>> device generation mismatch error message, they can only mount the fs >>> using "device" and "degraded" mount option (if possible), then replace >>> the offending device to manually "rebuild" the fs. >>> >>> Signed-off-by: Qu Wenruo <w...@suse.com> >>> --- >>> I totally understand that, generation based solution can't handle >>> split-brain case (where 2 RAID1 devices get mounted degraded separately) >>> at all, but at least let's handle what we can do. >>> >>> The best way to solve the problem is to make btrfs treat such lower gen >>> devices as some kind of missing device, and queue an automatic scrub for >>> that device. >>> But that's a lot of extra work, at least let's start from detecting such >>> problem first. >>> --- >>> fs/btrfs/volumes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 50 insertions(+) >>> >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >>> index e034ad9e23b4..80a7c44993bc 100644 >>> --- a/fs/btrfs/volumes.c >>> +++ b/fs/btrfs/volumes.c >>> @@ -6467,6 +6467,49 @@ static void btrfs_report_missing_device(struct >>> btrfs_fs_info *fs_info, >>> devid, uuid); >>> } >>> +static int verify_devices_generation(struct btrfs_fs_info *fs_info, >>> + struct btrfs_device *dev) >>> +{ >>> + struct btrfs_fs_devices *fs_devices = dev->fs_devices; >>> + struct btrfs_device *cur; >>> + bool warn_only = false; >>> + int ret = 0; >>> + >>> + if (!fs_devices || fs_devices->seeding || !dev->generation) >>> + return 0; >>> + >>> + /* >>> + * If we're not replaying log, we're completely safe to allow >>> + * generation mismatch as it won't write anything to disks, nor >>> + * remount to rw. >>> + */ >>> + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) >>> + warn_only = true; >>> + >>> + rcu_read_lock(); >>> + list_for_each_entry_rcu(cur, &fs_devices->devices, dev_list) { >>> + if (cur->generation && cur->generation != dev->generation) { >>> + if (warn_only) { >>> + btrfs_warn_rl_in_rcu(fs_info, >>> + "devid %llu has unexpected generation, has %llu expected %llu", >>> + dev->devid, >>> + dev->generation, >>> + cur->generation); >>> + } else { >>> + btrfs_err_rl_in_rcu(fs_info, >>> + "devid %llu has unexpected generation, has %llu expected %llu", >>> + dev->devid, >>> + dev->generation, >>> + cur->generation); >>> + ret = -EINVAL; > > > > >>> + break; >>> + } >>> + } >>> + } >>> + rcu_read_unlock(); >>> + return ret; >>> +} >>> + >>> static int read_one_chunk(struct btrfs_fs_info *fs_info, struct >>> btrfs_key *key, >>> struct extent_buffer *leaf, >>> struct btrfs_chunk *chunk) >>> @@ -6552,6 +6595,13 @@ static int read_one_chunk(struct btrfs_fs_info >>> *fs_info, struct btrfs_key *key, >>> return PTR_ERR(map->stripes[i].dev); >>> } >>> btrfs_report_missing_device(fs_info, devid, uuid, false); >>> + } else { >>> + ret = verify_devices_generation(fs_info, >>> + map->stripes[i].dev); >>> + if (ret < 0) { >>> + free_extent_map(em); >>> + return ret; >>> + } >>> } >>> set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, >>> &(map->stripes[i].dev->dev_state)); >>> >>
signature.asc
Description: OpenPGP digital signature