On 4/23/18 5:43 PM, David Sterba wrote: > On Tue, Apr 17, 2018 at 02:45:33PM -0400, Jeff Mahoney wrote: >> On a file system with many snapshots and qgroups enabled, an interrupted >> balance can end up taking a long time to mount due to recovering the >> relocations during mount. It does this in the task performing the mount, >> which can't be interrupted and may prevent mounting (and systems booting) >> for a long time as well. The thing is that as part of balance, this >> runs in the background all the time. This patch pushes the recovery >> into a helper thread and allows the mount to continue normally. We hold >> off on resuming any paused balance operation until after the relocation >> has completed, disallow any new balance operations if it's running, and >> wait for it on umount and remounting read-only. > > The overall logic sounds ok.
Thanks for the review. I've updated the style issues in my patch and removed them from the quote below. > So, this can also stall the umount, right? Eg. if I start mount, > relocation goes to background, then unmount will have to wait for > completion. Yep, I didn't try to solve that problem since the file system wouldn't even mount before. Makes sense to make it unmountable, though. That's a change that would probably speed up btrfs balance cancel as well. > Balance pause is requested at umount time, something similar should be > possible for relocation recovery. The fs_state bit CLOSING could be > checked somewhere in the loop. An earlier version had that check in the top of the loop in merge_reloc_roots, but I think a better spot would be the top of the merge_reloc_root loop. >> This doesn't do anything to address the relocation recovery operation >> taking a long time but does allow the file system to mount. >> >> Signed-off-by: Jeff Mahoney <je...@suse.com> >> --- >> fs/btrfs/ctree.h | 7 +++ >> fs/btrfs/disk-io.c | 7 ++- >> fs/btrfs/relocation.c | 92 >> +++++++++++++++++++++++++++++++++++++++++--------- >> fs/btrfs/super.c | 5 +- >> fs/btrfs/volumes.c | 6 +++ >> 5 files changed, 97 insertions(+), 20 deletions(-) >> >> --- a/fs/btrfs/ctree.h >> +++ b/fs/btrfs/ctree.h >> @@ -1052,6 +1052,10 @@ struct btrfs_fs_info { >> struct btrfs_work qgroup_rescan_work; >> bool qgroup_rescan_running; /* protected by qgroup_rescan_lock */ >> >> + /* relocation recovery items */ >> + bool relocation_recovery_started; >> + struct completion relocation_recovery_completion; > > This seems to copy the pattern of qgroup rescan, the completion + > workqueue. I'm planning to move this scheme to the fs_state bit instead > of bool and the wait_for_war with global workqueue, but for now we can > leave as it is here. Such that we just put these jobs on a workqueue instead? >> + if (err == 0) { >> + struct btrfs_root *fs_root; >> + >> + /* cleanup orphan inode in data relocation tree */ >> + fs_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID); >> + if (IS_ERR(fs_root)) >> + err = PTR_ERR(fs_root); >> + else >> + err = btrfs_orphan_cleanup(fs_root); >> + } >> + mutex_unlock(&fs_info->cleaner_mutex); >> + clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags); > > The part that sets the bit is in the caller, btrfs_recover_relocation, > but this can race if the kthread is too fast. > > btrfs_recover_relocation > start kthread with btrfs_resume_relocation > clear_bit > set_bit > ... > > now we're stuck with the EXCL_OP set without any operation actually running. > > The bit can be set right before the kthread is started and cleared > inside. There's no opportunity to race since the thread can't run until btrfs_recover_relocation returns and releases the cleaner mutex. >> @@ -4620,16 +4670,21 @@ int btrfs_recover_relocation(struct btrf >> if (err) >> goto out_free; >> >> - merge_reloc_roots(rc); >> - >> - unset_reloc_control(rc); >> - >> - trans = btrfs_join_transaction(rc->extent_root); >> - if (IS_ERR(trans)) { >> - err = PTR_ERR(trans); >> + tsk = kthread_run(btrfs_resume_relocation, fs_info, >> + "relocation-recovery"); > > Would be good to name it 'btrfs-reloc-recovery', ie with btrfs in the > name so it's easy greppable from the process list. Right. In an earlier version, I was using a btrfs_worker so that was added automatically. >> + if (IS_ERR(tsk)) { >> + err = PTR_ERR(tsk); >> goto out_free; >> } >> - err = btrfs_commit_transaction(trans); >> + >> + fs_info->relocation_recovery_started = true; >> + >> + /* protected from racing with resume thread by the cleaner_mutex */ >> + init_completion(&fs_info->relocation_recovery_completion); >> + >> + set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags); > > The 2nd comment above refers to this. The response too. -Jeff -- Jeff Mahoney SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html