On 4/23/18 5:43 PM, David Sterba wrote:
> On Tue, Apr 17, 2018 at 02:45:33PM -0400, Jeff Mahoney wrote:
>> On a file system with many snapshots and qgroups enabled, an interrupted
>> balance can end up taking a long time to mount due to recovering the
>> relocations during mount.  It does this in the task performing the mount,
>> which can't be interrupted and may prevent mounting (and systems booting)
>> for a long time as well.  The thing is that as part of balance, this
>> runs in the background all the time.  This patch pushes the recovery
>> into a helper thread and allows the mount to continue normally.  We hold
>> off on resuming any paused balance operation until after the relocation
>> has completed, disallow any new balance operations if it's running, and
>> wait for it on umount and remounting read-only.
> 
> The overall logic sounds ok.

Thanks for the review.  I've updated the style issues in my patch and
removed them from the quote below.

> So, this can also stall the umount, right? Eg. if I start mount,
> relocation goes to background, then unmount will have to wait for
> completion.

Yep, I didn't try to solve that problem since the file system wouldn't
even mount before.  Makes sense to make it unmountable, though.  That's
a change that would probably speed up btrfs balance cancel as well.

> Balance pause is requested at umount time, something similar should be
> possible for relocation recovery. The fs_state bit CLOSING could be
> checked somewhere in the loop.

An earlier version had that check in the top of the loop in
merge_reloc_roots, but I think a better spot would be the top of the
merge_reloc_root loop.

>> This doesn't do anything to address the relocation recovery operation
>> taking a long time but does allow the file system to mount.
>>
>> Signed-off-by: Jeff Mahoney <je...@suse.com>
>> ---
>>  fs/btrfs/ctree.h      |    7 +++
>>  fs/btrfs/disk-io.c    |    7 ++-
>>  fs/btrfs/relocation.c |   92 
>> +++++++++++++++++++++++++++++++++++++++++---------
>>  fs/btrfs/super.c      |    5 +-
>>  fs/btrfs/volumes.c    |    6 +++
>>  5 files changed, 97 insertions(+), 20 deletions(-)
>>
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1052,6 +1052,10 @@ struct btrfs_fs_info {
>>      struct btrfs_work qgroup_rescan_work;
>>      bool qgroup_rescan_running;     /* protected by qgroup_rescan_lock */
>>  
>> +    /* relocation recovery items */
>> +    bool relocation_recovery_started;
>> +    struct completion relocation_recovery_completion;
> 
> This seems to copy the pattern of qgroup rescan, the completion +
> workqueue. I'm planning to move this scheme to the fs_state bit instead
> of bool and the wait_for_war with global workqueue, but for now we can
> leave as it is here.

Such that we just put these jobs on a workqueue instead?

>> +    if (err == 0) {
>> +            struct btrfs_root *fs_root;
>> +
>> +            /* cleanup orphan inode in data relocation tree */
>> +            fs_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID);
>> +            if (IS_ERR(fs_root))
>> +                    err = PTR_ERR(fs_root);
>> +            else
>> +                    err = btrfs_orphan_cleanup(fs_root);
>> +    }
>> +    mutex_unlock(&fs_info->cleaner_mutex);
>> +    clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> 
> The part that sets the bit is in the caller, btrfs_recover_relocation,
> but this can race if the kthread is too fast.
> 
> btrfs_recover_relocation
>       start kthread with btrfs_resume_relocation
>               clear_bit
>       set_bit
>       ...
> 
> now we're stuck with the EXCL_OP set without any operation actually running.
> 
> The bit can be set right before the kthread is started and cleared
> inside.

There's no opportunity to race since the thread can't run until
btrfs_recover_relocation returns and releases the cleaner mutex.

>> @@ -4620,16 +4670,21 @@ int btrfs_recover_relocation(struct btrf
>>      if (err)
>>              goto out_free;
>>  
>> -    merge_reloc_roots(rc);
>> -
>> -    unset_reloc_control(rc);
>> -
>> -    trans = btrfs_join_transaction(rc->extent_root);
>> -    if (IS_ERR(trans)) {
>> -            err = PTR_ERR(trans);
>> +    tsk = kthread_run(btrfs_resume_relocation, fs_info,
>> +                      "relocation-recovery");
> 
> Would be good to name it 'btrfs-reloc-recovery', ie with btrfs in the
> name so it's easy greppable from the process list.

Right.  In an earlier version, I was using a btrfs_worker so that was
added automatically.

>> +    if (IS_ERR(tsk)) {
>> +            err = PTR_ERR(tsk);
>>              goto out_free;
>>      }
>> -    err = btrfs_commit_transaction(trans);
>> +
>> +    fs_info->relocation_recovery_started = true;
>> +
>> +    /* protected from racing with resume thread by the cleaner_mutex */
>> +    init_completion(&fs_info->relocation_recovery_completion);
>> +
>> +    set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> 
> The 2nd comment above refers to this.

The response too.

-Jeff

-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to