Re: [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)

Jeff Mahoney Fri, 04 May 2018 06:33:13 -0700

On 5/4/18 1:59 AM, Nikolay Borisov wrote:
> 
> 
> On  4.05.2018 01:27, Jeff Mahoney wrote:
>> On 5/3/18 2:23 AM, Nikolay Borisov wrote:
>>>
>>>
>>> On  3.05.2018 00:11, je...@suse.com wrote:
>>>> From: Jeff Mahoney <je...@suse.com>
>>>>
>>>> Hi Dave -
>>>>
>>>> Here's the updated patchset for the rescan races.  This fixes the issue
>>>> where we'd try to start multiple workers.  It introduces a new "ready"
>>>> bool that we set during initialization and clear while queuing the worker.
>>>> The queuer is also now responsible for most of the initialization.
>>>>
>>>> I have a separate patch set start that gets rid of the racy mess 
>>>> surrounding
>>>> the rescan worker startup.  We can handle it in btrfs_run_qgroups and
>>>> just set a flag to start it everywhere else.
>>> I'd be interested in seeing those patches. Some time ago I did send a
>>> patch which cleaned up the way qgroup rescan was initiated. It was done
>>> from "btrfs_run_qgroups" and I think this is messy. Whatever we do we
>>> ought to really have well-defined semantics when qgroups rescan are run,
>>> preferably we shouldn't be conflating rescan + run (unless there is
>>> _really_ good reason to do). In the past the rescan from scan was used
>>> only during qgroup enabling.
>>
>> I think btrfs_run_qgroups is the place to do it.  Here's why:
>>
>> 2773 int
>> 2774 btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>> 2775 {
>> 2776         int ret = 0;
>> 2777         struct btrfs_trans_handle *trans;
>> 2778
>> 2779         ret = qgroup_rescan_init(fs_info, 0, 1);
>> 2780         if (ret)
>> 2781                 return ret;
>> 2782
>> 2783         /*
>> 2784          * We have set the rescan_progress to 0, which means no more
>> 2785          * delayed refs will be accounted by btrfs_qgroup_account_ref.
>> 2786          * However, btrfs_qgroup_account_ref may be right after its call
>> 2787          * to btrfs_find_all_roots, in which case it would still do the
>> 2788          * accounting.
>> 2789          * To solve this, we're committing the transaction, which will
>> 2790          * ensure we run all delayed refs and only after that, we are
>> 2791          * going to clear all tracking information for a clean start.
>> 2792          */
>> 2793
>> 2794         trans = btrfs_join_transaction(fs_info->fs_root);
>> 2795         if (IS_ERR(trans)) {
>> 2796                 fs_info->qgroup_flags &= 
>> ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>> 2797                 return PTR_ERR(trans);
>> 2798         }
>> 2799         ret = btrfs_commit_transaction(trans);
>> 2800         if (ret) {
>> 2801                 fs_info->qgroup_flags &= 
>> ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>> 2802                 return ret;
>> 2803         }
>> 2804
>> 2805         qgroup_rescan_zero_tracking(fs_info);
>> 2806
>> 2807         queue_rescan_worker(fs_info);
>> 2808         return 0;
>> 2809 }
>>
>> The delayed ref race should exist anywhere we initiate a rescan outside of
>> initially enabling qgroups.  We already zero the tracking and queue the 
>> rescan
>> worker in btrfs_run_qgroups for when we enable qgroups.  Why not just always
>> queue the worker there so the initialization and execution has a clear 
>> starting point?
> 
> This is no longer true in upstream as of commit 5d23515be669 ("btrfs:
> Move qgroup rescan on quota enable to btrfs_quota_enable"). Hence my
> asking about this. I guess if we make it unconditional it won't increase
> the complexity, but the original code which was only run during qgroup
> enable was rather iffy I Just don't want to repeat this.


Ah, ok.  My repo is still using v4.16.  How does this work with the race
that is described in btrfs_qgroup_rescan?

-Jeff

-- 
Jeff Mahoney
SUSE Labs

signature.asc
Description: OpenPGP digital signature

Re: [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)

Reply via email to