On 2018/11/19 下午11:24, Filipe Manana wrote:
> On Mon, Nov 19, 2018 at 2:48 PM Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>
>>
>>
>> On 2018/11/19 下午10:15, fdman...@kernel.org wrote:
>>> From: Filipe Manana <fdman...@suse.com>
>>>
>>> If the quota enable and snapshot creation ioctls are called concurrently
>>> we can get into a deadlock where the task enabling quotas will deadlock
>>> on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it
>>> twice, or the task creating a snapshot tries to commit the transaction
>>> while the task enabling quota waits for the former task to commit the
>>> transaction while holding the mutex. The following time diagrams show how
>>> both cases happen.
>>>
>>> First scenario:
>>>
>>>            CPU 0                                    CPU 1
>>>
>>>  btrfs_ioctl()
>>>   btrfs_ioctl_quota_ctl()
>>>    btrfs_quota_enable()
>>>     mutex_lock(fs_info->qgroup_ioctl_lock)
>>>     btrfs_start_transaction()
>>>
>>>                                              btrfs_ioctl()
>>>                                               btrfs_ioctl_snap_create_v2
>>>                                                create_snapshot()
>>>                                                 --> adds snapshot to the
>>>                                                     list pending_snapshots
>>>                                                     of the current
>>>                                                     transaction
>>>
>>>     btrfs_commit_transaction()
>>>      create_pending_snapshots()
>>>        create_pending_snapshot()
>>>         qgroup_account_snapshot()
>>>          btrfs_qgroup_inherit()
>>>          mutex_lock(fs_info->qgroup_ioctl_lock)
>>>           --> deadlock, mutex already locked
>>>               by this task at
>>>               btrfs_quota_enable()
>>>
>>> Second scenario:
>>>
>>>            CPU 0                                    CPU 1
>>>
>>>  btrfs_ioctl()
>>>   btrfs_ioctl_quota_ctl()
>>>    btrfs_quota_enable()
>>>     mutex_lock(fs_info->qgroup_ioctl_lock)
>>>     btrfs_start_transaction()
>>>
>>>                                              btrfs_ioctl()
>>>                                               btrfs_ioctl_snap_create_v2
>>>                                                create_snapshot()
>>>                                                 --> adds snapshot to the
>>>                                                     list pending_snapshots
>>>                                                     of the current
>>>                                                     transaction
>>>
>>>                                                 btrfs_commit_transaction()
>>>                                                  --> waits for task at
>>>                                                      CPU 0 to release
>>>                                                      its transaction
>>>                                                      handle
>>>
>>>     btrfs_commit_transaction()
>>>      --> sees another task started
>>>          the transaction commit first
>>>      --> releases its transaction
>>>          handle
>>>      --> waits for the transaction
>>>          commit to be completed by
>>>          the task at CPU 1
>>>
>>>                                                  create_pending_snapshot()
>>>                                                   qgroup_account_snapshot()
>>>                                                    btrfs_qgroup_inherit()
>>>                                                     
>>> mutex_lock(fs_info->qgroup_ioctl_lock)
>>>                                                      --> deadlock, task at 
>>> CPU 0
>>>                                                          has the mutex 
>>> locked but
>>>                                                          it is waiting for 
>>> us to
>>>                                                          finish the 
>>> transaction
>>>                                                          commit
>>>
>>> So fix this by setting the quota enabled flag in fs_info after committing
>>> the transaction at btrfs_quota_enable(). This ends up serializing quota
>>> enable and snapshot creation as if the snapshot creation happened just
>>> before the quota enable request. The quota rescan task, scheduled after
>>> committing the transaction in btrfs_quote_enable(), will do the accounting.
>>>
>>> Fixes: 6426c7ad697d ("btrfs: qgroup: Fix qgroup accounting when creating 
>>> snapshot")
>>> Signed-off-by: Filipe Manana <fdman...@suse.com>
>>> ---
>>>
>>> V2: Added second deadlock example to changelog and changed the fix to deal
>>>     with that case as well.
>>>
>>>  fs/btrfs/qgroup.c | 14 ++++++++++----
>>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index d4917c0cddf5..ae1358253b7b 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -1013,16 +1013,22 @@ int btrfs_quota_enable(struct btrfs_fs_info 
>>> *fs_info)
>>>               btrfs_abort_transaction(trans, ret);
>>>               goto out_free_path;
>>>       }
>>> -     spin_lock(&fs_info->qgroup_lock);
>>> -     fs_info->quota_root = quota_root;
>>> -     set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags);
>>> -     spin_unlock(&fs_info->qgroup_lock);
>>>
>>>       ret = btrfs_commit_transaction(trans);
>>>       trans = NULL;
>>>       if (ret)
>>>               goto out_free_path;
>>
>> The main concern here is, if we don't set qgroup enabled bit before we
>> commit transaction, there will be a window where new tree modification
>> could happen before QGROUP_ENABLED bit set.
> 
> That doesn't seem to make much sense to me, if I understood correctly.
> Essentially you're saying stuff done to any tree in the the
> transaction we use to
> enable quotas must be accounted for. In that case the quota enabled bit should
> be done as soon as the transaction is started, because before we set
> it and after
> we started (or joined) a transaction, a lot could of modifications may
> have happened.
> Nevertheless I don't think it's unexpected for anyone to have the
> accounting happening
> only after the quota enable ioctl returns success.

The problem is not accounting, the qgroup number won't cause problem.

It's the reserved space. Like some dirty pages are dirtied before quota
enabled, but at run_dealloc() time quota is enabled.

For such case we have io_tree based method to avoid underflow so it
should be OK.

So v2 patch looks OK.

Thanks,
Qu

> 
>>
>> There may be some qgroup reserved space related problem in such case,
>> but I'm not 100% sure to foresee such problem.
>>
>>
>> The best way to do this is, commit trans first, and before any other one
>> trying to start transaction, we set the bit.
>> However I can't find such infrastructure now (I still remember we used
>> to have a pending bit to change quota enabled flag, but removed later)
>>
>> Thanks,
>> Qu
>>
>>>
>>> +     /*
>>> +      * Set quota enabled flag after committing the transaction, to avoid
>>> +      * deadlocks on fs_info->qgroup_ioctl_lock with concurrent snapshot
>>> +      * creation.
>>> +      */
>>> +     spin_lock(&fs_info->qgroup_lock);
>>> +     fs_info->quota_root = quota_root;
>>> +     set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags);
>>> +     spin_unlock(&fs_info->qgroup_lock);
>>> +
>>>       ret = qgroup_rescan_init(fs_info, 0, 1);
>>>       if (!ret) {
>>>               qgroup_rescan_zero_tracking(fs_info);
>>>
>>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to