On 2018年01月29日 19:21, Nikolay Borisov wrote:
> 
> 
> On 29.01.2018 13:09, Qu Wenruo wrote:
>>
>>
>> On 2018年01月29日 15:44, Nikolay Borisov wrote:
>>> Running generic/019 with qgroups on the scratch device enabled is
>>> almost guaranteed to trigger the BUG_ON in btrfs_free_tree_block. It's
>>> supposed to trigger only on -ENOMEM, in reality, however, it's possible
>>> to get -EIO from btrfs_qgroup_trace_extent_post. This function just
>>> finds the roots of the extent being tracked and sets the qrecord->old_roots
>>> list. If this operation fails nothing critical happens except the
>>> quota accounting can be considered wrong. In such case just set the
>>> INCONSISTENT flag for the quota and print a warning.
>>>
>>> Signed-off-by: Nikolay Borisov <nbori...@suse.com>
>>> ---
>>>
>>> V2: 
>>>  * Always print a warning if btrfs_qgroup_trace_extent_post fails 
>>>  * Set quota inconsistent flag if btrfs_qgroup_trace_extent_post fails
>>>
>>>  fs/btrfs/delayed-ref.c | 7 +++++--
>>>  fs/btrfs/qgroup.c      | 6 ++++--
>>>  2 files changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>>> index a1a40cf382e3..5b2789a28a13 100644
>>> --- a/fs/btrfs/delayed-ref.c
>>> +++ b/fs/btrfs/delayed-ref.c
>>> @@ -820,8 +820,11 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
>>> *fs_info,
>>>                          num_bytes, parent, ref_root, level, action);
>>>     spin_unlock(&delayed_refs->lock);
>>>  
>>> -   if (qrecord_inserted)
>>> -           return btrfs_qgroup_trace_extent_post(fs_info, record);
>>> +   if (qrecord_inserted) {
>>> +           int ret = btrfs_qgroup_trace_extent_post(fs_info, record);
>>> +           if (ret < 0)
>>> +                   btrfs_warn(fs_info, "Error accounting new delayed refs 
>>> extent (err code: %d). Quota inconsistent", ret);
>>
>> Sorry that I didn't point it out in previous review, there are 2 callers
>> in delayed-ref.c using btrfs_qgroup_trace_extent_post().
>>
>> One is the one you're fixing, and the other one is
>> btrfs_add_delayed_data_ref().
> 
> Yes, but the callers of btrfs_add_delayed_data_ref seem to be expecting
> error values and actually handling them.

Not exactly.

A quick search leads to extra unhandled btrfs_add_delayed_data_ref().

walk_down_proc()
|- btrfs_dec_ref()
   |- __btrfs_mod_ref()
      |- btrfs_free_extent()
         |- btrfs_add_delayed_data_ref()
            |- btrfs_qgroup_trace_extent_post()

And this leads to another BUG_ON().

> So a failure doesn't
> necessarily mean the fs is in inconsistent state.

But at the cost of aborting current transaction.

> 
>>
>> So it would be even better if the warning message can be integrated into
>> btrfs_qgroup_trace_extent_post().
> 
> See below why I don't think integrating the warning is a good idea. In
> the case being modified in this patch we will continue operating
> normally, hence the warning and INCONSISTENT flag make sense.
> 
>>
>> Also btrfs_qgroup_trace_extent_post() also needs to ignore the return
>> value from btrfs_qgroup_trace_extent_post().
> 
> I don't think so, if we are able to handle failures as is the case in
> the delayed_data_ref case then we might abort the current transaction
> and this should leave the fs in a consistent state.

Here comes the trade-off.

Keep the on-disk data consistent while abort current transaction and
make fs read-only.

VS

Make the fs continue running while just discard the qgroup data.


Although the truth is, either way we may eventually goes
abort_transaction() since we failed to read some tree blocks.

But since there are still some BUG_ON() wondering around the wild, the
latter one seems a little better.

> In that case even
> the "STATUS_FLAG_INCONSISTENT" being set in qgroup_trace_extent_post
> might be "wrong" in the sense that a failure from this function doesn't
> necessarily make the quota inconsistent if upper layers can handle the
> failures and revert their work.

Well, most of them just abort the transaction and leads to above trade-off.

Unfortunately, there is not much thing we can do in error handler. :(

Thanks,
Qu

> So I'm now starting to think that the
> inconsistent flag should be set in add_delayed_tree_ref, but this sort
> of leaks the qgroup implementation detail into the delayed tree ref
> function...
>>
>> Thanks,
>> Qu
>>
>>> +   }
>>>     return 0;
>>>  
>>>  free_head_ref:
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index b2ab5f795816..33f9dba44e92 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -1440,8 +1440,10 @@ int btrfs_qgroup_trace_extent_post(struct 
>>> btrfs_fs_info *fs_info,
>>>     int ret;
>>>  
>>>     ret = btrfs_find_all_roots(NULL, fs_info, bytenr, 0, &old_root, false);
>>> -   if (ret < 0)
>>> +   if (ret < 0) {
>>> +           fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
>>>             return ret;
>>> +   }
>>>  
>>>     /*
>>>      * Here we don't need to get the lock of
>>> @@ -2933,7 +2935,7 @@ static int __btrfs_qgroup_release_data(struct inode 
>>> *inode,
>>>     if (free && reserved)
>>>             return qgroup_free_reserved_data(inode, reserved, start, len);
>>>     extent_changeset_init(&changeset);
>>> -   ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start, 
>>> +   ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
>>>                     start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
>>>     if (ret < 0)
>>>             goto out;
>>>
>>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to