On 2018年01月29日 19:21, Nikolay Borisov wrote: > > > On 29.01.2018 13:09, Qu Wenruo wrote: >> >> >> On 2018年01月29日 15:44, Nikolay Borisov wrote: >>> Running generic/019 with qgroups on the scratch device enabled is >>> almost guaranteed to trigger the BUG_ON in btrfs_free_tree_block. It's >>> supposed to trigger only on -ENOMEM, in reality, however, it's possible >>> to get -EIO from btrfs_qgroup_trace_extent_post. This function just >>> finds the roots of the extent being tracked and sets the qrecord->old_roots >>> list. If this operation fails nothing critical happens except the >>> quota accounting can be considered wrong. In such case just set the >>> INCONSISTENT flag for the quota and print a warning. >>> >>> Signed-off-by: Nikolay Borisov <nbori...@suse.com> >>> --- >>> >>> V2: >>> * Always print a warning if btrfs_qgroup_trace_extent_post fails >>> * Set quota inconsistent flag if btrfs_qgroup_trace_extent_post fails >>> >>> fs/btrfs/delayed-ref.c | 7 +++++-- >>> fs/btrfs/qgroup.c | 6 ++++-- >>> 2 files changed, 9 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c >>> index a1a40cf382e3..5b2789a28a13 100644 >>> --- a/fs/btrfs/delayed-ref.c >>> +++ b/fs/btrfs/delayed-ref.c >>> @@ -820,8 +820,11 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info >>> *fs_info, >>> num_bytes, parent, ref_root, level, action); >>> spin_unlock(&delayed_refs->lock); >>> >>> - if (qrecord_inserted) >>> - return btrfs_qgroup_trace_extent_post(fs_info, record); >>> + if (qrecord_inserted) { >>> + int ret = btrfs_qgroup_trace_extent_post(fs_info, record); >>> + if (ret < 0) >>> + btrfs_warn(fs_info, "Error accounting new delayed refs >>> extent (err code: %d). Quota inconsistent", ret); >> >> Sorry that I didn't point it out in previous review, there are 2 callers >> in delayed-ref.c using btrfs_qgroup_trace_extent_post(). >> >> One is the one you're fixing, and the other one is >> btrfs_add_delayed_data_ref(). > > Yes, but the callers of btrfs_add_delayed_data_ref seem to be expecting > error values and actually handling them.
Not exactly. A quick search leads to extra unhandled btrfs_add_delayed_data_ref(). walk_down_proc() |- btrfs_dec_ref() |- __btrfs_mod_ref() |- btrfs_free_extent() |- btrfs_add_delayed_data_ref() |- btrfs_qgroup_trace_extent_post() And this leads to another BUG_ON(). > So a failure doesn't > necessarily mean the fs is in inconsistent state. But at the cost of aborting current transaction. > >> >> So it would be even better if the warning message can be integrated into >> btrfs_qgroup_trace_extent_post(). > > See below why I don't think integrating the warning is a good idea. In > the case being modified in this patch we will continue operating > normally, hence the warning and INCONSISTENT flag make sense. > >> >> Also btrfs_qgroup_trace_extent_post() also needs to ignore the return >> value from btrfs_qgroup_trace_extent_post(). > > I don't think so, if we are able to handle failures as is the case in > the delayed_data_ref case then we might abort the current transaction > and this should leave the fs in a consistent state. Here comes the trade-off. Keep the on-disk data consistent while abort current transaction and make fs read-only. VS Make the fs continue running while just discard the qgroup data. Although the truth is, either way we may eventually goes abort_transaction() since we failed to read some tree blocks. But since there are still some BUG_ON() wondering around the wild, the latter one seems a little better. > In that case even > the "STATUS_FLAG_INCONSISTENT" being set in qgroup_trace_extent_post > might be "wrong" in the sense that a failure from this function doesn't > necessarily make the quota inconsistent if upper layers can handle the > failures and revert their work. Well, most of them just abort the transaction and leads to above trade-off. Unfortunately, there is not much thing we can do in error handler. :( Thanks, Qu > So I'm now starting to think that the > inconsistent flag should be set in add_delayed_tree_ref, but this sort > of leaks the qgroup implementation detail into the delayed tree ref > function... >> >> Thanks, >> Qu >> >>> + } >>> return 0; >>> >>> free_head_ref: >>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >>> index b2ab5f795816..33f9dba44e92 100644 >>> --- a/fs/btrfs/qgroup.c >>> +++ b/fs/btrfs/qgroup.c >>> @@ -1440,8 +1440,10 @@ int btrfs_qgroup_trace_extent_post(struct >>> btrfs_fs_info *fs_info, >>> int ret; >>> >>> ret = btrfs_find_all_roots(NULL, fs_info, bytenr, 0, &old_root, false); >>> - if (ret < 0) >>> + if (ret < 0) { >>> + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; >>> return ret; >>> + } >>> >>> /* >>> * Here we don't need to get the lock of >>> @@ -2933,7 +2935,7 @@ static int __btrfs_qgroup_release_data(struct inode >>> *inode, >>> if (free && reserved) >>> return qgroup_free_reserved_data(inode, reserved, start, len); >>> extent_changeset_init(&changeset); >>> - ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start, >>> + ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start, >>> start + len -1, EXTENT_QGROUP_RESERVED, &changeset); >>> if (ret < 0) >>> goto out; >>> >>
signature.asc
Description: OpenPGP digital signature