On Tue, Apr 12, 2016 at 8:35 AM, Qu Wenruo <quwen...@cn.fujitsu.com>
wrote:
Current btrfs qgroup design implies a requirement that after calling
btrfs_qgroup_account_extents() there must be a commit root switch.
Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
inside btrfs_commit_transaction() just be commit_cowonly_roots().
However there is a exception at create_pending_snapshot(), which will
call btrfs_qgroup_account_extents() but no any commit root switch.
In case of creating a snapshot whose parent root is itself (create a
snapshot of fs tree), it will corrupt qgroup by the following trace:
(skipped unrelated data)
======
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384,
nr_old_roots = 0, nr_new_roots = 1
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1,
rfer = 0, excl = 0
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1,
rfer = 16384, excl = 16384
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384,
nr_old_roots = 0, nr_new_roots = 0
======
The problem here is in first qgroup_account_extent(), the
nr_new_roots of the extent is 1, which means its reference got
increased, and qgroup increased its rfer and excl.
But at second qgroup_account_extent(), its reference got decreased, but
between these two qgroup_account_extent(), there is no switch roots.
This leads to the same nr_old_roots, and this extent just got ignored by
qgroup, which means this extent is wrongly accounted.
Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
create_pending_snapshot(), with needed preparation.
Reported-by: Mark Fasheh <mfas...@suse.de>
Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com>
---
changelog:
v2:
Fix a soft lockup caused by missing switch_commit_root() call.
Fix a warning caused by dirty-but-not-committed root.
Note:
This may be the dirtiest hack I have ever done.
I don't like it either. But, more importantly, I don't think this is
correct. See below.
As there are already several different judgment to check if a fs root
should be updated. From root->last_trans to root->commit_root ==
root->node.
With this patch, we must switch the root of at least related fs tree
and extent tree to allow qgroup to call
btrfs_qgroup_account_extents().
But this will break some transid judgement, as transid is already
updated to current transid.
(maybe we need a special sub-transid for qgroup use only?)
As long as current qgroup use commit_root to determine old_roots,
there is no better idea though.
---
fs/btrfs/transaction.c | 96
+++++++++++++++++++++++++++++++++++++-------------
1 file changed, 71 insertions(+), 25 deletions(-)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 43885e5..0f299a56 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -311,12 +311,13 @@ loop:
* when the transaction commits
*/
static int record_root_in_trans(struct btrfs_trans_handle *trans,
- struct btrfs_root *root)
+ struct btrfs_root *root,
+ int force)
{
- if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
- root->last_trans < trans->transid) {
+ if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
+ root->last_trans < trans->transid) || force) {
WARN_ON(root == root->fs_info->extent_root);
- WARN_ON(root->commit_root != root->node);
+ WARN_ON(root->commit_root != root->node && !force);
/*
* see below for IN_TRANS_SETUP usage rules
@@ -331,7 +332,7 @@ static int record_root_in_trans(struct
btrfs_trans_handle *trans,
smp_wmb();
spin_lock(&root->fs_info->fs_roots_radix_lock);
- if (root->last_trans == trans->transid) {
+ if (root->last_trans == trans->transid && !force) {
spin_unlock(&root->fs_info->fs_roots_radix_lock);
return 0;
}
@@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct
btrfs_trans_handle *trans,
return 0;
mutex_lock(&root->fs_info->reloc_mutex);
- record_root_in_trans(trans, root);
+ record_root_in_trans(trans, root, 0);
mutex_unlock(&root->fs_info->reloc_mutex);
return 0;
@@ -1383,7 +1384,7 @@ static noinline int create_pending_snapshot(struct
btrfs_trans_handle *trans,
dentry = pending->dentry;
parent_inode = pending->dir;
parent_root = BTRFS_I(parent_inode)->root;
- record_root_in_trans(trans, parent_root);
+ record_root_in_trans(trans, parent_root, 0);
cur_time = current_fs_time(parent_inode->i_sb);
@@ -1420,7 +1421,7 @@ static noinline int create_pending_snapshot(struct
btrfs_trans_handle *trans,
goto fail;
}
- record_root_in_trans(trans, root);
+ record_root_in_trans(trans, root, 0);
btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));
btrfs_check_and_init_root_item(new_root_item);
@@ -1516,6 +1517,62 @@ static noinline int create_pending_snapshot(struct
btrfs_trans_handle *trans,
goto fail;
}
+ /*
+ * Account qgroups before insert the dir item
+ * As such dir item insert will modify parent_root, which could
be
+ * src root. If we don't do it now, wrong accounting may be
inherited
+ * to snapshot qgroup.
+ *
+ * For reason locking tree_log_mutex, see
btrfs_commit_transaction()
+ * comment
+ */
+ mutex_lock(&root->fs_info->tree_log_mutex);
+
+ ret = commit_fs_roots(trans, root);
+ if (ret) {
+ mutex_unlock(&root->fs_info->tree_log_mutex);
+ goto fail;
+ }
+
+ ret = btrfs_qgroup_prepare_account_extents(trans, root->fs_info);
+ if (ret < 0) {
+ mutex_unlock(&root->fs_info->tree_log_mutex);
+ goto fail;
+ }
+ ret = btrfs_qgroup_account_extents(trans, root->fs_info);
+ if (ret < 0) {
+ mutex_unlock(&root->fs_info->tree_log_mutex);
+ goto fail;
+ }
+ /*
+ * Now qgroup are all updated, we can inherit it to new qgroups
+ */
+ ret = btrfs_qgroup_inherit(trans, fs_info,
+ root->root_key.objectid,
+ objectid, pending->inherit);
+ if (ret < 0) {
+ mutex_unlock(&root->fs_info->tree_log_mutex);
+ goto fail;
+ }
+ /*
+ * qgroup_account_extents() must be followed by a
+ * switch_commit_roots(), or next qgroup_account_extents() will
+ * be corrupted
+ */
+ ret = commit_cowonly_roots(trans, root);
+ if (ret) {
+ mutex_unlock(&root->fs_info->tree_log_mutex);
+ goto fail;
+ }
+ /*
+ * Just like in btrfs_commit_transaction(), we need to
+ * switch_commit_roots().
+ * However this time we don't need to do a full one,
+ * excluding tree root and chunk root should be OK.
+ */
+ switch_commit_roots(trans->transaction, root->fs_info);
This will undo commit 2b9dbef272b63c561aab0a5be34fd428f7b710f5 (Btrfs:
keep dropped roots in cache until transaction commit). Won't it?
So create_pending_snapshot() / create_pending_snapshots() are called
at transaction commit before btrfs_qgroup_account_extents() is called.
And with create_pending_snapshot() now calling switch_commit_roots()
it means the roots for deleted snapshots and subvolumes are now gone
before btrfs_qgroup_account_extents() gets called, so it can't do the
correct calculations anymore for the cases described in that commit.
That is, btrfs_qgroup_account_extents() must always be called before
switch_commit_roots().
Or did I miss something?