Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On Wed, May 11, 2016 at 01:30:21PM -0700, Josef Bacik wrote: > > Signed-off-by: Qu Wenruo > > Signed-off-by: Mark Fasheh > Reviewed-by: Josef Bacik Applied to for-next with the following fixup to make it bisectable: --- btrfs: build fixup for qgroup_account_snapshot The macro btrfs_std_error got renamed to btrfs_handle_fs_error in an independent branch for the same merge target (4.7). To make the code compilable for bisectability reasons, add a temporary stub. Signed-off-by: David Sterba --- fs/btrfs/transaction.c | 5 + 1 file changed, 5 insertions(+) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index d7172d7ced5f..530081388d77 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1311,6 +1311,11 @@ int btrfs_defrag_root(struct btrfs_root *root) return ret; } +/* Bisesctability fixup, remove in 4.8 */ +#ifndef btrfs_std_error +#define btrfs_std_error btrfs_handle_fs_error +#endif + /* * Do all special snapshot related qgroup dirty hack. * --- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On 05/12/2016 04:30 AM, Josef Bacik wrote: On 05/11/2016 12:53 PM, Mark Fasheh wrote: On Wed, May 11, 2016 at 09:59:52AM -0700, Josef Bacik wrote: On 05/11/2016 09:57 AM, Mark Fasheh wrote: Hi Josef, On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: Current btrfs qgroup design implies a requirement that after calling btrfs_qgroup_account_extents() there must be a commit root switch. Normally this is OK, as btrfs_qgroup_accounting_extents() is only called inside btrfs_commit_transaction() just be commit_cowonly_roots(). However there is a exception at create_pending_snapshot(), which will call btrfs_qgroup_account_extents() but no any commit root switch. In case of creating a snapshot whose parent root is itself (create a snapshot of fs tree), it will corrupt qgroup by the following trace: (skipped unrelated data) == btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384 btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 == The problem here is in first qgroup_account_extent(), the nr_new_roots of the extent is 1, which means its reference got increased, and qgroup increased its rfer and excl. But at second qgroup_account_extent(), its reference got decreased, but between these two qgroup_account_extent(), there is no switch roots. This leads to the same nr_old_roots, and this extent just got ignored by qgroup, which means this extent is wrongly accounted. Fix it by call commit_cowonly_roots() after qgroup_account_extent() in create_pending_snapshot(), with needed preparation. Reported-by: Mark Fasheh Signed-off-by: Qu Wenruo --- v2: Fix a soft lockup caused by missing switch_commit_root() call. Fix a warning caused by dirty-but-not-committed root. v3: Fix a bug which will cause qgroup accounting for dropping snapshot wrong v4: Fix a bug caused by non-cowed btree modification. To Filipe: I'm sorry I didn't wait for your reply on the dropped roots. I reverted back the version where we deleted dropped roots in switch_commit_roots(). As I think as long as we called btrfs_qgroup_prepare_account_extents() and btrfs_qgroup_account_extents(), it should have already accounted extents for dropped roots, and then we are OK to drop them. It would be very nice if you could point out what I missed. Thanks Qu --- fs/btrfs/transaction.c | 117 +++-- 1 file changed, 93 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 43885e5..92f8193 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -311,10 +311,11 @@ loop: * when the transaction commits */ static int record_root_in_trans(struct btrfs_trans_handle *trans, - struct btrfs_root *root) + struct btrfs_root *root, + int force) { -if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) && -root->last_trans < trans->transid) { +if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) && +root->last_trans < trans->transid) || force) { WARN_ON(root == root->fs_info->extent_root); WARN_ON(root->commit_root != root->node); @@ -331,7 +332,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans, smp_wmb(); spin_lock(&root->fs_info->fs_roots_radix_lock); -if (root->last_trans == trans->transid) { +if (root->last_trans == trans->transid && !force) { spin_unlock(&root->fs_info->fs_roots_radix_lock); return 0; } @@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans, return 0; mutex_lock(&root->fs_info->reloc_mutex); -record_root_in_trans(trans, root); +record_root_in_trans(trans, root, 0); mutex_unlock(&root->fs_info->reloc_mutex); return 0; @@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root) } /* + * Do all special snapshot related qgroup dirty hack. + * + * Will do all needed qgroup inherit and dirty hack like switch commit + * roots inside one transaction and write all btree into disk, to make + * qgroup works. + */ +static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, + struct btrfs_root *src, + struct btrfs_root *parent, + struct btrfs_qgroup_inherit *inherit, + u64 dst_objectid) +{ +struct btrfs_fs_info *fs_info = src->fs_info; +int ret; + +/* + * We are going to commit transaction, see btrfs_commit_transaction() + * comment for reason locking tree_log_mutex + */ +mutex_lock(&fs_info->tree_log_mutex); + +ret = commit_fs_roots(trans, src); +if (
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On 05/11/2016 12:53 PM, Mark Fasheh wrote: On Wed, May 11, 2016 at 09:59:52AM -0700, Josef Bacik wrote: On 05/11/2016 09:57 AM, Mark Fasheh wrote: Hi Josef, On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: Current btrfs qgroup design implies a requirement that after calling btrfs_qgroup_account_extents() there must be a commit root switch. Normally this is OK, as btrfs_qgroup_accounting_extents() is only called inside btrfs_commit_transaction() just be commit_cowonly_roots(). However there is a exception at create_pending_snapshot(), which will call btrfs_qgroup_account_extents() but no any commit root switch. In case of creating a snapshot whose parent root is itself (create a snapshot of fs tree), it will corrupt qgroup by the following trace: (skipped unrelated data) == btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384 btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 == The problem here is in first qgroup_account_extent(), the nr_new_roots of the extent is 1, which means its reference got increased, and qgroup increased its rfer and excl. But at second qgroup_account_extent(), its reference got decreased, but between these two qgroup_account_extent(), there is no switch roots. This leads to the same nr_old_roots, and this extent just got ignored by qgroup, which means this extent is wrongly accounted. Fix it by call commit_cowonly_roots() after qgroup_account_extent() in create_pending_snapshot(), with needed preparation. Reported-by: Mark Fasheh Signed-off-by: Qu Wenruo --- v2: Fix a soft lockup caused by missing switch_commit_root() call. Fix a warning caused by dirty-but-not-committed root. v3: Fix a bug which will cause qgroup accounting for dropping snapshot wrong v4: Fix a bug caused by non-cowed btree modification. To Filipe: I'm sorry I didn't wait for your reply on the dropped roots. I reverted back the version where we deleted dropped roots in switch_commit_roots(). As I think as long as we called btrfs_qgroup_prepare_account_extents() and btrfs_qgroup_account_extents(), it should have already accounted extents for dropped roots, and then we are OK to drop them. It would be very nice if you could point out what I missed. Thanks Qu --- fs/btrfs/transaction.c | 117 +++-- 1 file changed, 93 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 43885e5..92f8193 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -311,10 +311,11 @@ loop: * when the transaction commits */ static int record_root_in_trans(struct btrfs_trans_handle *trans, - struct btrfs_root *root) + struct btrfs_root *root, + int force) { - if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) && - root->last_trans < trans->transid) { + if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) && + root->last_trans < trans->transid) || force) { WARN_ON(root == root->fs_info->extent_root); WARN_ON(root->commit_root != root->node); @@ -331,7 +332,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans, smp_wmb(); spin_lock(&root->fs_info->fs_roots_radix_lock); - if (root->last_trans == trans->transid) { + if (root->last_trans == trans->transid && !force) { spin_unlock(&root->fs_info->fs_roots_radix_lock); return 0; } @@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans, return 0; mutex_lock(&root->fs_info->reloc_mutex); - record_root_in_trans(trans, root); + record_root_in_trans(trans, root, 0); mutex_unlock(&root->fs_info->reloc_mutex); return 0; @@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root) } /* + * Do all special snapshot related qgroup dirty hack. + * + * Will do all needed qgroup inherit and dirty hack like switch commit + * roots inside one transaction and write all btree into disk, to make + * qgroup works. + */ +static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, + struct btrfs_root *src, + struct btrfs_root *parent, + struct btrfs_qgroup_inherit *inherit, + u64 dst_objectid) +{ + struct btrfs_fs_info *fs_info = src->fs_info; + int ret; + + /* +* We are going to commit transaction, see
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On Wed, May 11, 2016 at 09:59:52AM -0700, Josef Bacik wrote: > On 05/11/2016 09:57 AM, Mark Fasheh wrote: > >Hi Josef, > > > >On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: > >>On 04/15/2016 05:08 AM, Qu Wenruo wrote: > >>>Current btrfs qgroup design implies a requirement that after calling > >>>btrfs_qgroup_account_extents() there must be a commit root switch. > >>> > >>>Normally this is OK, as btrfs_qgroup_accounting_extents() is only called > >>>inside btrfs_commit_transaction() just be commit_cowonly_roots(). > >>> > >>>However there is a exception at create_pending_snapshot(), which will > >>>call btrfs_qgroup_account_extents() but no any commit root switch. > >>> > >>>In case of creating a snapshot whose parent root is itself (create a > >>>snapshot of fs tree), it will corrupt qgroup by the following trace: > >>>(skipped unrelated data) > >>>== > >>>btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, > >>>nr_old_roots = 0, nr_new_roots = 1 > >>>qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, > >>>rfer = 0, excl = 0 > >>>qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, > >>>rfer = 16384, excl = 16384 > >>>btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, > >>>nr_old_roots = 0, nr_new_roots = 0 > >>>== > >>> > >>>The problem here is in first qgroup_account_extent(), the > >>>nr_new_roots of the extent is 1, which means its reference got > >>>increased, and qgroup increased its rfer and excl. > >>> > >>>But at second qgroup_account_extent(), its reference got decreased, but > >>>between these two qgroup_account_extent(), there is no switch roots. > >>>This leads to the same nr_old_roots, and this extent just got ignored by > >>>qgroup, which means this extent is wrongly accounted. > >>> > >>>Fix it by call commit_cowonly_roots() after qgroup_account_extent() in > >>>create_pending_snapshot(), with needed preparation. > >>> > >>>Reported-by: Mark Fasheh > >>>Signed-off-by: Qu Wenruo > >>>--- > >>>v2: > >>> Fix a soft lockup caused by missing switch_commit_root() call. > >>> Fix a warning caused by dirty-but-not-committed root. > >>>v3: > >>> Fix a bug which will cause qgroup accounting for dropping snapshot > >>> wrong > >>>v4: > >>> Fix a bug caused by non-cowed btree modification. > >>> > >>>To Filipe: > >>> I'm sorry I didn't wait for your reply on the dropped roots. > >>> I reverted back the version where we deleted dropped roots in > >>> switch_commit_roots(). > >>> > >>> As I think as long as we called btrfs_qgroup_prepare_account_extents() > >>> and btrfs_qgroup_account_extents(), it should have already accounted > >>> extents for dropped roots, and then we are OK to drop them. > >>> > >>> It would be very nice if you could point out what I missed. > >>> Thanks > >>> Qu > >>>--- > >>> fs/btrfs/transaction.c | 117 > >>> +++-- > >>> 1 file changed, 93 insertions(+), 24 deletions(-) > >>> > >>>diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > >>>index 43885e5..92f8193 100644 > >>>--- a/fs/btrfs/transaction.c > >>>+++ b/fs/btrfs/transaction.c > >>>@@ -311,10 +311,11 @@ loop: > >>> * when the transaction commits > >>> */ > >>> static int record_root_in_trans(struct btrfs_trans_handle *trans, > >>>- struct btrfs_root *root) > >>>+ struct btrfs_root *root, > >>>+ int force) > >>> { > >>>- if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) && > >>>- root->last_trans < trans->transid) { > >>>+ if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) && > >>>+ root->last_trans < trans->transid) || force) { > >>> WARN_ON(root == root->fs_info->extent_root); > >>> WARN_ON(root->commit_root != root->node); > >>> > >>>@@ -331,7 +332,7 @@ static int record_root_in_trans(struct > >>>btrfs_trans_handle *trans, > >>> smp_wmb(); > >>> > >>> spin_lock(&root->fs_info->fs_roots_radix_lock); > >>>- if (root->last_trans == trans->transid) { > >>>+ if (root->last_trans == trans->transid && !force) { > >>> spin_unlock(&root->fs_info->fs_roots_radix_lock); > >>> return 0; > >>> } > >>>@@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct > >>>btrfs_trans_handle *trans, > >>> return 0; > >>> > >>> mutex_lock(&root->fs_info->reloc_mutex); > >>>- record_root_in_trans(trans, root); > >>>+ record_root_in_trans(trans, root, 0); > >>> mutex_unlock(&root->fs_info->reloc_mutex); > >>> > >>> return 0; > >>>@@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root) > >>> } > >>> > >>> /* > >>>+ * Do all special snapshot related qgroup dirty hack. > >>>+ * > >>>+ * Will do all needed qgroup inherit and dirty hack like switch commit > >>>+ * roots inside one transaction and write all btree into disk, to make > >>>+ * qgroup works. > >>>+ */ > >>>+sta
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On 05/11/2016 09:57 AM, Mark Fasheh wrote: Hi Josef, On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: Current btrfs qgroup design implies a requirement that after calling btrfs_qgroup_account_extents() there must be a commit root switch. Normally this is OK, as btrfs_qgroup_accounting_extents() is only called inside btrfs_commit_transaction() just be commit_cowonly_roots(). However there is a exception at create_pending_snapshot(), which will call btrfs_qgroup_account_extents() but no any commit root switch. In case of creating a snapshot whose parent root is itself (create a snapshot of fs tree), it will corrupt qgroup by the following trace: (skipped unrelated data) == btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384 btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 == The problem here is in first qgroup_account_extent(), the nr_new_roots of the extent is 1, which means its reference got increased, and qgroup increased its rfer and excl. But at second qgroup_account_extent(), its reference got decreased, but between these two qgroup_account_extent(), there is no switch roots. This leads to the same nr_old_roots, and this extent just got ignored by qgroup, which means this extent is wrongly accounted. Fix it by call commit_cowonly_roots() after qgroup_account_extent() in create_pending_snapshot(), with needed preparation. Reported-by: Mark Fasheh Signed-off-by: Qu Wenruo --- v2: Fix a soft lockup caused by missing switch_commit_root() call. Fix a warning caused by dirty-but-not-committed root. v3: Fix a bug which will cause qgroup accounting for dropping snapshot wrong v4: Fix a bug caused by non-cowed btree modification. To Filipe: I'm sorry I didn't wait for your reply on the dropped roots. I reverted back the version where we deleted dropped roots in switch_commit_roots(). As I think as long as we called btrfs_qgroup_prepare_account_extents() and btrfs_qgroup_account_extents(), it should have already accounted extents for dropped roots, and then we are OK to drop them. It would be very nice if you could point out what I missed. Thanks Qu --- fs/btrfs/transaction.c | 117 +++-- 1 file changed, 93 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 43885e5..92f8193 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -311,10 +311,11 @@ loop: * when the transaction commits */ static int record_root_in_trans(struct btrfs_trans_handle *trans, - struct btrfs_root *root) + struct btrfs_root *root, + int force) { - if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) && - root->last_trans < trans->transid) { + if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) && + root->last_trans < trans->transid) || force) { WARN_ON(root == root->fs_info->extent_root); WARN_ON(root->commit_root != root->node); @@ -331,7 +332,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans, smp_wmb(); spin_lock(&root->fs_info->fs_roots_radix_lock); - if (root->last_trans == trans->transid) { + if (root->last_trans == trans->transid && !force) { spin_unlock(&root->fs_info->fs_roots_radix_lock); return 0; } @@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans, return 0; mutex_lock(&root->fs_info->reloc_mutex); - record_root_in_trans(trans, root); + record_root_in_trans(trans, root, 0); mutex_unlock(&root->fs_info->reloc_mutex); return 0; @@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root) } /* + * Do all special snapshot related qgroup dirty hack. + * + * Will do all needed qgroup inherit and dirty hack like switch commit + * roots inside one transaction and write all btree into disk, to make + * qgroup works. + */ +static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, + struct btrfs_root *src, + struct btrfs_root *parent, + struct btrfs_qgroup_inherit *inherit, + u64 dst_objectid) +{ + struct btrfs_fs_info *fs_info = src->fs_info; + int ret; + + /* +* We are going to commit transaction, see btrfs_commit_transaction() +* comment for reason locking tree_log_mutex +
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
Hi Josef, On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: > On 04/15/2016 05:08 AM, Qu Wenruo wrote: > >Current btrfs qgroup design implies a requirement that after calling > >btrfs_qgroup_account_extents() there must be a commit root switch. > > > >Normally this is OK, as btrfs_qgroup_accounting_extents() is only called > >inside btrfs_commit_transaction() just be commit_cowonly_roots(). > > > >However there is a exception at create_pending_snapshot(), which will > >call btrfs_qgroup_account_extents() but no any commit root switch. > > > >In case of creating a snapshot whose parent root is itself (create a > >snapshot of fs tree), it will corrupt qgroup by the following trace: > >(skipped unrelated data) > >== > >btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, > >nr_old_roots = 0, nr_new_roots = 1 > >qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer > >= 0, excl = 0 > >qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer > >= 16384, excl = 16384 > >btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, > >nr_old_roots = 0, nr_new_roots = 0 > >== > > > >The problem here is in first qgroup_account_extent(), the > >nr_new_roots of the extent is 1, which means its reference got > >increased, and qgroup increased its rfer and excl. > > > >But at second qgroup_account_extent(), its reference got decreased, but > >between these two qgroup_account_extent(), there is no switch roots. > >This leads to the same nr_old_roots, and this extent just got ignored by > >qgroup, which means this extent is wrongly accounted. > > > >Fix it by call commit_cowonly_roots() after qgroup_account_extent() in > >create_pending_snapshot(), with needed preparation. > > > >Reported-by: Mark Fasheh > >Signed-off-by: Qu Wenruo > >--- > >v2: > > Fix a soft lockup caused by missing switch_commit_root() call. > > Fix a warning caused by dirty-but-not-committed root. > >v3: > > Fix a bug which will cause qgroup accounting for dropping snapshot > > wrong > >v4: > > Fix a bug caused by non-cowed btree modification. > > > >To Filipe: > > I'm sorry I didn't wait for your reply on the dropped roots. > > I reverted back the version where we deleted dropped roots in > > switch_commit_roots(). > > > > As I think as long as we called btrfs_qgroup_prepare_account_extents() > > and btrfs_qgroup_account_extents(), it should have already accounted > > extents for dropped roots, and then we are OK to drop them. > > > > It would be very nice if you could point out what I missed. > > Thanks > > Qu > >--- > > fs/btrfs/transaction.c | 117 > > +++-- > > 1 file changed, 93 insertions(+), 24 deletions(-) > > > >diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > >index 43885e5..92f8193 100644 > >--- a/fs/btrfs/transaction.c > >+++ b/fs/btrfs/transaction.c > >@@ -311,10 +311,11 @@ loop: > > * when the transaction commits > > */ > > static int record_root_in_trans(struct btrfs_trans_handle *trans, > >- struct btrfs_root *root) > >+ struct btrfs_root *root, > >+ int force) > > { > >-if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) && > >-root->last_trans < trans->transid) { > >+if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) && > >+root->last_trans < trans->transid) || force) { > > WARN_ON(root == root->fs_info->extent_root); > > WARN_ON(root->commit_root != root->node); > > > >@@ -331,7 +332,7 @@ static int record_root_in_trans(struct > >btrfs_trans_handle *trans, > > smp_wmb(); > > > > spin_lock(&root->fs_info->fs_roots_radix_lock); > >-if (root->last_trans == trans->transid) { > >+if (root->last_trans == trans->transid && !force) { > > spin_unlock(&root->fs_info->fs_roots_radix_lock); > > return 0; > > } > >@@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle > >*trans, > > return 0; > > > > mutex_lock(&root->fs_info->reloc_mutex); > >-record_root_in_trans(trans, root); > >+record_root_in_trans(trans, root, 0); > > mutex_unlock(&root->fs_info->reloc_mutex); > > > > return 0; > >@@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root) > > } > > > > /* > >+ * Do all special snapshot related qgroup dirty hack. > >+ * > >+ * Will do all needed qgroup inherit and dirty hack like switch commit > >+ * roots inside one transaction and write all btree into disk, to make > >+ * qgroup works. > >+ */ > >+static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, > >+ struct btrfs_root *src, > >+ struct btrfs_root *parent, > >+ struct btrfs_qgroup_inherit *inherit, > >+
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
Josef Bacik wrote on 2016/04/26 10:26 -0400: On 04/25/2016 08:35 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/25 10:24 -0400: On 04/24/2016 08:56 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:23 -0400: On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: +/* + * Force parent root to be updated, as we recorded it before so its + * last_trans == cur_transid. + * Or it won't be committed again onto disk after later + * insert_dir_item() + */ +if (!ret) +record_root_in_trans(trans, parent, 1); +return ret; +} NACK, holy shit we aren't adding a special transaction commit only for qgroup snapshots. Figure out a different way. Thanks, Yeah I saw that. To be fair, we run a whole lot of the transaction stuff multiple times (at least from my reading) so I'm really unclear on what the performance impact is. Do you have any suggestion though? We've been banging our heads against this for a while now and as slow as this patch might be, it actually works where nothing else has so far. I'm less concerned about committing another transaction and more concerned about the fact that it is an special variant of the transaction commit. If this goes wrong, or at some point in the future we fail to update it along with btrfs_transaction_commit we suddenly are corrupting metadata. If we have to commit a transaction then call btrfs_commit_transaction(), don't open code a stripped down version, here be dragons. Thanks, Josef Yes, I also don't like the dirty hack. Although the problem is, we have no other good choice. If we can call commit_transaction() that's the best case, but the problem is, in create_pending_snapshots(), we are already inside commit_transaction(). Or commit_transaction() can be called inside commit_transaction()? No, figure out a different way. IIRC I dealt with this with the no_quota flag for inc_ref/dec_ref since the copy root stuff does strange things with the reference counts, but all this code is gone now. I looked around to see if I could figure out how the refs are ending up this way but it doesn't make sense to me and there isn't enough information in your changelog for me to be able to figure it out. You've created this mess, clean it up without making it messier. Thanks, Josef Unfortunately, your original no_quota flag just hide the bug, and hide it in a bad method. Originally, no_quota flag is used for case like this, to skip quota at snapshot creation, and use quota_inherit() to hack the quota accounting. It seems work, but in fact, if the DIR_ITEM insert need to create a new cousin leaf, then quota is messed up. No, and this is the problem, you fundamentally didn't understand what I wrote, and instead of trying to understand it and fix the bug you just threw it all away. The no_quota stuff was not a hack, it was put in place to deal with refs we already knew where accounted for, such as when we converted to mixed refs or other such operations. Even we know it has been accounted, it's still a hack to jump away from normal accounting routing. Just like what we do in qgroup_inherit(). We believe snapshot creation will always make the exclusive to nodesize. But in fact, we didn't consider the higher qgroup, and qgroup_inherit() is just messing up that case. And Yes, I DID threw the old qgroup code away. As there are too many existing bugs and possible bugs, not only in qgroup, but also in backref walk. Backref walk can't handle time_seq really well, that's one of the reason for btrfs/091. And new comer must handle the extra no_quota flag, which is more easy to cause new regression. There were bugs in my rework, but now the situation is untenable. What we now have is something that holds delayed refs in memory for the entire transaction, which is a non-starter for anybody who wants to use this in production. Nope, we didn't hold delayed refs in current qgroup codes. Delayed refs can be flushed to disk at any time just as it is. We only trace which bytenr will go through qgroup routine, nothing to do with delayed refs. Yes you can have millions of delayed refs, but there won't be millions of different extents. Or sync_fs() would have been triggered. On our gluster machines we get millions of delayed refs per transaction, and then there are multiple file systems. Then you have to post-process all of that during the critical section of the commit? So now suddenly I'm walking millions of delayed refs doing accounting all at once, that is going to cause commit latencies in the seconds which is completely unacceptable. In fact qgroup code will never account or walk through delayed refs. Qgroup code will only walk through backrefs in extent tree. (commit tree and just before commit tree) So, millions is already reduced to a much smaller amount. Then, for old qgroup code, each time a inc/dec_extent_
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On 04/25/2016 08:35 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/25 10:24 -0400: On 04/24/2016 08:56 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:23 -0400: On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: +/* + * Force parent root to be updated, as we recorded it before so its + * last_trans == cur_transid. + * Or it won't be committed again onto disk after later + * insert_dir_item() + */ +if (!ret) +record_root_in_trans(trans, parent, 1); +return ret; +} NACK, holy shit we aren't adding a special transaction commit only for qgroup snapshots. Figure out a different way. Thanks, Yeah I saw that. To be fair, we run a whole lot of the transaction stuff multiple times (at least from my reading) so I'm really unclear on what the performance impact is. Do you have any suggestion though? We've been banging our heads against this for a while now and as slow as this patch might be, it actually works where nothing else has so far. I'm less concerned about committing another transaction and more concerned about the fact that it is an special variant of the transaction commit. If this goes wrong, or at some point in the future we fail to update it along with btrfs_transaction_commit we suddenly are corrupting metadata. If we have to commit a transaction then call btrfs_commit_transaction(), don't open code a stripped down version, here be dragons. Thanks, Josef Yes, I also don't like the dirty hack. Although the problem is, we have no other good choice. If we can call commit_transaction() that's the best case, but the problem is, in create_pending_snapshots(), we are already inside commit_transaction(). Or commit_transaction() can be called inside commit_transaction()? No, figure out a different way. IIRC I dealt with this with the no_quota flag for inc_ref/dec_ref since the copy root stuff does strange things with the reference counts, but all this code is gone now. I looked around to see if I could figure out how the refs are ending up this way but it doesn't make sense to me and there isn't enough information in your changelog for me to be able to figure it out. You've created this mess, clean it up without making it messier. Thanks, Josef Unfortunately, your original no_quota flag just hide the bug, and hide it in a bad method. Originally, no_quota flag is used for case like this, to skip quota at snapshot creation, and use quota_inherit() to hack the quota accounting. It seems work, but in fact, if the DIR_ITEM insert need to create a new cousin leaf, then quota is messed up. No, and this is the problem, you fundamentally didn't understand what I wrote, and instead of trying to understand it and fix the bug you just threw it all away. The no_quota stuff was not a hack, it was put in place to deal with refs we already knew where accounted for, such as when we converted to mixed refs or other such operations. There were bugs in my rework, but now the situation is untenable. What we now have is something that holds delayed refs in memory for the entire transaction, which is a non-starter for anybody who wants to use this in production. On our gluster machines we get millions of delayed refs per transaction, and then there are multiple file systems. Then you have to post-process all of that during the critical section of the commit? So now suddenly I'm walking millions of delayed refs doing accounting all at once, that is going to cause commit latencies in the seconds which is completely unacceptable. Anyway I spent some time this morning reading through the new stuff to figure out how it works now and I've got a patch to fix this problem that doesn't involve screwing with the transaction commit stuff at all. I sent it along separately Btrfs: fix qgroup accounting when snapshotting This fixes the basic case that was described originally. I haven't tested it other than that but I'm pretty sure it is correct. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
Josef Bacik wrote on 2016/04/25 10:24 -0400: On 04/24/2016 08:56 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:23 -0400: On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: +/* + * Force parent root to be updated, as we recorded it before so its + * last_trans == cur_transid. + * Or it won't be committed again onto disk after later + * insert_dir_item() + */ +if (!ret) +record_root_in_trans(trans, parent, 1); +return ret; +} NACK, holy shit we aren't adding a special transaction commit only for qgroup snapshots. Figure out a different way. Thanks, Yeah I saw that. To be fair, we run a whole lot of the transaction stuff multiple times (at least from my reading) so I'm really unclear on what the performance impact is. Do you have any suggestion though? We've been banging our heads against this for a while now and as slow as this patch might be, it actually works where nothing else has so far. I'm less concerned about committing another transaction and more concerned about the fact that it is an special variant of the transaction commit. If this goes wrong, or at some point in the future we fail to update it along with btrfs_transaction_commit we suddenly are corrupting metadata. If we have to commit a transaction then call btrfs_commit_transaction(), don't open code a stripped down version, here be dragons. Thanks, Josef Yes, I also don't like the dirty hack. Although the problem is, we have no other good choice. If we can call commit_transaction() that's the best case, but the problem is, in create_pending_snapshots(), we are already inside commit_transaction(). Or commit_transaction() can be called inside commit_transaction()? No, figure out a different way. IIRC I dealt with this with the no_quota flag for inc_ref/dec_ref since the copy root stuff does strange things with the reference counts, but all this code is gone now. I looked around to see if I could figure out how the refs are ending up this way but it doesn't make sense to me and there isn't enough information in your changelog for me to be able to figure it out. You've created this mess, clean it up without making it messier. Thanks, Josef Unfortunately, your original no_quota flag just hide the bug, and hide it in a bad method. Originally, no_quota flag is used for case like this, to skip quota at snapshot creation, and use quota_inherit() to hack the quota accounting. It seems work, but in fact, if the DIR_ITEM insert need to create a new cousin leaf, then quota is messed up. Your quota rework doesn't really help, as it won't even accounting things well, just check fstest/btrfs/091 on 4.1 kernel. The only perfect fix for this already nasty subvolume creation is to do full subtree rescan. Or no one knows when higher qgroups will be broken. If you think splitting commit_transaction into two variants can cause problem, I can merge this two variants into one. As in btrfs_commit_transaction() the commit process is much the same as the one used in create_pending_snapshot(). If there is only one __commit_roots() to do such commit, then there is nothing special only for quota. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On 04/24/2016 08:56 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:23 -0400: On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: +/* + * Force parent root to be updated, as we recorded it before so its + * last_trans == cur_transid. + * Or it won't be committed again onto disk after later + * insert_dir_item() + */ +if (!ret) +record_root_in_trans(trans, parent, 1); +return ret; +} NACK, holy shit we aren't adding a special transaction commit only for qgroup snapshots. Figure out a different way. Thanks, Yeah I saw that. To be fair, we run a whole lot of the transaction stuff multiple times (at least from my reading) so I'm really unclear on what the performance impact is. Do you have any suggestion though? We've been banging our heads against this for a while now and as slow as this patch might be, it actually works where nothing else has so far. I'm less concerned about committing another transaction and more concerned about the fact that it is an special variant of the transaction commit. If this goes wrong, or at some point in the future we fail to update it along with btrfs_transaction_commit we suddenly are corrupting metadata. If we have to commit a transaction then call btrfs_commit_transaction(), don't open code a stripped down version, here be dragons. Thanks, Josef Yes, I also don't like the dirty hack. Although the problem is, we have no other good choice. If we can call commit_transaction() that's the best case, but the problem is, in create_pending_snapshots(), we are already inside commit_transaction(). Or commit_transaction() can be called inside commit_transaction()? No, figure out a different way. IIRC I dealt with this with the no_quota flag for inc_ref/dec_ref since the copy root stuff does strange things with the reference counts, but all this code is gone now. I looked around to see if I could figure out how the refs are ending up this way but it doesn't make sense to me and there isn't enough information in your changelog for me to be able to figure it out. You've created this mess, clean it up without making it messier. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
Josef Bacik wrote on 2016/04/22 14:23 -0400: On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: +/* + * Force parent root to be updated, as we recorded it before so its + * last_trans == cur_transid. + * Or it won't be committed again onto disk after later + * insert_dir_item() + */ +if (!ret) +record_root_in_trans(trans, parent, 1); +return ret; +} NACK, holy shit we aren't adding a special transaction commit only for qgroup snapshots. Figure out a different way. Thanks, Yeah I saw that. To be fair, we run a whole lot of the transaction stuff multiple times (at least from my reading) so I'm really unclear on what the performance impact is. Do you have any suggestion though? We've been banging our heads against this for a while now and as slow as this patch might be, it actually works where nothing else has so far. I'm less concerned about committing another transaction and more concerned about the fact that it is an special variant of the transaction commit. If this goes wrong, or at some point in the future we fail to update it along with btrfs_transaction_commit we suddenly are corrupting metadata. If we have to commit a transaction then call btrfs_commit_transaction(), don't open code a stripped down version, here be dragons. Thanks, Josef Yes, I also don't like the dirty hack. Although the problem is, we have no other good choice. If we can call commit_transaction() that's the best case, but the problem is, in create_pending_snapshots(), we are already inside commit_transaction(). Or commit_transaction() can be called inside commit_transaction()? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On Fri, Apr 22, 2016 at 02:23:59PM -0400, Josef Bacik wrote: > On 04/22/2016 02:21 PM, Mark Fasheh wrote: > >On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: > >>On 04/15/2016 05:08 AM, Qu Wenruo wrote: > >>>+ /* > >>>+ * Force parent root to be updated, as we recorded it before so its > >>>+ * last_trans == cur_transid. > >>>+ * Or it won't be committed again onto disk after later > >>>+ * insert_dir_item() > >>>+ */ > >>>+ if (!ret) > >>>+ record_root_in_trans(trans, parent, 1); > >>>+ return ret; > >>>+} > >> > >>NACK, holy shit we aren't adding a special transaction commit only > >>for qgroup snapshots. Figure out a different way. Thanks, > > > >Yeah I saw that. To be fair, we run a whole lot of the transaction stuff > >multiple times (at least from my reading) so I'm really unclear on what the > >performance impact is. > > > >Do you have any suggestion though? We've been banging our heads against this > >for a while now and as slow as this patch might be, it actually works where > >nothing else has so far. > > I'm less concerned about committing another transaction and more > concerned about the fact that it is an special variant of the > transaction commit. If this goes wrong, or at some point in the > future we fail to update it along with btrfs_transaction_commit we > suddenly are corrupting metadata. If we have to commit a > transaction then call btrfs_commit_transaction(), don't open code a > stripped down version, here be dragons. Thanks, Ok yeah that makes perfect sense - I thought you were telling me that this would be a big performance problem. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: + /* +* Force parent root to be updated, as we recorded it before so its +* last_trans == cur_transid. +* Or it won't be committed again onto disk after later +* insert_dir_item() +*/ + if (!ret) + record_root_in_trans(trans, parent, 1); + return ret; +} NACK, holy shit we aren't adding a special transaction commit only for qgroup snapshots. Figure out a different way. Thanks, Yeah I saw that. To be fair, we run a whole lot of the transaction stuff multiple times (at least from my reading) so I'm really unclear on what the performance impact is. Do you have any suggestion though? We've been banging our heads against this for a while now and as slow as this patch might be, it actually works where nothing else has so far. I'm less concerned about committing another transaction and more concerned about the fact that it is an special variant of the transaction commit. If this goes wrong, or at some point in the future we fail to update it along with btrfs_transaction_commit we suddenly are corrupting metadata. If we have to commit a transaction then call btrfs_commit_transaction(), don't open code a stripped down version, here be dragons. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: > On 04/15/2016 05:08 AM, Qu Wenruo wrote: > >+/* > >+ * Force parent root to be updated, as we recorded it before so its > >+ * last_trans == cur_transid. > >+ * Or it won't be committed again onto disk after later > >+ * insert_dir_item() > >+ */ > >+if (!ret) > >+record_root_in_trans(trans, parent, 1); > >+return ret; > >+} > > NACK, holy shit we aren't adding a special transaction commit only > for qgroup snapshots. Figure out a different way. Thanks, Yeah I saw that. To be fair, we run a whole lot of the transaction stuff multiple times (at least from my reading) so I'm really unclear on what the performance impact is. Do you have any suggestion though? We've been banging our heads against this for a while now and as slow as this patch might be, it actually works where nothing else has so far. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On 04/15/2016 05:08 AM, Qu Wenruo wrote: Current btrfs qgroup design implies a requirement that after calling btrfs_qgroup_account_extents() there must be a commit root switch. Normally this is OK, as btrfs_qgroup_accounting_extents() is only called inside btrfs_commit_transaction() just be commit_cowonly_roots(). However there is a exception at create_pending_snapshot(), which will call btrfs_qgroup_account_extents() but no any commit root switch. In case of creating a snapshot whose parent root is itself (create a snapshot of fs tree), it will corrupt qgroup by the following trace: (skipped unrelated data) == btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384 btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 == The problem here is in first qgroup_account_extent(), the nr_new_roots of the extent is 1, which means its reference got increased, and qgroup increased its rfer and excl. But at second qgroup_account_extent(), its reference got decreased, but between these two qgroup_account_extent(), there is no switch roots. This leads to the same nr_old_roots, and this extent just got ignored by qgroup, which means this extent is wrongly accounted. Fix it by call commit_cowonly_roots() after qgroup_account_extent() in create_pending_snapshot(), with needed preparation. Reported-by: Mark Fasheh Signed-off-by: Qu Wenruo --- v2: Fix a soft lockup caused by missing switch_commit_root() call. Fix a warning caused by dirty-but-not-committed root. v3: Fix a bug which will cause qgroup accounting for dropping snapshot wrong v4: Fix a bug caused by non-cowed btree modification. To Filipe: I'm sorry I didn't wait for your reply on the dropped roots. I reverted back the version where we deleted dropped roots in switch_commit_roots(). As I think as long as we called btrfs_qgroup_prepare_account_extents() and btrfs_qgroup_account_extents(), it should have already accounted extents for dropped roots, and then we are OK to drop them. It would be very nice if you could point out what I missed. Thanks Qu --- fs/btrfs/transaction.c | 117 +++-- 1 file changed, 93 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 43885e5..92f8193 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -311,10 +311,11 @@ loop: * when the transaction commits */ static int record_root_in_trans(struct btrfs_trans_handle *trans, - struct btrfs_root *root) + struct btrfs_root *root, + int force) { - if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) && - root->last_trans < trans->transid) { + if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) && + root->last_trans < trans->transid) || force) { WARN_ON(root == root->fs_info->extent_root); WARN_ON(root->commit_root != root->node); @@ -331,7 +332,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans, smp_wmb(); spin_lock(&root->fs_info->fs_roots_radix_lock); - if (root->last_trans == trans->transid) { + if (root->last_trans == trans->transid && !force) { spin_unlock(&root->fs_info->fs_roots_radix_lock); return 0; } @@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans, return 0; mutex_lock(&root->fs_info->reloc_mutex); - record_root_in_trans(trans, root); + record_root_in_trans(trans, root, 0); mutex_unlock(&root->fs_info->reloc_mutex); return 0; @@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root) } /* + * Do all special snapshot related qgroup dirty hack. + * + * Will do all needed qgroup inherit and dirty hack like switch commit + * roots inside one transaction and write all btree into disk, to make + * qgroup works. + */ +static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, + struct btrfs_root *src, + struct btrfs_root *parent, + struct btrfs_qgroup_inherit *inherit, + u64 dst_objectid) +{ + struct btrfs_fs_info *fs_info = src->fs_info; + int ret; + + /* +* We are going to commit transaction, see btrfs_commit_transaction() +* comment for reason locking tree_log_mutex +*/ + mutex_lock(&fs_info->tree_log_mutex); + + ret = commit_fs_roots(trans, s
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On Tue, 19 Apr 2016 15:19:46 -0700, Mark Fasheh wrote: > On Fri, Apr 15, 2016 at 05:08:22PM +0800, Qu Wenruo wrote: >> Current btrfs qgroup design implies a requirement that after calling >> btrfs_qgroup_account_extents() there must be a commit root switch. >> >> Normally this is OK, as btrfs_qgroup_accounting_extents() is only called >> inside btrfs_commit_transaction() just be commit_cowonly_roots(). >> >> However there is a exception at create_pending_snapshot(), which will >> call btrfs_qgroup_account_extents() but no any commit root switch. >> >> In case of creating a snapshot whose parent root is itself (create a >> snapshot of fs tree), it will corrupt qgroup by the following trace: >> (skipped unrelated data) >> == >> btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, >> nr_old_roots = 0, nr_new_roots = 1 >> qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer >> = 0, excl = 0 >> qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer >> = 16384, excl = 16384 >> btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, >> nr_old_roots = 0, nr_new_roots = 0 >> == >> >> The problem here is in first qgroup_account_extent(), the >> nr_new_roots of the extent is 1, which means its reference got >> increased, and qgroup increased its rfer and excl. >> >> But at second qgroup_account_extent(), its reference got decreased, but >> between these two qgroup_account_extent(), there is no switch roots. >> This leads to the same nr_old_roots, and this extent just got ignored by >> qgroup, which means this extent is wrongly accounted. >> >> Fix it by call commit_cowonly_roots() after qgroup_account_extent() in >> create_pending_snapshot(), with needed preparation. >> >> Reported-by: Mark Fasheh >> Signed-off-by: Qu Wenruo > > Ok, this version seems to be giving me the right numbers. I'll send a test > for it shortly. I'd still like to know if this patch introduces an > unintended side effects but otherwise, thanks Qu. > --Mark Hi Mark, Can't speak about other side effects since I have not observed any so far, but I can confirm that the previously failing case of deleting a renamed snapshot [1] now works properly with v4 without getting the commit roots in a twist. So: Tested-by: holger.hoffstae...@googlemail.com cheers Holger [1] http://thread.gmane.org/gmane.comp.file-systems.btrfs/55052 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On Fri, Apr 15, 2016 at 05:08:22PM +0800, Qu Wenruo wrote: > Current btrfs qgroup design implies a requirement that after calling > btrfs_qgroup_account_extents() there must be a commit root switch. > > Normally this is OK, as btrfs_qgroup_accounting_extents() is only called > inside btrfs_commit_transaction() just be commit_cowonly_roots(). > > However there is a exception at create_pending_snapshot(), which will > call btrfs_qgroup_account_extents() but no any commit root switch. > > In case of creating a snapshot whose parent root is itself (create a > snapshot of fs tree), it will corrupt qgroup by the following trace: > (skipped unrelated data) > == > btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, > nr_old_roots = 0, nr_new_roots = 1 > qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer > = 0, excl = 0 > qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer > = 16384, excl = 16384 > btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, > nr_old_roots = 0, nr_new_roots = 0 > == > > The problem here is in first qgroup_account_extent(), the > nr_new_roots of the extent is 1, which means its reference got > increased, and qgroup increased its rfer and excl. > > But at second qgroup_account_extent(), its reference got decreased, but > between these two qgroup_account_extent(), there is no switch roots. > This leads to the same nr_old_roots, and this extent just got ignored by > qgroup, which means this extent is wrongly accounted. > > Fix it by call commit_cowonly_roots() after qgroup_account_extent() in > create_pending_snapshot(), with needed preparation. > > Reported-by: Mark Fasheh > Signed-off-by: Qu Wenruo Ok, this version seems to be giving me the right numbers. I'll send a test for it shortly. I'd still like to know if this patch introduces an unintended side effects but otherwise, thanks Qu. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html