Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-05-12 Thread David Sterba
On Wed, May 11, 2016 at 01:30:21PM -0700, Josef Bacik wrote:
> > Signed-off-by: Qu Wenruo 
> > Signed-off-by: Mark Fasheh 
> Reviewed-by: Josef Bacik 

Applied to for-next with the following fixup to make it bisectable:

---
btrfs: build fixup for qgroup_account_snapshot

The macro btrfs_std_error got renamed to btrfs_handle_fs_error in an
independent branch for the same merge target (4.7). To make the code
compilable for bisectability reasons, add a temporary stub.

Signed-off-by: David Sterba 
---
 fs/btrfs/transaction.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d7172d7ced5f..530081388d77 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1311,6 +1311,11 @@ int btrfs_defrag_root(struct btrfs_root *root)
return ret;
 }

+/* Bisesctability fixup, remove in 4.8 */
+#ifndef btrfs_std_error
+#define btrfs_std_error btrfs_handle_fs_error
+#endif
+
 /*
  * Do all special snapshot related qgroup dirty hack.
  *
---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-05-11 Thread Qu Wenruo



On 05/12/2016 04:30 AM, Josef Bacik wrote:

On 05/11/2016 12:53 PM, Mark Fasheh wrote:

On Wed, May 11, 2016 at 09:59:52AM -0700, Josef Bacik wrote:

On 05/11/2016 09:57 AM, Mark Fasheh wrote:

Hi Josef,

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

Current btrfs qgroup design implies a requirement that after calling
btrfs_qgroup_account_extents() there must be a commit root switch.

Normally this is OK, as btrfs_qgroup_accounting_extents() is only
called
inside btrfs_commit_transaction() just be commit_cowonly_roots().

However there is a exception at create_pending_snapshot(), which will
call btrfs_qgroup_account_extents() but no any commit root switch.

In case of creating a snapshot whose parent root is itself (create a
snapshot of fs tree), it will corrupt qgroup by the following trace:
(skipped unrelated data)
==
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384,
nr_old_roots = 0, nr_new_roots = 1
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count
= 1, rfer = 0, excl = 0
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count
= 1, rfer = 16384, excl = 16384
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384,
nr_old_roots = 0, nr_new_roots = 0
==

The problem here is in first qgroup_account_extent(), the
nr_new_roots of the extent is 1, which means its reference got
increased, and qgroup increased its rfer and excl.

But at second qgroup_account_extent(), its reference got
decreased, but
between these two qgroup_account_extent(), there is no switch roots.
This leads to the same nr_old_roots, and this extent just got
ignored by
qgroup, which means this extent is wrongly accounted.

Fix it by call commit_cowonly_roots() after
qgroup_account_extent() in
create_pending_snapshot(), with needed preparation.

Reported-by: Mark Fasheh 
Signed-off-by: Qu Wenruo 
---
v2:
 Fix a soft lockup caused by missing switch_commit_root() call.
 Fix a warning caused by dirty-but-not-committed root.
v3:
 Fix a bug which will cause qgroup accounting for dropping snapshot
 wrong
v4:
 Fix a bug caused by non-cowed btree modification.

To Filipe:
 I'm sorry I didn't wait for your reply on the dropped roots.
 I reverted back the version where we deleted dropped roots in
 switch_commit_roots().

 As I think as long as we called
btrfs_qgroup_prepare_account_extents()
 and btrfs_qgroup_account_extents(), it should have already accounted
 extents for dropped roots, and then we are OK to drop them.

 It would be very nice if you could point out what I missed.
 Thanks
 Qu
---
fs/btrfs/transaction.c | 117
+++--
1 file changed, 93 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 43885e5..92f8193 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -311,10 +311,11 @@ loop:
 * when the transaction commits
 */
static int record_root_in_trans(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root)
+   struct btrfs_root *root,
+   int force)
{
-if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
-root->last_trans < trans->transid) {
+if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
+root->last_trans < trans->transid) || force) {
WARN_ON(root == root->fs_info->extent_root);
WARN_ON(root->commit_root != root->node);

@@ -331,7 +332,7 @@ static int record_root_in_trans(struct
btrfs_trans_handle *trans,
smp_wmb();

spin_lock(&root->fs_info->fs_roots_radix_lock);
-if (root->last_trans == trans->transid) {
+if (root->last_trans == trans->transid && !force) {
spin_unlock(&root->fs_info->fs_roots_radix_lock);
return 0;
}
@@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct
btrfs_trans_handle *trans,
return 0;

mutex_lock(&root->fs_info->reloc_mutex);
-record_root_in_trans(trans, root);
+record_root_in_trans(trans, root, 0);
mutex_unlock(&root->fs_info->reloc_mutex);

return 0;
@@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root)
}

/*
+ * Do all special snapshot related qgroup dirty hack.
+ *
+ * Will do all needed qgroup inherit and dirty hack like switch
commit
+ * roots inside one transaction and write all btree into disk, to
make
+ * qgroup works.
+ */
+static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
+   struct btrfs_root *src,
+   struct btrfs_root *parent,
+   struct btrfs_qgroup_inherit *inherit,
+   u64 dst_objectid)
+{
+struct btrfs_fs_info *fs_info = src->fs_info;
+int ret;
+
+/*
+ * We are going to commit transaction, see
btrfs_commit_transaction()
+ * comment for reason locking tree_log_mutex
+ */
+mutex_lock(&fs_info->tree_log_mutex);
+
+ret = commit_fs_roots(trans, src);
+if (

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-05-11 Thread Josef Bacik

On 05/11/2016 12:53 PM, Mark Fasheh wrote:

On Wed, May 11, 2016 at 09:59:52AM -0700, Josef Bacik wrote:

On 05/11/2016 09:57 AM, Mark Fasheh wrote:

Hi Josef,

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

Current btrfs qgroup design implies a requirement that after calling
btrfs_qgroup_account_extents() there must be a commit root switch.

Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
inside btrfs_commit_transaction() just be commit_cowonly_roots().

However there is a exception at create_pending_snapshot(), which will
call btrfs_qgroup_account_extents() but no any commit root switch.

In case of creating a snapshot whose parent root is itself (create a
snapshot of fs tree), it will corrupt qgroup by the following trace:
(skipped unrelated data)
==
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots 
= 0, nr_new_roots = 1
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 
0, excl = 0
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 
16384, excl = 16384
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots 
= 0, nr_new_roots = 0
==

The problem here is in first qgroup_account_extent(), the
nr_new_roots of the extent is 1, which means its reference got
increased, and qgroup increased its rfer and excl.

But at second qgroup_account_extent(), its reference got decreased, but
between these two qgroup_account_extent(), there is no switch roots.
This leads to the same nr_old_roots, and this extent just got ignored by
qgroup, which means this extent is wrongly accounted.

Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
create_pending_snapshot(), with needed preparation.

Reported-by: Mark Fasheh 
Signed-off-by: Qu Wenruo 
---
v2:
 Fix a soft lockup caused by missing switch_commit_root() call.
 Fix a warning caused by dirty-but-not-committed root.
v3:
 Fix a bug which will cause qgroup accounting for dropping snapshot
 wrong
v4:
 Fix a bug caused by non-cowed btree modification.

To Filipe:
 I'm sorry I didn't wait for your reply on the dropped roots.
 I reverted back the version where we deleted dropped roots in
 switch_commit_roots().

 As I think as long as we called btrfs_qgroup_prepare_account_extents()
 and btrfs_qgroup_account_extents(), it should have already accounted
 extents for dropped roots, and then we are OK to drop them.

 It would be very nice if you could point out what I missed.
 Thanks
 Qu
---
fs/btrfs/transaction.c | 117 +++--
1 file changed, 93 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 43885e5..92f8193 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -311,10 +311,11 @@ loop:
 * when the transaction commits
 */
static int record_root_in_trans(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root)
+  struct btrfs_root *root,
+  int force)
{
-   if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
-   root->last_trans < trans->transid) {
+   if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
+   root->last_trans < trans->transid) || force) {
WARN_ON(root == root->fs_info->extent_root);
WARN_ON(root->commit_root != root->node);

@@ -331,7 +332,7 @@ static int record_root_in_trans(struct btrfs_trans_handle 
*trans,
smp_wmb();

spin_lock(&root->fs_info->fs_roots_radix_lock);
-   if (root->last_trans == trans->transid) {
+   if (root->last_trans == trans->transid && !force) {
spin_unlock(&root->fs_info->fs_roots_radix_lock);
return 0;
}
@@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle 
*trans,
return 0;

mutex_lock(&root->fs_info->reloc_mutex);
-   record_root_in_trans(trans, root);
+   record_root_in_trans(trans, root, 0);
mutex_unlock(&root->fs_info->reloc_mutex);

return 0;
@@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root)
}

/*
+ * Do all special snapshot related qgroup dirty hack.
+ *
+ * Will do all needed qgroup inherit and dirty hack like switch commit
+ * roots inside one transaction and write all btree into disk, to make
+ * qgroup works.
+ */
+static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
+  struct btrfs_root *src,
+  struct btrfs_root *parent,
+  struct btrfs_qgroup_inherit *inherit,
+  u64 dst_objectid)
+{
+   struct btrfs_fs_info *fs_info = src->fs_info;
+   int ret;
+
+   /*
+* We are going to commit transaction, see

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-05-11 Thread Mark Fasheh
On Wed, May 11, 2016 at 09:59:52AM -0700, Josef Bacik wrote:
> On 05/11/2016 09:57 AM, Mark Fasheh wrote:
> >Hi Josef,
> >
> >On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:
> >>On 04/15/2016 05:08 AM, Qu Wenruo wrote:
> >>>Current btrfs qgroup design implies a requirement that after calling
> >>>btrfs_qgroup_account_extents() there must be a commit root switch.
> >>>
> >>>Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
> >>>inside btrfs_commit_transaction() just be commit_cowonly_roots().
> >>>
> >>>However there is a exception at create_pending_snapshot(), which will
> >>>call btrfs_qgroup_account_extents() but no any commit root switch.
> >>>
> >>>In case of creating a snapshot whose parent root is itself (create a
> >>>snapshot of fs tree), it will corrupt qgroup by the following trace:
> >>>(skipped unrelated data)
> >>>==
> >>>btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, 
> >>>nr_old_roots = 0, nr_new_roots = 1
> >>>qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, 
> >>>rfer = 0, excl = 0
> >>>qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, 
> >>>rfer = 16384, excl = 16384
> >>>btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, 
> >>>nr_old_roots = 0, nr_new_roots = 0
> >>>==
> >>>
> >>>The problem here is in first qgroup_account_extent(), the
> >>>nr_new_roots of the extent is 1, which means its reference got
> >>>increased, and qgroup increased its rfer and excl.
> >>>
> >>>But at second qgroup_account_extent(), its reference got decreased, but
> >>>between these two qgroup_account_extent(), there is no switch roots.
> >>>This leads to the same nr_old_roots, and this extent just got ignored by
> >>>qgroup, which means this extent is wrongly accounted.
> >>>
> >>>Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
> >>>create_pending_snapshot(), with needed preparation.
> >>>
> >>>Reported-by: Mark Fasheh 
> >>>Signed-off-by: Qu Wenruo 
> >>>---
> >>>v2:
> >>>  Fix a soft lockup caused by missing switch_commit_root() call.
> >>>  Fix a warning caused by dirty-but-not-committed root.
> >>>v3:
> >>>  Fix a bug which will cause qgroup accounting for dropping snapshot
> >>>  wrong
> >>>v4:
> >>>  Fix a bug caused by non-cowed btree modification.
> >>>
> >>>To Filipe:
> >>>  I'm sorry I didn't wait for your reply on the dropped roots.
> >>>  I reverted back the version where we deleted dropped roots in
> >>>  switch_commit_roots().
> >>>
> >>>  As I think as long as we called btrfs_qgroup_prepare_account_extents()
> >>>  and btrfs_qgroup_account_extents(), it should have already accounted
> >>>  extents for dropped roots, and then we are OK to drop them.
> >>>
> >>>  It would be very nice if you could point out what I missed.
> >>>  Thanks
> >>>  Qu
> >>>---
> >>> fs/btrfs/transaction.c | 117 
> >>> +++--
> >>> 1 file changed, 93 insertions(+), 24 deletions(-)
> >>>
> >>>diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> >>>index 43885e5..92f8193 100644
> >>>--- a/fs/btrfs/transaction.c
> >>>+++ b/fs/btrfs/transaction.c
> >>>@@ -311,10 +311,11 @@ loop:
> >>>  * when the transaction commits
> >>>  */
> >>> static int record_root_in_trans(struct btrfs_trans_handle *trans,
> >>>- struct btrfs_root *root)
> >>>+ struct btrfs_root *root,
> >>>+ int force)
> >>> {
> >>>-  if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
> >>>-  root->last_trans < trans->transid) {
> >>>+  if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
> >>>+  root->last_trans < trans->transid) || force) {
> >>>   WARN_ON(root == root->fs_info->extent_root);
> >>>   WARN_ON(root->commit_root != root->node);
> >>>
> >>>@@ -331,7 +332,7 @@ static int record_root_in_trans(struct 
> >>>btrfs_trans_handle *trans,
> >>>   smp_wmb();
> >>>
> >>>   spin_lock(&root->fs_info->fs_roots_radix_lock);
> >>>-  if (root->last_trans == trans->transid) {
> >>>+  if (root->last_trans == trans->transid && !force) {
> >>>   spin_unlock(&root->fs_info->fs_roots_radix_lock);
> >>>   return 0;
> >>>   }
> >>>@@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct 
> >>>btrfs_trans_handle *trans,
> >>>   return 0;
> >>>
> >>>   mutex_lock(&root->fs_info->reloc_mutex);
> >>>-  record_root_in_trans(trans, root);
> >>>+  record_root_in_trans(trans, root, 0);
> >>>   mutex_unlock(&root->fs_info->reloc_mutex);
> >>>
> >>>   return 0;
> >>>@@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root)
> >>> }
> >>>
> >>> /*
> >>>+ * Do all special snapshot related qgroup dirty hack.
> >>>+ *
> >>>+ * Will do all needed qgroup inherit and dirty hack like switch commit
> >>>+ * roots inside one transaction and write all btree into disk, to make
> >>>+ * qgroup works.
> >>>+ */
> >>>+sta

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-05-11 Thread Josef Bacik

On 05/11/2016 09:57 AM, Mark Fasheh wrote:

Hi Josef,

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

Current btrfs qgroup design implies a requirement that after calling
btrfs_qgroup_account_extents() there must be a commit root switch.

Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
inside btrfs_commit_transaction() just be commit_cowonly_roots().

However there is a exception at create_pending_snapshot(), which will
call btrfs_qgroup_account_extents() but no any commit root switch.

In case of creating a snapshot whose parent root is itself (create a
snapshot of fs tree), it will corrupt qgroup by the following trace:
(skipped unrelated data)
==
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots 
= 0, nr_new_roots = 1
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 
0, excl = 0
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 
16384, excl = 16384
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots 
= 0, nr_new_roots = 0
==

The problem here is in first qgroup_account_extent(), the
nr_new_roots of the extent is 1, which means its reference got
increased, and qgroup increased its rfer and excl.

But at second qgroup_account_extent(), its reference got decreased, but
between these two qgroup_account_extent(), there is no switch roots.
This leads to the same nr_old_roots, and this extent just got ignored by
qgroup, which means this extent is wrongly accounted.

Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
create_pending_snapshot(), with needed preparation.

Reported-by: Mark Fasheh 
Signed-off-by: Qu Wenruo 
---
v2:
  Fix a soft lockup caused by missing switch_commit_root() call.
  Fix a warning caused by dirty-but-not-committed root.
v3:
  Fix a bug which will cause qgroup accounting for dropping snapshot
  wrong
v4:
  Fix a bug caused by non-cowed btree modification.

To Filipe:
  I'm sorry I didn't wait for your reply on the dropped roots.
  I reverted back the version where we deleted dropped roots in
  switch_commit_roots().

  As I think as long as we called btrfs_qgroup_prepare_account_extents()
  and btrfs_qgroup_account_extents(), it should have already accounted
  extents for dropped roots, and then we are OK to drop them.

  It would be very nice if you could point out what I missed.
  Thanks
  Qu
---
 fs/btrfs/transaction.c | 117 +++--
 1 file changed, 93 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 43885e5..92f8193 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -311,10 +311,11 @@ loop:
  * when the transaction commits
  */
 static int record_root_in_trans(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root)
+  struct btrfs_root *root,
+  int force)
 {
-   if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
-   root->last_trans < trans->transid) {
+   if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
+   root->last_trans < trans->transid) || force) {
WARN_ON(root == root->fs_info->extent_root);
WARN_ON(root->commit_root != root->node);

@@ -331,7 +332,7 @@ static int record_root_in_trans(struct btrfs_trans_handle 
*trans,
smp_wmb();

spin_lock(&root->fs_info->fs_roots_radix_lock);
-   if (root->last_trans == trans->transid) {
+   if (root->last_trans == trans->transid && !force) {
spin_unlock(&root->fs_info->fs_roots_radix_lock);
return 0;
}
@@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle 
*trans,
return 0;

mutex_lock(&root->fs_info->reloc_mutex);
-   record_root_in_trans(trans, root);
+   record_root_in_trans(trans, root, 0);
mutex_unlock(&root->fs_info->reloc_mutex);

return 0;
@@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root)
 }

 /*
+ * Do all special snapshot related qgroup dirty hack.
+ *
+ * Will do all needed qgroup inherit and dirty hack like switch commit
+ * roots inside one transaction and write all btree into disk, to make
+ * qgroup works.
+ */
+static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
+  struct btrfs_root *src,
+  struct btrfs_root *parent,
+  struct btrfs_qgroup_inherit *inherit,
+  u64 dst_objectid)
+{
+   struct btrfs_fs_info *fs_info = src->fs_info;
+   int ret;
+
+   /*
+* We are going to commit transaction, see btrfs_commit_transaction()
+* comment for reason locking tree_log_mutex
+  

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-05-11 Thread Mark Fasheh
Hi Josef, 

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:
> On 04/15/2016 05:08 AM, Qu Wenruo wrote:
> >Current btrfs qgroup design implies a requirement that after calling
> >btrfs_qgroup_account_extents() there must be a commit root switch.
> >
> >Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
> >inside btrfs_commit_transaction() just be commit_cowonly_roots().
> >
> >However there is a exception at create_pending_snapshot(), which will
> >call btrfs_qgroup_account_extents() but no any commit root switch.
> >
> >In case of creating a snapshot whose parent root is itself (create a
> >snapshot of fs tree), it will corrupt qgroup by the following trace:
> >(skipped unrelated data)
> >==
> >btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, 
> >nr_old_roots = 0, nr_new_roots = 1
> >qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer 
> >= 0, excl = 0
> >qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer 
> >= 16384, excl = 16384
> >btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, 
> >nr_old_roots = 0, nr_new_roots = 0
> >==
> >
> >The problem here is in first qgroup_account_extent(), the
> >nr_new_roots of the extent is 1, which means its reference got
> >increased, and qgroup increased its rfer and excl.
> >
> >But at second qgroup_account_extent(), its reference got decreased, but
> >between these two qgroup_account_extent(), there is no switch roots.
> >This leads to the same nr_old_roots, and this extent just got ignored by
> >qgroup, which means this extent is wrongly accounted.
> >
> >Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
> >create_pending_snapshot(), with needed preparation.
> >
> >Reported-by: Mark Fasheh 
> >Signed-off-by: Qu Wenruo 
> >---
> >v2:
> >   Fix a soft lockup caused by missing switch_commit_root() call.
> >   Fix a warning caused by dirty-but-not-committed root.
> >v3:
> >   Fix a bug which will cause qgroup accounting for dropping snapshot
> >   wrong
> >v4:
> >   Fix a bug caused by non-cowed btree modification.
> >
> >To Filipe:
> >   I'm sorry I didn't wait for your reply on the dropped roots.
> >   I reverted back the version where we deleted dropped roots in
> >   switch_commit_roots().
> >
> >   As I think as long as we called btrfs_qgroup_prepare_account_extents()
> >   and btrfs_qgroup_account_extents(), it should have already accounted
> >   extents for dropped roots, and then we are OK to drop them.
> >
> >   It would be very nice if you could point out what I missed.
> >   Thanks
> >   Qu
> >---
> >  fs/btrfs/transaction.c | 117 
> > +++--
> >  1 file changed, 93 insertions(+), 24 deletions(-)
> >
> >diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> >index 43885e5..92f8193 100644
> >--- a/fs/btrfs/transaction.c
> >+++ b/fs/btrfs/transaction.c
> >@@ -311,10 +311,11 @@ loop:
> >   * when the transaction commits
> >   */
> >  static int record_root_in_trans(struct btrfs_trans_handle *trans,
> >-   struct btrfs_root *root)
> >+   struct btrfs_root *root,
> >+   int force)
> >  {
> >-if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
> >-root->last_trans < trans->transid) {
> >+if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
> >+root->last_trans < trans->transid) || force) {
> > WARN_ON(root == root->fs_info->extent_root);
> > WARN_ON(root->commit_root != root->node);
> >
> >@@ -331,7 +332,7 @@ static int record_root_in_trans(struct 
> >btrfs_trans_handle *trans,
> > smp_wmb();
> >
> > spin_lock(&root->fs_info->fs_roots_radix_lock);
> >-if (root->last_trans == trans->transid) {
> >+if (root->last_trans == trans->transid && !force) {
> > spin_unlock(&root->fs_info->fs_roots_radix_lock);
> > return 0;
> > }
> >@@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle 
> >*trans,
> > return 0;
> >
> > mutex_lock(&root->fs_info->reloc_mutex);
> >-record_root_in_trans(trans, root);
> >+record_root_in_trans(trans, root, 0);
> > mutex_unlock(&root->fs_info->reloc_mutex);
> >
> > return 0;
> >@@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root)
> >  }
> >
> >  /*
> >+ * Do all special snapshot related qgroup dirty hack.
> >+ *
> >+ * Will do all needed qgroup inherit and dirty hack like switch commit
> >+ * roots inside one transaction and write all btree into disk, to make
> >+ * qgroup works.
> >+ */
> >+static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
> >+   struct btrfs_root *src,
> >+   struct btrfs_root *parent,
> >+   struct btrfs_qgroup_inherit *inherit,
> >+  

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-26 Thread Qu Wenruo



Josef Bacik wrote on 2016/04/26 10:26 -0400:

On 04/25/2016 08:35 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/25 10:24 -0400:

On 04/24/2016 08:56 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/22 14:23 -0400:

On 04/22/2016 02:21 PM, Mark Fasheh wrote:

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik
wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

+/* + * Force parent root to be updated, as we
recorded it before so its + * last_trans ==
cur_transid. + * Or it won't be committed again
onto disk after later + * insert_dir_item() +
*/ +if (!ret) +record_root_in_trans(trans,
parent, 1); +return ret; +}


NACK, holy shit we aren't adding a special transaction
commit only for qgroup snapshots.  Figure out a different
way.  Thanks,


Yeah I saw that. To be fair, we run a whole lot of the
transaction stuff multiple times (at least from my reading)
so I'm really unclear on what the performance impact is.

Do you have any suggestion though? We've been banging our
heads against this for a while now and as slow as this
patch might be, it actually works where nothing else has so
far.


I'm less concerned about committing another transaction and
more concerned about the fact that it is an special variant
of the transaction commit.  If this goes wrong, or at some
point in the future we fail to update it along with
btrfs_transaction_commit we suddenly are corrupting metadata.
If we have to commit a transaction then call
btrfs_commit_transaction(), don't open code a stripped down
version, here be dragons.  Thanks,

Josef




Yes, I also don't like the dirty hack.

Although the problem is, we have no other good choice.

If we can call commit_transaction() that's the best case, but
the problem is, in create_pending_snapshots(), we are already
inside commit_transaction().

Or commit_transaction() can be called inside
commit_transaction()?



No, figure out a different way.  IIRC I dealt with this with the
no_quota flag for inc_ref/dec_ref since the copy root stuff does
strange things with the reference counts, but all this code is
gone now.  I looked around to see if I could figure out how the
refs are ending up this way but it doesn't make sense to me and
there isn't enough information in your changelog for me to be
able to figure it out. You've created this mess, clean it up
without making it messier.  Thanks,

Josef



Unfortunately, your original no_quota flag just hide the bug, and
hide it in a bad method.

Originally, no_quota flag is used for case like this, to skip quota
at snapshot creation, and use quota_inherit() to hack the quota
accounting. It seems work, but in fact, if the DIR_ITEM insert need
to create a new cousin leaf, then quota is messed up.



No, and this is the problem, you fundamentally didn't understand what
I wrote, and instead of trying to understand it and fix the bug you
just threw it all away.  The no_quota stuff was not a hack, it was
put in place to deal with refs we already knew where accounted for,
such as when we converted to mixed refs or other such operations.


Even we know it has been accounted, it's still a hack to jump away from
normal accounting routing.

Just like what we do in qgroup_inherit(). We believe snapshot creation
will always make the exclusive to nodesize.
But in fact, we didn't consider the higher qgroup, and qgroup_inherit()
is just messing up that case.


And Yes, I DID threw the old qgroup code away.
As there are too many existing bugs and possible bugs, not only in
qgroup, but also in backref walk.

Backref walk can't handle time_seq really well, that's one of the reason
for btrfs/091.
And new comer must handle the extra no_quota flag, which is more easy to
cause new regression.



There were bugs in my rework, but now the situation is untenable.
What we now have is something that holds delayed refs in memory for
the entire transaction, which is a non-starter for anybody who wants
to use this in production.


Nope, we didn't hold delayed refs in current qgroup codes.

Delayed refs can be flushed to disk at any time just as it is.
We only trace which bytenr will go through qgroup routine, nothing to do 
with delayed refs.


Yes you can have millions of delayed refs, but there won't be millions
of different extents. Or sync_fs() would have been triggered.


On our gluster machines we get millions of delayed refs per
transaction, and then there are multiple file systems.  Then you have
to post-process all of that during the critical section of the
commit?  So now suddenly I'm walking millions of delayed refs doing
accounting all at once, that is going to cause commit latencies in
the seconds which is completely unacceptable.


In fact qgroup code will never account or walk through delayed refs.

Qgroup code will only walk through backrefs in extent tree. (commit tree 
and just before commit tree)

So, millions is already reduced to a much smaller amount.

Then, for old qgroup code, each time a inc/dec_extent_

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-26 Thread Josef Bacik

On 04/25/2016 08:35 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/25 10:24 -0400:

On 04/24/2016 08:56 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/22 14:23 -0400:

On 04/22/2016 02:21 PM, Mark Fasheh wrote:

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

+/*
+ * Force parent root to be updated, as we recorded it before so
its
+ * last_trans == cur_transid.
+ * Or it won't be committed again onto disk after later
+ * insert_dir_item()
+ */
+if (!ret)
+record_root_in_trans(trans, parent, 1);
+return ret;
+}


NACK, holy shit we aren't adding a special transaction commit only
for qgroup snapshots.  Figure out a different way.  Thanks,


Yeah I saw that. To be fair, we run a whole lot of the transaction
stuff
multiple times (at least from my reading) so I'm really unclear on
what the
performance impact is.

Do you have any suggestion though? We've been banging our heads
against this
for a while now and as slow as this patch might be, it actually works
where
nothing else has so far.


I'm less concerned about committing another transaction and more
concerned about the fact that it is an special variant of the
transaction commit.  If this goes wrong, or at some point in the future
we fail to update it along with btrfs_transaction_commit we suddenly
are
corrupting metadata.  If we have to commit a transaction then call
btrfs_commit_transaction(), don't open code a stripped down version,
here be dragons.  Thanks,

Josef




Yes, I also don't like the dirty hack.

Although the problem is, we have no other good choice.

If we can call commit_transaction() that's the best case, but the
problem is, in create_pending_snapshots(), we are already inside
commit_transaction().

Or commit_transaction() can be called inside commit_transaction()?



No, figure out a different way.  IIRC I dealt with this with the
no_quota flag for inc_ref/dec_ref since the copy root stuff does strange
things with the reference counts, but all this code is gone now.  I
looked around to see if I could figure out how the refs are ending up
this way but it doesn't make sense to me and there isn't enough
information in your changelog for me to be able to figure it out. You've
created this mess, clean it up without making it messier.  Thanks,

Josef



Unfortunately, your original no_quota flag just hide the bug, and hide
it in a bad method.

Originally, no_quota flag is used for case like this, to skip quota at
snapshot creation, and use quota_inherit() to hack the quota accounting.
It seems work, but in fact, if the DIR_ITEM insert need to create a new
cousin leaf, then quota is messed up.



No, and this is the problem, you fundamentally didn't understand what I 
wrote, and instead of trying to understand it and fix the bug you just 
threw it all away.  The no_quota stuff was not a hack, it was put in 
place to deal with refs we already knew where accounted for, such as 
when we converted to mixed refs or other such operations.


There were bugs in my rework, but now the situation is untenable.  What 
we now have is something that holds delayed refs in memory for the 
entire transaction, which is a non-starter for anybody who wants to use 
this in production.  On our gluster machines we get millions of delayed 
refs per transaction, and then there are multiple file systems.  Then 
you have to post-process all of that during the critical section of the 
commit?  So now suddenly I'm walking millions of delayed refs doing 
accounting all at once, that is going to cause commit latencies in the 
seconds which is completely unacceptable.


Anyway I spent some time this morning reading through the new stuff to 
figure out how it works now and I've got a patch to fix this problem 
that doesn't involve screwing with the transaction commit stuff at all. 
I sent it along separately


Btrfs: fix qgroup accounting when snapshotting

This fixes the basic case that was described originally.  I haven't 
tested it other than that but I'm pretty sure it is correct.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-25 Thread Qu Wenruo



Josef Bacik wrote on 2016/04/25 10:24 -0400:

On 04/24/2016 08:56 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/22 14:23 -0400:

On 04/22/2016 02:21 PM, Mark Fasheh wrote:

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

+/*
+ * Force parent root to be updated, as we recorded it before so
its
+ * last_trans == cur_transid.
+ * Or it won't be committed again onto disk after later
+ * insert_dir_item()
+ */
+if (!ret)
+record_root_in_trans(trans, parent, 1);
+return ret;
+}


NACK, holy shit we aren't adding a special transaction commit only
for qgroup snapshots.  Figure out a different way.  Thanks,


Yeah I saw that. To be fair, we run a whole lot of the transaction
stuff
multiple times (at least from my reading) so I'm really unclear on
what the
performance impact is.

Do you have any suggestion though? We've been banging our heads
against this
for a while now and as slow as this patch might be, it actually works
where
nothing else has so far.


I'm less concerned about committing another transaction and more
concerned about the fact that it is an special variant of the
transaction commit.  If this goes wrong, or at some point in the future
we fail to update it along with btrfs_transaction_commit we suddenly are
corrupting metadata.  If we have to commit a transaction then call
btrfs_commit_transaction(), don't open code a stripped down version,
here be dragons.  Thanks,

Josef




Yes, I also don't like the dirty hack.

Although the problem is, we have no other good choice.

If we can call commit_transaction() that's the best case, but the
problem is, in create_pending_snapshots(), we are already inside
commit_transaction().

Or commit_transaction() can be called inside commit_transaction()?



No, figure out a different way.  IIRC I dealt with this with the
no_quota flag for inc_ref/dec_ref since the copy root stuff does strange
things with the reference counts, but all this code is gone now.  I
looked around to see if I could figure out how the refs are ending up
this way but it doesn't make sense to me and there isn't enough
information in your changelog for me to be able to figure it out. You've
created this mess, clean it up without making it messier.  Thanks,

Josef


Unfortunately, your original no_quota flag just hide the bug, and hide 
it in a bad method.


Originally, no_quota flag is used for case like this, to skip quota at 
snapshot creation, and use quota_inherit() to hack the quota accounting.
It seems work, but in fact, if the DIR_ITEM insert need to create a new 
cousin leaf, then quota is messed up.


Your quota rework doesn't really help, as it won't even accounting 
things well, just check fstest/btrfs/091 on 4.1 kernel.


The only perfect fix for this already nasty subvolume creation is to do 
full subtree rescan.

Or no one knows when higher qgroups will be broken.



If you think splitting commit_transaction into two variants can cause 
problem, I can merge this two variants into one.


As in btrfs_commit_transaction() the commit process is much the same as 
the one used in create_pending_snapshot().


If there is only one __commit_roots() to do such commit, then there is 
nothing special only for quota.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-25 Thread Josef Bacik

On 04/24/2016 08:56 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/22 14:23 -0400:

On 04/22/2016 02:21 PM, Mark Fasheh wrote:

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

+/*
+ * Force parent root to be updated, as we recorded it before so
its
+ * last_trans == cur_transid.
+ * Or it won't be committed again onto disk after later
+ * insert_dir_item()
+ */
+if (!ret)
+record_root_in_trans(trans, parent, 1);
+return ret;
+}


NACK, holy shit we aren't adding a special transaction commit only
for qgroup snapshots.  Figure out a different way.  Thanks,


Yeah I saw that. To be fair, we run a whole lot of the transaction stuff
multiple times (at least from my reading) so I'm really unclear on
what the
performance impact is.

Do you have any suggestion though? We've been banging our heads
against this
for a while now and as slow as this patch might be, it actually works
where
nothing else has so far.


I'm less concerned about committing another transaction and more
concerned about the fact that it is an special variant of the
transaction commit.  If this goes wrong, or at some point in the future
we fail to update it along with btrfs_transaction_commit we suddenly are
corrupting metadata.  If we have to commit a transaction then call
btrfs_commit_transaction(), don't open code a stripped down version,
here be dragons.  Thanks,

Josef




Yes, I also don't like the dirty hack.

Although the problem is, we have no other good choice.

If we can call commit_transaction() that's the best case, but the
problem is, in create_pending_snapshots(), we are already inside
commit_transaction().

Or commit_transaction() can be called inside commit_transaction()?



No, figure out a different way.  IIRC I dealt with this with the 
no_quota flag for inc_ref/dec_ref since the copy root stuff does strange 
things with the reference counts, but all this code is gone now.  I 
looked around to see if I could figure out how the refs are ending up 
this way but it doesn't make sense to me and there isn't enough 
information in your changelog for me to be able to figure it out. 
You've created this mess, clean it up without making it messier.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-24 Thread Qu Wenruo



Josef Bacik wrote on 2016/04/22 14:23 -0400:

On 04/22/2016 02:21 PM, Mark Fasheh wrote:

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

+/*
+ * Force parent root to be updated, as we recorded it before so
its
+ * last_trans == cur_transid.
+ * Or it won't be committed again onto disk after later
+ * insert_dir_item()
+ */
+if (!ret)
+record_root_in_trans(trans, parent, 1);
+return ret;
+}


NACK, holy shit we aren't adding a special transaction commit only
for qgroup snapshots.  Figure out a different way.  Thanks,


Yeah I saw that. To be fair, we run a whole lot of the transaction stuff
multiple times (at least from my reading) so I'm really unclear on
what the
performance impact is.

Do you have any suggestion though? We've been banging our heads
against this
for a while now and as slow as this patch might be, it actually works
where
nothing else has so far.


I'm less concerned about committing another transaction and more
concerned about the fact that it is an special variant of the
transaction commit.  If this goes wrong, or at some point in the future
we fail to update it along with btrfs_transaction_commit we suddenly are
corrupting metadata.  If we have to commit a transaction then call
btrfs_commit_transaction(), don't open code a stripped down version,
here be dragons.  Thanks,

Josef




Yes, I also don't like the dirty hack.

Although the problem is, we have no other good choice.

If we can call commit_transaction() that's the best case, but the 
problem is, in create_pending_snapshots(), we are already inside 
commit_transaction().


Or commit_transaction() can be called inside commit_transaction()?

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-22 Thread Mark Fasheh
On Fri, Apr 22, 2016 at 02:23:59PM -0400, Josef Bacik wrote:
> On 04/22/2016 02:21 PM, Mark Fasheh wrote:
> >On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:
> >>On 04/15/2016 05:08 AM, Qu Wenruo wrote:
> >>>+  /*
> >>>+   * Force parent root to be updated, as we recorded it before so its
> >>>+   * last_trans == cur_transid.
> >>>+   * Or it won't be committed again onto disk after later
> >>>+   * insert_dir_item()
> >>>+   */
> >>>+  if (!ret)
> >>>+  record_root_in_trans(trans, parent, 1);
> >>>+  return ret;
> >>>+}
> >>
> >>NACK, holy shit we aren't adding a special transaction commit only
> >>for qgroup snapshots.  Figure out a different way.  Thanks,
> >
> >Yeah I saw that. To be fair, we run a whole lot of the transaction stuff
> >multiple times (at least from my reading) so I'm really unclear on what the
> >performance impact is.
> >
> >Do you have any suggestion though? We've been banging our heads against this
> >for a while now and as slow as this patch might be, it actually works where
> >nothing else has so far.
> 
> I'm less concerned about committing another transaction and more
> concerned about the fact that it is an special variant of the
> transaction commit.  If this goes wrong, or at some point in the
> future we fail to update it along with btrfs_transaction_commit we
> suddenly are corrupting metadata.  If we have to commit a
> transaction then call btrfs_commit_transaction(), don't open code a
> stripped down version, here be dragons.  Thanks,

Ok yeah that makes perfect sense - I thought you were telling me that this
would be a big performance problem.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-22 Thread Josef Bacik

On 04/22/2016 02:21 PM, Mark Fasheh wrote:

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

+   /*
+* Force parent root to be updated, as we recorded it before so its
+* last_trans == cur_transid.
+* Or it won't be committed again onto disk after later
+* insert_dir_item()
+*/
+   if (!ret)
+   record_root_in_trans(trans, parent, 1);
+   return ret;
+}


NACK, holy shit we aren't adding a special transaction commit only
for qgroup snapshots.  Figure out a different way.  Thanks,


Yeah I saw that. To be fair, we run a whole lot of the transaction stuff
multiple times (at least from my reading) so I'm really unclear on what the
performance impact is.

Do you have any suggestion though? We've been banging our heads against this
for a while now and as slow as this patch might be, it actually works where
nothing else has so far.


I'm less concerned about committing another transaction and more 
concerned about the fact that it is an special variant of the 
transaction commit.  If this goes wrong, or at some point in the future 
we fail to update it along with btrfs_transaction_commit we suddenly are 
corrupting metadata.  If we have to commit a transaction then call 
btrfs_commit_transaction(), don't open code a stripped down version, 
here be dragons.  Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-22 Thread Mark Fasheh
On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:
> On 04/15/2016 05:08 AM, Qu Wenruo wrote:
> >+/*
> >+ * Force parent root to be updated, as we recorded it before so its
> >+ * last_trans == cur_transid.
> >+ * Or it won't be committed again onto disk after later
> >+ * insert_dir_item()
> >+ */
> >+if (!ret)
> >+record_root_in_trans(trans, parent, 1);
> >+return ret;
> >+}
> 
> NACK, holy shit we aren't adding a special transaction commit only
> for qgroup snapshots.  Figure out a different way.  Thanks,

Yeah I saw that. To be fair, we run a whole lot of the transaction stuff
multiple times (at least from my reading) so I'm really unclear on what the
performance impact is.

Do you have any suggestion though? We've been banging our heads against this
for a while now and as slow as this patch might be, it actually works where
nothing else has so far.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-22 Thread Josef Bacik

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

Current btrfs qgroup design implies a requirement that after calling
btrfs_qgroup_account_extents() there must be a commit root switch.

Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
inside btrfs_commit_transaction() just be commit_cowonly_roots().

However there is a exception at create_pending_snapshot(), which will
call btrfs_qgroup_account_extents() but no any commit root switch.

In case of creating a snapshot whose parent root is itself (create a
snapshot of fs tree), it will corrupt qgroup by the following trace:
(skipped unrelated data)
==
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots 
= 0, nr_new_roots = 1
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 
0, excl = 0
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 
16384, excl = 16384
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots 
= 0, nr_new_roots = 0
==

The problem here is in first qgroup_account_extent(), the
nr_new_roots of the extent is 1, which means its reference got
increased, and qgroup increased its rfer and excl.

But at second qgroup_account_extent(), its reference got decreased, but
between these two qgroup_account_extent(), there is no switch roots.
This leads to the same nr_old_roots, and this extent just got ignored by
qgroup, which means this extent is wrongly accounted.

Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
create_pending_snapshot(), with needed preparation.

Reported-by: Mark Fasheh 
Signed-off-by: Qu Wenruo 
---
v2:
   Fix a soft lockup caused by missing switch_commit_root() call.
   Fix a warning caused by dirty-but-not-committed root.
v3:
   Fix a bug which will cause qgroup accounting for dropping snapshot
   wrong
v4:
   Fix a bug caused by non-cowed btree modification.

To Filipe:
   I'm sorry I didn't wait for your reply on the dropped roots.
   I reverted back the version where we deleted dropped roots in
   switch_commit_roots().

   As I think as long as we called btrfs_qgroup_prepare_account_extents()
   and btrfs_qgroup_account_extents(), it should have already accounted
   extents for dropped roots, and then we are OK to drop them.

   It would be very nice if you could point out what I missed.
   Thanks
   Qu
---
  fs/btrfs/transaction.c | 117 +++--
  1 file changed, 93 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 43885e5..92f8193 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -311,10 +311,11 @@ loop:
   * when the transaction commits
   */
  static int record_root_in_trans(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root)
+  struct btrfs_root *root,
+  int force)
  {
-   if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
-   root->last_trans < trans->transid) {
+   if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
+   root->last_trans < trans->transid) || force) {
WARN_ON(root == root->fs_info->extent_root);
WARN_ON(root->commit_root != root->node);

@@ -331,7 +332,7 @@ static int record_root_in_trans(struct btrfs_trans_handle 
*trans,
smp_wmb();

spin_lock(&root->fs_info->fs_roots_radix_lock);
-   if (root->last_trans == trans->transid) {
+   if (root->last_trans == trans->transid && !force) {
spin_unlock(&root->fs_info->fs_roots_radix_lock);
return 0;
}
@@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle 
*trans,
return 0;

mutex_lock(&root->fs_info->reloc_mutex);
-   record_root_in_trans(trans, root);
+   record_root_in_trans(trans, root, 0);
mutex_unlock(&root->fs_info->reloc_mutex);

return 0;
@@ -1311,6 +1312,80 @@ int btrfs_defrag_root(struct btrfs_root *root)
  }

  /*
+ * Do all special snapshot related qgroup dirty hack.
+ *
+ * Will do all needed qgroup inherit and dirty hack like switch commit
+ * roots inside one transaction and write all btree into disk, to make
+ * qgroup works.
+ */
+static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
+  struct btrfs_root *src,
+  struct btrfs_root *parent,
+  struct btrfs_qgroup_inherit *inherit,
+  u64 dst_objectid)
+{
+   struct btrfs_fs_info *fs_info = src->fs_info;
+   int ret;
+
+   /*
+* We are going to commit transaction, see btrfs_commit_transaction()
+* comment for reason locking tree_log_mutex
+*/
+   mutex_lock(&fs_info->tree_log_mutex);
+
+   ret = commit_fs_roots(trans, s

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-20 Thread Holger Hoffstätte
On Tue, 19 Apr 2016 15:19:46 -0700, Mark Fasheh wrote:

> On Fri, Apr 15, 2016 at 05:08:22PM +0800, Qu Wenruo wrote:
>> Current btrfs qgroup design implies a requirement that after calling
>> btrfs_qgroup_account_extents() there must be a commit root switch.
>> 
>> Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
>> inside btrfs_commit_transaction() just be commit_cowonly_roots().
>> 
>> However there is a exception at create_pending_snapshot(), which will
>> call btrfs_qgroup_account_extents() but no any commit root switch.
>> 
>> In case of creating a snapshot whose parent root is itself (create a
>> snapshot of fs tree), it will corrupt qgroup by the following trace:
>> (skipped unrelated data)
>> ==
>> btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, 
>> nr_old_roots = 0, nr_new_roots = 1
>> qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer 
>> = 0, excl = 0
>> qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer 
>> = 16384, excl = 16384
>> btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, 
>> nr_old_roots = 0, nr_new_roots = 0
>> ==
>> 
>> The problem here is in first qgroup_account_extent(), the
>> nr_new_roots of the extent is 1, which means its reference got
>> increased, and qgroup increased its rfer and excl.
>> 
>> But at second qgroup_account_extent(), its reference got decreased, but
>> between these two qgroup_account_extent(), there is no switch roots.
>> This leads to the same nr_old_roots, and this extent just got ignored by
>> qgroup, which means this extent is wrongly accounted.
>> 
>> Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
>> create_pending_snapshot(), with needed preparation.
>> 
>> Reported-by: Mark Fasheh 
>> Signed-off-by: Qu Wenruo 
> 
> Ok, this version seems to be giving me the right numbers. I'll send a test
> for it shortly. I'd still like to know if this patch introduces an
> unintended side effects but otherwise, thanks Qu.
>   --Mark

Hi Mark,

Can't speak about other side effects since I have not observed any so far,
but I can confirm that the previously failing case of deleting a renamed
snapshot [1] now works properly with v4 without getting the commit roots
in a twist. So:

Tested-by: holger.hoffstae...@googlemail.com

cheers
Holger

[1] http://thread.gmane.org/gmane.comp.file-systems.btrfs/55052

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-19 Thread Mark Fasheh
On Fri, Apr 15, 2016 at 05:08:22PM +0800, Qu Wenruo wrote:
> Current btrfs qgroup design implies a requirement that after calling
> btrfs_qgroup_account_extents() there must be a commit root switch.
> 
> Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
> inside btrfs_commit_transaction() just be commit_cowonly_roots().
> 
> However there is a exception at create_pending_snapshot(), which will
> call btrfs_qgroup_account_extents() but no any commit root switch.
> 
> In case of creating a snapshot whose parent root is itself (create a
> snapshot of fs tree), it will corrupt qgroup by the following trace:
> (skipped unrelated data)
> ==
> btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, 
> nr_old_roots = 0, nr_new_roots = 1
> qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer 
> = 0, excl = 0
> qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer 
> = 16384, excl = 16384
> btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, 
> nr_old_roots = 0, nr_new_roots = 0
> ==
> 
> The problem here is in first qgroup_account_extent(), the
> nr_new_roots of the extent is 1, which means its reference got
> increased, and qgroup increased its rfer and excl.
> 
> But at second qgroup_account_extent(), its reference got decreased, but
> between these two qgroup_account_extent(), there is no switch roots.
> This leads to the same nr_old_roots, and this extent just got ignored by
> qgroup, which means this extent is wrongly accounted.
> 
> Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
> create_pending_snapshot(), with needed preparation.
> 
> Reported-by: Mark Fasheh 
> Signed-off-by: Qu Wenruo 

Ok, this version seems to be giving me the right numbers. I'll send a test
for it shortly. I'd still like to know if this patch introduces an
unintended side effects but otherwise, thanks Qu.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html