Re: [PATCH 1/2] btrfs: catch cow on deleting snapshots

2018-12-06 Thread David Sterba
On Fri, Nov 30, 2018 at 12:19:18PM -0500, Josef Bacik wrote:
> On Fri, Nov 30, 2018 at 05:14:54PM +, Filipe Manana wrote:
> > On Fri, Nov 30, 2018 at 4:53 PM Josef Bacik  wrote:
> > >
> > > From: Josef Bacik 
> > >
> > > When debugging some weird extent reference bug I suspected that we were
> > > changing a snapshot while we were deleting it, which could explain my
> > > bug.  This was indeed what was happening, and this patch helped me
> > > verify my theory.  It is never correct to modify the snapshot once it's
> > > being deleted, so mark the root when we are deleting it and make sure we
> > > complain about it when it happens.
> > >
> > > Signed-off-by: Josef Bacik 
> > > ---
> > >  fs/btrfs/ctree.c   | 3 +++
> > >  fs/btrfs/ctree.h   | 1 +
> > >  fs/btrfs/extent-tree.c | 9 +
> > >  3 files changed, 13 insertions(+)
> > >
> > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > > index 5912a97b07a6..5f82f86085e8 100644
> > > --- a/fs/btrfs/ctree.c
> > > +++ b/fs/btrfs/ctree.c
> > > @@ -1440,6 +1440,9 @@ noinline int btrfs_cow_block(struct 
> > > btrfs_trans_handle *trans,
> > > u64 search_start;
> > > int ret;
> > >
> > > +   if (test_bit(BTRFS_ROOT_DELETING, >state))
> > > +   WARN(1, KERN_CRIT "cow'ing blocks on a fs root thats 
> > > being dropped\n");
> > 
> > Please use btrfs_warn(), it makes sure we use a consistent message
> > style, identifies the fs, etc.
> > Also, "thats" should be "that is" or "that's".
> > 
> 
> Ah yeah, I was following the other convention in there but we should probably
> convert all of those to btrfs_warn.  I'll fix the grammer thing as well, just 
> a
> leftover from the much less code of conduct friendly message I originally had
> there.  Thanks,

Committed with the following fixup:

-   WARN(1, KERN_CRIT "cow'ing blocks on a fs root thats being 
dropped\n");
+   btrfs_error(fs_info,
+   "COW'ing blocks on a fs root that's being dropped");



Re: [PATCH 1/2] btrfs: catch cow on deleting snapshots

2018-11-30 Thread Hans van Kranenburg
On 12/1/18 1:19 AM, Qu Wenruo wrote:
> 
> 
> On 2018/12/1 上午12:52, Josef Bacik wrote:
>> From: Josef Bacik 
>>
>> When debugging some weird extent reference bug I suspected that we were
>> changing a snapshot while we were deleting it, which could explain my
>> bug.
> 
> May I ask under which case we're going to modify an unlinked snapshot?
> 
> Maybe metadata relocation?

I have some old logging from a filesystem that got corrupted when doing
a subvolume delete while doing metadata balance:

https://syrinx.knorrie.org/~knorrie/btrfs/domokun-balance-skinny-lockup.txt

This is one of the very few moments when I really lost a btrfs
filesystem and had to mkfs again to move on.

I'm wondering if this one looks related. I never found an explanation
for it.

Hans

> 
> Thanks,
> Qu
> 
>> This was indeed what was happening, and this patch helped me
>> verify my theory.  It is never correct to modify the snapshot once it's
>> being deleted, so mark the root when we are deleting it and make sure we
>> complain about it when it happens.
>>
>> Signed-off-by: Josef Bacik 
>> ---
>>  fs/btrfs/ctree.c   | 3 +++
>>  fs/btrfs/ctree.h   | 1 +
>>  fs/btrfs/extent-tree.c | 9 +
>>  3 files changed, 13 insertions(+)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 5912a97b07a6..5f82f86085e8 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -1440,6 +1440,9 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
>> *trans,
>>  u64 search_start;
>>  int ret;
>>  
>> +if (test_bit(BTRFS_ROOT_DELETING, >state))
>> +WARN(1, KERN_CRIT "cow'ing blocks on a fs root thats being 
>> dropped\n");
>> +
>>  if (trans->transaction != fs_info->running_transaction)
>>  WARN(1, KERN_CRIT "trans %llu running %llu\n",
>> trans->transid,
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index facde70c15ed..5a3a94ccb65c 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1199,6 +1199,7 @@ enum {
>>  BTRFS_ROOT_FORCE_COW,
>>  BTRFS_ROOT_MULTI_LOG_TASKS,
>>  BTRFS_ROOT_DIRTY,
>> +BTRFS_ROOT_DELETING,
>>  };
>>  
>>  /*
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 581c2a0b2945..dcb699dd57f3 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -9333,6 +9333,15 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>  if (block_rsv)
>>  trans->block_rsv = block_rsv;
>>  
>> +/*
>> + * This will help us catch people modifying the fs tree while we're
>> + * dropping it.  It is unsafe to mess with the fs tree while it's being
>> + * dropped as we unlock the root node and parent nodes as we walk down
>> + * the tree, assuming nothing will change.  If something does change
>> + * then we'll have stale information and drop references to blocks we've
>> + * already dropped.
>> + */
>> +set_bit(BTRFS_ROOT_DELETING, >state);
>>  if (btrfs_disk_key_objectid(_item->drop_progress) == 0) {
>>  level = btrfs_header_level(root->node);
>>  path->nodes[level] = btrfs_lock_root_node(root);
>>
> 


-- 
Hans van Kranenburg



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] btrfs: catch cow on deleting snapshots

2018-11-30 Thread Qu Wenruo


On 2018/12/1 上午12:52, Josef Bacik wrote:
> From: Josef Bacik 
> 
> When debugging some weird extent reference bug I suspected that we were
> changing a snapshot while we were deleting it, which could explain my
> bug.

May I ask under which case we're going to modify an unlinked snapshot?

Maybe metadata relocation?

Thanks,
Qu

> This was indeed what was happening, and this patch helped me
> verify my theory.  It is never correct to modify the snapshot once it's
> being deleted, so mark the root when we are deleting it and make sure we
> complain about it when it happens.
> 
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/ctree.c   | 3 +++
>  fs/btrfs/ctree.h   | 1 +
>  fs/btrfs/extent-tree.c | 9 +
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 5912a97b07a6..5f82f86085e8 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1440,6 +1440,9 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
> *trans,
>   u64 search_start;
>   int ret;
>  
> + if (test_bit(BTRFS_ROOT_DELETING, >state))
> + WARN(1, KERN_CRIT "cow'ing blocks on a fs root thats being 
> dropped\n");
> +
>   if (trans->transaction != fs_info->running_transaction)
>   WARN(1, KERN_CRIT "trans %llu running %llu\n",
>  trans->transid,
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index facde70c15ed..5a3a94ccb65c 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1199,6 +1199,7 @@ enum {
>   BTRFS_ROOT_FORCE_COW,
>   BTRFS_ROOT_MULTI_LOG_TASKS,
>   BTRFS_ROOT_DIRTY,
> + BTRFS_ROOT_DELETING,
>  };
>  
>  /*
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 581c2a0b2945..dcb699dd57f3 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9333,6 +9333,15 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   if (block_rsv)
>   trans->block_rsv = block_rsv;
>  
> + /*
> +  * This will help us catch people modifying the fs tree while we're
> +  * dropping it.  It is unsafe to mess with the fs tree while it's being
> +  * dropped as we unlock the root node and parent nodes as we walk down
> +  * the tree, assuming nothing will change.  If something does change
> +  * then we'll have stale information and drop references to blocks we've
> +  * already dropped.
> +  */
> + set_bit(BTRFS_ROOT_DELETING, >state);
>   if (btrfs_disk_key_objectid(_item->drop_progress) == 0) {
>   level = btrfs_header_level(root->node);
>   path->nodes[level] = btrfs_lock_root_node(root);
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] btrfs: catch cow on deleting snapshots

2018-11-30 Thread Josef Bacik
On Fri, Nov 30, 2018 at 05:14:54PM +, Filipe Manana wrote:
> On Fri, Nov 30, 2018 at 4:53 PM Josef Bacik  wrote:
> >
> > From: Josef Bacik 
> >
> > When debugging some weird extent reference bug I suspected that we were
> > changing a snapshot while we were deleting it, which could explain my
> > bug.  This was indeed what was happening, and this patch helped me
> > verify my theory.  It is never correct to modify the snapshot once it's
> > being deleted, so mark the root when we are deleting it and make sure we
> > complain about it when it happens.
> >
> > Signed-off-by: Josef Bacik 
> > ---
> >  fs/btrfs/ctree.c   | 3 +++
> >  fs/btrfs/ctree.h   | 1 +
> >  fs/btrfs/extent-tree.c | 9 +
> >  3 files changed, 13 insertions(+)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 5912a97b07a6..5f82f86085e8 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -1440,6 +1440,9 @@ noinline int btrfs_cow_block(struct 
> > btrfs_trans_handle *trans,
> > u64 search_start;
> > int ret;
> >
> > +   if (test_bit(BTRFS_ROOT_DELETING, >state))
> > +   WARN(1, KERN_CRIT "cow'ing blocks on a fs root thats being 
> > dropped\n");
> 
> Please use btrfs_warn(), it makes sure we use a consistent message
> style, identifies the fs, etc.
> Also, "thats" should be "that is" or "that's".
> 

Ah yeah, I was following the other convention in there but we should probably
convert all of those to btrfs_warn.  I'll fix the grammer thing as well, just a
leftover from the much less code of conduct friendly message I originally had
there.  Thanks,

Josef


Re: [PATCH 1/2] btrfs: catch cow on deleting snapshots

2018-11-30 Thread Filipe Manana
On Fri, Nov 30, 2018 at 4:53 PM Josef Bacik  wrote:
>
> From: Josef Bacik 
>
> When debugging some weird extent reference bug I suspected that we were
> changing a snapshot while we were deleting it, which could explain my
> bug.  This was indeed what was happening, and this patch helped me
> verify my theory.  It is never correct to modify the snapshot once it's
> being deleted, so mark the root when we are deleting it and make sure we
> complain about it when it happens.
>
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/ctree.c   | 3 +++
>  fs/btrfs/ctree.h   | 1 +
>  fs/btrfs/extent-tree.c | 9 +
>  3 files changed, 13 insertions(+)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 5912a97b07a6..5f82f86085e8 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1440,6 +1440,9 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
> *trans,
> u64 search_start;
> int ret;
>
> +   if (test_bit(BTRFS_ROOT_DELETING, >state))
> +   WARN(1, KERN_CRIT "cow'ing blocks on a fs root thats being 
> dropped\n");

Please use btrfs_warn(), it makes sure we use a consistent message
style, identifies the fs, etc.
Also, "thats" should be "that is" or "that's".

With that fixed,
Reviewed-by: Filipe Manana 

> +
> if (trans->transaction != fs_info->running_transaction)
> WARN(1, KERN_CRIT "trans %llu running %llu\n",
>trans->transid,
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index facde70c15ed..5a3a94ccb65c 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1199,6 +1199,7 @@ enum {
> BTRFS_ROOT_FORCE_COW,
> BTRFS_ROOT_MULTI_LOG_TASKS,
> BTRFS_ROOT_DIRTY,
> +   BTRFS_ROOT_DELETING,
>  };
>
>  /*
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 581c2a0b2945..dcb699dd57f3 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9333,6 +9333,15 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> if (block_rsv)
> trans->block_rsv = block_rsv;
>
> +   /*
> +* This will help us catch people modifying the fs tree while we're
> +* dropping it.  It is unsafe to mess with the fs tree while it's 
> being
> +* dropped as we unlock the root node and parent nodes as we walk down
> +* the tree, assuming nothing will change.  If something does change
> +* then we'll have stale information and drop references to blocks 
> we've
> +* already dropped.
> +*/
> +   set_bit(BTRFS_ROOT_DELETING, >state);
> if (btrfs_disk_key_objectid(_item->drop_progress) == 0) {
> level = btrfs_header_level(root->node);
> path->nodes[level] = btrfs_lock_root_node(root);
> --
> 2.14.3
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”


[PATCH 1/2] btrfs: catch cow on deleting snapshots

2018-11-30 Thread Josef Bacik
From: Josef Bacik 

When debugging some weird extent reference bug I suspected that we were
changing a snapshot while we were deleting it, which could explain my
bug.  This was indeed what was happening, and this patch helped me
verify my theory.  It is never correct to modify the snapshot once it's
being deleted, so mark the root when we are deleting it and make sure we
complain about it when it happens.

Signed-off-by: Josef Bacik 
---
 fs/btrfs/ctree.c   | 3 +++
 fs/btrfs/ctree.h   | 1 +
 fs/btrfs/extent-tree.c | 9 +
 3 files changed, 13 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5912a97b07a6..5f82f86085e8 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1440,6 +1440,9 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
*trans,
u64 search_start;
int ret;
 
+   if (test_bit(BTRFS_ROOT_DELETING, >state))
+   WARN(1, KERN_CRIT "cow'ing blocks on a fs root thats being 
dropped\n");
+
if (trans->transaction != fs_info->running_transaction)
WARN(1, KERN_CRIT "trans %llu running %llu\n",
   trans->transid,
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index facde70c15ed..5a3a94ccb65c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1199,6 +1199,7 @@ enum {
BTRFS_ROOT_FORCE_COW,
BTRFS_ROOT_MULTI_LOG_TASKS,
BTRFS_ROOT_DIRTY,
+   BTRFS_ROOT_DELETING,
 };
 
 /*
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 581c2a0b2945..dcb699dd57f3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9333,6 +9333,15 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
if (block_rsv)
trans->block_rsv = block_rsv;
 
+   /*
+* This will help us catch people modifying the fs tree while we're
+* dropping it.  It is unsafe to mess with the fs tree while it's being
+* dropped as we unlock the root node and parent nodes as we walk down
+* the tree, assuming nothing will change.  If something does change
+* then we'll have stale information and drop references to blocks we've
+* already dropped.
+*/
+   set_bit(BTRFS_ROOT_DELETING, >state);
if (btrfs_disk_key_objectid(_item->drop_progress) == 0) {
level = btrfs_header_level(root->node);
path->nodes[level] = btrfs_lock_root_node(root);
-- 
2.14.3