Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo


On 2017年12月08日 15:57, Anand Jain wrote:
> -EXPERIMENTAL-
> As of now when primary SB fails we won't self heal and would fail mount,
> this is an experimental patch which thinks why not go and read backup
> copy.

Just curious about in which real world case that backup super block can
help.
At least from what I see in mail list, only few cases where backup super
helps.

Despite that self super heal seems good, although I agree with David, we
need a weaker but necessary check (magic and fsid from primary super?)
to ensure it's a valid btrfs before we use the backup supers.

Thanks,
Qu


> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c |  8 +++-
>  fs/btrfs/volumes.c | 10 +++---
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9b20c1f3563b..a791b8dfe8a8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
> block_device *bdev)
>* So, we need to add a special mount option to scan for
>* later supers, using BTRFS_SUPER_MIRROR_MAX instead
>*/
> - for (i = 0; i < 1; i++) {
> + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>   if (ret)
>   continue;
> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct 
> btrfs_fs_info *fs_info)
>   ret = -EINVAL;
>   }
>  
> +#if 0
> + /*
> +  * Need a way to check for any copy of SB, as its not a
> +  * strong check, just ignore this for now.
> +  */
>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
> btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>   ret = -EINVAL;
>   }
> +#endif
>  
>   /*
>* Obvious sys_chunk_array corruptions, it must hold at least one key
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fa2539a8493..f368db94d62b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>  {
>   struct btrfs_super_block *disk_super;
>   struct block_device *bdev;
> - struct page *page;
> + struct buffer_head *sb_bh;
>   int ret = -EINVAL;
>   u64 devid;
>   u64 transid;
> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   goto error;
>   }
>  
> - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
> + sb_bh = btrfs_read_dev_super(bdev);
> + if (IS_ERR(sb_bh)) {
> + ret = PTR_ERR(sb_bh);
>   goto error_bdev_put;
> + }
> + disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>  
>   devid = btrfs_stack_device_id(&disk_super->dev_item);
>   transid = btrfs_super_generation(disk_super);
> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   if (!ret && fs_devices_ret)
>   (*fs_devices_ret)->total_devices = total_devices;
>  
> - btrfs_release_disk_super(page);
> + brelse(sb_bh);
>  
>  error_bdev_put:
>   blkdev_put(bdev, flags);
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Nikolay Borisov


On  8.12.2017 09:57, Anand Jain wrote:
> -EXPERIMENTAL-
> As of now when primary SB fails we won't self heal and would fail mount,
> this is an experimental patch which thinks why not go and read backup
> copy.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c |  8 +++-
>  fs/btrfs/volumes.c | 10 +++---
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9b20c1f3563b..a791b8dfe8a8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
> block_device *bdev)
>* So, we need to add a special mount option to scan for
>* later supers, using BTRFS_SUPER_MIRROR_MAX instead
>*/
> - for (i = 0; i < 1; i++) {
> + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>   if (ret)
>   continue;
> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct 
> btrfs_fs_info *fs_info)
>   ret = -EINVAL;
>   }
>  
> +#if 0
> + /*
> +  * Need a way to check for any copy of SB, as its not a
> +  * strong check, just ignore this for now.
> +  */
>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
> btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>   ret = -EINVAL;
>   }
> +#endif
>  
>   /*
>* Obvious sys_chunk_array corruptions, it must hold at least one key
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fa2539a8493..f368db94d62b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>  {
>   struct btrfs_super_block *disk_super;
>   struct block_device *bdev;
> - struct page *page;
> + struct buffer_head *sb_bh;
>   int ret = -EINVAL;
>   u64 devid;
>   u64 transid;
> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   goto error;
>   }
>  
> - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
> + sb_bh = btrfs_read_dev_super(bdev);

This patch prompts another question: why do we have a page-based and a
bufferhead-based interface to reading the super block? I did prototype
switching the bufferheads to page based but the resulting code wasn't
any cleaner. I believe there is also open the question what happens when
btrfs is run on a 64k page machine. I.e. we are going to read a single
page and the sb is going to be in the first 4k but what about the rest
60, they could potentially contain other metadata. The page will have to
be freed asap so as not to peg the neighboring metadata?


> + if (IS_ERR(sb_bh)) {
> + ret = PTR_ERR(sb_bh);
>   goto error_bdev_put;
> + }
> + disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>  
>   devid = btrfs_stack_device_id(&disk_super->dev_item);
>   transid = btrfs_super_generation(disk_super);
> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   if (!ret && fs_devices_ret)
>   (*fs_devices_ret)->total_devices = total_devices;
>  
> - btrfs_release_disk_super(page);
> + brelse(sb_bh);
>  
>  error_bdev_put:
>   blkdev_put(bdev, flags);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain



On 12/08/2017 04:40 PM, Nikolay Borisov wrote:



On  8.12.2017 09:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.

Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c |  8 +++-
  fs/btrfs/volumes.c | 10 +++---
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
block_device *bdev)
 * So, we need to add a special mount option to scan for
 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 */
-   for (i = 0; i < 1; i++) {
+   for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
ret = btrfs_read_dev_one_super(bdev, i, &bh);
if (ret)
continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
ret = -EINVAL;
}
  
+#if 0

+   /*
+* Need a way to check for any copy of SB, as its not a
+* strong check, just ignore this for now.
+*/
if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
ret = -EINVAL;
}
+#endif
  
  	/*

 * Obvious sys_chunk_array corruptions, it must hold at least one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
  {
struct btrfs_super_block *disk_super;
struct block_device *bdev;
-   struct page *page;
+   struct buffer_head *sb_bh;
int ret = -EINVAL;
u64 devid;
u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
goto error;
}
  
-	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))

+   sb_bh = btrfs_read_dev_super(bdev);


This patch prompts another question: why do we have a page-based and a
bufferhead-based interface to reading the super block?


 Right. we need to know that. Sorry I just saw this.

 I have a very old patch to converge them and clean this up, but haven't
 sent it because there are some missing information on why it ended up
 like that in the first place.

Thanks, Anand



I did prototype
switching the bufferheads to page based but the resulting code wasn't
any cleaner. I believe there is also open the question what happens when
btrfs is run on a 64k page machine. I.e. we are going to read a single
page and the sb is going to be in the first 4k but what about the rest
60, they could potentially contain other metadata. The page will have to
be freed asap so as not to peg the neighboring metadata?






+   if (IS_ERR(sb_bh)) {
+   ret = PTR_ERR(sb_bh);
goto error_bdev_put;
+   }
+   disk_super = (struct btrfs_super_block *) sb_bh->b_data;
  
  	devid = btrfs_stack_device_id(&disk_super->dev_item);

transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
if (!ret && fs_devices_ret)
(*fs_devices_ret)->total_devices = total_devices;
  
-	btrfs_release_disk_super(page);

+   brelse(sb_bh);
  
  error_bdev_put:

blkdev_put(bdev, flags);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain



On 12/08/2017 04:17 PM, Qu Wenruo wrote:



On 2017年12月08日 15:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.


Just curious about in which real world case that backup super block can
help.
At least from what I see in mail list, only few cases where backup super
helps.


 Theoretical design helps. I ended up in this situation though. And
 ext4 has -o sb flag to manage this part. When we can expect EIO on
 any part of the disk block why not on the LBA which contains primary
 SB. And should we fail the mount for that reason ? No.


Despite that self super heal seems good, although I agree with David, we
need a weaker but necessary check (magic and fsid from primary super?)
to ensure it's a valid btrfs before we use the backup supers.


 Of course, we already have btrfs_check_super_valid() to verify the SB,
 I don't understand why - how do we verify the SB should be the point of
 concern here, at all.

Thanks, Anand


Thanks,
Qu




Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c |  8 +++-
  fs/btrfs/volumes.c | 10 +++---
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
block_device *bdev)
 * So, we need to add a special mount option to scan for
 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 */
-   for (i = 0; i < 1; i++) {
+   for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
ret = btrfs_read_dev_one_super(bdev, i, &bh);
if (ret)
continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
ret = -EINVAL;
}
  
+#if 0

+   /*
+* Need a way to check for any copy of SB, as its not a
+* strong check, just ignore this for now.
+*/
if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
ret = -EINVAL;
}
+#endif
  
  	/*

 * Obvious sys_chunk_array corruptions, it must hold at least one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
  {
struct btrfs_super_block *disk_super;
struct block_device *bdev;
-   struct page *page;
+   struct buffer_head *sb_bh;
int ret = -EINVAL;
u64 devid;
u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
goto error;
}
  
-	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))

+   sb_bh = btrfs_read_dev_super(bdev);
+   if (IS_ERR(sb_bh)) {
+   ret = PTR_ERR(sb_bh);
goto error_bdev_put;
+   }
+   disk_super = (struct btrfs_super_block *) sb_bh->b_data;
  
  	devid = btrfs_stack_device_id(&disk_super->dev_item);

transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
if (!ret && fs_devices_ret)
(*fs_devices_ret)->total_devices = total_devices;
  
-	btrfs_release_disk_super(page);

+   brelse(sb_bh);
  
  error_bdev_put:

blkdev_put(bdev, flags);




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo


On 2017年12月08日 18:39, Anand Jain wrote:
> 
> 
> On 12/08/2017 04:17 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 15:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>>
>> Just curious about in which real world case that backup super block can
>> help.
>> At least from what I see in mail list, only few cases where backup super
>> helps.
> 
>  Theoretical design helps. I ended up in this situation though. And
>  ext4 has -o sb flag to manage this part. When we can expect EIO on
>  any part of the disk block why not on the LBA which contains primary
>  SB. And should we fail the mount for that reason ? No.

And how do you ensure it's a btrfs?

> 
>> Despite that self super heal seems good, although I agree with David, we
>> need a weaker but necessary check (magic and fsid from primary super?)
>> to ensure it's a valid btrfs before we use the backup supers.
> 
>  Of course, we already have btrfs_check_super_valid() to verify the SB,
>  I don't understand why - how do we verify the SB should be the point of
>  concern here, at all.

The point here is, to distinguish an old and invalid btrfs (some other
valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
primary fs.

This the main concern here.
The difference between recovery and recognizing a bad btrfs is quite
important.

Thanks,
Qu

> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>
>>>
>>> Signed-off-by: Anand Jain 
>>> ---
>>>   fs/btrfs/disk-io.c |  8 +++-
>>>   fs/btrfs/volumes.c | 10 +++---
>>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>    * So, we need to add a special mount option to scan for
>>>    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>    */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>   if (ret)
>>>   continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>   ret = -EINVAL;
>>>   }
>>>   +#if 0
>>> +    /*
>>> + * Need a way to check for any copy of SB, as its not a
>>> + * strong check, just ignore this for now.
>>> + */
>>>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>   ret = -EINVAL;
>>>   }
>>> +#endif
>>>     /*
>>>    * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   {
>>>   struct btrfs_super_block *disk_super;
>>>   struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>   int ret = -EINVAL;
>>>   u64 devid;
>>>   u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   goto error;
>>>   }
>>>   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>> +    if (IS_ERR(sb_bh)) {
>>> +    ret = PTR_ERR(sb_bh);
>>>   goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>     devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>   transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   if (!ret && fs_devices_ret)
>>>   (*fs_devices_ret)->total_devices = total_devices;
>>>   -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>     error_bdev_put:
>>>   blkdev_put(bdev, flags);
>>>
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain



On 12/08/2017 07:01 PM, Qu Wenruo wrote:



On 2017年12月08日 18:39, Anand Jain wrote:



On 12/08/2017 04:17 PM, Qu Wenruo wrote:



On 2017年12月08日 15:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.


Just curious about in which real world case that backup super block can
help.
At least from what I see in mail list, only few cases where backup super
helps.


  Theoretical design helps. I ended up in this situation though. And
  ext4 has -o sb flag to manage this part. When we can expect EIO on
  any part of the disk block why not on the LBA which contains primary
  SB. And should we fail the mount for that reason ? No.


And how do you ensure it's a btrfs?


 Hmm. You mean outside of btrfs ? I did experiment with wipe and then
 using /etc/fstab to mount, and it did lead to btrfs, is that your
 concern that it shouldn't have been. That looked surprising to me as
 well, but then problem points at wipefs instead.






Despite that self super heal seems good, although I agree with David, we
need a weaker but necessary check (magic and fsid from primary super?)
to ensure it's a valid btrfs before we use the backup supers.


  Of course, we already have btrfs_check_super_valid() to verify the SB,
  I don't understand why - how do we verify the SB should be the point of
  concern here, at all.


The point here is, to distinguish an old and invalid btrfs (some other
valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
primary fs.


 Ok. When you check all the SBs you would pick the one which has passed
 btrfs_check_super_valid() and has highest generation. Did I ans your
 concern ? If a SB does not pass btrfs_check_super_valid() its not a
 valid btrfs SB at all.



This the main concern here.
The difference between recovery and recognizing a bad btrfs is quite
important.


 btrfs_check_super_valid() is already doing that right ? The point here
 is, should we use the backup SB when btrfs_check_super_valid() fails on
 primary SB.

Thanks, Anand


Thanks,
Qu



Thanks, Anand


Thanks,
Qu




Signed-off-by: Anand Jain 
---
   fs/btrfs/disk-io.c |  8 +++-
   fs/btrfs/volumes.c | 10 +++---
   2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
block_device *bdev)
    * So, we need to add a special mount option to scan for
    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
    */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
   ret = btrfs_read_dev_one_super(bdev, i, &bh);
   if (ret)
   continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
btrfs_fs_info *fs_info)
   ret = -EINVAL;
   }
   +#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
   btrfs_err(fs_info, "super offset mismatch %llu != %u",
     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
   ret = -EINVAL;
   }
+#endif
     /*
    * Obvious sys_chunk_array corruptions, it must hold at least
one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   {
   struct btrfs_super_block *disk_super;
   struct block_device *bdev;
-    struct page *page;
+    struct buffer_head *sb_bh;
   int ret = -EINVAL;
   u64 devid;
   u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   goto error;
   }
   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
+    sb_bh = btrfs_read_dev_super(bdev);
+    if (IS_ERR(sb_bh)) {
+    ret = PTR_ERR(sb_bh);
   goto error_bdev_put;
+    }
+    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
     devid = btrfs_stack_device_id(&disk_super->dev_item);
   transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   if (!ret && fs_devices_ret)
   (*fs_devices_ret)->total_devices = total_devices;
   -    btrfs_release_disk_super(page);
+    brelse(sb_bh);
     error_bdev_put:
   blkdev_put(bdev, flags);




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linu

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo


On 2017年12月08日 19:48, Anand Jain wrote:
> 
> 
> On 12/08/2017 07:01 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 18:39, Anand Jain wrote:
>>>
>>>
>>> On 12/08/2017 04:17 PM, Qu Wenruo wrote:


 On 2017年12月08日 15:57, Anand Jain wrote:
> -EXPERIMENTAL-
> As of now when primary SB fails we won't self heal and would fail
> mount,
> this is an experimental patch which thinks why not go and read backup
> copy.

 Just curious about in which real world case that backup super block can
 help.
 At least from what I see in mail list, only few cases where backup
 super
 helps.
>>>
>>>   Theoretical design helps. I ended up in this situation though. And
>>>   ext4 has -o sb flag to manage this part. When we can expect EIO on
>>>   any part of the disk block why not on the LBA which contains primary
>>>   SB. And should we fail the mount for that reason ? No.
>>
>> And how do you ensure it's a btrfs?
> 
>  Hmm. You mean outside of btrfs ? I did experiment with wipe and then
>  using /etc/fstab to mount, and it did lead to btrfs, is that your
>  concern that it shouldn't have been. That looked surprising to me as
>  well, but then problem points at wipefs instead.

It's closer, but still doesn't reach the point.
It can be a mkfs, other than wipefs.

It's can be another case like:

One user used btrfs for a while
But bugs made him/her unhappy, and he/she turned to use xfs (whatever
the fs is) instead.

While he/she forgot to change its fstab, and rebooted the system.

And suddenly, it's btrfs again!
What a surprise.


> 
>>
>>>
 Despite that self super heal seems good, although I agree with
 David, we
 need a weaker but necessary check (magic and fsid from primary super?)
 to ensure it's a valid btrfs before we use the backup supers.
>>>
>>>   Of course, we already have btrfs_check_super_valid() to verify the SB,
>>>   I don't understand why - how do we verify the SB should be the
>>> point of
>>>   concern here, at all.
>>
>> The point here is, to distinguish an old and invalid btrfs (some other
>> valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
>> primary fs.
> 
>  Ok. When you check all the SBs you would pick the one which has passed
>  btrfs_check_super_valid() and has highest generation. Did I ans your
>  concern ? If a SB does not pass btrfs_check_super_valid() its not a
>  valid btrfs SB at all.

No, not really.

What if the first SB is a XFS one or even a fs you didn't ever hear?

Skip it and use the 2nd one?
This filesystem is not even btrfs anymore.

Thanks,
Qu

> 
> 
>> This the main concern here.
>> The difference between recovery and recognizing a bad btrfs is quite
>> important.
> 
>  btrfs_check_super_valid() is already doing that right ? The point here
>  is, should we use the backup SB when btrfs_check_super_valid() fails on
>  primary SB.
> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>>
>>> Thanks, Anand
>>>
 Thanks,
 Qu


>
> Signed-off-by: Anand Jain 
> ---
>    fs/btrfs/disk-io.c |  8 +++-
>    fs/btrfs/volumes.c | 10 +++---
>    2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9b20c1f3563b..a791b8dfe8a8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
> block_device *bdev)
>     * So, we need to add a special mount option to scan for
>     * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>     */
> -    for (i = 0; i < 1; i++) {
> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>    ret = btrfs_read_dev_one_super(bdev, i, &bh);
>    if (ret)
>    continue;
> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
> btrfs_fs_info *fs_info)
>    ret = -EINVAL;
>    }
>    +#if 0
> +    /*
> + * Need a way to check for any copy of SB, as its not a
> + * strong check, just ignore this for now.
> + */
>    if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>    btrfs_err(fs_info, "super offset mismatch %llu != %u",
>  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>    ret = -EINVAL;
>    }
> +#endif
>      /*
>     * Obvious sys_chunk_array corruptions, it must hold at least
> one key
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fa2539a8493..f368db94d62b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
> fmode_t flags, void *holder,
>    {
>    struct btrfs_super_block *disk_super;
>    struct block_device *bdev;
> -    struct page *page;
> +    struct buffer_head *sb_bh;
>    int ret = -EINVAL;

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Nikolay Borisov


On  8.12.2017 12:33, Anand Jain wrote:
> 
> 
> On 12/08/2017 04:40 PM, Nikolay Borisov wrote:
>>
>>
>> On  8.12.2017 09:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>>>
>>> Signed-off-by: Anand Jain 
>>> ---
>>>   fs/btrfs/disk-io.c |  8 +++-
>>>   fs/btrfs/volumes.c | 10 +++---
>>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>    * So, we need to add a special mount option to scan for
>>>    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>    */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>   if (ret)
>>>   continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>   ret = -EINVAL;
>>>   }
>>>   +#if 0
>>> +    /*
>>> + * Need a way to check for any copy of SB, as its not a
>>> + * strong check, just ignore this for now.
>>> + */
>>>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>   ret = -EINVAL;
>>>   }
>>> +#endif
>>>     /*
>>>    * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   {
>>>   struct btrfs_super_block *disk_super;
>>>   struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>   int ret = -EINVAL;
>>>   u64 devid;
>>>   u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   goto error;
>>>   }
>>>   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>
>> This patch prompts another question: why do we have a page-based and a
>> bufferhead-based interface to reading the super block?
> 
>  Right. we need to know that. Sorry I just saw this.

FWIW unless we explicitly need a per-block state tracking (which I don't
think we do) we should ideally switch to page-based mechanism. Buffer
heads are considered deprecated. Also the only reason why we do have
btrfsic_submit_bh is for the superblock bufer head. So potentially by
removing the only requirement in the kernel where bh are used we can
simplify the btrfsic code as well . Just something to keep in mind.

> 
>  I have a very old patch to converge them and clean this up, but haven't
>  sent it because there are some missing information on why it ended up
>  like that in the first place.
> 
> Thanks, Anand
> 
> 
>> I did prototype
>> switching the bufferheads to page based but the resulting code wasn't
>> any cleaner. I believe there is also open the question what happens when
>> btrfs is run on a 64k page machine. I.e. we are going to read a single
>> page and the sb is going to be in the first 4k but what about the rest
>> 60, they could potentially contain other metadata. The page will have to
>> be freed asap so as not to peg the neighboring metadata?
> 
> 
>>
>>> +    if (IS_ERR(sb_bh)) {
>>> +    ret = PTR_ERR(sb_bh);
>>>   goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>     devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>   transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   if (!ret && fs_devices_ret)
>>>   (*fs_devices_ret)->total_devices = total_devices;
>>>   -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>     error_bdev_put:
>>>   blkdev_put(bdev, flags);
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info a

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Hans van Kranenburg
On 12/08/2017 09:17 AM, Qu Wenruo wrote:
> 
> 
> On 2017年12月08日 15:57, Anand Jain wrote:
>> -EXPERIMENTAL-
>> As of now when primary SB fails we won't self heal and would fail mount,
>> this is an experimental patch which thinks why not go and read backup
>> copy.
> 
> Just curious about in which real world case that backup super block can
> help.
> At least from what I see in mail list, only few cases where backup super
> helps.

That's about using 'backup roots' with old info about previous
generations than the current one on disk, which is a different thing
than using one of the other copies of the super block itself.

> Despite that self super heal seems good, although I agree with David, we
> need a weaker but necessary check (magic and fsid from primary super?)
> to ensure it's a valid btrfs before we use the backup supers.
> 
> Thanks,
> Qu
> 
> 
>>
>> Signed-off-by: Anand Jain 
>> ---
>>  fs/btrfs/disk-io.c |  8 +++-
>>  fs/btrfs/volumes.c | 10 +++---
>>  2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9b20c1f3563b..a791b8dfe8a8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
>> block_device *bdev)
>>   * So, we need to add a special mount option to scan for
>>   * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>   */
>> -for (i = 0; i < 1; i++) {
>> +for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>  ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>  if (ret)
>>  continue;
>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct 
>> btrfs_fs_info *fs_info)
>>  ret = -EINVAL;
>>  }
>>  
>> +#if 0
>> +/*
>> + * Need a way to check for any copy of SB, as its not a
>> + * strong check, just ignore this for now.
>> + */
>>  if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>  btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>  ret = -EINVAL;
>>  }
>> +#endif
>>  
>>  /*
>>   * Obvious sys_chunk_array corruptions, it must hold at least one key
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9fa2539a8493..f368db94d62b 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
>> flags, void *holder,
>>  {
>>  struct btrfs_super_block *disk_super;
>>  struct block_device *bdev;
>> -struct page *page;
>> +struct buffer_head *sb_bh;
>>  int ret = -EINVAL;
>>  u64 devid;
>>  u64 transid;
>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
>> flags, void *holder,
>>  goto error;
>>  }
>>  
>> -if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>> +sb_bh = btrfs_read_dev_super(bdev);
>> +if (IS_ERR(sb_bh)) {
>> +ret = PTR_ERR(sb_bh);
>>  goto error_bdev_put;
>> +}
>> +disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>  
>>  devid = btrfs_stack_device_id(&disk_super->dev_item);
>>  transid = btrfs_super_generation(disk_super);
>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
>> flags, void *holder,
>>  if (!ret && fs_devices_ret)
>>  (*fs_devices_ret)->total_devices = total_devices;
>>  
>> -btrfs_release_disk_super(page);
>> +brelse(sb_bh);
>>  
>>  error_bdev_put:
>>  blkdev_put(bdev, flags);
>>
> 


-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain



On 12/08/2017 08:02 PM, Qu Wenruo wrote:



On 2017年12月08日 19:48, Anand Jain wrote:



On 12/08/2017 07:01 PM, Qu Wenruo wrote:



On 2017年12月08日 18:39, Anand Jain wrote:



On 12/08/2017 04:17 PM, Qu Wenruo wrote:



On 2017年12月08日 15:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail
mount,
this is an experimental patch which thinks why not go and read backup
copy.


Just curious about in which real world case that backup super block can
help.
At least from what I see in mail list, only few cases where backup
super
helps.


   Theoretical design helps. I ended up in this situation though. And
   ext4 has -o sb flag to manage this part. When we can expect EIO on
   any part of the disk block why not on the LBA which contains primary
   SB. And should we fail the mount for that reason ? No.


And how do you ensure it's a btrfs?


  Hmm. You mean outside of btrfs ? I did experiment with wipe and then
  using /etc/fstab to mount, and it did lead to btrfs, is that your
  concern that it shouldn't have been. That looked surprising to me as
  well, but then problem points at wipefs instead.


It's closer, but still doesn't reach the point.
It can be a mkfs, other than wipefs.

It's can be another case like:

One user used btrfs for a while
But bugs made him/her unhappy, and he/she turned to use xfs (whatever
the fs is) instead.

While he/she forgot to change its fstab, and rebooted the system.

And suddenly, it's btrfs again!
What a surprise.



 I have worked quite a lot on defining the problem statements before
 btrfs. Here the user has to be blamed to have forgotten to update
 the fstab. I don't understand why should we workaround in btrfs for
 the reason that user may miss to update fstab. We don't design
 a stuff that like that, we design for the purpose, here backup SB
 is for the purpose that we all know, if we don't use backup SB, then
 its an incomplete design.








Despite that self super heal seems good, although I agree with
David, we
need a weaker but necessary check (magic and fsid from primary super?)
to ensure it's a valid btrfs before we use the backup supers.


   Of course, we already have btrfs_check_super_valid() to verify the SB,
   I don't understand why - how do we verify the SB should be the
point of
   concern here, at all.


The point here is, to distinguish an old and invalid btrfs (some other
valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
primary fs.


  Ok. When you check all the SBs you would pick the one which has passed
  btrfs_check_super_valid() and has highest generation. Did I ans your
  concern ? If a SB does not pass btrfs_check_super_valid() its not a
  valid btrfs SB at all.


No, not really.

What if the first SB is a XFS one or even a fs you didn't ever hear?

Skip it and use the 2nd one?
This filesystem is not even btrfs anymore.


 There is no problem is user does the correct thing that is to
 update the fstab. OR using a complete mount command. If user
 using -t btrfs OR unupdated btrfs then it indicates his intentions.

Thanks, Anand


Thanks,
Qu





This the main concern here.
The difference between recovery and recognizing a bad btrfs is quite
important.


  btrfs_check_super_valid() is already doing that right ? The point here
  is, should we use the backup SB when btrfs_check_super_valid() fails on
  primary SB.

Thanks, Anand


Thanks,
Qu



Thanks, Anand


Thanks,
Qu




Signed-off-by: Anand Jain 
---
    fs/btrfs/disk-io.c |  8 +++-
    fs/btrfs/volumes.c | 10 +++---
    2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
block_device *bdev)
     * So, we need to add a special mount option to scan for
     * later supers, using BTRFS_SUPER_MIRROR_MAX instead
     */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
    ret = btrfs_read_dev_one_super(bdev, i, &bh);
    if (ret)
    continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
btrfs_fs_info *fs_info)
    ret = -EINVAL;
    }
    +#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
    if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
    btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
    ret = -EINVAL;
    }
+#endif
      /*
     * Obvious sys_chunk_array corruptions, it must hold at least
one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t 

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Austin S. Hemmelgarn

On 2017-12-08 02:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.
I like the concept, and actually think this should be default behavior 
on a filesystem that's already mounted (we fix other errors, why not 
SB's), but I don't think it should be default behavior at mount time for 
the reasons Qu has outlined (picking up old BTRFS SB's after 
reformatting is bad).  However, I do think it's useful to be able to ask 
for this behavior on mount, so that you don't need to fight with the 
programs to get a filesystem to mount when the first SB is missing 
(perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).


Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c |  8 +++-
  fs/btrfs/volumes.c | 10 +++---
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
block_device *bdev)
 * So, we need to add a special mount option to scan for
 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 */
-   for (i = 0; i < 1; i++) {
+   for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
ret = btrfs_read_dev_one_super(bdev, i, &bh);
if (ret)
continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
ret = -EINVAL;
}
  
+#if 0

+   /*
+* Need a way to check for any copy of SB, as its not a
+* strong check, just ignore this for now.
+*/
if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
ret = -EINVAL;
}
+#endif
  
  	/*

 * Obvious sys_chunk_array corruptions, it must hold at least one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
  {
struct btrfs_super_block *disk_super;
struct block_device *bdev;
-   struct page *page;
+   struct buffer_head *sb_bh;
int ret = -EINVAL;
u64 devid;
u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
goto error;
}
  
-	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))

+   sb_bh = btrfs_read_dev_super(bdev);
+   if (IS_ERR(sb_bh)) {
+   ret = PTR_ERR(sb_bh);
goto error_bdev_put;
+   }
+   disk_super = (struct btrfs_super_block *) sb_bh->b_data;
  
  	devid = btrfs_stack_device_id(&disk_super->dev_item);

transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
if (!ret && fs_devices_ret)
(*fs_devices_ret)->total_devices = total_devices;
  
-	btrfs_release_disk_super(page);

+   brelse(sb_bh);
  
  error_bdev_put:

blkdev_put(bdev, flags);



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo


On 2017年12月08日 20:41, Anand Jain wrote:
> 
> 
> On 12/08/2017 08:02 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 19:48, Anand Jain wrote:
>>>
>>>
>>> On 12/08/2017 07:01 PM, Qu Wenruo wrote:


 On 2017年12月08日 18:39, Anand Jain wrote:
>
>
> On 12/08/2017 04:17 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 15:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail
>>> mount,
>>> this is an experimental patch which thinks why not go and read
>>> backup
>>> copy.
>>
>> Just curious about in which real world case that backup super
>> block can
>> help.
>> At least from what I see in mail list, only few cases where backup
>> super
>> helps.
>
>    Theoretical design helps. I ended up in this situation though. And
>    ext4 has -o sb flag to manage this part. When we can expect EIO on
>    any part of the disk block why not on the LBA which contains
> primary
>    SB. And should we fail the mount for that reason ? No.

 And how do you ensure it's a btrfs?
>>>
>>>   Hmm. You mean outside of btrfs ? I did experiment with wipe and then
>>>   using /etc/fstab to mount, and it did lead to btrfs, is that your
>>>   concern that it shouldn't have been. That looked surprising to me as
>>>   well, but then problem points at wipefs instead.
>>
>> It's closer, but still doesn't reach the point.
>> It can be a mkfs, other than wipefs.
>>
>> It's can be another case like:
>>
>> One user used btrfs for a while
>> But bugs made him/her unhappy, and he/she turned to use xfs (whatever
>> the fs is) instead.
>>
>> While he/she forgot to change its fstab, and rebooted the system.
>>
>> And suddenly, it's btrfs again!
>> What a surprise.
>>
> 
>  I have worked quite a lot on defining the problem statements before
>  btrfs. Here the user has to be blamed to have forgotten to update
>  the fstab. I don't understand why should we workaround in btrfs for
>  the reason that user may miss to update fstab. We don't design
>  a stuff that like that, we design for the purpose, here backup SB
>  is for the purpose that we all know, if we don't use backup SB, then
>  its an incomplete design.

Then implement something like ext*, using explicit mount option sb=n.

And since it's already called "backup", it's something used for
recovery, not for normal operation.
We already have rescue tool to recover sb from backup, so it's not a
incomplete design.

Personally speaking, I prefer the current way and leave backup just as
backup.
You can my example of a user fault, but the real world needs us to
handle user's fault.

Or there will be no need for mkfs -f options or rm --no-preserve-root
options at all.

Thanks,
Qu

> 
>>>

>
>> Despite that self super heal seems good, although I agree with
>> David, we
>> need a weaker but necessary check (magic and fsid from primary
>> super?)
>> to ensure it's a valid btrfs before we use the backup supers.
>
>    Of course, we already have btrfs_check_super_valid() to verify
> the SB,
>    I don't understand why - how do we verify the SB should be the
> point of
>    concern here, at all.

 The point here is, to distinguish an old and invalid btrfs (some other
 valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
 primary fs.
>>>
>>>   Ok. When you check all the SBs you would pick the one which has passed
>>>   btrfs_check_super_valid() and has highest generation. Did I ans your
>>>   concern ? If a SB does not pass btrfs_check_super_valid() its not a
>>>   valid btrfs SB at all.
>>
>> No, not really.
>>
>> What if the first SB is a XFS one or even a fs you didn't ever hear?
>>
>> Skip it and use the 2nd one?
>> This filesystem is not even btrfs anymore.
> 
>  There is no problem is user does the correct thing that is to
>  update the fstab. OR using a complete mount command. If user
>  using -t btrfs OR unupdated btrfs then it indicates his intentions.
> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>>
>>>
 This the main concern here.
 The difference between recovery and recognizing a bad btrfs is quite
 important.
>>>
>>>   btrfs_check_super_valid() is already doing that right ? The point here
>>>   is, should we use the backup SB when btrfs_check_super_valid()
>>> fails on
>>>   primary SB.
>>>
>>> Thanks, Anand
>>>
 Thanks,
 Qu

>
> Thanks, Anand
>
>> Thanks,
>> Qu
>>
>>
>>>
>>> Signed-off-by: Anand Jain 
>>> ---
>>>     fs/btrfs/disk-io.c |  8 +++-
>>>     fs/btrfs/volumes.c | 10 +++---
>>>     2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head
>>

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo


On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote:
> On 2017-12-08 02:57, Anand Jain wrote:
>> -EXPERIMENTAL-
>> As of now when primary SB fails we won't self heal and would fail mount,
>> this is an experimental patch which thinks why not go and read backup
>> copy.
> I like the concept, and actually think this should be default behavior
> on a filesystem that's already mounted (we fix other errors, why not
> SB's), but I don't think it should be default behavior at mount time for
> the reasons Qu has outlined (picking up old BTRFS SB's after
> reformatting is bad).  However, I do think it's useful to be able to ask
> for this behavior on mount, so that you don't need to fight with the
> programs to get a filesystem to mount when the first SB is missing
> (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).

Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount
option than do it automatically.

Thanks,
Qu

>>
>> Signed-off-by: Anand Jain 
>> ---
>>   fs/btrfs/disk-io.c |  8 +++-
>>   fs/btrfs/volumes.c | 10 +++---
>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9b20c1f3563b..a791b8dfe8a8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>> block_device *bdev)
>>    * So, we need to add a special mount option to scan for
>>    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>    */
>> -    for (i = 0; i < 1; i++) {
>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>   if (ret)
>>   continue;
>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>> btrfs_fs_info *fs_info)
>>   ret = -EINVAL;
>>   }
>>   +#if 0
>> +    /*
>> + * Need a way to check for any copy of SB, as its not a
>> + * strong check, just ignore this for now.
>> + */
>>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>   ret = -EINVAL;
>>   }
>> +#endif
>>     /*
>>    * Obvious sys_chunk_array corruptions, it must hold at least
>> one key
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9fa2539a8493..f368db94d62b 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>> fmode_t flags, void *holder,
>>   {
>>   struct btrfs_super_block *disk_super;
>>   struct block_device *bdev;
>> -    struct page *page;
>> +    struct buffer_head *sb_bh;
>>   int ret = -EINVAL;
>>   u64 devid;
>>   u64 transid;
>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>> fmode_t flags, void *holder,
>>   goto error;
>>   }
>>   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>> +    sb_bh = btrfs_read_dev_super(bdev);
>> +    if (IS_ERR(sb_bh)) {
>> +    ret = PTR_ERR(sb_bh);
>>   goto error_bdev_put;
>> +    }
>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>     devid = btrfs_stack_device_id(&disk_super->dev_item);
>>   transid = btrfs_super_generation(disk_super);
>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>> fmode_t flags, void *holder,
>>   if (!ret && fs_devices_ret)
>>   (*fs_devices_ret)->total_devices = total_devices;
>>   -    btrfs_release_disk_super(page);
>> +    brelse(sb_bh);
>>     error_bdev_put:
>>   blkdev_put(bdev, flags);
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain



On 12/08/2017 08:12 PM, Nikolay Borisov wrote:



On  8.12.2017 12:33, Anand Jain wrote:



On 12/08/2017 04:40 PM, Nikolay Borisov wrote:



On  8.12.2017 09:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.

Signed-off-by: Anand Jain 
---
   fs/btrfs/disk-io.c |  8 +++-
   fs/btrfs/volumes.c | 10 +++---
   2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
block_device *bdev)
    * So, we need to add a special mount option to scan for
    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
    */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
   ret = btrfs_read_dev_one_super(bdev, i, &bh);
   if (ret)
   continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
btrfs_fs_info *fs_info)
   ret = -EINVAL;
   }
   +#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
   btrfs_err(fs_info, "super offset mismatch %llu != %u",
     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
   ret = -EINVAL;
   }
+#endif
     /*
    * Obvious sys_chunk_array corruptions, it must hold at least
one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   {
   struct btrfs_super_block *disk_super;
   struct block_device *bdev;
-    struct page *page;
+    struct buffer_head *sb_bh;
   int ret = -EINVAL;
   u64 devid;
   u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   goto error;
   }
   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
+    sb_bh = btrfs_read_dev_super(bdev);


This patch prompts another question: why do we have a page-based and a
bufferhead-based interface to reading the super block?


  Right. we need to know that. Sorry I just saw this.


FWIW unless we explicitly need a per-block state tracking (which I don't
think we do) we should ideally switch to page-based mechanism. Buffer
heads are considered deprecated. Also the only reason why we do have
btrfsic_submit_bh is for the superblock bufer head. So potentially by
removing the only requirement in the kernel where bh are used we can
simplify the btrfsic code as well . Just something to keep in mind.


 Thanks. These are related to the other email thread [1]
 [1]
 [Question] Two variants of SB reads can we move into bio read instead ?
 We can follow up there. ? Could we ? That cleanup is due for a long
 time now.

Thanks, Anand



  I have a very old patch to converge them and clean this up, but haven't
  sent it because there are some missing information on why it ended up
  like that in the first place.

Thanks, Anand



I did prototype
switching the bufferheads to page based but the resulting code wasn't
any cleaner. I believe there is also open the question what happens when
btrfs is run on a 64k page machine. I.e. we are going to read a single
page and the sb is going to be in the first 4k but what about the rest
60, they could potentially contain other metadata. The page will have to
be freed asap so as not to peg the neighboring metadata?






+    if (IS_ERR(sb_bh)) {
+    ret = PTR_ERR(sb_bh);
   goto error_bdev_put;
+    }
+    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
     devid = btrfs_stack_device_id(&disk_super->dev_item);
   transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   if (!ret && fs_devices_ret)
   (*fs_devices_ret)->total_devices = total_devices;
   -    btrfs_release_disk_super(page);
+    brelse(sb_bh);
     error_bdev_put:
   blkdev_put(bdev, flags);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Austin S. Hemmelgarn

On 2017-12-08 07:59, Qu Wenruo wrote:



On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote:

On 2017-12-08 02:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.

I like the concept, and actually think this should be default behavior
on a filesystem that's already mounted (we fix other errors, why not
SB's), but I don't think it should be default behavior at mount time for
the reasons Qu has outlined (picking up old BTRFS SB's after
reformatting is bad).  However, I do think it's useful to be able to ask
for this behavior on mount, so that you don't need to fight with the
programs to get a filesystem to mount when the first SB is missing
(perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).


Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount
option than do it automatically.
I still think there should be an option to do automatic detection (it's 
not particularly hard, and it's not likely to do the wrong thing in most 
cases), but being able to explicitly specify a particular superblock for 
a mount is definitely a step in the right direction.


Thanks,
Qu



Signed-off-by: Anand Jain 
---
   fs/btrfs/disk-io.c |  8 +++-
   fs/btrfs/volumes.c | 10 +++---
   2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
block_device *bdev)
    * So, we need to add a special mount option to scan for
    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
    */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
   ret = btrfs_read_dev_one_super(bdev, i, &bh);
   if (ret)
   continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
btrfs_fs_info *fs_info)
   ret = -EINVAL;
   }
   +#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
   btrfs_err(fs_info, "super offset mismatch %llu != %u",
     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
   ret = -EINVAL;
   }
+#endif
     /*
    * Obvious sys_chunk_array corruptions, it must hold at least
one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   {
   struct btrfs_super_block *disk_super;
   struct block_device *bdev;
-    struct page *page;
+    struct buffer_head *sb_bh;
   int ret = -EINVAL;
   u64 devid;
   u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   goto error;
   }
   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
+    sb_bh = btrfs_read_dev_super(bdev);
+    if (IS_ERR(sb_bh)) {
+    ret = PTR_ERR(sb_bh);
   goto error_bdev_put;
+    }
+    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
     devid = btrfs_stack_device_id(&disk_super->dev_item);
   transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   if (!ret && fs_devices_ret)
   (*fs_devices_ret)->total_devices = total_devices;
   -    btrfs_release_disk_super(page);
+    brelse(sb_bh);
     error_bdev_put:
   blkdev_put(bdev, flags);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo


On 2017年12月08日 21:05, Austin S. Hemmelgarn wrote:
> On 2017-12-08 07:59, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote:
>>> On 2017-12-08 02:57, Anand Jain wrote:
 -EXPERIMENTAL-
 As of now when primary SB fails we won't self heal and would fail
 mount,
 this is an experimental patch which thinks why not go and read backup
 copy.
>>> I like the concept, and actually think this should be default behavior
>>> on a filesystem that's already mounted (we fix other errors, why not
>>> SB's), but I don't think it should be default behavior at mount time for
>>> the reasons Qu has outlined (picking up old BTRFS SB's after
>>> reformatting is bad).  However, I do think it's useful to be able to ask
>>> for this behavior on mount, so that you don't need to fight with the
>>> programs to get a filesystem to mount when the first SB is missing
>>> (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).
>>
>> Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount
>> option than do it automatically.
> I still think there should be an option to do automatic detection (it's
> not particularly hard, and it's not likely to do the wrong thing in most
> cases), but being able to explicitly specify a particular superblock for
> a mount is definitely a step in the right direction.

usebackupsuper <- Auto
usebackupsuper=n <- Manual
usebackupsuper=n,m <- Manual multi, in given order
(The last one seems a little overkilled though)

Thanks,
Qu

>>
>> Thanks,
>> Qu
>>

 Signed-off-by: Anand Jain 
 ---
    fs/btrfs/disk-io.c |  8 +++-
    fs/btrfs/volumes.c | 10 +++---
    2 files changed, 14 insertions(+), 4 deletions(-)

 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 9b20c1f3563b..a791b8dfe8a8 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
 block_device *bdev)
     * So, we need to add a special mount option to scan for
     * later supers, using BTRFS_SUPER_MIRROR_MAX instead
     */
 -    for (i = 0; i < 1; i++) {
 +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
    ret = btrfs_read_dev_one_super(bdev, i, &bh);
    if (ret)
    continue;
 @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
 btrfs_fs_info *fs_info)
    ret = -EINVAL;
    }
    +#if 0
 +    /*
 + * Need a way to check for any copy of SB, as its not a
 + * strong check, just ignore this for now.
 + */
    if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
    btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
    ret = -EINVAL;
    }
 +#endif
      /*
     * Obvious sys_chunk_array corruptions, it must hold at least
 one key
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 9fa2539a8493..f368db94d62b 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
 fmode_t flags, void *holder,
    {
    struct btrfs_super_block *disk_super;
    struct block_device *bdev;
 -    struct page *page;
 +    struct buffer_head *sb_bh;
    int ret = -EINVAL;
    u64 devid;
    u64 transid;
 @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
 fmode_t flags, void *holder,
    goto error;
    }
    -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
 +    sb_bh = btrfs_read_dev_super(bdev);
 +    if (IS_ERR(sb_bh)) {
 +    ret = PTR_ERR(sb_bh);
    goto error_bdev_put;
 +    }
 +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
      devid = btrfs_stack_device_id(&disk_super->dev_item);
    transid = btrfs_super_generation(disk_super);
 @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
 fmode_t flags, void *holder,
    if (!ret && fs_devices_ret)
    (*fs_devices_ret)->total_devices = total_devices;
    -    btrfs_release_disk_super(page);
 +    brelse(sb_bh);
      error_bdev_put:
    blkdev_put(bdev, flags);

> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain



On 12/08/2017 08:51 PM, Austin S. Hemmelgarn wrote:

On 2017-12-08 02:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.
I like the concept, and actually think this should be default behavior 
on a filesystem that's already mounted (we fix other errors, why not 
SB's), 



but I don't think it should be default behavior at mount time for 
the reasons Qu has outlined (picking up old BTRFS SB's after 
reformatting is bad).


 If the primary SB is not btrfs, then unless forced with -t btrfs
 option, this mount won't be in btrfs at all. That means it goes to
 the respective FS by default.

 If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs
 is not overwriting all the copies of SB then its a mkfs.btrfs bug.

 Which means there is already mount -t option to redirect to the FS
 module that is needed by the user, when primary SB is corrupted.
 We should use this feature.

Thanks, Anand


However, I do think it's useful to be able to ask 
for this behavior on mount, so that you don't need to fight with the 
programs to get a filesystem to mount when the first SB is missing 
(perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).


Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c |  8 +++-
  fs/btrfs/volumes.c | 10 +++---
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
block_device *bdev)

   * So, we need to add a special mount option to scan for
   * later supers, using BTRFS_SUPER_MIRROR_MAX instead
   */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
  ret = btrfs_read_dev_one_super(bdev, i, &bh);
  if (ret)
  continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct 
btrfs_fs_info *fs_info)

  ret = -EINVAL;
  }
+#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
  if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
  btrfs_err(fs_info, "super offset mismatch %llu != %u",
    btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
  ret = -EINVAL;
  }
+#endif
  /*
   * Obvious sys_chunk_array corruptions, it must hold at least 
one key

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, 
fmode_t flags, void *holder,

  {
  struct btrfs_super_block *disk_super;
  struct block_device *bdev;
-    struct page *page;
+    struct buffer_head *sb_bh;
  int ret = -EINVAL;
  u64 devid;
  u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, 
fmode_t flags, void *holder,

  goto error;
  }
-    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
+    sb_bh = btrfs_read_dev_super(bdev);
+    if (IS_ERR(sb_bh)) {
+    ret = PTR_ERR(sb_bh);
  goto error_bdev_put;
+    }
+    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
  devid = btrfs_stack_device_id(&disk_super->dev_item);
  transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, 
fmode_t flags, void *holder,

  if (!ret && fs_devices_ret)
  (*fs_devices_ret)->total_devices = total_devices;
-    btrfs_release_disk_super(page);
+    brelse(sb_bh);
  error_bdev_put:
  blkdev_put(bdev, flags);



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Paul Jones
> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-
> ow...@vger.kernel.org] On Behalf Of Austin S. Hemmelgarn
> Sent: Friday, 8 December 2017 11:51 PM
> To: Anand Jain ; linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH RFC] btrfs: self heal from SB fail
> 
> On 2017-12-08 02:57, Anand Jain wrote:
> > -EXPERIMENTAL-
> > As of now when primary SB fails we won't self heal and would fail
> > mount, this is an experimental patch which thinks why not go and read
> > backup copy.
> I like the concept, and actually think this should be default behavior on a
> filesystem that's already mounted (we fix other errors, why not SB's), but I
> don't think it should be default behavior at mount time for the reasons Qu
> has outlined (picking up old BTRFS SB's after reformatting is bad).  However, 
> I
> do think it's useful to be able to ask for this behavior on mount, so that you
> don't need to fight with the programs to get a filesystem to mount when the
> first SB is missing (perhaps add a 'usebackupsb' option to mirror
> 'usebackuproot'?).

I agree with this. The behaviour I'd like to see would be refusal to mount 
(without additional mount options) but also: print the needed info to the 
kernel log so the user can add the required mount option or read the wiki for 
more information, and print some diagnostic info on the primary + secondary 
super blocks.

Paul.








Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo


On 2017年12月08日 22:02, Anand Jain wrote:
> 
> 
> On 12/08/2017 08:51 PM, Austin S. Hemmelgarn wrote:
>> On 2017-12-08 02:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>> I like the concept, and actually think this should be default behavior
>> on a filesystem that's already mounted (we fix other errors, why not
>> SB's), 
> 
> 
>> but I don't think it should be default behavior at mount time for the
>> reasons Qu has outlined (picking up old BTRFS SB's after reformatting
>> is bad).
> 
>  If the primary SB is not btrfs, then unless forced with -t btrfs
>  option, this mount won't be in btrfs at all. That means it goes to
>  the respective FS by default.
> 
>  If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs
>  is not overwriting all the copies of SB then its a mkfs.btrfs bug.

Nope.

If the new btrfs is a smaller one than original btrfs, the 2nd super can
still be there.

(AFAIK all these examples have been given by David for several times)

Thanks,
Qu
> 
>  Which means there is already mount -t option to redirect to the FS
>  module that is needed by the user, when primary SB is corrupted.
>  We should use this feature.
> 
> Thanks, Anand
> 
> 
>> However, I do think it's useful to be able to ask for this behavior on
>> mount, so that you don't need to fight with the programs to get a
>> filesystem to mount when the first SB is missing (perhaps add a
>> 'usebackupsb' option to mirror 'usebackuproot'?).
>>>
>>> Signed-off-by: Anand Jain 
>>> ---
>>>   fs/btrfs/disk-io.c |  8 +++-
>>>   fs/btrfs/volumes.c | 10 +++---
>>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>    * So, we need to add a special mount option to scan for
>>>    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>    */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>   if (ret)
>>>   continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>   ret = -EINVAL;
>>>   }
>>> +#if 0
>>> +    /*
>>> + * Need a way to check for any copy of SB, as its not a
>>> + * strong check, just ignore this for now.
>>> + */
>>>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>   ret = -EINVAL;
>>>   }
>>> +#endif
>>>   /*
>>>    * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   {
>>>   struct btrfs_super_block *disk_super;
>>>   struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>   int ret = -EINVAL;
>>>   u64 devid;
>>>   u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   goto error;
>>>   }
>>> -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>> +    if (IS_ERR(sb_bh)) {
>>> +    ret = PTR_ERR(sb_bh);
>>>   goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>   devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>   transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   if (!ret && fs_devices_ret)
>>>   (*fs_devices_ret)->total_devices = total_devices;
>>> -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>   error_bdev_put:
>>>   blkdev_put(bdev, flags);
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature


Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-11 Thread Anand Jain





  If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs
  is not overwriting all the copies of SB then its a mkfs.btrfs bug.


Nope.

If the new btrfs is a smaller one than original btrfs, the 2nd super can
still be there.


 Oh right, the mkfs.btrfs -b  option, where we don't
 use entire device.

 Here the source of the problem is having the stale SB which is
 created by previous btrfs, and at the time of mkfs we do notice
 that there is a stale SB, so we already know how many copy of SB
 were there, but we are bit conservative not to clean all SBs.
 Its important to fix the problem end (mkfs), not the victim end
 (sb recovery).

 But looks like not everyone agrees to this view, and sb self heal
 will go wrong if mkfs does not clean all copy of SB, so now the
 only way to recover it is by user intervention, so I am ok to add
 mount -o sb_copy_num=<0|1|2>. Suggestions for any better name is
 welcome.

 Side check, any idea what's the use case around mkfs.btrfs -b option
 other than for testing ?

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html