date:20171208

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo



On 2017年12月08日 22:02, Anand Jain wrote:
> 
> 
> On 12/08/2017 08:51 PM, Austin S. Hemmelgarn wrote:
>> On 2017-12-08 02:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>> I like the concept, and actually think this should be default behavior
>> on a filesystem that's already mounted (we fix other errors, why not
>> SB's), 
> 
> 
>> but I don't think it should be default behavior at mount time for the
>> reasons Qu has outlined (picking up old BTRFS SB's after reformatting
>> is bad).
> 
>  If the primary SB is not btrfs, then unless forced with -t btrfs
>  option, this mount won't be in btrfs at all. That means it goes to
>  the respective FS by default.
> 
>  If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs
>  is not overwriting all the copies of SB then its a mkfs.btrfs bug.

Nope.

If the new btrfs is a smaller one than original btrfs, the 2nd super can
still be there.

(AFAIK all these examples have been given by David for several times)

Thanks,
Qu
> 
>  Which means there is already mount -t option to redirect to the FS
>  module that is needed by the user, when primary SB is corrupted.
>  We should use this feature.
> 
> Thanks, Anand
> 
> 
>> However, I do think it's useful to be able to ask for this behavior on
>> mount, so that you don't need to fight with the programs to get a
>> filesystem to mount when the first SB is missing (perhaps add a
>> 'usebackupsb' option to mirror 'usebackuproot'?).
>>>
>>> Signed-off-by: Anand Jain 
>>> ---
>>>   fs/btrfs/disk-io.c |  8 +++-
>>>   fs/btrfs/volumes.c | 10 +++---
>>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>    * So, we need to add a special mount option to scan for
>>>    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>    */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>   if (ret)
>>>   continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>   ret = -EINVAL;
>>>   }
>>> +#if 0
>>> +    /*
>>> + * Need a way to check for any copy of SB, as its not a
>>> + * strong check, just ignore this for now.
>>> + */
>>>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>   ret = -EINVAL;
>>>   }
>>> +#endif
>>>   /*
>>>    * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   {
>>>   struct btrfs_super_block *disk_super;
>>>   struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>   int ret = -EINVAL;
>>>   u64 devid;
>>>   u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   goto error;
>>>   }
>>> -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>> +    if (IS_ERR(sb_bh)) {
>>> +    ret = PTR_ERR(sb_bh);
>>>   goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>   devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>   transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   if (!ret && fs_devices_ret)
>>>   (*fs_devices_ret)->total_devices = total_devices;
>>> -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>   error_bdev_put:
>>>   blkdev_put(bdev, flags);
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] Btrfs: raid56: fix race between merge_bio and rbio_orig_end_io

2017-12-08 Thread Liu Bo

(Add Jérôme Carretero.)

Thanks,

-liubo

On Fri, Dec 08, 2017 at 04:02:35PM -0700, Liu Bo wrote:
> We're not allowed to take any new bios to rbio->bio_list in
> rbio_orig_end_io(), otherwise we may get merged with more bios and
> rbio->bio_list is not empty.
> 
> This should only happens in error-out cases, the normal path of
> recover and full stripe write have already set RBIO_RMW_LOCKED_BIT to
> disable merge before doing IO.
> 
> Reported-by: Jérôme Carretero 
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/raid56.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 5aa9d22..127c782 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -859,12 +859,23 @@ static void free_raid_bio(struct btrfs_raid_bio *rbio)
>   */
>  static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, blk_status_t err)
>  {
> - struct bio *cur = bio_list_get(&rbio->bio_list);
> + struct bio *cur;
>   struct bio *next;
>  
> + /*
> +  * We're not allowed to take any new bios to rbio->bio_list
> +  * from now on, otherwise we may get merged with more bios and
> +  * rbio->bio_list is not empty.
> +  */
> + spin_lock(&rbio->bio_list_lock);
> + set_bit(RBIO_RMW_LOCKED_BIT, &rbio->flags);
> + spin_unlock(&rbio->bio_list_lock);
> +
>   if (rbio->generic_bio_cnt)
>   btrfs_bio_counter_sub(rbio->fs_info, rbio->generic_bio_cnt);
>  
> + cur = bio_list_get(&rbio->bio_list);
> +
>   free_raid_bio(rbio);
>  
>   while (cur) {
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: raid56: fix race between merge_bio and rbio_orig_end_io

2017-12-08 Thread Liu Bo

We're not allowed to take any new bios to rbio->bio_list in
rbio_orig_end_io(), otherwise we may get merged with more bios and
rbio->bio_list is not empty.

This should only happens in error-out cases, the normal path of
recover and full stripe write have already set RBIO_RMW_LOCKED_BIT to
disable merge before doing IO.

Reported-by: Jérôme Carretero 
Signed-off-by: Liu Bo 
---
 fs/btrfs/raid56.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 5aa9d22..127c782 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -859,12 +859,23 @@ static void free_raid_bio(struct btrfs_raid_bio *rbio)
  */
 static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, blk_status_t err)
 {
-   struct bio *cur = bio_list_get(&rbio->bio_list);
+   struct bio *cur;
struct bio *next;
 
+   /*
+* We're not allowed to take any new bios to rbio->bio_list
+* from now on, otherwise we may get merged with more bios and
+* rbio->bio_list is not empty.
+*/
+   spin_lock(&rbio->bio_list_lock);
+   set_bit(RBIO_RMW_LOCKED_BIT, &rbio->flags);
+   spin_unlock(&rbio->bio_list_lock);
+
if (rbio->generic_bio_cnt)
btrfs_bio_counter_sub(rbio->fs_info, rbio->generic_bio_cnt);
 
+   cur = bio_list_get(&rbio->bio_list);
+
free_raid_bio(rbio);
 
while (cur) {
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Paul Jones

> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-
> ow...@vger.kernel.org] On Behalf Of Austin S. Hemmelgarn
> Sent: Friday, 8 December 2017 11:51 PM
> To: Anand Jain ; linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH RFC] btrfs: self heal from SB fail
> 
> On 2017-12-08 02:57, Anand Jain wrote:
> > -EXPERIMENTAL-
> > As of now when primary SB fails we won't self heal and would fail
> > mount, this is an experimental patch which thinks why not go and read
> > backup copy.
> I like the concept, and actually think this should be default behavior on a
> filesystem that's already mounted (we fix other errors, why not SB's), but I
> don't think it should be default behavior at mount time for the reasons Qu
> has outlined (picking up old BTRFS SB's after reformatting is bad).  However, 
> I
> do think it's useful to be able to ask for this behavior on mount, so that you
> don't need to fight with the programs to get a filesystem to mount when the
> first SB is missing (perhaps add a 'usebackupsb' option to mirror
> 'usebackuproot'?).

I agree with this. The behaviour I'd like to see would be refusal to mount 
(without additional mount options) but also: print the needed info to the 
kernel log so the user can add the required mount option or read the wiki for 
more information, and print some diagnostic info on the primary + secondary 
super blocks.

Paul.

Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection

2017-12-08 Thread Daniel Borkmann

On 12/08/2017 09:24 PM, Josef Bacik wrote:
> On Fri, Dec 08, 2017 at 04:35:44PM +0100, Daniel Borkmann wrote:
>> On 12/06/2017 05:12 PM, Josef Bacik wrote:
>>> Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro.  I went 
>>> to
>>> figure out why the compiler didn't catch it and it's because it was not used
>>> anywhere.  I had copied it from the trace blacklist code without 
>>> understanding
>>> where it was used as cscope didn't find the original macro I was looking 
>>> for, so
>>> I assumed it was some voodoo and left it in place.  Turns out cscope failed 
>>> me
>>> and I didn't need the macro at all, the trace blacklist thing I was looking 
>>> at
>>> was for marking assembly functions as blacklisted and I have no intention of
>>> marking assembly functions as error injectable at the moment.
>>>
>>> v7->v8:
>>> - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.
>>
>> The series doesn't apply cleanly to the bpf-next tree, so one last respin 
>> with
>> a rebase would unfortunately still be required, thanks!
> 
> I've rebased and let it sit in my git tree to make sure kbuild test bot didn't
> blow up, can you pull from
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
> bpf-override-return
> 
> or do you want me to repost the whole series?  Thanks,

Yeah, the patches would need to end up on netdev, so once kbuild bot went
through fine after your rebase, please send the series.

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-08 Thread Matthew Wilcox

On Thu, Dec 07, 2017 at 11:38:43AM +1100, Dave Chinner wrote:
> > > cmpxchg is for replacing a known object in a store - it's not really
> > > intended for doing initial inserts after a lookup tells us there is
> > > nothing in the store.  The radix tree "insert only if empty" makes
> > > sense here, because it naturally takes care of lookup/insert races
> > > via the -EEXIST mechanism.
> > > 
> > > I think that providing xa_store_excl() (which would return -EEXIST
> > > if the entry is not empty) would be a better interface here, because
> > > it matches the semantics of lookup cache population used all over
> > > the kernel
> > 
> > I'm not thrilled with xa_store_excl(), but I need to think about that
> > a bit more.
> 
> Not fussed about the name - I just think we need a function that
> matches the insert semantics of the code

I think I have something that works better for you than returning -EEXIST
(because you don't actually want -EEXIST, you want -EAGAIN):

/* insert the new inode */
-   spin_lock(&pag->pag_ici_lock);
-   error = radix_tree_insert(&pag->pag_ici_root, agino, ip);
-   if (unlikely(error)) {
-   WARN_ON(error != -EEXIST);
-   XFS_STATS_INC(mp, xs_ig_dup);
-   error = -EAGAIN;
-   goto out_preload_end;
-   }
-   spin_unlock(&pag->pag_ici_lock);
-   radix_tree_preload_end();
+   curr = xa_cmpxchg(&pag->pag_ici_xa, agino, NULL, ip, GFP_NOFS);
+   error = __xa_race(curr, -EAGAIN);
+   if (error)
+   goto out_unlock;

...

-out_preload_end:
-   spin_unlock(&pag->pag_ici_lock);
-   radix_tree_preload_end();
+out_unlock:
+   if (error == -EAGAIN)
+   XFS_STATS_INC(mp, xs_ig_dup);

I've changed the behaviour slightly in that returning an -ENOMEM used to
hit a WARN_ON, and I don't think that's the right response -- GFP_NOFS
returning -ENOMEM probably gets you a nice warning already from the
mm code.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Lockdep is less useful than it was

2017-12-08 Thread Dave Chinner

On Fri, Dec 08, 2017 at 10:14:38AM -0800, Matthew Wilcox wrote:
> At the moment, the radix tree actively disables the RCU checking that
> enabling lockdep would give us.  It has to, because it has no idea what
> lock protects any individual access to the radix tree.  The XArray can
> use the RCU checking because it knows that every reference is protected
> by either the spinlock or the RCU lock.
> 
> Dave was saying that he has a tree which has to be protected by a mutex
> because of where it is in the locking hierarchy, and I was vigorously
> declining his proposal of allowing him to skip taking the spinlock.

Oh, I wasn't suggesting that you remove the internal tree locking
because we need external locking.

I was trying to point out that the internal locking doesn't remove
the need for external locking,  and that there are cases where
smearing the internal lock outside the XA tree doesn't work, either.
i.e. internal locking doesn't replace all the cases where external
locking is required, and so it's less efficient than the existing
radix tree code.

What I was questioning was the value of replacing the radix tree
code with a less efficient structure just to add lockdep validation
to a tree that doesn't actually need any extra locking validation...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-08 Thread Dave Chinner

On Fri, Dec 08, 2017 at 12:35:07PM -0500, Alan Stern wrote:
> On Fri, 8 Dec 2017, Byungchul Park wrote:
> 
> > I'm sorry to hear that.. If I were you, I would also get
> > annoyed. And.. thanks for explanation.
> > 
> > But, I think assigning lock classes properly and checking
> > relationship of the classes to detect deadlocks is reasonable.
> > 
> > In my opinion about the common lockdep stuff, there are 2
> > problems on it.
> > 
> > 1) Firstly, it's hard to assign lock classes *properly*. By
> > default, it relies on the caller site of lockdep_init_map(),
> > but we need to assign another class manually, where ordering
> > rules are complicated so cannot rely on the caller site. That
> > *only* can be done by experts of the subsystem.

Sure, but that's not the issue here. The issue here is the lack of
communication with subsystem experts and that the annotation
complexity warnings given immediately by the subsystem experts were
completely ignored...

> > I think if they want to get benifit from lockdep, they have no
> > choice but to assign classes manually with the domain knowledge,
> > or use *lockdep_set_novalidate_class()* to invalidate locks
> > making the developers annoyed and not want to use the checking
> > for them.
> 
> Lockdep's no_validate class is used when the locking patterns are too
> complicated for lockdep to understand.  Basically, it tells lockdep to
> ignore those locks.

Let me just point out two things here:

1. Using lockdep_set_novalidate_class() for anything other
than device->mutex will throw checkpatch warnings. Nice. (*)

2. lockdep_set_novalidate_class() is completely undocumented
- it's the first I've ever heard of this functionality. i.e.
nobody has ever told us there is a mechanism to turn off
validation of an object; we've *always* been told to "change
your code and/or fix your annotations" when discussing
lockdep deficiencies. (**)

> The device core uses that class.  The tree of struct devices, each with
> its own lock, gets used in many different and complicated ways.  
> Lockdep can't understand this -- it doesn't have the ability to
> represent an arbitrarily deep hierarchical tree of locks -- so we tell
^^

That largely describes the in-memory structure of XFS, except we
have a forest of lock trees, not just one

> it to ignore the device locks.
> 
> It sounds like XFS may need to do the same thing with its semaphores.

Who-ever adds semaphore checking to lockdep can add those
annotations. The externalisation of the development cost of new
lockdep functionality is one of the problems here.

-Dave.

(*) checkpatch.pl is considered mostly harmful round here, too,
but that's another rant

(**) the frequent occurrence of "core code/devs aren't held to the
same rules/standard as everyone else" is another rant I have stored
up for a rainy day.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection

2017-12-08 Thread Josef Bacik

On Fri, Dec 08, 2017 at 04:35:44PM +0100, Daniel Borkmann wrote:
> On 12/06/2017 05:12 PM, Josef Bacik wrote:
> > Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro.  I went 
> > to
> > figure out why the compiler didn't catch it and it's because it was not used
> > anywhere.  I had copied it from the trace blacklist code without 
> > understanding
> > where it was used as cscope didn't find the original macro I was looking 
> > for, so
> > I assumed it was some voodoo and left it in place.  Turns out cscope failed 
> > me
> > and I didn't need the macro at all, the trace blacklist thing I was looking 
> > at
> > was for marking assembly functions as blacklisted and I have no intention of
> > marking assembly functions as error injectable at the moment.
> > 
> > v7->v8:
> > - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.
> 
> The series doesn't apply cleanly to the bpf-next tree, so one last respin with
> a rebase would unfortunately still be required, thanks!

I've rebased and let it sit in my git tree to make sure kbuild test bot didn't
blow up, can you pull from

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
bpf-override-return

or do you want me to repost the whole series?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Lockdep is less useful than it was

2017-12-08 Thread Matthew Wilcox

On Fri, Dec 08, 2017 at 10:27:17AM -0500, Theodore Ts'o wrote:
> So if you are adding complexity to the kernel with the argument,
> "lockdep will save us", I'm with Dave --- it's just not a believable
> argument.

I think that's a gross misrepresentation of what I'm doing.

At the moment, the radix tree actively disables the RCU checking that
enabling lockdep would give us.  It has to, because it has no idea what
lock protects any individual access to the radix tree.  The XArray can
use the RCU checking because it knows that every reference is protected
by either the spinlock or the RCU lock.

Dave was saying that he has a tree which has to be protected by a mutex
because of where it is in the locking hierarchy, and I was vigorously
declining his proposal of allowing him to skip taking the spinlock.

And yes, we have bugs today that I assume we only stumble across every
few billion years (or only on alpha, or only if our compiler gets more
aggressive) because we have missing rcu_dereference annotations.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-08 Thread Alan Stern

On Fri, 8 Dec 2017, Byungchul Park wrote:

> I'm sorry to hear that.. If I were you, I would also get
> annoyed. And.. thanks for explanation.
> 
> But, I think assigning lock classes properly and checking
> relationship of the classes to detect deadlocks is reasonable.
> 
> In my opinion about the common lockdep stuff, there are 2
> problems on it.
> 
> 1) Firstly, it's hard to assign lock classes *properly*. By
> default, it relies on the caller site of lockdep_init_map(),
> but we need to assign another class manually, where ordering
> rules are complicated so cannot rely on the caller site. That
> *only* can be done by experts of the subsystem.
> 
> I think if they want to get benifit from lockdep, they have no
> choice but to assign classes manually with the domain knowledge,
> or use *lockdep_set_novalidate_class()* to invalidate locks
> making the developers annoyed and not want to use the checking
> for them.

Lockdep's no_validate class is used when the locking patterns are too
complicated for lockdep to understand.  Basically, it tells lockdep to
ignore those locks.

The device core uses that class.  The tree of struct devices, each with
its own lock, gets used in many different and complicated ways.  
Lockdep can't understand this -- it doesn't have the ability to
represent an arbitrarily deep hierarchical tree of locks -- so we tell
it to ignore the device locks.

It sounds like XFS may need to do the same thing with its semaphores.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection

2017-12-08 Thread Daniel Borkmann

On 12/06/2017 05:12 PM, Josef Bacik wrote:
> Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro.  I went to
> figure out why the compiler didn't catch it and it's because it was not used
> anywhere.  I had copied it from the trace blacklist code without understanding
> where it was used as cscope didn't find the original macro I was looking for, 
> so
> I assumed it was some voodoo and left it in place.  Turns out cscope failed me
> and I didn't need the macro at all, the trace blacklist thing I was looking at
> was for marking assembly functions as blacklisted and I have no intention of
> marking assembly functions as error injectable at the moment.
> 
> v7->v8:
> - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.

The series doesn't apply cleanly to the bpf-next tree, so one last respin with
a rebase would unfortunately still be required, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Lockdep is less useful than it was

2017-12-08 Thread Theodore Ts'o

On Thu, Dec 07, 2017 at 02:38:03PM -0800, Matthew Wilcox wrote:
> I think it was a mistake to force these on for everybody; they have a
> much higher false-positive rate than the rest of lockdep, so as you say
> forcing them on leads to fewer people using *any* of lockdep.
> 
> The bug you're hitting isn't Byungchul's fault; it's an annotation
> problem.  The same kind of annotation problem that we used to have with
> dozens of other places in the kernel which are now fixed.

The question is whose responsibility is it to annotate the code?  On
another thread it was suggested it was ext4's responsibility to add
annotations to avoid the false positives --- never the mind the fact
that every single file system is going to have add annotations.

Also note that the documentation for how to add annotations is
***horrible***.  It's mostly, "try to figure out how other people
added magic cargo cult code which is not well defined (look at the
definitions of lockdep_set_class, lockdep_set_class_and_name,
lockdep_set_class_and_subclass, and lockdep_set_subclass, and weep) in
other subsystems and hope and pray it works for you."

And the explanation when there are failures, either false positives,
or not, are completely opaque.  For example:

[   16.190198] ext4lazyinit/648 is trying to acquire lock:
[   16.191201]  ((gendisk_completion)1 << part_shift(NUMA_NO_NODE)){+.+.}, at: 
[<8a1ebe9d>] wait_for_completion_io+0x12/0x20

Just try to tell me that:

((gendisk_completion)1 << part_shift(NUMA_NO_NODE)){+.+.}

is human comprehensible with a straight face.  And since the messages
don't even include the subclass/class/name key annotations, as lockdep
tries to handle things that are more and more complex, I'd argue it
has already crossed the boundary where unless you are a lockdep
developer, good luck trying to understand what is going on or how to
add annotations.

So if you are adding complexity to the kernel with the argument,
"lockdep will save us", I'm with Dave --- it's just not a believable
argument.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Improve btrfs_search_slot description

2017-12-08 Thread Nikolay Borisov

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.c | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 880f4f693263..1f001d31bda8 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2653,18 +2653,30 @@ int btrfs_find_item(struct btrfs_root *fs_root, struct 
btrfs_path *path,
return 0;
 }
 
-/*
- * look for key in the tree.  path is filled in with nodes along the way
- * if key is found, we return zero and you can find the item in the leaf
- * level of the path (level 0)
+/* btrfs_search_slot - look for a key in a tree and perform necessary
+ * modifications to preserve tree invariants.
+ *
+ * @trans: Handle of transaction, used when modifying the tree
+ * @p: Holds all btree nodes along the search path
+ * @root:  The root node of the tree
+ * @key:   The key we are looking for
+ * @ins_len:   Indicates purpose of search, for inserts it is 1, for
+ * deletions it's -1. 0 for plain searches
+ * @cow:   boolean should CoW operations be performed. Must always be 1
+ * when modifying the tree.
+ *
+ * If @ins_len > 0, nodes and leaves will be split as we walk down the tree.
+ * If @ins_len < 0, nodes will be merged as we walk down the tree (if possible)
+ *
+ * If @key is found, 0 is returned and you can find the item in the leaf level
+ * of the path (level 0)
  *
- * If the key isn't found, the path points to the slot where it should
- * be inserted, and 1 is returned.  If there are other errors during the
- * search a negative error number is returned.
+ * If @key isn't found, 1 is returned and the leaf level of the path (level 0)
+ * points to the slot where it should be inserted
+ * be inserted, and 1 is returned.
  *
- * if ins_len > 0, nodes and leaves will be split as we walk down the
- * tree.  if ins_len < 0, nodes will be merged as we walk down the tree (if
- * possible)
+ * If an error is encountered while searching the tree a negative error number
+ * is returned
  */
 int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root 
*root,
  const struct btrfs_key *key, struct btrfs_path *p,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Rename bin_search -> btrfs_bin_search

2017-12-08 Thread Nikolay Borisov

Currently there are 2 function doing binary search on btrfs nodes:
bin_search and btrfs_bin_search. The latter being a simple wrapper for
the former. So eliminate the wrapper and just rename bin_search to
btrfs_bin_search. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 1e74cf826532..880f4f693263 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1807,8 +1807,8 @@ static noinline int generic_bin_search(struct 
extent_buffer *eb,
  * simple bin_search frontend that does the right thing for
  * leaves vs nodes
  */
-static int bin_search(struct extent_buffer *eb, const struct btrfs_key *key,
- int level, int *slot)
+int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key,
+int level, int *slot)
 {
if (level == 0)
return generic_bin_search(eb,
@@ -1824,12 +1824,6 @@ static int bin_search(struct extent_buffer *eb, const 
struct btrfs_key *key,
  slot);
 }
 
-int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key,
-int level, int *slot)
-{
-   return bin_search(eb, key, level, slot);
-}
-
 static void root_add_used(struct btrfs_root *root, u32 size)
 {
spin_lock(&root->accounting_lock);
@@ -2614,7 +2608,7 @@ static int key_search(struct extent_buffer *b, const 
struct btrfs_key *key,
  int level, int *prev_cmp, int *slot)
 {
if (*prev_cmp != 0) {
-   *prev_cmp = bin_search(b, key, level, slot);
+   *prev_cmp = btrfs_bin_search(b, key, level, slot);
return *prev_cmp;
}
 
@@ -5175,7 +5169,7 @@ int btrfs_search_forward(struct btrfs_root *root, struct 
btrfs_key *min_key,
while (1) {
nritems = btrfs_header_nritems(cur);
level = btrfs_header_level(cur);
-   sret = bin_search(cur, min_key, level, &slot);
+   sret = btrfs_bin_search(cur, min_key, level, &slot);
 
/* at the lowest level, we're done, setup the path and exit */
if (level == path->lowest_level) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain




On 12/08/2017 08:51 PM, Austin S. Hemmelgarn wrote:

On 2017-12-08 02:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.
I like the concept, and actually think this should be default behavior 
on a filesystem that's already mounted (we fix other errors, why not 
SB's), 



but I don't think it should be default behavior at mount time for 
the reasons Qu has outlined (picking up old BTRFS SB's after 
reformatting is bad).


 If the primary SB is not btrfs, then unless forced with -t btrfs
 option, this mount won't be in btrfs at all. That means it goes to
 the respective FS by default.

 If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs
 is not overwriting all the copies of SB then its a mkfs.btrfs bug.

 Which means there is already mount -t option to redirect to the FS
 module that is needed by the user, when primary SB is corrupted.
 We should use this feature.

Thanks, Anand


However, I do think it's useful to be able to ask 
for this behavior on mount, so that you don't need to fight with the 
programs to get a filesystem to mount when the first SB is missing 
(perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).


Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c |  8 +++-
  fs/btrfs/volumes.c | 10 +++---
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
block_device *bdev)

   * So, we need to add a special mount option to scan for
   * later supers, using BTRFS_SUPER_MIRROR_MAX instead
   */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
  ret = btrfs_read_dev_one_super(bdev, i, &bh);
  if (ret)
  continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct 
btrfs_fs_info *fs_info)

  ret = -EINVAL;
  }
+#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
  if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
  btrfs_err(fs_info, "super offset mismatch %llu != %u",
    btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
  ret = -EINVAL;
  }
+#endif
  /*
   * Obvious sys_chunk_array corruptions, it must hold at least 
one key

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, 
fmode_t flags, void *holder,

  {
  struct btrfs_super_block *disk_super;
  struct block_device *bdev;
-    struct page *page;
+    struct buffer_head *sb_bh;
  int ret = -EINVAL;
  u64 devid;
  u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, 
fmode_t flags, void *holder,

  goto error;
  }
-    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
+    sb_bh = btrfs_read_dev_super(bdev);
+    if (IS_ERR(sb_bh)) {
+    ret = PTR_ERR(sb_bh);
  goto error_bdev_put;
+    }
+    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
  devid = btrfs_stack_device_id(&disk_super->dev_item);
  transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, 
fmode_t flags, void *holder,

  if (!ret && fs_devices_ret)
  (*fs_devices_ret)->total_devices = total_devices;
-    btrfs_release_disk_super(page);
+    brelse(sb_bh);
  error_bdev_put:
  blkdev_put(bdev, flags);



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] btrfs: sink extent_write_full_page tree argument

2017-12-08 Thread Nikolay Borisov

The tree argument passed to extent_write_full_page is referenced from
the page being passed to the same function. Since we already have
enough information to get the reference, remove the function parameter.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent_io.c | 5 ++---
 fs/btrfs/extent_io.h | 3 +--
 fs/btrfs/inode.c | 4 +---
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c0b2bf65d6b0..6cd3da16f114 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4056,13 +4056,12 @@ static noinline void flush_write_bio(void *data)
flush_epd_write_bio(epd);
 }
 
-int extent_write_full_page(struct extent_io_tree *tree, struct page *page,
- struct writeback_control *wbc)
+int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 {
int ret;
struct extent_page_data epd = {
.bio = NULL,
-   .tree = tree,
+   .tree = &BTRFS_I(page->mapping->host)->io_tree,
.extent_locked = 0,
.sync_io = wbc->sync_mode == WB_SYNC_ALL,
};
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index f2cbabb2306a..db2558b0cad4 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -403,8 +403,7 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 
start,
  struct extent_state **cached_state);
 int extent_invalidatepage(struct extent_io_tree *tree,
  struct page *page, unsigned long offset);
-int extent_write_full_page(struct extent_io_tree *tree, struct page *page,
- struct writeback_control *wbc);
+int extent_write_full_page(struct page *page, struct writeback_control *wbc);
 int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
  int mode);
 int extent_writepages(struct extent_io_tree *tree,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e61d549bfc36..e79154f032e3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8862,7 +8862,6 @@ int btrfs_readpage(struct file *file, struct page *page)
 
 static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
 {
-   struct extent_io_tree *tree;
struct inode *inode = page->mapping->host;
int ret;
 
@@ -8881,8 +8880,7 @@ static int btrfs_writepage(struct page *page, struct 
writeback_control *wbc)
redirty_page_for_writepage(wbc, page);
return AOP_WRITEPAGE_ACTIVATE;
}
-   tree = &BTRFS_I(page->mapping->host)->io_tree;
-   ret = extent_write_full_page(tree, page, wbc);
+   ret = extent_write_full_page(page, wbc);
btrfs_add_delayed_iput(inode);
return ret;
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs: sink extent_write_locked_range tree parameter

2017-12-08 Thread Nikolay Borisov

This function is called only from submit_compressed_extents and the
io tree being passed is always that of the inode. But we are also
passing the inode, so just move getting the io tree pointer in
extent_write_locked_range to simplify the signature.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent_io.c | 5 +++--
 fs/btrfs/extent_io.h | 4 ++--
 fs/btrfs/inode.c | 4 ++--
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 16ae832bdb5d..c0b2bf65d6b0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4073,11 +4073,12 @@ int extent_write_full_page(struct extent_io_tree *tree, 
struct page *page,
return ret;
 }
 
-int extent_write_locked_range(struct extent_io_tree *tree, struct inode *inode,
- u64 start, u64 end, int mode)
+int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
+ int mode)
 {
int ret = 0;
struct address_space *mapping = inode->i_mapping;
+   struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
struct page *page;
unsigned long nr_pages = (end - start + PAGE_SIZE) >>
PAGE_SHIFT;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c28f5ef88f42..f2cbabb2306a 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -405,8 +405,8 @@ int extent_invalidatepage(struct extent_io_tree *tree,
  struct page *page, unsigned long offset);
 int extent_write_full_page(struct extent_io_tree *tree, struct page *page,
  struct writeback_control *wbc);
-int extent_write_locked_range(struct extent_io_tree *tree, struct inode *inode,
- u64 start, u64 end, int mode);
+int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
+ int mode);
 int extent_writepages(struct extent_io_tree *tree,
  struct address_space *mapping,
  struct writeback_control *wbc);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 18c09cca3be7..e61d549bfc36 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -770,8 +770,8 @@ static noinline void submit_compressed_extents(struct inode 
*inode,
 * all those pages down to the drive.
 */
if (!page_started && !ret)
-   extent_write_locked_range(io_tree,
- inode, async_extent->start,
+   extent_write_locked_range(inode,
+ async_extent->start,
  async_extent->start +
  async_extent->ram_size - 1,
  WB_SYNC_ALL);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/10] btrfs: avoid to access bvec table directly for a cloned bio

2017-12-08 Thread Ming Lei

Commit 17347cec15f919901c90(Btrfs: change how we iterate bios in endio)
mentioned that for dio the submitted bio may be fast cloned, we
can't access the bvec table directly for a cloned bio, so use
bio_get_first_bvec() to retrieve the 1st bvec.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Cc: Liu Bo 
Reviewed-by: Liu Bo 
Acked: David Sterba 
Signed-off-by: Ming Lei 
---
 fs/btrfs/inode.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d28b66019d54..7f23c1993d24 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8013,6 +8013,7 @@ static blk_status_t dio_read_error(struct inode *inode, 
struct bio *failed_bio,
int segs;
int ret;
blk_status_t status;
+   struct bio_vec bvec;
 
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
@@ -8028,8 +8029,9 @@ static blk_status_t dio_read_error(struct inode *inode, 
struct bio *failed_bio,
}
 
segs = bio_segments(failed_bio);
+   bio_get_first_bvec(failed_bio, &bvec);
if (segs > 1 ||
-   (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
+   (bvec.bv_len > btrfs_inode_sectorsize(inode)))
read_mode |= REQ_FAILFAST_DEV;
 
isector = start - btrfs_io_bio(failed_bio)->logical;
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/10] btrfs: avoid access to .bi_vcnt directly

2017-12-08 Thread Ming Lei

BTRFS uses bio->bi_vcnt to figure out page numbers, this way becomes not
correct once we start to enable multipage bvec.

So use bio_nr_pages() to do that instead.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Acked-by: David Sterba 
Signed-off-by: Ming Lei 
---
 fs/btrfs/extent_io.c | 9 +
 fs/btrfs/extent_io.h | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6f6669f93beb..27795bf2507c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2257,7 +2257,7 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 
start, u64 end,
return 0;
 }
 
-bool btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
+bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
   struct io_failure_record *failrec, int failed_mirror)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2281,7 +2281,7 @@ bool btrfs_check_repairable(struct inode *inode, struct 
bio *failed_bio,
 *  a) deliver good data to the caller
 *  b) correct the bad sectors on disk
 */
-   if (failed_bio->bi_vcnt > 1) {
+   if (failed_bio_pages > 1) {
/*
 * to fulfill b), we need to know the exact failing sectors, as
 * we don't want to rewrite any more than the failed ones. thus,
@@ -2374,6 +2374,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
int read_mode = 0;
blk_status_t status;
int ret;
+   unsigned failed_bio_pages = bio_nr_pages(failed_bio);
 
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
@@ -2381,13 +2382,13 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 phy_offset,
if (ret)
return ret;
 
-   if (!btrfs_check_repairable(inode, failed_bio, failrec,
+   if (!btrfs_check_repairable(inode, failed_bio_pages, failrec,
failed_mirror)) {
free_io_failure(failure_tree, tree, failrec);
return -EIO;
}
 
-   if (failed_bio->bi_vcnt > 1)
+   if (failed_bio_pages > 1)
read_mode |= REQ_FAILFAST_DEV;
 
phy_offset >>= inode->i_sb->s_blocksize_bits;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 93dcae0c3183..20854d63c75b 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -540,7 +540,7 @@ void btrfs_free_io_failure_record(struct btrfs_inode 
*inode, u64 start,
u64 end);
 int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
struct io_failure_record **failrec_ret);
-bool btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
+bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
struct io_failure_record *failrec, int fail_mirror);
 struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio 
*failed_bio,
struct io_failure_record *failrec,
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo



On 2017年12月08日 21:05, Austin S. Hemmelgarn wrote:
> On 2017-12-08 07:59, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote:
>>> On 2017-12-08 02:57, Anand Jain wrote:
 -EXPERIMENTAL-
 As of now when primary SB fails we won't self heal and would fail
 mount,
 this is an experimental patch which thinks why not go and read backup
 copy.
>>> I like the concept, and actually think this should be default behavior
>>> on a filesystem that's already mounted (we fix other errors, why not
>>> SB's), but I don't think it should be default behavior at mount time for
>>> the reasons Qu has outlined (picking up old BTRFS SB's after
>>> reformatting is bad).  However, I do think it's useful to be able to ask
>>> for this behavior on mount, so that you don't need to fight with the
>>> programs to get a filesystem to mount when the first SB is missing
>>> (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).
>>
>> Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount
>> option than do it automatically.
> I still think there should be an option to do automatic detection (it's
> not particularly hard, and it's not likely to do the wrong thing in most
> cases), but being able to explicitly specify a particular superblock for
> a mount is definitely a step in the right direction.

usebackupsuper <- Auto
usebackupsuper=n <- Manual
usebackupsuper=n,m <- Manual multi, in given order
(The last one seems a little overkilled though)

Thanks,
Qu

>>
>> Thanks,
>> Qu
>>

 Signed-off-by: Anand Jain 
 ---
    fs/btrfs/disk-io.c |  8 +++-
    fs/btrfs/volumes.c | 10 +++---
    2 files changed, 14 insertions(+), 4 deletions(-)

 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 9b20c1f3563b..a791b8dfe8a8 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
 block_device *bdev)
     * So, we need to add a special mount option to scan for
     * later supers, using BTRFS_SUPER_MIRROR_MAX instead
     */
 -    for (i = 0; i < 1; i++) {
 +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
    ret = btrfs_read_dev_one_super(bdev, i, &bh);
    if (ret)
    continue;
 @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
 btrfs_fs_info *fs_info)
    ret = -EINVAL;
    }
    +#if 0
 +    /*
 + * Need a way to check for any copy of SB, as its not a
 + * strong check, just ignore this for now.
 + */
    if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
    btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
    ret = -EINVAL;
    }
 +#endif
      /*
     * Obvious sys_chunk_array corruptions, it must hold at least
 one key
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 9fa2539a8493..f368db94d62b 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
 fmode_t flags, void *holder,
    {
    struct btrfs_super_block *disk_super;
    struct block_device *bdev;
 -    struct page *page;
 +    struct buffer_head *sb_bh;
    int ret = -EINVAL;
    u64 devid;
    u64 transid;
 @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
 fmode_t flags, void *holder,
    goto error;
    }
    -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
 +    sb_bh = btrfs_read_dev_super(bdev);
 +    if (IS_ERR(sb_bh)) {
 +    ret = PTR_ERR(sb_bh);
    goto error_bdev_put;
 +    }
 +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
      devid = btrfs_stack_device_id(&disk_super->dev_item);
    transid = btrfs_super_generation(disk_super);
 @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
 fmode_t flags, void *holder,
    if (!ret && fs_devices_ret)
    (*fs_devices_ret)->total_devices = total_devices;
    -    btrfs_release_disk_super(page);
 +    brelse(sb_bh);
      error_bdev_put:
    blkdev_put(bdev, flags);

> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v5 0/3] Add cli and ioctl to deregister devices

2017-12-08 Thread Austin S. Hemmelgarn


On 2017-12-07 21:17, Duncan wrote:

Anand Jain posted on Fri, 08 Dec 2017 08:51:43 +0800 as excerpted:


On 12/07/2017 10:52 PM, Austin S. Hemmelgarn wrote:

On 2017-12-07 09:36, Anand Jain wrote:

Add ability to deregister a or all devices. I have named this sub cmd
as deregister, but I am open to your suggestions.

Being a bit picky here, but from the perspective of a native speaker of
American English, I would say that 'deregister' sounds rather synthetic
and somewhat harsh and alien.

Given that, as odd as it sounds, I think 'ignore' might be a more user
friendly name for the sub-command.  It accurately describes what the
command is doing (telling the kernel to ignore the device), and it's a
lot less alien sounding than 'deregister'.

If you're set on having it be based on the word 'register', I would
suggest changing it to 'unregister', as I think that sounds more
natural than 'deregister'.

Additionally, if you do continue with 'deregister' or go with
'unregister' as the name though, I would suggest adding 'register' as a
synonym for the 'scan' sub-command to keep things reasonably
symmetrical.


   A look up on unregister lead me to use deregister as more appropriate.
   Anyway I won't bother much, I will be go be suggestions, and how about
   unscan, since scan is already there. OR how about purge.


This is a bikeshed I think I have a beautiful color suggestion for! =:^)

Seriously, the normal purpose of scanning is to record particular details
of what was seen during the scan, based on some desired scanning criteria.

So what do you call it when you need to "forget" the information you
scanned for?

What about simply "forget", btrfs device forget ?  That sounds the most
natural to me, certainly far more so than the apparently freshly created
word "unscan", tho that would certainly deliver the meaning.
I actually like forget even better than ignore, it's more descriptive 
and a bit more obvious.  'purge' might make sense if you're always 
dumping the entire list, but sounds very absolute and dangerous (and 
almost universally has a very negative connotation).


And FWIW, "deregister", just.. no.  (I too would vote unregister if we're
sticking with the register root-word, but I have a feeling that may be a
regional/en-US preference and some other English regional variants may
find deregister is the less terrible to their ear.  That may explain
whatever commentary under unregister led you to go with deregister.)
"Deregister" sounds like something a computer programmer might say to
describe the process, but I can't imagine a "normal" person using the
word, except possibly in the context of removing someone from the voter
rolls (where one "registers" to vote, so "deregister" could be a a
reasonably natural term for reversing that) or the like.
Actually, in that case it's usually 'unregister' as well, at least in 
American English.  'deregister' is indeed largely a programming term, 
but even then it's not unusual for people to just use 'unregister' (the 
distinction that I usually use myself is that 'unregister' refers to a 
voluntary process initiated by the entity that initially registered 
whatever it was (which is the common case, and technically what this 
is), whereas 'deregister' is usually an involuntary process triggered by 
a third party (which is rare outside of things like forced kernel module 
removal or crashes)).  It's probably worth noting that both uses are 
uncommon enough that the standard American English dictionaries in 
Thunderbird and aspell lack both words (though Thunderbird does have 
'unregistered').


As far as regional variance, I've checked with a couple of friends from 
Australia, Canada, Denmark, and the UK, and all of them also felt that 
'deregister' sounded less natural than 'unregister'.  The reality is 
that pretty much regardless of the particular regional dialect, usage 
patterns for a given negation prefix tend to be reasonably consistent, 
namely:
* de- is usually used in technical verbs and not much else (desensitize, 
desaturate, deauthorize, delineate, etc)
* un- is usually used in generic terms, including verbs, adverbs, and 
adjectives, and is often the most common choice for creating 'new' words 
by negating an existing verb, adverb, or adjective (unknown, unsightly, 
unfriendly, etc)
* anti- is used exclusively for negating nouns, traditionally for 
concepts, but more recently for any noun (antithesis, antimatter, 
antidisestablishmentarianism, etc)
* dis- is used for a handful of very specific cases usually referring to 
behavior or personality, and sometimes culture (dishonest, disingenuous, 
etc)
* non- is used almost exclusively for adjectives and adverbs, but is 
less frequent than un- (nonsensical, nonplussed, etc)
* il- rarely used, typically only used for negating something describing 
the state of an action, and almost exclusively with words that begin 
with 'l', sometimes extended to ill to add a negative connotation to an 
existing word (illo

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Austin S. Hemmelgarn


On 2017-12-08 07:59, Qu Wenruo wrote:



On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote:

On 2017-12-08 02:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.

I like the concept, and actually think this should be default behavior
on a filesystem that's already mounted (we fix other errors, why not
SB's), but I don't think it should be default behavior at mount time for
the reasons Qu has outlined (picking up old BTRFS SB's after
reformatting is bad).  However, I do think it's useful to be able to ask
for this behavior on mount, so that you don't need to fight with the
programs to get a filesystem to mount when the first SB is missing
(perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).


Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount
option than do it automatically.
I still think there should be an option to do automatic detection (it's 
not particularly hard, and it's not likely to do the wrong thing in most 
cases), but being able to explicitly specify a particular superblock for 
a mount is definitely a step in the right direction.


Thanks,
Qu



Signed-off-by: Anand Jain 
---
   fs/btrfs/disk-io.c |  8 +++-
   fs/btrfs/volumes.c | 10 +++---
   2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
block_device *bdev)
    * So, we need to add a special mount option to scan for
    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
    */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
   ret = btrfs_read_dev_one_super(bdev, i, &bh);
   if (ret)
   continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
btrfs_fs_info *fs_info)
   ret = -EINVAL;
   }
   +#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
   btrfs_err(fs_info, "super offset mismatch %llu != %u",
     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
   ret = -EINVAL;
   }
+#endif
     /*
    * Obvious sys_chunk_array corruptions, it must hold at least
one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   {
   struct btrfs_super_block *disk_super;
   struct block_device *bdev;
-    struct page *page;
+    struct buffer_head *sb_bh;
   int ret = -EINVAL;
   u64 devid;
   u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   goto error;
   }
   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
+    sb_bh = btrfs_read_dev_super(bdev);
+    if (IS_ERR(sb_bh)) {
+    ret = PTR_ERR(sb_bh);
   goto error_bdev_put;
+    }
+    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
     devid = btrfs_stack_device_id(&disk_super->dev_item);
   transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   if (!ret && fs_devices_ret)
   (*fs_devices_ret)->total_devices = total_devices;
   -    btrfs_release_disk_super(page);
+    brelse(sb_bh);
     error_bdev_put:
   blkdev_put(bdev, flags);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain




On 12/08/2017 08:12 PM, Nikolay Borisov wrote:



On  8.12.2017 12:33, Anand Jain wrote:



On 12/08/2017 04:40 PM, Nikolay Borisov wrote:



On  8.12.2017 09:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.

Signed-off-by: Anand Jain 
---
   fs/btrfs/disk-io.c |  8 +++-
   fs/btrfs/volumes.c | 10 +++---
   2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
block_device *bdev)
    * So, we need to add a special mount option to scan for
    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
    */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
   ret = btrfs_read_dev_one_super(bdev, i, &bh);
   if (ret)
   continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
btrfs_fs_info *fs_info)
   ret = -EINVAL;
   }
   +#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
   btrfs_err(fs_info, "super offset mismatch %llu != %u",
     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
   ret = -EINVAL;
   }
+#endif
     /*
    * Obvious sys_chunk_array corruptions, it must hold at least
one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   {
   struct btrfs_super_block *disk_super;
   struct block_device *bdev;
-    struct page *page;
+    struct buffer_head *sb_bh;
   int ret = -EINVAL;
   u64 devid;
   u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   goto error;
   }
   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
+    sb_bh = btrfs_read_dev_super(bdev);


This patch prompts another question: why do we have a page-based and a
bufferhead-based interface to reading the super block?


  Right. we need to know that. Sorry I just saw this.


FWIW unless we explicitly need a per-block state tracking (which I don't
think we do) we should ideally switch to page-based mechanism. Buffer
heads are considered deprecated. Also the only reason why we do have
btrfsic_submit_bh is for the superblock bufer head. So potentially by
removing the only requirement in the kernel where bh are used we can
simplify the btrfsic code as well . Just something to keep in mind.


 Thanks. These are related to the other email thread [1]
 [1]
 [Question] Two variants of SB reads can we move into bio read instead ?
 We can follow up there. ? Could we ? That cleanup is due for a long
 time now.

Thanks, Anand



  I have a very old patch to converge them and clean this up, but haven't
  sent it because there are some missing information on why it ended up
  like that in the first place.

Thanks, Anand



I did prototype
switching the bufferheads to page based but the resulting code wasn't
any cleaner. I believe there is also open the question what happens when
btrfs is run on a 64k page machine. I.e. we are going to read a single
page and the sb is going to be in the first 4k but what about the rest
60, they could potentially contain other metadata. The page will have to
be freed asap so as not to peg the neighboring metadata?






+    if (IS_ERR(sb_bh)) {
+    ret = PTR_ERR(sb_bh);
   goto error_bdev_put;
+    }
+    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
     devid = btrfs_stack_device_id(&disk_super->dev_item);
   transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   if (!ret && fs_devices_ret)
   (*fs_devices_ret)->total_devices = total_devices;
   -    btrfs_release_disk_super(page);
+    brelse(sb_bh);
     error_bdev_put:
   blkdev_put(bdev, flags);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo



On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote:
> On 2017-12-08 02:57, Anand Jain wrote:
>> -EXPERIMENTAL-
>> As of now when primary SB fails we won't self heal and would fail mount,
>> this is an experimental patch which thinks why not go and read backup
>> copy.
> I like the concept, and actually think this should be default behavior
> on a filesystem that's already mounted (we fix other errors, why not
> SB's), but I don't think it should be default behavior at mount time for
> the reasons Qu has outlined (picking up old BTRFS SB's after
> reformatting is bad).  However, I do think it's useful to be able to ask
> for this behavior on mount, so that you don't need to fight with the
> programs to get a filesystem to mount when the first SB is missing
> (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).

Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount
option than do it automatically.

Thanks,
Qu

>>
>> Signed-off-by: Anand Jain 
>> ---
>>   fs/btrfs/disk-io.c |  8 +++-
>>   fs/btrfs/volumes.c | 10 +++---
>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9b20c1f3563b..a791b8dfe8a8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>> block_device *bdev)
>>    * So, we need to add a special mount option to scan for
>>    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>    */
>> -    for (i = 0; i < 1; i++) {
>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>   if (ret)
>>   continue;
>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>> btrfs_fs_info *fs_info)
>>   ret = -EINVAL;
>>   }
>>   +#if 0
>> +    /*
>> + * Need a way to check for any copy of SB, as its not a
>> + * strong check, just ignore this for now.
>> + */
>>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>   ret = -EINVAL;
>>   }
>> +#endif
>>     /*
>>    * Obvious sys_chunk_array corruptions, it must hold at least
>> one key
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9fa2539a8493..f368db94d62b 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>> fmode_t flags, void *holder,
>>   {
>>   struct btrfs_super_block *disk_super;
>>   struct block_device *bdev;
>> -    struct page *page;
>> +    struct buffer_head *sb_bh;
>>   int ret = -EINVAL;
>>   u64 devid;
>>   u64 transid;
>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>> fmode_t flags, void *holder,
>>   goto error;
>>   }
>>   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>> +    sb_bh = btrfs_read_dev_super(bdev);
>> +    if (IS_ERR(sb_bh)) {
>> +    ret = PTR_ERR(sb_bh);
>>   goto error_bdev_put;
>> +    }
>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>     devid = btrfs_stack_device_id(&disk_super->dev_item);
>>   transid = btrfs_super_generation(disk_super);
>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>> fmode_t flags, void *holder,
>>   if (!ret && fs_devices_ret)
>>   (*fs_devices_ret)->total_devices = total_devices;
>>   -    btrfs_release_disk_super(page);
>> +    brelse(sb_bh);
>>     error_bdev_put:
>>   blkdev_put(bdev, flags);
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo



On 2017年12月08日 20:41, Anand Jain wrote:
> 
> 
> On 12/08/2017 08:02 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 19:48, Anand Jain wrote:
>>>
>>>
>>> On 12/08/2017 07:01 PM, Qu Wenruo wrote:


 On 2017年12月08日 18:39, Anand Jain wrote:
>
>
> On 12/08/2017 04:17 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 15:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail
>>> mount,
>>> this is an experimental patch which thinks why not go and read
>>> backup
>>> copy.
>>
>> Just curious about in which real world case that backup super
>> block can
>> help.
>> At least from what I see in mail list, only few cases where backup
>> super
>> helps.
>
>    Theoretical design helps. I ended up in this situation though. And
>    ext4 has -o sb flag to manage this part. When we can expect EIO on
>    any part of the disk block why not on the LBA which contains
> primary
>    SB. And should we fail the mount for that reason ? No.

 And how do you ensure it's a btrfs?
>>>
>>>   Hmm. You mean outside of btrfs ? I did experiment with wipe and then
>>>   using /etc/fstab to mount, and it did lead to btrfs, is that your
>>>   concern that it shouldn't have been. That looked surprising to me as
>>>   well, but then problem points at wipefs instead.
>>
>> It's closer, but still doesn't reach the point.
>> It can be a mkfs, other than wipefs.
>>
>> It's can be another case like:
>>
>> One user used btrfs for a while
>> But bugs made him/her unhappy, and he/she turned to use xfs (whatever
>> the fs is) instead.
>>
>> While he/she forgot to change its fstab, and rebooted the system.
>>
>> And suddenly, it's btrfs again!
>> What a surprise.
>>
> 
>  I have worked quite a lot on defining the problem statements before
>  btrfs. Here the user has to be blamed to have forgotten to update
>  the fstab. I don't understand why should we workaround in btrfs for
>  the reason that user may miss to update fstab. We don't design
>  a stuff that like that, we design for the purpose, here backup SB
>  is for the purpose that we all know, if we don't use backup SB, then
>  its an incomplete design.

Then implement something like ext*, using explicit mount option sb=n.

And since it's already called "backup", it's something used for
recovery, not for normal operation.
We already have rescue tool to recover sb from backup, so it's not a
incomplete design.

Personally speaking, I prefer the current way and leave backup just as
backup.
You can my example of a user fault, but the real world needs us to
handle user's fault.

Or there will be no need for mkfs -f options or rm --no-preserve-root
options at all.

Thanks,
Qu

> 
>>>

>
>> Despite that self super heal seems good, although I agree with
>> David, we
>> need a weaker but necessary check (magic and fsid from primary
>> super?)
>> to ensure it's a valid btrfs before we use the backup supers.
>
>    Of course, we already have btrfs_check_super_valid() to verify
> the SB,
>    I don't understand why - how do we verify the SB should be the
> point of
>    concern here, at all.

 The point here is, to distinguish an old and invalid btrfs (some other
 valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
 primary fs.
>>>
>>>   Ok. When you check all the SBs you would pick the one which has passed
>>>   btrfs_check_super_valid() and has highest generation. Did I ans your
>>>   concern ? If a SB does not pass btrfs_check_super_valid() its not a
>>>   valid btrfs SB at all.
>>
>> No, not really.
>>
>> What if the first SB is a XFS one or even a fs you didn't ever hear?
>>
>> Skip it and use the 2nd one?
>> This filesystem is not even btrfs anymore.
> 
>  There is no problem is user does the correct thing that is to
>  update the fstab. OR using a complete mount command. If user
>  using -t btrfs OR unupdated btrfs then it indicates his intentions.
> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>>
>>>
 This the main concern here.
 The difference between recovery and recognizing a bad btrfs is quite
 important.
>>>
>>>   btrfs_check_super_valid() is already doing that right ? The point here
>>>   is, should we use the backup SB when btrfs_check_super_valid()
>>> fails on
>>>   primary SB.
>>>
>>> Thanks, Anand
>>>
 Thanks,
 Qu

>
> Thanks, Anand
>
>> Thanks,
>> Qu
>>
>>
>>>
>>> Signed-off-by: Anand Jain 
>>> ---
>>>     fs/btrfs/disk-io.c |  8 +++-
>>>     fs/btrfs/volumes.c | 10 +++---
>>>     2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head
>>

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Austin S. Hemmelgarn


On 2017-12-08 02:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.
I like the concept, and actually think this should be default behavior 
on a filesystem that's already mounted (we fix other errors, why not 
SB's), but I don't think it should be default behavior at mount time for 
the reasons Qu has outlined (picking up old BTRFS SB's after 
reformatting is bad).  However, I do think it's useful to be able to ask 
for this behavior on mount, so that you don't need to fight with the 
programs to get a filesystem to mount when the first SB is missing 
(perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).


Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c |  8 +++-
  fs/btrfs/volumes.c | 10 +++---
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
block_device *bdev)
 * So, we need to add a special mount option to scan for
 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 */
-   for (i = 0; i < 1; i++) {
+   for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
ret = btrfs_read_dev_one_super(bdev, i, &bh);
if (ret)
continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
ret = -EINVAL;
}
  
+#if 0

+   /*
+* Need a way to check for any copy of SB, as its not a
+* strong check, just ignore this for now.
+*/
if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
ret = -EINVAL;
}
+#endif
  
  	/*

 * Obvious sys_chunk_array corruptions, it must hold at least one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
  {
struct btrfs_super_block *disk_super;
struct block_device *bdev;
-   struct page *page;
+   struct buffer_head *sb_bh;
int ret = -EINVAL;
u64 devid;
u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
goto error;
}
  
-	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))

+   sb_bh = btrfs_read_dev_super(bdev);
+   if (IS_ERR(sb_bh)) {
+   ret = PTR_ERR(sb_bh);
goto error_bdev_put;
+   }
+   disk_super = (struct btrfs_super_block *) sb_bh->b_data;
  
  	devid = btrfs_stack_device_id(&disk_super->dev_item);

transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
if (!ret && fs_devices_ret)
(*fs_devices_ret)->total_devices = total_devices;
  
-	btrfs_release_disk_super(page);

+   brelse(sb_bh);
  
  error_bdev_put:

blkdev_put(bdev, flags);



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Question] Two variants of SB reads can we move into bio read instead ?

2017-12-08 Thread Nikolay Borisov



On  8.12.2017 12:27, Anand Jain wrote:
> 
> David,
> 
>  There are two variants of SB read, one using the device cache [1]
>  and the other buffer head [2].
> 
>  [1] btrfs_read_disk_super()
>  [2] btrfs_read_dev_super()
> 
>  Patch, 6f60cbd3ae442cb35861bb522f388db123d42ec1
>   (btrfs: access superblock via pagecache in scan_one_device)
>  Embraced device caches to avoid blocksize set and thus avoid device
>  drop cache. But however its in the context of starting up with a new
>  mount and in practice, would it really matter if the device cache is
>  dropped in the context of mount, we any way drop it when the device
>  is successfully mounted though. I don't understand the actual problem
>  here.
> 
>  Further [2] is still using buffer head, which works very well for us
>  in this context, any idea if there is any suggestion to move it to
>  newer bio read instead ?

the bh interface eventually goes:

btrfs_read_dev_one_super
 __bread
  __bread_gfp
   __getblk_gfp
__getblk_slow
 __find_get_block
  __find_get_block_slow
find_get_page_flags

Which is again using the pagecache, albeit a bit more convoluted.


> 
> Thanks, Anand
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain




On 12/08/2017 08:02 PM, Qu Wenruo wrote:



On 2017年12月08日 19:48, Anand Jain wrote:



On 12/08/2017 07:01 PM, Qu Wenruo wrote:



On 2017年12月08日 18:39, Anand Jain wrote:



On 12/08/2017 04:17 PM, Qu Wenruo wrote:



On 2017年12月08日 15:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail
mount,
this is an experimental patch which thinks why not go and read backup
copy.


Just curious about in which real world case that backup super block can
help.
At least from what I see in mail list, only few cases where backup
super
helps.


   Theoretical design helps. I ended up in this situation though. And
   ext4 has -o sb flag to manage this part. When we can expect EIO on
   any part of the disk block why not on the LBA which contains primary
   SB. And should we fail the mount for that reason ? No.


And how do you ensure it's a btrfs?


  Hmm. You mean outside of btrfs ? I did experiment with wipe and then
  using /etc/fstab to mount, and it did lead to btrfs, is that your
  concern that it shouldn't have been. That looked surprising to me as
  well, but then problem points at wipefs instead.


It's closer, but still doesn't reach the point.
It can be a mkfs, other than wipefs.

It's can be another case like:

One user used btrfs for a while
But bugs made him/her unhappy, and he/she turned to use xfs (whatever
the fs is) instead.

While he/she forgot to change its fstab, and rebooted the system.

And suddenly, it's btrfs again!
What a surprise.



 I have worked quite a lot on defining the problem statements before
 btrfs. Here the user has to be blamed to have forgotten to update
 the fstab. I don't understand why should we workaround in btrfs for
 the reason that user may miss to update fstab. We don't design
 a stuff that like that, we design for the purpose, here backup SB
 is for the purpose that we all know, if we don't use backup SB, then
 its an incomplete design.








Despite that self super heal seems good, although I agree with
David, we
need a weaker but necessary check (magic and fsid from primary super?)
to ensure it's a valid btrfs before we use the backup supers.


   Of course, we already have btrfs_check_super_valid() to verify the SB,
   I don't understand why - how do we verify the SB should be the
point of
   concern here, at all.


The point here is, to distinguish an old and invalid btrfs (some other
valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
primary fs.


  Ok. When you check all the SBs you would pick the one which has passed
  btrfs_check_super_valid() and has highest generation. Did I ans your
  concern ? If a SB does not pass btrfs_check_super_valid() its not a
  valid btrfs SB at all.


No, not really.

What if the first SB is a XFS one or even a fs you didn't ever hear?

Skip it and use the 2nd one?
This filesystem is not even btrfs anymore.


 There is no problem is user does the correct thing that is to
 update the fstab. OR using a complete mount command. If user
 using -t btrfs OR unupdated btrfs then it indicates his intentions.

Thanks, Anand


Thanks,
Qu





This the main concern here.
The difference between recovery and recognizing a bad btrfs is quite
important.


  btrfs_check_super_valid() is already doing that right ? The point here
  is, should we use the backup SB when btrfs_check_super_valid() fails on
  primary SB.

Thanks, Anand


Thanks,
Qu



Thanks, Anand


Thanks,
Qu




Signed-off-by: Anand Jain 
---
    fs/btrfs/disk-io.c |  8 +++-
    fs/btrfs/volumes.c | 10 +++---
    2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
block_device *bdev)
     * So, we need to add a special mount option to scan for
     * later supers, using BTRFS_SUPER_MIRROR_MAX instead
     */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
    ret = btrfs_read_dev_one_super(bdev, i, &bh);
    if (ret)
    continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
btrfs_fs_info *fs_info)
    ret = -EINVAL;
    }
    +#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
    if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
    btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
    ret = -EINVAL;
    }
+#endif
      /*
     * Obvious sys_chunk_array corruptions, it must hold at least
one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Hans van Kranenburg

On 12/08/2017 09:17 AM, Qu Wenruo wrote:
> 
> 
> On 2017年12月08日 15:57, Anand Jain wrote:
>> -EXPERIMENTAL-
>> As of now when primary SB fails we won't self heal and would fail mount,
>> this is an experimental patch which thinks why not go and read backup
>> copy.
> 
> Just curious about in which real world case that backup super block can
> help.
> At least from what I see in mail list, only few cases where backup super
> helps.

That's about using 'backup roots' with old info about previous
generations than the current one on disk, which is a different thing
than using one of the other copies of the super block itself.

> Despite that self super heal seems good, although I agree with David, we
> need a weaker but necessary check (magic and fsid from primary super?)
> to ensure it's a valid btrfs before we use the backup supers.
> 
> Thanks,
> Qu
> 
> 
>>
>> Signed-off-by: Anand Jain 
>> ---
>>  fs/btrfs/disk-io.c |  8 +++-
>>  fs/btrfs/volumes.c | 10 +++---
>>  2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9b20c1f3563b..a791b8dfe8a8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
>> block_device *bdev)
>>   * So, we need to add a special mount option to scan for
>>   * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>   */
>> -for (i = 0; i < 1; i++) {
>> +for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>  ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>  if (ret)
>>  continue;
>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct 
>> btrfs_fs_info *fs_info)
>>  ret = -EINVAL;
>>  }
>>  
>> +#if 0
>> +/*
>> + * Need a way to check for any copy of SB, as its not a
>> + * strong check, just ignore this for now.
>> + */
>>  if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>  btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>  ret = -EINVAL;
>>  }
>> +#endif
>>  
>>  /*
>>   * Obvious sys_chunk_array corruptions, it must hold at least one key
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9fa2539a8493..f368db94d62b 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
>> flags, void *holder,
>>  {
>>  struct btrfs_super_block *disk_super;
>>  struct block_device *bdev;
>> -struct page *page;
>> +struct buffer_head *sb_bh;
>>  int ret = -EINVAL;
>>  u64 devid;
>>  u64 transid;
>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
>> flags, void *holder,
>>  goto error;
>>  }
>>  
>> -if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>> +sb_bh = btrfs_read_dev_super(bdev);
>> +if (IS_ERR(sb_bh)) {
>> +ret = PTR_ERR(sb_bh);
>>  goto error_bdev_put;
>> +}
>> +disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>  
>>  devid = btrfs_stack_device_id(&disk_super->dev_item);
>>  transid = btrfs_super_generation(disk_super);
>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
>> flags, void *holder,
>>  if (!ret && fs_devices_ret)
>>  (*fs_devices_ret)->total_devices = total_devices;
>>  
>> -btrfs_release_disk_super(page);
>> +brelse(sb_bh);
>>  
>>  error_bdev_put:
>>  blkdev_put(bdev, flags);
>>
> 


-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Nikolay Borisov



On  8.12.2017 12:33, Anand Jain wrote:
> 
> 
> On 12/08/2017 04:40 PM, Nikolay Borisov wrote:
>>
>>
>> On  8.12.2017 09:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>>>
>>> Signed-off-by: Anand Jain 
>>> ---
>>>   fs/btrfs/disk-io.c |  8 +++-
>>>   fs/btrfs/volumes.c | 10 +++---
>>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>    * So, we need to add a special mount option to scan for
>>>    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>    */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>   if (ret)
>>>   continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>   ret = -EINVAL;
>>>   }
>>>   +#if 0
>>> +    /*
>>> + * Need a way to check for any copy of SB, as its not a
>>> + * strong check, just ignore this for now.
>>> + */
>>>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>   ret = -EINVAL;
>>>   }
>>> +#endif
>>>     /*
>>>    * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   {
>>>   struct btrfs_super_block *disk_super;
>>>   struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>   int ret = -EINVAL;
>>>   u64 devid;
>>>   u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   goto error;
>>>   }
>>>   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>
>> This patch prompts another question: why do we have a page-based and a
>> bufferhead-based interface to reading the super block?
> 
>  Right. we need to know that. Sorry I just saw this.

FWIW unless we explicitly need a per-block state tracking (which I don't
think we do) we should ideally switch to page-based mechanism. Buffer
heads are considered deprecated. Also the only reason why we do have
btrfsic_submit_bh is for the superblock bufer head. So potentially by
removing the only requirement in the kernel where bh are used we can
simplify the btrfsic code as well . Just something to keep in mind.

> 
>  I have a very old patch to converge them and clean this up, but haven't
>  sent it because there are some missing information on why it ended up
>  like that in the first place.
> 
> Thanks, Anand
> 
> 
>> I did prototype
>> switching the bufferheads to page based but the resulting code wasn't
>> any cleaner. I believe there is also open the question what happens when
>> btrfs is run on a 64k page machine. I.e. we are going to read a single
>> page and the sb is going to be in the first 4k but what about the rest
>> 60, they could potentially contain other metadata. The page will have to
>> be freed asap so as not to peg the neighboring metadata?
> 
> 
>>
>>> +    if (IS_ERR(sb_bh)) {
>>> +    ret = PTR_ERR(sb_bh);
>>>   goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>     devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>   transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   if (!ret && fs_devices_ret)
>>>   (*fs_devices_ret)->total_devices = total_devices;
>>>   -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>     error_bdev_put:
>>>   blkdev_put(bdev, flags);
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info a

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo



On 2017年12月08日 19:48, Anand Jain wrote:
> 
> 
> On 12/08/2017 07:01 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 18:39, Anand Jain wrote:
>>>
>>>
>>> On 12/08/2017 04:17 PM, Qu Wenruo wrote:


 On 2017年12月08日 15:57, Anand Jain wrote:
> -EXPERIMENTAL-
> As of now when primary SB fails we won't self heal and would fail
> mount,
> this is an experimental patch which thinks why not go and read backup
> copy.

 Just curious about in which real world case that backup super block can
 help.
 At least from what I see in mail list, only few cases where backup
 super
 helps.
>>>
>>>   Theoretical design helps. I ended up in this situation though. And
>>>   ext4 has -o sb flag to manage this part. When we can expect EIO on
>>>   any part of the disk block why not on the LBA which contains primary
>>>   SB. And should we fail the mount for that reason ? No.
>>
>> And how do you ensure it's a btrfs?
> 
>  Hmm. You mean outside of btrfs ? I did experiment with wipe and then
>  using /etc/fstab to mount, and it did lead to btrfs, is that your
>  concern that it shouldn't have been. That looked surprising to me as
>  well, but then problem points at wipefs instead.

It's closer, but still doesn't reach the point.
It can be a mkfs, other than wipefs.

It's can be another case like:

One user used btrfs for a while
But bugs made him/her unhappy, and he/she turned to use xfs (whatever
the fs is) instead.

While he/she forgot to change its fstab, and rebooted the system.

And suddenly, it's btrfs again!
What a surprise.


> 
>>
>>>
 Despite that self super heal seems good, although I agree with
 David, we
 need a weaker but necessary check (magic and fsid from primary super?)
 to ensure it's a valid btrfs before we use the backup supers.
>>>
>>>   Of course, we already have btrfs_check_super_valid() to verify the SB,
>>>   I don't understand why - how do we verify the SB should be the
>>> point of
>>>   concern here, at all.
>>
>> The point here is, to distinguish an old and invalid btrfs (some other
>> valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
>> primary fs.
> 
>  Ok. When you check all the SBs you would pick the one which has passed
>  btrfs_check_super_valid() and has highest generation. Did I ans your
>  concern ? If a SB does not pass btrfs_check_super_valid() its not a
>  valid btrfs SB at all.

No, not really.

What if the first SB is a XFS one or even a fs you didn't ever hear?

Skip it and use the 2nd one?
This filesystem is not even btrfs anymore.

Thanks,
Qu

> 
> 
>> This the main concern here.
>> The difference between recovery and recognizing a bad btrfs is quite
>> important.
> 
>  btrfs_check_super_valid() is already doing that right ? The point here
>  is, should we use the backup SB when btrfs_check_super_valid() fails on
>  primary SB.
> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>>
>>> Thanks, Anand
>>>
 Thanks,
 Qu


>
> Signed-off-by: Anand Jain 
> ---
>    fs/btrfs/disk-io.c |  8 +++-
>    fs/btrfs/volumes.c | 10 +++---
>    2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9b20c1f3563b..a791b8dfe8a8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
> block_device *bdev)
>     * So, we need to add a special mount option to scan for
>     * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>     */
> -    for (i = 0; i < 1; i++) {
> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>    ret = btrfs_read_dev_one_super(bdev, i, &bh);
>    if (ret)
>    continue;
> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
> btrfs_fs_info *fs_info)
>    ret = -EINVAL;
>    }
>    +#if 0
> +    /*
> + * Need a way to check for any copy of SB, as its not a
> + * strong check, just ignore this for now.
> + */
>    if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>    btrfs_err(fs_info, "super offset mismatch %llu != %u",
>  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>    ret = -EINVAL;
>    }
> +#endif
>      /*
>     * Obvious sys_chunk_array corruptions, it must hold at least
> one key
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fa2539a8493..f368db94d62b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
> fmode_t flags, void *holder,
>    {
>    struct btrfs_super_block *disk_super;
>    struct block_device *bdev;
> -    struct page *page;
> +    struct buffer_head *sb_bh;
>    int ret = -EINVAL;

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain




On 12/08/2017 07:01 PM, Qu Wenruo wrote:



On 2017年12月08日 18:39, Anand Jain wrote:



On 12/08/2017 04:17 PM, Qu Wenruo wrote:



On 2017年12月08日 15:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.


Just curious about in which real world case that backup super block can
help.
At least from what I see in mail list, only few cases where backup super
helps.


  Theoretical design helps. I ended up in this situation though. And
  ext4 has -o sb flag to manage this part. When we can expect EIO on
  any part of the disk block why not on the LBA which contains primary
  SB. And should we fail the mount for that reason ? No.


And how do you ensure it's a btrfs?


 Hmm. You mean outside of btrfs ? I did experiment with wipe and then
 using /etc/fstab to mount, and it did lead to btrfs, is that your
 concern that it shouldn't have been. That looked surprising to me as
 well, but then problem points at wipefs instead.






Despite that self super heal seems good, although I agree with David, we
need a weaker but necessary check (magic and fsid from primary super?)
to ensure it's a valid btrfs before we use the backup supers.


  Of course, we already have btrfs_check_super_valid() to verify the SB,
  I don't understand why - how do we verify the SB should be the point of
  concern here, at all.


The point here is, to distinguish an old and invalid btrfs (some other
valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
primary fs.


 Ok. When you check all the SBs you would pick the one which has passed
 btrfs_check_super_valid() and has highest generation. Did I ans your
 concern ? If a SB does not pass btrfs_check_super_valid() its not a
 valid btrfs SB at all.



This the main concern here.
The difference between recovery and recognizing a bad btrfs is quite
important.


 btrfs_check_super_valid() is already doing that right ? The point here
 is, should we use the backup SB when btrfs_check_super_valid() fails on
 primary SB.

Thanks, Anand


Thanks,
Qu



Thanks, Anand


Thanks,
Qu




Signed-off-by: Anand Jain 
---
   fs/btrfs/disk-io.c |  8 +++-
   fs/btrfs/volumes.c | 10 +++---
   2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
block_device *bdev)
    * So, we need to add a special mount option to scan for
    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
    */
-    for (i = 0; i < 1; i++) {
+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
   ret = btrfs_read_dev_one_super(bdev, i, &bh);
   if (ret)
   continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
btrfs_fs_info *fs_info)
   ret = -EINVAL;
   }
   +#if 0
+    /*
+ * Need a way to check for any copy of SB, as its not a
+ * strong check, just ignore this for now.
+ */
   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
   btrfs_err(fs_info, "super offset mismatch %llu != %u",
     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
   ret = -EINVAL;
   }
+#endif
     /*
    * Obvious sys_chunk_array corruptions, it must hold at least
one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   {
   struct btrfs_super_block *disk_super;
   struct block_device *bdev;
-    struct page *page;
+    struct buffer_head *sb_bh;
   int ret = -EINVAL;
   u64 devid;
   u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   goto error;
   }
   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
+    sb_bh = btrfs_read_dev_super(bdev);
+    if (IS_ERR(sb_bh)) {
+    ret = PTR_ERR(sb_bh);
   goto error_bdev_put;
+    }
+    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
     devid = btrfs_stack_device_id(&disk_super->dev_item);
   transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
fmode_t flags, void *holder,
   if (!ret && fs_devices_ret)
   (*fs_devices_ret)->total_devices = total_devices;
   -    btrfs_release_disk_super(page);
+    brelse(sb_bh);
     error_bdev_put:
   blkdev_put(bdev, flags);




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linu

[bug report] btrfs: cleanup device states define BTRFS_DEV_STATE_IN_FS_METADATA

2017-12-08 Thread Dan Carpenter

Hello Anand Jain,

The patch 2dcfcdc43524: "btrfs: cleanup device states define
BTRFS_DEV_STATE_IN_FS_METADATA" from Dec 4, 2017, leads to the
following static checker warning:

fs/btrfs/super.c:2007 btrfs_calc_avail_data_space()
warn: test_bit() takes a bit number

fs/btrfs/super.c
  2004  
  2005  rcu_read_lock();
  2006  list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) 
{
  2007  if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
  ^^
This BTRFS_DEV_STATE_IN_FS_METADATA define is BIT(1) but for test_bit()
we should just take 1.  In other words we are doing double BIT(BIT(1)).

  2008  &device->dev_state) ||
  2009  !device->bdev ||
  2010  test_bit(BTRFS_DEV_STATE_REPLACE_TGT, 
&device->dev_state))
  2011  continue;
  2012  
  2013  if (i >= nr_devices)
  2014  break;
  2015  

I get a bunch of these warnings.

drivers/md/md-bitmap.c:1993 bitmap_copy_from_slot() warn: 
'bitmap_file_test_bit(bitmap, block)' returns positive and negative
[ I threw this one is as a bonus.  The warning is because are we
  handling the -EINVAL return from bitmap_file_test_bit() correctly?
  I have no idea. ]

fs/btrfs/super.c:2007 btrfs_calc_avail_data_space() warn: test_bit() takes a 
bit number
fs/btrfs/super.c:2010 btrfs_calc_avail_data_space() warn: test_bit() takes a 
bit number
fs/btrfs/super.c:2315 btrfs_show_devname() warn: test_bit() takes a bit number
fs/btrfs/disk-io.c:3351 write_dev_flush() warn: test_bit() takes a bit number
fs/btrfs/disk-io.c:3361 wait_dev_flush() warn: test_bit() takes a bit number
fs/btrfs/disk-io.c:3364 wait_dev_flush() warn: test_bit() takes a bit number
fs/btrfs/disk-io.c:3392 barrier_all_devices() warn: test_bit() takes a bit 
number
fs/btrfs/disk-io.c:3396 barrier_all_devices() warn: test_bit() takes a bit 
number
fs/btrfs/disk-io.c:3397 barrier_all_devices() warn: test_bit() takes a bit 
number
fs/btrfs/disk-io.c:3406 barrier_all_devices() warn: test_bit() takes a bit 
number
fs/btrfs/disk-io.c:3412 barrier_all_devices() warn: test_bit() takes a bit 
number
fs/btrfs/disk-io.c:3413 barrier_all_devices() warn: test_bit() takes a bit 
number
fs/btrfs/disk-io.c:3510 write_all_supers() warn: test_bit() takes a bit number
fs/btrfs/disk-io.c:3511 write_all_supers() warn: test_bit() takes a bit number
fs/btrfs/disk-io.c:3550 write_all_supers() warn: test_bit() takes a bit number
fs/btrfs/disk-io.c:3551 write_all_supers() warn: test_bit() takes a bit number
fs/btrfs/volumes.c:695 btrfs_open_one_device() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:699 btrfs_open_one_device() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:701 btrfs_open_one_device() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:709 btrfs_open_one_device() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:713 btrfs_open_one_device() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:829 device_list_add() warn: test_bit() takes a bit number
fs/btrfs/volumes.c:831 device_list_add() warn: test_bit() takes a bit number
fs/btrfs/volumes.c:913 btrfs_close_extra_devices() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:915 btrfs_close_extra_devices() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:935 btrfs_close_extra_devices() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:945 btrfs_close_extra_devices() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:947 btrfs_close_extra_devices() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:948 btrfs_close_extra_devices() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:980 btrfs_close_bdev() warn: test_bit() takes a bit number
fs/btrfs/volumes.c:997 btrfs_prepare_close_one_device() warn: test_bit() takes 
a bit number
fs/btrfs/volumes.c:1003 btrfs_prepare_close_one_device() warn: test_bit() takes 
a bit number
fs/btrfs/volumes.c:1258 btrfs_account_dev_extents_size() warn: test_bit() takes 
a bit number
fs/btrfs/volumes.c:1437 find_free_dev_extent_start() warn: test_bit() takes a 
bit number
fs/btrfs/volumes.c:1644 btrfs_alloc_dev_extent() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:1645 btrfs_alloc_dev_extent() warn: test_bit() takes a bit 
number
fs/btrfs/volumes.c:1891 btrfs_find_next_active_device() warn: test_bit() takes 
a bit number
fs/btrfs/volumes.c:1953 btrfs_rm_device() warn: test_bit() takes a bit number
fs/btrfs/volumes.c:1958 btrfs_rm_device() warn: test_bit() takes a bit number
fs/btrfs/volumes.c:1964 btrfs_rm_device() warn: test_bit() takes a bit number
fs/btrfs/volumes.c:1986 btrfs_rm_device() warn: test_bit() takes a bit number
fs/btrfs/volumes.c:2006 btrfs_rm_device() warn: test_bit() takes a bit number
fs/btrfs/volumes.c:2026 btrfs_rm_device() warn: test_bit() takes a bit number
fs/btrfs/volumes.c

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo



On 2017年12月08日 18:39, Anand Jain wrote:
> 
> 
> On 12/08/2017 04:17 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 15:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>>
>> Just curious about in which real world case that backup super block can
>> help.
>> At least from what I see in mail list, only few cases where backup super
>> helps.
> 
>  Theoretical design helps. I ended up in this situation though. And
>  ext4 has -o sb flag to manage this part. When we can expect EIO on
>  any part of the disk block why not on the LBA which contains primary
>  SB. And should we fail the mount for that reason ? No.

And how do you ensure it's a btrfs?

> 
>> Despite that self super heal seems good, although I agree with David, we
>> need a weaker but necessary check (magic and fsid from primary super?)
>> to ensure it's a valid btrfs before we use the backup supers.
> 
>  Of course, we already have btrfs_check_super_valid() to verify the SB,
>  I don't understand why - how do we verify the SB should be the point of
>  concern here, at all.

The point here is, to distinguish an old and invalid btrfs (some other
valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
primary fs.

This the main concern here.
The difference between recovery and recognizing a bad btrfs is quite
important.

Thanks,
Qu

> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>
>>>
>>> Signed-off-by: Anand Jain 
>>> ---
>>>   fs/btrfs/disk-io.c |  8 +++-
>>>   fs/btrfs/volumes.c | 10 +++---
>>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>    * So, we need to add a special mount option to scan for
>>>    * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>    */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>   if (ret)
>>>   continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>   ret = -EINVAL;
>>>   }
>>>   +#if 0
>>> +    /*
>>> + * Need a way to check for any copy of SB, as its not a
>>> + * strong check, just ignore this for now.
>>> + */
>>>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>     btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>   ret = -EINVAL;
>>>   }
>>> +#endif
>>>     /*
>>>    * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   {
>>>   struct btrfs_super_block *disk_super;
>>>   struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>   int ret = -EINVAL;
>>>   u64 devid;
>>>   u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   goto error;
>>>   }
>>>   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>> +    if (IS_ERR(sb_bh)) {
>>> +    ret = PTR_ERR(sb_bh);
>>>   goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>     devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>   transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   if (!ret && fs_devices_ret)
>>>   (*fs_devices_ret)->total_devices = total_devices;
>>>   -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>     error_bdev_put:
>>>   blkdev_put(bdev, flags);
>>>
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain




On 12/08/2017 04:17 PM, Qu Wenruo wrote:



On 2017年12月08日 15:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.


Just curious about in which real world case that backup super block can
help.
At least from what I see in mail list, only few cases where backup super
helps.


 Theoretical design helps. I ended up in this situation though. And
 ext4 has -o sb flag to manage this part. When we can expect EIO on
 any part of the disk block why not on the LBA which contains primary
 SB. And should we fail the mount for that reason ? No.


Despite that self super heal seems good, although I agree with David, we
need a weaker but necessary check (magic and fsid from primary super?)
to ensure it's a valid btrfs before we use the backup supers.


 Of course, we already have btrfs_check_super_valid() to verify the SB,
 I don't understand why - how do we verify the SB should be the point of
 concern here, at all.

Thanks, Anand


Thanks,
Qu




Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c |  8 +++-
  fs/btrfs/volumes.c | 10 +++---
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
block_device *bdev)
 * So, we need to add a special mount option to scan for
 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 */
-   for (i = 0; i < 1; i++) {
+   for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
ret = btrfs_read_dev_one_super(bdev, i, &bh);
if (ret)
continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
ret = -EINVAL;
}
  
+#if 0

+   /*
+* Need a way to check for any copy of SB, as its not a
+* strong check, just ignore this for now.
+*/
if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
ret = -EINVAL;
}
+#endif
  
  	/*

 * Obvious sys_chunk_array corruptions, it must hold at least one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
  {
struct btrfs_super_block *disk_super;
struct block_device *bdev;
-   struct page *page;
+   struct buffer_head *sb_bh;
int ret = -EINVAL;
u64 devid;
u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
goto error;
}
  
-	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))

+   sb_bh = btrfs_read_dev_super(bdev);
+   if (IS_ERR(sb_bh)) {
+   ret = PTR_ERR(sb_bh);
goto error_bdev_put;
+   }
+   disk_super = (struct btrfs_super_block *) sb_bh->b_data;
  
  	devid = btrfs_stack_device_id(&disk_super->dev_item);

transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
if (!ret && fs_devices_ret)
(*fs_devices_ret)->total_devices = total_devices;
  
-	btrfs_release_disk_super(page);

+   brelse(sb_bh);
  
  error_bdev_put:

blkdev_put(bdev, flags);




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Anand Jain




On 12/08/2017 04:40 PM, Nikolay Borisov wrote:



On  8.12.2017 09:57, Anand Jain wrote:

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.

Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c |  8 +++-
  fs/btrfs/volumes.c | 10 +++---
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b20c1f3563b..a791b8dfe8a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
block_device *bdev)
 * So, we need to add a special mount option to scan for
 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 */
-   for (i = 0; i < 1; i++) {
+   for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
ret = btrfs_read_dev_one_super(bdev, i, &bh);
if (ret)
continue;
@@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
ret = -EINVAL;
}
  
+#if 0

+   /*
+* Need a way to check for any copy of SB, as its not a
+* strong check, just ignore this for now.
+*/
if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
btrfs_err(fs_info, "super offset mismatch %llu != %u",
  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
ret = -EINVAL;
}
+#endif
  
  	/*

 * Obvious sys_chunk_array corruptions, it must hold at least one key
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fa2539a8493..f368db94d62b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
  {
struct btrfs_super_block *disk_super;
struct block_device *bdev;
-   struct page *page;
+   struct buffer_head *sb_bh;
int ret = -EINVAL;
u64 devid;
u64 transid;
@@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
goto error;
}
  
-	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))

+   sb_bh = btrfs_read_dev_super(bdev);


This patch prompts another question: why do we have a page-based and a
bufferhead-based interface to reading the super block?


 Right. we need to know that. Sorry I just saw this.

 I have a very old patch to converge them and clean this up, but haven't
 sent it because there are some missing information on why it ended up
 like that in the first place.

Thanks, Anand



I did prototype
switching the bufferheads to page based but the resulting code wasn't
any cleaner. I believe there is also open the question what happens when
btrfs is run on a 64k page machine. I.e. we are going to read a single
page and the sb is going to be in the first 4k but what about the rest
60, they could potentially contain other metadata. The page will have to
be freed asap so as not to peg the neighboring metadata?






+   if (IS_ERR(sb_bh)) {
+   ret = PTR_ERR(sb_bh);
goto error_bdev_put;
+   }
+   disk_super = (struct btrfs_super_block *) sb_bh->b_data;
  
  	devid = btrfs_stack_device_id(&disk_super->dev_item);

transid = btrfs_super_generation(disk_super);
@@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
if (!ret && fs_devices_ret)
(*fs_devices_ret)->total_devices = total_devices;
  
-	btrfs_release_disk_super(page);

+   brelse(sb_bh);
  
  error_bdev_put:

blkdev_put(bdev, flags);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Delivery Failure

2017-12-08 Thread admin


-
The message you sent to vhakonigroup.co.za/mashudu was rejected because it 
would exceed the quota for the mailbox.

The subject of the message follows: 
Re:Start to earn 15k daily.So easy money

-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Question] Two variants of SB reads can we move into bio read instead ?

2017-12-08 Thread Anand Jain



David,

 There are two variants of SB read, one using the device cache [1]
 and the other buffer head [2].

 [1] btrfs_read_disk_super()
 [2] btrfs_read_dev_super()

 Patch, 6f60cbd3ae442cb35861bb522f388db123d42ec1
  (btrfs: access superblock via pagecache in scan_one_device)
 Embraced device caches to avoid blocksize set and thus avoid device
 drop cache. But however its in the context of starting up with a new
 mount and in practice, would it really matter if the device cache is
 dropped in the context of mount, we any way drop it when the device
 is successfully mounted though. I don't understand the actual problem
 here.

 Further [2] is still using buffer head, which works very well for us
 in this context, any idea if there is any suggestion to move it to
 newer bio read instead ?

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-08 Thread Byungchul Park


On 12/8/2017 4:25 PM, Dave Chinner wrote:

On Fri, Dec 08, 2017 at 01:45:52PM +0900, Byungchul Park wrote:

On Fri, Dec 08, 2017 at 09:22:16AM +1100, Dave Chinner wrote:

On Thu, Dec 07, 2017 at 11:06:34AM -0500, Theodore Ts'o wrote:

On Wed, Dec 06, 2017 at 06:06:48AM -0800, Matthew Wilcox wrote:

Unfortunately for you, I don't find arguments along the lines of
"lockdep will save us" at all convincing.  lockdep already throws
too many false positives to be useful as a tool that reliably and
accurately points out rare, exciting, complex, intricate locking
problems.


But it does reliably and accurately point out "dude, you forgot to take
the lock".  It's caught a number of real problems in my own testing that
you never got to see.


The problem is that if it has too many false positives --- and it's
gotten *way* worse with the completion callback "feature", people will
just stop using Lockdep as being too annyoing and a waste of developer
time when trying to figure what is a legitimate locking bug versus
lockdep getting confused.

I can't even disable the new Lockdep feature which is throwing
lots of new false positives --- it's just all or nothing.

Dave has just said he's already stopped using Lockdep, as a result.


This is compeltely OT, but FYI I stopped using lockdep a long time
ago.  We've spend orders of magnitude more time and effort to shut
up lockdep false positives in the XFS code than we ever have on
locking problems that lockdep has uncovered. And still lockdep
throws too many false positives on XFS workloads to be useful to me.

But it's more than that: I understand just how much lockdep *doesn't
check* and that means *I know I can't rely on lockdep* for potential
deadlock detection. e.g.  it doesn't cover semaphores, which means


Hello,

I'm careful in saying the following since you seem to feel not good at
crossrelease and even lockdep. Now that cross-release has been
introduced, semaphores can be covered as you might know. Actually, all
general waiters can.


And all it will do is create a whole bunch more work for us XFS guys
to shut up all the the false positive crap that falls out from it
because the locking model we have is far more complex than any of
the lockdep developers thought was necessary to support, just like
happened with the XFS inode annotations all those years ago.

e.g. nobody has ever bothered to ask us what is needed to describe
XFS's semaphore locking model.  If you did that, you'd know that we
nest *thousands* of locked semaphores in compeltely random lock
order during metadata buffer writeback. And that this lock order
does not reflect the actual locking order rules we have for locking
buffers during transactions.

Oh, and you'd also know that a semaphore's lock order and context
can change multiple times during the life time of the buffer.  Say
we free a block and the reallocate it as something else before it is
reclaimed - that buffer now might have a different lock order. Or
maybe we promote a buffer to be a root btree block as a result of a
join - it's now the first buffer in a lock run, rather than a child.
Or we split a tree, and the root is now a node and so no longer is
the first buffer in a lock run. Or that we walk sideways along the
leaf nodes siblings during searches.  IOWs, there is no well defined
static lock ordering at all for buffers - and therefore semaphores -
in XFS at all.

And knowing that, you wouldn't simply mention that lockdep can
support semaphores now as though that is necessary to "make it work"
for XFS.  It's going to be much simpler for us to just turn off
lockdep and ignore whatever crap it sends our way than it is to
spend unplanned weeks of our time to try to make lockdep sorta work
again. Sure, we might get there in the end, but it's likely to take
months, if not years like it did with the XFS inode annotations.


it has zero coverage of the entire XFS metadata buffer subsystem and
the complex locking orders we have for metadata updates.

Put simply: lockdep doesn't provide me with any benefit, so I don't
use it...


Sad..


I don't think you understand. I'll try to explain.

The lockdep infrastructure by itself doesn't make lockdep a useful
tool - it mostly generates false positives because it has no
concept of locking models that don't match it's internal tracking
assumptions and/or limitations.

That means if we can't suppress the false positives, then lockdep is
going to be too noisy to find real problems.  It's taken the XFS
developers months of work over the past 7-8 years to suppress all
the *common* false positives that lockdep throws on XFS. And despite
all that work, there's still too many false positives occuring
because we can't easily suppress them with annotations. IOWs, the
signal to noise ratio is still too low for lockdep to find real
problems.

That's why lockdep isn't useful to me - the noise floor is too high,
and the effort to lower the noise floor further is too great.

This is important, because cross-r

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Nikolay Borisov



On  8.12.2017 09:57, Anand Jain wrote:
> -EXPERIMENTAL-
> As of now when primary SB fails we won't self heal and would fail mount,
> this is an experimental patch which thinks why not go and read backup
> copy.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c |  8 +++-
>  fs/btrfs/volumes.c | 10 +++---
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9b20c1f3563b..a791b8dfe8a8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
> block_device *bdev)
>* So, we need to add a special mount option to scan for
>* later supers, using BTRFS_SUPER_MIRROR_MAX instead
>*/
> - for (i = 0; i < 1; i++) {
> + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>   if (ret)
>   continue;
> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct 
> btrfs_fs_info *fs_info)
>   ret = -EINVAL;
>   }
>  
> +#if 0
> + /*
> +  * Need a way to check for any copy of SB, as its not a
> +  * strong check, just ignore this for now.
> +  */
>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
> btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>   ret = -EINVAL;
>   }
> +#endif
>  
>   /*
>* Obvious sys_chunk_array corruptions, it must hold at least one key
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fa2539a8493..f368db94d62b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>  {
>   struct btrfs_super_block *disk_super;
>   struct block_device *bdev;
> - struct page *page;
> + struct buffer_head *sb_bh;
>   int ret = -EINVAL;
>   u64 devid;
>   u64 transid;
> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   goto error;
>   }
>  
> - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
> + sb_bh = btrfs_read_dev_super(bdev);

This patch prompts another question: why do we have a page-based and a
bufferhead-based interface to reading the super block? I did prototype
switching the bufferheads to page based but the resulting code wasn't
any cleaner. I believe there is also open the question what happens when
btrfs is run on a 64k page machine. I.e. we are going to read a single
page and the sb is going to be in the first 4k but what about the rest
60, they could potentially contain other metadata. The page will have to
be freed asap so as not to peg the neighboring metadata?


> + if (IS_ERR(sb_bh)) {
> + ret = PTR_ERR(sb_bh);
>   goto error_bdev_put;
> + }
> + disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>  
>   devid = btrfs_stack_device_id(&disk_super->dev_item);
>   transid = btrfs_super_generation(disk_super);
> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   if (!ret && fs_devices_ret)
>   (*fs_devices_ret)->total_devices = total_devices;
>  
> - btrfs_release_disk_super(page);
> + brelse(sb_bh);
>  
>  error_bdev_put:
>   blkdev_put(bdev, flags);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Assalamu`Alaikum.

2017-12-08 Thread Mohammad Ouattara

Greetings from Dr. mohammad ouattara.

Assalamu`Alaikum.

My Name is Dr. mohammad ouattara, I am a banker by profession. I'm
from Ouagadougou, Burkina Faso, West Africa. My reason for contacting
you is to transfer an abandoned $10.6M to your account.

The owner of this fund died since 2004 with his Next Of Kin. I want to
present you to the bank as the Next of Kin/beneficiary of this fund.

Further details of the transaction shall be forwarded to you as soon
as I receive your return mail indicating your interest.

Have a great day,
Dr. mohammad ouattara.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] btrfs: self heal from SB fail

2017-12-08 Thread Qu Wenruo



On 2017年12月08日 15:57, Anand Jain wrote:
> -EXPERIMENTAL-
> As of now when primary SB fails we won't self heal and would fail mount,
> this is an experimental patch which thinks why not go and read backup
> copy.

Just curious about in which real world case that backup super block can
help.
At least from what I see in mail list, only few cases where backup super
helps.

Despite that self super heal seems good, although I agree with David, we
need a weaker but necessary check (magic and fsid from primary super?)
to ensure it's a valid btrfs before we use the backup supers.

Thanks,
Qu


> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c |  8 +++-
>  fs/btrfs/volumes.c | 10 +++---
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9b20c1f3563b..a791b8dfe8a8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
> block_device *bdev)
>* So, we need to add a special mount option to scan for
>* later supers, using BTRFS_SUPER_MIRROR_MAX instead
>*/
> - for (i = 0; i < 1; i++) {
> + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>   ret = btrfs_read_dev_one_super(bdev, i, &bh);
>   if (ret)
>   continue;
> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct 
> btrfs_fs_info *fs_info)
>   ret = -EINVAL;
>   }
>  
> +#if 0
> + /*
> +  * Need a way to check for any copy of SB, as its not a
> +  * strong check, just ignore this for now.
> +  */
>   if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>   btrfs_err(fs_info, "super offset mismatch %llu != %u",
> btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>   ret = -EINVAL;
>   }
> +#endif
>  
>   /*
>* Obvious sys_chunk_array corruptions, it must hold at least one key
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fa2539a8493..f368db94d62b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>  {
>   struct btrfs_super_block *disk_super;
>   struct block_device *bdev;
> - struct page *page;
> + struct buffer_head *sb_bh;
>   int ret = -EINVAL;
>   u64 devid;
>   u64 transid;
> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   goto error;
>   }
>  
> - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
> + sb_bh = btrfs_read_dev_super(bdev);
> + if (IS_ERR(sb_bh)) {
> + ret = PTR_ERR(sb_bh);
>   goto error_bdev_put;
> + }
> + disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>  
>   devid = btrfs_stack_device_id(&disk_super->dev_item);
>   transid = btrfs_super_generation(disk_super);
> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   if (!ret && fs_devices_ret)
>   (*fs_devices_ret)->total_devices = total_devices;
>  
> - btrfs_release_disk_super(page);
> + brelse(sb_bh);
>  
>  error_bdev_put:
>   blkdev_put(bdev, flags);
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 5/5] btrfs: Greatly simplify btrfs_read_dev_super

2017-12-08 Thread Anand Jain




On 12/07/2017 11:07 PM, Anand Jain wrote:



On 12/07/2017 12:24 AM, David Sterba wrote:

On Mon, Dec 04, 2017 at 06:20:09PM +0200, Nikolay Borisov wrote:

I don't understand what problem *should* be solved here...


Without any further checks and validation, we cannot simply iterate 
over

all superblocks and try to mount anything that looks ok. Even if the
offsets exist on the block device.  The fsid should be at least 
checked,

but that's still not enough to ensure the 1st copy is what we want to
mount.


I'm more inclined to agree with Anand here, that if the users wants to
mount with -t btrfs then the filesystem should assume it's correct to
use any of the additional superblocks.


If and only if the additional superblocks are valid. And if we can't
read the primary superblock, we don't have enough information to
establish the validation.  EIO?  We don't have anything.  CRC mismatch?
Can't trust the whole data.  We need FSID and total_size at least. Other
actions are limited from inside kernel.


 ext4 has mount option -o sb= to specify the copy num to use.

 Not sure if I have dug enough but as of now I don't find any pitfall.
 Only funny thing is you can mount a device even after wipefs -a and
 recover its primary SB, to which my view is that the problem is at
 the wipefs -a end, not able to wipe all SB copies.

 I sent out an experimental RFC patch here:
   [PATCH RFC] btrfs: self heal from SB fail

 Pls try out, any problem that you could think off by this approach.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

44 matches

Mail list logo