Re: [PATCH RFC] btrfs: self heal from SB fail
On 2017年12月08日 22:02, Anand Jain wrote: > > > On 12/08/2017 08:51 PM, Austin S. Hemmelgarn wrote: >> On 2017-12-08 02:57, Anand Jain wrote: >>> -EXPERIMENTAL- >>> As of now when primary SB fails we won't self heal and would fail mount, >>> this is an experimental patch which thinks why not go and read backup >>> copy. >> I like the concept, and actually think this should be default behavior >> on a filesystem that's already mounted (we fix other errors, why not >> SB's), > > >> but I don't think it should be default behavior at mount time for the >> reasons Qu has outlined (picking up old BTRFS SB's after reformatting >> is bad). > > If the primary SB is not btrfs, then unless forced with -t btrfs > option, this mount won't be in btrfs at all. That means it goes to > the respective FS by default. > > If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs > is not overwriting all the copies of SB then its a mkfs.btrfs bug. Nope. If the new btrfs is a smaller one than original btrfs, the 2nd super can still be there. (AFAIK all these examples have been given by David for several times) Thanks, Qu > > Which means there is already mount -t option to redirect to the FS > module that is needed by the user, when primary SB is corrupted. > We should use this feature. > > Thanks, Anand > > >> However, I do think it's useful to be able to ask for this behavior on >> mount, so that you don't need to fight with the programs to get a >> filesystem to mount when the first SB is missing (perhaps add a >> 'usebackupsb' option to mirror 'usebackuproot'?). >>> >>> Signed-off-by: Anand Jain >>> --- >>> fs/btrfs/disk-io.c | 8 +++- >>> fs/btrfs/volumes.c | 10 +++--- >>> 2 files changed, 14 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>> index 9b20c1f3563b..a791b8dfe8a8 100644 >>> --- a/fs/btrfs/disk-io.c >>> +++ b/fs/btrfs/disk-io.c >>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct >>> block_device *bdev) >>> * So, we need to add a special mount option to scan for >>> * later supers, using BTRFS_SUPER_MIRROR_MAX instead >>> */ >>> - for (i = 0; i < 1; i++) { >>> + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { >>> ret = btrfs_read_dev_one_super(bdev, i, &bh); >>> if (ret) >>> continue; >>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct >>> btrfs_fs_info *fs_info) >>> ret = -EINVAL; >>> } >>> +#if 0 >>> + /* >>> + * Need a way to check for any copy of SB, as its not a >>> + * strong check, just ignore this for now. >>> + */ >>> if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { >>> btrfs_err(fs_info, "super offset mismatch %llu != %u", >>> btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); >>> ret = -EINVAL; >>> } >>> +#endif >>> /* >>> * Obvious sys_chunk_array corruptions, it must hold at least >>> one key >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >>> index 9fa2539a8493..f368db94d62b 100644 >>> --- a/fs/btrfs/volumes.c >>> +++ b/fs/btrfs/volumes.c >>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, >>> fmode_t flags, void *holder, >>> { >>> struct btrfs_super_block *disk_super; >>> struct block_device *bdev; >>> - struct page *page; >>> + struct buffer_head *sb_bh; >>> int ret = -EINVAL; >>> u64 devid; >>> u64 transid; >>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, >>> fmode_t flags, void *holder, >>> goto error; >>> } >>> - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) >>> + sb_bh = btrfs_read_dev_super(bdev); >>> + if (IS_ERR(sb_bh)) { >>> + ret = PTR_ERR(sb_bh); >>> goto error_bdev_put; >>> + } >>> + disk_super = (struct btrfs_super_block *) sb_bh->b_data; >>> devid = btrfs_stack_device_id(&disk_super->dev_item); >>> transid = btrfs_super_generation(disk_super); >>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, >>> fmode_t flags, void *holder, >>> if (!ret && fs_devices_ret) >>> (*fs_devices_ret)->total_devices = total_devices; >>> - btrfs_release_disk_super(page); >>> + brelse(sb_bh); >>> error_bdev_put: >>> blkdev_put(bdev, flags); >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: OpenPGP digital signature
Re: [PATCH] Btrfs: raid56: fix race between merge_bio and rbio_orig_end_io
(Add Jérôme Carretero.) Thanks, -liubo On Fri, Dec 08, 2017 at 04:02:35PM -0700, Liu Bo wrote: > We're not allowed to take any new bios to rbio->bio_list in > rbio_orig_end_io(), otherwise we may get merged with more bios and > rbio->bio_list is not empty. > > This should only happens in error-out cases, the normal path of > recover and full stripe write have already set RBIO_RMW_LOCKED_BIT to > disable merge before doing IO. > > Reported-by: Jérôme Carretero > Signed-off-by: Liu Bo > --- > fs/btrfs/raid56.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c > index 5aa9d22..127c782 100644 > --- a/fs/btrfs/raid56.c > +++ b/fs/btrfs/raid56.c > @@ -859,12 +859,23 @@ static void free_raid_bio(struct btrfs_raid_bio *rbio) > */ > static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, blk_status_t err) > { > - struct bio *cur = bio_list_get(&rbio->bio_list); > + struct bio *cur; > struct bio *next; > > + /* > + * We're not allowed to take any new bios to rbio->bio_list > + * from now on, otherwise we may get merged with more bios and > + * rbio->bio_list is not empty. > + */ > + spin_lock(&rbio->bio_list_lock); > + set_bit(RBIO_RMW_LOCKED_BIT, &rbio->flags); > + spin_unlock(&rbio->bio_list_lock); > + > if (rbio->generic_bio_cnt) > btrfs_bio_counter_sub(rbio->fs_info, rbio->generic_bio_cnt); > > + cur = bio_list_get(&rbio->bio_list); > + > free_raid_bio(rbio); > > while (cur) { > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: raid56: fix race between merge_bio and rbio_orig_end_io
We're not allowed to take any new bios to rbio->bio_list in rbio_orig_end_io(), otherwise we may get merged with more bios and rbio->bio_list is not empty. This should only happens in error-out cases, the normal path of recover and full stripe write have already set RBIO_RMW_LOCKED_BIT to disable merge before doing IO. Reported-by: Jérôme Carretero Signed-off-by: Liu Bo --- fs/btrfs/raid56.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 5aa9d22..127c782 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -859,12 +859,23 @@ static void free_raid_bio(struct btrfs_raid_bio *rbio) */ static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, blk_status_t err) { - struct bio *cur = bio_list_get(&rbio->bio_list); + struct bio *cur; struct bio *next; + /* +* We're not allowed to take any new bios to rbio->bio_list +* from now on, otherwise we may get merged with more bios and +* rbio->bio_list is not empty. +*/ + spin_lock(&rbio->bio_list_lock); + set_bit(RBIO_RMW_LOCKED_BIT, &rbio->flags); + spin_unlock(&rbio->bio_list_lock); + if (rbio->generic_bio_cnt) btrfs_bio_counter_sub(rbio->fs_info, rbio->generic_bio_cnt); + cur = bio_list_get(&rbio->bio_list); + free_raid_bio(rbio); while (cur) { -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC] btrfs: self heal from SB fail
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Austin S. Hemmelgarn > Sent: Friday, 8 December 2017 11:51 PM > To: Anand Jain ; linux-btrfs@vger.kernel.org > Subject: Re: [PATCH RFC] btrfs: self heal from SB fail > > On 2017-12-08 02:57, Anand Jain wrote: > > -EXPERIMENTAL- > > As of now when primary SB fails we won't self heal and would fail > > mount, this is an experimental patch which thinks why not go and read > > backup copy. > I like the concept, and actually think this should be default behavior on a > filesystem that's already mounted (we fix other errors, why not SB's), but I > don't think it should be default behavior at mount time for the reasons Qu > has outlined (picking up old BTRFS SB's after reformatting is bad). However, > I > do think it's useful to be able to ask for this behavior on mount, so that you > don't need to fight with the programs to get a filesystem to mount when the > first SB is missing (perhaps add a 'usebackupsb' option to mirror > 'usebackuproot'?). I agree with this. The behaviour I'd like to see would be refusal to mount (without additional mount options) but also: print the needed info to the kernel log so the user can add the required mount option or read the wiki for more information, and print some diagnostic info on the primary + secondary super blocks. Paul.
Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection
On 12/08/2017 09:24 PM, Josef Bacik wrote: > On Fri, Dec 08, 2017 at 04:35:44PM +0100, Daniel Borkmann wrote: >> On 12/06/2017 05:12 PM, Josef Bacik wrote: >>> Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro. I went >>> to >>> figure out why the compiler didn't catch it and it's because it was not used >>> anywhere. I had copied it from the trace blacklist code without >>> understanding >>> where it was used as cscope didn't find the original macro I was looking >>> for, so >>> I assumed it was some voodoo and left it in place. Turns out cscope failed >>> me >>> and I didn't need the macro at all, the trace blacklist thing I was looking >>> at >>> was for marking assembly functions as blacklisted and I have no intention of >>> marking assembly functions as error injectable at the moment. >>> >>> v7->v8: >>> - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed. >> >> The series doesn't apply cleanly to the bpf-next tree, so one last respin >> with >> a rebase would unfortunately still be required, thanks! > > I've rebased and let it sit in my git tree to make sure kbuild test bot didn't > blow up, can you pull from > > git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git > bpf-override-return > > or do you want me to repost the whole series? Thanks, Yeah, the patches would need to end up on netdev, so once kbuild bot went through fine after your rebase, please send the series. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray
On Thu, Dec 07, 2017 at 11:38:43AM +1100, Dave Chinner wrote: > > > cmpxchg is for replacing a known object in a store - it's not really > > > intended for doing initial inserts after a lookup tells us there is > > > nothing in the store. The radix tree "insert only if empty" makes > > > sense here, because it naturally takes care of lookup/insert races > > > via the -EEXIST mechanism. > > > > > > I think that providing xa_store_excl() (which would return -EEXIST > > > if the entry is not empty) would be a better interface here, because > > > it matches the semantics of lookup cache population used all over > > > the kernel > > > > I'm not thrilled with xa_store_excl(), but I need to think about that > > a bit more. > > Not fussed about the name - I just think we need a function that > matches the insert semantics of the code I think I have something that works better for you than returning -EEXIST (because you don't actually want -EEXIST, you want -EAGAIN): /* insert the new inode */ - spin_lock(&pag->pag_ici_lock); - error = radix_tree_insert(&pag->pag_ici_root, agino, ip); - if (unlikely(error)) { - WARN_ON(error != -EEXIST); - XFS_STATS_INC(mp, xs_ig_dup); - error = -EAGAIN; - goto out_preload_end; - } - spin_unlock(&pag->pag_ici_lock); - radix_tree_preload_end(); + curr = xa_cmpxchg(&pag->pag_ici_xa, agino, NULL, ip, GFP_NOFS); + error = __xa_race(curr, -EAGAIN); + if (error) + goto out_unlock; ... -out_preload_end: - spin_unlock(&pag->pag_ici_lock); - radix_tree_preload_end(); +out_unlock: + if (error == -EAGAIN) + XFS_STATS_INC(mp, xs_ig_dup); I've changed the behaviour slightly in that returning an -ENOMEM used to hit a WARN_ON, and I don't think that's the right response -- GFP_NOFS returning -ENOMEM probably gets you a nice warning already from the mm code. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Lockdep is less useful than it was
On Fri, Dec 08, 2017 at 10:14:38AM -0800, Matthew Wilcox wrote: > At the moment, the radix tree actively disables the RCU checking that > enabling lockdep would give us. It has to, because it has no idea what > lock protects any individual access to the radix tree. The XArray can > use the RCU checking because it knows that every reference is protected > by either the spinlock or the RCU lock. > > Dave was saying that he has a tree which has to be protected by a mutex > because of where it is in the locking hierarchy, and I was vigorously > declining his proposal of allowing him to skip taking the spinlock. Oh, I wasn't suggesting that you remove the internal tree locking because we need external locking. I was trying to point out that the internal locking doesn't remove the need for external locking, and that there are cases where smearing the internal lock outside the XA tree doesn't work, either. i.e. internal locking doesn't replace all the cases where external locking is required, and so it's less efficient than the existing radix tree code. What I was questioning was the value of replacing the radix tree code with a less efficient structure just to add lockdep validation to a tree that doesn't actually need any extra locking validation... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray
On Fri, Dec 08, 2017 at 12:35:07PM -0500, Alan Stern wrote: > On Fri, 8 Dec 2017, Byungchul Park wrote: > > > I'm sorry to hear that.. If I were you, I would also get > > annoyed. And.. thanks for explanation. > > > > But, I think assigning lock classes properly and checking > > relationship of the classes to detect deadlocks is reasonable. > > > > In my opinion about the common lockdep stuff, there are 2 > > problems on it. > > > > 1) Firstly, it's hard to assign lock classes *properly*. By > > default, it relies on the caller site of lockdep_init_map(), > > but we need to assign another class manually, where ordering > > rules are complicated so cannot rely on the caller site. That > > *only* can be done by experts of the subsystem. Sure, but that's not the issue here. The issue here is the lack of communication with subsystem experts and that the annotation complexity warnings given immediately by the subsystem experts were completely ignored... > > I think if they want to get benifit from lockdep, they have no > > choice but to assign classes manually with the domain knowledge, > > or use *lockdep_set_novalidate_class()* to invalidate locks > > making the developers annoyed and not want to use the checking > > for them. > > Lockdep's no_validate class is used when the locking patterns are too > complicated for lockdep to understand. Basically, it tells lockdep to > ignore those locks. Let me just point out two things here: 1. Using lockdep_set_novalidate_class() for anything other than device->mutex will throw checkpatch warnings. Nice. (*) 2. lockdep_set_novalidate_class() is completely undocumented - it's the first I've ever heard of this functionality. i.e. nobody has ever told us there is a mechanism to turn off validation of an object; we've *always* been told to "change your code and/or fix your annotations" when discussing lockdep deficiencies. (**) > The device core uses that class. The tree of struct devices, each with > its own lock, gets used in many different and complicated ways. > Lockdep can't understand this -- it doesn't have the ability to > represent an arbitrarily deep hierarchical tree of locks -- so we tell ^^ That largely describes the in-memory structure of XFS, except we have a forest of lock trees, not just one > it to ignore the device locks. > > It sounds like XFS may need to do the same thing with its semaphores. Who-ever adds semaphore checking to lockdep can add those annotations. The externalisation of the development cost of new lockdep functionality is one of the problems here. -Dave. (*) checkpatch.pl is considered mostly harmful round here, too, but that's another rant (**) the frequent occurrence of "core code/devs aren't held to the same rules/standard as everyone else" is another rant I have stored up for a rainy day. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection
On Fri, Dec 08, 2017 at 04:35:44PM +0100, Daniel Borkmann wrote: > On 12/06/2017 05:12 PM, Josef Bacik wrote: > > Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro. I went > > to > > figure out why the compiler didn't catch it and it's because it was not used > > anywhere. I had copied it from the trace blacklist code without > > understanding > > where it was used as cscope didn't find the original macro I was looking > > for, so > > I assumed it was some voodoo and left it in place. Turns out cscope failed > > me > > and I didn't need the macro at all, the trace blacklist thing I was looking > > at > > was for marking assembly functions as blacklisted and I have no intention of > > marking assembly functions as error injectable at the moment. > > > > v7->v8: > > - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed. > > The series doesn't apply cleanly to the bpf-next tree, so one last respin with > a rebase would unfortunately still be required, thanks! I've rebased and let it sit in my git tree to make sure kbuild test bot didn't blow up, can you pull from git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git bpf-override-return or do you want me to repost the whole series? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Lockdep is less useful than it was
On Fri, Dec 08, 2017 at 10:27:17AM -0500, Theodore Ts'o wrote: > So if you are adding complexity to the kernel with the argument, > "lockdep will save us", I'm with Dave --- it's just not a believable > argument. I think that's a gross misrepresentation of what I'm doing. At the moment, the radix tree actively disables the RCU checking that enabling lockdep would give us. It has to, because it has no idea what lock protects any individual access to the radix tree. The XArray can use the RCU checking because it knows that every reference is protected by either the spinlock or the RCU lock. Dave was saying that he has a tree which has to be protected by a mutex because of where it is in the locking hierarchy, and I was vigorously declining his proposal of allowing him to skip taking the spinlock. And yes, we have bugs today that I assume we only stumble across every few billion years (or only on alpha, or only if our compiler gets more aggressive) because we have missing rcu_dereference annotations. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray
On Fri, 8 Dec 2017, Byungchul Park wrote: > I'm sorry to hear that.. If I were you, I would also get > annoyed. And.. thanks for explanation. > > But, I think assigning lock classes properly and checking > relationship of the classes to detect deadlocks is reasonable. > > In my opinion about the common lockdep stuff, there are 2 > problems on it. > > 1) Firstly, it's hard to assign lock classes *properly*. By > default, it relies on the caller site of lockdep_init_map(), > but we need to assign another class manually, where ordering > rules are complicated so cannot rely on the caller site. That > *only* can be done by experts of the subsystem. > > I think if they want to get benifit from lockdep, they have no > choice but to assign classes manually with the domain knowledge, > or use *lockdep_set_novalidate_class()* to invalidate locks > making the developers annoyed and not want to use the checking > for them. Lockdep's no_validate class is used when the locking patterns are too complicated for lockdep to understand. Basically, it tells lockdep to ignore those locks. The device core uses that class. The tree of struct devices, each with its own lock, gets used in many different and complicated ways. Lockdep can't understand this -- it doesn't have the ability to represent an arbitrarily deep hierarchical tree of locks -- so we tell it to ignore the device locks. It sounds like XFS may need to do the same thing with its semaphores. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection
On 12/06/2017 05:12 PM, Josef Bacik wrote: > Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro. I went to > figure out why the compiler didn't catch it and it's because it was not used > anywhere. I had copied it from the trace blacklist code without understanding > where it was used as cscope didn't find the original macro I was looking for, > so > I assumed it was some voodoo and left it in place. Turns out cscope failed me > and I didn't need the macro at all, the trace blacklist thing I was looking at > was for marking assembly functions as blacklisted and I have no intention of > marking assembly functions as error injectable at the moment. > > v7->v8: > - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed. The series doesn't apply cleanly to the bpf-next tree, so one last respin with a rebase would unfortunately still be required, thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Lockdep is less useful than it was
On Thu, Dec 07, 2017 at 02:38:03PM -0800, Matthew Wilcox wrote: > I think it was a mistake to force these on for everybody; they have a > much higher false-positive rate than the rest of lockdep, so as you say > forcing them on leads to fewer people using *any* of lockdep. > > The bug you're hitting isn't Byungchul's fault; it's an annotation > problem. The same kind of annotation problem that we used to have with > dozens of other places in the kernel which are now fixed. The question is whose responsibility is it to annotate the code? On another thread it was suggested it was ext4's responsibility to add annotations to avoid the false positives --- never the mind the fact that every single file system is going to have add annotations. Also note that the documentation for how to add annotations is ***horrible***. It's mostly, "try to figure out how other people added magic cargo cult code which is not well defined (look at the definitions of lockdep_set_class, lockdep_set_class_and_name, lockdep_set_class_and_subclass, and lockdep_set_subclass, and weep) in other subsystems and hope and pray it works for you." And the explanation when there are failures, either false positives, or not, are completely opaque. For example: [ 16.190198] ext4lazyinit/648 is trying to acquire lock: [ 16.191201] ((gendisk_completion)1 << part_shift(NUMA_NO_NODE)){+.+.}, at: [<8a1ebe9d>] wait_for_completion_io+0x12/0x20 Just try to tell me that: ((gendisk_completion)1 << part_shift(NUMA_NO_NODE)){+.+.} is human comprehensible with a straight face. And since the messages don't even include the subclass/class/name key annotations, as lockdep tries to handle things that are more and more complex, I'd argue it has already crossed the boundary where unless you are a lockdep developer, good luck trying to understand what is going on or how to add annotations. So if you are adding complexity to the kernel with the argument, "lockdep will save us", I'm with Dave --- it's just not a believable argument. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Improve btrfs_search_slot description
Signed-off-by: Nikolay Borisov --- fs/btrfs/ctree.c | 32 ++-- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 880f4f693263..1f001d31bda8 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2653,18 +2653,30 @@ int btrfs_find_item(struct btrfs_root *fs_root, struct btrfs_path *path, return 0; } -/* - * look for key in the tree. path is filled in with nodes along the way - * if key is found, we return zero and you can find the item in the leaf - * level of the path (level 0) +/* btrfs_search_slot - look for a key in a tree and perform necessary + * modifications to preserve tree invariants. + * + * @trans: Handle of transaction, used when modifying the tree + * @p: Holds all btree nodes along the search path + * @root: The root node of the tree + * @key: The key we are looking for + * @ins_len: Indicates purpose of search, for inserts it is 1, for + * deletions it's -1. 0 for plain searches + * @cow: boolean should CoW operations be performed. Must always be 1 + * when modifying the tree. + * + * If @ins_len > 0, nodes and leaves will be split as we walk down the tree. + * If @ins_len < 0, nodes will be merged as we walk down the tree (if possible) + * + * If @key is found, 0 is returned and you can find the item in the leaf level + * of the path (level 0) * - * If the key isn't found, the path points to the slot where it should - * be inserted, and 1 is returned. If there are other errors during the - * search a negative error number is returned. + * If @key isn't found, 1 is returned and the leaf level of the path (level 0) + * points to the slot where it should be inserted + * be inserted, and 1 is returned. * - * if ins_len > 0, nodes and leaves will be split as we walk down the - * tree. if ins_len < 0, nodes will be merged as we walk down the tree (if - * possible) + * If an error is encountered while searching the tree a negative error number + * is returned */ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, const struct btrfs_key *key, struct btrfs_path *p, -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Rename bin_search -> btrfs_bin_search
Currently there are 2 function doing binary search on btrfs nodes: bin_search and btrfs_bin_search. The latter being a simple wrapper for the former. So eliminate the wrapper and just rename bin_search to btrfs_bin_search. No functional changes Signed-off-by: Nikolay Borisov --- fs/btrfs/ctree.c | 14 -- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 1e74cf826532..880f4f693263 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1807,8 +1807,8 @@ static noinline int generic_bin_search(struct extent_buffer *eb, * simple bin_search frontend that does the right thing for * leaves vs nodes */ -static int bin_search(struct extent_buffer *eb, const struct btrfs_key *key, - int level, int *slot) +int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key, +int level, int *slot) { if (level == 0) return generic_bin_search(eb, @@ -1824,12 +1824,6 @@ static int bin_search(struct extent_buffer *eb, const struct btrfs_key *key, slot); } -int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key, -int level, int *slot) -{ - return bin_search(eb, key, level, slot); -} - static void root_add_used(struct btrfs_root *root, u32 size) { spin_lock(&root->accounting_lock); @@ -2614,7 +2608,7 @@ static int key_search(struct extent_buffer *b, const struct btrfs_key *key, int level, int *prev_cmp, int *slot) { if (*prev_cmp != 0) { - *prev_cmp = bin_search(b, key, level, slot); + *prev_cmp = btrfs_bin_search(b, key, level, slot); return *prev_cmp; } @@ -5175,7 +5169,7 @@ int btrfs_search_forward(struct btrfs_root *root, struct btrfs_key *min_key, while (1) { nritems = btrfs_header_nritems(cur); level = btrfs_header_level(cur); - sret = bin_search(cur, min_key, level, &slot); + sret = btrfs_bin_search(cur, min_key, level, &slot); /* at the lowest level, we're done, setup the path and exit */ if (level == path->lowest_level) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: self heal from SB fail
On 12/08/2017 08:51 PM, Austin S. Hemmelgarn wrote: On 2017-12-08 02:57, Anand Jain wrote: -EXPERIMENTAL- As of now when primary SB fails we won't self heal and would fail mount, this is an experimental patch which thinks why not go and read backup copy. I like the concept, and actually think this should be default behavior on a filesystem that's already mounted (we fix other errors, why not SB's), but I don't think it should be default behavior at mount time for the reasons Qu has outlined (picking up old BTRFS SB's after reformatting is bad). If the primary SB is not btrfs, then unless forced with -t btrfs option, this mount won't be in btrfs at all. That means it goes to the respective FS by default. If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs is not overwriting all the copies of SB then its a mkfs.btrfs bug. Which means there is already mount -t option to redirect to the FS module that is needed by the user, when primary SB is corrupted. We should use this feature. Thanks, Anand However, I do think it's useful to be able to ask for this behavior on mount, so that you don't need to fight with the programs to get a filesystem to mount when the first SB is missing (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?). Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 +++- fs/btrfs/volumes.c | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b20c1f3563b..a791b8dfe8a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfs_read_dev_one_super(bdev, i, &bh); if (ret) continue; @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } +#if 0 + /* + * Need a way to check for any copy of SB, as its not a + * strong check, just ignore this for now. + */ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { btrfs_err(fs_info, "super offset mismatch %llu != %u", btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); ret = -EINVAL; } +#endif /* * Obvious sys_chunk_array corruptions, it must hold at least one key diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9fa2539a8493..f368db94d62b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, { struct btrfs_super_block *disk_super; struct block_device *bdev; - struct page *page; + struct buffer_head *sb_bh; int ret = -EINVAL; u64 devid; u64 transid; @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, goto error; } - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) + sb_bh = btrfs_read_dev_super(bdev); + if (IS_ERR(sb_bh)) { + ret = PTR_ERR(sb_bh); goto error_bdev_put; + } + disk_super = (struct btrfs_super_block *) sb_bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); transid = btrfs_super_generation(disk_super); @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, if (!ret && fs_devices_ret) (*fs_devices_ret)->total_devices = total_devices; - btrfs_release_disk_super(page); + brelse(sb_bh); error_bdev_put: blkdev_put(bdev, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: sink extent_write_full_page tree argument
The tree argument passed to extent_write_full_page is referenced from the page being passed to the same function. Since we already have enough information to get the reference, remove the function parameter. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 5 ++--- fs/btrfs/extent_io.h | 3 +-- fs/btrfs/inode.c | 4 +--- 3 files changed, 4 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c0b2bf65d6b0..6cd3da16f114 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4056,13 +4056,12 @@ static noinline void flush_write_bio(void *data) flush_epd_write_bio(epd); } -int extent_write_full_page(struct extent_io_tree *tree, struct page *page, - struct writeback_control *wbc) +int extent_write_full_page(struct page *page, struct writeback_control *wbc) { int ret; struct extent_page_data epd = { .bio = NULL, - .tree = tree, + .tree = &BTRFS_I(page->mapping->host)->io_tree, .extent_locked = 0, .sync_io = wbc->sync_mode == WB_SYNC_ALL, }; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index f2cbabb2306a..db2558b0cad4 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -403,8 +403,7 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start, struct extent_state **cached_state); int extent_invalidatepage(struct extent_io_tree *tree, struct page *page, unsigned long offset); -int extent_write_full_page(struct extent_io_tree *tree, struct page *page, - struct writeback_control *wbc); +int extent_write_full_page(struct page *page, struct writeback_control *wbc); int extent_write_locked_range(struct inode *inode, u64 start, u64 end, int mode); int extent_writepages(struct extent_io_tree *tree, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e61d549bfc36..e79154f032e3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8862,7 +8862,6 @@ int btrfs_readpage(struct file *file, struct page *page) static int btrfs_writepage(struct page *page, struct writeback_control *wbc) { - struct extent_io_tree *tree; struct inode *inode = page->mapping->host; int ret; @@ -8881,8 +8880,7 @@ static int btrfs_writepage(struct page *page, struct writeback_control *wbc) redirty_page_for_writepage(wbc, page); return AOP_WRITEPAGE_ACTIVATE; } - tree = &BTRFS_I(page->mapping->host)->io_tree; - ret = extent_write_full_page(tree, page, wbc); + ret = extent_write_full_page(page, wbc); btrfs_add_delayed_iput(inode); return ret; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: sink extent_write_locked_range tree parameter
This function is called only from submit_compressed_extents and the io tree being passed is always that of the inode. But we are also passing the inode, so just move getting the io tree pointer in extent_write_locked_range to simplify the signature. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 5 +++-- fs/btrfs/extent_io.h | 4 ++-- fs/btrfs/inode.c | 4 ++-- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 16ae832bdb5d..c0b2bf65d6b0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4073,11 +4073,12 @@ int extent_write_full_page(struct extent_io_tree *tree, struct page *page, return ret; } -int extent_write_locked_range(struct extent_io_tree *tree, struct inode *inode, - u64 start, u64 end, int mode) +int extent_write_locked_range(struct inode *inode, u64 start, u64 end, + int mode) { int ret = 0; struct address_space *mapping = inode->i_mapping; + struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; struct page *page; unsigned long nr_pages = (end - start + PAGE_SIZE) >> PAGE_SHIFT; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index c28f5ef88f42..f2cbabb2306a 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -405,8 +405,8 @@ int extent_invalidatepage(struct extent_io_tree *tree, struct page *page, unsigned long offset); int extent_write_full_page(struct extent_io_tree *tree, struct page *page, struct writeback_control *wbc); -int extent_write_locked_range(struct extent_io_tree *tree, struct inode *inode, - u64 start, u64 end, int mode); +int extent_write_locked_range(struct inode *inode, u64 start, u64 end, + int mode); int extent_writepages(struct extent_io_tree *tree, struct address_space *mapping, struct writeback_control *wbc); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 18c09cca3be7..e61d549bfc36 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -770,8 +770,8 @@ static noinline void submit_compressed_extents(struct inode *inode, * all those pages down to the drive. */ if (!page_started && !ret) - extent_write_locked_range(io_tree, - inode, async_extent->start, + extent_write_locked_range(inode, + async_extent->start, async_extent->start + async_extent->ram_size - 1, WB_SYNC_ALL); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] btrfs: avoid to access bvec table directly for a cloned bio
Commit 17347cec15f919901c90(Btrfs: change how we iterate bios in endio) mentioned that for dio the submitted bio may be fast cloned, we can't access the bvec table directly for a cloned bio, so use bio_get_first_bvec() to retrieve the 1st bvec. Cc: Chris Mason Cc: Josef Bacik Cc: David Sterba Cc: linux-btrfs@vger.kernel.org Cc: Liu Bo Reviewed-by: Liu Bo Acked: David Sterba Signed-off-by: Ming Lei --- fs/btrfs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d28b66019d54..7f23c1993d24 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8013,6 +8013,7 @@ static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio, int segs; int ret; blk_status_t status; + struct bio_vec bvec; BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); @@ -8028,8 +8029,9 @@ static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio, } segs = bio_segments(failed_bio); + bio_get_first_bvec(failed_bio, &bvec); if (segs > 1 || - (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode))) + (bvec.bv_len > btrfs_inode_sectorsize(inode))) read_mode |= REQ_FAILFAST_DEV; isector = start - btrfs_io_bio(failed_bio)->logical; -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/10] btrfs: avoid access to .bi_vcnt directly
BTRFS uses bio->bi_vcnt to figure out page numbers, this way becomes not correct once we start to enable multipage bvec. So use bio_nr_pages() to do that instead. Cc: Chris Mason Cc: Josef Bacik Cc: David Sterba Cc: linux-btrfs@vger.kernel.org Acked-by: David Sterba Signed-off-by: Ming Lei --- fs/btrfs/extent_io.c | 9 + fs/btrfs/extent_io.h | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6f6669f93beb..27795bf2507c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2257,7 +2257,7 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end, return 0; } -bool btrfs_check_repairable(struct inode *inode, struct bio *failed_bio, +bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, struct io_failure_record *failrec, int failed_mirror) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); @@ -2281,7 +2281,7 @@ bool btrfs_check_repairable(struct inode *inode, struct bio *failed_bio, * a) deliver good data to the caller * b) correct the bad sectors on disk */ - if (failed_bio->bi_vcnt > 1) { + if (failed_bio_pages > 1) { /* * to fulfill b), we need to know the exact failing sectors, as * we don't want to rewrite any more than the failed ones. thus, @@ -2374,6 +2374,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, int read_mode = 0; blk_status_t status; int ret; + unsigned failed_bio_pages = bio_nr_pages(failed_bio); BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); @@ -2381,13 +2382,13 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, if (ret) return ret; - if (!btrfs_check_repairable(inode, failed_bio, failrec, + if (!btrfs_check_repairable(inode, failed_bio_pages, failrec, failed_mirror)) { free_io_failure(failure_tree, tree, failrec); return -EIO; } - if (failed_bio->bi_vcnt > 1) + if (failed_bio_pages > 1) read_mode |= REQ_FAILFAST_DEV; phy_offset >>= inode->i_sb->s_blocksize_bits; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 93dcae0c3183..20854d63c75b 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -540,7 +540,7 @@ void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end); int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end, struct io_failure_record **failrec_ret); -bool btrfs_check_repairable(struct inode *inode, struct bio *failed_bio, +bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, struct io_failure_record *failrec, int fail_mirror); struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, struct io_failure_record *failrec, -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: self heal from SB fail
On 2017年12月08日 21:05, Austin S. Hemmelgarn wrote: > On 2017-12-08 07:59, Qu Wenruo wrote: >> >> >> On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote: >>> On 2017-12-08 02:57, Anand Jain wrote: -EXPERIMENTAL- As of now when primary SB fails we won't self heal and would fail mount, this is an experimental patch which thinks why not go and read backup copy. >>> I like the concept, and actually think this should be default behavior >>> on a filesystem that's already mounted (we fix other errors, why not >>> SB's), but I don't think it should be default behavior at mount time for >>> the reasons Qu has outlined (picking up old BTRFS SB's after >>> reformatting is bad). However, I do think it's useful to be able to ask >>> for this behavior on mount, so that you don't need to fight with the >>> programs to get a filesystem to mount when the first SB is missing >>> (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?). >> >> Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount >> option than do it automatically. > I still think there should be an option to do automatic detection (it's > not particularly hard, and it's not likely to do the wrong thing in most > cases), but being able to explicitly specify a particular superblock for > a mount is definitely a step in the right direction. usebackupsuper <- Auto usebackupsuper=n <- Manual usebackupsuper=n,m <- Manual multi, in given order (The last one seems a little overkilled though) Thanks, Qu >> >> Thanks, >> Qu >> Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 +++- fs/btrfs/volumes.c | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b20c1f3563b..a791b8dfe8a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfs_read_dev_one_super(bdev, i, &bh); if (ret) continue; @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } +#if 0 + /* + * Need a way to check for any copy of SB, as its not a + * strong check, just ignore this for now. + */ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { btrfs_err(fs_info, "super offset mismatch %llu != %u", btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); ret = -EINVAL; } +#endif /* * Obvious sys_chunk_array corruptions, it must hold at least one key diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9fa2539a8493..f368db94d62b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, { struct btrfs_super_block *disk_super; struct block_device *bdev; - struct page *page; + struct buffer_head *sb_bh; int ret = -EINVAL; u64 devid; u64 transid; @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, goto error; } - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) + sb_bh = btrfs_read_dev_super(bdev); + if (IS_ERR(sb_bh)) { + ret = PTR_ERR(sb_bh); goto error_bdev_put; + } + disk_super = (struct btrfs_super_block *) sb_bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); transid = btrfs_super_generation(disk_super); @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, if (!ret && fs_devices_ret) (*fs_devices_ret)->total_devices = total_devices; - btrfs_release_disk_super(page); + brelse(sb_bh); error_bdev_put: blkdev_put(bdev, flags); > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: OpenPGP digital signature
Re: [PATCH v5 0/3] Add cli and ioctl to deregister devices
On 2017-12-07 21:17, Duncan wrote: Anand Jain posted on Fri, 08 Dec 2017 08:51:43 +0800 as excerpted: On 12/07/2017 10:52 PM, Austin S. Hemmelgarn wrote: On 2017-12-07 09:36, Anand Jain wrote: Add ability to deregister a or all devices. I have named this sub cmd as deregister, but I am open to your suggestions. Being a bit picky here, but from the perspective of a native speaker of American English, I would say that 'deregister' sounds rather synthetic and somewhat harsh and alien. Given that, as odd as it sounds, I think 'ignore' might be a more user friendly name for the sub-command. It accurately describes what the command is doing (telling the kernel to ignore the device), and it's a lot less alien sounding than 'deregister'. If you're set on having it be based on the word 'register', I would suggest changing it to 'unregister', as I think that sounds more natural than 'deregister'. Additionally, if you do continue with 'deregister' or go with 'unregister' as the name though, I would suggest adding 'register' as a synonym for the 'scan' sub-command to keep things reasonably symmetrical. A look up on unregister lead me to use deregister as more appropriate. Anyway I won't bother much, I will be go be suggestions, and how about unscan, since scan is already there. OR how about purge. This is a bikeshed I think I have a beautiful color suggestion for! =:^) Seriously, the normal purpose of scanning is to record particular details of what was seen during the scan, based on some desired scanning criteria. So what do you call it when you need to "forget" the information you scanned for? What about simply "forget", btrfs device forget ? That sounds the most natural to me, certainly far more so than the apparently freshly created word "unscan", tho that would certainly deliver the meaning. I actually like forget even better than ignore, it's more descriptive and a bit more obvious. 'purge' might make sense if you're always dumping the entire list, but sounds very absolute and dangerous (and almost universally has a very negative connotation). And FWIW, "deregister", just.. no. (I too would vote unregister if we're sticking with the register root-word, but I have a feeling that may be a regional/en-US preference and some other English regional variants may find deregister is the less terrible to their ear. That may explain whatever commentary under unregister led you to go with deregister.) "Deregister" sounds like something a computer programmer might say to describe the process, but I can't imagine a "normal" person using the word, except possibly in the context of removing someone from the voter rolls (where one "registers" to vote, so "deregister" could be a a reasonably natural term for reversing that) or the like. Actually, in that case it's usually 'unregister' as well, at least in American English. 'deregister' is indeed largely a programming term, but even then it's not unusual for people to just use 'unregister' (the distinction that I usually use myself is that 'unregister' refers to a voluntary process initiated by the entity that initially registered whatever it was (which is the common case, and technically what this is), whereas 'deregister' is usually an involuntary process triggered by a third party (which is rare outside of things like forced kernel module removal or crashes)). It's probably worth noting that both uses are uncommon enough that the standard American English dictionaries in Thunderbird and aspell lack both words (though Thunderbird does have 'unregistered'). As far as regional variance, I've checked with a couple of friends from Australia, Canada, Denmark, and the UK, and all of them also felt that 'deregister' sounded less natural than 'unregister'. The reality is that pretty much regardless of the particular regional dialect, usage patterns for a given negation prefix tend to be reasonably consistent, namely: * de- is usually used in technical verbs and not much else (desensitize, desaturate, deauthorize, delineate, etc) * un- is usually used in generic terms, including verbs, adverbs, and adjectives, and is often the most common choice for creating 'new' words by negating an existing verb, adverb, or adjective (unknown, unsightly, unfriendly, etc) * anti- is used exclusively for negating nouns, traditionally for concepts, but more recently for any noun (antithesis, antimatter, antidisestablishmentarianism, etc) * dis- is used for a handful of very specific cases usually referring to behavior or personality, and sometimes culture (dishonest, disingenuous, etc) * non- is used almost exclusively for adjectives and adverbs, but is less frequent than un- (nonsensical, nonplussed, etc) * il- rarely used, typically only used for negating something describing the state of an action, and almost exclusively with words that begin with 'l', sometimes extended to ill to add a negative connotation to an existing word (illo
Re: [PATCH RFC] btrfs: self heal from SB fail
On 2017-12-08 07:59, Qu Wenruo wrote: On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote: On 2017-12-08 02:57, Anand Jain wrote: -EXPERIMENTAL- As of now when primary SB fails we won't self heal and would fail mount, this is an experimental patch which thinks why not go and read backup copy. I like the concept, and actually think this should be default behavior on a filesystem that's already mounted (we fix other errors, why not SB's), but I don't think it should be default behavior at mount time for the reasons Qu has outlined (picking up old BTRFS SB's after reformatting is bad). However, I do think it's useful to be able to ask for this behavior on mount, so that you don't need to fight with the programs to get a filesystem to mount when the first SB is missing (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?). Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount option than do it automatically. I still think there should be an option to do automatic detection (it's not particularly hard, and it's not likely to do the wrong thing in most cases), but being able to explicitly specify a particular superblock for a mount is definitely a step in the right direction. Thanks, Qu Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 +++- fs/btrfs/volumes.c | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b20c1f3563b..a791b8dfe8a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfs_read_dev_one_super(bdev, i, &bh); if (ret) continue; @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } +#if 0 + /* + * Need a way to check for any copy of SB, as its not a + * strong check, just ignore this for now. + */ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { btrfs_err(fs_info, "super offset mismatch %llu != %u", btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); ret = -EINVAL; } +#endif /* * Obvious sys_chunk_array corruptions, it must hold at least one key diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9fa2539a8493..f368db94d62b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, { struct btrfs_super_block *disk_super; struct block_device *bdev; - struct page *page; + struct buffer_head *sb_bh; int ret = -EINVAL; u64 devid; u64 transid; @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, goto error; } - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) + sb_bh = btrfs_read_dev_super(bdev); + if (IS_ERR(sb_bh)) { + ret = PTR_ERR(sb_bh); goto error_bdev_put; + } + disk_super = (struct btrfs_super_block *) sb_bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); transid = btrfs_super_generation(disk_super); @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, if (!ret && fs_devices_ret) (*fs_devices_ret)->total_devices = total_devices; - btrfs_release_disk_super(page); + brelse(sb_bh); error_bdev_put: blkdev_put(bdev, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: self heal from SB fail
On 12/08/2017 08:12 PM, Nikolay Borisov wrote: On 8.12.2017 12:33, Anand Jain wrote: On 12/08/2017 04:40 PM, Nikolay Borisov wrote: On 8.12.2017 09:57, Anand Jain wrote: -EXPERIMENTAL- As of now when primary SB fails we won't self heal and would fail mount, this is an experimental patch which thinks why not go and read backup copy. Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 +++- fs/btrfs/volumes.c | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b20c1f3563b..a791b8dfe8a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfs_read_dev_one_super(bdev, i, &bh); if (ret) continue; @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } +#if 0 + /* + * Need a way to check for any copy of SB, as its not a + * strong check, just ignore this for now. + */ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { btrfs_err(fs_info, "super offset mismatch %llu != %u", btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); ret = -EINVAL; } +#endif /* * Obvious sys_chunk_array corruptions, it must hold at least one key diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9fa2539a8493..f368db94d62b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, { struct btrfs_super_block *disk_super; struct block_device *bdev; - struct page *page; + struct buffer_head *sb_bh; int ret = -EINVAL; u64 devid; u64 transid; @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, goto error; } - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) + sb_bh = btrfs_read_dev_super(bdev); This patch prompts another question: why do we have a page-based and a bufferhead-based interface to reading the super block? Right. we need to know that. Sorry I just saw this. FWIW unless we explicitly need a per-block state tracking (which I don't think we do) we should ideally switch to page-based mechanism. Buffer heads are considered deprecated. Also the only reason why we do have btrfsic_submit_bh is for the superblock bufer head. So potentially by removing the only requirement in the kernel where bh are used we can simplify the btrfsic code as well . Just something to keep in mind. Thanks. These are related to the other email thread [1] [1] [Question] Two variants of SB reads can we move into bio read instead ? We can follow up there. ? Could we ? That cleanup is due for a long time now. Thanks, Anand I have a very old patch to converge them and clean this up, but haven't sent it because there are some missing information on why it ended up like that in the first place. Thanks, Anand I did prototype switching the bufferheads to page based but the resulting code wasn't any cleaner. I believe there is also open the question what happens when btrfs is run on a 64k page machine. I.e. we are going to read a single page and the sb is going to be in the first 4k but what about the rest 60, they could potentially contain other metadata. The page will have to be freed asap so as not to peg the neighboring metadata? + if (IS_ERR(sb_bh)) { + ret = PTR_ERR(sb_bh); goto error_bdev_put; + } + disk_super = (struct btrfs_super_block *) sb_bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); transid = btrfs_super_generation(disk_super); @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, if (!ret && fs_devices_ret) (*fs_devices_ret)->total_devices = total_devices; - btrfs_release_disk_super(page); + brelse(sb_bh); error_bdev_put: blkdev_put(bdev, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: self heal from SB fail
On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote: > On 2017-12-08 02:57, Anand Jain wrote: >> -EXPERIMENTAL- >> As of now when primary SB fails we won't self heal and would fail mount, >> this is an experimental patch which thinks why not go and read backup >> copy. > I like the concept, and actually think this should be default behavior > on a filesystem that's already mounted (we fix other errors, why not > SB's), but I don't think it should be default behavior at mount time for > the reasons Qu has outlined (picking up old BTRFS SB's after > reformatting is bad). However, I do think it's useful to be able to ask > for this behavior on mount, so that you don't need to fight with the > programs to get a filesystem to mount when the first SB is missing > (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?). Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount option than do it automatically. Thanks, Qu >> >> Signed-off-by: Anand Jain >> --- >> fs/btrfs/disk-io.c | 8 +++- >> fs/btrfs/volumes.c | 10 +++--- >> 2 files changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 9b20c1f3563b..a791b8dfe8a8 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct >> block_device *bdev) >> * So, we need to add a special mount option to scan for >> * later supers, using BTRFS_SUPER_MIRROR_MAX instead >> */ >> - for (i = 0; i < 1; i++) { >> + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { >> ret = btrfs_read_dev_one_super(bdev, i, &bh); >> if (ret) >> continue; >> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct >> btrfs_fs_info *fs_info) >> ret = -EINVAL; >> } >> +#if 0 >> + /* >> + * Need a way to check for any copy of SB, as its not a >> + * strong check, just ignore this for now. >> + */ >> if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { >> btrfs_err(fs_info, "super offset mismatch %llu != %u", >> btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); >> ret = -EINVAL; >> } >> +#endif >> /* >> * Obvious sys_chunk_array corruptions, it must hold at least >> one key >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index 9fa2539a8493..f368db94d62b 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, >> fmode_t flags, void *holder, >> { >> struct btrfs_super_block *disk_super; >> struct block_device *bdev; >> - struct page *page; >> + struct buffer_head *sb_bh; >> int ret = -EINVAL; >> u64 devid; >> u64 transid; >> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, >> fmode_t flags, void *holder, >> goto error; >> } >> - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) >> + sb_bh = btrfs_read_dev_super(bdev); >> + if (IS_ERR(sb_bh)) { >> + ret = PTR_ERR(sb_bh); >> goto error_bdev_put; >> + } >> + disk_super = (struct btrfs_super_block *) sb_bh->b_data; >> devid = btrfs_stack_device_id(&disk_super->dev_item); >> transid = btrfs_super_generation(disk_super); >> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, >> fmode_t flags, void *holder, >> if (!ret && fs_devices_ret) >> (*fs_devices_ret)->total_devices = total_devices; >> - btrfs_release_disk_super(page); >> + brelse(sb_bh); >> error_bdev_put: >> blkdev_put(bdev, flags); >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: OpenPGP digital signature
Re: [PATCH RFC] btrfs: self heal from SB fail
On 2017年12月08日 20:41, Anand Jain wrote: > > > On 12/08/2017 08:02 PM, Qu Wenruo wrote: >> >> >> On 2017年12月08日 19:48, Anand Jain wrote: >>> >>> >>> On 12/08/2017 07:01 PM, Qu Wenruo wrote: On 2017年12月08日 18:39, Anand Jain wrote: > > > On 12/08/2017 04:17 PM, Qu Wenruo wrote: >> >> >> On 2017年12月08日 15:57, Anand Jain wrote: >>> -EXPERIMENTAL- >>> As of now when primary SB fails we won't self heal and would fail >>> mount, >>> this is an experimental patch which thinks why not go and read >>> backup >>> copy. >> >> Just curious about in which real world case that backup super >> block can >> help. >> At least from what I see in mail list, only few cases where backup >> super >> helps. > > Theoretical design helps. I ended up in this situation though. And > ext4 has -o sb flag to manage this part. When we can expect EIO on > any part of the disk block why not on the LBA which contains > primary > SB. And should we fail the mount for that reason ? No. And how do you ensure it's a btrfs? >>> >>> Hmm. You mean outside of btrfs ? I did experiment with wipe and then >>> using /etc/fstab to mount, and it did lead to btrfs, is that your >>> concern that it shouldn't have been. That looked surprising to me as >>> well, but then problem points at wipefs instead. >> >> It's closer, but still doesn't reach the point. >> It can be a mkfs, other than wipefs. >> >> It's can be another case like: >> >> One user used btrfs for a while >> But bugs made him/her unhappy, and he/she turned to use xfs (whatever >> the fs is) instead. >> >> While he/she forgot to change its fstab, and rebooted the system. >> >> And suddenly, it's btrfs again! >> What a surprise. >> > > I have worked quite a lot on defining the problem statements before > btrfs. Here the user has to be blamed to have forgotten to update > the fstab. I don't understand why should we workaround in btrfs for > the reason that user may miss to update fstab. We don't design > a stuff that like that, we design for the purpose, here backup SB > is for the purpose that we all know, if we don't use backup SB, then > its an incomplete design. Then implement something like ext*, using explicit mount option sb=n. And since it's already called "backup", it's something used for recovery, not for normal operation. We already have rescue tool to recover sb from backup, so it's not a incomplete design. Personally speaking, I prefer the current way and leave backup just as backup. You can my example of a user fault, but the real world needs us to handle user's fault. Or there will be no need for mkfs -f options or rm --no-preserve-root options at all. Thanks, Qu > >>> > >> Despite that self super heal seems good, although I agree with >> David, we >> need a weaker but necessary check (magic and fsid from primary >> super?) >> to ensure it's a valid btrfs before we use the backup supers. > > Of course, we already have btrfs_check_super_valid() to verify > the SB, > I don't understand why - how do we verify the SB should be the > point of > concern here, at all. The point here is, to distinguish an old and invalid btrfs (some other valid fs mkfs beyond the old fs) from a valid btrfs with corrupted primary fs. >>> >>> Ok. When you check all the SBs you would pick the one which has passed >>> btrfs_check_super_valid() and has highest generation. Did I ans your >>> concern ? If a SB does not pass btrfs_check_super_valid() its not a >>> valid btrfs SB at all. >> >> No, not really. >> >> What if the first SB is a XFS one or even a fs you didn't ever hear? >> >> Skip it and use the 2nd one? >> This filesystem is not even btrfs anymore. > > There is no problem is user does the correct thing that is to > update the fstab. OR using a complete mount command. If user > using -t btrfs OR unupdated btrfs then it indicates his intentions. > > Thanks, Anand > >> Thanks, >> Qu >> >>> >>> This the main concern here. The difference between recovery and recognizing a bad btrfs is quite important. >>> >>> btrfs_check_super_valid() is already doing that right ? The point here >>> is, should we use the backup SB when btrfs_check_super_valid() >>> fails on >>> primary SB. >>> >>> Thanks, Anand >>> Thanks, Qu > > Thanks, Anand > >> Thanks, >> Qu >> >> >>> >>> Signed-off-by: Anand Jain >>> --- >>> fs/btrfs/disk-io.c | 8 +++- >>> fs/btrfs/volumes.c | 10 +++--- >>> 2 files changed, 14 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>> index 9b20c1f3563b..a791b8dfe8a8 100644 >>> --- a/fs/btrfs/disk-io.c >>> +++ b/fs/btrfs/disk-io.c >>> @@ -3190,7 +3190,7 @@ struct buffer_head >>
Re: [PATCH RFC] btrfs: self heal from SB fail
On 2017-12-08 02:57, Anand Jain wrote: -EXPERIMENTAL- As of now when primary SB fails we won't self heal and would fail mount, this is an experimental patch which thinks why not go and read backup copy. I like the concept, and actually think this should be default behavior on a filesystem that's already mounted (we fix other errors, why not SB's), but I don't think it should be default behavior at mount time for the reasons Qu has outlined (picking up old BTRFS SB's after reformatting is bad). However, I do think it's useful to be able to ask for this behavior on mount, so that you don't need to fight with the programs to get a filesystem to mount when the first SB is missing (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?). Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 +++- fs/btrfs/volumes.c | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b20c1f3563b..a791b8dfe8a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfs_read_dev_one_super(bdev, i, &bh); if (ret) continue; @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } +#if 0 + /* +* Need a way to check for any copy of SB, as its not a +* strong check, just ignore this for now. +*/ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { btrfs_err(fs_info, "super offset mismatch %llu != %u", btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); ret = -EINVAL; } +#endif /* * Obvious sys_chunk_array corruptions, it must hold at least one key diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9fa2539a8493..f368db94d62b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, { struct btrfs_super_block *disk_super; struct block_device *bdev; - struct page *page; + struct buffer_head *sb_bh; int ret = -EINVAL; u64 devid; u64 transid; @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, goto error; } - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) + sb_bh = btrfs_read_dev_super(bdev); + if (IS_ERR(sb_bh)) { + ret = PTR_ERR(sb_bh); goto error_bdev_put; + } + disk_super = (struct btrfs_super_block *) sb_bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); transid = btrfs_super_generation(disk_super); @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, if (!ret && fs_devices_ret) (*fs_devices_ret)->total_devices = total_devices; - btrfs_release_disk_super(page); + brelse(sb_bh); error_bdev_put: blkdev_put(bdev, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Question] Two variants of SB reads can we move into bio read instead ?
On 8.12.2017 12:27, Anand Jain wrote: > > David, > > There are two variants of SB read, one using the device cache [1] > and the other buffer head [2]. > > [1] btrfs_read_disk_super() > [2] btrfs_read_dev_super() > > Patch, 6f60cbd3ae442cb35861bb522f388db123d42ec1 > (btrfs: access superblock via pagecache in scan_one_device) > Embraced device caches to avoid blocksize set and thus avoid device > drop cache. But however its in the context of starting up with a new > mount and in practice, would it really matter if the device cache is > dropped in the context of mount, we any way drop it when the device > is successfully mounted though. I don't understand the actual problem > here. > > Further [2] is still using buffer head, which works very well for us > in this context, any idea if there is any suggestion to move it to > newer bio read instead ? the bh interface eventually goes: btrfs_read_dev_one_super __bread __bread_gfp __getblk_gfp __getblk_slow __find_get_block __find_get_block_slow find_get_page_flags Which is again using the pagecache, albeit a bit more convoluted. > > Thanks, Anand > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: self heal from SB fail
On 12/08/2017 08:02 PM, Qu Wenruo wrote: On 2017年12月08日 19:48, Anand Jain wrote: On 12/08/2017 07:01 PM, Qu Wenruo wrote: On 2017年12月08日 18:39, Anand Jain wrote: On 12/08/2017 04:17 PM, Qu Wenruo wrote: On 2017年12月08日 15:57, Anand Jain wrote: -EXPERIMENTAL- As of now when primary SB fails we won't self heal and would fail mount, this is an experimental patch which thinks why not go and read backup copy. Just curious about in which real world case that backup super block can help. At least from what I see in mail list, only few cases where backup super helps. Theoretical design helps. I ended up in this situation though. And ext4 has -o sb flag to manage this part. When we can expect EIO on any part of the disk block why not on the LBA which contains primary SB. And should we fail the mount for that reason ? No. And how do you ensure it's a btrfs? Hmm. You mean outside of btrfs ? I did experiment with wipe and then using /etc/fstab to mount, and it did lead to btrfs, is that your concern that it shouldn't have been. That looked surprising to me as well, but then problem points at wipefs instead. It's closer, but still doesn't reach the point. It can be a mkfs, other than wipefs. It's can be another case like: One user used btrfs for a while But bugs made him/her unhappy, and he/she turned to use xfs (whatever the fs is) instead. While he/she forgot to change its fstab, and rebooted the system. And suddenly, it's btrfs again! What a surprise. I have worked quite a lot on defining the problem statements before btrfs. Here the user has to be blamed to have forgotten to update the fstab. I don't understand why should we workaround in btrfs for the reason that user may miss to update fstab. We don't design a stuff that like that, we design for the purpose, here backup SB is for the purpose that we all know, if we don't use backup SB, then its an incomplete design. Despite that self super heal seems good, although I agree with David, we need a weaker but necessary check (magic and fsid from primary super?) to ensure it's a valid btrfs before we use the backup supers. Of course, we already have btrfs_check_super_valid() to verify the SB, I don't understand why - how do we verify the SB should be the point of concern here, at all. The point here is, to distinguish an old and invalid btrfs (some other valid fs mkfs beyond the old fs) from a valid btrfs with corrupted primary fs. Ok. When you check all the SBs you would pick the one which has passed btrfs_check_super_valid() and has highest generation. Did I ans your concern ? If a SB does not pass btrfs_check_super_valid() its not a valid btrfs SB at all. No, not really. What if the first SB is a XFS one or even a fs you didn't ever hear? Skip it and use the 2nd one? This filesystem is not even btrfs anymore. There is no problem is user does the correct thing that is to update the fstab. OR using a complete mount command. If user using -t btrfs OR unupdated btrfs then it indicates his intentions. Thanks, Anand Thanks, Qu This the main concern here. The difference between recovery and recognizing a bad btrfs is quite important. btrfs_check_super_valid() is already doing that right ? The point here is, should we use the backup SB when btrfs_check_super_valid() fails on primary SB. Thanks, Anand Thanks, Qu Thanks, Anand Thanks, Qu Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 +++- fs/btrfs/volumes.c | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b20c1f3563b..a791b8dfe8a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfs_read_dev_one_super(bdev, i, &bh); if (ret) continue; @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } +#if 0 + /* + * Need a way to check for any copy of SB, as its not a + * strong check, just ignore this for now. + */ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { btrfs_err(fs_info, "super offset mismatch %llu != %u", btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); ret = -EINVAL; } +#endif /* * Obvious sys_chunk_array corruptions, it must hold at least one key diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9fa2539a8493..f368db94d62b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t
Re: [PATCH RFC] btrfs: self heal from SB fail
On 12/08/2017 09:17 AM, Qu Wenruo wrote: > > > On 2017年12月08日 15:57, Anand Jain wrote: >> -EXPERIMENTAL- >> As of now when primary SB fails we won't self heal and would fail mount, >> this is an experimental patch which thinks why not go and read backup >> copy. > > Just curious about in which real world case that backup super block can > help. > At least from what I see in mail list, only few cases where backup super > helps. That's about using 'backup roots' with old info about previous generations than the current one on disk, which is a different thing than using one of the other copies of the super block itself. > Despite that self super heal seems good, although I agree with David, we > need a weaker but necessary check (magic and fsid from primary super?) > to ensure it's a valid btrfs before we use the backup supers. > > Thanks, > Qu > > >> >> Signed-off-by: Anand Jain >> --- >> fs/btrfs/disk-io.c | 8 +++- >> fs/btrfs/volumes.c | 10 +++--- >> 2 files changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 9b20c1f3563b..a791b8dfe8a8 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct >> block_device *bdev) >> * So, we need to add a special mount option to scan for >> * later supers, using BTRFS_SUPER_MIRROR_MAX instead >> */ >> -for (i = 0; i < 1; i++) { >> +for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { >> ret = btrfs_read_dev_one_super(bdev, i, &bh); >> if (ret) >> continue; >> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct >> btrfs_fs_info *fs_info) >> ret = -EINVAL; >> } >> >> +#if 0 >> +/* >> + * Need a way to check for any copy of SB, as its not a >> + * strong check, just ignore this for now. >> + */ >> if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { >> btrfs_err(fs_info, "super offset mismatch %llu != %u", >>btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); >> ret = -EINVAL; >> } >> +#endif >> >> /* >> * Obvious sys_chunk_array corruptions, it must hold at least one key >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index 9fa2539a8493..f368db94d62b 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t >> flags, void *holder, >> { >> struct btrfs_super_block *disk_super; >> struct block_device *bdev; >> -struct page *page; >> +struct buffer_head *sb_bh; >> int ret = -EINVAL; >> u64 devid; >> u64 transid; >> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t >> flags, void *holder, >> goto error; >> } >> >> -if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) >> +sb_bh = btrfs_read_dev_super(bdev); >> +if (IS_ERR(sb_bh)) { >> +ret = PTR_ERR(sb_bh); >> goto error_bdev_put; >> +} >> +disk_super = (struct btrfs_super_block *) sb_bh->b_data; >> >> devid = btrfs_stack_device_id(&disk_super->dev_item); >> transid = btrfs_super_generation(disk_super); >> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t >> flags, void *holder, >> if (!ret && fs_devices_ret) >> (*fs_devices_ret)->total_devices = total_devices; >> >> -btrfs_release_disk_super(page); >> +brelse(sb_bh); >> >> error_bdev_put: >> blkdev_put(bdev, flags); >> > -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: self heal from SB fail
On 8.12.2017 12:33, Anand Jain wrote: > > > On 12/08/2017 04:40 PM, Nikolay Borisov wrote: >> >> >> On 8.12.2017 09:57, Anand Jain wrote: >>> -EXPERIMENTAL- >>> As of now when primary SB fails we won't self heal and would fail mount, >>> this is an experimental patch which thinks why not go and read backup >>> copy. >>> >>> Signed-off-by: Anand Jain >>> --- >>> fs/btrfs/disk-io.c | 8 +++- >>> fs/btrfs/volumes.c | 10 +++--- >>> 2 files changed, 14 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>> index 9b20c1f3563b..a791b8dfe8a8 100644 >>> --- a/fs/btrfs/disk-io.c >>> +++ b/fs/btrfs/disk-io.c >>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct >>> block_device *bdev) >>> * So, we need to add a special mount option to scan for >>> * later supers, using BTRFS_SUPER_MIRROR_MAX instead >>> */ >>> - for (i = 0; i < 1; i++) { >>> + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { >>> ret = btrfs_read_dev_one_super(bdev, i, &bh); >>> if (ret) >>> continue; >>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct >>> btrfs_fs_info *fs_info) >>> ret = -EINVAL; >>> } >>> +#if 0 >>> + /* >>> + * Need a way to check for any copy of SB, as its not a >>> + * strong check, just ignore this for now. >>> + */ >>> if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { >>> btrfs_err(fs_info, "super offset mismatch %llu != %u", >>> btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); >>> ret = -EINVAL; >>> } >>> +#endif >>> /* >>> * Obvious sys_chunk_array corruptions, it must hold at least >>> one key >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >>> index 9fa2539a8493..f368db94d62b 100644 >>> --- a/fs/btrfs/volumes.c >>> +++ b/fs/btrfs/volumes.c >>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, >>> fmode_t flags, void *holder, >>> { >>> struct btrfs_super_block *disk_super; >>> struct block_device *bdev; >>> - struct page *page; >>> + struct buffer_head *sb_bh; >>> int ret = -EINVAL; >>> u64 devid; >>> u64 transid; >>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, >>> fmode_t flags, void *holder, >>> goto error; >>> } >>> - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) >>> + sb_bh = btrfs_read_dev_super(bdev); >> >> This patch prompts another question: why do we have a page-based and a >> bufferhead-based interface to reading the super block? > > Right. we need to know that. Sorry I just saw this. FWIW unless we explicitly need a per-block state tracking (which I don't think we do) we should ideally switch to page-based mechanism. Buffer heads are considered deprecated. Also the only reason why we do have btrfsic_submit_bh is for the superblock bufer head. So potentially by removing the only requirement in the kernel where bh are used we can simplify the btrfsic code as well . Just something to keep in mind. > > I have a very old patch to converge them and clean this up, but haven't > sent it because there are some missing information on why it ended up > like that in the first place. > > Thanks, Anand > > >> I did prototype >> switching the bufferheads to page based but the resulting code wasn't >> any cleaner. I believe there is also open the question what happens when >> btrfs is run on a 64k page machine. I.e. we are going to read a single >> page and the sb is going to be in the first 4k but what about the rest >> 60, they could potentially contain other metadata. The page will have to >> be freed asap so as not to peg the neighboring metadata? > > >> >>> + if (IS_ERR(sb_bh)) { >>> + ret = PTR_ERR(sb_bh); >>> goto error_bdev_put; >>> + } >>> + disk_super = (struct btrfs_super_block *) sb_bh->b_data; >>> devid = btrfs_stack_device_id(&disk_super->dev_item); >>> transid = btrfs_super_generation(disk_super); >>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, >>> fmode_t flags, void *holder, >>> if (!ret && fs_devices_ret) >>> (*fs_devices_ret)->total_devices = total_devices; >>> - btrfs_release_disk_super(page); >>> + brelse(sb_bh); >>> error_bdev_put: >>> blkdev_put(bdev, flags); >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info a
Re: [PATCH RFC] btrfs: self heal from SB fail
On 2017年12月08日 19:48, Anand Jain wrote: > > > On 12/08/2017 07:01 PM, Qu Wenruo wrote: >> >> >> On 2017年12月08日 18:39, Anand Jain wrote: >>> >>> >>> On 12/08/2017 04:17 PM, Qu Wenruo wrote: On 2017年12月08日 15:57, Anand Jain wrote: > -EXPERIMENTAL- > As of now when primary SB fails we won't self heal and would fail > mount, > this is an experimental patch which thinks why not go and read backup > copy. Just curious about in which real world case that backup super block can help. At least from what I see in mail list, only few cases where backup super helps. >>> >>> Theoretical design helps. I ended up in this situation though. And >>> ext4 has -o sb flag to manage this part. When we can expect EIO on >>> any part of the disk block why not on the LBA which contains primary >>> SB. And should we fail the mount for that reason ? No. >> >> And how do you ensure it's a btrfs? > > Hmm. You mean outside of btrfs ? I did experiment with wipe and then > using /etc/fstab to mount, and it did lead to btrfs, is that your > concern that it shouldn't have been. That looked surprising to me as > well, but then problem points at wipefs instead. It's closer, but still doesn't reach the point. It can be a mkfs, other than wipefs. It's can be another case like: One user used btrfs for a while But bugs made him/her unhappy, and he/she turned to use xfs (whatever the fs is) instead. While he/she forgot to change its fstab, and rebooted the system. And suddenly, it's btrfs again! What a surprise. > >> >>> Despite that self super heal seems good, although I agree with David, we need a weaker but necessary check (magic and fsid from primary super?) to ensure it's a valid btrfs before we use the backup supers. >>> >>> Of course, we already have btrfs_check_super_valid() to verify the SB, >>> I don't understand why - how do we verify the SB should be the >>> point of >>> concern here, at all. >> >> The point here is, to distinguish an old and invalid btrfs (some other >> valid fs mkfs beyond the old fs) from a valid btrfs with corrupted >> primary fs. > > Ok. When you check all the SBs you would pick the one which has passed > btrfs_check_super_valid() and has highest generation. Did I ans your > concern ? If a SB does not pass btrfs_check_super_valid() its not a > valid btrfs SB at all. No, not really. What if the first SB is a XFS one or even a fs you didn't ever hear? Skip it and use the 2nd one? This filesystem is not even btrfs anymore. Thanks, Qu > > >> This the main concern here. >> The difference between recovery and recognizing a bad btrfs is quite >> important. > > btrfs_check_super_valid() is already doing that right ? The point here > is, should we use the backup SB when btrfs_check_super_valid() fails on > primary SB. > > Thanks, Anand > >> Thanks, >> Qu >> >>> >>> Thanks, Anand >>> Thanks, Qu > > Signed-off-by: Anand Jain > --- > fs/btrfs/disk-io.c | 8 +++- > fs/btrfs/volumes.c | 10 +++--- > 2 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 9b20c1f3563b..a791b8dfe8a8 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct > block_device *bdev) > * So, we need to add a special mount option to scan for > * later supers, using BTRFS_SUPER_MIRROR_MAX instead > */ > - for (i = 0; i < 1; i++) { > + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { > ret = btrfs_read_dev_one_super(bdev, i, &bh); > if (ret) > continue; > @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct > btrfs_fs_info *fs_info) > ret = -EINVAL; > } > +#if 0 > + /* > + * Need a way to check for any copy of SB, as its not a > + * strong check, just ignore this for now. > + */ > if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { > btrfs_err(fs_info, "super offset mismatch %llu != %u", > btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); > ret = -EINVAL; > } > +#endif > /* > * Obvious sys_chunk_array corruptions, it must hold at least > one key > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 9fa2539a8493..f368db94d62b 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, > fmode_t flags, void *holder, > { > struct btrfs_super_block *disk_super; > struct block_device *bdev; > - struct page *page; > + struct buffer_head *sb_bh; > int ret = -EINVAL;
Re: [PATCH RFC] btrfs: self heal from SB fail
On 12/08/2017 07:01 PM, Qu Wenruo wrote: On 2017年12月08日 18:39, Anand Jain wrote: On 12/08/2017 04:17 PM, Qu Wenruo wrote: On 2017年12月08日 15:57, Anand Jain wrote: -EXPERIMENTAL- As of now when primary SB fails we won't self heal and would fail mount, this is an experimental patch which thinks why not go and read backup copy. Just curious about in which real world case that backup super block can help. At least from what I see in mail list, only few cases where backup super helps. Theoretical design helps. I ended up in this situation though. And ext4 has -o sb flag to manage this part. When we can expect EIO on any part of the disk block why not on the LBA which contains primary SB. And should we fail the mount for that reason ? No. And how do you ensure it's a btrfs? Hmm. You mean outside of btrfs ? I did experiment with wipe and then using /etc/fstab to mount, and it did lead to btrfs, is that your concern that it shouldn't have been. That looked surprising to me as well, but then problem points at wipefs instead. Despite that self super heal seems good, although I agree with David, we need a weaker but necessary check (magic and fsid from primary super?) to ensure it's a valid btrfs before we use the backup supers. Of course, we already have btrfs_check_super_valid() to verify the SB, I don't understand why - how do we verify the SB should be the point of concern here, at all. The point here is, to distinguish an old and invalid btrfs (some other valid fs mkfs beyond the old fs) from a valid btrfs with corrupted primary fs. Ok. When you check all the SBs you would pick the one which has passed btrfs_check_super_valid() and has highest generation. Did I ans your concern ? If a SB does not pass btrfs_check_super_valid() its not a valid btrfs SB at all. This the main concern here. The difference between recovery and recognizing a bad btrfs is quite important. btrfs_check_super_valid() is already doing that right ? The point here is, should we use the backup SB when btrfs_check_super_valid() fails on primary SB. Thanks, Anand Thanks, Qu Thanks, Anand Thanks, Qu Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 +++- fs/btrfs/volumes.c | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b20c1f3563b..a791b8dfe8a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfs_read_dev_one_super(bdev, i, &bh); if (ret) continue; @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } +#if 0 + /* + * Need a way to check for any copy of SB, as its not a + * strong check, just ignore this for now. + */ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { btrfs_err(fs_info, "super offset mismatch %llu != %u", btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); ret = -EINVAL; } +#endif /* * Obvious sys_chunk_array corruptions, it must hold at least one key diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9fa2539a8493..f368db94d62b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, { struct btrfs_super_block *disk_super; struct block_device *bdev; - struct page *page; + struct buffer_head *sb_bh; int ret = -EINVAL; u64 devid; u64 transid; @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, goto error; } - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) + sb_bh = btrfs_read_dev_super(bdev); + if (IS_ERR(sb_bh)) { + ret = PTR_ERR(sb_bh); goto error_bdev_put; + } + disk_super = (struct btrfs_super_block *) sb_bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); transid = btrfs_super_generation(disk_super); @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, if (!ret && fs_devices_ret) (*fs_devices_ret)->total_devices = total_devices; - btrfs_release_disk_super(page); + brelse(sb_bh); error_bdev_put: blkdev_put(bdev, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linu
[bug report] btrfs: cleanup device states define BTRFS_DEV_STATE_IN_FS_METADATA
Hello Anand Jain, The patch 2dcfcdc43524: "btrfs: cleanup device states define BTRFS_DEV_STATE_IN_FS_METADATA" from Dec 4, 2017, leads to the following static checker warning: fs/btrfs/super.c:2007 btrfs_calc_avail_data_space() warn: test_bit() takes a bit number fs/btrfs/super.c 2004 2005 rcu_read_lock(); 2006 list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) { 2007 if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA, ^^ This BTRFS_DEV_STATE_IN_FS_METADATA define is BIT(1) but for test_bit() we should just take 1. In other words we are doing double BIT(BIT(1)). 2008 &device->dev_state) || 2009 !device->bdev || 2010 test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) 2011 continue; 2012 2013 if (i >= nr_devices) 2014 break; 2015 I get a bunch of these warnings. drivers/md/md-bitmap.c:1993 bitmap_copy_from_slot() warn: 'bitmap_file_test_bit(bitmap, block)' returns positive and negative [ I threw this one is as a bonus. The warning is because are we handling the -EINVAL return from bitmap_file_test_bit() correctly? I have no idea. ] fs/btrfs/super.c:2007 btrfs_calc_avail_data_space() warn: test_bit() takes a bit number fs/btrfs/super.c:2010 btrfs_calc_avail_data_space() warn: test_bit() takes a bit number fs/btrfs/super.c:2315 btrfs_show_devname() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3351 write_dev_flush() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3361 wait_dev_flush() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3364 wait_dev_flush() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3392 barrier_all_devices() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3396 barrier_all_devices() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3397 barrier_all_devices() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3406 barrier_all_devices() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3412 barrier_all_devices() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3413 barrier_all_devices() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3510 write_all_supers() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3511 write_all_supers() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3550 write_all_supers() warn: test_bit() takes a bit number fs/btrfs/disk-io.c:3551 write_all_supers() warn: test_bit() takes a bit number fs/btrfs/volumes.c:695 btrfs_open_one_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:699 btrfs_open_one_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:701 btrfs_open_one_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:709 btrfs_open_one_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:713 btrfs_open_one_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:829 device_list_add() warn: test_bit() takes a bit number fs/btrfs/volumes.c:831 device_list_add() warn: test_bit() takes a bit number fs/btrfs/volumes.c:913 btrfs_close_extra_devices() warn: test_bit() takes a bit number fs/btrfs/volumes.c:915 btrfs_close_extra_devices() warn: test_bit() takes a bit number fs/btrfs/volumes.c:935 btrfs_close_extra_devices() warn: test_bit() takes a bit number fs/btrfs/volumes.c:945 btrfs_close_extra_devices() warn: test_bit() takes a bit number fs/btrfs/volumes.c:947 btrfs_close_extra_devices() warn: test_bit() takes a bit number fs/btrfs/volumes.c:948 btrfs_close_extra_devices() warn: test_bit() takes a bit number fs/btrfs/volumes.c:980 btrfs_close_bdev() warn: test_bit() takes a bit number fs/btrfs/volumes.c:997 btrfs_prepare_close_one_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1003 btrfs_prepare_close_one_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1258 btrfs_account_dev_extents_size() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1437 find_free_dev_extent_start() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1644 btrfs_alloc_dev_extent() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1645 btrfs_alloc_dev_extent() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1891 btrfs_find_next_active_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1953 btrfs_rm_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1958 btrfs_rm_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1964 btrfs_rm_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:1986 btrfs_rm_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:2006 btrfs_rm_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c:2026 btrfs_rm_device() warn: test_bit() takes a bit number fs/btrfs/volumes.c
Re: [PATCH RFC] btrfs: self heal from SB fail
On 2017年12月08日 18:39, Anand Jain wrote: > > > On 12/08/2017 04:17 PM, Qu Wenruo wrote: >> >> >> On 2017年12月08日 15:57, Anand Jain wrote: >>> -EXPERIMENTAL- >>> As of now when primary SB fails we won't self heal and would fail mount, >>> this is an experimental patch which thinks why not go and read backup >>> copy. >> >> Just curious about in which real world case that backup super block can >> help. >> At least from what I see in mail list, only few cases where backup super >> helps. > > Theoretical design helps. I ended up in this situation though. And > ext4 has -o sb flag to manage this part. When we can expect EIO on > any part of the disk block why not on the LBA which contains primary > SB. And should we fail the mount for that reason ? No. And how do you ensure it's a btrfs? > >> Despite that self super heal seems good, although I agree with David, we >> need a weaker but necessary check (magic and fsid from primary super?) >> to ensure it's a valid btrfs before we use the backup supers. > > Of course, we already have btrfs_check_super_valid() to verify the SB, > I don't understand why - how do we verify the SB should be the point of > concern here, at all. The point here is, to distinguish an old and invalid btrfs (some other valid fs mkfs beyond the old fs) from a valid btrfs with corrupted primary fs. This the main concern here. The difference between recovery and recognizing a bad btrfs is quite important. Thanks, Qu > > Thanks, Anand > >> Thanks, >> Qu >> >> >>> >>> Signed-off-by: Anand Jain >>> --- >>> fs/btrfs/disk-io.c | 8 +++- >>> fs/btrfs/volumes.c | 10 +++--- >>> 2 files changed, 14 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>> index 9b20c1f3563b..a791b8dfe8a8 100644 >>> --- a/fs/btrfs/disk-io.c >>> +++ b/fs/btrfs/disk-io.c >>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct >>> block_device *bdev) >>> * So, we need to add a special mount option to scan for >>> * later supers, using BTRFS_SUPER_MIRROR_MAX instead >>> */ >>> - for (i = 0; i < 1; i++) { >>> + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { >>> ret = btrfs_read_dev_one_super(bdev, i, &bh); >>> if (ret) >>> continue; >>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct >>> btrfs_fs_info *fs_info) >>> ret = -EINVAL; >>> } >>> +#if 0 >>> + /* >>> + * Need a way to check for any copy of SB, as its not a >>> + * strong check, just ignore this for now. >>> + */ >>> if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { >>> btrfs_err(fs_info, "super offset mismatch %llu != %u", >>> btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); >>> ret = -EINVAL; >>> } >>> +#endif >>> /* >>> * Obvious sys_chunk_array corruptions, it must hold at least >>> one key >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >>> index 9fa2539a8493..f368db94d62b 100644 >>> --- a/fs/btrfs/volumes.c >>> +++ b/fs/btrfs/volumes.c >>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, >>> fmode_t flags, void *holder, >>> { >>> struct btrfs_super_block *disk_super; >>> struct block_device *bdev; >>> - struct page *page; >>> + struct buffer_head *sb_bh; >>> int ret = -EINVAL; >>> u64 devid; >>> u64 transid; >>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, >>> fmode_t flags, void *holder, >>> goto error; >>> } >>> - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) >>> + sb_bh = btrfs_read_dev_super(bdev); >>> + if (IS_ERR(sb_bh)) { >>> + ret = PTR_ERR(sb_bh); >>> goto error_bdev_put; >>> + } >>> + disk_super = (struct btrfs_super_block *) sb_bh->b_data; >>> devid = btrfs_stack_device_id(&disk_super->dev_item); >>> transid = btrfs_super_generation(disk_super); >>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, >>> fmode_t flags, void *holder, >>> if (!ret && fs_devices_ret) >>> (*fs_devices_ret)->total_devices = total_devices; >>> - btrfs_release_disk_super(page); >>> + brelse(sb_bh); >>> error_bdev_put: >>> blkdev_put(bdev, flags); >>> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: OpenPGP digital signature
Re: [PATCH RFC] btrfs: self heal from SB fail
On 12/08/2017 04:17 PM, Qu Wenruo wrote: On 2017年12月08日 15:57, Anand Jain wrote: -EXPERIMENTAL- As of now when primary SB fails we won't self heal and would fail mount, this is an experimental patch which thinks why not go and read backup copy. Just curious about in which real world case that backup super block can help. At least from what I see in mail list, only few cases where backup super helps. Theoretical design helps. I ended up in this situation though. And ext4 has -o sb flag to manage this part. When we can expect EIO on any part of the disk block why not on the LBA which contains primary SB. And should we fail the mount for that reason ? No. Despite that self super heal seems good, although I agree with David, we need a weaker but necessary check (magic and fsid from primary super?) to ensure it's a valid btrfs before we use the backup supers. Of course, we already have btrfs_check_super_valid() to verify the SB, I don't understand why - how do we verify the SB should be the point of concern here, at all. Thanks, Anand Thanks, Qu Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 +++- fs/btrfs/volumes.c | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b20c1f3563b..a791b8dfe8a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfs_read_dev_one_super(bdev, i, &bh); if (ret) continue; @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } +#if 0 + /* +* Need a way to check for any copy of SB, as its not a +* strong check, just ignore this for now. +*/ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { btrfs_err(fs_info, "super offset mismatch %llu != %u", btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); ret = -EINVAL; } +#endif /* * Obvious sys_chunk_array corruptions, it must hold at least one key diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9fa2539a8493..f368db94d62b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, { struct btrfs_super_block *disk_super; struct block_device *bdev; - struct page *page; + struct buffer_head *sb_bh; int ret = -EINVAL; u64 devid; u64 transid; @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, goto error; } - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) + sb_bh = btrfs_read_dev_super(bdev); + if (IS_ERR(sb_bh)) { + ret = PTR_ERR(sb_bh); goto error_bdev_put; + } + disk_super = (struct btrfs_super_block *) sb_bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); transid = btrfs_super_generation(disk_super); @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, if (!ret && fs_devices_ret) (*fs_devices_ret)->total_devices = total_devices; - btrfs_release_disk_super(page); + brelse(sb_bh); error_bdev_put: blkdev_put(bdev, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: self heal from SB fail
On 12/08/2017 04:40 PM, Nikolay Borisov wrote: On 8.12.2017 09:57, Anand Jain wrote: -EXPERIMENTAL- As of now when primary SB fails we won't self heal and would fail mount, this is an experimental patch which thinks why not go and read backup copy. Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 +++- fs/btrfs/volumes.c | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b20c1f3563b..a791b8dfe8a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfs_read_dev_one_super(bdev, i, &bh); if (ret) continue; @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } +#if 0 + /* +* Need a way to check for any copy of SB, as its not a +* strong check, just ignore this for now. +*/ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { btrfs_err(fs_info, "super offset mismatch %llu != %u", btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); ret = -EINVAL; } +#endif /* * Obvious sys_chunk_array corruptions, it must hold at least one key diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9fa2539a8493..f368db94d62b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, { struct btrfs_super_block *disk_super; struct block_device *bdev; - struct page *page; + struct buffer_head *sb_bh; int ret = -EINVAL; u64 devid; u64 transid; @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, goto error; } - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) + sb_bh = btrfs_read_dev_super(bdev); This patch prompts another question: why do we have a page-based and a bufferhead-based interface to reading the super block? Right. we need to know that. Sorry I just saw this. I have a very old patch to converge them and clean this up, but haven't sent it because there are some missing information on why it ended up like that in the first place. Thanks, Anand I did prototype switching the bufferheads to page based but the resulting code wasn't any cleaner. I believe there is also open the question what happens when btrfs is run on a 64k page machine. I.e. we are going to read a single page and the sb is going to be in the first 4k but what about the rest 60, they could potentially contain other metadata. The page will have to be freed asap so as not to peg the neighboring metadata? + if (IS_ERR(sb_bh)) { + ret = PTR_ERR(sb_bh); goto error_bdev_put; + } + disk_super = (struct btrfs_super_block *) sb_bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); transid = btrfs_super_generation(disk_super); @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, if (!ret && fs_devices_ret) (*fs_devices_ret)->total_devices = total_devices; - btrfs_release_disk_super(page); + brelse(sb_bh); error_bdev_put: blkdev_put(bdev, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Delivery Failure
- The message you sent to vhakonigroup.co.za/mashudu was rejected because it would exceed the quota for the mailbox. The subject of the message follows: Re:Start to earn 15k daily.So easy money - -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Question] Two variants of SB reads can we move into bio read instead ?
David, There are two variants of SB read, one using the device cache [1] and the other buffer head [2]. [1] btrfs_read_disk_super() [2] btrfs_read_dev_super() Patch, 6f60cbd3ae442cb35861bb522f388db123d42ec1 (btrfs: access superblock via pagecache in scan_one_device) Embraced device caches to avoid blocksize set and thus avoid device drop cache. But however its in the context of starting up with a new mount and in practice, would it really matter if the device cache is dropped in the context of mount, we any way drop it when the device is successfully mounted though. I don't understand the actual problem here. Further [2] is still using buffer head, which works very well for us in this context, any idea if there is any suggestion to move it to newer bio read instead ? Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray
On 12/8/2017 4:25 PM, Dave Chinner wrote: On Fri, Dec 08, 2017 at 01:45:52PM +0900, Byungchul Park wrote: On Fri, Dec 08, 2017 at 09:22:16AM +1100, Dave Chinner wrote: On Thu, Dec 07, 2017 at 11:06:34AM -0500, Theodore Ts'o wrote: On Wed, Dec 06, 2017 at 06:06:48AM -0800, Matthew Wilcox wrote: Unfortunately for you, I don't find arguments along the lines of "lockdep will save us" at all convincing. lockdep already throws too many false positives to be useful as a tool that reliably and accurately points out rare, exciting, complex, intricate locking problems. But it does reliably and accurately point out "dude, you forgot to take the lock". It's caught a number of real problems in my own testing that you never got to see. The problem is that if it has too many false positives --- and it's gotten *way* worse with the completion callback "feature", people will just stop using Lockdep as being too annyoing and a waste of developer time when trying to figure what is a legitimate locking bug versus lockdep getting confused. I can't even disable the new Lockdep feature which is throwing lots of new false positives --- it's just all or nothing. Dave has just said he's already stopped using Lockdep, as a result. This is compeltely OT, but FYI I stopped using lockdep a long time ago. We've spend orders of magnitude more time and effort to shut up lockdep false positives in the XFS code than we ever have on locking problems that lockdep has uncovered. And still lockdep throws too many false positives on XFS workloads to be useful to me. But it's more than that: I understand just how much lockdep *doesn't check* and that means *I know I can't rely on lockdep* for potential deadlock detection. e.g. it doesn't cover semaphores, which means Hello, I'm careful in saying the following since you seem to feel not good at crossrelease and even lockdep. Now that cross-release has been introduced, semaphores can be covered as you might know. Actually, all general waiters can. And all it will do is create a whole bunch more work for us XFS guys to shut up all the the false positive crap that falls out from it because the locking model we have is far more complex than any of the lockdep developers thought was necessary to support, just like happened with the XFS inode annotations all those years ago. e.g. nobody has ever bothered to ask us what is needed to describe XFS's semaphore locking model. If you did that, you'd know that we nest *thousands* of locked semaphores in compeltely random lock order during metadata buffer writeback. And that this lock order does not reflect the actual locking order rules we have for locking buffers during transactions. Oh, and you'd also know that a semaphore's lock order and context can change multiple times during the life time of the buffer. Say we free a block and the reallocate it as something else before it is reclaimed - that buffer now might have a different lock order. Or maybe we promote a buffer to be a root btree block as a result of a join - it's now the first buffer in a lock run, rather than a child. Or we split a tree, and the root is now a node and so no longer is the first buffer in a lock run. Or that we walk sideways along the leaf nodes siblings during searches. IOWs, there is no well defined static lock ordering at all for buffers - and therefore semaphores - in XFS at all. And knowing that, you wouldn't simply mention that lockdep can support semaphores now as though that is necessary to "make it work" for XFS. It's going to be much simpler for us to just turn off lockdep and ignore whatever crap it sends our way than it is to spend unplanned weeks of our time to try to make lockdep sorta work again. Sure, we might get there in the end, but it's likely to take months, if not years like it did with the XFS inode annotations. it has zero coverage of the entire XFS metadata buffer subsystem and the complex locking orders we have for metadata updates. Put simply: lockdep doesn't provide me with any benefit, so I don't use it... Sad.. I don't think you understand. I'll try to explain. The lockdep infrastructure by itself doesn't make lockdep a useful tool - it mostly generates false positives because it has no concept of locking models that don't match it's internal tracking assumptions and/or limitations. That means if we can't suppress the false positives, then lockdep is going to be too noisy to find real problems. It's taken the XFS developers months of work over the past 7-8 years to suppress all the *common* false positives that lockdep throws on XFS. And despite all that work, there's still too many false positives occuring because we can't easily suppress them with annotations. IOWs, the signal to noise ratio is still too low for lockdep to find real problems. That's why lockdep isn't useful to me - the noise floor is too high, and the effort to lower the noise floor further is too great. This is important, because cross-r
Re: [PATCH RFC] btrfs: self heal from SB fail
On 8.12.2017 09:57, Anand Jain wrote: > -EXPERIMENTAL- > As of now when primary SB fails we won't self heal and would fail mount, > this is an experimental patch which thinks why not go and read backup > copy. > > Signed-off-by: Anand Jain > --- > fs/btrfs/disk-io.c | 8 +++- > fs/btrfs/volumes.c | 10 +++--- > 2 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 9b20c1f3563b..a791b8dfe8a8 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct > block_device *bdev) >* So, we need to add a special mount option to scan for >* later supers, using BTRFS_SUPER_MIRROR_MAX instead >*/ > - for (i = 0; i < 1; i++) { > + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { > ret = btrfs_read_dev_one_super(bdev, i, &bh); > if (ret) > continue; > @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct > btrfs_fs_info *fs_info) > ret = -EINVAL; > } > > +#if 0 > + /* > + * Need a way to check for any copy of SB, as its not a > + * strong check, just ignore this for now. > + */ > if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { > btrfs_err(fs_info, "super offset mismatch %llu != %u", > btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); > ret = -EINVAL; > } > +#endif > > /* >* Obvious sys_chunk_array corruptions, it must hold at least one key > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 9fa2539a8493..f368db94d62b 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > { > struct btrfs_super_block *disk_super; > struct block_device *bdev; > - struct page *page; > + struct buffer_head *sb_bh; > int ret = -EINVAL; > u64 devid; > u64 transid; > @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > goto error; > } > > - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) > + sb_bh = btrfs_read_dev_super(bdev); This patch prompts another question: why do we have a page-based and a bufferhead-based interface to reading the super block? I did prototype switching the bufferheads to page based but the resulting code wasn't any cleaner. I believe there is also open the question what happens when btrfs is run on a 64k page machine. I.e. we are going to read a single page and the sb is going to be in the first 4k but what about the rest 60, they could potentially contain other metadata. The page will have to be freed asap so as not to peg the neighboring metadata? > + if (IS_ERR(sb_bh)) { > + ret = PTR_ERR(sb_bh); > goto error_bdev_put; > + } > + disk_super = (struct btrfs_super_block *) sb_bh->b_data; > > devid = btrfs_stack_device_id(&disk_super->dev_item); > transid = btrfs_super_generation(disk_super); > @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > if (!ret && fs_devices_ret) > (*fs_devices_ret)->total_devices = total_devices; > > - btrfs_release_disk_super(page); > + brelse(sb_bh); > > error_bdev_put: > blkdev_put(bdev, flags); > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Assalamu`Alaikum.
Greetings from Dr. mohammad ouattara. Assalamu`Alaikum. My Name is Dr. mohammad ouattara, I am a banker by profession. I'm from Ouagadougou, Burkina Faso, West Africa. My reason for contacting you is to transfer an abandoned $10.6M to your account. The owner of this fund died since 2004 with his Next Of Kin. I want to present you to the bank as the Next of Kin/beneficiary of this fund. Further details of the transaction shall be forwarded to you as soon as I receive your return mail indicating your interest. Have a great day, Dr. mohammad ouattara. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: self heal from SB fail
On 2017年12月08日 15:57, Anand Jain wrote: > -EXPERIMENTAL- > As of now when primary SB fails we won't self heal and would fail mount, > this is an experimental patch which thinks why not go and read backup > copy. Just curious about in which real world case that backup super block can help. At least from what I see in mail list, only few cases where backup super helps. Despite that self super heal seems good, although I agree with David, we need a weaker but necessary check (magic and fsid from primary super?) to ensure it's a valid btrfs before we use the backup supers. Thanks, Qu > > Signed-off-by: Anand Jain > --- > fs/btrfs/disk-io.c | 8 +++- > fs/btrfs/volumes.c | 10 +++--- > 2 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 9b20c1f3563b..a791b8dfe8a8 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct > block_device *bdev) >* So, we need to add a special mount option to scan for >* later supers, using BTRFS_SUPER_MIRROR_MAX instead >*/ > - for (i = 0; i < 1; i++) { > + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { > ret = btrfs_read_dev_one_super(bdev, i, &bh); > if (ret) > continue; > @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct > btrfs_fs_info *fs_info) > ret = -EINVAL; > } > > +#if 0 > + /* > + * Need a way to check for any copy of SB, as its not a > + * strong check, just ignore this for now. > + */ > if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) { > btrfs_err(fs_info, "super offset mismatch %llu != %u", > btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET); > ret = -EINVAL; > } > +#endif > > /* >* Obvious sys_chunk_array corruptions, it must hold at least one key > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 9fa2539a8493..f368db94d62b 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > { > struct btrfs_super_block *disk_super; > struct block_device *bdev; > - struct page *page; > + struct buffer_head *sb_bh; > int ret = -EINVAL; > u64 devid; > u64 transid; > @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > goto error; > } > > - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) > + sb_bh = btrfs_read_dev_super(bdev); > + if (IS_ERR(sb_bh)) { > + ret = PTR_ERR(sb_bh); > goto error_bdev_put; > + } > + disk_super = (struct btrfs_super_block *) sb_bh->b_data; > > devid = btrfs_stack_device_id(&disk_super->dev_item); > transid = btrfs_super_generation(disk_super); > @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > if (!ret && fs_devices_ret) > (*fs_devices_ret)->total_devices = total_devices; > > - btrfs_release_disk_super(page); > + brelse(sb_bh); > > error_bdev_put: > blkdev_put(bdev, flags); > signature.asc Description: OpenPGP digital signature
Re: [PATCH 5/5] btrfs: Greatly simplify btrfs_read_dev_super
On 12/07/2017 11:07 PM, Anand Jain wrote: On 12/07/2017 12:24 AM, David Sterba wrote: On Mon, Dec 04, 2017 at 06:20:09PM +0200, Nikolay Borisov wrote: I don't understand what problem *should* be solved here... Without any further checks and validation, we cannot simply iterate over all superblocks and try to mount anything that looks ok. Even if the offsets exist on the block device. The fsid should be at least checked, but that's still not enough to ensure the 1st copy is what we want to mount. I'm more inclined to agree with Anand here, that if the users wants to mount with -t btrfs then the filesystem should assume it's correct to use any of the additional superblocks. If and only if the additional superblocks are valid. And if we can't read the primary superblock, we don't have enough information to establish the validation. EIO? We don't have anything. CRC mismatch? Can't trust the whole data. We need FSID and total_size at least. Other actions are limited from inside kernel. ext4 has mount option -o sb= to specify the copy num to use. Not sure if I have dug enough but as of now I don't find any pitfall. Only funny thing is you can mount a device even after wipefs -a and recover its primary SB, to which my view is that the problem is at the wipefs -a end, not able to wipe all SB copies. I sent out an experimental RFC patch here: [PATCH RFC] btrfs: self heal from SB fail Pls try out, any problem that you could think off by this approach. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html