Re: [PATCH 1/3] btrfs: simplify counting number of eb pages
On 24.04.2018 02:03, David Sterba wrote: > The eb length is nodesize, as initialized in __alloc_extent_buffer. > Regardless of start, we should always get the same number of pages, so > use that fact. > > Signed-off-by: David Sterba> --- > fs/btrfs/extent_io.h | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h > index a53009694b16..ee92c1289edd 100644 > --- a/fs/btrfs/extent_io.h > +++ b/fs/btrfs/extent_io.h > @@ -454,8 +454,7 @@ void wait_on_extent_buffer_writeback(struct extent_buffer > *eb); > > static inline unsigned long num_extent_pages(u64 start, u64 len) > { > - return ((start + len + PAGE_SIZE - 1) >> PAGE_SHIFT) - > - (start >> PAGE_SHIFT); > + return len >> PAGE_SHIFT; Shouldn't this really be len + PAGE_SIZE -1 or in fact DIV_ROUND_DOWN (len, PAGE_SIZE). Because with a nodesize of 4k (and basically less than a page size) we can get into a situation where we do: 4096 >> 13 = 0 On powerpc for example we have: arch/powerpc/include/asm/page.h:#define PAGE_SHIFT 18 arch/powerpc/include/asm/page.h:#define PAGE_SHIFT 16 arch/powerpc/include/asm/page.h:#define PAGE_SHIFT 14 arch/powerpc/include/asm/page.h:#define PAGE_SHIFT 12 > } > > static inline void extent_buffer_get(struct extent_buffer *eb) > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: do not merge tree block refs have different root_id
For an extent item which contains many tree block backrefs, like = In 020-extent-ref-cases/keyed_block_ref.img item 10 key (29470720 METADATA_ITEM 0) itemoff 3450 itemsize 222 refs 23 gen 10 flags TREE_BLOCK tree block skinny level 0 tree block backref root 278 tree block backref root 277 tree block backref root 276 tree block backref root 275 tree block backref root 274 tree block backref root 273 tree block backref root 272 tree block backref root 271 tree block backref root 270 tree block backref root 269 tree block backref root 268 tree block backref root 267 tree block backref root 266 tree block backref root 265 tree block backref root 264 tree block backref root 263 tree block backref root 262 tree block backref root 261 tree block backref root 260 tree block backref root 259 tree block backref root 258 tree block backref root 257 = In find_parent_nodes(), these refs's parents are 0, then __merge_refs will merge refs to one ref. It causes only one root to be returned. So, if both parents are 0, do not merge refs. Lowmem check calls find_parent_nodes frequently to decide whether. check an extent buffer or not. These bug influences bytes accounting. Signed-off-by: Su Yue--- backref.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/backref.c b/backref.c index 51553c702187..daadb6299c79 100644 --- a/backref.c +++ b/backref.c @@ -507,6 +507,8 @@ static void __merge_refs(struct pref_state *prefstate, int mode) } else { if (ref1->parent != ref2->parent) continue; + if (ref1->parent == 0) + continue; } eie = ref1->inode_list; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs-progs: remove comments about delayed ref in backref.c
There is no delayed ref in btrfs-progs, so remove related comments. Signed-off-by: Su Yue--- backref.c | 19 +++ 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/backref.c b/backref.c index 27309e07a1e9..c144dbf060f2 100644 --- a/backref.c +++ b/backref.c @@ -155,19 +155,6 @@ static void init_pref_state(struct pref_state *prefstate) * - if you cannot add the parent or a correct key, then we will look into the * block later to set a correct key * - * delayed refs - * - *backref type | shared | indirect | shared | indirect - * information | tree | tree | data | data - * ++--++-- - * parent logical |y | -|- | - - * key to resolve |- | y|y | y - * tree block logical |- | -|- | - - * root for resolving |y | y|y | y - * - * - column 1: we've the parent -> done - * - column 2, 3, 4: we use the key to find the parent - * * on disk refs (inline or keyed) * == *backref type | shared | indirect | shared | indirect @@ -735,9 +722,9 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info, } /* - * this adds all existing backrefs (inline backrefs, backrefs and delayed - * refs) for the given bytenr to the refs list, merges duplicates and resolves - * indirect refs to their parent bytenr. + * this adds all existing backrefs (inline backrefs, backrefs for the given + * bytenr to the refs list, merges duplicates and resolves indirect refs to + * their parent bytenr. * When roots are found, they're added to the roots list * * FIXME some caching might speed things up -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: remove useless branch in __merge_refs
After call of ref_for_same_block, ref1->parent must equals to ref2->parent, the block of exchange is never reached. So remove the block of exchange. Signed-off-by: Su Yue--- backref.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/backref.c b/backref.c index c144dbf060f2..51553c702187 100644 --- a/backref.c +++ b/backref.c @@ -497,7 +497,6 @@ static void __merge_refs(struct pref_state *prefstate, int mode) for (pos2 = pos1->next, n2 = pos2->next; pos2 != head; pos2 = n2, n2 = pos2->next) { struct __prelim_ref *ref2; - struct __prelim_ref *xchg; struct extent_inode_elem *eie; ref2 = list_entry(pos2, struct __prelim_ref, list); @@ -505,11 +504,6 @@ static void __merge_refs(struct pref_state *prefstate, int mode) if (mode == 1) { if (!ref_for_same_block(ref1, ref2)) continue; - if (!ref1->parent && ref2->parent) { - xchg = ref1; - ref1 = ref2; - ref2 = xchg; - } } else { if (ref1->parent != ref2->parent) continue; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: print-tree: Add locking status output for debug build
It's pretty handy if we can get debug output for locking status of an extent buffer, specially for race related debugging. So add the following output for btrfs_print_tree() and btrfs_print_leaf(): - refs - write_locks (as w:%d) - read_locks (as r:%d) - blocking_writers (as bw:%d) - blocking_readers (as br:%d) - spinning_writers (as sw:%d) - spinning_readers (as sr:%d) - lock_owner - current->pid Signed-off-by: Qu Wenruo--- changelog: v2: Use correct specifier for int (both atomic_t and pid_t, no special specifier found in Documentation/core-api/printk-formats.rst) --- fs/btrfs/print-tree.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index 9904cf741e1c..7f0ada51e5fe 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -179,6 +179,26 @@ static void print_uuid_item(struct extent_buffer *l, unsigned long offset, } } +/* + * Helper to output refs and locking status. + * + * Useful to debug race related problem + */ +static void print_eb_refs_lock(struct extent_buffer *eb) +{ +#ifdef CONFIG_BTRFS_DEBUG + btrfs_info(eb->fs_info, +"refs %u lock(w:%u r:%u bw:%u br:%u sw:%u sr:%u) lock_owner %u current %u", + atomic_read(>refs), atomic_read(>write_locks), + atomic_read(>read_locks), + atomic_read(>blocking_writers), + atomic_read(>blocking_readers), + atomic_read(>spinning_writers), + atomic_read(>spinning_readers), + eb->lock_owner, current->pid); +#endif +} + void btrfs_print_leaf(struct extent_buffer *l) { struct btrfs_fs_info *fs_info; @@ -206,6 +226,7 @@ void btrfs_print_leaf(struct extent_buffer *l) "leaf %llu gen %llu total ptrs %d free space %d owner %llu", btrfs_header_bytenr(l), btrfs_header_generation(l), nr, btrfs_leaf_free_space(fs_info, l), btrfs_header_owner(l)); + print_eb_refs_lock(l); for (i = 0 ; i < nr ; i++) { item = btrfs_item_nr(i); btrfs_item_key_to_cpu(l, , i); @@ -360,6 +381,7 @@ void btrfs_print_tree(struct extent_buffer *c, bool follow) btrfs_header_bytenr(c), level, btrfs_header_generation(c), nr, (u32)BTRFS_NODEPTRS_PER_BLOCK(fs_info) - nr, btrfs_header_owner(c)); + print_eb_refs_lock(c); for (i = 0; i < nr; i++) { btrfs_node_key_to_cpu(c, , i); pr_info("\tkey %d (%llu %u %llu) block %llu gen %llu\n", -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/4] btrfs: Add csum type check for btrfs_check_super_valid()
Just like incompat flags check, although we have already done super csum type check before calling btrfs_check_super_valid(), we can still add such check for later write time check. Signed-off-by: Qu Wenruo--- fs/btrfs/disk-io.c | 9 + 1 file changed, 9 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3968f7ff8de2..9282a6ac91db 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3985,6 +3985,15 @@ static int btrfs_validate_super(struct btrfs_fs_info *fs_info) btrfs_err(fs_info, "no valid FS found"); ret = -EINVAL; } + /* +* For write time check, as for mount time we have checked csum before +* calling btrfs_check_super_valid(), so it must be a corruption +*/ + if (btrfs_super_csum_type(sb) >= ARRAY_SIZE(btrfs_csum_sizes)) { + btrfs_err(fs_info, "corrupted csum type %u", + btrfs_super_csum_type(sb)); + ret = -EINVAL; + } if (btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP) { btrfs_err(fs_info, "unrecognized or unsupported super flag: %llu", btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP); -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/4] btrfs: Add incompat flags check for btrfs_check_super_valid()
Although we have already checked incompat flags manually before really mounting it, we could still enhance btrfs_check_super_valid() to check incompat flags for later write time super block validation check. This patch adds such incompat flags check for btrfs_check_super_valid(), currently it won't be triggered, but provides the basis for later write time check. Signed-off-by: Qu Wenruo--- fs/btrfs/disk-io.c | 13 + 1 file changed, 13 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 23b5c90cdbb2..3968f7ff8de2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4104,6 +4104,19 @@ static int btrfs_validate_super(struct btrfs_fs_info *fs_info) ret = -EINVAL; } + /* +* Before calling btrfs_check_super_valid() we have already checked +* incompat flags. So if we developr new incompat flags, it's must be +* some corruption. +*/ + if (btrfs_super_incompat_flags(sb) & ~BTRFS_FEATURE_INCOMPAT_SUPP) { + btrfs_err(fs_info, + "corrupted incompat flags detected 0x%llx, supported 0x%llx", + btrfs_super_incompat_flags(sb), + BTRFS_FEATURE_INCOMPAT_SUPP); + ret = -EINVAL; + } + /* * The generation is a global counter, we'll trust it more than the others * but it's still possible that it's the one that's wrong. -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/4] btrfs: Add write time super block validation
This patchset can be fetched from github: https://github.com/adam900710/linux/tree/write_time_sb_check We have 2 reports about corrupted btrfs super block, which has some garbage in its super block, but otherwise it's completely fine and its csum even matches. This means we develop memory corruption during btrfs mount time. It's not clear whether it's caused by btrfs or some other kernel module, but at least let's do write time verification to catch such corruption early. Changelog: v2: Rename btrfs_check_super_valid() to btrfs_validate_super() suggested by Nikolay and David. Qu Wenruo (4): btrfs: Rename btrfs_check_super_valid() to btrfs_validate_super() btrfs: Add incompat flags check for btrfs_check_super_valid() btrfs: Add csum type check for btrfs_check_super_valid() btrfs: Do super block verification before writing it to disk fs/btrfs/disk-io.c | 58 -- 1 file changed, 51 insertions(+), 7 deletions(-) -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/4] btrfs: Rename btrfs_check_super_valid() to btrfs_validate_super()
Makes the function name a little shorter and easier to read. Signed-off-by: Qu Wenruo--- fs/btrfs/disk-io.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 60caa68c3618..23b5c90cdbb2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -55,7 +55,7 @@ static const struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void free_fs_root(struct btrfs_root *root); -static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info); +static int btrfs_validate_super(struct btrfs_fs_info *fs_info); static void btrfs_destroy_ordered_extents(struct btrfs_root *root); static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, struct btrfs_fs_info *fs_info); @@ -2668,7 +2668,7 @@ int open_ctree(struct super_block *sb, memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE); - ret = btrfs_check_super_valid(fs_info); + ret = btrfs_validate_super(fs_info); if (ret) { btrfs_err(fs_info, "superblock contains fatal errors"); err = -EINVAL; @@ -3974,7 +3974,7 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid, int level, level, first_key); } -static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) +static int btrfs_validate_super(struct btrfs_fs_info *fs_info) { struct btrfs_super_block *sb = fs_info->super_copy; u64 nodesize = btrfs_super_nodesize(sb); -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/4] btrfs: Do super block verification before writing it to disk
There are already 2 reports about strangely corrupted super blocks, where csum still matches but extra garbage gets slipped into super block. The corruption would looks like: -- superblock: bytenr=65536, device=/dev/sdc1 - csum_type 41700 (INVALID) csum0x3b252d3a [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] ... incompat_flags 0x5b224169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA | unknown flag: 0x5b224000 ) ... -- Or -- superblock: bytenr=65536, device=/dev/mapper/x - csum_type 35355 (INVALID) csum_size 32 csum 0xf0dbeddd [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] ... incompat_flags 0x176d2169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA | unknown flag: 0x176d2000 ) -- Obviously, csum_type and incompat_flags get some garbage, but its csum still matches, which means kernel calculates the csum based on corrupted super block memory. And after manually fixing these values, the filesystem is completely healthy without any problem exposed by btrfs check. Although the cause is still unknown, at least detect it and prevent further corruption. Reported-by: Ken SwensonReported-by: Ben Parsons <9parso...@gmail.com> Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 36 +--- 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9282a6ac91db..0f5771244641 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -55,7 +55,9 @@ static const struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void free_fs_root(struct btrfs_root *root); -static int btrfs_validate_super(struct btrfs_fs_info *fs_info); +static int btrfs_validate_super(struct btrfs_fs_info *fs_info, + struct btrfs_super_block *sb, + int super_mirror); static void btrfs_destroy_ordered_extents(struct btrfs_root *root); static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, struct btrfs_fs_info *fs_info); @@ -2668,7 +2670,7 @@ int open_ctree(struct super_block *sb, memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE); - ret = btrfs_validate_super(fs_info); + ret = btrfs_validate_super(fs_info, fs_info->super_copy, 0); if (ret) { btrfs_err(fs_info, "superblock contains fatal errors"); err = -EINVAL; @@ -3563,6 +3565,16 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) sb = fs_info->super_for_commit; dev_item = >dev_item; + /* +* super_bytenr will be updated in write_dev_supers(), even if it is +* corrupted in current copy, it won't reach disk. So skip bytenr check. +*/ + if (btrfs_validate_super(fs_info, sb, -1)) { + btrfs_err(fs_info, + "superblock corruption detected before transaction commit"); + return -EUCLEAN; + } + mutex_lock(_info->fs_devices->device_list_mutex); head = _info->fs_devices->devices; max_errors = btrfs_super_num_devices(fs_info->super_copy) - 1; @@ -3974,9 +3986,18 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid, int level, level, first_key); } -static int btrfs_validate_super(struct btrfs_fs_info *fs_info) +/* + * Check the validation of btrfs super block. + * + * @sb:super block to check + * @super_mirror: the super block number to check its bytenr. + * 0 means the primary (1st) sb, 1 and 2 means 2nd and + * 3rd backup sb, while -1 means to skip bytenr check. + */ +static int btrfs_validate_super(struct btrfs_fs_info *fs_info, + struct btrfs_super_block *sb, + int super_mirror) { - struct btrfs_super_block *sb = fs_info->super_copy; u64 nodesize = btrfs_super_nodesize(sb); u64 sectorsize = btrfs_super_sectorsize(sb); int ret = 0; @@ -4088,9 +4109,10 @@ static int
Re: [PATCH] btrfs-progs: treewide: Replace strerror(errno) with %m.
On 01/24/2018 03:42 AM, David Sterba wrote: On Sun, Jan 07, 2018 at 01:54:21PM -0800, Rosen Penev wrote: As btrfs is specific to Linux, %m can be used instead of strerror(errno) in format strings. This has some size reduction benefits for embedded systems. Makes sense. glibc, musl, and uclibc-ng all support %m as a modifier to printf. A quick glance at the BIONIC libc source indicates that it has support for %m as well. BSDs and Windows do not but I do believe them to be beyond the scope of btrfs-progs. Thanks for checking the compatibility. The %m can be substituted by a wrapper if this becomes a problem in the future. It seems a little problem in output now since it's not caused by this patch. $ touch /tmp/tmp $ btrfs-image -r /tmp/tmp /tmp/test.img ERROR: unable to read cluster: Success ERROR: restore failed: Success $ echo $? 1 Though btrfs-image failed, %m(strerror(errno) argument makes output looks strange. IMO, here we should judge errno before output. But it means size reduction is inavaiable for embedded systems. Thanks, Su Compiled sizes on Ubuntu 16.04: Before: 3916512 btrfs After: 3908744 btrfs the delta is about 7KiB, that's not much but still counts. I would not object further optimizations towards size reduction as long as the code remains maintainable. 233256 libbtrfs.so.0.1 4899bcp 2366560 btrfs-convert 2207432 btrfs-corrupt-block 13302 btrfs-debugfs 2151104 btrfs-debug-tree 2134968 btrfs-find-root 2281864 btrfs-image 2143536 btrfs-map-logical 2129704 btrfs-select-super 2151552 btrfstune 2130696 btrfs-zero-log 2276272 mkfs.btrfs Some of the utilities are typically installed by default, the binaries share a lot of code as they get built from the same object files. I had once an idea of a compound binary that would switch the function by the name of the executable. Similar to what busybox does. https://github.com/kdave/btrfs-progs/commit/8fc697a7f763f39f3afe0abaa68ac13a49ac8a86 --- * btrfs * mkfs.btrfs * btrfstun * btrfs-image * btrfs-convert * btrfs-debug-tree * btrfs-show-super * btrfs-find-root The static target is also supported. The name of resulting boxed binaries is btrfs.box and btrfs.box.static . textdata bss dec hex filename 550988 19120 15444 585552 8ef50 btrfs 1562099 25316 42256 1629671 18dde7 btrfs.static 659504 21104 16492 697100 aa30c btrfs.box 1817274 27988 44088 1889350 1cd446 btrfs.box.static --- I was not sure if this is was just another good idea waiting for a usecase (or not), so I haven't continued past the prototype. Please let me know if you'd be interested in this functionality, the patch is fairly trivial to update. @@ -815,7 +815,7 @@ static const char * const cmd_filesystem_sync_usage[] = { static int cmd_filesystem_sync(int argc, char **argv) { - int fd, res, e; + int fd, res; char*path; DIR *dirstream = NULL; @@ -831,10 +831,9 @@ static int cmd_filesystem_sync(int argc, char **argv) return 1; res = ioctl(fd, BTRFS_IOC_SYNC); - e = errno; close_file_or_dir(fd, dirstream); if( res < 0 ){ - error("sync ioctl failed on '%s': %s", path, strerror(e)); + error("sync ioctl failed on '%s': %m", path); Let me use that one as example, there are a few more similar updates. There's potentially lost errno from the ioctl if something inside close_file_or_dir() overwrites it, as there are closedir and close. This is highly unlikely and I'll deal with that separately, so I'm going to apply the patch without further changes. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
4.14.35: possible circular locking dependency detected
Hi, after scrub start, scrub cancel, umount, mount of a two disk raid1 (data + metadata): [12999.229791] == [12999.236029] WARNING: possible circular locking dependency detected [12999.242261] 4.14.35 #36 Not tainted [12999.245806] -- [12999.252037] btrfs/4682 is trying to acquire lock: [12999.256794] ("%s-%s""btrfs", name){+.+.}, at: [] flush_workqueue+0x70/0x480 [12999.265486] but task is already holding lock: [12999.271390] (_info->scrub_lock){+.+.}, at: [] btrfs_scrub_dev+0x311/0x650 [btrfs] [12999.280887] which lock already depends on the new lock. [12999.289147] the existing dependency chain (in reverse order) is: [12999.296721] -> #3 (_info->scrub_lock){+.+.}: [12999.302949]__mutex_lock+0x66/0x9a0 [12999.307287]btrfs_scrub_dev+0x105/0x650 [btrfs] [12999.312615]btrfs_ioctl+0x19a0/0x2030 [btrfs] [12999.317706]do_vfs_ioctl+0x8c/0x6a0 [12999.321951]SyS_ioctl+0x6f/0x80 [12999.325860]do_syscall_64+0x64/0x170 [12999.330202]entry_SYSCALL_64_after_hwframe+0x42/0xb7 [12999.335930] -> #2 (_devs->device_list_mutex){+.+.}: [12999.342798]__mutex_lock+0x66/0x9a0 [12999.347049]reada_start_machine_worker+0xb0/0x3c0 [btrfs] [12999.353324]btrfs_worker_helper+0x8b/0x630 [btrfs] [12999.358926]process_one_work+0x242/0x6a0 [12999.363613]worker_thread+0x32/0x3f0 [12999.367894]kthread+0x11f/0x140 [12999.371792]ret_from_fork+0x3a/0x50 [12999.376038] -> #1 ((>normal_work)){+.+.}: [12999.382254]process_one_work+0x20c/0x6a0 [12999.386881]worker_thread+0x32/0x3f0 [12999.391216]kthread+0x11f/0x140 [12999.395107]ret_from_fork+0x3a/0x50 [12999.399290] -> #0 ("%s-%s""btrfs", name){+.+.}: [12999.405455]lock_acquire+0x93/0x220 [12999.409683]flush_workqueue+0x97/0x480 [12999.414129]drain_workqueue+0xa4/0x180 [12999.418620]destroy_workqueue+0xe/0x1e0 [12999.423202]btrfs_destroy_workqueue+0x57/0x280 [btrfs] [12999.429137]scrub_workers_put+0x29/0x60 [btrfs] [12999.434426]btrfs_scrub_dev+0x324/0x650 [btrfs] [12999.439764]btrfs_ioctl+0x19a0/0x2030 [btrfs] [12999.444870]do_vfs_ioctl+0x8c/0x6a0 [12999.449115]SyS_ioctl+0x6f/0x80 [12999.453017]do_syscall_64+0x64/0x170 [12999.457367]entry_SYSCALL_64_after_hwframe+0x42/0xb7 [12999.463103] other info that might help us debug this: [12999.471370] Chain exists of: "%s-%s""btrfs", name --> _devs->device_list_mutex --> _info->scrub_lock [12999.484396] Possible unsafe locking scenario: [12999.490524]CPU0CPU1 [12999.495161] [12999.499840] lock(_info->scrub_lock); [12999.504001]lock(_devs->device_list_mutex); [12999.511341]lock(_info->scrub_lock); [12999.518074] lock("%s-%s""btrfs", name); [12999.522243] *** DEADLOCK *** [12999.528347] 2 locks held by btrfs/4682: [12999.532330] #0: (sb_writers#15){.+.+}, at: [] mnt_want_write_file+0x33/0xb0 [12999.541310] #1: (_info->scrub_lock){+.+.}, at: [] btrfs_scrub_dev+0x311/0x650 [btrfs] [12999.551389] stack backtrace: [12999.555917] CPU: 3 PID: 4682 Comm: btrfs Not tainted 4.14.35 #36 [12999.562139] Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.2a 06/27/2012 [12999.569487] Call Trace: [12999.572049] dump_stack+0x67/0x95 [12999.575530] print_circular_bug.isra.42+0x1ce/0x1db [12999.580580] __lock_acquire+0x121c/0x1300 [12999.584758] ? lock_acquire+0x93/0x220 [12999.588668] lock_acquire+0x93/0x220 [12999.592316] ? flush_workqueue+0x70/0x480 [12999.596485] flush_workqueue+0x97/0x480 [12999.600487] ? flush_workqueue+0x70/0x480 [12999.604596] ? find_held_lock+0x2d/0x90 [12999.608541] ? drain_workqueue+0xa4/0x180 [12999.612724] drain_workqueue+0xa4/0x180 [12999.616730] destroy_workqueue+0xe/0x1e0 [12999.620784] btrfs_destroy_workqueue+0x57/0x280 [btrfs] [12999.626252] scrub_workers_put+0x29/0x60 [btrfs] [12999.631087] btrfs_scrub_dev+0x324/0x650 [btrfs] [12999.635814] ? __sb_start_write+0x137/0x1a0 [12999.640156] ? mnt_want_write_file+0x33/0xb0 [12999.644624] btrfs_ioctl+0x19a0/0x2030 [btrfs] [12999.649194] ? find_held_lock+0x2d/0x90 [12999.653146] ? do_vfs_ioctl+0x8c/0x6a0 [12999.657095] ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs] [12999.663572] do_vfs_ioctl+0x8c/0x6a0 [12999.667203] ? __fget+0x100/0x1f0 [12999.670627] SyS_ioctl+0x6f/0x80 [12999.673954] do_syscall_64+0x64/0x170 [12999.67] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [12999.682993] RIP: 0033:0x7f409f638f07 [12999.686667] RSP: 002b:7f409f549d38 EFLAGS: 0246 ORIG_RAX: 0010
[PATCH 3/3] btrfs: switch types to int when counting eb pages
The loops iterating eb pages use unsigned long, that's an overkill as we know that there are at most 16 pages (64k / 4k), and 4 by default (with nodesize 16k). Signed-off-by: David Sterba--- fs/btrfs/extent_io.c | 44 ++-- fs/btrfs/extent_io.h | 2 +- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 0cc5d6ae1876..5bdfdb9c8777 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2062,7 +2062,7 @@ int repair_eb_io_failure(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, int mirror_num) { u64 start = eb->start; - unsigned long i, num_pages = num_extent_pages(eb); + int i, num_pages = num_extent_pages(eb); int ret = 0; if (sb_rdonly(fs_info->sb)) @@ -3541,7 +3541,7 @@ lock_extent_buffer_for_io(struct extent_buffer *eb, struct btrfs_fs_info *fs_info, struct extent_page_data *epd) { - unsigned long i, num_pages; + int i, num_pages; int flush = 0; int ret = 0; @@ -3715,7 +3715,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, struct extent_io_tree *tree = _I(fs_info->btree_inode)->io_tree; u64 offset = eb->start; u32 nritems; - unsigned long i, num_pages; + int i, num_pages; unsigned long start, end; unsigned int write_flags = wbc_to_write_flags(wbc) | REQ_META; int ret = 0; @@ -4648,7 +4648,7 @@ int extent_buffer_under_io(struct extent_buffer *eb) */ static void btrfs_release_extent_buffer_page(struct extent_buffer *eb) { - unsigned long index; + int index; struct page *page; int mapped = !test_bit(EXTENT_BUFFER_DUMMY, >bflags); @@ -4744,10 +4744,10 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src) { - unsigned long i; + int i; struct page *p; struct extent_buffer *new; - unsigned long num_pages = num_extent_pages(src); + int num_pages = num_extent_pages(src); new = __alloc_extent_buffer(src->fs_info, src->start, src->len); if (new == NULL) @@ -4776,8 +4776,8 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, unsigned long len) { struct extent_buffer *eb; - unsigned long num_pages; - unsigned long i; + int num_pages; + int i; eb = __alloc_extent_buffer(fs_info, start, len); if (!eb) @@ -4843,7 +4843,7 @@ static void check_buffer_tree_ref(struct extent_buffer *eb) static void mark_extent_buffer_accessed(struct extent_buffer *eb, struct page *accessed) { - unsigned long num_pages, i; + int num_pages, i; check_buffer_tree_ref(eb); @@ -4944,8 +4944,8 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start) { unsigned long len = fs_info->nodesize; - unsigned long num_pages; - unsigned long i; + int num_pages; + int i; unsigned long index = start >> PAGE_SHIFT; struct extent_buffer *eb; struct extent_buffer *exists = NULL; @@ -5160,8 +5160,8 @@ void free_extent_buffer_stale(struct extent_buffer *eb) void clear_extent_buffer_dirty(struct extent_buffer *eb) { - unsigned long i; - unsigned long num_pages; + int i; + int num_pages; struct page *page; num_pages = num_extent_pages(eb); @@ -5190,8 +5190,8 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb) int set_extent_buffer_dirty(struct extent_buffer *eb) { - unsigned long i; - unsigned long num_pages; + int i; + int num_pages; int was_dirty = 0; check_buffer_tree_ref(eb); @@ -5209,9 +5209,9 @@ int set_extent_buffer_dirty(struct extent_buffer *eb) void clear_extent_buffer_uptodate(struct extent_buffer *eb) { - unsigned long i; + int i; struct page *page; - unsigned long num_pages; + int num_pages; clear_bit(EXTENT_BUFFER_UPTODATE, >bflags); num_pages = num_extent_pages(eb); @@ -5224,9 +5224,9 @@ void clear_extent_buffer_uptodate(struct extent_buffer *eb) void set_extent_buffer_uptodate(struct extent_buffer *eb) { - unsigned long i; + int i; struct page *page; - unsigned long num_pages; + int num_pages; set_bit(EXTENT_BUFFER_UPTODATE, >bflags); num_pages = num_extent_pages(eb); @@ -5239,13 +5239,13 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb) int read_extent_buffer_pages(struct extent_io_tree *tree, struct extent_buffer *eb, int wait, int
[PATCH 0/3] Simplify counting of extent buffer pages
Some low-hanging cleanup fruit. The argument bloat-o-meter shows some improvements: extent_io.c:cache_state_if_flags.part.27 -8 (8 -> 0) extent_io.c:cache_state.part.28-8 (8 -> 0) extent_io.c:check_buffer_tree_ref.part.31 -24 (24 -> 0) extent_io.c:mark_extent_buffer_accessed-8 (40 -> 32) extent_io.c:alloc_extent_state_atomic.part.35 -8 (8 -> 0) extent_io.c:flush_write_bio.isra.40 -16 (16 -> 0) extent_io.c:set_page_extent_mapped.part.49 -8 (8 -> 0) extent_io.c:extent_buffer_under_io.part.50 -8 (8 -> 0) extent_io.c:__unlock_for_delalloc.isra.38 -8 (8 -> 0) extent_io.c:merge_state.part.45 -48 (48 -> 0) extent_io.c:repair_eb_io_failure -8 (72 -> 64) extent_io.c:set_extent_buffer_dirty-8 (40 -> 32) extent_io.c:__alloc_dummy_extent_buffer+8 (32 -> 40) extent_io.c:write_one_eb +16 (152 -> 168) David Sterba (3): btrfs: simplify counting number of eb pages btrfs: pass only eb to num_extent_pages btrfs: switch types to int when counting eb pages fs/btrfs/extent_io.c | 68 ++-- fs/btrfs/extent_io.h | 5 ++-- 2 files changed, 36 insertions(+), 37 deletions(-) -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs: pass only eb to num_extent_pages
Almost all callers pass the start and len as 2 arguments but this is not necessary, all the information is provided by the eb. By reordering the calls to num_extent_pages, we don't need the local variables with start/len. Signed-off-by: David Sterba--- fs/btrfs/extent_io.c | 30 +++--- fs/btrfs/extent_io.h | 4 ++-- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index fb32394fd830..0cc5d6ae1876 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2062,7 +2062,7 @@ int repair_eb_io_failure(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, int mirror_num) { u64 start = eb->start; - unsigned long i, num_pages = num_extent_pages(eb->start, eb->len); + unsigned long i, num_pages = num_extent_pages(eb); int ret = 0; if (sb_rdonly(fs_info->sb)) @@ -3591,7 +3591,7 @@ lock_extent_buffer_for_io(struct extent_buffer *eb, if (!ret) return ret; - num_pages = num_extent_pages(eb->start, eb->len); + num_pages = num_extent_pages(eb); for (i = 0; i < num_pages; i++) { struct page *p = eb->pages[i]; @@ -3721,7 +3721,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, int ret = 0; clear_bit(EXTENT_BUFFER_WRITE_ERR, >bflags); - num_pages = num_extent_pages(eb->start, eb->len); + num_pages = num_extent_pages(eb); atomic_set(>io_pages, num_pages); /* set btree blocks beyond nritems with 0 to avoid stale content. */ @@ -4654,7 +4654,7 @@ static void btrfs_release_extent_buffer_page(struct extent_buffer *eb) BUG_ON(extent_buffer_under_io(eb)); - index = num_extent_pages(eb->start, eb->len); + index = num_extent_pages(eb); if (index == 0) return; @@ -4747,7 +4747,7 @@ struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src) unsigned long i; struct page *p; struct extent_buffer *new; - unsigned long num_pages = num_extent_pages(src->start, src->len); + unsigned long num_pages = num_extent_pages(src); new = __alloc_extent_buffer(src->fs_info, src->start, src->len); if (new == NULL) @@ -4779,12 +4779,11 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info, unsigned long num_pages; unsigned long i; - num_pages = num_extent_pages(start, len); - eb = __alloc_extent_buffer(fs_info, start, len); if (!eb) return NULL; + num_pages = num_extent_pages(eb); for (i = 0; i < num_pages; i++) { eb->pages[i] = alloc_page(GFP_NOFS); if (!eb->pages[i]) @@ -4848,7 +4847,7 @@ static void mark_extent_buffer_accessed(struct extent_buffer *eb, check_buffer_tree_ref(eb); - num_pages = num_extent_pages(eb->start, eb->len); + num_pages = num_extent_pages(eb); for (i = 0; i < num_pages; i++) { struct page *p = eb->pages[i]; @@ -4945,7 +4944,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start) { unsigned long len = fs_info->nodesize; - unsigned long num_pages = num_extent_pages(start, len); + unsigned long num_pages; unsigned long i; unsigned long index = start >> PAGE_SHIFT; struct extent_buffer *eb; @@ -4968,6 +4967,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, if (!eb) return ERR_PTR(-ENOMEM); + num_pages = num_extent_pages(eb); for (i = 0; i < num_pages; i++, index++) { p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL); if (!p) { @@ -5164,7 +5164,7 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb) unsigned long num_pages; struct page *page; - num_pages = num_extent_pages(eb->start, eb->len); + num_pages = num_extent_pages(eb); for (i = 0; i < num_pages; i++) { page = eb->pages[i]; @@ -5198,7 +5198,7 @@ int set_extent_buffer_dirty(struct extent_buffer *eb) was_dirty = test_and_set_bit(EXTENT_BUFFER_DIRTY, >bflags); - num_pages = num_extent_pages(eb->start, eb->len); + num_pages = num_extent_pages(eb); WARN_ON(atomic_read(>refs) == 0); WARN_ON(!test_bit(EXTENT_BUFFER_TREE_REF, >bflags)); @@ -5214,7 +5214,7 @@ void clear_extent_buffer_uptodate(struct extent_buffer *eb) unsigned long num_pages; clear_bit(EXTENT_BUFFER_UPTODATE, >bflags); - num_pages = num_extent_pages(eb->start, eb->len); + num_pages = num_extent_pages(eb); for (i = 0; i < num_pages; i++) { page = eb->pages[i]; if (page) @@ -5229,7 +5229,7 @@ void
[PATCH 1/3] btrfs: simplify counting number of eb pages
The eb length is nodesize, as initialized in __alloc_extent_buffer. Regardless of start, we should always get the same number of pages, so use that fact. Signed-off-by: David Sterba--- fs/btrfs/extent_io.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index a53009694b16..ee92c1289edd 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -454,8 +454,7 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb); static inline unsigned long num_extent_pages(u64 start, u64 len) { - return ((start + len + PAGE_SIZE - 1) >> PAGE_SHIFT) - - (start >> PAGE_SHIFT); + return len >> PAGE_SHIFT; } static inline void extent_buffer_get(struct extent_buffer *eb) -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: push relocation recovery into a helper thread
On Tue, Apr 17, 2018 at 02:45:33PM -0400, Jeff Mahoney wrote: > On a file system with many snapshots and qgroups enabled, an interrupted > balance can end up taking a long time to mount due to recovering the > relocations during mount. It does this in the task performing the mount, > which can't be interrupted and may prevent mounting (and systems booting) > for a long time as well. The thing is that as part of balance, this > runs in the background all the time. This patch pushes the recovery > into a helper thread and allows the mount to continue normally. We hold > off on resuming any paused balance operation until after the relocation > has completed, disallow any new balance operations if it's running, and > wait for it on umount and remounting read-only. The overall logic sounds ok. So, this can also stall the umount, right? Eg. if I start mount, relocation goes to background, then unmount will have to wait for completion. Balance pause is requested at umount time, something similar should be possible for relocation recovery. The fs_state bit CLOSING could be checked somewhere in the loop. > This doesn't do anything to address the relocation recovery operation > taking a long time but does allow the file system to mount. > > Signed-off-by: Jeff Mahoney> --- > fs/btrfs/ctree.h |7 +++ > fs/btrfs/disk-io.c|7 ++- > fs/btrfs/relocation.c | 92 > +- > fs/btrfs/super.c |5 +- > fs/btrfs/volumes.c|6 +++ > 5 files changed, 97 insertions(+), 20 deletions(-) > > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1052,6 +1052,10 @@ struct btrfs_fs_info { > struct btrfs_work qgroup_rescan_work; > bool qgroup_rescan_running; /* protected by qgroup_rescan_lock */ > > + /* relocation recovery items */ > + bool relocation_recovery_started; > + struct completion relocation_recovery_completion; This seems to copy the pattern of qgroup rescan, the completion + workqueue. I'm planning to move this scheme to the fs_state bit instead of bool and the wait_for_war with global workqueue, but for now we can leave as it is here. > + > /* filesystem state */ > unsigned long fs_state; > > @@ -3671,7 +3675,8 @@ int btrfs_init_reloc_root(struct btrfs_t > struct btrfs_root *root); > int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, > struct btrfs_root *root); > -int btrfs_recover_relocation(struct btrfs_root *root); > +int btrfs_recover_relocation(struct btrfs_fs_info *fs_info); > +void btrfs_wait_for_relocation_completion(struct btrfs_fs_info *fs_info); > int btrfs_reloc_clone_csums(struct inode *inode, u64 file_pos, u64 len); > int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans, > struct btrfs_root *root, struct extent_buffer *buf, > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -2999,7 +2999,7 @@ retry_root_backup: > goto fail_qgroup; > > mutex_lock(_info->cleaner_mutex); > - ret = btrfs_recover_relocation(tree_root); > + ret = btrfs_recover_relocation(fs_info); > mutex_unlock(_info->cleaner_mutex); > if (ret < 0) { > btrfs_warn(fs_info, "failed to recover relocation: %d", > @@ -3017,7 +3017,8 @@ retry_root_backup: > if (IS_ERR(fs_info->fs_root)) { > err = PTR_ERR(fs_info->fs_root); > btrfs_warn(fs_info, "failed to read fs tree: %d", err); > - goto fail_qgroup; > + close_ctree(fs_info); > + return err; > } > > if (sb_rdonly(sb)) > @@ -3778,6 +3779,8 @@ void close_ctree(struct btrfs_fs_info *f > /* wait for the qgroup rescan worker to stop */ > btrfs_qgroup_wait_for_completion(fs_info, false); > > + btrfs_wait_for_relocation_completion(fs_info); > + > /* wait for the uuid_scan task to finish */ > down(_info->uuid_tree_rescan_sem); > /* avoid complains from lockdep et al., set sem back to initial state */ > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > #include "ctree.h" > #include "disk-io.h" > #include "transaction.h" > @@ -4492,14 +4493,61 @@ static noinline_for_stack int mark_garba > } > > /* > - * recover relocation interrupted by system crash. > - * > * this function resumes merging reloc trees with corresponding fs trees. > * this is important for keeping the sharing of tree blocks > */ > -int btrfs_recover_relocation(struct btrfs_root *root) > +static int > +btrfs_resume_relocation(void *data) Please keep the type and function name on one line. > { > - struct btrfs_fs_info *fs_info = root->fs_info; > + struct btrfs_fs_info *fs_info = data; > + struct btrfs_trans_handle *trans; > + struct reloc_control *rc =
Re: [RFC] Add support for BTRFS raid5/6 to GRUB
On 04/23/2018 01:50 PM, Daniel Kiper wrote: > On Tue, Apr 17, 2018 at 09:57:40PM +0200, Goffredo Baroncelli wrote: >> Hi All, >> >> Below you can find a patch to add support for accessing files from >> grub in a RAID5/6 btrfs filesystem. This is a RFC because it is >> missing the support for recovery (i.e. if some devices are missed). In >> the next days (weeks ?) I will extend this patch to support also this >> case. >> >> Comments are welcome. > > More or less LGTM. Just a nitpick below... I am happy to take full blown > patch into GRUB if it is ready. Thanks for the comments; however now I implemented also the recovery. It is under testing. Let me few days and I will resubmit the patches. > >> BR >> G.Baroncelli >> >> >> --- >> >> commit 8c80a1b7c913faf50f95c5c76b4666ed17685666 >> Author: Goffredo Baroncelli>> Date: Tue Apr 17 21:40:31 2018 +0200 >> >> Add initial support for btrfs raid5/6 chunk >> >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c >> index be195448d..4c5632acb 100644 >> --- a/grub-core/fs/btrfs.c >> +++ b/grub-core/fs/btrfs.c >> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item >> #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 >> #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20 >> #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40 >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID60x100 >>grub_uint8_t dummy2[0xc]; >>grub_uint16_t nstripes; >>grub_uint16_t nsubstripes; >> @@ -764,6 +766,39 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, >> grub_disk_addr_t addr, >>stripe_offset = low + chunk_stripe_length >> * high; >>csize = chunk_stripe_length - low; >> + break; >> +} >> + case GRUB_BTRFS_CHUNK_TYPE_RAID5: >> + case GRUB_BTRFS_CHUNK_TYPE_RAID6: >> +{ >> + grub_uint64_t nparities; >> + grub_uint64_t parity_pos; >> + grub_uint64_t stripe_nr, high; >> + grub_uint64_t low; >> + >> + redundancy = 1; /* no redundancy for now */ >> + >> + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) >> +{ >> + grub_dprintf ("btrfs", "RAID5\n"); >> + nparities = 1; >> +} >> + else >> +{ >> + grub_dprintf ("btrfs", "RAID6\n"); >> + nparities = 2; >> +} >> + >> + stripe_nr = grub_divmod64 (off, chunk_stripe_length, ); >> + >> + high = grub_divmod64 (stripe_nr, nstripes - nparities, ); >> + grub_divmod64 (high+nstripes-nparities, nstripes, _pos); >> + grub_divmod64 (parity_pos+nparities+stripen, nstripes, ); > > Missing spaces around "+" and "-". > > Daniel > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libbrtfsutil questions
On 2018-04-23 14:25, waxhead wrote: Howdy! I am pondering writing a little C program that use libmicrohttpd and libbtrfsutil to display some very basic (overview) details about BTRFS. I was hoping to display the same information that'btrfs fi sh /mnt' and 'btrfs fi us -T /mnt' do, but somewhat combined. Since I recently just figured out how easy it was to do svg graphics I was hoping to try to visualize things a bit. What I was hoping to achieve is: - show all filesystems - ..show all devices in a filesystem (and mark missing devices clearly) - show usage and/or allocation for each device - possibly display chunks as blocks (like old defrag programs) where the brightness indicate how utilied a (meta)data chunk is. - possibly mark devices with errors ( 'btrfs de st /mnt' ). The problem is ... I looked at libbtrfsutil and it appears that there is mostly sync + subvolume/snapshot stuff in there. > So my question is: Is libbtrfsutil the right choice and intended to at some point (in the future?) supply me with the data I need for these things or should I look elsewhere? I believe that the intent is for libbtrfsutil to cover pretty much everything (AIUI, that's supposed to become the public API for BTRFS, with the CLI tools just being wrappers for that). However, as you've found out, it's not really there yet. If you've got some experience with Python and don't mind putting a bit more work in learning how things work at a low level in BTRFS, the Python interface maintained by Hans van Kranenburg and found at [1] may be of some interest. I would imagine it shouldn't be much more difficult to wire the Python http.server module up to that than it would be to wire libmicrohttpd up to a C library, and it would give the advantage that you could use a nice templating language like Jinja2 to format everything. PS! This a completely private project for my own egoistic reasons. However if it turns out to be useful and the code is not too embarrassing I am happy put the code into public domain ... if it ever gets written :S For what it's worth, I think the idea is great. Especially if it could be made to return the data in a JSON format, as that would then make it trivial to use for integrating most existing system monitoring infrastructure with BTRFS. [1] https://github.com/knorrie/python-btrfs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
libbrtfsutil questions
Howdy! I am pondering writing a little C program that use libmicrohttpd and libbtrfsutil to display some very basic (overview) details about BTRFS. I was hoping to display the same information that'btrfs fi sh /mnt' and 'btrfs fi us -T /mnt' do, but somewhat combined. Since I recently just figured out how easy it was to do svg graphics I was hoping to try to visualize things a bit. What I was hoping to achieve is: - show all filesystems - ..show all devices in a filesystem (and mark missing devices clearly) - show usage and/or allocation for each device - possibly display chunks as blocks (like old defrag programs) where the brightness indicate how utilied a (meta)data chunk is. - possibly mark devices with errors ( 'btrfs de st /mnt' ). The problem is ... I looked at libbtrfsutil and it appears that there is mostly sync + subvolume/snapshot stuff in there. So my question is: Is libbtrfsutil the right choice and intended to at some point (in the future?) supply me with the data I need for these things or should I look elsewhere? PS! This a completely private project for my own egoistic reasons. However if it turns out to be useful and the code is not too embarrassing I am happy put the code into public domain ... if it ever gets written :S -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status page
On Thu, Apr 19, 2018 at 06:24:29PM +0200, Gandalf Corvotempesta wrote: > Hi to all, > as kernel 4.16 is out and 4.17 in in RC, would be possible to update > BTRFS status page > https://btrfs.wiki.kernel.org/index.php/Status to reflect 4.16 stability ? > > That page is still based on kernel 4.15 (marked as EOL here: > https://www.kernel.org/) Reviewed and updated for 4.16, there's no change regarding the overall status, though 4.16 has some raid56 fixes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Fix wrong first_key parameter in replace_path
On Mon, Apr 23, 2018 at 05:20:06PM +0300, Nikolay Borisov wrote: > On 23.04.2018 17:16, David Sterba wrote: > > On Mon, Apr 23, 2018 at 05:32:04PM +0800, Qu Wenruo wrote: > >> Commit 581c1760415c ("btrfs: Validate child tree block's level and first > >> key") introduced new @first_key parameter for read_tree_block(), however > >> caller in replace_path() is parasing wrong key to read_tree_block(). > >> > >> It should use parameter @first_key other than @key. > >> > >> Normally it won't expose problem as @key is normally initialzied to the > >> same value of @first_key we expect. > >> However in relocation recovery case, @key can be set to (0, 0, 0), and > >> since no valid key in relocation tree can be (0, 0, 0), it will cause > >> read_tree_block() to return -EUCLEAN and interrupt relocation recovery. > >> > >> Fix it by setting @first_key correctly. > >> > >> Signed-off-by: Qu Wenruo> > > > Added to next, thanks. > > This is actually -rc2 material, right? Yes, adding to next is the 1st stage as we want to start testing. What will end up in the next rc pull is determined later if everything is fine. For some patches it's obvious that they're fixing a regression, the others are clear fixes that still qualify for rc. I can be more specific about the target branch, but this should be transparent to developers anyway once the patch is in the pool. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Fix wrong first_key parameter in replace_path
On 23.04.2018 17:16, David Sterba wrote: > On Mon, Apr 23, 2018 at 05:32:04PM +0800, Qu Wenruo wrote: >> Commit 581c1760415c ("btrfs: Validate child tree block's level and first >> key") introduced new @first_key parameter for read_tree_block(), however >> caller in replace_path() is parasing wrong key to read_tree_block(). >> >> It should use parameter @first_key other than @key. >> >> Normally it won't expose problem as @key is normally initialzied to the >> same value of @first_key we expect. >> However in relocation recovery case, @key can be set to (0, 0, 0), and >> since no valid key in relocation tree can be (0, 0, 0), it will cause >> read_tree_block() to return -EUCLEAN and interrupt relocation recovery. >> >> Fix it by setting @first_key correctly. >> >> Signed-off-by: Qu Wenruo> > Added to next, thanks. This is actually -rc2 material, right? > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Fix wrong first_key parameter in replace_path
On Mon, Apr 23, 2018 at 05:32:04PM +0800, Qu Wenruo wrote: > Commit 581c1760415c ("btrfs: Validate child tree block's level and first > key") introduced new @first_key parameter for read_tree_block(), however > caller in replace_path() is parasing wrong key to read_tree_block(). > > It should use parameter @first_key other than @key. > > Normally it won't expose problem as @key is normally initialzied to the > same value of @first_key we expect. > However in relocation recovery case, @key can be set to (0, 0, 0), and > since no valid key in relocation tree can be (0, 0, 0), it will cause > read_tree_block() to return -EUCLEAN and interrupt relocation recovery. > > Fix it by setting @first_key correctly. > > Signed-off-by: Qu WenruoAdded to next, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] btrfs: add FS_IOC_FSSETXATTR ioctl
On Mon, Apr 23, 2018 at 11:42:42AM +0900, Misono Tomohiro wrote: > On 2018/04/21 2:02, David Sterba wrote: > > The new ioctl is an extension to the FS_IOC_SETFLAGS and adds new > > flags and is extensible. Don't get fooled by the XATTR in the name, it > > does not have anything in common with the extended attributes, > > incidentally also abbreviated as XATTRs. > > > > This patch allows to set the xflags portion of the fsxattr structure, > > other items have no meaning and non-zero values will result in > > EOPNOTSUPP. > > > > Currently supported xflags: > > > > - APPEND > > - IMMUTABLE > > - NOATIME > > - NODUMP > > - SYNC > > > > The structure of btrfs_ioctl_fssetxattr copies btrfs_ioctl_setflags but > > is simpler on the flag setting side. > > > > The original patch was written by Chandan Jay Sharma but was incomplete > > and no further revision has been sent. > > > > Based-on-patches-by: Chandan Jay Sharma> > Signed-off-by: David Sterba > > --- > > fs/btrfs/ioctl.c | 94 > > > > 1 file changed, 94 insertions(+) > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > > index 52b12ab9b82b..4fd61f191bba 100644 > > --- a/fs/btrfs/ioctl.c > > +++ b/fs/btrfs/ioctl.c > > @@ -388,6 +388,98 @@ static int btrfs_ioctl_fsgetxattr(struct file *file, > > void __user *arg) > > return 0; > > } > > > > +static int btrfs_ioctl_fssetxattr(struct file *file, void __user *arg) > > +{ > > + struct inode *inode = file_inode(file); > > + struct btrfs_inode *binode = BTRFS_I(inode); > > + struct btrfs_root *root = binode->root; > > + struct btrfs_trans_handle *trans; > > + struct fsxattr fa; > > + unsigned oldflags; > > + unsigned old_i_flags; > > + int ret = 0; > > + > > + if (!inode_owner_or_capable(inode)) > > + return -EPERM; > > + > > + if (btrfs_root_readonly(root)) > > + return -EROFS; > > + > > + memset(, 0, sizeof(fa)); > > + if (copy_from_user(, arg, sizeof(fa))) > > + return -EFAULT; > > + > > + ret = check_xflags(fa.fsx_xflags); > > + if (ret) > > + return ret; > > + > > + if (fa.fsx_extsize != 0 || fa.fsx_projid != 0 || fa.fsx_cowextsize != 0) > > + return -EOPNOTSUPP; > > + > > + ret = mnt_want_write_file(file); > > + if (ret) > > + return ret; > > + > > + inode_lock(inode); > > + > > + oldflags = binode->flags; > > + old_i_flags = inode->i_flags; > > + > > + /* We need the capabilities to change append-only or immutable inode */ > > + if (((oldflags & (BTRFS_INODE_APPEND | BTRFS_INODE_IMMUTABLE)) || > > +(fa.fsx_xflags & (FS_XFLAG_APPEND | FS_XFLAG_IMMUTABLE))) && > > + !capable(CAP_LINUX_IMMUTABLE)) { > > + ret = -EPERM; > > + goto out_unlock; > > + } > > + > > + if (fa.fsx_xflags & FS_XFLAG_SYNC) > > + binode->flags |= BTRFS_INODE_SYNC; > > + else > > + binode->flags &= ~BTRFS_INODE_SYNC; > > + if (fa.fsx_xflags & FS_XFLAG_IMMUTABLE) > > + binode->flags |= BTRFS_INODE_IMMUTABLE; > > + else > > + binode->flags &= ~BTRFS_INODE_IMMUTABLE; > > + if (fa.fsx_xflags & FS_XFLAG_APPEND) > > + binode->flags |= BTRFS_INODE_APPEND; > > + else > > + binode->flags &= ~BTRFS_INODE_APPEND; > > + if (fa.fsx_xflags & FS_XFLAG_NODUMP) > > + binode->flags |= BTRFS_INODE_NODUMP; > > + else > > + binode->flags &= ~BTRFS_INODE_NODUMP; > > + if (fa.fsx_xflags & FS_XFLAG_NOATIME) > > + binode->flags |= BTRFS_INODE_NOATIME; > > + else > > + binode->flags &= ~BTRFS_INODE_NOATIME; > > + > > + /* 1 item for the inode */ > > + trans = btrfs_start_transaction(root, 1); > > + if (IS_ERR(trans)) { > > + ret = PTR_ERR(trans); > > + goto out_drop; > > + } > > + > > + btrfs_sync_inode_flags_to_i_flags(inode); > > + inode_inc_iversion(inode); > > + inode->i_ctime = current_time(inode); > > + ret = btrfs_update_inode(trans, root, inode); > > + > > + btrfs_end_transaction(trans); > > Shouldn't out_drop label be here? > Otherwise, above "goto out_drop" won't unlock nor restore the flag value. Right, will fix, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] btrfs: rename btrfs_flags_to_ioctl to reflect which flags it touches
On Mon, Apr 23, 2018 at 11:37:08AM +0900, Misono Tomohiro wrote: > On 2018/04/21 2:02, David Sterba wrote: > > Converts btrfs_inode::flags to the FS_*_FL flags. > > > > Signed-off-by: David Sterba> > --- > > fs/btrfs/ioctl.c | 9 + > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > > index 953473f2a136..f0e6074233fa 100644 > > --- a/fs/btrfs/ioctl.c > > +++ b/fs/btrfs/ioctl.c > > @@ -105,9 +105,10 @@ static unsigned int btrfs_mask_fsflags_for_type(struct > > inode *inode, > > } > > > > /* > > - * Export inode flags to the format expected by the FS_IOC_GETFLAGS ioctl. > > + * Export internal inode flags to the format expected by the > > FS_IOC_GETFLAGS > > + * ioctl. > > */ > > -static unsigned int btrfs_flags_to_ioctl(unsigned int flags) > > +static unsigned int btrfs_inode_flags_to_fsflags(unsigned int flags) > > { > > unsigned int iflags = 0; > > > > @@ -161,7 +162,7 @@ void btrfs_sync_inode_flags_to_i_flags(struct inode > > *inode) > > static int btrfs_ioctl_getflags(struct file *file, void __user *arg) > > { > > struct btrfs_inode *ip = BTRFS_I(file_inode(file)); > > *ip here and in btrfs_ioctl_setflags() should be changed to *binode as in 1st > patch? Yeah this would be at least consistent with the 1st patch, on a second look I'm not sure if the variable renaming should be part of the patch, so I'll move it to another patch. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Add support for BTRFS raid5/6 to GRUB
On Tue, Apr 17, 2018 at 09:57:40PM +0200, Goffredo Baroncelli wrote: > Hi All, > > Below you can find a patch to add support for accessing files from > grub in a RAID5/6 btrfs filesystem. This is a RFC because it is > missing the support for recovery (i.e. if some devices are missed). In > the next days (weeks ?) I will extend this patch to support also this > case. > > Comments are welcome. More or less LGTM. Just a nitpick below... I am happy to take full blown patch into GRUB if it is ready. > BR > G.Baroncelli > > > --- > > commit 8c80a1b7c913faf50f95c5c76b4666ed17685666 > Author: Goffredo Baroncelli> Date: Tue Apr 17 21:40:31 2018 +0200 > > Add initial support for btrfs raid5/6 chunk > > diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c > index be195448d..4c5632acb 100644 > --- a/grub-core/fs/btrfs.c > +++ b/grub-core/fs/btrfs.c > @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item > #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 > #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20 > #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40 > +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 > +#define GRUB_BTRFS_CHUNK_TYPE_RAID60x100 >grub_uint8_t dummy2[0xc]; >grub_uint16_t nstripes; >grub_uint16_t nsubstripes; > @@ -764,6 +766,39 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, > grub_disk_addr_t addr, > stripe_offset = low + chunk_stripe_length > * high; > csize = chunk_stripe_length - low; > + break; > + } > + case GRUB_BTRFS_CHUNK_TYPE_RAID5: > + case GRUB_BTRFS_CHUNK_TYPE_RAID6: > + { > + grub_uint64_t nparities; > + grub_uint64_t parity_pos; > + grub_uint64_t stripe_nr, high; > + grub_uint64_t low; > + > + redundancy = 1; /* no redundancy for now */ > + > + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) > + { > + grub_dprintf ("btrfs", "RAID5\n"); > + nparities = 1; > + } > + else > + { > + grub_dprintf ("btrfs", "RAID6\n"); > + nparities = 2; > + } > + > + stripe_nr = grub_divmod64 (off, chunk_stripe_length, ); > + > + high = grub_divmod64 (stripe_nr, nstripes - nparities, ); > + grub_divmod64 (high+nstripes-nparities, nstripes, _pos); > + grub_divmod64 (parity_pos+nparities+stripen, nstripes, ); Missing spaces around "+" and "-". Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Fix wrong first_key parameter in replace_path
Commit 581c1760415c ("btrfs: Validate child tree block's level and first key") introduced new @first_key parameter for read_tree_block(), however caller in replace_path() is parasing wrong key to read_tree_block(). It should use parameter @first_key other than @key. Normally it won't expose problem as @key is normally initialzied to the same value of @first_key we expect. However in relocation recovery case, @key can be set to (0, 0, 0), and since no valid key in relocation tree can be (0, 0, 0), it will cause read_tree_block() to return -EUCLEAN and interrupt relocation recovery. Fix it by setting @first_key correctly. Signed-off-by: Qu Wenruo--- fs/btrfs/relocation.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 00b7d3231821..b041b945a7ae 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1841,7 +1841,7 @@ int replace_path(struct btrfs_trans_handle *trans, old_bytenr = btrfs_node_blockptr(parent, slot); blocksize = fs_info->nodesize; old_ptr_gen = btrfs_node_ptr_generation(parent, slot); - btrfs_node_key_to_cpu(parent, , slot); + btrfs_node_key_to_cpu(parent, _key, slot); if (level <= max_level) { eb = path->nodes[level]; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Remove delay_iput parameter when running delalloc work
On 23.04.2018 12:27, Qu Wenruo wrote: > > > On 2018年04月23日 15:54, Nikolay Borisov wrote: >> While trying to make sense of the lifecycle of delayed iputs it became >> apparent >> that the delay_iput parameter of btrfs_start_delalloc_roots/ >> btrfs_start_delalloc_inodes is always set to 0. (Patch 1 contains historical >> information of why this parameter was needed and when it became obsolete). >> Now >> that the code has changed sufficiently it's no longer required to have it so >> this series gradually removes it. >> >> Nikolay Borisov (5): >> btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots >> btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes >> btrfs: Remove delay_iput parameter from __start_delalloc_inodes >> btrfs: Remove delayed_iput member from btrfs_delalloc_work >> btrfs: Unexport btrfs_alloc_delalloc_work > > Solid cleanup. > > Reviewed-by: Qu Wenruo> > Just a little nitpick about the 3rd and 4th patch. > It would be nicer the merge them, as in the the 3rd patch > btrfs_alloc_delalloc_work() call still takes 0 as @delay_iput, but in > next patch the @delay_iput just get removed. I'm fine with that, I guess David if you deem it more reasonable you could squash the 2 patches. I just did it for the sake of bisectability and review purposes. > > Thanks, > Qu > >> >> fs/btrfs/ctree.h | 9 ++--- >> fs/btrfs/dev-replace.c | 2 +- >> fs/btrfs/extent-tree.c | 4 ++-- >> fs/btrfs/inode.c | 28 +--- >> fs/btrfs/ioctl.c | 4 ++-- >> 5 files changed, 16 insertions(+), 31 deletions(-) >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Remove delay_iput parameter when running delalloc work
On 2018年04月23日 15:54, Nikolay Borisov wrote: > While trying to make sense of the lifecycle of delayed iputs it became > apparent > that the delay_iput parameter of btrfs_start_delalloc_roots/ > btrfs_start_delalloc_inodes is always set to 0. (Patch 1 contains historical > information of why this parameter was needed and when it became obsolete). Now > that the code has changed sufficiently it's no longer required to have it so > this series gradually removes it. > > Nikolay Borisov (5): > btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots > btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes > btrfs: Remove delay_iput parameter from __start_delalloc_inodes > btrfs: Remove delayed_iput member from btrfs_delalloc_work > btrfs: Unexport btrfs_alloc_delalloc_work Solid cleanup. Reviewed-by: Qu WenruoJust a little nitpick about the 3rd and 4th patch. It would be nicer the merge them, as in the the 3rd patch btrfs_alloc_delalloc_work() call still takes 0 as @delay_iput, but in next patch the @delay_iput just get removed. Thanks, Qu > > fs/btrfs/ctree.h | 9 ++--- > fs/btrfs/dev-replace.c | 2 +- > fs/btrfs/extent-tree.c | 4 ++-- > fs/btrfs/inode.c | 28 +--- > fs/btrfs/ioctl.c | 4 ++-- > 5 files changed, 16 insertions(+), 31 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.17-rc1 FS went read-only during balance
> > TL;DR It seems as regression in 4.17, but I managed to find a > > workaround to make filesystem rw mountable again. > > > > Kernel built from tag v4.17-rc1 > > btrfs-progs 4.16 > > > > Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were > > doing usual weekly balance with this command via cron: > > btrfs balance start -musage=50 -dusage=50 > > Both machines run same kernel version. > > > > On PC that caused root and "data" filesystems to go readonly. Root > > is on an SSD with data single and metadata DUP, "data" filesystem > > is on 2 HDDs with RAID1 for data and metadata. > > > > On laptop only /home went ro, it's on NVMe SSD with data single and > > metadata DUP. > > > > Btrfs check of PC rootfs was without any errors in both modes, I did > > them once each before reboot on readonly filesystem with --force > > flag and then from live usb. Same output without any errors. > > > > After reboot kernel refused rw mount rootfs with the same error as > > during cron balance, ro mount was accepted, error during rw mount: > > BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117 > >>> > 117 means EUCLEAN, which could be caused by the newly introduced > first_key and level check. > >>> > Please apply this hotfix to fix it. > btrfs: Only check first key for committed tree blocks > (Which is included in latest pull request) > >>> > Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra > debug info. > >>> > Thanks, > Qu > >>> > >>> I tried 4.17-rc2 (as the pull request was pulled) with > >>> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb) > >>> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg > >>> attached. > >> > >> Thanks for the info and your previous btrfs-image. > >> > >> The image itself shows nothing wrong, so it should be runtime problem. > >> Would you please apply these two debug patches? > >> https://patchwork.kernel.org/patch/10335133/ > >> https://patchwork.kernel.org/patch/10335135/ > >> > >> And the attached diff file? > >> > >> My guess is the parent node is not initialized correctly in this case. > >> > >> Thanks, > >> Qu > > > > Dmesg from kernel with all three patches applied attached. > > > Thanks for the debug info, it really helps a lot! > > It turns out that I'm just a super idiot, a typo in replace_path() > caused this, and it could not be trigger unless we enter it from > relocation recovery. > > Please try the attached patch to see if it solves the problem. > > Thanks, > Qu Glad to help, the patch solved the problem, rw mount is successful and balance finished, no errors or debug output, btrfs check is clean in both modes. [2.842718] BTRFS: device label home devid 1 transid 277952 /dev/vdb [2.924965] BTRFS: device label root devid 1 transid 84092 /dev/vda2 [3.072271] BTRFS info (device vda2): use lzo compression, level 0 [3.072897] BTRFS info (device vda2): enabling auto defrag [3.073476] BTRFS info (device vda2): using free space tree [3.074049] BTRFS info (device vda2): has skinny extents [5.411821] BTRFS info (device vda2): using free space tree [ 24.925293] BTRFS info (device vdb): using free space tree [ 24.925324] BTRFS info (device vdb): has skinny extents [ 31.711868] BTRFS info (device vdb): continuing balance [ 31.721658] BTRFS info (device vdb): checking UUID tree [ 31.822920] BTRFS info (device vdb): relocating block group 69889687552flags data [ 33.730399] BTRFS info (device vdb): found 12 extents [ 36.950699] BTRFS info (device vdb): found 12 extents [ 37.030813] BTRFS info (device vdb): relocating block group 67742203904flags metadata|dup [ 37.104174] BTRFS info (device vdb): relocating block group 67708649472 flags system|dup [ 37.189843] BTRFS info (device vdb): found 1 extents pgppgUIF6oj1v.pgp Description: OpenPGP digital signature
Re: 4.17-rc1 FS went read-only during balance
On 2018年04月23日 16:04, Dmitrii Tcvetkov wrote: > TL;DR It seems as regression in 4.17, but I managed to find a > workaround to make filesystem rw mountable again. > > Kernel built from tag v4.17-rc1 > btrfs-progs 4.16 > > Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were > doing usual weekly balance with this command via cron: > btrfs balance start -musage=50 -dusage=50 > Both machines run same kernel version. > > On PC that caused root and "data" filesystems to go readonly. Root > is on an SSD with data single and metadata DUP, "data" filesystem > is on 2 HDDs with RAID1 for data and metadata. > > On laptop only /home went ro, it's on NVMe SSD with data single and > metadata DUP. > > Btrfs check of PC rootfs was without any errors in both modes, I did > them once each before reboot on readonly filesystem with --force > flag and then from live usb. Same output without any errors. > > After reboot kernel refused rw mount rootfs with the same error as > during cron balance, ro mount was accepted, error during rw mount: > BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117 >>> 117 means EUCLEAN, which could be caused by the newly introduced first_key and level check. >>> Please apply this hotfix to fix it. btrfs: Only check first key for committed tree blocks (Which is included in latest pull request) >>> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra debug info. >>> Thanks, Qu >>> >>> I tried 4.17-rc2 (as the pull request was pulled) with >>> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb) >>> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg >>> attached. >> >> Thanks for the info and your previous btrfs-image. >> >> The image itself shows nothing wrong, so it should be runtime problem. >> Would you please apply these two debug patches? >> https://patchwork.kernel.org/patch/10335133/ >> https://patchwork.kernel.org/patch/10335135/ >> >> And the attached diff file? >> >> My guess is the parent node is not initialized correctly in this case. >> >> Thanks, >> Qu > > Dmesg from kernel with all three patches applied attached. > Thanks for the debug info, it really helps a lot! It turns out that I'm just a super idiot, a typo in replace_path() caused this, and it could not be trigger unless we enter it from relocation recovery. Please try the attached patch to see if it solves the problem. Thanks, Qu From 4b70eb864192ec5cf54a7e67e2957ddf0e5c0f6f Mon Sep 17 00:00:00 2001 From: Qu WenruoDate: Mon, 23 Apr 2018 16:13:55 +0800 Subject: [PATCH] btrfs: Fix wrong first_key parameter in replace_path Commit 581c1760415c ("btrfs: Validate child tree block's level and first key") introduced new @first_key parameter for read_tree_block(), however caller in replace_path() is parasing wrong key to read_tree_block(). It should use parameter @first_key other than @key. Normally it won't expose problem as @key is normally initialzied to the same value of @first_key we expect. However in relocation recovery case, @key can be set to (0, 0, 0), and since no valid key in relocation tree can be (0, 0, 0), it will cause read_tree_block() to return -EUCLEAN and interrupt relocation recovery. Fix it by setting @first_key correctly. Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 00b7d3231821..b041b945a7ae 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1841,7 +1841,7 @@ int replace_path(struct btrfs_trans_handle *trans, old_bytenr = btrfs_node_blockptr(parent, slot); blocksize = fs_info->nodesize; old_ptr_gen = btrfs_node_ptr_generation(parent, slot); - btrfs_node_key_to_cpu(parent, , slot); + btrfs_node_key_to_cpu(parent, _key, slot); if (level <= max_level) { eb = path->nodes[level]; -- 2.17.0 signature.asc Description: OpenPGP digital signature
[PATCH 3/5] btrfs: Remove delay_iput parameter from __start_delalloc_inodes
It's always set to 0 so remove it Signed-off-by: Nikolay Borisov--- fs/btrfs/inode.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 855237737acb..42a2590559df 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10189,8 +10189,7 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode, * some fairly slow code that needs optimization. This walks the list * of all the inodes with pending delalloc and forces them to disk. */ -static int __start_delalloc_inodes(struct btrfs_root *root, int delay_iput, - int nr) +static int __start_delalloc_inodes(struct btrfs_root *root, int nr) { struct btrfs_inode *binode; struct inode *inode; @@ -10218,12 +10217,9 @@ static int __start_delalloc_inodes(struct btrfs_root *root, int delay_iput, } spin_unlock(>delalloc_lock); - work = btrfs_alloc_delalloc_work(inode, delay_iput); + work = btrfs_alloc_delalloc_work(inode, 0); if (!work) { - if (delay_iput) - btrfs_add_delayed_iput(inode); - else - iput(inode); + iput(inode); ret = -ENOMEM; goto out; } @@ -10262,7 +10258,7 @@ int btrfs_start_delalloc_inodes(struct btrfs_root *root) if (test_bit(BTRFS_FS_STATE_ERROR, _info->fs_state)) return -EROFS; - ret = __start_delalloc_inodes(root, 0, -1); + ret = __start_delalloc_inodes(root, -1); if (ret > 0) ret = 0; return ret; @@ -10291,7 +10287,7 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr) _info->delalloc_roots); spin_unlock(_info->delalloc_root_lock); - ret = __start_delalloc_inodes(root, 0, nr); + ret = __start_delalloc_inodes(root, nr); btrfs_put_fs_root(root); if (ret < 0) goto out; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots
This parameter was introduced alongside the function in eb73c1b7cea7 ("Btrfs: introduce per-subvolume delalloc inode list") to avoid deadlocks since this function was used in the transaction commit path. However, commit 8d875f95da43 ("btrfs: disable strict file flushes for renames and truncates") removed that usage, rendering the parameter obsolete. Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 3 +-- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/extent-tree.c | 4 ++-- fs/btrfs/inode.c | 5 ++--- fs/btrfs/ioctl.c | 2 +- 5 files changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aac3a8bfab51..8c5ab06ccc0a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3199,8 +3199,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, u32 min_type); int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput); -int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput, - int nr); +int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr); int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, unsigned int extra_bits, struct extent_state **cached_state, int dedupe); diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index f82be266ba4b..a08b30052e06 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -503,7 +503,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, * flush all outstanding I/O and inode extent mappings before the * copy operation is declared as being finished */ - ret = btrfs_start_delalloc_roots(fs_info, 0, -1); + ret = btrfs_start_delalloc_roots(fs_info, -1); if (ret) { mutex_unlock(_replace->lock_finishing_cancel_unmount); return ret; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 22ba5d500836..cf3cd23f274d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4337,7 +4337,7 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes) need_commit--; if (need_commit > 0) { - btrfs_start_delalloc_roots(fs_info, 0, -1); + btrfs_start_delalloc_roots(fs_info, -1); btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1); } @@ -4790,7 +4790,7 @@ static void btrfs_writeback_inodes_sb_nr(struct btrfs_fs_info *fs_info, * the filesystem is readonly(all dirty pages are written to * the disk). */ - btrfs_start_delalloc_roots(fs_info, 0, nr_items); + btrfs_start_delalloc_roots(fs_info, nr_items); if (!current->journal_info) btrfs_wait_ordered_roots(fs_info, nr_items, 0, (u64)-1); } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c0a38917539d..74ea63159b93 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10268,8 +10268,7 @@ int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput) return ret; } -int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput, - int nr) +int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr) { struct btrfs_root *root; struct list_head splice; @@ -10292,7 +10291,7 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput, _info->delalloc_roots); spin_unlock(_info->delalloc_root_lock); - ret = __start_delalloc_inodes(root, delay_iput, nr); + ret = __start_delalloc_inodes(root, 0, nr); btrfs_put_fs_root(root); if (ret < 0) goto out; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index de5dcdb87e97..ea192378cb3b 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -5316,7 +5316,7 @@ long btrfs_ioctl(struct file *file, unsigned int case BTRFS_IOC_SYNC: { int ret; - ret = btrfs_start_delalloc_roots(fs_info, 0, -1); + ret = btrfs_start_delalloc_roots(fs_info, -1); if (ret) return ret; ret = btrfs_sync_fs(inode->i_sb, 1); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Remove delay_iput parameter when running delalloc work
While trying to make sense of the lifecycle of delayed iputs it became apparent that the delay_iput parameter of btrfs_start_delalloc_roots/ btrfs_start_delalloc_inodes is always set to 0. (Patch 1 contains historical information of why this parameter was needed and when it became obsolete). Now that the code has changed sufficiently it's no longer required to have it so this series gradually removes it. Nikolay Borisov (5): btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes btrfs: Remove delay_iput parameter from __start_delalloc_inodes btrfs: Remove delayed_iput member from btrfs_delalloc_work btrfs: Unexport btrfs_alloc_delalloc_work fs/btrfs/ctree.h | 9 ++--- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/extent-tree.c | 4 ++-- fs/btrfs/inode.c | 28 +--- fs/btrfs/ioctl.c | 4 ++-- 5 files changed, 16 insertions(+), 31 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] btrfs: Remove delayed_iput member from btrfs_delalloc_work
When allocating a delalloc work we are always setting the delayed_iput to 0. So remove the delay_iput member of btrfs_delalloc_work, as a result also remove it as a parameter from btrfs_alloc_delalloc_work since it's not used anymore. Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 4 +--- fs/btrfs/inode.c | 11 +++ 2 files changed, 4 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1ac983270bb7..91f51a80334f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3165,14 +3165,12 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode, /* inode.c */ struct btrfs_delalloc_work { struct inode *inode; - int delay_iput; struct completion completion; struct list_head list; struct btrfs_work work; }; -struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode, - int delay_iput); +struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode); struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode, struct page *page, size_t pg_offset, u64 start, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 42a2590559df..9b8f2904ad55 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10158,15 +10158,11 @@ static void btrfs_run_delalloc_work(struct btrfs_work *work) _I(inode)->runtime_flags)) filemap_flush(inode->i_mapping); - if (delalloc_work->delay_iput) - btrfs_add_delayed_iput(inode); - else - iput(inode); + iput(inode); complete(_work->completion); } -struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode, - int delay_iput) +struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode) { struct btrfs_delalloc_work *work; @@ -10177,7 +10173,6 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode, init_completion(>completion); INIT_LIST_HEAD(>list); work->inode = inode; - work->delay_iput = delay_iput; WARN_ON_ONCE(!inode); btrfs_init_work(>work, btrfs_flush_delalloc_helper, btrfs_run_delalloc_work, NULL, NULL); @@ -10217,7 +10212,7 @@ static int __start_delalloc_inodes(struct btrfs_root *root, int nr) } spin_unlock(>delalloc_lock); - work = btrfs_alloc_delalloc_work(inode, 0); + work = btrfs_alloc_delalloc_work(inode); if (!work) { iput(inode); ret = -ENOMEM; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes
It's always set to 0, so just remove it and collapse the constant value to the only function we are passing it. Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 2 +- fs/btrfs/inode.c | 4 ++-- fs/btrfs/ioctl.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8c5ab06ccc0a..1ac983270bb7 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3198,7 +3198,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, struct inode *inode, u64 new_size, u32 min_type); -int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput); +int btrfs_start_delalloc_inodes(struct btrfs_root *root); int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr); int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, unsigned int extra_bits, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 74ea63159b93..855237737acb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10254,7 +10254,7 @@ static int __start_delalloc_inodes(struct btrfs_root *root, int delay_iput, return ret; } -int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput) +int btrfs_start_delalloc_inodes(struct btrfs_root *root) { struct btrfs_fs_info *fs_info = root->fs_info; int ret; @@ -10262,7 +10262,7 @@ int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput) if (test_bit(BTRFS_FS_STATE_ERROR, _info->fs_state)) return -EROFS; - ret = __start_delalloc_inodes(root, delay_iput, -1); + ret = __start_delalloc_inodes(root, 0, -1); if (ret > 0) ret = 0; return ret; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index ea192378cb3b..f94b70e3525d 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -640,7 +640,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, wait_event(root->subv_writers->wait, percpu_counter_sum(>subv_writers->counter) == 0); - ret = btrfs_start_delalloc_inodes(root, 0); + ret = btrfs_start_delalloc_inodes(root); if (ret) goto dec_and_free; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] btrfs: Unexport btrfs_alloc_delalloc_work
It's used only in inode.c so makes no sense to have it exported. Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 91f51a80334f..e5c24d86fdfa 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3170,8 +3170,6 @@ struct btrfs_delalloc_work { struct btrfs_work work; }; -struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode); - struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode, struct page *page, size_t pg_offset, u64 start, u64 len, int create); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.17-rc1 FS went read-only during balance
On 2018年04月23日 13:08, Dmitrii Tcvetkov wrote: > On Mon, 23 Apr 2018 09:23:53 +0800 > Qu Wenruowrote: > >> On 2018年04月21日 22:55, Dmitrii Tcvetkov wrote: >>> TL;DR It seems as regression in 4.17, but I managed to find a >>> workaround to make filesystem rw mountable again. >>> >>> Kernel built from tag v4.17-rc1 >>> btrfs-progs 4.16 >>> >>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were >>> doing usual weekly balance with this command via cron: >>> btrfs balance start -musage=50 -dusage=50 >>> Both machines run same kernel version. >>> >>> On PC that caused root and "data" filesystems to go readonly. Root >>> is on an SSD with data single and metadata DUP, "data" filesystem >>> is on 2 HDDs with RAID1 for data and metadata. >>> >>> On laptop only /home went ro, it's on NVMe SSD with data single and >>> metadata DUP. >>> >>> Btrfs check of PC rootfs was without any errors in both modes, I did >>> them once each before reboot on readonly filesystem with --force >>> flag and then from live usb. Same output without any errors. >>> >>> After reboot kernel refused rw mount rootfs with the same error as >>> during cron balance, ro mount was accepted, error during rw mount: >>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117 > >> 117 means EUCLEAN, which could be caused by the newly introduced >> first_key and level check. > >> Please apply this hotfix to fix it. >> btrfs: Only check first key for committed tree blocks >> (Which is included in latest pull request) > >> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra >> debug info. > >> Thanks, >> Qu > > I tried 4.17-rc2 (as the pull request was pulled) with > CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb) > in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg > attached. Thanks for the info and your previous btrfs-image. The image itself shows nothing wrong, so it should be runtime problem. Would you please apply these two debug patches? https://patchwork.kernel.org/patch/10335133/ https://patchwork.kernel.org/patch/10335135/ And the attached diff file? My guess is the parent node is not initialized correctly in this case. Thanks, Qu diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 60caa68c3618..79f482578e02 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -458,6 +458,7 @@ static int verify_level_key(struct btrfs_fs_info *fs_info, eb->start, first_key->objectid, first_key->type, first_key->offset, found_key.objectid, found_key.type, found_key.offset); + btrfs_print_tree(eb, false); } #endif return ret; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 00b7d3231821..cde0cb6c9786 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1870,6 +1870,8 @@ int replace_path(struct btrfs_trans_handle *trans, level - 1, _key); if (IS_ERR(eb)) { ret = PTR_ERR(eb); +btrfs_err(fs_info, "parent leaf, slot: %d:", slot); +btrfs_print_tree(parent, false); break; } else if (!extent_buffer_uptodate(eb)) { ret = -EIO; signature.asc Description: OpenPGP digital signature