Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
在 2015年09月23日 17:40, Stéphane Lesimple 写道: Le 2015-09-23 09:03, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/22 16:31 +0200: Le 2015-09-22 10:51, Qu Wenruo a écrit : [92098.842261] Call Trace: [92098.842277] [] ? read_extent_buffer+0xb8/0x110 [btrfs] [92098.842304] [] ? btrfs_find_all_roots+0x60/0x70 [btrfs] [92098.842329] [] btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs] Would you please show the code of it? This one seems to be another stupid bug I made when rewriting the framework. Maybe I forgot to reinit some variants or I'm screwing memory... (gdb) list *(btrfs_qgroup_rescan_worker+0x28d) 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760). 2755 2756static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu, 2757 struct btrfs_disk_key *disk) 2758{ 2759cpu->offset =e64_to_cpu(disk->offset); 2760cpu->type =isk->type; 2761cpu->objectid =e64_to_cpu(disk->objectid); 2762} 2763 2764static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key *disk, (gdb) Does it makes sense ? So it seems that the memory of cpu key is being screwed up... The code is be specific thin inline function, so what about other stack? Like btrfs_qgroup_rescan_helper+0x12? Thanks, Qu Oh, I forgot that you can just change the number of btrfs_qgroup_rescan_worker+0x28d to smaller value. Try +0x280 for example, which will revert to 14 bytes asm code back, which may jump out of the inline function range, and may give you a good hint. Or gdb may have a better mode for inline function, but I don't know... Actually, "list -" is our friend here (show 10 lignes before the last src output) No, that's not the case. List - will only show lines around the source code. What I need is to get the higher caller stack. If debugging a running program, it's quite easy to just use frame command. But in this situation, we don't have call stack, so I'd like to change the +0x28d to several bytes backward, until we jump out of the inline function call, and see the meaningful codes. Ah, you're right. I had a hard time finding a value where I wouldn't end up in another inline function or entirely somewhere else in the kernel code, but here it is : (gdb) list *(btrfs_qgroup_rescan_worker+0x26e) 0x97f4e is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2237). 2232memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf)); 2233slot = path->slots[0]; 2234btrfs_release_path(path); 2235mutex_unlock(_info->qgroup_rescan_lock); 2236 2237for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) { 2238btrfs_item_key_to_cpu(scratch_leaf, , slot); <== here 2239if (found.type != BTRFS_EXTENT_ITEM_KEY && 2240found.type != BTRFS_METADATA_ITEM_KEY) 2241continue; the btrfs_item_key_to_cpu() inline func calls 2 other inline funcs: static inline void btrfs_item_key_to_cpu(struct extent_buffer *eb, struct btrfs_key *key, int nr) { struct btrfs_disk_key disk_key; btrfs_item_key(eb, _key, nr); btrfs_disk_key_to_cpu(key, _key); <== this is 0x28d } btrfs_disk_key_to_cpu() is the inline referenced by 0x28d and this is where the GPF happens. Thanks, now things are much more clear. Not completely sure, but scratch_leaf seems invalid and cause the bug. (found is in stack memory, so I don't think it's the cause). But less related to the qgroup rework, as that's the existing code. A quick glance already shows some dirty and maybe deadly hack, like copying the whole extent buffer, which includes pages and all kinds of locks. But I'm not 100% sure if that's the problem, but I'll create a patch to for you to test in recent days. BTW, did you tried the following patch? https://patchwork.kernel.org/patch/7114321/ btrfs: qgroup: exit the rescan worker during umount The problem seems a little related to the bug you encountered, so I'd recommend to give it a try. Not yet, but I've come across this bug too during my tests: starting a rescan and umounting gets you a crash. I didn't mention it because I was sure this was an already known bug. Nice to see it has been fixed though ! I'll certainly give it a try but I'm not really sure it'll fix the specific bug we're talking about. However the group of patches posted by Mark should fix the qgroup count disrepancies as I understand it, right ? It might be of interest to try them all at once for sure. Yes, his patch should fix the qgroup count mismatch problem for subvolume remove. If I read the codes correctly, after remove and sync, the accounting number for qgroup of deleted subvolume should be: rfer = 0 and excl = 0. Thanks, Qu Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Stéphane Lesimple wrote on 2015/09/22 16:31 +0200: Le 2015-09-22 10:51, Qu Wenruo a écrit : [92098.842261] Call Trace: [92098.842277] [] ? read_extent_buffer+0xb8/0x110 [btrfs] [92098.842304] [] ? btrfs_find_all_roots+0x60/0x70 [btrfs] [92098.842329] [] btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs] Would you please show the code of it? This one seems to be another stupid bug I made when rewriting the framework. Maybe I forgot to reinit some variants or I'm screwing memory... (gdb) list *(btrfs_qgroup_rescan_worker+0x28d) 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760). 2755 2756static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu, 2757 struct btrfs_disk_key *disk) 2758{ 2759cpu->offset =e64_to_cpu(disk->offset); 2760cpu->type =isk->type; 2761cpu->objectid =e64_to_cpu(disk->objectid); 2762} 2763 2764static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key *disk, (gdb) Does it makes sense ? So it seems that the memory of cpu key is being screwed up... The code is be specific thin inline function, so what about other stack? Like btrfs_qgroup_rescan_helper+0x12? Thanks, Qu Oh, I forgot that you can just change the number of btrfs_qgroup_rescan_worker+0x28d to smaller value. Try +0x280 for example, which will revert to 14 bytes asm code back, which may jump out of the inline function range, and may give you a good hint. Or gdb may have a better mode for inline function, but I don't know... Actually, "list -" is our friend here (show 10 lignes before the last src output) No, that's not the case. List - will only show lines around the source code. What I need is to get the higher caller stack. If debugging a running program, it's quite easy to just use frame command. But in this situation, we don't have call stack, so I'd like to change the +0x28d to several bytes backward, until we jump out of the inline function call, and see the meaningful codes. BTW, did you tried the following patch? https://patchwork.kernel.org/patch/7114321/ btrfs: qgroup: exit the rescan worker during umount The problem seems a little related to the bug you encountered, so I'd recommend to give it a try. Thanks, Qu (gdb) list *(btrfs_qgroup_rescan_worker+0x28d) 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760). 2755 2756static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu, 2757 struct btrfs_disk_key *disk) 2758{ 2759cpu->offset = le64_to_cpu(disk->offset); 2760cpu->type = disk->type; 2761cpu->objectid = le64_to_cpu(disk->objectid); 2762} 2763 2764static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key *disk, (gdb) list - 2745struct btrfs_disk_key *key) 2746{ 2747write_eb_member(eb, h, struct btrfs_free_space_header, location, key); 2748} 2749 2750/* struct btrfs_disk_key */ 2751BTRFS_SETGET_STACK_FUNCS(disk_key_objectid, struct btrfs_disk_key, 2752 objectid, 64); 2753BTRFS_SETGET_STACK_FUNCS(disk_key_offset, struct btrfs_disk_key, offset, 64); 2754BTRFS_SETGET_STACK_FUNCS(disk_key_type, struct btrfs_disk_key, type, 8); (gdb) list - 2735 2736static inline void btrfs_free_space_key(struct extent_buffer *eb, 2737struct btrfs_free_space_header *h, 2738struct btrfs_disk_key *key) 2739{ 2740read_eb_member(eb, h, struct btrfs_free_space_header, location, key); 2741} 2742 2743static inline void btrfs_set_free_space_key(struct extent_buffer *eb, 2744struct btrfs_free_space_header *h, (gdb) Lots of inline funcs and macros it seems. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-23 09:03, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/22 16:31 +0200: Le 2015-09-22 10:51, Qu Wenruo a écrit : [92098.842261] Call Trace: [92098.842277] [] ? read_extent_buffer+0xb8/0x110 [btrfs] [92098.842304] [] ? btrfs_find_all_roots+0x60/0x70 [btrfs] [92098.842329] [] btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs] Would you please show the code of it? This one seems to be another stupid bug I made when rewriting the framework. Maybe I forgot to reinit some variants or I'm screwing memory... (gdb) list *(btrfs_qgroup_rescan_worker+0x28d) 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760). 2755 2756static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu, 2757 struct btrfs_disk_key *disk) 2758{ 2759cpu->offset =e64_to_cpu(disk->offset); 2760cpu->type =isk->type; 2761cpu->objectid =e64_to_cpu(disk->objectid); 2762} 2763 2764static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key *disk, (gdb) Does it makes sense ? So it seems that the memory of cpu key is being screwed up... The code is be specific thin inline function, so what about other stack? Like btrfs_qgroup_rescan_helper+0x12? Thanks, Qu Oh, I forgot that you can just change the number of btrfs_qgroup_rescan_worker+0x28d to smaller value. Try +0x280 for example, which will revert to 14 bytes asm code back, which may jump out of the inline function range, and may give you a good hint. Or gdb may have a better mode for inline function, but I don't know... Actually, "list -" is our friend here (show 10 lignes before the last src output) No, that's not the case. List - will only show lines around the source code. What I need is to get the higher caller stack. If debugging a running program, it's quite easy to just use frame command. But in this situation, we don't have call stack, so I'd like to change the +0x28d to several bytes backward, until we jump out of the inline function call, and see the meaningful codes. Ah, you're right. I had a hard time finding a value where I wouldn't end up in another inline function or entirely somewhere else in the kernel code, but here it is : (gdb) list *(btrfs_qgroup_rescan_worker+0x26e) 0x97f4e is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2237). 2232memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf)); 2233slot = path->slots[0]; 2234btrfs_release_path(path); 2235mutex_unlock(_info->qgroup_rescan_lock); 2236 2237for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) { 2238btrfs_item_key_to_cpu(scratch_leaf, , slot); <== here 2239if (found.type != BTRFS_EXTENT_ITEM_KEY && 2240found.type != BTRFS_METADATA_ITEM_KEY) 2241continue; the btrfs_item_key_to_cpu() inline func calls 2 other inline funcs: static inline void btrfs_item_key_to_cpu(struct extent_buffer *eb, struct btrfs_key *key, int nr) { struct btrfs_disk_key disk_key; btrfs_item_key(eb, _key, nr); btrfs_disk_key_to_cpu(key, _key); <== this is 0x28d } btrfs_disk_key_to_cpu() is the inline referenced by 0x28d and this is where the GPF happens. BTW, did you tried the following patch? https://patchwork.kernel.org/patch/7114321/ btrfs: qgroup: exit the rescan worker during umount The problem seems a little related to the bug you encountered, so I'd recommend to give it a try. Not yet, but I've come across this bug too during my tests: starting a rescan and umounting gets you a crash. I didn't mention it because I was sure this was an already known bug. Nice to see it has been fixed though ! I'll certainly give it a try but I'm not really sure it'll fix the specific bug we're talking about. However the group of patches posted by Mark should fix the qgroup count disrepancies as I understand it, right ? It might be of interest to try them all at once for sure. Thanks, -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
在 2015年09月22日 15:34, Stéphane Lesimple 写道: Le 2015-09-22 03:37, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/22 03:30 +0200: Le 2015-09-20 13:14, Stéphane Lesimple a écrit : Le 2015-09-20 12:51, Qu Wenruo a écrit : Would you please use gdb to show the codes of "btrfs_qgroup_rescan_worker+0x388" ? (Need kernel debuginfo) My guess is the following line:(pretty sure, but not 100% sure) -- /* * only update status, since the previous part has alreay updated the * qgroup info. */ trans = btrfs_start_transaction(fs_info->quota_root, 1); < if (IS_ERR(trans)) { err = PTR_ERR(trans); btrfs_err(fs_info, "fail to start transaction for status update: %d\n", err); goto done; } -- The kernel and modules were already compiled with debuginfo. However for some reason, I couldn't get gdb disassembly of /proc/kcore properly aligned with the source I compiled: the asm code doesn't match the C code shown by gdb. In any case, watching the source of this function, this is the only place btrfs_start_transaction is called, so we can be 100% sure it's where the crash happens indeed. Yep, that's the only caller. Here is some useful small hint to locate the code, if you are interestied in kernel development. # Not sure about whether ubuntu gzipped modules, at least Arch does # compress it $ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/ $ gunzip /tmp/btrfs.ko.gz $ gdb /tmp/btrfs.ko # Make sure gdb read all the needed debuginfo $ gdb list *(btrfs_qgroup_rescan_worker+0x388) And gdb will find the code position for you. Quite easy one, only backtrace info is needed. Ah, thanks for the tips, I was loading whole vmlinux and using /proc/kcore as the core info, then adding the module with "add-symbol-file". But as we're just looking for the code and not the variables, it was indeed completely overkill. (gdb) list *(btrfs_qgroup_rescan_worker+0x388) 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328). 2323 2324/* 2325 * only update status, since the previous part has alreay updated the 2326 * qgroup info. 2327 */ 2328trans = btrfs_start_transaction(fs_info->quota_root, 1); 2329if (IS_ERR(trans)) { 2330err = PTR_ERR(trans); 2331btrfs_err(fs_info, 2332 "fail to start transaction for status update: %d\n", So this just confirms what we were already 99% sure of. Another hint is about how to collect the kernel crash info. Your netconsole setup would be definitely one good practice. Another one I use to collect crash info is kdump. Ubuntu should have a good wiki on it. I've already come across kdump a few times, but never really look into it. To debug the other complicated extend backref bug, it could be of some use. So, as a quick summary of this big thread, it seems I've been hitting 3 bugs, all reproductible : - kernel BUG on balance (this original thread) For this, I can't provide much help, as extent backref bug is quite hard to debug, unless a developer is interested in it and find a stable way to reproduce it. Yes, unfortunately as it looks so much like a race condition, I know I can reproduce it with my worflow, but it can take between 1 minute and 12 hours, so I wouldn't call it a "stable way" to reproduce it unfortunately :( Still if any dev is interested in it, I can reproduce it, with a patched kernel if needed. Maybe you are already doing it, you can only compile the btrfs modules, which will be far more faster than compile the whole kernel, if and only if the compiled module can be loaded. Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll try to load the modified module and see if the running kernel accepts it. I have to rmmod the loaded module first, hence umounting any btrfs fs before that. Should be able to do it in a couple hours. I'll delete again all my snapshots and run my script. Should be easy to trigger the (hopefully worked-around) bug again. Well, I didn't trigger this exact bug, but another one, not less severe though, as it also crashed the system: [92098.841309] general protection fault: [#1] SMP [92098.841338] Modules linked in: ... [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 4.3.0-rc1 #1 [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 01/06/2014 [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs] [92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti: 8800a3dc8000 [92098.841910] RIP: 0010:[] [] memcpy_erms+0x6/0x10 [92098.841935] RSP: 0018:8800a3dcbcc8 EFLAGS: 00010207 [92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX: 0009 [92098.841970] RDX: 0009 RSI: 00050800 RDI: 8800a3dcbd67 [92098.841989] RBP:
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
在 2015年09月22日 16:40, Qu Wenruo 写道: 在 2015年09月22日 15:34, Stéphane Lesimple 写道: Le 2015-09-22 03:37, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/22 03:30 +0200: Le 2015-09-20 13:14, Stéphane Lesimple a écrit : Le 2015-09-20 12:51, Qu Wenruo a écrit : Would you please use gdb to show the codes of "btrfs_qgroup_rescan_worker+0x388" ? (Need kernel debuginfo) My guess is the following line:(pretty sure, but not 100% sure) -- /* * only update status, since the previous part has alreay updated the * qgroup info. */ trans =trfs_start_transaction(fs_info->quota_root, 1); < if (IS_ERR(trans)) { err =TR_ERR(trans); btrfs_err(fs_info, "fail to start transaction for status update: %d\n", err); goto done; } -- The kernel and modules were already compiled with debuginfo. However for some reason, I couldn't get gdb disassembly of /proc/kcore properly aligned with the source I compiled: the asm code doesn't match the C code shown by gdb. In any case, watching the source of this function, this is the only place btrfs_start_transaction is called, so we can be 100% sure it's where the crash happens indeed. Yep, that's the only caller. Here is some useful small hint to locate the code, if you are interestied in kernel development. # Not sure about whether ubuntu gzipped modules, at least Arch does # compress it $ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/ $ gunzip /tmp/btrfs.ko.gz $ gdb /tmp/btrfs.ko # Make sure gdb read all the needed debuginfo $ gdb list *(btrfs_qgroup_rescan_worker+0x388) And gdb will find the code position for you. Quite easy one, only backtrace info is needed. Ah, thanks for the tips, I was loading whole vmlinux and using /proc/kcore as the core info, then adding the module with "add-symbol-file". But as we're just looking for the code and not the variables, it was indeed completely overkill. (gdb) list *(btrfs_qgroup_rescan_worker+0x388) 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328). 2323 2324/* 2325 * only update status, since the previous part has alreay updated the 2326 * qgroup info. 2327 */ 2328trans =trfs_start_transaction(fs_info->quota_root, 1); 2329if (IS_ERR(trans)) { 2330err =TR_ERR(trans); 2331btrfs_err(fs_info, 2332 "fail to start transaction for status update: %d\n", So this just confirms what we were already 99% sure of. Another hint is about how to collect the kernel crash info. Your netconsole setup would be definitely one good practice. Another one I use to collect crash info is kdump. Ubuntu should have a good wiki on it. I've already come across kdump a few times, but never really look into it. To debug the other complicated extend backref bug, it could be of some use. So, as a quick summary of this big thread, it seems I've been hitting 3 bugs, all reproductible : - kernel BUG on balance (this original thread) For this, I can't provide much help, as extent backref bug is quite hard to debug, unless a developer is interested in it and find a stable way to reproduce it. Yes, unfortunately as it looks so much like a race condition, I know I can reproduce it with my worflow, but it can take between 1 minute and 12 hours, so I wouldn't call it a "stable way" to reproduce it unfortunately :( Still if any dev is interested in it, I can reproduce it, with a patched kernel if needed. Maybe you are already doing it, you can only compile the btrfs modules, which will be far more faster than compile the whole kernel, if and only if the compiled module can be loaded. Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll try to load the modified module and see if the running kernel accepts it. I have to rmmod the loaded module first, hence umounting any btrfs fs before that. Should be able to do it in a couple hours. I'll delete again all my snapshots and run my script. Should be easy to trigger the (hopefully worked-around) bug again. Well, I didn't trigger this exact bug, but another one, not less severe though, as it also crashed the system: [92098.841309] general protection fault: [#1] SMP [92098.841338] Modules linked in: ... [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 4.3.0-rc1 #1 [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 01/06/2014 [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs] [92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti: 8800a3dc8000 [92098.841910] RIP: 0010:[] [] memcpy_erms+0x6/0x10 [92098.841935] RSP: 0018:8800a3dcbcc8 EFLAGS: 00010207 [92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX: 0009 [92098.841970] RDX: 0009 RSI: 00050800 RDI:
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-22 03:37, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/22 03:30 +0200: Le 2015-09-20 13:14, Stéphane Lesimple a écrit : Le 2015-09-20 12:51, Qu Wenruo a écrit : Would you please use gdb to show the codes of "btrfs_qgroup_rescan_worker+0x388" ? (Need kernel debuginfo) My guess is the following line:(pretty sure, but not 100% sure) -- /* * only update status, since the previous part has alreay updated the * qgroup info. */ trans = btrfs_start_transaction(fs_info->quota_root, 1); < if (IS_ERR(trans)) { err = PTR_ERR(trans); btrfs_err(fs_info, "fail to start transaction for status update: %d\n", err); goto done; } -- The kernel and modules were already compiled with debuginfo. However for some reason, I couldn't get gdb disassembly of /proc/kcore properly aligned with the source I compiled: the asm code doesn't match the C code shown by gdb. In any case, watching the source of this function, this is the only place btrfs_start_transaction is called, so we can be 100% sure it's where the crash happens indeed. Yep, that's the only caller. Here is some useful small hint to locate the code, if you are interestied in kernel development. # Not sure about whether ubuntu gzipped modules, at least Arch does # compress it $ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/ $ gunzip /tmp/btrfs.ko.gz $ gdb /tmp/btrfs.ko # Make sure gdb read all the needed debuginfo $ gdb list *(btrfs_qgroup_rescan_worker+0x388) And gdb will find the code position for you. Quite easy one, only backtrace info is needed. Ah, thanks for the tips, I was loading whole vmlinux and using /proc/kcore as the core info, then adding the module with "add-symbol-file". But as we're just looking for the code and not the variables, it was indeed completely overkill. (gdb) list *(btrfs_qgroup_rescan_worker+0x388) 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328). 2323 2324/* 2325 * only update status, since the previous part has alreay updated the 2326 * qgroup info. 2327 */ 2328trans = btrfs_start_transaction(fs_info->quota_root, 1); 2329if (IS_ERR(trans)) { 2330err = PTR_ERR(trans); 2331btrfs_err(fs_info, 2332 "fail to start transaction for status update: %d\n", So this just confirms what we were already 99% sure of. Another hint is about how to collect the kernel crash info. Your netconsole setup would be definitely one good practice. Another one I use to collect crash info is kdump. Ubuntu should have a good wiki on it. I've already come across kdump a few times, but never really look into it. To debug the other complicated extend backref bug, it could be of some use. So, as a quick summary of this big thread, it seems I've been hitting 3 bugs, all reproductible : - kernel BUG on balance (this original thread) For this, I can't provide much help, as extent backref bug is quite hard to debug, unless a developer is interested in it and find a stable way to reproduce it. Yes, unfortunately as it looks so much like a race condition, I know I can reproduce it with my worflow, but it can take between 1 minute and 12 hours, so I wouldn't call it a "stable way" to reproduce it unfortunately :( Still if any dev is interested in it, I can reproduce it, with a patched kernel if needed. Maybe you are already doing it, you can only compile the btrfs modules, which will be far more faster than compile the whole kernel, if and only if the compiled module can be loaded. Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll try to load the modified module and see if the running kernel accepts it. I have to rmmod the loaded module first, hence umounting any btrfs fs before that. Should be able to do it in a couple hours. I'll delete again all my snapshots and run my script. Should be easy to trigger the (hopefully worked-around) bug again. Well, I didn't trigger this exact bug, but another one, not less severe though, as it also crashed the system: [92098.841309] general protection fault: [#1] SMP [92098.841338] Modules linked in: ... [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 4.3.0-rc1 #1 [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 01/06/2014 [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs] [92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti: 8800a3dc8000 [92098.841910] RIP: 0010:[] [] memcpy_erms+0x6/0x10 [92098.841935] RSP: 0018:8800a3dcbcc8 EFLAGS: 00010207 [92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX: 0009 [92098.841970] RDX: 0009 RSI: 00050800 RDI: 8800a3dcbd67 [92098.841989] RBP: 8800a3dcbd00
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-20 13:14, Stéphane Lesimple a écrit : Le 2015-09-20 12:51, Qu Wenruo a écrit : Would you please use gdb to show the codes of "btrfs_qgroup_rescan_worker+0x388" ? (Need kernel debuginfo) My guess is the following line:(pretty sure, but not 100% sure) -- /* * only update status, since the previous part has alreay updated the * qgroup info. */ trans = btrfs_start_transaction(fs_info->quota_root, 1); < if (IS_ERR(trans)) { err = PTR_ERR(trans); btrfs_err(fs_info, "fail to start transaction for status update: %d\n", err); goto done; } -- The kernel and modules were already compiled with debuginfo. However for some reason, I couldn't get gdb disassembly of /proc/kcore properly aligned with the source I compiled: the asm code doesn't match the C code shown by gdb. In any case, watching the source of this function, this is the only place btrfs_start_transaction is called, so we can be 100% sure it's where the crash happens indeed. Yep, that's the only caller. Here is some useful small hint to locate the code, if you are interestied in kernel development. # Not sure about whether ubuntu gzipped modules, at least Arch does # compress it $ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/ $ gunzip /tmp/btrfs.ko.gz $ gdb /tmp/btrfs.ko # Make sure gdb read all the needed debuginfo $ gdb list *(btrfs_qgroup_rescan_worker+0x388) And gdb will find the code position for you. Quite easy one, only backtrace info is needed. Ah, thanks for the tips, I was loading whole vmlinux and using /proc/kcore as the core info, then adding the module with "add-symbol-file". But as we're just looking for the code and not the variables, it was indeed completely overkill. (gdb) list *(btrfs_qgroup_rescan_worker+0x388) 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328). 2323 2324/* 2325 * only update status, since the previous part has alreay updated the 2326 * qgroup info. 2327 */ 2328trans = btrfs_start_transaction(fs_info->quota_root, 1); 2329if (IS_ERR(trans)) { 2330err = PTR_ERR(trans); 2331btrfs_err(fs_info, 2332 "fail to start transaction for status update: %d\n", So this just confirms what we were already 99% sure of. Another hint is about how to collect the kernel crash info. Your netconsole setup would be definitely one good practice. Another one I use to collect crash info is kdump. Ubuntu should have a good wiki on it. I've already come across kdump a few times, but never really look into it. To debug the other complicated extend backref bug, it could be of some use. So, as a quick summary of this big thread, it seems I've been hitting 3 bugs, all reproductible : - kernel BUG on balance (this original thread) For this, I can't provide much help, as extent backref bug is quite hard to debug, unless a developer is interested in it and find a stable way to reproduce it. Yes, unfortunately as it looks so much like a race condition, I know I can reproduce it with my worflow, but it can take between 1 minute and 12 hours, so I wouldn't call it a "stable way" to reproduce it unfortunately :( Still if any dev is interested in it, I can reproduce it, with a patched kernel if needed. Maybe you are already doing it, you can only compile the btrfs modules, which will be far more faster than compile the whole kernel, if and only if the compiled module can be loaded. Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll try to load the modified module and see if the running kernel accepts it. I have to rmmod the loaded module first, hence umounting any btrfs fs before that. Should be able to do it in a couple hours. I'll delete again all my snapshots and run my script. Should be easy to trigger the (hopefully worked-around) bug again. Well, I didn't trigger this exact bug, but another one, not less severe though, as it also crashed the system: [92098.841309] general protection fault: [#1] SMP [92098.841338] Modules linked in: ... [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 4.3.0-rc1 #1 [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 01/06/2014 [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs] [92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti: 8800a3dc8000 [92098.841910] RIP: 0010:[] [] memcpy_erms+0x6/0x10 [92098.841935] RSP: 0018:8800a3dcbcc8 EFLAGS: 00010207 [92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX: 0009 [92098.841970] RDX: 0009 RSI: 00050800 RDI: 8800a3dcbd67 [92098.841989] RBP: 8800a3dcbd00 R08: 00019c60 R09: 88011fb19c60 [92098.842009] R10: ea0003006480 R11:
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Stéphane Lesimple wrote on 2015/09/22 03:30 +0200: Le 2015-09-20 13:14, Stéphane Lesimple a écrit : Le 2015-09-20 12:51, Qu Wenruo a écrit : Would you please use gdb to show the codes of "btrfs_qgroup_rescan_worker+0x388" ? (Need kernel debuginfo) My guess is the following line:(pretty sure, but not 100% sure) -- /* * only update status, since the previous part has alreay updated the * qgroup info. */ trans = btrfs_start_transaction(fs_info->quota_root, 1); < if (IS_ERR(trans)) { err = PTR_ERR(trans); btrfs_err(fs_info, "fail to start transaction for status update: %d\n", err); goto done; } -- The kernel and modules were already compiled with debuginfo. However for some reason, I couldn't get gdb disassembly of /proc/kcore properly aligned with the source I compiled: the asm code doesn't match the C code shown by gdb. In any case, watching the source of this function, this is the only place btrfs_start_transaction is called, so we can be 100% sure it's where the crash happens indeed. Yep, that's the only caller. Here is some useful small hint to locate the code, if you are interestied in kernel development. # Not sure about whether ubuntu gzipped modules, at least Arch does # compress it $ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/ $ gunzip /tmp/btrfs.ko.gz $ gdb /tmp/btrfs.ko # Make sure gdb read all the needed debuginfo $ gdb list *(btrfs_qgroup_rescan_worker+0x388) And gdb will find the code position for you. Quite easy one, only backtrace info is needed. Ah, thanks for the tips, I was loading whole vmlinux and using /proc/kcore as the core info, then adding the module with "add-symbol-file". But as we're just looking for the code and not the variables, it was indeed completely overkill. (gdb) list *(btrfs_qgroup_rescan_worker+0x388) 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328). 2323 2324/* 2325 * only update status, since the previous part has alreay updated the 2326 * qgroup info. 2327 */ 2328trans = btrfs_start_transaction(fs_info->quota_root, 1); 2329if (IS_ERR(trans)) { 2330err = PTR_ERR(trans); 2331btrfs_err(fs_info, 2332 "fail to start transaction for status update: %d\n", So this just confirms what we were already 99% sure of. Another hint is about how to collect the kernel crash info. Your netconsole setup would be definitely one good practice. Another one I use to collect crash info is kdump. Ubuntu should have a good wiki on it. I've already come across kdump a few times, but never really look into it. To debug the other complicated extend backref bug, it could be of some use. So, as a quick summary of this big thread, it seems I've been hitting 3 bugs, all reproductible : - kernel BUG on balance (this original thread) For this, I can't provide much help, as extent backref bug is quite hard to debug, unless a developer is interested in it and find a stable way to reproduce it. Yes, unfortunately as it looks so much like a race condition, I know I can reproduce it with my worflow, but it can take between 1 minute and 12 hours, so I wouldn't call it a "stable way" to reproduce it unfortunately :( Still if any dev is interested in it, I can reproduce it, with a patched kernel if needed. Maybe you are already doing it, you can only compile the btrfs modules, which will be far more faster than compile the whole kernel, if and only if the compiled module can be loaded. Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll try to load the modified module and see if the running kernel accepts it. I have to rmmod the loaded module first, hence umounting any btrfs fs before that. Should be able to do it in a couple hours. I'll delete again all my snapshots and run my script. Should be easy to trigger the (hopefully worked-around) bug again. Well, I didn't trigger this exact bug, but another one, not less severe though, as it also crashed the system: [92098.841309] general protection fault: [#1] SMP [92098.841338] Modules linked in: ... [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 4.3.0-rc1 #1 [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 01/06/2014 [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs] [92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti: 8800a3dc8000 [92098.841910] RIP: 0010:[] [] memcpy_erms+0x6/0x10 [92098.841935] RSP: 0018:8800a3dcbcc8 EFLAGS: 00010207 [92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX: 0009 [92098.841970] RDX: 0009 RSI: 00050800 RDI: 8800a3dcbd67 [92098.841989] RBP: 8800a3dcbd00 R08: 00019c60 R09: 88011fb19c60 [92098.842009] R10:
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
在 2015年09月20日 18:35, Stéphane Lesimple 写道: Le 2015-09-20 03:22, Qu Wenruo a écrit : The mentioned steps are as follows : 0) Rsync data from the next ext4 "snapshot" to the subvolume 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 2) Create the needed readonly snapshot on btrfs 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 4) Avoid doing IO if possible until step 6) 5) Do 'btrfs quota rescan -w' and save it <== 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== The resulting files are available here: http://speed47.net/tmp2/qgroup.tar.gz The run2 is the more complete one, during run1 the machine crashed even faster. It's interesting to note, however, that it seems to have crashed the same way and at the same step in the process. Your data really helps a lot!! And the good news is, the qgroup accouting part is working as expected. Although I only checked about 1/3/6 of about 5 snaps, they are all OK. I can make a script to cross check them, but from the last few result, I think qgroup works fine. I'm more confident about the minus number, which should be a result of deleted subvolume, and the real problem is, such qgroup is not handled well with qgroup rescan. I agree with your analysis, this matches what I observed. I'll try to add a hot fix for such case if needed. But right now, I don't have a good idea for it until Mark's work of rescan subtree. Maybe I can add a new option for btrfs-progs to automatically remove the qgroup and trigger a rescan? Until this is properly fixed in the kernel code, and this is good news to know Mark and you are working on it, this would be a good workaround yes! [ 5738.172879] Call Trace: [ 5738.172887] [] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 5738.172896] [] btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs] Your netconsole backtrace is also of greate value. This one implies that, my rework also caused some stupid bug. (Yeah, I always make such bugs) or some existing unexposed rescan bug. Would you please use gdb to show the codes of "btrfs_qgroup_rescan_worker+0x388" ? (Need kernel debuginfo) My guess is the following line:(pretty sure, but not 100% sure) -- /* * only update status, since the previous part has alreay updated the * qgroup info. */ trans = btrfs_start_transaction(fs_info->quota_root, 1); < if (IS_ERR(trans)) { err = PTR_ERR(trans); btrfs_err(fs_info, "fail to start transaction for status update: %d\n", err); goto done; } -- The kernel and modules were already compiled with debuginfo. However for some reason, I couldn't get gdb disassembly of /proc/kcore properly aligned with the source I compiled: the asm code doesn't match the C code shown by gdb. In any case, watching the source of this function, this is the only place btrfs_start_transaction is called, so we can be 100% sure it's where the crash happens indeed. Yep, that's the only caller. Here is some useful small hint to locate the code, if you are interestied in kernel development. # Not sure about whether ubuntu gzipped modules, at least Arch does # compress it $ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/ $ gunzip /tmp/btrfs.ko.gz $ gdb /tmp/btrfs.ko # Make sure gdb read all the needed debuginfo $ gdb list *(btrfs_qgroup_rescan_worker+0x388) And gdb will find the code position for you. Quite easy one, only backtrace info is needed. Another hint is about how to collect the kernel crash info. Your netconsole setup would be definitely one good practice. Another one I use to collect crash info is kdump. Ubuntu should have a good wiki on it. But that means, at rescan time, fs_info->quota_root is still NULL, which is quite wired. I can add extra check to avoid such NULL pointer for now, but it's better to review the existing rescan workflow, as I think there is some race for it to init quota_root. You can also try the following hotfix patch to see if it works: http://pastebin.com/966GQXPk My concern is, this may cause qgroup rescan to exit without updating its accounting info... So still need your help. Or I can use your reproducer script to test it next Monday. Compiling with your patch, just amended of a little printk to know if the execution flow enters the added if condition. Will let you know about the results. But I'm pretty sure I can get that (u64)-1 value again by deleting snapshots. Shall I ? Or do you have something else for me to run before that ? You have already done a great job in helping maturing qgroups. The minus number and 0 excl is somewhat expected for deleted snapshots. Good news is, 1) it doesn't affect valid(non-orphan) qgroup. 2) Mark is already working on it. I'll try to add a btrfs-progs hotfix for you to delete and rescan qgroups to avoid such problem. That would be good ! So, as a quick summary of
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-20 12:51, Qu Wenruo a écrit : Would you please use gdb to show the codes of "btrfs_qgroup_rescan_worker+0x388" ? (Need kernel debuginfo) My guess is the following line:(pretty sure, but not 100% sure) -- /* * only update status, since the previous part has alreay updated the * qgroup info. */ trans = btrfs_start_transaction(fs_info->quota_root, 1); < if (IS_ERR(trans)) { err = PTR_ERR(trans); btrfs_err(fs_info, "fail to start transaction for status update: %d\n", err); goto done; } -- The kernel and modules were already compiled with debuginfo. However for some reason, I couldn't get gdb disassembly of /proc/kcore properly aligned with the source I compiled: the asm code doesn't match the C code shown by gdb. In any case, watching the source of this function, this is the only place btrfs_start_transaction is called, so we can be 100% sure it's where the crash happens indeed. Yep, that's the only caller. Here is some useful small hint to locate the code, if you are interestied in kernel development. # Not sure about whether ubuntu gzipped modules, at least Arch does # compress it $ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/ $ gunzip /tmp/btrfs.ko.gz $ gdb /tmp/btrfs.ko # Make sure gdb read all the needed debuginfo $ gdb list *(btrfs_qgroup_rescan_worker+0x388) And gdb will find the code position for you. Quite easy one, only backtrace info is needed. Ah, thanks for the tips, I was loading whole vmlinux and using /proc/kcore as the core info, then adding the module with "add-symbol-file". But as we're just looking for the code and not the variables, it was indeed completely overkill. (gdb) list *(btrfs_qgroup_rescan_worker+0x388) 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328). 2323 2324/* 2325 * only update status, since the previous part has alreay updated the 2326 * qgroup info. 2327 */ 2328trans = btrfs_start_transaction(fs_info->quota_root, 1); 2329if (IS_ERR(trans)) { 2330err = PTR_ERR(trans); 2331btrfs_err(fs_info, 2332 "fail to start transaction for status update: %d\n", So this just confirms what we were already 99% sure of. Another hint is about how to collect the kernel crash info. Your netconsole setup would be definitely one good practice. Another one I use to collect crash info is kdump. Ubuntu should have a good wiki on it. I've already come across kdump a few times, but never really look into it. To debug the other complicated extend backref bug, it could be of some use. So, as a quick summary of this big thread, it seems I've been hitting 3 bugs, all reproductible : - kernel BUG on balance (this original thread) For this, I can't provide much help, as extent backref bug is quite hard to debug, unless a developer is interested in it and find a stable way to reproduce it. Yes, unfortunately as it looks so much like a race condition, I know I can reproduce it with my worflow, but it can take between 1 minute and 12 hours, so I wouldn't call it a "stable way" to reproduce it unfortunately :( Still if any dev is interested in it, I can reproduce it, with a patched kernel if needed. Maybe you are already doing it, you can only compile the btrfs modules, which will be far more faster than compile the whole kernel, if and only if the compiled module can be loaded. Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll try to load the modified module and see if the running kernel accepts it. I have to rmmod the loaded module first, hence umounting any btrfs fs before that. Should be able to do it in a couple hours. I'll delete again all my snapshots and run my script. Should be easy to trigger the (hopefully worked-around) bug again. Regards, -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-20 03:22, Qu Wenruo a écrit : The mentioned steps are as follows : 0) Rsync data from the next ext4 "snapshot" to the subvolume 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 2) Create the needed readonly snapshot on btrfs 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 4) Avoid doing IO if possible until step 6) 5) Do 'btrfs quota rescan -w' and save it <== 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== The resulting files are available here: http://speed47.net/tmp2/qgroup.tar.gz The run2 is the more complete one, during run1 the machine crashed even faster. It's interesting to note, however, that it seems to have crashed the same way and at the same step in the process. Your data really helps a lot!! And the good news is, the qgroup accouting part is working as expected. Although I only checked about 1/3/6 of about 5 snaps, they are all OK. I can make a script to cross check them, but from the last few result, I think qgroup works fine. I'm more confident about the minus number, which should be a result of deleted subvolume, and the real problem is, such qgroup is not handled well with qgroup rescan. I agree with your analysis, this matches what I observed. I'll try to add a hot fix for such case if needed. But right now, I don't have a good idea for it until Mark's work of rescan subtree. Maybe I can add a new option for btrfs-progs to automatically remove the qgroup and trigger a rescan? Until this is properly fixed in the kernel code, and this is good news to know Mark and you are working on it, this would be a good workaround yes! [ 5738.172879] Call Trace: [ 5738.172887] [] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 5738.172896] [] btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs] Your netconsole backtrace is also of greate value. This one implies that, my rework also caused some stupid bug. (Yeah, I always make such bugs) or some existing unexposed rescan bug. Would you please use gdb to show the codes of "btrfs_qgroup_rescan_worker+0x388" ? (Need kernel debuginfo) My guess is the following line:(pretty sure, but not 100% sure) -- /* * only update status, since the previous part has alreay updated the * qgroup info. */ trans = btrfs_start_transaction(fs_info->quota_root, 1); < if (IS_ERR(trans)) { err = PTR_ERR(trans); btrfs_err(fs_info, "fail to start transaction for status update: %d\n", err); goto done; } -- The kernel and modules were already compiled with debuginfo. However for some reason, I couldn't get gdb disassembly of /proc/kcore properly aligned with the source I compiled: the asm code doesn't match the C code shown by gdb. In any case, watching the source of this function, this is the only place btrfs_start_transaction is called, so we can be 100% sure it's where the crash happens indeed. But that means, at rescan time, fs_info->quota_root is still NULL, which is quite wired. I can add extra check to avoid such NULL pointer for now, but it's better to review the existing rescan workflow, as I think there is some race for it to init quota_root. You can also try the following hotfix patch to see if it works: http://pastebin.com/966GQXPk My concern is, this may cause qgroup rescan to exit without updating its accounting info... So still need your help. Or I can use your reproducer script to test it next Monday. Compiling with your patch, just amended of a little printk to know if the execution flow enters the added if condition. Will let you know about the results. But I'm pretty sure I can get that (u64)-1 value again by deleting snapshots. Shall I ? Or do you have something else for me to run before that ? You have already done a great job in helping maturing qgroups. The minus number and 0 excl is somewhat expected for deleted snapshots. Good news is, 1) it doesn't affect valid(non-orphan) qgroup. 2) Mark is already working on it. I'll try to add a btrfs-progs hotfix for you to delete and rescan qgroups to avoid such problem. That would be good ! So, as a quick summary of this big thread, it seems I've been hitting 3 bugs, all reproductible : - kernel BUG on balance (this original thread) For this, I can't provide much help, as extent backref bug is quite hard to debug, unless a developer is interested in it and find a stable way to reproduce it. Yes, unfortunately as it looks so much like a race condition, I know I can reproduce it with my worflow, but it can take between 1 minute and 12 hours, so I wouldn't call it a "stable way" to reproduce it unfortunately :( Still if any dev is interested in it, I can reproduce it, with a patched kernel if needed. The rest two are explained or have hot fix mentioned above. And thanks for that, will keep you posted. -- Stéphane. --
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
在 2015年09月18日 18:26, Stéphane Lesimple 写道: Le 2015-09-18 12:15, Stéphane Lesimple a écrit : Le 2015-09-18 09:36, Stéphane Lesimple a écrit : Sure, I did a quota disable / quota enable before running the snapshot debug procedure, so the qgroups were clean again when I started : qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 none none --- --- 0/1906 1657848029184 1657848029184 none none --- --- 0/1909124950921216 124950921216 none none --- --- 0/1911 1054587293696 1054587293696 none none --- --- 0/3270 23727300608 23727300608 none none --- --- 0/3314 23221784576 23221784576 none none --- --- 0/3341 74792755207479275520 none none --- --- 0/3367 24185790464 24185790464 none none --- --- The test is running, I expect to post the results within an hour or two. Well, my system crashed twice while running the procedure... By "crashed" I mean : the machine no longer pings, and nothing is logged in kern.log unfortunately : [ 7096.735731] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [ 7172.614851] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [ 7242.870259] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [ 7321.466931] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [0.00] Initializing cgroup subsys cpuset The even stranger part is that the last 2 stdout dump files exist but are empty : -rw-r--r-- 1 root root 21 Sep 18 10:29 snap32.step5 -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6 -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1 -rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3 -rw-r--r-- 1 root root 21 Sep 18 10:30 snap33.step5 -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6 -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1 -rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step3 <== -rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step5 <== The mentioned steps are as follows : 0) Rsync data from the next ext4 "snapshot" to the subvolume 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 2) Create the needed readonly snapshot on btrfs 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 4) Avoid doing IO if possible until step 6) 5) Do 'btrfs quota rescan -w' and save it <== 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== The resulting files are available here: http://speed47.net/tmp2/qgroup.tar.gz The run2 is the more complete one, during run1 the machine crashed even faster. It's interesting to note, however, that it seems to have crashed the same way and at the same step in the process. Your data really helps a lot!! And the good news is, the qgroup accouting part is working as expected. Although I only checked about 1/3/6 of about 5 snaps, they are all OK. I can make a script to cross check them, but from the last few result, I think qgroup works fine. I'm more confident about the minus number, which should be a result of deleted subvolume, and the real problem is, such qgroup is not handled well with qgroup rescan. I'll try to add a hot fix for such case if needed. But right now, I don't have a good idea for it until Mark's work of rescan subtree. Maybe I can add a new option for btrfs-progs to automatically remove the qgroup and trigger a rescan? Actually about that, I forgot I did set up netconsole before starting the second run after the first "muted" crash, and it did work : even if I have no logs in kern.log, netconsole managed to send them to my other machine before going down, so here it is : --- [ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at 01f0 [ 5738.172702] IP: [] start_transaction+0x1b/0x580 [btrfs] [ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0 [ 5738.172723] Oops: [#1] SMP [ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp nf_conntrack_sane iptable_security iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-18 02:59, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/17 20:47 +0200: Le 2015-09-17 12:41, Qu Wenruo a écrit : In the meantime, I've reactivated quotas, umounted the filesystem and ran a btrfsck on it : as you would expect, there's no qgroup problem reported so far. At least, rescan code is working without problem. I'll clear all my snapshots, run an quota rescan, then re-create them one by one by rsyncing from my ext4 system I still have. Maybe I'll run into the issue again. Would you mind to do the following check for each subvolume rsync? 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output 2) Create the needed snapshot 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output 4) Avoid doing IO if possible until step 6) 5) Do 'btrfs quota rescan -w' and save it 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output 7) Rsync data from ext4 to the newly created snapshot The point is, as you mentioned, rescan is working fine, we can compare output from 3), 6) and 1) to see which qgroup accounting number changes. And if differs, which means the qgroup update at write time OR snapshot creation has something wrong, at least we can locate the problem to qgroup update routine or snapshot creation. I was about to do that, but first there's something that sounds strange : I've begun by trashing all my snapshots, then ran a quota rescan, and waited for it to complete, to start on a sane base. However, this is the output of qgroup show now : By "trashing", did you mean deleting all the files inside the subvolume? Or "btrfs subv del"? Sorry for the confusion here, yes, I meant btrfs subvolume del. qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 1638416384 none none --- --- 0/1906 16578480291841657848029184 none none --- --- 0/1909124950921216 124950921216 none none --- --- 0/1911 10545872936961054587293696 none none --- --- 0/3270 23727300608 23727300608 none none --- --- 0/3314 23206055936 23206055936 none none --- --- 0/3317 184729968640 none none --- --- 0/3318 22235709440 18446744073708421120 none none --- --- 0/3319 222403338240 none none --- --- 0/3320 222896087040 none none --- --- 0/3321 222896087040 none none --- --- 0/3322 184611512320 none none --- --- 0/3323 184239022080 none none --- --- 0/3324 184239022080 none none --- --- 0/3325 184635064320 none none --- --- 0/3326 184635064320 none none --- --- 0/3327 184635064320 none none --- --- 0/3328 184635064320 none none --- --- 0/3329 185854279680 none none --- --- 0/3330 18621472768 18446744073251348480 none none --- --- 0/3331 186214727680 none none --- --- 0/3332 186214727680 none none --- --- 0/ 187830763520 none none --- --- 0/3334 187998044160 none none --- --- 0/3335 187998044160 none none --- --- 0/3336 188162170880 none none --- --- 0/3337 188162662400 none none --- --- 0/3338 188162662400 none none --- --- 0/3339 188162662400 none none --- --- 0/3340 188163645440 none none --- --- 0/3341 7530119168 7530119168 none none --- --- 0/3342 49192837120 none none --- --- 0/3343 49217249280 none none --- --- 0/3344 49217249280 none none --- --- 0/3345 6503317504 18446744073690902528 none none --- --- 0/3346 65034526720 none none --- --- 0/3347 65095147520 none none --- --- 0/3348 65157939200 none none --- --- 0/3349
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-18 09:36, Stéphane Lesimple a écrit : Sure, I did a quota disable / quota enable before running the snapshot debug procedure, so the qgroups were clean again when I started : qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 none none --- --- 0/1906 1657848029184 1657848029184 none none --- --- 0/1909124950921216 124950921216 none none --- --- 0/1911 1054587293696 1054587293696 none none --- --- 0/3270 23727300608 23727300608 none none --- --- 0/3314 23221784576 23221784576 none none --- --- 0/3341 74792755207479275520 none none --- --- 0/3367 24185790464 24185790464 none none --- --- The test is running, I expect to post the results within an hour or two. Well, my system crashed twice while running the procedure... By "crashed" I mean : the machine no longer pings, and nothing is logged in kern.log unfortunately : [ 7096.735731] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [ 7172.614851] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [ 7242.870259] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [ 7321.466931] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [0.00] Initializing cgroup subsys cpuset The even stranger part is that the last 2 stdout dump files exist but are empty : -rw-r--r-- 1 root root 21 Sep 18 10:29 snap32.step5 -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6 -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1 -rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3 -rw-r--r-- 1 root root 21 Sep 18 10:30 snap33.step5 -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6 -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1 -rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step3 <== -rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step5 <== The mentioned steps are as follows : 0) Rsync data from the next ext4 "snapshot" to the subvolume 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 2) Create the needed readonly snapshot on btrfs 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 4) Avoid doing IO if possible until step 6) 5) Do 'btrfs quota rescan -w' and save it <== 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== The resulting files are available here: http://speed47.net/tmp2/qgroup.tar.gz The run2 is the more complete one, during run1 the machine crashed even faster. It's interesting to note, however, that it seems to have crashed the same way and at the same step in the process. As the machine is now, qgroups seems OK : ~# btrfs qgroup show -pcre --raw /tank/ qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 32768 32768 none none --- --- 0/1906 3315696058368 3315696058368 none none --- --- 0/1909249901842432 249901842432 none none --- --- 0/1911 2109174587392 2109174587392 none none --- --- 0/3270 47454601216 47454601216 none none --- --- 0/3314 46408499200 32768 none none --- --- 0/3341 14991097856 32768 none none --- --- 0/3367 48371580928 48371580928 none none --- --- 0/5335 56523751424 280592384 none none --- --- 0/5336 601752535042599960576 none none --- --- 0/5337 45751746560 250888192 none none --- --- 0/5338 45804650496 186531840 none none --- --- 0/5339 45875167232 190521344 none none --- --- 0/5340 45933486080327680 none none --- --- 0/5341 45933502464344064 none none --- --- 0/5342 46442815488 35454976 none none --- --- 0/5343 46442520576 30638080 none none --- --- 0/5344 46448312320 36495360 none none --- --- 0/5345 46425235456 86204416 none none --- --- 0/5346 46081941504 119398400 none none --- --- 0/5347 46402715648 55615488 none none --- --- 0/5348 46403534848 50528256 none none --- --- 0/5349 45486301184 91463680 none none --- --- 0/5351 46414635008393216 none none --- --- 0/5352
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-18 12:15, Stéphane Lesimple a écrit : Le 2015-09-18 09:36, Stéphane Lesimple a écrit : Sure, I did a quota disable / quota enable before running the snapshot debug procedure, so the qgroups were clean again when I started : qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 none none --- --- 0/1906 1657848029184 1657848029184 none none --- --- 0/1909124950921216 124950921216 none none --- --- 0/1911 1054587293696 1054587293696 none none --- --- 0/3270 23727300608 23727300608 none none --- --- 0/3314 23221784576 23221784576 none none --- --- 0/3341 74792755207479275520 none none --- --- 0/3367 24185790464 24185790464 none none --- --- The test is running, I expect to post the results within an hour or two. Well, my system crashed twice while running the procedure... By "crashed" I mean : the machine no longer pings, and nothing is logged in kern.log unfortunately : [ 7096.735731] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [ 7172.614851] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [ 7242.870259] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [ 7321.466931] BTRFS info (device dm-3): qgroup scan completed (inconsistency flag cleared) [0.00] Initializing cgroup subsys cpuset The even stranger part is that the last 2 stdout dump files exist but are empty : -rw-r--r-- 1 root root 21 Sep 18 10:29 snap32.step5 -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6 -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1 -rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3 -rw-r--r-- 1 root root 21 Sep 18 10:30 snap33.step5 -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6 -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1 -rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step3 <== -rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step5 <== The mentioned steps are as follows : 0) Rsync data from the next ext4 "snapshot" to the subvolume 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 2) Create the needed readonly snapshot on btrfs 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== 4) Avoid doing IO if possible until step 6) 5) Do 'btrfs quota rescan -w' and save it <== 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <== The resulting files are available here: http://speed47.net/tmp2/qgroup.tar.gz The run2 is the more complete one, during run1 the machine crashed even faster. It's interesting to note, however, that it seems to have crashed the same way and at the same step in the process. Actually about that, I forgot I did set up netconsole before starting the second run after the first "muted" crash, and it did work : even if I have no logs in kern.log, netconsole managed to send them to my other machine before going down, so here it is : --- [ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at 01f0 [ 5738.172702] IP: [] start_transaction+0x1b/0x580 [btrfs] [ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0 [ 5738.172723] Oops: [#1] SMP [ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp nf_conntrack_sane iptable_security iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul snd ghash_clmulni_intel cryptd serio_raw soundcore mei_me mei lpc_ich shpchp mac_hid parport_pc ppdev nct6775 hwmon_vid coretemp lp parport btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear nbd raid1 i915 e1000e i2c_algo_bit drm_kms_helper syscopyarea ptp sysfillrect sysimgblt fb_sys_fops psmouse ahci drm libahci pps_core wmi video [last unloaded: netconsole] [ 5738.172831] CPU: 1 PID: 10932 Comm: kworker/u4:14 Not tainted 4.3.0-rc1 #1 [ 5738.172833] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 01/06/2014 [ 5738.172843] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-17 05:03, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/16 22:41 +0200: Le 2015-09-16 22:18, Duncan a écrit : Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted: Well actually it's the (d) option ;) I activate the quota feature for only one reason : being able to track down how much space my snapshots are taking. Yeah, that's completely one of the ideal use case of btrfs qgroup. But I'm quite curious about the btrfsck error report on qgroup. If btrfsck report such error, it means either I'm too confident about the recent qgroup accounting rework, or btrfsck has some bug which I didn't take much consideration during the kernel rework. Would you please provide the full result of previous btrfsck with qgroup error? Sure, I've saved the log somewhere just in case, here your are : Counts for qgroup id: 3359 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3361 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3362 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3363 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3361 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3362 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3363 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3364 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3365 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3366 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 16384 exclusive compressed 16384 disk: exclusive 16384 exclusive compressed 16384
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-16 15:04, Stéphane Lesimple a écrit : I also disabled quota because it has almost for sure nothing to do with the bug As it turns out, it seems that this assertion was completely wrong. I've got balance running for more than 16 hours now, without a crash. This is almost 50% of the work done without any issue. Before, a crash would happen within minutes, sometimes 1 hour, but not much more. The problem is, I didn't change anything to the filesystem, well, appart from the benign quota disable. So Qu's question about the qgroups errors in fsck made me wonder : if I activate quota again, it'll still continue to balance flawlessly, right ? Well, it doesn't. I just ran btrfs quota enable on my filesystem, it completed successfully after some minutes (rescan -s said that no rescan was pending). Then less than 5 minutes later, the kernel crashed, at the same BUG_ON() than usually : [60156.062082] BTRFS info (device dm-3): relocating block group 972839452672 flags 129 [60185.203626] BTRFS info (device dm-3): found 1463 extents [60414.452890] {btrfs} in insert_inline_extent_backref, got owner < BTRFS_FIRST_FREE_OBJECTID [60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384 parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256 [60414.452924] [ cut here ] [60414.452928] kernel BUG at fs/btrfs/extent-tree.c:1837! owner is=1 again at this point in the code (this is still kernel 4.3.0-rc1 with my added printks). So I'll disable quota, again, and resume the balance. If I'm right, it should proceed without issue for 18 more hours ! Qu, my filesystem is at your disposal :) -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-17 08:29, Stéphane Lesimple a écrit : Le 2015-09-16 15:04, Stéphane Lesimple a écrit : I also disabled quota because it has almost for sure nothing to do with the bug As it turns out, it seems that this assertion was completely wrong. I've got balance running for more than 16 hours now, without a crash. This is almost 50% of the work done without any issue. Before, a crash would happen within minutes, sometimes 1 hour, but not much more. The problem is, I didn't change anything to the filesystem, well, appart from the benign quota disable. So Qu's question about the qgroups errors in fsck made me wonder : if I activate quota again, it'll still continue to balance flawlessly, right ? Well, it doesn't. I just ran btrfs quota enable on my filesystem, it completed successfully after some minutes (rescan -s said that no rescan was pending). Then less than 5 minutes later, the kernel crashed, at the same BUG_ON() than usually : [60156.062082] BTRFS info (device dm-3): relocating block group 972839452672 flags 129 [60185.203626] BTRFS info (device dm-3): found 1463 extents [60414.452890] {btrfs} in insert_inline_extent_backref, got owner < BTRFS_FIRST_FREE_OBJECTID [60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384 parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256 [60414.452924] [ cut here ] [60414.452928] kernel BUG at fs/btrfs/extent-tree.c:1837! owner is=1 again at this point in the code (this is still kernel 4.3.0-rc1 with my added printks). So I'll disable quota, again, and resume the balance. If I'm right, it should proceed without issue for 18 more hours ! Damn, wrong again. It just re-crashed without quota enabled :( The fact that it went perfectly well for 17+ hours and crashed minutes after I reactivated quota might be by complete chance then ... [ 5487.706499] {btrfs} in insert_inline_extent_backref, got owner < BTRFS_FIRST_FREE_OBJECTID [ 5487.706504] {btrfs} with bytenr=6906661109760 num_bytes=16384 parent=6905020874752 root_objectid=18446744073709551608 owner=1 offset=0 refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256 [ 5487.706536] [ cut here ] [ 5487.706539] kernel BUG at fs/btrfs/extent-tree.c:1837! For reference, the crash I had earlier this morning was as follows : [60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384 parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256 So, this is a completely different part of the filesystem. The bug is always the same though, owner=1 where it shouldn't be < 256. Balance cancelled. To me, it sounds like some sort of race condition. But I'm out of ideas on what to test now. -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-17 08:42, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/17 08:11 +0200: Le 2015-09-17 05:03, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/16 22:41 +0200: Le 2015-09-16 22:18, Duncan a écrit : Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted: Well actually it's the (d) option ;) I activate the quota feature for only one reason : being able to track down how much space my snapshots are taking. Yeah, that's completely one of the ideal use case of btrfs qgroup. But I'm quite curious about the btrfsck error report on qgroup. If btrfsck report such error, it means either I'm too confident about the recent qgroup accounting rework, or btrfsck has some bug which I didn't take much consideration during the kernel rework. Would you please provide the full result of previous btrfsck with qgroup error? Sure, I've saved the log somewhere just in case, here your are : [...] Thanks for your log, pretty interesting result. BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1? If so, I would be much relaxed as they can be the problem of old kernels. The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota under 4.2.0 precisely. My kern.log tends to confirm that (looking for 'qgroup scan completed'). If it's OK for you, would you please enable quota after reproducing the bug and use for sometime and recheck it? Sure, I've just reproduced the bug twice as I wanted, and posted the info, so now I've cancelled the balance and I can reenable quota. Will do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about it in the following days. Regards, -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-17 10:11, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/17 10:02 +0200: Le 2015-09-17 08:42, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/17 08:11 +0200: Le 2015-09-17 05:03, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/16 22:41 +0200: Le 2015-09-16 22:18, Duncan a écrit : Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted: Well actually it's the (d) option ;) I activate the quota feature for only one reason : being able to track down how much space my snapshots are taking. Yeah, that's completely one of the ideal use case of btrfs qgroup. But I'm quite curious about the btrfsck error report on qgroup. If btrfsck report such error, it means either I'm too confident about the recent qgroup accounting rework, or btrfsck has some bug which I didn't take much consideration during the kernel rework. Would you please provide the full result of previous btrfsck with qgroup error? Sure, I've saved the log somewhere just in case, here your are : [...] Thanks for your log, pretty interesting result. BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1? If so, I would be much relaxed as they can be the problem of old kernels. The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota under 4.2.0 precisely. My kern.log tends to confirm that (looking for 'qgroup scan completed'). Emmm, seems I need to pay more attention on this case now. Any info about the workload for this btrfs fs? If it's OK for you, would you please enable quota after reproducing the bug and use for sometime and recheck it? Sure, I've just reproduced the bug twice as I wanted, and posted the info, so now I've cancelled the balance and I can reenable quota. Will do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about it in the following days. Regards, Thanks for your patience and detailed report. You're very welcome. But I still have another question, did you do any snapshot deletion after quota enabled? (I'll assume you did it, as there are a lot of backup snapshot, old ones should be already deleted) Actually no : this btrfs system is quite new (less than a week old) as I'm migrating from mdadm(raid1)+ext4 to btrfs. So those snapshots were actually rsynced one by one from my hardlinks-based "snapshots" under ext4 (those pseudo-snapshots are created using a program named "rsnapshot", if you know it. This is basically a wrapper to cp -la). I didn't activate yet an automatic snapshot/delete on my btrfs system, due to the bugs I'm tripping on. So no snapshot was deleted. That's one of the known bug and Mark is working on it actively. If you delete non-empty snapshot a lot, then I'd better add a hot fix to mark qgroup inconsistent after snapshot delete, and trigger a rescan if possible. I've made a btrfs-image of the filesystem just before disabling quotas (which I did to get a clean btrfsck and eliminate quotas from the equation trying to reproduce the bug I have). Would it be of any use if I drop it somewhere for you to pick it up ? (2.9G in size). In the meantime, I've reactivated quotas, umounted the filesystem and ran a btrfsck on it : as you would expect, there's no qgroup problem reported so far. I'll clear all my snapshots, run an quota rescan, then re-create them one by one by rsyncing from my ext4 system I still have. Maybe I'll run into the issue again. -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
在 2015年09月17日 18:08, Stéphane Lesimple 写道: Le 2015-09-17 10:11, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/17 10:02 +0200: Le 2015-09-17 08:42, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/17 08:11 +0200: Le 2015-09-17 05:03, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/16 22:41 +0200: Le 2015-09-16 22:18, Duncan a écrit : Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted: Well actually it's the (d) option ;) I activate the quota feature for only one reason : being able to track down how much space my snapshots are taking. Yeah, that's completely one of the ideal use case of btrfs qgroup. But I'm quite curious about the btrfsck error report on qgroup. If btrfsck report such error, it means either I'm too confident about the recent qgroup accounting rework, or btrfsck has some bug which I didn't take much consideration during the kernel rework. Would you please provide the full result of previous btrfsck with qgroup error? Sure, I've saved the log somewhere just in case, here your are : [...] Thanks for your log, pretty interesting result. BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1? If so, I would be much relaxed as they can be the problem of old kernels. The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota under 4.2.0 precisely. My kern.log tends to confirm that (looking for 'qgroup scan completed'). Emmm, seems I need to pay more attention on this case now. Any info about the workload for this btrfs fs? If it's OK for you, would you please enable quota after reproducing the bug and use for sometime and recheck it? Sure, I've just reproduced the bug twice as I wanted, and posted the info, so now I've cancelled the balance and I can reenable quota. Will do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about it in the following days. Regards, Thanks for your patience and detailed report. You're very welcome. But I still have another question, did you do any snapshot deletion after quota enabled? (I'll assume you did it, as there are a lot of backup snapshot, old ones should be already deleted) Actually no : this btrfs system is quite new (less than a week old) as I'm migrating from mdadm(raid1)+ext4 to btrfs. So those snapshots were actually rsynced one by one from my hardlinks-based "snapshots" under ext4 (those pseudo-snapshots are created using a program named "rsnapshot", if you know it. This is basically a wrapper to cp -la). I didn't activate yet an automatic snapshot/delete on my btrfs system, due to the bugs I'm tripping on. So no snapshot was deleted. Now things are getting tricky, as all known bugs are ruled out, it must be another hidden bug, even we tried to rework the qgroup accounting code. That's one of the known bug and Mark is working on it actively. If you delete non-empty snapshot a lot, then I'd better add a hot fix to mark qgroup inconsistent after snapshot delete, and trigger a rescan if possible. I've made a btrfs-image of the filesystem just before disabling quotas (which I did to get a clean btrfsck and eliminate quotas from the equation trying to reproduce the bug I have). Would it be of any use if I drop it somewhere for you to pick it up ? (2.9G in size). For dismatch case, static btrfs-image dump won't really help. As the important point is, when and which operation caused qgroup accounting to dismatch. In the meantime, I've reactivated quotas, umounted the filesystem and ran a btrfsck on it : as you would expect, there's no qgroup problem reported so far. At least, rescan code is working without problem. I'll clear all my snapshots, run an quota rescan, then re-create them one by one by rsyncing from my ext4 system I still have. Maybe I'll run into the issue again. Would you mind to do the following check for each subvolume rsync? 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output 2) Create the needed snapshot 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output 4) Avoid doing IO if possible until step 6) 5) Do 'btrfs quota rescan -w' and save it 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output 7) Rsync data from ext4 to the newly created snapshot The point is, as you mentioned, rescan is working fine, we can compare output from 3), 6) and 1) to see which qgroup accounting number changes. And if differs, which means the qgroup update at write time OR snapshot creation has something wrong, at least we can locate the problem to qgroup update routine or snapshot creation. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Stéphane Lesimple wrote on 2015/09/17 10:02 +0200: Le 2015-09-17 08:42, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/17 08:11 +0200: Le 2015-09-17 05:03, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/16 22:41 +0200: Le 2015-09-16 22:18, Duncan a écrit : Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted: Well actually it's the (d) option ;) I activate the quota feature for only one reason : being able to track down how much space my snapshots are taking. Yeah, that's completely one of the ideal use case of btrfs qgroup. But I'm quite curious about the btrfsck error report on qgroup. If btrfsck report such error, it means either I'm too confident about the recent qgroup accounting rework, or btrfsck has some bug which I didn't take much consideration during the kernel rework. Would you please provide the full result of previous btrfsck with qgroup error? Sure, I've saved the log somewhere just in case, here your are : [...] Thanks for your log, pretty interesting result. BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1? If so, I would be much relaxed as they can be the problem of old kernels. The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota under 4.2.0 precisely. My kern.log tends to confirm that (looking for 'qgroup scan completed'). Emmm, seems I need to pay more attention on this case now. Any info about the workload for this btrfs fs? If it's OK for you, would you please enable quota after reproducing the bug and use for sometime and recheck it? Sure, I've just reproduced the bug twice as I wanted, and posted the info, so now I've cancelled the balance and I can reenable quota. Will do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about it in the following days. Regards, Thanks for your patience and detailed report. But I still have another question, did you do any snapshot deletion after quota enabled? (I'll assume you did it, as there are a lot of backup snapshot, old ones should be already deleted) That's one of the known bug and Mark is working on it actively. If you delete non-empty snapshot a lot, then I'd better add a hot fix to mark qgroup inconsistent after snapshot delete, and trigger a rescan if possible. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Stéphane Lesimple wrote on 2015/09/17 08:11 +0200: Le 2015-09-17 05:03, Qu Wenruo a écrit : Stéphane Lesimple wrote on 2015/09/16 22:41 +0200: Le 2015-09-16 22:18, Duncan a écrit : Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted: Well actually it's the (d) option ;) I activate the quota feature for only one reason : being able to track down how much space my snapshots are taking. Yeah, that's completely one of the ideal use case of btrfs qgroup. But I'm quite curious about the btrfsck error report on qgroup. If btrfsck report such error, it means either I'm too confident about the recent qgroup accounting rework, or btrfsck has some bug which I didn't take much consideration during the kernel rework. Would you please provide the full result of previous btrfsck with qgroup error? Sure, I've saved the log somewhere just in case, here your are : Counts for qgroup id: 3359 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3361 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3362 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3363 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3361 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3362 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3363 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3364 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3365 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 49152 exclusive compressed 49152 disk: exclusive 32768 exclusive compressed 32768 diff: exclusive 16384 exclusive compressed 16384 Counts for qgroup id: 3366 are different our:referenced 7530119168 referenced compressed 7530119168 disk: referenced 7530086400 referenced compressed 7530086400 diff: referenced 32768 referenced compressed 32768 our:exclusive 16384 exclusive compressed 16384 disk:
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-17 12:41, Qu Wenruo a écrit : In the meantime, I've reactivated quotas, umounted the filesystem and ran a btrfsck on it : as you would expect, there's no qgroup problem reported so far. At least, rescan code is working without problem. I'll clear all my snapshots, run an quota rescan, then re-create them one by one by rsyncing from my ext4 system I still have. Maybe I'll run into the issue again. Would you mind to do the following check for each subvolume rsync? 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output 2) Create the needed snapshot 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output 4) Avoid doing IO if possible until step 6) 5) Do 'btrfs quota rescan -w' and save it 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output 7) Rsync data from ext4 to the newly created snapshot The point is, as you mentioned, rescan is working fine, we can compare output from 3), 6) and 1) to see which qgroup accounting number changes. And if differs, which means the qgroup update at write time OR snapshot creation has something wrong, at least we can locate the problem to qgroup update routine or snapshot creation. I was about to do that, but first there's something that sounds strange : I've begun by trashing all my snapshots, then ran a quota rescan, and waited for it to complete, to start on a sane base. However, this is the output of qgroup show now : qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 1638416384 none none --- --- 0/1906 16578480291841657848029184 none none --- --- 0/1909124950921216 124950921216 none none --- --- 0/1911 10545872936961054587293696 none none --- --- 0/3270 23727300608 23727300608 none none --- --- 0/3314 23206055936 23206055936 none none --- --- 0/3317 184729968640 none none --- --- 0/3318 22235709440 18446744073708421120 none none --- --- 0/3319 222403338240 none none --- --- 0/3320 222896087040 none none --- --- 0/3321 222896087040 none none --- --- 0/3322 184611512320 none none --- --- 0/3323 184239022080 none none --- --- 0/3324 184239022080 none none --- --- 0/3325 184635064320 none none --- --- 0/3326 184635064320 none none --- --- 0/3327 184635064320 none none --- --- 0/3328 184635064320 none none --- --- 0/3329 185854279680 none none --- --- 0/3330 18621472768 18446744073251348480 none none --- --- 0/3331 186214727680 none none --- --- 0/3332 186214727680 none none --- --- 0/ 187830763520 none none --- --- 0/3334 187998044160 none none --- --- 0/3335 187998044160 none none --- --- 0/3336 188162170880 none none --- --- 0/3337 188162662400 none none --- --- 0/3338 188162662400 none none --- --- 0/3339 188162662400 none none --- --- 0/3340 188163645440 none none --- --- 0/3341 7530119168 7530119168 none none --- --- 0/3342 49192837120 none none --- --- 0/3343 49217249280 none none --- --- 0/3344 49217249280 none none --- --- 0/3345 6503317504 18446744073690902528 none none --- --- 0/3346 65034526720 none none --- --- 0/3347 65095147520 none none --- --- 0/3348 65157939200 none none --- --- 0/3349 65157939200 none none --- --- 0/3350 65186856960 none none --- --- 0/3351 65215119360 none none ---
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-16 07:02, Duncan a écrit : Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 as excerpted: Le 2015-09-15 16:56, Josef Bacik a écrit : On 09/15/2015 10:47 AM, Stéphane Lesimple wrote: I've been experiencing repetitive "kernel BUG" occurences in the past few days trying to balance a raid5 filesystem after adding a new drive. It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools. I've ran a scrub on this filesystem after the crash happened twice, and if found no errors. The BUG_ON() condition that my filesystem triggers is the following : BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); // in insert_inline_extent_backref() of extent-tree.c. Does btrfsck complain at all? Just to elucidate a bit... [...] Which is where btrfs check comes in and why JB asked you to run it, since unlike scrub, check is designed to catch filesystem logic errors. Thanks for your clarification Duncan, that perfectly makes sense. You're right, even if btrfs scrub didn't complain, btrfsck does : checking extents bad metadata [4179166806016, 4179166822400) crossing stripe boundary bad metadata [4179166871552, 4179166887936) crossing stripe boundary bad metadata [4179166937088, 4179166953472) crossing stripe boundary This is an actively in-focus bug ATM, and while I'm not a dev and can't tell you for sure that it's behind the specific balance-related crash and traces you posted (tho I believe it so), it certainly has the potential to be that serious, yes. The most common cause is a buggy btrfs-convert that was creating invalid btrfs when converting from ext* at one point. AFAIK they've hotfixed the immediate convert issue, but are still actively working on a longer term proper fix. Meanwhile, while btrfs check does now detect the issue (and even that is quite new code, added in 4.2 I believe), there's still no real fix for what was after all a defective btrfs from the moment the convert was done. [...] If, however, you created the filesystem using mkfs.btrfs, then the problem must have occurred some other way. Whether there's some other cause beyond the known cause, a buggy btrfs-convert, has in fact been in question, so in this case the devs are likely to be quite interested indeed in your case and perhaps the filesystem history that brought you to this point. The ultimate fix is likely to be the same (unless the devs have you test new fix code for btrfs check --repair), but I'd strongly urge you to delay blowing away the filesystem, if possible, until the devs have a chance to ask you to run other diagnostics and perhaps even get a btrfs-image for them, since you may well have accidentally found a corner-case they'll have trouble reproducing, without your information. Nice to know that this bug was already somewhat known, but I can confirm that it actually doesn't come from an ext4 conversion on my case. Here is the filesystem history, which is actually quite short : - FS created from scratch, no convert, on 2x4T devices using mkfs.btrfs with raid1 metadata, raid5 data. This is using the 4.2 tools and kernel 3.19, so a couple incompat features were turned on by default (such as skinny metadata). - Approx. 4T worth of files copied to it, a bit less, I had around 30G free after the copy. - Upgraded to kernel 4.2.0 - Added a third 4T device to the filesystem - Ran a balance to get an even repartition of data/metadata among the 3 drives - Kernel BUG after a couple hours. The btrfs balance userspace tool segfaulted at the same time. Due to apport default configuration (damn you, Ubuntu !), core file was discarded, but I don't think the segfault is really interesting. The kernel trace is. This was all done within ~1 week. I've just created an image of the metadata, using btrfs-image -s. The image is 2.9G large, I can drop it somewhere in case a dev would like to have a look at it. For what it's worth, I've been hitting another kernel BUG, almost certainly related, while trying to dev del the 3rd device, after 8 hours of work (kernel 4.1.7) : kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2248! in __btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs] Trace: [] ? __percpu_counter_add+0x55/0x70 [] btrfs_run_delayed_refs.part.66+0x73/0x270 [btrfs] [] btrfs_run_delayed_refs+0x17/0x20 [btrfs] [] btrfs_should_end_transaction+0x49/0x60 [btrfs] [] btrfs_drop_snapshot+0x472/0x880 [btrfs] [] ? should_ignore_root.part.15+0x50/0x50 [btrfs] [] merge_reloc_roots+0xd9/0x240 [btrfs] [] relocate_block_group+0x269/0x670 [btrfs] [] btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs] [] btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs] [] btrfs_shrink_device+0x1d4/0x450 [btrfs] [] btrfs_rm_device+0x323/0x810 [btrfs] [] btrfs_ioctl+0x1e86/0x2b30 [btrfs] [] ? filemap_map_pages+0x1d4/0x230 [] ? handle_mm_fault+0xd95/0x17e0 [] ? from_kgid_munged+0x12/0x20 [] ? cp_new_stat+0x140/0x160 [] do_vfs_ioctl+0x2f8/0x510 [] ? __do_page_fault+0x1b6/0x450 [] ? SYSC_newstat+0x2f/0x40 []
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
On 09/16/15 12:28, Stéphane Lesimple wrote: > Nice to know that this bug was already somewhat known, but I can > confirm that it actually doesn't come from an ext4 conversion on my > case. In that case the "crossing stripe boundary" messages are false positives in btrfs-progs-4.2: http://www.spinics.net/lists/linux-btrfs/msg47059.html This should be fixed in the next release. -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-16 12:46, Holger Hoffstätte a écrit : On 09/16/15 12:28, Stéphane Lesimple wrote: Nice to know that this bug was already somewhat known, but I can confirm that it actually doesn't come from an ext4 conversion on my case. In that case the "crossing stripe boundary" messages are false positives in btrfs-progs-4.2: http://www.spinics.net/lists/linux-btrfs/msg47059.html This should be fixed in the next release. Out of curiosity I compiled the btrfs-progs-4.2 release patched with the diff you're referencing to fix the off-by-one error, and ran a btrfsck again. Indeed those errors disappear and my filesystem seems clean in this regard. I also disabled quota because it has almost for sure nothing to do with the bug, and now btrsfck is 100% happy: - checking extents checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164 UUID: 6bec1608-d9c0-453e-87eb-8b8663c9010d found 2922178546042 bytes used err is 0 total csum bytes: 2849102736 total tree bytes: 4697341952 total fs tree bytes: 1276395520 total extent tree bytes: 90963968 btree space waste bytes: 640514848 file data blocks allocated: 2959998808064 referenced 2959997575168 btrfs-progs v4.2-dirty - So this is even more interesting, my filesystem is reported by scrub and fsck as being in perfect shape, but still crashes the kernel from time to time on balance. Next step: reboot under 4.3.0-rc1 with my printk's, run a balance, log the crash, reboot, balance again, crash again, and compare. If the same filesystem spot triggers the crash twice then it would be an undetected metadata/filesystem internal integrity corruption, if it crashes at 2 different spots, then maybe it's some kind of race condition that, for some reason, my system hits way more often than others. -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-16 22:18, Duncan a écrit : Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted: Le 2015-09-16 12:46, Holger Hoffstätte a écrit : I also disabled quota because it has almost for sure nothing to do with the bug, and now btrsfck is 100% happy: Yes. Quotas have been a continuing issue on btrfs. AFAIK, they're on the third rewrite now, and still have some bugs to work out. So what I've been recommending is unless you're (a) directly and specifically working with the devs to find and fix the quota issues (in which case please keep at it! =:^), either you (b) really depend on quotas working and btrfs isn't appropriate for you at this time, since they're known to be buggy, so use a more mature filesystem where they're known to work, or (c) you don't actually need quotas at all, so disable them and remove the quota tracking metadata, thus avoiding the bugs in the feature entirely. =:^) Well actually it's the (d) option ;) I activate the quota feature for only one reason : being able to track down how much space my snapshots are taking. I've been using ZFS in the past, and I was really missing the "zfs list" command that is able to tell you how much space a given snapshot is actually taking under btrfs. With quota enabled, I was able to somehow mimic zfs list with a perl script I wrote, btrfs-list : PATHID TYPE REFER USED 'tank' -1 df - 2.66T (13.26G free) /tank 5 vol48.00K48.00K media 1906 subvol 1.04T 1.04T photos 1909 subvol 116.37G 116.37G main 1911 subvol 973.23G 973.23G bkp-slash 3270 subvol15.86G15.86G bkp-quasar 3314 subvol18.26G16.00K .snaps/bkp-quasar@2015-01-173317 rosnap17.77G40.76M .snaps/bkp-quasar@2015-03-063318 rosnap17.89G88.88M .snaps/bkp-quasar@2015-04-053319 rosnap17.92G90.97M .snaps/bkp-quasar@2015-05-313320 rosnap17.95G 1.02M .snaps/bkp-quasar@2015-06-133321 rosnap17.95G 760.00K .snaps/bkp-quasar@2015-07-263322 rosnap18.19G17.88M .snaps/bkp-quasar@2015-07-313323 rosnap18.19G14.58M .snaps/bkp-quasar@2015-08-033324 rosnap18.19G17.43M bkp-liznodisk 3341 subvol 7.01G16.00K .snaps/bkp-liznodisk@2015-03-01 3342 rosnap 6.96G75.37M .snaps/bkp-liznodisk@2015-03-28 3343 rosnap 6.98G84.93M .snaps/bkp-liznodisk@2015-04-26 3344 rosnap 6.96G67.14M .snaps/bkp-liznodisk@2015-05-24 3345 rosnap 6.95G47.67M .snaps/bkp-liznodisk@2015-06-27 3346 rosnap 6.96G67.97M .snaps/bkp-liznodisk@2015-07-25 3347 rosnap 6.98G60.30M .snaps/bkp-liznodisk@2015-08-16 3348 rosnap 7.10G 159.44M bkp-skyline3367 subvol22.52G22.52G I just pushed it to https://github.com/speed47/btrfs-list, if anybody is interested. Anyway, balance is running again for 7+ hours, trying to reproduce the bug twice, but no crash yet ... should happen soon now ! -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Stéphane Lesimple wrote on 2015/09/16 22:41 +0200: Le 2015-09-16 22:18, Duncan a écrit : Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted: Le 2015-09-16 12:46, Holger Hoffstätte a écrit : I also disabled quota because it has almost for sure nothing to do with the bug, and now btrsfck is 100% happy: Yes. Quotas have been a continuing issue on btrfs. AFAIK, they're on the third rewrite now, and still have some bugs to work out. So what I've been recommending is unless you're (a) directly and specifically working with the devs to find and fix the quota issues (in which case please keep at it! =:^), either you (b) really depend on quotas working and btrfs isn't appropriate for you at this time, since they're known to be buggy, so use a more mature filesystem where they're known to work, or (c) you don't actually need quotas at all, so disable them and remove the quota tracking metadata, thus avoiding the bugs in the feature entirely. =:^) Well actually it's the (d) option ;) I activate the quota feature for only one reason : being able to track down how much space my snapshots are taking. I've been using ZFS in the past, and I was really missing the "zfs list" command that is able to tell you how much space a given snapshot is actually taking under btrfs. With quota enabled, I was able to somehow mimic zfs list with a perl script I wrote, btrfs-list : PATHID TYPE REFER USED 'tank' -1 df - 2.66T (13.26G free) /tank 5 vol48.00K48.00K media 1906 subvol 1.04T 1.04T photos 1909 subvol 116.37G 116.37G main 1911 subvol 973.23G 973.23G bkp-slash 3270 subvol15.86G15.86G bkp-quasar 3314 subvol18.26G16.00K .snaps/bkp-quasar@2015-01-173317 rosnap17.77G40.76M .snaps/bkp-quasar@2015-03-063318 rosnap17.89G88.88M .snaps/bkp-quasar@2015-04-053319 rosnap17.92G90.97M .snaps/bkp-quasar@2015-05-313320 rosnap17.95G 1.02M .snaps/bkp-quasar@2015-06-133321 rosnap17.95G 760.00K .snaps/bkp-quasar@2015-07-263322 rosnap18.19G17.88M .snaps/bkp-quasar@2015-07-313323 rosnap18.19G14.58M .snaps/bkp-quasar@2015-08-033324 rosnap18.19G17.43M bkp-liznodisk 3341 subvol 7.01G16.00K .snaps/bkp-liznodisk@2015-03-01 3342 rosnap 6.96G75.37M .snaps/bkp-liznodisk@2015-03-28 3343 rosnap 6.98G84.93M .snaps/bkp-liznodisk@2015-04-26 3344 rosnap 6.96G67.14M .snaps/bkp-liznodisk@2015-05-24 3345 rosnap 6.95G47.67M .snaps/bkp-liznodisk@2015-06-27 3346 rosnap 6.96G67.97M .snaps/bkp-liznodisk@2015-07-25 3347 rosnap 6.98G60.30M .snaps/bkp-liznodisk@2015-08-16 3348 rosnap 7.10G 159.44M bkp-skyline3367 subvol22.52G22.52G I just pushed it to https://github.com/speed47/btrfs-list, if anybody is interested. Anyway, balance is running again for 7+ hours, trying to reproduce the bug twice, but no crash yet ... should happen soon now ! Yeah, that's completely one of the ideal use case of btrfs qgroup. But I'm quite curious about the btrfsck error report on qgroup. If btrfsck report such error, it means either I'm too confident about the recent qgroup accounting rework, or btrfsck has some bug which I didn't take much consideration during the kernel rework. Would you please provide the full result of previous btrfsck with qgroup error? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 as excerpted: > Le 2015-09-15 16:56, Josef Bacik a écrit : >> On 09/15/2015 10:47 AM, Stéphane Lesimple wrote: I've been experiencing repetitive "kernel BUG" occurences in the past few days trying to balance a raid5 filesystem after adding a new drive. It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools. >>> >>> I've ran a scrub on this filesystem after the crash happened twice, >>> and if found no errors. >>> >>> The BUG_ON() condition that my filesystem triggers is the following : >>> >>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); >>> // in insert_inline_extent_backref() of extent-tree.c. >>> >> Does btrfsck complain at all? Just to elucidate a bit... Scrub is designed to detect, and where there's a second copy available (dup or raid1/10 modes, raid5/6 modes can reconstruct from parity) correct, exactly one problem, corruption where the checksum stored at data write doesn't match that computed on data read back from storage. As such, it detects/corrects media errors and (perhaps more commonly) corrupted data due to crashes in the middle of the write, but if the data was bad when it was written in the first place and thus the checksum covering it simply validates what was already bad before the write happened, scrub will be none the wiser and will happily validate the incorrect data, since it's a totally valid checksum covering data that was bad before the checksum was ever created. Which is where btrfs check comes in and why JB asked you to run it, since unlike scrub, check is designed to catch filesystem logic errors. > Thanks for your suggestion. > You're right, even if btrfs scrub didn't complain, btrfsck does : > > checking extents > bad metadata [4179166806016, 4179166822400) crossing stripe boundary > bad metadata [4179166871552, 4179166887936) crossing stripe boundary > bad metadata [4179166937088, 4179166953472) crossing stripe boundary This is an actively in-focus bug ATM, and while I'm not a dev and can't tell you for sure that it's behind the specific balance-related crash and traces you posted (tho I believe it so), it certainly has the potential to be that serious, yes. The most common cause is a buggy btrfs-convert that was creating invalid btrfs when converting from ext* at one point. AFAIK they've hotfixed the immediate convert issue, but are still actively working on a longer term proper fix. Meanwhile, while btrfs check does now detect the issue (and even that is quite new code, added in 4.2 I believe), there's still no real fix for what was after all a defective btrfs from the moment the convert was done. So where that's the cause, the filesystem was created from an ext* fs using a buggy btrfs-convert and is thus actually invalid due to this cross-stripe-metadata, the current fix is to back up the files you want to keep (and FWIW, as any good sysadmin will tell you, a backup that hasn't been tested restorable isn't yet a backup, as the job isn't complete), then blow away and recreate the filesystem properly, using mkfs.btrfs, and of course then restore to the new filesystem. If, however, you created the filesystem using mkfs.btrfs, then the problem must have occurred some other way. Whether there's some other cause beyond the known cause, a buggy btrfs-convert, has in fact been in question, so in this case the devs are likely to be quite interested indeed in your case and perhaps the filesystem history that brought you to this point. The ultimate fix is likely to be the same (unless the devs have you test new fix code for btrfs check --repair), but I'd strongly urge you to delay blowing away the filesystem, if possible, until the devs have a chance to ask you to run other diagnostics and perhaps even get a btrfs-image for them, since you may well have accidentally found a corner-case they'll have trouble reproducing, without your information. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
Le 2015-09-15 16:56, Josef Bacik a écrit : On 09/15/2015 10:47 AM, Stéphane Lesimple wrote: I've been experiencing repetitive "kernel BUG" occurences in the past few days trying to balance a raid5 filesystem after adding a new drive. It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools. I've ran a scrub on this filesystem after the crash happened twice, and if found no errors. The BUG_ON() condition that my filesystem triggers is the following : BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); // in insert_inline_extent_backref() of extent-tree.c. I've compiled a fresh 4.3.0-rc1 with a couple added printk's just before the BUG_ON(), to dump the parameters passed to insert_inline_extent_backref() when the problem occurs. Here is an excerpt of the resulting dmesg : {btrfs} in insert_inline_extent_backref, got owner < BTRFS_FIRST_FREE_OBJECTID {btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432 root_objectid=3339 owner=1 offset=0 refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256 [ cut here ] kernel BUG at fs/btrfs/extent-tree.c:1837! I'll retry with the exact same kernel once I get the machine back up, and see if the the bug happens again at the same filesystem spot or a different one. The variable amount of time after a balance start elapsed before I get the bug suggests that this would be a different one. Does btrfsck complain at all? Thanks for your suggestion. You're right, even if btrfs scrub didn't complain, btrfsck does : checking extents bad metadata [4179166806016, 4179166822400) crossing stripe boundary bad metadata [4179166871552, 4179166887936) crossing stripe boundary bad metadata [4179166937088, 4179166953472) crossing stripe boundary [... some more ...] extent buffer leak: start 4561066901504 len 16384 extent buffer leak: start 4561078812672 len 16384 extent buffer leak: start 4561078861824 len 16384 [... some more ...] then some complains about mismatched counts for qgroups. I can see from tbe btrfsck source code that the --repair will not work here, so I didn't try. I'm not sure if those errors would be a cause or a consequence of the bug. As the filesystem was only a few days old and as there was always a balance running during the crashes, I would be tempted to think it might actually be a consequence, but I can't be sure. In your experience, could these inconsistencies cause the crash ? If you think so, then I'll btrfs dev del the 3rd device, then remount the array degraded with just 1 disk and create a new btrfs system from scratch on the second, then copy the data in single redundancy, then re-add the 2 disks and balance convert in raid5. If you think not, then this array could still help you debug a corner case, and I can keep it that way for a couple days if more testing/debug is needed. Thanks, -- Stéphane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
I've been experiencing repetitive "kernel BUG" occurences in the past few days trying to balance a raid5 filesystem after adding a new drive. It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools. I've ran a scrub on this filesystem after the crash happened twice, and if found no errors. The BUG_ON() condition that my filesystem triggers is the following : BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); // in insert_inline_extent_backref() of extent-tree.c. I've compiled a fresh 4.3.0-rc1 with a couple added printk's just before the BUG_ON(), to dump the parameters passed to insert_inline_extent_backref() when the problem occurs. Here is an excerpt of the resulting dmesg : {btrfs} in insert_inline_extent_backref, got owner < BTRFS_FIRST_FREE_OBJECTID {btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432 root_objectid=3339 owner=1 offset=0 refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256 [ cut here ] kernel BUG at fs/btrfs/extent-tree.c:1837! I'll retry with the exact same kernel once I get the machine back up, and see if the the bug happens again at the same filesystem spot or a different one. The variable amount of time after a balance start elapsed before I get the bug suggests that this would be a different one. -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
On 09/15/2015 10:47 AM, Stéphane Lesimple wrote: I've been experiencing repetitive "kernel BUG" occurences in the past few days trying to balance a raid5 filesystem after adding a new drive. It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools. I've ran a scrub on this filesystem after the crash happened twice, and if found no errors. The BUG_ON() condition that my filesystem triggers is the following : BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); // in insert_inline_extent_backref() of extent-tree.c. I've compiled a fresh 4.3.0-rc1 with a couple added printk's just before the BUG_ON(), to dump the parameters passed to insert_inline_extent_backref() when the problem occurs. Here is an excerpt of the resulting dmesg : {btrfs} in insert_inline_extent_backref, got owner < BTRFS_FIRST_FREE_OBJECTID {btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432 root_objectid=3339 owner=1 offset=0 refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256 [ cut here ] kernel BUG at fs/btrfs/extent-tree.c:1837! I'll retry with the exact same kernel once I get the machine back up, and see if the the bug happens again at the same filesystem spot or a different one. The variable amount of time after a balance start elapsed before I get the bug suggests that this would be a different one. Does btrfsck complain at all? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html