Re: btrfs bio linked list corruption.
On Sat, Oct 15, 2016 at 08:42:40PM -0400, Dave Jones wrote: On Thu, Oct 13, 2016 at 05:18:46PM -0400, Chris Mason wrote: > > > > .. and of course the first thing that happens is a completely different > > > > btrfs trace.. > > > > > > > > > > > > WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 start_transaction+0x40a/0x440 [btrfs] > > > > CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 > > > > c900019076a8 b731ff3c > > > > c900019076e8 b707a6c1 01e9f5806ce0 8804f74c4d98 > > > > 0801 880501cfa2a8 008a 008a > > > > > > This isn't even IO. Uuug. We're going to need a fast enough test > > > that we can bisect. > > > > Progress... > > I've found that this combination of syscalls.. > > > > ./trinity -C64 -q -l off -a64 --enable-fds=testfile -c fsync -c fsetxattr -c lremovexattr -c pwritev2 > > > > hits one of these two bugs in a few minutes runtime. > > > > Just the xattr syscalls + fsync isn't enough, neither is just pwrite + fsync. > > Mix them together though, and something goes awry. > > > Hasn't triggered here yet. I'll leave it running though. The hits keep coming.. BUG: Bad page state in process kworker/u8:12 pfn:4988fa page:ea0012623e80 count:0 mapcount:0 mapping:8804450456e0 index:0x9 Hmpf, I've had this running since Friday without failing. Can you send me your .config please? -chris
Re: btrfs bio linked list corruption.
On Thu, Oct 13, 2016 at 05:18:46PM -0400, Chris Mason wrote: > > > > .. and of course the first thing that happens is a completely > > different > > > > btrfs trace.. > > > > > > > > > > > > WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 > > start_transaction+0x40a/0x440 [btrfs] > > > > CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 > > > > c900019076a8 b731ff3c > > > > c900019076e8 b707a6c1 01e9f5806ce0 8804f74c4d98 > > > > 0801 880501cfa2a8 008a 008a > > > > > > This isn't even IO. Uuug. We're going to need a fast enough test > > > that we can bisect. > > > > Progress... > > I've found that this combination of syscalls.. > > > > ./trinity -C64 -q -l off -a64 --enable-fds=testfile -c fsync -c fsetxattr > > -c lremovexattr -c pwritev2 > > > > hits one of these two bugs in a few minutes runtime. > > > > Just the xattr syscalls + fsync isn't enough, neither is just pwrite + > > fsync. > > Mix them together though, and something goes awry. > > > Hasn't triggered here yet. I'll leave it running though. The hits keep coming.. BUG: Bad page state in process kworker/u8:12 pfn:4988fa page:ea0012623e80 count:0 mapcount:0 mapping:8804450456e0 index:0x9 flags: 0x400c(referenced|uptodate) page dumped because: non-NULL mapping CPU: 2 PID: 1388 Comm: kworker/u8:12 Not tainted 4.8.0-think+ #18 Workqueue: writeback wb_workfn (flush-btrfs-1) c9aef7e8 81320e7c ea0012623e80 819fe6ec c9aef810 81159b3f ea0012623e80 400c c9aef820 81159bfa c9aef868 Call Trace: [] dump_stack+0x4f/0x73 [] bad_page+0xbf/0x120 [] free_pages_check_bad+0x5a/0x70 [] free_hot_cold_page+0x20b/0x270 [] free_hot_cold_page_list+0x2b/0x50 [] release_pages+0x2d2/0x380 [] __pagevec_release+0x22/0x30 [] extent_write_cache_pages.isra.48.constprop.63+0x350/0x430 [btrfs] [] ? debug_smp_processor_id+0x17/0x20 [] ? get_lock_stats+0x19/0x50 [] extent_writepages+0x58/0x80 [btrfs] [] ? btrfs_releasepage+0x40/0x40 [btrfs] [] btrfs_writepages+0x23/0x30 [btrfs] [] do_writepages+0x1c/0x30 [] __writeback_single_inode+0x33/0x180 [] writeback_sb_inodes+0x2cb/0x5d0 [] __writeback_inodes_wb+0x8d/0xc0 [] wb_writeback+0x203/0x210 [] wb_workfn+0xe7/0x2a0 [] ? __lock_acquire.isra.32+0x1cf/0x8c0 [] process_one_work+0x1da/0x4b0 [] ? process_one_work+0x17a/0x4b0 [] worker_thread+0x49/0x490 [] ? process_one_work+0x4b0/0x4b0 [] ? process_one_work+0x4b0/0x4b0
Re: btrfs bio linked list corruption.
On Thu, Oct 13, 2016 at 05:18:46PM -0400, Chris Mason wrote: > > > > WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 > > start_transaction+0x40a/0x440 [btrfs] > > > > CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 > > > > c900019076a8 b731ff3c > > > > c900019076e8 b707a6c1 01e9f5806ce0 8804f74c4d98 > > > > 0801 880501cfa2a8 008a 008a > > > > > > This isn't even IO. Uuug. We're going to need a fast enough test > > > that we can bisect. > > > > Progress... > > I've found that this combination of syscalls.. > > > > ./trinity -C64 -q -l off -a64 --enable-fds=testfile -c fsync -c fsetxattr > > -c lremovexattr -c pwritev2 > > > > hits one of these two bugs in a few minutes runtime. > > > > Just the xattr syscalls + fsync isn't enough, neither is just pwrite + > > fsync. > > Mix them together though, and something goes awry. > > > > Hasn't triggered here yet. I'll leave it running though. With that combo of params I triggered it 3-4 times in a row within minutes.. Then as soon as I posted, it stopped being so easy to repro. There's some other variable I haven't figured out yet (maybe how the random way that files get opened in fds/testfiles.c), but it does seem to point at the xattr changes. I'll poke at it some more tomorrow. Dave
Re: btrfs bio linked list corruption.
On 10/13/2016 02:16 PM, Dave Jones wrote: On Wed, Oct 12, 2016 at 10:42:46AM -0400, Chris Mason wrote: > On 10/12/2016 10:40 AM, Dave Jones wrote: > > On Wed, Oct 12, 2016 at 09:47:17AM -0400, Dave Jones wrote: > > > On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > > > > > > > > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > > > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > > > > > > > [ cut here ] > > > > > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > > > > > list_add corruption. prev->next should be next (e8806648), but was c967fcd8. (prev=880503878b80). > > > > > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > > > > > c9d87458 8d32007c c9d874a8 > > > > > c9d87498 8d07a6c1 00210246 88050388e880 > > > > > > I hit this again overnight, it's the same trace, the only difference > > > being slightly different addresses in the list pointers: > > > > > > [42572.777196] list_add corruption. prev->next should be next (e8806648), but was c9647cd8. (prev=880503a0ba00). > > > > > > I'm actually a little surprised that ->next was the same across two > > > reboots on two different kernel builds. That might be a sign this is > > > more repeatable than I'd thought, even if it does take hours of runtime > > > right now to trigger it. I'll try and narrow the scope of what trinity > > > is doing to see if I can make it happen faster. > > > > .. and of course the first thing that happens is a completely different > > btrfs trace.. > > > > > > WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 start_transaction+0x40a/0x440 [btrfs] > > CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 > > c900019076a8 b731ff3c > > c900019076e8 b707a6c1 01e9f5806ce0 8804f74c4d98 > > 0801 880501cfa2a8 008a 008a > > This isn't even IO. Uuug. We're going to need a fast enough test > that we can bisect. Progress... I've found that this combination of syscalls.. ./trinity -C64 -q -l off -a64 --enable-fds=testfile -c fsync -c fsetxattr -c lremovexattr -c pwritev2 hits one of these two bugs in a few minutes runtime. Just the xattr syscalls + fsync isn't enough, neither is just pwrite + fsync. Mix them together though, and something goes awry. Hasn't triggered here yet. I'll leave it running though. -chris
Re: btrfs bio linked list corruption.
On Wed, Oct 12, 2016 at 10:42:46AM -0400, Chris Mason wrote: > On 10/12/2016 10:40 AM, Dave Jones wrote: > > On Wed, Oct 12, 2016 at 09:47:17AM -0400, Dave Jones wrote: > > > On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > > > > > > > > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > > > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > > > > > > > [ cut here ] > > > > > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 > > __list_add+0x89/0xb0 > > > > > list_add corruption. prev->next should be next (e8806648), > > but was c967fcd8. (prev=880503878b80). > > > > > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > > > > > c9d87458 8d32007c c9d874a8 > > > > > > > c9d87498 8d07a6c1 00210246 > > 88050388e880 > > > > > > I hit this again overnight, it's the same trace, the only difference > > > being slightly different addresses in the list pointers: > > > > > > [42572.777196] list_add corruption. prev->next should be next > > (e8806648), but was c9647cd8. (prev=880503a0ba00). > > > > > > I'm actually a little surprised that ->next was the same across two > > > reboots on two different kernel builds. That might be a sign this is > > > more repeatable than I'd thought, even if it does take hours of runtime > > > right now to trigger it. I'll try and narrow the scope of what trinity > > > is doing to see if I can make it happen faster. > > > > .. and of course the first thing that happens is a completely different > > btrfs trace.. > > > > > > WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 > > start_transaction+0x40a/0x440 [btrfs] > > CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 > > c900019076a8 b731ff3c > > c900019076e8 b707a6c1 01e9f5806ce0 8804f74c4d98 > > 0801 880501cfa2a8 008a 008a > > This isn't even IO. Uuug. We're going to need a fast enough test > that we can bisect. Progress... I've found that this combination of syscalls.. ./trinity -C64 -q -l off -a64 --enable-fds=testfile -c fsync -c fsetxattr -c lremovexattr -c pwritev2 hits one of these two bugs in a few minutes runtime. Just the xattr syscalls + fsync isn't enough, neither is just pwrite + fsync. Mix them together though, and something goes awry. Dave
Re: btrfs bio linked list corruption.
On Wed, Oct 12, 2016 at 09:47:17AM -0400, Dave Jones wrote: > On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > > > [ cut here ] > > > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > > > list_add corruption. prev->next should be next (e8806648), but > was c967fcd8. (prev=880503878b80). > > > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > > > c9d87458 8d32007c c9d874a8 > > > c9d87498 8d07a6c1 00210246 88050388e880 > > I hit this again overnight, it's the same trace, the only difference > being slightly different addresses in the list pointers: > > [42572.777196] list_add corruption. prev->next should be next > (e8806648), but was c9647cd8. (prev=880503a0ba00). > > I'm actually a little surprised that ->next was the same across two > reboots on two different kernel builds. That might be a sign this is > more repeatable than I'd thought, even if it does take hours of runtime > right now to trigger it. I'll try and narrow the scope of what trinity > is doing to see if I can make it happen faster. .. and of course the first thing that happens is a completely different btrfs trace.. WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 start_transaction+0x40a/0x440 [btrfs] CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 c900019076a8 b731ff3c c900019076e8 b707a6c1 01e9f5806ce0 8804f74c4d98 0801 880501cfa2a8 008a 008a Call Trace: [] dump_stack+0x4f/0x73 [] __warn+0xc1/0xe0 [] warn_slowpath_null+0x18/0x20 [] start_transaction+0x40a/0x440 [btrfs] [] ? btrfs_alloc_path+0x15/0x20 [btrfs] [] btrfs_join_transaction+0x12/0x20 [btrfs] [] cow_file_range_inline+0xef/0x830 [btrfs] [] cow_file_range.isra.64+0x365/0x480 [btrfs] [] ? _raw_spin_unlock+0x2c/0x50 [] ? release_extent_buffer+0x9f/0x110 [btrfs] [] run_delalloc_nocow+0x409/0xbd0 [btrfs] [] ? get_lock_stats+0x19/0x50 [] run_delalloc_range+0x38a/0x3e0 [btrfs] [] writepage_delalloc.isra.47+0x10a/0x190 [btrfs] [] __extent_writepage+0xd8/0x2c0 [btrfs] [] extent_write_cache_pages.isra.44.constprop.63+0x2ce/0x430 [btrfs] [] ? debug_smp_processor_id+0x17/0x20 [] ? get_lock_stats+0x19/0x50 [] extent_writepages+0x58/0x80 [btrfs] [] ? btrfs_releasepage+0x40/0x40 [btrfs] [] btrfs_writepages+0x23/0x30 [btrfs] [] do_writepages+0x1c/0x30 [] __filemap_fdatawrite_range+0xc1/0x100 [] filemap_fdatawrite_range+0xe/0x10 [] btrfs_fdatawrite_range+0x1b/0x50 [btrfs] [] btrfs_wait_ordered_range+0x40/0x100 [btrfs] [] btrfs_sync_file+0x285/0x390 [btrfs] [] vfs_fsync_range+0x46/0xa0 [] do_fsync+0x38/0x60 [] SyS_fsync+0xb/0x10 [] do_syscall_64+0x5c/0x170 [] entry_SYSCALL64_slow_path+0x25/0x25
Re: btrfs bio linked list corruption.
On 10/12/2016 10:40 AM, Dave Jones wrote: On Wed, Oct 12, 2016 at 09:47:17AM -0400, Dave Jones wrote: > On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > > > [ cut here ] > > > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > > > list_add corruption. prev->next should be next (e8806648), but was c967fcd8. (prev=880503878b80). > > > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > > > c9d87458 8d32007c c9d874a8 > > > c9d87498 8d07a6c1 00210246 88050388e880 > > I hit this again overnight, it's the same trace, the only difference > being slightly different addresses in the list pointers: > > [42572.777196] list_add corruption. prev->next should be next (e8806648), but was c9647cd8. (prev=880503a0ba00). > > I'm actually a little surprised that ->next was the same across two > reboots on two different kernel builds. That might be a sign this is > more repeatable than I'd thought, even if it does take hours of runtime > right now to trigger it. I'll try and narrow the scope of what trinity > is doing to see if I can make it happen faster. .. and of course the first thing that happens is a completely different btrfs trace.. WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 start_transaction+0x40a/0x440 [btrfs] CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 c900019076a8 b731ff3c c900019076e8 b707a6c1 01e9f5806ce0 8804f74c4d98 0801 880501cfa2a8 008a 008a This isn't even IO. Uuug. We're going to need a fast enough test that we can bisect. -chris
Re: btrfs bio linked list corruption.
On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > [ cut here ] > > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > > list_add corruption. prev->next should be next (e8806648), but was > > c967fcd8. (prev=880503878b80). > > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > > c9d87458 8d32007c c9d874a8 > > c9d87498 8d07a6c1 00210246 88050388e880 I hit this again overnight, it's the same trace, the only difference being slightly different addresses in the list pointers: [42572.777196] list_add corruption. prev->next should be next (e8806648), but was c9647cd8. (prev=880503a0ba00). I'm actually a little surprised that ->next was the same across two reboots on two different kernel builds. That might be a sign this is more repeatable than I'd thought, even if it does take hours of runtime right now to trigger it. I'll try and narrow the scope of what trinity is doing to see if I can make it happen faster. Dave
Re: btrfs bio linked list corruption.
On 10/11/2016 10:45 AM, Dave Jones wrote: > This is from Linus' current tree, with Al's iovec fixups on top. > > [ cut here ] > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > list_add corruption. prev->next should be next (e8806648), but was > c967fcd8. (prev=880503878b80). > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > c9d87458 8d32007c c9d874a8 > c9d87498 8d07a6c1 00210246 88050388e880 > 880503878b80 e8806648 e8c06600 880502808008 > Call Trace: > [] dump_stack+0x4f/0x73 > [] __warn+0xc1/0xe0 > [] warn_slowpath_fmt+0x5a/0x80 > [] __list_add+0x89/0xb0 > [] blk_sq_make_request+0x2f8/0x350 /* * A task plug currently exists. Since this is completely lockless, * utilize that to temporarily store requests until the task is * either done or scheduled away. */ plug = current->plug; if (plug) { blk_mq_bio_to_request(rq, bio); if (!request_count) trace_block_plug(q); blk_mq_put_ctx(data.ctx); if (request_count >= BLK_MAX_REQUEST_COUNT) { blk_flush_plug_list(plug, false); trace_block_plug(q); } list_add_tail(&rq->queuelist, &plug->mq_list); ^^ Dave, is this where we're crashing? This seems strange. -chris
Re: btrfs bio linked list corruption.
On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > [ cut here ] > > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > > list_add corruption. prev->next should be next (e8806648), but was > > c967fcd8. (prev=880503878b80). > > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > > c9d87458 8d32007c c9d874a8 > > c9d87498 8d07a6c1 00210246 88050388e880 > > 880503878b80 e8806648 e8c06600 880502808008 > > Call Trace: > > [] dump_stack+0x4f/0x73 > > [] __warn+0xc1/0xe0 > > [] warn_slowpath_fmt+0x5a/0x80 > > [] __list_add+0x89/0xb0 > > [] blk_sq_make_request+0x2f8/0x350 > >/* > > * A task plug currently exists. Since this is completely lockless, > > * utilize that to temporarily store requests until the task is > > * either done or scheduled away. > > */ > > plug = current->plug; > > if (plug) { > > blk_mq_bio_to_request(rq, bio); > > if (!request_count) > > trace_block_plug(q); > > > > blk_mq_put_ctx(data.ctx); > > > > if (request_count >= BLK_MAX_REQUEST_COUNT) { > > blk_flush_plug_list(plug, false); > > trace_block_plug(q); > > } > > > > list_add_tail(&rq->queuelist, &plug->mq_list); > > ^^ > > Dave, is this where we're crashing? This seems strange. According to objdump -S .. 8130a1b7: 48 8b 70 50 mov0x50(%rax),%rsi list_add_tail(&rq->queuelist, &ctx->rq_list); 8130a1bb: 48 8d 50 48 lea0x48(%rax),%rdx 8130a1bf: 48 89 45 a8 mov%rax,-0x58(%rbp) 8130a1c3: e8 38 44 03 00 callq 8133e600 <__list_add> blk_mq_hctx_mark_pending(hctx, ctx); 8130a1c8: 48 8b 45 a8 mov-0x58(%rbp),%rax 8130a1cc: 4c 89 ffmov%r15,%rdi That looks like the list_add_tail from __blk_mq_insert_req_list Dave
Re: btrfs bio linked list corruption.
On Tue, Oct 11, 2016 at 11:20:41AM -0400, Chris Mason wrote: > > > On 10/11/2016 11:19 AM, Dave Jones wrote: > > On Tue, Oct 11, 2016 at 04:11:39PM +0100, Al Viro wrote: > > > On Tue, Oct 11, 2016 at 10:45:08AM -0400, Dave Jones wrote: > > > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > > > Those iovec fixups are in the current tree... > > > > ah yeah, git quietly dropped my local copy when I rebased so I didn't > > notice. > > > > > TBH, I don't see anything > > > in splice-related stuff that could come anywhere near that (short of > > > some general memory corruption having random effects of that sort). > > > > > > Could you try to bisect that sucker, or is it too hard to reproduce? > > > > Only hit it the once overnight so far. Will see if I can find a better way > > to > > reproduce today. > > This call trace is reading metadata so we can finish the truncate. I'd > say adding more memory pressure would make it happen more often. That story checks out. There were a bunch of oom's in the log before this. Dave
btrfs bio linked list corruption.
This is from Linus' current tree, with Al's iovec fixups on top. [ cut here ] WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 list_add corruption. prev->next should be next (e8806648), but was c967fcd8. (prev=880503878b80). CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 c9d87458 8d32007c c9d874a8 c9d87498 8d07a6c1 00210246 88050388e880 880503878b80 e8806648 e8c06600 880502808008 Call Trace: [] dump_stack+0x4f/0x73 [] __warn+0xc1/0xe0 [] warn_slowpath_fmt+0x5a/0x80 [] __list_add+0x89/0xb0 [] blk_sq_make_request+0x2f8/0x350 [] ? generic_make_request+0xec/0x240 [] generic_make_request+0xf9/0x240 [] submit_bio+0x78/0x150 [] ? __percpu_counter_add+0x85/0xb0 [] btrfs_map_bio+0x19e/0x330 [btrfs] [] btree_submit_bio_hook+0xfa/0x110 [btrfs] [] submit_one_bio+0x65/0xa0 [btrfs] [] read_extent_buffer_pages+0x2f0/0x3d0 [btrfs] [] ? free_root_pointers+0x60/0x60 [btrfs] [] btree_read_extent_buffer_pages.constprop.55+0xa8/0x110 [btrfs] [] read_tree_block+0x2d/0x50 [btrfs] [] read_block_for_search.isra.33+0x134/0x330 [btrfs] [] ? _raw_write_unlock+0x2c/0x50 [] ? unlock_up+0x16c/0x1a0 [btrfs] [] btrfs_search_slot+0x450/0xa40 [btrfs] [] btrfs_del_csums+0xe3/0x2e0 [btrfs] [] __btrfs_free_extent.isra.82+0x32d/0xc90 [btrfs] [] __btrfs_run_delayed_refs+0x4d3/0x1010 [btrfs] [] ? debug_smp_processor_id+0x17/0x20 [] ? get_lock_stats+0x19/0x50 [] btrfs_run_delayed_refs+0x9c/0x2d0 [btrfs] [] btrfs_truncate_inode_items+0x888/0xda0 [btrfs] [] btrfs_truncate+0xe5/0x2b0 [btrfs] [] btrfs_setattr+0x249/0x360 [btrfs] [] notify_change+0x252/0x440 [] do_truncate+0x6e/0xc0 [] do_sys_ftruncate.constprop.19+0x10c/0x170 [] ? __this_cpu_preempt_check+0x13/0x20 [] SyS_ftruncate+0x9/0x10 [] do_syscall_64+0x5c/0x170 [] entry_SYSCALL64_slow_path+0x25/0x25 --[ end trace 906673a2f703b373 ]---
Re: btrfs bio linked list corruption.
On 10/11/2016 11:19 AM, Dave Jones wrote: On Tue, Oct 11, 2016 at 04:11:39PM +0100, Al Viro wrote: > On Tue, Oct 11, 2016 at 10:45:08AM -0400, Dave Jones wrote: > > This is from Linus' current tree, with Al's iovec fixups on top. > > Those iovec fixups are in the current tree... ah yeah, git quietly dropped my local copy when I rebased so I didn't notice. > TBH, I don't see anything > in splice-related stuff that could come anywhere near that (short of > some general memory corruption having random effects of that sort). > > Could you try to bisect that sucker, or is it too hard to reproduce? Only hit it the once overnight so far. Will see if I can find a better way to reproduce today. This call trace is reading metadata so we can finish the truncate. I'd say adding more memory pressure would make it happen more often. I'll try to trigger. -chris
Re: btrfs bio linked list corruption.
On Tue, Oct 11, 2016 at 04:11:39PM +0100, Al Viro wrote: > On Tue, Oct 11, 2016 at 10:45:08AM -0400, Dave Jones wrote: > > This is from Linus' current tree, with Al's iovec fixups on top. > > Those iovec fixups are in the current tree... ah yeah, git quietly dropped my local copy when I rebased so I didn't notice. > TBH, I don't see anything > in splice-related stuff that could come anywhere near that (short of > some general memory corruption having random effects of that sort). > > Could you try to bisect that sucker, or is it too hard to reproduce? Only hit it the once overnight so far. Will see if I can find a better way to reproduce today. Dave
Re: btrfs bio linked list corruption.
On Tue, Oct 11, 2016 at 10:45:08AM -0400, Dave Jones wrote: > This is from Linus' current tree, with Al's iovec fixups on top. Those iovec fixups are in the current tree... TBH, I don't see anything in splice-related stuff that could come anywhere near that (short of some general memory corruption having random effects of that sort). Could you try to bisect that sucker, or is it too hard to reproduce?