On 2018年08月06日 14:07, James Courtier-Dutton wrote: > > > On Mon, 6 Aug 2018, 01:48 Qu Wenruo, <quwenruo.bt...@gmx.com > <mailto:quwenruo.bt...@gmx.com>> wrote: > > > > On 2018年08月05日 22:09, James Courtier-Dutton wrote: > > I am seeing a server halt and require a manual restart that I think > > might be related to btrfs. > > I attach the kernel log from it, in the hope that someone will > > understand it better than me. > > Any clues? > > > > https://paste.fedoraproject.org/paste/xSblK1RKANiwhKHQj31Cdw > <https://paste.fedoraproject.org/paste/xSblK1RKANiwhKHQj31Cdw> > > When checking the dmesg, we should check for the first backtrace other > than later backtraces, which are normally caused by the first problem. > > In your case, it's general protection happens at timerqueue_add(), which > is unrelated to btrfs at all. > > > I forgot to mention. This log is via netconsole as nothing is seen in > the local logs. As such it seems to jumble up the logs a bit. > The first bad bit happens here:
Sorry, my fault. Then it makes sense why a lot of messages looks interrupted. > > WARNING: CPU: 3 PID: 803 at > /build/linux-hwe-SYRsgd/linux-hwe-4.15.0/fs/btrfs/extent_map.c:77 > free_extent_map+0x78/0x90 [btrfs] Then it makes sense, as it's a WARN_ON() line, showing one extent map is still used. If it get freed, it will definitely cause some rbtree corruption. > > But the associated stack trace is not the immediately one below it. The > associated stack trace is: > Jul 30 03:48:20 mum [453835.199561] RIP: 0010:frde_extent_map+0x78/0x90 > [btrfs] > Jul 30 03:48:20 mum [453835.199564] RSP: 0000:ffffb548417cb5d0 EFLAGS: > 00010296 > Jul 30 03:48:20 mum [453835.199568] RAX: 0000000000000001 RBX: > fffffffffffffffe RCX: ffff960c0aa20000 > Jul 30 03:48:20 mum [453835.199570] RDX: fffffffffffff000 RSI: > 00000912e6968000 RDI: ffff960c0aa20000 > Jul 30 03:48:20 mum [453835.199572] RBP: ffffb548417cb5d8 R08: > 0000000000000000 R09: 0000000000000000 > Jul 30 03:48:20 mum [453835.199572] R10: ffff960b584a2690 R11: > 0000000000000000 R12: 0000000000000000 > Jul 30 03:48:20 mum [453835.199575] R13: 0000000000000000 R14: > 00000912e6968000 R15: 0000000000001000 > Jul 30 03:48:20 mum [453835.199578] FS: 00007f1b4b92f700(0000) > GS:ffff960c1fd80000(0000) knlGS:0000000000000000 > Jul 30 03:48:20 mum [453835.199579] CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Jul 30 03:48:20 mum [453835.199582] CR2: 00007f1b4c96489f CR3: > 0000000213d42000 CR4: 00000000000006e0 > Jul 30 03:48:20 mum [453835.199584] Call Trace: > Jul 30 03:48:20 mum [453835.199607] __do_readpage+0x2bb/0x850 [btrfs] It's should be the only free_extent_map() call in __do_readpage() function. However a first glance into this function shows nothing wrong, nor new modification in this function. (Maybe it's the get_extent() hook?) Is there any reproducer? Or special workload? And, have you tried "btrfs check --readonly <device>"? If there is any error it would help a lot to locate the problem. Thanks, Qu > Jul 30 03:48:20 mum [453835.199625] ? free_root_pointers+0x70/0x70 [btrfs] > Jul 30 03:48:20 mum [453835.199645] __extent_read_full_page+0xe1/0xf0 > [btrfs] > Jul 30 03:48:20 mum [453835.199663] ? free_root_pointers+0x70/0x70 [btrfs] > Jul 30 03:48:20 mum [453835.199683] > read_extent_buffer_pages+0x291/0x2c0 [btrfs] > Jul 30 03:48:20 mum [453835.199701] ? free_root_pointers+0x70/0x70 [btrfs] > Jul 30 03:48:20 mum [453835.199718] > btree_read_extent_buffer_pages+0x9d/0x100 [btrfs] > Jul 30 03:48:20 mum [453835.199736] read_tree_block+0x32/0x50 [btrfs] > Jul 30 03:48:20 mum [453835.199752] > read_block_for_search.isra.31+0x135/0x2f0 [btrfs] > Jul 30 03:48:20 mum [453835.199767] btrfs_search_slot+0x3a1/0x9d0 [btrfs] > Jul 30 03:48:20 mum [453835.199785] btrfs_lookup_csum+0x52/0x150 [btrfs] > Jul 30 03:48:20 mum [453835.199802] __btrfs_lookup_bio_sums+0x315/0x6b0 > [btrfs] > Jul 30 03:48:20 mum [453835.199820] ? btrfs_bio_wq_end_io+0x28/0x70 [btrfs] > Jul 30 03:48:20 mum [453835.199837] btrfs_lookup_bio_sums+0x16/0x20 [btrfs] > Jul 30 03:48:20 mum [453835.199856] btrfs_submit_bio_hook+0xc3/0x180 > [btrfs] > Jul 30 03:48:20 mum [453835.199877] submit_one_bio+0x64/0x90 [btrfs] > Jul 30 03:48:20 mum [453835.199897] extent_readpages+0x181/0x280 [btrfs] > Jul 30 03:48:20 mum [453835.199945] __do_page_cache_readahead+0x1be/0x2c0 > Jul 30 03:48:20 mum [453835.199949] filemap_fault+0x53e/0x750 > Jul 30 03:48:20 mum [453835.199952] ? filemap_fault+0x53e/0x750 > Jul 30 03:48:20 mum [453835.199953] ? filemap_map_pages+0x185/0x3f0 > Jul 30 03:48:20 mum [453835.199957] __do_fault+0x24/0xf0 > Jul 30 03:48:20 mum [453835.199960] __handle_mm_fault+0xd1b/0x11f0 > Jul 30 03:48:20 mum [453835.199963] ? __check_object_size+0xfc/0x1a0 > Jul 30 03:48:20 mum [453835.199966] handle_mm_fault+0xcc/0x1c0 > Jul 30 03:48:20 mum [453835.199971] __do_page_fault+0x260/0x500 > Jul 30 03:48:20 mum [453835.199974] do_page_fault+0x2e/0xf0 > Jul 30 03:48:20 mum [453835.199978] ? page_fault+0x2f/0x50 > Jul 30 03:48:20 mum [453835.199981] page_fatlt+0x45/0x50 > Jul 30 03:48:20 mum [453835.199984] RIP: 0033:0x7f1b4c96489f > Jul 30 03:48:20 mum [453835.199986] RSP: 002b:00007f1b4b90e590 EFLAGS: > 00010203 > Jul 30 03:48:20 mum [453835.199988] RAX: 00000000000000cf RBX: > 0000000000000000 RCX: 00007f1b4df8151d > Jul 30 03:48:20 mum [453835.199989] RDX: 00000000000000cf RSI: > 00007f1b4b90eda0 RDI: 0000000000000000 > Jul 30 03:48:20 mum [453835.199991] RBP: 000000000096c2c0 R08: > 0000000000000000 R09: 0000000004000001 > Jul 30 03:48:20 mum [453835.199994] R10: 0000000000000001 R11: > 0000000000000000 R12: 00007f1b4b90eda0 > Jul 30 03:48:20 mum [453835.199997] R13: 0000000000001fa0 R14: > 0000000000001f9f R15: 00007f1b4b90ee03 > > >
signature.asc
Description: OpenPGP digital signature