Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c
On Tue, Jun 30, 2015 at 03:23:00PM -0400, Josef Bacik wrote: > On 06/30/2015 03:20 PM, Dave Jones wrote: > > On Wed, Jun 17, 2015 at 09:35:41AM -0400, Dave Jones wrote: > > > > > page:ea00027cc640 count:4 mapcount:0 mapping:8800af11d8a0 > > index:0x0 > > > flags: 0x4846(error|referenced|active|private) > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > > > [ cut here ] > > > kernel BUG at mm/filemap.c:745! > > > > Still occasionally bumping into this. > > The 'count:4 mapcount:0' is constant in every instance I've seen > > so far. Could that be a clue ? > > > > I've seen various page flags, but it's always !locked > > > > Ideas on additional debugging I could add ? > > Huh I just noticed that PG_Error seems to be set, is that the same for > every time? I wonder where that's getting set and why. I'll dig into > the areas we set that and see if I can spot anything. Thanks, Seems to be set every time yeah. I can try annotating those places that set it to see which one is triggering when I get home. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c
On 06/30/2015 03:20 PM, Dave Jones wrote: On Wed, Jun 17, 2015 at 09:35:41AM -0400, Dave Jones wrote: > page:ea00027cc640 count:4 mapcount:0 mapping:8800af11d8a0 index:0x0 > flags: 0x4846(error|referenced|active|private) > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > [ cut here ] > kernel BUG at mm/filemap.c:745! > invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC > CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2 > task: 8800b9ec ti: 8800843ec000 task.ti: 8800843ec000 > RIP: 0010:[] [] unlock_page+0x7c/0x80 > RSP: 0018:8800843efa58 EFLAGS: 00010292 > RAX: 0036 RBX: 1000 RCX: > RDX: 8000 RSI: b20c80c9 RDI: b20c7ce4 > RBP: 8800843efa58 R08: 0001 R09: 0d1d > R10: 037c R11: 0001 R12: ea00027cc640 > R13: R14: 0fff R15: > FS: 7fc9c42b5700() GS:8800bf70() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 0008 CR3: 50978000 CR4: 07e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0600 > Stack: > 8800843efb68 c02d06ec 0fff 10080008 > 8800af11d548 8800843efab8 0fff > 88009f319000 8800843efc08 8800af11d728 > Call Trace: > [] __do_readpage+0x61c/0x7c0 [btrfs] > [] ? lock_extent_bits+0x83/0x2e0 [btrfs] > [] ? get_parent_ip+0x11/0x50 > [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] > [] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs] > [] __extent_read_full_page+0xc5/0xe0 [btrfs] > [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] > [] extent_read_full_page+0x37/0x60 [btrfs] > [] btrfs_readpage+0x25/0x30 [btrfs] > [] prepare_uptodate_page+0x4a/0x90 [btrfs] > [] prepare_pages+0x101/0x190 [btrfs] > [] __btrfs_buffered_write+0x1d3/0x650 [btrfs] > [] btrfs_file_write_iter+0x463/0x570 [btrfs] > [] ? bad_area+0x4a/0x60 > [] __vfs_write+0xb1/0xf0 > [] vfs_write+0xa9/0x1b0 > [] SyS_pwrite64+0x72/0xb0 > [] ? syscall_trace_enter_phase2+0x220/0x260 > [] ? syscall_trace_leave+0x95/0x140 > [] tracesys_phase2+0x84/0x89 > Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d c3 0f 1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f > RIP [] unlock_page+0x7c/0x80 Still occasionally bumping into this. The 'count:4 mapcount:0' is constant in every instance I've seen so far. Could that be a clue ? I've seen various page flags, but it's always !locked Ideas on additional debugging I could add ? Huh I just noticed that PG_Error seems to be set, is that the same for every time? I wonder where that's getting set and why. I'll dig into the areas we set that and see if I can spot anything. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c
On Wed, Jun 17, 2015 at 09:35:41AM -0400, Dave Jones wrote: > page:ea00027cc640 count:4 mapcount:0 mapping:8800af11d8a0 index:0x0 > flags: 0x4846(error|referenced|active|private) > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > [ cut here ] > kernel BUG at mm/filemap.c:745! > invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC > CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2 > task: 8800b9ec ti: 8800843ec000 task.ti: 8800843ec000 > RIP: 0010:[] [] unlock_page+0x7c/0x80 > RSP: 0018:8800843efa58 EFLAGS: 00010292 > RAX: 0036 RBX: 1000 RCX: > RDX: 8000 RSI: b20c80c9 RDI: b20c7ce4 > RBP: 8800843efa58 R08: 0001 R09: 0d1d > R10: 037c R11: 0001 R12: ea00027cc640 > R13: R14: 0fff R15: > FS: 7fc9c42b5700() GS:8800bf70() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 0008 CR3: 50978000 CR4: 07e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0600 > Stack: > 8800843efb68 c02d06ec 0fff 10080008 > 8800af11d548 8800843efab8 0fff > 88009f319000 8800843efc08 8800af11d728 > Call Trace: > [] __do_readpage+0x61c/0x7c0 [btrfs] > [] ? lock_extent_bits+0x83/0x2e0 [btrfs] > [] ? get_parent_ip+0x11/0x50 > [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] > [] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs] > [] __extent_read_full_page+0xc5/0xe0 [btrfs] > [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] > [] extent_read_full_page+0x37/0x60 [btrfs] > [] btrfs_readpage+0x25/0x30 [btrfs] > [] prepare_uptodate_page+0x4a/0x90 [btrfs] > [] prepare_pages+0x101/0x190 [btrfs] > [] __btrfs_buffered_write+0x1d3/0x650 [btrfs] > [] btrfs_file_write_iter+0x463/0x570 [btrfs] > [] ? bad_area+0x4a/0x60 > [] __vfs_write+0xb1/0xf0 > [] vfs_write+0xa9/0x1b0 > [] SyS_pwrite64+0x72/0xb0 > [] ? syscall_trace_enter_phase2+0x220/0x260 > [] ? syscall_trace_leave+0x95/0x140 > [] tracesys_phase2+0x84/0x89 > Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d > c3 0f 1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 > 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f > RIP [] unlock_page+0x7c/0x80 Still occasionally bumping into this. The 'count:4 mapcount:0' is constant in every instance I've seen so far. Could that be a clue ? I've seen various page flags, but it's always !locked Ideas on additional debugging I could add ? Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c
On Tue, Jun 16, 2015 at 01:19:20PM -0400, Chris Mason wrote: > On 06/16/2015 01:14 PM, David Sterba wrote: > > On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote: > >> On 06/10/2015 09:40 AM, Dave Jones wrote: > >>> Found this on serial console this morning. The machine had rebooted > >>> itself shortly > >>> afterwards (surprising, given I don't have panic-on-oops or similar set). > >> > >> We had one other report of this a few months ago. Josef and I read > >> through all of this and decided it was impossible, so someone else must > >> be holding on to that page and unlocking it. > >> > >> (that someone else could easily be btrfs, just not in this code path) > > > > https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug > > symptoms match the "keywords", I haven't inspected it closely. > > > > That one is in my integration-4.2 branch if you want to give it a shot. I was sceptical about this being the same bug, and it looks like I was right.. page:ea00027cc640 count:4 mapcount:0 mapping:8800af11d8a0 index:0x0 flags: 0x4846(error|referenced|active|private) page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) [ cut here ] kernel BUG at mm/filemap.c:745! invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2 task: 8800b9ec ti: 8800843ec000 task.ti: 8800843ec000 RIP: 0010:[] [] unlock_page+0x7c/0x80 RSP: 0018:8800843efa58 EFLAGS: 00010292 RAX: 0036 RBX: 1000 RCX: RDX: 8000 RSI: b20c80c9 RDI: b20c7ce4 RBP: 8800843efa58 R08: 0001 R09: 0d1d R10: 037c R11: 0001 R12: ea00027cc640 R13: R14: 0fff R15: FS: 7fc9c42b5700() GS:8800bf70() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0008 CR3: 50978000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: 8800843efb68 c02d06ec 0fff 10080008 8800af11d548 8800843efab8 0fff 88009f319000 8800843efc08 8800af11d728 Call Trace: [] __do_readpage+0x61c/0x7c0 [btrfs] [] ? lock_extent_bits+0x83/0x2e0 [btrfs] [] ? get_parent_ip+0x11/0x50 [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] [] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs] [] __extent_read_full_page+0xc5/0xe0 [btrfs] [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] [] extent_read_full_page+0x37/0x60 [btrfs] [] btrfs_readpage+0x25/0x30 [btrfs] [] prepare_uptodate_page+0x4a/0x90 [btrfs] [] prepare_pages+0x101/0x190 [btrfs] [] __btrfs_buffered_write+0x1d3/0x650 [btrfs] [] btrfs_file_write_iter+0x463/0x570 [btrfs] [] ? bad_area+0x4a/0x60 [] __vfs_write+0xb1/0xf0 [] vfs_write+0xa9/0x1b0 [] SyS_pwrite64+0x72/0xb0 [] ? syscall_trace_enter_phase2+0x220/0x260 [] ? syscall_trace_leave+0x95/0x140 [] tracesys_phase2+0x84/0x89 Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d c3 0f 1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f RIP [] unlock_page+0x7c/0x80 Still haven't managed to narrow down a reproducer, but it shows up consistently within 6 hrs or so of fuzzing. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c
On 06/16/2015 01:14 PM, David Sterba wrote: > On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote: >> On 06/10/2015 09:40 AM, Dave Jones wrote: >>> Found this on serial console this morning. The machine had rebooted itself >>> shortly >>> afterwards (surprising, given I don't have panic-on-oops or similar set). >>> >> >> We had one other report of this a few months ago. Josef and I read >> through all of this and decided it was impossible, so someone else must >> be holding on to that page and unlocking it. >> >> (that someone else could easily be btrfs, just not in this code path) > > https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug > symptoms match the "keywords", I haven't inspected it closely. > That one is in my integration-4.2 branch if you want to give it a shot. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c
On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote: > On 06/10/2015 09:40 AM, Dave Jones wrote: > > Found this on serial console this morning. The machine had rebooted itself > > shortly > > afterwards (surprising, given I don't have panic-on-oops or similar set). > > > > We had one other report of this a few months ago. Josef and I read > through all of this and decided it was impossible, so someone else must > be holding on to that page and unlocking it. > > (that someone else could easily be btrfs, just not in this code path) https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug symptoms match the "keywords", I haven't inspected it closely. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c
On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote: > On 06/10/2015 09:40 AM, Dave Jones wrote: > > Found this on serial console this morning. The machine had rebooted itself > > shortly > > afterwards (surprising, given I don't have panic-on-oops or similar set). > > > > We had one other report of this a few months ago. Josef and I read > through all of this and decided it was impossible, so someone else must > be holding on to that page and unlocking it. > > (that someone else could easily be btrfs, just not in this code path) > > so...what horrible things have you been up to? Not sure exactly. I'll try and dig in some when I get home tonight. I do seem to be able to reproduce it fairly easily at least. (Twice this morning). Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c
On 06/10/2015 09:40 AM, Dave Jones wrote: > Found this on serial console this morning. The machine had rebooted itself > shortly > afterwards (surprising, given I don't have panic-on-oops or similar set). > We had one other report of this a few months ago. Josef and I read through all of this and decided it was impossible, so someone else must be holding on to that page and unlocking it. (that someone else could easily be btrfs, just not in this code path) so...what horrible things have you been up to? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[4.1-rc7] btrfs related VM_BUG_ON in filemap.c
Found this on serial console this morning. The machine had rebooted itself shortly afterwards (surprising, given I don't have panic-on-oops or similar set). Dave page:ea0002b0a040 count:4 mapcount:0 mapping:8800abf76ad0 index:0x0 flags: 0x4806(error|referenced|private) page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) [ cut here ] kernel BUG at mm/filemap.c:745! invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 1 PID: 32187 Comm: trinity-c3 Not tainted 4.1.0-rc7-gelk-debug+ #4 task: 8800b6bd0a50 ti: 8800abf5 task.ti: 8800abf5 RIP: 0010:[] [] unlock_page+0x7c/0x80 RSP: :8800abf53a58 EFLAGS: 00010292 RAX: 0036 RBX: 1000 RCX: RDX: RSI: b00c9e29 RDI: b00c9a44 RBP: 8800abf53a58 R08: 0001 R09: 07f9 R10: 0478 R11: 8800bb20e848 R12: ea0002b0a040 R13: R14: 0fff R15: FS: 7f9c1ea80700() GS:8800bf70() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f3ca086f850 CR3: a8ceb000 CR4: 07e0 DR0: 7f1c6d7b DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: 8800abf53b68 c015c5ec 0fff 10080008 8800abf76778 8800abf53ab8 0fff 8800ac281000 8800abf53c08 8800abf76958 Call Trace: [] __do_readpage+0x61c/0x7c0 [btrfs] [] ? lock_extent_bits+0x83/0x2e0 [btrfs] [] ? get_parent_ip+0x11/0x50 [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] [] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs] [] __extent_read_full_page+0xc5/0xe0 [btrfs] [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] [] extent_read_full_page+0x37/0x60 [btrfs] [] btrfs_readpage+0x25/0x30 [btrfs] [] prepare_uptodate_page+0x4a/0x90 [btrfs] [] prepare_pages+0x101/0x190 [btrfs] [] __btrfs_buffered_write+0x1d3/0x650 [btrfs] [] btrfs_file_write_iter+0x463/0x570 [btrfs] [] __vfs_write+0xb1/0xf0 [] vfs_write+0xa9/0x1b0 [] ? mutex_lock+0x2c/0x40 [] SyS_write+0x49/0xb0 [] ? context_tracking_user_enter+0x13/0x20 [] ? syscall_trace_leave+0x95/0x140 [] system_call_fastpath+0x12/0x6a Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 59 ab f4 ff 5d c3 0f 1f 80 00 00 00 00 48 c7 c6 10 f4 a2 b0 e8 14 87 02 00 <0f> 0b 66 90 66 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/