Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

2015-06-30 Thread Dave Jones
On Tue, Jun 30, 2015 at 03:23:00PM -0400, Josef Bacik wrote:
 > On 06/30/2015 03:20 PM, Dave Jones wrote:
 > > On Wed, Jun 17, 2015 at 09:35:41AM -0400, Dave Jones wrote:
 > >
 > >   > page:ea00027cc640 count:4 mapcount:0 mapping:8800af11d8a0 
 > > index:0x0
 > >   > flags: 0x4846(error|referenced|active|private)
 > >   > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
 > >   > [ cut here ]
 > >   > kernel BUG at mm/filemap.c:745!
 > > 
 > > Still occasionally bumping into this.
 > > The 'count:4 mapcount:0' is constant in every instance I've seen
 > > so far. Could that be a clue ?
 > >
 > > I've seen various page flags, but it's always !locked
 > >
 > > Ideas on additional debugging I could add ?
 > 
 > Huh I just noticed that PG_Error seems to be set, is that the same for 
 > every time?  I wonder where that's getting set and why.  I'll dig into 
 > the areas we set that and see if I can spot anything.  Thanks,

Seems to be set every time yeah.
I can try annotating those places that set it to see which one is triggering
when I get home.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

2015-06-30 Thread Josef Bacik

On 06/30/2015 03:20 PM, Dave Jones wrote:

On Wed, Jun 17, 2015 at 09:35:41AM -0400, Dave Jones wrote:

  > page:ea00027cc640 count:4 mapcount:0 mapping:8800af11d8a0 index:0x0
  > flags: 0x4846(error|referenced|active|private)
  > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
  > [ cut here ]
  > kernel BUG at mm/filemap.c:745!
  > invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
  > CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2
  > task: 8800b9ec ti: 8800843ec000 task.ti: 8800843ec000
  > RIP: 0010:[]  [] unlock_page+0x7c/0x80
  > RSP: 0018:8800843efa58  EFLAGS: 00010292
  > RAX: 0036 RBX: 1000 RCX: 
  > RDX: 8000 RSI: b20c80c9 RDI: b20c7ce4
  > RBP: 8800843efa58 R08: 0001 R09: 0d1d
  > R10: 037c R11: 0001 R12: ea00027cc640
  > R13:  R14: 0fff R15: 
  > FS:  7fc9c42b5700() GS:8800bf70() knlGS:
  > CS:  0010 DS:  ES:  CR0: 80050033
  > CR2: 0008 CR3: 50978000 CR4: 07e0
  > DR0:  DR1:  DR2: 
  > DR3:  DR6: 0ff0 DR7: 0600
  > Stack:
  >  8800843efb68 c02d06ec 0fff 10080008
  >  8800af11d548  8800843efab8 0fff
  >   88009f319000 8800843efc08 8800af11d728
  > Call Trace:
  >  [] __do_readpage+0x61c/0x7c0 [btrfs]
  >  [] ? lock_extent_bits+0x83/0x2e0 [btrfs]
  >  [] ? get_parent_ip+0x11/0x50
  >  [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
  >  [] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs]
  >  [] __extent_read_full_page+0xc5/0xe0 [btrfs]
  >  [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
  >  [] extent_read_full_page+0x37/0x60 [btrfs]
  >  [] btrfs_readpage+0x25/0x30 [btrfs]
  >  [] prepare_uptodate_page+0x4a/0x90 [btrfs]
  >  [] prepare_pages+0x101/0x190 [btrfs]
  >  [] __btrfs_buffered_write+0x1d3/0x650 [btrfs]
  >  [] btrfs_file_write_iter+0x463/0x570 [btrfs]
  >  [] ? bad_area+0x4a/0x60
  >  [] __vfs_write+0xb1/0xf0
  >  [] vfs_write+0xa9/0x1b0
  >  [] SyS_pwrite64+0x72/0xb0
  >  [] ? syscall_trace_enter_phase2+0x220/0x260
  >  [] ? syscall_trace_leave+0x95/0x140
  >  [] tracesys_phase2+0x84/0x89
  > Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d c3 0f 
1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 66 66 66 90 
55 85 f6 48 89 e5 75 13 85 d2 74 3f
  > RIP  [] unlock_page+0x7c/0x80

Still occasionally bumping into this.
The 'count:4 mapcount:0' is constant in every instance I've seen
so far. Could that be a clue ?

I've seen various page flags, but it's always !locked

Ideas on additional debugging I could add ?



Huh I just noticed that PG_Error seems to be set, is that the same for 
every time?  I wonder where that's getting set and why.  I'll dig into 
the areas we set that and see if I can spot anything.  Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

2015-06-30 Thread Dave Jones
On Wed, Jun 17, 2015 at 09:35:41AM -0400, Dave Jones wrote:

 > page:ea00027cc640 count:4 mapcount:0 mapping:8800af11d8a0 index:0x0
 > flags: 0x4846(error|referenced|active|private)
 > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
 > [ cut here ]
 > kernel BUG at mm/filemap.c:745!
 > invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC 
 > CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2
 > task: 8800b9ec ti: 8800843ec000 task.ti: 8800843ec000
 > RIP: 0010:[]  [] unlock_page+0x7c/0x80
 > RSP: 0018:8800843efa58  EFLAGS: 00010292
 > RAX: 0036 RBX: 1000 RCX: 
 > RDX: 8000 RSI: b20c80c9 RDI: b20c7ce4
 > RBP: 8800843efa58 R08: 0001 R09: 0d1d
 > R10: 037c R11: 0001 R12: ea00027cc640
 > R13:  R14: 0fff R15: 
 > FS:  7fc9c42b5700() GS:8800bf70() knlGS:
 > CS:  0010 DS:  ES:  CR0: 80050033
 > CR2: 0008 CR3: 50978000 CR4: 07e0
 > DR0:  DR1:  DR2: 
 > DR3:  DR6: 0ff0 DR7: 0600
 > Stack:
 >  8800843efb68 c02d06ec 0fff 10080008
 >  8800af11d548  8800843efab8 0fff
 >   88009f319000 8800843efc08 8800af11d728
 > Call Trace:
 >  [] __do_readpage+0x61c/0x7c0 [btrfs]
 >  [] ? lock_extent_bits+0x83/0x2e0 [btrfs]
 >  [] ? get_parent_ip+0x11/0x50
 >  [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
 >  [] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs]
 >  [] __extent_read_full_page+0xc5/0xe0 [btrfs]
 >  [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
 >  [] extent_read_full_page+0x37/0x60 [btrfs]
 >  [] btrfs_readpage+0x25/0x30 [btrfs]
 >  [] prepare_uptodate_page+0x4a/0x90 [btrfs]
 >  [] prepare_pages+0x101/0x190 [btrfs]
 >  [] __btrfs_buffered_write+0x1d3/0x650 [btrfs]
 >  [] btrfs_file_write_iter+0x463/0x570 [btrfs]
 >  [] ? bad_area+0x4a/0x60
 >  [] __vfs_write+0xb1/0xf0
 >  [] vfs_write+0xa9/0x1b0
 >  [] SyS_pwrite64+0x72/0xb0
 >  [] ? syscall_trace_enter_phase2+0x220/0x260
 >  [] ? syscall_trace_leave+0x95/0x140
 >  [] tracesys_phase2+0x84/0x89
 > Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d 
 > c3 0f 1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 
 > 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f 
 > RIP  [] unlock_page+0x7c/0x80

Still occasionally bumping into this.
The 'count:4 mapcount:0' is constant in every instance I've seen
so far. Could that be a clue ?

I've seen various page flags, but it's always !locked

Ideas on additional debugging I could add ?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

2015-06-17 Thread Dave Jones
On Tue, Jun 16, 2015 at 01:19:20PM -0400, Chris Mason wrote:
 > On 06/16/2015 01:14 PM, David Sterba wrote:
 > > On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote:
 > >> On 06/10/2015 09:40 AM, Dave Jones wrote:
 > >>> Found this on serial console this morning. The machine had rebooted 
 > >>> itself shortly
 > >>> afterwards (surprising, given I don't have panic-on-oops or similar set).
 > >>
 > >> We had one other report of this a few months ago.  Josef and I read
 > >> through all of this and decided it was impossible, so someone else must
 > >> be holding on to that page and unlocking it.
 > >>
 > >> (that someone else could easily be btrfs, just not in this code path)
 > > 
 > > https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug
 > > symptoms match the "keywords", I haven't inspected it closely.
 > > 
 > 
 > That one is in my integration-4.2 branch if you want to give it a shot.

I was sceptical about this being the same bug, and it looks like I was right..

page:ea00027cc640 count:4 mapcount:0 mapping:8800af11d8a0 index:0x0
flags: 0x4846(error|referenced|active|private)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
[ cut here ]
kernel BUG at mm/filemap.c:745!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC 
CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2
task: 8800b9ec ti: 8800843ec000 task.ti: 8800843ec000
RIP: 0010:[]  [] unlock_page+0x7c/0x80
RSP: 0018:8800843efa58  EFLAGS: 00010292
RAX: 0036 RBX: 1000 RCX: 
RDX: 8000 RSI: b20c80c9 RDI: b20c7ce4
RBP: 8800843efa58 R08: 0001 R09: 0d1d
R10: 037c R11: 0001 R12: ea00027cc640
R13:  R14: 0fff R15: 
FS:  7fc9c42b5700() GS:8800bf70() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0008 CR3: 50978000 CR4: 07e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Stack:
 8800843efb68 c02d06ec 0fff 10080008
 8800af11d548  8800843efab8 0fff
  88009f319000 8800843efc08 8800af11d728
Call Trace:
 [] __do_readpage+0x61c/0x7c0 [btrfs]
 [] ? lock_extent_bits+0x83/0x2e0 [btrfs]
 [] ? get_parent_ip+0x11/0x50
 [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
 [] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs]
 [] __extent_read_full_page+0xc5/0xe0 [btrfs]
 [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
 [] extent_read_full_page+0x37/0x60 [btrfs]
 [] btrfs_readpage+0x25/0x30 [btrfs]
 [] prepare_uptodate_page+0x4a/0x90 [btrfs]
 [] prepare_pages+0x101/0x190 [btrfs]
 [] __btrfs_buffered_write+0x1d3/0x650 [btrfs]
 [] btrfs_file_write_iter+0x463/0x570 [btrfs]
 [] ? bad_area+0x4a/0x60
 [] __vfs_write+0xb1/0xf0
 [] vfs_write+0xa9/0x1b0
 [] SyS_pwrite64+0x72/0xb0
 [] ? syscall_trace_enter_phase2+0x220/0x260
 [] ? syscall_trace_leave+0x95/0x140
 [] tracesys_phase2+0x84/0x89
Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d c3 
0f 1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 66 66 
66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f 
RIP  [] unlock_page+0x7c/0x80



Still haven't managed to narrow down a reproducer, but it shows up
consistently within 6 hrs or so of fuzzing.

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

2015-06-16 Thread Chris Mason
On 06/16/2015 01:14 PM, David Sterba wrote:
> On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote:
>> On 06/10/2015 09:40 AM, Dave Jones wrote:
>>> Found this on serial console this morning. The machine had rebooted itself 
>>> shortly
>>> afterwards (surprising, given I don't have panic-on-oops or similar set).
>>>
>>
>> We had one other report of this a few months ago.  Josef and I read
>> through all of this and decided it was impossible, so someone else must
>> be holding on to that page and unlocking it.
>>
>> (that someone else could easily be btrfs, just not in this code path)
> 
> https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug
> symptoms match the "keywords", I haven't inspected it closely.
> 

That one is in my integration-4.2 branch if you want to give it a shot.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

2015-06-16 Thread David Sterba
On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote:
> On 06/10/2015 09:40 AM, Dave Jones wrote:
> > Found this on serial console this morning. The machine had rebooted itself 
> > shortly
> > afterwards (surprising, given I don't have panic-on-oops or similar set).
> > 
> 
> We had one other report of this a few months ago.  Josef and I read
> through all of this and decided it was impossible, so someone else must
> be holding on to that page and unlocking it.
> 
> (that someone else could easily be btrfs, just not in this code path)

https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug
symptoms match the "keywords", I haven't inspected it closely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

2015-06-10 Thread Dave Jones
On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote:
 > On 06/10/2015 09:40 AM, Dave Jones wrote:
 > > Found this on serial console this morning. The machine had rebooted itself 
 > > shortly
 > > afterwards (surprising, given I don't have panic-on-oops or similar set).
 > > 
 > 
 > We had one other report of this a few months ago.  Josef and I read
 > through all of this and decided it was impossible, so someone else must
 > be holding on to that page and unlocking it.
 > 
 > (that someone else could easily be btrfs, just not in this code path)
 > 
 > so...what horrible things have you been up to?

Not sure exactly. I'll try and dig in some when I get home tonight.
I do seem to be able to reproduce it fairly easily at least.
(Twice this morning).

Dave


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

2015-06-10 Thread Chris Mason
On 06/10/2015 09:40 AM, Dave Jones wrote:
> Found this on serial console this morning. The machine had rebooted itself 
> shortly
> afterwards (surprising, given I don't have panic-on-oops or similar set).
> 

We had one other report of this a few months ago.  Josef and I read
through all of this and decided it was impossible, so someone else must
be holding on to that page and unlocking it.

(that someone else could easily be btrfs, just not in this code path)

so...what horrible things have you been up to?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[4.1-rc7] btrfs related VM_BUG_ON in filemap.c

2015-06-10 Thread Dave Jones
Found this on serial console this morning. The machine had rebooted itself 
shortly
afterwards (surprising, given I don't have panic-on-oops or similar set).

Dave

page:ea0002b0a040 count:4 mapcount:0 mapping:8800abf76ad0 index:0x0
flags: 0x4806(error|referenced|private)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
[ cut here ]
kernel BUG at mm/filemap.c:745!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC 
CPU: 1 PID: 32187 Comm: trinity-c3 Not tainted 4.1.0-rc7-gelk-debug+ #4
task: 8800b6bd0a50 ti: 8800abf5 task.ti: 8800abf5
RIP: 0010:[]  [] unlock_page+0x7c/0x80
RSP: :8800abf53a58  EFLAGS: 00010292
RAX: 0036 RBX: 1000 RCX: 
RDX:  RSI: b00c9e29 RDI: b00c9a44
RBP: 8800abf53a58 R08: 0001 R09: 07f9
R10: 0478 R11: 8800bb20e848 R12: ea0002b0a040
R13:  R14: 0fff R15: 
FS:  7f9c1ea80700() GS:8800bf70() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f3ca086f850 CR3: a8ceb000 CR4: 07e0
DR0: 7f1c6d7b DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Stack:
 8800abf53b68 c015c5ec 0fff 10080008
 8800abf76778  8800abf53ab8 0fff
  8800ac281000 8800abf53c08 8800abf76958
Call Trace:
 [] __do_readpage+0x61c/0x7c0 [btrfs]
 [] ? lock_extent_bits+0x83/0x2e0 [btrfs]
 [] ? get_parent_ip+0x11/0x50
 [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
 [] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs]
 [] __extent_read_full_page+0xc5/0xe0 [btrfs]
 [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
 [] extent_read_full_page+0x37/0x60 [btrfs]
 [] btrfs_readpage+0x25/0x30 [btrfs]
 [] prepare_uptodate_page+0x4a/0x90 [btrfs]
 [] prepare_pages+0x101/0x190 [btrfs]
 [] __btrfs_buffered_write+0x1d3/0x650 [btrfs]
 [] btrfs_file_write_iter+0x463/0x570 [btrfs]
 [] __vfs_write+0xb1/0xf0
 [] vfs_write+0xa9/0x1b0
 [] ? mutex_lock+0x2c/0x40
 [] SyS_write+0x49/0xb0
 [] ? context_tracking_user_enter+0x13/0x20
 [] ? syscall_trace_leave+0x95/0x140
 [] system_call_fastpath+0x12/0x6a
Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 59 ab f4 ff 5d c3 
0f 1f 80 00 00 00 00 48 c7 c6 10 f4 a2 b0 e8 14 87 02 00 <0f> 0b 66 90 66 66 66 
66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/