Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0

2019-08-24 Thread Sander Eikelenboom
On 17/08/2019 18:35, Eric Dumazet wrote:
> 
> 
> On 8/17/19 10:24 AM, Sander Eikelenboom wrote:
>> On 12/08/2019 19:56, Eric Dumazet wrote:
>>>
>>>
>>> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
 L.S.,

 While testing a somewhere-after-5.3-rc3 kernel (which included the latest 
 net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
 one of my Xen VM's (which gets quite some network load) crashed.
 See below for the stacktrace.

 Unfortunately I haven't got a clear trigger, so bisection doesn't seem to 
 be an option at the moment. 
 I haven't encountered this on 5.2, so it seems to be an regression against 
 5.2.

 Any ideas ?

 --
 Sander


 [16930.653595] general protection fault:  [#1] SMP NOPTI
 [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 
 5.3.0-rc3-20190809-doflr+ #1
 [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
 [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 
 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
 [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286
 [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
 801b
>>>
>>> crash in " mov0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid 
>>> kernel address)
>>>
>>> Look like one bit corruption maybe.
>>>
>>> Nothing comes to mind really between 5.2 and 53 that could explain this.
>>>
 [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 
 888016b00880
 [16930.653819] RBP: 888016b00880 R08: 0001 R09: 
 
 [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 
 05a0
 [16930.653875] R13: 0001 R14: bfe62d46 R15: 
 0004
 [16930.653913] FS:  7fe71fe2cb80() GS:88801f20() 
 knlGS:
 [16930.653943] CS:  0010 DS:  ES:  CR0: 80050033
 [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
 06f0
 [16930.653993] Call Trace:
 [16930.654005]  
 [16930.654018]  tcp_ack+0xbb0/0x1230
 [16930.654033]  tcp_rcv_established+0x2e8/0x630
 [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
 [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
 [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
 [16930.654109]  ip_local_deliver_finish+0x3f/0x50
 [16930.654128]  ip_local_deliver+0x4d/0xe0
 [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
 [16930.654163]  ip_rcv+0x4c/0xd0
 [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
 [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
 [16930.654219]  napi_gro_receive+0xe7/0x140
 [16930.654237]  xennet_poll+0x9be/0xae0
 [16930.654254]  net_rx_action+0x136/0x340
 [16930.654271]  __do_softirq+0xdd/0x2cf
 [16930.654287]  irq_exit+0x7a/0xa0
 [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
 [16930.654320]  xen_hvm_callback_vector+0xf/0x20
 [16930.654339]  
 [16930.654349] RIP: 0033:0x55de0d87db99
 [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 
 f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a 
 <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
 [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: 
 ff0c
 [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 
 007f
 [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 
 0002
 [16930.655062] RBP: 7fff R08: 80ea R09: 
 01f0
 [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 
 55de0f3e0f2a
 [16930.655116] R13: 0010 R14: 7f16 R15: 
 0080
 [16930.655144] Modules linked in:
 [16930.655200] ---[ end trace 533367c95501b645 ]---
 [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
 [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 
 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
 [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286
 [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
 801b
 [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: 
 888016b00880
 [16930.655387] RBP: 888016b00880 R08: 0001 R09: 
 
 [16930.655414] R10: 88800ae00800 R11: bfe632e6 R12: 
 05a0
 [16930.655441] R13: 0001 R14: bfe62d46 R15: 
 0004
 [16930.655475] FS:  7fe71fe2cb80() GS:88801f20() 
 

Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0

2019-08-17 Thread Eric Dumazet



On 8/17/19 10:24 AM, Sander Eikelenboom wrote:
> On 12/08/2019 19:56, Eric Dumazet wrote:
>>
>>
>> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
>>> L.S.,
>>>
>>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest 
>>> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
>>> one of my Xen VM's (which gets quite some network load) crashed.
>>> See below for the stacktrace.
>>>
>>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to 
>>> be an option at the moment. 
>>> I haven't encountered this on 5.2, so it seems to be an regression against 
>>> 5.2.
>>>
>>> Any ideas ?
>>>
>>> --
>>> Sander
>>>
>>>
>>> [16930.653595] general protection fault:  [#1] SMP NOPTI
>>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 
>>> 5.3.0-rc3-20190809-doflr+ #1
>>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
>>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 
>>> <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>>> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286
>>> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>>> 801b
>>
>> crash in " mov0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid 
>> kernel address)
>>
>> Look like one bit corruption maybe.
>>
>> Nothing comes to mind really between 5.2 and 53 that could explain this.
>>
>>> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 
>>> 888016b00880
>>> [16930.653819] RBP: 888016b00880 R08: 0001 R09: 
>>> 
>>> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 
>>> 05a0
>>> [16930.653875] R13: 0001 R14: bfe62d46 R15: 
>>> 0004
>>> [16930.653913] FS:  7fe71fe2cb80() GS:88801f20() 
>>> knlGS:
>>> [16930.653943] CS:  0010 DS:  ES:  CR0: 80050033
>>> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
>>> 06f0
>>> [16930.653993] Call Trace:
>>> [16930.654005]  
>>> [16930.654018]  tcp_ack+0xbb0/0x1230
>>> [16930.654033]  tcp_rcv_established+0x2e8/0x630
>>> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
>>> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
>>> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
>>> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
>>> [16930.654128]  ip_local_deliver+0x4d/0xe0
>>> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>>> [16930.654163]  ip_rcv+0x4c/0xd0
>>> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
>>> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
>>> [16930.654219]  napi_gro_receive+0xe7/0x140
>>> [16930.654237]  xennet_poll+0x9be/0xae0
>>> [16930.654254]  net_rx_action+0x136/0x340
>>> [16930.654271]  __do_softirq+0xdd/0x2cf
>>> [16930.654287]  irq_exit+0x7a/0xa0
>>> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
>>> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
>>> [16930.654339]  
>>> [16930.654349] RIP: 0033:0x55de0d87db99
>>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 
>>> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a 
>>> <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
>>> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: 
>>> ff0c
>>> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 
>>> 007f
>>> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 
>>> 0002
>>> [16930.655062] RBP: 7fff R08: 80ea R09: 
>>> 01f0
>>> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 
>>> 55de0f3e0f2a
>>> [16930.655116] R13: 0010 R14: 7f16 R15: 
>>> 0080
>>> [16930.655144] Modules linked in:
>>> [16930.655200] ---[ end trace 533367c95501b645 ]---
>>> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
>>> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 
>>> <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>>> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286
>>> [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>>> 801b
>>> [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: 
>>> 888016b00880
>>> [16930.655387] RBP: 888016b00880 R08: 0001 R09: 
>>> 
>>> [16930.655414] R10: 88800ae00800 R11: bfe632e6 R12: 
>>> 05a0
>>> [16930.655441] R13: 0001 R14: bfe62d46 R15: 
>>> 0004
>>> [16930.655475] FS:  7fe71fe2cb80() GS:88801f20() 
>>> knlGS:
>>> [16930.655502] CS:  0010 DS:  ES:  CR0: 80050033
>>> [16930.655525] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
>>> 

Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0

2019-08-17 Thread Sander Eikelenboom
On 12/08/2019 19:56, Eric Dumazet wrote:
> 
> 
> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
>> L.S.,
>>
>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest 
>> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
>> one of my Xen VM's (which gets quite some network load) crashed.
>> See below for the stacktrace.
>>
>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be 
>> an option at the moment. 
>> I haven't encountered this on 5.2, so it seems to be an regression against 
>> 5.2.
>>
>> Any ideas ?
>>
>> --
>> Sander
>>
>>
>> [16930.653595] general protection fault:  [#1] SMP NOPTI
>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 
>> 5.3.0-rc3-20190809-doflr+ #1
>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 
>> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286
>> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>> 801b
> 
> crash in " mov0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid 
> kernel address)
> 
> Look like one bit corruption maybe.
> 
> Nothing comes to mind really between 5.2 and 53 that could explain this.
> 
>> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 
>> 888016b00880
>> [16930.653819] RBP: 888016b00880 R08: 0001 R09: 
>> 
>> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 
>> 05a0
>> [16930.653875] R13: 0001 R14: bfe62d46 R15: 
>> 0004
>> [16930.653913] FS:  7fe71fe2cb80() GS:88801f20() 
>> knlGS:
>> [16930.653943] CS:  0010 DS:  ES:  CR0: 80050033
>> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
>> 06f0
>> [16930.653993] Call Trace:
>> [16930.654005]  
>> [16930.654018]  tcp_ack+0xbb0/0x1230
>> [16930.654033]  tcp_rcv_established+0x2e8/0x630
>> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
>> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
>> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
>> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
>> [16930.654128]  ip_local_deliver+0x4d/0xe0
>> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>> [16930.654163]  ip_rcv+0x4c/0xd0
>> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
>> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
>> [16930.654219]  napi_gro_receive+0xe7/0x140
>> [16930.654237]  xennet_poll+0x9be/0xae0
>> [16930.654254]  net_rx_action+0x136/0x340
>> [16930.654271]  __do_softirq+0xdd/0x2cf
>> [16930.654287]  irq_exit+0x7a/0xa0
>> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
>> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
>> [16930.654339]  
>> [16930.654349] RIP: 0033:0x55de0d87db99
>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 
>> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 
>> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
>> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: 
>> ff0c
>> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 
>> 007f
>> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 
>> 0002
>> [16930.655062] RBP: 7fff R08: 80ea R09: 
>> 01f0
>> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 
>> 55de0f3e0f2a
>> [16930.655116] R13: 0010 R14: 7f16 R15: 
>> 0080
>> [16930.655144] Modules linked in:
>> [16930.655200] ---[ end trace 533367c95501b645 ]---
>> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 
>> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286
>> [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>> 801b
>> [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: 
>> 888016b00880
>> [16930.655387] RBP: 888016b00880 R08: 0001 R09: 
>> 
>> [16930.655414] R10: 88800ae00800 R11: bfe632e6 R12: 
>> 05a0
>> [16930.655441] R13: 0001 R14: bfe62d46 R15: 
>> 0004
>> [16930.655475] FS:  7fe71fe2cb80() GS:88801f20() 
>> knlGS:
>> [16930.655502] CS:  0010 DS:  ES:  CR0: 80050033
>> [16930.655525] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
>> 06f0
>> [16930.63] Kernel panic - not syncing: Fatal exception in interrupt
>> [16930.655789] Kernel Offset: disabled
>>

Hi Eric,

Got another VM 

Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0

2019-08-12 Thread Sander Eikelenboom
On 12/08/2019 19:56, Eric Dumazet wrote:
> 
> 
> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
>> L.S.,
>>
>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest 
>> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
>> one of my Xen VM's (which gets quite some network load) crashed.
>> See below for the stacktrace.
>>
>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be 
>> an option at the moment. 
>> I haven't encountered this on 5.2, so it seems to be an regression against 
>> 5.2.
>>
>> Any ideas ?
>>
>> --
>> Sander
>>
>>
>> [16930.653595] general protection fault:  [#1] SMP NOPTI
>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 
>> 5.3.0-rc3-20190809-doflr+ #1
>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 
>> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286
>> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>> 801b
> 
> crash in " mov0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid 
> kernel address)
> 
> Look like one bit corruption maybe.
> 
> Nothing comes to mind really between 5.2 and 53 that could explain this.

Hi Eric,

Hmm could be it's a rare coincidence, sp that it just never occurred on pre 5.3 
by chance.
Let's wait and see if it reoccurs, will report back if it does.

Thanks for your explanation.

--
Sander


>> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 
>> 888016b00880
>> [16930.653819] RBP: 888016b00880 R08: 0001 R09: 
>> 
>> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 
>> 05a0
>> [16930.653875] R13: 0001 R14: bfe62d46 R15: 
>> 0004
>> [16930.653913] FS:  7fe71fe2cb80() GS:88801f20() 
>> knlGS:
>> [16930.653943] CS:  0010 DS:  ES:  CR0: 80050033
>> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
>> 06f0
>> [16930.653993] Call Trace:
>> [16930.654005]  
>> [16930.654018]  tcp_ack+0xbb0/0x1230
>> [16930.654033]  tcp_rcv_established+0x2e8/0x630
>> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
>> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
>> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
>> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
>> [16930.654128]  ip_local_deliver+0x4d/0xe0
>> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>> [16930.654163]  ip_rcv+0x4c/0xd0
>> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
>> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
>> [16930.654219]  napi_gro_receive+0xe7/0x140
>> [16930.654237]  xennet_poll+0x9be/0xae0
>> [16930.654254]  net_rx_action+0x136/0x340
>> [16930.654271]  __do_softirq+0xdd/0x2cf
>> [16930.654287]  irq_exit+0x7a/0xa0
>> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
>> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
>> [16930.654339]  
>> [16930.654349] RIP: 0033:0x55de0d87db99
>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 
>> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 
>> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
>> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: 
>> ff0c
>> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 
>> 007f
>> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 
>> 0002
>> [16930.655062] RBP: 7fff R08: 80ea R09: 
>> 01f0
>> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 
>> 55de0f3e0f2a
>> [16930.655116] R13: 0010 R14: 7f16 R15: 
>> 0080
>> [16930.655144] Modules linked in:
>> [16930.655200] ---[ end trace 533367c95501b645 ]---
>> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 
>> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286
>> [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>> 801b
>> [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: 
>> 888016b00880
>> [16930.655387] RBP: 888016b00880 R08: 0001 R09: 
>> 
>> [16930.655414] R10: 88800ae00800 R11: bfe632e6 R12: 
>> 05a0
>> [16930.655441] R13: 0001 R14: bfe62d46 R15: 
>> 0004
>> [16930.655475] FS:  7fe71fe2cb80() GS:88801f20() 
>> knlGS:
>> [16930.655502] CS:  0010 DS:  ES:  CR0: 80050033
>> [16930.655525] CR2: 

Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0

2019-08-12 Thread Eric Dumazet



On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
> L.S.,
> 
> While testing a somewhere-after-5.3-rc3 kernel (which included the latest net 
> merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
> one of my Xen VM's (which gets quite some network load) crashed.
> See below for the stacktrace.
> 
> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be 
> an option at the moment. 
> I haven't encountered this on 5.2, so it seems to be an regression against 
> 5.2.
> 
> Any ideas ?
> 
> --
> Sander
> 
> 
> [16930.653595] general protection fault:  [#1] SMP NOPTI
> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 
> 5.3.0-rc3-20190809-doflr+ #1
> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 
> 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 
> 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286
> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
> 801b

crash in " mov0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid 
kernel address)

Look like one bit corruption maybe.

Nothing comes to mind really between 5.2 and 53 that could explain this.

> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 
> 888016b00880
> [16930.653819] RBP: 888016b00880 R08: 0001 R09: 
> 
> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 
> 05a0
> [16930.653875] R13: 0001 R14: bfe62d46 R15: 
> 0004
> [16930.653913] FS:  7fe71fe2cb80() GS:88801f20() 
> knlGS:
> [16930.653943] CS:  0010 DS:  ES:  CR0: 80050033
> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
> 06f0
> [16930.653993] Call Trace:
> [16930.654005]  
> [16930.654018]  tcp_ack+0xbb0/0x1230
> [16930.654033]  tcp_rcv_established+0x2e8/0x630
> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
> [16930.654128]  ip_local_deliver+0x4d/0xe0
> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
> [16930.654163]  ip_rcv+0x4c/0xd0
> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
> [16930.654219]  napi_gro_receive+0xe7/0x140
> [16930.654237]  xennet_poll+0x9be/0xae0
> [16930.654254]  net_rx_action+0x136/0x340
> [16930.654271]  __do_softirq+0xdd/0x2cf
> [16930.654287]  irq_exit+0x7a/0xa0
> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
> [16930.654339]  
> [16930.654349] RIP: 0033:0x55de0d87db99
> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 f4 
> eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 25 
> 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: 
> ff0c
> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 
> 007f
> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 
> 0002
> [16930.655062] RBP: 7fff R08: 80ea R09: 
> 01f0
> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 
> 55de0f3e0f2a
> [16930.655116] R13: 0010 R14: 7f16 R15: 
> 0080
> [16930.655144] Modules linked in:
> [16930.655200] ---[ end trace 533367c95501b645 ]---
> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 
> 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 
> 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286
> [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
> 801b
> [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: 
> 888016b00880
> [16930.655387] RBP: 888016b00880 R08: 0001 R09: 
> 
> [16930.655414] R10: 88800ae00800 R11: bfe632e6 R12: 
> 05a0
> [16930.655441] R13: 0001 R14: bfe62d46 R15: 
> 0004
> [16930.655475] FS:  7fe71fe2cb80() GS:88801f20() 
> knlGS:
> [16930.655502] CS:  0010 DS:  ES:  CR0: 80050033
> [16930.655525] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
> 06f0
> [16930.63] Kernel panic - not syncing: Fatal exception in interrupt
> [16930.655789] Kernel Offset: disabled
>