Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
On 17/08/2019 18:35, Eric Dumazet wrote: > > > On 8/17/19 10:24 AM, Sander Eikelenboom wrote: >> On 12/08/2019 19:56, Eric Dumazet wrote: >>> >>> >>> On 8/12/19 2:50 PM, Sander Eikelenboom wrote: >>>> L.S., >>>> >>>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest >>>> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9), >>>> one of my Xen VM's (which gets quite some network load) crashed. >>>> See below for the stacktrace. >>>> >>>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to >>>> be an option at the moment. >>>> I haven't encountered this on 5.2, so it seems to be an regression against >>>> 5.2. >>>> >>>> Any ideas ? >>>> >>>> -- >>>> Sander >>>> >>>> >>>> [16930.653595] general protection fault: [#1] SMP NOPTI >>>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted >>>> 5.3.0-rc3-20190809-doflr+ #1 >>>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0 >>>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >>>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 >>>> <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >>>> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286 >>>> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: >>>> 801b >>> >>> crash in " mov0x20(%rax),%eax" and RAX=fffe888005bf62c0 (not a valid >>> kernel address) >>> >>> Look like one bit corruption maybe. >>> >>> Nothing comes to mind really between 5.2 and 53 that could explain this. >>> >>>> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: >>>> 888016b00880 >>>> [16930.653819] RBP: 888016b00880 R08: 0001 R09: >>>> >>>> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: >>>> 05a0 >>>> [16930.653875] R13: 0001 R14: bfe62d46 R15: >>>> 0004 >>>> [16930.653913] FS: 7fe71fe2cb80() GS:88801f20() >>>> knlGS: >>>> [16930.653943] CS: 0010 DS: ES: CR0: 80050033 >>>> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: >>>> 06f0 >>>> [16930.653993] Call Trace: >>>> [16930.654005] >>>> [16930.654018] tcp_ack+0xbb0/0x1230 >>>> [16930.654033] tcp_rcv_established+0x2e8/0x630 >>>> [16930.654053] tcp_v4_do_rcv+0x129/0x1d0 >>>> [16930.654070] tcp_v4_rcv+0xac9/0xcb0 >>>> [16930.654088] ip_protocol_deliver_rcu+0x27/0x1b0 >>>> [16930.654109] ip_local_deliver_finish+0x3f/0x50 >>>> [16930.654128] ip_local_deliver+0x4d/0xe0 >>>> [16930.654145] ? ip_protocol_deliver_rcu+0x1b0/0x1b0 >>>> [16930.654163] ip_rcv+0x4c/0xd0 >>>> [16930.654179] __netif_receive_skb_one_core+0x79/0x90 >>>> [16930.654200] netif_receive_skb_internal+0x2a/0xa0 >>>> [16930.654219] napi_gro_receive+0xe7/0x140 >>>> [16930.654237] xennet_poll+0x9be/0xae0 >>>> [16930.654254] net_rx_action+0x136/0x340 >>>> [16930.654271] __do_softirq+0xdd/0x2cf >>>> [16930.654287] irq_exit+0x7a/0xa0 >>>> [16930.654304] xen_evtchn_do_upcall+0x27/0x40 >>>> [16930.654320] xen_hvm_callback_vector+0xf/0x20 >>>> [16930.654339] >>>> [16930.654349] RIP: 0033:0x55de0d87db99 >>>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 >>>> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a >>>> <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6 >>>> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: >>>> ff0c >>>> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: >>>> 007f >>>> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: >>>> 0002 >>>> [16930.655062] RBP: 7fff R08: 80ea R09: >>>> 01f0 >>>> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: >>>> 55de0f3e0f2a >>>> [16930.655116] R13
Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
On 12/08/2019 19:56, Eric Dumazet wrote: > > > On 8/12/19 2:50 PM, Sander Eikelenboom wrote: >> L.S., >> >> While testing a somewhere-after-5.3-rc3 kernel (which included the latest >> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9), >> one of my Xen VM's (which gets quite some network load) crashed. >> See below for the stacktrace. >> >> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be >> an option at the moment. >> I haven't encountered this on 5.2, so it seems to be an regression against >> 5.2. >> >> Any ideas ? >> >> -- >> Sander >> >> >> [16930.653595] general protection fault: [#1] SMP NOPTI >> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted >> 5.3.0-rc3-20190809-doflr+ #1 >> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0 >> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> >> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286 >> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: >> 801b > > crash in " mov0x20(%rax),%eax" and RAX=fffe888005bf62c0 (not a valid > kernel address) > > Look like one bit corruption maybe. > > Nothing comes to mind really between 5.2 and 53 that could explain this. > >> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: >> 888016b00880 >> [16930.653819] RBP: 888016b00880 R08: 0001 R09: >> >> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: >> 05a0 >> [16930.653875] R13: 0001 R14: bfe62d46 R15: >> 0004 >> [16930.653913] FS: 7fe71fe2cb80() GS:88801f20() >> knlGS: >> [16930.653943] CS: 0010 DS: ES: CR0: 80050033 >> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: >> 06f0 >> [16930.653993] Call Trace: >> [16930.654005] >> [16930.654018] tcp_ack+0xbb0/0x1230 >> [16930.654033] tcp_rcv_established+0x2e8/0x630 >> [16930.654053] tcp_v4_do_rcv+0x129/0x1d0 >> [16930.654070] tcp_v4_rcv+0xac9/0xcb0 >> [16930.654088] ip_protocol_deliver_rcu+0x27/0x1b0 >> [16930.654109] ip_local_deliver_finish+0x3f/0x50 >> [16930.654128] ip_local_deliver+0x4d/0xe0 >> [16930.654145] ? ip_protocol_deliver_rcu+0x1b0/0x1b0 >> [16930.654163] ip_rcv+0x4c/0xd0 >> [16930.654179] __netif_receive_skb_one_core+0x79/0x90 >> [16930.654200] netif_receive_skb_internal+0x2a/0xa0 >> [16930.654219] napi_gro_receive+0xe7/0x140 >> [16930.654237] xennet_poll+0x9be/0xae0 >> [16930.654254] net_rx_action+0x136/0x340 >> [16930.654271] __do_softirq+0xdd/0x2cf >> [16930.654287] irq_exit+0x7a/0xa0 >> [16930.654304] xen_evtchn_do_upcall+0x27/0x40 >> [16930.654320] xen_hvm_callback_vector+0xf/0x20 >> [16930.654339] >> [16930.654349] RIP: 0033:0x55de0d87db99 >> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 >> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> >> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6 >> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: >> ff0c >> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: >> 007f >> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: >> 0002 >> [16930.655062] RBP: 7fff R08: 80ea R09: >> 01f0 >> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: >> 55de0f3e0f2a >> [16930.655116] R13: 0010 R14: 7f16 R15: >> 0080 >> [16930.655144] Modules linked in: >> [16930.655200] ---[ end trace 533367c95501b645 ]--- >> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0 >> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> >> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286 >> [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: >> 801b >> [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: >> 888016b00880 >> [16930.655387] RBP: 888016b00880 R08: 0001 R
Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
On 12/08/2019 19:56, Eric Dumazet wrote: > > > On 8/12/19 2:50 PM, Sander Eikelenboom wrote: >> L.S., >> >> While testing a somewhere-after-5.3-rc3 kernel (which included the latest >> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9), >> one of my Xen VM's (which gets quite some network load) crashed. >> See below for the stacktrace. >> >> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be >> an option at the moment. >> I haven't encountered this on 5.2, so it seems to be an regression against >> 5.2. >> >> Any ideas ? >> >> -- >> Sander >> >> >> [16930.653595] general protection fault: [#1] SMP NOPTI >> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted >> 5.3.0-rc3-20190809-doflr+ #1 >> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0 >> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> >> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286 >> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: >> 801b > > crash in " mov0x20(%rax),%eax" and RAX=fffe888005bf62c0 (not a valid > kernel address) > > Look like one bit corruption maybe. > > Nothing comes to mind really between 5.2 and 53 that could explain this. Hi Eric, Hmm could be it's a rare coincidence, sp that it just never occurred on pre 5.3 by chance. Let's wait and see if it reoccurs, will report back if it does. Thanks for your explanation. -- Sander >> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: >> 888016b00880 >> [16930.653819] RBP: 888016b00880 R08: 0001 R09: >> >> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: >> 05a0 >> [16930.653875] R13: 0001 R14: bfe62d46 R15: >> 0004 >> [16930.653913] FS: 7fe71fe2cb80() GS:88801f20() >> knlGS: >> [16930.653943] CS: 0010 DS: ES: CR0: 80050033 >> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: >> 06f0 >> [16930.653993] Call Trace: >> [16930.654005] >> [16930.654018] tcp_ack+0xbb0/0x1230 >> [16930.654033] tcp_rcv_established+0x2e8/0x630 >> [16930.654053] tcp_v4_do_rcv+0x129/0x1d0 >> [16930.654070] tcp_v4_rcv+0xac9/0xcb0 >> [16930.654088] ip_protocol_deliver_rcu+0x27/0x1b0 >> [16930.654109] ip_local_deliver_finish+0x3f/0x50 >> [16930.654128] ip_local_deliver+0x4d/0xe0 >> [16930.654145] ? ip_protocol_deliver_rcu+0x1b0/0x1b0 >> [16930.654163] ip_rcv+0x4c/0xd0 >> [16930.654179] __netif_receive_skb_one_core+0x79/0x90 >> [16930.654200] netif_receive_skb_internal+0x2a/0xa0 >> [16930.654219] napi_gro_receive+0xe7/0x140 >> [16930.654237] xennet_poll+0x9be/0xae0 >> [16930.654254] net_rx_action+0x136/0x340 >> [16930.654271] __do_softirq+0xdd/0x2cf >> [16930.654287] irq_exit+0x7a/0xa0 >> [16930.654304] xen_evtchn_do_upcall+0x27/0x40 >> [16930.654320] xen_hvm_callback_vector+0xf/0x20 >> [16930.654339] >> [16930.654349] RIP: 0033:0x55de0d87db99 >> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 >> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> >> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6 >> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: >> ff0c >> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: >> 007f >> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: >> 0002 >> [16930.655062] RBP: 7fff R08: 80ea R09: >> 01f0 >> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: >> 55de0f3e0f2a >> [16930.655116] R13: 0010 R14: 7f16 R15: >> 0080 >> [16930.655144] Modules linked in: >> [16930.655200] ---[ end trace 533367c95501b645 ]--- >> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0 >> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> >> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286 >> [16930.655331] RAX: fffe888005bf62c0 RBX: 88801
5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
L.S., While testing a somewhere-after-5.3-rc3 kernel (which included the latest net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9), one of my Xen VM's (which gets quite some network load) crashed. See below for the stacktrace. Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be an option at the moment. I haven't encountered this on 5.2, so it seems to be an regression against 5.2. Any ideas ? -- Sander [16930.653595] general protection fault: [#1] SMP NOPTI [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 5.3.0-rc3-20190809-doflr+ #1 [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0 [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286 [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 801b [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 888016b00880 [16930.653819] RBP: 888016b00880 R08: 0001 R09: [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 05a0 [16930.653875] R13: 0001 R14: bfe62d46 R15: 0004 [16930.653913] FS: 7fe71fe2cb80() GS:88801f20() knlGS: [16930.653943] CS: 0010 DS: ES: CR0: 80050033 [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 06f0 [16930.653993] Call Trace: [16930.654005] [16930.654018] tcp_ack+0xbb0/0x1230 [16930.654033] tcp_rcv_established+0x2e8/0x630 [16930.654053] tcp_v4_do_rcv+0x129/0x1d0 [16930.654070] tcp_v4_rcv+0xac9/0xcb0 [16930.654088] ip_protocol_deliver_rcu+0x27/0x1b0 [16930.654109] ip_local_deliver_finish+0x3f/0x50 [16930.654128] ip_local_deliver+0x4d/0xe0 [16930.654145] ? ip_protocol_deliver_rcu+0x1b0/0x1b0 [16930.654163] ip_rcv+0x4c/0xd0 [16930.654179] __netif_receive_skb_one_core+0x79/0x90 [16930.654200] netif_receive_skb_internal+0x2a/0xa0 [16930.654219] napi_gro_receive+0xe7/0x140 [16930.654237] xennet_poll+0x9be/0xae0 [16930.654254] net_rx_action+0x136/0x340 [16930.654271] __do_softirq+0xdd/0x2cf [16930.654287] irq_exit+0x7a/0xa0 [16930.654304] xen_evtchn_do_upcall+0x27/0x40 [16930.654320] xen_hvm_callback_vector+0xf/0x20 [16930.654339] [16930.654349] RIP: 0033:0x55de0d87db99 [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6 [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: ff0c [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 007f [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 0002 [16930.655062] RBP: 7fff R08: 80ea R09: 01f0 [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 55de0f3e0f2a [16930.655116] R13: 0010 R14: 7f16 R15: 0080 [16930.655144] Modules linked in: [16930.655200] ---[ end trace 533367c95501b645 ]--- [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0 [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286 [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 801b [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: 888016b00880 [16930.655387] RBP: 888016b00880 R08: 0001 R09: [16930.655414] R10: 88800ae00800 R11: bfe632e6 R12: 05a0 [16930.655441] R13: 0001 R14: bfe62d46 R15: 0004 [16930.655475] FS: 7fe71fe2cb80() GS:88801f20() knlGS: [16930.655502] CS: 0010 DS: ES: CR0: 80050033 [16930.655525] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 06f0 [16930.63] Kernel panic - not syncing: Fatal exception in interrupt [16930.655789] Kernel Offset: disabled
Re: RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
On 08/08/2019 12:21, Paolo Valente wrote: > > >> Il giorno 8 ago 2019, alle ore 12:21, Sander Eikelenboom >> ha scritto: >> >> On 08/08/2019 11:10, Paolo Valente wrote: >>> >>> >>>> Il giorno 8 ago 2019, alle ore 11:05, Sander Eikelenboom >>>> ha scritto: >>>> >>>> L.S., >>>> >>>> While testing a linux 5.3-rc3 kernel on my Xen server I come across the >>>> splat below when trying to shutdown all the VM's. >>>> This is after the server has ran for a few days without any problem. It >>>> seems to happen consistently. >>>> >>>> It seems it's in the same area as >>>> dbc3117d4ca9e17819ac73501e914b8422686750, but already rc3 incorporates >>>> that patch. >>>> >>>> Any ideas ? >>>> >>> >>> Could you try these fixes I proposed yesterday: >>> https://lkml.org/lkml/2019/8/7/536 >>> or, on patchwork: >>> https://patchwork.kernel.org/patch/11082247/ >>> https://patchwork.kernel.org/patch/11082249/ >> >> Hi Paolo, >> >> These two above seem to fix the issue ! >> So thanks for the swift reply (and the patchwork links for easy >> downloading the patches). >> >> I will test the third unrelated patch as well, but if you don't hear >> back , it's all good. >> > > Great! Thank you for offering to test also the other patch. Tested-by are > welcome too :) Hi, Haven't seen any problems with the patch so far, but haven't tested it on constraint memory, so i don't think a tested-by is justified in this case. -- Sander > Thanks, > Paolo > >> Thanks again ! >> >> -- >> Sander >> >>> I posted a further fix too, which should be unrelated. But, just in case: >>> https://lkml.org/lkml/2019/8/7/715 >>> or, on patchwork: >>> https://patchwork.kernel.org/patch/11082521/ >>> >>> Crossing my fingers (and think you for reporting this), >>> Paolo >>> >>>> -- >>>> Sander >>>> >>>> >>>> [80915.716048] BUG: unable to handle page fault for address: >>>> 1008 >>>> [80915.724188] #PF: supervisor write access in kernel mode >>>> [80915.733182] #PF: error_code(0x0002) - not-present page >>>> [80915.741455] PGD 0 P4D 0 >>>> [80915.750538] Oops: 0002 [#1] SMP NOPTI >>>> [80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: GW >>>> 5.3.0-rc3-20190807-doflr+ #1 >>>> [80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >>>> V1.8B1 09/13/2010 >>>> [80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 >>>> [80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 >>>> f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 >>>> <48> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 >>>> [80915.796792] RSP: e02b:c9000473be28 EFLAGS: 00010006 >>>> [80915.804419] RAX: 888070393200 RBX: 888076c4a800 RCX: >>>> 888076c4a9f8 >>>> [80915.810254] device vif17.0 left promiscuous mode >>>> [80915.811906] RDX: 1000 RSI: 1000 RDI: >>>> >>>> [80915.811908] RBP: 888077efc398 R08: 0004 R09: >>>> 81106800 >>>> [80915.811909] R10: 88807804ca40 R11: c9000473be31 R12: >>>> 888005256bf0 >>>> [80915.811909] R13: R14: 888005256800 R15: >>>> 82a6a3c0 >>>> [80915.811919] FS: 7f1c30a8dbc0() GS:88807d50() >>>> knlGS: >>>> [80915.819456] xen_bridge: port 18(vif17.0) entered disabled state >>>> [80915.826569] CS: 1e030 DS: ES: CR0: 80050033 >>>> [80915.826571] CR2: 1008 CR3: 5d9d CR4: >>>> 0660 >>>> [80915.826575] Call Trace: >>>> [80915.826592] bfq_exit_icq+0xe/0x20 >>>> [80915.826595] put_io_context_active+0x52/0x80 >>>> [80915.826599] do_exit+0x774/0xac0 >>>> [80915.906037] ? xen_blkif_be_int+0x30/0x30 >>>> [80915.913311] kthread+0xda/0x130 >>>> [80915.920398] ? kthread_park+0x80/0x80 >>>> [80915.927524] ret_from_fork+0x22/0x40 >>>> [80915.934512] Modules linked in: >>>> [80915.941412] CR2: 1008
Re: RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
On 08/08/2019 11:10, Paolo Valente wrote: > > >> Il giorno 8 ago 2019, alle ore 11:05, Sander Eikelenboom >> ha scritto: >> >> L.S., >> >> While testing a linux 5.3-rc3 kernel on my Xen server I come across the >> splat below when trying to shutdown all the VM's. >> This is after the server has ran for a few days without any problem. It >> seems to happen consistently. >> >> It seems it's in the same area as dbc3117d4ca9e17819ac73501e914b8422686750, >> but already rc3 incorporates that patch. >> >> Any ideas ? >> > > Could you try these fixes I proposed yesterday: > https://lkml.org/lkml/2019/8/7/536 > or, on patchwork: > https://patchwork.kernel.org/patch/11082247/ > https://patchwork.kernel.org/patch/11082249/ Hi Paolo, These two above seem to fix the issue ! So thanks for the swift reply (and the patchwork links for easy downloading the patches). I will test the third unrelated patch as well, but if you don't hear back , it's all good. Thanks again ! -- Sander > I posted a further fix too, which should be unrelated. But, just in case: > https://lkml.org/lkml/2019/8/7/715 > or, on patchwork: > https://patchwork.kernel.org/patch/11082521/ > > Crossing my fingers (and think you for reporting this), > Paolo > >> -- >> Sander >> >> >> [80915.716048] BUG: unable to handle page fault for address: 1008 >> [80915.724188] #PF: supervisor write access in kernel mode >> [80915.733182] #PF: error_code(0x0002) - not-present page >> [80915.741455] PGD 0 P4D 0 >> [80915.750538] Oops: 0002 [#1] SMP NOPTI >> [80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: GW >> 5.3.0-rc3-20190807-doflr+ #1 >> [80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >> V1.8B1 09/13/2010 >> [80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 >> [80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 >> f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> >> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 >> [80915.796792] RSP: e02b:c9000473be28 EFLAGS: 00010006 >> [80915.804419] RAX: 888070393200 RBX: 888076c4a800 RCX: >> 888076c4a9f8 >> [80915.810254] device vif17.0 left promiscuous mode >> [80915.811906] RDX: 1000 RSI: 1000 RDI: >> >> [80915.811908] RBP: 888077efc398 R08: 0004 R09: >> 81106800 >> [80915.811909] R10: 88807804ca40 R11: c9000473be31 R12: >> 888005256bf0 >> [80915.811909] R13: R14: 888005256800 R15: >> 82a6a3c0 >> [80915.811919] FS: 7f1c30a8dbc0() GS:88807d50() >> knlGS: >> [80915.819456] xen_bridge: port 18(vif17.0) entered disabled state >> [80915.826569] CS: 1e030 DS: ES: CR0: 80050033 >> [80915.826571] CR2: 1008 CR3: 5d9d CR4: >> 0660 >> [80915.826575] Call Trace: >> [80915.826592] bfq_exit_icq+0xe/0x20 >> [80915.826595] put_io_context_active+0x52/0x80 >> [80915.826599] do_exit+0x774/0xac0 >> [80915.906037] ? xen_blkif_be_int+0x30/0x30 >> [80915.913311] kthread+0xda/0x130 >> [80915.920398] ? kthread_park+0x80/0x80 >> [80915.927524] ret_from_fork+0x22/0x40 >> [80915.934512] Modules linked in: >> [80915.941412] CR2: 1008 >> [80915.948221] ---[ end trace 61315493e0f8ef40 ]--- >> [80915.954984] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 >> [80915.961850] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 >> f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> >> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 >> [80915.976124] RSP: e02b:c9000473be28 EFLAGS: 00010006 >> [80915.983205] RAX: 888070393200 RBX: 888076c4a800 RCX: >> 888076c4a9f8 >> [80915.990321] RDX: 1000 RSI: 1000 RDI: >> >> [80915.997319] RBP: 888077efc398 R08: 0004 R09: >> 81106800 >> [80916.004427] R10: 88807804ca40 R11: c9000473be31 R12: >> 888005256bf0 >> [80916.011525] R13: R14: 888005256800 R15: >> 82a6a3c0 >> [80916.018679] FS: 7f1c30a8dbc0() GS:88807d50() >> knlGS: >> [80916.025897] CS: 1e030 DS: ES: CR0: 80050033 >> [80916.033116] CR2: 1008 CR3: 5d9d CR4: >> 0660 >> [80916.040348] Fixing recursive fault but reboot is needed! >
RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
L.S., While testing a linux 5.3-rc3 kernel on my Xen server I come across the splat below when trying to shutdown all the VM's. This is after the server has ran for a few days without any problem. It seems to happen consistently. It seems it's in the same area as dbc3117d4ca9e17819ac73501e914b8422686750, but already rc3 incorporates that patch. Any ideas ? -- Sander [80915.716048] BUG: unable to handle page fault for address: 1008 [80915.724188] #PF: supervisor write access in kernel mode [80915.733182] #PF: error_code(0x0002) - not-present page [80915.741455] PGD 0 P4D 0 [80915.750538] Oops: 0002 [#1] SMP NOPTI [80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: GW 5.3.0-rc3-20190807-doflr+ #1 [80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 [80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 [80915.796792] RSP: e02b:c9000473be28 EFLAGS: 00010006 [80915.804419] RAX: 888070393200 RBX: 888076c4a800 RCX: 888076c4a9f8 [80915.810254] device vif17.0 left promiscuous mode [80915.811906] RDX: 1000 RSI: 1000 RDI: [80915.811908] RBP: 888077efc398 R08: 0004 R09: 81106800 [80915.811909] R10: 88807804ca40 R11: c9000473be31 R12: 888005256bf0 [80915.811909] R13: R14: 888005256800 R15: 82a6a3c0 [80915.811919] FS: 7f1c30a8dbc0() GS:88807d50() knlGS: [80915.819456] xen_bridge: port 18(vif17.0) entered disabled state [80915.826569] CS: 1e030 DS: ES: CR0: 80050033 [80915.826571] CR2: 1008 CR3: 5d9d CR4: 0660 [80915.826575] Call Trace: [80915.826592] bfq_exit_icq+0xe/0x20 [80915.826595] put_io_context_active+0x52/0x80 [80915.826599] do_exit+0x774/0xac0 [80915.906037] ? xen_blkif_be_int+0x30/0x30 [80915.913311] kthread+0xda/0x130 [80915.920398] ? kthread_park+0x80/0x80 [80915.927524] ret_from_fork+0x22/0x40 [80915.934512] Modules linked in: [80915.941412] CR2: 1008 [80915.948221] ---[ end trace 61315493e0f8ef40 ]--- [80915.954984] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 [80915.961850] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 [80915.976124] RSP: e02b:c9000473be28 EFLAGS: 00010006 [80915.983205] RAX: 888070393200 RBX: 888076c4a800 RCX: 888076c4a9f8 [80915.990321] RDX: 1000 RSI: 1000 RDI: [80915.997319] RBP: 888077efc398 R08: 0004 R09: 81106800 [80916.004427] R10: 88807804ca40 R11: c9000473be31 R12: 888005256bf0 [80916.011525] R13: R14: 888005256800 R15: 82a6a3c0 [80916.018679] FS: 7f1c30a8dbc0() GS:88807d50() knlGS: [80916.025897] CS: 1e030 DS: ES: CR0: 80050033 [80916.033116] CR2: 1008 CR3: 5d9d CR4: 0660 [80916.040348] Fixing recursive fault but reboot is needed!
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 10/02/2019 12:44, Heiner Kallweit wrote: > On 10.02.2019 10:16, Sander Eikelenboom wrote: >> On 09/02/2019 12:50, Heiner Kallweit wrote: >>> On 09.02.2019 11:07, Sander Eikelenboom wrote: >>>> On 09/02/2019 10:59, Heiner Kallweit wrote: >>>>> On 09.02.2019 10:34, Sander Eikelenboom wrote: >>>>>> On 09/02/2019 10:02, Heiner Kallweit wrote: >>>>>>> On 09.02.2019 00:09, Eric Dumazet wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>>>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>>>>>>>> L.S., >>>>>>>>>>>>>> >>>>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top >>>>>>>>>>>>>> but they don't seem related) under Xen i the nasty splat below, >>>>>>>>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and >>>>>>>>>>>>>> bisecting could be nasty due to another (networking related) >>>>>>>>>>>>>> kernel bug. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please >>>>>>>>>>>>>> feel free to ask. >>>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the report. However I see no change in the r8169 >>>>>>>>>>>>> driver between >>>>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root >>>>>>>>>>>>> cause could >>>>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>>>>>>>> >>>>>>>>>>>> Hmm i did some diging and i think: >>>>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded >>>>>>>>>>>> mmiowb barriers >>>>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of >>>>>>>>>>>> xmit_more and __netdev_sent_queue >>>>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>>>>>>>> >>>>>>>>>>> You're right. Thought this was added in 4.20 already. >>>>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I >>>>>>>>>>> haven't heard about >>>>>>>>>>> this issue from any user of physical hw. And due to the fact that a >>>>>>>>>>> lot of mainboards >>>>>>>>>>> have onboard Realtek network I have quite a few testers out there. >>>>>>>>>>> Does the issue occur under specific circumstances like very high >>>>>>>>>>> load? >>>>>>>>>> >>>>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I >>>>>>>>>> remember correctly it occurred while kernel compiling >>>>>>>>>> on the host. >>>>>>>>>> >>>>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>>>>>>>> involve Eric Dumazet >>>>>>>>>>> as author of the underlying changes. >>>>>>>>>> >>>>>>>>>> It could also be the barriers weren't that unneeded as assumed. >>>>>>>>> >>>>>>
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 10/02/2019 12:44, Heiner Kallweit wrote: > On 10.02.2019 10:16, Sander Eikelenboom wrote: >> On 09/02/2019 12:50, Heiner Kallweit wrote: >>> On 09.02.2019 11:07, Sander Eikelenboom wrote: >>>> On 09/02/2019 10:59, Heiner Kallweit wrote: >>>>> On 09.02.2019 10:34, Sander Eikelenboom wrote: >>>>>> On 09/02/2019 10:02, Heiner Kallweit wrote: >>>>>>> On 09.02.2019 00:09, Eric Dumazet wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>>>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>>>>>>>> L.S., >>>>>>>>>>>>>> >>>>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top >>>>>>>>>>>>>> but they don't seem related) under Xen i the nasty splat below, >>>>>>>>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and >>>>>>>>>>>>>> bisecting could be nasty due to another (networking related) >>>>>>>>>>>>>> kernel bug. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please >>>>>>>>>>>>>> feel free to ask. >>>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the report. However I see no change in the r8169 >>>>>>>>>>>>> driver between >>>>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root >>>>>>>>>>>>> cause could >>>>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>>>>>>>> >>>>>>>>>>>> Hmm i did some diging and i think: >>>>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded >>>>>>>>>>>> mmiowb barriers >>>>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of >>>>>>>>>>>> xmit_more and __netdev_sent_queue >>>>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>>>>>>>> >>>>>>>>>>> You're right. Thought this was added in 4.20 already. >>>>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I >>>>>>>>>>> haven't heard about >>>>>>>>>>> this issue from any user of physical hw. And due to the fact that a >>>>>>>>>>> lot of mainboards >>>>>>>>>>> have onboard Realtek network I have quite a few testers out there. >>>>>>>>>>> Does the issue occur under specific circumstances like very high >>>>>>>>>>> load? >>>>>>>>>> >>>>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I >>>>>>>>>> remember correctly it occurred while kernel compiling >>>>>>>>>> on the host. >>>>>>>>>> >>>>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>>>>>>>> involve Eric Dumazet >>>>>>>>>>> as author of the underlying changes. >>>>>>>>>> >>>>>>>>>> It could also be the barriers weren't that unneeded as assumed. >>>>>>>>> >>>>>>
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 09/02/2019 12:50, Heiner Kallweit wrote: > On 09.02.2019 11:07, Sander Eikelenboom wrote: >> On 09/02/2019 10:59, Heiner Kallweit wrote: >>> On 09.02.2019 10:34, Sander Eikelenboom wrote: >>>> On 09/02/2019 10:02, Heiner Kallweit wrote: >>>>> On 09.02.2019 00:09, Eric Dumazet wrote: >>>>>> >>>>>> >>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>>>>>> L.S., >>>>>>>>>>>> >>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but >>>>>>>>>>>> they don't seem related) under Xen i the nasty splat below, >>>>>>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and >>>>>>>>>>>> bisecting could be nasty due to another (networking related) >>>>>>>>>>>> kernel bug. >>>>>>>>>>>> >>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please >>>>>>>>>>>> feel free to ask. >>>>>>>>>>>> >>>>>>>>>>> Thanks for the report. However I see no change in the r8169 driver >>>>>>>>>>> between >>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root >>>>>>>>>>> cause could >>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>>>>>> >>>>>>>>>> Hmm i did some diging and i think: >>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded >>>>>>>>>> mmiowb barriers >>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of >>>>>>>>>> xmit_more and __netdev_sent_queue >>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>>>>>> >>>>>>>>> You're right. Thought this was added in 4.20 already. >>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I >>>>>>>>> haven't heard about >>>>>>>>> this issue from any user of physical hw. And due to the fact that a >>>>>>>>> lot of mainboards >>>>>>>>> have onboard Realtek network I have quite a few testers out there. >>>>>>>>> Does the issue occur under specific circumstances like very high load? >>>>>>>> >>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I >>>>>>>> remember correctly it occurred while kernel compiling >>>>>>>> on the host. >>>>>>>> >>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>>>>>> involve Eric Dumazet >>>>>>>>> as author of the underlying changes. >>>>>>>> >>>>>>>> It could also be the barriers weren't that unneeded as assumed. >>>>>>> >>>>>>> The barriers were removed after adding xmit_more handling. Therefore it >>>>>>> would be good to >>>>>>> test also with only >>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>>>>> barriers >>>>>>> removed. >>>>>>> >>>>>>>> Since we are almost at RC6 i took the liberty to CC Eric now. >>>>>>>> >>>>>>> Sure, thanks. >>>>>>> >>>>>>>> B
Re: Linux 5.0 regression: BUG: unable to handle kernel paging request at ffff888023e26778 RIP: e030:move_page_tables+0x7c1/0xae0
On 09/02/2019 19:48, Juergen Gross wrote: > On 09/02/2019 19:45, Sander Eikelenboom wrote: >> On 09/02/2019 09:26, Sander Eikelenboom wrote: >>> L.S., >>> >>> >>> While testing a Linux 5.0-rc5-ish kernel (pull of yesterday) with some >>> additional patches for >>> already reported other issues i came across the issue below which i haven't >>> seen with 4.20.x >>> >>> I haven't got a reproducer so i might be hard to hit it again, >>> system is AMD and this is from the host kernel running under >>> the Xen hypervisor might it matter. >> >>> -- >>> >>> Sander >> >> Hi Boris / Juergen, >> >> The commit causing this is: >> 2c91bd4a4e2e530582d6fd643ea7b86b27907151 mm: speed up mremap by 20x on large >> regions >> >> Since it seems there haven't been any other reports about this .. >> could it be this doesn't specifically work well with a Xen PVH dom0 ? > > PVH? Not PV? Ah sorry, indeed PV ! > > Juergen >
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 09/02/2019 10:59, Heiner Kallweit wrote: > On 09.02.2019 10:34, Sander Eikelenboom wrote: >> On 09/02/2019 10:02, Heiner Kallweit wrote: >>> On 09.02.2019 00:09, Eric Dumazet wrote: >>>> >>>> >>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>>>> L.S., >>>>>>>>>> >>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but >>>>>>>>>> they don't seem related) under Xen i the nasty splat below, >>>>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>>>> >>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and >>>>>>>>>> bisecting could be nasty due to another (networking related) kernel >>>>>>>>>> bug. >>>>>>>>>> >>>>>>>>>> If you need more info, want me to run a debug patch etc., please >>>>>>>>>> feel free to ask. >>>>>>>>>> >>>>>>>>> Thanks for the report. However I see no change in the r8169 driver >>>>>>>>> between >>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause >>>>>>>>> could >>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>>>> >>>>>>>> Hmm i did some diging and i think: >>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>>>>>> barriers >>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more >>>>>>>> and __netdev_sent_queue >>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>>>> >>>>>>> You're right. Thought this was added in 4.20 already. >>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't >>>>>>> heard about >>>>>>> this issue from any user of physical hw. And due to the fact that a lot >>>>>>> of mainboards >>>>>>> have onboard Realtek network I have quite a few testers out there. >>>>>>> Does the issue occur under specific circumstances like very high load? >>>>>> >>>>>> Yep, the box is already quite contented with the Xen VM's and if I >>>>>> remember correctly it occurred while kernel compiling >>>>>> on the host. >>>>>> >>>>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>>>> involve Eric Dumazet >>>>>>> as author of the underlying changes. >>>>>> >>>>>> It could also be the barriers weren't that unneeded as assumed. >>>>> >>>>> The barriers were removed after adding xmit_more handling. Therefore it >>>>> would be good to >>>>> test also with only >>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>>> barriers >>>>> removed. >>>>> >>>>>> Since we are almost at RC6 i took the liberty to CC Eric now. >>>>>> >>>>> Sure, thanks. >>>>> >>>>>> BTW am i correct these patches are merely optimizations ? >>>>> >>>>> Yes >>>>> >>>>>> If so and concluding they revert cleanly, perhaps it should be >>>>>> considered at this point in the RC's >>>>>> to revert them for 5.0 and try again for 5.1 ? >>>>>> >>>>> Before removing both it would be good to test with only the >>>>> barrier-removal removed. >>>>> >>>> >>>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of >>>> xmit
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 09/02/2019 10:02, Heiner Kallweit wrote: > On 09.02.2019 00:09, Eric Dumazet wrote: >> >> >> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>> L.S., >>>>>>>> >>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but >>>>>>>> they don't seem related) under Xen i the nasty splat below, >>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>> >>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting >>>>>>>> could be nasty due to another (networking related) kernel bug. >>>>>>>> >>>>>>>> If you need more info, want me to run a debug patch etc., please feel >>>>>>>> free to ask. >>>>>>>> >>>>>>> Thanks for the report. However I see no change in the r8169 driver >>>>>>> between >>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause >>>>>>> could >>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>> >>>>>> Hmm i did some diging and i think: >>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>>>> barriers >>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more >>>>>> and __netdev_sent_queue >>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>> >>>>> You're right. Thought this was added in 4.20 already. >>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't >>>>> heard about >>>>> this issue from any user of physical hw. And due to the fact that a lot >>>>> of mainboards >>>>> have onboard Realtek network I have quite a few testers out there. >>>>> Does the issue occur under specific circumstances like very high load? >>>> >>>> Yep, the box is already quite contented with the Xen VM's and if I >>>> remember correctly it occurred while kernel compiling >>>> on the host. >>>> >>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>> involve Eric Dumazet >>>>> as author of the underlying changes. >>>> >>>> It could also be the barriers weren't that unneeded as assumed. >>> >>> The barriers were removed after adding xmit_more handling. Therefore it >>> would be good to >>> test also with only >>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>> barriers >>> removed. >>> >>>> Since we are almost at RC6 i took the liberty to CC Eric now. >>>> >>> Sure, thanks. >>> >>>> BTW am i correct these patches are merely optimizations ? >>> >>> Yes >>> >>>> If so and concluding they revert cleanly, perhaps it should be considered >>>> at this point in the RC's >>>> to revert them for 5.0 and try again for 5.1 ? >>>> >>> Before removing both it would be good to test with only the barrier-removal >>> removed. >>> >> >> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more >> and __netdev_sent_queue >> looks buggy to me, since the skb might have been freed already on another >> cpu when you call >> >> You could try : >> >> diff --git a/drivers/net/ethernet/realtek/r8169.c >> b/drivers/net/ethernet/realtek/r8169.c >> index >> 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 >> 100644 >> --- a/drivers/net/ethernet/realtek/r8169.c >> +++ b/drivers/net/ethernet/realtek/r8169.c >> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff >> *skb, >> dma_addr_t mapping; >> u32 opts[2], len; >> bool stop_queue; >> + bool door_bell; >>
Linux 5.0 regression: BUG: unable to handle kernel paging request at ffff888023e26778
L.S., While testing a Linux 5.0-rc5-ish kernel (pull of yesterday) with some additional patches for already reported other issues i came across the issue below which i haven't seen with 4.20.x I haven't got a reproducer so i might be hard to hit it again, system is AMD and this is from the host kernel running under the Xen hypervisor might it matter. -- Sander [17035.016433] BUG: unable to handle kernel paging request at 888023e26778 [17035.025887] #PF error: [PROT] [WRITE] [17035.035146] PGD 2a2a067 P4D 2a2a067 PUD 2a2b067 PMD 7fe01067 PTE 801023e26065 [17035.044371] Oops: 0003 [#1] SMP NOPTI [17035.053720] CPU: 3 PID: 28310 Comm: apt-get Not tainted 5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1 [17035.063440] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [17035.072635] RIP: e030:move_page_tables+0x7c1/0xae0 [17035.081585] Code: ce 00 48 8b 03 31 ff 48 89 44 24 20 e8 9e 72 e4 ff 66 90 48 89 c6 48 89 df e8 8b 89 e4 ff 66 90 48 8b 44 24 20 b9 0c 00 00 00 <48> 89 45 00 41 f6 46 52 40 0f 85 3f 02 00 00 49 8b 7e 40 45 31 c0 [17035.100225] RSP: e02b:c9f2bd40 EFLAGS: 00010282 [17035.109208] RAX: 000475e42067 RBX: 888023e267e0 RCX: 000c [17035.118332] RDX: RSI: RDI: 0201 [17035.127378] RBP: 888023e26778 R08: R09: 00051c1d9000 [17035.136310] R10: deadbeefdeadf00d R11: 88807fc17000 R12: 7fc59fa0 [17035.145433] R13: ea8f89a8 R14: 88801c2286c0 R15: 7fc59f80 [17035.154171] FS: 7fc5a5591100() GS:88807d4c() knlGS: [17035.162730] CS: e030 DS: ES: CR0: 80050033 [17035.171180] CR2: 888023e26778 CR3: 1c3f6000 CR4: 0660 [17035.179545] Call Trace: [17035.187736] move_vma.isra.3+0xd1/0x2d0 [17035.195837] __se_sys_mremap+0x3c6/0x5b0 [17035.203986] do_syscall_64+0x49/0x100 [17035.212109] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [17035.219971] RIP: 0033:0x7fc5a453527a [17035.227558] Code: 73 01 c3 48 8b 0d 1e fc 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee fb 2a 00 f7 d8 64 89 01 48 [17035.243255] RSP: 002b:7ffda22d96f8 EFLAGS: 0246 ORIG_RAX: 0019 [17035.251121] RAX: ffda RBX: 557d40923a30 RCX: 7fc5a453527a [17035.258986] RDX: 01a0 RSI: 0190 RDI: 7fc59f7ff000 [17035.267127] RBP: 01a0 R08: 0020 R09: 0040 [17035.275259] R10: 0001 R11: 0246 R12: 7fc59f7ff060 [17035.282681] R13: 7fc59f7ff000 R14: 557d40923a30 R15: 557d40829aa0 [17035.290322] Modules linked in: [17035.297875] CR2: 888023e26778 [17035.305405] ---[ end trace 6ff49f09286816b6 ]--- [17035.313131] RIP: e030:move_page_tables+0x7c1/0xae0 [17035.320326] Code: ce 00 48 8b 03 31 ff 48 89 44 24 20 e8 9e 72 e4 ff 66 90 48 89 c6 48 89 df e8 8b 89 e4 ff 66 90 48 8b 44 24 20 b9 0c 00 00 00 <48> 89 45 00 41 f6 46 52 40 0f 85 3f 02 00 00 49 8b 7e 40 45 31 c0 [17035.334851] RSP: e02b:c9f2bd40 EFLAGS: 00010282 [17035.341727] RAX: 000475e42067 RBX: 888023e267e0 RCX: 000c [17035.348838] RDX: RSI: RDI: 0201 [17035.356000] RBP: 888023e26778 R08: R09: 00051c1d9000 [17035.363623] R10: deadbeefdeadf00d R11: 88807fc17000 R12: 7fc59fa0 [17035.371454] R13: ea8f89a8 R14: 88801c2286c0 R15: 7fc59f80 [17035.378958] FS: 7fc5a5591100() GS:88807d4c() knlGS: [17035.386585] CS: e030 DS: ES: CR0: 80050033 [17035.393797] CR2: 888023e26778 CR3: 1c3f6000 CR4: 0660
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 08/02/2019 22:50, Heiner Kallweit wrote: > On 08.02.2019 22:45, Sander Eikelenboom wrote: >> On 08/02/2019 22:22, Heiner Kallweit wrote: >>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>> L.S., >>>>>> >>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they >>>>>> don't seem related) under Xen i the nasty splat below, >>>>>> that I haven encountered with Linux 4.20.x. >>>>>> >>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting >>>>>> could be nasty due to another (networking related) kernel bug. >>>>>> >>>>>> If you need more info, want me to run a debug patch etc., please feel >>>>>> free to ask. >>>>>> >>>>> Thanks for the report. However I see no change in the r8169 driver between >>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause >>>>> could >>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>> >>>> Hmm i did some diging and i think: >>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>> barriers >>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and >>>> __netdev_sent_queue >>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>> >>> You're right. Thought this was added in 4.20 already. >>> The BQL code pattern I copied from the mlx4 driver and so far I haven't >>> heard about >>> this issue from any user of physical hw. And due to the fact that a lot of >>> mainboards >>> have onboard Realtek network I have quite a few testers out there. >>> Does the issue occur under specific circumstances like very high load? >> >> Yep, the box is already quite contented with the Xen VM's and if I remember >> correctly it occurred while kernel compiling >> on the host. >> >>> If indeed the xmit_more patch causes the issue, I think we have to involve >>> Eric Dumazet >>> as author of the underlying changes. >> >> It could also be the barriers weren't that unneeded as assumed. > > The barriers were removed after adding xmit_more handling. Therefore it would > be good to > test also with only > bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb > barriers > removed. *arghh* *grmbl* with both: bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 and 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 reverted i get yet another splat: [ 3769.246083] ld: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 [ 3769.246095] CPU: 2 PID: 3201 Comm: ld Not tainted 5.0.0-rc5-20190208-thp-net-florian-rtl8169-doflr+ #1 [ 3769.246096] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 3769.246098] Call Trace: [ 3769.246104] [ 3769.246114] dump_stack+0x5c/0x7b [ 3769.246120] warn_alloc+0x103/0x190 [ 3769.246122] __alloc_pages_nodemask+0xe3d/0xe80 [ 3769.246128] ? inet_gro_receive+0x232/0x2c0 [ 3769.246130] page_frag_alloc+0x117/0x150 [ 3769.246132] __napi_alloc_skb+0x83/0xd0 [ 3769.246137] rtl8169_poll+0x210/0x640 [ 3769.246140] net_rx_action+0x23d/0x370 [ 3769.246145] __do_softirq+0xed/0x229 [ 3769.246149] irq_exit+0xb7/0xc0 [ 3769.246152] xen_evtchn_do_upcall+0x27/0x40 [ 3769.246154] xen_do_hypervisor_callback+0x29/0x40 [ 3769.246155] [ 3769.246161] RIP: e030:__pv_queued_spin_lock_slowpath+0xda/0x280 [ 3769.246163] Code: 14 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f b6 c0 48 89 04 24 c6 45 14 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44 [ 3769.246164] RSP: e02b:c90005b0f780 EFLAGS: 0202 [ 3769.246166] RAX: 0001 RBX: 8880047c9200 RCX: 0001 [ 3769.246167] RDX: 7d75 RSI: RDI: 8880047c9200 [ 3769.246167] RBP: 88807d4a1a80 R08: c90005b0f978 R09: c90005b0f978 [ 3769.246168] R10: c90005b0f9d0 R11: 88807fc17000 R12: 0001 [ 3769.246169] R13: 0100 R14: R15: 000c [ 3769.246173] _raw_spin_lock+0x16/0x20 [ 3769.246176] list_lru_add+0x59/0x170 [ 3769.246179] inode_lru_list_add+0x1b/0x40 [ 3769.246182] iput+0x18b/0x1a0 [ 3769.246184] __dentry_kill+0xc5/0x170 [ 3769.246186] shrink_dentry_list+0
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 08/02/2019 22:22, Heiner Kallweit wrote: > On 08.02.2019 21:55, Sander Eikelenboom wrote: >> On 08/02/2019 19:52, Heiner Kallweit wrote: >>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>> L.S., >>>> >>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they >>>> don't seem related) under Xen i the nasty splat below, >>>> that I haven encountered with Linux 4.20.x. >>>> >>>> Unfortunately I haven't got a clear reproducer for this and bisecting >>>> could be nasty due to another (networking related) kernel bug. >>>> >>>> If you need more info, want me to run a debug patch etc., please feel free >>>> to ask. >>>> >>> Thanks for the report. However I see no change in the r8169 driver between >>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could >>> be somewhere else. Therefore I'm afraid a bisect will be needed. >> >> Hmm i did some diging and i think: >> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >> barriers >> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and >> __netdev_sent_queue >> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue >> as variant of __netdev_tx_sent_queue >> > You're right. Thought this was added in 4.20 already. > The BQL code pattern I copied from the mlx4 driver and so far I haven't heard > about > this issue from any user of physical hw. And due to the fact that a lot of > mainboards > have onboard Realtek network I have quite a few testers out there. > Does the issue occur under specific circumstances like very high load? Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling on the host. > If indeed the xmit_more patch causes the issue, I think we have to involve > Eric Dumazet > as author of the underlying changes. It could also be the barriers weren't that unneeded as assumed. Since we are almost at RC6 i took the liberty to CC Eric now. BTW am i correct these patches are merely optimizations ? If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's to revert them for 5.0 and try again for 5.1 ? -- Sander > >> would be candidates, which were merged in 5.0. >> >> I have reverted the first two, see how that works out. >> >> -- >> Sander >> > Heiner > >> >>>> -- >>>> Sander >>>> >>> Heiner >>> >>>> >>>> [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27! >>>> [ 6466.571425] invalid opcode: [#1] SMP NOPTI >>>> [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted >>>> 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1 >>>> [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >>>> V1.8B1 09/13/2010 >>>> [ 6466.611579] RIP: e030:dql_completed+0x126/0x140 >>>> [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 >>>> 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff >>>> <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90 >>>> [ 6466.648130] RSP: e02b:88807d4c3e78 EFLAGS: 00010297 >>>> [ 6466.659616] RAX: 0042 RBX: 8880049cf800 RCX: >>>> >>>> [ 6466.672835] RDX: 0001 RSI: 0042 RDI: >>>> 8880049cf8c0 >>>> [ 6466.684521] RBP: 888077df7260 R08: 0001 R09: >>>> >>>> [ 6466.696824] R10: 387c2336 R11: 387c2336 R12: >>>> 1000 >>>> [ 6466.709953] R13: 888077df6898 R14: 888077df75c0 R15: >>>> 00454677 >>>> [ 6466.722165] FS: 7fd869147200() GS:88807d4c() >>>> knlGS: >>>> [ 6466.733228] CS: e030 DS: ES: CR0: 80050033 >>>> [ 6466.746581] CR2: 7fd867dfd000 CR3: 74884000 CR4: >>>> 0660 >>>> [ 6466.758366] Call Trace: >>>> [ 6466.768118] >>>> [ 6466.778214] rtl8169_poll+0x4f4/0x640 >>>> [ 6466.789198] net_rx_action+0x23d/0x370 >>>> [ 6466.798467] __do_softirq+0xed/0x229 >>>> [ 6466.807039] irq_exit+0xb7/0xc0 >>>> [ 6466.815471] xen_evtchn_do_upcall+0x27/0x40 >>>> [ 6466.826647] xen_do_hypervisor_callback+0x29/0x40 >>
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 08/02/2019 19:52, Heiner Kallweit wrote: > On 08.02.2019 19:29, Sander Eikelenboom wrote: >> L.S., >> >> While testing a linux 5.0-rc5 kernel (with some patches on top but they >> don't seem related) under Xen i the nasty splat below, >> that I haven encountered with Linux 4.20.x. >> >> Unfortunately I haven't got a clear reproducer for this and bisecting could >> be nasty due to another (networking related) kernel bug. >> >> If you need more info, want me to run a debug patch etc., please feel free >> to ask. >> > Thanks for the report. However I see no change in the r8169 driver between > 4.20 and 5.0 with regard to BQL code. Having said that the root cause could > be somewhere else. Therefore I'm afraid a bisect will be needed. Hmm i did some diging and i think: bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue would be candidates, which were merged in 5.0. I have reverted the first two, see how that works out. -- Sander >> -- >> Sander >> > Heiner > >> >> [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27! >> [ 6466.571425] invalid opcode: [#1] SMP NOPTI >> [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted >> 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1 >> [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >> V1.8B1 09/13/2010 >> [ 6466.611579] RIP: e030:dql_completed+0x126/0x140 >> [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 >> 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> >> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90 >> [ 6466.648130] RSP: e02b:88807d4c3e78 EFLAGS: 00010297 >> [ 6466.659616] RAX: 0042 RBX: 8880049cf800 RCX: >> >> [ 6466.672835] RDX: 0001 RSI: 0042 RDI: >> 8880049cf8c0 >> [ 6466.684521] RBP: 888077df7260 R08: 0001 R09: >> >> [ 6466.696824] R10: 387c2336 R11: 387c2336 R12: >> 1000 >> [ 6466.709953] R13: 888077df6898 R14: 888077df75c0 R15: >> 00454677 >> [ 6466.722165] FS: 7fd869147200() GS:88807d4c() >> knlGS: >> [ 6466.733228] CS: e030 DS: ES: CR0: 80050033 >> [ 6466.746581] CR2: 7fd867dfd000 CR3: 74884000 CR4: >> 0660 >> [ 6466.758366] Call Trace: >> [ 6466.768118] >> [ 6466.778214] rtl8169_poll+0x4f4/0x640 >> [ 6466.789198] net_rx_action+0x23d/0x370 >> [ 6466.798467] __do_softirq+0xed/0x229 >> [ 6466.807039] irq_exit+0xb7/0xc0 >> [ 6466.815471] xen_evtchn_do_upcall+0x27/0x40 >> [ 6466.826647] xen_do_hypervisor_callback+0x29/0x40 >> [ 6466.835902] >> [ 6466.845361] RIP: e030:xen_hypercall_mmu_update+0xa/0x20 >> [ 6466.853390] Code: 51 41 53 b8 00 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc >> cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 01 00 00 00 0f 05 <41> >> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc >> [ 6466.874031] RSP: e02b:c90003c0bdd0 EFLAGS: 0246 >> [ 6466.883452] RAX: RBX: 00041f83bfe8 RCX: >> 8100102a >> [ 6466.891986] RDX: deadbeefdeadf00d RSI: deadbeefdeadf00d RDI: >> deadbeefdeadf00d >> [ 6466.903402] RBP: 0fe8 R08: 000b R09: >> >> [ 6466.911201] R10: deadbeefdeadf00d R11: 0246 R12: >> 80050c346067 >> [ 6466.918491] R13: 8880607c4fe8 R14: 888005082800 R15: >> >> [ 6466.926647] ? xen_hypercall_mmu_update+0xa/0x20 >> [ 6466.938195] ? xen_set_pte_at+0x78/0xe0 >> [ 6466.947046] ? __handle_mm_fault+0xc43/0x1060 >> [ 6466.955772] ? do_mmap+0x44b/0x5b0 >> [ 6466.964410] ? handle_mm_fault+0xf8/0x200 >> [ 6466.973290] ? __do_page_fault+0x231/0x4a0 >> [ 6466.981973] ? page_fault+0x8/0x30 >> [ 6466.990904] ? page_fault+0x1e/0x30 >> [ 6466.999585] Modules linked in: >> [ 6467.007533] ---[ end trace 94bec01608fe4061 ]--- >> [ 6467.016751] RIP: e030:dql_completed+0x126/0x140 >> [ 6467.024271] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 >> 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> >> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90 >&g
Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
L.S., While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, that I haven encountered with Linux 4.20.x. Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug. If you need more info, want me to run a debug patch etc., please feel free to ask. -- Sander [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27! [ 6466.571425] invalid opcode: [#1] SMP NOPTI [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1 [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 6466.611579] RIP: e030:dql_completed+0x126/0x140 [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90 [ 6466.648130] RSP: e02b:88807d4c3e78 EFLAGS: 00010297 [ 6466.659616] RAX: 0042 RBX: 8880049cf800 RCX: [ 6466.672835] RDX: 0001 RSI: 0042 RDI: 8880049cf8c0 [ 6466.684521] RBP: 888077df7260 R08: 0001 R09: [ 6466.696824] R10: 387c2336 R11: 387c2336 R12: 1000 [ 6466.709953] R13: 888077df6898 R14: 888077df75c0 R15: 00454677 [ 6466.722165] FS: 7fd869147200() GS:88807d4c() knlGS: [ 6466.733228] CS: e030 DS: ES: CR0: 80050033 [ 6466.746581] CR2: 7fd867dfd000 CR3: 74884000 CR4: 0660 [ 6466.758366] Call Trace: [ 6466.768118] [ 6466.778214] rtl8169_poll+0x4f4/0x640 [ 6466.789198] net_rx_action+0x23d/0x370 [ 6466.798467] __do_softirq+0xed/0x229 [ 6466.807039] irq_exit+0xb7/0xc0 [ 6466.815471] xen_evtchn_do_upcall+0x27/0x40 [ 6466.826647] xen_do_hypervisor_callback+0x29/0x40 [ 6466.835902] [ 6466.845361] RIP: e030:xen_hypercall_mmu_update+0xa/0x20 [ 6466.853390] Code: 51 41 53 b8 00 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 01 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [ 6466.874031] RSP: e02b:c90003c0bdd0 EFLAGS: 0246 [ 6466.883452] RAX: RBX: 00041f83bfe8 RCX: 8100102a [ 6466.891986] RDX: deadbeefdeadf00d RSI: deadbeefdeadf00d RDI: deadbeefdeadf00d [ 6466.903402] RBP: 0fe8 R08: 000b R09: [ 6466.911201] R10: deadbeefdeadf00d R11: 0246 R12: 80050c346067 [ 6466.918491] R13: 8880607c4fe8 R14: 888005082800 R15: [ 6466.926647] ? xen_hypercall_mmu_update+0xa/0x20 [ 6466.938195] ? xen_set_pte_at+0x78/0xe0 [ 6466.947046] ? __handle_mm_fault+0xc43/0x1060 [ 6466.955772] ? do_mmap+0x44b/0x5b0 [ 6466.964410] ? handle_mm_fault+0xf8/0x200 [ 6466.973290] ? __do_page_fault+0x231/0x4a0 [ 6466.981973] ? page_fault+0x8/0x30 [ 6466.990904] ? page_fault+0x1e/0x30 [ 6466.999585] Modules linked in: [ 6467.007533] ---[ end trace 94bec01608fe4061 ]--- [ 6467.016751] RIP: e030:dql_completed+0x126/0x140 [ 6467.024271] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90 [ 6467.039726] RSP: e02b:88807d4c3e78 EFLAGS: 00010297 [ 6467.047243] RAX: 0042 RBX: 8880049cf800 RCX: [ 6467.054202] RDX: 0001 RSI: 0042 RDI: 8880049cf8c0 [ 6467.062000] RBP: 888077df7260 R08: 0001 R09: [ 6467.069664] R10: 387c2336 R11: 387c2336 R12: 1000 [ 6467.077715] R13: 888077df6898 R14: 888077df75c0 R15: 00454677 [ 6467.084916] FS: 7fd869147200() GS:88807d4c() knlGS: [ 6467.093352] CS: e030 DS: ES: CR0: 80050033 [ 6467.101492] CR2: 7fd867dfd000 CR3: 74884000 CR4: 0660 [ 6467.110542] Kernel panic - not syncing: Fatal exception in interrupt [ 6467.118166] Kernel Offset: disabled (XEN) [2019-02-08 18:04:48.854] Hardware Dom0 crashed: rebooting machine in 5 seconds.
Re: Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
On 08/02/2019 12:54, Florian Westphal wrote: > Florian Westphal wrote: >> Sander Eikelenboom wrote: >>> L.S., >>> >>> While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression >>> with NAT. >>> (using an nftables firewall with NAT and connection tracking). >>> >>> Unfortunately it isn't too obvious since no errors are logged, but on >>> clients it >>> causes symptoms like firefox intermittently not being able to load pages >>> with: >>> Network Protocol Error >>> An error occurred during a connection to www.example.com >>> The page you are trying to view cannot be shown because an error in the >>> network protocol was detected. >>> Please contact the website owners to inform them of this problem. >>> >>> But it's only intermittently, so i can still visit some webpages with >>> clients, >>> could be that packet size and or fragments are at play ? >>> >>> So I tried testing with >>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with >>> e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have >>> the latest netdev has to offer, >>> but to no avail. >>> >>> After that I tried to git bisect and ended up with: >>> >>> faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit >>> commit faec18dbb0405c7d4dda025054511dc3a6696918 >>> Author: Florian Westphal >>> Date: Thu Dec 13 16:01:33 2018 +0100 >>> >>> netfilter: nat: remove l4proto->manip_pkt >> >> Thanks, this is immensely helpful. >> >> I think I see the bug, we can't use target->dst.protonum in >> nf_nat_l4proto_manip_pkt(), it will be TCP in case we're dealing >> with a related icmp packet. >> >> I will send a patch in a few hours when I get back. > > Sander, does this patch fix things for you? Hi Florian, You may stick on a reported/tested-by if you like. Thanks for the swift fix ! -- Sander > > Thanks! > > diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > @@ -215,6 +215,7 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb, > > /* Change outer to look like the reply to an incoming packet */ > nf_ct_invert_tuplepr(, >tuplehash[!dir].tuple); > + target.dst.protonum = IPPROTO_ICMP; > if (!nf_nat_ipv4_manip_pkt(skb, 0, , manip)) > return 0; > > diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c > b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c > --- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c > +++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c > @@ -226,6 +226,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb, > } > > nf_ct_invert_tuplepr(, >tuplehash[!dir].tuple); > + target.dst.protonum = IPPROTO_ICMPV6; > if (!nf_nat_ipv6_manip_pkt(skb, 0, , manip)) > return 0; > >
Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
L.S., While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT. (using an nftables firewall with NAT and connection tracking). Unfortunately it isn't too obvious since no errors are logged, but on clients it causes symptoms like firefox intermittently not being able to load pages with: Network Protocol Error An error occurred during a connection to www.example.com The page you are trying to view cannot be shown because an error in the network protocol was detected. Please contact the website owners to inform them of this problem. But it's only intermittently, so i can still visit some webpages with clients, could be that packet size and or fragments are at play ? So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer, but to no avail. After that I tried to git bisect and ended up with: faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit commit faec18dbb0405c7d4dda025054511dc3a6696918 Author: Florian Westphal Date: Thu Dec 13 16:01:33 2018 +0100 netfilter: nat: remove l4proto->manip_pkt This removes the last l4proto indirection, the two callers, the l3proto packet mangling helpers for ipv4 and ipv6, now call the nf_nat_l4proto_manip_pkt() helper. nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though they contain no functionality anymore to not clutter this patch. Next patch will remove the empty files and the nf_nat_l4proto struct. nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the other nat manip functionality as well, not just udp and udplite. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso :04 04 22d8706921e03cbd6d78a6ebcc5f253ccfd2bf0c b6f8ab2779215b4495dfe641f50e798da73859ac M include :04 04 af212a756f1acf00cbe45c3be5b71f38f01f1d34 165c440f9e6f2e05738628a19b51f7603f95752a M net Any ideas or debugging hints ? -- Sander
Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 23:48, Boris Ostrovsky wrote: > On 9/27/18 5:37 PM, Jens Axboe wrote: >> On 9/27/18 2:33 PM, Sander Eikelenboom wrote: >>> On 27/09/18 21:06, Boris Ostrovsky wrote: >>>> On 9/27/18 2:56 PM, Jens Axboe wrote: >>>>> On 9/27/18 12:52 PM, Sander Eikelenboom wrote: >>>>>> On 27/09/18 16:26, Jens Axboe wrote: >>>>>>> On 9/27/18 1:12 AM, Juergen Gross wrote: >>>>>>>> On 22/09/18 21:55, Boris Ostrovsky wrote: >>>>>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>>>>> added support for purging persistent grants when they are not in use. >>>>>>>>> As >>>>>>>>> part of the purge, the grants were removed from the grant buffer, This >>>>>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in >>>>>>>>> get_free_grant(). This can be observed even on an idle system, within >>>>>>>>> 20-30 minutes. >>>>>>>>> >>>>>>>>> We should keep the grants in the buffer when purging, and only free >>>>>>>>> the >>>>>>>>> grant ref. >>>>>>>>> >>>>>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>>>>> Signed-off-by: Boris Ostrovsky >>>>>>>> Reviewed-by: Juergen Gross >>>>>>> Since Konrad is out, I'm going to queue this up for 4.19. >>>>>>> >>>>>> Hi Boris/Juergen. >>>>>> >>>>>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch >>>>>> from Boris pulled on top. >>>>>> Unfortunately it made a VM hang (probably because it's rootFS is >>>>>> shuffled from under it's feet >>>> What do you mean by "rootFS is shuffled from under it's feet " ? >>> Assumption that block-front getting borked and either a kernel crash or >>> rootfs becoming mounted readonly. Didn't (try) to check though. >>> >>>>>> and it gave these in dom0 dmesg: >>>>>> >>>>>> [ 9251.696090] xen-blkback: requesting a grant already in use >>>>>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> [ 9251.715781] xen-blkback: requesting a grant already in use >>>>>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> [ 9251.735698] xen-blkback: requesting a grant already in use >>>>>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> >>>>>> The VM was a HVM with 4 vcpu's and 2 phy disks: >>>>>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) >>>>>> persistent grants >>>>>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) >>>>>> persistent grants >>>>>> >>>>>> >>>>>> Currently i have been running 4.19-rc5 with xen-next on top and commit >>>>>> a46b53672b2c reverted, for a couple of days. That seems to run stable >>>>>> for me (since it's a small box so i'm not hit by what a46b53672b2c >>>>>> tried to fix. >>>>>> >>>>>> If you can come up with a debug patch i can give that a spin tomorrow >>>>>> evening or in the weekend, so we are hopefully still in time for the >>>>>> 4.19 release. >>>>> At this late in the game, might make more sense to simply revert the >>>>> buggy commit. Especially since what is currently out there doesn't fix >>>>> the issue for you. >>> Don't know if Boris or Juergen have a hunch about the issue, if not >>> perhaps a revert is the best. >> Anyone? Unless I hear otherwise, I'll revert the series tomorrow. > > Juergen may have something to say by tomorrow, but from my perspective, > given that we are coming up on rc6 --- yes. > > I looked at the patches again and didn't see anything obvious. > > -boris Could also be that what i hit is a latent bug, that is not caused by these patches but merely got uncovered by them. xl dmesg also shows quite some: (XEN) [2018-09-24 03:15:46.847] grant_table.c:1755:d14v0 Expanding d14 grant table from 19 to 20 frames (XEN) [2018-09-24 03:15:46.849] grant_table.c:1755:d14v0 Expanding d14 grant table from 20 to 21 frames (and has done that for ages on my box not leading to any direct problems to my knowledge) I don't know if there could be related and something around the (persistent) grants for block devices could be leaking under some conditions? -- Sander
Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 23:48, Boris Ostrovsky wrote: > On 9/27/18 5:37 PM, Jens Axboe wrote: >> On 9/27/18 2:33 PM, Sander Eikelenboom wrote: >>> On 27/09/18 21:06, Boris Ostrovsky wrote: >>>> On 9/27/18 2:56 PM, Jens Axboe wrote: >>>>> On 9/27/18 12:52 PM, Sander Eikelenboom wrote: >>>>>> On 27/09/18 16:26, Jens Axboe wrote: >>>>>>> On 9/27/18 1:12 AM, Juergen Gross wrote: >>>>>>>> On 22/09/18 21:55, Boris Ostrovsky wrote: >>>>>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>>>>> added support for purging persistent grants when they are not in use. >>>>>>>>> As >>>>>>>>> part of the purge, the grants were removed from the grant buffer, This >>>>>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in >>>>>>>>> get_free_grant(). This can be observed even on an idle system, within >>>>>>>>> 20-30 minutes. >>>>>>>>> >>>>>>>>> We should keep the grants in the buffer when purging, and only free >>>>>>>>> the >>>>>>>>> grant ref. >>>>>>>>> >>>>>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>>>>> Signed-off-by: Boris Ostrovsky >>>>>>>> Reviewed-by: Juergen Gross >>>>>>> Since Konrad is out, I'm going to queue this up for 4.19. >>>>>>> >>>>>> Hi Boris/Juergen. >>>>>> >>>>>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch >>>>>> from Boris pulled on top. >>>>>> Unfortunately it made a VM hang (probably because it's rootFS is >>>>>> shuffled from under it's feet >>>> What do you mean by "rootFS is shuffled from under it's feet " ? >>> Assumption that block-front getting borked and either a kernel crash or >>> rootfs becoming mounted readonly. Didn't (try) to check though. >>> >>>>>> and it gave these in dom0 dmesg: >>>>>> >>>>>> [ 9251.696090] xen-blkback: requesting a grant already in use >>>>>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> [ 9251.715781] xen-blkback: requesting a grant already in use >>>>>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> [ 9251.735698] xen-blkback: requesting a grant already in use >>>>>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> >>>>>> The VM was a HVM with 4 vcpu's and 2 phy disks: >>>>>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) >>>>>> persistent grants >>>>>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) >>>>>> persistent grants >>>>>> >>>>>> >>>>>> Currently i have been running 4.19-rc5 with xen-next on top and commit >>>>>> a46b53672b2c reverted, for a couple of days. That seems to run stable >>>>>> for me (since it's a small box so i'm not hit by what a46b53672b2c >>>>>> tried to fix. >>>>>> >>>>>> If you can come up with a debug patch i can give that a spin tomorrow >>>>>> evening or in the weekend, so we are hopefully still in time for the >>>>>> 4.19 release. >>>>> At this late in the game, might make more sense to simply revert the >>>>> buggy commit. Especially since what is currently out there doesn't fix >>>>> the issue for you. >>> Don't know if Boris or Juergen have a hunch about the issue, if not >>> perhaps a revert is the best. >> Anyone? Unless I hear otherwise, I'll revert the series tomorrow. > > Juergen may have something to say by tomorrow, but from my perspective, > given that we are coming up on rc6 --- yes. > > I looked at the patches again and didn't see anything obvious. > > -boris Could also be that what i hit is a latent bug, that is not caused by these patches but merely got uncovered by them. xl dmesg also shows quite some: (XEN) [2018-09-24 03:15:46.847] grant_table.c:1755:d14v0 Expanding d14 grant table from 19 to 20 frames (XEN) [2018-09-24 03:15:46.849] grant_table.c:1755:d14v0 Expanding d14 grant table from 20 to 21 frames (and has done that for ages on my box not leading to any direct problems to my knowledge) I don't know if there could be related and something around the (persistent) grants for block devices could be leaking under some conditions? -- Sander
Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 21:06, Boris Ostrovsky wrote: > On 9/27/18 2:56 PM, Jens Axboe wrote: >> On 9/27/18 12:52 PM, Sander Eikelenboom wrote: >>> On 27/09/18 16:26, Jens Axboe wrote: >>>> On 9/27/18 1:12 AM, Juergen Gross wrote: >>>>> On 22/09/18 21:55, Boris Ostrovsky wrote: >>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>> added support for purging persistent grants when they are not in use. As >>>>>> part of the purge, the grants were removed from the grant buffer, This >>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in >>>>>> get_free_grant(). This can be observed even on an idle system, within >>>>>> 20-30 minutes. >>>>>> >>>>>> We should keep the grants in the buffer when purging, and only free the >>>>>> grant ref. >>>>>> >>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>> Signed-off-by: Boris Ostrovsky >>>>> Reviewed-by: Juergen Gross >>>> Since Konrad is out, I'm going to queue this up for 4.19. >>>> >>> Hi Boris/Juergen. >>> >>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch >>> from Boris pulled on top. >>> Unfortunately it made a VM hang (probably because it's rootFS is shuffled >>> from under it's feet > > What do you mean by "rootFS is shuffled from under it's feet " ? Assumption that block-front getting borked and either a kernel crash or rootfs becoming mounted readonly. Didn't (try) to check though. >>> and it gave these in dom0 dmesg: >>> >>> [ 9251.696090] xen-blkback: requesting a grant already in use >>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the tree >>> [ 9251.715781] xen-blkback: requesting a grant already in use >>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the tree >>> [ 9251.735698] xen-blkback: requesting a grant already in use >>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the tree >>> >>> The VM was a HVM with 4 vcpu's and 2 phy disks: >>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) >>> persistent grants >>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) >>> persistent grants >>> >>> >>> Currently i have been running 4.19-rc5 with xen-next on top and commit >>> a46b53672b2c reverted, for a couple of days. That seems to run stable >>> for me (since it's a small box so i'm not hit by what a46b53672b2c >>> tried to fix. >>> >>> If you can come up with a debug patch i can give that a spin tomorrow >>> evening or in the weekend, so we are hopefully still in time for the >>> 4.19 release. >> At this late in the game, might make more sense to simply revert the >> buggy commit. Especially since what is currently out there doesn't fix >> the issue for you. Don't know if Boris or Juergen have a hunch about the issue, if not perhaps a revert is the best. > If decision is to revert then I think the whole series needs to be > reverted. > > -boris > For Boris and Juergen: Would it make sense to have an "xen-next" branch in the xen-tip tree that is: - based on the previous stable kernel - and has the for-linus branches for the upcoming kernel release on top; - and has the pathes for net(-next) and block changes on top (since these don't go via the tree but only via mailing-list patches); (which are scattered, difficult to track and use for automated testing) - and dependency patches for the above if necessary to be able to build. So there is one branch that can be used to test ALL pending kernel related Xen patches and which could be used in OSStest without as many potential false alarms as linux-next will have ? -- Sander
Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 21:06, Boris Ostrovsky wrote: > On 9/27/18 2:56 PM, Jens Axboe wrote: >> On 9/27/18 12:52 PM, Sander Eikelenboom wrote: >>> On 27/09/18 16:26, Jens Axboe wrote: >>>> On 9/27/18 1:12 AM, Juergen Gross wrote: >>>>> On 22/09/18 21:55, Boris Ostrovsky wrote: >>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>> added support for purging persistent grants when they are not in use. As >>>>>> part of the purge, the grants were removed from the grant buffer, This >>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in >>>>>> get_free_grant(). This can be observed even on an idle system, within >>>>>> 20-30 minutes. >>>>>> >>>>>> We should keep the grants in the buffer when purging, and only free the >>>>>> grant ref. >>>>>> >>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>> Signed-off-by: Boris Ostrovsky >>>>> Reviewed-by: Juergen Gross >>>> Since Konrad is out, I'm going to queue this up for 4.19. >>>> >>> Hi Boris/Juergen. >>> >>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch >>> from Boris pulled on top. >>> Unfortunately it made a VM hang (probably because it's rootFS is shuffled >>> from under it's feet > > What do you mean by "rootFS is shuffled from under it's feet " ? Assumption that block-front getting borked and either a kernel crash or rootfs becoming mounted readonly. Didn't (try) to check though. >>> and it gave these in dom0 dmesg: >>> >>> [ 9251.696090] xen-blkback: requesting a grant already in use >>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the tree >>> [ 9251.715781] xen-blkback: requesting a grant already in use >>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the tree >>> [ 9251.735698] xen-blkback: requesting a grant already in use >>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the tree >>> >>> The VM was a HVM with 4 vcpu's and 2 phy disks: >>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) >>> persistent grants >>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) >>> persistent grants >>> >>> >>> Currently i have been running 4.19-rc5 with xen-next on top and commit >>> a46b53672b2c reverted, for a couple of days. That seems to run stable >>> for me (since it's a small box so i'm not hit by what a46b53672b2c >>> tried to fix. >>> >>> If you can come up with a debug patch i can give that a spin tomorrow >>> evening or in the weekend, so we are hopefully still in time for the >>> 4.19 release. >> At this late in the game, might make more sense to simply revert the >> buggy commit. Especially since what is currently out there doesn't fix >> the issue for you. Don't know if Boris or Juergen have a hunch about the issue, if not perhaps a revert is the best. > If decision is to revert then I think the whole series needs to be > reverted. > > -boris > For Boris and Juergen: Would it make sense to have an "xen-next" branch in the xen-tip tree that is: - based on the previous stable kernel - and has the for-linus branches for the upcoming kernel release on top; - and has the pathes for net(-next) and block changes on top (since these don't go via the tree but only via mailing-list patches); (which are scattered, difficult to track and use for automated testing) - and dependency patches for the above if necessary to be able to build. So there is one branch that can be used to test ALL pending kernel related Xen patches and which could be used in OSStest without as many potential false alarms as linux-next will have ? -- Sander
Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 16:26, Jens Axboe wrote: > On 9/27/18 1:12 AM, Juergen Gross wrote: >> On 22/09/18 21:55, Boris Ostrovsky wrote: >>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>> added support for purging persistent grants when they are not in use. As >>> part of the purge, the grants were removed from the grant buffer, This >>> eventually causes the buffer to become empty, with BUG_ON triggered in >>> get_free_grant(). This can be observed even on an idle system, within >>> 20-30 minutes. >>> >>> We should keep the grants in the buffer when purging, and only free the >>> grant ref. >>> >>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>> Signed-off-by: Boris Ostrovsky >> >> Reviewed-by: Juergen Gross > > Since Konrad is out, I'm going to queue this up for 4.19. > Hi Boris/Juergen. Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch from Boris pulled on top. Unfortunately it made a VM hang (probably because it's rootFS is shuffled from under it's feet and it gave these in dom0 dmesg: [ 9251.696090] xen-blkback: requesting a grant already in use [ 9251.705861] xen-blkback: trying to add a gref that's already in the tree [ 9251.715781] xen-blkback: requesting a grant already in use [ 9251.725756] xen-blkback: trying to add a gref that's already in the tree [ 9251.735698] xen-blkback: requesting a grant already in use [ 9251.745573] xen-blkback: trying to add a gref that's already in the tree The VM was a HVM with 4 vcpu's and 2 phy disks: xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) persistent grants xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) persistent grants Currently i have been running 4.19-rc5 with xen-next on top and commit a46b53672b2c reverted, for a couple of days. That seems to run stable for me (since it's a small box so i'm not hit by what a46b53672b2c tried to fix. If you can come up with a debug patch i can give that a spin tomorrow evening or in the weekend, so we are hopefully still in time for the 4.19 release. -- Sander
Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 16:26, Jens Axboe wrote: > On 9/27/18 1:12 AM, Juergen Gross wrote: >> On 22/09/18 21:55, Boris Ostrovsky wrote: >>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>> added support for purging persistent grants when they are not in use. As >>> part of the purge, the grants were removed from the grant buffer, This >>> eventually causes the buffer to become empty, with BUG_ON triggered in >>> get_free_grant(). This can be observed even on an idle system, within >>> 20-30 minutes. >>> >>> We should keep the grants in the buffer when purging, and only free the >>> grant ref. >>> >>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>> Signed-off-by: Boris Ostrovsky >> >> Reviewed-by: Juergen Gross > > Since Konrad is out, I'm going to queue this up for 4.19. > Hi Boris/Juergen. Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch from Boris pulled on top. Unfortunately it made a VM hang (probably because it's rootFS is shuffled from under it's feet and it gave these in dom0 dmesg: [ 9251.696090] xen-blkback: requesting a grant already in use [ 9251.705861] xen-blkback: trying to add a gref that's already in the tree [ 9251.715781] xen-blkback: requesting a grant already in use [ 9251.725756] xen-blkback: trying to add a gref that's already in the tree [ 9251.735698] xen-blkback: requesting a grant already in use [ 9251.745573] xen-blkback: trying to add a gref that's already in the tree The VM was a HVM with 4 vcpu's and 2 phy disks: xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) persistent grants xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) persistent grants Currently i have been running 4.19-rc5 with xen-next on top and commit a46b53672b2c reverted, for a couple of days. That seems to run stable for me (since it's a small box so i'm not hit by what a46b53672b2c tried to fix. If you can come up with a debug patch i can give that a spin tomorrow evening or in the weekend, so we are hopefully still in time for the 4.19 release. -- Sander
Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
On 13/02/18 14:07, Ulf Magnusson wrote: > On Tue, Feb 13, 2018 at 1:35 PM, Ulf Magnusson <ulfali...@gmail.com> wrote: >> On Tue, Feb 13, 2018 at 12:33:24PM +0100, Ulf Magnusson wrote: >>> On Tue, Feb 13, 2018 at 11:00:49AM +0100, Sander Eikelenboom wrote: >>>> On 13/02/18 05:09, Masahiro Yamada wrote: >>>>> 2018-02-13 12:00 GMT+09:00 Woody Suwalski <terraluna...@gmail.com>: >>>>>> Sander Eikelenboom wrote: >>>>>>> >>>>>>> L.S., >>>>>>> >>>>>>> The Debian kernel-package tool make-kpkg for easy building of upstream >>>>>>> kernels on Debian fails with linux 4.16-rc1. >>>>>>> >>>>>>> The tool (perl script) while invoked with: >>>>>>> make-kpkg --initrd --append_to_version -20180212 kernel_image >>>>>>> >>>>>>> On a git tree with a .config from the previous kernel release, so new >>>>>>> KConfig questions have to be asked on new or changed options. >>>>>>> >>>>>>> The script stalls indefinitely while it seems to be excuting: >>>>>>> exec make kpkg_version=13.018+nmu1 -f >>>>>>> /usr/share/kernel-package/ruleset/minimal.mk debian >>>>>>> APPEND_TO_VERSION=-t440s-20180212 INITRD=YES >>>>>>> >>>>>>> After using ctrl-c to break out it, i get: >>>>>>> ^CFailed to create a ./debian directory: No such file or directory >>>>>>> at >>>>>>> /usr/bin/make-kpkg line 970. >>>>>>> >>>>>>> Bisection turned up as culprit: >>>>>>> commit d2a04648a5dbc3d1d043b35257364f0197d4d868 >>>>>>> kconfig: remove check_stdin() >>>>>>> Except silentoldconfig, valid_stdin is 1, so check_stdin() is >>>>>>> no-op. >>>>>>> oldconfig and silentoldconfig work almost in the same way >>>>>>> except >>>>>>> that >>>>>>> the latter generates additional files under include/. Both ask >>>>>>> users >>>>>>> for input for new symbols. >>>>>>> I do not know why only silentoldconfig requires stdio be tty. >>>>>>> $ rm -f .config; touch .config >>>>>>>$ yes "" | make oldconfig > stdout >>>>>>>$ rm -f .config; touch .config >>>>>>>$ yes "" | make silentoldconfig > stdout >>>>>>>make[1]: *** [silentoldconfig] Error 1 >>>>>>>make: *** [silentoldconfig] Error 2 >>>>>>>$ tail -n 4 stdout >>>>>>>Console input/output is redirected. Run 'make oldconfig' to >>>>>>> update >>>>>>> configuration. >>>>>>> scripts/kconfig/Makefile:40: recipe for target >>>>>>> 'silentoldconfig' failed >>>>>>>Makefile:507: recipe for target 'silentoldconfig' failed >>>>>>> Redirection is useful, for example, for testing where we want >>>>>>> to >>>>>>> give >>>>>>> particular key inputs from a test file, then check the result. >>>>>>> Signed-off-by: Masahiro Yamada <yamada.masah...@socionext.com> >>>>>>> Reviewed-by: Ulf Magnusson <ulfali...@gmail.com> >>>>>>> >>>>>>> Reverting this specific commit makes make-kpkg work again as usual. >>>>>>> >>>>>>> Version of the kernel-package used: >>>>>>> ii kernel-package >>>>>>> 13.018+nmu1 >>>>>>> >>>>>>> >>>>>>> I also cc'ed the Debian developer who maintains the kernel-package >>>>>>> package: Manoj Srivastava >>>>>>> >>>>>>> -- >>>>>>> Sander >>>>>>> >>>>>> I have noticed today the same - the kernel-build blockage was in (as I >>>>>> recall) >>>>>> srcipts/kconfig/conf -s --silentoldconfig Kbuild >>>>>> >>>>>> I have bypassed it by regenerating the .config "by hand"
Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
On 13/02/18 14:07, Ulf Magnusson wrote: > On Tue, Feb 13, 2018 at 1:35 PM, Ulf Magnusson wrote: >> On Tue, Feb 13, 2018 at 12:33:24PM +0100, Ulf Magnusson wrote: >>> On Tue, Feb 13, 2018 at 11:00:49AM +0100, Sander Eikelenboom wrote: >>>> On 13/02/18 05:09, Masahiro Yamada wrote: >>>>> 2018-02-13 12:00 GMT+09:00 Woody Suwalski : >>>>>> Sander Eikelenboom wrote: >>>>>>> >>>>>>> L.S., >>>>>>> >>>>>>> The Debian kernel-package tool make-kpkg for easy building of upstream >>>>>>> kernels on Debian fails with linux 4.16-rc1. >>>>>>> >>>>>>> The tool (perl script) while invoked with: >>>>>>> make-kpkg --initrd --append_to_version -20180212 kernel_image >>>>>>> >>>>>>> On a git tree with a .config from the previous kernel release, so new >>>>>>> KConfig questions have to be asked on new or changed options. >>>>>>> >>>>>>> The script stalls indefinitely while it seems to be excuting: >>>>>>> exec make kpkg_version=13.018+nmu1 -f >>>>>>> /usr/share/kernel-package/ruleset/minimal.mk debian >>>>>>> APPEND_TO_VERSION=-t440s-20180212 INITRD=YES >>>>>>> >>>>>>> After using ctrl-c to break out it, i get: >>>>>>> ^CFailed to create a ./debian directory: No such file or directory >>>>>>> at >>>>>>> /usr/bin/make-kpkg line 970. >>>>>>> >>>>>>> Bisection turned up as culprit: >>>>>>> commit d2a04648a5dbc3d1d043b35257364f0197d4d868 >>>>>>> kconfig: remove check_stdin() >>>>>>> Except silentoldconfig, valid_stdin is 1, so check_stdin() is >>>>>>> no-op. >>>>>>> oldconfig and silentoldconfig work almost in the same way >>>>>>> except >>>>>>> that >>>>>>> the latter generates additional files under include/. Both ask >>>>>>> users >>>>>>> for input for new symbols. >>>>>>> I do not know why only silentoldconfig requires stdio be tty. >>>>>>> $ rm -f .config; touch .config >>>>>>>$ yes "" | make oldconfig > stdout >>>>>>>$ rm -f .config; touch .config >>>>>>>$ yes "" | make silentoldconfig > stdout >>>>>>>make[1]: *** [silentoldconfig] Error 1 >>>>>>>make: *** [silentoldconfig] Error 2 >>>>>>>$ tail -n 4 stdout >>>>>>>Console input/output is redirected. Run 'make oldconfig' to >>>>>>> update >>>>>>> configuration. >>>>>>> scripts/kconfig/Makefile:40: recipe for target >>>>>>> 'silentoldconfig' failed >>>>>>>Makefile:507: recipe for target 'silentoldconfig' failed >>>>>>> Redirection is useful, for example, for testing where we want >>>>>>> to >>>>>>> give >>>>>>> particular key inputs from a test file, then check the result. >>>>>>> Signed-off-by: Masahiro Yamada >>>>>>> Reviewed-by: Ulf Magnusson >>>>>>> >>>>>>> Reverting this specific commit makes make-kpkg work again as usual. >>>>>>> >>>>>>> Version of the kernel-package used: >>>>>>> ii kernel-package >>>>>>> 13.018+nmu1 >>>>>>> >>>>>>> >>>>>>> I also cc'ed the Debian developer who maintains the kernel-package >>>>>>> package: Manoj Srivastava >>>>>>> >>>>>>> -- >>>>>>> Sander >>>>>>> >>>>>> I have noticed today the same - the kernel-build blockage was in (as I >>>>>> recall) >>>>>> srcipts/kconfig/conf -s --silentoldconfig Kbuild >>>>>> >>>>>> I have bypassed it by regenerating the .config "by hand"... >>>>> >>>>> >>>>> silentoldconfig asks you values for new symb
Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
On 13/02/18 05:09, Masahiro Yamada wrote: > 2018-02-13 12:00 GMT+09:00 Woody Suwalski <terraluna...@gmail.com>: >> Sander Eikelenboom wrote: >>> >>> L.S., >>> >>> The Debian kernel-package tool make-kpkg for easy building of upstream >>> kernels on Debian fails with linux 4.16-rc1. >>> >>> The tool (perl script) while invoked with: >>> make-kpkg --initrd --append_to_version -20180212 kernel_image >>> >>> On a git tree with a .config from the previous kernel release, so new >>> KConfig questions have to be asked on new or changed options. >>> >>> The script stalls indefinitely while it seems to be excuting: >>> exec make kpkg_version=13.018+nmu1 -f >>> /usr/share/kernel-package/ruleset/minimal.mk debian >>> APPEND_TO_VERSION=-t440s-20180212 INITRD=YES >>> >>> After using ctrl-c to break out it, i get: >>> ^CFailed to create a ./debian directory: No such file or directory at >>> /usr/bin/make-kpkg line 970. >>> >>> Bisection turned up as culprit: >>> commit d2a04648a5dbc3d1d043b35257364f0197d4d868 >>> kconfig: remove check_stdin() >>> Except silentoldconfig, valid_stdin is 1, so check_stdin() is >>> no-op. >>> oldconfig and silentoldconfig work almost in the same way except >>> that >>> the latter generates additional files under include/. Both ask users >>> for input for new symbols. >>> I do not know why only silentoldconfig requires stdio be tty. >>> $ rm -f .config; touch .config >>>$ yes "" | make oldconfig > stdout >>>$ rm -f .config; touch .config >>>$ yes "" | make silentoldconfig > stdout >>>make[1]: *** [silentoldconfig] Error 1 >>>make: *** [silentoldconfig] Error 2 >>>$ tail -n 4 stdout >>>Console input/output is redirected. Run 'make oldconfig' to update >>> configuration. >>> scripts/kconfig/Makefile:40: recipe for target >>> 'silentoldconfig' failed >>>Makefile:507: recipe for target 'silentoldconfig' failed >>> Redirection is useful, for example, for testing where we want to >>> give >>> particular key inputs from a test file, then check the result. >>> Signed-off-by: Masahiro Yamada <yamada.masah...@socionext.com> >>> Reviewed-by: Ulf Magnusson <ulfali...@gmail.com> >>> >>> Reverting this specific commit makes make-kpkg work again as usual. >>> >>> Version of the kernel-package used: >>> ii kernel-package >>> 13.018+nmu1 >>> >>> >>> I also cc'ed the Debian developer who maintains the kernel-package >>> package: Manoj Srivastava >>> >>> -- >>> Sander >>> >> I have noticed today the same - the kernel-build blockage was in (as I >> recall) >> srcipts/kconfig/conf -s --silentoldconfig Kbuild >> >> I have bypassed it by regenerating the .config "by hand"... > > > silentoldconfig asks you values for new symbols. > So, you must answer questions to proceed. I know, but it stalls before asking the questions. > > How does 'make-kpkg' handle silentoldconfig? > > Re-direct stdio, then make it forcibly fail? I don't know, it is a bunch of perl and shell scripts that gets invoked, not the most easy to comprehend if you are not familiar with them. I'm just a user of the tool. So i would have to defer that question to the Debian package maintainer, hopefully he will chime in. -- Sander > > >
Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
On 13/02/18 05:09, Masahiro Yamada wrote: > 2018-02-13 12:00 GMT+09:00 Woody Suwalski : >> Sander Eikelenboom wrote: >>> >>> L.S., >>> >>> The Debian kernel-package tool make-kpkg for easy building of upstream >>> kernels on Debian fails with linux 4.16-rc1. >>> >>> The tool (perl script) while invoked with: >>> make-kpkg --initrd --append_to_version -20180212 kernel_image >>> >>> On a git tree with a .config from the previous kernel release, so new >>> KConfig questions have to be asked on new or changed options. >>> >>> The script stalls indefinitely while it seems to be excuting: >>> exec make kpkg_version=13.018+nmu1 -f >>> /usr/share/kernel-package/ruleset/minimal.mk debian >>> APPEND_TO_VERSION=-t440s-20180212 INITRD=YES >>> >>> After using ctrl-c to break out it, i get: >>> ^CFailed to create a ./debian directory: No such file or directory at >>> /usr/bin/make-kpkg line 970. >>> >>> Bisection turned up as culprit: >>> commit d2a04648a5dbc3d1d043b35257364f0197d4d868 >>> kconfig: remove check_stdin() >>> Except silentoldconfig, valid_stdin is 1, so check_stdin() is >>> no-op. >>> oldconfig and silentoldconfig work almost in the same way except >>> that >>> the latter generates additional files under include/. Both ask users >>> for input for new symbols. >>> I do not know why only silentoldconfig requires stdio be tty. >>> $ rm -f .config; touch .config >>>$ yes "" | make oldconfig > stdout >>>$ rm -f .config; touch .config >>>$ yes "" | make silentoldconfig > stdout >>>make[1]: *** [silentoldconfig] Error 1 >>>make: *** [silentoldconfig] Error 2 >>>$ tail -n 4 stdout >>>Console input/output is redirected. Run 'make oldconfig' to update >>> configuration. >>> scripts/kconfig/Makefile:40: recipe for target >>> 'silentoldconfig' failed >>>Makefile:507: recipe for target 'silentoldconfig' failed >>> Redirection is useful, for example, for testing where we want to >>> give >>> particular key inputs from a test file, then check the result. >>> Signed-off-by: Masahiro Yamada >>> Reviewed-by: Ulf Magnusson >>> >>> Reverting this specific commit makes make-kpkg work again as usual. >>> >>> Version of the kernel-package used: >>> ii kernel-package >>> 13.018+nmu1 >>> >>> >>> I also cc'ed the Debian developer who maintains the kernel-package >>> package: Manoj Srivastava >>> >>> -- >>> Sander >>> >> I have noticed today the same - the kernel-build blockage was in (as I >> recall) >> srcipts/kconfig/conf -s --silentoldconfig Kbuild >> >> I have bypassed it by regenerating the .config "by hand"... > > > silentoldconfig asks you values for new symbols. > So, you must answer questions to proceed. I know, but it stalls before asking the questions. > > How does 'make-kpkg' handle silentoldconfig? > > Re-direct stdio, then make it forcibly fail? I don't know, it is a bunch of perl and shell scripts that gets invoked, not the most easy to comprehend if you are not familiar with them. I'm just a user of the tool. So i would have to defer that question to the Debian package maintainer, hopefully he will chime in. -- Sander > > >
Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
L.S., The Debian kernel-package tool make-kpkg for easy building of upstream kernels on Debian fails with linux 4.16-rc1. The tool (perl script) while invoked with: make-kpkg --initrd --append_to_version -20180212 kernel_image On a git tree with a .config from the previous kernel release, so new KConfig questions have to be asked on new or changed options. The script stalls indefinitely while it seems to be excuting: exec make kpkg_version=13.018+nmu1 -f /usr/share/kernel-package/ruleset/minimal.mk debian APPEND_TO_VERSION=-t440s-20180212 INITRD=YES After using ctrl-c to break out it, i get: ^CFailed to create a ./debian directory: No such file or directory at /usr/bin/make-kpkg line 970. Bisection turned up as culprit: commit d2a04648a5dbc3d1d043b35257364f0197d4d868 kconfig: remove check_stdin() Except silentoldconfig, valid_stdin is 1, so check_stdin() is no-op. oldconfig and silentoldconfig work almost in the same way except that the latter generates additional files under include/. Both ask users for input for new symbols. I do not know why only silentoldconfig requires stdio be tty. $ rm -f .config; touch .config $ yes "" | make oldconfig > stdout $ rm -f .config; touch .config $ yes "" | make silentoldconfig > stdout make[1]: *** [silentoldconfig] Error 1 make: *** [silentoldconfig] Error 2 $ tail -n 4 stdout Console input/output is redirected. Run 'make oldconfig' to update configuration. scripts/kconfig/Makefile:40: recipe for target 'silentoldconfig' failed Makefile:507: recipe for target 'silentoldconfig' failed Redirection is useful, for example, for testing where we want to give particular key inputs from a test file, then check the result. Signed-off-by: Masahiro YamadaReviewed-by: Ulf Magnusson Reverting this specific commit makes make-kpkg work again as usual. Version of the kernel-package used: ii kernel-package 13.018+nmu1 I also cc'ed the Debian developer who maintains the kernel-package package: Manoj Srivastava -- Sander
Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
L.S., The Debian kernel-package tool make-kpkg for easy building of upstream kernels on Debian fails with linux 4.16-rc1. The tool (perl script) while invoked with: make-kpkg --initrd --append_to_version -20180212 kernel_image On a git tree with a .config from the previous kernel release, so new KConfig questions have to be asked on new or changed options. The script stalls indefinitely while it seems to be excuting: exec make kpkg_version=13.018+nmu1 -f /usr/share/kernel-package/ruleset/minimal.mk debian APPEND_TO_VERSION=-t440s-20180212 INITRD=YES After using ctrl-c to break out it, i get: ^CFailed to create a ./debian directory: No such file or directory at /usr/bin/make-kpkg line 970. Bisection turned up as culprit: commit d2a04648a5dbc3d1d043b35257364f0197d4d868 kconfig: remove check_stdin() Except silentoldconfig, valid_stdin is 1, so check_stdin() is no-op. oldconfig and silentoldconfig work almost in the same way except that the latter generates additional files under include/. Both ask users for input for new symbols. I do not know why only silentoldconfig requires stdio be tty. $ rm -f .config; touch .config $ yes "" | make oldconfig > stdout $ rm -f .config; touch .config $ yes "" | make silentoldconfig > stdout make[1]: *** [silentoldconfig] Error 1 make: *** [silentoldconfig] Error 2 $ tail -n 4 stdout Console input/output is redirected. Run 'make oldconfig' to update configuration. scripts/kconfig/Makefile:40: recipe for target 'silentoldconfig' failed Makefile:507: recipe for target 'silentoldconfig' failed Redirection is useful, for example, for testing where we want to give particular key inputs from a test file, then check the result. Signed-off-by: Masahiro Yamada Reviewed-by: Ulf Magnusson Reverting this specific commit makes make-kpkg work again as usual. Version of the kernel-package used: ii kernel-package 13.018+nmu1 I also cc'ed the Debian developer who maintains the kernel-package package: Manoj Srivastava -- Sander
Linux 4.14-rc6 bisected regression tun devices not working anymore in openvpn
L.S., While testing a linux 4.14-rc6 kernel i noticed OpenVPN didn't function anymore. My openvpn config uses tun devices and is pretty standard. The openvpn version is current Debian stable: openvpn 2.4.0-6+deb9u2 >From the openvpn logging: Sat Oct 28 16:03:34 2017 us=175829 TUN/TAP device opened Sat Oct 28 16:03:34 2017 us=183027 Note: Cannot set tx queue length on : No such device (errno=19) Sat Oct 28 16:03:34 2017 us=183055 do_ifconfig, tt->did_ifconfig_ipv6_setup=0 Sat Oct 28 16:03:34 2017 us=183071 /sbin/ip link set dev up mtu 1500 Cannot find device "" Sat Oct 28 16:03:34 2017 us=200445 Linux ip link set failed: external program exited with error status: 1 Sat Oct 28 16:03:34 2017 us=200482 Exiting due to fatal error Sat Oct 28 16:38:17 2017 us=923381 TCP/UDP: Closing socket Sat Oct 28 16:38:17 2017 us=925986 Closing TUN/TAP interface The offending commit is: 0ad646c81b2182f7fa67ec0c8c825e0ee165696d "tun: call dev_get_valid_name() before register_netdevice()" Reverting this commit fixes the issue for me, it's unfortunate that the commit it self seems to fix an other issue. -- Sander
Linux 4.14-rc6 bisected regression tun devices not working anymore in openvpn
L.S., While testing a linux 4.14-rc6 kernel i noticed OpenVPN didn't function anymore. My openvpn config uses tun devices and is pretty standard. The openvpn version is current Debian stable: openvpn 2.4.0-6+deb9u2 >From the openvpn logging: Sat Oct 28 16:03:34 2017 us=175829 TUN/TAP device opened Sat Oct 28 16:03:34 2017 us=183027 Note: Cannot set tx queue length on : No such device (errno=19) Sat Oct 28 16:03:34 2017 us=183055 do_ifconfig, tt->did_ifconfig_ipv6_setup=0 Sat Oct 28 16:03:34 2017 us=183071 /sbin/ip link set dev up mtu 1500 Cannot find device "" Sat Oct 28 16:03:34 2017 us=200445 Linux ip link set failed: external program exited with error status: 1 Sat Oct 28 16:03:34 2017 us=200482 Exiting due to fatal error Sat Oct 28 16:38:17 2017 us=923381 TCP/UDP: Closing socket Sat Oct 28 16:38:17 2017 us=925986 Closing TUN/TAP interface The offending commit is: 0ad646c81b2182f7fa67ec0c8c825e0ee165696d "tun: call dev_get_valid_name() before register_netdevice()" Reverting this commit fixes the issue for me, it's unfortunate that the commit it self seems to fix an other issue. -- Sander
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 19:49, Craig Bergstrom wrote: > Sander, thanks for the details, they've been very useful. > > I suspect that your host system's mem=2048M parameter is causing the > problem. Any chance you can confirm by removing the parameter and > running the guest code path? I removed it, but kept the hypervisor limiting dom0 memory to 2046M intact (in grub using the xen bootcmd: "multiboot /xen-4.10.gz dom0_mem=2048M,max:2048M ." Unfortunately that doesn't change anything, the guest still fails to start with the same errors. > More specifically, since you're telling the kernel that it's high > memory address is at 2048M and your device is at 0xfe1fe000 (~4G), the > new mmap() limits are preventing you from mapping addresses that are > explicitly disallowed by the parameter. > Which would probably mean the current patch prohibits hard limiting the dom0 memory to a certain value (below 4G) at least in combination with PCI-passthrough. So the only thing left would be to have no hard memory restriction on dom0 and rely on auto-ballooning, but I'm not a great fan of that. I don't know how KVM handles setting memory limits for the host system, but perhaps it suffers from the same issue. I also tried the patch from one of your last mails to make the check "less strict", but still get the same errors (when using the hard memory limits). -- Sander > > On Thu, Oct 26, 2017 at 10:39 AM, Ingo Molnarwrote: >> >> * Craig Bergstrom wrote: >> >>> Yes, not much time left for 4.14, it might be reasonable to pull the >>> change out since it's causing problems. [...] >> >> Ok, I'll queue up a revert tomorrow morning and send it to Linus ASAP if >> there's >> no good fix by then. In hindsight I should have queued it for v4.15 ... >> >> Thanks, >> >> Ingo
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 19:49, Craig Bergstrom wrote: > Sander, thanks for the details, they've been very useful. > > I suspect that your host system's mem=2048M parameter is causing the > problem. Any chance you can confirm by removing the parameter and > running the guest code path? I removed it, but kept the hypervisor limiting dom0 memory to 2046M intact (in grub using the xen bootcmd: "multiboot /xen-4.10.gz dom0_mem=2048M,max:2048M ." Unfortunately that doesn't change anything, the guest still fails to start with the same errors. > More specifically, since you're telling the kernel that it's high > memory address is at 2048M and your device is at 0xfe1fe000 (~4G), the > new mmap() limits are preventing you from mapping addresses that are > explicitly disallowed by the parameter. > Which would probably mean the current patch prohibits hard limiting the dom0 memory to a certain value (below 4G) at least in combination with PCI-passthrough. So the only thing left would be to have no hard memory restriction on dom0 and rely on auto-ballooning, but I'm not a great fan of that. I don't know how KVM handles setting memory limits for the host system, but perhaps it suffers from the same issue. I also tried the patch from one of your last mails to make the check "less strict", but still get the same errors (when using the hard memory limits). -- Sander > > On Thu, Oct 26, 2017 at 10:39 AM, Ingo Molnar wrote: >> >> * Craig Bergstrom wrote: >> >>> Yes, not much time left for 4.14, it might be reasonable to pull the >>> change out since it's causing problems. [...] >> >> Ok, I'll queue up a revert tomorrow morning and send it to Linus ASAP if >> there's >> no good fix by then. In hindsight I should have queued it for v4.15 ... >> >> Thanks, >> >> Ingo
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 10:12, Sander Eikelenboom wrote: > On 26/10/17 10:05, Sander Eikelenboom wrote: >> On 26/10/17 00:02, Craig Bergstrom wrote: >>> Thanks for the notification, my apologies for the breakage. I'll take a >>> close look and see if I can figure out what went wrong. >>> >>> Sander, any chance you can send /proc/iomem and the inputs to the mmap call >>> that fail on your affected system? >> >> Hi Craig, >> >> The output from /proc/iomem is simple to get and attached. >> The mmap call is probably issued by qemu and will require more digging. > > Ahh grepping qemu gave a pointer, it's probably the code in: > > http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40 > > around line 571, that would also explain why it's only this device that > has the problem, since it's the only one trying to use MSI(-X) > interrupts. Will see it i can add some logging to that function. Attached is the qemu debug output with an extra line outputting all stuff used to calculate the arguments used by the mmap-call. -- Sander > -- > Sander > > >> >> I don't know if there is that much time left for 4.14, since we are at >> RC6 already. >> >> -- >> Sander >> >> >>> >>> >>> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky <boris.ostrov...@oracle.com >>>> wrote: >>> >>>> On 10/23/2017 10:44 PM, Fengguang Wu wrote: >>>>> Greetings, >>>>> >>>>> 0day kernel testing robot got the below dmesg and the first bad commit is >>>>> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >>>> master >>>>> >>>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381 >>>>> Author: Craig Bergstrom <cra...@google.com> >>>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600 >>>>> Commit: Ingo Molnar <mi...@kernel.org> >>>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200 >>>>> >>>>> x86/mm: Limit mmap() of /dev/mem to valid physical addresses >>>> >>>> Also note >>>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html >>>> >>>> -boris >>>> >>> >> > qemu-system-i386: -serial pty: char device redirected to /dev/pts/16 (label serial0) [00:05.0] xen_pt_realize: Assigning real physical device 08:00.0 to devfn 0x28 [00:05.0] xen_pt_register_regions: IO region 0 registered (size=0x2000 base_addr=0xfe1fe000 type: 0x4) [00:05.0] xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080, host=0x, syncing to 0x0080. [00:05.0] xen_pt_config_reg_init: Offset 0x0010 mismatch! Emulated=0x, host=0xfe1fe004, syncing to 0xfe1fe004. [00:05.0] xen_pt_config_reg_init: Offset 0x0052 mismatch! Emulated=0x, host=0x4803, syncing to 0x0003. [00:05.0] xen_pt_config_reg_init: Offset 0x0072 mismatch! Emulated=0x, host=0x0086, syncing to 0x0080. [00:05.0] xen_pt_config_reg_init: Offset 0x00a4 mismatch! Emulated=0x, host=0x8fc0, syncing to 0x8fc0. [00:05.0] xen_pt_config_reg_init: Offset 0x00b2 mismatch! Emulated=0x, host=0x1012, syncing to 0x1012. [00:05.0] xen_pt_msix_init: get MSI-X table BAR base 0xfe1fe000 [00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8 [00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8, PCI_MSIX_ENTRY_SIZE = 0x10, msix->table_offset_adjust = 0, msix->table_base = 0xfe1fe000 [00:05.0] xen_pt_msix_init: Error: Can't map physical MSI-X table: Invalid argument [00:05.0] xen_pt_msix_size_init: Error: Internal error: Invalid xen_pt_msix_init. Failed to initialize 12/15, type = 0x1, rc: -22 [00:05.0] xen_pt_msi_set_enable: disabling MSI. *** Error in `/usr/local/lib/xen/bin/qemu-system-i386': corrupted size vs. prev_size: 0x55ce13565570 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x70bcb)[0x7f700ab7ebcb] /lib/x86_64-linux-gnu/libc.so.6(+0x76f96)[0x7f700ab84f96] /lib/x86_64-linux-gnu/libc.so.6(+0x77388)[0x7f700ab85388] /lib/x86_64-linux-gnu/libc.so.6(+0x78dca)[0x7f700ab86dca] /lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0x27b)[0x7f700ab89b4b] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_malloc0+0x21)[0x7f700bbbee61] /usr/local/lib/xen/bin/qemu-system-i386(+0x6d78ee)[0x55ce114298ee] /usr/local/lib/xen/bin/qemu-system-i386(+0x6d309e)[0x55ce1142509e] /usr/local/lib/xen/bin/qemu-system-i386(+0x6d316f)[0x55ce1142516f] /usr/local/lib/xen/bin/qemu-system-i386(+0x24d79b)[0x55ce10f9f79b] /usr/local/lib/xen/bin/qemu-system-i386(+0x6da8bf)[0x55ce1142c8bf] /usr/local/lib/xen/bin/qemu-
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 10:12, Sander Eikelenboom wrote: > On 26/10/17 10:05, Sander Eikelenboom wrote: >> On 26/10/17 00:02, Craig Bergstrom wrote: >>> Thanks for the notification, my apologies for the breakage. I'll take a >>> close look and see if I can figure out what went wrong. >>> >>> Sander, any chance you can send /proc/iomem and the inputs to the mmap call >>> that fail on your affected system? >> >> Hi Craig, >> >> The output from /proc/iomem is simple to get and attached. >> The mmap call is probably issued by qemu and will require more digging. > > Ahh grepping qemu gave a pointer, it's probably the code in: > > http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40 > > around line 571, that would also explain why it's only this device that > has the problem, since it's the only one trying to use MSI(-X) > interrupts. Will see it i can add some logging to that function. Attached is the qemu debug output with an extra line outputting all stuff used to calculate the arguments used by the mmap-call. -- Sander > -- > Sander > > >> >> I don't know if there is that much time left for 4.14, since we are at >> RC6 already. >> >> -- >> Sander >> >> >>> >>> >>> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky >>> wrote: >>> >>>> On 10/23/2017 10:44 PM, Fengguang Wu wrote: >>>>> Greetings, >>>>> >>>>> 0day kernel testing robot got the below dmesg and the first bad commit is >>>>> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >>>> master >>>>> >>>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381 >>>>> Author: Craig Bergstrom >>>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600 >>>>> Commit: Ingo Molnar >>>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200 >>>>> >>>>> x86/mm: Limit mmap() of /dev/mem to valid physical addresses >>>> >>>> Also note >>>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html >>>> >>>> -boris >>>> >>> >> > qemu-system-i386: -serial pty: char device redirected to /dev/pts/16 (label serial0) [00:05.0] xen_pt_realize: Assigning real physical device 08:00.0 to devfn 0x28 [00:05.0] xen_pt_register_regions: IO region 0 registered (size=0x2000 base_addr=0xfe1fe000 type: 0x4) [00:05.0] xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080, host=0x, syncing to 0x0080. [00:05.0] xen_pt_config_reg_init: Offset 0x0010 mismatch! Emulated=0x, host=0xfe1fe004, syncing to 0xfe1fe004. [00:05.0] xen_pt_config_reg_init: Offset 0x0052 mismatch! Emulated=0x, host=0x4803, syncing to 0x0003. [00:05.0] xen_pt_config_reg_init: Offset 0x0072 mismatch! Emulated=0x, host=0x0086, syncing to 0x0080. [00:05.0] xen_pt_config_reg_init: Offset 0x00a4 mismatch! Emulated=0x, host=0x8fc0, syncing to 0x8fc0. [00:05.0] xen_pt_config_reg_init: Offset 0x00b2 mismatch! Emulated=0x, host=0x1012, syncing to 0x1012. [00:05.0] xen_pt_msix_init: get MSI-X table BAR base 0xfe1fe000 [00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8 [00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8, PCI_MSIX_ENTRY_SIZE = 0x10, msix->table_offset_adjust = 0, msix->table_base = 0xfe1fe000 [00:05.0] xen_pt_msix_init: Error: Can't map physical MSI-X table: Invalid argument [00:05.0] xen_pt_msix_size_init: Error: Internal error: Invalid xen_pt_msix_init. Failed to initialize 12/15, type = 0x1, rc: -22 [00:05.0] xen_pt_msi_set_enable: disabling MSI. *** Error in `/usr/local/lib/xen/bin/qemu-system-i386': corrupted size vs. prev_size: 0x55ce13565570 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x70bcb)[0x7f700ab7ebcb] /lib/x86_64-linux-gnu/libc.so.6(+0x76f96)[0x7f700ab84f96] /lib/x86_64-linux-gnu/libc.so.6(+0x77388)[0x7f700ab85388] /lib/x86_64-linux-gnu/libc.so.6(+0x78dca)[0x7f700ab86dca] /lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0x27b)[0x7f700ab89b4b] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_malloc0+0x21)[0x7f700bbbee61] /usr/local/lib/xen/bin/qemu-system-i386(+0x6d78ee)[0x55ce114298ee] /usr/local/lib/xen/bin/qemu-system-i386(+0x6d309e)[0x55ce1142509e] /usr/local/lib/xen/bin/qemu-system-i386(+0x6d316f)[0x55ce1142516f] /usr/local/lib/xen/bin/qemu-system-i386(+0x24d79b)[0x55ce10f9f79b] /usr/local/lib/xen/bin/qemu-system-i386(+0x6da8bf)[0x55ce1142c8bf] /usr/local/lib/xen/bin/qemu-system-i386(+0x70717c)[0x55ce1145917c] /usr/local/lib/xen/bin/qemu-system-i386(+0x7072c4)[0x5
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 00:02, Craig Bergstrom wrote: > Thanks for the notification, my apologies for the breakage. I'll take a > close look and see if I can figure out what went wrong. > > Sander, any chance you can send /proc/iomem and the inputs to the mmap call > that fail on your affected system? Hi Craig, The output from /proc/iomem is simple to get and attached. The mmap call is probably issued by qemu and will require more digging. I don't know if there is that much time left for 4.14, since we are at RC6 already. -- Sander > > > On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky> wrote: > >> On 10/23/2017 10:44 PM, Fengguang Wu wrote: >>> Greetings, >>> >>> 0day kernel testing robot got the below dmesg and the first bad commit is >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> master >>> >>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381 >>> Author: Craig Bergstrom >>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600 >>> Commit: Ingo Molnar >>> CommitDate: Fri Oct 20 09:48:00 2017 +0200 >>> >>> x86/mm: Limit mmap() of /dev/mem to valid physical addresses >> >> Also note >> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html >> >> -boris >> > -0fff : Reserved 1000-00095fff : System RAM 00096000-000963ff : RAM buffer 00096400-000f : Reserved 000a-000b : PCI Bus :00 000c-000cfdff : Video ROM 000d-000d : PCI Bus :00 000d4800-000d4bff : Adapter ROM 000f-000f : System ROM 0010-7fff : System RAM 0100-01d2a703 : Kernel code 01d2a704-025450ff : Kernel data 02b3f000-02cc1fff : Kernel bss c7f9-c7f9dfff : ACPI Tables c7f9e000-c7fd : ACPI Non-volatile Storage c7fe-c7ff : Reserved c800-dfff : PCI Bus :00 cfe0-cfef : PCI Bus :0c cfef8000-cfefbfff : :0c:00.0 cfef8000-cfefbfff : r8169 cfeff000-cfef : :0c:00.0 cfeff000-cfef : r8169 cff0-cfff : PCI Bus :0d cfff8000-cfffbfff : :0d:00.0 cfff8000-cfffbfff : r8169 c000-cfff : :0d:00.0 c000-cfff : r8169 d000-dfff : PCI Bus :0f d000-dfff : :0f:00.0 d000-d0ff : vesafb e000-efff : PCI MMCONFIG [bus 00-ff] e000-efff : pnp 00:07 f000-febf : PCI Bus :00 f600-f6003fff : Reserved f600-f6003fff : pnp 00:01 fdcf7000-fdcf7fff : :00:12.0 fdcf7000-fdcf7fff : ohci_hcd fdcf8000-fdcfbfff : :00:14.2 fdcfc000-fdcfcfff : :00:13.0 fdcfc000-fdcfcfff : ohci_hcd fdcfd000-fdcfdfff : :00:14.5 fdcfd000-fdcfdfff : ohci_hcd fdcfe000-fdcfefff : :00:16.0 fdcfe000-fdcfefff : ohci_hcd fdcff000-fdcff3ff : :00:11.0 fdcff000-fdcff3ff : ahci fdcff400-fdcff4ff : :00:12.2 fdcff400-fdcff4ff : ehci_hcd fdcff800-fdcff8ff : :00:13.2 fdcff800-fdcff8ff : ehci_hcd fdcffc00-fdcffcff : :00:16.2 fdcffc00-fdcffcff : ehci_hcd fde0-fdef : PCI Bus :04 fdef8000-fdef8fff : :04:00.0 fdef9000-fdef9fff : :04:00.1 fdefa000-fdefafff : :04:00.2 fdefb000-fdefbfff : :04:00.3 fdefc000-fdefcfff : :04:00.4 fdefd000-fdefdfff : :04:00.5 fdefe000-fdefefff : :04:00.6 fdeff000-fdef : :04:00.7 fdf0-fe1f : PCI Bus :05 fdfe-fdff : :05:00.0 fe00-fe1f : PCI Bus :06 fe00-fe0f : PCI Bus :07 fe0e-fe0e : :07:00.0 fe0ff800-fe0f : :07:00.0 fe0ff800-fe0f : ahci fe10-fe1f : PCI Bus :08 fe1fe000-fe1f : :08:00.0 fe20-fe3f : PCI Bus :09 fe20-fe3f : :09:00.0 fe40-fe4f : PCI Bus :0a fe4f8000-fe4f8fff : :0a:00.0 fe4f9000-fe4f9fff : :0a:00.1 fe4fa000-fe4fafff : :0a:00.2 fe4fb000-fe4fbfff : :0a:00.3 fe4fc000-fe4fcfff : :0a:00.4 fe4fd000-fe4fdfff : :0a:00.5 fe4fe000-fe4fefff : :0a:00.6 fe4ff000-fe4f : :0a:00.7 fe50-fe5f : PCI Bus :0b fe5fe000-fe5f : :0b:00.0 fe60-fe6f : PCI Bus :0c fe6e-fe6f : :0c:00.0 fe70-fe7f : PCI Bus :0d fe7e-fe7f : :0d:00.0 fe80-fe8f : PCI Bus :0e fe8fe000-fe8f : :0e:00.0 fe90-fe9f : PCI Bus :0f fe9e-fe9e : :0f:00.0 fe9fc000-fe9f : :0f:00.1 fe9fc000-fe9f : ICH HD audio fec0-fec00fff : Reserved fec0-fec003ff : IOAPIC 0 fec1-fec1001f : pnp 00:06 fec2-fec20fff : Reserved fec2-fec203ff : IOAPIC 1 fed0-fed003ff : HPET 2 fed0-fed003ff : PNP0103:00 fed8-fed80fff : pnp 00:06 fee0-feef : Reserved fee0-fee00fff : Local APIC fee0-fee00fff : pnp 00:05 ffb8-ffbf : pnp
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 00:02, Craig Bergstrom wrote: > Thanks for the notification, my apologies for the breakage. I'll take a > close look and see if I can figure out what went wrong. > > Sander, any chance you can send /proc/iomem and the inputs to the mmap call > that fail on your affected system? Hi Craig, The output from /proc/iomem is simple to get and attached. The mmap call is probably issued by qemu and will require more digging. I don't know if there is that much time left for 4.14, since we are at RC6 already. -- Sander > > > On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky > wrote: > >> On 10/23/2017 10:44 PM, Fengguang Wu wrote: >>> Greetings, >>> >>> 0day kernel testing robot got the below dmesg and the first bad commit is >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> master >>> >>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381 >>> Author: Craig Bergstrom >>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600 >>> Commit: Ingo Molnar >>> CommitDate: Fri Oct 20 09:48:00 2017 +0200 >>> >>> x86/mm: Limit mmap() of /dev/mem to valid physical addresses >> >> Also note >> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html >> >> -boris >> > -0fff : Reserved 1000-00095fff : System RAM 00096000-000963ff : RAM buffer 00096400-000f : Reserved 000a-000b : PCI Bus :00 000c-000cfdff : Video ROM 000d-000d : PCI Bus :00 000d4800-000d4bff : Adapter ROM 000f-000f : System ROM 0010-7fff : System RAM 0100-01d2a703 : Kernel code 01d2a704-025450ff : Kernel data 02b3f000-02cc1fff : Kernel bss c7f9-c7f9dfff : ACPI Tables c7f9e000-c7fd : ACPI Non-volatile Storage c7fe-c7ff : Reserved c800-dfff : PCI Bus :00 cfe0-cfef : PCI Bus :0c cfef8000-cfefbfff : :0c:00.0 cfef8000-cfefbfff : r8169 cfeff000-cfef : :0c:00.0 cfeff000-cfef : r8169 cff0-cfff : PCI Bus :0d cfff8000-cfffbfff : :0d:00.0 cfff8000-cfffbfff : r8169 c000-cfff : :0d:00.0 c000-cfff : r8169 d000-dfff : PCI Bus :0f d000-dfff : :0f:00.0 d000-d0ff : vesafb e000-efff : PCI MMCONFIG [bus 00-ff] e000-efff : pnp 00:07 f000-febf : PCI Bus :00 f600-f6003fff : Reserved f600-f6003fff : pnp 00:01 fdcf7000-fdcf7fff : :00:12.0 fdcf7000-fdcf7fff : ohci_hcd fdcf8000-fdcfbfff : :00:14.2 fdcfc000-fdcfcfff : :00:13.0 fdcfc000-fdcfcfff : ohci_hcd fdcfd000-fdcfdfff : :00:14.5 fdcfd000-fdcfdfff : ohci_hcd fdcfe000-fdcfefff : :00:16.0 fdcfe000-fdcfefff : ohci_hcd fdcff000-fdcff3ff : :00:11.0 fdcff000-fdcff3ff : ahci fdcff400-fdcff4ff : :00:12.2 fdcff400-fdcff4ff : ehci_hcd fdcff800-fdcff8ff : :00:13.2 fdcff800-fdcff8ff : ehci_hcd fdcffc00-fdcffcff : :00:16.2 fdcffc00-fdcffcff : ehci_hcd fde0-fdef : PCI Bus :04 fdef8000-fdef8fff : :04:00.0 fdef9000-fdef9fff : :04:00.1 fdefa000-fdefafff : :04:00.2 fdefb000-fdefbfff : :04:00.3 fdefc000-fdefcfff : :04:00.4 fdefd000-fdefdfff : :04:00.5 fdefe000-fdefefff : :04:00.6 fdeff000-fdef : :04:00.7 fdf0-fe1f : PCI Bus :05 fdfe-fdff : :05:00.0 fe00-fe1f : PCI Bus :06 fe00-fe0f : PCI Bus :07 fe0e-fe0e : :07:00.0 fe0ff800-fe0f : :07:00.0 fe0ff800-fe0f : ahci fe10-fe1f : PCI Bus :08 fe1fe000-fe1f : :08:00.0 fe20-fe3f : PCI Bus :09 fe20-fe3f : :09:00.0 fe40-fe4f : PCI Bus :0a fe4f8000-fe4f8fff : :0a:00.0 fe4f9000-fe4f9fff : :0a:00.1 fe4fa000-fe4fafff : :0a:00.2 fe4fb000-fe4fbfff : :0a:00.3 fe4fc000-fe4fcfff : :0a:00.4 fe4fd000-fe4fdfff : :0a:00.5 fe4fe000-fe4fefff : :0a:00.6 fe4ff000-fe4f : :0a:00.7 fe50-fe5f : PCI Bus :0b fe5fe000-fe5f : :0b:00.0 fe60-fe6f : PCI Bus :0c fe6e-fe6f : :0c:00.0 fe70-fe7f : PCI Bus :0d fe7e-fe7f : :0d:00.0 fe80-fe8f : PCI Bus :0e fe8fe000-fe8f : :0e:00.0 fe90-fe9f : PCI Bus :0f fe9e-fe9e : :0f:00.0 fe9fc000-fe9f : :0f:00.1 fe9fc000-fe9f : ICH HD audio fec0-fec00fff : Reserved fec0-fec003ff : IOAPIC 0 fec1-fec1001f : pnp 00:06 fec2-fec20fff : Reserved fec2-fec203ff : IOAPIC 1 fed0-fed003ff : HPET 2 fed0-fed003ff : PNP0103:00 fed8-fed80fff : pnp 00:06 fee0-feef : Reserved fee0-fee00fff : Local APIC fee0-fee00fff : pnp 00:05 ffb8-ffbf : pnp 00:06 ffe0- : Reserved fd-ff :
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 10:05, Sander Eikelenboom wrote: > On 26/10/17 00:02, Craig Bergstrom wrote: >> Thanks for the notification, my apologies for the breakage. I'll take a >> close look and see if I can figure out what went wrong. >> >> Sander, any chance you can send /proc/iomem and the inputs to the mmap call >> that fail on your affected system? > > Hi Craig, > > The output from /proc/iomem is simple to get and attached. > The mmap call is probably issued by qemu and will require more digging. Ahh grepping qemu gave a pointer, it's probably the code in: http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40 around line 571, that would also explain why it's only this device that has the problem, since it's the only one trying to use MSI(-X) interrupts. Will see it i can add some logging to that function. -- Sander > > I don't know if there is that much time left for 4.14, since we are at > RC6 already. > > -- > Sander > > >> >> >> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky <boris.ostrov...@oracle.com >>> wrote: >> >>> On 10/23/2017 10:44 PM, Fengguang Wu wrote: >>>> Greetings, >>>> >>>> 0day kernel testing robot got the below dmesg and the first bad commit is >>>> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >>> master >>>> >>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381 >>>> Author: Craig Bergstrom <cra...@google.com> >>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600 >>>> Commit: Ingo Molnar <mi...@kernel.org> >>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200 >>>> >>>> x86/mm: Limit mmap() of /dev/mem to valid physical addresses >>> >>> Also note >>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html >>> >>> -boris >>> >> >
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 10:05, Sander Eikelenboom wrote: > On 26/10/17 00:02, Craig Bergstrom wrote: >> Thanks for the notification, my apologies for the breakage. I'll take a >> close look and see if I can figure out what went wrong. >> >> Sander, any chance you can send /proc/iomem and the inputs to the mmap call >> that fail on your affected system? > > Hi Craig, > > The output from /proc/iomem is simple to get and attached. > The mmap call is probably issued by qemu and will require more digging. Ahh grepping qemu gave a pointer, it's probably the code in: http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40 around line 571, that would also explain why it's only this device that has the problem, since it's the only one trying to use MSI(-X) interrupts. Will see it i can add some logging to that function. -- Sander > > I don't know if there is that much time left for 4.14, since we are at > RC6 already. > > -- > Sander > > >> >> >> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky >> wrote: >> >>> On 10/23/2017 10:44 PM, Fengguang Wu wrote: >>>> Greetings, >>>> >>>> 0day kernel testing robot got the below dmesg and the first bad commit is >>>> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >>> master >>>> >>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381 >>>> Author: Craig Bergstrom >>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600 >>>> Commit: Ingo Molnar >>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200 >>>> >>>> x86/mm: Limit mmap() of /dev/mem to valid physical addresses >>> >>> Also note >>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html >>> >>> -boris >>> >> >
4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi
Hi, I encountered this splat with 4.12-RC2. -- Sander [ 119.021594] BUG: scheduling while atomic: irq/47-iwlwifi/517/0x0200 [ 119.021604] Modules linked in: xt_tcpudp ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_security ip6table_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security iptable_mangle ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables rfcomm bnep binfmt_misc arc4 iTCO_wdt iTCO_vendor_support uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev intel_rapl cdc_mbim iwlmvm x86_pkg_temp_thermal intel_powerclamp mac80211 media cdc_wdm btusb coretemp cdc_ncm kvm_intel usbnet mii cdc_acm iwlwifi kvm btintel joydev pcspkr serio_raw cfg80211 snd_hda_codec_hdmi [ 119.021701] bluetooth lpc_ich snd_hda_codec_realtek snd_hda_codec_generic shpchp sg ecdh_generic snd_hda_intel thinkpad_acpi snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer nvram snd soundcore evdev tpm_tis tpm_tis_core tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci_sdmmc mmc_core aesni_intel aes_x86_64 crypto_simd cryptd glue_helper psmouse i2c_i801 sd_mod ehci_pci ehci_hcd e1000e rtsx_pci mfd_core ptp xhci_pci pps_core xhci_hcd [ 119.021759] CPU: 1 PID: 517 Comm: irq/47-iwlwifi Not tainted 4.12.0-rc2-t440s-20170522+ #1 [ 119.021763] Hardware name: LENOVO 20AQS03H00/20AQS03H00, BIOS GJET91WW (2.41 ) 09/21/2016 [ 119.021766] Call Trace: [ 119.021778] ? dump_stack+0x5c/0x84 [ 119.021784] ? __schedule_bug+0x4c/0x70 [ 119.021792] ? __schedule+0x496/0x5c0 [ 119.021798] ? schedule+0x2d/0x80 [ 119.021804] ? schedule_preempt_disabled+0x5/0x10 [ 119.021810] ? __mutex_lock.isra.0+0x18e/0x4c0 [ 119.021817] ? __wake_up+0x2f/0x50 [ 119.021833] ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211] [ 119.021844] ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211] [ 119.021859] ? iwl_mvm_rx_lmac_scan_iter_complete_notif+0x17/0x30 [iwlmvm] [ 119.021869] ? iwl_pcie_rx_handle+0x2a9/0x7e0 [iwlwifi] [ 119.021878] ? iwl_pcie_irq_handler+0x17c/0x730 [iwlwifi] [ 119.021884] ? irq_forced_thread_fn+0x60/0x60 [ 119.021887] ? irq_thread_fn+0x16/0x40 [ 119.021892] ? irq_thread+0x109/0x180 [ 119.021896] ? wake_threads_waitq+0x30/0x30 [ 119.021901] ? kthread+0xf2/0x130 [ 119.021905] ? irq_thread_dtor+0x90/0x90 [ 119.021910] ? kthread_create_on_node+0x40/0x40 [ 119.021915] ? ret_from_fork+0x26/0x40
4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi
Hi, I encountered this splat with 4.12-RC2. -- Sander [ 119.021594] BUG: scheduling while atomic: irq/47-iwlwifi/517/0x0200 [ 119.021604] Modules linked in: xt_tcpudp ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_security ip6table_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security iptable_mangle ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables rfcomm bnep binfmt_misc arc4 iTCO_wdt iTCO_vendor_support uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev intel_rapl cdc_mbim iwlmvm x86_pkg_temp_thermal intel_powerclamp mac80211 media cdc_wdm btusb coretemp cdc_ncm kvm_intel usbnet mii cdc_acm iwlwifi kvm btintel joydev pcspkr serio_raw cfg80211 snd_hda_codec_hdmi [ 119.021701] bluetooth lpc_ich snd_hda_codec_realtek snd_hda_codec_generic shpchp sg ecdh_generic snd_hda_intel thinkpad_acpi snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer nvram snd soundcore evdev tpm_tis tpm_tis_core tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci_sdmmc mmc_core aesni_intel aes_x86_64 crypto_simd cryptd glue_helper psmouse i2c_i801 sd_mod ehci_pci ehci_hcd e1000e rtsx_pci mfd_core ptp xhci_pci pps_core xhci_hcd [ 119.021759] CPU: 1 PID: 517 Comm: irq/47-iwlwifi Not tainted 4.12.0-rc2-t440s-20170522+ #1 [ 119.021763] Hardware name: LENOVO 20AQS03H00/20AQS03H00, BIOS GJET91WW (2.41 ) 09/21/2016 [ 119.021766] Call Trace: [ 119.021778] ? dump_stack+0x5c/0x84 [ 119.021784] ? __schedule_bug+0x4c/0x70 [ 119.021792] ? __schedule+0x496/0x5c0 [ 119.021798] ? schedule+0x2d/0x80 [ 119.021804] ? schedule_preempt_disabled+0x5/0x10 [ 119.021810] ? __mutex_lock.isra.0+0x18e/0x4c0 [ 119.021817] ? __wake_up+0x2f/0x50 [ 119.021833] ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211] [ 119.021844] ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211] [ 119.021859] ? iwl_mvm_rx_lmac_scan_iter_complete_notif+0x17/0x30 [iwlmvm] [ 119.021869] ? iwl_pcie_rx_handle+0x2a9/0x7e0 [iwlwifi] [ 119.021878] ? iwl_pcie_irq_handler+0x17c/0x730 [iwlwifi] [ 119.021884] ? irq_forced_thread_fn+0x60/0x60 [ 119.021887] ? irq_thread_fn+0x16/0x40 [ 119.021892] ? irq_thread+0x109/0x180 [ 119.021896] ? wake_threads_waitq+0x30/0x30 [ 119.021901] ? kthread+0xf2/0x130 [ 119.021905] ? irq_thread_dtor+0x90/0x90 [ 119.021910] ? kthread_create_on_node+0x40/0x40 [ 119.021915] ? ret_from_fork+0x26/0x40
Re: [PATCH] xen/x86: Initialize per_cpu(xen_vcpu, 0) a little earlier
On 2016-10-03 00:45, Boris Ostrovsky wrote: xen_cpuhp_setup() calls mutex_lock() which, when CONFIG_DEBUG_MUTEXES is defined, ends up calling xen_save_fl(). That routine expects per_cpu(xen_vcpu, 0) to be already initialized. Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com> Reported-by: Sander Eikelenboom <li...@eikelenboom.it> --- Sander, please see if this fixes the problem. Thanks. Hi Boris, I have tested it and it fixes the dom0 crash in early boot for me. Thanks again for investigating and the swift fix ! -- Sander arch/x86/xen/enlighten.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 366b6ae..96c2dea 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1644,7 +1644,6 @@ asmlinkage __visible void __init xen_start_kernel(void) xen_initial_gdt = _cpu(gdt_page, 0); xen_smp_init(); - WARN_ON(xen_cpuhp_setup()); #ifdef CONFIG_ACPI_NUMA /* @@ -1658,6 +1657,8 @@ asmlinkage __visible void __init xen_start_kernel(void) possible map and a non-dummy shared_info. */ per_cpu(xen_vcpu, 0) = _shared_info->vcpu_info[0]; + WARN_ON(xen_cpuhp_setup()); + local_irq_disable(); early_boot_irqs_disabled = true;
Re: [PATCH] xen/x86: Initialize per_cpu(xen_vcpu, 0) a little earlier
On 2016-10-03 00:45, Boris Ostrovsky wrote: xen_cpuhp_setup() calls mutex_lock() which, when CONFIG_DEBUG_MUTEXES is defined, ends up calling xen_save_fl(). That routine expects per_cpu(xen_vcpu, 0) to be already initialized. Signed-off-by: Boris Ostrovsky Reported-by: Sander Eikelenboom --- Sander, please see if this fixes the problem. Thanks. Hi Boris, I have tested it and it fixes the dom0 crash in early boot for me. Thanks again for investigating and the swift fix ! -- Sander arch/x86/xen/enlighten.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 366b6ae..96c2dea 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1644,7 +1644,6 @@ asmlinkage __visible void __init xen_start_kernel(void) xen_initial_gdt = _cpu(gdt_page, 0); xen_smp_init(); - WARN_ON(xen_cpuhp_setup()); #ifdef CONFIG_ACPI_NUMA /* @@ -1658,6 +1657,8 @@ asmlinkage __visible void __init xen_start_kernel(void) possible map and a non-dummy shared_info. */ per_cpu(xen_vcpu, 0) = _shared_info->vcpu_info[0]; + WARN_ON(xen_cpuhp_setup()); + local_irq_disable(); early_boot_irqs_disabled = true;
Re: [Intel-gfx] Linux 4.8-rc?: WARNING: at drivers/gpu/drm/i915/intel_pm.c:7866 sandybridge_pcode_write Missing switch case (16) in gen6_check_mailbox_status
On 2016-09-07 16:49, Jani Nikula wrote: On Tue, 06 Sep 2016, li...@eikelenboom.it wrote: On 2016-09-06 11:25, Jani Nikula wrote: On Tue, 06 Sep 2016, li...@eikelenboom.it wrote: L.S., Since one of the last 4.8 RC's i'm getting the warning below when booting on my sandybridge based thinkpad. From what it seems the machine still works fine though. What does 'lspci -nns 2' say for you? 00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) Fixed in drm-intel-fixes by commit fc2780b66b15092ac68272644a522c1624c48547 Author: Chris WilsonDate: Fri Aug 26 11:59:26 2016 +0100 drm/i915: Add GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE to SNB BR, Jani. Works-for-me, thx! -- Sander
Re: [Intel-gfx] Linux 4.8-rc?: WARNING: at drivers/gpu/drm/i915/intel_pm.c:7866 sandybridge_pcode_write Missing switch case (16) in gen6_check_mailbox_status
On 2016-09-07 16:49, Jani Nikula wrote: On Tue, 06 Sep 2016, li...@eikelenboom.it wrote: On 2016-09-06 11:25, Jani Nikula wrote: On Tue, 06 Sep 2016, li...@eikelenboom.it wrote: L.S., Since one of the last 4.8 RC's i'm getting the warning below when booting on my sandybridge based thinkpad. From what it seems the machine still works fine though. What does 'lspci -nns 2' say for you? 00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) Fixed in drm-intel-fixes by commit fc2780b66b15092ac68272644a522c1624c48547 Author: Chris Wilson Date: Fri Aug 26 11:59:26 2016 +0100 drm/i915: Add GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE to SNB BR, Jani. Works-for-me, thx! -- Sander
Re: [Linux 4.8-rc1 Bisected] Clock on boot Xen HVM guest starts at 31/12/1999
Friday, August 12, 2016, 7:29:37 PM, you wrote: > Hi, > On 12/08/2016 at 19:23:36 +0200, Sander Eikelenboom wrote : >> L.S., >> >> I'm seeing an issue when using a Linux 4.8-rc1 kernel in a Xen HVM guest (PV >> guests and dom0 are uneffected). The clock is always set to 31/12/1999 on >> boot >> of the guest, instead of the system clock time. >> >> Bisecting seems to point out commit: >> 463a86304cae92e10277b47180ac59cf93982e5b char/genrtc: x86: remove remnants >> of asm/rtc.h >> > Isn't that solved by http://patchwork.ozlabs.org/patch/657465/ ? Ah yes that solves it (i only looked in your git-tree to see if there was a patch already), sorry for the noise ! -- Sander
Re: [Linux 4.8-rc1 Bisected] Clock on boot Xen HVM guest starts at 31/12/1999
Friday, August 12, 2016, 7:29:37 PM, you wrote: > Hi, > On 12/08/2016 at 19:23:36 +0200, Sander Eikelenboom wrote : >> L.S., >> >> I'm seeing an issue when using a Linux 4.8-rc1 kernel in a Xen HVM guest (PV >> guests and dom0 are uneffected). The clock is always set to 31/12/1999 on >> boot >> of the guest, instead of the system clock time. >> >> Bisecting seems to point out commit: >> 463a86304cae92e10277b47180ac59cf93982e5b char/genrtc: x86: remove remnants >> of asm/rtc.h >> > Isn't that solved by http://patchwork.ozlabs.org/patch/657465/ ? Ah yes that solves it (i only looked in your git-tree to see if there was a patch already), sorry for the noise ! -- Sander
[Linux 4.8-rc1 Bisected] Clock on boot Xen HVM guest starts at 31/12/1999
L.S., I'm seeing an issue when using a Linux 4.8-rc1 kernel in a Xen HVM guest (PV guests and dom0 are uneffected). The clock is always set to 31/12/1999 on boot of the guest, instead of the system clock time. Bisecting seems to point out commit: 463a86304cae92e10277b47180ac59cf93982e5b char/genrtc: x86: remove remnants of asm/rtc.h -- Sander
[Linux 4.8-rc1 Bisected] Clock on boot Xen HVM guest starts at 31/12/1999
L.S., I'm seeing an issue when using a Linux 4.8-rc1 kernel in a Xen HVM guest (PV guests and dom0 are uneffected). The clock is always set to 31/12/1999 on boot of the guest, instead of the system clock time. Bisecting seems to point out commit: 463a86304cae92e10277b47180ac59cf93982e5b char/genrtc: x86: remove remnants of asm/rtc.h -- Sander
Re: nf_unregister_net_hook: hook not found!
On 2015-12-30 03:39, ebied...@xmission.com wrote: Pablo Neira Ayuso writes: On Mon, Dec 28, 2015 at 09:05:03PM +0100, Sander Eikelenboom wrote: Hi, Running a 4.4.0-rc6 kernel i encountered the warning below. Cc'ing Eric Biederman. @Sander, could you provide a way to reproduce this? I am on vacation until the new year, but if this is reproducible we should be able to print out reg, reg->pf, reg->hooknum, reg->hook to figure out which hook is having something very weird happen to it. This is happening in some network namespace exit. Eric Unfortunately i have found no way to reproduce, 13 seconds implies it was at boot, but i only have seen this once. -- Sander Thanks. [ 13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team [ 13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.328141] systemd-logind[2485]: Failed to start user service: Unknown unit: user@117.service [ 14.356634] systemd-logind[2485]: New session c1 of user lightdm. [ 14.357320] [ cut here ] [ 14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357328] nf_unregister_net_hook: hook not found! [ 14.357371] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357380] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U 4.4.0-rc6-x220-20151224+ #1 [ 14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357390] Workqueue: netns cleanup_net [ 14.357393] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357395] 88030e820d80 88030e7cbd90 81c962d8 81c962e0 [ 14.357397] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357398] Call Trace: [ 14.357405] [] ? dump_stack+0x40/0x57 [ 14.357408] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357410] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357416] [] ? mutex_lock+0x9/0x30 [ 14.357418] [] ? netfilter_net_exit+0x25/0x50 [ 14.357421] [] ? ops_exit_list.isra.6+0x2e/0x60 [ 14.357424] [] ? cleanup_net+0x1ab/0x280 [ 14.357427] [] ? process_one_work+0x133/0x330 [ 14.357429] [] ? worker_thread+0x60/0x470 [ 14.357430] [] ? process_one_work+0x330/0x330 [ 14.357434] [] ? kthread+0xca/0xe0 [ 14.357436] [] ? kthread_create_on_node+0x170/0x170 [ 14.357439] [] ? ret_from_fork+0x3f/0x70 [ 14.357441] [] ? kthread_create_on_node+0x170/0x170 [ 14.357443] ---[ end trace 9984cc4b0e89f818 ]--- [ 14.357443] [ cut here ] [ 14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357446] nf_unregister_net_hook: hook not found! [ 14.357472] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357478] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U W 4.4.0
Re: nf_unregister_net_hook: hook not found!
On 2015-12-30 03:39, ebied...@xmission.com wrote: Pablo Neira Ayuso <pa...@netfilter.org> writes: On Mon, Dec 28, 2015 at 09:05:03PM +0100, Sander Eikelenboom wrote: Hi, Running a 4.4.0-rc6 kernel i encountered the warning below. Cc'ing Eric Biederman. @Sander, could you provide a way to reproduce this? I am on vacation until the new year, but if this is reproducible we should be able to print out reg, reg->pf, reg->hooknum, reg->hook to figure out which hook is having something very weird happen to it. This is happening in some network namespace exit. Eric Unfortunately i have found no way to reproduce, 13 seconds implies it was at boot, but i only have seen this once. -- Sander Thanks. [ 13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team [ 13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.328141] systemd-logind[2485]: Failed to start user service: Unknown unit: user@117.service [ 14.356634] systemd-logind[2485]: New session c1 of user lightdm. [ 14.357320] [ cut here ] [ 14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357328] nf_unregister_net_hook: hook not found! [ 14.357371] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357380] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U 4.4.0-rc6-x220-20151224+ #1 [ 14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357390] Workqueue: netns cleanup_net [ 14.357393] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357395] 88030e820d80 88030e7cbd90 81c962d8 81c962e0 [ 14.357397] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357398] Call Trace: [ 14.357405] [] ? dump_stack+0x40/0x57 [ 14.357408] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357410] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357416] [] ? mutex_lock+0x9/0x30 [ 14.357418] [] ? netfilter_net_exit+0x25/0x50 [ 14.357421] [] ? ops_exit_list.isra.6+0x2e/0x60 [ 14.357424] [] ? cleanup_net+0x1ab/0x280 [ 14.357427] [] ? process_one_work+0x133/0x330 [ 14.357429] [] ? worker_thread+0x60/0x470 [ 14.357430] [] ? process_one_work+0x330/0x330 [ 14.357434] [] ? kthread+0xca/0xe0 [ 14.357436] [] ? kthread_create_on_node+0x170/0x170 [ 14.357439] [] ? ret_from_fork+0x3f/0x70 [ 14.357441] [] ? kthread_create_on_node+0x170/0x170 [ 14.357443] ---[ end trace 9984cc4b0e89f818 ]--- [ 14.357443] [ cut here ] [ 14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357446] nf_unregister_net_hook: hook not found! [ 14.357472] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357478] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Taint
nf_unregister_net_hook: hook not found!
Hi, Running a 4.4.0-rc6 kernel i encountered the warning below. -- Sander [ 13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team [ 13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.328141] systemd-logind[2485]: Failed to start user service: Unknown unit: user@117.service [ 14.356634] systemd-logind[2485]: New session c1 of user lightdm. [ 14.357320] [ cut here ] [ 14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357328] nf_unregister_net_hook: hook not found! [ 14.357371] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357380] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U 4.4.0-rc6-x220-20151224+ #1 [ 14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357390] Workqueue: netns cleanup_net [ 14.357393] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357395] 88030e820d80 88030e7cbd90 81c962d8 81c962e0 [ 14.357397] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357398] Call Trace: [ 14.357405] [] ? dump_stack+0x40/0x57 [ 14.357408] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357410] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357416] [] ? mutex_lock+0x9/0x30 [ 14.357418] [] ? netfilter_net_exit+0x25/0x50 [ 14.357421] [] ? ops_exit_list.isra.6+0x2e/0x60 [ 14.357424] [] ? cleanup_net+0x1ab/0x280 [ 14.357427] [] ? process_one_work+0x133/0x330 [ 14.357429] [] ? worker_thread+0x60/0x470 [ 14.357430] [] ? process_one_work+0x330/0x330 [ 14.357434] [] ? kthread+0xca/0xe0 [ 14.357436] [] ? kthread_create_on_node+0x170/0x170 [ 14.357439] [] ? ret_from_fork+0x3f/0x70 [ 14.357441] [] ? kthread_create_on_node+0x170/0x170 [ 14.357443] ---[ end trace 9984cc4b0e89f818 ]--- [ 14.357443] [ cut here ] [ 14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357446] nf_unregister_net_hook: hook not found! [ 14.357472] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357478] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U W 4.4.0-rc6-x220-20151224+ #1 [ 14.357481] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357484] Workqueue: netns cleanup_net [ 14.357486] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357488] 88030e820db8 88030e7cbd90 81c962d8 81c962e0 [ 14.357489] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357490] Call Trace: [ 14.357493] [] ? dump_stack+0x40/0x57 [ 14.357495] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357497] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357499] [] ?
nf_unregister_net_hook: hook not found!
Hi, Running a 4.4.0-rc6 kernel i encountered the warning below. -- Sander [ 13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team [ 13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.328141] systemd-logind[2485]: Failed to start user service: Unknown unit: user@117.service [ 14.356634] systemd-logind[2485]: New session c1 of user lightdm. [ 14.357320] [ cut here ] [ 14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357328] nf_unregister_net_hook: hook not found! [ 14.357371] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357380] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U 4.4.0-rc6-x220-20151224+ #1 [ 14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357390] Workqueue: netns cleanup_net [ 14.357393] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357395] 88030e820d80 88030e7cbd90 81c962d8 81c962e0 [ 14.357397] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357398] Call Trace: [ 14.357405] [] ? dump_stack+0x40/0x57 [ 14.357408] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357410] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357416] [] ? mutex_lock+0x9/0x30 [ 14.357418] [] ? netfilter_net_exit+0x25/0x50 [ 14.357421] [] ? ops_exit_list.isra.6+0x2e/0x60 [ 14.357424] [] ? cleanup_net+0x1ab/0x280 [ 14.357427] [] ? process_one_work+0x133/0x330 [ 14.357429] [] ? worker_thread+0x60/0x470 [ 14.357430] [] ? process_one_work+0x330/0x330 [ 14.357434] [] ? kthread+0xca/0xe0 [ 14.357436] [] ? kthread_create_on_node+0x170/0x170 [ 14.357439] [] ? ret_from_fork+0x3f/0x70 [ 14.357441] [] ? kthread_create_on_node+0x170/0x170 [ 14.357443] ---[ end trace 9984cc4b0e89f818 ]--- [ 14.357443] [ cut here ] [ 14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357446] nf_unregister_net_hook: hook not found! [ 14.357472] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357478] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U W 4.4.0-rc6-x220-20151224+ #1 [ 14.357481] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357484] Workqueue: netns cleanup_net [ 14.357486] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357488] 88030e820db8 88030e7cbd90 81c962d8 81c962e0 [ 14.357489] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357490] Call Trace: [ 14.357493] [] ? dump_stack+0x40/0x57 [ 14.357495] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357497] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357499] [] ?
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu
On 2015-12-14 20:48, Eric Shelton wrote: Please note that the same issue appears to have been introduced in the recent 4.2.7 kernel. It perhaps has to do with b4ff8389ed14b849354b59ce9b360bdefcdbf99c having a matching commit e8d097151d309eb71f750bbf34e6a7ef6256da7e in linux-stable.git. The below patch to arch/x86/kernel/rtc.c was also effective for 4.2.7. Eric Hi Eric, Yeah it's unfortunate the patch patching the other patches destined for stable didn't make it in time for stable :(. Any how the chosen solution wasn't ideal so there now is a V2 patch by Boris. It hasn't been picked up yet, but hopefully will be anytime soon (for the patch see http://lkml.iu.edu/hypermail/linux/kernel/1512.1/03504.html) -- Sander On 2015-12-02 18:30, Sander Eikelenboom wrote: On 2015-12-02 15:55, David Vrabel wrote: > On 28/11/15 15:47, Sander Eikelenboom wrote: >> genirq: Flags mismatch irq 8. (hvc_console) vs. >> (rtc0) > > We shouldn't register an rtc_cmos device because its legacy irq > conflicts with the irq needed for hvc0. For a multi VCPU guest irq 8 > is > in use for the pv spinlocks and this gets requested first, preventing > the rtc device from probing. > > Does this patch fix it for you? > > David It does, thanks. Reported-and-tested-by: Sander Eikelenboom -- Sander > 8< > x86: rtc_cmos platform device requires legacy irqs > > Adding the rtc platform device when there are no legacy irqs (no > legacy PIC) causes a conflict with other devices that end up using the > same irq number. > > In a single VCPU PV guest we should have: > > /proc/interrupts: >CPU0 > 0: 4934 xen-percpu-virq timer0 > 1: 0 xen-percpu-ipi spinlock0 > 2: 0 xen-percpu-ipi resched0 > 3: 0 xen-percpu-ipi callfunc0 > 4: 0 xen-percpu-virq debug0 > 5: 0 xen-percpu-ipi callfuncsingle0 > 6: 0 xen-percpu-ipi irqwork0 > 7:321 xen-dyn-event xenbus > 8: 90 xen-dyn-event hvc_console > ... > > But hvc_console cannot get its interrupt because it is already in use > by rtc0 and the console does not work. > > genirq: Flags mismatch irq 8. (hvc_console) vs. > (rtc0) > > The rtc_cmos device requires a particular legacy irq so don't add it > if there are no legacy irqs. > > Signed-off-by: David Vrabel > --- > arch/x86/kernel/rtc.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c > index cd96852..07c70f1 100644 > --- a/arch/x86/kernel/rtc.c > +++ b/arch/x86/kernel/rtc.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_X86_32 > /* > @@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void) > } > #endif > > + /* RTC uses legacy IRQs. */ > + if (!nr_legacy_irqs()) > + return -ENODEV; > + > platform_device_register(_device); > dev_info(_device.dev, >"registered platform RTC device (no PNP device found)\n"); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu
On 2015-12-14 20:48, Eric Shelton wrote: Please note that the same issue appears to have been introduced in the recent 4.2.7 kernel. It perhaps has to do with b4ff8389ed14b849354b59ce9b360bdefcdbf99c having a matching commit e8d097151d309eb71f750bbf34e6a7ef6256da7e in linux-stable.git. The below patch to arch/x86/kernel/rtc.c was also effective for 4.2.7. Eric Hi Eric, Yeah it's unfortunate the patch patching the other patches destined for stable didn't make it in time for stable :(. Any how the chosen solution wasn't ideal so there now is a V2 patch by Boris. It hasn't been picked up yet, but hopefully will be anytime soon (for the patch see http://lkml.iu.edu/hypermail/linux/kernel/1512.1/03504.html) -- Sander On 2015-12-02 18:30, Sander Eikelenboom wrote: On 2015-12-02 15:55, David Vrabel wrote: > On 28/11/15 15:47, Sander Eikelenboom wrote: >> genirq: Flags mismatch irq 8. (hvc_console) vs. >> (rtc0) > > We shouldn't register an rtc_cmos device because its legacy irq > conflicts with the irq needed for hvc0. For a multi VCPU guest irq 8 > is > in use for the pv spinlocks and this gets requested first, preventing > the rtc device from probing. > > Does this patch fix it for you? > > David It does, thanks. Reported-and-tested-by: Sander Eikelenboom <li...@eikelenboom.it> -- Sander > 8< > x86: rtc_cmos platform device requires legacy irqs > > Adding the rtc platform device when there are no legacy irqs (no > legacy PIC) causes a conflict with other devices that end up using the > same irq number. > > In a single VCPU PV guest we should have: > > /proc/interrupts: >CPU0 > 0: 4934 xen-percpu-virq timer0 > 1: 0 xen-percpu-ipi spinlock0 > 2: 0 xen-percpu-ipi resched0 > 3: 0 xen-percpu-ipi callfunc0 > 4: 0 xen-percpu-virq debug0 > 5: 0 xen-percpu-ipi callfuncsingle0 > 6: 0 xen-percpu-ipi irqwork0 > 7:321 xen-dyn-event xenbus > 8: 90 xen-dyn-event hvc_console > ... > > But hvc_console cannot get its interrupt because it is already in use > by rtc0 and the console does not work. > > genirq: Flags mismatch irq 8. (hvc_console) vs. > (rtc0) > > The rtc_cmos device requires a particular legacy irq so don't add it > if there are no legacy irqs. > > Signed-off-by: David Vrabel <david.vra...@citrix.com> > --- > arch/x86/kernel/rtc.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c > index cd96852..07c70f1 100644 > --- a/arch/x86/kernel/rtc.c > +++ b/arch/x86/kernel/rtc.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_X86_32 > /* > @@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void) > } > #endif > > + /* RTC uses legacy IRQs. */ > + if (!nr_legacy_irqs()) > + return -ENODEV; > + > platform_device_register(_device); > dev_info(_device.dev, >"registered platform RTC device (no PNP device found)\n"); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] x86: Xen PV guests don't have the rtc_cmos platform device
On 2015-12-09 15:42, Jan Beulich wrote: On 09.12.15 at 15:32, wrote: --- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -200,6 +200,9 @@ static __init int add_rtc_cmos(void) } #endif + if (paravirt_enabled()) + return -ENODEV; What about Xen Dom0? Jan Checked that in my testing and that still worked: [ 16.733837] rtc_cmos 00:02: RTC can wake from S4 [ 16.734030] rtc_cmos 00:02: rtc core: registered rtc_cmos as rtc0 [ 16.734087] rtc_cmos 00:02: alarms up to one month, y3k, 114 bytes nvram [ 17.760329] rtc_cmos 00:02: setting system clock to 2015-12-09 08:43:48 UTC (1449650628) and /dev/rtc and /dev/rtc0 both exist. But i don't know the nitty gritty details about why ... -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] x86: Xen PV guests don't have the rtc_cmos platform device
On 2015-12-09 15:42, Jan Beulich wrote: On 09.12.15 at 15:32,wrote: --- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -200,6 +200,9 @@ static __init int add_rtc_cmos(void) } #endif + if (paravirt_enabled()) + return -ENODEV; What about Xen Dom0? Jan Checked that in my testing and that still worked: [ 16.733837] rtc_cmos 00:02: RTC can wake from S4 [ 16.734030] rtc_cmos 00:02: rtc core: registered rtc_cmos as rtc0 [ 16.734087] rtc_cmos 00:02: alarms up to one month, y3k, 114 bytes nvram [ 17.760329] rtc_cmos 00:02: setting system clock to 2015-12-09 08:43:48 UTC (1449650628) and /dev/rtc and /dev/rtc0 both exist. But i don't know the nitty gritty details about why ... -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 15:55, David Vrabel wrote: On 28/11/15 15:47, Sander Eikelenboom wrote: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) We shouldn't register an rtc_cmos device because its legacy irq conflicts with the irq needed for hvc0. For a multi VCPU guest irq 8 is in use for the pv spinlocks and this gets requested first, preventing the rtc device from probing. Does this patch fix it for you? David It does, thanks. Reported-and-tested-by: Sander Eikelenboom -- Sander 8< x86: rtc_cmos platform device requires legacy irqs Adding the rtc platform device when there are no legacy irqs (no legacy PIC) causes a conflict with other devices that end up using the same irq number. In a single VCPU PV guest we should have: /proc/interrupts: CPU0 0: 4934 xen-percpu-virq timer0 1: 0 xen-percpu-ipi spinlock0 2: 0 xen-percpu-ipi resched0 3: 0 xen-percpu-ipi callfunc0 4: 0 xen-percpu-virq debug0 5: 0 xen-percpu-ipi callfuncsingle0 6: 0 xen-percpu-ipi irqwork0 7:321 xen-dyn-event xenbus 8: 90 xen-dyn-event hvc_console ... But hvc_console cannot get its interrupt because it is already in use by rtc0 and the console does not work. genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) The rtc_cmos device requires a particular legacy irq so don't add it if there are no legacy irqs. Signed-off-by: David Vrabel --- arch/x86/kernel/rtc.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c index cd96852..07c70f1 100644 --- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -14,6 +14,7 @@ #include #include #include +#include #ifdef CONFIG_X86_32 /* @@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void) } #endif + /* RTC uses legacy IRQs. */ + if (!nr_legacy_irqs()) + return -ENODEV; + platform_device_register(_device); dev_info(_device.dev, "registered platform RTC device (no PNP device found)\n"); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 00:41, Boris Ostrovsky wrote: On 12/01/2015 06:30 PM, Sander Eikelenboom wrote: On 2015-12-02 00:19, Boris Ostrovsky wrote: On 12/01/2015 06:00 PM, Sander Eikelenboom wrote: On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R 16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0 after that commit. This is the last number that you see in NR_IRQS:4352 nr_irqs:48 0 line. I think you should be able to safely revert both b4ff8389ed14b849354b59ce9b360bdefcdbf99c and 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any difference. -boris That was already underway compiling :) And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) hvc_open: request_irq failed with rc -16. Let me try it again tomorrow. Can you post your guest config file, Xen version and host HW (Intel or AMD)? 'xl info' maybe? -boris Hi Boris, A fresh new day .. a fresh new thought. If i look at the /proc/interrupts from a broken and a kernel with both commits the thing that catches the eye is irq8, just as the dmesg message was telling. In my PV guest rtc0 now seems to try and take irq8 that was already assigned to HVC ? Sounds like some assumptions around the legacy range are broken somewhere. What is the benefit of not just reserving the legacy range ? Attached the /proc/interrupts from both boots. -- Sander What i did get was an conflict reverting b4ff8389ed14b849354b59ce9b360bdefcdbf99c: arch/arm64/include/asm/irq.h, although that shouldn't matter because we are on x86 and not on arm. -- Sander -- Sander -boris ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel CPU0 16: 315536 xen-percpu-virq timer0 17: 0 xen-percpu-ipi spinlock0 18: 0 xen-percpu-ipi resched0 19: 0 xen-percpu-ipi callfunc0 20: 0 xen-percpu-virq debug0 21: 0 xen-percpu-ipi callfuncsingle0 22: 0 xen-percpu-ipi irqwork0 23:346 xen-dyn-event xenbus 24:134 xen-dyn-event hvc_console 25: 11464 xen-dyn-event blkif 26: 28710 xen-dyn-event eth0-q0-tx 27: 40136 xen-dyn-event eth0-q0-rx NMI: 0 Non-maskable interrupts LOC: 0 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 0 IRQ work interrupts RTR: 0 APIC ICR read retries RES: 0 Resche
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 15:55, David Vrabel wrote: On 28/11/15 15:47, Sander Eikelenboom wrote: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) We shouldn't register an rtc_cmos device because its legacy irq conflicts with the irq needed for hvc0. For a multi VCPU guest irq 8 is in use for the pv spinlocks and this gets requested first, preventing the rtc device from probing. Does this patch fix it for you? David It does, thanks. Reported-and-tested-by: Sander Eikelenboom <li...@eikelenboom.it> -- Sander 8< x86: rtc_cmos platform device requires legacy irqs Adding the rtc platform device when there are no legacy irqs (no legacy PIC) causes a conflict with other devices that end up using the same irq number. In a single VCPU PV guest we should have: /proc/interrupts: CPU0 0: 4934 xen-percpu-virq timer0 1: 0 xen-percpu-ipi spinlock0 2: 0 xen-percpu-ipi resched0 3: 0 xen-percpu-ipi callfunc0 4: 0 xen-percpu-virq debug0 5: 0 xen-percpu-ipi callfuncsingle0 6: 0 xen-percpu-ipi irqwork0 7:321 xen-dyn-event xenbus 8: 90 xen-dyn-event hvc_console ... But hvc_console cannot get its interrupt because it is already in use by rtc0 and the console does not work. genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) The rtc_cmos device requires a particular legacy irq so don't add it if there are no legacy irqs. Signed-off-by: David Vrabel <david.vra...@citrix.com> --- arch/x86/kernel/rtc.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c index cd96852..07c70f1 100644 --- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -14,6 +14,7 @@ #include #include #include +#include #ifdef CONFIG_X86_32 /* @@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void) } #endif + /* RTC uses legacy IRQs. */ + if (!nr_legacy_irqs()) + return -ENODEV; + platform_device_register(_device); dev_info(_device.dev, "registered platform RTC device (no PNP device found)\n"); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 00:41, Boris Ostrovsky wrote: On 12/01/2015 06:30 PM, Sander Eikelenboom wrote: On 2015-12-02 00:19, Boris Ostrovsky wrote: On 12/01/2015 06:00 PM, Sander Eikelenboom wrote: On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R 16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0 after that commit. This is the last number that you see in NR_IRQS:4352 nr_irqs:48 0 line. I think you should be able to safely revert both b4ff8389ed14b849354b59ce9b360bdefcdbf99c and 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any difference. -boris That was already underway compiling :) And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) hvc_open: request_irq failed with rc -16. Let me try it again tomorrow. Can you post your guest config file, Xen version and host HW (Intel or AMD)? 'xl info' maybe? -boris Hi Boris, A fresh new day .. a fresh new thought. If i look at the /proc/interrupts from a broken and a kernel with both commits the thing that catches the eye is irq8, just as the dmesg message was telling. In my PV guest rtc0 now seems to try and take irq8 that was already assigned to HVC ? Sounds like some assumptions around the legacy range are broken somewhere. What is the benefit of not just reserving the legacy range ? Attached the /proc/interrupts from both boots. -- Sander What i did get was an conflict reverting b4ff8389ed14b849354b59ce9b360bdefcdbf99c: arch/arm64/include/asm/irq.h, although that shouldn't matter because we are on x86 and not on arm. -- Sander -- Sander -boris ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel CPU0 16: 315536 xen-percpu-virq timer0 17: 0 xen-percpu-ipi spinlock0 18: 0 xen-percpu-ipi resched0 19: 0 xen-percpu-ipi callfunc0 20: 0 xen-percpu-virq debug0 21: 0 xen-percpu-ipi callfuncsingle0 22: 0 xen-percpu-ipi irqwork0 23:346 xen-dyn-event xenbus 24:134 xen-dyn-event hvc_console 25: 11464 xen-dyn-event blkif 26: 28710 xen-dyn-event eth0-q0-tx 27: 40136 xen-dyn-event eth0-q0-rx NMI: 0 Non-maskable interrupts LOC: 0 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 0 IRQ work interrupts RTR: 0 APIC ICR read retries RES: 0 Resche
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 00:41, Boris Ostrovsky wrote: On 12/01/2015 06:30 PM, Sander Eikelenboom wrote: On 2015-12-02 00:19, Boris Ostrovsky wrote: On 12/01/2015 06:00 PM, Sander Eikelenboom wrote: On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R 16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0 after that commit. This is the last number that you see in NR_IRQS:4352 nr_irqs:48 0 line. I think you should be able to safely revert both b4ff8389ed14b849354b59ce9b360bdefcdbf99c and 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any difference. -boris That was already underway compiling :) And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) hvc_open: request_irq failed with rc -16. Let me try it again tomorrow. Can you post your guest config file, Xen version and host HW (Intel or AMD)? 'xl info' maybe? -boris Guest config file == dom0 config file == the one i send you earlier. Host is an AMD Phenom X6. # xl info host : serveerstertje release: 4.4.0-rc3-20151201-linus-doflr-boris+ version: #1 SMP Tue Dec 1 19:02:58 CET 2015 machine: x86_64 nr_cpus: 6 max_cpu_id : 5 nr_nodes : 1 cores_per_socket : 6 threads_per_core : 1 cpu_mhz: 3200 hw_caps: 178bf3ff:efd3fbff::00011300:00802001::37ff: virt_caps : hvm hvm_directio total_memory : 20479 free_memory: 7745 sharing_freed_memory : 0 sharing_used_memory: 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 7 xen_extra : -unstable xen_version: 4.7-unstable xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params: virt_start=0x8000 xen_changeset : Thu Nov 26 20:58:13 2015 +0100 git:5252636-dirty xen_commandline: dom0_mem=1536M,max:1536M loglvl=all loglvl_guest=all console_timestamps=datems vga=gfx-1280x1024x32 cpuidle cpufreq=xen com1=38400,8n1 console=vga,com1 ivrs_ioapic[6]=00:14.0 iommu=on,verbose,debug,amd-iommu-debug conring_size=128k ucode=-1 cc_compiler: gcc-4.9.real (Debian 4.9.2-10) 4.9.2 cc_compile_by : root cc_compile_domain : dyndns.org cc_compile_date: Thu Nov 26 21:18:41 CET 2015 xend_config_format : 4 If you need and can get mor
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 00:19, Boris Ostrovsky wrote: On 12/01/2015 06:00 PM, Sander Eikelenboom wrote: On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R 16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0 after that commit. This is the last number that you see in NR_IRQS:4352 nr_irqs:48 0 line. I think you should be able to safely revert both b4ff8389ed14b849354b59ce9b360bdefcdbf99c and 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any difference. -boris That was already underway compiling :) And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) hvc_open: request_irq failed with rc -16. What i did get was an conflict reverting b4ff8389ed14b849354b59ce9b360bdefcdbf99c: arch/arm64/include/asm/irq.h, although that shouldn't matter because we are on x86 and not on arm. -- Sander -- Sander -boris ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? -- Sander -boris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. Thanks :) -- Sander Between 4.3 and 4.4-single: -NR_IRQS:4352 nr_irqs:32 16 +Using NULL legacy PIC +NR_IRQS:4352 nr_irqs:32 0 This is fine, as long as you have b4ff8389ed14b849354b59ce9b360bdefcdbf99c. -cpu 0 spinlock event irq 17 +cpu 0 spinlock event irq 1 This is strange. I wouldn't expect spinlocks to use legacy irqs. Could it be .. that with your fixup: xen/events: Always allocate legacy interrupts on PV guests (b4ff8389ed14b849354b59ce9b360bdefcdbf99c) for commit: x86/irq: Probe for PIC presence before allocating descs for legacy IRQs (8c058b0b9c34d8c8d7912880956543769323e2d8) that we now have the situation described in the commit message of 8c058b0b9c, but now for Xen PV instead of Hyper-V ? (seems both Xen and Hyper-V want to achieve the same but have different competing implementations ?) (BTW 8c058b0b9c has a CC for stable ... so could be destined to cause more trouble). -- Sander and later on: -hctosys: unable to open rtc device (rtc0) +rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) +hvc_open: request_irq failed with rc -16. +Warning: unable to open an initial console. between 4.4-single and 4.4-multi: Using NULL legacy PIC -NR_IRQS:4352 nr_irqs:32 0 +NR_IRQS:4352 nr_irqs:48 0 This is probably OK too since nr_irqs depend on number of CPUs. I think something is messed up with IRQ. I saw last week something from setup_irq() generating a stack dump (warninig) for rtc_cmos but it appeared harmless at that time and now I don't see it anymore. -boris and later on: -rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +hctosys: unable to open rtc device (rtc0) -genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) -hvc_open: request_irq failed with rc -16. -Warning: unable to open an initial console. attached: - dmesg with 4.3 kernel with 1 vcpu - dmesg with 4.4 kernel with 1 vpcu - dmesg with 4.4 kernel with 2 vpcus - .config of the 4.4 kernel is attached. -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? -- Sander -boris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. Thanks :) -- Sander Between 4.3 and 4.4-single: -NR_IRQS:4352 nr_irqs:32 16 +Using NULL legacy PIC +NR_IRQS:4352 nr_irqs:32 0 This is fine, as long as you have b4ff8389ed14b849354b59ce9b360bdefcdbf99c. -cpu 0 spinlock event irq 17 +cpu 0 spinlock event irq 1 This is strange. I wouldn't expect spinlocks to use legacy irqs. Could it be .. that with your fixup: xen/events: Always allocate legacy interrupts on PV guests (b4ff8389ed14b849354b59ce9b360bdefcdbf99c) for commit: x86/irq: Probe for PIC presence before allocating descs for legacy IRQs (8c058b0b9c34d8c8d7912880956543769323e2d8) that we now have the situation described in the commit message of 8c058b0b9c, but now for Xen PV instead of Hyper-V ? (seems both Xen and Hyper-V want to achieve the same but have different competing implementations ?) (BTW 8c058b0b9c has a CC for stable ... so could be destined to cause more trouble). -- Sander and later on: -hctosys: unable to open rtc device (rtc0) +rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) +hvc_open: request_irq failed with rc -16. +Warning: unable to open an initial console. between 4.4-single and 4.4-multi: Using NULL legacy PIC -NR_IRQS:4352 nr_irqs:32 0 +NR_IRQS:4352 nr_irqs:48 0 This is probably OK too since nr_irqs depend on number of CPUs. I think something is messed up with IRQ. I saw last week something from setup_irq() generating a stack dump (warninig) for rtc_cmos but it appeared harmless at that time and now I don't see it anymore. -boris and later on: -rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +hctosys: unable to open rtc device (rtc0) -genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) -hvc_open: request_irq failed with rc -16. -Warning: unable to open an initial console. attached: - dmesg with 4.3 kernel with 1 vcpu - dmesg with 4.4 kernel with 1 vpcu - dmesg with 4.4 kernel with 2 vpcus - .config of the 4.4 kernel is attached. -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 00:19, Boris Ostrovsky wrote: On 12/01/2015 06:00 PM, Sander Eikelenboom wrote: On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R 16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0 after that commit. This is the last number that you see in NR_IRQS:4352 nr_irqs:48 0 line. I think you should be able to safely revert both b4ff8389ed14b849354b59ce9b360bdefcdbf99c and 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any difference. -boris That was already underway compiling :) And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) hvc_open: request_irq failed with rc -16. What i did get was an conflict reverting b4ff8389ed14b849354b59ce9b360bdefcdbf99c: arch/arm64/include/asm/irq.h, although that shouldn't matter because we are on x86 and not on arm. -- Sander -- Sander -boris ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 00:41, Boris Ostrovsky wrote: On 12/01/2015 06:30 PM, Sander Eikelenboom wrote: On 2015-12-02 00:19, Boris Ostrovsky wrote: On 12/01/2015 06:00 PM, Sander Eikelenboom wrote: On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R 16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0 after that commit. This is the last number that you see in NR_IRQS:4352 nr_irqs:48 0 line. I think you should be able to safely revert both b4ff8389ed14b849354b59ce9b360bdefcdbf99c and 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any difference. -boris That was already underway compiling :) And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) hvc_open: request_irq failed with rc -16. Let me try it again tomorrow. Can you post your guest config file, Xen version and host HW (Intel or AMD)? 'xl info' maybe? -boris Guest config file == dom0 config file == the one i send you earlier. Host is an AMD Phenom X6. # xl info host : serveerstertje release: 4.4.0-rc3-20151201-linus-doflr-boris+ version: #1 SMP Tue Dec 1 19:02:58 CET 2015 machine: x86_64 nr_cpus: 6 max_cpu_id : 5 nr_nodes : 1 cores_per_socket : 6 threads_per_core : 1 cpu_mhz: 3200 hw_caps: 178bf3ff:efd3fbff::00011300:00802001::37ff: virt_caps : hvm hvm_directio total_memory : 20479 free_memory: 7745 sharing_freed_memory : 0 sharing_used_memory: 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 7 xen_extra : -unstable xen_version: 4.7-unstable xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params: virt_start=0x8000 xen_changeset : Thu Nov 26 20:58:13 2015 +0100 git:5252636-dirty xen_commandline: dom0_mem=1536M,max:1536M loglvl=all loglvl_guest=all console_timestamps=datems vga=gfx-1280x1024x32 cpuidle cpufreq=xen com1=38400,8n1 console=vga,com1 ivrs_ioapic[6]=00:14.0 iommu=on,verbose,debug,amd-iommu-debug conring_size=128k ucode=-1 cc_compiler: gcc-4.9.real (Debian 4.9.2-10) 4.9.2 cc_compile_by : root cc_compile_domain : dyndns.org cc_compile_date: Thu Nov 26 21:18:41 CET 2015 xend_config_format : 4 If you need and can get mor
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) -- Sander Thanks :) -- Sander Between 4.3 and 4.4-single: -NR_IRQS:4352 nr_irqs:32 16 +Using NULL legacy PIC +NR_IRQS:4352 nr_irqs:32 0 This is fine, as long as you have b4ff8389ed14b849354b59ce9b360bdefcdbf99c. -cpu 0 spinlock event irq 17 +cpu 0 spinlock event irq 1 This is strange. I wouldn't expect spinlocks to use legacy irqs. and later on: -hctosys: unable to open rtc device (rtc0) +rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) +hvc_open: request_irq failed with rc -16. +Warning: unable to open an initial console. between 4.4-single and 4.4-multi: Using NULL legacy PIC -NR_IRQS:4352 nr_irqs:32 0 +NR_IRQS:4352 nr_irqs:48 0 This is probably OK too since nr_irqs depend on number of CPUs. I think something is messed up with IRQ. I saw last week something from setup_irq() generating a stack dump (warninig) for rtc_cmos but it appeared harmless at that time and now I don't see it anymore. -boris and later on: -rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +hctosys: unable to open rtc device (rtc0) -genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) -hvc_open: request_irq failed with rc -16. -Warning: unable to open an initial console. attached: - dmesg with 4.3 kernel with 1 vcpu - dmesg with 4.4 kernel with 1 vpcu - dmesg with 4.4 kernel with 2 vpcus - .config of the 4.4 kernel is attached. -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) -- Sander Thanks :) -- Sander Between 4.3 and 4.4-single: -NR_IRQS:4352 nr_irqs:32 16 +Using NULL legacy PIC +NR_IRQS:4352 nr_irqs:32 0 This is fine, as long as you have b4ff8389ed14b849354b59ce9b360bdefcdbf99c. -cpu 0 spinlock event irq 17 +cpu 0 spinlock event irq 1 This is strange. I wouldn't expect spinlocks to use legacy irqs. and later on: -hctosys: unable to open rtc device (rtc0) +rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) +hvc_open: request_irq failed with rc -16. +Warning: unable to open an initial console. between 4.4-single and 4.4-multi: Using NULL legacy PIC -NR_IRQS:4352 nr_irqs:32 0 +NR_IRQS:4352 nr_irqs:48 0 This is probably OK too since nr_irqs depend on number of CPUs. I think something is messed up with IRQ. I saw last week something from setup_irq() generating a stack dump (warninig) for rtc_cmos but it appeared harmless at that time and now I don't see it anymore. -boris and later on: -rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +hctosys: unable to open rtc device (rtc0) -genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) -hvc_open: request_irq failed with rc -16. -Warning: unable to open an initial console. attached: - dmesg with 4.3 kernel with 1 vcpu - dmesg with 4.4 kernel with 1 vpcu - dmesg with 4.4 kernel with 2 vpcus - .config of the 4.4 kernel is attached. -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-4.4-mw] Regression: cx25821: Oops: no 32bit PCI DMA
On 2015-11-15 13:56, Christoph Hellwig wrote: Hi Saner, this is my fault. Please see the patch which I already sent out to Andrew and lkml. Hi Christoph, Thanks for the pointer, just tested and it works fine again. -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-4.4-mw] Regression: cx25821: Oops: no 32bit PCI DMA
On 2015-11-15 13:56, Christoph Hellwig wrote: Hi Saner, this is my fault. Please see the patch which I already sent out to Andrew and lkml. Hi Christoph, Thanks for the pointer, just tested and it works fine again. -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
Thursday, November 5, 2015, 2:53:40 PM, you wrote: > On 11/05/2015 04:13 AM, Sander Eikelenboom wrote: >> >> It makes "cat /sys/kernel/debug/kernel_page_tables" work and >> prevents a kernel with CONFIG_DEBUG_WX=y from crashing at boot. > Great. Our nightly runs also failed spectacularly due to this bug. >> >> It now does give a warning about an insecure W+X mapping, so >> CONFIG_DEBUG_WX=y >> seems to be working. No idea how to interpret it though (and if it's a >> legit >> warning). >> >> -- >> Sander >> >> [ 19.034706] Freeing unused kernel memory: 1104K (822fc000 - >> 8241) >> [ 19.041339] Write protecting the kernel read-only data: 18432k >> [ 19.052596] Freeing unused kernel memory: 1144K (880001ae2000 - >> 880001c0) >> [ 19.060285] Freeing unused kernel memory: 1560K (88000207a000 - >> 88000220) >> [ 19.067079] [ cut here ] >> [ 19.073931] WARNING: CPU: 5 PID: 1 at >> arch/x86/mm/dump_pagetables.c:225 note_page+0x619/0x7e0() > Yes, this apparently is a known issue: https://lkml.org/lkml/2015/11/4/476 > -boris Ah thx for the pointer :) -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-05 00:13, Boris Ostrovsky wrote: On 11/04/2015 03:02 PM, Sander Eikelenboom wrote: On 2015-11-04 19:47, Stephen Smalley wrote: On 11/04/2015 01:28 PM, Sander Eikelenboom wrote: On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory Needs CONFIG_X86_PTDUMP=y. Also assumes you have debugfs mounted there. Recompiled, and the result is that it also blows up: Can you try this: diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 1bf417e..b534216 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -362,8 +362,13 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, bool checkwx) { #ifdef CONFIG_X86_64 +/* 8000 - 87ff is reserved for hypervisor */ +#define is_hypervisor_range(idx) (paravirt_enabled() && \ + ((idx >= pgd_index(__PAGE_OFFSET) - 16) && \ + (idx < pgd_index(__PAGE_OFFSET pgd_t *start = (pgd_t *) _level4_pgt; #else +#define is_hypervisor_range(idx) 0 pgd_t *start = swapper_pg_dir; #endif pgprotval_t prot; @@ -381,7 +386,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, for (i = 0; i < PTRS_PER_PGD; i++) { st.current_address = normalize_addr(i * PGD_LEVEL_MULT); -if (!pgd_none(*start)) { +if (!pgd_none(*start) && !is_hypervisor_range(i)) { if (pgd_large(*start) || !pgd_present(*start)) { prot = pgd_flags(*start); note_page(m, , __pgprot(prot), 1); Hi
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-05 00:13, Boris Ostrovsky wrote: On 11/04/2015 03:02 PM, Sander Eikelenboom wrote: On 2015-11-04 19:47, Stephen Smalley wrote: On 11/04/2015 01:28 PM, Sander Eikelenboom wrote: On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory Needs CONFIG_X86_PTDUMP=y. Also assumes you have debugfs mounted there. Recompiled, and the result is that it also blows up: Can you try this: diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 1bf417e..b534216 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -362,8 +362,13 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, bool checkwx) { #ifdef CONFIG_X86_64 +/* 8000 - 87ff is reserved for hypervisor */ +#define is_hypervisor_range(idx) (paravirt_enabled() && \ + ((idx >= pgd_index(__PAGE_OFFSET) - 16) && \ + (idx < pgd_index(__PAGE_OFFSET pgd_t *start = (pgd_t *) _level4_pgt; #else +#define is_hypervisor_range(idx) 0 pgd_t *start = swapper_pg_dir; #endif pgprotval_t prot; @@ -381,7 +386,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, for (i = 0; i < PTRS_PER_PGD; i++) { st.current_address = normalize_addr(i * PGD_LEVEL_MULT); -if (!pgd_none(*start)) { +if (!pgd_none(*start) && !is_hypervisor_range(i)) { if (pgd_large(*start) || !pgd_present(*start)) { prot = pgd_flags(*start); note_page(m, , __pgprot(prot), 1); Hi
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
Thursday, November 5, 2015, 2:53:40 PM, you wrote: > On 11/05/2015 04:13 AM, Sander Eikelenboom wrote: >> >> It makes "cat /sys/kernel/debug/kernel_page_tables" work and >> prevents a kernel with CONFIG_DEBUG_WX=y from crashing at boot. > Great. Our nightly runs also failed spectacularly due to this bug. >> >> It now does give a warning about an insecure W+X mapping, so >> CONFIG_DEBUG_WX=y >> seems to be working. No idea how to interpret it though (and if it's a >> legit >> warning). >> >> -- >> Sander >> >> [ 19.034706] Freeing unused kernel memory: 1104K (822fc000 - >> 8241) >> [ 19.041339] Write protecting the kernel read-only data: 18432k >> [ 19.052596] Freeing unused kernel memory: 1144K (880001ae2000 - >> 880001c0) >> [ 19.060285] Freeing unused kernel memory: 1560K (88000207a000 - >> 88000220) >> [ 19.067079] [ cut here ] >> [ 19.073931] WARNING: CPU: 5 PID: 1 at >> arch/x86/mm/dump_pagetables.c:225 note_page+0x619/0x7e0() > Yes, this apparently is a known issue: https://lkml.org/lkml/2015/11/4/476 > -boris Ah thx for the pointer :) -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-04 19:47, Stephen Smalley wrote: On 11/04/2015 01:28 PM, Sander Eikelenboom wrote: On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory Needs CONFIG_X86_PTDUMP=y. Also assumes you have debugfs mounted there. Recompiled, and the result is that it also blows up: [ 902.389247] BUG: unable to handle kernel paging request at 88055c883000 [ 902.402749] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 902.416261] PGD 2212067 PUD 0 [ 902.427768] Oops: [#1] SMP [ 902.438137] Modules linked in: [ 902.448299] CPU: 2 PID: 21951 Comm: cat Not tainted 4.3.0-mw-20151104-linus-doflr-nodebugwx-withptdump+ #1 [ 902.458581] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 902.468850] task: 88004b49e300 ti: 88005928c000 task.ti: 88005928c000 [ 902.479133] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 902.489536] RSP: e02b:88005928fd20 EFLAGS: 00010296 [ 902.499692] RAX: 88055c883000 RBX: RCX: 8800 [ 902.509755] RDX: 0067 RSI: 88005928fd70 RDI: 88001000 [ 902.519680] RBP: 88005928fdd8 R08: 1000 R09: [ 902.529555] R10: R11: 0246 R12: 88005928ff20 [ 902.539349] R13: cfff R14: 88005928fd70 R15: 880033c773c0 [ 902.549081] FS: 7f56b07d4700() GS:88005f68() knlGS: [ 902.558690] CS: e033 DS: ES: CR0: 8005003b [ 902.568111] CR2: 88055c883000 CR3: 4563f000 CR4: 0660 [ 902.577508] Stac
Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory -- Sander # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 4.3.0-mw-20151104-linus-doflr Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=4 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32
Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-04 19:06, Ingo Molnar wrote: * Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: >Hi All, > >I just tried to boot with the current linus mergewindow tree under Xen. >It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" >option enabled. >Disabling it makes the kernel boot fine. > >The splat: >[ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - >8241) >[ 18.430314] Write protecting the kernel read-only data: 18432k >[ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - >880001c0) >[ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - >88000220) >[ 18.453947] BUG: unable to handle kernel paging request at >88055c883000 >[ 18.459943] IP: [] >ptdump_walk_pgd_level_core+0x20e/0x440 >[ 18.465847] PGD 2212067 PUD 0 >[ 18.471564] Oops: [#1] SMP >[ 18.477248] Modules linked in: >[ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted >4.3.0-mw-20151104-linus-doflr+ #1 >[ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >V1.8B1 09/13/2010 >[ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: >880059b98000 >[ 18.500852] RIP: e030:[] [] >ptdump_walk_pgd_level_core+0x20e/0x440 It would be nice to see which line of code this corresponds to. Doing this: gdb vmlinux list *0x8105af8e should normally do the trick. Thanks, Ingo Hi Ingo, (gdb) list *0x8105af8e 0x8105af8e is in ptdump_walk_pgd_level_core (arch/x86/mm/dump_pagetables.c:181). warning: Source file is more recent than executable. 176 * On 64 bits, sign-extend the 48 bit address to 64 bit 177 */ 178 static unsigned long normalize_addr(unsigned long u) 179 { 180 #ifdef CONFIG_X86_64 181 return (signed long)(u << 16) >> 16; 182 #else 183 return u; 184 #endif 185 } -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-04 19:47, Stephen Smalley wrote: On 11/04/2015 01:28 PM, Sander Eikelenboom wrote: On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory Needs CONFIG_X86_PTDUMP=y. Also assumes you have debugfs mounted there. Recompiled, and the result is that it also blows up: [ 902.389247] BUG: unable to handle kernel paging request at 88055c883000 [ 902.402749] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 902.416261] PGD 2212067 PUD 0 [ 902.427768] Oops: [#1] SMP [ 902.438137] Modules linked in: [ 902.448299] CPU: 2 PID: 21951 Comm: cat Not tainted 4.3.0-mw-20151104-linus-doflr-nodebugwx-withptdump+ #1 [ 902.458581] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 902.468850] task: 88004b49e300 ti: 88005928c000 task.ti: 88005928c000 [ 902.479133] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 902.489536] RSP: e02b:88005928fd20 EFLAGS: 00010296 [ 902.499692] RAX: 88055c883000 RBX: RCX: 8800 [ 902.509755] RDX: 0067 RSI: 88005928fd70 RDI: 88001000 [ 902.519680] RBP: 88005928fdd8 R08: 1000 R09: [ 902.529555] R10: R11: 0246 R12: 88005928ff20 [ 902.539349] R13: cfff R14: 88005928fd70 R15: 880033c773c0 [ 902.549081] FS: 7f56b07d4700() GS:88005f68() knlGS: [ 902.558690] CS: e033 DS: ES: CR0: 8005003b [ 902.568111] CR2: 88055c883000 CR3: 4563f000 CR4: 0660 [ 902.577508] Stac
Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-04 19:06, Ingo Molnar wrote: * Stephen Smalley <s...@tycho.nsa.gov> wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: >Hi All, > >I just tried to boot with the current linus mergewindow tree under Xen. >It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" >option enabled. >Disabling it makes the kernel boot fine. > >The splat: >[ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - >8241) >[ 18.430314] Write protecting the kernel read-only data: 18432k >[ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - >880001c0) >[ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - >88000220) >[ 18.453947] BUG: unable to handle kernel paging request at >88055c883000 >[ 18.459943] IP: [] >ptdump_walk_pgd_level_core+0x20e/0x440 >[ 18.465847] PGD 2212067 PUD 0 >[ 18.471564] Oops: [#1] SMP >[ 18.477248] Modules linked in: >[ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted >4.3.0-mw-20151104-linus-doflr+ #1 >[ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >V1.8B1 09/13/2010 >[ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: >880059b98000 >[ 18.500852] RIP: e030:[] [] >ptdump_walk_pgd_level_core+0x20e/0x440 It would be nice to see which line of code this corresponds to. Doing this: gdb vmlinux list *0x8105af8e should normally do the trick. Thanks, Ingo Hi Ingo, (gdb) list *0x8105af8e 0x8105af8e is in ptdump_walk_pgd_level_core (arch/x86/mm/dump_pagetables.c:181). warning: Source file is more recent than executable. 176 * On 64 bits, sign-extend the 48 bit address to 64 bit 177 */ 178 static unsigned long normalize_addr(unsigned long u) 179 { 180 #ifdef CONFIG_X86_64 181 return (signed long)(u << 16) >> 16; 182 #else 183 return u; 184 #endif 185 } -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory -- Sander # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 4.3.0-mw-20151104-linus-doflr Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=4 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32
Re: Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80
On 2015-08-17 19:18, Eric Dumazet wrote: From: Eric Dumazet On Mon, 2015-08-17 at 16:25 +0200, Sander Eikelenboom wrote: Monday, August 17, 2015, 4:21:47 PM, you wrote: > On Mon, 2015-08-17 at 09:02 -0500, Jon Christopherson wrote: >> This is very similar to the behavior I am seeing in this bug: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=102911 > OK, but have you applied the fix ? > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > It will be part of net iteration from David Miller to Linus Torvald. I did have that patch in for my last report. But i don't think he had (looking at the second part of his oops). Then can you try following fix as well ? Thanks ! Running now :) [PATCH] timer: fix a race in __mod_timer() lock_timer_base() can not catch following : CPU1 ( in __mod_timer() timer->flags |= TIMER_MIGRATING; spin_unlock(>lock); base = new_base; spin_lock(>lock); timer->flags &= ~TIMER_BASEMASK; CPU2 (in lock_timer_base()) see timer base is cpu0 base spin_lock_irqsave(>lock, *flags); if (timer->flags == tf) return base; // oops, wrong base timer->flags |= base->cpu // too late We must write timer->flags in one go, otherwise we can fool other cpus. Fixes: bc7a34b8b9eb ("timer: Reduce timer migration overhead if disabled") Signed-off-by: Eric Dumazet Cc: Thomas Gleixner --- kernel/time/timer.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 5e097fa9faf7..84190f02b521 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -807,8 +807,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, spin_unlock(>lock); base = new_base; spin_lock(>lock); - timer->flags &= ~TIMER_BASEMASK; - timer->flags |= base->cpu; + WRITE_ONCE(timer->flags, + (timer->flags & ~TIMER_BASEMASK) | base->cpu); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80
Monday, August 17, 2015, 4:21:47 PM, you wrote: > On Mon, 2015-08-17 at 09:02 -0500, Jon Christopherson wrote: >> This is very similar to the behavior I am seeing in this bug: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=102911 > OK, but have you applied the fix ? > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > It will be part of net iteration from David Miller to Linus Torvald. I did have that patch in for my last report. But i don't think he had (looking at the second part of his oops). -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80
Monday, August 17, 2015, 3:37:13 PM, you wrote: > On Mon, 2015-08-17 at 11:09 +0200, Sander Eikelenboom wrote: >> Saturday, August 15, 2015, 12:39:25 AM, you wrote: >> >> > On Sat, 2015-08-15 at 00:09 +0200, Sander Eikelenboom wrote: >> >> On 2015-08-13 00:41, Eric Dumazet wrote: >> >> > On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: >> >> > >> >> >> Thanks for the reminder, but luckily i was aware of that, >> >> >> seen enough of your replies asking for patches to be resubmitted >> >> >> against "the other tree" ;) >> >> >> Kernel with patch is currently running so fingers crossed. >> >> > >> >> > Thanks for testing. I am definitely interested knowing your results. >> >> >> >> Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is >> >> breaking things >> >> (have to test if a revert helps) i get this in some guests: >> >> >> > Yes, this was fixed by : >> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af >> >> >> Hi Eric, >> >> With that patch i had a crash again this night, see below. >> >> -- >> Sander >> >> [177459.188808] general protection fault: [#1] SMP >> [177459.199746] Modules linked in: >> [177459.210540] CPU: 0 PID: 0 Comm: swapper/0 Not tainted >> 4.2.0-rc6-20150815-linus-doflr-net+ #1 >> [177459.221441] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >> V1.8B1 09/13/2010 >> [177459.232247] task: 8221a580 ti: 8220 task.ti: >> 8220 >> [177459.242931] RIP: e030:[] [] >> detach_if_pending+0x18/0x80 >> [177459.253503] RSP: e02b:88005f6039d8 EFLAGS: 00010086 >> [177459.264051] RAX: 8800584d6580 RBX: 880004901420 RCX: >> dead00200200 >> [177459.274599] RDX: RSI: 88005f60e5c0 RDI: >> 880004901420 >> [177459.285122] RBP: 88005f6039d8 R08: 0001 R09: >> >> [177459.295286] R10: 0003 R11: 880004901394 R12: >> 0003 >> [177459.305388] R13: 00010ae47040 R14: 07b98a00 R15: >> 88005f60e5c0 >> [177459.315345] FS: 7f51317ec700() GS:88005f60() >> knlGS: >> [177459.325340] CS: e033 DS: ES: CR0: 8005003b >> [177459.335217] CR2: 010f8000 CR3: 2a154000 CR4: >> 0660 >> [177459.345129] Stack: >> [177459.354783] 88005f603a28 8110ee7f 810fb261 >> 0200 >> [177459.364505] 0003 880004901380 0003 >> 8800567d0d00 >> [177459.374064] 07b98a00 88005f603a58 >> 819b3eb3 >> [177459.383532] Call Trace: >> [177459.392878] >> [177459.392935] [] mod_timer_pending+0x3f/0xe0 >> [177459.411058] [] ? >> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 >> [177459.419876] [] __nf_ct_refresh_acct+0xa3/0xb0 >> [177459.428642] [] tcp_packet+0xb3b/0x1290 >> [177459.437285] [] ? ip_output+0x5e/0xc0 >> [177459.445845] [] ? __local_bh_enable_ip+0x2a/0x90 >> [177459.454331] [] ? __nf_conntrack_find_get+0x129/0x2a0 >> [177459.462642] [] nf_conntrack_in+0x29c/0x7c0 >> [177459.470711] [] ipv4_conntrack_local+0x4c/0x50 >> [177459.478753] [] nf_iterate+0x4c/0x80 >> [177459.486726] [] ? generic_handle_irq+0x27/0x40 >> [177459.494634] [] nf_hook_slow+0x64/0xc0 >> [177459.502486] [] __ip_local_out_sk+0x90/0xa0 >> [177459.510248] [] ? ip_forward_options+0x1a0/0x1a0 >> [177459.517782] [] ip_local_out_sk+0x16/0x40 >> [177459.525044] [] ip_queue_xmit+0x14d/0x350 >> [177459.532247] [] tcp_transmit_skb+0x48e/0x960 >> [177459.539413] [] tcp_xmit_probe_skb+0xdb/0xf0 >> [177459.546389] [] tcp_write_wakeup+0x5b/0x150 >> [177459.553061] [] tcp_keepalive_timer+0x1fb/0x230 >> [177459.559761] [] ? tcp_init_xmit_timers+0x20/0x20 >> [177459.566447] [] call_timer_fn.isra.27+0x17/0x80 >> [177459.573121] [] ? tcp_init_xmit_timers+0x20/0x20 >> [177459.579778] [] run_timer_softirq+0x12d/0x200 >> [177459.586448] [] __do_softirq+0x103/0x210 >> [177459.593138] [] irq_exit+0x4b/0xa0 >> [177459.599783] [] xen_evtchn_do_upcall+0x34/0x50 >> [177459.606300] [] xen_do_hypervisor_callback+0x1e/0x40 >> [177459.612583] >> [177459.612637] [] ? xen_hypercall_sched_op+0xa/0x20 >> [177459.62
Re: Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80
Saturday, August 15, 2015, 12:39:25 AM, you wrote: > On Sat, 2015-08-15 at 00:09 +0200, Sander Eikelenboom wrote: >> On 2015-08-13 00:41, Eric Dumazet wrote: >> > On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: >> > >> >> Thanks for the reminder, but luckily i was aware of that, >> >> seen enough of your replies asking for patches to be resubmitted >> >> against "the other tree" ;) >> >> Kernel with patch is currently running so fingers crossed. >> > >> > Thanks for testing. I am definitely interested knowing your results. >> >> Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is >> breaking things >> (have to test if a revert helps) i get this in some guests: > Yes, this was fixed by : > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af Hi Eric, With that patch i had a crash again this night, see below. -- Sander [177459.188808] general protection fault: [#1] SMP [177459.199746] Modules linked in: [177459.210540] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150815-linus-doflr-net+ #1 [177459.221441] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [177459.232247] task: 8221a580 ti: 8220 task.ti: 8220 [177459.242931] RIP: e030:[] [] detach_if_pending+0x18/0x80 [177459.253503] RSP: e02b:88005f6039d8 EFLAGS: 00010086 [177459.264051] RAX: 8800584d6580 RBX: 880004901420 RCX: dead00200200 [177459.274599] RDX: RSI: 88005f60e5c0 RDI: 880004901420 [177459.285122] RBP: 88005f6039d8 R08: 0001 R09: [177459.295286] R10: 0003 R11: 880004901394 R12: 0003 [177459.305388] R13: 00010ae47040 R14: 07b98a00 R15: 88005f60e5c0 [177459.315345] FS: 7f51317ec700() GS:88005f60() knlGS: [177459.325340] CS: e033 DS: ES: CR0: 8005003b [177459.335217] CR2: 010f8000 CR3: 2a154000 CR4: 0660 [177459.345129] Stack: [177459.354783] 88005f603a28 8110ee7f 810fb261 0200 [177459.364505] 0003 880004901380 0003 8800567d0d00 [177459.374064] 07b98a00 88005f603a58 819b3eb3 [177459.383532] Call Trace: [177459.392878] [177459.392935] [] mod_timer_pending+0x3f/0xe0 [177459.411058] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [177459.419876] [] __nf_ct_refresh_acct+0xa3/0xb0 [177459.428642] [] tcp_packet+0xb3b/0x1290 [177459.437285] [] ? ip_output+0x5e/0xc0 [177459.445845] [] ? __local_bh_enable_ip+0x2a/0x90 [177459.454331] [] ? __nf_conntrack_find_get+0x129/0x2a0 [177459.462642] [] nf_conntrack_in+0x29c/0x7c0 [177459.470711] [] ipv4_conntrack_local+0x4c/0x50 [177459.478753] [] nf_iterate+0x4c/0x80 [177459.486726] [] ? generic_handle_irq+0x27/0x40 [177459.494634] [] nf_hook_slow+0x64/0xc0 [177459.502486] [] __ip_local_out_sk+0x90/0xa0 [177459.510248] [] ? ip_forward_options+0x1a0/0x1a0 [177459.517782] [] ip_local_out_sk+0x16/0x40 [177459.525044] [] ip_queue_xmit+0x14d/0x350 [177459.532247] [] tcp_transmit_skb+0x48e/0x960 [177459.539413] [] tcp_xmit_probe_skb+0xdb/0xf0 [177459.546389] [] tcp_write_wakeup+0x5b/0x150 [177459.553061] [] tcp_keepalive_timer+0x1fb/0x230 [177459.559761] [] ? tcp_init_xmit_timers+0x20/0x20 [177459.566447] [] call_timer_fn.isra.27+0x17/0x80 [177459.573121] [] ? tcp_init_xmit_timers+0x20/0x20 [177459.579778] [] run_timer_softirq+0x12d/0x200 [177459.586448] [] __do_softirq+0x103/0x210 [177459.593138] [] irq_exit+0x4b/0xa0 [177459.599783] [] xen_evtchn_do_upcall+0x34/0x50 [177459.606300] [] xen_do_hypervisor_callback+0x1e/0x40 [177459.612583] [177459.612637] [] ? xen_hypercall_sched_op+0xa/0x20 [177459.625010] [] ? xen_hypercall_sched_op+0xa/0x20 [177459.631157] [] ? xen_safe_halt+0x10/0x20 [177459.637158] [] ? default_idle+0x13/0x20 [177459.643072] [] ? arch_cpu_idle+0xa/0x10 [177459.648809] [] ? default_idle_call+0x2e/0x50 [177459.654650] [] ? cpu_startup_entry+0x272/0x2e0 [177459.660488] [] ? rest_init+0x77/0x80 [177459.666297] [] ? start_kernel+0x43b/0x448 [177459.672092] [] ? x86_64_start_reservations+0x2a/0x2c [177459.677800] [] ? xen_start_kernel+0x550/0x55c [177459.683451] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48 [177459.695332] RIP [] detach_if_pending+0x18/0x80 [177459.701154] RSP (XEN) [2015-08-17 00:11:51.426] Hardware Dom0 crashed: rebooting machine in 5 seconds. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80
Saturday, August 15, 2015, 12:39:25 AM, you wrote: On Sat, 2015-08-15 at 00:09 +0200, Sander Eikelenboom wrote: On 2015-08-13 00:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: Thanks for the reminder, but luckily i was aware of that, seen enough of your replies asking for patches to be resubmitted against the other tree ;) Kernel with patch is currently running so fingers crossed. Thanks for testing. I am definitely interested knowing your results. Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is breaking things (have to test if a revert helps) i get this in some guests: Yes, this was fixed by : http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af Hi Eric, With that patch i had a crash again this night, see below. -- Sander [177459.188808] general protection fault: [#1] SMP [177459.199746] Modules linked in: [177459.210540] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150815-linus-doflr-net+ #1 [177459.221441] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [177459.232247] task: 8221a580 ti: 8220 task.ti: 8220 [177459.242931] RIP: e030:[8110eb58] [8110eb58] detach_if_pending+0x18/0x80 [177459.253503] RSP: e02b:88005f6039d8 EFLAGS: 00010086 [177459.264051] RAX: 8800584d6580 RBX: 880004901420 RCX: dead00200200 [177459.274599] RDX: RSI: 88005f60e5c0 RDI: 880004901420 [177459.285122] RBP: 88005f6039d8 R08: 0001 R09: [177459.295286] R10: 0003 R11: 880004901394 R12: 0003 [177459.305388] R13: 00010ae47040 R14: 07b98a00 R15: 88005f60e5c0 [177459.315345] FS: 7f51317ec700() GS:88005f60() knlGS: [177459.325340] CS: e033 DS: ES: CR0: 8005003b [177459.335217] CR2: 010f8000 CR3: 2a154000 CR4: 0660 [177459.345129] Stack: [177459.354783] 88005f603a28 8110ee7f 810fb261 0200 [177459.364505] 0003 880004901380 0003 8800567d0d00 [177459.374064] 07b98a00 88005f603a58 819b3eb3 [177459.383532] Call Trace: [177459.392878] IRQ [177459.392935] [8110ee7f] mod_timer_pending+0x3f/0xe0 [177459.411058] [810fb261] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [177459.419876] [819b3eb3] __nf_ct_refresh_acct+0xa3/0xb0 [177459.428642] [819baafb] tcp_packet+0xb3b/0x1290 [177459.437285] [81a2535e] ? ip_output+0x5e/0xc0 [177459.445845] [810ca8ca] ? __local_bh_enable_ip+0x2a/0x90 [177459.454331] [819b35a9] ? __nf_conntrack_find_get+0x129/0x2a0 [177459.462642] [819b549c] nf_conntrack_in+0x29c/0x7c0 [177459.470711] [81a65e9c] ipv4_conntrack_local+0x4c/0x50 [177459.478753] [819ad67c] nf_iterate+0x4c/0x80 [177459.486726] [81102437] ? generic_handle_irq+0x27/0x40 [177459.494634] [819ad714] nf_hook_slow+0x64/0xc0 [177459.502486] [81a22d40] __ip_local_out_sk+0x90/0xa0 [177459.510248] [81a22c40] ? ip_forward_options+0x1a0/0x1a0 [177459.517782] [81a22d66] ip_local_out_sk+0x16/0x40 [177459.525044] [81a2343d] ip_queue_xmit+0x14d/0x350 [177459.532247] [81a3ae7e] tcp_transmit_skb+0x48e/0x960 [177459.539413] [81a3cddb] tcp_xmit_probe_skb+0xdb/0xf0 [177459.546389] [81a3dffb] tcp_write_wakeup+0x5b/0x150 [177459.553061] [81a3e51b] tcp_keepalive_timer+0x1fb/0x230 [177459.559761] [81a3e320] ? tcp_init_xmit_timers+0x20/0x20 [177459.566447] [8110f3c7] call_timer_fn.isra.27+0x17/0x80 [177459.573121] [81a3e320] ? tcp_init_xmit_timers+0x20/0x20 [177459.579778] [8110f55d] run_timer_softirq+0x12d/0x200 [177459.586448] [810ca6c3] __do_softirq+0x103/0x210 [177459.593138] [810ca9cb] irq_exit+0x4b/0xa0 [177459.599783] [814f05d4] xen_evtchn_do_upcall+0x34/0x50 [177459.606300] [81af93ae] xen_do_hypervisor_callback+0x1e/0x40 [177459.612583] EOI [177459.612637] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [177459.625010] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [177459.631157] [81008d60] ? xen_safe_halt+0x10/0x20 [177459.637158] [810188d3] ? default_idle+0x13/0x20 [177459.643072] [81018e1a] ? arch_cpu_idle+0xa/0x10 [177459.648809] [810f8e7e] ? default_idle_call+0x2e/0x50 [177459.654650] [810f9112] ? cpu_startup_entry+0x272/0x2e0 [177459.660488] [81ae79f7] ? rest_init+0x77/0x80 [177459.666297] [82312f58] ? start_kernel+0x43b/0x448 [177459.672092] [823124ef] ? x86_64_start_reservations+0x2a/0x2c [177459.677800] [82316008] ? xen_start_kernel+0x550/0x55c [177459.683451
Re: Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80
Monday, August 17, 2015, 3:37:13 PM, you wrote: On Mon, 2015-08-17 at 11:09 +0200, Sander Eikelenboom wrote: Saturday, August 15, 2015, 12:39:25 AM, you wrote: On Sat, 2015-08-15 at 00:09 +0200, Sander Eikelenboom wrote: On 2015-08-13 00:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: Thanks for the reminder, but luckily i was aware of that, seen enough of your replies asking for patches to be resubmitted against the other tree ;) Kernel with patch is currently running so fingers crossed. Thanks for testing. I am definitely interested knowing your results. Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is breaking things (have to test if a revert helps) i get this in some guests: Yes, this was fixed by : http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af Hi Eric, With that patch i had a crash again this night, see below. -- Sander [177459.188808] general protection fault: [#1] SMP [177459.199746] Modules linked in: [177459.210540] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150815-linus-doflr-net+ #1 [177459.221441] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [177459.232247] task: 8221a580 ti: 8220 task.ti: 8220 [177459.242931] RIP: e030:[8110eb58] [8110eb58] detach_if_pending+0x18/0x80 [177459.253503] RSP: e02b:88005f6039d8 EFLAGS: 00010086 [177459.264051] RAX: 8800584d6580 RBX: 880004901420 RCX: dead00200200 [177459.274599] RDX: RSI: 88005f60e5c0 RDI: 880004901420 [177459.285122] RBP: 88005f6039d8 R08: 0001 R09: [177459.295286] R10: 0003 R11: 880004901394 R12: 0003 [177459.305388] R13: 00010ae47040 R14: 07b98a00 R15: 88005f60e5c0 [177459.315345] FS: 7f51317ec700() GS:88005f60() knlGS: [177459.325340] CS: e033 DS: ES: CR0: 8005003b [177459.335217] CR2: 010f8000 CR3: 2a154000 CR4: 0660 [177459.345129] Stack: [177459.354783] 88005f603a28 8110ee7f 810fb261 0200 [177459.364505] 0003 880004901380 0003 8800567d0d00 [177459.374064] 07b98a00 88005f603a58 819b3eb3 [177459.383532] Call Trace: [177459.392878] IRQ [177459.392935] [8110ee7f] mod_timer_pending+0x3f/0xe0 [177459.411058] [810fb261] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [177459.419876] [819b3eb3] __nf_ct_refresh_acct+0xa3/0xb0 [177459.428642] [819baafb] tcp_packet+0xb3b/0x1290 [177459.437285] [81a2535e] ? ip_output+0x5e/0xc0 [177459.445845] [810ca8ca] ? __local_bh_enable_ip+0x2a/0x90 [177459.454331] [819b35a9] ? __nf_conntrack_find_get+0x129/0x2a0 [177459.462642] [819b549c] nf_conntrack_in+0x29c/0x7c0 [177459.470711] [81a65e9c] ipv4_conntrack_local+0x4c/0x50 [177459.478753] [819ad67c] nf_iterate+0x4c/0x80 [177459.486726] [81102437] ? generic_handle_irq+0x27/0x40 [177459.494634] [819ad714] nf_hook_slow+0x64/0xc0 [177459.502486] [81a22d40] __ip_local_out_sk+0x90/0xa0 [177459.510248] [81a22c40] ? ip_forward_options+0x1a0/0x1a0 [177459.517782] [81a22d66] ip_local_out_sk+0x16/0x40 [177459.525044] [81a2343d] ip_queue_xmit+0x14d/0x350 [177459.532247] [81a3ae7e] tcp_transmit_skb+0x48e/0x960 [177459.539413] [81a3cddb] tcp_xmit_probe_skb+0xdb/0xf0 [177459.546389] [81a3dffb] tcp_write_wakeup+0x5b/0x150 [177459.553061] [81a3e51b] tcp_keepalive_timer+0x1fb/0x230 [177459.559761] [81a3e320] ? tcp_init_xmit_timers+0x20/0x20 [177459.566447] [8110f3c7] call_timer_fn.isra.27+0x17/0x80 [177459.573121] [81a3e320] ? tcp_init_xmit_timers+0x20/0x20 [177459.579778] [8110f55d] run_timer_softirq+0x12d/0x200 [177459.586448] [810ca6c3] __do_softirq+0x103/0x210 [177459.593138] [810ca9cb] irq_exit+0x4b/0xa0 [177459.599783] [814f05d4] xen_evtchn_do_upcall+0x34/0x50 [177459.606300] [81af93ae] xen_do_hypervisor_callback+0x1e/0x40 [177459.612583] EOI [177459.612637] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [177459.625010] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [177459.631157] [81008d60] ? xen_safe_halt+0x10/0x20 [177459.637158] [810188d3] ? default_idle+0x13/0x20 [177459.643072] [81018e1a] ? arch_cpu_idle+0xa/0x10 [177459.648809] [810f8e7e] ? default_idle_call+0x2e/0x50 [177459.654650] [810f9112] ? cpu_startup_entry+0x272/0x2e0 [177459.660488] [81ae79f7] ? rest_init+0x77/0x80
Re: Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80
Monday, August 17, 2015, 4:21:47 PM, you wrote: On Mon, 2015-08-17 at 09:02 -0500, Jon Christopherson wrote: This is very similar to the behavior I am seeing in this bug: https://bugzilla.kernel.org/show_bug.cgi?id=102911 OK, but have you applied the fix ? http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af It will be part of net iteration from David Miller to Linus Torvald. I did have that patch in for my last report. But i don't think he had (looking at the second part of his oops). -- Sander -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80
On 2015-08-17 19:18, Eric Dumazet wrote: From: Eric Dumazet eduma...@google.com On Mon, 2015-08-17 at 16:25 +0200, Sander Eikelenboom wrote: Monday, August 17, 2015, 4:21:47 PM, you wrote: On Mon, 2015-08-17 at 09:02 -0500, Jon Christopherson wrote: This is very similar to the behavior I am seeing in this bug: https://bugzilla.kernel.org/show_bug.cgi?id=102911 OK, but have you applied the fix ? http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af It will be part of net iteration from David Miller to Linus Torvald. I did have that patch in for my last report. But i don't think he had (looking at the second part of his oops). Then can you try following fix as well ? Thanks ! Running now :) [PATCH] timer: fix a race in __mod_timer() lock_timer_base() can not catch following : CPU1 ( in __mod_timer() timer-flags |= TIMER_MIGRATING; spin_unlock(base-lock); base = new_base; spin_lock(base-lock); timer-flags = ~TIMER_BASEMASK; CPU2 (in lock_timer_base()) see timer base is cpu0 base spin_lock_irqsave(base-lock, *flags); if (timer-flags == tf) return base; // oops, wrong base timer-flags |= base-cpu // too late We must write timer-flags in one go, otherwise we can fool other cpus. Fixes: bc7a34b8b9eb (timer: Reduce timer migration overhead if disabled) Signed-off-by: Eric Dumazet eduma...@google.com Cc: Thomas Gleixner t...@linutronix.de --- kernel/time/timer.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 5e097fa9faf7..84190f02b521 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -807,8 +807,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, spin_unlock(base-lock); base = new_base; spin_lock(base-lock); - timer-flags = ~TIMER_BASEMASK; - timer-flags |= base-cpu; + WRITE_ONCE(timer-flags, + (timer-flags ~TIMER_BASEMASK) | base-cpu); } } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80
On 2015-08-15 00:09, Sander Eikelenboom wrote: On 2015-08-13 00:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: Thanks for the reminder, but luckily i was aware of that, seen enough of your replies asking for patches to be resubmitted against "the other tree" ;) Kernel with patch is currently running so fingers crossed. Thanks for testing. I am definitely interested knowing your results. Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is breaking things (have to test if a revert helps) i get this in some guests: Should have done that before, because it wasn't in yet .. and likely to fix the issue, also pulled and compiling now. -- Sander NMI watchdog: BUG: soft lockup - CPU#0 stuck for 506s! [swapper/0:0] [ 6620.282805] Modules linked in: [ 6620.282805] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150814-linus-doflr-apicrevert+ #1 [ 6620.282805] task: 8221a580 ti: 8220 task.ti: 8220 [ 6620.282805] RIP: e030:[] [] xen_hypercall_xen_version+0xa/0x20 [ 6620.282805] RSP: e02b:88000fc03d48 EFLAGS: 0246 [ 6620.282805] RAX: 00040006 RBX: 0200 RCX: 8100122a [ 6620.282805] RDX: 0001 RSI: deadbeef RDI: deadbeef [ 6620.282805] RBP: 88000fc03d60 R08: 88000fc03ee0 R09: 00ee [ 6620.282805] R10: 8220a0c0 R11: 0246 R12: [ 6620.282805] R13: 0001 R14: 880003b53054 R15: 0005 [ 6620.282805] FS: 7fec747ad800() GS:88000fc0() knlGS: [ 6620.282805] CS: e033 DS: ES: CR0: 8005003b [ 6620.282805] CR2: 7ffcb7a7a6d8 CR3: 03164000 CR4: 0660 [ 6620.282805] Stack: [ 6620.282805] 0068 0007 81008dbd 88000fc03dd8 [ 6620.282805] 81009592 0068 8220a0c0 00ee [ 6620.282805] 88000fc03ee0 0200 0200 0001 [ 6620.282805] Call Trace: [ 6620.282805] [ 6620.282805] [] ? xen_force_evtchn_callback+0xd/0x10 [ 6620.282805] [] check_events+0x12/0x20 [ 6620.282805] [] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 6620.282805] [] ? _raw_spin_unlock_irqrestore+0x25/0x30 [ 6620.282805] [] try_to_del_timer_sync+0x43/0x60 [ 6620.282805] [] del_timer_sync+0x47/0x60 [ 6620.282805] [] inet_csk_reqsk_queue_drop+0x118/0x1f0 [ 6620.282805] [] reqsk_timer_handler+0x156/0x260 [ 6620.282805] [] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [] call_timer_fn.isra.27+0x17/0x80 [ 6620.282805] [] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [] run_timer_softirq+0x12d/0x200 [ 6620.282805] [] __do_softirq+0x103/0x210 [ 6620.282805] [] irq_exit+0x4b/0xa0 [ 6620.282805] [] xen_evtchn_do_upcall+0x34/0x50 [ 6620.282805] [] xen_do_hypervisor_callback+0x1e/0x40 [ 6620.282805] [ 6620.282805] [] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [] ? xen_safe_halt+0x10/0x20 [ 6620.282805] [] ? default_idle+0x13/0x20 [ 6620.282805] [] ? arch_cpu_idle+0xa/0x10 [ 6620.282805] [] ? default_idle_call+0x2e/0x50 [ 6620.282805] [] ? cpu_startup_entry+0x272/0x2e0 [ 6620.282805] [] ? rest_init+0x77/0x80 [ 6620.282805] [] ? start_kernel+0x43b/0x448 [ 6620.282805] [] ? x86_64_start_reservations+0x2a/0x2c [ 6620.282805] [] ? xen_start_kernel+0x550/0x55c [ 6620.282805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80
On 2015-08-13 00:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: Thanks for the reminder, but luckily i was aware of that, seen enough of your replies asking for patches to be resubmitted against "the other tree" ;) Kernel with patch is currently running so fingers crossed. Thanks for testing. I am definitely interested knowing your results. Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is breaking things (have to test if a revert helps) i get this in some guests: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 506s! [swapper/0:0] [ 6620.282805] Modules linked in: [ 6620.282805] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150814-linus-doflr-apicrevert+ #1 [ 6620.282805] task: 8221a580 ti: 8220 task.ti: 8220 [ 6620.282805] RIP: e030:[] [] xen_hypercall_xen_version+0xa/0x20 [ 6620.282805] RSP: e02b:88000fc03d48 EFLAGS: 0246 [ 6620.282805] RAX: 00040006 RBX: 0200 RCX: 8100122a [ 6620.282805] RDX: 0001 RSI: deadbeef RDI: deadbeef [ 6620.282805] RBP: 88000fc03d60 R08: 88000fc03ee0 R09: 00ee [ 6620.282805] R10: 8220a0c0 R11: 0246 R12: [ 6620.282805] R13: 0001 R14: 880003b53054 R15: 0005 [ 6620.282805] FS: 7fec747ad800() GS:88000fc0() knlGS: [ 6620.282805] CS: e033 DS: ES: CR0: 8005003b [ 6620.282805] CR2: 7ffcb7a7a6d8 CR3: 03164000 CR4: 0660 [ 6620.282805] Stack: [ 6620.282805] 0068 0007 81008dbd 88000fc03dd8 [ 6620.282805] 81009592 0068 8220a0c0 00ee [ 6620.282805] 88000fc03ee0 0200 0200 0001 [ 6620.282805] Call Trace: [ 6620.282805] [ 6620.282805] [] ? xen_force_evtchn_callback+0xd/0x10 [ 6620.282805] [] check_events+0x12/0x20 [ 6620.282805] [] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 6620.282805] [] ? _raw_spin_unlock_irqrestore+0x25/0x30 [ 6620.282805] [] try_to_del_timer_sync+0x43/0x60 [ 6620.282805] [] del_timer_sync+0x47/0x60 [ 6620.282805] [] inet_csk_reqsk_queue_drop+0x118/0x1f0 [ 6620.282805] [] reqsk_timer_handler+0x156/0x260 [ 6620.282805] [] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [] call_timer_fn.isra.27+0x17/0x80 [ 6620.282805] [] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [] run_timer_softirq+0x12d/0x200 [ 6620.282805] [] __do_softirq+0x103/0x210 [ 6620.282805] [] irq_exit+0x4b/0xa0 [ 6620.282805] [] xen_evtchn_do_upcall+0x34/0x50 [ 6620.282805] [] xen_do_hypervisor_callback+0x1e/0x40 [ 6620.282805] [ 6620.282805] [] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [] ? xen_safe_halt+0x10/0x20 [ 6620.282805] [] ? default_idle+0x13/0x20 [ 6620.282805] [] ? arch_cpu_idle+0xa/0x10 [ 6620.282805] [] ? default_idle_call+0x2e/0x50 [ 6620.282805] [] ? cpu_startup_entry+0x272/0x2e0 [ 6620.282805] [] ? rest_init+0x77/0x80 [ 6620.282805] [] ? start_kernel+0x43b/0x448 [ 6620.282805] [] ? x86_64_start_reservations+0x2a/0x2c [ 6620.282805] [] ? xen_start_kernel+0x550/0x55c [ 6620.282805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80
On 2015-08-13 00:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: Thanks for the reminder, but luckily i was aware of that, seen enough of your replies asking for patches to be resubmitted against the other tree ;) Kernel with patch is currently running so fingers crossed. Thanks for testing. I am definitely interested knowing your results. Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is breaking things (have to test if a revert helps) i get this in some guests: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 506s! [swapper/0:0] [ 6620.282805] Modules linked in: [ 6620.282805] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150814-linus-doflr-apicrevert+ #1 [ 6620.282805] task: 8221a580 ti: 8220 task.ti: 8220 [ 6620.282805] RIP: e030:[8100122a] [8100122a] xen_hypercall_xen_version+0xa/0x20 [ 6620.282805] RSP: e02b:88000fc03d48 EFLAGS: 0246 [ 6620.282805] RAX: 00040006 RBX: 0200 RCX: 8100122a [ 6620.282805] RDX: 0001 RSI: deadbeef RDI: deadbeef [ 6620.282805] RBP: 88000fc03d60 R08: 88000fc03ee0 R09: 00ee [ 6620.282805] R10: 8220a0c0 R11: 0246 R12: [ 6620.282805] R13: 0001 R14: 880003b53054 R15: 0005 [ 6620.282805] FS: 7fec747ad800() GS:88000fc0() knlGS: [ 6620.282805] CS: e033 DS: ES: CR0: 8005003b [ 6620.282805] CR2: 7ffcb7a7a6d8 CR3: 03164000 CR4: 0660 [ 6620.282805] Stack: [ 6620.282805] 0068 0007 81008dbd 88000fc03dd8 [ 6620.282805] 81009592 0068 8220a0c0 00ee [ 6620.282805] 88000fc03ee0 0200 0200 0001 [ 6620.282805] Call Trace: [ 6620.282805] IRQ [ 6620.282805] [81008dbd] ? xen_force_evtchn_callback+0xd/0x10 [ 6620.282805] [81009592] check_events+0x12/0x20 [ 6620.282805] [8100957f] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 6620.282805] [81af79a5] ? _raw_spin_unlock_irqrestore+0x25/0x30 [ 6620.282805] [8110ed43] try_to_del_timer_sync+0x43/0x60 [ 6620.282805] [8110eda7] del_timer_sync+0x47/0x60 [ 6620.282805] [81a2b698] inet_csk_reqsk_queue_drop+0x118/0x1f0 [ 6620.282805] [81a2b8c6] reqsk_timer_handler+0x156/0x260 [ 6620.282805] [81a2b770] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [8110f3c7] call_timer_fn.isra.27+0x17/0x80 [ 6620.282805] [81a2b770] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [8110f55d] run_timer_softirq+0x12d/0x200 [ 6620.282805] [810ca6c3] __do_softirq+0x103/0x210 [ 6620.282805] [810ca9cb] irq_exit+0x4b/0xa0 [ 6620.282805] [814f05d4] xen_evtchn_do_upcall+0x34/0x50 [ 6620.282805] [81af932e] xen_do_hypervisor_callback+0x1e/0x40 [ 6620.282805] EOI [ 6620.282805] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [81008d60] ? xen_safe_halt+0x10/0x20 [ 6620.282805] [810188d3] ? default_idle+0x13/0x20 [ 6620.282805] [81018e1a] ? arch_cpu_idle+0xa/0x10 [ 6620.282805] [810f8e7e] ? default_idle_call+0x2e/0x50 [ 6620.282805] [810f9112] ? cpu_startup_entry+0x272/0x2e0 [ 6620.282805] [81ae7967] ? rest_init+0x77/0x80 [ 6620.282805] [82312f58] ? start_kernel+0x43b/0x448 [ 6620.282805] [823124ef] ? x86_64_start_reservations+0x2a/0x2c [ 6620.282805] [82316008] ? xen_start_kernel+0x550/0x55c [ 6620.282805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80
On 2015-08-15 00:09, Sander Eikelenboom wrote: On 2015-08-13 00:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: Thanks for the reminder, but luckily i was aware of that, seen enough of your replies asking for patches to be resubmitted against the other tree ;) Kernel with patch is currently running so fingers crossed. Thanks for testing. I am definitely interested knowing your results. Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is breaking things (have to test if a revert helps) i get this in some guests: Should have done that before, because it wasn't in yet .. and likely to fix the issue, also pulled and compiling now. -- Sander NMI watchdog: BUG: soft lockup - CPU#0 stuck for 506s! [swapper/0:0] [ 6620.282805] Modules linked in: [ 6620.282805] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150814-linus-doflr-apicrevert+ #1 [ 6620.282805] task: 8221a580 ti: 8220 task.ti: 8220 [ 6620.282805] RIP: e030:[8100122a] [8100122a] xen_hypercall_xen_version+0xa/0x20 [ 6620.282805] RSP: e02b:88000fc03d48 EFLAGS: 0246 [ 6620.282805] RAX: 00040006 RBX: 0200 RCX: 8100122a [ 6620.282805] RDX: 0001 RSI: deadbeef RDI: deadbeef [ 6620.282805] RBP: 88000fc03d60 R08: 88000fc03ee0 R09: 00ee [ 6620.282805] R10: 8220a0c0 R11: 0246 R12: [ 6620.282805] R13: 0001 R14: 880003b53054 R15: 0005 [ 6620.282805] FS: 7fec747ad800() GS:88000fc0() knlGS: [ 6620.282805] CS: e033 DS: ES: CR0: 8005003b [ 6620.282805] CR2: 7ffcb7a7a6d8 CR3: 03164000 CR4: 0660 [ 6620.282805] Stack: [ 6620.282805] 0068 0007 81008dbd 88000fc03dd8 [ 6620.282805] 81009592 0068 8220a0c0 00ee [ 6620.282805] 88000fc03ee0 0200 0200 0001 [ 6620.282805] Call Trace: [ 6620.282805] IRQ [ 6620.282805] [81008dbd] ? xen_force_evtchn_callback+0xd/0x10 [ 6620.282805] [81009592] check_events+0x12/0x20 [ 6620.282805] [8100957f] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 6620.282805] [81af79a5] ? _raw_spin_unlock_irqrestore+0x25/0x30 [ 6620.282805] [8110ed43] try_to_del_timer_sync+0x43/0x60 [ 6620.282805] [8110eda7] del_timer_sync+0x47/0x60 [ 6620.282805] [81a2b698] inet_csk_reqsk_queue_drop+0x118/0x1f0 [ 6620.282805] [81a2b8c6] reqsk_timer_handler+0x156/0x260 [ 6620.282805] [81a2b770] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [8110f3c7] call_timer_fn.isra.27+0x17/0x80 [ 6620.282805] [81a2b770] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [8110f55d] run_timer_softirq+0x12d/0x200 [ 6620.282805] [810ca6c3] __do_softirq+0x103/0x210 [ 6620.282805] [810ca9cb] irq_exit+0x4b/0xa0 [ 6620.282805] [814f05d4] xen_evtchn_do_upcall+0x34/0x50 [ 6620.282805] [81af932e] xen_do_hypervisor_callback+0x1e/0x40 [ 6620.282805] EOI [ 6620.282805] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [81008d60] ? xen_safe_halt+0x10/0x20 [ 6620.282805] [810188d3] ? default_idle+0x13/0x20 [ 6620.282805] [81018e1a] ? arch_cpu_idle+0xa/0x10 [ 6620.282805] [810f8e7e] ? default_idle_call+0x2e/0x50 [ 6620.282805] [810f9112] ? cpu_startup_entry+0x272/0x2e0 [ 6620.282805] [81ae7967] ? rest_init+0x77/0x80 [ 6620.282805] [82312f58] ? start_kernel+0x43b/0x448 [ 6620.282805] [823124ef] ? x86_64_start_reservations+0x2a/0x2c [ 6620.282805] [82316008] ? xen_start_kernel+0x550/0x55c [ 6620.282805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/