Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full
On Fri, 31 Jul 2020 at 06:45, Nitesh Narayan Lal wrote: > > > On 7/29/20 8:34 AM, Nitesh Narayan Lal wrote: > > On 7/28/20 10:38 PM, Wanpeng Li wrote: > >> Hi Nitesh, > >> On Wed, 29 Jul 2020 at 09:00, Wanpeng Li wrote: > >>> On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal > >>> wrote: > Hi, > > I have recently come across an RCU trace with the 5.8-rc7 kernel that > has the > debug configs enabled while installing a VM on a CPU that is listed under > nohz_full. > > Based on some of the initial debugging, my impression is that the issue > is > triggered because of the fastpath that is meant to optimize the writes > to x2APIC > ICR that eventually leads to a virtual IPI in fixed delivery mode, is > getting > invoked from the quiescent state. > >> Could you try latest linux-next tree? I guess maybe some patches are > >> pending in linux-next tree, I can't reproduce against linux-next tree. > > Sure, I will try this today. > > Hi Wanpeng, > > I am not seeing the issue getting reproduced with the linux-next tree. > Although, I am still seeing a Warning stack trace: > > [ 139.220080] RIP: 0010:kvm_arch_vcpu_ioctl_run+0xb57/0x1320 [kvm] > [ 139.226837] Code: e8 03 0f b6 04 18 84 c0 74 06 0f 8e 4a 03 00 00 41 c6 85 > 48 > 31 00 00 00 e9 24 f8 ff ff 4c 89 ef e8 7e ac 02 00 e9 3d f8 ff ff <0f> 0b e9 > f2 > f8 ff ff 48f > [ 139.247828] RSP: 0018:8889bc397cb8 EFLAGS: 00010202 > [ 139.253700] RAX: 0001 RBX: dc00 RCX: > c1fc3bef > [ 139.261695] RDX: RSI: 0001 RDI: > 888f0fa1a8a0 > [ 139.269692] RBP: 8889bc397d18 R08: ed113786a7d0 R09: > ed113786a7d0 > [ 139.277686] R10: 8889bc353e7f R11: ed113786a7cf R12: > 8889bc35423c > [ 139.285682] R13: 8889bc353e40 R14: 8889bc353e6c R15: > 88897f536000 > [ 139.293678] FS: 7f3d8a71c700() GS:888a3c40() > knlGS: > [ 139.302742] CS: 0010 DS: ES: CR0: 80050033 > [ 139.309186] CR2: CR3: 0009bc34c004 CR4: > 003726e0 > [ 139.317180] Call Trace: > [ 139.320002] kvm_vcpu_ioctl+0x3ee/0xb10 [kvm] > [ 139.324907] ? sched_clock+0x5/0x10 > [ 139.328875] ? kvm_io_bus_get_dev+0x1c0/0x1c0 [kvm] > [ 139.334375] ? ioctl_file_clone+0x120/0x120 > [ 139.339079] ? selinux_file_ioctl+0x98/0x570 > [ 139.343895] ? selinux_file_mprotect+0x5b0/0x5b0 > [ 139.349088] ? irq_matrix_assign+0x360/0x430 > [ 139.353904] ? rcu_read_lock_sched_held+0xe0/0xe0 > [ 139.359201] ? __fget_files+0x1f0/0x300 > [ 139.363532] __x64_sys_ioctl+0x128/0x18e > [ 139.367948] do_syscall_64+0x33/0x40 > [ 139.371974] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 139.377643] RIP: 0033:0x7f3d98d0a88b > > Are you also triggering anything like this in your environment? I see other issue when rmmod kvm modules. :) [ cut here ] WARNING: CPU: 5 PID: 2837 at kernel/rcu/tree_plugin.h:1738 call_rcu+0xd3/0x800 CPU: 5 PID: 2837 Comm: rmmod Not tainted 5.8.0-rc7-next-20200728 #1 RIP: 0010:call_rcu+0xd3/0x800 RSP: 0018:ae25c302bd90 EFLAGS: 00010002 RAX: 0001 RBX: 944e4f80 RCX: RDX: 0101 RSI: 009f RDI: 93308cef RBP: ae25c302bdf0 R08: 074e R09: 2a2f R10: 0002 R11: 944dcf80 R12: 936ef4c0 R13: 9702ce1fd900 R14: 920e2bd0 R15: 3849 FS: 7fc99b242700() GS:9702ce00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 55905332dd58 CR3: 0003eeb46005 CR4: 001706e0 Call Trace: call_rcu_zapped+0x70/0x80 lockdep_unregister_key+0xa6/0xf0 destroy_workqueue+0x1b1/0x210 kvm_irqfd_exit+0x15/0x20 [kvm] kvm_exit+0x78/0x80 [kvm] vmx_exit+0x1e/0x50 [kvm_intel] __x64_sys_delete_module+0x1e6/0x260 do_syscall_64+0x63/0x350 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc99ad7b9e7
Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full
On 7/29/20 8:34 AM, Nitesh Narayan Lal wrote: > On 7/28/20 10:38 PM, Wanpeng Li wrote: >> Hi Nitesh, >> On Wed, 29 Jul 2020 at 09:00, Wanpeng Li wrote: >>> On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal wrote: Hi, I have recently come across an RCU trace with the 5.8-rc7 kernel that has the debug configs enabled while installing a VM on a CPU that is listed under nohz_full. Based on some of the initial debugging, my impression is that the issue is triggered because of the fastpath that is meant to optimize the writes to x2APIC ICR that eventually leads to a virtual IPI in fixed delivery mode, is getting invoked from the quiescent state. >> Could you try latest linux-next tree? I guess maybe some patches are >> pending in linux-next tree, I can't reproduce against linux-next tree. > Sure, I will try this today. Hi Wanpeng, I am not seeing the issue getting reproduced with the linux-next tree. Although, I am still seeing a Warning stack trace: [ 139.220080] RIP: 0010:kvm_arch_vcpu_ioctl_run+0xb57/0x1320 [kvm] [ 139.226837] Code: e8 03 0f b6 04 18 84 c0 74 06 0f 8e 4a 03 00 00 41 c6 85 48 31 00 00 00 e9 24 f8 ff ff 4c 89 ef e8 7e ac 02 00 e9 3d f8 ff ff <0f> 0b e9 f2 f8 ff ff 48f [ 139.247828] RSP: 0018:8889bc397cb8 EFLAGS: 00010202 [ 139.253700] RAX: 0001 RBX: dc00 RCX: c1fc3bef [ 139.261695] RDX: RSI: 0001 RDI: 888f0fa1a8a0 [ 139.269692] RBP: 8889bc397d18 R08: ed113786a7d0 R09: ed113786a7d0 [ 139.277686] R10: 8889bc353e7f R11: ed113786a7cf R12: 8889bc35423c [ 139.285682] R13: 8889bc353e40 R14: 8889bc353e6c R15: 88897f536000 [ 139.293678] FS: 7f3d8a71c700() GS:888a3c40() knlGS: [ 139.302742] CS: 0010 DS: ES: CR0: 80050033 [ 139.309186] CR2: CR3: 0009bc34c004 CR4: 003726e0 [ 139.317180] Call Trace: [ 139.320002] kvm_vcpu_ioctl+0x3ee/0xb10 [kvm] [ 139.324907] ? sched_clock+0x5/0x10 [ 139.328875] ? kvm_io_bus_get_dev+0x1c0/0x1c0 [kvm] [ 139.334375] ? ioctl_file_clone+0x120/0x120 [ 139.339079] ? selinux_file_ioctl+0x98/0x570 [ 139.343895] ? selinux_file_mprotect+0x5b0/0x5b0 [ 139.349088] ? irq_matrix_assign+0x360/0x430 [ 139.353904] ? rcu_read_lock_sched_held+0xe0/0xe0 [ 139.359201] ? __fget_files+0x1f0/0x300 [ 139.363532] __x64_sys_ioctl+0x128/0x18e [ 139.367948] do_syscall_64+0x33/0x40 [ 139.371974] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 139.377643] RIP: 0033:0x7f3d98d0a88b Are you also triggering anything like this in your environment? > -- Nitesh signature.asc Description: OpenPGP digital signature
Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full
On 7/28/20 10:38 PM, Wanpeng Li wrote: > Hi Nitesh, > On Wed, 29 Jul 2020 at 09:00, Wanpeng Li wrote: >> On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal wrote: >>> Hi, >>> >>> I have recently come across an RCU trace with the 5.8-rc7 kernel that has >>> the >>> debug configs enabled while installing a VM on a CPU that is listed under >>> nohz_full. >>> >>> Based on some of the initial debugging, my impression is that the issue is >>> triggered because of the fastpath that is meant to optimize the writes to >>> x2APIC >>> ICR that eventually leads to a virtual IPI in fixed delivery mode, is >>> getting >>> invoked from the quiescent state. > Could you try latest linux-next tree? I guess maybe some patches are > pending in linux-next tree, I can't reproduce against linux-next tree. Sure, I will try this today. > -- Nitesh signature.asc Description: OpenPGP digital signature
Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full
Hi Nitesh, On Wed, 29 Jul 2020 at 09:00, Wanpeng Li wrote: > > On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal wrote: > > > > Hi, > > > > I have recently come across an RCU trace with the 5.8-rc7 kernel that has > > the > > debug configs enabled while installing a VM on a CPU that is listed under > > nohz_full. > > > > Based on some of the initial debugging, my impression is that the issue is > > triggered because of the fastpath that is meant to optimize the writes to > > x2APIC > > ICR that eventually leads to a virtual IPI in fixed delivery mode, is > > getting > > invoked from the quiescent state. Could you try latest linux-next tree? I guess maybe some patches are pending in linux-next tree, I can't reproduce against linux-next tree.
Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full
On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal wrote: > > Hi, > > I have recently come across an RCU trace with the 5.8-rc7 kernel that has the > debug configs enabled while installing a VM on a CPU that is listed under > nohz_full. > > Based on some of the initial debugging, my impression is that the issue is > triggered because of the fastpath that is meant to optimize the writes to > x2APIC > ICR that eventually leads to a virtual IPI in fixed delivery mode, is getting > invoked from the quiescent state. I still can reproduce this after removing the ipi/timer fastpath codes, anyway, I will have a look.
WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full
Hi, I have recently come across an RCU trace with the 5.8-rc7 kernel that has the debug configs enabled while installing a VM on a CPU that is listed under nohz_full. Based on some of the initial debugging, my impression is that the issue is triggered because of the fastpath that is meant to optimize the writes to x2APIC ICR that eventually leads to a virtual IPI in fixed delivery mode, is getting invoked from the quiescent state. Following is the RCU trace dump that I was getting: [ 178.109535] = [ 178.114027] WARNING: suspicious RCU usage [ 178.118518] 5.8.0-rc7-upstream-+ #10 Not tainted [ 178.123685] - [ 178.128176] arch/x86/kvm/lapic.c:269 suspicious rcu_dereference_check() usage! [ 178.136254] [ 178.136254] other info that might help us debug this: [ 178.136254] [ 178.145205] [ 178.145205] rcu_scheduler_active = 2, debug_locks = 1 [ 178.152506] 1 lock held by CPU 0/KVM/2959: [ 178.157091] #0: c9000717b6f8 (>arch.apic_map_lock){+.+.}-{3:3}, at: kvm_recalculate_apic_map+0x8b/0xdd0 [kvm] [ 178.169207] [ 178.169207] stack backtrace: [ 178.174086] CPU: 18 PID: 2959 Comm: CPU 0/KVM Not tainted 5.8.0-rc7-upstream-+ #10 [ 178.182539] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.6.0 10/31/2017 [ 178.190895] Call Trace: [ 178.193637] dump_stack+0x9d/0xe0 [ 178.197379] kvm_recalculate_apic_map+0x8ce/0xdd0 [kvm] [ 178.203259] ? kvm_lapic_reset+0x832/0xe50 [kvm] [ 178.208459] kvm_vcpu_reset+0x28/0x7b0 [kvm] [ 178.213270] kvm_arch_vcpu_create+0x830/0xb70 [kvm] [ 178.218759] kvm_vm_ioctl+0x11b1/0x1fe0 [kvm] [ 178.223635] ? mark_lock+0x144/0x19e0 [ 178.227757] ? kvm_unregister_device_ops+0xe0/0xe0 [kvm] [ 178.233698] ? sched_clock+0x5/0x10 [ 178.237597] ? sched_clock_cpu+0x18/0x1d0 [ 178.242087] ? __lock_acquire+0xcf6/0x5010 [ 178.246686] ? lockdep_hardirqs_on_prepare+0x550/0x550 [ 178.252429] ? lockdep_hardirqs_on_prepare+0x550/0x550 [ 178.258177] ? sched_clock+0x5/0x10 [ 178.262074] ? sched_clock_cpu+0x18/0x1d0 [ 178.266556] ? find_held_lock+0x3a/0x1c0 [ 178.270953] ? ioctl_file_clone+0x120/0x120 [ 178.275630] ? selinux_file_ioctl+0x98/0x570 [ 178.280405] ? selinux_file_mprotect+0x5b0/0x5b0 [ 178.285569] ? rcu_tasks_wait_gp+0x6d1/0xa50 [ 178.290342] ? rcu_read_lock_sched_held+0xe0/0xe0 [ 178.295608] ? __fget_files+0x1f0/0x300 [ 178.299912] ksys_ioctl+0xc0/0x110 [ 178.303719] __x64_sys_ioctl+0x6f/0xb0 [ 178.307913] do_syscall_64+0x51/0xb0 [ 178.311913] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 178.317557] RIP: 0033:0x7f6b9700d88b [ 178.321551] Code: 0f 1e fa 48 8b 05 fd 95 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 018 [ 178.342513] RSP: 002b:7f6b8cbe9668 EFLAGS: 0246 ORIG_RAX: 0010 [ 178.350967] RAX: ffda RBX: 55e8162d9000 RCX: 7f6b9700d88b [ 178.358935] RDX: RSI: ae41 RDI: 000e [ 178.366903] RBP: 55e8162d9000 R08: 55e8155ec4d0 R09: 55e8162d9000 [ 178.374871] R10: 55e815d94ee0 R11: 0246 R12: 55e8162ad420 [ 178.382838] R13: 55e8162d9000 R14: 7ffedf043660 R15: 7f6b8cbe9800 [ 182.771858] [ 182.773606] = [ 182.778084] WARNING: suspicious RCU usage [ 182.782564] 5.8.0-rc7-upstream-+ #10 Not tainted [ 182.787719] - [ 182.792197] arch/x86/include/asm/trace/fpu.h:60 suspicious rcu_dereference_check() usage! [ 182.801329] [ 182.801329] other info that might help us debug this: [ 182.801329] [ 182.810268] [ 182.810268] RCU used illegally from idle CPU! [ 182.810268] rcu_scheduler_active = 2, debug_locks = 1 [ 182.822407] RCU used illegally from extended quiescent state! [ 182.828824] 1 lock held by CPU 0/KVM/2959: [ 182.833397] #0: 88903f8500d0 (>mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x172/0xb00 [kvm] [ 182.838308] [ 182.838308] stack backtrace: [ 182.838313] CPU: 18 PID: 2959 Comm: CPU 0/KVM Tainted: G W 5.8.0-rc7-upstream-+ #10 [ 182.838316] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.6.0 10/31/2017 [ 182.838318] Call Trace: [ 182.874318] dump_stack+0x9d/0xe0 [ 182.878024] switch_fpu_return+0x37c/0x410 [ 182.882602] ? fpu__clear+0x1a0/0x1a0 [ 182.886700] ? rcu_dynticks_eqs_enter+0x15/0x30 [ 182.891807] vcpu_enter_guest+0x1854/0x3df0 [kvm] [ 182.897121] ? kvm_vcpu_reload_apic_access_page+0x60/0x60 [kvm] [ 182.903738] ? lock_acquire+0x1ac/0xac0 [ 182.908062] ? kvm_arch_vcpu_ioctl_run+0x1dc/0x13c0 [kvm] [ 182.914107] ? rcu_read_unlock+0x50/0x50 [ 182.918489] ? rcu_read_lock_sched_held+0xaf/0xe0 [ 182.923788] ? kvm_load_guest_fpu+0x94/0x350 [kvm] [ 182.929177] ? kvm_load_guest_fpu+0x94/0x350 [kvm] [ 182.934528] ? __local_bh_enable_ip+0x123/0x1a0 [ 182.939635] kvm_arch_vcpu_ioctl_run+0x310/0x13c0 [kvm] [ 182.945529]