Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full

2020-07-31 Thread Wanpeng Li
On Fri, 31 Jul 2020 at 06:45, Nitesh Narayan Lal  wrote:
>
>
> On 7/29/20 8:34 AM, Nitesh Narayan Lal wrote:
> > On 7/28/20 10:38 PM, Wanpeng Li wrote:
> >> Hi Nitesh,
> >> On Wed, 29 Jul 2020 at 09:00, Wanpeng Li  wrote:
> >>> On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal  
> >>> wrote:
>  Hi,
> 
>  I have recently come across an RCU trace with the 5.8-rc7 kernel that 
>  has the
>  debug configs enabled while installing a VM on a CPU that is listed under
>  nohz_full.
> 
>  Based on some of the initial debugging, my impression is that the issue 
>  is
>  triggered because of the fastpath that is meant to optimize the writes 
>  to x2APIC
>  ICR that eventually leads to a virtual IPI in fixed delivery mode, is 
>  getting
>  invoked from the quiescent state.
> >> Could you try latest linux-next tree? I guess maybe some patches are
> >> pending in linux-next tree, I can't reproduce against linux-next tree.
> > Sure, I will try this today.
>
> Hi Wanpeng,
>
> I am not seeing the issue getting reproduced with the linux-next tree.
> Although, I am still seeing a Warning stack trace:
>
> [  139.220080] RIP: 0010:kvm_arch_vcpu_ioctl_run+0xb57/0x1320 [kvm]
> [  139.226837] Code: e8 03 0f b6 04 18 84 c0 74 06 0f 8e 4a 03 00 00 41 c6 85 
> 48
> 31 00 00 00 e9 24 f8 ff ff 4c 89 ef e8 7e ac 02 00 e9 3d f8 ff ff <0f> 0b e9 
> f2
> f8 ff ff 48f
> [  139.247828] RSP: 0018:8889bc397cb8 EFLAGS: 00010202
> [  139.253700] RAX: 0001 RBX: dc00 RCX: 
> c1fc3bef
> [  139.261695] RDX:  RSI: 0001 RDI: 
> 888f0fa1a8a0
> [  139.269692] RBP: 8889bc397d18 R08: ed113786a7d0 R09: 
> ed113786a7d0
> [  139.277686] R10: 8889bc353e7f R11: ed113786a7cf R12: 
> 8889bc35423c
> [  139.285682] R13: 8889bc353e40 R14: 8889bc353e6c R15: 
> 88897f536000
> [  139.293678] FS:  7f3d8a71c700() GS:888a3c40()
> knlGS:
> [  139.302742] CS:  0010 DS:  ES:  CR0: 80050033
> [  139.309186] CR2:  CR3: 0009bc34c004 CR4: 
> 003726e0
> [  139.317180] Call Trace:
> [  139.320002]  kvm_vcpu_ioctl+0x3ee/0xb10 [kvm]
> [  139.324907]  ? sched_clock+0x5/0x10
> [  139.328875]  ? kvm_io_bus_get_dev+0x1c0/0x1c0 [kvm]
> [  139.334375]  ? ioctl_file_clone+0x120/0x120
> [  139.339079]  ? selinux_file_ioctl+0x98/0x570
> [  139.343895]  ? selinux_file_mprotect+0x5b0/0x5b0
> [  139.349088]  ? irq_matrix_assign+0x360/0x430
> [  139.353904]  ? rcu_read_lock_sched_held+0xe0/0xe0
> [  139.359201]  ? __fget_files+0x1f0/0x300
> [  139.363532]  __x64_sys_ioctl+0x128/0x18e
> [  139.367948]  do_syscall_64+0x33/0x40
> [  139.371974]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  139.377643] RIP: 0033:0x7f3d98d0a88b
>
> Are you also triggering anything like this in your environment?

I see other issue when rmmod kvm modules. :)

 [ cut here ]
 WARNING: CPU: 5 PID: 2837 at kernel/rcu/tree_plugin.h:1738 call_rcu+0xd3/0x800
 CPU: 5 PID: 2837 Comm: rmmod Not tainted 5.8.0-rc7-next-20200728 #1
 RIP: 0010:call_rcu+0xd3/0x800
 RSP: 0018:ae25c302bd90 EFLAGS: 00010002
 RAX: 0001 RBX: 944e4f80 RCX: 
 RDX: 0101 RSI: 009f RDI: 93308cef
 RBP: ae25c302bdf0 R08: 074e R09: 2a2f
 R10: 0002 R11: 944dcf80 R12: 936ef4c0
 R13: 9702ce1fd900 R14: 920e2bd0 R15: 3849
 FS:  7fc99b242700() GS:9702ce00() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 55905332dd58 CR3: 0003eeb46005 CR4: 001706e0
 Call Trace:
  call_rcu_zapped+0x70/0x80
  lockdep_unregister_key+0xa6/0xf0
  destroy_workqueue+0x1b1/0x210
  kvm_irqfd_exit+0x15/0x20 [kvm]
  kvm_exit+0x78/0x80 [kvm]
  vmx_exit+0x1e/0x50 [kvm_intel]
  __x64_sys_delete_module+0x1e6/0x260
  do_syscall_64+0x63/0x350
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fc99ad7b9e7


Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full

2020-07-30 Thread Nitesh Narayan Lal

On 7/29/20 8:34 AM, Nitesh Narayan Lal wrote:
> On 7/28/20 10:38 PM, Wanpeng Li wrote:
>> Hi Nitesh,
>> On Wed, 29 Jul 2020 at 09:00, Wanpeng Li  wrote:
>>> On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal  wrote:
 Hi,

 I have recently come across an RCU trace with the 5.8-rc7 kernel that has 
 the
 debug configs enabled while installing a VM on a CPU that is listed under
 nohz_full.

 Based on some of the initial debugging, my impression is that the issue is
 triggered because of the fastpath that is meant to optimize the writes to 
 x2APIC
 ICR that eventually leads to a virtual IPI in fixed delivery mode, is 
 getting
 invoked from the quiescent state.
>> Could you try latest linux-next tree? I guess maybe some patches are
>> pending in linux-next tree, I can't reproduce against linux-next tree.
> Sure, I will try this today.

Hi Wanpeng,

I am not seeing the issue getting reproduced with the linux-next tree.
Although, I am still seeing a Warning stack trace:

[  139.220080] RIP: 0010:kvm_arch_vcpu_ioctl_run+0xb57/0x1320 [kvm]
[  139.226837] Code: e8 03 0f b6 04 18 84 c0 74 06 0f 8e 4a 03 00 00 41 c6 85 48
31 00 00 00 e9 24 f8 ff ff 4c 89 ef e8 7e ac 02 00 e9 3d f8 ff ff <0f> 0b e9 f2
f8 ff ff 48f
[  139.247828] RSP: 0018:8889bc397cb8 EFLAGS: 00010202
[  139.253700] RAX: 0001 RBX: dc00 RCX: c1fc3bef
[  139.261695] RDX:  RSI: 0001 RDI: 888f0fa1a8a0
[  139.269692] RBP: 8889bc397d18 R08: ed113786a7d0 R09: ed113786a7d0
[  139.277686] R10: 8889bc353e7f R11: ed113786a7cf R12: 8889bc35423c
[  139.285682] R13: 8889bc353e40 R14: 8889bc353e6c R15: 88897f536000
[  139.293678] FS:  7f3d8a71c700() GS:888a3c40()
knlGS:
[  139.302742] CS:  0010 DS:  ES:  CR0: 80050033
[  139.309186] CR2:  CR3: 0009bc34c004 CR4: 003726e0
[  139.317180] Call Trace:
[  139.320002]  kvm_vcpu_ioctl+0x3ee/0xb10 [kvm]
[  139.324907]  ? sched_clock+0x5/0x10
[  139.328875]  ? kvm_io_bus_get_dev+0x1c0/0x1c0 [kvm]
[  139.334375]  ? ioctl_file_clone+0x120/0x120
[  139.339079]  ? selinux_file_ioctl+0x98/0x570
[  139.343895]  ? selinux_file_mprotect+0x5b0/0x5b0
[  139.349088]  ? irq_matrix_assign+0x360/0x430
[  139.353904]  ? rcu_read_lock_sched_held+0xe0/0xe0
[  139.359201]  ? __fget_files+0x1f0/0x300
[  139.363532]  __x64_sys_ioctl+0x128/0x18e
[  139.367948]  do_syscall_64+0x33/0x40
[  139.371974]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  139.377643] RIP: 0033:0x7f3d98d0a88b

Are you also triggering anything like this in your environment?


>
-- 
Nitesh



signature.asc
Description: OpenPGP digital signature


Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full

2020-07-29 Thread Nitesh Narayan Lal

On 7/28/20 10:38 PM, Wanpeng Li wrote:
> Hi Nitesh,
> On Wed, 29 Jul 2020 at 09:00, Wanpeng Li  wrote:
>> On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal  wrote:
>>> Hi,
>>>
>>> I have recently come across an RCU trace with the 5.8-rc7 kernel that has 
>>> the
>>> debug configs enabled while installing a VM on a CPU that is listed under
>>> nohz_full.
>>>
>>> Based on some of the initial debugging, my impression is that the issue is
>>> triggered because of the fastpath that is meant to optimize the writes to 
>>> x2APIC
>>> ICR that eventually leads to a virtual IPI in fixed delivery mode, is 
>>> getting
>>> invoked from the quiescent state.
> Could you try latest linux-next tree? I guess maybe some patches are
> pending in linux-next tree, I can't reproduce against linux-next tree.

Sure, I will try this today.

>
-- 
Nitesh



signature.asc
Description: OpenPGP digital signature


Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full

2020-07-28 Thread Wanpeng Li
Hi Nitesh,
On Wed, 29 Jul 2020 at 09:00, Wanpeng Li  wrote:
>
> On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal  wrote:
> >
> > Hi,
> >
> > I have recently come across an RCU trace with the 5.8-rc7 kernel that has 
> > the
> > debug configs enabled while installing a VM on a CPU that is listed under
> > nohz_full.
> >
> > Based on some of the initial debugging, my impression is that the issue is
> > triggered because of the fastpath that is meant to optimize the writes to 
> > x2APIC
> > ICR that eventually leads to a virtual IPI in fixed delivery mode, is 
> > getting
> > invoked from the quiescent state.

Could you try latest linux-next tree? I guess maybe some patches are
pending in linux-next tree, I can't reproduce against linux-next tree.


Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full

2020-07-28 Thread Wanpeng Li
On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal  wrote:
>
> Hi,
>
> I have recently come across an RCU trace with the 5.8-rc7 kernel that has the
> debug configs enabled while installing a VM on a CPU that is listed under
> nohz_full.
>
> Based on some of the initial debugging, my impression is that the issue is
> triggered because of the fastpath that is meant to optimize the writes to 
> x2APIC
> ICR that eventually leads to a virtual IPI in fixed delivery mode, is getting
> invoked from the quiescent state.

I still can reproduce this after removing the ipi/timer fastpath
codes, anyway, I will have a look.


WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz_full

2020-07-28 Thread Nitesh Narayan Lal
Hi,

I have recently come across an RCU trace with the 5.8-rc7 kernel that has the
debug configs enabled while installing a VM on a CPU that is listed under
nohz_full.

Based on some of the initial debugging, my impression is that the issue is
triggered because of the fastpath that is meant to optimize the writes to x2APIC
ICR that eventually leads to a virtual IPI in fixed delivery mode, is getting
invoked from the quiescent state.

Following is the RCU trace dump that I was getting:

[  178.109535] =
[  178.114027] WARNING: suspicious RCU usage
[  178.118518] 5.8.0-rc7-upstream-+ #10 Not tainted
[  178.123685] -
[  178.128176] arch/x86/kvm/lapic.c:269 suspicious rcu_dereference_check() 
usage!
[  178.136254]
[  178.136254] other info that might help us debug this:
[  178.136254]
[  178.145205]
[  178.145205] rcu_scheduler_active = 2, debug_locks = 1
[  178.152506] 1 lock held by CPU 0/KVM/2959:
[  178.157091]  #0: c9000717b6f8 (>arch.apic_map_lock){+.+.}-{3:3}, at:
kvm_recalculate_apic_map+0x8b/0xdd0 [kvm]
[  178.169207]
[  178.169207] stack backtrace:
[  178.174086] CPU: 18 PID: 2959 Comm: CPU 0/KVM Not tainted
5.8.0-rc7-upstream-+ #10
[  178.182539] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.6.0 
10/31/2017
[  178.190895] Call Trace:
[  178.193637]  dump_stack+0x9d/0xe0
[  178.197379]  kvm_recalculate_apic_map+0x8ce/0xdd0 [kvm]
[  178.203259]  ? kvm_lapic_reset+0x832/0xe50 [kvm]
[  178.208459]  kvm_vcpu_reset+0x28/0x7b0 [kvm]
[  178.213270]  kvm_arch_vcpu_create+0x830/0xb70 [kvm]
[  178.218759]  kvm_vm_ioctl+0x11b1/0x1fe0 [kvm]
[  178.223635]  ? mark_lock+0x144/0x19e0
[  178.227757]  ? kvm_unregister_device_ops+0xe0/0xe0 [kvm]
[  178.233698]  ? sched_clock+0x5/0x10
[  178.237597]  ? sched_clock_cpu+0x18/0x1d0
[  178.242087]  ? __lock_acquire+0xcf6/0x5010
[  178.246686]  ? lockdep_hardirqs_on_prepare+0x550/0x550
[  178.252429]  ? lockdep_hardirqs_on_prepare+0x550/0x550
[  178.258177]  ? sched_clock+0x5/0x10
[  178.262074]  ? sched_clock_cpu+0x18/0x1d0
[  178.266556]  ? find_held_lock+0x3a/0x1c0
[  178.270953]  ? ioctl_file_clone+0x120/0x120
[  178.275630]  ? selinux_file_ioctl+0x98/0x570
[  178.280405]  ? selinux_file_mprotect+0x5b0/0x5b0
[  178.285569]  ? rcu_tasks_wait_gp+0x6d1/0xa50
[  178.290342]  ? rcu_read_lock_sched_held+0xe0/0xe0
[  178.295608]  ? __fget_files+0x1f0/0x300
[  178.299912]  ksys_ioctl+0xc0/0x110
[  178.303719]  __x64_sys_ioctl+0x6f/0xb0
[  178.307913]  do_syscall_64+0x51/0xb0
[  178.311913]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  178.317557] RIP: 0033:0x7f6b9700d88b
[  178.321551] Code: 0f 1e fa 48 8b 05 fd 95 2c 00 64 c7 00 26 00 00 00 48 c7 c0
ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0
ff ff 73 018
[  178.342513] RSP: 002b:7f6b8cbe9668 EFLAGS: 0246 ORIG_RAX:
0010
[  178.350967] RAX: ffda RBX: 55e8162d9000 RCX: 7f6b9700d88b
[  178.358935] RDX:  RSI: ae41 RDI: 000e
[  178.366903] RBP: 55e8162d9000 R08: 55e8155ec4d0 R09: 55e8162d9000
[  178.374871] R10: 55e815d94ee0 R11: 0246 R12: 55e8162ad420
[  178.382838] R13: 55e8162d9000 R14: 7ffedf043660 R15: 7f6b8cbe9800
[  182.771858]
[  182.773606] =
[  182.778084] WARNING: suspicious RCU usage
[  182.782564] 5.8.0-rc7-upstream-+ #10 Not tainted
[  182.787719] -
[  182.792197] arch/x86/include/asm/trace/fpu.h:60 suspicious
rcu_dereference_check() usage!
[  182.801329]
[  182.801329] other info that might help us debug this:
[  182.801329]
[  182.810268]
[  182.810268] RCU used illegally from idle CPU!
[  182.810268] rcu_scheduler_active = 2, debug_locks = 1
[  182.822407] RCU used illegally from extended quiescent state!
[  182.828824] 1 lock held by CPU 0/KVM/2959:
[  182.833397]  #0: 88903f8500d0 (>mutex){+.+.}-{3:3}, at:
kvm_vcpu_ioctl+0x172/0xb00 [kvm]
[  182.838308]
[  182.838308] stack backtrace:
[  182.838313] CPU: 18 PID: 2959 Comm: CPU 0/KVM Tainted: G    W
5.8.0-rc7-upstream-+ #10
[  182.838316] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.6.0 
10/31/2017
[  182.838318] Call Trace:
[  182.874318]  dump_stack+0x9d/0xe0
[  182.878024]  switch_fpu_return+0x37c/0x410
[  182.882602]  ? fpu__clear+0x1a0/0x1a0
[  182.886700]  ? rcu_dynticks_eqs_enter+0x15/0x30
[  182.891807]  vcpu_enter_guest+0x1854/0x3df0 [kvm]
[  182.897121]  ? kvm_vcpu_reload_apic_access_page+0x60/0x60 [kvm]
[  182.903738]  ? lock_acquire+0x1ac/0xac0
[  182.908062]  ? kvm_arch_vcpu_ioctl_run+0x1dc/0x13c0 [kvm]
[  182.914107]  ? rcu_read_unlock+0x50/0x50
[  182.918489]  ? rcu_read_lock_sched_held+0xaf/0xe0
[  182.923788]  ? kvm_load_guest_fpu+0x94/0x350 [kvm]
[  182.929177]  ? kvm_load_guest_fpu+0x94/0x350 [kvm]
[  182.934528]  ? __local_bh_enable_ip+0x123/0x1a0
[  182.939635]  kvm_arch_vcpu_ioctl_run+0x310/0x13c0 [kvm]
[  182.945529]