On Fri, Sep 29, 2017 at 04:43:39PM +0000, Paul E. McKenney wrote: > On Fri, Sep 29, 2017 at 04:53:57PM +0200, Paolo Bonzini wrote: > > On 29/09/2017 13:01, Boqun Feng wrote: > > > Sasha Levin reported a WARNING: > > > > > > | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329 > > > | rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline] > > > | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329 > > > | rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458 > > > ... > > > | CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246 > > > | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > | 1.10.1-1ubuntu1 04/01/2014 > > > | Call Trace: > > > ... > > > | RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 > > > [inline] > > > | RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458 > > > | RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002 > > > | RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000 > > > | RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410 > > > | RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001 > > > | R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08 > > > | R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040 > > > | __schedule+0x201/0x2240 kernel/sched/core.c:3292 > > > | schedule+0x113/0x460 kernel/sched/core.c:3421 > > > | kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158 > > > | do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271 > > > | async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069 > > > | RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996 > > > | RSP: 0018:ffff88003b2df520 EFLAGS: 00010283 > > > | RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670 > > > | RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140 > > > | RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000 > > > | R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8 > > > | R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000 > > > | vsnprintf+0x173/0x1700 lib/vsprintf.c:2136 > > > | sprintf+0xbe/0xf0 lib/vsprintf.c:2386 > > > | proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23 > > > | get_link fs/namei.c:1047 [inline] > > > | link_path_walk+0x1041/0x1490 fs/namei.c:2127 > > > ... > > > > > > And this happened when we hit a page fault in an RCU read-side critical > > > section and then we tried to reschedule in kvm_async_pf_task_wait(), > > > this reschedule would hit the WARN in rcu_preempt_note_context_switch(), > > > and be treated as a sleep in RCU read-side critical section, which is > > > not allowed(even in preemptible RCU). > > > > Just a small fix to the commit message: > > > > This happened when the host hit a page fault, and delivered it as in an > > async page fault, while the guest was in an RCU read-side critical > > section. The guest then tries to reschedule in kvm_async_pf_task_wait(), > > but rcu_preempt_note_context_switch() would treat the reschedule as a > > sleep in RCU read-side critical section, which is not allowed (even in > > preemptible RCU). Thus the WARN. > > > > Queued with that change, thanks. > > Not to be repetitive, but if the schedule() is on the guest, this change > really does silently break up an RCU read-side critical section on > guests built with PREEMPT=n. (Yes, they were already being broken, > but it would be good to avoid this breakage in PREEMPT=n as well as > in PREEMPT=y.) >
Then probably adding !IS_ENABLED(CONFIG_PREEMPT) as one of the reason we
choose the halt path? Like:
n.halted = is_idle_task(current) || preempt_count() > 1 ||
!IS_ENABLED(CONFIG_PREEMPT) || rcu_preempt_depth();
But I think async PF could also happen while a user program is running?
Then maybe add a second parameter @user for kvm_async_pf_task_wait(),
like:
kvm_async_pf_task_wait((u32)read_cr2(), user_mode(regs));
and the halt condition becomes:
n.halted = is_idle_task(current) || preempt_count() > 1 ||
(!IS_ENABLED(CONFIG_PREEMPT) && !user) ||
rcu_preempt_depth();
Thoughts?
A side thing is being broken already for PREEMPT=n means we maybe fail
to detect this in rcutorture? Then should we add a config with
KVM_GUEST=y and try to run some memory consuming things(e.g. stress
--vm) in the rcutorture kvm script simultaneously? Paolo, do you have
any test workload that could trigger async PF quickly?
Regards,
Boqun
> Thanx, Paul
>
> > Paolo
> >
> > > To cure this, make kvm_async_pf_task_wait() go to the halt path if the
> > > PF happens in a RCU read-side critical section.
> > >
> > > Reported-by: Sasha Levin <[email protected]>
> > > Cc: "Paul E. McKenney" <[email protected]>
> > > Cc: Peter Zijlstra <[email protected]>
> > > Signed-off-by: Boqun Feng <[email protected]>
> > > ---
> > > arch/x86/kernel/kvm.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index aa60a08b65b1..e675704fa6f7 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > > @@ -140,7 +140,8 @@ void kvm_async_pf_task_wait(u32 token)
> > >
> > > n.token = token;
> > > n.cpu = smp_processor_id();
> > > - n.halted = is_idle_task(current) || preempt_count() > 1;
> > > + n.halted = is_idle_task(current) || preempt_count() > 1 ||
> > > + rcu_preempt_depth();
> > > init_swait_queue_head(&n.wq);
> > > hlist_add_head(&n.link, &b->list);
> > > raw_spin_unlock(&b->lock);
> > >
> >
>
signature.asc
Description: PGP signature

