Re: Possible nohz-full/RCU issue in arm64 KVM

Nicolas Saenz Julienne Sun, 19 Dec 2021 04:12:19 -0800

On Fri, 2021-12-17 at 13:21 +0000, Mark Rutland wrote:
> On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > Hi All,
> 
> Hi,
> 
> > arm64's guest entry code does the following:
> > 
> > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > {
> >     [...]
> > 
> >     guest_enter_irqoff();
> > 
> >     ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > 
> >     [...]
> > 
> >     local_irq_enable();
> > 
> >     /*
> >      * We do local_irq_enable() before calling guest_exit() so
> >      * that if a timer interrupt hits while running the guest we
> >      * account that tick as being spent in the guest.  We enable
> >      * preemption after calling guest_exit() so that if we get
> >      * preempted we make sure ticks after that is not counted as
> >      * guest time.
> >      */
> >     guest_exit();
> >     [...]
> > }
> > 
> > 
> > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > state (EQS). Any interrupt happening between local_irq_enable() and
> > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> > do the right thing if trggered in this context, but el1's won't. Is it
> > possible to hit an el1 handler (for example __el1_irq()) there?
> 
> I think you're right that the EL1 handlers can trigger here and won't exit the
> EQS.
> 
> I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
> taken from a guest mode? I couldn't spot any handling of that case, but I'm 
> not
> familiar enough with the x86 exception model to know if I'm looking in the
> right place.


Well x86 has its own private KVM guest context exit function
'kvm_guest_exit_irqoff()', which allows it to do the right thing (simplifying
things):

        local_irq_disable();
        kvm_guest_enter_irqoff() // Inform CT, enter EQS
        __vmx_kvm_run()
        kvm_guest_exit_irqoff() // Inform CT, exit EQS, task still marked with 
PF_VCPU

        /*
         * Consume any pending interrupts, including the possible source of
         * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
         * An instruction is required after local_irq_enable() to fully unblock
         * interrupts on processors that implement an interrupt shadow, the
         * stat.exits increment will do nicely.
         */
        local_irq_enable();
        ++vcpu->stat.exits;
        local_irq_disable();

        /*
         * Wait until after servicing IRQs to account guest time so that any
         * ticks that occurred while running the guest are properly accounted
         * to the guest.  Waiting until IRQs are enabled degrades the accuracy
         * of accounting via context tracking, but the loss of accuracy is
         * acceptable for all known use cases.
         */
        vtime_account_guest_exit(); // current->flags &= ~PF_VCPU

So I guess we should convert to x86's scheme, and maybe create another generic
guest_{enter,exit}() flavor for virtualization schemes that run with interrupts
disabled.

> Note that the EL0 handlers *cannot* trigger for an exception taken from a
> guest. We use separate vectors while running a guest (for both VHE and nVHE
> modes), and from the main kernel's PoV we return from kvm_call_hyp_ret(). We
> can ony take IRQ from EL1 *after* that returns.
> 
> We *might* need to audit the KVM vector handlers to make sure they're not
> dependent on RCU protection (I assume they're not, but it's possible something
> has leaked into the VHE code).

IIUC in the window between local_irq_enable() and guest_exit() any driver
interrupt might trigger, isn't it?

Regards,

-- 
Nicolás Sáenz

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: Possible nohz-full/RCU issue in arm64 KVM

Reply via email to