On Thu, Apr 25, 2013 at 11:13:40PM +0200, Alexander Graf wrote:
> 
> On 25.04.2013, at 21:03, Scott Wood wrote:
> 
> > On 04/25/2013 09:49:23 AM, Alexander Graf wrote:
> >> On 25.04.2013, at 13:30, Alexander Graf wrote:
> >> >
> >> > On 19.04.2013, at 20:51, Scott Wood wrote:
> >> >
> >> >> On 04/19/2013 09:06:27 AM, Alexander Graf wrote:
> >> >>> Now that all pieces are in place for reusing generic irq 
> >> >>> infrastructure,
> >> >>> we can copy x86's implementation of KVM_IRQ_LINE irq injection and 
> >> >>> simply
> >> >>> reuse it for PPC, as it will work there just as well.
> >> >>> Signed-off-by: Alexander Graf <ag...@suse.de>
> >> >>> ---
> >> >>> arch/powerpc/include/uapi/asm/kvm.h |    1 +
> >> >>> arch/powerpc/kvm/powerpc.c          |   13 +++++++++++++
> >> >>> 2 files changed, 14 insertions(+), 0 deletions(-)
> >> >>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
> >> >>> b/arch/powerpc/include/uapi/asm/kvm.h
> >> >>> index 3537bf3..dbb2ac2 100644
> >> >>> --- a/arch/powerpc/include/uapi/asm/kvm.h
> >> >>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> >> >>> @@ -26,6 +26,7 @@
> >> >>> #define __KVM_HAVE_SPAPR_TCE
> >> >>> #define __KVM_HAVE_PPC_SMT
> >> >>> #define __KVM_HAVE_IRQCHIP
> >> >>> +#define __KVM_HAVE_IRQ_LINE
> >> >>> struct kvm_regs {
> >> >>>        __u64 pc;
> >> >>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> >> >>> index c431fea..874c106 100644
> >> >>> --- a/arch/powerpc/kvm/powerpc.c
> >> >>> +++ b/arch/powerpc/kvm/powerpc.c
> >> >>> @@ -33,6 +33,7 @@
> >> >>> #include <asm/cputhreads.h>
> >> >>> #include <asm/irqflags.h>
> >> >>> #include "timing.h"
> >> >>> +#include "irq.h"
> >> >>> #include "../mm/mmu_decl.h"
> >> >>> #define CREATE_TRACE_POINTS
> >> >>> @@ -945,6 +946,18 @@ static int kvm_vm_ioctl_get_pvinfo(struct 
> >> >>> kvm_ppc_pvinfo *pvinfo)
> >> >>>        return 0;
> >> >>> }
> >> >>> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level 
> >> >>> *irq_event,
> >> >>> +                        bool line_status)
> >> >>> +{
> >> >>> +      if (!irqchip_in_kernel(kvm))
> >> >>> +              return -ENXIO;
> >> >>> +
> >> >>> +      irq_event->status = kvm_set_irq(kvm, 
> >> >>> KVM_USERSPACE_IRQ_SOURCE_ID,
> >> >>> +                                      irq_event->irq, 
> >> >>> irq_event->level,
> >> >>> +                                      line_status);
> >> >>> +      return 0;
> >> >>> +}
> >> >>
> >> >> As Paul noted in the XICS patchset, this could reference an MPIC that 
> >> >> has gone away if the user never attached any vcpus and then closed the 
> >> >> MPIC fd.  It's not a reasonable use case, but it could be used 
> >> >> malicously to get the kernel to access a bad pointer.  The 
> >> >> irqchip_in_kernel check helps somewhat, but it's meant for ensuring 
> >> >> that the creation has happened -- it's racy if used for ensuring that 
> >> >> destruction hasn't happened.
> >> >>
> >> >> The problem is rooted in the awkwardness of performing an operation 
> >> >> that logically should be on the MPIC fd, but is instead being done on 
> >> >> the vm fd.
> >> >>
> >> >> I think these three steps would fix it (the first two seem like things 
> >> >> we should be doing anyway):
> >> >> - During MPIC destruction, make sure MPIC deregisters all routes that 
> >> >> reference it.
> >> >> - In kvm_set_irq(), do not release the RCU read lock until after the 
> >> >> set() function has been called.
> >> >> - Do not hook up kvm_send_userspace_msi() to MPIC or other new 
> >> >> irqchips, as that bypasses the RCU lock.  It could be supported as a 
> >> >> device fd ioctl if desired, or it could be reworked to operate on an 
> >> >> RCU-managed list of MSI handlers, though MPIC really doesn't need this 
> >> >> at all.
> >> >
> >> > Can't we just add an RCU lock in the send_userspace_msi case? I don't 
> >> > think we should handle MSIs any differently from normal IRQs.
> > 
> > Well, you can't *just* add the RCU lock -- you need to add data to be 
> > managed via RCU (e.g. a list of MSI callbacks, or at least a boolean 
> > indicating whether calling the MSI code is OK).
> 
> Well, we'd just access a random pin routing :).
> 
> > 
> >> In fact I'm having a hard time verifying that we're always accessing 
> >> things with proper locks held. I'm pretty sure we're missing a few cases.
> > 
> > Any path in particular?
> 
> I'm already getting confused on whether normal MMIO accesses are always safe.

asserts via mutex_is_locked() and spinlock/rcu variants might be helpful.

> >> So how about we delay mpic destruction to vm destruction? We simply add 
> >> one user too many when we spawn the mpic and put it on vm_destruct. That 
> >> way users _can_ destroy mpics, but they will only be really free'd once 
> >> the vm is also gone.
> > 
> > That's what we originally had before the fd conversion.  If we want it 
> > again, we'll need to go back to maintaining a list of devices in KVM 
> > (though it could be a linked list now that we don't need to use it for 
> > lookups), or have some hardcoded MPIC hack.
> 
> Well, we could have an anonymous linked list of device pointers with a simple 
> registration function. That way it's generic enough for any device to be kept 
> alive until vm destruction if it wants that.
> 
> > IIRC I said back then that converting to fd would make destruction ordering 
> > more of a pain...
> 
> I usually like to pick the raisins from everything I can. So while I like the 
> fd approach for its universally understandable scheme, simplicity of use, 
> extensibility of ioctls etc, I don't really like the headaches that come with 
> destroying a device while a VM is running. So having a device keep itself 
> alive until the VM is gone is the best of all worlds :).

The other problem which arises from the moment you allow "get/set device
attribute at any time during VM lifetime" (which this interface allows),
is that synchronization with vcpus must be performed (and you don't want
to take a lock on the vcpu path). So the programmer has to avoid doing
that now. But its no big deal.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to