Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Alex Williamson
On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:
 
  Right, we don't want to create dependencies across modules.  I don't
  have a vision for how this should work.  This is effectively a complete
  side-band to vfio, so we're really just dealing in the iommu group
  space.  Maybe there needs to be some kind of registration of ownership
  for the group using some kind of token.  It would need to include some
  kind of notification when that ownership ends.  That might also be a
  convenient tag to toggle driver probing off for devices in the group.
  Other ideas?  Thanks,
 
 All of that smells nasty like it will need a pile of bloody
 infrastructure which makes me think it's too complicated and not the
 right approach.
 
 How does access control work today on x86/VFIO ? Can you give me a bit
 more details ? I didn't get a good grasp in your previous email

The current model is not x86 specific, but it only covers doing iommu
and device access through vfio.  The kink here is that we're trying to
do device access and setup through vfio, but iommu manipulation through
kvm.  We may want to revisit whether we can do the in-kernel iommu
manipulation through vfio rather than kvm.

For vfio in general, the group is the unit of ownership.  A user is
granted access to /dev/vfio/$GROUP through file permissions.  The user
opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER
on the group.  If supported by the platform, multiple groups can be set
to the same container, which allows for iommu domain sharing.  Once a
group is associated with a container, an iommu backend can be
initialized for the container.  Only then can a device be accessed
through the group.

So even if we were to pass a vfio group file descriptor into kvm and it
matched as some kind of ownership token on the iommu group, it's not
clear that's sufficient to assume we can start programming the iommu.
Thanks,

Alex

 From the look of it, the VFIO file descriptor is what has the access
 control to the underlying iommu, is this right ? So we somewhat need to
 transfer (or copy) that ownership from the VFIO fd to the KVM VM.
 
 I don't see a way to do that without some cross-layering here...
 
 Rusty, are you aware of some kernel mechanism we can use for that ?
 
 Cheers,
 Ben.
 
 



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Benjamin Herrenschmidt
On Tue, 2013-06-18 at 08:48 -0600, Alex Williamson wrote:
 On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote:
  On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:
  
   Right, we don't want to create dependencies across modules.  I don't
   have a vision for how this should work.  This is effectively a complete
   side-band to vfio, so we're really just dealing in the iommu group
   space.  Maybe there needs to be some kind of registration of ownership
   for the group using some kind of token.  It would need to include some
   kind of notification when that ownership ends.  That might also be a
   convenient tag to toggle driver probing off for devices in the group.
   Other ideas?  Thanks,
  
  All of that smells nasty like it will need a pile of bloody
  infrastructure which makes me think it's too complicated and not the
  right approach.
  
  How does access control work today on x86/VFIO ? Can you give me a bit
  more details ? I didn't get a good grasp in your previous email
 
 The current model is not x86 specific, but it only covers doing iommu
 and device access through vfio.  The kink here is that we're trying to
 do device access and setup through vfio, but iommu manipulation through
 kvm.  We may want to revisit whether we can do the in-kernel iommu
 manipulation through vfio rather than kvm.

How would that be possible ?

The hypercalls from the guest arrive in KVM... in a very very specific 
restricted environment which we call real mode (MMU off but still
running in guest context), where we try to do as much as possible, or in
virtual mode, where they get handled as normal KVM exits.

The only way we could handle them in VFIO would be if somewhat VFIO
registered callbacks with KVM... if we have that sort of
cross-dependency, then we may as well have a simpler one where VFIO
tells KVM what iommu is available for the VM

 For vfio in general, the group is the unit of ownership.  A user is
 granted access to /dev/vfio/$GROUP through file permissions.  The user
 opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER
 on the group.  If supported by the platform, multiple groups can be set
 to the same container, which allows for iommu domain sharing.  Once a
 group is associated with a container, an iommu backend can be
 initialized for the container.  Only then can a device be accessed
 through the group.
 
 So even if we were to pass a vfio group file descriptor into kvm and it
 matched as some kind of ownership token on the iommu group, it's not
 clear that's sufficient to assume we can start programming the iommu.
 Thanks,

Your scheme seems to me that it would have the same problem if you
wanted to do virtualized iommu

In any case, this is a big deal. We have a requirement for pass-through.
It cannot work with any remotely usable performance level if we don't
implement the calls in KVM, so it needs to be sorted one way or another
and I'm at a loss how here...

Ben.

 Alex
 
  From the look of it, the VFIO file descriptor is what has the access
  control to the underlying iommu, is this right ? So we somewhat need to
  transfer (or copy) that ownership from the VFIO fd to the KVM VM.
  
  I don't see a way to do that without some cross-layering here...
  
  Rusty, are you aware of some kernel mechanism we can use for that ?
  
  Cheers,
  Ben.
  
  
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Alexey Kardashevskiy
On 06/16/2013 02:39 PM, Benjamin Herrenschmidt wrote:
  static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool 
 writing,
 -unsigned long *pte_sizep)
 +unsigned long *pte_sizep, bool do_get_page)
  {
  pte_t *ptep;
  unsigned int shift = 0;
 @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
  if (!pte_present(*ptep))
  return __pte(0);
  
 +/*
 + * Put huge pages handling to the virtual mode.
 + * The only exception is for TCE list pages which we
 + * do need to call get_page() for.
 + */
 +if ((*pte_sizep  PAGE_SIZE)  do_get_page)
 +return __pte(0);
 +
  /* wait until _PAGE_BUSY is clear then set it atomically */
  __asm__ __volatile__ (
  1: ldarx   %0,0,%3\n
 @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
  : cc);
  
  ret = pte;
 +if (do_get_page  pte_present(pte)  (!writing || pte_write(pte))) {
 +struct page *pg = NULL;
 +pg = realmode_pfn_to_page(pte_pfn(pte));
 +if (realmode_get_page(pg)) {
 +ret = __pte(0);
 +} else {
 +pte = pte_mkyoung(pte);
 +if (writing)
 +pte = pte_mkdirty(pte);
 +}
 +}
 +*ptep = pte;/* clears _PAGE_BUSY */
  
  return ret;
  }
 
 So now you are adding the clearing of _PAGE_BUSY that was missing for
 your first patch, except that this is not enough since that means that
 in the emulated case (ie, !do_get_page) you will in essence return
 and then use a PTE that is not locked without any synchronization to
 ensure that the underlying page doesn't go away... then you'll
 dereference that page.
 
 So either make everything use speculative get_page, or make the emulated
 case use the MMU notifier to drop the operation in case of collision.
 
 The former looks easier.
 
 Also, any specific reason why you do:
 
   - Lock the PTE
   - get_page()
   - Unlock the PTE
 
 Instead of
 
   - Read the PTE
   - get_page_unless_zero
   - re-check PTE
 
 Like get_user_pages_fast() does ?
 
 The former will be two atomic ops, the latter only one (faster), but
 maybe you have a good reason why that can't work...



If we want to set dirty and young bits for pte then I do not know how
to avoid _PAGE_BUSY.



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Rusty Russell
Alex Williamson alex.william...@redhat.com writes:
 On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:
 
  IOMMU groups themselves don't provide security, they're accessed by
  interfaces like VFIO, which provide the security.  Given a brief look, I
  agree, this looks like a possible backdoor.  The typical VFIO way to
  handle this would be to pass a VFIO file descriptor here to prove that
  the process has access to the IOMMU group.  This is how /dev/vfio/vfio
  gains the ability to setup an IOMMU domain an do mappings with the
  SET_CONTAINER ioctl using a group fd.  Thanks,
 
 How do you envision that in the kernel ? IE. I'm in KVM code, gets that
 vfio fd, what do I do with it ?
 
 Basically, KVM needs to know that the user is allowed to use that iommu
 group. I don't think we want KVM however to call into VFIO directly
 right ?

 Right, we don't want to create dependencies across modules.  I don't
 have a vision for how this should work.  This is effectively a complete
 side-band to vfio, so we're really just dealing in the iommu group
 space.  Maybe there needs to be some kind of registration of ownership
 for the group using some kind of token.  It would need to include some
 kind of notification when that ownership ends.  That might also be a
 convenient tag to toggle driver probing off for devices in the group.
 Other ideas?  Thanks,

It's actually not that bad.

eg. 

struct vfio_container *vfio_container_from_file(struct file *filp)
{
if (filp-f_op != vfio_device_fops)
return ERR_PTR(-EINVAL);

/* OK it really is a vfio fd, return the data. */

}
EXPORT_SYMBOL_GPL(vfio_container_from_file);

...

inside KVM_CREATE_SPAPR_TCE_IOMMU:

struct file *vfio_filp;
struct vfio_container *(lookup)(struct file *filp);

vfio_filp = fget(create_tce_iommu.fd);
if (!vfio)
ret = -EBADF;
lookup = symbol_get(vfio_container_from_file);
if (!lookup)
ret = -EINVAL;
else {
container = lookup(vfio_filp);
if (IS_ERR(container))
ret = PTR_ERR(container);
else
...
symbol_put(vfio_container_from_file);
}

symbol_get() won't try to load a module; it'll just fail.  This is what
you want, since they must have vfio in the kernel to get a valid fd...

Hope that helps,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Benjamin Herrenschmidt
On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote:
 symbol_get() won't try to load a module; it'll just fail.  This is what
 you want, since they must have vfio in the kernel to get a valid fd...

Ok, cool. I suppose what we want here Alexey is slightly higher level,
something like:

vfio_validate_iommu_id(file, iommu_id)

Which verifies that the file that was passed in is allowed to use
that iommu_id.

That's a simple and flexible interface (ie, it will work even if we
support multiple iommu IDs in the future for a vfio, for example
for DDW windows etc...), the logic to know about the ID remains
in qemu, this is strictly a validation call.

That way we also don't have to expose the containing vfio struct etc...
just that simple function.

Alex, any objection ?

Do we need to make it a get/put interface instead ?

vfio_validate_and_use_iommu(file, iommu_id);

vfio_release_iommu(file, iommu_id);

To ensure that the resource remains owned by the process until KVM
is closed as well ?

Or do we want to register with VFIO with a callback so that VFIO can
call us if it needs us to give it up ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html