Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-30 Thread Gleb Natapov
On Sun, Sep 29, 2013 at 09:55:59AM -0600, Alex Williamson wrote:
 On Sun, 2013-09-29 at 17:44 +0300, Gleb Natapov wrote:
  On Sun, Sep 29, 2013 at 07:52:28AM -0600, Alex Williamson wrote:
   On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
 So far we've succeeded at making KVM and VFIO mostly unaware of each
 other, but there's any important point where that breaks down.  Intel
 VT-d hardware may or may not support snoop control.  When snoop
 control is available, intel-iommu promotes No-Snoop transactions on
 PCIe to be cache coherent.  That allows KVM to handle things like the
 x86 WBINVD opcode as a nop.  When the hardware does not support this,
 KVM must implement a hardware visible WBINVD for the guest.
 
 We could simply let userspace tell KVM how to handle WBINVD, but it's
 privileged for a reason.  Allowing an arbitrary user to enable
 physical WBINVD gives them a more access to the hardware.  Previously,
 this has only been enabled for guests supporting legacy PCI device
 assignment.  In such cases it's necessary for proper guest execution.
 We therefore create a new KVM-VFIO virtual device.  The user can add
 and remove VFIO groups to this device via file descriptors.  KVM
 makes use of the VFIO external user interface to validate that the
 user has access to physical hardware and gets the coherency state of
 the IOMMU from VFIO.  This provides equivalent functionality to
 legacy KVM assignment, while keeping (nearly) all the bits isolated.
 
Looks good overall to me, one things though: to use legacy device
assignment one needs root permission, so only root user can enable
WBINVD emulation.
   
   That's not entirely accurate, legacy device assignment can be used by a
   non-root user, libvirt does this all the time.  The part that requires
   root access is opening the pci-sysfs config file, the rest can be
   managed via file permissions on the remaining sysfs files.
   
  So how libvirt manages to do that as non-root user if pci-sysfs config
  file needs root permission. I didn't mean to say that legacy code
  checks for root explicitly, what I meant is that at some point root
  permission is needed.
 
 Yes, libvirt needs admin permission for legacy to bind to pci-stub,
 change permission on sysfs files and pass an opened pci config sysfs
 file descriptor.  For vfio libvirt needs admin permission to bind to
 vfio-pci and change group file permission.  From that perspective the
 admin requirement is similar.
 
Yes, certainly appears so. Thanks.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Gleb Natapov
On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
 So far we've succeeded at making KVM and VFIO mostly unaware of each
 other, but there's any important point where that breaks down.  Intel
 VT-d hardware may or may not support snoop control.  When snoop
 control is available, intel-iommu promotes No-Snoop transactions on
 PCIe to be cache coherent.  That allows KVM to handle things like the
 x86 WBINVD opcode as a nop.  When the hardware does not support this,
 KVM must implement a hardware visible WBINVD for the guest.
 
 We could simply let userspace tell KVM how to handle WBINVD, but it's
 privileged for a reason.  Allowing an arbitrary user to enable
 physical WBINVD gives them a more access to the hardware.  Previously,
 this has only been enabled for guests supporting legacy PCI device
 assignment.  In such cases it's necessary for proper guest execution.
 We therefore create a new KVM-VFIO virtual device.  The user can add
 and remove VFIO groups to this device via file descriptors.  KVM
 makes use of the VFIO external user interface to validate that the
 user has access to physical hardware and gets the coherency state of
 the IOMMU from VFIO.  This provides equivalent functionality to
 legacy KVM assignment, while keeping (nearly) all the bits isolated.
 
Looks good overall to me, one things though: to use legacy device
assignment one needs root permission, so only root user can enable
WBINVD emulation. Who does this permission checking here? Is only root
allowed to create non coherent group with vfio?

 The one intrusion is the resulting flag indicating the coherency
 state.  For this RFC it's placed on the x86 kvm_arch struct, however
 I know POWER has interest in using the VFIO external user interface,
 and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
 care about No-Snoop handling as well or the code can be #ifdef'd.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
  Documentation/virtual/kvm/devices/vfio.txt |   22 +++
  arch/x86/include/asm/kvm_host.h|1 
  arch/x86/kvm/Makefile  |2 
  arch/x86/kvm/vmx.c |5 -
  arch/x86/kvm/x86.c |5 -
  include/linux/kvm_host.h   |1 
  include/uapi/linux/kvm.h   |4 
  virt/kvm/kvm_main.c|3 
  virt/kvm/vfio.c|  237 
 
  9 files changed, 275 insertions(+), 5 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
  create mode 100644 virt/kvm/vfio.c
 
 diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
 b/Documentation/virtual/kvm/devices/vfio.txt
 new file mode 100644
 index 000..831e6a6
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/vfio.txt
 @@ -0,0 +1,22 @@
 +VFIO virtual device
 +===
 +
 +Device types supported:
 +  KVM_DEV_TYPE_VFIO
 +
 +Only one VFIO instance may be created per VM.  The created device
 +tracks VFIO groups in use by the VM and features of those groups
 +important to the correctness and acceleration of the VM.  As groups
 +are enabled and disabled for use by the VM, KVM should be updated
 +about their presence.  When registered with KVM, a reference to the
 +VFIO-group is held by KVM.
 +
 +Groups:
 +  KVM_DEV_VFIO_ADD_GROUP
 +  KVM_DEV_VFIO_DEL_GROUP
 +
 +Each takes a int32_t file descriptor for kvm_device_attr.addr and
 +does not support any group device kvm_device_attr.attr.
 +
 +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
 +  KVM_DEV_VFIO_GROUP_ADD  DEL?
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index c76ff74..5b9350d 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -588,6 +588,7 @@ struct kvm_arch {
  
   spinlock_t pvclock_gtod_sync_lock;
   bool use_master_clock;
 + bool vfio_noncoherent;
   u64 master_kernel_ns;
   cycle_t master_cycle_now;
  
 diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
 index bf4fb04..25d22b2 100644
 --- a/arch/x86/kvm/Makefile
 +++ b/arch/x86/kvm/Makefile
 @@ -9,7 +9,7 @@ KVM := ../../../virt/kvm
  
  kvm-y+= $(KVM)/kvm_main.o $(KVM)/ioapic.o \
   $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
 - $(KVM)/eventfd.o $(KVM)/irqchip.o
 + $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
  kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)  += $(KVM)/assigned-dev.o $(KVM)/iommu.o
  kvm-$(CONFIG_KVM_ASYNC_PF)   += $(KVM)/async_pf.o
  
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 1f1da43..94f7786 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7395,8 +7395,9 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
 gfn, bool is_mmio)
*/
   if (is_mmio)
   ret = MTRR_TYPE_UNCACHABLE  VMX_EPT_MT_EPTE_SHIFT;
 - else if 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Alex Williamson
On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
 On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
  So far we've succeeded at making KVM and VFIO mostly unaware of each
  other, but there's any important point where that breaks down.  Intel
  VT-d hardware may or may not support snoop control.  When snoop
  control is available, intel-iommu promotes No-Snoop transactions on
  PCIe to be cache coherent.  That allows KVM to handle things like the
  x86 WBINVD opcode as a nop.  When the hardware does not support this,
  KVM must implement a hardware visible WBINVD for the guest.
  
  We could simply let userspace tell KVM how to handle WBINVD, but it's
  privileged for a reason.  Allowing an arbitrary user to enable
  physical WBINVD gives them a more access to the hardware.  Previously,
  this has only been enabled for guests supporting legacy PCI device
  assignment.  In such cases it's necessary for proper guest execution.
  We therefore create a new KVM-VFIO virtual device.  The user can add
  and remove VFIO groups to this device via file descriptors.  KVM
  makes use of the VFIO external user interface to validate that the
  user has access to physical hardware and gets the coherency state of
  the IOMMU from VFIO.  This provides equivalent functionality to
  legacy KVM assignment, while keeping (nearly) all the bits isolated.
  
 Looks good overall to me, one things though: to use legacy device
 assignment one needs root permission, so only root user can enable
 WBINVD emulation.

That's not entirely accurate, legacy device assignment can be used by a
non-root user, libvirt does this all the time.  The part that requires
root access is opening the pci-sysfs config file, the rest can be
managed via file permissions on the remaining sysfs files.

  Who does this permission checking here? Is only root
 allowed to create non coherent group with vfio?

With vfio the user is granted permission by giving them access to the
vfio group file (/dev/vfio/$GROUP) and binding all the devices in the
group to vfio.  That enables the user to create a container (~iommu
domain) with the group attached to it.  Only then will the vfio external
user interface provide a reference to the group and enable this wbinvd
support.  So, wbinvd emulation should only be available to a user that
own a vfio group and has it configured for use with this interface.
Thanks,

Alex

  The one intrusion is the resulting flag indicating the coherency
  state.  For this RFC it's placed on the x86 kvm_arch struct, however
  I know POWER has interest in using the VFIO external user interface,
  and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
  care about No-Snoop handling as well or the code can be #ifdef'd.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
   Documentation/virtual/kvm/devices/vfio.txt |   22 +++
   arch/x86/include/asm/kvm_host.h|1 
   arch/x86/kvm/Makefile  |2 
   arch/x86/kvm/vmx.c |5 -
   arch/x86/kvm/x86.c |5 -
   include/linux/kvm_host.h   |1 
   include/uapi/linux/kvm.h   |4 
   virt/kvm/kvm_main.c|3 
   virt/kvm/vfio.c|  237 
  
   9 files changed, 275 insertions(+), 5 deletions(-)
   create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
   create mode 100644 virt/kvm/vfio.c
  
  diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
  b/Documentation/virtual/kvm/devices/vfio.txt
  new file mode 100644
  index 000..831e6a6
  --- /dev/null
  +++ b/Documentation/virtual/kvm/devices/vfio.txt
  @@ -0,0 +1,22 @@
  +VFIO virtual device
  +===
  +
  +Device types supported:
  +  KVM_DEV_TYPE_VFIO
  +
  +Only one VFIO instance may be created per VM.  The created device
  +tracks VFIO groups in use by the VM and features of those groups
  +important to the correctness and acceleration of the VM.  As groups
  +are enabled and disabled for use by the VM, KVM should be updated
  +about their presence.  When registered with KVM, a reference to the
  +VFIO-group is held by KVM.
  +
  +Groups:
  +  KVM_DEV_VFIO_ADD_GROUP
  +  KVM_DEV_VFIO_DEL_GROUP
  +
  +Each takes a int32_t file descriptor for kvm_device_attr.addr and
  +does not support any group device kvm_device_attr.attr.
  +
  +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
  +  KVM_DEV_VFIO_GROUP_ADD  DEL?
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index c76ff74..5b9350d 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -588,6 +588,7 @@ struct kvm_arch {
   
  spinlock_t pvclock_gtod_sync_lock;
  bool use_master_clock;
  +   bool vfio_noncoherent;
  u64 master_kernel_ns;
  cycle_t master_cycle_now;
   
  diff --git a/arch/x86/kvm/Makefile 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Gleb Natapov
On Sun, Sep 29, 2013 at 07:52:28AM -0600, Alex Williamson wrote:
 On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
  On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
   So far we've succeeded at making KVM and VFIO mostly unaware of each
   other, but there's any important point where that breaks down.  Intel
   VT-d hardware may or may not support snoop control.  When snoop
   control is available, intel-iommu promotes No-Snoop transactions on
   PCIe to be cache coherent.  That allows KVM to handle things like the
   x86 WBINVD opcode as a nop.  When the hardware does not support this,
   KVM must implement a hardware visible WBINVD for the guest.
   
   We could simply let userspace tell KVM how to handle WBINVD, but it's
   privileged for a reason.  Allowing an arbitrary user to enable
   physical WBINVD gives them a more access to the hardware.  Previously,
   this has only been enabled for guests supporting legacy PCI device
   assignment.  In such cases it's necessary for proper guest execution.
   We therefore create a new KVM-VFIO virtual device.  The user can add
   and remove VFIO groups to this device via file descriptors.  KVM
   makes use of the VFIO external user interface to validate that the
   user has access to physical hardware and gets the coherency state of
   the IOMMU from VFIO.  This provides equivalent functionality to
   legacy KVM assignment, while keeping (nearly) all the bits isolated.
   
  Looks good overall to me, one things though: to use legacy device
  assignment one needs root permission, so only root user can enable
  WBINVD emulation.
 
 That's not entirely accurate, legacy device assignment can be used by a
 non-root user, libvirt does this all the time.  The part that requires
 root access is opening the pci-sysfs config file, the rest can be
 managed via file permissions on the remaining sysfs files.
 
So how libvirt manages to do that as non-root user if pci-sysfs config
file needs root permission. I didn't mean to say that legacy code
checks for root explicitly, what I meant is that at some point root
permission is needed.

   Who does this permission checking here? Is only root
  allowed to create non coherent group with vfio?
 
 With vfio the user is granted permission by giving them access to the
 vfio group file (/dev/vfio/$GROUP) and binding all the devices in the
 group to vfio.  That enables the user to create a container (~iommu
 domain) with the group attached to it.  Only then will the vfio external
 user interface provide a reference to the group and enable this wbinvd
 support.  So, wbinvd emulation should only be available to a user that
 own a vfio group and has it configured for use with this interface.
What is the default permission of /dev/vfio/$GROUP?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Alex Williamson
On Sun, 2013-09-29 at 17:44 +0300, Gleb Natapov wrote:
 On Sun, Sep 29, 2013 at 07:52:28AM -0600, Alex Williamson wrote:
  On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
   On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but there's any important point where that breaks down.  Intel
VT-d hardware may or may not support snoop control.  When snoop
control is available, intel-iommu promotes No-Snoop transactions on
PCIe to be cache coherent.  That allows KVM to handle things like the
x86 WBINVD opcode as a nop.  When the hardware does not support this,
KVM must implement a hardware visible WBINVD for the guest.

We could simply let userspace tell KVM how to handle WBINVD, but it's
privileged for a reason.  Allowing an arbitrary user to enable
physical WBINVD gives them a more access to the hardware.  Previously,
this has only been enabled for guests supporting legacy PCI device
assignment.  In such cases it's necessary for proper guest execution.
We therefore create a new KVM-VFIO virtual device.  The user can add
and remove VFIO groups to this device via file descriptors.  KVM
makes use of the VFIO external user interface to validate that the
user has access to physical hardware and gets the coherency state of
the IOMMU from VFIO.  This provides equivalent functionality to
legacy KVM assignment, while keeping (nearly) all the bits isolated.

   Looks good overall to me, one things though: to use legacy device
   assignment one needs root permission, so only root user can enable
   WBINVD emulation.
  
  That's not entirely accurate, legacy device assignment can be used by a
  non-root user, libvirt does this all the time.  The part that requires
  root access is opening the pci-sysfs config file, the rest can be
  managed via file permissions on the remaining sysfs files.
  
 So how libvirt manages to do that as non-root user if pci-sysfs config
 file needs root permission. I didn't mean to say that legacy code
 checks for root explicitly, what I meant is that at some point root
 permission is needed.

Yes, libvirt needs admin permission for legacy to bind to pci-stub,
change permission on sysfs files and pass an opened pci config sysfs
file descriptor.  For vfio libvirt needs admin permission to bind to
vfio-pci and change group file permission.  From that perspective the
admin requirement is similar.

Who does this permission checking here? Is only root
   allowed to create non coherent group with vfio?
  
  With vfio the user is granted permission by giving them access to the
  vfio group file (/dev/vfio/$GROUP) and binding all the devices in the
  group to vfio.  That enables the user to create a container (~iommu
  domain) with the group attached to it.  Only then will the vfio external
  user interface provide a reference to the group and enable this wbinvd
  support.  So, wbinvd emulation should only be available to a user that
  own a vfio group and has it configured for use with this interface.
 What is the default permission of /dev/vfio/$GROUP?

It's 600.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-25 Thread Alex Williamson
On Sun, 2013-09-15 at 22:40 +1000, Alexey Kardashevskiy wrote:
 On 09/14/2013 02:25 AM, Alex Williamson wrote:
  On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
  On 09/13/2013 07:23 AM, Alex Williamson wrote:
  So far we've succeeded at making KVM and VFIO mostly unaware of each
  other, but there's any important point where that breaks down.  Intel
  VT-d hardware may or may not support snoop control.  When snoop
  control is available, intel-iommu promotes No-Snoop transactions on
  PCIe to be cache coherent.  That allows KVM to handle things like the
  x86 WBINVD opcode as a nop.  When the hardware does not support this,
  KVM must implement a hardware visible WBINVD for the guest.
 
  We could simply let userspace tell KVM how to handle WBINVD, but it's
  privileged for a reason.  Allowing an arbitrary user to enable
  physical WBINVD gives them a more access to the hardware.  Previously,
  this has only been enabled for guests supporting legacy PCI device
  assignment.  In such cases it's necessary for proper guest execution.
  We therefore create a new KVM-VFIO virtual device.  The user can add
  and remove VFIO groups to this device via file descriptors.  KVM
  makes use of the VFIO external user interface to validate that the
  user has access to physical hardware and gets the coherency state of
  the IOMMU from VFIO.  This provides equivalent functionality to
  legacy KVM assignment, while keeping (nearly) all the bits isolated.
 
  The one intrusion is the resulting flag indicating the coherency
  state.  For this RFC it's placed on the x86 kvm_arch struct, however
  I know POWER has interest in using the VFIO external user interface,
  and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
  care about No-Snoop handling as well or the code can be #ifdef'd.
 
 
  POWER does not support (at least boos3s - server, not sure about others)
  this cache-non-coherent stuff at all.
  
  Then it's easy for your IOMMU API interface to return always cache
  coherent or never cache coherent or whatever ;)
  
  Regarding reusing this device with external API for POWER - I posted a
  patch which introduces KVM device to link KVM with IOMMU but besides the
  list of groups registered in KVM, it also provides the way to find a group
  by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
  in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
  there window_size too (for a quick boundary check). I am not sure we want
  to mix everything here.
 
  It is in [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
  handling if you are interested (kvmppc_spapr_tce_iommu_device).
  
  Yes, I stole the code to get the vfio symbols from your code.  The
  convergence I was hoping to achieve is that KVM doesn't really want to
  know about VFIO and vica versa.  We can therefore at least limit the
  intrusion by sharing a common device.  Obviously for you it will need
  some extra interfaces to associate an LIOBN to a group, but we keep both
  the kernel an userspace cleaner by avoiding duplication where we can.
  Is this really not extensible to your usage?
 
 Well, I do not have a good taste for this kind of stuff so I cannot tell
 for sure. I can reuse this device and hack it to do whatever I need but how?
 
 There are 2 things I need from KVM device:
 1. associate IOMMU group with LIOBN

Does QEMU know this?  We could set another attribute to specify the
LIOBN for a group.

 2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
 pointer which is an IOMMU data of an IOMMU group so we could take a
 shortcut here).

Once we have a VFIO device with a VFIO group added and the LIOBN
attribute set, isn't this just a matter of some access code?

 There are 3 possible interfaces to use:
 A. set/get attributes
 B. ioctl
 C. additional API

I think we need to differentiate user interfaces from kernel interfaces.
Within the kernel, we make no guarantees about interfaces and we can
change them to meet our needs.  That includes the vfio external user
interface.  For userspace, we need to be careful not to break things.  I
suggest we use the set/get/has attribute interface that's already part
of KVM for the user interface.

 What I posted a week ago uses A for 1 and C for 2. Now we move this to
 virt/kvm/vfio.c.

I don't care where it lives, other than we both have a need for it, so
it seems like the core of it should not live in architecture specific
directories.

 A for 1 is more or less ok but how exactly? Yet another attribute? Platform
 specific bus ID? In your patch attr-addr is not really an address (to be
 accessed with get_user()) but just an fd.

Is that a problem?

 For 2 - there are already A and B interfaces so we do not want C, right?
 What kind of a good device has a backdoor? :) But A and B do not have
 access control to prevent the user space from receiving a IOMMU group
 pointer (which it would not be able to use anyway 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-25 Thread Alexey Kardashevskiy
On 09/26/2013 06:19 AM, Alex Williamson wrote:
 On Sun, 2013-09-15 at 22:40 +1000, Alexey Kardashevskiy wrote:
 On 09/14/2013 02:25 AM, Alex Williamson wrote:
 On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
 On 09/13/2013 07:23 AM, Alex Williamson wrote:
 So far we've succeeded at making KVM and VFIO mostly unaware of each
 other, but there's any important point where that breaks down.  Intel
 VT-d hardware may or may not support snoop control.  When snoop
 control is available, intel-iommu promotes No-Snoop transactions on
 PCIe to be cache coherent.  That allows KVM to handle things like the
 x86 WBINVD opcode as a nop.  When the hardware does not support this,
 KVM must implement a hardware visible WBINVD for the guest.

 We could simply let userspace tell KVM how to handle WBINVD, but it's
 privileged for a reason.  Allowing an arbitrary user to enable
 physical WBINVD gives them a more access to the hardware.  Previously,
 this has only been enabled for guests supporting legacy PCI device
 assignment.  In such cases it's necessary for proper guest execution.
 We therefore create a new KVM-VFIO virtual device.  The user can add
 and remove VFIO groups to this device via file descriptors.  KVM
 makes use of the VFIO external user interface to validate that the
 user has access to physical hardware and gets the coherency state of
 the IOMMU from VFIO.  This provides equivalent functionality to
 legacy KVM assignment, while keeping (nearly) all the bits isolated.

 The one intrusion is the resulting flag indicating the coherency
 state.  For this RFC it's placed on the x86 kvm_arch struct, however
 I know POWER has interest in using the VFIO external user interface,
 and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
 care about No-Snoop handling as well or the code can be #ifdef'd.


 POWER does not support (at least boos3s - server, not sure about others)
 this cache-non-coherent stuff at all.

 Then it's easy for your IOMMU API interface to return always cache
 coherent or never cache coherent or whatever ;)

 Regarding reusing this device with external API for POWER - I posted a
 patch which introduces KVM device to link KVM with IOMMU but besides the
 list of groups registered in KVM, it also provides the way to find a group
 by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
 in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
 there window_size too (for a quick boundary check). I am not sure we want
 to mix everything here.

 It is in [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
 handling if you are interested (kvmppc_spapr_tce_iommu_device).

 Yes, I stole the code to get the vfio symbols from your code.  The
 convergence I was hoping to achieve is that KVM doesn't really want to
 know about VFIO and vica versa.  We can therefore at least limit the
 intrusion by sharing a common device.  Obviously for you it will need
 some extra interfaces to associate an LIOBN to a group, but we keep both
 the kernel an userspace cleaner by avoiding duplication where we can.
 Is this really not extensible to your usage?

 Well, I do not have a good taste for this kind of stuff so I cannot tell
 for sure. I can reuse this device and hack it to do whatever I need but how?

 There are 2 things I need from KVM device:
 1. associate IOMMU group with LIOBN
 
 Does QEMU know this?  We could set another attribute to specify the
 LIOBN for a group.

QEMU knows as QEMU decides on LOIBN number. And yes, we could do that.


 2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
 pointer which is an IOMMU data of an IOMMU group so we could take a
 shortcut here).
 
 Once we have a VFIO device with a VFIO group added and the LIOBN
 attribute set, isn't this just a matter of some access code?


The lookup function will be called from what we call a realmode on PPC64,
i.e. when MMU is off and kvm.ko code is not available. So we will either
have to put this lookup function to a separate virt/kvm/vfio_rm.c or
compile the whole thing into the kernel image but this is not a big issue
anyway.

You can have a look for example at book3s_64_vio_hv.o vs. book3s_64_vio.o
to get a better picture if you like.


 There are 3 possible interfaces to use:
 A. set/get attributes
 B. ioctl
 C. additional API
 
 I think we need to differentiate user interfaces from kernel interfaces.
 Within the kernel, we make no guarantees about interfaces and we can
 change them to meet our needs.  That includes the vfio external user
 interface.  For userspace, we need to be careful not to break things.  I
 suggest we use the set/get/has attribute interface that's already part
 of KVM for the user interface.
 
 What I posted a week ago uses A for 1 and C for 2. Now we move this to
 virt/kvm/vfio.c.
 
 I don't care where it lives, other than we both have a need for it, so
 it seems like the core of it should not live in architecture specific
 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-15 Thread Alexey Kardashevskiy
On 09/14/2013 02:25 AM, Alex Williamson wrote:
 On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
 On 09/13/2013 07:23 AM, Alex Williamson wrote:
 So far we've succeeded at making KVM and VFIO mostly unaware of each
 other, but there's any important point where that breaks down.  Intel
 VT-d hardware may or may not support snoop control.  When snoop
 control is available, intel-iommu promotes No-Snoop transactions on
 PCIe to be cache coherent.  That allows KVM to handle things like the
 x86 WBINVD opcode as a nop.  When the hardware does not support this,
 KVM must implement a hardware visible WBINVD for the guest.

 We could simply let userspace tell KVM how to handle WBINVD, but it's
 privileged for a reason.  Allowing an arbitrary user to enable
 physical WBINVD gives them a more access to the hardware.  Previously,
 this has only been enabled for guests supporting legacy PCI device
 assignment.  In such cases it's necessary for proper guest execution.
 We therefore create a new KVM-VFIO virtual device.  The user can add
 and remove VFIO groups to this device via file descriptors.  KVM
 makes use of the VFIO external user interface to validate that the
 user has access to physical hardware and gets the coherency state of
 the IOMMU from VFIO.  This provides equivalent functionality to
 legacy KVM assignment, while keeping (nearly) all the bits isolated.

 The one intrusion is the resulting flag indicating the coherency
 state.  For this RFC it's placed on the x86 kvm_arch struct, however
 I know POWER has interest in using the VFIO external user interface,
 and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
 care about No-Snoop handling as well or the code can be #ifdef'd.


 POWER does not support (at least boos3s - server, not sure about others)
 this cache-non-coherent stuff at all.
 
 Then it's easy for your IOMMU API interface to return always cache
 coherent or never cache coherent or whatever ;)
 
 Regarding reusing this device with external API for POWER - I posted a
 patch which introduces KVM device to link KVM with IOMMU but besides the
 list of groups registered in KVM, it also provides the way to find a group
 by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
 in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
 there window_size too (for a quick boundary check). I am not sure we want
 to mix everything here.

 It is in [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
 handling if you are interested (kvmppc_spapr_tce_iommu_device).
 
 Yes, I stole the code to get the vfio symbols from your code.  The
 convergence I was hoping to achieve is that KVM doesn't really want to
 know about VFIO and vica versa.  We can therefore at least limit the
 intrusion by sharing a common device.  Obviously for you it will need
 some extra interfaces to associate an LIOBN to a group, but we keep both
 the kernel an userspace cleaner by avoiding duplication where we can.
 Is this really not extensible to your usage?

Well, I do not have a good taste for this kind of stuff so I cannot tell
for sure. I can reuse this device and hack it to do whatever I need but how?

There are 2 things I need from KVM device:
1. associate IOMMU group with LIOBN
2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
pointer which is an IOMMU data of an IOMMU group so we could take a
shortcut here).

There are 3 possible interfaces to use:
A. set/get attributes
B. ioctl
C. additional API

What I posted a week ago uses A for 1 and C for 2. Now we move this to
virt/kvm/vfio.c.
A for 1 is more or less ok but how exactly? Yet another attribute? Platform
specific bus ID? In your patch attr-addr is not really an address (to be
accessed with get_user()) but just an fd.

For 2 - there are already A and B interfaces so we do not want C, right?
What kind of a good device has a backdoor? :) But A and B do not have
access control to prevent the user space from receiving a IOMMU group
pointer (which it would not be able to use anyway but still). Do we care
about this (I do not)? And using B (ioctls) within the kernel - I cannot
say I saw many usages of this.

I am pretty sure we will spend some time (not much) arguing about these
things and when we agree on something, then some of KVM maintainers will
step in and say that there is 1001st way of doing this and - goto start.
And after all, this still won't be a device as it does not emulate anything
present in the real hardware, this is just yet another interface to KVM and
some ways of using it won't be natural for somebody. /me sighs.



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Alexey Kardashevskiy
On 09/13/2013 07:23 AM, Alex Williamson wrote:
 So far we've succeeded at making KVM and VFIO mostly unaware of each
 other, but there's any important point where that breaks down.  Intel
 VT-d hardware may or may not support snoop control.  When snoop
 control is available, intel-iommu promotes No-Snoop transactions on
 PCIe to be cache coherent.  That allows KVM to handle things like the
 x86 WBINVD opcode as a nop.  When the hardware does not support this,
 KVM must implement a hardware visible WBINVD for the guest.
 
 We could simply let userspace tell KVM how to handle WBINVD, but it's
 privileged for a reason.  Allowing an arbitrary user to enable
 physical WBINVD gives them a more access to the hardware.  Previously,
 this has only been enabled for guests supporting legacy PCI device
 assignment.  In such cases it's necessary for proper guest execution.
 We therefore create a new KVM-VFIO virtual device.  The user can add
 and remove VFIO groups to this device via file descriptors.  KVM
 makes use of the VFIO external user interface to validate that the
 user has access to physical hardware and gets the coherency state of
 the IOMMU from VFIO.  This provides equivalent functionality to
 legacy KVM assignment, while keeping (nearly) all the bits isolated.
 
 The one intrusion is the resulting flag indicating the coherency
 state.  For this RFC it's placed on the x86 kvm_arch struct, however
 I know POWER has interest in using the VFIO external user interface,
 and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
 care about No-Snoop handling as well or the code can be #ifdef'd.


POWER does not support (at least boos3s - server, not sure about others)
this cache-non-coherent stuff at all.

Regarding reusing this device with external API for POWER - I posted a
patch which introduces KVM device to link KVM with IOMMU but besides the
list of groups registered in KVM, it also provides the way to find a group
by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
there window_size too (for a quick boundary check). I am not sure we want
to mix everything here.

It is in [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
handling if you are interested (kvmppc_spapr_tce_iommu_device).



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Michael S. Tsirkin
On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
 So far we've succeeded at making KVM and VFIO mostly unaware of each
 other, but there's any important point where that breaks down.  Intel
 VT-d hardware may or may not support snoop control.  When snoop
 control is available, intel-iommu promotes No-Snoop transactions on
 PCIe to be cache coherent.  That allows KVM to handle things like the
 x86 WBINVD opcode as a nop.  When the hardware does not support this,
 KVM must implement a hardware visible WBINVD for the guest.
 
 We could simply let userspace tell KVM how to handle WBINVD, but it's
 privileged for a reason.  Allowing an arbitrary user to enable
 physical WBINVD gives them a more access to the hardware.  Previously,
 this has only been enabled for guests supporting legacy PCI device
 assignment.  In such cases it's necessary for proper guest execution.
 We therefore create a new KVM-VFIO virtual device.  The user can add
 and remove VFIO groups to this device via file descriptors.  KVM
 makes use of the VFIO external user interface to validate that the
 user has access to physical hardware and gets the coherency state of
 the IOMMU from VFIO.  This provides equivalent functionality to
 legacy KVM assignment, while keeping (nearly) all the bits isolated.


So how is the isolation handled then?
How is this better than a ioctl to grant WBINVD to guest?
kvm char device can be opened by any user,
so any user can grant itself these priveledges.
What did I miss?


 The one intrusion is the resulting flag indicating the coherency
 state.  For this RFC it's placed on the x86 kvm_arch struct, however
 I know POWER has interest in using the VFIO external user interface,
 and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
 care about No-Snoop handling as well or the code can be #ifdef'd.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
  Documentation/virtual/kvm/devices/vfio.txt |   22 +++
  arch/x86/include/asm/kvm_host.h|1 
  arch/x86/kvm/Makefile  |2 
  arch/x86/kvm/vmx.c |5 -
  arch/x86/kvm/x86.c |5 -
  include/linux/kvm_host.h   |1 
  include/uapi/linux/kvm.h   |4 
  virt/kvm/kvm_main.c|3 
  virt/kvm/vfio.c|  237 
 
  9 files changed, 275 insertions(+), 5 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
  create mode 100644 virt/kvm/vfio.c
 
 diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
 b/Documentation/virtual/kvm/devices/vfio.txt
 new file mode 100644
 index 000..831e6a6
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/vfio.txt
 @@ -0,0 +1,22 @@
 +VFIO virtual device
 +===
 +
 +Device types supported:
 +  KVM_DEV_TYPE_VFIO
 +
 +Only one VFIO instance may be created per VM.  The created device
 +tracks VFIO groups in use by the VM and features of those groups
 +important to the correctness and acceleration of the VM.  As groups
 +are enabled and disabled for use by the VM, KVM should be updated
 +about their presence.  When registered with KVM, a reference to the
 +VFIO-group is held by KVM.
 +
 +Groups:
 +  KVM_DEV_VFIO_ADD_GROUP
 +  KVM_DEV_VFIO_DEL_GROUP
 +
 +Each takes a int32_t file descriptor for kvm_device_attr.addr and
 +does not support any group device kvm_device_attr.attr.
 +
 +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
 +  KVM_DEV_VFIO_GROUP_ADD  DEL?
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index c76ff74..5b9350d 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -588,6 +588,7 @@ struct kvm_arch {
  
   spinlock_t pvclock_gtod_sync_lock;
   bool use_master_clock;
 + bool vfio_noncoherent;
   u64 master_kernel_ns;
   cycle_t master_cycle_now;
  
 diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
 index bf4fb04..25d22b2 100644
 --- a/arch/x86/kvm/Makefile
 +++ b/arch/x86/kvm/Makefile
 @@ -9,7 +9,7 @@ KVM := ../../../virt/kvm
  
  kvm-y+= $(KVM)/kvm_main.o $(KVM)/ioapic.o \
   $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
 - $(KVM)/eventfd.o $(KVM)/irqchip.o
 + $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
  kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)  += $(KVM)/assigned-dev.o $(KVM)/iommu.o
  kvm-$(CONFIG_KVM_ASYNC_PF)   += $(KVM)/async_pf.o
  
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 1f1da43..94f7786 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7395,8 +7395,9 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
 gfn, bool is_mmio)
*/
   if (is_mmio)
   ret = MTRR_TYPE_UNCACHABLE  VMX_EPT_MT_EPTE_SHIFT;
 - else if (vcpu-kvm-arch.iommu_domain 
 - 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Alex Williamson
On Fri, 2013-09-13 at 15:39 +0300, Michael S. Tsirkin wrote:
 On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
  So far we've succeeded at making KVM and VFIO mostly unaware of each
  other, but there's any important point where that breaks down.  Intel
  VT-d hardware may or may not support snoop control.  When snoop
  control is available, intel-iommu promotes No-Snoop transactions on
  PCIe to be cache coherent.  That allows KVM to handle things like the
  x86 WBINVD opcode as a nop.  When the hardware does not support this,
  KVM must implement a hardware visible WBINVD for the guest.
  
  We could simply let userspace tell KVM how to handle WBINVD, but it's
  privileged for a reason.  Allowing an arbitrary user to enable
  physical WBINVD gives them a more access to the hardware.  Previously,
  this has only been enabled for guests supporting legacy PCI device
  assignment.  In such cases it's necessary for proper guest execution.
  We therefore create a new KVM-VFIO virtual device.  The user can add
  and remove VFIO groups to this device via file descriptors.  KVM
  makes use of the VFIO external user interface to validate that the
  user has access to physical hardware and gets the coherency state of
  the IOMMU from VFIO.  This provides equivalent functionality to
  legacy KVM assignment, while keeping (nearly) all the bits isolated.
 
 
 So how is the isolation handled then?

By isolation above, I'm only talking about the intrusive-ness of the
code and leaking vfio knowledge into KVM.  Nothing to do with device
isolation.

 How is this better than a ioctl to grant WBINVD to guest?
 kvm char device can be opened by any user,
 so any user can grant itself these priveledges.
 What did I miss?

With this interface the caller must have physical access to one or more
devices via vfio groups and each of those groups must be configured into
one or more IOMMU domains.  Furthermore, at least one of the IOMMU
domains must not include the IOMMU_CAP_CACHE_COHERENCY capability.  So
it's actually quite a significantly higher hurdle than an ioctl open to
anyone and the number of VMs on a given host capable of doing this is
bound by the number of IOMMU groups.  We do not however verify that a
vfio device is actually in use by the VM, but I don't think there's a
way to do that from KVM and I'm not sure that it's important to do so.
I believe having access to physical hardware is already a sufficient
granting of privilege to enable things like WBINVD.  Thanks,

Alex

  The one intrusion is the resulting flag indicating the coherency
  state.  For this RFC it's placed on the x86 kvm_arch struct, however
  I know POWER has interest in using the VFIO external user interface,
  and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
  care about No-Snoop handling as well or the code can be #ifdef'd.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
   Documentation/virtual/kvm/devices/vfio.txt |   22 +++
   arch/x86/include/asm/kvm_host.h|1 
   arch/x86/kvm/Makefile  |2 
   arch/x86/kvm/vmx.c |5 -
   arch/x86/kvm/x86.c |5 -
   include/linux/kvm_host.h   |1 
   include/uapi/linux/kvm.h   |4 
   virt/kvm/kvm_main.c|3 
   virt/kvm/vfio.c|  237 
  
   9 files changed, 275 insertions(+), 5 deletions(-)
   create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
   create mode 100644 virt/kvm/vfio.c
  
  diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
  b/Documentation/virtual/kvm/devices/vfio.txt
  new file mode 100644
  index 000..831e6a6
  --- /dev/null
  +++ b/Documentation/virtual/kvm/devices/vfio.txt
  @@ -0,0 +1,22 @@
  +VFIO virtual device
  +===
  +
  +Device types supported:
  +  KVM_DEV_TYPE_VFIO
  +
  +Only one VFIO instance may be created per VM.  The created device
  +tracks VFIO groups in use by the VM and features of those groups
  +important to the correctness and acceleration of the VM.  As groups
  +are enabled and disabled for use by the VM, KVM should be updated
  +about their presence.  When registered with KVM, a reference to the
  +VFIO-group is held by KVM.
  +
  +Groups:
  +  KVM_DEV_VFIO_ADD_GROUP
  +  KVM_DEV_VFIO_DEL_GROUP
  +
  +Each takes a int32_t file descriptor for kvm_device_attr.addr and
  +does not support any group device kvm_device_attr.attr.
  +
  +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
  +  KVM_DEV_VFIO_GROUP_ADD  DEL?
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index c76ff74..5b9350d 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -588,6 +588,7 @@ struct kvm_arch {
   
  spinlock_t pvclock_gtod_sync_lock;
  bool use_master_clock;
  +   bool vfio_noncoherent;
  u64 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Michael S. Tsirkin
On Fri, Sep 13, 2013 at 08:13:40AM -0600, Alex Williamson wrote:
 On Fri, 2013-09-13 at 15:39 +0300, Michael S. Tsirkin wrote:
  On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
   So far we've succeeded at making KVM and VFIO mostly unaware of each
   other, but there's any important point where that breaks down.  Intel
   VT-d hardware may or may not support snoop control.  When snoop
   control is available, intel-iommu promotes No-Snoop transactions on
   PCIe to be cache coherent.  That allows KVM to handle things like the
   x86 WBINVD opcode as a nop.  When the hardware does not support this,
   KVM must implement a hardware visible WBINVD for the guest.
   
   We could simply let userspace tell KVM how to handle WBINVD, but it's
   privileged for a reason.  Allowing an arbitrary user to enable
   physical WBINVD gives them a more access to the hardware.  Previously,
   this has only been enabled for guests supporting legacy PCI device
   assignment.  In such cases it's necessary for proper guest execution.
   We therefore create a new KVM-VFIO virtual device.  The user can add
   and remove VFIO groups to this device via file descriptors.  KVM
   makes use of the VFIO external user interface to validate that the
   user has access to physical hardware and gets the coherency state of
   the IOMMU from VFIO.  This provides equivalent functionality to
   legacy KVM assignment, while keeping (nearly) all the bits isolated.
  
  
  So how is the isolation handled then?
 
 By isolation above, I'm only talking about the intrusive-ness of the
 code and leaking vfio knowledge into KVM.  Nothing to do with device
 isolation.
 
  How is this better than a ioctl to grant WBINVD to guest?
  kvm char device can be opened by any user,
  so any user can grant itself these priveledges.
  What did I miss?
 
 With this interface the caller must have physical access to one or more
 devices via vfio groups and each of those groups must be configured into
 one or more IOMMU domains.  Furthermore, at least one of the IOMMU
 domains must not include the IOMMU_CAP_CACHE_COHERENCY capability.  So
 it's actually quite a significantly higher hurdle than an ioctl open to
 anyone and the number of VMs on a given host capable of doing this is
 bound by the number of IOMMU groups.  We do not however verify that a
 vfio device is actually in use by the VM, but I don't think there's a
 way to do that from KVM and I'm not sure that it's important to do so.
 I believe having access to physical hardware is already a sufficient
 granting of privilege to enable things like WBINVD.  Thanks,
 
 Alex

Fair enough, but how about revoking the priveledge?
For example, device can be removed by hotplug - does
priveledge remain?
Is this important at all?


   The one intrusion is the resulting flag indicating the coherency
   state.  For this RFC it's placed on the x86 kvm_arch struct, however
   I know POWER has interest in using the VFIO external user interface,
   and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
   care about No-Snoop handling as well or the code can be #ifdef'd.
   
   Signed-off-by: Alex Williamson alex.william...@redhat.com
   ---
Documentation/virtual/kvm/devices/vfio.txt |   22 +++
arch/x86/include/asm/kvm_host.h|1 
arch/x86/kvm/Makefile  |2 
arch/x86/kvm/vmx.c |5 -
arch/x86/kvm/x86.c |5 -
include/linux/kvm_host.h   |1 
include/uapi/linux/kvm.h   |4 
virt/kvm/kvm_main.c|3 
virt/kvm/vfio.c|  237 
   
9 files changed, 275 insertions(+), 5 deletions(-)
create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
create mode 100644 virt/kvm/vfio.c
   
   diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
   b/Documentation/virtual/kvm/devices/vfio.txt
   new file mode 100644
   index 000..831e6a6
   --- /dev/null
   +++ b/Documentation/virtual/kvm/devices/vfio.txt
   @@ -0,0 +1,22 @@
   +VFIO virtual device
   +===
   +
   +Device types supported:
   +  KVM_DEV_TYPE_VFIO
   +
   +Only one VFIO instance may be created per VM.  The created device
   +tracks VFIO groups in use by the VM and features of those groups
   +important to the correctness and acceleration of the VM.  As groups
   +are enabled and disabled for use by the VM, KVM should be updated
   +about their presence.  When registered with KVM, a reference to the
   +VFIO-group is held by KVM.
   +
   +Groups:
   +  KVM_DEV_VFIO_ADD_GROUP
   +  KVM_DEV_VFIO_DEL_GROUP
   +
   +Each takes a int32_t file descriptor for kvm_device_attr.addr and
   +does not support any group device kvm_device_attr.attr.
   +
   +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
   +  KVM_DEV_VFIO_GROUP_ADD  DEL?
   diff --git 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Alex Williamson
On Fri, 2013-09-13 at 17:52 +0300, Michael S. Tsirkin wrote:
 On Fri, Sep 13, 2013 at 08:13:40AM -0600, Alex Williamson wrote:
  On Fri, 2013-09-13 at 15:39 +0300, Michael S. Tsirkin wrote:
   On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but there's any important point where that breaks down.  Intel
VT-d hardware may or may not support snoop control.  When snoop
control is available, intel-iommu promotes No-Snoop transactions on
PCIe to be cache coherent.  That allows KVM to handle things like the
x86 WBINVD opcode as a nop.  When the hardware does not support this,
KVM must implement a hardware visible WBINVD for the guest.

We could simply let userspace tell KVM how to handle WBINVD, but it's
privileged for a reason.  Allowing an arbitrary user to enable
physical WBINVD gives them a more access to the hardware.  Previously,
this has only been enabled for guests supporting legacy PCI device
assignment.  In such cases it's necessary for proper guest execution.
We therefore create a new KVM-VFIO virtual device.  The user can add
and remove VFIO groups to this device via file descriptors.  KVM
makes use of the VFIO external user interface to validate that the
user has access to physical hardware and gets the coherency state of
the IOMMU from VFIO.  This provides equivalent functionality to
legacy KVM assignment, while keeping (nearly) all the bits isolated.
   
   
   So how is the isolation handled then?
  
  By isolation above, I'm only talking about the intrusive-ness of the
  code and leaking vfio knowledge into KVM.  Nothing to do with device
  isolation.
  
   How is this better than a ioctl to grant WBINVD to guest?
   kvm char device can be opened by any user,
   so any user can grant itself these priveledges.
   What did I miss?
  
  With this interface the caller must have physical access to one or more
  devices via vfio groups and each of those groups must be configured into
  one or more IOMMU domains.  Furthermore, at least one of the IOMMU
  domains must not include the IOMMU_CAP_CACHE_COHERENCY capability.  So
  it's actually quite a significantly higher hurdle than an ioctl open to
  anyone and the number of VMs on a given host capable of doing this is
  bound by the number of IOMMU groups.  We do not however verify that a
  vfio device is actually in use by the VM, but I don't think there's a
  way to do that from KVM and I'm not sure that it's important to do so.
  I believe having access to physical hardware is already a sufficient
  granting of privilege to enable things like WBINVD.  Thanks,
  
  Alex
 
 Fair enough, but how about revoking the priveledge?
 For example, device can be removed by hotplug - does
 priveledge remain?
 Is this important at all?

That's where the reference comes in through the vfio external user
interface.  KVM holds a reference to the group that maintains iommu
protection for the group.  Therefore there's no revocation, KVM needs to
release the reference either under direction of the user, closing of the
KVM device fd, or teardown of the KVM VM.

Unused devices can be hotplugged from the group; used devices will be
blocked by the user's reference to the device.  Hot-added devices
devices will join the group and be added to the IOMMU domain.  If a
non-vfio driver is attached to a member of the group, we'll BUG_ON, just
like we do for the normal user case.  Eventually this needs to be
converted to tracking and killing user processes, but that will have the
effect of killing the VM and closing all the file descriptors,
automatically cleaning up the KVM held group reference.

I suppose the corner case you might be looking for is the scenario where
the user enables WBINVD via a group and never uses the device.  The
device is then hot-removed.  Should the user still have WBINVD?
Probably not.  Is it worth worrying about?  Probably not.  The user
still has privileges to the IOMMU domain, even though it has no devices.
If the device is re-added, it will go back into the same IOMMU domain,
so there's no opportunity for another user to gain privilege.  If we
wanted to track such a situation we'd need to introduce some mechanism
for KVM to be notified of group changes to re-evaluate the coherency
flags.  That all seems like negotiation between KVM and vfio in the
external user interface that isn't exposed to the user, should we decide
to handle it.  Thanks,

Alex

The one intrusion is the resulting flag indicating the coherency
state.  For this RFC it's placed on the x86 kvm_arch struct, however
I know POWER has interest in using the VFIO external user interface,
and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
care about No-Snoop handling as well or the code can be #ifdef'd.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Alex Williamson
On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
 On 09/13/2013 07:23 AM, Alex Williamson wrote:
  So far we've succeeded at making KVM and VFIO mostly unaware of each
  other, but there's any important point where that breaks down.  Intel
  VT-d hardware may or may not support snoop control.  When snoop
  control is available, intel-iommu promotes No-Snoop transactions on
  PCIe to be cache coherent.  That allows KVM to handle things like the
  x86 WBINVD opcode as a nop.  When the hardware does not support this,
  KVM must implement a hardware visible WBINVD for the guest.
  
  We could simply let userspace tell KVM how to handle WBINVD, but it's
  privileged for a reason.  Allowing an arbitrary user to enable
  physical WBINVD gives them a more access to the hardware.  Previously,
  this has only been enabled for guests supporting legacy PCI device
  assignment.  In such cases it's necessary for proper guest execution.
  We therefore create a new KVM-VFIO virtual device.  The user can add
  and remove VFIO groups to this device via file descriptors.  KVM
  makes use of the VFIO external user interface to validate that the
  user has access to physical hardware and gets the coherency state of
  the IOMMU from VFIO.  This provides equivalent functionality to
  legacy KVM assignment, while keeping (nearly) all the bits isolated.
  
  The one intrusion is the resulting flag indicating the coherency
  state.  For this RFC it's placed on the x86 kvm_arch struct, however
  I know POWER has interest in using the VFIO external user interface,
  and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
  care about No-Snoop handling as well or the code can be #ifdef'd.
 
 
 POWER does not support (at least boos3s - server, not sure about others)
 this cache-non-coherent stuff at all.

Then it's easy for your IOMMU API interface to return always cache
coherent or never cache coherent or whatever ;)

 Regarding reusing this device with external API for POWER - I posted a
 patch which introduces KVM device to link KVM with IOMMU but besides the
 list of groups registered in KVM, it also provides the way to find a group
 by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
 in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
 there window_size too (for a quick boundary check). I am not sure we want
 to mix everything here.
 
 It is in [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
 handling if you are interested (kvmppc_spapr_tce_iommu_device).

Yes, I stole the code to get the vfio symbols from your code.  The
convergence I was hoping to achieve is that KVM doesn't really want to
know about VFIO and vica versa.  We can therefore at least limit the
intrusion by sharing a common device.  Obviously for you it will need
some extra interfaces to associate an LIOBN to a group, but we keep both
the kernel an userspace cleaner by avoiding duplication where we can.
Is this really not extensible to your usage?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html