Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-30 Thread Gleb Natapov
On Sun, Sep 29, 2013 at 09:55:59AM -0600, Alex Williamson wrote:
> On Sun, 2013-09-29 at 17:44 +0300, Gleb Natapov wrote:
> > On Sun, Sep 29, 2013 at 07:52:28AM -0600, Alex Williamson wrote:
> > > On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
> > > > On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
> > > > > So far we've succeeded at making KVM and VFIO mostly unaware of each
> > > > > other, but there's any important point where that breaks down.  Intel
> > > > > VT-d hardware may or may not support snoop control.  When snoop
> > > > > control is available, intel-iommu promotes No-Snoop transactions on
> > > > > PCIe to be cache coherent.  That allows KVM to handle things like the
> > > > > x86 WBINVD opcode as a nop.  When the hardware does not support this,
> > > > > KVM must implement a hardware visible WBINVD for the guest.
> > > > > 
> > > > > We could simply let userspace tell KVM how to handle WBINVD, but it's
> > > > > privileged for a reason.  Allowing an arbitrary user to enable
> > > > > physical WBINVD gives them a more access to the hardware.  Previously,
> > > > > this has only been enabled for guests supporting legacy PCI device
> > > > > assignment.  In such cases it's necessary for proper guest execution.
> > > > > We therefore create a new KVM-VFIO virtual device.  The user can add
> > > > > and remove VFIO groups to this device via file descriptors.  KVM
> > > > > makes use of the VFIO external user interface to validate that the
> > > > > user has access to physical hardware and gets the coherency state of
> > > > > the IOMMU from VFIO.  This provides equivalent functionality to
> > > > > legacy KVM assignment, while keeping (nearly) all the bits isolated.
> > > > > 
> > > > Looks good overall to me, one things though: to use legacy device
> > > > assignment one needs root permission, so only root user can enable
> > > > WBINVD emulation.
> > > 
> > > That's not entirely accurate, legacy device assignment can be used by a
> > > non-root user, libvirt does this all the time.  The part that requires
> > > root access is opening the pci-sysfs config file, the rest can be
> > > managed via file permissions on the remaining sysfs files.
> > > 
> > So how libvirt manages to do that as non-root user if pci-sysfs config
> > file needs root permission. I didn't mean to say that legacy code
> > checks for root explicitly, what I meant is that at some point root
> > permission is needed.
> 
> Yes, libvirt needs admin permission for legacy to bind to pci-stub,
> change permission on sysfs files and pass an opened pci config sysfs
> file descriptor.  For vfio libvirt needs admin permission to bind to
> vfio-pci and change group file permission.  From that perspective the
> admin requirement is similar.
> 
Yes, certainly appears so. Thanks.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Alex Williamson
On Sun, 2013-09-29 at 17:44 +0300, Gleb Natapov wrote:
> On Sun, Sep 29, 2013 at 07:52:28AM -0600, Alex Williamson wrote:
> > On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
> > > On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
> > > > So far we've succeeded at making KVM and VFIO mostly unaware of each
> > > > other, but there's any important point where that breaks down.  Intel
> > > > VT-d hardware may or may not support snoop control.  When snoop
> > > > control is available, intel-iommu promotes No-Snoop transactions on
> > > > PCIe to be cache coherent.  That allows KVM to handle things like the
> > > > x86 WBINVD opcode as a nop.  When the hardware does not support this,
> > > > KVM must implement a hardware visible WBINVD for the guest.
> > > > 
> > > > We could simply let userspace tell KVM how to handle WBINVD, but it's
> > > > privileged for a reason.  Allowing an arbitrary user to enable
> > > > physical WBINVD gives them a more access to the hardware.  Previously,
> > > > this has only been enabled for guests supporting legacy PCI device
> > > > assignment.  In such cases it's necessary for proper guest execution.
> > > > We therefore create a new KVM-VFIO virtual device.  The user can add
> > > > and remove VFIO groups to this device via file descriptors.  KVM
> > > > makes use of the VFIO external user interface to validate that the
> > > > user has access to physical hardware and gets the coherency state of
> > > > the IOMMU from VFIO.  This provides equivalent functionality to
> > > > legacy KVM assignment, while keeping (nearly) all the bits isolated.
> > > > 
> > > Looks good overall to me, one things though: to use legacy device
> > > assignment one needs root permission, so only root user can enable
> > > WBINVD emulation.
> > 
> > That's not entirely accurate, legacy device assignment can be used by a
> > non-root user, libvirt does this all the time.  The part that requires
> > root access is opening the pci-sysfs config file, the rest can be
> > managed via file permissions on the remaining sysfs files.
> > 
> So how libvirt manages to do that as non-root user if pci-sysfs config
> file needs root permission. I didn't mean to say that legacy code
> checks for root explicitly, what I meant is that at some point root
> permission is needed.

Yes, libvirt needs admin permission for legacy to bind to pci-stub,
change permission on sysfs files and pass an opened pci config sysfs
file descriptor.  For vfio libvirt needs admin permission to bind to
vfio-pci and change group file permission.  From that perspective the
admin requirement is similar.

> > >  Who does this permission checking here? Is only root
> > > allowed to create non coherent group with vfio?
> > 
> > With vfio the user is granted permission by giving them access to the
> > vfio group file (/dev/vfio/$GROUP) and binding all the devices in the
> > group to vfio.  That enables the user to create a container (~iommu
> > domain) with the group attached to it.  Only then will the vfio external
> > user interface provide a reference to the group and enable this wbinvd
> > support.  So, wbinvd emulation should only be available to a user that
> > "own" a vfio group and has it configured for use with this interface.
> What is the default permission of /dev/vfio/$GROUP?

It's 600.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Gleb Natapov
On Sun, Sep 29, 2013 at 07:52:28AM -0600, Alex Williamson wrote:
> On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
> > On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
> > > So far we've succeeded at making KVM and VFIO mostly unaware of each
> > > other, but there's any important point where that breaks down.  Intel
> > > VT-d hardware may or may not support snoop control.  When snoop
> > > control is available, intel-iommu promotes No-Snoop transactions on
> > > PCIe to be cache coherent.  That allows KVM to handle things like the
> > > x86 WBINVD opcode as a nop.  When the hardware does not support this,
> > > KVM must implement a hardware visible WBINVD for the guest.
> > > 
> > > We could simply let userspace tell KVM how to handle WBINVD, but it's
> > > privileged for a reason.  Allowing an arbitrary user to enable
> > > physical WBINVD gives them a more access to the hardware.  Previously,
> > > this has only been enabled for guests supporting legacy PCI device
> > > assignment.  In such cases it's necessary for proper guest execution.
> > > We therefore create a new KVM-VFIO virtual device.  The user can add
> > > and remove VFIO groups to this device via file descriptors.  KVM
> > > makes use of the VFIO external user interface to validate that the
> > > user has access to physical hardware and gets the coherency state of
> > > the IOMMU from VFIO.  This provides equivalent functionality to
> > > legacy KVM assignment, while keeping (nearly) all the bits isolated.
> > > 
> > Looks good overall to me, one things though: to use legacy device
> > assignment one needs root permission, so only root user can enable
> > WBINVD emulation.
> 
> That's not entirely accurate, legacy device assignment can be used by a
> non-root user, libvirt does this all the time.  The part that requires
> root access is opening the pci-sysfs config file, the rest can be
> managed via file permissions on the remaining sysfs files.
> 
So how libvirt manages to do that as non-root user if pci-sysfs config
file needs root permission. I didn't mean to say that legacy code
checks for root explicitly, what I meant is that at some point root
permission is needed.

> >  Who does this permission checking here? Is only root
> > allowed to create non coherent group with vfio?
> 
> With vfio the user is granted permission by giving them access to the
> vfio group file (/dev/vfio/$GROUP) and binding all the devices in the
> group to vfio.  That enables the user to create a container (~iommu
> domain) with the group attached to it.  Only then will the vfio external
> user interface provide a reference to the group and enable this wbinvd
> support.  So, wbinvd emulation should only be available to a user that
> "own" a vfio group and has it configured for use with this interface.
What is the default permission of /dev/vfio/$GROUP?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Alex Williamson
On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
> On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
> > So far we've succeeded at making KVM and VFIO mostly unaware of each
> > other, but there's any important point where that breaks down.  Intel
> > VT-d hardware may or may not support snoop control.  When snoop
> > control is available, intel-iommu promotes No-Snoop transactions on
> > PCIe to be cache coherent.  That allows KVM to handle things like the
> > x86 WBINVD opcode as a nop.  When the hardware does not support this,
> > KVM must implement a hardware visible WBINVD for the guest.
> > 
> > We could simply let userspace tell KVM how to handle WBINVD, but it's
> > privileged for a reason.  Allowing an arbitrary user to enable
> > physical WBINVD gives them a more access to the hardware.  Previously,
> > this has only been enabled for guests supporting legacy PCI device
> > assignment.  In such cases it's necessary for proper guest execution.
> > We therefore create a new KVM-VFIO virtual device.  The user can add
> > and remove VFIO groups to this device via file descriptors.  KVM
> > makes use of the VFIO external user interface to validate that the
> > user has access to physical hardware and gets the coherency state of
> > the IOMMU from VFIO.  This provides equivalent functionality to
> > legacy KVM assignment, while keeping (nearly) all the bits isolated.
> > 
> Looks good overall to me, one things though: to use legacy device
> assignment one needs root permission, so only root user can enable
> WBINVD emulation.

That's not entirely accurate, legacy device assignment can be used by a
non-root user, libvirt does this all the time.  The part that requires
root access is opening the pci-sysfs config file, the rest can be
managed via file permissions on the remaining sysfs files.

>  Who does this permission checking here? Is only root
> allowed to create non coherent group with vfio?

With vfio the user is granted permission by giving them access to the
vfio group file (/dev/vfio/$GROUP) and binding all the devices in the
group to vfio.  That enables the user to create a container (~iommu
domain) with the group attached to it.  Only then will the vfio external
user interface provide a reference to the group and enable this wbinvd
support.  So, wbinvd emulation should only be available to a user that
"own" a vfio group and has it configured for use with this interface.
Thanks,

Alex

> > The one intrusion is the resulting flag indicating the coherency
> > state.  For this RFC it's placed on the x86 kvm_arch struct, however
> > I know POWER has interest in using the VFIO external user interface,
> > and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> > care about No-Snoop handling as well or the code can be #ifdef'd.
> > 
> > Signed-off-by: Alex Williamson 
> > ---
> >  Documentation/virtual/kvm/devices/vfio.txt |   22 +++
> >  arch/x86/include/asm/kvm_host.h|1 
> >  arch/x86/kvm/Makefile  |2 
> >  arch/x86/kvm/vmx.c |5 -
> >  arch/x86/kvm/x86.c |5 -
> >  include/linux/kvm_host.h   |1 
> >  include/uapi/linux/kvm.h   |4 
> >  virt/kvm/kvm_main.c|3 
> >  virt/kvm/vfio.c|  237 
> > 
> >  9 files changed, 275 insertions(+), 5 deletions(-)
> >  create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
> >  create mode 100644 virt/kvm/vfio.c
> > 
> > diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
> > b/Documentation/virtual/kvm/devices/vfio.txt
> > new file mode 100644
> > index 000..831e6a6
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/devices/vfio.txt
> > @@ -0,0 +1,22 @@
> > +VFIO virtual device
> > +===
> > +
> > +Device types supported:
> > +  KVM_DEV_TYPE_VFIO
> > +
> > +Only one VFIO instance may be created per VM.  The created device
> > +tracks VFIO groups in use by the VM and features of those groups
> > +important to the correctness and acceleration of the VM.  As groups
> > +are enabled and disabled for use by the VM, KVM should be updated
> > +about their presence.  When registered with KVM, a reference to the
> > +VFIO-group is held by KVM.
> > +
> > +Groups:
> > +  KVM_DEV_VFIO_ADD_GROUP
> > +  KVM_DEV_VFIO_DEL_GROUP
> > +
> > +Each takes a int32_t file descriptor for kvm_device_attr.addr and
> > +does not support any group device kvm_device_attr.attr.
> > +
> > +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
> > +  KVM_DEV_VFIO_GROUP_ADD & DEL?
> > diff --git a/arch/x86/include/asm/kvm_host.h 
> > b/arch/x86/include/asm/kvm_host.h
> > index c76ff74..5b9350d 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -588,6 +588,7 @@ struct kvm_arch {
> >  
> > spinlock_t pvclock_gtod_sync_lock;
> > bool use_master_clock;
> > 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Gleb Natapov
On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
> So far we've succeeded at making KVM and VFIO mostly unaware of each
> other, but there's any important point where that breaks down.  Intel
> VT-d hardware may or may not support snoop control.  When snoop
> control is available, intel-iommu promotes No-Snoop transactions on
> PCIe to be cache coherent.  That allows KVM to handle things like the
> x86 WBINVD opcode as a nop.  When the hardware does not support this,
> KVM must implement a hardware visible WBINVD for the guest.
> 
> We could simply let userspace tell KVM how to handle WBINVD, but it's
> privileged for a reason.  Allowing an arbitrary user to enable
> physical WBINVD gives them a more access to the hardware.  Previously,
> this has only been enabled for guests supporting legacy PCI device
> assignment.  In such cases it's necessary for proper guest execution.
> We therefore create a new KVM-VFIO virtual device.  The user can add
> and remove VFIO groups to this device via file descriptors.  KVM
> makes use of the VFIO external user interface to validate that the
> user has access to physical hardware and gets the coherency state of
> the IOMMU from VFIO.  This provides equivalent functionality to
> legacy KVM assignment, while keeping (nearly) all the bits isolated.
> 
Looks good overall to me, one things though: to use legacy device
assignment one needs root permission, so only root user can enable
WBINVD emulation. Who does this permission checking here? Is only root
allowed to create non coherent group with vfio?

> The one intrusion is the resulting flag indicating the coherency
> state.  For this RFC it's placed on the x86 kvm_arch struct, however
> I know POWER has interest in using the VFIO external user interface,
> and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> care about No-Snoop handling as well or the code can be #ifdef'd.
> 
> Signed-off-by: Alex Williamson 
> ---
>  Documentation/virtual/kvm/devices/vfio.txt |   22 +++
>  arch/x86/include/asm/kvm_host.h|1 
>  arch/x86/kvm/Makefile  |2 
>  arch/x86/kvm/vmx.c |5 -
>  arch/x86/kvm/x86.c |5 -
>  include/linux/kvm_host.h   |1 
>  include/uapi/linux/kvm.h   |4 
>  virt/kvm/kvm_main.c|3 
>  virt/kvm/vfio.c|  237 
> 
>  9 files changed, 275 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
>  create mode 100644 virt/kvm/vfio.c
> 
> diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
> b/Documentation/virtual/kvm/devices/vfio.txt
> new file mode 100644
> index 000..831e6a6
> --- /dev/null
> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> @@ -0,0 +1,22 @@
> +VFIO virtual device
> +===
> +
> +Device types supported:
> +  KVM_DEV_TYPE_VFIO
> +
> +Only one VFIO instance may be created per VM.  The created device
> +tracks VFIO groups in use by the VM and features of those groups
> +important to the correctness and acceleration of the VM.  As groups
> +are enabled and disabled for use by the VM, KVM should be updated
> +about their presence.  When registered with KVM, a reference to the
> +VFIO-group is held by KVM.
> +
> +Groups:
> +  KVM_DEV_VFIO_ADD_GROUP
> +  KVM_DEV_VFIO_DEL_GROUP
> +
> +Each takes a int32_t file descriptor for kvm_device_attr.addr and
> +does not support any group device kvm_device_attr.attr.
> +
> +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
> +  KVM_DEV_VFIO_GROUP_ADD & DEL?
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c76ff74..5b9350d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -588,6 +588,7 @@ struct kvm_arch {
>  
>   spinlock_t pvclock_gtod_sync_lock;
>   bool use_master_clock;
> + bool vfio_noncoherent;
>   u64 master_kernel_ns;
>   cycle_t master_cycle_now;
>  
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index bf4fb04..25d22b2 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -9,7 +9,7 @@ KVM := ../../../virt/kvm
>  
>  kvm-y+= $(KVM)/kvm_main.o $(KVM)/ioapic.o \
>   $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
> - $(KVM)/eventfd.o $(KVM)/irqchip.o
> + $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
>  kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)  += $(KVM)/assigned-dev.o $(KVM)/iommu.o
>  kvm-$(CONFIG_KVM_ASYNC_PF)   += $(KVM)/async_pf.o
>  
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1f1da43..94f7786 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7395,8 +7395,9 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
> gfn, bool is_mmio)
>*/
>   if (is_mmio)
>   

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-25 Thread Alexey Kardashevskiy
On 09/26/2013 06:19 AM, Alex Williamson wrote:
> On Sun, 2013-09-15 at 22:40 +1000, Alexey Kardashevskiy wrote:
>> On 09/14/2013 02:25 AM, Alex Williamson wrote:
>>> On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
 On 09/13/2013 07:23 AM, Alex Williamson wrote:
> So far we've succeeded at making KVM and VFIO mostly unaware of each
> other, but there's any important point where that breaks down.  Intel
> VT-d hardware may or may not support snoop control.  When snoop
> control is available, intel-iommu promotes No-Snoop transactions on
> PCIe to be cache coherent.  That allows KVM to handle things like the
> x86 WBINVD opcode as a nop.  When the hardware does not support this,
> KVM must implement a hardware visible WBINVD for the guest.
>
> We could simply let userspace tell KVM how to handle WBINVD, but it's
> privileged for a reason.  Allowing an arbitrary user to enable
> physical WBINVD gives them a more access to the hardware.  Previously,
> this has only been enabled for guests supporting legacy PCI device
> assignment.  In such cases it's necessary for proper guest execution.
> We therefore create a new KVM-VFIO virtual device.  The user can add
> and remove VFIO groups to this device via file descriptors.  KVM
> makes use of the VFIO external user interface to validate that the
> user has access to physical hardware and gets the coherency state of
> the IOMMU from VFIO.  This provides equivalent functionality to
> legacy KVM assignment, while keeping (nearly) all the bits isolated.
>
> The one intrusion is the resulting flag indicating the coherency
> state.  For this RFC it's placed on the x86 kvm_arch struct, however
> I know POWER has interest in using the VFIO external user interface,
> and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> care about No-Snoop handling as well or the code can be #ifdef'd.


 POWER does not support (at least boos3s - "server", not sure about others)
 this cache-non-coherent stuff at all.
>>>
>>> Then it's easy for your IOMMU API interface to return always cache
>>> coherent or never cache coherent or whatever ;)
>>>
 Regarding reusing this device with external API for POWER - I posted a
 patch which introduces KVM device to link KVM with IOMMU but besides the
 list of groups registered in KVM, it also provides the way to find a group
 by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
 in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
 there window_size too (for a quick boundary check). I am not sure we want
 to mix everything here.

 It is in "[PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
 handling" if you are interested (kvmppc_spapr_tce_iommu_device).
>>>
>>> Yes, I stole the code to get the vfio symbols from your code.  The
>>> convergence I was hoping to achieve is that KVM doesn't really want to
>>> know about VFIO and vica versa.  We can therefore at least limit the
>>> intrusion by sharing a common device.  Obviously for you it will need
>>> some extra interfaces to associate an LIOBN to a group, but we keep both
>>> the kernel an userspace cleaner by avoiding duplication where we can.
>>> Is this really not extensible to your usage?
>>
>> Well, I do not have a good taste for this kind of stuff so I cannot tell
>> for sure. I can reuse this device and hack it to do whatever I need but how?
>>
>> There are 2 things I need from KVM device:
>> 1. associate IOMMU group with LIOBN
> 
> Does QEMU know this?  We could set another attribute to specify the
> LIOBN for a group.

QEMU knows as QEMU decides on LOIBN number. And yes, we could do that.


>> 2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
>> pointer which is an IOMMU data of an IOMMU group so we could take a
>> shortcut here).
> 
> Once we have a VFIO device with a VFIO group added and the LIOBN
> attribute set, isn't this just a matter of some access code?


The lookup function will be called from what we call a realmode on PPC64,
i.e. when MMU is off and kvm.ko code is not available. So we will either
have to put this lookup function to a separate virt/kvm/vfio_rm.c or
compile the whole thing into the kernel image but this is not a big issue
anyway.

You can have a look for example at book3s_64_vio_hv.o vs. book3s_64_vio.o
to get a better picture if you like.


>> There are 3 possible interfaces to use:
>> A. set/get attributes
>> B. ioctl
>> C. additional API
> 
> I think we need to differentiate user interfaces from kernel interfaces.
> Within the kernel, we make no guarantees about interfaces and we can
> change them to meet our needs.  That includes the vfio external user
> interface.  For userspace, we need to be careful not to break things.  I
> suggest we use the set/get/has attribute interface that's already part
> of

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-25 Thread Alex Williamson
On Sun, 2013-09-15 at 22:40 +1000, Alexey Kardashevskiy wrote:
> On 09/14/2013 02:25 AM, Alex Williamson wrote:
> > On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
> >> On 09/13/2013 07:23 AM, Alex Williamson wrote:
> >>> So far we've succeeded at making KVM and VFIO mostly unaware of each
> >>> other, but there's any important point where that breaks down.  Intel
> >>> VT-d hardware may or may not support snoop control.  When snoop
> >>> control is available, intel-iommu promotes No-Snoop transactions on
> >>> PCIe to be cache coherent.  That allows KVM to handle things like the
> >>> x86 WBINVD opcode as a nop.  When the hardware does not support this,
> >>> KVM must implement a hardware visible WBINVD for the guest.
> >>>
> >>> We could simply let userspace tell KVM how to handle WBINVD, but it's
> >>> privileged for a reason.  Allowing an arbitrary user to enable
> >>> physical WBINVD gives them a more access to the hardware.  Previously,
> >>> this has only been enabled for guests supporting legacy PCI device
> >>> assignment.  In such cases it's necessary for proper guest execution.
> >>> We therefore create a new KVM-VFIO virtual device.  The user can add
> >>> and remove VFIO groups to this device via file descriptors.  KVM
> >>> makes use of the VFIO external user interface to validate that the
> >>> user has access to physical hardware and gets the coherency state of
> >>> the IOMMU from VFIO.  This provides equivalent functionality to
> >>> legacy KVM assignment, while keeping (nearly) all the bits isolated.
> >>>
> >>> The one intrusion is the resulting flag indicating the coherency
> >>> state.  For this RFC it's placed on the x86 kvm_arch struct, however
> >>> I know POWER has interest in using the VFIO external user interface,
> >>> and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> >>> care about No-Snoop handling as well or the code can be #ifdef'd.
> >>
> >>
> >> POWER does not support (at least boos3s - "server", not sure about others)
> >> this cache-non-coherent stuff at all.
> > 
> > Then it's easy for your IOMMU API interface to return always cache
> > coherent or never cache coherent or whatever ;)
> > 
> >> Regarding reusing this device with external API for POWER - I posted a
> >> patch which introduces KVM device to link KVM with IOMMU but besides the
> >> list of groups registered in KVM, it also provides the way to find a group
> >> by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
> >> in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
> >> there window_size too (for a quick boundary check). I am not sure we want
> >> to mix everything here.
> >>
> >> It is in "[PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
> >> handling" if you are interested (kvmppc_spapr_tce_iommu_device).
> > 
> > Yes, I stole the code to get the vfio symbols from your code.  The
> > convergence I was hoping to achieve is that KVM doesn't really want to
> > know about VFIO and vica versa.  We can therefore at least limit the
> > intrusion by sharing a common device.  Obviously for you it will need
> > some extra interfaces to associate an LIOBN to a group, but we keep both
> > the kernel an userspace cleaner by avoiding duplication where we can.
> > Is this really not extensible to your usage?
> 
> Well, I do not have a good taste for this kind of stuff so I cannot tell
> for sure. I can reuse this device and hack it to do whatever I need but how?
> 
> There are 2 things I need from KVM device:
> 1. associate IOMMU group with LIOBN

Does QEMU know this?  We could set another attribute to specify the
LIOBN for a group.

> 2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
> pointer which is an IOMMU data of an IOMMU group so we could take a
> shortcut here).

Once we have a VFIO device with a VFIO group added and the LIOBN
attribute set, isn't this just a matter of some access code?

> There are 3 possible interfaces to use:
> A. set/get attributes
> B. ioctl
> C. additional API

I think we need to differentiate user interfaces from kernel interfaces.
Within the kernel, we make no guarantees about interfaces and we can
change them to meet our needs.  That includes the vfio external user
interface.  For userspace, we need to be careful not to break things.  I
suggest we use the set/get/has attribute interface that's already part
of KVM for the user interface.

> What I posted a week ago uses A for 1 and C for 2. Now we move this to
> virt/kvm/vfio.c.

I don't care where it lives, other than we both have a need for it, so
it seems like the core of it should not live in architecture specific
directories.

> A for 1 is more or less ok but how exactly? Yet another attribute? Platform
> specific "bus ID"? In your patch attr->addr is not really an address (to be
> accessed with get_user()) but just an fd.

Is that a problem?

> For 2 - there are already A and B interfaces so we do not want C, rig

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-15 Thread Alexey Kardashevskiy
On 09/14/2013 02:25 AM, Alex Williamson wrote:
> On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
>> On 09/13/2013 07:23 AM, Alex Williamson wrote:
>>> So far we've succeeded at making KVM and VFIO mostly unaware of each
>>> other, but there's any important point where that breaks down.  Intel
>>> VT-d hardware may or may not support snoop control.  When snoop
>>> control is available, intel-iommu promotes No-Snoop transactions on
>>> PCIe to be cache coherent.  That allows KVM to handle things like the
>>> x86 WBINVD opcode as a nop.  When the hardware does not support this,
>>> KVM must implement a hardware visible WBINVD for the guest.
>>>
>>> We could simply let userspace tell KVM how to handle WBINVD, but it's
>>> privileged for a reason.  Allowing an arbitrary user to enable
>>> physical WBINVD gives them a more access to the hardware.  Previously,
>>> this has only been enabled for guests supporting legacy PCI device
>>> assignment.  In such cases it's necessary for proper guest execution.
>>> We therefore create a new KVM-VFIO virtual device.  The user can add
>>> and remove VFIO groups to this device via file descriptors.  KVM
>>> makes use of the VFIO external user interface to validate that the
>>> user has access to physical hardware and gets the coherency state of
>>> the IOMMU from VFIO.  This provides equivalent functionality to
>>> legacy KVM assignment, while keeping (nearly) all the bits isolated.
>>>
>>> The one intrusion is the resulting flag indicating the coherency
>>> state.  For this RFC it's placed on the x86 kvm_arch struct, however
>>> I know POWER has interest in using the VFIO external user interface,
>>> and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
>>> care about No-Snoop handling as well or the code can be #ifdef'd.
>>
>>
>> POWER does not support (at least boos3s - "server", not sure about others)
>> this cache-non-coherent stuff at all.
> 
> Then it's easy for your IOMMU API interface to return always cache
> coherent or never cache coherent or whatever ;)
> 
>> Regarding reusing this device with external API for POWER - I posted a
>> patch which introduces KVM device to link KVM with IOMMU but besides the
>> list of groups registered in KVM, it also provides the way to find a group
>> by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
>> in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
>> there window_size too (for a quick boundary check). I am not sure we want
>> to mix everything here.
>>
>> It is in "[PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
>> handling" if you are interested (kvmppc_spapr_tce_iommu_device).
> 
> Yes, I stole the code to get the vfio symbols from your code.  The
> convergence I was hoping to achieve is that KVM doesn't really want to
> know about VFIO and vica versa.  We can therefore at least limit the
> intrusion by sharing a common device.  Obviously for you it will need
> some extra interfaces to associate an LIOBN to a group, but we keep both
> the kernel an userspace cleaner by avoiding duplication where we can.
> Is this really not extensible to your usage?

Well, I do not have a good taste for this kind of stuff so I cannot tell
for sure. I can reuse this device and hack it to do whatever I need but how?

There are 2 things I need from KVM device:
1. associate IOMMU group with LIOBN
2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
pointer which is an IOMMU data of an IOMMU group so we could take a
shortcut here).

There are 3 possible interfaces to use:
A. set/get attributes
B. ioctl
C. additional API

What I posted a week ago uses A for 1 and C for 2. Now we move this to
virt/kvm/vfio.c.
A for 1 is more or less ok but how exactly? Yet another attribute? Platform
specific "bus ID"? In your patch attr->addr is not really an address (to be
accessed with get_user()) but just an fd.

For 2 - there are already A and B interfaces so we do not want C, right?
What kind of a good device has a backdoor? :) But A and B do not have
access control to prevent the user space from receiving a IOMMU group
pointer (which it would not be able to use anyway but still). Do we care
about this (I do not)? And using B (ioctls) within the kernel - I cannot
say I saw many usages of this.

I am pretty sure we will spend some time (not much) arguing about these
things and when we agree on something, then some of KVM maintainers will
step in and say that there is 1001st way of doing this and - goto start.
And after all, this still won't be a device as it does not emulate anything
present in the real hardware, this is just yet another interface to KVM and
some ways of using it won't be natural for somebody. /me sighs.



-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Alex Williamson
On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
> On 09/13/2013 07:23 AM, Alex Williamson wrote:
> > So far we've succeeded at making KVM and VFIO mostly unaware of each
> > other, but there's any important point where that breaks down.  Intel
> > VT-d hardware may or may not support snoop control.  When snoop
> > control is available, intel-iommu promotes No-Snoop transactions on
> > PCIe to be cache coherent.  That allows KVM to handle things like the
> > x86 WBINVD opcode as a nop.  When the hardware does not support this,
> > KVM must implement a hardware visible WBINVD for the guest.
> > 
> > We could simply let userspace tell KVM how to handle WBINVD, but it's
> > privileged for a reason.  Allowing an arbitrary user to enable
> > physical WBINVD gives them a more access to the hardware.  Previously,
> > this has only been enabled for guests supporting legacy PCI device
> > assignment.  In such cases it's necessary for proper guest execution.
> > We therefore create a new KVM-VFIO virtual device.  The user can add
> > and remove VFIO groups to this device via file descriptors.  KVM
> > makes use of the VFIO external user interface to validate that the
> > user has access to physical hardware and gets the coherency state of
> > the IOMMU from VFIO.  This provides equivalent functionality to
> > legacy KVM assignment, while keeping (nearly) all the bits isolated.
> > 
> > The one intrusion is the resulting flag indicating the coherency
> > state.  For this RFC it's placed on the x86 kvm_arch struct, however
> > I know POWER has interest in using the VFIO external user interface,
> > and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> > care about No-Snoop handling as well or the code can be #ifdef'd.
> 
> 
> POWER does not support (at least boos3s - "server", not sure about others)
> this cache-non-coherent stuff at all.

Then it's easy for your IOMMU API interface to return always cache
coherent or never cache coherent or whatever ;)

> Regarding reusing this device with external API for POWER - I posted a
> patch which introduces KVM device to link KVM with IOMMU but besides the
> list of groups registered in KVM, it also provides the way to find a group
> by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
> in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
> there window_size too (for a quick boundary check). I am not sure we want
> to mix everything here.
> 
> It is in "[PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
> handling" if you are interested (kvmppc_spapr_tce_iommu_device).

Yes, I stole the code to get the vfio symbols from your code.  The
convergence I was hoping to achieve is that KVM doesn't really want to
know about VFIO and vica versa.  We can therefore at least limit the
intrusion by sharing a common device.  Obviously for you it will need
some extra interfaces to associate an LIOBN to a group, but we keep both
the kernel an userspace cleaner by avoiding duplication where we can.
Is this really not extensible to your usage?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Alex Williamson
On Fri, 2013-09-13 at 17:52 +0300, Michael S. Tsirkin wrote:
> On Fri, Sep 13, 2013 at 08:13:40AM -0600, Alex Williamson wrote:
> > On Fri, 2013-09-13 at 15:39 +0300, Michael S. Tsirkin wrote:
> > > On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
> > > > So far we've succeeded at making KVM and VFIO mostly unaware of each
> > > > other, but there's any important point where that breaks down.  Intel
> > > > VT-d hardware may or may not support snoop control.  When snoop
> > > > control is available, intel-iommu promotes No-Snoop transactions on
> > > > PCIe to be cache coherent.  That allows KVM to handle things like the
> > > > x86 WBINVD opcode as a nop.  When the hardware does not support this,
> > > > KVM must implement a hardware visible WBINVD for the guest.
> > > > 
> > > > We could simply let userspace tell KVM how to handle WBINVD, but it's
> > > > privileged for a reason.  Allowing an arbitrary user to enable
> > > > physical WBINVD gives them a more access to the hardware.  Previously,
> > > > this has only been enabled for guests supporting legacy PCI device
> > > > assignment.  In such cases it's necessary for proper guest execution.
> > > > We therefore create a new KVM-VFIO virtual device.  The user can add
> > > > and remove VFIO groups to this device via file descriptors.  KVM
> > > > makes use of the VFIO external user interface to validate that the
> > > > user has access to physical hardware and gets the coherency state of
> > > > the IOMMU from VFIO.  This provides equivalent functionality to
> > > > legacy KVM assignment, while keeping (nearly) all the bits isolated.
> > > 
> > > 
> > > So how is the isolation handled then?
> > 
> > By isolation above, I'm only talking about the intrusive-ness of the
> > code and "leaking" vfio knowledge into KVM.  Nothing to do with device
> > isolation.
> > 
> > > How is this better than a ioctl to grant WBINVD to guest?
> > > kvm char device can be opened by any user,
> > > so any user can grant itself these priveledges.
> > > What did I miss?
> > 
> > With this interface the caller must have physical access to one or more
> > devices via vfio groups and each of those groups must be configured into
> > one or more IOMMU domains.  Furthermore, at least one of the IOMMU
> > domains must not include the IOMMU_CAP_CACHE_COHERENCY capability.  So
> > it's actually quite a significantly higher hurdle than an ioctl open to
> > anyone and the number of VMs on a given host capable of doing this is
> > bound by the number of IOMMU groups.  We do not however verify that a
> > vfio device is actually in use by the VM, but I don't think there's a
> > way to do that from KVM and I'm not sure that it's important to do so.
> > I believe having access to physical hardware is already a sufficient
> > granting of privilege to enable things like WBINVD.  Thanks,
> > 
> > Alex
> 
> Fair enough, but how about revoking the priveledge?
> For example, device can be removed by hotplug - does
> priveledge remain?
> Is this important at all?

That's where the reference comes in through the vfio external user
interface.  KVM holds a reference to the group that maintains iommu
protection for the group.  Therefore there's no revocation, KVM needs to
release the reference either under direction of the user, closing of the
KVM device fd, or teardown of the KVM VM.

Unused devices can be hotplugged from the group; used devices will be
blocked by the user's reference to the device.  Hot-added devices
devices will join the group and be added to the IOMMU domain.  If a
non-vfio driver is attached to a member of the group, we'll BUG_ON, just
like we do for the normal user case.  Eventually this needs to be
converted to tracking and killing user processes, but that will have the
effect of killing the VM and closing all the file descriptors,
automatically cleaning up the KVM held group reference.

I suppose the corner case you might be looking for is the scenario where
the user enables WBINVD via a group and never uses the device.  The
device is then hot-removed.  Should the user still have WBINVD?
Probably not.  Is it worth worrying about?  Probably not.  The user
still has privileges to the IOMMU domain, even though it has no devices.
If the device is re-added, it will go back into the same IOMMU domain,
so there's no opportunity for another user to gain privilege.  If we
wanted to track such a situation we'd need to introduce some mechanism
for KVM to be notified of group changes to re-evaluate the coherency
flags.  That all seems like negotiation between KVM and vfio in the
external user interface that isn't exposed to the user, should we decide
to handle it.  Thanks,

Alex

> > > > The one intrusion is the resulting flag indicating the coherency
> > > > state.  For this RFC it's placed on the x86 kvm_arch struct, however
> > > > I know POWER has interest in using the VFIO external user interface,
> > > > and I'm hoping we can share a common KVM-VFIO device.  Perhaps 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Michael S. Tsirkin
On Fri, Sep 13, 2013 at 08:13:40AM -0600, Alex Williamson wrote:
> On Fri, 2013-09-13 at 15:39 +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
> > > So far we've succeeded at making KVM and VFIO mostly unaware of each
> > > other, but there's any important point where that breaks down.  Intel
> > > VT-d hardware may or may not support snoop control.  When snoop
> > > control is available, intel-iommu promotes No-Snoop transactions on
> > > PCIe to be cache coherent.  That allows KVM to handle things like the
> > > x86 WBINVD opcode as a nop.  When the hardware does not support this,
> > > KVM must implement a hardware visible WBINVD for the guest.
> > > 
> > > We could simply let userspace tell KVM how to handle WBINVD, but it's
> > > privileged for a reason.  Allowing an arbitrary user to enable
> > > physical WBINVD gives them a more access to the hardware.  Previously,
> > > this has only been enabled for guests supporting legacy PCI device
> > > assignment.  In such cases it's necessary for proper guest execution.
> > > We therefore create a new KVM-VFIO virtual device.  The user can add
> > > and remove VFIO groups to this device via file descriptors.  KVM
> > > makes use of the VFIO external user interface to validate that the
> > > user has access to physical hardware and gets the coherency state of
> > > the IOMMU from VFIO.  This provides equivalent functionality to
> > > legacy KVM assignment, while keeping (nearly) all the bits isolated.
> > 
> > 
> > So how is the isolation handled then?
> 
> By isolation above, I'm only talking about the intrusive-ness of the
> code and "leaking" vfio knowledge into KVM.  Nothing to do with device
> isolation.
> 
> > How is this better than a ioctl to grant WBINVD to guest?
> > kvm char device can be opened by any user,
> > so any user can grant itself these priveledges.
> > What did I miss?
> 
> With this interface the caller must have physical access to one or more
> devices via vfio groups and each of those groups must be configured into
> one or more IOMMU domains.  Furthermore, at least one of the IOMMU
> domains must not include the IOMMU_CAP_CACHE_COHERENCY capability.  So
> it's actually quite a significantly higher hurdle than an ioctl open to
> anyone and the number of VMs on a given host capable of doing this is
> bound by the number of IOMMU groups.  We do not however verify that a
> vfio device is actually in use by the VM, but I don't think there's a
> way to do that from KVM and I'm not sure that it's important to do so.
> I believe having access to physical hardware is already a sufficient
> granting of privilege to enable things like WBINVD.  Thanks,
> 
> Alex

Fair enough, but how about revoking the priveledge?
For example, device can be removed by hotplug - does
priveledge remain?
Is this important at all?


> > > The one intrusion is the resulting flag indicating the coherency
> > > state.  For this RFC it's placed on the x86 kvm_arch struct, however
> > > I know POWER has interest in using the VFIO external user interface,
> > > and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> > > care about No-Snoop handling as well or the code can be #ifdef'd.
> > > 
> > > Signed-off-by: Alex Williamson 
> > > ---
> > >  Documentation/virtual/kvm/devices/vfio.txt |   22 +++
> > >  arch/x86/include/asm/kvm_host.h|1 
> > >  arch/x86/kvm/Makefile  |2 
> > >  arch/x86/kvm/vmx.c |5 -
> > >  arch/x86/kvm/x86.c |5 -
> > >  include/linux/kvm_host.h   |1 
> > >  include/uapi/linux/kvm.h   |4 
> > >  virt/kvm/kvm_main.c|3 
> > >  virt/kvm/vfio.c|  237 
> > > 
> > >  9 files changed, 275 insertions(+), 5 deletions(-)
> > >  create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
> > >  create mode 100644 virt/kvm/vfio.c
> > > 
> > > diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
> > > b/Documentation/virtual/kvm/devices/vfio.txt
> > > new file mode 100644
> > > index 000..831e6a6
> > > --- /dev/null
> > > +++ b/Documentation/virtual/kvm/devices/vfio.txt
> > > @@ -0,0 +1,22 @@
> > > +VFIO virtual device
> > > +===
> > > +
> > > +Device types supported:
> > > +  KVM_DEV_TYPE_VFIO
> > > +
> > > +Only one VFIO instance may be created per VM.  The created device
> > > +tracks VFIO groups in use by the VM and features of those groups
> > > +important to the correctness and acceleration of the VM.  As groups
> > > +are enabled and disabled for use by the VM, KVM should be updated
> > > +about their presence.  When registered with KVM, a reference to the
> > > +VFIO-group is held by KVM.
> > > +
> > > +Groups:
> > > +  KVM_DEV_VFIO_ADD_GROUP
> > > +  KVM_DEV_VFIO_DEL_GROUP
> > > +
> > > +Each takes a int32_t file descriptor for kvm_device_attr.addr

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Alex Williamson
On Fri, 2013-09-13 at 15:39 +0300, Michael S. Tsirkin wrote:
> On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
> > So far we've succeeded at making KVM and VFIO mostly unaware of each
> > other, but there's any important point where that breaks down.  Intel
> > VT-d hardware may or may not support snoop control.  When snoop
> > control is available, intel-iommu promotes No-Snoop transactions on
> > PCIe to be cache coherent.  That allows KVM to handle things like the
> > x86 WBINVD opcode as a nop.  When the hardware does not support this,
> > KVM must implement a hardware visible WBINVD for the guest.
> > 
> > We could simply let userspace tell KVM how to handle WBINVD, but it's
> > privileged for a reason.  Allowing an arbitrary user to enable
> > physical WBINVD gives them a more access to the hardware.  Previously,
> > this has only been enabled for guests supporting legacy PCI device
> > assignment.  In such cases it's necessary for proper guest execution.
> > We therefore create a new KVM-VFIO virtual device.  The user can add
> > and remove VFIO groups to this device via file descriptors.  KVM
> > makes use of the VFIO external user interface to validate that the
> > user has access to physical hardware and gets the coherency state of
> > the IOMMU from VFIO.  This provides equivalent functionality to
> > legacy KVM assignment, while keeping (nearly) all the bits isolated.
> 
> 
> So how is the isolation handled then?

By isolation above, I'm only talking about the intrusive-ness of the
code and "leaking" vfio knowledge into KVM.  Nothing to do with device
isolation.

> How is this better than a ioctl to grant WBINVD to guest?
> kvm char device can be opened by any user,
> so any user can grant itself these priveledges.
> What did I miss?

With this interface the caller must have physical access to one or more
devices via vfio groups and each of those groups must be configured into
one or more IOMMU domains.  Furthermore, at least one of the IOMMU
domains must not include the IOMMU_CAP_CACHE_COHERENCY capability.  So
it's actually quite a significantly higher hurdle than an ioctl open to
anyone and the number of VMs on a given host capable of doing this is
bound by the number of IOMMU groups.  We do not however verify that a
vfio device is actually in use by the VM, but I don't think there's a
way to do that from KVM and I'm not sure that it's important to do so.
I believe having access to physical hardware is already a sufficient
granting of privilege to enable things like WBINVD.  Thanks,

Alex

> > The one intrusion is the resulting flag indicating the coherency
> > state.  For this RFC it's placed on the x86 kvm_arch struct, however
> > I know POWER has interest in using the VFIO external user interface,
> > and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> > care about No-Snoop handling as well or the code can be #ifdef'd.
> > 
> > Signed-off-by: Alex Williamson 
> > ---
> >  Documentation/virtual/kvm/devices/vfio.txt |   22 +++
> >  arch/x86/include/asm/kvm_host.h|1 
> >  arch/x86/kvm/Makefile  |2 
> >  arch/x86/kvm/vmx.c |5 -
> >  arch/x86/kvm/x86.c |5 -
> >  include/linux/kvm_host.h   |1 
> >  include/uapi/linux/kvm.h   |4 
> >  virt/kvm/kvm_main.c|3 
> >  virt/kvm/vfio.c|  237 
> > 
> >  9 files changed, 275 insertions(+), 5 deletions(-)
> >  create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
> >  create mode 100644 virt/kvm/vfio.c
> > 
> > diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
> > b/Documentation/virtual/kvm/devices/vfio.txt
> > new file mode 100644
> > index 000..831e6a6
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/devices/vfio.txt
> > @@ -0,0 +1,22 @@
> > +VFIO virtual device
> > +===
> > +
> > +Device types supported:
> > +  KVM_DEV_TYPE_VFIO
> > +
> > +Only one VFIO instance may be created per VM.  The created device
> > +tracks VFIO groups in use by the VM and features of those groups
> > +important to the correctness and acceleration of the VM.  As groups
> > +are enabled and disabled for use by the VM, KVM should be updated
> > +about their presence.  When registered with KVM, a reference to the
> > +VFIO-group is held by KVM.
> > +
> > +Groups:
> > +  KVM_DEV_VFIO_ADD_GROUP
> > +  KVM_DEV_VFIO_DEL_GROUP
> > +
> > +Each takes a int32_t file descriptor for kvm_device_attr.addr and
> > +does not support any group device kvm_device_attr.attr.
> > +
> > +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
> > +  KVM_DEV_VFIO_GROUP_ADD & DEL?
> > diff --git a/arch/x86/include/asm/kvm_host.h 
> > b/arch/x86/include/asm/kvm_host.h
> > index c76ff74..5b9350d 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -588,6 +588,7

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Michael S. Tsirkin
On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
> So far we've succeeded at making KVM and VFIO mostly unaware of each
> other, but there's any important point where that breaks down.  Intel
> VT-d hardware may or may not support snoop control.  When snoop
> control is available, intel-iommu promotes No-Snoop transactions on
> PCIe to be cache coherent.  That allows KVM to handle things like the
> x86 WBINVD opcode as a nop.  When the hardware does not support this,
> KVM must implement a hardware visible WBINVD for the guest.
> 
> We could simply let userspace tell KVM how to handle WBINVD, but it's
> privileged for a reason.  Allowing an arbitrary user to enable
> physical WBINVD gives them a more access to the hardware.  Previously,
> this has only been enabled for guests supporting legacy PCI device
> assignment.  In such cases it's necessary for proper guest execution.
> We therefore create a new KVM-VFIO virtual device.  The user can add
> and remove VFIO groups to this device via file descriptors.  KVM
> makes use of the VFIO external user interface to validate that the
> user has access to physical hardware and gets the coherency state of
> the IOMMU from VFIO.  This provides equivalent functionality to
> legacy KVM assignment, while keeping (nearly) all the bits isolated.


So how is the isolation handled then?
How is this better than a ioctl to grant WBINVD to guest?
kvm char device can be opened by any user,
so any user can grant itself these priveledges.
What did I miss?


> The one intrusion is the resulting flag indicating the coherency
> state.  For this RFC it's placed on the x86 kvm_arch struct, however
> I know POWER has interest in using the VFIO external user interface,
> and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> care about No-Snoop handling as well or the code can be #ifdef'd.
> 
> Signed-off-by: Alex Williamson 
> ---
>  Documentation/virtual/kvm/devices/vfio.txt |   22 +++
>  arch/x86/include/asm/kvm_host.h|1 
>  arch/x86/kvm/Makefile  |2 
>  arch/x86/kvm/vmx.c |5 -
>  arch/x86/kvm/x86.c |5 -
>  include/linux/kvm_host.h   |1 
>  include/uapi/linux/kvm.h   |4 
>  virt/kvm/kvm_main.c|3 
>  virt/kvm/vfio.c|  237 
> 
>  9 files changed, 275 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
>  create mode 100644 virt/kvm/vfio.c
> 
> diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
> b/Documentation/virtual/kvm/devices/vfio.txt
> new file mode 100644
> index 000..831e6a6
> --- /dev/null
> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> @@ -0,0 +1,22 @@
> +VFIO virtual device
> +===
> +
> +Device types supported:
> +  KVM_DEV_TYPE_VFIO
> +
> +Only one VFIO instance may be created per VM.  The created device
> +tracks VFIO groups in use by the VM and features of those groups
> +important to the correctness and acceleration of the VM.  As groups
> +are enabled and disabled for use by the VM, KVM should be updated
> +about their presence.  When registered with KVM, a reference to the
> +VFIO-group is held by KVM.
> +
> +Groups:
> +  KVM_DEV_VFIO_ADD_GROUP
> +  KVM_DEV_VFIO_DEL_GROUP
> +
> +Each takes a int32_t file descriptor for kvm_device_attr.addr and
> +does not support any group device kvm_device_attr.attr.
> +
> +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
> +  KVM_DEV_VFIO_GROUP_ADD & DEL?
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c76ff74..5b9350d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -588,6 +588,7 @@ struct kvm_arch {
>  
>   spinlock_t pvclock_gtod_sync_lock;
>   bool use_master_clock;
> + bool vfio_noncoherent;
>   u64 master_kernel_ns;
>   cycle_t master_cycle_now;
>  
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index bf4fb04..25d22b2 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -9,7 +9,7 @@ KVM := ../../../virt/kvm
>  
>  kvm-y+= $(KVM)/kvm_main.o $(KVM)/ioapic.o \
>   $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
> - $(KVM)/eventfd.o $(KVM)/irqchip.o
> + $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
>  kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)  += $(KVM)/assigned-dev.o $(KVM)/iommu.o
>  kvm-$(CONFIG_KVM_ASYNC_PF)   += $(KVM)/async_pf.o
>  
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1f1da43..94f7786 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7395,8 +7395,9 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
> gfn, bool is_mmio)
>*/
>   if (is_mmio)
>   ret = MTRR_TYPE_UNCACHABLE << VMX_EP

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-13 Thread Alexey Kardashevskiy
On 09/13/2013 07:23 AM, Alex Williamson wrote:
> So far we've succeeded at making KVM and VFIO mostly unaware of each
> other, but there's any important point where that breaks down.  Intel
> VT-d hardware may or may not support snoop control.  When snoop
> control is available, intel-iommu promotes No-Snoop transactions on
> PCIe to be cache coherent.  That allows KVM to handle things like the
> x86 WBINVD opcode as a nop.  When the hardware does not support this,
> KVM must implement a hardware visible WBINVD for the guest.
> 
> We could simply let userspace tell KVM how to handle WBINVD, but it's
> privileged for a reason.  Allowing an arbitrary user to enable
> physical WBINVD gives them a more access to the hardware.  Previously,
> this has only been enabled for guests supporting legacy PCI device
> assignment.  In such cases it's necessary for proper guest execution.
> We therefore create a new KVM-VFIO virtual device.  The user can add
> and remove VFIO groups to this device via file descriptors.  KVM
> makes use of the VFIO external user interface to validate that the
> user has access to physical hardware and gets the coherency state of
> the IOMMU from VFIO.  This provides equivalent functionality to
> legacy KVM assignment, while keeping (nearly) all the bits isolated.
> 
> The one intrusion is the resulting flag indicating the coherency
> state.  For this RFC it's placed on the x86 kvm_arch struct, however
> I know POWER has interest in using the VFIO external user interface,
> and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> care about No-Snoop handling as well or the code can be #ifdef'd.


POWER does not support (at least boos3s - "server", not sure about others)
this cache-non-coherent stuff at all.

Regarding reusing this device with external API for POWER - I posted a
patch which introduces KVM device to link KVM with IOMMU but besides the
list of groups registered in KVM, it also provides the way to find a group
by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
there window_size too (for a quick boundary check). I am not sure we want
to mix everything here.

It is in "[PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
handling" if you are interested (kvmppc_spapr_tce_iommu_device).



-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-12 Thread Alex Williamson
So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but there's any important point where that breaks down.  Intel
VT-d hardware may or may not support snoop control.  When snoop
control is available, intel-iommu promotes No-Snoop transactions on
PCIe to be cache coherent.  That allows KVM to handle things like the
x86 WBINVD opcode as a nop.  When the hardware does not support this,
KVM must implement a hardware visible WBINVD for the guest.

We could simply let userspace tell KVM how to handle WBINVD, but it's
privileged for a reason.  Allowing an arbitrary user to enable
physical WBINVD gives them a more access to the hardware.  Previously,
this has only been enabled for guests supporting legacy PCI device
assignment.  In such cases it's necessary for proper guest execution.
We therefore create a new KVM-VFIO virtual device.  The user can add
and remove VFIO groups to this device via file descriptors.  KVM
makes use of the VFIO external user interface to validate that the
user has access to physical hardware and gets the coherency state of
the IOMMU from VFIO.  This provides equivalent functionality to
legacy KVM assignment, while keeping (nearly) all the bits isolated.

The one intrusion is the resulting flag indicating the coherency
state.  For this RFC it's placed on the x86 kvm_arch struct, however
I know POWER has interest in using the VFIO external user interface,
and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
care about No-Snoop handling as well or the code can be #ifdef'd.

Signed-off-by: Alex Williamson 
---
 Documentation/virtual/kvm/devices/vfio.txt |   22 +++
 arch/x86/include/asm/kvm_host.h|1 
 arch/x86/kvm/Makefile  |2 
 arch/x86/kvm/vmx.c |5 -
 arch/x86/kvm/x86.c |5 -
 include/linux/kvm_host.h   |1 
 include/uapi/linux/kvm.h   |4 
 virt/kvm/kvm_main.c|3 
 virt/kvm/vfio.c|  237 
 9 files changed, 275 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
 create mode 100644 virt/kvm/vfio.c

diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
b/Documentation/virtual/kvm/devices/vfio.txt
new file mode 100644
index 000..831e6a6
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -0,0 +1,22 @@
+VFIO virtual device
+===
+
+Device types supported:
+  KVM_DEV_TYPE_VFIO
+
+Only one VFIO instance may be created per VM.  The created device
+tracks VFIO groups in use by the VM and features of those groups
+important to the correctness and acceleration of the VM.  As groups
+are enabled and disabled for use by the VM, KVM should be updated
+about their presence.  When registered with KVM, a reference to the
+VFIO-group is held by KVM.
+
+Groups:
+  KVM_DEV_VFIO_ADD_GROUP
+  KVM_DEV_VFIO_DEL_GROUP
+
+Each takes a int32_t file descriptor for kvm_device_attr.addr and
+does not support any group device kvm_device_attr.attr.
+
+RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
+  KVM_DEV_VFIO_GROUP_ADD & DEL?
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c76ff74..5b9350d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -588,6 +588,7 @@ struct kvm_arch {
 
spinlock_t pvclock_gtod_sync_lock;
bool use_master_clock;
+   bool vfio_noncoherent;
u64 master_kernel_ns;
cycle_t master_cycle_now;
 
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index bf4fb04..25d22b2 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -9,7 +9,7 @@ KVM := ../../../virt/kvm
 
 kvm-y  += $(KVM)/kvm_main.o $(KVM)/ioapic.o \
$(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
-   $(KVM)/eventfd.o $(KVM)/irqchip.o
+   $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= $(KVM)/assigned-dev.o $(KVM)/iommu.o
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1f1da43..94f7786 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7395,8 +7395,9 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
 */
if (is_mmio)
ret = MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
-   else if (vcpu->kvm->arch.iommu_domain &&
-   !(vcpu->kvm->arch.iommu_flags & KVM_IOMMU_CACHE_COHERENCY))
+   else if (vcpu->kvm->arch.vfio_noncoherent ||
+vcpu->kvm->arch.iommu_domain &&
+!(vcpu->kvm->arch.iommu_flags & KVM_IOMMU_CACHE_COHERENCY))
ret = kvm_get_guest_memory_type(vcpu, gfn) <<
  VMX_EPT_MT_EPTE_SHIFT;
else
diff --git a/arch/x8