[RFC Design Doc v3] Enable Shared Virtual Memory feature in pass-through scenarios

2016-11-30 Thread Liu, Yi L
What's changed from v2:
a) Detailed feature description
b) refine description in "Address translation in virtual SVM"
b) "Terms" is added

Content
===
1. Feature description
2. Why use it?
3. How to enable it
4. How to test
5. Terms

Details
===
1. Feature description
Shared virtual memory(SVM) is to let application program share its virtual
address with SVM capable devices. 

Shared virtual memory details:
a) SVM feature requires ATS/PRQ/PASID support on both device side and
IOMMU side.
b) SVM capable devices could send DMA requests with PASID, the address
in the request would be a virtual address within a program's virtual address
space.
c) IOMMU would use first level page table to translate the address in the
request.
d) First level page table is a HVA->HPA mapping on bare metal.

Shared Virtual Memory feature in pass-through scenarios is actually SVM
virtualization. It is to let application programs(running in guest)share their
virtual address with assigned device(e.g. graphics processors or accelerators).

In virtualization, SVM would be:
a) Require a vIOMMU exposed to guest
b) Assigned SVM capable device could send DMA requests with PASID, the
address in the request would be a virtual address within a guest
program's virtual address space(GVA).
c) Physical IOMMU needs to do GVA->GPA->HPA translation. Nested mode
would be enabled, first level page table would achieve GVA->GPA mapping,
while second level page table would achieve GPA->HPA translation.

For more SVM detail, you may want refer to section 2.5.1.1 of Intel VT-d spec
and section 5.6 of OpenCL spec. For details about SVM address translation,
pls refer to section 3 of Intel VT-d spec.
It's also welcomed to discuss directly in this thread.

Link to related specs:
http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf


2. Why use it?
It is common to pass-through devices to guest and expect to achieve as
much similar performance as it is on host. With this feature enabled, 
the application programs in guest would be able to share data-structures
with assigned devices without unnecessary overheads.


3. How to enable it
As mentioned above, SVM virtualization requires a vIOMMU exposed to guest.
Since there is an existing IOMMU emulator in host user space(QEMU), it is
more acceptable to extend the IOMMU emulator to support SVM for assigned
devices. So far, the vIOMMU exposed to guest is only for emulated devices.
In this design, it would focus on virtual SVM for assigned devices. Virtual
IOVA and virtual interrupt remapping will not be included here.

The enabling work would include the following items.

a) IOMMU Register Access Emulation
Already existed in QEMU, need some extensions to support SVM. e.g. support
page request service related registers(PQA_REG).

b) vIOMMU Capability
Report SVM related capabilities(PASID,PRS,DT,PT,ECS etc.) in ex-capability
register and cache mode, DWD, DRD in capability register.

c) QI Handling Emulation
Already existed in QEMU, need to shadow the QIs related to assigned devices to
physical IOMMU.
i.  ex-context entry cache invalidation(nested mode setting, guest PASID 
table
pointer shadowing)
ii. 1st level translation cache invalidation
iii.Response for recoverable faults

d) Address translation in virtual SVM
In virtualization, for requests with PASID from assigned device, the address 
translation
would be subjected to first level page table and then second level page table, 
which is
named nested mode. Extended context mode should be supported on hardware. DMA
remapping in SVM virtualization would be:
i.  For requests with PASID, the related extended context entry should have
the NESTE bit set. 
ii. Guest PASID table pointer should be shadowed to host IOMMU driver.
The PASID table pointer field in extended context entry would be a GPA as
nested mode is on.

First level page table would be maintained by guest IOMMU driver. Second level
page table would be maintained by host IOMMU driver.

e) Recoverable Address Translation Faults Handling Emulation
It is serviced by page request when device support PRS. For assigned devices, 
host IOMMU driver would get page requests from pIOMMU. Here, we need a
mechanism to drain the page requests from devices which are assigned to a
guest. In this design it would be done through VFIO. Page request descriptors
would be propagated to user space and then exposed to guest IOMMU driver.
This requires following support:
i.  a mechanism to notify vIOMMU emulator to fetch PRQ descriptor
ii. a notify framework in QEMU to signal the PRQ descriptor fetching when
notified by pIOMMU

f) Non-Recoverable Address Translation Handling Emulation
The non-recoverable fault propagation is similar to recoverable faults. In
this design it would propagate fault data to user space

RE: [RFC PATCH 00/30] Add PCIe SVM support to ARM SMMUv3

2017-03-06 Thread Liu, Yi L


> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of Jean-Philippe Brucker
> Sent: Tuesday, February 28, 2017 3:54 AM
> Cc: Shanker Donthineni ; k...@vger.kernel.org;
> Catalin Marinas ; Sinan Kaya
> ; Will Deacon ;
> iommu@lists.linux-foundation.org; Harv Abdulhamid ;
> linux-...@vger.kernel.org; Bjorn Helgaas ; David
> Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate
> Watterson 
> Subject: [RFC PATCH 00/30] Add PCIe SVM support to ARM SMMUv3
> 
> Hi,
> 
> This series adds support for PCI ATS, PRI and PASID extensions to the
> SMMUv3 driver. In systems that support it, it is now possible for some 
> high-end
> devices to perform DMA into process address spaces. Page tables are shared
> between MMU and SMMU; page faults from devices are recoverable and handled by
> the mm subsystem.
> 
> We propose an extension to the IOMMU API that unifies existing SVM
> implementations (AMD, Intel and ARM) in patches 22 and 24. Nothing is set in 
> stone,
> the goal is to start discussions and find an intersection between 
> implementations.
> 
> We also propose a VFIO interface in patches 29 and 30, that allows userspace 
> device
> drivers to make use of SVM. It would also serve as example implementation for
> other device drivers.
> 
> Overview of the patches:
> 
> * 1 and 2 prepare the SMMUv3 structures for ATS,
> * 3 to 5 enable ATS for devices that support it.
> * 6 to 10 prepare the SMMUv3 structures for PASID and PRI. Patch 9,
>   in particular, provides details on the structure requirements.
> * 11 introduces an interface for sharing ASIDs on ARM64,
> * 12 to 17 add more infrastructure for sharing page tables,
> * 18 and 19 add minor helpers to PCI,
> * 20 enables PASID in devices that support it,

Jean, supposedly, you will introduce a PASID management mechanism in
SMMU v3 driver. Here I have a question about PASID management on ARM.
Will there be a system wide PASID table? Or there is equivalent implementation.

Thanks,
Yi L 

> * 21 enables PRI and adds device fault handler,
> * 22 and 24 draft a possible interface for SVM in the IOMMU API
> * 23 and 25-28 finalize support for SVM in SMMUv3
> * 29 and 30 draft a possible interface for SVM in VFIO.
> 
> The series is available on git://linux-arm.org/linux-jpb.git svm/rfc1 Enable
> CONFIG_PCI_PASID, CONFIG_PCI_PRI and you should be good to go.
> 
> So far, this has only been tested with a software model of an SMMUv3 and a 
> PCIe
> DMA engine. We don't intend to get this merged until it has been tested on 
> silicon,
> but at least the driver implementation should be mature enough. I might split 
> next
> versions depending on what is ready and what needs more work so we can merge 
> it
> progressively.
> 
> A lot of open questions remain:
> 
> 1. Can we declare that PASID 0 is always invalid?
> 
> 2. For this prototype, I kept the interface simple from an implementation
>perspective. At the moment is is "bind this device to that address
>space". For consistency with the rest of VFIO and IOMMU, I think "bind
>this container to that address space" would be more in line with VFIO,
>and "bind that group to that address space" more in line with IOMMU.
>VFIO would tell the IOMMU "for all groups in this container, bind to
>that address space".
>This raises the question of inconsistency between device capabilities.
>When adding a device that supports less PASID bits to a group, what do
>we do? What if we already allocated a PASID that is out of range for
>the new device?
> 
> 3. How do we reconcile the IOMMU fault reporting infrastructure with the
>SVM interface?
> 
> 4. SVM is the product of two features: handling device faults, and devices
>having multiple address spaces. What about one feature without the
>other?
>a. If we cannot afford to have a device fault, can we at least share a
>   pinned address space? Pinning all current memory would be done by
>   vfio, but there also need to be pinning of all future mappings.
>   (mlock isn't sufficient, still allows for minor faults.)
>b. If the device has a single address space, can we still bind it to a
>   process? The main issue with unifying DMA and process page tables is
>   reserved regions on the device side. What do we do if, for instance,
>   and MSI frame address clashes with a process mapping? Or if a
>   process mapping exists outside of the device's DMA window?
> 
> Please find more details in the IOMMU API and VFIO patches.
> 
> Thanks,
> Jean-Philippe
> 
> Cc: Harv Abdulhamid 
> Cc: Will Deacon 
> Cc: Shanker Donthineni 
> Cc: Bjorn Helgaas 
> Cc: Sinan Kaya 
> Cc: Lorenzo Pieralisi 
> Cc: Catalin Marinas 
> Cc: Robin Murphy 
> Cc: Joerg Roedel 
> Cc: Nate Watterson 
> Cc: Alex Williamson 
> Cc: David Woodhouse 
> 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-...@vger.kernel.org
> Cc: iommu@lists.linux-foundati

RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory

2017-03-21 Thread Liu, Yi L
Hi Jean,

I'm working on virtual SVM, and have some comments on the VFIO channel
definition.

> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of Jean-Philippe Brucker
> Sent: Tuesday, February 28, 2017 3:55 AM
> Cc: Shanker Donthineni ; k...@vger.kernel.org;
> Catalin Marinas ; Sinan Kaya
> ; Will Deacon ;
> iommu@lists.linux-foundation.org; Harv Abdulhamid ;
> linux-...@vger.kernel.org; Bjorn Helgaas ; David
> Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate
> Watterson 
> Subject: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
> 
> Add two new ioctl for VFIO devices. VFIO_DEVICE_BIND_TASK creates a bond
> between a device and a process address space, identified by a device-specific 
> ID
> named PASID. This allows the device to target DMA transactions at the process
> virtual addresses without a need for mapping and unmapping buffers explicitly 
> in the
> IOMMU. The process page tables are shared with the IOMMU, and mechanisms such
> as PCI ATS/PRI may be used to handle faults. VFIO_DEVICE_UNBIND_TASK removed
> a bond identified by a PASID.
> 
> Also add a capability flag in device info to detect whether the system and 
> the device
> support SVM.
> 
> Users need to specify the state of a PASID when unbinding, with flags
> VFIO_PASID_RELEASE_FLUSHED and VFIO_PASID_RELEASE_CLEAN. Even for PCI,
> PASID invalidation is specific to each device and only partially covered by 
> the
> specification:
> 
> * Device must have an implementation-defined mechanism for stopping the
>   use of a PASID. When this mechanism finishes, the device has stopped
>   issuing transactions for this PASID and all transactions for this PASID
>   have been flushed to the IOMMU.
> 
> * Device may either wait for all outstanding PRI requests for this PASID
>   to finish, or issue a Stop Marker message, a barrier that separates PRI
>   requests affecting this instance of the PASID from PRI requests
>   affecting the next instance. In the first case, we say that the PASID is
>   "clean", in the second case it is "flushed" (and the IOMMU has to wait
>   for the Stop Marker before reassigning the PASID.)
> 
> We expect similar distinctions for platform devices. Ideally there should be 
> a callback
> for each PCI device, allowing the IOMMU to ask the device to stop using a 
> PASID.
> When the callback returns, the PASID is either flushed or clean and the 
> return value
> tells which.
> 
> For the moment I don't know how to implement this callback for PCI, so if the 
> user
> forgets to call unbind with either "clean" or "flushed", the PASID is never 
> reused. For
> platform devices, it might be simpler to implement since we could associate an
> invalidate_pasid callback to a DT compatible string, as is currently done for 
> reset.
> 
> Signed-off-by: Jean-Philippe Brucker 

[...]

>  drivers/vfio/pci/vfio_pci.c |  24 ++
>  drivers/vfio/vfio.c | 104 
> 
>  include/uapi/linux/vfio.h   |  55 +++
>  3 files changed, 183 insertions(+)
> 
...
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index
> 519eff362c1c..3fe4197a5ea0 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -198,6 +198,7 @@ struct vfio_device_info {
>  #define VFIO_DEVICE_FLAGS_PCI(1 << 1)/* vfio-pci device */
>  #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)  /* vfio-platform device */
>  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3) /* vfio-amba device */
> +#define VFIO_DEVICE_FLAGS_SVM(1 << 4)/* Device supports 
> bind/unbind */
>   __u32   num_regions;/* Max region index + 1 */
>   __u32   num_irqs;   /* Max IRQ index + 1 */
>  };
> @@ -409,6 +410,60 @@ struct vfio_irq_set {
>   */
>  #define VFIO_DEVICE_RESET_IO(VFIO_TYPE, VFIO_BASE + 11)
> 
> +struct vfio_device_svm {
> + __u32   argsz;
> + __u32   flags;
> +#define VFIO_SVM_PASID_RELEASE_FLUSHED   (1 << 0)
> +#define VFIO_SVM_PASID_RELEASE_CLEAN (1 << 1)
> + __u32   pasid;
> +};

For virtual SVM work, the VFIO channel would be used to passdown guest
PASID tale PTR and invalidation information. And may have further usage
except the above.

Here is the virtual SVM design doc which illustrates the VFIO usage.
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html

For the guest PASID table ptr passdown, I've following message in pseudo code.
struct pasid_table_info {
__u64 ptr;
__u32 size;
 };

For invalidation, I've following info in in pseudo code.
struct iommu_svm_tlb_invalidate_info
{
   __u32 inv_type;
#define IOTLB_INV   (1 << 0)
#define EXTENDED_IOTLB_INV  (1 << 1)
#define DEVICE_IOTLB_INV(1 << 2)
#define EXTENDED_DEVICE_IOTLB_INV   (1 << 3)
#define PASID_CACHE_INV (1 << 4)
   __u32 pasid;
   __u64 addr

RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory

2017-03-23 Thread Liu, Yi L
Hi Jean,

Thx for the excellent ideas. Pls refer to comments inline.

[...]

> > Hi Jean,
> >
> > I'm working on virtual SVM, and have some comments on the VFIO channel
> > definition.
> 
> Thanks a lot for the comments, this is quite interesting to me. I just have 
> some
> concerns about portability so I'm proposing a way to be slightly more generic 
> below.
> 

yes, portability is what need to consider.

[...]

> >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >> index
> >> 519eff362c1c..3fe4197a5ea0 100644
> >> --- a/include/uapi/linux/vfio.h
> >> +++ b/include/uapi/linux/vfio.h
> >> @@ -198,6 +198,7 @@ struct vfio_device_info {
> >>  #define VFIO_DEVICE_FLAGS_PCI (1 << 1)/* vfio-pci device */
> >>  #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)   /* vfio-platform device 
> >> */
> >>  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)  /* vfio-amba device */
> >> +#define VFIO_DEVICE_FLAGS_SVM (1 << 4)/* Device supports 
> >> bind/unbind */
> >>__u32   num_regions;/* Max region index + 1 */
> >>__u32   num_irqs;   /* Max IRQ index + 1 */
> >>  };
> >> @@ -409,6 +410,60 @@ struct vfio_irq_set {
> >>   */
> >>  #define VFIO_DEVICE_RESET _IO(VFIO_TYPE, VFIO_BASE + 11)
> >>
> >> +struct vfio_device_svm {
> >> +  __u32   argsz;
> >> +  __u32   flags;
> >> +#define VFIO_SVM_PASID_RELEASE_FLUSHED(1 << 0)
> >> +#define VFIO_SVM_PASID_RELEASE_CLEAN  (1 << 1)
> >> +  __u32   pasid;
> >> +};
> >
> > For virtual SVM work, the VFIO channel would be used to passdown guest
> > PASID tale PTR and invalidation information. And may have further
> > usage except the above.
> >
> > Here is the virtual SVM design doc which illustrates the VFIO usage.
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >
> > For the guest PASID table ptr passdown, I've following message in pseudo 
> > code.
> > struct pasid_table_info {
> > __u64 ptr;
> > __u32 size;
> >  };
> 
> There should probably be a way to specify the table format, so that the pIOMMU
> driver can check that it recognizes the format used by the vIOMMU before 
> attaching
> it. This would allow to reuse the structure for other IOMMU architectures. 
> If, for
> instance, the host has an intel IOMMU and someone decides to emulate an ARM
> SMMU with Qemu (their loss :), it can certainly use VFIO for passing-through 
> devices
> with MAP/UNMAP. But if Qemu then attempts to passdown a PASID table in SMMU
> format, the Intel driver should have a way to reject it, as the SMMU format 
> isn't
> compatible.

Exactly, it would be grt if we can have the API defined as generic as 
MAP/UNMAP. The
case you mentioned to emulate an ARM SMMU on an Intel platform is 
representative.
For such cases, the problem is different vendors may have different PASID table 
format
and also different page table format. In my understanding, these incompatible 
things
may just result in failure if users try such emulation. What's your opinion 
here?
Anyhow, better to listen to different voices.

> 
> I'm tackling a similar problem at the moment, but for passing a single page 
> directory
> instead of full PASID table to the IOMMU.

For, Intel IOMMU, passing the whole guest PASID table is enough and it also 
avoids 
too much pgd passing. However, I'm open on this idea. You may just add a new 
flag
in "struct vfio_device_svm" and pass the single pgd down to host.

> 
> So we need some kind of high-level classification that the vIOMMU must
> communicate to the physical one. Each IOMMU flavor would get a unique, global
> identifier, simply to make sure that vIOMMU and pIOMMU speak the same 
> language.
> For example:
> 
> 0x65776886 "AMDV" AMD IOMMU
> 0x73788476 "INTL" Intel IOMMU
> 0x83515748 "S390" s390 IOMMU
> 0x8385 "SMMU" ARM SMMU
> etc.
> 
> It needs to be a global magic number that everyone can recognize. Could be as
> simple as 32-bit numbers allocated from 0. Once we have a global magic 
> number, we
> can use it to differentiate architecture-specific details.

I may need to think more on this part.
 
> struct pasid_table_info {
>   __u64 ptr;
>   __u64 size; /* Is it number of entry or size in
>  bytes? */

For Intel platform, it's encoded. But I can make it in bytes. Here, I'd like
to check with you if whole guest PASID info is also needed on ARM?

> 
>   __u32 model;/* magic number */
>   __u32 variant;  /* version of the IOMMU architecture,
>  maybe? IOMMU-specific. */
>   __u8 opaque[];  /* IOMMU-specific details */
> };
> 
> And then each IOMMU or page-table code can do low-level validation of the 
> format,
> by reading the details in 'opaque'. I assume that for Intel this would be 
> empty. But

yes, for Intel, if the PASID ptr is in the definition, opaque would be empty.

> for instance on ARM SMMUv3, PASID table can have either one or two levels, and
>

RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory

2017-03-24 Thread Liu, Yi L
> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf
> Of Jean-Philippe Brucker
> Sent: Thursday, March 23, 2017 9:38 PM
> To: Liu, Yi L ; Alex Williamson 
> 
> Cc: Shanker Donthineni ; k...@vger.kernel.org;
> Catalin Marinas ; Sinan Kaya
> ; Will Deacon ;
> iommu@lists.linux-foundation.org; Harv Abdulhamid ;
> linux-...@vger.kernel.org; Bjorn Helgaas ; David
> Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate
> Watterson ; Tian, Kevin ;
> Lan, Tianyu ; Raj, Ashok ; Pan, 
> Jacob
> jun ; Joerg Roedel ; Robin Murphy
> 
> Subject: Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
> 
> On 23/03/17 08:39, Liu, Yi L wrote:
> > Hi Jean,
> >
> > Thx for the excellent ideas. Pls refer to comments inline.
> >
> > [...]
> >
> >>> Hi Jean,
> >>>
> >>> I'm working on virtual SVM, and have some comments on the VFIO
> >>> channel definition.
> >>
> >> Thanks a lot for the comments, this is quite interesting to me. I
> >> just have some concerns about portability so I'm proposing a way to be 
> >> slightly
> more generic below.
> >>
> >
> > yes, portability is what need to consider.
> >
> > [...]
> >
> >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>>> index
> >>>> 519eff362c1c..3fe4197a5ea0 100644
> >>>> --- a/include/uapi/linux/vfio.h
> >>>> +++ b/include/uapi/linux/vfio.h
> >>>> @@ -198,6 +198,7 @@ struct vfio_device_info {
> >>>>  #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)/* vfio-pci device */
> >>>>  #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device 
> >>>> */
> >>>>  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)/* vfio-amba device */
> >>>> +#define VFIO_DEVICE_FLAGS_SVM   (1 << 4)/* Device supports
> bind/unbind */
> >>>>  __u32   num_regions;/* Max region index + 1 */
> >>>>  __u32   num_irqs;   /* Max IRQ index + 1 */
> >>>>  };
> >>>> @@ -409,6 +410,60 @@ struct vfio_irq_set {
> >>>>   */
> >>>>  #define VFIO_DEVICE_RESET   _IO(VFIO_TYPE, VFIO_BASE + 11)
> >>>>
> >>>> +struct vfio_device_svm {
> >>>> +__u32   argsz;
> >>>> +__u32   flags;
> >>>> +#define VFIO_SVM_PASID_RELEASE_FLUSHED  (1 << 0)
> >>>> +#define VFIO_SVM_PASID_RELEASE_CLEAN(1 << 1)
> >>>> +__u32   pasid;
> >>>> +};
> >>>
> >>> For virtual SVM work, the VFIO channel would be used to passdown
> >>> guest PASID tale PTR and invalidation information. And may have
> >>> further usage except the above.
> >>>
> >>> Here is the virtual SVM design doc which illustrates the VFIO usage.
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >>>
> >>> For the guest PASID table ptr passdown, I've following message in pseudo 
> >>> code.
> >>> struct pasid_table_info {
> >>> __u64 ptr;
> >>> __u32 size;
> >>>  };
> >>
> >> There should probably be a way to specify the table format, so that
> >> the pIOMMU driver can check that it recognizes the format used by the
> >> vIOMMU before attaching it. This would allow to reuse the structure
> >> for other IOMMU architectures. If, for instance, the host has an
> >> intel IOMMU and someone decides to emulate an ARM SMMU with Qemu
> >> (their loss :), it can certainly use VFIO for passing-through devices
> >> with MAP/UNMAP. But if Qemu then attempts to passdown a PASID table
> >> in SMMU format, the Intel driver should have a way to reject it, as the 
> >> SMMU
> format isn't compatible.
> >
> > Exactly, it would be grt if we can have the API defined as generic as
> > MAP/UNMAP. The case you mentioned to emulate an ARM SMMU on an Intel
> platform is representative.
> > For such cases, the problem is different vendors may have different
> > PASID table format and also different page table format. In my
> > understanding, these incompatible things may just result in failure if 
> > users try such
> emulation. What's your opinion here?
> > Anyhow, better to listen to different voices.
> 
> Yes, in case the vIOMMU and 

RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory

2017-03-28 Thread Liu, Yi L
> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Monday, March 27, 2017 6:14 PM
> To: Liu, Yi L ; Alex Williamson 
> 
> Cc: Shanker Donthineni ; k...@vger.kernel.org;
> Catalin Marinas ; Sinan Kaya
> ; Will Deacon ;
> iommu@lists.linux-foundation.org; Harv Abdulhamid ;
> linux-...@vger.kernel.org; Bjorn Helgaas ; David
> Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate
> Watterson ; Tian, Kevin ;
> Lan, Tianyu ; Raj, Ashok ; Pan, 
> Jacob
> jun ; Joerg Roedel ; Robin Murphy
> 
> Subject: Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
> 
> On 24/03/17 07:46, Liu, Yi L wrote:
> [...]
> >>>>
> >>>> So we need some kind of high-level classification that the vIOMMU
> >>>> must communicate to the physical one. Each IOMMU flavor would get a
> >>>> unique, global identifier, simply to make sure that vIOMMU and
> >>>> pIOMMU speak
> >> the same language.
> >>>> For example:
> >>>>
> >>>> 0x65776886 "AMDV" AMD IOMMU
> >>>> 0x73788476 "INTL" Intel IOMMU
> >>>> 0x83515748 "S390" s390 IOMMU
> >>>> 0x8385 "SMMU" ARM SMMU
> >>>> etc.
> >>>>
> >>>> It needs to be a global magic number that everyone can recognize.
> >>>> Could be as simple as 32-bit numbers allocated from 0. Once we have
> >>>> a global magic number, we can use it to differentiate 
> >>>> architecture-specific
> details.
> >
> > I prefer simple numbers to stand for each vendor.
> 
> Sure, I don't have any preference. Simple numbers could be easier to allocate.
> 
> >>> I may need to think more on this part.
> >>>
> >>>> struct pasid_table_info {
> >>>>  __u64 ptr;
> >>>>  __u64 size; /* Is it number of entry or size in
> >>>> bytes? */
> >>>
> >>> For Intel platform, it's encoded. But I can make it in bytes. Here,
> >>> I'd like to check with you if whole guest PASID info is also needed on 
> >>> ARM?
> >>
> >> It will be needed on ARM if someone ever emulates the SMMU with SVM.
> >> Though I'm not planning on doing that myself, it is unavoidable. And
> >> it would be a shame for the next SVM virtualization solution to have
> >> to introduce a new flag "VFIO_SVM_BIND_PASIDPT_2" if they could reuse
> >> most of the BIND_PASIDPT interface but simply needed to add one or
> >> two configuration fields specific to their IOMMU.
> >
> > So you are totally fine with putting PASID table ptr and size in the
> > generic part? Maybe we have different usage for it. For me, it's a
> > guest PASID table ptr. For you, it may be different.
> 
> It's the same for SMMU, with some added format specifiers that would go in
> 'opaque[]'. I think that table pointer and size (in bytes, or number of
> entries) is generic enough for a "bind table" call and can be reused by future
> implementations.
> 
> >>>>
> >>>>  __u32 model;/* magic number */
> >>>>  __u32 variant;  /* version of the IOMMU architecture,
> >>>> maybe? IOMMU-specific. */
> >
> > For variant, it will be combined with model to do sanity check. Am I right?
> > Maybe it could be moved to opaque.
> 
> Yes I guess it could be moved to opaque. It would be a version of the model 
> used, so
> we wouldn't have to allocate a new model number whenever an architecture
> updates the fields of its PASID descriptors, but we can let IOMMU drivers 
> decide if
> they need it and what to put in there.
> 
> >>>>  __u8 opaque[];  /* IOMMU-specific details */
> >>>> };
> >>>>
> [...]
> >>
> >> Yes, that seems sensible. I could add an explicit VFIO_BIND_PASID
> >> flags to make it explicit that data[] is "u32 pasid" and avoid having any 
> >> default.
> >
> > Add it in the comment I suppose. The length is 4 byes, it could be deduced 
> > from
> argsz.
> >
> >>
> >>>>
> >>>>> #define VFIO_SVM_PASSDOWN_INVALIDATE(1 << 1)
> >>>>
> >>>> Using the vfio_device_svm structure for invalidate operations is a
> >>>> bit odd, it might be nicer to add a new VFIO_SVM_INVALIDATE 

[RFC PATCH 00/20] Qemu: Extend intel_iommu emulator to support Shared Virtual Memory

2017-04-26 Thread Liu, Yi L
entry

Run-time:
(4) Forward guest cache invalidation requests for 1st level translation to
pIOMMU
(5) Fault reporting, reports fault happen on host to intel_iommu emulator,
then to guest
(6) Page Request and response

As fault reporting framework is in discussion in another thread which is
driven by Lan Tianyu, so vSVM enabling plan is to divide the work into two
phase. This patchset is for Phase 1.

Phase 1: include item (1), (2) and (3).
Phase 2: include item (4), (5) and (6).


[Overview of patch]
This patchset has a requirement of Passthru-Mode supporting for
intel_iommu. Peter Xu has sent a patch for it.
https://www.mail-archive.com/qemu-devel@nongnu.org/msg443627.html

* 1 ~ 2 enables Extend-Context Support in intel_iommu emulator.
* 3 exposes SVM related capability to guest with an option.
* 4 changes VFIO notifier parameter for the newly added notifier.
* 5 ~ 6 adds new VFIO notifier for pasid table bind request.
* 7 ~ 8 adds notifier flag check in memory_replay and region_del.
* 9 ~ 11 introduces a mechanism between VFIO and intel_iommu emulator
  to record assigned device info. e.g. the host SID of the assigned
  device.
* 12 adds fire function for pasid table bind notifier
* 13 adds generic definition for pasid table info in iommu.h
* 14 ~ 15 link the guest pasid table to host for intel_iommu
* 16 adds VFIO notifier for propagating guest IOMMU TLB invalidate
  to host.
* 17 adds fire function for IOMMU TLB invalidate notifier
* 18 ~ 20 propagate first-level page table related cache invalidate
  to host.

[Test Done]
The patchset is tested with IGD. Assign IGD to guest, the IGD could
write data to guest application address space.

i915 SVM capable driver could be found:
https://cgit.freedesktop.org/~miku/drm-intel/?h=svm

i915 svm test tool:
https://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=svm


[Co-work with gIOVA enablement]
Currently Peter Xu is working on enabling gIOVA usage for Intel
IOMMU emulator, this patchset is based on Peter's work (V7).
https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v7

[Limitation]
* Due to VT-d HW limitation, an assigned device cannot use gIOVA
and vSVM in the same time. Intel VT-d spec would introduce a new
capability bit indicating such limitation which guest IOMMU driver
can check to prevent both IOVA/SVM enabled, as a short-term solution.
In the long term it will be fixed by HW.

[Open]
* This patchset proposes passing raw data from guest to host when
propagating the guest IOMMU TLB invalidation.

In fact, we have two choice here.

a) as proposed in this patchset, passing raw data to host. Host pIOMMU
   driver submits invalidation request after replacing specific fields.
   Reject if the IOMMU model is not correct.
   * Pros: no need to do parse and re-assembling, better performance
   * Cons: unable to support the scenarios which emulates an Intel IOMMU
   on an ARM platform.
b) parse the invalidation info into specific data, e.g. gran, addr,
   size, invalidation type etc. then fill the data in a generic
   structure. In host, pIOMMU driver re-assemble the invalidation
   request and submit to pIOMMU.
   * Pros: may be able to support the scenario above. But it is still in
   question since different vendor may have vendor specific
   invalidation info. This would make it difficult to have vendor
   agnostic invalidation propagation API.

   * Cons: needs additional complexity to do parse and re-assembling.
   The generic structure would be a hyper-set of all possible
   invalidate info, this may be hard to maintain in future.

As the pros/cons show, I proposed a) as an initial version. But it is an
open. I would be glad to hear from you.

FYI. The following definition is a draft discussed with Jean in previous
discussion. It has both generic part and vendor specific part.

struct tlb_invalidate_info
{
__u32   model;  /* Vendor number */
__u8 granularity
#define DEVICE_SELECTVIE_INV(1 << 0)
#define PAGE_SELECTIVE_INV  (1 << 0)
#define PASID_SELECTIVE_INV (1 << 1)
__u32 pasid;
__u64 addr;
__u64 size;

/* Since IOMMU format has already been validated for this table,
   the IOMMU driver knows that the following structure is in a
   format it knows */
__u8 opaque[];
};

struct tlb_invalidate_info_intel
{
__u32 inv_type;
...
__u64 flags;
...
__u8 mip;
__u16 pfsid;
};

Additionally, Jean is proposing a para-vIOMMU solution. There is opaque
data in the proposed invalidate request VIRTIO_IOMMU_T_INVALIDATE. So it
may be preferred to have opaque part when doing the iommu tlb invalidate
propagation in SVM virtualization.

http://www.spinics.net/lists/kvm/msg147993.html

Best Wishes,
Yi L


Liu, Yi L (20):
  intel_iommu: add "ecs" option
  intel_iommu: exposed extended-context mode to guest
  intel_iommu: add "svm" option

[RFC PATCH 05/20] VFIO: add new IOCTL for svm bind tasks

2017-04-26 Thread Liu, Yi L
Add a new IOCTL cmd VFIO_IOMMU_SVM_BIND_TASK attached on container->fd.

On VT-d, this IOCTL cmd would be used to link the guest PASID page table
to host. While for other vendors, it may also be used to support other
kind of SVM bind request. Previously, there is a discussion on it with
ARM engineer. It can be found by the link below. This IOCTL cmd may
support SVM PASID bind request from userspace driver, or page table(cr3)
bind request from guest. These SVM bind requests would be supported by
adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
support page table bind from guest.

https://patchwork.kernel.org/patch/9594231/

Signed-off-by: Liu, Yi L 
---
 linux-headers/linux/vfio.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 759b850..9848d63 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -537,6 +537,24 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/* IOCTL for Shared Virtual Memory Bind */
+struct vfio_device_svm {
+   __u32   argsz;
+#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */
+#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace driver */
+#define VFIO_SVM_BIND_PGTABLE  (1 << 2) /* Bind guest mmu page table */
+   __u32   flags;
+   __u32   length;
+   __u8data[];
+};
+
+#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \
+   VFIO_SVM_BIND_PASID | \
+   VFIO_SVM_BIND_PGTABLE )
+
+#define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
+
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 13/20] IOMMU: add pasid_table_info for guest pasid table

2017-04-26 Thread Liu, Yi L
This patch adds iommu.h to define some generic definition for IOMMU.

Here defines "struct pasid_table_info" for guest pasid table bind.

Signed-off-by: Liu, Yi L 
---
 linux-headers/linux/iommu.h | 30 ++
 1 file changed, 30 insertions(+)
 create mode 100644 linux-headers/linux/iommu.h

diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
new file mode 100644
index 000..4519dcf
--- /dev/null
+++ b/linux-headers/linux/iommu.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2017 Intel Corporation.
+ * Author: Yi Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef __LINUX_IOMMU_H
+#define __LINUX_IOMMU_H
+
+#include 
+
+struct pasid_table_info {
+   __u64  ptr; /* PASID table ptr */
+   __u64  size;/* PASID table size*/
+   __u32  model;   /* magic number */
+#defineINTEL_IOMMU (1 << 0)
+#defineARM_SMMU(1 << 1)
+   __u8   opaque[];/* IOMMU-specific details */
+};
+
+#endif /* __LINUX_IOMMU_H */
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 04/20] Memory: modify parameter in IOMMUNotifier func

2017-04-26 Thread Liu, Yi L
This patch modifies parameter of IOMMUNotifier, use "void *data" instead
of "IOMMUTLBEntry*". This is to extend it to support notifiers other than
MAP/UNMAP.

Signed-off-by: Liu, Yi L 
---
 hw/vfio/common.c  | 3 ++-
 hw/virtio/vhost.c | 3 ++-
 include/exec/memory.h | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6b33b9f..14473f1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -332,10 +332,11 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 return true;
 }
 
-static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
 VFIOContainer *container = giommu->container;
+IOMMUTLBEntry *iotlb = (IOMMUTLBEntry *)data;
 hwaddr iova = iotlb->iova + giommu->iommu_offset;
 bool read_only;
 void *vaddr;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ccf8b2e..fd20fd0 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1161,9 +1161,10 @@ static void vhost_virtqueue_cleanup(struct 
vhost_virtqueue *vq)
 event_notifier_cleanup(&vq->masked_notifier);
 }
 
-static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vhost_iommu_unmap_notify(IOMMUNotifier *n, void *data)
 {
 struct vhost_dev *hdev = container_of(n, struct vhost_dev, n);
+IOMMUTLBEntry *iotlb = (IOMMUTLBEntry *)data;
 
 if (hdev->vhost_ops->vhost_invalidate_device_iotlb(hdev,
iotlb->iova,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 267f399..1faca3b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -81,7 +81,7 @@ typedef enum {
 
 struct IOMMUNotifier;
 typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
-IOMMUTLBEntry *data);
+void *data);
 
 struct IOMMUNotifier {
 IOMMUNotify notify;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier

2017-04-26 Thread Liu, Yi L
Add a separate function to fire pasid table bind notifier. In future
there may be more pasid bind type with different granularity. e.g.
binding pasid entry instead of binding pasid table. It can be supported
by adding bind_type, check bind_type in fire func and trigger correct
notifier.

Signed-off-by: Liu, Yi L 
---
 include/exec/memory.h | 11 +++
 memory.c  | 21 +
 2 files changed, 32 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 49087ef..3b8f487 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -695,6 +695,17 @@ uint64_t 
memory_region_iommu_get_min_page_size(MemoryRegion *mr);
 void memory_region_notify_iommu(MemoryRegion *mr,
 IOMMUTLBEntry entry);
 
+/*
+ * memory_region_notify_iommu_svm_bind notify SVM bind
+ * request from vIOMMU emulator.
+ *
+ * @mr: the memory region of IOMMU
+ * @data: IOMMU SVM data
+ */
+void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
+ void *data);
+
+
 /**
  * memory_region_notify_one: notify a change in an IOMMU translation
  *   entry to a single notifier
diff --git a/memory.c b/memory.c
index 45ef069..ce0b0ff 100644
--- a/memory.c
+++ b/memory.c
@@ -1729,6 +1729,27 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 }
 }
 
+void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
+ void *data)
+{
+IOMMUNotifier *iommu_notifier;
+IOMMUNotifierFlag request_flags;
+
+assert(memory_region_is_iommu(mr));
+
+/*TODO: support other bind requests with smaller gran,
+ * e.g. bind signle pasid entry
+ */
+request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
+
+QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
+if (iommu_notifier->notifier_flags & request_flags) {
+iommu_notifier->notify(iommu_notifier, data);
+break;
+}
+}
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
 uint8_t mask = 1 << client;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 10/20] VFIO: notify vIOMMU emulator when device is assigned

2017-04-26 Thread Liu, Yi L
With vIOMMU exposed to guest, notify vIOMMU emulator to record information
of this assigned device. This patch adds iommu_ops->record_device to record
the host bus/slot/function for this device. In future, it can be extended to
other info which is needed.

Signed-off-by: Liu, Yi L 
---
 hw/vfio/pci.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9e13472..a1e6942 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2881,6 +2881,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
subregion,
0,
&n1);
+
+memory_region_notify_device_record(subregion,
+   &vdev->host);
+
 }
 }
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 15/20] intel_iommu: link whole guest pasid table to host

2017-04-26 Thread Liu, Yi L
VT-d has a nested mode which allows SVM virtualization. Link the whole
guest PASID table to host context entry and enable nested mode, pIOMMU
would do nested translation for DMA request. Thus achieve GVA->HPA
translation.

When extended-context-entry is modified in guest, intel_iommu emulator
should capture it, then link the whole guest PASID table to host and
enable nested mode for the assigned device.

Signed-off-by: Liu, Yi L 
---
 hw/i386/intel_iommu.c  | 121 +++--
 hw/i386/intel_iommu_internal.h |  11 
 2 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f291995..cd6db65 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -36,6 +36,7 @@
 #include "hw/i386/apic_internal.h"
 #include "kvm_i386.h"
 #include "trace.h"
+#include 
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -55,6 +56,14 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | 
VTD_DBGBIT(CSR);
 #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
 #endif
 
+typedef void (*vtd_device_hook)(VTDNotifierIterator *iter,
+void *hook_info,
+void *notify_info);
+
+static void vtd_context_inv_notify_hook(VTDNotifierIterator *iter,
+void *hook_info,
+void *notify_info);
+
 #define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
__opaque_type, \
__hook_info, \
@@ -1213,6 +1222,66 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
 }
 }
 
+void vtd_context_inv_notify_hook(VTDNotifierIterator *iter,
+ void *hook_info,
+ void *notify_info)
+{
+struct pasid_table_info *pasidt_info;
+IOMMUNotifierData iommu_data;
+VTDContextHookInfo *context_hook_info;
+uint16_t *host_sid;
+pasidt_info = (struct pasid_table_info *) notify_info;
+context_hook_info = (VTDContextHookInfo *) hook_info;
+switch (context_hook_info->gran) {
+case VTD_INV_DESC_CC_GLOBAL:
+/* Fall through */
+case VTD_INV_DESC_CC_DOMAIN:
+if (iter->did == *context_hook_info->did) {
+break;
+}
+/* Fall through */
+case VTD_INV_DESC_CC_DEVICE:
+if ((iter->did == *context_hook_info->did) &&
+(iter->sid == *context_hook_info->sid)) {
+break;
+}
+/* Fall through */
+default:
+return;
+}
+
+pasidt_info->model = INTEL_IOMMU;
+host_sid = (uint16_t *)&pasidt_info->opaque;
+
+pasidt_info->ptr = iter->ce[1].lo;
+pasidt_info->size = iter->ce[1].lo & VTD_PASID_TABLE_SIZE_MASK;
+*host_sid = iter->host_sid;
+iommu_data.payload = (uint8_t *) pasidt_info;
+iommu_data.payload_size = sizeof(*pasidt_info) + sizeof(*host_sid);
+memory_region_notify_iommu_svm_bind(&iter->vtd_as->iommu,
+&iommu_data);
+return;
+}
+
+static void vtd_context_cache_invalidate_notify(IntelIOMMUState *s,
+uint16_t *did,
+uint16_t *sid,
+uint8_t gran,
+vtd_device_hook hook_fn)
+{
+VTDContextHookInfo context_hook_info = {
+.did = did,
+.sid = sid,
+.gran = gran,
+};
+
+FOR_EACH_ASSIGN_DEVICE(struct pasid_table_info,
+   uint16_t,
+   &context_hook_info,
+   hook_fn);
+return;
+}
+
 static void vtd_context_global_invalidate(IntelIOMMUState *s)
 {
 trace_vtd_inv_desc_cc_global();
@@ -1228,8 +1297,35 @@ static void 
vtd_context_global_invalidate(IntelIOMMUState *s)
  * VT-d emulation codes.
  */
 vtd_iommu_replay_all(s);
+
+if (s->svm) {
+vtd_context_cache_invalidate_notify(s, NULL, NULL,
+VTD_INV_DESC_CC_GLOBAL, vtd_context_inv_notify_hook);
+}
 }
 
+static void vtd_context_domain_selective_invalidate(IntelIOMMUState *s,
+uint16_t did)
+{
+trace_vtd_inv_desc_cc_global();
+s->context_cache_gen++;
+if (s->context_cache_gen == VTD_CONTEXT_CACHE_GEN_MAX) {
+vtd_reset_context_cache(s);
+}
+/*
+ * From VT-d spec 6.5.2.1, a global context entry invalidation
+ * should be followed by a IOTLB global invalidation, so we should
+ * be safe even without this. Hoewever, let's replay the region as
+ * well to be safer, and go back here when we need finer tunes for
+ * VT-d emulation codes.
+ */
+vtd_iommu_replay_all(s);
+
+if (s->svm) {
+  

[RFC PATCH 08/20] Memory: add notifier flag check in memory_replay()

2017-04-26 Thread Liu, Yi L
memory_region_iommu_replay is used to do replay with MAP/UNMAP notifier.
However, other notifiers may be passed in, so add a check against notifier
flag to avoid potential error. e.g. memory_region_iommu_replay_all loops
all registered notifiers, may just pass in wrong notifier.

Signed-off-by: Liu, Yi L 
---
 memory.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/memory.c b/memory.c
index 9c253cc..0728e62 100644
--- a/memory.c
+++ b/memory.c
@@ -1630,6 +1630,14 @@ void memory_region_iommu_replay(MemoryRegion *mr, 
IOMMUNotifier *n,
 hwaddr addr, granularity;
 IOMMUTLBEntry iotlb;
 
+if (!(n->notifier_flags & IOMMU_NOTIFIER_MAP_UNMAP)) {
+/* If notifier flag is not IOMMU_NOTIFIER_UNMAP or
+ * IOMMU_NOTIFIER_MAP, return. This check is necessary
+ * as there is notifier other than MAP/UNMAP
+ */
+return;
+}
+
 /* If the IOMMU has its own replay callback, override */
 if (mr->iommu_ops->replay) {
 mr->iommu_ops->replay(mr, n);
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 17/20] Memory: Add func to fire TLB invalidate notifier

2017-04-26 Thread Liu, Yi L
This patch adds a separate function to fire IOMMU TLB invalidate notifier.

Signed-off-by: Liu, Yi L 
---
 include/exec/memory.h |  9 +
 memory.c  | 18 ++
 2 files changed, 27 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index af15351..0155bad 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -707,6 +707,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
  void *data);
 
+/*
+ * memory_region_notify_iommu_invalidate: notify IOMMU
+ * TLB invalidation passdown.
+ *
+ * @mr: the memory region of IOMMU
+ * @data: IOMMU SVM data
+ */
+void memory_region_notify_iommu_invalidate(MemoryRegion *mr,
+   void *data);
 
 /**
  * memory_region_notify_one: notify a change in an IOMMU translation
diff --git a/memory.c b/memory.c
index ce0b0ff..8c572d5 100644
--- a/memory.c
+++ b/memory.c
@@ -1750,6 +1750,24 @@ void memory_region_notify_iommu_svm_bind(MemoryRegion 
*mr,
 }
 }
 
+void memory_region_notify_iommu_invalidate(MemoryRegion *mr,
+   void *data)
+{
+IOMMUNotifier *iommu_notifier;
+IOMMUNotifierFlag request_flags;
+
+assert(memory_region_is_iommu(mr));
+
+request_flags = IOMMU_NOTIFIER_IOMMU_TLB_INV;
+
+QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
+if (iommu_notifier->notifier_flags & request_flags) {
+iommu_notifier->notify(iommu_notifier, data);
+break;
+}
+}
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
 uint8_t mask = 1 << client;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 06/20] VFIO: add new notifier for binding PASID table

2017-04-26 Thread Liu, Yi L
This patch includes the following items:

* add vfio_register_notifier() for vfio notifier initialization
* add new notifier flag IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4
* add vfio_iommu_bind_pasid_tbl_notify() to link guest pasid table
  to host

This patch doesn't register new notifier in vfio memory region listener
region_add callback. The reason is as below:

On VT-d, when virtual intel_iommu is exposed to guest, the vfio memory
listener listens to address_space_memory. When guest Intel IOMMU driver
enables address translation, vfio memory listener may switch to listen
to vtd_address_space. But there is special case. If virtual intel_iommu
reports ecap.PT=1 to guest and meanwhile guest Intel IOMMU driver sets
"pt" mode for the assigned, vfio memory listener would keep listen to
address_space_memory to make sure there is GPA->HPA mapping in pIOMMU.
Thus region_add would not be triggered. While for the newly added
notifier, it requires to be registered once virtual intel_iommu is
exposed to guest.

Signed-off-by: Liu, Yi L 
---
 hw/vfio/common.c  | 37 +++---
 hw/vfio/pci.c | 53 ++-
 include/exec/memory.h |  8 +++
 include/hw/vfio/vfio-common.h |  5 
 4 files changed, 94 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 14473f1..e270255 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -294,6 +294,25 @@ static bool 
vfio_listener_skipped_section(MemoryRegionSection *section)
section->offset_within_address_space & (1ULL << 63);
 }
 
+VFIOGuestIOMMU *vfio_register_notifier(VFIOContainer *container,
+   MemoryRegion *mr,
+   hwaddr offset,
+   IOMMUNotifier *n)
+{
+VFIOGuestIOMMU *giommu;
+
+giommu = g_malloc0(sizeof(*giommu));
+giommu->iommu = mr;
+giommu->iommu_offset = offset;
+giommu->container = container;
+giommu->n = *n;
+
+QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+
+return giommu;
+}
+
 /* Called with rcu_read_lock held.  */
 static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
bool *read_only)
@@ -466,6 +485,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 
 if (memory_region_is_iommu(section->mr)) {
 VFIOGuestIOMMU *giommu;
+IOMMUNotifier n;
+hwaddr iommu_offset;
 
 trace_vfio_listener_region_add_iommu(iova, end);
 /*
@@ -474,21 +495,21 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
  * would be the right place to wire that up (tell the KVM
  * device emulation the VFIO iommu handles to use).
  */
-giommu = g_malloc0(sizeof(*giommu));
-giommu->iommu = section->mr;
-giommu->iommu_offset = section->offset_within_address_space -
-   section->offset_within_region;
-giommu->container = container;
+iommu_offset = section->offset_within_address_space -
+   section->offset_within_region;
 llend = int128_add(int128_make64(section->offset_within_region),
section->size);
 llend = int128_sub(llend, int128_one());
-iommu_notifier_init(&giommu->n, vfio_iommu_map_notify,
+iommu_notifier_init(&n, vfio_iommu_map_notify,
 IOMMU_NOTIFIER_ALL,
 section->offset_within_region,
 int128_get64(llend));
-QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
-memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+giommu = vfio_register_notifier(container,
+section->mr,
+iommu_offset,
+&n);
+
 memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
 
 return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..9e13472 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2594,11 +2594,38 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 vdev->req_enabled = false;
 }
 
+static void vfio_iommu_bind_pasid_tbl_notify(IOMMUNotifier *n, void *data)
+{
+VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+VFIOContainer *container = giommu->container;
+IOMMUNotifierData *iommu_data = (IOMMUNotifierData *) data;
+struct vfio_device_svm *vfio_svm;
+int argsz;
+
+argsz = sizeof(*vfio_svm) + iommu_data->payload_size;
+vfio_svm = g_malloc0(argsz);
+vfio_svm->argsz =

[RFC PATCH 07/20] VFIO: check notifier flag in region_del()

2017-04-26 Thread Liu, Yi L
This patch adds flag check when unregistering MAP/UNMAP notifier in
region_del. MAP/UNMAP notifier would be unregistered when iommu
memory region is deleted. This is to avoid unregistering other
notifiers.

Peter Xu's intel_iommu enhancement series has introduced dynamic
switch of IOMMU region. If an assigned device switches to use "pt",
the IOMMU region would be deleted, thus the MAP/UNMAP notifier would
be unregistered. While for some cases, the other notifiers may still
wanted. e.g. if a user decides to use vSVM for the assigned device
after the switch, then the pasid table bind notifier is needed. The
newly added pasid table bind notifier would be unregistered in the
vfio_disconnect_container(). The link below would direct you to Peter's
dynamic switch patch.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg62.html

Signed-off-by: Liu, Yi L 
---
 hw/vfio/common.c  | 5 +++--
 include/exec/memory.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e270255..719de61 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -501,7 +501,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
section->size);
 llend = int128_sub(llend, int128_one());
 iommu_notifier_init(&n, vfio_iommu_map_notify,
-IOMMU_NOTIFIER_ALL,
+IOMMU_NOTIFIER_MAP_UNMAP,
 section->offset_within_region,
 int128_get64(llend));
 
@@ -578,7 +578,8 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 
 QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
 if (giommu->iommu == section->mr &&
-giommu->n.start == section->offset_within_region) {
+giommu->n.start == section->offset_within_region &&
+giommu->n.notifier_flags & IOMMU_NOTIFIER_MAP_UNMAP) {
 memory_region_unregister_iommu_notifier(giommu->iommu,
 &giommu->n);
 QLIST_REMOVE(giommu, giommu_next);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index d2f24cc..7bd13ab 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -85,7 +85,7 @@ typedef enum {
 IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4,
 } IOMMUNotifierFlag;
 
-#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
+#define IOMMU_NOTIFIER_MAP_UNMAP (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
 
 struct IOMMUNotifier;
 typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest

2017-04-26 Thread Liu, Yi L
VT-d implementations reporting PASID or PRS fields as "Set", must also
report ecap.ECS as "Set". Extended-Context is required for SVM.

When ECS is reported, intel iommu driver would initiate extended root entry
and extended context entry, and also PASID table if there is any SVM capable
device.

Signed-off-by: Liu, Yi L 
---
 hw/i386/intel_iommu.c  | 131 +++--
 hw/i386/intel_iommu_internal.h |   9 +++
 include/hw/i386/intel_iommu.h  |   2 +-
 3 files changed, 97 insertions(+), 45 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 400d0d1..bf98fa5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry 
*root)
 return root->val & VTD_ROOT_ENTRY_P;
 }
 
+static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
+{
+return root->rsvd & VTD_ROOT_ENTRY_P;
+}
+
 static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
   VTDRootEntry *re)
 {
@@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t 
index,
 return -VTD_FR_ROOT_TABLE_INV;
 }
 re->val = le64_to_cpu(re->val);
+if (s->ecs) {
+re->rsvd = le64_to_cpu(re->rsvd);
+}
 return 0;
 }
 
@@ -517,19 +525,30 @@ static inline bool 
vtd_context_entry_present(VTDContextEntry *context)
 return context->lo & VTD_CONTEXT_ENTRY_P;
 }
 
-static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
-   VTDContextEntry *ce)
+static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
+ VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
 {
-dma_addr_t addr;
+dma_addr_t addr, ce_size;
 
 /* we have checked that root entry is present */
-addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
-if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
+ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
+addr = (s->ecs && (index > 0x7f)) ?
+   ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
+   ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
+
+if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
 trace_vtd_re_invalid(root->rsvd, root->val);
 return -VTD_FR_CONTEXT_TABLE_INV;
 }
-ce->lo = le64_to_cpu(ce->lo);
-ce->hi = le64_to_cpu(ce->hi);
+
+ce[0].lo = le64_to_cpu(ce[0].lo);
+ce[0].hi = le64_to_cpu(ce[0].hi);
+
+if (s->ecs) {
+ce[1].lo = le64_to_cpu(ce[1].lo);
+ce[1].hi = le64_to_cpu(ce[1].hi);
+}
+
 return 0;
 }
 
@@ -595,9 +614,11 @@ static inline uint32_t 
vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
 return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
 }
 
-static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
+static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
+   VTDContextEntry *ce)
 {
-return ce->lo & VTD_CONTEXT_ENTRY_TT;
+return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
+(ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
 }
 
 static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
@@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, 
uint8_t bus_num,
 return ret_fr;
 }
 
-if (!vtd_root_entry_present(&re)) {
+if (!vtd_root_entry_present(&re) ||
+(s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re {
 /* Not error - it's okay we don't have root entry. */
 trace_vtd_re_not_present(bus_num);
 return -VTD_FR_ROOT_ENTRY_P;
-} else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
-trace_vtd_re_invalid(re.rsvd, re.val);
-return -VTD_FR_ROOT_ENTRY_RSVD;
+}
+if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
+(s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
+((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD {
+trace_vtd_re_invalid(re.rsvd, re.val);
+return -VTD_FR_ROOT_ENTRY_RSVD;
 }
 
-ret_fr = vtd_get_context_entry_from_root(&re, devfn, ce);
+ret_fr = vtd_get_context_entry_from_root(s, &re, devfn, ce);
 if (ret_fr) {
 return ret_fr;
 }
@@ -860,21 +885,36 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, 
uint8_t bus_num,
 /* Not error - it's okay we don't have context entry. */
 trace_vtd_ce_not_present(bus_num, devfn);
 return -VTD_FR_CONTEXT_ENTRY_P;
-} else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
-   (ce->lo &

[RFC PATCH 16/20] VFIO: Add notifier for propagating IOMMU TLB invalidate

2017-04-26 Thread Liu, Yi L
This patch adds the following items:
* add new notifier flag IOMMU_NOTIFIER_IOMMU_TLB_INV = 0x8
* add new IOCTL cmd VFIO_IOMMU_TLB_INVALIDATE attached on container->fd
* add vfio_iommu_tlb_invalidate_notify() to propagate IOMMU TLB invalidate
  to host

This new notifier is originated from the requirement of SVM virtualization
on VT-d. It is for invalidation of first-level and nested mappings from the
IOTLB and the paging-structure-caches. Since the existed MAP/UNMAP notifier
is designed for second-level related mappings, it is not suitable for the
new requirement. So it is necessary to introduce this new notifier to meet
the SVM virtualization requirement. Further detail would be included in the
patch below:

"intel_iommu: propagate Extended-IOTLB invalidate to host"

Signed-off-by: Liu, Yi L 
---
 hw/vfio/pci.c   | 37 +
 include/exec/memory.h   |  2 ++
 linux-headers/linux/iommu.h |  5 +
 linux-headers/linux/vfio.h  |  8 
 4 files changed, 52 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a1e6942..afcefd6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2619,6 +2619,33 @@ static void 
vfio_iommu_bind_pasid_tbl_notify(IOMMUNotifier *n, void *data)
 g_free(vfio_svm);
 }
 
+static void vfio_iommu_tlb_invalidate_notify(IOMMUNotifier *n,
+ void *data)
+{
+VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+VFIOContainer *container = giommu->container;
+IOMMUNotifierData *iommu_data = (IOMMUNotifierData *) data;
+struct vfio_iommu_tlb_invalidate *vfio_tlb_inv;
+int argsz;
+
+argsz = sizeof(*vfio_tlb_inv) + iommu_data->payload_size;
+vfio_tlb_inv = g_malloc0(argsz);
+vfio_tlb_inv->argsz = argsz;
+vfio_tlb_inv->length = iommu_data->payload_size;
+
+memcpy(&vfio_tlb_inv->data, iommu_data->payload,
+  iommu_data->payload_size);
+
+rcu_read_lock();
+if (ioctl(container->fd, VFIO_IOMMU_TLB_INVALIDATE,
+  vfio_tlb_inv) != 0) {
+error_report("vfio_iommu_tlb_invalidate_notify:"
+ " failed, contanier: %p", container);
+}
+rcu_read_unlock();
+g_free(vfio_tlb_inv);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
@@ -2865,6 +2892,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) {
 if (memory_region_is_iommu(subregion)) {
 IOMMUNotifier n1;
+IOMMUNotifier n2;
 
 /*
  FIXME: current iommu notifier is actually designed for
@@ -2882,6 +2910,15 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
0,
&n1);
 
+iommu_notifier_init(&n2, vfio_iommu_tlb_invalidate_notify,
+IOMMU_NOTIFIER_IOMMU_TLB_INV,
+0,
+0);
+vfio_register_notifier(group->container,
+   subregion,
+   0,
+   &n2);
+
 memory_region_notify_device_record(subregion,
&vdev->host);
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3b8f487..af15351 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -83,6 +83,8 @@ typedef enum {
 IOMMU_NOTIFIER_MAP = 0x2,
 /* Notify PASID Table Binding */
 IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4,
+/* Notify IOMMU TLB Invalidation */
+IOMMU_NOTIFIER_IOMMU_TLB_INV = 0x8,
 } IOMMUNotifierFlag;
 
 #define IOMMU_NOTIFIER_MAP_UNMAP (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
index 4519dcf..c2742ba 100644
--- a/linux-headers/linux/iommu.h
+++ b/linux-headers/linux/iommu.h
@@ -27,4 +27,9 @@ struct pasid_table_info {
__u8   opaque[];/* IOMMU-specific details */
 };
 
+struct tlb_invalidate_info {
+   __u32   model;
+   __u8opaque[];
+};
+
 #endif /* __LINUX_IOMMU_H */
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 9848d63..6c71c4a 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -554,6 +554,14 @@ struct vfio_device_svm {
 
 #define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
 
+/* For IOMMU Invalidation Passdwon */
+struct vfio_iommu_tlb_invalidate {
+   __u32   argsz;
+   __u32   length;
+   __u8data[];
+};
+
+#define VFIO_IOMMU_TLB_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 23)
 
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
-- 
1.9.1

_

[RFC PATCH 14/20] intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro

2017-04-26 Thread Liu, Yi L
Add FOR_EACH_ASSIGN_DEVICE. It would be used to loop all assigned
devices when processing guest pasid table linking and iommu cache
invalidate propagation.

Signed-off-by: Liu, Yi L 
---
 hw/i386/intel_iommu.c  | 32 
 hw/i386/intel_iommu_internal.h | 11 +++
 2 files changed, 43 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 0c412d2..f291995 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -55,6 +55,38 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | 
VTD_DBGBIT(CSR);
 #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
 #endif
 
+#define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
+   __opaque_type, \
+   __hook_info, \
+   __hook_fn) \
+do { \
+IntelIOMMUNotifierNode *node; \
+VTDNotifierIterator iterator; \
+int ret = 0; \
+__notify_info_type *notify_info; \
+__opaque_type *opaq; \
+int argsz; \
+argsz = sizeof(*notify_info) + sizeof(*opaq); \
+notify_info = g_malloc0(argsz); \
+QLIST_FOREACH(node, &(s->notifiers_list), next) { \
+VTDAddressSpace *vtd_as = node->vtd_as; \
+VTDContextEntry ce[2]; \
+iterator.bus = pci_bus_num(vtd_as->bus); \
+ret = vtd_dev_to_context_entry(s, iterator.bus, \
+   vtd_as->devfn, &ce[0]); \
+if (ret != 0) { \
+continue; \
+} \
+iterator.sid = vtd_make_source_id(iterator.bus, vtd_as->devfn); \
+iterator.did =  VTD_CONTEXT_ENTRY_DID(ce[0].hi); \
+iterator.host_sid = node->host_sid; \
+iterator.vtd_as = vtd_as; \
+iterator.ce = &ce[0]; \
+__hook_fn(&iterator, __hook_info, notify_info); \
+} \
+g_free(notify_info); \
+} while (0)
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
 uint64_t wmask, uint64_t w1cmask)
 {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f2a7d12..5178398 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -439,6 +439,17 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB   (4ULL << 2)
 #define VTD_EXT_CONTEXT_TT_DEV_IOTLB  (5ULL << 2)
 
+struct VTDNotifierIterator {
+VTDAddressSpace *vtd_as;
+VTDContextEntry *ce;
+uint16_t host_sid;
+uint16_t sid;
+uint16_t did;
+uint8_t  bus;
+};
+
+typedef struct VTDNotifierIterator VTDNotifierIterator;
+
 /* Paging Structure common */
 #define VTD_SL_PT_PAGE_SIZE_MASK(1ULL << 7)
 /* Bits to decide the offset for each level */
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 09/20] Memory: introduce iommu_ops->record_device

2017-04-26 Thread Liu, Yi L
With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
between host and guest. e.g. a device-selective TLB flush, vIOMMU
emulator needs to replace guest SID with host SID so that to limit
the invalidation. This patch introduces a new callback
iommu_ops->record_device() to notify vIOMMU emulator to record necessary
information about the assigned device.

Signed-off-by: Liu, Yi L 
---
 include/exec/memory.h | 11 +++
 memory.c  | 12 
 2 files changed, 23 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7bd13ab..49087ef 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
 IOMMUNotifierFlag new_flags);
 /* Set this up to provide customized IOMMU replay function */
 void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
+void (*record_device)(MemoryRegion *iommu,
+  void *device_info);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 void memory_region_notify_one(IOMMUNotifier *notifier,
   IOMMUTLBEntry *entry);
 
+/*
+ * memory_region_notify_device_record: notify IOMMU to record assign
+ * device.
+ * @mr: the memory region to notify
+ * @ device_info: device information
+ */
+void memory_region_notify_device_record(MemoryRegion *mr,
+void *info);
+
 /**
  * memory_region_register_iommu_notifier: register a notifier for changes to
  * IOMMU translation entries.
diff --git a/memory.c b/memory.c
index 0728e62..45ef069 100644
--- a/memory.c
+++ b/memory.c
@@ -1600,6 +1600,18 @@ static void 
memory_region_update_iommu_notify_flags(MemoryRegion *mr)
 mr->iommu_notify_flags = flags;
 }
 
+void memory_region_notify_device_record(MemoryRegion *mr,
+void *info)
+{
+assert(memory_region_is_iommu(mr));
+
+if (mr->iommu_ops->record_device) {
+mr->iommu_ops->record_device(mr, info);
+}
+
+return;
+}
+
 void memory_region_register_iommu_notifier(MemoryRegion *mr,
IOMMUNotifier *n)
 {
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 11/20] intel_iommu: provide iommu_ops->record_device

2017-04-26 Thread Liu, Yi L
This patch provides iommu_ops->record_device implementation for
intel_iommu. It records the host sid in the IntelIOMMUNotifierNode for
further virtualization usage. e.g. guest sid -> host sid translation
during propagating 1st level cache invalidation from guest to host.

Signed-off-by: Liu, Yi L 
---
 hw/i386/intel_iommu.c | 19 +++
 include/hw/i386/intel_iommu.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ba1e7eb..0c412d2 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2407,6 +2407,24 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion 
*iommu,
 }
 }
 
+static void vtd_iommu_record_device(MemoryRegion *iommu,
+void *device_info)
+{
+VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
+IntelIOMMUState *s = vtd_as->iommu_state;
+IntelIOMMUNotifierNode *node = NULL;
+IntelIOMMUNotifierNode *next_node = NULL;
+PCIHostDeviceAddress *host = (PCIHostDeviceAddress *) device_info;
+
+QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
+if (node->vtd_as == vtd_as) {
+node->host_sid = ((host->bus & 0xffUL) << 8)
+   | ((host->slot & 0x1f) << 3)
+   | (host->function & 0x7);
+}
+}
+}
+
 static const VMStateDescription vtd_vmstate = {
 .name = "iommu-intel",
 .version_id = 1,
@@ -2940,6 +2958,7 @@ static void vtd_init(IntelIOMMUState *s)
 s->iommu_ops.translate = vtd_iommu_translate;
 s->iommu_ops.notify_flag_changed = vtd_iommu_notify_flag_changed;
 s->iommu_ops.replay = vtd_iommu_replay;
+s->iommu_ops.record_device = vtd_iommu_record_device;
 s->root = 0;
 s->root_extended = false;
 s->dmar_enabled = false;
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 8981615..a4ce5c3 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -252,6 +252,7 @@ struct VTD_MSIMessage {
 
 struct IntelIOMMUNotifierNode {
 VTDAddressSpace *vtd_as;
+uint16_t host_sid;
 QLIST_ENTRY(IntelIOMMUNotifierNode) next;
 };
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 20/20] intel_iommu: propagate Ext-Device-TLB invalidate to host

2017-04-26 Thread Liu, Yi L
For Extended-Device-TLB invalidation, intel_iommu emulator needs to check
all the assigned device and find the affected device. Replace the guest
SID with the host SID in the invalidate descriptor and pass the request to
host.

Host may just submit the request to corresponding invalidation queue in
pIOMMU. In future maybe PASID needs to be replaced.

Signed-off-by: Liu, Yi L 
---
 hw/i386/intel_iommu.c  | 43 ++
 hw/i386/intel_iommu_internal.h |  7 +++
 2 files changed, 50 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c5e9170..4370790 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2012,6 +2012,13 @@ static void vtd_tlb_inv_notify_hook(VTDNotifierIterator 
*iter,
 } else {
 return;
 }
+case VTD_INV_DESC_EXT_DIOTLB:
+if (iter->sid != *tlb_hook_info->sid) {
+return;
+}
+tlb_hook_info->inv_desc->lo &= ~VTD_INV_DESC_EXT_DIOTLB_SID_MASK;
+tlb_hook_info->inv_desc->lo |= (iter->host_sid << 16);
+break;
 default:
 return;
 }
@@ -2147,6 +2154,34 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
 return true;
 }
 
+static bool vtd_process_ext_device_iotlb(IntelIOMMUState *s,
+ VTDInvDesc *inv_desc)
+{
+uint32_t pasid;
+uint16_t sid;
+VTDIOTLBInvHookInfo tlb_hook_info;
+
+if ((inv_desc->lo & VTD_INV_DESC_EXT_DIOTLB_RSVD_LO) ||
+(inv_desc->hi & VTD_INV_DESC_EXT_DIOTLB_RSVD_HI)) {
+VTD_DPRINTF(GENERAL, "error: non-zero reserved field in"
+" Device ExIOTLB desc, hi 0x%"PRIx64 " lo 0x%"PRIx64,
+inv_desc->hi, inv_desc->lo);
+return false;
+}
+
+pasid = VTD_INV_DESC_EXT_DIOTLB_PASID(inv_desc->lo);
+sid = VTD_INV_DESC_EXT_DIOTLB_SID(inv_desc->lo);
+
+tlb_hook_info.did = NULL;
+tlb_hook_info.sid = &sid;
+tlb_hook_info.pasid = &pasid;
+tlb_hook_info.inv_desc = inv_desc;
+vtd_tlb_inv_passdown_notify(s,
+&tlb_hook_info,
+vtd_tlb_inv_notify_hook);
+return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
 VTDInvDesc inv_desc;
@@ -2190,6 +2225,14 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 }
 break;
 
+case VTD_INV_DESC_EXT_DIOTLB:
+trace_vtd_inv_desc("device-extended-iotlb",
+   inv_desc.hi, inv_desc.lo);
+if (!vtd_process_ext_device_iotlb(s, &inv_desc)) {
+return false;
+}
+break;
+
 case VTD_INV_DESC_WAIT:
 trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
 if (!vtd_process_wait_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index a6b9350..3cb2361 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -343,6 +343,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_WAIT   0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_EXT_IOTLB  0x6 /* Ext-IOTLB Invalidate Desc */
 #define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */
+#define VTD_INV_DESC_EXT_DIOTLB 0x8 /* Ext-DIOTLB Invalidate Desc */
 #define VTD_INV_DESC_NONE   0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -407,6 +408,12 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_PASIDC_ALL_ALL(0ULL << 4)
 #define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
 
+#define VTD_INV_DESC_EXT_DIOTLB_PASID(val) (((val) >> 32) & 0xfULL)
+#define VTD_INV_DESC_EXT_DIOTLB_SID(val)   (((val) >> 16) & 0x)
+#define VTD_INV_DESC_EXT_DIOTLB_RSVD_LO0xe00ULL
+#define VTD_INV_DESC_EXT_DIOTLB_RSVD_HI0x7feULL
+#define VTD_INV_DESC_EXT_DIOTLB_SID_MASK   0xULL
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
 uint16_t domain_id;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 01/20] intel_iommu: add "ecs" option

2017-04-26 Thread Liu, Yi L
Report ecap.ECS=1 to guest by "-deivce intel-iommu, ecs=on" in QEMU Cmd

Signed-off-by: Liu, Yi L 
---
 hw/i386/intel_iommu.c  | 5 +
 hw/i386/intel_iommu_internal.h | 1 +
 include/hw/i386/intel_iommu.h  | 1 +
 3 files changed, 7 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4b7d90d..400d0d1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2409,6 +2409,7 @@ static Property vtd_properties[] = {
 ON_OFF_AUTO_AUTO),
 DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
 DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
+DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2925,6 +2926,10 @@ static void vtd_init(IntelIOMMUState *s)
 s->ecap |= VTD_ECAP_PT;
 }
 
+if (s->ecs) {
+s->ecap |= VTD_ECAP_ECS;
+}
+
 if (s->caching_mode) {
 s->cap |= VTD_CAP_CM;
 }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b96884e..ec1bd17 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -190,6 +190,7 @@
 #define VTD_ECAP_EIM(1ULL << 4)
 #define VTD_ECAP_PT (1ULL << 6)
 #define VTD_ECAP_MHMV   (15ULL << 20)
+#define VTD_ECAP_ECS(1ULL << 24)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 3e51876..fa5963e 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -266,6 +266,7 @@ struct IntelIOMMUState {
 uint32_t version;
 
 bool caching_mode;  /* RO - is cap CM enabled? */
+bool ecs;   /* Extended Context Support */
 
 dma_addr_t root;/* Current root table pointer */
 bool root_extended; /* Type of root table (extended or not) */
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 03/20] intel_iommu: add "svm" option

2017-04-26 Thread Liu, Yi L
Expose "Shared Virtual Memory" to guest by using "svm" option.
Also use "svm" to expose SVM related capabilities to guest.
e.g. "-device intel-iommu, svm=on"

Signed-off-by: Liu, Yi L 
---
 hw/i386/intel_iommu.c  | 10 ++
 hw/i386/intel_iommu_internal.h |  5 +
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 16 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index bf98fa5..ba1e7eb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
 DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
 DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
 DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
+DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
 s->ecap |= VTD_ECAP_ECS;
 }
 
+if (s->svm) {
+if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
+error_report("Need to set ecs, pt, caching-mode for svm");
+exit(1);
+}
+s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
+s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
+}
+
 if (s->caching_mode) {
 s->cap |= VTD_CAP_CM;
 }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 71a1c1e..f2a7d12 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -191,6 +191,9 @@
 #define VTD_ECAP_PT (1ULL << 6)
 #define VTD_ECAP_MHMV   (15ULL << 20)
 #define VTD_ECAP_ECS(1ULL << 24)
+#define VTD_ECAP_PASID28(1ULL << 28)
+#define VTD_ECAP_PRS(1ULL << 29)
+#define VTD_ECAP_PTS(0xeULL << 35)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
@@ -207,6 +210,8 @@
 #define VTD_CAP_PSI (1ULL << 39)
 #define VTD_CAP_SLLPS   ((1ULL << 34) | (1ULL << 35))
 #define VTD_CAP_CM  (1ULL << 7)
+#define VTD_CAP_DWD (1ULL << 54)
+#define VTD_CAP_DRD (1ULL << 55)
 
 /* Supported Adjusted Guest Address Widths */
 #define VTD_CAP_SAGAW_SHIFT 8
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index ae21fe5..8981615 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -267,6 +267,7 @@ struct IntelIOMMUState {
 
 bool caching_mode;  /* RO - is cap CM enabled? */
 bool ecs;   /* Extended Context Support */
+bool svm;   /* Shared Virtual Memory */
 
 dma_addr_t root;/* Current root table pointer */
 bool root_extended; /* Type of root table (extended or not) */
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 6/8] VFIO: do pasid table binding

2017-04-26 Thread Liu, Yi L
From: "Liu, Yi L" 

This patch adds IOCTL processing in vfio_iommu_type1 for
VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by
calling iommu_ops->bind_pasid_table to link the whole
PASID table to pIOMMU.

For VT-d, it is linking the guest PASID table to host pIOMMU.
This is key point to support SVM virtualization on VT-d.

Signed-off-by: Liu, Yi L 
---
 drivers/vfio/vfio_iommu_type1.c | 72 +
 1 file changed, 72 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b3cc33f..30b6d48 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct 
vfio_iommu *iommu)
return ret;
 }
 
+struct vfio_svm_task {
+   struct iommu_domain *domain;
+   void *payload;
+};
+
+static int bind_pasid_tbl_fn(struct device *dev, void *data)
+{
+   int ret = 0;
+   struct vfio_svm_task *task = data;
+   struct pasid_table_info *pasidt_binfo;
+
+   pasidt_binfo = task->payload;
+   ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo);
+   return ret;
+}
+
+static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
+   int (*fn)(struct device *, void *))
+{
+   int ret = 0;
+   struct vfio_domain *d;
+   struct vfio_group *g;
+   struct vfio_svm_task task;
+
+   task.payload = data;
+
+   mutex_lock(&iommu->lock);
+
+   list_for_each_entry(d, &iommu->domain_list, next) {
+   list_for_each_entry(g, &d->group_list, next) {
+   if (g->iommu_group != NULL) {
+   task.domain = d->domain;
+   ret = iommu_group_for_each_dev(
+   g->iommu_group, &task, fn);
+   if (ret != 0)
+   break;
+   }
+   }
+   }
+
+   mutex_unlock(&iommu->lock);
+   return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
return copy_to_user((void __user *)arg, &unmap, minsz) ?
-EFAULT : 0;
+   } else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) {
+   struct vfio_device_svm hdr;
+   u8 *data = NULL;
+   int ret = 0;
+
+   minsz = offsetofend(struct vfio_device_svm, length);
+   if (copy_from_user(&hdr, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (hdr.length == 0)
+   return -EINVAL;
+
+   data = memdup_user((void __user *)(arg + minsz),
+   hdr.length);
+   if (IS_ERR(data))
+   return PTR_ERR(data);
+
+   switch (hdr.flags & VFIO_SVM_TYPE_MASK) {
+   case VFIO_SVM_BIND_PASIDTBL:
+   ret = vfio_do_svm_task(iommu, data,
+   bind_pasid_tbl_fn);
+   break;
+   default:
+   ret = -EINVAL;
+   break;
+   }
+   kfree(data);
+   return ret;
}
 
return -ENOTTY;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d

2017-04-26 Thread Liu, Yi L
Hi,

This patchset introduces SVM virtualization for intel_iommu in
IOMMU/VFIO. The total SVM virtualization for intel_iommu touched
Qemu/IOMMU/VFIO.

Another patchset would change the Qemu. It is "[RFC PATCH 0/20] Qemu:
Extend intel_iommu emulator to support Shared Virtual Memory"

In this patchset, it adds two new IOMMU APIs and their implementation
in intel_iommu driver. In VFIO, it adds two IOCTL cmd attached on
container->fd to propagate data from QEMU to kernel space.

[Patch Overview]
* 1 adds iommu API definition for binding guest PASID table
* 2 adds binding PASID table API implementation in VT-d iommu driver
* 3 adds iommu API definition to do IOMMU TLB invalidation from guest
* 4 adds IOMMU TLB invalidation implementation in VT-d iommu driver
* 5 adds VFIO IOCTL for propagating PASID table binding from guest
* 6 adds processing of pasid table binding in vfio_iommu_type1
* 7 adds VFIO IOCTL for propagating IOMMU TLB invalidation from guest
* 8 adds processing of IOMMU TLB invalidation in vfio_iommu_type1

Best Wishes,
Yi L


Jacob Pan (3):
  iommu: Introduce bind_pasid_table API function
  iommu/vt-d: add bind_pasid_table function
  iommu/vt-d: Add iommu do invalidate function

Liu, Yi L (5):
  iommu: Introduce iommu do invalidate API function
  VFIO: Add new IOTCL for PASID Table bind propagation
  VFIO: do pasid table binding
  VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  VFIO: do IOMMU TLB invalidation from guest

 drivers/iommu/intel-iommu.c | 146 
 drivers/iommu/iommu.c   |  32 +
 drivers/vfio/vfio_iommu_type1.c |  98 +++
 include/linux/dma_remapping.h   |   1 +
 include/linux/intel-iommu.h |  11 +++
 include/linux/iommu.h   |  47 +
 include/uapi/linux/vfio.h   |  26 +++
 7 files changed, 361 insertions(+)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function

2017-04-26 Thread Liu, Yi L
From: "Liu, Yi L" 

When a SVM capable device is assigned to a guest, the first level page
tables are owned by the guest and the guest PASID table pointer is
linked to the device context entry of the physical IOMMU.

Host IOMMU driver has no knowledge of caching structure updates unless
the guest invalidation activities are passed down to the host. The
primary usage is derived from emulated IOMMU in the guest, where QEMU
can trap invalidation activities before pass them down the
host/physical IOMMU. There are IOMMU architectural specific actions
need to be taken which requires the generic APIs introduced in this
patch to have opaque data in the tlb_invalidate_info argument.

Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c | 13 +
 include/linux/iommu.h | 16 
 2 files changed, 29 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2da636..ca7cff2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain 
*domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
 
+int iommu_do_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+   int ret = 0;
+
+   if (unlikely(domain->ops->do_invalidate == NULL))
+   return -ENODEV;
+
+   ret = domain->ops->do_invalidate(domain, dev, inv_info);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_do_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 491a011..a48e3b75 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -140,6 +140,11 @@ struct pasid_table_info {
__u8opaque[];/* IOMMU-specific details */
 };
 
+struct tlb_invalidate_info {
+   __u32   model;
+   __u8opaque[];
+};
+
 #ifdef CONFIG_IOMMU_API
 
 /**
@@ -215,6 +220,8 @@ struct iommu_ops {
struct pasid_table_info *pasidt_binfo);
int (*unbind_pasid_table)(struct iommu_domain *domain,
struct device *dev);
+   int (*do_invalidate)(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info);
 
unsigned long pgsize_bitmap;
 };
@@ -240,6 +247,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain 
*domain,
struct device *dev, struct pasid_table_info *pasidt_binfo);
 extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
struct device *dev);
+extern int iommu_do_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info);
+
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot);
@@ -626,6 +636,12 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, 
struct device *dev)
return -EINVAL;
 }
 
+static inline int iommu_do_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+   return -EINVAL;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation

2017-04-26 Thread Liu, Yi L
From: "Liu, Yi L" 

This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
binding requests.

On VT-d, this IOCTL cmd would be used to link the guest PASID page table
to host. While for other vendors, it may also be used to support other
kind of SVM bind request. Previously, there is a discussion on it with
ARM engineer. It can be found by the link below. This IOCTL cmd may
support SVM PASID bind request from userspace driver, or page table(cr3)
bind request from guest. These SVM bind requests would be supported by
adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
support page table bind from guest.

https://patchwork.kernel.org/patch/9594231/

Signed-off-by: Liu, Yi L 
---
 include/uapi/linux/vfio.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 519eff3..6b97987 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/* IOCTL for Shared Virtual Memory Bind */
+struct vfio_device_svm {
+   __u32   argsz;
+#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */
+#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace driver */
+#define VFIO_SVM_BIND_PGTABLE  (1 << 2) /* Bind guest mmu page table */
+   __u32   flags;
+   __u32   length;
+   __u8data[];
+};
+
+#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \
+   VFIO_SVM_BIND_PASID | \
+   VFIO_SVM_BIND_PGTABLE)
+
+#define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function

2017-04-26 Thread Liu, Yi L
From: Jacob Pan 

Add Intel VT-d ops to the generic iommu_bind_pasid_table API
functions.

The primary use case is for direct assignment of SVM capable
device. Originated from emulated IOMMU in the guest, the request goes
through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
passes guest PASID table pointer (GPA) and size.

Device context table entry is modified by Intel IOMMU specific
bind_pasid_table function. This will turn on nesting mode and matching
translation type.

The unbind operation restores default context mapping.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-iommu.c   | 103 ++
 include/linux/dma_remapping.h |   1 +
 2 files changed, 104 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 646756c..6d5b939 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5306,6 +5306,105 @@ struct intel_iommu *intel_svm_device_to_iommu(struct 
device *dev)
 
return iommu;
 }
+
+static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
+   struct device *dev, struct pasid_table_info *pasidt_binfo)
+{
+   struct intel_iommu *iommu;
+   struct context_entry *context;
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct device_domain_info *info;
+   u8 bus, devfn;
+   u16 did, *sid;
+   int ret = 0;
+   unsigned long flags;
+   u64 ctx_lo;
+
+   if (pasidt_binfo == NULL || pasidt_binfo->model != INTEL_IOMMU) {
+   pr_warn("%s: Invalid bind request!\n", __func__);
+   return -EINVAL;
+   }
+
+   iommu = device_to_iommu(dev, &bus, &devfn);
+   if (!iommu)
+   return -ENODEV;
+
+   sid = (u16 *)&pasidt_binfo->opaque;
+   /* check SID, if it is not correct, return */
+   if (PCI_DEVID(bus, devfn) != *sid)
+   return 0;
+
+   info = dev->archdata.iommu;
+   if (!info || !info->pasid_supported) {
+   pr_err("Device %d:%d.%d has no pasid support\n", bus,
+   PCI_SLOT(devfn), PCI_FUNC(devfn));
+   ret = -EINVAL;
+   goto out;
+   }
+
+   if (pasidt_binfo->size >= intel_iommu_get_pts(iommu)) {
+   pr_err("Invalid gPASID table size %llu, host size %lu\n",
+   pasidt_binfo->size,
+   intel_iommu_get_pts(iommu));
+   ret = -EINVAL;
+   goto out;
+   }
+   spin_lock_irqsave(&iommu->lock, flags);
+   context = iommu_context_addr(iommu, bus, devfn, 0);
+   if (!context || !context_present(context)) {
+   pr_warn("%s: ctx not present for bus devfn %x:%x\n",
+   __func__, bus, devfn);
+   spin_unlock_irqrestore(&iommu->lock, flags);
+   goto out;
+   }
+   /* Anticipate guest to use SVM and owns the first level */
+   ctx_lo = context[0].lo;
+   ctx_lo |= CONTEXT_NESTE;
+   ctx_lo |= CONTEXT_PRS;
+   ctx_lo |= CONTEXT_PASIDE;
+   ctx_lo &= ~CONTEXT_TT_MASK;
+   ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
+   context[0].lo = ctx_lo;
+
+   /* Assign guest PASID table pointer and size */
+   ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size;
+   context[1].lo = ctx_lo;
+   /* make sure context entry is updated before flushing */
+   wmb();
+   did = dmar_domain->iommu_did[iommu->seq_id];
+   iommu->flush.flush_context(iommu, did,
+   (((u16)bus) << 8) | devfn,
+   DMA_CCMD_MASK_NOBIT,
+   DMA_CCMD_DEVICE_INVL);
+   iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+   spin_unlock_irqrestore(&iommu->lock, flags);
+
+
+out:
+   return ret;
+}
+
+static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
+   struct device *dev)
+{
+   struct intel_iommu *iommu;
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   u8 bus, devfn;
+
+   iommu = device_to_iommu(dev, &bus, &devfn);
+   if (!iommu)
+   return -ENODEV;
+   /*
+* REVISIT: we might want to clear the PASID table pointer
+* as part of context clear operation. Currently, it leaves
+* stale data but should be ignored by hardware since PASIDE
+* is clear.
+*/
+   /* ATS will be reenabled when remapping is restored */
+   pci_disable_ats(to_pci_dev(dev));
+   domain_context_clear(iommu, dev);
+   return domain_context_mapping_one(dmar_domain, iommu, bus, devfn);
+}
 #endif /* CONFIG_INTEL_IOMMU_SVM */
 
 static const struct iommu_ops intel_iommu_ops = {
@@ -5314,6 +5413,10 

[RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function

2017-04-26 Thread Liu, Yi L
From: Jacob Pan 

This patch adds Intel VT-d specific function to implement
iommu_do_invalidate API.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 43 +++
 include/linux/intel-iommu.h | 11 +++
 2 files changed, 54 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6d5b939..0b098ad 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct 
iommu_domain *domain,
dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
 }
 
+static int intel_iommu_do_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+   int ret = 0;
+   struct intel_iommu *iommu;
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct intel_invalidate_data *inv_data;
+   struct qi_desc *qi;
+   u16 did;
+   u8 bus, devfn;
+
+   if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
+   return -EINVAL;
+
+   iommu = device_to_iommu(dev, &bus, &devfn);
+   if (!iommu)
+   return -ENODEV;
+
+   inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
+
+   /* check SID */
+   if (PCI_DEVID(bus, devfn) != inv_data->sid)
+   return 0;
+
+   qi = &inv_data->inv_desc;
+
+   switch (qi->low & QI_TYPE_MASK) {
+   case QI_DIOTLB_TYPE:
+   case QI_DEIOTLB_TYPE:
+   /* for device IOTLB, we just let it pass through */
+   break;
+   default:
+   did = dmar_domain->iommu_did[iommu->seq_id];
+   set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did));
+   break;
+   }
+
+   ret = qi_submit_sync(qi, iommu);
+
+   return ret;
+}
+
 static int intel_iommu_map(struct iommu_domain *domain,
   unsigned long iova, phys_addr_t hpa,
   size_t size, int iommu_prot)
@@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct 
iommu_domain *domain,
 #ifdef CONFIG_INTEL_IOMMU_SVM
.bind_pasid_table   = intel_iommu_bind_pasid_table,
.unbind_pasid_table = intel_iommu_unbind_pasid_table,
+   .do_invalidate  = intel_iommu_do_invalidate,
 #endif
.map= intel_iommu_map,
.unmap  = intel_iommu_unmap,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ac04f28..9d6562c 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -271,6 +272,10 @@ enum {
 #define QI_PGRP_RESP_TYPE  0x9
 #define QI_PSTRM_RESP_TYPE 0xa
 
+#define QI_DID(did)(((u64)did & 0x) << 16)
+#define QI_DID_MASKGENMASK(31, 16)
+#define QI_TYPE_MASK   GENMASK(3, 0)
+
 #define QI_IEC_SELECTIVE   (((u64)1) << 4)
 #define QI_IEC_IIDEX(idx)  (((u64)(idx & 0x) << 32))
 #define QI_IEC_IM(m)   (((u64)(m & 0x1f) << 27))
@@ -529,6 +534,12 @@ struct intel_svm {
 extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
 #endif
 
+struct intel_invalidate_data {
+   u16 sid;
+   u32 pasid;
+   struct qi_desc inv_desc;
+};
+
 extern const struct attribute_group *intel_iommu_groups[];
 extern void intel_iommu_debugfs_init(void);
 extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 19/20] intel_iommu: propagate PASID-Cache invalidate to host

2017-04-26 Thread Liu, Yi L
This patch adds support for propagating PASID-Cache invalidation to host.
Similar with Extended-IOTLB invalidation, intel_iommu emulator would also
check all the assigned devices and do sanity check, then pass it to host.

Host pIOMMU driver would replace some fields in the raw data before
submitting to pIOMMU. e.g. guest domain ID must be replaced with the real
domain ID in host. In future PASID may need to be replaced.

Signed-off-by: Liu, Yi L 
---
 hw/i386/intel_iommu.c  | 56 ++
 hw/i386/intel_iommu_internal.h | 10 
 2 files changed, 66 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5fbb7f1..c5e9170 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2006,6 +2006,7 @@ static void vtd_tlb_inv_notify_hook(VTDNotifierIterator 
*iter,
 tlb_hook_info = (VTDIOTLBInvHookInfo *) hook_info;
 switch (tlb_hook_info->inv_desc->lo & VTD_INV_DESC_TYPE) {
 case VTD_INV_DESC_EXT_IOTLB:
+case VTD_INV_DESC_PC:
 if (iter->did == *tlb_hook_info->did) {
 break;
 } else {
@@ -2098,6 +2099,54 @@ static bool vtd_process_exiotlb_desc(IntelIOMMUState *s,
 return true;
 }
 
+static bool vtd_process_pasid_desc(IntelIOMMUState *s,
+   VTDInvDesc *inv_desc)
+{
+uint16_t domain_id;
+uint32_t pasid;
+VTDIOTLBInvHookInfo tlb_hook_info;
+
+if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) ||
+(inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) {
+VTD_DPRINTF(GENERAL, "error: non-zero reserved field"
+" in PASID desc, hi 0x%"PRIx64 " lo 0x%"PRIx64,
+inv_desc->hi, inv_desc->lo);
+return false;
+}
+
+domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo);
+
+switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) {
+case VTD_INV_DESC_PASIDC_ALL_ALL:
+VTD_DPRINTF(INV, "Invalidate all PASID");
+break;
+
+case VTD_INV_DESC_PASIDC_PASID_SI:
+VTD_DPRINTF(INV, "pasid-selective invalidation"
+" domain 0x%"PRIx16, domain_id);
+break;
+
+default:
+VTD_DPRINTF(GENERAL, "error: invalid granularity"
+" in PASID-Cache Invalidate Descriptor"
+" hi 0x%"PRIx64 " lo 0x%"PRIx64,
+inv_desc->hi, inv_desc->lo);
+return false;
+}
+
+pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->lo);
+
+tlb_hook_info.did = &domain_id;
+tlb_hook_info.sid = NULL;
+tlb_hook_info.pasid = &pasid;
+tlb_hook_info.inv_desc = inv_desc;
+vtd_tlb_inv_passdown_notify(s,
+&tlb_hook_info,
+vtd_tlb_inv_notify_hook);
+
+return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
 VTDInvDesc inv_desc;
@@ -2134,6 +2183,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 }
 break;
 
+ case VTD_INV_DESC_PC:
+trace_vtd_inv_desc("pasid-cache", inv_desc.hi, inv_desc.lo);
+if (!vtd_process_pasid_desc(s, &inv_desc)) {
+return false;
+}
+break;
+
 case VTD_INV_DESC_WAIT:
 trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
 if (!vtd_process_wait_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 9f89751..a6b9350 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -342,6 +342,7 @@ typedef union VTDInvDesc VTDInvDesc;
Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT   0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_EXT_IOTLB  0x6 /* Ext-IOTLB Invalidate Desc */
+#define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */
 #define VTD_INV_DESC_NONE   0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -397,6 +398,15 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_EXIOTLB_IH(val)   (((val) >> 6) & 0x1)
 #define VTD_INV_DESC_EXIOTLB_GL(val)   (((val) >> 7) & 0x1)
 
+#define VTD_INV_DESC_PASIDC_G  (3ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfULL)
+#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) & VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PASIDC_RSVD_LO0xfff0ffc0ULL
+#define VTD_INV_DESC_PASIDC_RSVD_HI0xULL
+
+#define VTD_INV_DESC_PASIDC_ALL_ALL(0ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
 uint16_t domain_id;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 8/8] VFIO: do IOMMU TLB invalidation from guest

2017-04-26 Thread Liu, Yi L
From: "Liu, Yi L" 

This patch adds support for VFIO_IOMMU_TLB_INVALIDATE cmd in
vfio_iommu_type1.

For SVM virtualization on VT-d, for VFIO_IOMMU_TLB_INVALIDATE, it
calls iommu_ops->do_invalidate() to submit the guest iommu cache
invalidation to pIOMMU.

Signed-off-by: Liu, Yi L 
---
 drivers/vfio/vfio_iommu_type1.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 30b6d48..6cebdfd 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1528,6 +1528,17 @@ static int bind_pasid_tbl_fn(struct device *dev, void 
*data)
return ret;
 }
 
+static int do_tlb_inv_fn(struct device *dev, void *data)
+{
+   int ret = 0;
+   struct vfio_svm_task *task = data;
+   struct tlb_invalidate_info *inv_info;
+
+   inv_info = task->payload;
+   ret = iommu_do_invalidate(task->domain, dev, inv_info);
+   return ret;
+}
+
 static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
int (*fn)(struct device *, void *))
 {
@@ -1654,6 +1665,21 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
}
kfree(data);
return ret;
+   } else if (cmd == VFIO_IOMMU_TLB_INVALIDATE) {
+   struct vfio_iommu_tlb_invalidate hdr;
+   u8 *data = NULL;
+   int ret = 0;
+
+   minsz = offsetofend(struct vfio_iommu_tlb_invalidate, length);
+   if (copy_from_user(&hdr, (void __user *)arg, minsz))
+   return -EFAULT;
+   if (hdr.length == 0)
+   return -EINVAL;
+   data = memdup_user((void __user *)(arg + minsz),
+   hdr.length);
+   ret = vfio_do_svm_task(iommu, data, do_tlb_inv_fn);
+   kfree(data);
+   return ret;
}
 
return -ENOTTY;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-04-26 Thread Liu, Yi L
From: "Liu, Yi L" 

This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
invalidate request from guest to host.

In the case of SVM virtualization on VT-d, host IOMMU driver has
no knowledge of caching structure updates unless the guest
invalidation activities are passed down to the host. So a new
IOCTL is needed to propagate the guest cache invalidation through
VFIO.

Signed-off-by: Liu, Yi L 
---
 include/uapi/linux/vfio.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 6b97987..50c51f8 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -564,6 +564,15 @@ struct vfio_device_svm {
 
 #define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
 
+/* For IOMMU TLB Invalidation Propagation */
+struct vfio_iommu_tlb_invalidate {
+   __u32   argsz;
+   __u32   length;
+   __u8data[];
+};
+
+#define VFIO_IOMMU_TLB_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 18/20] intel_iommu: propagate Extended-IOTLB invalidate to host

2017-04-26 Thread Liu, Yi L
The invalidation of Extended-IOTLB invalidates first-level and nested
mappings from the IOTLB and the paging-structure-caches.

For SVM virtualization, iommu tlb invalidate notifier is added. The reason
is as below:

* On VT-d, MAP/UNMAP notifier would be used to shadow the changes of the
  guest second-level page table. While for the 1st-level page table, is
  not shadowed like the way of second-level page table. Actually, the
  guest 1st-level page table is linked to host after the whole guest PASID
  table is linked to host. 1st-level page table is owned by guest in this
  SVM virtualization solution for VT-d. Guest should have modified the
  1st-level page table in memory before it issues the invalidate request
  for 1st-level mappings, so MAP/UNMAP notifier is not suitable for the
  invalidation of guest 1st-level mappings.

* Since guest owns the 1st-level page table, host have no knowledge about
  the invalidations to 1st-level related mappings. So intel_iommu emulator
  needs to propagate the invalidate request to host, then host invalidates
  the 1st-level and nested mapping in IOTLB and paging-structure-caches on
  host. So a new notifier is added to meet such requirement.

Before passing the invalidate request to host, intel_iommu emulator needs
to do specific translation to the invalidation request. e.g. granularity
translation, needs to limit the scope of the invalidate.

This patchset proposes passing raw data from guest to host when propagating
the guest IOMMU TLB invalidation. As the cover letter mentioned, there is
both pros and cons for passing raw data. Would be pleased to see comments
on the way how to pass the invalidate request to host.

For Extended-IOTLB invalidation, intel_iommu emulator would check all the
assigned devices to see if the device is affected by the invalidate request,
also intel_iommu emulator needs to do sanity check to the invalidate request
and then pass it to host.

Host would replace some fields in the raw data before submitting to pIOMMU.
e.g. guest domain ID must be replaced with the real domain ID in host. In
future PASID may also need to be replaced.

Signed-off-by: Liu, Yi L 
---
 hw/i386/intel_iommu.c  | 126 +
 hw/i386/intel_iommu_internal.h |  33 +++
 2 files changed, 159 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index cd6db65..5fbb7f1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -64,6 +64,10 @@ static void vtd_context_inv_notify_hook(VTDNotifierIterator 
*iter,
 void *hook_info,
 void *notify_info);
 
+static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
+void *hook_info,
+void *notify_info);
+
 #define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
__opaque_type, \
__hook_info, \
@@ -1979,6 +1983,121 @@ done:
 return true;
 }
 
+static void vtd_tlb_inv_passdown_notify(IntelIOMMUState *s,
+VTDIOTLBInvHookInfo *hook_info,
+vtd_device_hook hook_fn)
+{
+FOR_EACH_ASSIGN_DEVICE(struct tlb_invalidate_info,
+   VTDInvalidateData,
+   hook_info,
+   hook_fn);
+return;
+}
+
+static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
+ void *hook_info,
+ void *notify_info)
+{
+struct tlb_invalidate_info *tlb_inv_info;
+IOMMUNotifierData iommu_data;
+VTDIOTLBInvHookInfo *tlb_hook_info;
+VTDInvalidateData *inv_data;
+tlb_inv_info = (struct tlb_invalidate_info *) notify_info;
+tlb_hook_info = (VTDIOTLBInvHookInfo *) hook_info;
+switch (tlb_hook_info->inv_desc->lo & VTD_INV_DESC_TYPE) {
+case VTD_INV_DESC_EXT_IOTLB:
+if (iter->did == *tlb_hook_info->did) {
+break;
+} else {
+return;
+}
+default:
+return;
+}
+
+tlb_inv_info->model = INTEL_IOMMU;
+
+inv_data = (VTDInvalidateData *)&tlb_inv_info->opaque;
+inv_data->pasid = *tlb_hook_info->pasid;
+inv_data->sid = iter->host_sid;
+inv_data->inv_desc = *tlb_hook_info->inv_desc;
+
+iommu_data.payload = (uint8_t *) tlb_inv_info;
+iommu_data.payload_size = sizeof(*tlb_inv_info) + sizeof(*inv_data);
+
+memory_region_notify_iommu_invalidate(&iter->vtd_as->iommu,
+  &iommu_data);
+}
+
+static bool vtd_process_exiotlb_desc(IntelIOMMUState *s,
+ VTDInvDesc *inv_desc)
+{
+uint16_t domain_id;
+uint32_t pasid;
+uint8_t am;
+VTDIOTLBInvHookInfo tlb_hook_info;
+
+if ((inv_desc->lo & VTD_INV_DESC_EXIOTLB_

[RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-04-26 Thread Liu, Yi L
From: Jacob Pan 

Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
case in the guest:
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html

As part of the proposed architecture, when a SVM capable PCI
device is assigned to a guest, nested mode is turned on. Guest owns the
first level page tables (request with PASID) and performs GVA->GPA
translation. Second level page tables are owned by the host for GPA->HPA
translation for both request with and without PASID.

A new IOMMU driver interface is therefore needed to perform tasks as
follows:
* Enable nested translation and appropriate translation type
* Assign guest PASID table pointer (in GPA) and size to host IOMMU

This patch introduces new functions called iommu_(un)bind_pasid_table()
to IOMMU APIs. Architecture specific IOMMU function can be added later
to perform the specific steps for binding pasid table of assigned devices.

This patch also adds model definition in iommu.h. It would be used to
check if the bind request is from a compatible entity. e.g. a bind
request from an intel_iommu emulator may not be supported by an ARM SMMU
driver.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/iommu.c | 19 +++
 include/linux/iommu.h | 31 +++
 2 files changed, 50 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index dbe7f65..f2da636 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+   struct pasid_table_info *pasidt_binfo)
+{
+   if (unlikely(!domain->ops->bind_pasid_table))
+   return -EINVAL;
+
+   return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
+}
+EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
+
+int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
+{
+   if (unlikely(!domain->ops->unbind_pasid_table))
+   return -EINVAL;
+
+   return domain->ops->unbind_pasid_table(domain, dev);
+}
+EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0ff5111..491a011 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -131,6 +131,15 @@ struct iommu_dm_region {
int prot;
 };
 
+struct pasid_table_info {
+   __u64   ptr;/* PASID table ptr */
+   __u64   size;   /* PASID table size*/
+   __u32   model;  /* magic number */
+#define INTEL_IOMMU(1 << 0)
+#define ARM_SMMU   (1 << 1)
+   __u8opaque[];/* IOMMU-specific details */
+};
+
 #ifdef CONFIG_IOMMU_API
 
 /**
@@ -159,6 +168,8 @@ struct iommu_dm_region {
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @bind_pasid_table: bind pasid table pointer for guest SVM
+ * @unbind_pasid_table: unbind pasid table pointer and restore defaults
  */
 struct iommu_ops {
bool (*capable)(enum iommu_cap);
@@ -200,6 +211,10 @@ struct iommu_ops {
u32 (*domain_get_windows)(struct iommu_domain *domain);
 
int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
+   int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
+   struct pasid_table_info *pasidt_binfo);
+   int (*unbind_pasid_table)(struct iommu_domain *domain,
+   struct device *dev);
 
unsigned long pgsize_bitmap;
 };
@@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
   struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
+extern int iommu_bind_pasid_table(struct iommu_domain *domain,
+   struct device *dev, struct pasid_table_info *pasidt_binfo);
+extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
+   struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot);
@@ -595,6 +614,18 @@ const struct iommu_ops *iommu_get_instance(struct 
fwnode_handle *fwnode)
return NULL;
 }
 
+static inline
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+   struct pasid_table_info *pasidt_binfo)
+{
+   return -EINVAL;
+}
+static inline
+int iommu_unbind_pasid_table(struc

Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier

2017-04-26 Thread Liu, Yi L
On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> 
> 
> On 26/04/2017 12:06, Liu, Yi L wrote:
> > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > + void *data)
> > +{
> > +IOMMUNotifier *iommu_notifier;
> > +IOMMUNotifierFlag request_flags;
> > +
> > +assert(memory_region_is_iommu(mr));
> > +
> > +/*TODO: support other bind requests with smaller gran,
> > + * e.g. bind signle pasid entry
> > + */
> > +request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > +
> > +QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> > +if (iommu_notifier->notifier_flags & request_flags) {
> > +iommu_notifier->notify(iommu_notifier, data);
> > +break;
> > +}
> > +}
> 
> Peter,
> 
> should this reuse ->notify, or should it be different function pointer
> in IOMMUNotifier?

Hi Paolo,

Thx for your review.

I think it should be “->notify” here. In this patchset, the new notifier
is registered with the existing notifier registration API. So the all the
notifiers are in the mr->iommu_notify list. And notifiers are labeled
by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
When the flag meets, trigger it by “->notify”. The diagram below shows
my understanding , wish it helps to make me understood.

VFIOContainer
   |
   giommu_list(VFIOGuestIOMMU)
\
 VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
| | |
mr->iommu_notify: IOMMUNotifier   ->IOMMUNotifier  ->  IOMMUNotifier
  (Flag:MAP/UNMAP) (Flag:SVM bind)  (Flag:tlb invalidate)


Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
no start/end check, and there may be other types of bind notfier flag in
future, so I added a separate fire func for SVM bind notifier.

Thanks,
Yi L

> Paolo
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation

2017-04-26 Thread Liu, Yi L
On Wed, Apr 26, 2017 at 05:56:50PM +0100, Jean-Philippe Brucker wrote:
> On 26/04/17 11:12, Liu, Yi L wrote:
> > From: "Liu, Yi L" 
> > 
> > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> > binding requests.
> > 
> > On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> > to host. While for other vendors, it may also be used to support other
> > kind of SVM bind request. Previously, there is a discussion on it with
> > ARM engineer. It can be found by the link below. This IOCTL cmd may
> > support SVM PASID bind request from userspace driver, or page table(cr3)
> > bind request from guest. These SVM bind requests would be supported by
> > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> > support page table bind from guest.
> > 
> > https://patchwork.kernel.org/patch/9594231/
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/uapi/linux/vfio.h | 17 +
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 519eff3..6b97987 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
> >  
> > +/* IOCTL for Shared Virtual Memory Bind */
> > +struct vfio_device_svm {
> > +   __u32   argsz;
> > +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */
> > +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace 
> > driver */
> > +#define VFIO_SVM_BIND_PGTABLE  (1 << 2) /* Bind guest mmu page table */
> > +   __u32   flags;
> > +   __u32   length;
> > +   __u8data[];
> > +};
> > +
> > +#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \
> > +   VFIO_SVM_BIND_PASID | \
> > +   VFIO_SVM_BIND_PGTABLE)
> > +
> > +#define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
> 
> This could be called "VFIO_IOMMU_SVM_BIND, since it will be used both to
> bind tables and individual tasks.

yes, it is. would modify it in next version.

Thanks,
Yi L 
> Thanks,
> Jean
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-04-26 Thread Liu, Yi L
On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi, Jacob,
> 
> On 26/04/17 11:11, Liu, Yi L wrote:
> > From: Jacob Pan 
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/iommu.c | 19 +++
> >  include/linux/iommu.h | 31 +++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
> > struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +   struct pasid_table_info *pasidt_binfo)
> 
> I guess that domain can always be deduced from dev using
> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> 
> For the next version of my SVM series, I was thinking of passing group
> instead of device to iommu_bind. Since all devices in a group are expected
> to share the same mappings (whether they want it or not), users will have
> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> might be simpler to let the IOMMU core take the group lock and do
> group->domain->ops->bind_task(dev...) for each device. The question also
> holds for iommu_do_invalidate in patch 3/8.
> 
> This way the prototypes would be:
> int iommu_bind...(struct iommu_group *group, struct ... *info)
> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> 
> For PASID table binding it might not matter much, as VFIO will most likely
> be the only user. But task binding will be called by device drivers, which
> by now should be encouraged to do things at iommu_group granularity.
> Alternatively it could be done implicitly like in iommu_attach_device,
> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> 
> 
> Extending this reasoning, since groups in a domain are also supposed to
> have the same mappings, then similarly to map/unmap,
> bind/unbind/invalidate should really be done with an iommu_domain (and
> nothing else) as target argument. However this requires the IOMMU core to
> keep a group list in each domain, which might complicate things a little
> too much.
> 
> But "all devices in a domain share the same PASID table" is the paradigm
> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> iommu_group, it should be made more explicit to users, so they don't
> assume that devices within a domain are isolated from each others with
> regard to PASID DMA.
> 
> > +{
> > +   if (unlikely(!domain->ops->bind_pasid_table))
> > +   return -EINVAL;
> > +
> > +   return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device 
> > *dev)
> > +{
> > +   if (unlikely(!domain->ops->unbind_pasid_table))
> > +   return -EINVAL;
> > +
> > +   return domain->o

Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier

2017-04-27 Thread Liu, Yi L
On Thu, Apr 27, 2017 at 02:14:27PM +0800, Peter Xu wrote:
> On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> > On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > > + void *data)
> > > > +{
> > > > +IOMMUNotifier *iommu_notifier;
> > > > +IOMMUNotifierFlag request_flags;
> > > > +
> > > > +assert(memory_region_is_iommu(mr));
> > > > +
> > > > +/*TODO: support other bind requests with smaller gran,
> > > > + * e.g. bind signle pasid entry
> > > > + */
> > > > +request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > > +
> > > > +QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> > > > +if (iommu_notifier->notifier_flags & request_flags) {
> > > > +iommu_notifier->notify(iommu_notifier, data);
> > > > +break;
> > > > +}
> > > > +}
> > > 
> > > Peter,
> > > 
> > > should this reuse ->notify, or should it be different function pointer
> > > in IOMMUNotifier?
> > 
> > Hi Paolo,
> > 
> > Thx for your review.
> > 
> > I think it should be “->notify” here. In this patchset, the new notifier
> > is registered with the existing notifier registration API. So the all the
> > notifiers are in the mr->iommu_notify list. And notifiers are labeled
> > by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> > When the flag meets, trigger it by “->notify”. The diagram below shows
> > my understanding , wish it helps to make me understood.
> > 
> > VFIOContainer
> >|
> >giommu_list(VFIOGuestIOMMU)
> > \
> >  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
> > | | |
> > mr->iommu_notify: IOMMUNotifier   ->IOMMUNotifier  ->  IOMMUNotifier
> >   (Flag:MAP/UNMAP) (Flag:SVM bind)  (Flag:tlb 
> > invalidate)
> > 
> > 
> > Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> > no start/end check, and there may be other types of bind notfier flag in
> > future, so I added a separate fire func for SVM bind notifier.
> 
> I agree with Paolo that this interface might not be the suitable place
> for the SVM notifiers (just like what I worried about in previous
> discussions).
> 
> The biggest problem is that, if you see current notifier mechanism,
> it's per-memory-region. However iiuc your messages should be
> per-iommu, or say, per translation unit.

Hi Peter,

yes, you're right. the newly added notifier is per-iommu.

> While, for each iommu, there
> can be more than one memory regions (ppc can be an example). When
> there are more than one MRs binded to the same iommu unit, which
> memory region should you register to? Any one of them, or all?

Honestly, I'm not expert on ppc. According to the current code,
I can only find one MR initialized with memory_region_init_iommu()
in spapr_tce_table_realize(). So to better get your point, let me
check. Do you mean there may be multiple of iommu MRs behind a iommu?

I admit it must be considered if there are multiple iommu MRs. I may
choose to register for one of them since the notifier is per-iommu as
you've pointed. Then vIOMMU emulator need to trigger the notifier with
the correct MR. Not sure if ppc vIOMMU is fine with it.

> So my conclusion is, it just has nothing to do with memory regions...
>
> Instead of a different function pointer in IOMMUNotifer, IMHO we can
> even move a step further, to isolate IOTLB notifications (targeted at
> memory regions and with start/end ranges) out of SVM/other
> notifications, since they are different in general. So we basically
> need two notification mechanism:
> 
> - one for memory regions, currently what I can see is IOTLB
>   notifications
> 
> - one for translation units, currently I see all the rest of
>   notifications needed in virt-svm in this category
> 
> Maybe some RFC patches would be good to show what I mean... I'll see
> whether I can prepare some.

I agree that it would be helpful to split the two kinds of notifiers. I
marked it as a FIXME in patch 0006 of this series. Just saw your RFC patch
for common IOMMUObject. Thx for your work, would try to review it.

Besides the notifier registration, pls also help to review the SVM
virtualization itself. Would be glad to know your comments.

Thanks,
Yi L

> Thanks,
> 
> -- 
> Peter Xu
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-04-28 Thread Liu, Yi L
On Thu, Apr 27, 2017 at 11:12:45AM +0100, Jean-Philippe Brucker wrote:
> On 27/04/17 07:36, Liu, Yi L wrote:
> > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> >> Hi Yi, Jacob,
> >>
> >> On 26/04/17 11:11, Liu, Yi L wrote:
> >>> From: Jacob Pan 
> >>>
> >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> >>> case in the guest:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >>>
> >>> As part of the proposed architecture, when a SVM capable PCI
> >>> device is assigned to a guest, nested mode is turned on. Guest owns the
> >>> first level page tables (request with PASID) and performs GVA->GPA
> >>> translation. Second level page tables are owned by the host for GPA->HPA
> >>> translation for both request with and without PASID.
> >>>
> >>> A new IOMMU driver interface is therefore needed to perform tasks as
> >>> follows:
> >>> * Enable nested translation and appropriate translation type
> >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> >>>
> >>> This patch introduces new functions called iommu_(un)bind_pasid_table()
> >>> to IOMMU APIs. Architecture specific IOMMU function can be added later
> >>> to perform the specific steps for binding pasid table of assigned devices.
> >>>
> >>> This patch also adds model definition in iommu.h. It would be used to
> >>> check if the bind request is from a compatible entity. e.g. a bind
> >>> request from an intel_iommu emulator may not be supported by an ARM SMMU
> >>> driver.
> >>>
> >>> Signed-off-by: Jacob Pan 
> >>> Signed-off-by: Liu, Yi L 
> >>> ---
> >>>  drivers/iommu/iommu.c | 19 +++
> >>>  include/linux/iommu.h | 31 +++
> >>>  2 files changed, 50 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>> index dbe7f65..f2da636 100644
> >>> --- a/drivers/iommu/iommu.c
> >>> +++ b/drivers/iommu/iommu.c
> >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain 
> >>> *domain, struct device *dev)
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >>>  
> >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device 
> >>> *dev,
> >>> + struct pasid_table_info *pasidt_binfo)
> >>
> >> I guess that domain can always be deduced from dev using
> >> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> >>
> >> For the next version of my SVM series, I was thinking of passing group
> >> instead of device to iommu_bind. Since all devices in a group are expected
> >> to share the same mappings (whether they want it or not), users will have
> >> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> >> might be simpler to let the IOMMU core take the group lock and do
> >> group->domain->ops->bind_task(dev...) for each device. The question also
> >> holds for iommu_do_invalidate in patch 3/8.
> >>
> >> This way the prototypes would be:
> >> int iommu_bind...(struct iommu_group *group, struct ... *info)
> >> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> >> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> >>
> >> For PASID table binding it might not matter much, as VFIO will most likely
> >> be the only user. But task binding will be called by device drivers, which
> >> by now should be encouraged to do things at iommu_group granularity.
> >> Alternatively it could be done implicitly like in iommu_attach_device,
> >> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> >>
> >>
> >> Extending this reasoning, since groups in a domain are also supposed to
> >> have the same mappings, then similarly to map/unmap,
> >> bind/unbind/invalidate should really be done with an iommu_domain (and
> >> nothing else) as target argument. However this requires the IOMMU core to
> >> keep a group list in each domain, which might complicate things a little
> >> too much.
> >>
> >> But "all devices in a domain share the same PASID table" is the paradigm
> >> I'm currently usin

Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-04-28 Thread Liu, Yi L
On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi, Jacob,
> 
> On 26/04/17 11:11, Liu, Yi L wrote:
> > From: Jacob Pan 
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/iommu.c | 19 +++
> >  include/linux/iommu.h | 31 +++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
> > struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +   struct pasid_table_info *pasidt_binfo)
> 
> I guess that domain can always be deduced from dev using
> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> 
> For the next version of my SVM series, I was thinking of passing group
> instead of device to iommu_bind. Since all devices in a group are expected
> to share the same mappings (whether they want it or not), users will have

Virtual address space is not tied to protection domain as I/O virtual address
space does. Is it really necessary to affect all the devices in this group.
Or it is just for consistence?

> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> might be simpler to let the IOMMU core take the group lock and do
> group->domain->ops->bind_task(dev...) for each device. The question also
> holds for iommu_do_invalidate in patch 3/8.

In my understanding, it is moving the for_each_dev loop into iommu driver?
Is it?

> This way the prototypes would be:
> int iommu_bind...(struct iommu_group *group, struct ... *info)
> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> int iommu_invalidate...(struct iommu_group *group, struct ...*info)

For PASID table binding from guest, I think it'd better to be per-device op
since the bind operation wants to modify the host context entry. But we may
still share the API and do things differently in iommu driver.

For invalidation, I think it'd better to be per-group. Actually, with guest
IOMMU exists, there is only one group in a domain on Intel platform. Do it for
each device is not expected. How about it on ARM?

> For PASID table binding it might not matter much, as VFIO will most likely
> be the only user. But task binding will be called by device drivers, which
> by now should be encouraged to do things at iommu_group granularity.
> Alternatively it could be done implicitly like in iommu_attach_device,
> with "iommu_bind_device_x" calling "iommu_bind_group_x".

Do you mean the bind task from userspace driver? I guess you're trying to do
different types of binding request in a single svm_bind API?

> 
> Extending this reasoning, since groups in a domain are also supposed to
> have the same mappings, then similarly to map/unmap,
> bind/unbind/invalidate should really be done with an iommu_domain (and
> nothing else) as target argument. However this requires the IOMMU core to
> keep a group list in each domain, which might complicate things a little
> too much.
> 
> But "all devices in a domain share the same PASID ta

Re: [Qemu-devel] [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest

2017-04-28 Thread Liu, Yi L
On Thu, Apr 27, 2017 at 06:32:21PM +0800, Peter Xu wrote:
> On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
> > VT-d implementations reporting PASID or PRS fields as "Set", must also
> > report ecap.ECS as "Set". Extended-Context is required for SVM.
> > 
> > When ECS is reported, intel iommu driver would initiate extended root entry
> > and extended context entry, and also PASID table if there is any SVM capable
> > device.
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  hw/i386/intel_iommu.c  | 131 
> > +++--
> >  hw/i386/intel_iommu_internal.h |   9 +++
> >  include/hw/i386/intel_iommu.h  |   2 +-
> >  3 files changed, 97 insertions(+), 45 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 400d0d1..bf98fa5 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry 
> > *root)
> >  return root->val & VTD_ROOT_ENTRY_P;
> >  }
> >  
> > +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
> > +{
> > +return root->rsvd & VTD_ROOT_ENTRY_P;
> > +}
> > +
> >  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >VTDRootEntry *re)
> >  {
> > @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, 
> > uint8_t index,
> >  return -VTD_FR_ROOT_TABLE_INV;
> >  }
> >  re->val = le64_to_cpu(re->val);
> > +if (s->ecs) {
> > +re->rsvd = le64_to_cpu(re->rsvd);
> > +}
> 
> I feel it slightly hacky to play with re->rsvd. How about:
> 
> union VTDRootEntry {
> struct {
> uint64_t val;
> uint64_t rsvd;
> } base;
> struct {
> uint64_t ext_lo;
> uint64_t ext_hi;
> } extended;
> };

Agree.
 
> (Or any better way that can get rid of rsvd...)
> 
> Even:
> 
> struct VTDRootEntry {
> union {
> struct {
> uint64_t val;
> uint64_t rsvd;
> } base;
> struct {
> uint64_t ext_lo;
> uint64_t ext_hi;
> } extended;
> } data;
> bool extended;
> };
> 
> Then we read the entry into data, and setup extended bit. A benefit of
> it is that we may avoid passing around IntelIOMMUState everywhere to
> know whether we are using extended context entries.

For this proposal, it's combining the s->ecs bit and root entry. But it
may mislead future maintainer as it uses VTDRootEntry. maybe name it
differently.

> >  return 0;
> >  }
> >  
> > @@ -517,19 +525,30 @@ static inline bool 
> > vtd_context_entry_present(VTDContextEntry *context)
> >  return context->lo & VTD_CONTEXT_ENTRY_P;
> >  }
> >  
> > -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t 
> > index,
> > -   VTDContextEntry *ce)
> > +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
> > + VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
> >  {
> > -dma_addr_t addr;
> > +dma_addr_t addr, ce_size;
> >  
> >  /* we have checked that root entry is present */
> > -addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
> > -if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
> > +ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
> > +addr = (s->ecs && (index > 0x7f)) ?
> > +   ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
> > +   ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
> > +
> > +if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
> >  trace_vtd_re_invalid(root->rsvd, root->val);
> >  return -VTD_FR_CONTEXT_TABLE_INV;
> >  }
> > -ce->lo = le64_to_cpu(ce->lo);
> > -ce->hi = le64_to_cpu(ce->hi);
> > +
> > +ce[0].lo = le64_to_cpu(ce[0].lo);
> > +ce[0].hi = le64_to_cpu(ce[0].hi);
> 
> Again, I feel this even hackier. :)
> 
> I would slightly prefer to play the same union trick to context
> entries, just like what I proposed to the root entries above...

would think about it.

> > +
> > +if (s->ecs) {
> > +ce[1].lo = le64_to_cpu(ce[1].lo);
> > +ce[1].hi = le64_

Re: [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest

2017-04-28 Thread Liu, Yi L
On Fri, Apr 28, 2017 at 02:00:15PM +0800, Lan Tianyu wrote:
> On 2017年04月27日 18:32, Peter Xu wrote:
> > On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
> >> VT-d implementations reporting PASID or PRS fields as "Set", must also
> >> report ecap.ECS as "Set". Extended-Context is required for SVM.
> >>
> >> When ECS is reported, intel iommu driver would initiate extended root entry
> >> and extended context entry, and also PASID table if there is any SVM 
> >> capable
> >> device.
> >>
> >> Signed-off-by: Liu, Yi L 
> >> ---
> >>  hw/i386/intel_iommu.c  | 131 
> >> +++--
> >>  hw/i386/intel_iommu_internal.h |   9 +++
> >>  include/hw/i386/intel_iommu.h  |   2 +-
> >>  3 files changed, 97 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >> index 400d0d1..bf98fa5 100644
> >> --- a/hw/i386/intel_iommu.c
> >> +++ b/hw/i386/intel_iommu.c
> >> @@ -497,6 +497,11 @@ static inline bool 
> >> vtd_root_entry_present(VTDRootEntry *root)
> >>  return root->val & VTD_ROOT_ENTRY_P;
> >>  }
> >>  
> >> +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
> >> +{
> >> +return root->rsvd & VTD_ROOT_ENTRY_P;
> >> +}
> >> +
> >>  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >>VTDRootEntry *re)
> >>  {
> >> @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, 
> >> uint8_t index,
> >>  return -VTD_FR_ROOT_TABLE_INV;
> >>  }
> >>  re->val = le64_to_cpu(re->val);
> >> +if (s->ecs) {
> >> +re->rsvd = le64_to_cpu(re->rsvd);
> >> +}
> > 
> > I feel it slightly hacky to play with re->rsvd. How about:
> > 
> > union VTDRootEntry {
> > struct {
> > uint64_t val;
> > uint64_t rsvd;
> > } base;
> > struct {
> > uint64_t ext_lo;
> > uint64_t ext_hi;
> > } extended;
> > };
> > 
> > (Or any better way that can get rid of rsvd...)
> > 
> > Even:
> > 
> > struct VTDRootEntry {
> > union {
> > struct {
> > uint64_t val;
> > uint64_t rsvd;
> > } base;
> > struct {
> > uint64_t ext_lo;
> > uint64_t ext_hi;
> > } extended;
> > } data;
> > bool extended;
> > };
> > 
> > Then we read the entry into data, and setup extended bit. A benefit of
> > it is that we may avoid passing around IntelIOMMUState everywhere to
> > know whether we are using extended context entries.
> > 
> >>  return 0;
> >>  }
> >>  
> >> @@ -517,19 +525,30 @@ static inline bool 
> >> vtd_context_entry_present(VTDContextEntry *context)
> >>  return context->lo & VTD_CONTEXT_ENTRY_P;
> >>  }
> >>  
> >> -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t 
> >> index,
> >> -   VTDContextEntry *ce)
> >> +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
> >> + VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
> >>  {
> >> -dma_addr_t addr;
> >> +dma_addr_t addr, ce_size;
> >>  
> >>  /* we have checked that root entry is present */
> >> -addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
> >> -if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
> >> +ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
> >> +addr = (s->ecs && (index > 0x7f)) ?
> >> +   ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) 
> >> :
> >> +   ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
> >> +
> >> +if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
> >>  trace_vtd_re_invalid(root->rsvd, root->val);
> >>  return -VTD_FR_CONTEXT_TABLE_INV;
> >>  }
> >> -ce->lo = le64_to_cpu(ce->lo);
> >> -ce->hi = le64_to_cpu(ce->hi);
> >> +
> >> +ce[0].lo = le64_to_cpu(ce[0].lo);
> >

Re: [Qemu-devel] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d

2017-05-08 Thread Liu, Yi L
On Mon, May 08, 2017 at 12:09:42PM +0800, Xiao Guangrong wrote:
> 
> Hi Liu Yi,
> 
> I haven't started to read the code yet, however, could you
> detail more please? It emulates a SVM capable iommu device in
> a VM? Or It speeds up device's DMA access in a VM? Or it is a
> new facility introduced for a VM? Could you please add a bit
> more for its usage?

Hi Guangrong,

Nice to hear from you.

This patchset is part of the whole SVM virtualization work. The whole
patchset wants to expose a SVM capable Intel IOMMU to guest. And yes,
it is an emulated iommu.

For the detail introduction for SVM and SVM virtualization, I think
you may get more from the link below.

http://www.spinics.net/lists/kvm/msg148798.html

For the usage, I can give an example with IGD. Latest IGD is SVM capable
device. On bare metal(Intel IOMMU is also SVM capable), application could
request to share its virtual address(an allocated buffer) with IGD device
through the IOCTL cmd provided by IGD driver. e.g. OpenCL application. When
IGD is assigned to a guest, it is expected to support such usage in guest.
With the SVM virtualization patchset, the application in guest would also
be able to share its virtual address with IGD device. Different from bare
metal, it's sharing GVA with IGD. The hardware IOMMU needs to help translate
the GVA to HPA. So hardware IOMMU needs to know the GVA->HPA mapping. This
patchset would make sure the GVA->HPA mapping is built and maintain the TLB.

Be free to let me know if you want more detail.

Thanks,
Yi L

> 
> Thanks!
> 
> On 04/26/2017 06:11 PM, Liu, Yi L wrote:
> >Hi,
> >
> >This patchset introduces SVM virtualization for intel_iommu in
> >IOMMU/VFIO. The total SVM virtualization for intel_iommu touched
> >Qemu/IOMMU/VFIO.
> >
> >Another patchset would change the Qemu. It is "[RFC PATCH 0/20] Qemu:
> >Extend intel_iommu emulator to support Shared Virtual Memory"
> >
> >In this patchset, it adds two new IOMMU APIs and their implementation
> >in intel_iommu driver. In VFIO, it adds two IOCTL cmd attached on
> >container->fd to propagate data from QEMU to kernel space.
> >
> >[Patch Overview]
> >* 1 adds iommu API definition for binding guest PASID table
> >* 2 adds binding PASID table API implementation in VT-d iommu driver
> >* 3 adds iommu API definition to do IOMMU TLB invalidation from guest
> >* 4 adds IOMMU TLB invalidation implementation in VT-d iommu driver
> >* 5 adds VFIO IOCTL for propagating PASID table binding from guest
> >* 6 adds processing of pasid table binding in vfio_iommu_type1
> >* 7 adds VFIO IOCTL for propagating IOMMU TLB invalidation from guest
> >* 8 adds processing of IOMMU TLB invalidation in vfio_iommu_type1
> >
> >Best Wishes,
> >Yi L
> >
> >
> >Jacob Pan (3):
> >   iommu: Introduce bind_pasid_table API function
> >   iommu/vt-d: add bind_pasid_table function
> >   iommu/vt-d: Add iommu do invalidate function
> >
> >Liu, Yi L (5):
> >   iommu: Introduce iommu do invalidate API function
> >   VFIO: Add new IOTCL for PASID Table bind propagation
> >   VFIO: do pasid table binding
> >   VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
> >   VFIO: do IOMMU TLB invalidation from guest
> >
> >  drivers/iommu/intel-iommu.c | 146 
> > 
> >  drivers/iommu/iommu.c   |  32 +
> >  drivers/vfio/vfio_iommu_type1.c |  98 +++
> >  include/linux/dma_remapping.h   |   1 +
> >  include/linux/intel-iommu.h |  11 +++
> >  include/linux/iommu.h   |  47 +
> >  include/uapi/linux/vfio.h   |  26 +++
> >  7 files changed, 361 insertions(+)
> >
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH 03/20] intel_iommu: add "svm" option

2017-05-08 Thread Liu, Yi L
On Thu, 27 Apr 2017 18:53:17 +0800
Peter Xu  wrote:

> On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > Also use "svm" to expose SVM related capabilities to guest.
> > e.g. "-device intel-iommu, svm=on"
> >
> > Signed-off-by: Liu, Yi L 
> > ---
> >  hw/i386/intel_iommu.c  | 10 ++
> >  hw/i386/intel_iommu_internal.h |  5 +
> > include/hw/i386/intel_iommu.h  |  1 +
> >  3 files changed, 16 insertions(+)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > bf98fa5..ba1e7eb 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> >  DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> >  DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> FALSE),
> >  DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > +DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> >  DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> >  s->ecap |= VTD_ECAP_ECS;
> >  }
> >
> > +if (s->svm) {
> > +if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > +error_report("Need to set ecs, pt, caching-mode for svm");
> > +exit(1);
> > +}
> > +s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > +s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > +}
> > +
> >  if (s->caching_mode) {
> >  s->cap |= VTD_CAP_CM;
> >  }
> > diff --git a/hw/i386/intel_iommu_internal.h
> > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -191,6 +191,9 @@
> >  #define VTD_ECAP_PT (1ULL << 6)
> >  #define VTD_ECAP_MHMV   (15ULL << 20)
> >  #define VTD_ECAP_ECS(1ULL << 24)
> > +#define VTD_ECAP_PASID28(1ULL << 28)
> 
> Could I ask what's this bit? On my spec, it says this bit is reserved and 
> defunct (spec
> version: June 2016).

As Ashok confirmed, yes it should be bit 40. would update it.

> > +#define VTD_ECAP_PRS(1ULL << 29)
> > +#define VTD_ECAP_PTS(0xeULL << 35)
> 
> Would it better we avoid using 0xe here, or at least add some comment?

For this value, it must be no more than the bits host supports. So it may be
better to have a default value and meanwhile expose an option to let user
set it. how about your opinion?

> 
> >
> >  /* CAP_REG */
> >  /* (offset >> 4) << 24 */
> > @@ -207,6 +210,8 @@
> >  #define VTD_CAP_PSI (1ULL << 39)
> >  #define VTD_CAP_SLLPS   ((1ULL << 34) | (1ULL << 35))
> >  #define VTD_CAP_CM  (1ULL << 7)
> > +#define VTD_CAP_DWD (1ULL << 54)
> > +#define VTD_CAP_DRD (1ULL << 55)
> 
> Just to confirm: after this series, we should support drain read/write then, 
> right?

I haven’t done special process against it in IOMMU emulator. It's set to keep
consistence with VT-d spec since DWD and DRW is required capability when
PASID it reported as Set. However, I think it should be fine if guest issue QI
with drain read/write set in the descriptor. Host should be able to process it.

Thanks,
Yi L
> >
> >  /* Supported Adjusted Guest Address Widths */
> >  #define VTD_CAP_SAGAW_SHIFT 8
> > diff --git a/include/hw/i386/intel_iommu.h
> > b/include/hw/i386/intel_iommu.h index ae21fe5..8981615 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -267,6 +267,7 @@ struct IntelIOMMUState {
> >
> >  bool caching_mode;  /* RO - is cap CM enabled? */
> >  bool ecs;   /* Extended Context Support */
> > +bool svm;   /* Shared Virtual Memory */
> >
> >  dma_addr_t root;/* Current root table pointer */
> >  bool root_extended; /* Type of root table (extended or 
> > not) */
> > --
> > 1.9.1
> >
> 
> --
> Peter Xu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option

2017-05-09 Thread Liu, Yi L
On Mon, May 08, 2017 at 07:20:34PM +0800, Peter Xu wrote:
> On Mon, May 08, 2017 at 10:38:09AM +0000, Liu, Yi L wrote:
> > On Thu, 27 Apr 2017 18:53:17 +0800
> > Peter Xu  wrote:
> > 
> > > On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > > > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > > > Also use "svm" to expose SVM related capabilities to guest.
> > > > e.g. "-device intel-iommu, svm=on"
> > > >
> > > > Signed-off-by: Liu, Yi L 
> > > > ---
> > > >  hw/i386/intel_iommu.c  | 10 ++
> > > >  hw/i386/intel_iommu_internal.h |  5 +
> > > > include/hw/i386/intel_iommu.h  |  1 +
> > > >  3 files changed, 16 insertions(+)
> > > >
> > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > > > bf98fa5..ba1e7eb 100644
> > > > --- a/hw/i386/intel_iommu.c
> > > > +++ b/hw/i386/intel_iommu.c
> > > > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> > > >  DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> > > >  DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> > > FALSE),
> > > >  DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > > > +DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> > > >  DEFINE_PROP_END_OF_LIST(),
> > > >  };
> > > >
> > > > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> > > >  s->ecap |= VTD_ECAP_ECS;
> > > >  }
> > > >
> > > > +if (s->svm) {
> > > > +if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > > > +error_report("Need to set ecs, pt, caching-mode for svm");
> > > > +exit(1);
> > > > +}
> > > > +s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > > > +s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > > > +}
> > > > +
> > > >  if (s->caching_mode) {
> > > >  s->cap |= VTD_CAP_CM;
> > > >  }
> > > > diff --git a/hw/i386/intel_iommu_internal.h
> > > > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644
> > > > --- a/hw/i386/intel_iommu_internal.h
> > > > +++ b/hw/i386/intel_iommu_internal.h
> > > > @@ -191,6 +191,9 @@
> > > >  #define VTD_ECAP_PT (1ULL << 6)
> > > >  #define VTD_ECAP_MHMV   (15ULL << 20)
> > > >  #define VTD_ECAP_ECS(1ULL << 24)
> > > > +#define VTD_ECAP_PASID28(1ULL << 28)
> > > 
> > > Could I ask what's this bit? On my spec, it says this bit is reserved and 
> > > defunct (spec
> > > version: June 2016).
> > 
> > As Ashok confirmed, yes it should be bit 40. would update it.
> 
> Ok.
> 
> > 
> > > > +#define VTD_ECAP_PRS(1ULL << 29)
> > > > +#define VTD_ECAP_PTS(0xeULL << 35)
> > > 
> > > Would it better we avoid using 0xe here, or at least add some comment?
> > 
> > For this value, it must be no more than the bits host supports. So it may be
> > better to have a default value and meanwhile expose an option to let user
> > set it. how about your opinion?
> 
> I think a more important point is that we need to make sure this value
> is no larger than hardware support? 

Agree. If it is larger, sanity check would fail.

> Since you are also working on the
> vfio interface for virt-svm... would it be possible that we can talk
> to kernel in some way so that we can know the supported pasid size in
> host IOMMU? So that when guest specifies something bigger, we can stop
> the user.

If it is just to stop when the size is not valid, I think we already have
such sanity check in host when trying to bind guest pasid table. Not sure
if it is practical to talk with kernel on the supported pasid size. But
may think about it. It is very likely that we need to do it through VFIO.

> 
> I don't know the practical value for this field, if it's static
> enough, I think it's also okay we make it static here as well. But
> again, I would prefer at least some comment, like:
> 
>   /* Value N indicates PASID field of N+1 bits, here 0xe stands for.. */

yes, at least we need

Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation

2017-05-12 Thread Liu, Yi L
On Wed, Apr 26, 2017 at 06:12:02PM +0800, Liu, Yi L wrote:
> From: "Liu, Yi L" 

Hi Alex,

In this patchset, I'm trying to add two new IOCTL cmd for Shared
Virtual Memory virtualization. One for binding guest PASID Table
and one for iommu tlb invalidation from guest. ARM has similar
requirement on SVM supporting. Since it touched VFIO, I'd like
to know your comments on changes in VFIO.

Thanks,
Yi L

> This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> binding requests.
> 
> On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> to host. While for other vendors, it may also be used to support other
> kind of SVM bind request. Previously, there is a discussion on it with
> ARM engineer. It can be found by the link below. This IOCTL cmd may
> support SVM PASID bind request from userspace driver, or page table(cr3)
> bind request from guest. These SVM bind requests would be supported by
> adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> support page table bind from guest.
> 
> https://patchwork.kernel.org/patch/9594231/
> 
> Signed-off-by: Liu, Yi L 
> ---
>  include/uapi/linux/vfio.h | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..6b97987 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE   _IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/* IOCTL for Shared Virtual Memory Bind */
> +struct vfio_device_svm {
> + __u32   argsz;
> +#define VFIO_SVM_BIND_PASIDTBL   (1 << 0) /* Bind PASID Table */
> +#define VFIO_SVM_BIND_PASID  (1 << 1) /* Bind PASID from userspace driver */
> +#define VFIO_SVM_BIND_PGTABLE(1 << 2) /* Bind guest mmu page table */
> + __u32   flags;
> + __u32   length;
> + __u8data[];
> +};
> +
> +#define VFIO_SVM_TYPE_MASK   (VFIO_SVM_BIND_PASIDTBL | \
> + VFIO_SVM_BIND_PASID | \
> + VFIO_SVM_BIND_PGTABLE)
> +
> +#define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22)
> +
>  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
>  
>  /*
> -- 
> 1.9.1
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Qemu-devel] [RFC PATCH 6/8] VFIO: do pasid table binding

2017-05-12 Thread Liu, Yi L
On Tue, May 09, 2017 at 03:55:20PM +0800, Xiao Guangrong wrote:
> 
> 
> On 04/26/2017 06:12 PM, Liu, Yi L wrote:
> >From: "Liu, Yi L" 
> >
> >This patch adds IOCTL processing in vfio_iommu_type1 for
> >VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by
> >calling iommu_ops->bind_pasid_table to link the whole
> >PASID table to pIOMMU.
> >
> >For VT-d, it is linking the guest PASID table to host pIOMMU.
> >This is key point to support SVM virtualization on VT-d.
> >
> >Signed-off-by: Liu, Yi L 
> >---
> >  drivers/vfio/vfio_iommu_type1.c | 72 
> > +
> >  1 file changed, 72 insertions(+)
> >
> >diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >b/drivers/vfio/vfio_iommu_type1.c
> >index b3cc33f..30b6d48 100644
> >--- a/drivers/vfio/vfio_iommu_type1.c
> >+++ b/drivers/vfio/vfio_iommu_type1.c
> >@@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct 
> >vfio_iommu *iommu)
> > return ret;
> >  }
> >+struct vfio_svm_task {
> >+struct iommu_domain *domain;
> >+void *payload;
> >+};
> >+
> >+static int bind_pasid_tbl_fn(struct device *dev, void *data)
> >+{
> >+int ret = 0;
> >+struct vfio_svm_task *task = data;
> >+struct pasid_table_info *pasidt_binfo;
> >+
> >+pasidt_binfo = task->payload;
> >+ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo);
> >+return ret;
> >+}
> >+
> >+static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
> >+int (*fn)(struct device *, void *))
> >+{
> >+int ret = 0;
> >+struct vfio_domain *d;
> >+struct vfio_group *g;
> >+struct vfio_svm_task task;
> >+
> >+task.payload = data;
> >+
> >+mutex_lock(&iommu->lock);
> >+
> >+list_for_each_entry(d, &iommu->domain_list, next) {
> >+list_for_each_entry(g, &d->group_list, next) {
> >+if (g->iommu_group != NULL) {
> >+task.domain = d->domain;
> >+ret = iommu_group_for_each_dev(
> >+g->iommu_group, &task, fn);
> >+if (ret != 0)
> >+break;
> >+}
> >+}
> >+}
> >+
> >+mutex_unlock(&iommu->lock);
> >+return ret;
> >+}
> >+
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >unsigned int cmd, unsigned long arg)
> >  {
> >@@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> > return copy_to_user((void __user *)arg, &unmap, minsz) ?
> > -EFAULT : 0;
> >+} else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) {
> >+struct vfio_device_svm hdr;
> >+u8 *data = NULL;
> >+int ret = 0;
> >+
> >+minsz = offsetofend(struct vfio_device_svm, length);
> >+if (copy_from_user(&hdr, (void __user *)arg, minsz))
> >+return -EFAULT;
> >+
> >+if (hdr.length == 0)
> >+return -EINVAL;
> >+
> >+data = memdup_user((void __user *)(arg + minsz),
> >+hdr.length);
> 
> You should check the @length is at least sizeof(struct pasid_table_info) as
> kernel uses it as pasid_table_info, a evil application can crash kernel.

Yes, thx for the remind.

Thanks,
Yi L 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-05-15 Thread Liu, Yi L
On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi,
> 
> On 26/04/17 11:12, Liu, Yi L wrote:
> > From: "Liu, Yi L" 
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/uapi/linux/vfio.h | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +   __u32   argsz;
> > +   __u32   length;
> > +   __u8data[];
> > +};
> 
> We initially discussed something a little more generic than this, with
> most info explicitly described and only pIOMMU-specific quirks and hints
> in an opaque structure. Out of curiosity, why the change? I'm not against
> a fully opaque structure, but there seem to be a large overlap between TLB
> invalidations across architectures.

Hi Jean,

As my cover letter mentioned, it is an open on the iommu tlb invalidate
propagation. Paste it here since it's in the cover letter for Qemu part
changes. Pls refer to the [Open] session in the following link.

http://www.spinics.net/lists/kvm/msg148798.html

I want to see if community wants to have opaque structure or not
on iommu tlb invalidate propagation. Personally, I incline to use
opaque structure. But it's better to gather the comments before
deciding it. To assist the discussion, I put the full opaque structure
here. Once community gets consensus on using opaque structure for
iommu tlb invalidate propagation, I'm glad to work with you on a
structure with partial opaque since there seems to be overlap across
arch.

> 
> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> with the following.
> 
> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> after unpacking an opaque structure, since userspace doesn't know what's
> in it and guest might use a different endianess. So we need to force all
> opaque data to be e.g. little-endian.)
> 
> struct vfio_iommu_tlb_invalidate {
>   __u32   argsz;
>   __u32   scope;
>   __u32   flags;
>   __u32   pasid;
>   __u64   vaddr;
>   __u64   size;
>   __u8data[];
> };
>
> Scope is a bitfields restricting the invalidation scope. By default
> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> and @size are unused.
> 
> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
> scope to the pasid described by @pasid.
> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
> scope to the address range described by (@vaddr, @size).
> 
> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
> range for *all* pasids (as well as no_pasid). Setting scope =
> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
> the VA range only for @pasid.
> 

Besides VA range flusing, there is PASID Cache flushing on VT-d. How about
SMMU? So I think besides the two scope you defined, may need one more to
indicate if it's PASID Cache flushing.

> Flags depend on the selected scope:
> 
> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
> exclusively (some architectures, e.g. SMMU, allow this)
> 
> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
> to invalidate all intermediate tables cached as part of the PTW for vaddr,
> only the last-level entry (pte). This is a hint.
> 
> I guess what's missing for Intel IOMMU and would go in @data is the
> "global" hint (which we don't have in SMMU invalidations). Do you see
> anything else, that the pIOMMU cannot deduce from this structure?
> 

For Intel platform, Drain read/write would be needed in the opaque.

Thanks,
Yi L
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-05-15 Thread Liu, Yi L
On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:04 +0800
> "Liu, Yi L"  wrote:
> 
> > From: "Liu, Yi L" 
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/uapi/linux/vfio.h | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +   __u32   argsz;
> > +   __u32   length;
> > +   __u8data[];
> > +};
> > +
> > +#define VFIO_IOMMU_TLB_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 23)
> 
> I'm kind of wondering why this isn't just a new flag bit on
> vfio_device_svm, the data structure is so similar.  Of course data
> needs to be fully specified in uapi.

Hi Alex,

For this part, it depends on using opaque structure or not. The following
link mentioned it in [Open] session.

http://www.spinics.net/lists/kvm/msg148798.html

If we pick the full opaque solution for iommu tlb invalidate propagation.
Then I may add a flag bit on vfio_device_svm and also add definition in
uapi as you suggested.

Thanks,
Yi L

> > +
> >  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
> >  
> >  /*
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-05-15 Thread Liu, Yi L
On Fri, May 12, 2017 at 03:59:14PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:11:58 +0800
> "Liu, Yi L"  wrote:
> 
> > From: Jacob Pan 
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/iommu.c | 19 +++
> >  include/linux/iommu.h | 31 +++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
> > struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +   struct pasid_table_info *pasidt_binfo)
> > +{
> > +   if (unlikely(!domain->ops->bind_pasid_table))
> > +   return -EINVAL;
> > +
> > +   return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device 
> > *dev)
> > +{
> > +   if (unlikely(!domain->ops->unbind_pasid_table))
> > +   return -EINVAL;
> > +
> > +   return domain->ops->unbind_pasid_table(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >   struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0ff5111..491a011 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -131,6 +131,15 @@ struct iommu_dm_region {
> > int prot;
> >  };
> >  
> > +struct pasid_table_info {
> > +   __u64   ptr;/* PASID table ptr */
> > +   __u64   size;   /* PASID table size*/
> > +   __u32   model;  /* magic number */
> > +#define INTEL_IOMMU(1 << 0)
> > +#define ARM_SMMU   (1 << 1)
> > +   __u8opaque[];/* IOMMU-specific details */
> > +};
> 
> This needs to be in uapi since you're expecting a user to pass it 

yes, it is. Thx for the correction.

Thanks,
Yi L
> > +
> >  #ifdef CONFIG_IOMMU_API
> >  
> >  /**
> > @@ -159,6 +168,8 @@ struct iommu_dm_region {
> >   * @domain_get_windows: Return the number of windows for a domain
> >   * @of_xlate: add OF master IDs to iommu grouping
> >   * @pgsize_bitmap: bitmap of all possible supported page sizes
> > + * @bind_pasid_table: bind pasid table pointer for guest SVM
> > + * @unbind_pasid_table: unbind pasid table pointer and restore defaults
> >   */
> >  struct iommu_ops {
> > bool (*capable)(enum iommu_cap);
> > @@ -200,6 +211,10 @@ struct iommu_ops {
> > u32 (*domain_get_windows)(struct iommu_domain *domain);
> >  
> > int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> > +   int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
> > +   struct pasid_table_info *pasidt_binfo);
> > +   int (*unbind_pasid_t

Re: [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function

2017-05-18 Thread Liu, Yi L
On Fri, May 12, 2017 at 03:59:24PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:00 +0800
> "Liu, Yi L"  wrote:
> 

Hi Alex,

Pls refer to the open I mentioned in this email, I need your comments
on it to prepare the formal patchset for SVM virtualization. Thx.

> > From: "Liu, Yi L" 
> > 
> > When a SVM capable device is assigned to a guest, the first level page
> > tables are owned by the guest and the guest PASID table pointer is
> > linked to the device context entry of the physical IOMMU.
> > 
> > Host IOMMU driver has no knowledge of caching structure updates unless
> > the guest invalidation activities are passed down to the host. The
> > primary usage is derived from emulated IOMMU in the guest, where QEMU
> > can trap invalidation activities before pass them down the
> > host/physical IOMMU. There are IOMMU architectural specific actions
> > need to be taken which requires the generic APIs introduced in this
> > patch to have opaque data in the tlb_invalidate_info argument.
> > 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/iommu.c | 13 +
> >  include/linux/iommu.h | 16 
> >  2 files changed, 29 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index f2da636..ca7cff2 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain 
> > *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> >  
> > +int iommu_do_invalidate(struct iommu_domain *domain,
> > +   struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +   int ret = 0;
> > +
> > +   if (unlikely(domain->ops->do_invalidate == NULL))
> > +   return -ENODEV;
> > +
> > +   ret = domain->ops->do_invalidate(domain, dev, inv_info);
> > +   return ret;
> 
> nit, ret is unnecessary.

yes, would modify it. Thx.
 
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_do_invalidate);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >   struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 491a011..a48e3b75 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -140,6 +140,11 @@ struct pasid_table_info {
> > __u8opaque[];/* IOMMU-specific details */
> >  };
> >  
> > +struct tlb_invalidate_info {
> > +   __u32   model;
> > +   __u8opaque[];
> > +};
> 
> I'm wondering if 'model' is really necessary here, shouldn't this
> function only be called if a bind_pasid_table() succeeded, and then the
> model would be set at that time?

For this model, I'm thinking about another potential usage which
is from Tianyu's idea to use tlb_invalidate_info to pass invalidations
for iova related mappings. In such case, there would be no bind_pasid_table()
before it, so a model check would be needed. But I may remove it since this
patchset is focusing on SVM.

Here, I have an open to check with you. I defined the tlb_invalidate_info
with full opaque data. The opaque would include the invalidate info for
different vendors. But we have two choices for the tlb_invalidate_info
definition.

a) as proposed in this patchset, passing raw data to host. Host pIOMMU
   driver submits invalidation request after replacing specific fields.
   Reject if the IOMMU model is not correct.
   * Pros: no need to do parse and re-assembling, better performance
   * Cons: unable to support the scenarios which emulates an Intel IOMMU
   on an ARM platform.
b) parse the invalidation info into specific data, e.g. gran, addr,
   size, invalidation type etc. then fill the data in a generic
   structure. In host, pIOMMU driver re-assemble the invalidation
   request and submit to pIOMMU.
   * Pros: may be able to support the scenario above. But it is still in
   question since different vendor may have vendor specific
   invalidation info. This would make it difficult to have vendor
   agnostic invalidation propagation API.

   * Cons: needs additional complexity to do parse and re-assembling.
   The generic structure would be a hyper-set of all possible
   invalidate info, this may be hard to maintain in future.

As the pros/cons show, I proposed a) as an initial version. But it is an
open. Jean from ARM has gave some comments on it and inclined to the opaque
way with generic part defined explicitly. Jean's reply is in the link bel

Re: [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function

2017-05-18 Thread Liu, Yi L
On Fri, May 12, 2017 at 03:59:18PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:01 +0800
> "Liu, Yi L"  wrote:
> 
> > From: Jacob Pan 
> > 
> > This patch adds Intel VT-d specific function to implement
> > iommu_do_invalidate API.
> > 
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the guest
> > to the physical IOMMU.
> > 
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
> > 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/intel-iommu.c | 43 
> > +++
> >  include/linux/intel-iommu.h | 11 +++
> >  2 files changed, 54 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 6d5b939..0b098ad 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct 
> > iommu_domain *domain,
> > dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
> >  }
> >  
> > +static int intel_iommu_do_invalidate(struct iommu_domain *domain,
> > +   struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +   int ret = 0;
> > +   struct intel_iommu *iommu;
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct intel_invalidate_data *inv_data;
> > +   struct qi_desc *qi;
> > +   u16 did;
> > +   u8 bus, devfn;
> > +
> > +   if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
> > +   return -EINVAL;
> > +
> > +   iommu = device_to_iommu(dev, &bus, &devfn);
> > +   if (!iommu)
> > +   return -ENODEV;
> > +
> > +   inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
> > +
> > +   /* check SID */
> > +   if (PCI_DEVID(bus, devfn) != inv_data->sid)
> > +   return 0;
> > +
> > +   qi = &inv_data->inv_desc;
> > +
> > +   switch (qi->low & QI_TYPE_MASK) {
> > +   case QI_DIOTLB_TYPE:
> > +   case QI_DEIOTLB_TYPE:
> > +   /* for device IOTLB, we just let it pass through */
> > +   break;
> > +   default:
> > +   did = dmar_domain->iommu_did[iommu->seq_id];
> > +   set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did));
> > +   break;
> > +   }
> > +
> > +   ret = qi_submit_sync(qi, iommu);
> > +
> > +   return ret;
> 
> nit, ret variable is unnecessary.

yes, would remove it.
 
> > +}
> > +
> >  static int intel_iommu_map(struct iommu_domain *domain,
> >unsigned long iova, phys_addr_t hpa,
> >size_t size, int iommu_prot)
> > @@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct 
> > iommu_domain *domain,
> >  #ifdef CONFIG_INTEL_IOMMU_SVM
> > .bind_pasid_table   = intel_iommu_bind_pasid_table,
> > .unbind_pasid_table = intel_iommu_unbind_pasid_table,
> > +   .do_invalidate  = intel_iommu_do_invalidate,
> >  #endif
> > .map= intel_iommu_map,
> > .unmap  = intel_iommu_unmap,
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index ac04f28..9d6562c 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -271,6 +272,10 @@ enum {
> >  #define QI_PGRP_RESP_TYPE  0x9
> >  #define QI_PSTRM_RESP_TYPE 0xa
> >  
> > +#define QI_DID(did)(((u64)did & 0x) << 16)
> > +#define QI_DID_MASKGENMASK(31, 16)
> > +#define QI_TYPE_MASK   GENMASK(3, 0)
> > +
> >  #define QI_IEC_SELECTIVE   (((u64)1) << 4)
> >  #define QI_IEC_IIDEX(idx)  (((u64)(idx & 0x) << 32))
> >  #define QI_IEC_IM(m)   (((u64)(m & 0x1f) << 27))
> > @@ -529,6 +534,12 @@ struct intel_svm {
> >  extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
> >  #endif
> >  
> > +struct intel_invalidate_data {
> > +   u16 sid;
> > +   u32 pasid;
> > +   struct qi_desc inv_desc;
> > +};
> 
> This needs to be uapi since the vfio user is expected to create it, so
> we need a uapi version of qi_desc too.
>

yes, would do it.

Thx,
Yi L
 
> > +
> >  extern const struct attribute_group *intel_iommu_groups[];
> >  extern void intel_iommu_debugfs_init(void);
> >  extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation

2017-05-18 Thread Liu, Yi L
On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:02 +0800
> "Liu, Yi L"  wrote:
> 
> > From: "Liu, Yi L" 
> > 
> > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> > binding requests.
> > 
> > On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> > to host. While for other vendors, it may also be used to support other
> > kind of SVM bind request. Previously, there is a discussion on it with
> > ARM engineer. It can be found by the link below. This IOCTL cmd may
> > support SVM PASID bind request from userspace driver, or page table(cr3)
> > bind request from guest. These SVM bind requests would be supported by
> > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> > support page table bind from guest.
> > 
> > https://patchwork.kernel.org/patch/9594231/
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/uapi/linux/vfio.h | 17 +
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 519eff3..6b97987 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
> >  
> > +/* IOCTL for Shared Virtual Memory Bind */
> > +struct vfio_device_svm {
> > +   __u32   argsz;
> > +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */
> > +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace 
> > driver */
> > +#define VFIO_SVM_BIND_PGTABLE  (1 << 2) /* Bind guest mmu page table */
> > +   __u32   flags;
> > +   __u32   length;
> > +   __u8data[];
> 
> In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
> pasid_table_info?  So at a minimum this is a union including struct
> pasid_table_info.  Furthermore how does a user learn what the opaque
> data in struct pasid_table_info is without looking at the code?  A user
> API needs to be clear and documented, not opaque and variable.  We
> should also have references to the hardware spec for an Intel or ARM
> PASID table in uapi.  flags should be defined as they're used, let's
> not reserve them with the expectation of future use.
> 

Agree. would add description accordingly. For the flags, I would remove
the last two as I wouldn't use. I think Jean would add them in his/her
patchset. Anyhow, one of us need to do merge on the flags.

Thanks,
Yi L

> > +};
> > +
> > +#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \
> > +   VFIO_SVM_BIND_PASID | \
> > +   VFIO_SVM_BIND_PGTABLE)
> > +
> > +#define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
> > +
> >  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
> >  
> >  /*
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device

2017-05-18 Thread Liu, Yi L
Hi Alex,

What's your opinion with Tianyu's question? Is it accepatable
to use VFIO API in intel_iommu emulator?

Thanks,
Yi L
On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> On 2017年04月26日 18:06, Liu, Yi L wrote:
> > With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
> > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > emulator needs to replace guest SID with host SID so that to limit
> > the invalidation. This patch introduces a new callback
> > iommu_ops->record_device() to notify vIOMMU emulator to record necessary
> > information about the assigned device.
> 
> This patch is to prepare to translate guest sbdf to host sbdf.
> 
> Alex:
>   Could we add a new vfio API to do such translation? This will be more
> straight forward than storing host sbdf in the vIOMMU device model.
> 
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/exec/memory.h | 11 +++
> >  memory.c  | 12 
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 7bd13ab..49087ef 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> >  IOMMUNotifierFlag new_flags);
> >  /* Set this up to provide customized IOMMU replay function */
> >  void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > +void (*record_device)(MemoryRegion *iommu,
> > +  void *device_info);
> >  };
> >  
> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > @@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> >  void memory_region_notify_one(IOMMUNotifier *notifier,
> >IOMMUTLBEntry *entry);
> >  
> > +/*
> > + * memory_region_notify_device_record: notify IOMMU to record assign
> > + * device.
> > + * @mr: the memory region to notify
> > + * @ device_info: device information
> > + */
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +void *info);
> > +
> >  /**
> >   * memory_region_register_iommu_notifier: register a notifier for changes 
> > to
> >   * IOMMU translation entries.
> > diff --git a/memory.c b/memory.c
> > index 0728e62..45ef069 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1600,6 +1600,18 @@ static void 
> > memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> >  mr->iommu_notify_flags = flags;
> >  }
> >  
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +void *info)
> > +{
> > +assert(memory_region_is_iommu(mr));
> > +
> > +if (mr->iommu_ops->record_device) {
> > +mr->iommu_ops->record_device(mr, info);
> > +}
> > +
> > +return;
> > +}
> > +
> >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > IOMMUNotifier *n)
> >  {
> > 
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device

2017-05-19 Thread Liu, Yi L
On Fri, May 19, 2017 at 09:07:49AM +, Tian, Kevin wrote:
> > From: Liu, Yi L [mailto:yi.l@linux.intel.com]
> > Sent: Friday, May 19, 2017 1:24 PM
> > 
> > Hi Alex,
> > 
> > What's your opinion with Tianyu's question? Is it accepatable
> > to use VFIO API in intel_iommu emulator?
> 
> Did you actually need such translation at all? SID should be
> filled by kernel IOMMU driver based on which device is
> requested with invalidation request, regardless of which 
> guest SID is used in user space. Qemu only needs to know
> which fd corresponds to guest SID, and then initiates an
> invalidation request on that fd?

Kevin,

It actually depends on the svm binding behavior we expect in host
IOMMU driver side. If we want to have the binding per-device, this
translation is needed in Qemu either in VFIO or intel_iommu emulator.
So that the host SID could be used as a device selector when looping
devices in a group.

If we can use VFIO API directly, we also may trigger the svm bind/qi
propagation straightforwardly instead of using notifier.

Thanks,
Yi L
 
> > 
> > Thanks,
> > Yi L
> > On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> > > On 2017年04月26日 18:06, Liu, Yi L wrote:
> > > > With vIOMMU exposed to guest, vIOMMU emulator needs to do
> > translation
> > > > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > > > emulator needs to replace guest SID with host SID so that to limit
> > > > the invalidation. This patch introduces a new callback
> > > > iommu_ops->record_device() to notify vIOMMU emulator to record
> > necessary
> > > > information about the assigned device.
> > >
> > > This patch is to prepare to translate guest sbdf to host sbdf.
> > >
> > > Alex:
> > >   Could we add a new vfio API to do such translation? This will be more
> > > straight forward than storing host sbdf in the vIOMMU device model.
> > >
> > > >
> > > > Signed-off-by: Liu, Yi L 
> > > > ---
> > > >  include/exec/memory.h | 11 +++
> > > >  memory.c  | 12 
> > > >  2 files changed, 23 insertions(+)
> > > >
> > > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > > index 7bd13ab..49087ef 100644
> > > > --- a/include/exec/memory.h
> > > > +++ b/include/exec/memory.h
> > > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> > > >  IOMMUNotifierFlag new_flags);
> > > >  /* Set this up to provide customized IOMMU replay function */
> > > >  void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > > > +void (*record_device)(MemoryRegion *iommu,
> > > > +  void *device_info);
> > > >  };
> > > >
> > > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > > @@ -708,6 +710,15 @@ void
> > memory_region_notify_iommu(MemoryRegion *mr,
> > > >  void memory_region_notify_one(IOMMUNotifier *notifier,
> > > >IOMMUTLBEntry *entry);
> > > >
> > > > +/*
> > > > + * memory_region_notify_device_record: notify IOMMU to record
> > assign
> > > > + * device.
> > > > + * @mr: the memory region to notify
> > > > + * @ device_info: device information
> > > > + */
> > > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > > +void *info);
> > > > +
> > > >  /**
> > > >   * memory_region_register_iommu_notifier: register a notifier for
> > changes to
> > > >   * IOMMU translation entries.
> > > > diff --git a/memory.c b/memory.c
> > > > index 0728e62..45ef069 100644
> > > > --- a/memory.c
> > > > +++ b/memory.c
> > > > @@ -1600,6 +1600,18 @@ static void
> > memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> > > >  mr->iommu_notify_flags = flags;
> > > >  }
> > > >
> > > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > > +void *info)
> > > > +{
> > > > +assert(memory_region_is_iommu(mr));
> > > > +
> > > > +if (mr->iommu_ops->record_device) {
> > > > +mr->iommu_ops->record_device(mr, info);
> > > > +}
> > > > +
> > > > +return;
> > > > +}
> > > > +
> > > >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > > IOMMUNotifier *n)
> > > >  {
> > > >
> > >
> > >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-05-23 Thread Liu, Yi L
On Fri, Apr 28, 2017 at 01:51:42PM +0100, Jean-Philippe Brucker wrote:
> On 28/04/17 10:04, Liu, Yi L wrote:
Hi Jean,

Sorry for the delay response. Still have some follow-up comments on
per-device or per-group. Pls refer to comments inline.

> > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> >> Hi Yi, Jacob,
> >>
> >> On 26/04/17 11:11, Liu, Yi L wrote:
> >>> From: Jacob Pan 
> >>>
> >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> >>> case in the guest:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >>>
> >>> As part of the proposed architecture, when a SVM capable PCI
> >>> device is assigned to a guest, nested mode is turned on. Guest owns the
> >>> first level page tables (request with PASID) and performs GVA->GPA
> >>> translation. Second level page tables are owned by the host for GPA->HPA
> >>> translation for both request with and without PASID.
> >>>
> >>> A new IOMMU driver interface is therefore needed to perform tasks as
> >>> follows:
> >>> * Enable nested translation and appropriate translation type
> >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> >>>
> >>> This patch introduces new functions called iommu_(un)bind_pasid_table()
> >>> to IOMMU APIs. Architecture specific IOMMU function can be added later
> >>> to perform the specific steps for binding pasid table of assigned devices.
> >>>
> >>> This patch also adds model definition in iommu.h. It would be used to
> >>> check if the bind request is from a compatible entity. e.g. a bind
> >>> request from an intel_iommu emulator may not be supported by an ARM SMMU
> >>> driver.
> >>>
> >>> Signed-off-by: Jacob Pan 
> >>> Signed-off-by: Liu, Yi L 
> >>> ---
> >>>  drivers/iommu/iommu.c | 19 +++
> >>>  include/linux/iommu.h | 31 +++
> >>>  2 files changed, 50 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>> index dbe7f65..f2da636 100644
> >>> --- a/drivers/iommu/iommu.c
> >>> +++ b/drivers/iommu/iommu.c
> >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain 
> >>> *domain, struct device *dev)
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >>>  
> >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device 
> >>> *dev,
> >>> + struct pasid_table_info *pasidt_binfo)
> >>
> >> I guess that domain can always be deduced from dev using
> >> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> >>
> >> For the next version of my SVM series, I was thinking of passing group
> >> instead of device to iommu_bind. Since all devices in a group are expected
> >> to share the same mappings (whether they want it or not), users will have
> > 
> > Virtual address space is not tied to protection domain as I/O virtual 
> > address
> > space does. Is it really necessary to affect all the devices in this group.
> > Or it is just for consistence?
> 
> It's mostly about consistency, and also avoid hiding implicit behavior in
> the IOMMU driver. I have the following example, described using group and
> domain structures from the IOMMU API:
>  
> |IOMMU   |
> |  |DOM  __ ||
> |  ||GRP   ||| bind
> |  ||A<-Task 1
> |  ||B |||
> |  ||__|||
> |  | __ ||
> |  ||GRP   |||
> |  ||C |||
> |  ||__|||
> |  |||
> |    |
> |  |DOM  __ ||
> |  ||GRP   |||
> |  ||D |||
> |  ||__|||
> |  |||
> ||
> 
> Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
> to some hardware limitation (in the bus, the device or the IOMMU), B can
> see all DMA transactions issued by A. A and B are therefore 

Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-03 Thread Liu, Yi L
On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:

Hi Jean,

As we've got a few discussions on it. I'd like to have a conclusion and
make it as a reference for future discussion.

Currently, we are inclined to have a hybrid format for the iommu tlb
invalidation from userspace(vIOMMU or userspace driver).

Based on the previous discussion, may the below work?

1. Add a IOCTL for iommu tlb invalidation.

VFIO_IOMMU_TLB_INVALIDATE

struct vfio_iommu_tlb_invalidate {
   __u32   argsz;
   __u32   length;
   __u8data[];
};

comments from Alex William: is it more suitable to add a new flag bit on
vfio_device_svm(a structure defined in patch 5 of this patchset), the data
structure is so similar.

Personally, I'm ok with it. Pls let me know your thoughts. However, the
precondition is we accept the whole definition in this email. If not, the
vfio_iommu_tlb_invalidate would be defined differently.

2. Define a structure in include/uapi/linux/iommu.h(newly added header file)

struct iommu_tlb_invalidate {
__u32   scope;
/* pasid-selective invalidation described by @pasid */
#define IOMMU_INVALIDATE_PASID  (1 << 0)
/* address-selevtive invalidation described by (@vaddr, @size) */
#define IOMMU_INVALIDATE_VADDR  (1 << 1)
__u32   flags;
/*  targets non-pasid mappings, @pasid is not valid */
#define IOMMU_INVALIDATE_NO_PASID   (1 << 0)
/* indicating that the pIOMMU doesn't need to invalidate
   all intermediate tables cached as part of the PTE for
   vaddr, only the last-level entry (pte). This is a hint. */
#define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1)
__u32   pasid;
__u64   vaddr;
__u64   size;
__u8data[];
};

For this part, the scope and flags are basically aligned with your previous
email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags
would be filled by vIOMMU emulator and be parsed by underlying iommu driver,
it is much more suitable to be defined in a uapi header file.

Besides the reason above, I don't want VFIO engae too much on the data parsing.
If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate,
then both kernel space vfio and user space vfio needs to do much parsing. So I
may prefer the way above.

If you've got any other idea, pls feel free to post it. It's welcomed.

Thanks,
Yi L

> Hi Yi,
> 
> On 26/04/17 11:12, Liu, Yi L wrote:
> > From: "Liu, Yi L" 
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/uapi/linux/vfio.h | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +   __u32   argsz;
> > +   __u32   length;
> > +   __u8data[];
> > +};
> 
> We initially discussed something a little more generic than this, with
> most info explicitly described and only pIOMMU-specific quirks and hints
> in an opaque structure. Out of curiosity, why the change? I'm not against
> a fully opaque structure, but there seem to be a large overlap between TLB
> invalidations across architectures.
> 
> 
> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> with the following.
> 
> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> after unpacking an opaque structure, since userspace doesn't know what's
> in it and guest might use a different endianess. So we need to force all
> opaque data to be e.g. little-endian.)
> 
> struct vfio_iommu_tlb_invalidate {
>   __u32   argsz;
>   __u32   scope;
>   __u32   flags;
>   __u32   pasid;
>   __u64   vaddr;
>   __u64   size;
>   __u8data[];
> };
> 
> Scope is a bitfields restricting the invalidation scope. By default
> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> and @size are unused.
> 
> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
>

Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-04 Thread Liu, Yi L
Hi Jean,

On Mon, Jul 03, 2017 at 12:52:52PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi,
> 
> On 02/07/17 11:06, Liu, Yi L wrote:
> > On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> > 
> > Hi Jean,
> > 
> > As we've got a few discussions on it. I'd like to have a conclusion and
> > make it as a reference for future discussion.
> > 
> > Currently, we are inclined to have a hybrid format for the iommu tlb
> > invalidation from userspace(vIOMMU or userspace driver).
> > 
> > Based on the previous discussion, may the below work?
> > 
> > 1. Add a IOCTL for iommu tlb invalidation.
> > 
> > VFIO_IOMMU_TLB_INVALIDATE
> > 
> > struct vfio_iommu_tlb_invalidate {
> >__u32   argsz;
> >__u32   length;
> 
> Wouldn't argsz be exactly length + 8? Might be redundant in this case.

yes, it is. we may not use it in future version. but yes, if we still use it.
I think we can make it easier.
 
> >__u8data[];
> > };
> > 
> > comments from Alex William: is it more suitable to add a new flag bit on
> > vfio_device_svm(a structure defined in patch 5 of this patchset), the data
> > structure is so similar.
> > 
> > Personally, I'm ok with it. Pls let me know your thoughts. However, the
> > precondition is we accept the whole definition in this email. If not, the
> > vfio_iommu_tlb_invalidate would be defined differently.
> 
> With this proposal sharing the structure makes sense. As I understand it
> we're keeping the VFIO_IOMMU_TLB_INVALIDATE ioctl? In which case adding a
> flag bit would be redundant.

yes, it seems to be strange if we share vfio_device_svm structure but use
a separate IOCTL cmd. Maybe it's more reasonable to share IOCTL cmd and just
add a new flag. Then all the svm related operations share the IOCTL. However,
need to check if there would be any non-svm related iommu tlb invalidation.
Then vfio_device_svm should be renamed to be non-svm specific.

> 
> > 2. Define a structure in include/uapi/linux/iommu.h(newly added header file)
> > 
> > struct iommu_tlb_invalidate {
> > __u32   scope;
> > /* pasid-selective invalidation described by @pasid */
> > #define IOMMU_INVALIDATE_PASID  (1 << 0)
> > /* address-selevtive invalidation described by (@vaddr, @size) */
> > #define IOMMU_INVALIDATE_VADDR  (1 << 1)
> > __u32   flags;
> > /*  targets non-pasid mappings, @pasid is not valid */
> > #define IOMMU_INVALIDATE_NO_PASID   (1 << 0)
> 
> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> using a special mode where PASID 0 is reserved and any traffic without
> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> to invalidate that special context explicitly. But this means that
> invalidation packet targeted at that context will have "scope = PASID" and
> "flags = NO_PASID", which is utterly confusing.
> 
> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID flag
> and just use PASID 0 to invalidate this context on ARM. I don't think
> other architectures would use the NO_PASID flag anyway, but might be mistaken.

I may suggest to keep it so far. On VT-d, we may pass some data in opaque, so
we may work without it. But if other vendor want to issue non-PASID tagged
cache, then may encounter problem.

> > /* indicating that the pIOMMU doesn't need to invalidate
> >all intermediate tables cached as part of the PTE for
> >vaddr, only the last-level entry (pte). This is a hint. */
> > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1)
> > __u32   pasid;
> > __u64   vaddr;
> > __u64   size;
> > __u8data[];
> > };
> > 
> > For this part, the scope and flags are basically aligned with your previous
> > email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and 
> > flags
> > would be filled by vIOMMU emulator and be parsed by underlying iommu driver,
> > it is much more suitable to be defined in a uapi header file.
> 
> I tend to agree, defining a single structure in a new IOMMU UAPI file is
> better than having identical structures both in uapi/linux/vfio.h and
> linux/iommu.h. This way we avoid VFIO having to copy the same structure
> field by field. Arch-specific structures that go in
> iommu_tlb_invalidate.data also ought to be defined in uapi/linux/iommu.h

yes, it is.

> > Besides the reason above, I don't want VFIO engae too much on the data 
> > parsing.
> > If we move the scope,flags,pasid,vaddr,size fields to 
> > vfio_i

RE: Support SVM without PASID

2017-07-09 Thread Liu, Yi L
> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of valmiki
> Sent: Sunday, July 9, 2017 11:16 AM
> To: Alex Williamson 
> Cc: Lan, Tianyu ; Tian, Kevin ;
> k...@vger.kernel.org; linux-...@vger.kernel.org; 
> iommu@lists.linux-foundation.org;
> Pan, Jacob jun 
> Subject: Re: Support SVM without PASID
> 
> >> Hi,
> >>
> >> In SMMUv3 architecture document i see "PASIDs are optional,
> >> configurable, and of a size determined by the minimum of the
> >> endpoint".
> >>
> >> So if PASID's are optional and not supported by PCIe end point, how
> >> SVM can be achieved ?
> >
> > It cannot be inferred from that statement that PASID support is not
> > required for SVM.  AIUI, SVM is a software feature enabled by numerous
> > "optional" hardware features, including PASID.  Features that are
> > optional per the hardware specification may be required for specific
> > software features.  Thanks,
> >
> Thanks for the information Alex. Suppose if an End point doesn't support 
> PASID, is it
> still possible to achieve SVM ?
> Are there any such features in SMMUv3 with which we can achieve it ?

If endpoint has no PASID support, I don't think it is SVM capable. For SMMU, 
maybe
you can get more info from Jean.

Regards,
Yi L
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-14 Thread Liu, Yi L
Hi Alex,

Against to the opaque open, I'd like to propose the following definition
based on the existing comments. Pls note that I've merged the pasid
table binding and iommu tlb invalidation into a single IOCTL and make
different flags to indicate the iommu operations. Per Kevin's comments,
there may be iommu invalidation for guest IOVA tlb, so I renamed the
IOCTL and data structure to be non-svm specific. Pls kindly have a review,
so that we can make the opaque open closed and move forward. Surely,
comments and ideas are welcomed. And for the scope and flags definition
in struct iommu_tlb_invalidate, it's also welcomed to give your ideas on it.

1. Add a VFIO IOCTL for iommu operations from user-space

#define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)

Corresponding data structure:
struct vfio_iommu_operation_info {
__u32   argsz;
#define VFIO_IOMMU_BIND_PASIDTBL(1 << 0) /* Bind PASID Table */
#define VFIO_IOMMU_BIND_PASID   (1 << 1) /* Bind PASID from userspace driver*/
#define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */
#define VFIO_IOMMU_INVAL_IOTLB  (1 << 3) /* Invalidate iommu tlb */
__u32   flag;
__u32   length; // length of the data[] part in byte
__u8data[]; // stores the data for iommu op indicated by flag field
};

For iommu tlb invalidation from userspace, the "__u8 data[]" stores
data which would be parsed by the "struct iommu_tlb_invalidate" defined
below.

2. Definitions in include/uapi/linux/iommu.h(newly added header file)

/* IOMMU model definition for iommu operations from userspace */
enum iommu_model {
INTLE_IOMMU,
ARM_SMMU,
AMD_IOMMU,
SPAPR_IOMMU,
S390_IOMMU,
};

struct iommu_tlb_invalidate {
__u32   scope;
/* pasid-selective invalidation described by @pasid */
#define IOMMU_INVALIDATE_PASID  (1 << 0)
/* address-selevtive invalidation described by (@vaddr, @size) */
#define IOMMU_INVALIDATE_VADDR  (1 << 1)
__u32   flags;
/*  targets non-pasid mappings, @pasid is not valid */
#define IOMMU_INVALIDATE_NO_PASID   (1 << 0)
/* indicating that the pIOMMU doesn't need to invalidate
all intermediate tables cached as part of the PTE for
vaddr, only the last-level entry (pte). This is a hint. */
#define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1)
__u32   pasid;
__u64   vaddr;
__u64   size;
enum iommu_model model;
/*
 Vendor may have different HW version and thus the
 data part of this structure differs, use sub_version
 to indicate such difference.
 */
__u322 sub_version;
__u64 length; // length of the data[] part in byte
__u8data[];
};

For Intel, the data structue is:
struct intel_iommu_invalidate_data {
__u64 low;
__u64 high;
}

Thanks,
Yi L

> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, July 6, 2017 1:28 AM
> To: Jean-Philippe Brucker 
> Cc: Tian, Kevin ; Liu, Yi L ; 
> Lan,
> Tianyu ; Liu, Yi L ; Raj, Ashok
> ; k...@vger.kernel.org; jasow...@redhat.com; Will Deacon
> ; pet...@redhat.com; qemu-de...@nongnu.org;
> iommu@lists.linux-foundation.org; Pan, Jacob jun 
> Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB
> invalidate propagation
> 
> On Wed, 5 Jul 2017 13:42:03 +0100
> Jean-Philippe Brucker  wrote:
> 
> > On 05/07/17 07:45, Tian, Kevin wrote:
> > >> From: Liu, Yi L
> > >> Sent: Monday, July 3, 2017 6:31 PM
> > >>
> > >> Hi Jean,
> > >>
> > >>
> > >>>
> > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added
> > >>>> header
> > >> file)
> > >>>>
> > >>>> struct iommu_tlb_invalidate {
> > >>>>__u32   scope;
> > >>>> /* pasid-selective invalidation described by @pasid */
> > >>>> #define IOMMU_INVALIDATE_PASID (1 << 0)
> > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */
> > >>>> #define IOMMU_INVALIDATE_VADDR (1 << 1)
> > >
> > > For VT-d above two flags are related. There is no method of flushing
> > > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > > selective invalidation is valid only for a given pasid. So it's not
> > > appropriate to put them in same level of scope definition at least for 
> > > VT-d.
> >
> > For ARM SMMU the "flush all by VA" operation is valid. Although it's
> > unclear at this point if we will ever allow that, it should probably
> &

RE: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-17 Thread Liu, Yi L
Hi Alex,

Pls refer to the response inline.

> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf
> Of Alex Williamson
> Sent: Saturday, July 15, 2017 2:16 AM
> To: Liu, Yi L 
> Cc: Jean-Philippe Brucker ; Tian, Kevin
> ; Liu, Yi L ; Lan, Tianyu
> ; Raj, Ashok ; 
> k...@vger.kernel.org;
> jasow...@redhat.com; Will Deacon ; pet...@redhat.com;
> qemu-de...@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun
> ; Joerg Roedel 
> Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB
> invalidate propagation
> 
> On Fri, 14 Jul 2017 08:58:02 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> >
> > Against to the opaque open, I'd like to propose the following
> > definition based on the existing comments. Pls note that I've merged
> > the pasid table binding and iommu tlb invalidation into a single IOCTL
> > and make different flags to indicate the iommu operations. Per Kevin's
> > comments, there may be iommu invalidation for guest IOVA tlb, so I
> > renamed the IOCTL and data structure to be non-svm specific. Pls
> > kindly have a review, so that we can make the opaque open closed and
> > move forward. Surely, comments and ideas are welcomed. And for the
> > scope and flags definition in struct iommu_tlb_invalidate, it's also 
> > welcomed to
> give your ideas on it.
> >
> > 1. Add a VFIO IOCTL for iommu operations from user-space
> >
> > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> >
> > Corresponding data structure:
> > struct vfio_iommu_operation_info {
> > __u32   argsz;
> > #define VFIO_IOMMU_BIND_PASIDTBL(1 << 0) /* Bind PASID Table */
> > #define VFIO_IOMMU_BIND_PASID   (1 << 1) /* Bind PASID from userspace
> driver*/
> > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */
> > #define VFIO_IOMMU_INVAL_IOTLB  (1 << 3) /* Invalidate iommu tlb */
> > __u32   flag;
> > __u32   length; // length of the data[] part in byte
> > __u8data[]; // stores the data for iommu op indicated by flag field
> > };
> 
> If we're doing a generic "Ops" ioctl, then we should have an "op" field which 
> is
> defined by an enum.  It doesn't make sense to use flags for this, for example 
> can we
> set multiple flag bits?  If not then it's not a good use for a bit field.  
> I'm also not sure I
> understand the value of the "length" field, can't it always be calculated 
> from argsz?

Agreed, enum would be better. "length" field could be calculated from argsz. I 
used
it just to avoid offset calculations. May remove it.
 
> > For iommu tlb invalidation from userspace, the "__u8 data[]" stores
> > data which would be parsed by the "struct iommu_tlb_invalidate"
> > defined below.
> >
> > 2. Definitions in include/uapi/linux/iommu.h(newly added header file)
> >
> > /* IOMMU model definition for iommu operations from userspace */ enum
> > iommu_model {
> > INTLE_IOMMU,
> > ARM_SMMU,
> > AMD_IOMMU,
> > SPAPR_IOMMU,
> > S390_IOMMU,
> > };
> >
> > struct iommu_tlb_invalidate {
> > __u32   scope;
> > /* pasid-selective invalidation described by @pasid */
> > #define IOMMU_INVALIDATE_PASID  (1 << 0)
> > /* address-selevtive invalidation described by (@vaddr, @size) */
> > #define IOMMU_INVALIDATE_VADDR  (1 << 1)
> 
> Again, is a bit field appropriate here, can a user set both bits?

yes, user may set both bits. It would be invalidate address range
which is tagged with a PASID value.

> 
> > __u32   flags;
> > /*  targets non-pasid mappings, @pasid is not valid */
> > #define IOMMU_INVALIDATE_NO_PASID   (1 << 0)
> > /* indicating that the pIOMMU doesn't need to invalidate
> > all intermediate tables cached as part of the PTE for
> > vaddr, only the last-level entry (pte). This is a hint. */
> > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1)
> 
> Are we venturing into vendor specific attributes here?

These two attributes are still in discussion. Jean and me synced
several rounds. But lack of comments from other vendors.

Personally, I think both should be generic.
IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used
for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates
only invalidate leaf mappings. 
I would see if other vendor is object on it. If yes, I'm fine to move
it to vendor specific part.
 
> 
> > __u32   pasid;
>

Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-19 Thread Liu, Yi L
On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote:
> On Mon, 17 Jul 2017 10:58:41 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> > 
> > Pls refer to the response inline.
> > 
> > > -Original Message-
> > > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On 
> > > Behalf
> > > Of Alex Williamson
> > > Sent: Saturday, July 15, 2017 2:16 AM
> > > To: Liu, Yi L 
> > > Cc: Jean-Philippe Brucker ; Tian, Kevin
> > > ; Liu, Yi L ; Lan, Tianyu
> > > ; Raj, Ashok ; 
> > > k...@vger.kernel.org;
> > > jasow...@redhat.com; Will Deacon ; pet...@redhat.com;
> > > qemu-de...@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun
> > > ; Joerg Roedel 
> > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU 
> > > TLB
> > > invalidate propagation
> > > 
> > > On Fri, 14 Jul 2017 08:58:02 +
> > > "Liu, Yi L"  wrote:
> > >   
> > > > Hi Alex,
> > > >
> > > > Against to the opaque open, I'd like to propose the following
> > > > definition based on the existing comments. Pls note that I've merged
> > > > the pasid table binding and iommu tlb invalidation into a single IOCTL
> > > > and make different flags to indicate the iommu operations. Per Kevin's
> > > > comments, there may be iommu invalidation for guest IOVA tlb, so I
> > > > renamed the IOCTL and data structure to be non-svm specific. Pls
> > > > kindly have a review, so that we can make the opaque open closed and
> > > > move forward. Surely, comments and ideas are welcomed. And for the
> > > > scope and flags definition in struct iommu_tlb_invalidate, it's also 
> > > > welcomed to  
> > > give your ideas on it.  
> > > >
> > > > 1. Add a VFIO IOCTL for iommu operations from user-space
> > > >
> > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> > > >
> > > > Corresponding data structure:
> > > > struct vfio_iommu_operation_info {
> > > > __u32   argsz;
> > > > #define VFIO_IOMMU_BIND_PASIDTBL(1 << 0) /* Bind PASID Table */
> > > > #define VFIO_IOMMU_BIND_PASID   (1 << 1) /* Bind PASID from userspace  
> > > driver*/  
> > > > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */
> > > > #define VFIO_IOMMU_INVAL_IOTLB  (1 << 3) /* Invalidate iommu tlb */
> > > > __u32   flag;
> > > > __u32   length; // length of the data[] part in byte
> > > > __u8data[]; // stores the data for iommu op indicated by 
> > > > flag field
> > > > };  
> > > 
> > > If we're doing a generic "Ops" ioctl, then we should have an "op" field 
> > > which is
> > > defined by an enum.  It doesn't make sense to use flags for this, for 
> > > example can we
> > > set multiple flag bits?  If not then it's not a good use for a bit field. 
> > >  I'm also not sure I
> > > understand the value of the "length" field, can't it always be calculated 
> > > from argsz?  
> > 
> > Agreed, enum would be better. "length" field could be calculated from 
> > argsz. I used
> > it just to avoid offset calculations. May remove it.
> >  
> > > > For iommu tlb invalidation from userspace, the "__u8 data[]" stores
> > > > data which would be parsed by the "struct iommu_tlb_invalidate"
> > > > defined below.
> > > >
> > > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file)
> > > >
> > > > /* IOMMU model definition for iommu operations from userspace */ enum
> > > > iommu_model {
> > > > INTLE_IOMMU,
> > > > ARM_SMMU,
> > > > AMD_IOMMU,
> > > > SPAPR_IOMMU,
> > > > S390_IOMMU,
> > > > };
> > > >
> > > > struct iommu_tlb_invalidate {
> > > > __u32   scope;
> > > > /* pasid-selective invalidation described by @pasid */
> > > > #define IOMMU_INVALIDATE_PASID  (1 << 0)
> > > > /* address-selevtive invalidation described by (@vaddr, @size) */
> > > > #define IOMMU_INVALIDATE_VADDR  (1 << 1)  
> > > 
> > > Aga

RE: [PATCH v5 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-12-03 Thread Liu, Yi L
Hi Joerg,

> From: Joerg Roedel [mailto:j...@8bytes.org]
> Sent: Monday, December 3, 2018 5:49 AM
> To: Lu Baolu 
> Subject: Re: [PATCH v5 04/12] iommu/vt-d: Add 256-bit invalidation descriptor
> support
> 
> On Wed, Nov 28, 2018 at 11:54:41AM +0800, Lu Baolu wrote:
> > -
> > -   desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC | __GFP_ZERO,
> 0);
> > +   /*
> > +* Need two pages to accommodate 256 descriptors of 256 bits each
> > +* if the remapping hardware supports scalable mode translation.
> > +*/
> > +   desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC | __GFP_ZERO,
> > +!!ecap_smts(iommu->ecap));
> 
> 
> Same here, does the allocation really need GFP_ATOMIC?

still leave to Baolu.

> 
> >  struct q_inval {
> > raw_spinlock_t  q_lock;
> > -   struct qi_desc  *desc;  /* invalidation queue */
> > +   void*desc;  /* invalidation queue */
> > int *desc_status;   /* desc status */
> > int free_head;  /* first free entry */
> > int free_tail;  /* last free entry */
> 
> Why do you switch the pointer to void* ?

In this patch, there is some code like the code below. It calculates
destination address of memcpy with qi->desc. If it's still struct qi_desc
pointer, the calculation result would be wrong.

+   memcpy(desc, qi->desc + (wait_index << shift),
+  1 << shift);

The change of the calculation method is to support 128 bits invalidation
descriptors and 256 invalidation descriptors in this unified code logic.

Also, the conversation between Baolu and me may help.

https://lore.kernel.org/patchwork/patch/1006756/

> 
>   Joerg

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables

2018-12-03 Thread Liu, Yi L
Hi Joerg,

> From: Joerg Roedel [mailto:j...@8bytes.org]
> Sent: Monday, December 3, 2018 5:44 AM
> To: Lu Baolu 
> Subject: Re: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables
> 
> Hi Baolu,
> 
> On Wed, Nov 28, 2018 at 11:54:39AM +0800, Lu Baolu wrote:
> > @@ -2482,12 +2482,13 @@ static struct dmar_domain
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
> > if (dev)
> > dev->archdata.iommu = info;
> >
> > -   if (dev && dev_is_pci(dev) && info->pasid_supported) {
> > +   /* PASID table is mandatory for a PCI device in scalable mode. */
> > +   if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
> 
> This will also allocate a PASID table if the device does not support
> PASIDs, right? Will the table not be used in that case or will the
> device just use the fallback PASID? Isn't it better in that case to have
> no PASID table?

We need to allocate the PASID table in scalable mode, the reason is as below:
In VT-d scalable mode, all address translation is done in PASID-granularity.
For requests-with-PASID, the address translation would be subjected to the
PASID entry specified by the PASID value in the DMA request. However, for
requests-without-PASID, there is no PASID in the DMA request. To fulfil
the translation logic, we've introduced RID2PASID field in sm-context-entry
in VT-d 3.o spec. So that such DMA requests would be subjected to the pasid
entry specified by the PASID value in the RID2PASID field of sm-context-entry.

So for a device without PASID support, we need to at least to have a PASID
entry so that its DMA request (without pasid) can be translated. Thus a PASID
table is needed for such devices.

> 
> > @@ -143,18 +143,20 @@ int intel_pasid_alloc_table(struct device *dev)
> > return -ENOMEM;
> > INIT_LIST_HEAD(&pasid_table->dev);
> >
> > -   size = sizeof(struct pasid_entry);
> > -   count = min_t(int, pci_max_pasids(to_pci_dev(dev)), intel_pasid_max_id);
> > -   order = get_order(size * count);
> > +   if (info->pasid_supported)
> > +   max_pasid = min_t(int, pci_max_pasids(to_pci_dev(dev)),
> > + intel_pasid_max_id);
> > +
> > +   size = max_pasid >> (PASID_PDE_SHIFT - 3);
> > +   order = size ? get_order(size) : 0;
> > pages = alloc_pages_node(info->iommu->node,
> > -GFP_ATOMIC | __GFP_ZERO,
> > -order);
> > +GFP_ATOMIC | __GFP_ZERO, order);
> 
> This is a simple data structure allocation path, does it need
> GFP_ATOMIC?

will leave it to Baolu.

> 
>   Joerg

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH 1/5] iommu: Add APIs for IOMMU PASID management

2018-12-15 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Sunday, November 11, 2018 10:45 PM
> Subject: [RFC PATCH 1/5] iommu: Add APIs for IOMMU PASID management
> 
> This adds APIs for IOMMU drivers and device drivers to manage the PASIDs used 
> for
> DMA transfer and translation. It bases on I/O ASID allocator for PASID 
> namespace
> management and relies on vendor specific IOMMU drivers for paravirtual PASIDs.
> 
> Below APIs are added:
> 
> * iommu_pasid_init(pasid)
>   - Initialize a PASID consumer. The vendor specific IOMMU
> drivers are able to set the PASID range imposed by IOMMU
> hardware through a callback in iommu_ops.
> 
> * iommu_pasid_exit(pasid)
>   - The PASID consumer stops consuming any PASID.
> 
> * iommu_pasid_alloc(pasid, min, max, private, *ioasid)
>   - Allocate a PASID and associate a @private data with this
> PASID. The PASID value is stored in @ioaisd if returning
> success.
> 
> * iommu_pasid_free(pasid, ioasid)
>   - Free a PASID to the pool so that it could be consumed by
> others.
> 
> This also adds below helpers to lookup or iterate PASID items associated with 
> a
> consumer.
> 
> * iommu_pasid_for_each(pasid, func, data)
>   - Iterate PASID items of the consumer identified by @pasid,
> and call @func() against each item. An error returned from
> @func() will break the iteration.
> 
> * iommu_pasid_find(pasid, ioasid)
>   - Retrieve the private data associated with @ioasid.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Jean-Philippe Brucker 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/Kconfig |  1 +
>  drivers/iommu/iommu.c | 89 +++
>  include/linux/iommu.h | 73 +++
>  3 files changed, 163 insertions(+)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index
> d9a25715650e..39f2bb76c7b8 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -1,6 +1,7 @@
>  # IOMMU_API always gets selected by whoever wants it.
>  config IOMMU_API
>   bool
> + select IOASID
> 
>  menuconfig IOMMU_SUPPORT
>   bool "IOMMU Hardware Support"
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> 0b7c96d1425e..570b244897bb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2082,3 +2082,92 @@ void iommu_detach_device_aux(struct iommu_domain
> *domain, struct device *dev)
>   }
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_device_aux);
> +
> +/*
> + * APIs for PASID used by IOMMU and the device drivers which depend
> + * on IOMMU.
> + */
> +struct iommu_pasid *iommu_pasid_init(struct bus_type *bus) {

I'm thinking about if using struct iommu_domain here is better
than struct bus_type. The major purpose is to pass iommu_ops
in it and route into iommu-sublayer. iommu_domain may be
better since some modules like vfio_iommu_type1 would use
iommu_domain more than bus type.

Thanks,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v6 0/9] vfio/mdev: IOMMU aware mediated device

2019-02-19 Thread Liu, Yi L
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, February 15, 2019 4:15 AM
> To: Lu Baolu 
> Subject: Re: [PATCH v6 0/9] vfio/mdev: IOMMU aware mediated device
> 
> On Wed, 13 Feb 2019 12:02:52 +0800
> Lu Baolu  wrote:
> 
> > Hi,
> >
> > The Mediate Device is a framework for fine-grained physical device
> > sharing across the isolated domains. Currently the mdev framework is
> > designed to be independent of the platform IOMMU support. As the
> > result, the DMA isolation relies on the mdev parent device in a vendor
> > specific way.
> >
> > There are several cases where a mediated device could be protected and
> > isolated by the platform IOMMU. For example, Intel vt-d rev3.0 [1]
> > introduces a new translation mode called 'scalable mode', which
> > enables PASID-granular translations. The vt-d scalable mode is the key
> > ingredient for Scalable I/O Virtualization [2] [3] which allows
> > sharing a device in minimal possible granularity (ADI - Assignable
> > Device Interface).
> >
> > A mediated device backed by an ADI could be protected and isolated by
> > the IOMMU since 1) the parent device supports tagging an unique PASID
> > to all DMA traffic out of the mediated device; and 2) the DMA
> > translation unit (IOMMU) supports the PASID granular translation.
> > We can apply IOMMU protection and isolation to this kind of devices
> > just as what we are doing with an assignable PCI device.
> >
> > In order to distinguish the IOMMU-capable mediated devices from those
> > which still need to rely on parent devices, this patch set adds one
> > new member in struct mdev_device.
> >
> > * iommu_device
> >   - This, if set, indicates that the mediated device could
> > be fully isolated and protected by IOMMU via attaching
> > an iommu domain to this device. If empty, it indicates
> > using vendor defined isolation.
> >
> > Below helpers are added to set and get above iommu device in mdev core
> > implementation.
> >
> > * mdev_set/get_iommu_device(dev, iommu_device)
> >   - Set or get the iommu device which represents this mdev
> > in IOMMU's device scope. Drivers don't need to set the
> > iommu device if it uses vendor defined isolation.
> >
> > The mdev parent device driver could opt-in that the mdev could be
> > fully isolated and protected by the IOMMU when the mdev is being
> > created by invoking mdev_set_iommu_device() in its @create().
> >
> > In the vfio_iommu_type1_attach_group(), a domain allocated through
> > iommu_domain_alloc() will be attached to the mdev iommu device if an
> > iommu device has been set. Otherwise, the dummy external domain will
> > be used and all the DMA isolation and protection are routed to parent
> > driver as the result.
> >
> > On IOMMU side, a basic requirement is allowing to attach multiple
> > domains to a PCI device if the device advertises the capability and
> > the IOMMU hardware supports finer granularity translations than the
> > normal PCI Source ID based translation.
> >
> > As the result, a PCI device could work in two modes: normal mode and
> > auxiliary mode. In the normal mode, a pci device could be isolated in
> > the Source ID granularity; the pci device itself could be assigned to
> > a user application by attaching a single domain to it. In the
> > auxiliary mode, a pci device could be isolated in finer granularity,
> > hence subsets of the device could be assigned to different user level
> > application by attaching a different domain to each subset.
> >
> > Below APIs are introduced in iommu generic layer for aux-domain
> > purpose:
> >
> > * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
> >   - Check whether both IOMMU and device support IOMMU aux
> > domain feature. Below aux-domain specific interfaces
> > are available only after this returns true.
> >
> > * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
> >   - Enable/disable device specific aux-domain feature.
> >
> > * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
> >   - Check whether the aux domain specific feature enabled or
> > not.
> >
> > * iommu_aux_attach_device(domain, dev)
> >   - Attaches @domain to @dev in the auxiliary mode. Multiple
> > domains could be attached to a single device in the
> > auxiliary mode with each domain representing an isolated
> > address space for an assignable subset of the device.
> >
> > * iommu_aux_detach_device(domain, dev)
> >   - Detach @domain which has been attached to @dev in the
> > auxiliary mode.
> >
> > * iommu_aux_get_pasid(domain, dev)
> >   - Return ID used for finer-granularity DMA translation.
> > For the Intel Scalable IOV usage model, this will be
> > a PASID. The device which supports Scalable IOV needs
> > to write this ID to the device register so that DMA
> > requests could be tagged with a right PASID prefix.
> >
> > In order for the ease of discussion, sometimes we call "a domain in
> > auxiliary mode' or simply 'an auxiliary d

[RFC v3 0/3] vfio_pci: wrap pci device as a mediated device

2019-04-24 Thread Liu, Yi L
This patchset aims to add a vfio-pci-like meta driver as a demo
user of the vfio changes introduced in "vfio/mdev: IOMMU aware
mediated device" patchset from Baolu Lu.

Previous RFC v1 has given two proposals and the discussion could
be found in following link. Per the comments, this patchset adds
a separate driver named vfio-mdev-pci. It is a sample driver, but
loactes in drivers/vfio/pci due to code sharing consideration.
The corresponding Kconfig definition is in samples/Kconfig.

https://lkml.org/lkml/2019/3/4/529

Besides the test purpose, per Alex's comments, it could also be a
good base driver for experimenting with device specific mdev migration.

Specific interface tested in this proposal:

*) int mdev_set_iommu_device(struct device *dev,
struct device *iommu_device)
   introduced in the patch as below:
   "[PATCH v5 6/8] vfio/mdev: Add iommu related member in mdev_device"


Links:
*) Link of "vfio/mdev: IOMMU aware mediated device"
https://lwn.net/Articles/780522/

Please feel free give your comments.

Thanks,
Yi Liu

Change log:
  v2->v3:
  - use vfio-mdev-pci instead of vfio-pci-mdev
  - place the new driver under drivers/vfio/pci while define
Kconfig in samples/Kconfig to clarify it is a sample driver

  v1->v2:
  - instead of adding kernel option to existing vfio-pci
module in v1, v2 follows Alex's suggestion to add a
separate vfio-pci-mdev module.
  - new patchset subject: "vfio/pci: wrap pci device as a mediated device"

Liu, Yi L (3):
  vfio_pci: split vfio_pci.c into two source files
  vfio/pci: protect cap/ecap_perm bits alloc/free with atomic op
  smaples: add vfio-mdev-pci driver

 drivers/vfio/pci/Makefile   |7 +-
 drivers/vfio/pci/common.c   | 1511 +++
 drivers/vfio/pci/vfio_mdev_pci.c|  386 +
 drivers/vfio/pci/vfio_pci.c | 1476 +-
 drivers/vfio/pci/vfio_pci_config.c  |9 +
 drivers/vfio/pci/vfio_pci_private.h |   27 +
 samples/Kconfig |   11 +
 7 files changed, 1962 insertions(+), 1465 deletions(-)
 create mode 100644 drivers/vfio/pci/common.c
 create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 2/3] vfio/pci: protect cap/ecap_perm bits alloc/free with atomic op

2019-04-24 Thread Liu, Yi L
There is a case in which cap_perms and ecap_perms can be reallocated
by different modules. e.g. the vfio-mdev-pci sample driver. To secure
the initialization of cap_perms and ecap_perms, this patch adds an
atomic variable to track the user of cap/ecap_perms bits. First caller
of vfio_pci_init_perm_bits() will initialize the bits. While the last
caller of vfio_pci_uninit_perm_bits() will free the bits.

Cc: Kevin Tian 
Cc: Lu Baolu 
Suggested-by: Alex Williamson 
Signed-off-by: Liu, Yi L 
---
 drivers/vfio/pci/vfio_pci_config.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index e82b511..913fca6 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -996,11 +996,17 @@ static int __init init_pci_ext_cap_pwr_perm(struct 
perm_bits *perm)
return 0;
 }
 
+/* Track the user number of the cap/ecap perm_bits */
+atomic_t vfio_pci_perm_bits_users = ATOMIC_INIT(0);
+
 /*
  * Initialize the shared permission tables
  */
 void vfio_pci_uninit_perm_bits(void)
 {
+   if (atomic_dec_return(&vfio_pci_perm_bits_users))
+   return;
+
free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]);
 
free_perm_bits(&cap_perms[PCI_CAP_ID_PM]);
@@ -1017,6 +1023,9 @@ int __init vfio_pci_init_perm_bits(void)
 {
int ret;
 
+   if (atomic_inc_return(&vfio_pci_perm_bits_users) != 1)
+   return 0;
+
/* Basic config space */
ret = init_pci_cap_basic_perm(&cap_perms[PCI_CAP_ID_BASIC]);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 1/3] vfio_pci: split vfio_pci.c into two source files

2019-04-24 Thread Liu, Yi L
This patch splits the non-module specific codes from original
drivers/vfio/pci/vfio_pci.c into a common.c under drivers/vfio/pci.
This is for potential code sharing. e.g. vfio-mdev-pci driver

Cc: Kevin Tian 
Cc: Lu Baolu 
Signed-off-by: Liu, Yi L 
---
 drivers/vfio/pci/Makefile   |2 +-
 drivers/vfio/pci/common.c   | 1511 +++
 drivers/vfio/pci/vfio_pci.c | 1476 +-
 drivers/vfio/pci/vfio_pci_private.h |   27 +
 4 files changed, 1551 insertions(+), 1465 deletions(-)
 create mode 100644 drivers/vfio/pci/common.c

diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 9662c06..813f6b3 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,5 +1,5 @@
 
-vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-y := vfio_pci.o common.o vfio_pci_intrs.o vfio_pci_rdwr.o 
vfio_pci_config.o
 vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
 vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
 
diff --git a/drivers/vfio/pci/common.c b/drivers/vfio/pci/common.c
new file mode 100644
index 000..847e2e4
--- /dev/null
+++ b/drivers/vfio/pci/common.c
@@ -0,0 +1,1511 @@
+/*
+ * Copyright © 2019 Intel Corporation.
+ * Author: Liu, Yi L 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio_pci.c:
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vfio_pci_private.h"
+
+inline bool vfio_vga_disabled(struct vfio_pci_device *vdev)
+{
+#ifdef CONFIG_VFIO_PCI_VGA
+   return vdev->disable_vga;
+#else
+   return true;
+#endif
+}
+
+/*
+ * Our VGA arbiter participation is limited since we don't know anything
+ * about the device itself.  However, if the device is the only VGA device
+ * downstream of a bridge and VFIO VGA support is disabled, then we can
+ * safely return legacy VGA IO and memory as not decoded since the user
+ * has no way to get to it and routing can be disabled externally at the
+ * bridge.
+ */
+static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
+{
+   struct vfio_pci_device *vdev = opaque;
+   struct pci_dev *tmp = NULL, *pdev = vdev->pdev;
+   unsigned char max_busnr;
+   unsigned int decodes;
+
+   if (single_vga || !vfio_vga_disabled(vdev) ||
+   pci_is_root_bus(pdev->bus))
+   return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+  VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
+
+   max_busnr = pci_bus_max_busnr(pdev->bus);
+   decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM;
+
+   while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, tmp)) != NULL) {
+   if (tmp == pdev ||
+   pci_domain_nr(tmp->bus) != pci_domain_nr(pdev->bus) ||
+   pci_is_root_bus(tmp->bus))
+   continue;
+
+   if (tmp->bus->number >= pdev->bus->number &&
+   tmp->bus->number <= max_busnr) {
+   pci_dev_put(tmp);
+   decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
+   break;
+   }
+   }
+
+   return decodes;
+}
+
+inline bool vfio_pci_is_vga(struct pci_dev *pdev)
+{
+   return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
+}
+
+void vfio_pci_vga_probe(struct vfio_pci_device *vdev)
+{
+   vga_client_register(vdev->pdev, vdev, NULL, vfio_pci_set_vga_decode);
+   vga_set_legacy_decoding(vdev->pdev,
+   vfio_pci_set_vga_decode(vdev, false));
+}
+
+void vfio_pci_vga_remove(struct vfio_pci_device *vdev)
+{
+   vga_client_register(vdev->pdev, NULL, NULL, NULL);
+   vga_set_legacy_decoding(vdev->pdev,
+   VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+   VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
+}
+
+static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
+{
+   struct resource *res;
+   int bar;
+   struct vfio_pci_dummy_resource *dummy_res;
+
+   INIT_LIST_HEAD(&vdev->dummy_resources_list);
+
+   for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
+   res = vdev->pdev->resource + bar;
+
+   if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
+  

[RFC v3 3/3] smaples: add vfio-mdev-pci driver

2019-04-24 Thread Liu, Yi L
This patch adds sample driver named vfio-mdev-pci. It is to wrap
a PCI device as a mediated device. For a pci device, once bound
to vfio-mdev-pci driver, user space access of this device will
go through vfio mdev framework. The usage of the device follows
mdev management method. e.g. user should create a mdev before
exposing the device to user-space.

Benefit of this new driver would be acting as a sample driver
for recent changes from "vfio/mdev: IOMMU aware mediated device"
patchset. Also it could be a good experiment driver for future
device specific mdev migration support.

To use this driver:
a) build and load vfio-mdev-pci.ko module
   execute "make menuconfig" and config CONFIG_SAMPLE_VFIO_MDEV_PCI
   then load it with following command
   > sudo modprobe vfio
   > sudo modprobe vfio-pci
   > sudo insmod drivers/vfio/pci/vfio-mdev-pci.ko

b) unbind original device driver
   e.g. use following command to unbind its original driver
   > echo $dev_bdf > /sys/bus/pci/devices/$dev_bdf/driver/unbind

c) bind vfio-mdev-pci driver to the physical device
   > echo $vend_id $dev_id > /sys/bus/pci/drivers/vfio-mdev-pci/new_id

d) check the supported mdev instances
   > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/
 vfio-mdev-pci-type1
   > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\
 vfio-mdev-pci-type1/
 available_instances  create  device_api  devices  name

e)  create mdev on this physical device (only 1 instance)
   > echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1003" > \
 /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\
 vfio-mdev-pci-type1/create

f) passthru the mdev to guest
   add the following line in Qemu boot command
   -device vfio-pci,\
sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003

g) destroy mdev
   > echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003/\
 remove

Cc: Kevin Tian 
Cc: Lu Baolu 
Cc: Masahiro Yamada 
Suggested-by: Alex Williamson 
Signed-off-by: Liu, Yi L 
---
 drivers/vfio/pci/Makefile|   5 +
 drivers/vfio/pci/vfio_mdev_pci.c | 386 +++
 samples/Kconfig  |  11 ++
 3 files changed, 402 insertions(+)
 create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c

diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 813f6b3..6a05393 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -3,4 +3,9 @@ vfio-pci-y := vfio_pci.o common.o vfio_pci_intrs.o 
vfio_pci_rdwr.o vfio_pci_conf
 vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
 vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
 
+vfio-mdev-pci-y := vfio_mdev_pci.o common.o vfio_pci_intrs.o vfio_pci_rdwr.o 
vfio_pci_config.o
+vfio-mdev-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
+vfio-mdev-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
+
 obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+obj-$(CONFIG_SAMPLE_VFIO_MDEV_PCI) += vfio-mdev-pci.o
diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c
new file mode 100644
index 000..aec7a5b
--- /dev/null
+++ b/drivers/vfio/pci/vfio_mdev_pci.c
@@ -0,0 +1,386 @@
+/*
+ * Copyright © 2019 Intel Corporation.
+ * Author: Liu, Yi L 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio_pci.c:
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vfio_pci_private.h"
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Liu, Yi L "
+#define DRIVER_DESC "VFIO Mdev PCI - Sample driver for PCI device as a 
mdev"
+
+#define VFIO_MDEV_PCI_NAME  "vfio-mdev-pci"
+
+static char ids[1024] __initdata;
+module_param_string(ids, ids, sizeof(ids), 0);
+MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio-mdev-pci driver, 
format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask\" and 
multiple comma separated entries can be specified");
+
+static bool nointxmask;
+module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(nointxmask,
+ "Disable support for PCI 2.3 style INTx masking.  If this 
resolves problems for specific devices, report lspci -vvvxxx to 
linux-...@vger.kernel.org so the device can be fixed automatically via the 
broken_intx_masking flag.");
+
+#ifdef CONFIG_VFIO_

RE: [PATCH v2 09/19] iommu/vt-d: Enlightened PASID allocation

2019-04-25 Thread Liu, Yi L
Hi Eric,

> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: Thursday, April 25, 2019 1:28 AM
> To: Jacob Pan ; 
> iommu@lists.linux-foundation.org;
> Subject: Re: [PATCH v2 09/19] iommu/vt-d: Enlightened PASID allocation
> 
> Hi Jacob,
> 
> On 4/24/19 1:31 AM, Jacob Pan wrote:
> > From: Lu Baolu 
> >
> > If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the IOMMU
> > driver should rely on the emulation software to allocate and free
> > PASID IDs.
> Do we make the decision depending on the CM or depending on the VCCAP_REG?
> 
> VCCAP_REG description says:
> 
> If Set, software must use Virtual Command Register interface to allocate and 
> free
> PASIDs.

The answer is it depends on the ECAP.VCS and then the PASID allocation bit in
VCCAP_REG. But VCS bit implies the iommu is a software implementation
(vIOMMU) of vt-d architecture. Pls refer to the descriptions of "Virtual
Command Support" in vt-d 3.0 spec.

"Hardware implementations of this architecture report a value of 0
in this field. Software implementations (emulation) of this
architecture may report VCS=1."

Thanks,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: bind pasid table API

2017-09-20 Thread Liu, Yi L
Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Wednesday, September 20, 2017 8:10 PM
> To: Pan, Jacob jun ; iommu@lists.linux-
> foundation.org
> Cc: Liu, Yi L ; Raj, Ashok ; David
> Woodhouse ; Joerg Roedel ; Tian,
> Kevin ; Auger Eric 
> Subject: Re: bind pasid table API
> 
> Hi Jacob,
> 
> [Adding Eric as he might need pasid_table_info for vSVM at some point]
> 
> On 19/09/17 04:45, Jacob Pan wrote:
> > Hi Jean and All,
> >
> > This is a follow-up on the LPC discussion we had last week.
> > (https://linuxplumbersconf.org/2017/ocw/proposals/4748)
> >
> > My understanding is that the data structure below can satisfy the
> > needs from Intel (pointer + size) and AMD (pointer only). But ARM
> > pvIOMMU would need additional info to indicate the page table format.
> > Could you share your idea of the right addition for ARM such that we
> > can have a unified API?
> >
> > /**
> >  * PASID table data used to bind guest PASID table to the host IOMMU.
> > This will
> >  * enable guest managed first level page tables.
> >  * @ptr:PASID table pointer
> >  * @size_order: number of bits supported in the guest PASID table, must
> be less
> >  *  or equal than the host table size.
> >  */
> > struct pasid_table_info {
> > __u64   ptr;
> > __u64   size_order;
> > };
> 
> For the PASID table, Arm SMMUv3 would need two additional fields:
> * 'format' telling whether the table has 1 or 2 levels and their
>   dimensions,
> * 'default_substream' telling if PASID0 is reserved for non-pasid traffic.
> 
> I think that's it for the moment, but it does require to leave space for a 
> vendor-
> specific structure at the end. It is one reason why I'd prefer having a 
> 'model' field
> in the pasid_table_info structure telling what fields the whole structure 
> actually
> contains.
> 
> Another reason is if some IOMMU is able to support multiple PASID table
> formats, it could advertise them all in sysfs and Qemu could tell which one it
> chose in 'model'. I'm not sure we'll ever see that in practice.

Regards to your idea, I think the whole flow may be:
* Qemu queries underlying IOMMU for the capability(through sysfs)
* Combined with requirement from user(the guy who starts the VM), Qemu chose a
   suitable model or exit if HW capability is incompatible with user's 
requirement
* In the coming bind_pasid_table() calling, Qemu pass the chosen model info to 
host
* Host checks the "model" and use the correct model specific structure to parse
   the model specific data. may be kind of structure intel_pasid_table_info,
   amd_pasid_table_info or arm_pasid_table_info_v#

Does this flow show what you want? This would be a "model" + "model specific 
data"
proposal. And my concern is the model specific field may look like to be kind 
of "opaque".

Besides the comments above, is there also possibility for us to put all the 
possible info
in a super-set just as what we plan to do for the tlb_invalidate() API?

> 
> For binding page tables instead of PASID tables (e.g. virtio-iommu), the 
> generic
> data would be:
> 
> struct pgtable_info {
>   __u32   pasid;
>   __u64   ptr;
>   __u32   model;
>   __u8model_data[];
> };

Besides bind_pasid_table API, you would also want to propose an extra API
which likely to be named as bind_pgtable() for this page table binding?

What would be the "model" field indicate? "vendor" or "vendor+version" ? You
may want a length field to indicate the size of "model_data" field.

And same with the bind_pasid_table API, would model_data look like an "opaque"?

> Followed by a few arch-specific configuration values. For Arm we can summarize
> this to three registers, defined in the Armv8 Architecture Reference Manual:
> 
> struct arm_lpae_pgtable_info {
>   __u64   tcr;/* Translation Control Register */
>   __u64   mair;   /* Memory Attributes Indirection Register */
>   __u64   asid;   /* Address Space ID */
> };

Hmmm, just for curious. What is the difference between "pasid" and "asid"?

Thanks,
Yi L

> Some data packed in the TCR might be common to most architectures, like page
> granularity and max VA size. Most fields of the TCR won't be used but it 
> provides
> a nice architected way to communicate Arm page table configuration.
> 
> Note that there might be an additional page directory in the arch-specific 
> info, as
> we can split the address space in two. I'm not sure whether we should allow it
> yet.
> 
> Thanks,
> Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 03/16] iommu: introduce iommu invalidate API function

2017-10-11 Thread Liu, Yi L

> On Tue, 10 Oct 2017 15:35:42 +0200
> Joerg Roedel  wrote:
> 
> > On Thu, Oct 05, 2017 at 04:03:31PM -0700, Jacob Pan wrote:
> > > +int iommu_invalidate(struct iommu_domain *domain,
> > > + struct device *dev, struct tlb_invalidate_info
> > > *inv_info)
> >
> > This name is way too generic, it should at least be called
> > iommu_svm_invalidate() or something like that. With the name above it
> > is easily confused with the other TLB invalidation functions of the
> > IOMMU-API.
> >
> Good point. I was calling it iommu_passdown_invalidate() originally.
> The invalidation request comes from guest or user space instead of in-kernel 
> unmap
> kind of calls.

[Liu, Yi L] I agree that iommu_invalidate() is too generic. Additionally, also 
better to avoid
making it svm specific.

The reason we introduce this API is in vSVM case is that guest owns the first 
level page
table(vtd). If we use similar mechanism for vIOVA, then we also need to 
passdown guest's
vIOVA tlb flush.

Since it is to expose an API for iommu tlb flushes requests from 
userspace/guest which is out
of iommu. How about naming it as iommu_tlb_external_invalidate()?

> > > +enum iommu_inv_granularity {
> > > + IOMMU_INV_GRANU_GLOBAL, /* all TLBs
> > > invalidated */
> >
> > Is that needed? We certainly don't want to give userspace/guests that
> > fine-grained control about IOMMU cache invalidations.
> >
> > In the end a guest issues flush-global command does not translate to a
> > flush-global on the host, but to separate flushes for the domains the
> > guest uses.
> >
> Right, guest should not go beyond its own domain.

[Liu, Yi L] So far, for virtualization, we would not allow any guest to flush 
all the
physical iommu tlb. Hypervisor will limit it even if guest issues invalidate 
with global
granularity. Jacob just wants to show all the possible granularity. Actually, 
global gran
can be a big hammer to clear cache, maybe there is usage somehow. For now, I 
think
we may just remove it.  

Thanks,
Yi L

> > > + IOMMU_INV_GRANU_DOMAIN, /* all TLBs
> > > associated with a domain */
> > > + IOMMU_INV_GRANU_DEVICE, /* caching
> > > structure associated with a
> > > +  * device ID
> >
> > What is the difference between a DOMAIN and a DEVICE flush?
> >
> Those are based on vt-d context cache flush granularity, domain selective 
> flushes all
> context caches associated with a domain ID.
> Device selective flush flushes context caches of a source ID.
> But like you pointed out below, since context cache flush will come in as 
> unbind call,
> there is no need to do passdown invalidate. I can remove that.
> 
> Here I am trying to use all generic definitions, which is a superset of all 
> vendor
> models. I am likely missing out some non-vt-d cases.
> 
> > > + IOMMU_INV_GRANU_DOMAN_PAGE, /* address range with a
> > > domain */
> > > + IOMMU_INV_GRANU_ALL_PASID,  /* cache of a given
> > > PASID */
> > > + IOMMU_INV_GRANU_PASID_SEL,  /* only invalidate
> > > specified PASID */ +
> > > + IOMMU_INV_GRANU_NG_ALL_PASID,   /* non-global within
> > > all PASIDs */
> > > + IOMMU_INV_GRANU_NG_PASID,   /* non-global within a
> > > PASIDs */
> > > + IOMMU_INV_GRANU_PAGE_PASID, /* page-selective
> > > within a PASID */
> > > + IOMMU_INV_NR_GRANU,
> > > +};
> > > +
> > > +enum iommu_inv_type {
> > > + IOMMU_INV_TYPE_DTLB,/* device IOTLB */
> > > + IOMMU_INV_TYPE_TLB, /* IOMMU paging structure cache
> > > */
> > > + IOMMU_INV_TYPE_PASID,   /* PASID cache */
> > > + IOMMU_INV_TYPE_CONTEXT, /* device context entry
> > > cache */
> >
> > Is that really needed? When the guest updates it context-entry
> > equivalent it translates to bind_pasid_table/unbind_pasid_table calls,
> > no?
> >
> Right no need to passdown context cache invalidation for VT-d. I just wasn't 
> sure it is
> the same for all models. Again, trying to have a superset of generic fields.
> 
> Thanks!
> 
> Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 03/16] iommu: introduce iommu invalidate API function

2017-10-11 Thread Liu, Yi L
> On Wed, Oct 11, 2017 at 07:54:32AM +0000, Liu, Yi L wrote:
> > I agree that iommu_invalidate() is too generic. Additionally, also
> > better to avoid making it svm specific.
> 
> I also don't like to name the functions after the Intel feature, but I failed 
> to come up
> with a better alternative so far. The only one I can come up with for now 
> would be
> 'iovm', so the function name would be iommu_iovm_invalidate().

[Liu, Yi L] Actually, I'm not against 'SVM' terms. Just want to make it be 
compatible
with future usage in non-SVM scenario.

> On the other side, the ARM guys also already call the feature set 'SVM', 
> despite it
> being ambiguous and Intel specific. I don't have a strong opinion on the 
> naming.
> 
> > The reason we introduce this API is in vSVM case is that guest owns
> > the first level page table(vtd). If we use similar mechanism for
> > vIOVA, then we also need to passdown guest's vIOVA tlb flush.
> >
> > Since it is to expose an API for iommu tlb flushes requests from
> > userspace/guest which is out of iommu. How about naming it as
> > iommu_tlb_external_invalidate()?
> 
> If you only read the function name, 'external' could mean everything. It is 
> not clear

[Liu, Yi L] Agree, 'external' is also unclear.

> from the name when to use this function. So something like
> iommu_iovm_invalidate() is better.
> 

[Liu, Yi L] I didn't quite get 'iovm' mean. Can you explain a bit about the 
idea?

Thanks,
Yi L
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 03/16] iommu: introduce iommu invalidate API function

2017-10-12 Thread Liu, Yi L


> -Original Message-
> From: Bob Liu [mailto:liub...@huawei.com]
> Sent: Thursday, October 12, 2017 5:39 PM
> To: Jean-Philippe Brucker ; Joerg Roedel
> ; Liu, Yi L 
> Cc: Lan, Tianyu ; Liu, Yi L ; 
> Greg
> Kroah-Hartman ; Wysocki, Rafael J
> ; LKML ;
> iommu@lists.linux-foundation.org; David Woodhouse 
> Subject: Re: [PATCH v2 03/16] iommu: introduce iommu invalidate API function
> 
> On 2017/10/11 20:48, Jean-Philippe Brucker wrote:
> > On 11/10/17 13:15, Joerg Roedel wrote:
> >> On Wed, Oct 11, 2017 at 11:54:52AM +, Liu, Yi L wrote:
> >>> I didn't quite get 'iovm' mean. Can you explain a bit about the idea?
> >>
> >> It's short for IO Virtual Memory, basically a replacement term for 'svm'
> >> that is not ambiguous (afaik) and not specific to Intel.
> >
> > I wonder if SVM originated in OpenCL first, rather than intel? That's
> > why I'm using it, but it is ambiguous. I'm not sure IOVM is precise
> > enough though, since the name could as well be used without shared
> > tables, for classical map/unmap and IOVAs. Kevin Tian suggested SVA
> > "Shared Virtual Addressing" last time, which is a little more clear
> > than SVM and isn't used elsewhere in the kernel either.
> >
> 
> The process "vaddr" can be the same as "IOVA" by using the classical map/unmap
> way.
> This is also a kind of share virtual memory/address(except have to pin 
> physical
> memory).
> How to distinguish these two different implementation of "share virtual
> memory/address"?
> 
[Liu, Yi L] Not sure if I get your idea well. Process "vaddr" is owned by 
process and
maintained by mmu, while "IOVA" is maintained by iommu. So they are different 
in the
way they are maintained. Since process "vaddr" is maintained by mmu and then 
used by
iommu, so we call it shared virtual memory/address. This is how "shared" term 
comes.
I didn't quite get " two different implementation of "share virtual 
memory/address"".
Maybe you can explain further more.

Regards,
Yi L

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 03/16] iommu: introduce iommu invalidate API function

2017-10-12 Thread Liu, Yi L

> -Original Message-
> From: Bob Liu [mailto:liub...@huawei.com]
> Sent: Thursday, October 12, 2017 6:08 PM
> To: Liu, Yi L ; Jean-Philippe Brucker  philippe.bruc...@arm.com>; Joerg Roedel 
> Cc: Lan, Tianyu ; Liu, Yi L ; 
> Greg
> Kroah-Hartman ; Wysocki, Rafael J
> ; LKML ;
> iommu@lists.linux-foundation.org; David Woodhouse 
> Subject: Re: [PATCH v2 03/16] iommu: introduce iommu invalidate API function
> 
> On 2017/10/12 17:50, Liu, Yi L wrote:
> >
> >
> >> -Original Message-
> >> From: Bob Liu [mailto:liub...@huawei.com]
> >> Sent: Thursday, October 12, 2017 5:39 PM
> >> To: Jean-Philippe Brucker ; Joerg
> >> Roedel ; Liu, Yi L 
> >> Cc: Lan, Tianyu ; Liu, Yi L
> >> ; Greg Kroah-Hartman
> >> ; Wysocki, Rafael J
> >> ; LKML ;
> >> iommu@lists.linux-foundation.org; David Woodhouse
> >> 
> >> Subject: Re: [PATCH v2 03/16] iommu: introduce iommu invalidate API
> >> function
> >>
> >> On 2017/10/11 20:48, Jean-Philippe Brucker wrote:
> >>> On 11/10/17 13:15, Joerg Roedel wrote:
> >>>> On Wed, Oct 11, 2017 at 11:54:52AM +, Liu, Yi L wrote:
> >>>>> I didn't quite get 'iovm' mean. Can you explain a bit about the idea?
> >>>>
> >>>> It's short for IO Virtual Memory, basically a replacement term for 'svm'
> >>>> that is not ambiguous (afaik) and not specific to Intel.
> >>>
> >>> I wonder if SVM originated in OpenCL first, rather than intel?
> >>> That's why I'm using it, but it is ambiguous. I'm not sure IOVM is
> >>> precise enough though, since the name could as well be used without
> >>> shared tables, for classical map/unmap and IOVAs. Kevin Tian
> >>> suggested SVA "Shared Virtual Addressing" last time, which is a
> >>> little more clear than SVM and isn't used elsewhere in the kernel either.
> >>>
> >>
> >> The process "vaddr" can be the same as "IOVA" by using the classical
> >> map/unmap way.
> >> This is also a kind of share virtual memory/address(except have to
> >> pin physical memory).
> >> How to distinguish these two different implementation of "share
> >> virtual memory/address"?
> >>
> > [Liu, Yi L] Not sure if I get your idea well. Process "vaddr" is owned
> > by process and maintained by mmu, while "IOVA" is maintained by iommu.
> > So they are different in the way they are maintained. Since process
> > "vaddr" is maintained by mmu and then used by iommu, so we call it shared 
> > virtual
> memory/address. This is how "shared" term comes.
> 
> I think from the view of application, the share virtual memory/address(or 
> Nvidia-
> CUDA unify virtual address) is like this:
> 
> 1. vaddr = malloc(); e.g vaddr=0x1
> 2. device can get the same data(accessing the same physical memory) through 
> same
> address e.g 0x1, and don't care about it's a vaddr or IOVA..
> (actually in Nvidia-cuda case, the data will be migrated between system-ddr 
> and gpu-
> memory, but the vaddr is always the same for CPU and GPU).
> 
> So there are two ways(beside Nvidia way) to implement this requirement:
> 1)
> get the physical memory of vaddr;
> dma_map the paddr to iova;
> If we appoint iova = vaddr (e.g iova can be controlled by the user space 
> driver
> through vfio DMA_MAP), This can also be called share virtual address between 
> CPU
> process and device..

[Liu, Yi L] I see. Thx for raising it. I think this is a software way to get 
data shared between
process and device. However, it's not sharing the virtual address space since 
device would
still use iova to do access memory. So it should be another story.

Regards,
Yi L

> 2)
> The second way is what this RFC did.
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 02/16] iommu/vt-d: add bind_pasid_table function

2017-10-12 Thread Liu, Yi L
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Friday, October 6, 2017 7:04 AM
> To: iommu@lists.linux-foundation.org; LKML ; 
> Joerg
> Roedel ; David Woodhouse ; Greg
> Kroah-Hartman ; Wysocki, Rafael J
> ; Jean-Philippe Brucker  philippe.bruc...@arm.com>
> Cc: Liu, Yi L ; Lan, Tianyu ; Tian, 
> Kevin
> ; Raj, Ashok ; Alex Williamson
> ; Jacob Pan ; Liu; 
> Yi
> L 
> Subject: [PATCH v2 02/16] iommu/vt-d: add bind_pasid_table function
> 
> Add Intel VT-d ops to the generic iommu_bind_pasid_table API functions.
> 
> The primary use case is for direct assignment of SVM capable device. 
> Originated
> from emulated IOMMU in the guest, the request goes through many layers (e.g.
> VFIO). Upon calling host IOMMU driver, caller passes guest PASID table 
> pointer (GPA)
> and size.
> 
> Device context table entry is modified by Intel IOMMU specific 
> bind_pasid_table
> function. This will turn on nesting mode and matching translation type.
> 
> The unbind operation restores default context mapping.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/iommu/intel-iommu.c   | 117
> ++
>  include/linux/dma_remapping.h |   1 +
>  2 files changed, 118 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
> 209d99a..7ae569c 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5200,6 +5200,7 @@ static void intel_iommu_put_resv_regions(struct device
> *dev,
> 
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  #define MAX_NR_PASID_BITS (20)
> +#define MIN_NR_PASID_BITS (5)
>  static inline unsigned long intel_iommu_get_pts(struct intel_iommu *iommu)  {
>   /*
> @@ -5326,6 +5327,118 @@ struct intel_iommu
> *intel_svm_device_to_iommu(struct device *dev)
> 
>   return iommu;
>  }
> +
> +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
> + struct device *dev, struct pasid_table_config *pasidt_binfo) {
> + struct intel_iommu *iommu;
> + struct context_entry *context;
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + struct pci_dev *pdev;
> + u8 bus, devfn, host_table_pasid_bits;
> + u16 did, sid;
> + int ret = 0;
> + unsigned long flags;
> + u64 ctx_lo;
> +
> + iommu = device_to_iommu(dev, &bus, &devfn);
> + if (!iommu)
> + return -ENODEV;
> + /* VT-d spec 9.4 says pasid table size is encoded as 2^(x+5) */
> + host_table_pasid_bits = intel_iommu_get_pts(iommu) +
> MIN_NR_PASID_BITS;
> + if (!pasidt_binfo || pasidt_binfo->pasid_bits > host_table_pasid_bits ||
> + pasidt_binfo->pasid_bits < MIN_NR_PASID_BITS) {
> + pr_err("Invalid gPASID bits %d, host range %d - %d\n",
> + pasidt_binfo->pasid_bits,
> + MIN_NR_PASID_BITS, host_table_pasid_bits);
> + return -ERANGE;
> + }
> +
> + pdev = to_pci_dev(dev);
> + if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> + return -EINVAL;
> + sid = PCI_DEVID(bus, devfn);
> +
> + info = dev->archdata.iommu;
> + if (!info || !info->pasid_supported) {
> + dev_err(dev, "No PASID support\n");
> + ret = -EINVAL;
> + goto out;
> + }
> + if (!info->pasid_enabled) {
> +         ret = pci_enable_pasid(pdev, info->pasid_supported & ~1);
> + if (ret)
> + goto out;
> + }
> + if (!device_context_mapped(iommu, bus, devfn)) {
> + pr_warn("ctx not mapped for bus devfn %x:%x\n", bus, devfn);
> + ret = -EINVAL;
> + goto out;
> + }

[Liu, Yi L] This is checking whether ctx is present. So if it is true, then the 
following 6 line
should be always true. Perhaps, a merge could be done here with the following 6 
lines.

> + spin_lock_irqsave(&iommu->lock, flags);
> + context = iommu_context_addr(iommu, bus, devfn, 0);
> + if (!context) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> +

Regards,
Yi L
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 1/3] iommu/vt-d: Missing checks for pasid tables if allocation fails

2017-10-18 Thread Liu, Yi L


> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of Lu Baolu
> Sent: Thursday, October 19, 2017 8:39 AM
> To: j...@8bytes.org; dw...@infradead.org
> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> Subject: [PATCH 1/3] iommu/vt-d: Missing checks for pasid tables if 
> allocation fails
> 
> intel_svm_alloc_pasid_tables() might return an error but never be checked by 
> the
> callers. Later when intel_svm_bind_mm() is called, there are no checks for 
> valid pasid
> tables before enabling them.
> 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index
> f6697e5..43280ca 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -292,7 +292,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
> flags, struct svm_dev_
>   int pasid_max;
>   int ret;
> 
> - if (WARN_ON(!iommu))
> + if (WARN_ON(!iommu || !iommu->pasid_table))

[Liu, Yi L] Hi Baolu, I guess there also need a check to iommu->ecap , see if 
the pasid bit
is reported. thoughts?

Regards,
Yi L

>   return -EINVAL;
> 
>   if (dev_is_pci(dev)) {
> --
> 2.7.4
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 1/3] iommu/vt-d: Missing checks for pasid tables if allocation fails

2017-10-19 Thread Liu, Yi L


> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Friday, October 20, 2017 8:49 AM
> To: Liu, Yi L ; j...@8bytes.org; dw...@infradead.org
> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 1/3] iommu/vt-d: Missing checks for pasid tables if 
> allocation
> fails
> 
> Hi Yi,
> 
> On 10/19/2017 02:40 PM, Liu, Yi L wrote:
> >
> >> -Original Message-
> >> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> >> boun...@lists.linux-foundation.org] On Behalf Of Lu Baolu
> >> Sent: Thursday, October 19, 2017 8:39 AM
> >> To: j...@8bytes.org; dw...@infradead.org
> >> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> >> Subject: [PATCH 1/3] iommu/vt-d: Missing checks for pasid tables if
> >> allocation fails
> >>
> >> intel_svm_alloc_pasid_tables() might return an error but never be
> >> checked by the callers. Later when intel_svm_bind_mm() is called,
> >> there are no checks for valid pasid tables before enabling them.
> >>
> >> Signed-off-by: Ashok Raj 
> >> Signed-off-by: Lu Baolu 
> >> ---
> >>  drivers/iommu/intel-svm.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> >> index f6697e5..43280ca 100644
> >> --- a/drivers/iommu/intel-svm.c
> >> +++ b/drivers/iommu/intel-svm.c
> >> @@ -292,7 +292,7 @@ int intel_svm_bind_mm(struct device *dev, int
> >> *pasid, int flags, struct svm_dev_
> >>int pasid_max;
> >>int ret;
> >>
> >> -  if (WARN_ON(!iommu))
> >> +  if (WARN_ON(!iommu || !iommu->pasid_table))
> > [Liu, Yi L] Hi Baolu, I guess there also need a check to iommu->ecap ,
> > see if the pasid bit is reported. thoughts?
> >
> 
> If pasid bit is not set in ecap register, iommu->pasid_table won't be set.
> 
> We did this by:
> 
> if (pasid_enabled(iommu))
> intel_svm_alloc_pasid_tables(iommu);

[Liu, Yi L] Sounds good. thx.

Reviewed-by: Liu, Yi L 

Regards,
Yi L
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 08/16] iommu: introduce device fault data

2017-10-20 Thread Liu, Yi L


> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Wednesday, October 11, 2017 3:29 AM
> To: Jacob Pan ; 
> iommu@lists.linux-foundation.org;
> LKML ; Joerg Roedel ; David
> Woodhouse ; Greg Kroah-Hartman
> ; Wysocki, Rafael J 
> Cc: Liu, Yi L ; Lan, Tianyu ; Tian, 
> Kevin
> ; Raj, Ashok ; Alex Williamson
> 
> Subject: Re: [PATCH v2 08/16] iommu: introduce device fault data
> 
> On 06/10/17 00:03, Jacob Pan wrote:
> > Device faults detected by IOMMU can be reported outside IOMMU
> > subsystem. This patch intends to provide a generic device fault data
> > such that device drivers can communicate IOMMU faults without model
> > specific knowledge.
> >
> > The assumption is that model specific IOMMU driver can filter and
> > handle most of the IOMMU faults if the cause is within IOMMU driver
> > control. Therefore, the fault reasons can be reported are grouped and
> > generalized based common specifications such as PCI ATS.
> >
> > Signed-off-by: Jacob Pan 
> > ---
> >  include/linux/iommu.h | 69
> > +++
> >  1 file changed, 69 insertions(+)
> >
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > 4af1820..3f9b367 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -49,6 +49,7 @@ struct bus_type;
> >  struct device;
> >  struct iommu_domain;
> >  struct notifier_block;
> > +struct iommu_fault_event;
> >
> >  /* iommu fault flags */
> >  #define IOMMU_FAULT_READ   0x0
> > @@ -56,6 +57,7 @@ struct notifier_block;
> >
> >  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
> > struct device *, unsigned long, int, void *);
> > +typedef int (*iommu_dev_fault_handler_t)(struct device *, struct
> > +iommu_fault_event *);
> >
> >  struct iommu_domain_geometry {
> > dma_addr_t aperture_start; /* First address that can be mapped*/
> > @@ -264,6 +266,60 @@ struct iommu_device {
> > struct device *dev;
> >  };
> >
> > +enum iommu_model {
> > +   IOMMU_MODEL_INTEL = 1,
> > +   IOMMU_MODEL_AMD,
> > +   IOMMU_MODEL_SMMU3,
> > +};
> 
> Now unused, I guess?
> 
> > +
> > +/*  Generic fault types, can be expanded IRQ remapping fault */ enum
> > +iommu_fault_type {
> > +   IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
> > +   IOMMU_FAULT_PAGE_REQ,   /* page request fault */
> > +};
> > +
> > +enum iommu_fault_reason {
> > +   IOMMU_FAULT_REASON_CTX = 1,
> 
> If I read the VT-d spec right, this is a fault encountered while fetching the 
> PASID table
> pointer?
> 
> > +   IOMMU_FAULT_REASON_ACCESS,
> 
> And this a pgd or pte access fault?
> 
> > +   IOMMU_FAULT_REASON_INVALIDATE,
> 
> What would this be?
> 
> > +   IOMMU_FAULT_REASON_UNKNOWN,
> > +};
> 
> I'm currently doing the same exploratory work for virtio-iommu, and I'd be 
> tempted
> to report reasons as detailed as possible to guest or device driver, but it's 
> not clear
> what they need, how they would use this information. I'd like to discuss this 
> some
> more.

[Liu, Yi L] In fact, it's not necessary to pass the detailed unrecoverable 
fault to guest in
virtualization case. Unrecoverable fault happened on native indicates fault 
during native
IOMMU address translation. If the fault is not due to guest IOMMU page table 
setting,
then it is not necessary to inject the fault to guest. And hypervisor should be 
able to
deduce it by walking the guest IOMMU page table with the fault address. So I 
think for
virtualization case, pass the fault address is enough. If hypervisor doesn't 
see any issue
after checking the guest IOMMU translation hierarchy, no use to let guest know 
it. Hypervisor
can either throw error log or stop the guest. If hypervisor see any error in 
the guest
iommu translation hierarchy, then inject the error to guest with a proper fault 
type.

But for device driver or other user-space driver, I'm not sure if they need 
detailed fault
info. In fact, it is enough to pass the possible info which would help them to 
deduce whether
the unrecoverable fault is due to them. This need more inputs from device 
driver reviewers.

> For unrecoverable faults I guess CTX means "the host IOMMU driver is broken", 
> since
> the device tables are invalid. In which case there is no use continuing, 
> trying to
> shutdown the device cleanly is really all the guest/device driver can do.

[Liu, Yi L] Not sure about what device table mean here. 

RE: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs

2017-10-23 Thread Liu, Yi L
Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Friday, October 6, 2017 9:31 PM
> To: linux-arm-ker...@lists.infradead.org; linux-...@vger.kernel.org; linux-
> a...@vger.kernel.org; devicet...@vger.kernel.org; iommu@lists.linux-
> foundation.org
> Cc: j...@8bytes.org; robh...@kernel.org; mark.rutl...@arm.com;
> catalin.mari...@arm.com; will.dea...@arm.com; lorenzo.pieral...@arm.com;
> hanjun@linaro.org; sudeep.ho...@arm.com; r...@rjwysocki.net;
> l...@kernel.org; robin.mur...@arm.com; bhelg...@google.com;
> alex.william...@redhat.com; t...@semihalf.com; liub...@huawei.com;
> thunder.leiz...@huawei.com; xieyishe...@huawei.com;
> gabriele.paol...@huawei.com; nwatt...@codeaurora.org; ok...@codeaurora.org;
> rfr...@cavium.com; dw...@infradead.org; jacob.jun@linux.intel.com; Liu, Yi
> L ; Raj, Ashok ; robdcl...@gmail.com
> Subject: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
> 
> IOMMU drivers need a way to bind Linux processes to devices. This is used for
> Shared Virtual Memory (SVM), where devices support paging. In that mode, DMA 
> can
> directly target virtual addresses of a process.
> 
> Introduce boilerplate code for allocating process structures and binding them 
> to
> devices. Four operations are added to IOMMU drivers:
> 
> * process_alloc, process_free: to create an iommu_process structure and
>   perform architecture-specific operations required to grab the process
>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>   iommu_process structure per Linux process.
> 
> * process_attach: attach a process to a device. The IOMMU driver checks
>   that the device is capable of sharing an address space with this
>   process, and writes the PASID table entry to install the process page
>   directory.
> 
>   Some IOMMU drivers (e.g. ARM SMMU and virtio-iommu) will have a single
>   PASID table per domain, for convenience. Other can implement it
>   differently but to help these drivers, process_attach and process_detach
>   take a 'first' or 'last' parameter telling whether they need to
>   install/remove the PASID entry or only send the required TLB
>   invalidations.
> 
> * process_detach: detach a process from a device. The IOMMU driver removes
>   the PASID table entry and invalidates the IOTLBs.
> 
> process_attach and process_detach operations are serialized with a spinlock. 
> At the
> moment it is global, but if we try to optimize it, the core should at least 
> prevent
> concurrent attach/detach on the same domain.
> (so multi-level PASID table code can allocate tables lazily without having to 
> go
> through the io-pgtable concurrency nightmare). process_alloc can sleep, but
> process_free must not (because we'll have to call it from
> call_srcu.)
> 
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a 
> custom
> allocator will be needed for top-down PASID allocation).
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/Kconfig |  10 ++
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/iommu-process.c | 225
> ++
>  drivers/iommu/iommu.c |   1 +
>  include/linux/iommu.h |  24 +
>  5 files changed, 261 insertions(+)
>  create mode 100644 drivers/iommu/iommu-process.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index
> f3a21343e636..1ea5c90e37be 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,16 @@ config IOMMU_DMA
>   select IOMMU_IOVA
>   select NEED_SG_DMA_LENGTH
> 
> +config IOMMU_PROCESS
> + bool "Process management API for the IOMMU"
> + select IOMMU_API
> + help
> +   Enable process management for the IOMMU API. In systems that support
> +   it, device drivers can bind processes to devices and share their page
> +   tables using this API.
> +
> +   If unsure, say N here.
> +
>  config FSL_PAMU
>   bool "Freescale IOMMU support"
>   depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index
> b910aea813a1..a2832edbfaa2 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,6 +1,7 @@
>  obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> +obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S)

RE: [PATCH v2 08/16] iommu: introduce device fault data

2017-11-07 Thread Liu, Yi L
Hi Jean,

Nice to have you "online". This open is really blocking the progress. Pls check 
inline.

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Tuesday, November 7, 2017 3:02 AM
> To: Liu, Yi L ; Jacob Pan ;
> iommu@lists.linux-foundation.org; LKML ; Joerg
> Roedel ; David Woodhouse ; Greg
> Kroah-Hartman ; Wysocki, Rafael J
> 
> Cc: Lan, Tianyu ; Tian, Kevin ; 
> Raj,
> Ashok ; Alex Williamson 
> Subject: Re: [PATCH v2 08/16] iommu: introduce device fault data
> 
> Hi Yi,
> 
> Sorry for the late reply, I seem to have missed this.
> 
> On 20/10/17 11:07, Liu, Yi L wrote:
> [...]
> >>> +
> >>> +/*  Generic fault types, can be expanded IRQ remapping fault */
> >>> +enum iommu_fault_type {
> >>> + IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
> >>> + IOMMU_FAULT_PAGE_REQ,   /* page request fault */
> >>> +};
> >>> +
> >>> +enum iommu_fault_reason {
> >>> + IOMMU_FAULT_REASON_CTX = 1,
> >>
> >> If I read the VT-d spec right, this is a fault encountered while
> >> fetching the PASID table pointer?
> >>
> >>> + IOMMU_FAULT_REASON_ACCESS,
> >>
> >> And this a pgd or pte access fault?
> >>
> >>> + IOMMU_FAULT_REASON_INVALIDATE,
> >>
> >> What would this be?
> >>
> >>> + IOMMU_FAULT_REASON_UNKNOWN,
> >>> +};
> >>
> >> I'm currently doing the same exploratory work for virtio-iommu, and
> >> I'd be tempted to report reasons as detailed as possible to guest or
> >> device driver, but it's not clear what they need, how they would use
> >> this information. I'd like to discuss this some more.
> >
> > [Liu, Yi L] In fact, it's not necessary to pass the detailed
> > unrecoverable fault to guest in virtualization case. Unrecoverable
> > fault happened on native indicates fault during native IOMMU address
> > translation. If the fault is not due to guest IOMMU page table
> > setting, then it is not necessary to inject the fault to guest. And 
> > hypervisor should
> be able to deduce it by walking the guest IOMMU page table with the fault 
> address.
> 
> I'm not sure the hypervisor should go and inspect the guest's page tables.

[Liu, Yi L] I think hypervisor needs to do it to make sure reporting fault to 
guest
correctly. If not, hypervisor may report some fault to guest and make guest
confused. e.g. pIOMMU walks page table and failed during walking root 
table(VT-d)
or device table(SMMU). such fault is due to no valid programming in host, guest
has no duty on it and neither has knowledge to fix it. it would make guest to
believe that it has programmed the root table or device table in the wrong way
while the fact is not.

> The pIOMMU already did the walk and reported the fault, so the hypervisor 
> knows
> that they are invalid. I thought VT-d and other pIOMMUs provide enough
> information in the fault report to tell if the error was due to invalid page 
> tables?

[Liu, Yi L] yes, pIOMMU did walk and get the fault info, but it's not sure who 
is
responsible to the fault. With inspecting the guest table, hypervisor may know 
who
should be responsible to the fault.

> 
> > So I think for
> > virtualization case, pass the fault address is enough. If hypervisor
> > doesn't see any issue after checking the guest IOMMU translation
> > hierarchy, no use to let guest know it. Hypervisor can either throw
> > error log or stop the guest. If hypervisor see any error in the guest
> > iommu translation hierarchy, then inject the error to guest with a
> > proper fault type.> But for device driver or other user-space driver,
> > I'm not sure if they need detailed fault info. In fact, it is enough to 
> > pass the
> possible info which would help them to deduce whether the unrecoverable fault 
> is
> due to them. This need more inputs from device driver reviewers.
> 
> Agreed, though I'm not sure how to reach them.

[Liu, Yi L] I'd like to supplement my words here. Except the fault address, we 
may also
need to provide the BDF and PASID if it is there.

> 
> At the moment, the only users of report_iommu_fault, the existing fault 
> reporting
> mechanism, are ARM-based IOMMU drivers and there are only four device drivers
> that register a handler with iommu_set_fault_handler. Two of them simply 
> print the
> fault, one resets the offending device, and the last one (msm GPU) wants to 
> provide
> more detailed debugging information about the device state.

[Liu, Yi L] Well, 

RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-28 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Thursday, February 15, 2018 8:41 PM
> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
> 
> On 13/02/18 23:34, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 8:57 PM
> >>
> >> On 13/02/18 07:54, Tian, Kevin wrote:
>  From: Jean-Philippe Brucker
>  Sent: Tuesday, February 13, 2018 2:33 AM
> 
>  Add bind() and unbind() operations to the IOMMU API. Device drivers
> >> can
>  use them to share process page tables with their devices.
>  bind_group() is provided for VFIO's convenience, as it needs to
>  provide a coherent interface on containers. Other device drivers
>  will most likely want to use bind_device(), which binds a single device 
>  in the
> group.
> >>>
> >>> I saw your bind_group implementation tries to bind the address space
> >>> for all devices within a group, which IMO has some problem. Based on
> >> PCIe
> >>> spec, packet routing on the bus doesn't take PASID into consideration.
> >>> since devices within same group cannot be isolated based on
> >>> requestor-
> >> ID
> >>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> >> devices
> >>> could cause undesired p2p.
> >> But so does enabling "classic" DMA... If two devices are not
> >> protected by ACS for example, they are put in the same IOMMU group,
> >> and one device might be able to snoop the other's DMA. VFIO allows
> >> userspace to create a container for them and use MAP/UNMAP, but makes
> >> it explicit to the user that for DMA, these devices are not isolated
> >> and must be considered as a single device (you can't pass them to
> >> different VMs or put them in different containers). So I tried to
> >> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
> >> operations on the VFIO container instead of the device.
> >
> > there is a small difference. for classic DMA we can reserve PCI BARs
> > when allocating IOVA, thus multiple devices in the same group can
> > still work correctly applied with same translation, if isolation is
> > not cared in between. However for SVA it's CPU virtual addresses
> > managed by kernel mm thus difficult to introduce similar address
> > reservation. Then it's possible for a VA falling into other device's
> > BAR in the same group and cause undesired p2p traffic. In such regard,
> > SVA is actually functionally-broken.
> 
> I think the problem exists even if there is a single device in the group.
> If for example, malloc() returns a VA that corresponds to a PCI host bridge 
> in IOVA
> space, performing DMA on that buffer won't reach the IOMMU and will cause
> undesirable side-effects.

If only a single device in a group, should it mean there is ACS support in
the path from this device to root complex? It means any memory request
from this device would be upstreamed to root complex, thus it should be
able to avoid undesired p2p traffics. So I intend to believe, even we do
bind in group level, we actually expect to make it work only for the case
where a single device within a group.

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 1/9] iommu/vt-d: Global PASID name space

2018-05-01 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, April 17, 2018 11:03 AM
> 
> This adds the system wide PASID name space for the PASID
> allocation. Currently we are using per IOMMU PASID name
> spaces which are not suitable for some use cases. For an
> example, one application (associated with a PASID) might
> talk to two physical devices simultaneously while the two
> devices could reside behind two different IOMMU units.

Looks good to me.
Reviewed-by: Liu, Yi L 

> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Suggested-by: Ashok Raj 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Kevin Tian 
> ---
>  drivers/iommu/Makefile  |  2 +-
>  drivers/iommu/intel-iommu.c | 13 ++
>  drivers/iommu/intel-pasid.c | 60
> +
>  drivers/iommu/intel-pasid.h | 30 +++
>  4 files changed, 104 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/intel-pasid.c
>  create mode 100644 drivers/iommu/intel-pasid.h
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb6958..0a190b4 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -14,7 +14,7 @@ obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
>  obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
>  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>  obj-$(CONFIG_DMAR_TABLE) += dmar.o
> -obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o
> +obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
>  obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
>  obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
>  obj-$(CONFIG_IRQ_REMAP) += intel_irq_remapping.o irq_remapping.o
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 749d8f2..98c5ae9 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -53,6 +53,7 @@
>  #include 
> 
>  #include "irq_remapping.h"
> +#include "intel-pasid.h"
> 
>  #define ROOT_SIZEVTD_PAGE_SIZE
>  #define CONTEXT_SIZE VTD_PAGE_SIZE
> @@ -3265,6 +3266,18 @@ static int __init init_dmars(void)
>   }
> 
>   for_each_active_iommu(iommu, drhd) {
> + /*
> +  * Find the max pasid size of all IOMMU's in the system.
> +  * we need to ensure the system pasid table is no bigger
> +  * than the smallest supported.
> +  */
> + if (pasid_enabled(iommu)) {
> + u32 temp = 2 << ecap_pss(iommu->ecap);
> +
> + intel_pasid_max_id = min_t(u32, temp,
> +intel_pasid_max_id);
> + }
> +
>   g_iommus[iommu->seq_id] = iommu;
> 
>   intel_iommu_init_qi(iommu);
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> new file mode 100644
> index 000..0690f39
> --- /dev/null
> +++ b/drivers/iommu/intel-pasid.c
> @@ -0,0 +1,60 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/**
> + * intel-pasid.c - PASID idr, table and entry manipulation
> + *
> + * Copyright (C) 2018 Intel Corporation
> + *
> + * Author: Lu Baolu 
> + */
> +
> +#define pr_fmt(fmt)  "DMAR: " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "intel-pasid.h"
> +
> +/*
> + * Intel IOMMU global PASID pool:
> + */
> +static DEFINE_SPINLOCK(pasid_lock);
> +u32 intel_pasid_max_id = PASID_MAX;
> +static DEFINE_IDR(pasid_idr);
> +
> +int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> +{
> + int ret, min, max;
> +
> + min = max_t(int, start, PASID_MIN);
> + max = min_t(int, end, intel_pasid_max_id);
> +
> + WARN_ON(in_interrupt());
> + idr_preload(gfp);
> + spin_lock(&pasid_lock);
> + ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> + spin_unlock(&pasid_lock);
> + idr_preload_end();
> +
> + return ret;
> +}
> +
> +void intel_pasid_free_id(int pasid)
> +{
> + spin_lock(&pasid_lock);
> + idr_remove(&pasid_idr, pasid);
> + spin_unlock(&pasid_lock);
> +}
> +
> +void *intel_pasid_lookup_id(int pasid)
> +{
> + void *p;
> +
> + spin_lock(&pasid_lock);
> + p = idr_find(&pasid_idr, pasid);
> + spin_unlock(&pasid_lock);
> +
> + return p;
> +}
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> new file mode 100644
> index 000..0c36af0
> --- /dev/null
> +++ b/drivers/iommu/intel-pasid.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
>

RE: [PATCH 2/9] iommu/vt-d: Decouple idr bond pointer from svm

2018-05-01 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, April 17, 2018 11:03 AM
> Subject: [PATCH 2/9] iommu/vt-d: Decouple idr bond pointer from svm
> 
> As we move the PASID idr out of SVM code and make it serving
> as a global PASID name space, the consumer can specify a ptr
> to bind it with a PASID. We shouldn't assume that each PASID
> will be bond with a ptr of struct intel_svm anymore.
> This patch cleans up a idr_for_each_entry() usage in the SVM
> code. It's required to replace the SVM-specific idr with the
> global PASID idr.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Kevin Tian 

Looks good to me.
Reviewed-by: Liu, Yi L 

Regards,
Yi Liu
>  drivers/iommu/intel-svm.c   | 14 ++
>  include/linux/intel-iommu.h |  1 +
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index e8cd984..983af0c 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -298,6 +298,7 @@ static const struct mmu_notifier_ops intel_mmuops = {
>  };
> 
>  static DEFINE_MUTEX(pasid_mutex);
> +static LIST_HEAD(global_svm_list);
> 
>  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
> svm_dev_ops
> *ops)
>  {
> @@ -329,13 +330,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, 
> int
> flags, struct svm_dev_
> 
>   mutex_lock(&pasid_mutex);
>   if (pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
> - int i;
> + struct intel_svm *t;
> 
> - idr_for_each_entry(&iommu->pasid_idr, svm, i) {
> - if (svm->mm != mm ||
> - (svm->flags & SVM_FLAG_PRIVATE_PASID))
> + list_for_each_entry(t, &global_svm_list, list) {
> + if (t->mm != mm || (t->flags & SVM_FLAG_PRIVATE_PASID))
>   continue;
> 
> + svm = t;
>   if (svm->pasid >= pasid_max) {
>   dev_warn(dev,
>"Limited PASID width. Cannot use 
> existing
> PASID %d\n",
> @@ -404,6 +405,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
> flags, struct svm_dev_
>   svm->mm = mm;
>   svm->flags = flags;
>   INIT_LIST_HEAD_RCU(&svm->devs);
> + INIT_LIST_HEAD(&svm->list);
>   ret = -ENOMEM;
>   if (mm) {
>   ret = mmu_notifier_register(&svm->notifier, mm);
> @@ -430,6 +432,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
> flags, struct svm_dev_
>*/
>   if (cap_caching_mode(iommu->cap))
>   intel_flush_pasid_dev(svm, sdev, svm->pasid);
> +
> + list_add_tail(&svm->list, &global_svm_list);
>   }
>   list_add_rcu(&sdev->list, &svm->devs);
> 
> @@ -485,6 +489,8 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
>   if (svm->mm)
>   mmu_notifier_unregister(&svm-
> >notifier, svm->mm);
> 
> + list_del(&svm->list);
> +
>   /* We mandate that no page faults may be
> outstanding
>* for the PASID when
> intel_svm_unbind_mm() is called.
>* If that is not obeyed, subtle errors 
> will
> happen.
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ef169d6..795717e 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -486,6 +486,7 @@ struct intel_svm {
>   int flags;
>   int pasid;
>   struct list_head devs;
> + struct list_head list;
>  };
> 
>  extern int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct
> intel_svm_dev *sdev);
> --
> 2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 4/9] iommu/vt-d: Move device_domain_info to header

2018-05-01 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, April 17, 2018 11:03 AM
> 
> This allows the per device iommu data to be accessed from other
> files.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 

Looks good to me.
Reviewed-by: Liu, Yi L 

Regards,
Yi Liu
> ---
>  drivers/iommu/intel-iommu.c | 62 +++--
>  include/linux/intel-iommu.h | 68
> +
>  2 files changed, 72 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 98c5ae9..caa0b5c 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -381,60 +381,6 @@ static int hw_pass_through = 1;
>   for (idx = 0; idx < g_num_of_iommus; idx++) \
>   if (domain->iommu_refcnt[idx])
> 
> -struct dmar_domain {
> - int nid;/* node id */
> -
> - unsignediommu_refcnt[DMAR_UNITS_SUPPORTED];
> - /* Refcount of devices per iommu */
> -
> -
> - u16 iommu_did[DMAR_UNITS_SUPPORTED];
> - /* Domain ids per IOMMU. Use u16 since
> -  * domain ids are 16 bit wide according
> -  * to VT-d spec, section 9.3 */
> -
> - bool has_iotlb_device;
> - struct list_head devices;   /* all devices' list */
> - struct iova_domain iovad;   /* iova's that belong to this domain */
> -
> - struct dma_pte  *pgd;   /* virtual address */
> - int gaw;/* max guest address width */
> -
> - /* adjusted guest address width, 0 is level 2 30-bit */
> - int agaw;
> -
> - int flags;  /* flags to find out type of domain */
> -
> - int iommu_coherency;/* indicate coherency of iommu access
> */
> - int iommu_snooping; /* indicate snooping control feature*/
> - int iommu_count;/* reference count of iommu */
> - int iommu_superpage;/* Level of superpages supported:
> -0 == 4KiB (no superpages), 1 == 2MiB,
> -2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
> - u64 max_addr;   /* maximum mapped address */
> -
> - struct iommu_domain domain; /* generic domain data structure for
> -iommu core */
> -};
> -
> -/* PCI domain-device relationship */
> -struct device_domain_info {
> - struct list_head link;  /* link to domain siblings */
> - struct list_head global; /* link to global list */
> - u8 bus; /* PCI bus number */
> - u8 devfn;   /* PCI devfn number */
> - u8 pasid_supported:3;
> - u8 pasid_enabled:1;
> - u8 pri_supported:1;
> - u8 pri_enabled:1;
> - u8 ats_supported:1;
> - u8 ats_enabled:1;
> - u8 ats_qdep;
> - struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
> - struct intel_iommu *iommu; /* IOMMU used by this device */
> - struct dmar_domain *domain; /* pointer to domain */
> -};
> -
>  struct dmar_rmrr_unit {
>   struct list_head list;  /* list of rmrr units   */
>   struct acpi_dmar_header *hdr;   /* ACPI header  */
> @@ -631,7 +577,7 @@ static void set_iommu_domain(struct intel_iommu *iommu,
> u16 did,
>   domains[did & 0xff] = domain;
>  }
> 
> -static inline void *alloc_pgtable_page(int node)
> +void *alloc_pgtable_page(int node)
>  {
>   struct page *page;
>   void *vaddr = NULL;
> @@ -642,7 +588,7 @@ static inline void *alloc_pgtable_page(int node)
>   return vaddr;
>  }
> 
> -static inline void free_pgtable_page(void *vaddr)
> +void free_pgtable_page(void *vaddr)
>  {
>   free_page((unsigned long)vaddr);
>  }
> @@ -725,7 +671,7 @@ int iommu_calculate_agaw(struct intel_iommu *iommu)
>  }
> 
>  /* This functionin only returns single iommu in a domain */
> -static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain)
> +struct intel_iommu *domain_get_iommu(struct dmar_domain *domain)
>  {
>   int iommu_id;
> 
> @@ -3500,7 +3446,7 @@ static unsigned long intel_alloc_iova(struct device 
> *dev,
>   return iova_pfn;
>  }
> 
> -static struct dmar_domain *get_valid_domain_for_dev(struct device *dev)
> +struct dmar_domain *get_valid_domain_for_dev(struct device *dev)
>  {
>   struct dmar_doma

RE: [PATCH 5/9] iommu/vt-d: Per domain pasid table interfaces

2018-05-01 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, April 17, 2018 11:03 AM
> 
> This patch adds the interfaces for per domain pasid table
> management. Currently we allocate one pasid table for all
> devices under the scope of an IOMMU. It's insecure in the
> cases where multiple devices under one single IOMMU unit
> support PASID feature. With per domain pasid table, we can
> achieve finer protection and isolation granularity.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Suggested-by: Ashok Raj 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-pasid.c | 75
> +
>  drivers/iommu/intel-pasid.h |  4 +++
>  include/linux/intel-iommu.h |  5 +++
>  3 files changed, 84 insertions(+)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 0690f39..b8691a6 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "intel-pasid.h"
> @@ -58,3 +59,77 @@ void *intel_pasid_lookup_id(int pasid)
> 
>   return p;
>  }
> +
> +/*
> + * Interfaces for per domain pasid table management:
> + */
> +int intel_pasid_alloc_table(struct device *dev, size_t entry_size,
> + size_t entry_count)
> +{
> + struct device_domain_info *info;
> + struct dmar_domain *domain;
> + struct page *pages;
> + int order;
> +
> + info = dev->archdata.iommu;
> + if (WARN_ON(!info || !dev_is_pci(dev) ||
> + !info->pasid_supported ||
> + !info->domain))
> + return -EINVAL;
> +
> + domain = info->domain;
> +
> + if (entry_count > intel_pasid_max_id)
> + entry_count = intel_pasid_max_id;
> +
> + order = get_order(entry_size * entry_count);
> + pages = alloc_pages_node(domain->nid, GFP_KERNEL | __GFP_ZERO, order);
> + if (!pages)
> + return -ENOMEM;
> +
> + spin_lock(&pasid_lock);
> + if (domain->pasid_table) {

Can the check be moved prior to the page allocation?

> + __free_pages(pages, order);
> + } else {
> + domain->pasid_table = page_address(pages);
> + domain->order   = order;
> + domain->max_pasid   = entry_count;
> + }
> + domain->pasid_users++;
> + spin_unlock(&pasid_lock);
> +
> + return 0;
> +}
> +
> +void intel_pasid_free_table(struct device *dev)
> +{
> + struct dmar_domain *domain;
> +
> + domain = get_valid_domain_for_dev(dev);
> + if (!domain || !dev_is_pci(dev))
> + return;
> +
> + spin_lock(&pasid_lock);
> + if (domain->pasid_table) {
> + domain->pasid_users--;
> + if (!domain->pasid_users) {
> + free_pages((unsigned long)domain->pasid_table,
> +domain->order);
> + domain->pasid_table = NULL;
> + domain->order   = 0;
> + domain->max_pasid   = 0;
> + }
> + }
> + spin_unlock(&pasid_lock);
> +}
> +
> +void *intel_pasid_get_table(struct device *dev)

Will intel_iommu_get_pasid_table() more accurate?

Regards,
Yi Liu

> +{
> + struct dmar_domain *domain;
> +
> + domain = get_valid_domain_for_dev(dev);
> + if (!domain)
> + return NULL;
> +
> + return domain->pasid_table;
> +}
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index 0c36af0..a90c60b 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -26,5 +26,9 @@ extern u32 intel_pasid_max_id;
>  int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp);
>  void intel_pasid_free_id(int pasid);
>  void *intel_pasid_lookup_id(int pasid);
> +int intel_pasid_alloc_table(struct device *dev, size_t entry_size,
> + size_t entry_count);
> +void intel_pasid_free_table(struct device *dev);
> +void *intel_pasid_get_table(struct device *dev);
> 
>  #endif /* __INTEL_PASID_H */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index a4463f0..bee7a3f 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -424,6 +424,11 @@ struct dmar_domain {
>*/
>   u64 max_addr;   /* maximum mapped address */
> 
> + void   

RE: [PATCH 3/9] iommu/vt-d: Use global PASID for SVM usage

2018-05-01 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, April 17, 2018 11:03 AM
> This patch switches PASID management for SVM from per iommu idr to the global 
> idr.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Kevin Tian 

Looks good to me.
Reviewed-by: Liu, Yi L 

Regards,
Yi Liu
> ---
>  drivers/iommu/intel-svm.c   | 22 +++---
>  include/linux/intel-iommu.h |  1 -
>  2 files changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index
> 983af0c..24d0ea1 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -26,6 +26,8 @@
>  #include 
>  #include 
> 
> +#include "intel-pasid.h"
> +
>  #define PASID_ENTRY_PBIT_ULL(0)
>  #define PASID_ENTRY_FLPM_5LP BIT_ULL(9)
>  #define PASID_ENTRY_SRE  BIT_ULL(11)
> @@ -85,8 +87,6 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
>   iommu->name);
>   }
> 
> - idr_init(&iommu->pasid_idr);
> -
>   return 0;
>  }
> 
> @@ -102,7 +102,7 @@ int intel_svm_free_pasid_tables(struct intel_iommu *iommu)
>   free_pages((unsigned long)iommu->pasid_state_table, order);
>   iommu->pasid_state_table = NULL;
>   }
> - idr_destroy(&iommu->pasid_idr);
> +
>   return 0;
>  }
> 
> @@ -392,9 +392,9 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
> flags, struct svm_dev_
>   pasid_max = iommu->pasid_max;
> 
>   /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
> - ret = idr_alloc(&iommu->pasid_idr, svm,
> - !!cap_caching_mode(iommu->cap),
> - pasid_max - 1, GFP_KERNEL);
> + ret = intel_pasid_alloc_id(svm,
> +!!cap_caching_mode(iommu->cap),
> +pasid_max - 1, GFP_KERNEL);
>   if (ret < 0) {
>   kfree(svm);
>   kfree(sdev);
> @@ -410,7 +410,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
> flags, struct svm_dev_
>   if (mm) {
>   ret = mmu_notifier_register(&svm->notifier, mm);
>   if (ret) {
> - idr_remove(&svm->iommu->pasid_idr, svm->pasid);
> + intel_pasid_free_id(svm->pasid);
>   kfree(svm);
>   kfree(sdev);
>   goto out;
> @@ -460,7 +460,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
>   if (!iommu || !iommu->pasid_table)
>   goto out;
> 
> - svm = idr_find(&iommu->pasid_idr, pasid);
> + svm = intel_pasid_lookup_id(pasid);
>   if (!svm)
>   goto out;
> 
> @@ -485,7 +485,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
>   svm->iommu->pasid_table[svm->pasid].val
> = 0;
>   wmb();
> 
> - idr_remove(&svm->iommu->pasid_idr,
> svm->pasid);
> + intel_pasid_free_id(svm->pasid);
>   if (svm->mm)
>   mmu_notifier_unregister(&svm-
> >notifier, svm->mm);
> 
> @@ -520,7 +520,7 @@ int intel_svm_is_pasid_valid(struct device *dev, int 
> pasid)
>   if (!iommu || !iommu->pasid_table)
>   goto out;
> 
> - svm = idr_find(&iommu->pasid_idr, pasid);
> + svm = intel_pasid_lookup_id(pasid);
>   if (!svm)
>   goto out;
> 
> @@ -618,7 +618,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> 
>   if (!svm || svm->pasid != req->pasid) {
>   rcu_read_lock();
> - svm = idr_find(&iommu->pasid_idr, req->pasid);
> + svm = intel_pasid_lookup_id(req->pasid);
>   /* It *can't* go away, because the driver is not 
> permitted
>* to unbind the mm while any page faults are 
> outstanding.
>* So we only need RCU to protect the internal idr 
> code. */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
> 795717e..6b5ef6c 100644
> --- a/include/linux/intel-iommu.h
> ++

RE: [PATCH 6/9] iommu/vt-d: Allocate and free pasid table

2018-05-01 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, April 17, 2018 11:03 AM
> 
> This patch allocates PASID table for a domain at the time when
> it is being created (if any devices using this domain supports
> PASID feature), and free it when the domain is freed.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-iommu.c | 13 +
>  drivers/iommu/intel-svm.c   |  8 
>  include/linux/intel-iommu.h | 10 --
>  3 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index caa0b5c..99c643b 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2460,6 +2460,18 @@ static struct dmar_domain
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   dev->archdata.iommu = info;
>   spin_unlock_irqrestore(&device_domain_lock, flags);
> 
> + if (dev && dev_is_pci(dev) && info->pasid_supported) {
> + if (pasid_enabled(iommu)) {
> + size_t size, count;
> +
> + size = sizeof(struct pasid_entry);
> + count = min_t(int,
> +   pci_max_pasids(to_pci_dev(dev)),
> +   intel_pasid_max_id);
> + ret = intel_pasid_alloc_table(dev, size, count);

No check for the return value?

> + }
> + }
> +
>   if (dev && domain_context_mapping(domain, dev)) {
>   pr_err("Domain context map for %s failed\n", dev_name(dev));
>   dmar_remove_one_dev_info(domain, dev);
> @@ -4826,6 +4838,7 @@ static void dmar_remove_one_dev_info(struct
> dmar_domain *domain,
>   unsigned long flags;
> 
>   spin_lock_irqsave(&device_domain_lock, flags);
> + intel_pasid_free_table(dev);
>   info = dev->archdata.iommu;
>   __dmar_remove_one_dev_info(info);
>   spin_unlock_irqrestore(&device_domain_lock, flags);
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 24d0ea1..3abc94f 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -34,14 +34,6 @@
> 
>  static irqreturn_t prq_event_thread(int irq, void *d);
> 
> -struct pasid_entry {
> - u64 val;
> -};
> -
> -struct pasid_state_entry {
> - u64 val;
> -};
> -
>  int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
>  {
>   struct page *pages;
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index bee7a3f..08e5811 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -382,8 +382,14 @@ enum {
>  #define VTD_FLAG_TRANS_PRE_ENABLED   (1 << 0)
>  #define VTD_FLAG_IRQ_REMAP_PRE_ENABLED   (1 << 1)
> 
> -struct pasid_entry;
> -struct pasid_state_entry;
> +struct pasid_entry {
> + u64 val;
> +};
> +
> +struct pasid_state_entry {
> + u64 val;
> +};
> +
>  struct page_req_dsc;
> 
>  struct dmar_domain {

Overall, this patch looks good to me. But need to fix the comment above.

Reviewed-by: Liu, Yi L 

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 7/9] iommu/vt-d: Calculate PTS value

2018-05-01 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, April 17, 2018 11:03 AM
> 
> Calculate PTS (PASID Table Size) value for the extended
> context entry from the real size of the PASID table for
> a domain.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 

Looks good to me.
Reviewed-by: Liu, Yi L 

Regards,
Yi Liu
> ---
>  drivers/iommu/intel-iommu.c | 22 --
>  1 file changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 99c643b..d4f9cea 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5146,22 +5146,16 @@ static void intel_iommu_put_resv_regions(struct device
> *dev,
> 
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  #define MAX_NR_PASID_BITS (20)
> -static inline unsigned long intel_iommu_get_pts(struct intel_iommu *iommu)
> +static inline unsigned long intel_iommu_get_pts(struct dmar_domain *domain)
>  {
> - /*
> -  * Convert ecap_pss to extend context entry pts encoding, also
> -  * respect the soft pasid_max value set by the iommu.
> -  * - number of PASID bits = ecap_pss + 1
> -  * - number of PASID table entries = 2^(pts + 5)
> -  * Therefore, pts = ecap_pss - 4
> -  * e.g. KBL ecap_pss = 0x13, PASID has 20 bits, pts = 15
> -  */
> - if (ecap_pss(iommu->ecap) < 5)
> + int pts;
> +
> + pts = find_first_bit((unsigned long *)&domain->max_pasid,
> +  MAX_NR_PASID_BITS);
> + if (pts < 5)
>   return 0;
> 
> - /* pasid_max is encoded as actual number of entries not the bits */
> - return find_first_bit((unsigned long *)&iommu->pasid_max,
> - MAX_NR_PASID_BITS) - 5;
> + return pts - 5;
>  }
> 
>  int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev
> *sdev)
> @@ -5198,7 +5192,7 @@ int intel_iommu_enable_pasid(struct intel_iommu
> *iommu, struct intel_svm_dev *sd
>   if (iommu->pasid_state_table)
>   context[1].hi = (u64)virt_to_phys(iommu-
> >pasid_state_table);
>   context[1].lo = (u64)virt_to_phys(iommu->pasid_table) |
> - intel_iommu_get_pts(iommu);
> + intel_iommu_get_pts(domain);
> 
>   wmb();
>   /* CONTEXT_TT_MULTI_LEVEL and CONTEXT_TT_DEV_IOTLB are
> both
> --
> 2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 8/9] iommu/vt-d: Use per-domain pasid table

2018-05-01 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, April 17, 2018 11:03 AM
> 
> This patch replaces current per iommu pasid table with
> the new added per domain pasid table. Each svm-capable
> PCI device will have its own pasid table.

This is not accurate. pasid table is per-iommu domain. May
more accurate "Each svm-capable PCI device will be configed
with a pasid table which shares with other svm-capable device
within its iommu domain"

Can include my reviewed by after refining the description.

Reviewed-by: Liu, Yi L 

Thanks,
Yi Liu
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-iommu.c |  6 +++---
>  drivers/iommu/intel-svm.c   | 37 +
>  2 files changed, 28 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index d4f9cea..5fe7f91 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5191,7 +5191,7 @@ int intel_iommu_enable_pasid(struct intel_iommu
> *iommu, struct intel_svm_dev *sd
>   if (!(ctx_lo & CONTEXT_PASIDE)) {
>   if (iommu->pasid_state_table)
>   context[1].hi = (u64)virt_to_phys(iommu-
> >pasid_state_table);
> - context[1].lo = (u64)virt_to_phys(iommu->pasid_table) |
> + context[1].lo = (u64)virt_to_phys(domain->pasid_table) |
>   intel_iommu_get_pts(domain);
> 
>   wmb();
> @@ -5259,8 +5259,8 @@ struct intel_iommu *intel_svm_device_to_iommu(struct
> device *dev)
>   return NULL;
>   }
> 
> - if (!iommu->pasid_table) {
> - dev_err(dev, "PASID not enabled on IOMMU; cannot enable
> SVM\n");
> + if (!intel_pasid_get_table(dev)) {
> + dev_err(dev, "No PASID table for device; cannot enable SVM\n");
>   return NULL;
>   }
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 3abc94f..3b14819 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -256,6 +256,7 @@ static void intel_flush_pasid_dev(struct intel_svm *svm,
> struct intel_svm_dev *s
>  static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>  {
>   struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
> + struct pasid_entry *pasid_table;
>   struct intel_svm_dev *sdev;
> 
>   /* This might end up being called from exit_mmap(), *before* the page
> @@ -270,11 +271,16 @@ static void intel_mm_release(struct mmu_notifier *mn,
> struct mm_struct *mm)
>* page) so that we end up taking a fault that the hardware really
>* *has* to handle gracefully without affecting other processes.
>*/
> - svm->iommu->pasid_table[svm->pasid].val = 0;
> - wmb();
> -
>   rcu_read_lock();
>   list_for_each_entry_rcu(sdev, &svm->devs, list) {
> + pasid_table = intel_pasid_get_table(sdev->dev);
> + if (!pasid_table)
> + continue;
> +
> + pasid_table[svm->pasid].val = 0;
> + /* Make sure the entry update is visible before translation. */
> + wmb();
> +
>   intel_flush_pasid_dev(svm, sdev, svm->pasid);
>   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
>   }
> @@ -295,6 +301,7 @@ static LIST_HEAD(global_svm_list);
>  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
> svm_dev_ops
> *ops)
>  {
>   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> + struct pasid_entry *pasid_table;
>   struct intel_svm_dev *sdev;
>   struct intel_svm *svm = NULL;
>   struct mm_struct *mm = NULL;
> @@ -302,7 +309,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
> flags, struct svm_dev_
>   int pasid_max;
>   int ret;
> 
> - if (WARN_ON(!iommu || !iommu->pasid_table))
> + pasid_table = intel_pasid_get_table(dev);
> + if (WARN_ON(!iommu || !pasid_table))
>   return -EINVAL;
> 
>   if (dev_is_pci(dev)) {
> @@ -380,8 +388,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
> flags, struct svm_dev_
>   }
>   svm->iommu = iommu;
> 
> - if (pasid_max > iommu->pasid_max)
> - pasid_max = iommu->pasid_max;
> + if (pasid_max > intel_pasid_max_id)
> + pasid_max = intel_pasid_max_id;
> 
>   /* Do not use PASID 0 in caching mode (virtu

RE: [PATCH 9/9] iommu/vt-d: Clean up PASID talbe management for SVM

2018-05-01 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, April 17, 2018 11:03 AM
> 
> The previous per iommu pasid table alloc/free interfaces
> are no longer used. Clean up the driver by removing it.

I think this patch major cleans intel_svm_alloc_pasid_tables
and intel_svm_free_pasid_tables. Actually, only PASID State
table allocation is remained in these two functions.

Since PASID Table is modified to be per-iommu domain. How
about the PASID State Table? Should it also be per-iommu domain?

Thanks,
Yi Liu
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-iommu.c |  6 +++---
>  drivers/iommu/intel-svm.c   | 17 ++---
>  include/linux/intel-iommu.h |  5 ++---
>  3 files changed, 7 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5fe7f91..5acb90d 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1736,7 +1736,7 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
>   if (pasid_enabled(iommu)) {
>   if (ecap_prs(iommu->ecap))
>   intel_svm_finish_prq(iommu);
> - intel_svm_free_pasid_tables(iommu);
> + intel_svm_exit(iommu);
>   }
>  #endif
>  }
> @@ -3291,7 +3291,7 @@ static int __init init_dmars(void)
>   hw_pass_through = 0;
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>   if (pasid_enabled(iommu))
> - intel_svm_alloc_pasid_tables(iommu);
> + intel_svm_init(iommu);
>  #endif
>   }
> 
> @@ -4268,7 +4268,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
> 
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>   if (pasid_enabled(iommu))
> - intel_svm_alloc_pasid_tables(iommu);
> + intel_svm_init(iommu);
>  #endif
> 
>   if (dmaru->ignored) {
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 3b14819..38cae65 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -34,7 +34,7 @@
> 
>  static irqreturn_t prq_event_thread(int irq, void *d);
> 
> -int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
> +int intel_svm_init(struct intel_iommu *iommu)
>  {
>   struct page *pages;
>   int order;
> @@ -59,15 +59,6 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
>   iommu->pasid_max = 0x2;
> 
>   order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
> - pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
> - if (!pages) {
> - pr_warn("IOMMU: %s: Failed to allocate PASID table\n",
> - iommu->name);
> - return -ENOMEM;
> - }
> - iommu->pasid_table = page_address(pages);
> - pr_info("%s: Allocated order %d PASID table.\n", iommu->name, order);
> -
>   if (ecap_dis(iommu->ecap)) {
>   /* Just making it explicit... */
>   BUILD_BUG_ON(sizeof(struct pasid_entry) != sizeof(struct
> pasid_state_entry));
> @@ -82,14 +73,10 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu
> *iommu)
>   return 0;
>  }
> 
> -int intel_svm_free_pasid_tables(struct intel_iommu *iommu)
> +int intel_svm_exit(struct intel_iommu *iommu)
>  {
>   int order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
> 
> - if (iommu->pasid_table) {
> - free_pages((unsigned long)iommu->pasid_table, order);
> - iommu->pasid_table = NULL;
> - }
>   if (iommu->pasid_state_table) {
>   free_pages((unsigned long)iommu->pasid_state_table, order);
>   iommu->pasid_state_table = NULL;
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 08e5811..44c7613 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -470,7 +470,6 @@ struct intel_iommu {
>* devices away to userspace processes (e.g. for DPDK) and don't
>* want to trust that userspace will use *only* the PASID it was
>* told to. But while it's all driver-arbitrated, we're fine. */
> - struct pasid_entry *pasid_table;
>   struct pasid_state_entry *pasid_state_table;
>   struct page_req_dsc *prq;
>   unsigned char prq_name[16];/* Name for PRQ interrupt */
> @@ -539,8 +538,8 @@ void free_pgtable_page(void *vaddr);
>  struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
> 
>  #ifdef CONFIG_INTEL_IOMMU_SVM
> -extern int intel_svm_alloc_pasid_tables(struct intel

RE: [RFC v3 0/3] vfio_pci: wrap pci device as a mediated device

2019-05-23 Thread Liu, Yi L
Hi Alex,

Sorry to disturb you. Do you want to review on this version or review a rebased 
version? :-) If rebase version is better, I can try to do it asap.

Thanks,
Yi Liu

> -Original Message-
> From: Liu, Yi L
> Sent: Tuesday, April 23, 2019 8:15 PM
> To: alex.william...@redhat.com; kwankh...@nvidia.com
> Cc: Tian, Kevin ; baolu...@linux.intel.com; Liu, Yi L
> ; Sun, Yi Y ; j...@8bytes.org; jean-
> philippe.bruc...@arm.com; pet...@redhat.com; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org; yamada.masah...@socionext.com; iommu@lists.linux-
> foundation.org
> Subject: [RFC v3 0/3] vfio_pci: wrap pci device as a mediated device
> 
> This patchset aims to add a vfio-pci-like meta driver as a demo user of the 
> vfio
> changes introduced in "vfio/mdev: IOMMU aware mediated device" patchset from
> Baolu Lu.
> 
> Previous RFC v1 has given two proposals and the discussion could be found in
> following link. Per the comments, this patchset adds a separate driver named 
> vfio-
> mdev-pci. It is a sample driver, but loactes in drivers/vfio/pci due to code 
> sharing
> consideration.
> The corresponding Kconfig definition is in samples/Kconfig.
> 
> https://lkml.org/lkml/2019/3/4/529
> 
> Besides the test purpose, per Alex's comments, it could also be a good base 
> driver
> for experimenting with device specific mdev migration.
> 
> Specific interface tested in this proposal:
> 
> *) int mdev_set_iommu_device(struct device *dev,
>   struct device *iommu_device)
>introduced in the patch as below:
>"[PATCH v5 6/8] vfio/mdev: Add iommu related member in mdev_device"
> 
> 
> Links:
> *) Link of "vfio/mdev: IOMMU aware mediated device"
>   https://lwn.net/Articles/780522/
> 
> Please feel free give your comments.
> 
> Thanks,
> Yi Liu
> 
> Change log:
>   v2->v3:
>   - use vfio-mdev-pci instead of vfio-pci-mdev
>   - place the new driver under drivers/vfio/pci while define
> Kconfig in samples/Kconfig to clarify it is a sample driver
> 
>   v1->v2:
>   - instead of adding kernel option to existing vfio-pci
> module in v1, v2 follows Alex's suggestion to add a
> separate vfio-pci-mdev module.
>   - new patchset subject: "vfio/pci: wrap pci device as a mediated device"
> 
> Liu, Yi L (3):
>   vfio_pci: split vfio_pci.c into two source files
>   vfio/pci: protect cap/ecap_perm bits alloc/free with atomic op
>   smaples: add vfio-mdev-pci driver
> 
>  drivers/vfio/pci/Makefile   |7 +-
>  drivers/vfio/pci/common.c   | 1511 
> +++
>  drivers/vfio/pci/vfio_mdev_pci.c|  386 +
>  drivers/vfio/pci/vfio_pci.c | 1476 +-
>  drivers/vfio/pci/vfio_pci_config.c  |9 +
>  drivers/vfio/pci/vfio_pci_private.h |   27 +
>  samples/Kconfig |   11 +
>  7 files changed, 1962 insertions(+), 1465 deletions(-)  create mode 100644
> drivers/vfio/pci/common.c  create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c
> 
> --
> 2.7.4



RE: [RFC v3 0/3] vfio_pci: wrap pci device as a mediated device

2019-06-09 Thread Liu, Yi L
> From Alex Williamson
> Sent: Thursday, May 23, 2019 9:03 PM
> To: Liu, Yi L 
> Cc: kwankh...@nvidia.com; Tian, Kevin ;
> baolu...@linux.intel.com; Sun, Yi Y ; j...@8bytes.org; 
> jean-
> philippe.bruc...@arm.com; pet...@redhat.com; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org; yamada.masah...@socionext.com; iommu@lists.linux-
> foundation.org
> Subject: Re: [RFC v3 0/3] vfio_pci: wrap pci device as a mediated device
> 
> On Thu, 23 May 2019 08:44:57 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> >
> > Sorry to disturb you. Do you want to review on this version or review a 
> > rebased
> version? :-) If rebase version is better, I can try to do it asap.
> 
> Hi Yi,
> 
> Perhaps you missed my comments on 1/3:
> 
> https://www.spinics.net/lists/kvm/msg187282.html
> 
> In summary, it looks pretty good, but consider a file name more consistent 
> with the
> existing files and prune out the code changes from the code moves so they can 
> be
> reviewed more easily.  Thanks,

Thanks for the remind, Alex. So sorry I made changes in a "disordered".
I've made the changes accordingly. Pls refer to my latest post just now :-)

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 23/29] vfio: VFIO_IOMMU_CACHE_INVALIDATE

2019-06-14 Thread Liu, Yi L
Hi Eric,

> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: Monday, May 27, 2019 12:10 AM
> Subject: [PATCH v8 23/29] vfio: VFIO_IOMMU_CACHE_INVALIDATE
> 
> From: "Liu, Yi L" 
> 
> When the guest "owns" the stage 1 translation structures,  the host IOMMU 
> driver
> has no knowledge of caching structure updates unless the guest invalidation
> requests are trapped and passed down to the host.
> 
> This patch adds the VFIO_IOMMU_CACHE_INVALIDATE ioctl with aims at
> propagating guest stage1 IOMMU cache invalidations to the host.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Eric Auger 
> 
> ---
> v6 -> v7:
> - Use iommu_capsule struct
> - renamed vfio_iommu_for_each_dev into vfio_iommu_lookup_dev
>   due to checkpatch error related to for_each_dev suffix
> 
> v2 -> v3:
> - introduce vfio_iommu_for_each_dev back in this patch
> 
> v1 -> v2:
> - s/TLB/CACHE
> - remove vfio_iommu_task usage
> - commit message rewording
> ---
>  drivers/vfio/vfio_iommu_type1.c | 55 +
>  include/uapi/linux/vfio.h   | 13 
>  2 files changed, 68 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index b2d609d6fe83..6fda4fbc9bfa 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -120,6 +120,34 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(&iommu->domain_list))
> 
> +struct domain_capsule {
> + struct iommu_domain *domain;
> + void *data;
> +};
> +
> +/* iommu->lock must be held */
> +static int
> +vfio_iommu_lookup_dev(struct vfio_iommu *iommu,
> +   int (*fn)(struct device *dev, void *data),
> +   void *data)
> +{
> + struct domain_capsule dc = {.data = data};
> + struct vfio_domain *d;
> + struct vfio_group *g;
> + int ret = 0;
> +
> + list_for_each_entry(d, &iommu->domain_list, next) {
> + dc.domain = d->domain;
> + list_for_each_entry(g, &d->group_list, next) {
> + ret = iommu_group_for_each_dev(g->iommu_group,
> +&dc, fn);
> + if (ret)
> + break;
> + }
> + }
> + return ret;
> +}
> +
>  static int put_pfn(unsigned long pfn, int prot);
> 
>  /*
> @@ -1795,6 +1823,15 @@ vfio_attach_pasid_table(struct vfio_iommu *iommu,
>   return ret;
>  }
> 
> +static int vfio_cache_inv_fn(struct device *dev, void *data) {
> + struct domain_capsule *dc = (struct domain_capsule *)data;
> + struct vfio_iommu_type1_cache_invalidate *ustruct =
> + (struct vfio_iommu_type1_cache_invalidate *)dc->data;
> +
> + return iommu_cache_invalidate(dc->domain, dev, &ustruct->info); }
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)  { @@ -
> 1881,6 +1918,24 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   } else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) {
>   vfio_detach_pasid_table(iommu);
>   return 0;
> + } else if (cmd == VFIO_IOMMU_CACHE_INVALIDATE) {
> + struct vfio_iommu_type1_cache_invalidate ustruct;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_cache_invalidate,
> + info);
> +
> + if (copy_from_user(&ustruct, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)

May remove the flags field?

> + return -EINVAL;
> +
> + mutex_lock(&iommu->lock);
> + ret = vfio_iommu_lookup_dev(iommu, vfio_cache_inv_fn,
> + &ustruct);
> + mutex_unlock(&iommu->lock);
> + return ret;
>   }
> 
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index
> 4316dd8cb5b5..055aa9b9745a 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -785,6 +785,19 @@ struct vfio_iommu_type1_attach_pasid_table {
>   */
>  #define VFIO_IOMMU_DETACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 23)
> 
> +/**
> + * VFIO_IOMMU_CACHE_INVALIDATE - _IOWR(VFIO_TYPE, VFIO_BASE + 24,
> + *   struct vfio_iommu_type1_cache_invalidate)

[RFC v1 0/4] vfio: support Shared Virtual Addressing

2019-07-06 Thread Liu, Yi L
Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to expose SVA capability to VMs. i.e. shared guest
application address space with passthru devices. The whole SVA virtualization
requires QEMU/VFIO/IOMMU changes. This series includes the VFIO changes, for
QEMU and IOMMU changes, they are in separate series (listed in the "Related
series").

The high-level architecture for SVA virtualization is as below:

.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

There are roughly three parts:
1. vfio support for PASID allocation and free from VMs
2. vfio support for guest PASID binding from VMs
3. vfio support for IOMMU cache invalidation from VMs

Related series:
[1] [PATCH v4 00/22]  Shared virtual address IOMMU and VT-d support:
https://lwn.net/Articles/790820/


[2] [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM
from Yi Liu

This work is based on collaboration with other developers on the IOMMU
mailing list. Notably,

[1] [RFC PATCH 00/20] Qemu: Extend intel_iommu emulator to support
Shared Virtual Memory from Yi Liu
https://www.spinics.net/lists/kvm/msg148798.html

[2] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d from Yi Liu
https://lists.linuxfoundation.org/pipermail/iommu/2017-April/021475.html

[3] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA by Yi
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html

[4] [PATCH v6 00/22] SMMUv3 Nested Stage Setup by Eric Auger
https://lkml.org/lkml/2019/3/17/124

[5] [RFC v4 00/27] vSMMUv3/pSMMUv3 2 stage VFIO integration by Eric Auger
https://lists.sr.ht/~philmd/qemu/%3C20190527114203.2762-1-eric.auger%40redhat.com%3E

[6] [RFC PATCH 2/6] drivers core: Add I/O ASID allocator by Jean-Philippe
Brucker
https://www.spinics.net/lists/iommu/msg30639.html

Liu Yi L (4):
  vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE
  vfio: VFIO_IOMMU_CACHE_INVALIDATE
  vfio/type1: VFIO_IOMMU_PASID_REQUEST(alloc/free)
  vfio/type1: bind guest pasid (guest page tables) to host

 drivers/vfio/vfio_iommu_type1.c | 384 
 include/uapi/linux/vfio.h   | 116 
 2 files changed, 500 insertions(+)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v1 1/4] vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE

2019-07-06 Thread Liu, Yi L
From: Liu Yi L 

This patch adds VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE ioctl
which aims to pass/withdraw the virtual iommu guest configuration
to/from the VFIO driver downto to the iommu subsystem.

Cc: Kevin Tian 
Signed-off-by: Jacob Pan 
Signed-off-by: Liu Yi L 
Signed-off-by: Eric Auger 
---
 drivers/vfio/vfio_iommu_type1.c | 53 +
 include/uapi/linux/vfio.h   | 22 +
 2 files changed, 75 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3ddc375..b2d609d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1758,6 +1758,43 @@ static int vfio_domains_have_iommu_cache(struct 
vfio_iommu *iommu)
return ret;
 }
 
+static void
+vfio_detach_pasid_table(struct vfio_iommu *iommu)
+{
+   struct vfio_domain *d;
+
+   mutex_lock(&iommu->lock);
+
+   list_for_each_entry(d, &iommu->domain_list, next) {
+   iommu_detach_pasid_table(d->domain);
+   }
+   mutex_unlock(&iommu->lock);
+}
+
+static int
+vfio_attach_pasid_table(struct vfio_iommu *iommu,
+   struct vfio_iommu_type1_attach_pasid_table *ustruct)
+{
+   struct vfio_domain *d;
+   int ret = 0;
+
+   mutex_lock(&iommu->lock);
+
+   list_for_each_entry(d, &iommu->domain_list, next) {
+   ret = iommu_attach_pasid_table(d->domain, &ustruct->config);
+   if (ret)
+   goto unwind;
+   }
+   goto unlock;
+unwind:
+   list_for_each_entry_continue_reverse(d, &iommu->domain_list, next) {
+   iommu_detach_pasid_table(d->domain);
+   }
+unlock:
+   mutex_unlock(&iommu->lock);
+   return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -1828,6 +1865,22 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
return copy_to_user((void __user *)arg, &unmap, minsz) ?
-EFAULT : 0;
+   } else if (cmd == VFIO_IOMMU_ATTACH_PASID_TABLE) {
+   struct vfio_iommu_type1_attach_pasid_table ustruct;
+
+   minsz = offsetofend(struct vfio_iommu_type1_attach_pasid_table,
+   config);
+
+   if (copy_from_user(&ustruct, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (ustruct.argsz < minsz || ustruct.flags)
+   return -EINVAL;
+
+   return vfio_attach_pasid_table(iommu, &ustruct);
+   } else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) {
+   vfio_detach_pasid_table(iommu);
+   return 0;
}
 
return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 8f10748..4316dd8 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 
 #define VFIO_API_VERSION   0
 
@@ -763,6 +764,27 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_ATTACH_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
+ * struct vfio_iommu_type1_attach_pasid_table)
+ *
+ * Passes the PASID table to the host. Calling ATTACH_PASID_TABLE
+ * while a table is already installed is allowed: it replaces the old
+ * table. DETACH does a comprehensive tear down of the nested mode.
+ */
+struct vfio_iommu_type1_attach_pasid_table {
+   __u32   argsz;
+   __u32   flags;
+   struct iommu_pasid_table_config config;
+};
+#define VFIO_IOMMU_ATTACH_PASID_TABLE  _IO(VFIO_TYPE, VFIO_BASE + 22)
+
+/**
+ * VFIO_IOMMU_DETACH_PASID_TABLE - - _IOWR(VFIO_TYPE, VFIO_BASE + 23)
+ * Detaches the PASID table
+ */
+#define VFIO_IOMMU_DETACH_PASID_TABLE  _IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v1 4/4] vfio/type1: bind guest pasid (guest page tables) to host

2019-07-06 Thread Liu, Yi L
From: Liu Yi L 

This patch adds vfio support to bind guest translation structure
to host iommu. VFIO exposes iommu programming capability to user-
space. Guest is a user-space application in host under KVM solution.
For SVA usage in Virtual Machine, guest owns GVA->GPA translation
structure. And this part should be passdown to host to enable nested
translation (or say two stage translation). This patch reuses the
VFIO_IOMMU_BIND proposal from Jean-Philippe Brucker, and adds new
bind type for binding guest owned translation structure to host.

*) Add two new ioctls for VFIO containers.

  - VFIO_IOMMU_BIND: for bind request from userspace, it could be
   bind a process to a pasid or bind a guest pasid
   to a device, this is indicated by type
  - VFIO_IOMMU_UNBIND: for unbind request from userspace, it could be
   unbind a process to a pasid or unbind a guest pasid
   to a device, also indicated by type
  - Bind type:
VFIO_IOMMU_BIND_PROCESS: user-space request to bind a process
   to a device
VFIO_IOMMU_BIND_GUEST_PASID: bind guest owned translation
   structure to host iommu. e.g. guest page table

*) Code logic in vfio_iommu_type1_ioctl() to handle VFIO_IOMMU_BIND/UNBIND

Cc: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/vfio/vfio_iommu_type1.c | 151 
 include/uapi/linux/vfio.h   |  56 +++
 2 files changed, 207 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d5e0c01..57826ed 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1920,6 +1920,119 @@ static int vfio_iommu_type1_pasid_free(struct 
vfio_iommu *iommu, int pasid)
return ret;
 }
 
+static int vfio_bind_gpasid_fn(struct device *dev, void *data)
+{
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct vfio_iommu_type1_bind_guest_pasid *guest_bind = data;
+
+   return iommu_sva_bind_gpasid(domain, dev, &guest_bind->bind_data);
+}
+
+static int vfio_unbind_gpasid_fn(struct device *dev, void *data)
+{
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct vfio_iommu_type1_bind_guest_pasid *guest_bind = data;
+
+   return iommu_sva_unbind_gpasid(domain, dev,
+   guest_bind->bind_data.hpasid);
+}
+
+/*
+ * unbind specific gpasid, caller of this function requires hold
+ * vfio_iommu->lock
+ */
+static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu,
+ struct vfio_iommu_type1_bind_guest_pasid *guest_bind)
+{
+   struct vfio_domain *domain;
+   struct vfio_group *group;
+   int ret = 0;
+
+   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   list_for_each_entry(domain, &iommu->domain_list, next) {
+   list_for_each_entry(group, &domain->group_list, next) {
+   ret = iommu_group_for_each_dev(group->iommu_group,
+  guest_bind, vfio_unbind_gpasid_fn);
+   if (ret)
+   goto out;
+   }
+   }
+
+   return 0;
+
+out:
+   return ret;
+}
+
+static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu,
+   void __user *arg,
+   struct vfio_iommu_type1_bind *bind)
+{
+   struct vfio_iommu_type1_bind_guest_pasid guest_bind;
+   struct vfio_domain *domain;
+   struct vfio_group *group;
+   unsigned long minsz;
+   int ret = 0;
+
+   minsz = sizeof(*bind) + sizeof(guest_bind);
+   if (bind->argsz < minsz)
+   return -EINVAL;
+
+   if (copy_from_user(&guest_bind, arg, sizeof(guest_bind)))
+   return -EFAULT;
+
+   mutex_lock(&iommu->lock);
+   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   list_for_each_entry(domain, &iommu->domain_list, next) {
+   list_for_each_entry(group, &domain->group_list, next) {
+   ret = iommu_group_for_each_dev(group->iommu_group,
+  &guest_bind, vfio_bind_gpasid_fn);
+   if (ret)
+   goto out_unbind;
+   }
+   }
+
+   mutex_unlock(&iommu->lock);
+   return 0;
+
+out_unbind:
+   /* Undo all binds that already succeeded */
+   vfio_iommu_type1_do_guest_unbind(iommu, &guest_bind);
+
+out_unlock:
+   mutex_unlock(&iommu->lock);
+   return ret;
+}
+
+static long vfio_iommu_type1_unbind_gpas

  1   2   3   4   5   6   >