RE: Device specific pass through in host systems - discuss user interface

2019-06-09 Thread Prakhya, Sai Praneeth
> > I am working on an IOMMU driver feature that allows a user to specify
> > if the DMA from a device should be translated by IOMMU or not.
> > Presently, we support only all devices or none mode i.e. if user
> > specifies "iommu=pt" [X86] or "iommu.passthrough" [ARM64] through
> > kernel command line, all the devices would be in pass through mode and
> > we don't have per device granularity, but, we were requested by a
> > customer to selectively put devices in pass through mode and not all.
> 
> Most iommu vendor drivers have switched from per-device to per-group domain
> (a.k.a. default domain). So per-group pass-through mode makes more sense?
> 
> By the way, can we extend this to "per-group default domain type", instead of
> only "per-group pass-through mode"? Currently we have system level default
> domain type, if we have finer granularity of default domain type, both iommu
> drivers and end users will benefit from it.

Sure! Makes sense.. per-group default domain type sounds good.

> > I am looking for a consensus on **how the kernel command line argument
> > should look like and path for sysfs entry**. Also, please note that if
> > a device is put in pass through mode it won't be available for the
> > guest and that's ok.
> 
> Just out of curiosity, what's the limitation for a device using pass- through 
> DMA
> domain to be assignable.

Sorry! I don't know about assignable devices. Probably, Ashok or Jacob could 
answer this question

Regards,
Sai
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Device specific pass through in host systems - discuss user interface

2019-06-09 Thread Lu, Baolu

Hi Sai,


On 6/7/19 10:24 AM, Prakhya, Sai Praneeth wrote:

Hi All,

I am working on an IOMMU driver feature that allows a user to specify if 
the DMA from a device should be translated by IOMMU or not. Presently, 
we support only all devices or none mode i.e. if user specifies 
“iommu=pt” [X86] or “iommu.passthrough” [ARM64] through kernel command 
line, all the devices would be in pass through mode and we don’t have 
per device granularity, but, we were requested by a customer to 
selectively put devices in pass through mode and not all.


Most iommu vendor drivers have switched from per-device to per-group
domain (a.k.a. default domain). So per-group pass-through mode makes
more sense?

By the way, can we extend this to "per-group default domain type",
instead of only "per-group pass-through mode"? Currently we have system
level default domain type, if we have finer granularity of default
domain type, both iommu drivers and end users will benefit from it.



Since, this feature could be generic across architectures, we thought it 
would be better if the user interface is discussed in the community 
first. We are envisioning this to be used both during boot time and 
runtime and hence having a kernel command line argument along with a 
sysfs entry are needed. So, please pour in your suggestions on how the 
user interface should look like to make it architecture agnostic.


1.Have a kernel command line argument that takes a list of BDF’s as an 
input and puts them in pass through mode


a.Accepting BDF as an input has a downside – BDF is dynamic and could 
change if BIOS/OS enumerates a new device in next reboot


b.Accepting  pair as an input has a downside – What 
to do when there are multiple such devices and user would like to put 
only some of them in PT mode


2.Have a sysfs file which takes 1 or 0 as an input to enable/disable 
pass through mode. Some places that seem to be reasonable are


a./sys/class/iommu/dmar0/devices/

b./sys/kernel/iommu_groups//devices

I am looking for a consensus on **how the kernel command line argument 
should look like and path for sysfs entry**. Also, please note that if a 
device is put in pass through mode it won’t be available for the guest 
and that’s ok.


Just out of curiosity, what's the limitation for a device using pass-
through DMA domain to be assignable.

Best regards,
Baolu



Regards,

Sai

PS: Idea credits: Ashok Raj


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 01/14] dt-bindings: Add binding for MT2712 MIPI-CSI2

2019-06-09 Thread Tomasz Figa
Hi CK, Stu,

On Mon, Jun 10, 2019 at 11:34 AM CK Hu  wrote:
>
> Hi, Stu:
>
> "mediatek,mt2712-mipicsi" and "mediatek,mt2712-mipicsi-common" have many
> common part with "mediatek,mt8183-seninf", and I've a discussion in [1],
> so I would like these two to be merged together.
>
> [1] https://patchwork.kernel.org/patch/10979131/
>

Thanks CK for spotting this.

I also noticed that the driver in fact handles two hardware blocks at
the same time - SenInf and CamSV. Unless the architecture is very
different from MT8183, I'd suggest splitting it.

On a general note, the MT8183 SenInf driver has received several
rounds of review comments already, but I couldn't find any comments
posted for this one.

Given the two aspects above and also based on my quick look at code
added by this series, I'd recommend adding MT2712 support on top of
the MT8183 series.

Best regards,
Tomasz


[PATCH v4 08/22] iommu: Introduce attach/detach_pasid_table API

2019-06-09 Thread Jacob Pan
In virtualization use case, when a guest is assigned
a PCI host device, protected by a virtual IOMMU on the guest,
the physical IOMMU must be programmed to be consistent with
the guest mappings. If the physical IOMMU supports two
translation stages it makes sense to program guest mappings
onto the first stage/level (ARM/Intel terminology) while the host
owns the stage/level 2.

In that case, it is mandated to trap on guest configuration
settings and pass those to the physical iommu driver.

This patch adds a new API to the iommu subsystem that allows
to set/unset the pasid table information.

A generic iommu_pasid_table_config struct is introduced in
a new iommu.h uapi header. This is going to be used by the VFIO
user API.

Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu, Yi L 
Signed-off-by: Ashok Raj 
Signed-off-by: Jacob Pan 
Signed-off-by: Eric Auger 
Reviewed-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu.c  | 19 +
 include/linux/iommu.h  | 18 
 include/uapi/linux/iommu.h | 52 ++
 3 files changed, 89 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 166adb8..4496ccd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1619,6 +1619,25 @@ int iommu_page_response(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_page_response);
 
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+struct iommu_pasid_table_config *cfg)
+{
+   if (unlikely(!domain->ops->attach_pasid_table))
+   return -ENODEV;
+
+   return domain->ops->attach_pasid_table(domain, cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
+
+void iommu_detach_pasid_table(struct iommu_domain *domain)
+{
+   if (unlikely(!domain->ops->detach_pasid_table))
+   return;
+
+   domain->ops->detach_pasid_table(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 950347b..d3edb10 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -264,6 +264,8 @@ struct page_response_msg {
  * @sva_unbind: Unbind process address space from device
  * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
+ * @attach_pasid_table: attach a pasid table
+ * @detach_pasid_table: detach the pasid table
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -323,6 +325,9 @@ struct iommu_ops {
  void *drvdata);
void (*sva_unbind)(struct iommu_sva *handle);
int (*sva_get_pasid)(struct iommu_sva *handle);
+   int (*attach_pasid_table)(struct iommu_domain *domain,
+ struct iommu_pasid_table_config *cfg);
+   void (*detach_pasid_table)(struct iommu_domain *domain);
 
int (*page_response)(struct device *dev, struct page_response_msg *msg);
 
@@ -434,6 +439,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
   struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
+extern int iommu_attach_pasid_table(struct iommu_domain *domain,
+   struct iommu_pasid_table_config *cfg);
+extern void iommu_detach_pasid_table(struct iommu_domain *domain);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -947,6 +955,13 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct 
device *dev)
return -ENODEV;
 }
 
+static inline
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+struct iommu_pasid_table_config *cfg)
+{
+   return -ENODEV;
+}
+
 static inline struct iommu_sva *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
 {
@@ -968,6 +983,9 @@ static inline int iommu_sva_get_pasid(struct iommu_sva 
*handle)
return IOMMU_PASID_INVALID;
 }
 
+static inline
+void iommu_detach_pasid_table(struct iommu_domain *domain) {}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index aaa3b6a..3976767 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -115,4 +115,56 @@ struct iommu_fault {
struct iommu_fault_page_request prm;
};
 };
+
+/**
+ * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
+ * information
+ * @version: API version of this structure
+ * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
+ * or 2-level table)
+ * @s

[PATCH v4 12/22] iommu: Add I/O ASID allocator

2019-06-09 Thread Jacob Pan
From: Jean-Philippe Brucker 

Some devices might support multiple DMA address spaces, in particular
those that have the PCI PASID feature. PASID (Process Address Space ID)
allows to share process address spaces with devices (SVA), partition a
device into VM-assignable entities (VFIO mdev) or simply provide
multiple DMA address space to kernel drivers. Add a global PASID
allocator usable by different drivers at the same time. Name it I/O ASID
to avoid confusion with ASIDs allocated by arch code, which are usually
a separate ID space.

The IOASID space is global. Each device can have its own PASID space,
but by convention the IOMMU ended up having a global PASID space, so
that with SVA, each mm_struct is associated to a single PASID.

The allocator is primarily used by IOMMU subsystem but in rare occasions
drivers would like to allocate PASIDs for devices that aren't managed by
an IOMMU, using the same ID space as IOMMU.

There are two types of allocators:
1. default allocator - Always available, uses an XArray to track
2. custom allocators - Can be registered at runtime, take precedence
over the default allocator.

Custom allocators have these attributes:
- provides platform specific alloc()/free() functions with private data.
- allocation results lookup are not provided by the allocator, lookup
  request must be done by the IOASID framework by its own XArray.
- allocators can be unregistered at runtime, either fallback to the next
  custom allocator or to the default allocator.
- custom allocators can share the same set of alloc()/free() helpers, in
  this case they also share the same IOASID space, thus the same XArray.
- switching between allocators requires all outstanding IOASIDs to be
  freed unless the two allocators share the same alloc()/free() helpers.

Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Jacob Pan 
Link: https://lkml.org/lkml/2019/4/26/462
---
 drivers/iommu/Kconfig  |   8 +
 drivers/iommu/Makefile |   1 +
 drivers/iommu/ioasid.c | 427 +
 include/linux/ioasid.h |  74 +
 4 files changed, 510 insertions(+)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 83664db..c40c4b5 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -3,6 +3,13 @@
 config IOMMU_IOVA
tristate
 
+# The IOASID allocator may also be used by non-IOMMU_API users
+config IOASID
+   tristate
+   help
+ Enable the I/O Address Space ID allocator. A single ID space shared
+ between different users.
+
 # IOMMU_API always gets selected by whoever wants it.
 config IOMMU_API
bool
@@ -207,6 +214,7 @@ config INTEL_IOMMU_SVM
depends on INTEL_IOMMU && X86
select PCI_PASID
select MMU_NOTIFIER
+   select IOASID
help
  Shared Virtual Memory (SVM) provides a facility for devices
  to access DMA resources through process address space by
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 8c71a15..0efac6f 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU) += of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
new file mode 100644
index 000..0919b70
--- /dev/null
+++ b/drivers/iommu/ioasid.c
@@ -0,0 +1,427 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * I/O Address Space ID allocator. There is one global IOASID space, split into
+ * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
+ * free IOASIDs with ioasid_alloc and ioasid_free.
+ */
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct ioasid_data {
+   ioasid_t id;
+   struct ioasid_set *set;
+   void *private;
+   struct rcu_head rcu;
+};
+
+/*
+ * struct ioasid_allocator_data - Internal data structure to hold information
+ * about an allocator. There are two types of allocators:
+ *
+ * - Default allocator always has its own XArray to track the IOASIDs 
allocated.
+ * - Custom allocators may share allocation helpers with different private 
data.
+ *   Custom allocators share the same helper functions also share the same
+ *   XArray.
+ * Rules:
+ * 1. Default allocator is always available, not dynamically registered. This 
is
+ *to prevent race conditions with early boot code that want to register
+ *custom allocators or allocate IOASIDs.
+ * 2. Custom allocators take precedence over the default allocator.
+ * 3. When all custom allocators sharing the same helper functions are
+ *unregistered (e.g. due 

[PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID

2019-06-09 Thread Jacob Pan
When VT-d driver runs in the guest, PASID allocation must be
performed via virtual command interface. This patch registers a
custom IOASID allocator which takes precedence over the default
XArray based allocator. The resulting IOASID allocation will always
come from the host. This ensures that PASID namespace is system-
wide.

Signed-off-by: Lu Baolu 
Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/Kconfig   |  1 +
 drivers/iommu/intel-iommu.c | 60 +
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 63 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c40c4b5..5d1bc2a 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -213,6 +213,7 @@ config INTEL_IOMMU_SVM
bool "Support for Shared Virtual Memory with Intel IOMMU"
depends on INTEL_IOMMU && X86
select PCI_PASID
+   select IOASID
select MMU_NOTIFIER
select IOASID
help
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 09b8ff0..5b84994 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1711,6 +1711,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
}
+   ioasid_unregister_allocator(&iommu->pasid_allocator);
+
 #endif
 }
 
@@ -4820,6 +4822,46 @@ static int __init platform_optin_force_iommu(void)
return 1;
 }
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
+{
+   struct intel_iommu *iommu = data;
+   ioasid_t ioasid;
+
+   /*
+* VT-d virtual command interface always uses the full 20 bit
+* PASID range. Host can partition guest PASID range based on
+* policies but it is out of guest's control.
+*/
+   if (min < PASID_MIN || max > PASID_MAX)
+   return -EINVAL;
+
+   if (vcmd_alloc_pasid(iommu, &ioasid))
+   return INVALID_IOASID;
+
+   return ioasid;
+}
+
+static void intel_ioasid_free(ioasid_t ioasid, void *data)
+{
+   struct iommu_pasid_alloc_info *svm;
+   struct intel_iommu *iommu = data;
+
+   if (!iommu)
+   return;
+   /*
+* Sanity check the ioasid owner is done at upper layer, e.g. VFIO
+* We can only free the PASID when all the devices are unbond.
+*/
+   svm = ioasid_find(NULL, ioasid, NULL);
+   if (!svm) {
+   pr_warn("Freeing unbond IOASID %d\n", ioasid);
+   return;
+   }
+   vcmd_free_pasid(iommu, ioasid);
+}
+#endif
+
 int __init intel_iommu_init(void)
 {
int ret = -ENODEV;
@@ -4924,6 +4966,24 @@ int __init intel_iommu_init(void)
   "%s", iommu->name);
iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
iommu_device_register(&iommu->iommu);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) {
+   /*
+* Register a custom ASID allocator if we are running
+* in a guest, the purpose is to have a system wide 
PASID
+* namespace among all PASID users.
+* There can be multiple vIOMMUs in each guest but only
+* one allocator is active. All vIOMMU allocators will
+* eventually be calling the same host allocator.
+*/
+   iommu->pasid_allocator.alloc = intel_ioasid_alloc;
+   iommu->pasid_allocator.free = intel_ioasid_free;
+   iommu->pasid_allocator.pdata = (void *)iommu;
+   ret = 
ioasid_register_allocator(&iommu->pasid_allocator);
+   if (ret)
+   pr_warn("Custom PASID allocator registeration 
failed\n");
+   }
+#endif
}
 
bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index bff907b..8605c74 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -549,6 +550,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
struct page_req_dsc *prq;
unsigned char prq_name[16];/* Name for PRQ interrupt */
+   struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for 
PASIDs */
 #endif
struct q_inval  *qi;/* Queued invalidation info */
u32 *iommu_state; /* Store iommu states between suspend and resume.*/
-- 
2.7.4



[PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

2019-06-09 Thread Jacob Pan
Make use of generic IOASID code to manage PASID allocation,
free, and lookup. Replace Intel specific code.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 11 +--
 drivers/iommu/intel-pasid.c | 36 
 drivers/iommu/intel-svm.c   | 37 +
 3 files changed, 26 insertions(+), 58 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5b84994..39b63fe 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5167,7 +5167,7 @@ static void auxiliary_unlink_device(struct dmar_domain 
*domain,
domain->auxd_refcnt--;
 
if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   intel_pasid_free_id(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 }
 
 static int aux_domain_add_dev(struct dmar_domain *domain,
@@ -5185,10 +5185,9 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
if (domain->default_pasid <= 0) {
int pasid;
 
-   pasid = intel_pasid_alloc_id(domain, PASID_MIN,
-pci_max_pasids(to_pci_dev(dev)),
-GFP_KERNEL);
-   if (pasid <= 0) {
+   pasid = ioasid_alloc(NULL, PASID_MIN, 
pci_max_pasids(to_pci_dev(dev)) - 1,
+   domain);
+   if (pasid == INVALID_IOASID) {
pr_err("Can't allocate default pasid\n");
return -ENODEV;
}
@@ -5224,7 +5223,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
spin_unlock(&iommu->lock);
spin_unlock_irqrestore(&device_domain_lock, flags);
if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   intel_pasid_free_id(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 
return ret;
 }
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 69fddd3..1e25539 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -26,42 +26,6 @@
  */
 static DEFINE_SPINLOCK(pasid_lock);
 u32 intel_pasid_max_id = PASID_MAX;
-static DEFINE_IDR(pasid_idr);
-
-int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
-{
-   int ret, min, max;
-
-   min = max_t(int, start, PASID_MIN);
-   max = min_t(int, end, intel_pasid_max_id);
-
-   WARN_ON(in_interrupt());
-   idr_preload(gfp);
-   spin_lock(&pasid_lock);
-   ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
-   spin_unlock(&pasid_lock);
-   idr_preload_end();
-
-   return ret;
-}
-
-void intel_pasid_free_id(int pasid)
-{
-   spin_lock(&pasid_lock);
-   idr_remove(&pasid_idr, pasid);
-   spin_unlock(&pasid_lock);
-}
-
-void *intel_pasid_lookup_id(int pasid)
-{
-   void *p;
-
-   spin_lock(&pasid_lock);
-   p = idr_find(&pasid_idr, pasid);
-   spin_unlock(&pasid_lock);
-
-   return p;
-}
 
 int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
 {
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 8f87304..9cbcc1f 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "intel-pasid.h"
@@ -332,16 +333,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
if (pasid_max > intel_pasid_max_id)
pasid_max = intel_pasid_max_id;
 
-   /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
-   ret = intel_pasid_alloc_id(svm,
-  !!cap_caching_mode(iommu->cap),
-  pasid_max - 1, GFP_KERNEL);
-   if (ret < 0) {
+   /* Do not use PASID 0, reserved for RID to PASID */
+   svm->pasid = ioasid_alloc(NULL, PASID_MIN,
+   pasid_max - 1, svm);
+   if (svm->pasid == INVALID_IOASID) {
kfree(svm);
kfree(sdev);
+   ret = ENOSPC;
goto out;
}
-   svm->pasid = ret;
svm->notifier.ops = &intel_mmuops;
svm->mm = mm;
svm->flags = flags;
@@ -351,7 +351,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
if (mm) {
ret = mmu_notifier_register(&svm->notifier, mm);
if (ret) {
-   intel_pasid_free_id(svm->pasid);
+   ioasid_free(svm->pasid);
kfree(svm);
kfree(sdev);
goto out;
@@ -367,7 +367,7 @@ int intel_svm_bind_mm(struct

[PATCH v4 07/22] iommu: Use device fault trace event

2019-06-09 Thread Jacob Pan
For performance and debugging purposes, these trace events help
analyzing device faults that interact with IOMMU subsystem.
E.g.
IOMMU::00:0a.0 type=2 reason=0 addr=0x007ff000 pasid=1
group=1 last=0 prot=1

Signed-off-by: Jacob Pan 
[JPB: removed invalidate event, that will be added later]
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 64e87d5..166adb8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1032,6 +1032,7 @@ int iommu_report_device_fault(struct device *dev, struct 
iommu_fault_event *evt)
}
 
ret = fparam->handler(evt, fparam->data);
+   trace_dev_fault(dev, &evt->fault);
 done_unlock:
mutex_unlock(¶m->lock);
return ret;
@@ -1604,6 +1605,7 @@ int iommu_page_response(struct device *dev,
if (evt->fault.prm.pasid == msg->pasid &&
evt->fault.prm.grpid == msg->grpid) {
msg->iommu_data = evt->iommu_private;
+   trace_dev_page_response(dev, msg);
ret = domain->ops->page_response(dev, msg);
list_del(&evt->list);
kfree(evt);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 18/22] iommu/vt-d: Add nested translation helper function

2019-06-09 Thread Jacob Pan
Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-pasid.c | 93 +
 drivers/iommu/intel-pasid.h | 11 ++
 2 files changed, 104 insertions(+)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 1ff2ecc..50ec344 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -684,3 +684,96 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
 
return 0;
 }
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation
+ * which is used for vSVA. The first level page tables are used for
+ * GVA-GPA translation in the guest, second level page tables are used
+ * for GPA to HPA translation.
+ *
+ * @iommu:  Iommu which the device belong to
+ * @dev:Device to be set up for translation
+ * @gpgd:   FLPTPTR: First Level Page translation pointer in GPA
+ * @pasid:  PASID to be programmed in the device PASID table
+ * @flags:  Additional info such as supervisor PASID
+ * @domain: Domain info for setting up second level page tables
+ * @addr_width: Address width of the first level (guest)
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu,
+   struct device *dev, pgd_t *gpgd,
+   int pasid, int flags,
+   struct dmar_domain *domain,
+   int addr_width)
+{
+   struct pasid_entry *pte;
+   struct dma_pte *pgd;
+   u64 pgd_val;
+   int agaw;
+   u16 did;
+
+   if (!ecap_nest(iommu->ecap)) {
+   pr_err("IOMMU: %s: No nested translation support\n",
+  iommu->name);
+   return -EINVAL;
+   }
+
+   pte = intel_pasid_get_entry(dev, pasid);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   pasid_clear_entry(pte);
+
+   /* Sanity checking performed by caller to make sure address
+* width matching in two dimensions:
+* 1. CPU vs. IOMMU
+* 2. Guest vs. Host.
+*/
+   switch (addr_width) {
+   case 57:
+   pasid_set_flpm(pte, 1);
+   break;
+   case 48:
+   pasid_set_flpm(pte, 0);
+   break;
+   default:
+   dev_err(dev, "Invalid paging mode %d\n", addr_width);
+   return -EINVAL;
+   }
+
+   /* Setup the first level page table pointer in GPA */
+   pasid_set_flptr(pte, (u64)gpgd);
+   if (flags & PASID_FLAG_SUPERVISOR_MODE) {
+   if (!ecap_srs(iommu->ecap)) {
+   pr_err("No supervisor request support on %s\n",
+  iommu->name);
+   return -EINVAL;
+   }
+   pasid_set_sre(pte);
+   }
+
+   /* Setup the second level based on the given domain */
+   pgd = domain->pgd;
+
+   for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
+   pgd = phys_to_virt(dma_pte_addr(pgd));
+   if (!dma_pte_present(pgd)) {
+   dev_err(dev, "Invalid domain page table\n");
+   return -EINVAL;
+   }
+   }
+   pgd_val = virt_to_phys(pgd);
+   pasid_set_slptr(pte, pgd_val);
+   pasid_set_fault_enable(pte);
+
+   did = domain->iommu_did[iommu->seq_id];
+   pasid_set_domain_id(pte, did);
+
+   pasid_set_address_width(pte, agaw);
+   pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+   pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+   pasid_set_present(pte);
+   pasid_flush_caches(iommu, pte, pasid, did);
+
+   return 0;
+}
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 4b26ab5..2234fd5 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -42,6 +42,7 @@
  * to vmalloc or even module mappings.
  */
 #define PASID_FLAG_SUPERVISOR_MODE BIT(0)
+#define PASID_FLAG_NESTED  BIT(1)
 
 struct pasid_dir_entry {
u64 val;
@@ -51,6 +52,11 @@ struct pasid_entry {
u64 val[8];
 };
 
+#define PASID_ENTRY_PGTT_FL_ONLY   (1)
+#define PASID_ENTRY_PGTT_SL_ONLY   (2)
+#define PASID_ENTRY_PGTT_NESTED(3)
+#define PASID_ENTRY_PGTT_PT(4)
+
 /* The representative of a PASID table */
 struct pasid_table {
void*table; /* pasid table pointer */
@@ -77,6 +83,11 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,

[PATCH v4 10/22] iommu: Fix compile error without IOMMU_API

2019-06-09 Thread Jacob Pan
struct page_response_msg needs to be defined outside CONFIG_IOMMU_API.

Signed-off-by: Jacob Pan 
---
 include/linux/iommu.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7a37336..8d766a8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -189,8 +189,6 @@ struct iommu_sva_ops {
iommu_mm_exit_handler_t mm_exit;
 };
 
-#ifdef CONFIG_IOMMU_API
-
 /**
  * enum page_response_code - Return status of fault handlers, telling the IOMMU
  * driver how to proceed with the fault.
@@ -227,6 +225,7 @@ struct page_response_msg {
u64 iommu_data;
 };
 
+#ifdef CONFIG_IOMMU_API
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
-- 
2.7.4



[PATCH v4 22/22] iommu/vt-d: Add svm/sva invalidate function

2019-06-09 Thread Jacob Pan
When Shared Virtual Address (SVA) is enabled for a guest OS via
vIOMMU, we need to provide invalidation support at IOMMU API and driver
level. This patch adds Intel VT-d specific function to implement
iommu passdown invalidate API for shared virtual address.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-iommu.c | 170 
 1 file changed, 170 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3b4d712..3b55b86 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5352,6 +5352,175 @@ static void intel_iommu_aux_detach_device(struct 
iommu_domain *domain,
aux_domain_remove_dev(to_dmar_domain(domain), dev);
 }
 
+/*
+ * 2D array for converting and sanitizing IOMMU generic TLB granularity to
+ * VT-d granularity. Invalidation is typically included in the unmap operation
+ * as a result of DMA or VFIO unmap. However, for assigned device where guest
+ * could own the first level page tables without being shadowed by QEMU. In
+ * this case there is no pass down unmap to the host IOMMU as a result of unmap
+ * in the guest. Only invalidations are trapped and passed down.
+ * In all cases, only first level TLB invalidation (request with PASID) can be
+ * passed down, therefore we do not include IOTLB granularity for request
+ * without PASID (second level).
+ *
+ * For an example, to find the VT-d granularity encoding for IOTLB
+ * type and page selective granularity within PASID:
+ * X: indexed by iommu cache type
+ * Y: indexed by enum iommu_inv_granularity
+ * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
+ *
+ * Granu_map array indicates validity of the table. 1: valid, 0: invalid
+ *
+ */
+const static int 
inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+   /* PASID based IOTLB, support PASID selective and page selective */
+   {0, 1, 1},
+   /* PASID based dev TLBs, only support all PASIDs or single PASID */
+   {1, 1, 0},
+   /* PASID cache */
+   {1, 1, 0}
+};
+
+const static u64 
inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+   /* PASID based IOTLB */
+   {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
+   /* PASID based dev TLBs */
+   {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
+   /* PASID cache */
+   {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
+};
+
+static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
+{
+   if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR ||
+   !inv_type_granu_map[type][granu])
+   return -EINVAL;
+
+   *vtd_granu = inv_type_granu_table[type][granu];
+
+   return 0;
+}
+
+static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
+{
+   u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
+
+   /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
+* IOMMU cache invalidate API passes granu_size in bytes, and number of
+* granu size in contiguous memory.
+*/
+   return order_base_2(nr_pages);
+}
+
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct iommu_cache_invalidate_info 
*inv_info)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct device_domain_info *info;
+   struct intel_iommu *iommu;
+   unsigned long flags;
+   int cache_type;
+   u8 bus, devfn;
+   u16 did, sid;
+   int ret = 0;
+   u64 size;
+
+   if (!inv_info || !dmar_domain ||
+   inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;
+
+   iommu = device_to_iommu(dev, &bus, &devfn);
+   if (!iommu)
+   return -ENODEV;
+
+   spin_lock_irqsave(&device_domain_lock, flags);
+   spin_lock(&iommu->lock);
+   info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
+   if (!info) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+   did = dmar_domain->iommu_did[iommu->seq_id];
+   sid = PCI_DEVID(bus, devfn);
+   size = to_vtd_size(inv_info->addr_info.granule_size, 
inv_info->addr_info.nb_granules);
+
+   for_each_set_bit(cache_type, (unsigned long *)&inv_info->cache, 
IOMMU_CACHE_INV_TYPE_NR) {
+   u64 granu = 0;
+  

[PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup

2019-06-09 Thread Jacob Pan
After each setup for PASID entry, related translation caches must be flushed.
We can combine duplicated code into one function which is less error prone.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-pasid.c | 48 +
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 1e25539..1ff2ecc 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -522,6 +522,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
 
+static inline void pasid_flush_caches(struct intel_iommu *iommu,
+   struct pasid_entry *pte,
+   int pasid, u16 did)
+{
+   if (!ecap_coherent(iommu->ecap))
+   clflush_cache_range(pte, sizeof(*pte));
+
+   if (cap_caching_mode(iommu->cap)) {
+   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+   iotlb_invalidation_with_pasid(iommu, did, pasid);
+   } else
+   iommu_flush_write_buffer(iommu);
+
+}
+
 /*
  * Set up the scalable mode pasid table entry for first only
  * translation type.
@@ -567,16 +582,7 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
/* Setup Present and PASID Granular Transfer Type: */
pasid_set_translation_type(pte, 1);
pasid_set_present(pte);
-
-   if (!ecap_coherent(iommu->ecap))
-   clflush_cache_range(pte, sizeof(*pte));
-
-   if (cap_caching_mode(iommu->cap)) {
-   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
-   } else {
-   iommu_flush_write_buffer(iommu);
-   }
+   pasid_flush_caches(iommu, pte, pasid, did);
 
return 0;
 }
@@ -640,16 +646,7 @@ int intel_pasid_setup_second_level(struct intel_iommu 
*iommu,
 */
pasid_set_sre(pte);
pasid_set_present(pte);
-
-   if (!ecap_coherent(iommu->ecap))
-   clflush_cache_range(pte, sizeof(*pte));
-
-   if (cap_caching_mode(iommu->cap)) {
-   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
-   } else {
-   iommu_flush_write_buffer(iommu);
-   }
+   pasid_flush_caches(iommu, pte, pasid, did);
 
return 0;
 }
@@ -683,16 +680,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
 */
pasid_set_sre(pte);
pasid_set_present(pte);
-
-   if (!ecap_coherent(iommu->ecap))
-   clflush_cache_range(pte, sizeof(*pte));
-
-   if (cap_caching_mode(iommu->cap)) {
-   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
-   } else {
-   iommu_flush_write_buffer(iommu);
-   }
+   pasid_flush_caches(iommu, pte, pasid, did);
 
return 0;
 }
-- 
2.7.4



[PATCH v4 19/22] iommu/vt-d: Clean up for SVM device list

2019-06-09 Thread Jacob Pan
Use combined macro for_each_svm_dev() to simplify SVM device iteration.

Suggested-by: Andy Shevchenko 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel-svm.c | 79 +++
 1 file changed, 39 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 9cbcc1f..66d98e1 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -225,6 +225,9 @@ static const struct mmu_notifier_ops intel_mmuops = {
 
 static DEFINE_MUTEX(pasid_mutex);
 static LIST_HEAD(global_svm_list);
+#define for_each_svm_dev() \
+   list_for_each_entry(sdev, &svm->devs, list) \
+   if (dev == sdev->dev)   \
 
 int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
svm_dev_ops *ops)
 {
@@ -271,15 +274,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
goto out;
}
 
-   list_for_each_entry(sdev, &svm->devs, list) {
-   if (dev == sdev->dev) {
-   if (sdev->ops != ops) {
-   ret = -EBUSY;
-   goto out;
-   }
-   sdev->users++;
-   goto success;
+   for_each_svm_dev() {
+   if (sdev->ops != ops) {
+   ret = -EBUSY;
+   goto out;
}
+   sdev->users++;
+   goto success;
}
 
break;
@@ -409,40 +410,38 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
if (!svm)
goto out;
 
-   list_for_each_entry(sdev, &svm->devs, list) {
-   if (dev == sdev->dev) {
-   ret = 0;
-   sdev->users--;
-   if (!sdev->users) {
-   list_del_rcu(&sdev->list);
-   /* Flush the PASID cache and IOTLB for this 
device.
-* Note that we do depend on the hardware *not* 
using
-* the PASID any more. Just as we depend on 
other
-* devices never using PASIDs that they have no 
right
-* to use. We have a *shared* PASID table, 
because it's
-* large and has to be physically contiguous. 
So it's
-* hard to be as defensive as we might like. */
-   intel_pasid_tear_down_entry(iommu, dev, 
svm->pasid);
-   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, 
!svm->mm);
-   kfree_rcu(sdev, rcu);
-
-   if (list_empty(&svm->devs)) {
-   ioasid_free(svm->pasid);
-   if (svm->mm)
-   
mmu_notifier_unregister(&svm->notifier, svm->mm);
-
-   list_del(&svm->list);
-
-   /* We mandate that no page faults may 
be outstanding
-* for the PASID when 
intel_svm_unbind_mm() is called.
-* If that is not obeyed, subtle errors 
will happen.
-* Let's make them less subtle... */
-   memset(svm, 0x6b, sizeof(*svm));
-   kfree(svm);
-   }
+   for_each_svm_dev() {
+   ret = 0;
+   sdev->users--;
+   if (!sdev->users) {
+   list_del_rcu(&sdev->list);
+   /* Flush the PASID cache and IOTLB for this device.
+* Note that we do depend on the hardware *not* using
+* the PASID any more. Just as we depend on other
+* devices never using PASIDs that they have no right
+* to use. We have a *shared* PASID table, because it's
+* large and has to be physically contiguous. So it's
+* hard to be as defensive as we might like. */
+   intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, 
!svm->mm);
+   kfree_rcu(sdev, rcu);
+
+   if (list_empty(&svm->devs)) {
+   ioasid_free(svm->pasid);
+   

[PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation

2019-06-09 Thread Jacob Pan
From: Lu Baolu 

If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the
IOMMU driver should rely on the emulation software to allocate
and free PASID IDs. The Intel vt-d spec revision 3.0 defines a
register set to support this. This includes a capability register,
a virtual command register and a virtual response register. Refer
to section 10.4.42, 10.4.43, 10.4.44 for more information.

This patch adds the enlightened PASID allocation/free interfaces
via the virtual command register.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-pasid.c | 76 +
 drivers/iommu/intel-pasid.h | 13 +++-
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 2fefeaf..69fddd3 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -63,6 +63,82 @@ void *intel_pasid_lookup_id(int pasid)
return p;
 }
 
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
+{
+   u64 res;
+   u64 cap;
+   u8 status_code;
+   unsigned long flags;
+   int ret = 0;
+
+   if (!ecap_vcs(iommu->ecap)) {
+   pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
+   iommu->name);
+   return -ENODEV;
+   }
+
+   cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
+   if (!(cap & DMA_VCS_PAS)) {
+   pr_warn("IOMMU: %s: Emulation software doesn't support PASID 
allocation\n",
+   iommu->name);
+   return -ENODEV;
+   }
+
+   raw_spin_lock_irqsave(&iommu->register_lock, flags);
+   dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
+   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+   raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+   status_code = VCMD_VRSP_SC(res);
+   switch (status_code) {
+   case VCMD_VRSP_SC_SUCCESS:
+   *pasid = VCMD_VRSP_RESULT(res);
+   break;
+   case VCMD_VRSP_SC_NO_PASID_AVAIL:
+   pr_info("IOMMU: %s: No PASID available\n", iommu->name);
+   ret = -ENOMEM;
+   break;
+   default:
+   ret = -ENODEV;
+   pr_warn("IOMMU: %s: Unkonwn error code %d\n",
+   iommu->name, status_code);
+   }
+
+   return ret;
+}
+
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
+{
+   u64 res;
+   u8 status_code;
+   unsigned long flags;
+
+   if (!ecap_vcs(iommu->ecap)) {
+   pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
+   iommu->name);
+   return;
+   }
+
+   raw_spin_lock_irqsave(&iommu->register_lock, flags);
+   dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE);
+   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+   raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+   status_code = VCMD_VRSP_SC(res);
+   switch (status_code) {
+   case VCMD_VRSP_SC_SUCCESS:
+   break;
+   case VCMD_VRSP_SC_INVALID_PASID:
+   pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
+   break;
+   default:
+   pr_warn("IOMMU: %s: Unkonwn error code %d\n",
+   iommu->name, status_code);
+   }
+}
+
 /*
  * Per device pasid table management:
  */
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 23537b3..4b26ab5 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -19,6 +19,16 @@
 #define PASID_PDE_SHIFT6
 #define MAX_NR_PASID_BITS  20
 
+/* Virtual command interface for enlightened pasid management. */
+#define VCMD_CMD_ALLOC 0x1
+#define VCMD_CMD_FREE  0x2
+#define VCMD_VRSP_IP   0x1
+#define VCMD_VRSP_SC(e)(((e) >> 1) & 0x3)
+#define VCMD_VRSP_SC_SUCCESS   0
+#define VCMD_VRSP_SC_NO_PASID_AVAIL1
+#define VCMD_VRSP_SC_INVALID_PASID 1
+#define VCMD_VRSP_RESULT(e)(((e) >> 8) & 0xf)
+
 /*
  * Domain ID reserved for pasid entries programmed for first-level
  * only and pass-through transfer modes.
@@ -69,5 +79,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
   struct device *dev, int pasid);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 struct device *dev, int pasid);
-
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
 #endif /* __INTEL_PASID_H */
diff --g

[PATCH v4 11/22] iommu: Introduce guest PASID bind function

2019-06-09 Thread Jacob Pan
Guest shared virtual address (SVA) may require host to shadow guest
PASID tables. Guest PASID can also be allocated from the host via
enlightened interfaces. In this case, guest needs to bind the guest
mm, i.e. cr3 in guest physical address to the actual PASID table in
the host IOMMU. Nesting will be turned on such that guest virtual
address can go through a two level translation:
- 1st level translates GVA to GPA
- 2nd level translates GPA to HPA
This patch introduces APIs to bind guest PASID data to the assigned
device entry in the physical IOMMU. See the diagram below for usage
explaination.

.-.  .---.
|   vIOMMU|  | Guest process mm, FL only |
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |  GP
'-'
Guest
--| Shadow |--- GP->HP* -
  vv  |
Host  v
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.-.
| |   |Set SL to GPA-HPA|
| |   '-'
'-'

Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables
 - GP = Guest PASID
 - HP = Host PASID
* Conversion needed if non-identity GP-HP mapping option is chosen.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu Yi L 
---
 drivers/iommu/iommu.c  | 20 
 include/linux/iommu.h  | 21 +
 include/uapi/linux/iommu.h | 58 ++
 3 files changed, 99 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1758b57..d0416f60 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1648,6 +1648,26 @@ int iommu_cache_invalidate(struct iommu_domain *domain, 
struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
 
+int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+   struct device *dev, struct gpasid_bind_data *data)
+{
+   if (unlikely(!domain->ops->sva_bind_gpasid))
+   return -ENODEV;
+
+   return domain->ops->sva_bind_gpasid(domain, dev, data);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
+
+int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+   ioasid_t pasid)
+{
+   if (unlikely(!domain->ops->sva_unbind_gpasid))
+   return -ENODEV;
+
+   return domain->ops->sva_unbind_gpasid(dev, pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8d766a8..560c8c8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define IOMMU_READ (1 << 0)
@@ -267,6 +268,8 @@ struct page_response_msg {
  * @detach_pasid_table: detach the pasid table
  * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @sva_bind_gpasid: bind guest pasid and mm
+ * @sva_unbind_gpasid: unbind guest pasid and mm
  */
 struct iommu_ops {
bool (*capable)(enum iommu_cap);
@@ -332,6 +335,10 @@ struct iommu_ops {
int (*page_response)(struct device *dev, struct page_response_msg *msg);
int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
struct iommu_cache_invalidate_info *inv_info);
+   int (*sva_bind_gpasid)(struct iommu_domain *domain,
+   struct device *dev, struct gpasid_bind_data *data);
+
+   int (*sva_unbind_gpasid)(struct device *dev, int pasid);
 
unsigned long pgsize_bitmap;
 };
@@ -447,6 +454,10 @@ extern void iommu_detach_pasid_table(struct iommu_domain 
*domain);
 extern int iommu_cache_invalidate(struct iommu_domain *domain,
  struct device *dev,
  struct iommu_cache_invalidate_info *inv_info);
+extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+   struct device *dev, struct gpasid_bind_data *data);
+extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -998,6 +1009,16 @@ iommu_cache_invalidate(

[PATCH v4 09/22] iommu: Introduce cache_invalidate API

2019-06-09 Thread Jacob Pan
From: Liu Yi L 

In any virtualization use case, when the first translation stage
is "owned" by the guest OS, the host IOMMU driver has no knowledge
of caching structure updates unless the guest invalidation activities
are trapped by the virtualizer and passed down to the host.

Since the invalidation data are obtained from user space and will be
written into physical IOMMU, we must allow security check at various
layers. Therefore, generic invalidation data format are proposed here,
model specific IOMMU drivers need to convert them into their own format.

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
Signed-off-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu.c  |  10 +
 include/linux/iommu.h  |  14 ++
 include/uapi/linux/iommu.h | 110 +
 3 files changed, 134 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4496ccd..1758b57 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1638,6 +1638,16 @@ void iommu_detach_pasid_table(struct iommu_domain 
*domain)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
 
+int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+  struct iommu_cache_invalidate_info *inv_info)
+{
+   if (unlikely(!domain->ops->cache_invalidate))
+   return -ENODEV;
+
+   return domain->ops->cache_invalidate(domain, dev, inv_info);
+}
+EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d3edb10..7a37336 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -266,6 +266,7 @@ struct page_response_msg {
  * @page_response: handle page request response
  * @attach_pasid_table: attach a pasid table
  * @detach_pasid_table: detach the pasid table
+ * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -330,6 +331,8 @@ struct iommu_ops {
void (*detach_pasid_table)(struct iommu_domain *domain);
 
int (*page_response)(struct device *dev, struct page_response_msg *msg);
+   int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
+   struct iommu_cache_invalidate_info *inv_info);
 
unsigned long pgsize_bitmap;
 };
@@ -442,6 +445,9 @@ extern void iommu_detach_device(struct iommu_domain *domain,
 extern int iommu_attach_pasid_table(struct iommu_domain *domain,
struct iommu_pasid_table_config *cfg);
 extern void iommu_detach_pasid_table(struct iommu_domain *domain);
+extern int iommu_cache_invalidate(struct iommu_domain *domain,
+ struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -986,6 +992,14 @@ static inline int iommu_sva_get_pasid(struct iommu_sva 
*handle)
 static inline
 void iommu_detach_pasid_table(struct iommu_domain *domain) {}
 
+static inline int
+iommu_cache_invalidate(struct iommu_domain *domain,
+  struct device *dev,
+  struct iommu_cache_invalidate_info *inv_info)
+{
+   return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 3976767..ca4b753 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -167,4 +167,114 @@ struct iommu_pasid_table_config {
};
 };
 
+/* defines the granularity of the invalidation */
+enum iommu_inv_granularity {
+   IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */
+   IOMMU_INV_GRANU_PASID,  /* PASID-selective invalidation */
+   IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation */
+   IOMMU_INV_GRANU_NR, /* number of invalidation granularities */
+};
+
+/**
+ * struct iommu_inv_addr_info - Address Selective Invalidation Structure
+ *
+ * @flags: indicates the granularity of the address-selective invalidation
+ * - If the PASID bit is set, the @pasid field is populated and the 
invalidation
+ *   relates to cache entries tagged with this PASID and matching the address
+ *   range.
+ * - If ARCHID bit is set, @archid is populated and the invalidation relates
+ *   to cache entries tagged with this architecture specific ID and matching
+ *   the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - If neither PASID or ARCHID is set, global addr invalidation applies.
+ * - The LEAF flag indicates whether only the leaf 

[PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types

2019-06-09 Thread Jacob Pan
When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
IOTLB invalidation may be passed down from outside IOMMU subsystems.
This patch adds invalidation functions that can be used for additional
translation cache types.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dmar.c| 50 +
 include/linux/intel-iommu.h | 21 +++
 2 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 6d969a1..0cda6fb 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1357,6 +1357,21 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, 
u64 addr,
qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based IOTLB Invalidate */
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
+   unsigned int size_order, u64 granu, int ih)
+{
+   struct qi_desc desc;
+
+   desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+   QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
+   desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
+   QI_EIOTLB_AM(size_order);
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+   qi_submit_sync(&desc, iommu);
+}
+
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
u16 qdep, u64 addr, unsigned mask)
 {
@@ -1380,6 +1395,41 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 
sid, u16 pfsid,
qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based device IOTLB Invalidate */
+void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+   u32 pasid,  u16 qdep, u64 addr, unsigned size, u64 granu)
+{
+   struct qi_desc desc;
+
+   desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
+   QI_DEV_IOTLB_PFSID(pfsid);
+   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
+
+   /* If S bit is 0, we only flush a single page. If S bit is set,
+* The least significant zero bit indicates the size. VT-d spec
+* 6.5.2.6
+*/
+   if (!size)
+   desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
+   else {
+   unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size);
+
+   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | 
QI_DEV_EIOTLB_SIZE;
+   }
+   qi_submit_sync(&desc, iommu);
+}
+
+void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int 
pasid)
+{
+   struct qi_desc desc;
+
+   desc.qw0 = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu) | 
QI_PC_PASID(pasid);
+   desc.qw1 = 0;
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+   qi_submit_sync(&desc, iommu);
+}
 /*
  * Disable Queued Invalidation interface.
  */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 94d3a9a..1cdb35b 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -339,7 +339,7 @@ enum {
 #define QI_IOTLB_GRAN(gran)(((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
 #define QI_IOTLB_ADDR(addr)(((u64)addr) & VTD_PAGE_MASK)
 #define QI_IOTLB_IH(ih)(((u64)ih) << 6)
-#define QI_IOTLB_AM(am)(((u8)am))
+#define QI_IOTLB_AM(am)(((u8)am) & 0x3f)
 
 #define QI_CC_FM(fm)   (((u64)fm) << 48)
 #define QI_CC_SID(sid) (((u64)sid) << 32)
@@ -357,17 +357,22 @@ enum {
 #define QI_PC_DID(did) (((u64)did) << 16)
 #define QI_PC_GRAN(gran)   (((u64)gran) << 4)
 
-#define QI_PC_ALL_PASIDS   (QI_PC_TYPE | QI_PC_GRAN(0))
-#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1))
+/* PASID cache invalidation granu */
+#define QI_PC_ALL_PASIDS   0
+#define QI_PC_PASID_SEL1
 
 #define QI_EIOTLB_ADDR(addr)   ((u64)(addr) & VTD_PAGE_MASK)
 #define QI_EIOTLB_GL(gl)   (((u64)gl) << 7)
 #define QI_EIOTLB_IH(ih)   (((u64)ih) << 6)
-#define QI_EIOTLB_AM(am)   (((u64)am))
+#define QI_EIOTLB_AM(am)   (((u64)am) & 0x3f)
 #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
 #define QI_EIOTLB_DID(did) (((u64)did) << 16)
 #define QI_EIOTLB_GRAN(gran)   (((u64)gran) << 4)
 
+/* QI Dev-IOTLB inv granu */
+#define QI_DEV_IOTLB_GRAN_ALL  1
+#define QI_DEV_IOTLB_GRAN_PASID_SEL0
+
 #define QI_DEV_EIOTLB_ADDR(a)  ((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
 #define QI_DEV_EIOTLB_GLOB(g)  ((u64)g)
@@ -658,8 +663,16 @@ extern void qi_flush_context(struct intel_iommu *iommu, 
u16 did, u16 sid,
 u8 fm, u64 type);
 extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
  unsigned int size_order, u64 type);
+extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
+   u32 pasid, unsigned int size_order, u64 type, int ih);
 extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid

[PATCH v4 04/22] iommu: Add recoverable fault reporting

2019-06-09 Thread Jacob Pan
From: Jean-Philippe Brucker 

Some IOMMU hardware features, for example PCI's PRI and Arm SMMU's Stall,
enable recoverable I/O page faults. Allow IOMMU drivers to report PRI Page
Requests and Stall events through the new fault reporting API. The
consumer of the fault can be either an I/O page fault handler in the host,
or a guest OS.

Once handled, the fault must be completed by sending a page response back
to the IOMMU. Add an iommu_page_response() function to complete a page
fault.

Signed-off-by: Jacob Pan 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu.c | 77 ++-
 include/linux/iommu.h | 51 ++
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7955184..13b301c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -869,7 +869,14 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
  * @data: private data passed as argument to the handler
  *
  * When an IOMMU fault event is received, this handler gets called with the
- * fault event and data as argument.
+ * fault event and data as argument. The handler should return 0 on success. If
+ * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the handler should also
+ * complete the fault by calling iommu_page_response() with one of the 
following
+ * response code:
+ * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
+ * - IOMMU_PAGE_RESP_INVALID: terminate the fault
+ * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
+ *   page faults if possible.
  *
  * Return 0 if the fault handler was installed successfully, or an error.
  */
@@ -904,6 +911,8 @@ int iommu_register_device_fault_handler(struct device *dev,
}
param->fault_param->handler = handler;
param->fault_param->data = data;
+   mutex_init(¶m->fault_param->lock);
+   INIT_LIST_HEAD(¶m->fault_param->faults);
 
 done_unlock:
mutex_unlock(¶m->lock);
@@ -934,6 +943,12 @@ int iommu_unregister_device_fault_handler(struct device 
*dev)
if (!param->fault_param)
goto unlock;
 
+   /* we cannot unregister handler if there are pending faults */
+   if (!list_empty(¶m->fault_param->faults)) {
+   ret = -EBUSY;
+   goto unlock;
+   }
+
kfree(param->fault_param);
param->fault_param = NULL;
put_device(dev);
@@ -958,6 +973,7 @@ EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
 int iommu_report_device_fault(struct device *dev, struct iommu_fault_event 
*evt)
 {
struct iommu_param *param = dev->iommu_param;
+   struct iommu_fault_event *evt_pending;
struct iommu_fault_param *fparam;
int ret = 0;
 
@@ -972,6 +988,20 @@ int iommu_report_device_fault(struct device *dev, struct 
iommu_fault_event *evt)
ret = -EINVAL;
goto done_unlock;
}
+
+   if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
+   (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
+   evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
+ GFP_KERNEL);
+   if (!evt_pending) {
+   ret = -ENOMEM;
+   goto done_unlock;
+   }
+   mutex_lock(&fparam->lock);
+   list_add_tail(&evt_pending->list, &fparam->faults);
+   mutex_unlock(&fparam->lock);
+   }
+
ret = fparam->handler(evt, fparam->data);
 done_unlock:
mutex_unlock(¶m->lock);
@@ -1513,6 +1543,51 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_page_response(struct device *dev,
+   struct page_response_msg *msg)
+{
+   struct iommu_param *param = dev->iommu_param;
+   int ret = -EINVAL;
+   struct iommu_fault_event *evt;
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+   if (!domain || !domain->ops->page_response)
+   return -ENODEV;
+
+   /*
+* Device iommu_param should have been allocated when device is
+* added to its iommu_group.
+*/
+   if (!param || !param->fault_param)
+   return -EINVAL;
+
+   /* Only send response if there is a fault report pending */
+   mutex_lock(¶m->fault_param->lock);
+   if (list_empty(¶m->fault_param->faults)) {
+   pr_warn("no pending PRQ, drop response\n");
+   goto done_unlock;
+   }
+   /*
+* Check if we have a matching page request pending to respond,
+* otherwise return -EINVAL
+*/
+   list_for_each_entry(evt, ¶m->fault_param->faults, list) {
+   if (evt->fault.prm.pasid == msg->pasid &&
+   evt->fault.prm.grpid == msg->grpid) {
+   msg->iommu_data = evt->iommu_private;
+  

[PATCH v4 06/22] trace/iommu: Add sva trace events

2019-06-09 Thread Jacob Pan
From: Jean-Philippe Brucker 

For development only, trace I/O page faults and responses.

Signed-off-by: Jacob Pan 
[JPB: removed the invalidate trace event, that will be added later]
Signed-off-by: Jean-Philippe Brucker 
---
 include/trace/events/iommu.h | 87 
 1 file changed, 87 insertions(+)

diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h
index 72b4582..c8de147 100644
--- a/include/trace/events/iommu.h
+++ b/include/trace/events/iommu.h
@@ -12,6 +12,8 @@
 #define _TRACE_IOMMU_H
 
 #include 
+#include 
+#include 
 
 struct device;
 
@@ -161,6 +163,91 @@ DEFINE_EVENT(iommu_error, io_page_fault,
 
TP_ARGS(dev, iova, flags)
 );
+
+TRACE_EVENT(dev_fault,
+
+   TP_PROTO(struct device *dev,  struct iommu_fault *evt),
+
+   TP_ARGS(dev, evt),
+
+   TP_STRUCT__entry(
+   __string(device, dev_name(dev))
+   __field(int, type)
+   __field(int, reason)
+   __field(u64, addr)
+   __field(u64, fetch_addr)
+   __field(u32, pasid)
+   __field(u32, grpid)
+   __field(u32, flags)
+   __field(u32, prot)
+   ),
+
+   TP_fast_assign(
+   __assign_str(device, dev_name(dev));
+   __entry->type = evt->type;
+   if (evt->type == IOMMU_FAULT_DMA_UNRECOV) {
+   __entry->reason = evt->event.reason;
+   __entry->flags  = evt->event.flags;
+   __entry->pasid  = evt->event.pasid;
+   __entry->grpid  = 0;
+   __entry->prot   = evt->event.perm;
+   __entry->addr   = evt->event.addr;
+   __entry->fetch_addr = evt->event.fetch_addr;
+   } else {
+   __entry->reason = 0;
+   __entry->flags  = evt->prm.flags;
+   __entry->pasid  = evt->prm.pasid;
+   __entry->grpid  = evt->prm.grpid;
+   __entry->prot   = evt->prm.perm;
+   __entry->addr   = evt->prm.addr;
+   __entry->fetch_addr = 0;
+   }
+   ),
+
+   TP_printk("IOMMU:%s type=%d reason=%d addr=0x%016llx fetch=0x%016llx 
pasid=%d group=%d flags=%x prot=%d",
+   __get_str(device),
+   __entry->type,
+   __entry->reason,
+   __entry->addr,
+   __entry->fetch_addr,
+   __entry->pasid,
+   __entry->grpid,
+   __entry->flags,
+   __entry->prot
+   )
+);
+
+TRACE_EVENT(dev_page_response,
+
+   TP_PROTO(struct device *dev,  struct page_response_msg *msg),
+
+   TP_ARGS(dev, msg),
+
+   TP_STRUCT__entry(
+   __string(device, dev_name(dev))
+   __field(int, code)
+   __field(u64, addr)
+   __field(u32, pasid)
+   __field(u32, grpid)
+   ),
+
+   TP_fast_assign(
+   __assign_str(device, dev_name(dev));
+   __entry->code = msg->resp_code;
+   __entry->addr = msg->addr;
+   __entry->pasid = msg->pasid;
+   __entry->grpid = msg->grpid;
+   ),
+
+   TP_printk("IOMMU:%s code=%d addr=0x%016llx pasid=%d group=%d",
+   __get_str(device),
+   __entry->code,
+   __entry->addr,
+   __entry->pasid,
+   __entry->grpid
+   )
+);
+
 #endif /* _TRACE_IOMMU_H */
 
 /* This part must be outside protection */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support

2019-06-09 Thread Jacob Pan
When supporting guest SVA with emulated IOMMU, the guest PASID
table is shadowed in VMM. Updates to guest vIOMMU PASID table
will result in PASID cache flush which will be passed down to
the host as bind guest PASID calls.

For the SL page tables, it will be harvested from device's
default domain (request w/o PASID), or aux domain in case of
mediated device.

.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-iommu.c |   4 +
 drivers/iommu/intel-svm.c   | 187 
 include/linux/intel-iommu.h |  13 ++-
 include/linux/intel-svm.h   |  17 
 4 files changed, 219 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 7cfa0eb..3b4d712 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
.dev_enable_feat= intel_iommu_dev_enable_feat,
.dev_disable_feat   = intel_iommu_dev_disable_feat,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   .sva_bind_gpasid= intel_svm_bind_gpasid,
+   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
+#endif
 };
 
 static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 66d98e1..f06a82f 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
list_for_each_entry(sdev, &svm->devs, list) \
if (dev == sdev->dev)   \
 
+int intel_svm_bind_gpasid(struct iommu_domain *domain,
+   struct device *dev,
+   struct gpasid_bind_data *data)
+{
+   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+   struct intel_svm_dev *sdev;
+   struct intel_svm *svm = NULL;
+   struct dmar_domain *ddomain;
+   int ret = 0;
+
+   if (WARN_ON(!iommu) || !data)
+   return -EINVAL;
+
+   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
+   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   return -EINVAL;
+
+   if (dev_is_pci(dev)) {
+   /* VT-d supports devices with full 20 bit PASIDs only */
+   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
+   return -EINVAL;
+   }
+
+   /*
+* We only check host PASID range, we have no knowledge to check
+* guest PASID range nor do we use the guest PASID.
+*/
+   if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
+   return -EINVAL;
+
+   ddomain = to_dmar_domain(domain);
+   /* REVISIT:
+* Sanity check adddress width and paging mode support
+* width matching in two dimensions:
+* 1. paging mode CPU <= IOMMU
+* 2. address width Guest <= Host.
+*/
+   mutex_lock(&pasid_mutex);
+   svm = ioasid_find(NULL, data->hpasid, NULL);
+   if (IS_ERR(svm)) {
+   ret = PTR_ERR(svm);
+   goto out;
+   }
+   if (svm) {
+   /*
+* If we found svm for the PASID, there must be at
+* least one device bond, otherwise svm should be freed.
+*/
+   BUG_ON(list_empty(&svm->devs));
+
+   for_each_svm_dev() {
+   /* In case of multiple sub-devices of the same pdev 
assigned, we should
+* allow multiple bind calls with the same PASID and 
pdev.
+*/
+   sdev->users++;
+   goto out;
+   }
+   } else {
+   /* We come here when PASID has never been bond to a device. */
+   svm = kzalloc(sizeof(*svm), GFP_KERNEL);
+   if (!svm) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   /* REVISIT: upper 

[PATCH v4 03/22] iommu: Introduce device fault report API

2019-06-09 Thread Jacob Pan
Traditionally, device specific faults are detected and handled within
their own device drivers. When IOMMU is enabled, faults such as DMA
related transactions are detected by IOMMU. There is no generic
reporting mechanism to report faults back to the in-kernel device
driver or the guest OS in case of assigned devices.

This patch introduces a registration API for device specific fault
handlers. This differs from the existing iommu_set_fault_handler/
report_iommu_fault infrastructures in several ways:
- it allows to report more sophisticated fault events (both
  unrecoverable faults and page request faults) due to the nature
  of the iommu_fault struct
- it is device specific and not domain specific.

The current iommu_report_device_fault() implementation only handles
the "shoot and forget" unrecoverable fault case. Handling of page
request faults or stalled faults will come later.

Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Eric Auger 
---
 drivers/iommu/iommu.c | 127 +-
 include/linux/iommu.h |  33 -
 2 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee662..7955184 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -644,6 +644,13 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
goto err_free_name;
}
 
+   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
+   if (!dev->iommu_param) {
+   ret = -ENOMEM;
+   goto err_free_name;
+   }
+   mutex_init(&dev->iommu_param->lock);
+
kobject_get(group->devices_kobj);
 
dev->iommu_group = group;
@@ -674,6 +681,7 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
mutex_unlock(&group->mutex);
dev->iommu_group = NULL;
kobject_put(group->devices_kobj);
+   kfree(dev->iommu_param);
 err_free_name:
kfree(device->name);
 err_remove_link:
@@ -720,7 +728,7 @@ void iommu_group_remove_device(struct device *dev)
sysfs_remove_link(&dev->kobj, "iommu_group");
 
trace_remove_device_from_group(group->id, dev);
-
+   kfree(dev->iommu_param);
kfree(device->name);
kfree(device);
dev->iommu_group = NULL;
@@ -855,6 +863,123 @@ int iommu_group_unregister_notifier(struct iommu_group 
*group,
 EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 
 /**
+ * iommu_register_device_fault_handler() - Register a device fault handler
+ * @dev: the device
+ * @handler: the fault handler
+ * @data: private data passed as argument to the handler
+ *
+ * When an IOMMU fault event is received, this handler gets called with the
+ * fault event and data as argument.
+ *
+ * Return 0 if the fault handler was installed successfully, or an error.
+ */
+int iommu_register_device_fault_handler(struct device *dev,
+   iommu_dev_fault_handler_t handler,
+   void *data)
+{
+   struct iommu_param *param = dev->iommu_param;
+   int ret = 0;
+
+   /*
+* Device iommu_param should have been allocated when device is
+* added to its iommu_group.
+*/
+   if (!param)
+   return -EINVAL;
+
+   mutex_lock(¶m->lock);
+   /* Only allow one fault handler registered for each device */
+   if (param->fault_param) {
+   ret = -EBUSY;
+   goto done_unlock;
+   }
+
+   get_device(dev);
+   param->fault_param =
+   kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL);
+   if (!param->fault_param) {
+   put_device(dev);
+   ret = -ENOMEM;
+   goto done_unlock;
+   }
+   param->fault_param->handler = handler;
+   param->fault_param->data = data;
+
+done_unlock:
+   mutex_unlock(¶m->lock);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
+
+/**
+ * iommu_unregister_device_fault_handler() - Unregister the device fault 
handler
+ * @dev: the device
+ *
+ * Remove the device fault handler installed with
+ * iommu_register_device_fault_handler().
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_unregister_device_fault_handler(struct device *dev)
+{
+   struct iommu_param *param = dev->iommu_param;
+   int ret = 0;
+
+   if (!param)
+   return -EINVAL;
+
+   mutex_lock(¶m->lock);
+
+   if (!param->fault_param)
+   goto unlock;
+
+   kfree(param->fault_param);
+   param->fault_param = NULL;
+   put_device(dev);
+unlock:
+   mutex_unlock(¶m->lock);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
+
+
+/**
+ * iommu_report_device_fault() - Report fault event to device
+ * @dev: the device
+ * @evt: fault event data
+ *
+ * Called by IOMMU dri

[PATCH v4 00/22] Shared virtual address IOMMU and VT-d support

2019-06-09 Thread Jacob Pan
Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to enable SVA virtualization, i.e. shared guest
application address space and physical device DMA address. Only IOMMU portion
of the changes are included in this series. Additional support is needed in
VFIO and QEMU (will be submitted separately) to complete this functionality.

To make incremental changes and reduce the size of each patchset. This series
does not inlcude support for page request services.

In VT-d implementation, PASID table is per device and maintained in the host.
Guest PASID table is shadowed in VMM where virtual IOMMU is emulated.

.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables


This work is based on collaboration with other developers on the IOMMU
mailing list. Notably,

[1] [PATCH v6 00/22] SMMUv3 Nested Stage Setup by Eric Auger
https://lkml.org/lkml/2019/3/17/124

[2] [RFC PATCH 2/6] drivers core: Add I/O ASID allocator by Jean-Philippe
Brucker
https://www.spinics.net/lists/iommu/msg30639.html

[3] [RFC PATCH 0/5] iommu: APIs for paravirtual PASID allocation by Lu Baolu
https://lkml.org/lkml/2018/11/12/1921

[4] [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual
Address (SVA)
https://lwn.net/Articles/754331/

There are roughly three parts:
1. Generic PASID allocator [1] with extension to support custom allocator
2. IOMMU cache invalidation passdown from guest to host
3. Guest PASID bind for nested translation

All generic IOMMU APIs are reused from [1], which has a v7 just published with
no real impact to the patches used here. It is worth noting that unlike sMMU
nested stage setup, where PASID table is owned by the guest, VT-d PASID table is
owned by the host, individual PASIDs are bound instead of the PASID table.

This series is based on the new VT-d 3.0 Specification
(https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf).
This is different than the older series in [4] which was based on the older
specification that does not have scalable mode.


ChangeLog:
- V4
  - Redesigned IOASID allocator such that it can support custom
  allocators with shared helper functions. Use separate XArray
  to store IOASIDs per allocator. Took advice from Eric Auger to
  have default allocator use the generic allocator structure.
  Combined into one patch in that the default allocator is just
  "another" allocator now. Can be built as a module in case of
  driver use without IOMMU.
  - Extended bind guest PASID data to support SMMU and non-identity
  guest to host PASID mapping https://lkml.org/lkml/2019/5/21/802
  - Rebased on Jean's sva/api common tree, new patches starts with
   [PATCH v4 10/22]

- V3
  - Addressed thorough review comments from Eric Auger (Thank you!)
  - Moved IOASID allocator from driver core to IOMMU code per
suggestion by Christoph Hellwig
(https://lkml.org/lkml/2019/4/26/462)
  - Rebased on top of Jean's SVA API branch and Eric's v7[1]
(git://linux-arm.org/linux-jpb.git sva/api)
  - All IOMMU APIs are unmodified (except the new bind guest PASID
call in patch 9/16)

- V2
  - Rebased on Joerg's IOMMU x86/vt-d branch v5.1-rc4
  - Integrated with Eric Auger's new v7 series for common APIs
  (https://github.com/eauger/linux/tree/v5.1-rc3-2stage-v7)
  - Addressed review comments from Andy Shevchenko and Alex Williamson 
on
IOASID custom allocator.
  - Support multiple custom IOASID allocators (vIOMMUs) and dynamic
registration.


Jacob Pan (17):
  driver core: Add per device iommu param
  iommu: Introduce device fault data
  iommu: Introduce device fault report API
  iommu: Add a timeout parameter for PRQ response
  iommu: Use device fault trace 

[PATCH v4 01/22] driver core: Add per device iommu param

2019-06-09 Thread Jacob Pan
DMA faults can be detected by IOMMU at device level. Adding a pointer
to struct device allows IOMMU subsystem to report relevant faults
back to the device driver for further handling.
For direct assigned device (or user space drivers), guest OS holds
responsibility to handle and respond per device IOMMU fault.
Therefore we need fault reporting mechanism to propagate faults beyond
IOMMU subsystem.

There are two other IOMMU data pointers under struct device today, here
we introduce iommu_param as a parent pointer such that all device IOMMU
data can be consolidated here. The idea was suggested here by Greg KH
and Joerg. The name iommu_param is chosen here since iommu_data has been used.

Suggested-by: Greg Kroah-Hartman 
Reviewed-by: Greg Kroah-Hartman 
Signed-off-by: Jacob Pan 
Link: https://lkml.org/lkml/2017/10/6/81
---
 include/linux/device.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index e85264f..f0a975a 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -42,6 +42,7 @@ struct iommu_ops;
 struct iommu_group;
 struct iommu_fwspec;
 struct dev_pin_info;
+struct iommu_param;
 
 struct bus_attribute {
struct attributeattr;
@@ -959,6 +960,7 @@ struct dev_links_info {
  * device (i.e. the bus driver that discovered the device).
  * @iommu_group: IOMMU group the device belongs to.
  * @iommu_fwspec: IOMMU-specific properties supplied by firmware.
+ * @iommu_param: Per device generic IOMMU runtime data
  *
  * @offline_disabled: If set, the device is permanently online.
  * @offline:   Set after successful invocation of bus type's .offline().
@@ -1052,6 +1054,7 @@ struct device {
void(*release)(struct device *dev);
struct iommu_group  *iommu_group;
struct iommu_fwspec *iommu_fwspec;
+   struct iommu_param  *iommu_param;
 
booloffline_disabled:1;
booloffline:1;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 02/22] iommu: Introduce device fault data

2019-06-09 Thread Jacob Pan
Device faults detected by IOMMU can be reported outside the IOMMU
subsystem for further processing. This patch introduces
a generic device fault data structure.

The fault can be either an unrecoverable fault or a page request,
also referred to as a recoverable fault.

We only care about non internal faults that are likely to be reported
to an external subsystem.

Signed-off-by: Jacob Pan 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu, Yi L 
Signed-off-by: Ashok Raj 
Signed-off-by: Eric Auger 
---
 include/linux/iommu.h  |  44 +
 include/uapi/linux/iommu.h | 118 +
 2 files changed, 162 insertions(+)
 create mode 100644 include/uapi/linux/iommu.h

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a815cf6..7890a92 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IOMMU_READ (1 << 0)
 #define IOMMU_WRITE(1 << 1)
@@ -49,6 +50,7 @@ struct device;
 struct iommu_domain;
 struct notifier_block;
 struct iommu_sva;
+struct iommu_fault_event;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ   0x0
@@ -58,6 +60,7 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
struct device *, unsigned long, int, void *);
 typedef int (*iommu_mm_exit_handler_t)(struct device *dev, struct iommu_sva *,
   void *);
+typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 
 struct iommu_domain_geometry {
dma_addr_t aperture_start; /* First address that can be mapped*/
@@ -301,6 +304,46 @@ struct iommu_device {
struct device *dev;
 };
 
+/**
+ * struct iommu_fault_event - Generic fault event
+ *
+ * Can represent recoverable faults such as a page requests or
+ * unrecoverable faults such as DMA or IRQ remapping faults.
+ *
+ * @fault: fault descriptor
+ * @iommu_private: used by the IOMMU driver for storing fault-specific
+ * data. Users should not modify this field before
+ * sending the fault response.
+ */
+struct iommu_fault_event {
+   struct iommu_fault fault;
+   u64 iommu_private;
+};
+
+/**
+ * struct iommu_fault_param - per-device IOMMU fault data
+ * @dev_fault_handler: Callback function to handle IOMMU faults at device level
+ * @data: handler private data
+ *
+ */
+struct iommu_fault_param {
+   iommu_dev_fault_handler_t handler;
+   void *data;
+};
+
+/**
+ * struct iommu_param - collection of per-device IOMMU data
+ *
+ * @fault_param: IOMMU detected device fault reporting data
+ *
+ * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
+ * struct iommu_group  *iommu_group;
+ * struct iommu_fwspec *iommu_fwspec;
+ */
+struct iommu_param {
+   struct iommu_fault_param *fault_param;
+};
+
 int  iommu_device_register(struct iommu_device *iommu);
 void iommu_device_unregister(struct iommu_device *iommu);
 int  iommu_device_sysfs_add(struct iommu_device *iommu,
@@ -504,6 +547,7 @@ struct iommu_ops {};
 struct iommu_group {};
 struct iommu_fwspec {};
 struct iommu_device {};
+struct iommu_fault_param {};
 
 static inline bool iommu_present(struct bus_type *bus)
 {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
new file mode 100644
index 000..aaa3b6a
--- /dev/null
+++ b/include/uapi/linux/iommu.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * IOMMU user API definitions
+ */
+
+#ifndef _UAPI_IOMMU_H
+#define _UAPI_IOMMU_H
+
+#include 
+
+#define IOMMU_FAULT_PERM_READ  (1 << 0) /* read */
+#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
+#define IOMMU_FAULT_PERM_EXEC  (1 << 2) /* exec */
+#define IOMMU_FAULT_PERM_PRIV  (1 << 3) /* privileged */
+
+/* Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+   IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
+   IOMMU_FAULT_PAGE_REQ,   /* page request fault */
+};
+
+enum iommu_fault_reason {
+   IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+   /* Could not access the PASID table (fetch caused external abort) */
+   IOMMU_FAULT_REASON_PASID_FETCH,
+
+   /* PASID entry is invalid or has configuration errors */
+   IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+
+   /*
+* PASID is out of range (e.g. exceeds the maximum PASID
+* supported by the IOMMU) or disabled.
+*/
+   IOMMU_FAULT_REASON_PASID_INVALID,
+
+   /*
+* An external abort occurred fetching (or updating) a translation
+* table descriptor
+*/
+   IOMMU_FAULT_REASON_WALK_EABT,
+
+   /*
+* Could not access the page table entry (Bad address),
+* actual translation fault
+*/
+   IOMMU_FAULT_REASON_PTE_FETCH,
+
+   /* Protection flag check failed */
+   IOMMU_FAULT_REASON_PERMISSION,
+
+   /* access flag check f

[PATCH v4 05/22] iommu: Add a timeout parameter for PRQ response

2019-06-09 Thread Jacob Pan
When an IO page request is processed outside IOMMU subsystem, response
can be delayed or lost. Add a tunable setup parameter such that user can
choose the timeout for IOMMU to track pending page requests.

This timeout mechanism is a basic safety net which can be implemented in
conjunction with credit based or device level page response exception
handling.

Signed-off-by: Jacob Pan 
---
 Documentation/admin-guide/kernel-parameters.txt |  8 +++
 drivers/iommu/iommu.c   | 29 +
 2 files changed, 37 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 138f666..b43f089 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1813,6 +1813,14 @@
1 - Bypass the IOMMU for DMA.
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
 
+   iommu.prq_timeout=
+   Timeout in seconds to wait for page response
+   of a pending page request.
+   Format: 
+   Default: 10
+   0 - no timeout tracking
+   1 to 100 - allowed range
+
io7=[HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 13b301c..64e87d5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -45,6 +45,19 @@ static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
 static bool iommu_dma_strict __read_mostly = true;
 
+/*
+ * Timeout to wait for page response of a pending page request. This is
+ * intended as a basic safty net in case a pending page request is not
+ * responded for an exceptionally long time. Device may also implement
+ * its own protection mechanism against this exception.
+ * Units are in jiffies with a range between 1 - 100 seconds equivalent.
+ * Default to 10 seconds.
+ * Setting 0 means no timeout tracking.
+ */
+#define IOMMU_PAGE_RESPONSE_MAX_TIMEOUT (HZ * 100)
+#define IOMMU_PAGE_RESPONSE_DEF_TIMEOUT (HZ * 10)
+static unsigned long prq_timeout = IOMMU_PAGE_RESPONSE_DEF_TIMEOUT;
+
 struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
@@ -157,6 +170,22 @@ static int __init iommu_dma_setup(char *str)
 }
 early_param("iommu.strict", iommu_dma_setup);
 
+static int __init iommu_set_prq_timeout(char *str)
+{
+   unsigned long timeout;
+
+   if (!str)
+   return -EINVAL;
+   timeout = simple_strtoul(str, NULL, 0);
+   timeout = timeout * HZ;
+   if (timeout > IOMMU_PAGE_RESPONSE_MAX_TIMEOUT)
+   return -EINVAL;
+   prq_timeout = timeout;
+
+   return 0;
+}
+early_param("iommu.prq_timeout", iommu_set_prq_timeout);
+
 static ssize_t iommu_group_attr_show(struct kobject *kobj,
 struct attribute *__attr, char *buf)
 {
-- 
2.7.4



[PATCH v4 16/22] iommu/vt-d: Move domain helper to header

2019-06-09 Thread Jacob Pan
Move domainer helper to header to be used by SVA code.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 6 --
 include/linux/intel-iommu.h | 6 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 39b63fe..7cfa0eb 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -427,12 +427,6 @@ static void init_translation_status(struct intel_iommu 
*iommu)
iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
 }
 
-/* Convert generic 'struct iommu_domain to private struct dmar_domain */
-static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
-{
-   return container_of(dom, struct dmar_domain, domain);
-}
-
 static int __init intel_iommu_setup(char *str)
 {
if (!str)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8605c74..b75f17d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -597,6 +597,12 @@ static inline void __iommu_flush_cache(
clflush_cache_range(addr, size);
 }
 
+/* Convert generic 'struct iommu_domain to private struct dmar_domain */
+static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct dmar_domain, domain);
+}
+
 /*
  * 0: readable
  * 1: writable
-- 
2.7.4



RE: [RFC v3 0/3] vfio_pci: wrap pci device as a mediated device

2019-06-09 Thread Liu, Yi L
> From Alex Williamson
> Sent: Thursday, May 23, 2019 9:03 PM
> To: Liu, Yi L 
> Cc: kwankh...@nvidia.com; Tian, Kevin ;
> baolu...@linux.intel.com; Sun, Yi Y ; j...@8bytes.org; 
> jean-
> philippe.bruc...@arm.com; pet...@redhat.com; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org; yamada.masah...@socionext.com; iommu@lists.linux-
> foundation.org
> Subject: Re: [RFC v3 0/3] vfio_pci: wrap pci device as a mediated device
> 
> On Thu, 23 May 2019 08:44:57 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> >
> > Sorry to disturb you. Do you want to review on this version or review a 
> > rebased
> version? :-) If rebase version is better, I can try to do it asap.
> 
> Hi Yi,
> 
> Perhaps you missed my comments on 1/3:
> 
> https://www.spinics.net/lists/kvm/msg187282.html
> 
> In summary, it looks pretty good, but consider a file name more consistent 
> with the
> existing files and prune out the code changes from the code moves so they can 
> be
> reviewed more easily.  Thanks,

Thanks for the remind, Alex. So sorry I made changes in a "disordered".
I've made the changes accordingly. Pls refer to my latest post just now :-)

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu