Re: [PATCH v3 1/1] PCI/ATS: Check PRI supported on the PF device when SRIOV is enabled

2020-07-23 Thread Lu Baolu

On 7/24/20 6:37 AM, Ashok Raj wrote:

PASID and PRI capabilities are only enumerated in PF devices. VF devices
do not enumerate these capabilites. IOMMU drivers also need to enumerate
them before enabling features in the IOMMU. Extending the same support as
PASID feature discovery (pci_pasid_features) for PRI.

Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities before 
ATS")
Signed-off-by: Ashok Raj 

To: Bjorn Helgaas 
To: Joerg Roedel 
To: Lu Baolu 


Reviewed-by: Lu Baolu 

Best regards,
baolu


Cc: sta...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: Ashok Raj 
Cc: iommu@lists.linux-foundation.org
---
v3: Added Fixes tag
v2: Fixed build failure reported from lkp when CONFIG_PRI=n

  drivers/iommu/intel/iommu.c |  2 +-
  drivers/pci/ats.c   | 13 +
  include/linux/pci-ats.h |  4 
  3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e98..276452f5e6a7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2560,7 +2560,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
}
  
  			if (info->ats_supported && ecap_prs(iommu->ecap) &&

-   pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
+   pci_pri_supported(pdev))
info->pri_supported = 1;
}
}
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index b761c1f72f67..2e6cf0c700f7 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -325,6 +325,19 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
  
  	return pdev->pasid_required;

  }
+
+/**
+ * pci_pri_supported - Check if PRI is supported.
+ * @pdev: PCI device structure
+ *
+ * Returns true if PRI capability is present, false otherwise.
+ */
+bool pci_pri_supported(struct pci_dev *pdev)
+{
+   /* VFs share the PF PRI configuration */
+   return !!(pci_physfn(pdev)->pri_cap);
+}
+EXPORT_SYMBOL_GPL(pci_pri_supported);
  #endif /* CONFIG_PCI_PRI */
  
  #ifdef CONFIG_PCI_PASID

diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index f75c307f346d..df54cd5b15db 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -28,6 +28,10 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
  void pci_disable_pri(struct pci_dev *pdev);
  int pci_reset_pri(struct pci_dev *pdev);
  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
+bool pci_pri_supported(struct pci_dev *pdev);
+#else
+static inline bool pci_pri_supported(struct pci_dev *pdev)
+{ return false; }
  #endif /* CONFIG_PCI_PRI */
  
  #ifdef CONFIG_PCI_PASID



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-contiguous: cleanup dma_alloc_contiguous

2020-07-23 Thread Nicolin Chen
Hi Christoph,

On Thu, Jul 23, 2020 at 02:01:33PM +0200, Christoph Hellwig wrote:
> Split out a cma_alloc_aligned helper to deal with the "interesting"
> calling conventions for cma_alloc, which then allows to the main
> function to be written straight forward.  This also takes advantage
> of the fact that NULL dev arguments have been gone from the DMA API
> for a while.

I am still seeing some potential NULL dev callers:
drivers/staging/emxx_udc/emxx_udc.c:2596:   ep->virt_buf = 
dma_alloc_coherent(NULL, PAGE_SIZE,
drivers/tty/synclink.c:3667:info->buffer_list = 
dma_alloc_coherent(NULL, BUFFERLISTSIZE, >buffer_list_dma_addr, 
GFP_KERNEL);
drivers/tty/synclink.c:3777:BufferList[i].virt_addr = 
dma_alloc_coherent(NULL, DMABUFFERSIZE, [i].dma_addr, GFP_KERNEL);

Personally I don't feel that it hurts to check the dev pointer
as we do now, but if you are confident with this version:

Reviewed-by: Nicolin Chen 

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/12] iommu/vt-d: Add page response ops support

2020-07-23 Thread Lu Baolu
After page requests are handled, software must respond to the device
which raised the page request with the result. This is done through
the iommu ops.page_response if the request was reported to outside of
vendor iommu driver through iommu_report_device_fault(). This adds the
VT-d implementation of page_response ops.

Co-developed-by: Jacob Pan 
Signed-off-by: Jacob Pan 
Co-developed-by: Liu Yi L 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
Reviewed-by: Kevin Tian 
---
 drivers/iommu/intel/iommu.c |  1 +
 drivers/iommu/intel/svm.c   | 99 +
 include/linux/intel-iommu.h |  3 ++
 3 files changed, 103 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 097a71b3b75d..a939d17d40a1 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -6073,6 +6073,7 @@ const struct iommu_ops intel_iommu_ops = {
.sva_bind   = intel_svm_bind,
.sva_unbind = intel_svm_unbind,
.sva_get_pasid  = intel_svm_get_pasid,
+   .page_response  = intel_svm_page_response,
 #endif
 };
 
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 140114ab1375..85ce8daa3177 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1078,3 +1078,102 @@ int intel_svm_get_pasid(struct iommu_sva *sva)
 
return pasid;
 }
+
+int intel_svm_page_response(struct device *dev,
+   struct iommu_fault_event *evt,
+   struct iommu_page_response *msg)
+{
+   struct iommu_fault_page_request *prm;
+   struct intel_svm_dev *sdev = NULL;
+   struct intel_svm *svm = NULL;
+   struct intel_iommu *iommu;
+   bool private_present;
+   bool pasid_present;
+   bool last_page;
+   u8 bus, devfn;
+   int ret = 0;
+   u16 sid;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;
+
+   iommu = device_to_iommu(dev, , );
+   if (!iommu)
+   return -ENODEV;
+
+   if (!msg || !evt)
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+
+   prm = >fault.prm;
+   sid = PCI_DEVID(bus, devfn);
+   pasid_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+   private_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
+   last_page = prm->flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+
+   if (!pasid_present) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   if (prm->pasid == 0 || prm->pasid >= PASID_MAX) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   ret = pasid_to_svm_sdev(dev, prm->pasid, , );
+   if (ret || !sdev) {
+   ret = -ENODEV;
+   goto out;
+   }
+
+   /*
+* For responses from userspace, need to make sure that the
+* pasid has been bound to its mm.
+*/
+   if (svm->flags & SVM_FLAG_GUEST_MODE) {
+   struct mm_struct *mm;
+
+   mm = get_task_mm(current);
+   if (!mm) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   if (mm != svm->mm) {
+   ret = -ENODEV;
+   mmput(mm);
+   goto out;
+   }
+
+   mmput(mm);
+   }
+
+   /*
+* Per VT-d spec. v3.0 ch7.7, system software must respond
+* with page group response if private data is present (PDP)
+* or last page in group (LPIG) bit is set. This is an
+* additional VT-d requirement beyond PCI ATS spec.
+*/
+   if (last_page || private_present) {
+   struct qi_desc desc;
+
+   desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid) |
+   QI_PGRP_PASID_P(pasid_present) |
+   QI_PGRP_PDP(private_present) |
+   QI_PGRP_RESP_CODE(msg->code) |
+   QI_PGRP_RESP_TYPE;
+   desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+   if (private_present)
+   memcpy(, prm->private_data,
+  sizeof(prm->private_data));
+
+   qi_submit_sync(iommu, , 1, 0);
+   }
+out:
+   mutex_unlock(_mutex);
+   return ret;
+}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 7e69aec60592..b1ed2f25f7c0 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -742,6 +742,9 @@ struct iommu_sva *intel_svm_bind(struct device *dev, struct 
mm_struct *mm,
 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
 int intel_svm_get_pasid(struct iommu_sva *handle);
+int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
+

[PATCH 09/12] iommu/vt-d: Add a helper to get svm and sdev for pasid

2020-07-23 Thread Lu Baolu
There are several places in the code that need to get the pointers of
svm and sdev according to a pasid and device. Add a helper to achieve
this for code consolidation and readability.

Signed-off-by: Lu Baolu 
Reviewed-by: Kevin Tian 
---
 drivers/iommu/intel/svm.c | 115 +-
 1 file changed, 65 insertions(+), 50 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 65d2327dcd0d..c104a50a625c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -228,13 +228,57 @@ static LIST_HEAD(global_svm_list);
list_for_each_entry((sdev), &(svm)->devs, list) \
if ((d) != (sdev)->dev) {} else
 
+static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
+struct intel_svm **rsvm,
+struct intel_svm_dev **rsdev)
+{
+   struct intel_svm_dev *d, *sdev = NULL;
+   struct intel_svm *svm;
+
+   /* The caller should hold the pasid_mutex lock */
+   if (WARN_ON(!mutex_is_locked(_mutex)))
+   return -EINVAL;
+
+   if (pasid == INVALID_IOASID || pasid >= PASID_MAX)
+   return -EINVAL;
+
+   svm = ioasid_find(NULL, pasid, NULL);
+   if (IS_ERR(svm))
+   return PTR_ERR(svm);
+
+   if (!svm)
+   goto out;
+
+   /*
+* If we found svm for the PASID, there must be at least one device
+* bond.
+*/
+   if (WARN_ON(list_empty(>devs)))
+   return -EINVAL;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(d, >devs, list) {
+   if (d->dev == dev) {
+   sdev = d;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+out:
+   *rsvm = svm;
+   *rsdev = sdev;
+
+   return 0;
+}
+
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
  struct iommu_gpasid_bind_data *data)
 {
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
+   struct intel_svm_dev *sdev = NULL;
struct dmar_domain *dmar_domain;
-   struct intel_svm_dev *sdev;
-   struct intel_svm *svm;
+   struct intel_svm *svm = NULL;
int ret = 0;
 
if (WARN_ON(!iommu) || !data)
@@ -261,35 +305,23 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
dmar_domain = to_dmar_domain(domain);
 
mutex_lock(_mutex);
-   svm = ioasid_find(NULL, data->hpasid, NULL);
-   if (IS_ERR(svm)) {
-   ret = PTR_ERR(svm);
+   ret = pasid_to_svm_sdev(dev, data->hpasid, , );
+   if (ret)
goto out;
-   }
-
-   if (svm) {
-   /*
-* If we found svm for the PASID, there must be at
-* least one device bond, otherwise svm should be freed.
-*/
-   if (WARN_ON(list_empty(>devs))) {
-   ret = -EINVAL;
-   goto out;
-   }
 
+   if (sdev) {
/*
 * Do not allow multiple bindings of the same device-PASID since
 * there is only one SL page tables per PASID. We may revisit
 * once sharing PGD across domains are supported.
 */
-   for_each_svm_dev(sdev, svm, dev) {
-   dev_warn_ratelimited(dev,
-"Already bound with PASID %u\n",
-svm->pasid);
-   ret = -EBUSY;
-   goto out;
-   }
-   } else {
+   dev_warn_ratelimited(dev, "Already bound with PASID %u\n",
+svm->pasid);
+   ret = -EBUSY;
+   goto out;
+   }
+
+   if (!svm) {
/* We come here when PASID has never been bond to a device. */
svm = kzalloc(sizeof(*svm), GFP_KERNEL);
if (!svm) {
@@ -372,25 +404,17 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
struct intel_svm_dev *sdev;
struct intel_svm *svm;
-   int ret = -EINVAL;
+   int ret;
 
if (WARN_ON(!iommu))
return -EINVAL;
 
mutex_lock(_mutex);
-   svm = ioasid_find(NULL, pasid, NULL);
-   if (!svm) {
-   ret = -EINVAL;
-   goto out;
-   }
-
-   if (IS_ERR(svm)) {
-   ret = PTR_ERR(svm);
+   ret = pasid_to_svm_sdev(dev, pasid, , );
+   if (ret)
goto out;
-   }
 
-   for_each_svm_dev(sdev, svm, dev) {
-   ret = 0;
+   if (sdev) {
if (iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
sdev->users--;
if (!sdev->users) {
@@ -414,7 +438,6 @@ int 

[PATCH 00/12] [PULL REQUEST] iommu/vt-d: patches for v5.9

2020-07-23 Thread Lu Baolu
Hi Joerg,

Below patches have been piled up for v5.9. It includes:

 - Misc tweaks and fixes for vSVA
 - Report/response page request events
 - Cleanups

Can you please consider them for iommu/next?

Best regards,
Lu Baolu

Jacob Pan (4):
  iommu/vt-d: Remove global page support in devTLB flush
  iommu/vt-d: Fix PASID devTLB invalidation
  iommu/vt-d: Warn on out-of-range invalidation address
  iommu/vt-d: Disable multiple GPASID-dev bind

Liu Yi L (3):
  iommu/vt-d: Enforce PASID devTLB field mask
  iommu/vt-d: Handle non-page aligned address
  iommu/vt-d: Fix devTLB flush for vSVA

Lu Baolu (5):
  iommu/vt-d: Refactor device_to_iommu() helper
  iommu/vt-d: Add a helper to get svm and sdev for pasid
  iommu/vt-d: Report page request faults for guest SVA
  iommu/vt-d: Add page response ops support
  iommu/vt-d: Rename intel-pasid.h to pasid.h

 drivers/iommu/intel/debugfs.c |   2 +-
 drivers/iommu/intel/dmar.c|  25 +-
 drivers/iommu/intel/iommu.c   |  96 +++--
 drivers/iommu/intel/pasid.c   |  13 +-
 .../iommu/intel/{intel-pasid.h => pasid.h}|   2 +-
 drivers/iommu/intel/svm.c | 335 +-
 include/linux/intel-iommu.h   |  11 +-
 7 files changed, 337 insertions(+), 147 deletions(-)
 rename drivers/iommu/intel/{intel-pasid.h => pasid.h} (98%)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/12] iommu/vt-d: Refactor device_to_iommu() helper

2020-07-23 Thread Lu Baolu
It is refactored in two ways:

- Make it global so that it could be used in other files.

- Make bus/devfn optional so that callers could ignore these two returned
values when they only want to get the coresponding iommu pointer.

Signed-off-by: Lu Baolu 
Reviewed-by: Kevin Tian 
---
 drivers/iommu/intel/iommu.c | 55 +++--
 drivers/iommu/intel/svm.c   |  8 +++---
 include/linux/intel-iommu.h |  3 +-
 3 files changed, 21 insertions(+), 45 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 592869ce2a0f..097a71b3b75d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -779,16 +779,16 @@ is_downstream_to_pci_bridge(struct device *dev, struct 
device *bridge)
return false;
 }
 
-static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 
*devfn)
+struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn)
 {
struct dmar_drhd_unit *drhd = NULL;
+   struct pci_dev *pdev = NULL;
struct intel_iommu *iommu;
struct device *tmp;
-   struct pci_dev *pdev = NULL;
u16 segment = 0;
int i;
 
-   if (iommu_dummy(dev))
+   if (!dev || iommu_dummy(dev))
return NULL;
 
if (dev_is_pci(dev)) {
@@ -819,8 +819,10 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
if (pdev && pdev->is_virtfn)
goto got_pdev;
 
-   *bus = drhd->devices[i].bus;
-   *devfn = drhd->devices[i].devfn;
+   if (bus && devfn) {
+   *bus = drhd->devices[i].bus;
+   *devfn = drhd->devices[i].devfn;
+   }
goto out;
}
 
@@ -830,8 +832,10 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
 
if (pdev && drhd->include_all) {
got_pdev:
-   *bus = pdev->bus->number;
-   *devfn = pdev->devfn;
+   if (bus && devfn) {
+   *bus = pdev->bus->number;
+   *devfn = pdev->devfn;
+   }
goto out;
}
}
@@ -5152,11 +5156,10 @@ static int aux_domain_add_dev(struct dmar_domain 
*domain,
  struct device *dev)
 {
int ret;
-   u8 bus, devfn;
unsigned long flags;
struct intel_iommu *iommu;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu)
return -ENODEV;
 
@@ -5242,9 +5245,8 @@ static int prepare_domain_attach_device(struct 
iommu_domain *domain,
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct intel_iommu *iommu;
int addr_width;
-   u8 bus, devfn;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu)
return -ENODEV;
 
@@ -5674,9 +5676,8 @@ static bool intel_iommu_capable(enum iommu_cap cap)
 static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 {
struct intel_iommu *iommu;
-   u8 bus, devfn;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu)
return ERR_PTR(-ENODEV);
 
@@ -5689,9 +5690,8 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
 static void intel_iommu_release_device(struct device *dev)
 {
struct intel_iommu *iommu;
-   u8 bus, devfn;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu)
return;
 
@@ -5841,37 +5841,14 @@ static struct iommu_group 
*intel_iommu_device_group(struct device *dev)
return generic_device_group(dev);
 }
 
-#ifdef CONFIG_INTEL_IOMMU_SVM
-struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
-{
-   struct intel_iommu *iommu;
-   u8 bus, devfn;
-
-   if (iommu_dummy(dev)) {
-   dev_warn(dev,
-"No IOMMU translation for device; cannot enable 
SVM\n");
-   return NULL;
-   }
-
-   iommu = device_to_iommu(dev, , );
-   if ((!iommu)) {
-   dev_err(dev, "No IOMMU for device; cannot enable SVM\n");
-   return NULL;
-   }
-
-   return iommu;
-}
-#endif /* CONFIG_INTEL_IOMMU_SVM */
-
 static int intel_iommu_enable_auxd(struct device *dev)
 {
struct device_domain_info *info;
struct intel_iommu *iommu;
unsigned long flags;
-   u8 bus, devfn;
int ret;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, 

[PATCH 03/12] iommu/vt-d: Fix PASID devTLB invalidation

2020-07-23 Thread Lu Baolu
From: Jacob Pan 

DevTLB flush can be used for both DMA request with and without PASIDs.
The former uses PASID#0 (RID2PASID), latter uses non-zero PASID for SVA
usage.

This patch adds a check for PASID value such that devTLB flush with
PASID is used for SVA case. This is more efficient in that multiple
PASIDs can be used by a single device, when tearing down a PASID entry
we shall flush only the devTLB specific to a PASID.

Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table")
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/pasid.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c81f0f17c6ba..fa0154cce537 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
qdep = info->ats_qdep;
pfsid = info->pfsid;
 
-   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
+   /*
+* When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
+* devTLB flush w/o PASID should be used. For non-zero PASID under
+* SVA usage, device could do DMA with multiple PASIDs. It is more
+* efficient to flush devTLB specific to the PASID.
+*/
+   if (pasid == PASID_RID2PASID)
+   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);
+   else
+   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 
- VTD_PAGE_SHIFT);
 }
 
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/12] iommu/vt-d: Warn on out-of-range invalidation address

2020-07-23 Thread Lu Baolu
From: Jacob Pan 

For guest requested IOTLB invalidation, address and mask are provided as
part of the invalidation data. VT-d HW silently ignores any address bits
below the mask. SW shall also allow such case but give warning if
address does not align with the mask. This patch relax the fault
handling from error to warning and proceed with invalidation request
with the given mask.

Fixes: 6ee1b77ba3ac0 ("iommu/vt-d: Add svm/sva invalidate function")
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 281b32b233d8..592869ce2a0f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5453,13 +5453,12 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
+   /* HW will ignore LSB bits based on address mask */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
(inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
size)) - 1))) {
-   pr_err_ratelimited("Address out of range, 
0x%llx, size order %llu\n",
+   pr_err_ratelimited("User address not aligned, 
0x%llx, size order %llu\n",
   inv_info->addr_info.addr, 
size);
-   ret = -ERANGE;
-   goto out_unlock;
}
 
/*
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/12] iommu/vt-d: Disable multiple GPASID-dev bind

2020-07-23 Thread Lu Baolu
From: Jacob Pan 

For the unlikely use case where multiple aux domains from the same pdev
are attached to a single guest and then bound to a single process
(thus same PASID) within that guest, we cannot easily support this case
by refcounting the number of users. As there is only one SL page table
per PASID while we have multiple aux domains thus multiple SL page tables
for the same PASID.

Extra unbinding guest PASID can happen due to race between normal and
exception cases. Termination of one aux domain may affect others unless
we actively track and switch aux domains to ensure the validity of SL
page tables and TLB states in the shared PASID entry.

Support for sharing second level PGDs across domains can reduce the
complexity but this is not available due to the limitations on VFIO
container architecture. We can revisit this decision once sharing PGDs
are available.

Overall, the complexity and potential glitch do not warrant this unlikely
use case thereby removed by this patch.

Fixes: 56722a4398a30 ("iommu/vt-d: Add bind guest PASID support")
Cc: Kevin Tian 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/svm.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 6c87c807a0ab..d386853121a2 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -277,20 +277,16 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
goto out;
}
 
+   /*
+* Do not allow multiple bindings of the same device-PASID since
+* there is only one SL page tables per PASID. We may revisit
+* once sharing PGD across domains are supported.
+*/
for_each_svm_dev(sdev, svm, dev) {
-   /*
-* For devices with aux domains, we should allow
-* multiple bind calls with the same PASID and pdev.
-*/
-   if (iommu_dev_feature_enabled(dev,
- IOMMU_DEV_FEAT_AUX)) {
-   sdev->users++;
-   } else {
-   dev_warn_ratelimited(dev,
-"Already bound with PASID 
%u\n",
-svm->pasid);
-   ret = -EBUSY;
-   }
+   dev_warn_ratelimited(dev,
+"Already bound with PASID %u\n",
+svm->pasid);
+   ret = -EBUSY;
goto out;
}
} else {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/12] iommu/vt-d: Report page request faults for guest SVA

2020-07-23 Thread Lu Baolu
A pasid might be bound to a page table from a VM guest via the iommu
ops.sva_bind_gpasid. In this case, when a DMA page fault is detected
on the physical IOMMU, we need to inject the page fault request into
the guest. After the guest completes handling the page fault, a page
response need to be sent back via the iommu ops.page_response().

This adds support to report a page request fault. Any external module
which is interested in handling this fault should regiester a notifier
with iommu_register_device_fault_handler().

Co-developed-by: Jacob Pan 
Signed-off-by: Jacob Pan 
Co-developed-by: Liu Yi L 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
Reviewed-by: Kevin Tian 
---
 drivers/iommu/intel/svm.c | 103 +++---
 1 file changed, 85 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index c104a50a625c..140114ab1375 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -811,8 +811,63 @@ static void intel_svm_drain_prq(struct device *dev, int 
pasid)
}
 }
 
+static int prq_to_iommu_prot(struct page_req_dsc *req)
+{
+   int prot = 0;
+
+   if (req->rd_req)
+   prot |= IOMMU_FAULT_PERM_READ;
+   if (req->wr_req)
+   prot |= IOMMU_FAULT_PERM_WRITE;
+   if (req->exe_req)
+   prot |= IOMMU_FAULT_PERM_EXEC;
+   if (req->pm_req)
+   prot |= IOMMU_FAULT_PERM_PRIV;
+
+   return prot;
+}
+
+static int
+intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc)
+{
+   struct iommu_fault_event event;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;
+
+   /* Fill in event data for device specific processing */
+   memset(, 0, sizeof(struct iommu_fault_event));
+   event.fault.type = IOMMU_FAULT_PAGE_REQ;
+   event.fault.prm.addr = desc->addr;
+   event.fault.prm.pasid = desc->pasid;
+   event.fault.prm.grpid = desc->prg_index;
+   event.fault.prm.perm = prq_to_iommu_prot(desc);
+
+   if (desc->lpig)
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+   if (desc->pasid_present) {
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
+   }
+   if (desc->priv_data_present) {
+   /*
+* Set last page in group bit if private data is present,
+* page response is required as it does for LPIG.
+* iommu_report_device_fault() doesn't understand this vendor
+* specific requirement thus we set last_page as a workaround.
+*/
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
+   memcpy(event.fault.prm.private_data, desc->priv_data,
+  sizeof(desc->priv_data));
+   }
+
+   return iommu_report_device_fault(dev, );
+}
+
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
+   struct intel_svm_dev *sdev = NULL;
struct intel_iommu *iommu = d;
struct intel_svm *svm = NULL;
int head, tail, handled = 0;
@@ -824,7 +879,6 @@ static irqreturn_t prq_event_thread(int irq, void *d)
tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
while (head != tail) {
-   struct intel_svm_dev *sdev;
struct vm_area_struct *vma;
struct page_req_dsc *req;
struct qi_desc resp;
@@ -860,6 +914,20 @@ static irqreturn_t prq_event_thread(int irq, void *d)
}
}
 
+   if (!sdev || sdev->sid != req->rid) {
+   struct intel_svm_dev *t;
+
+   sdev = NULL;
+   rcu_read_lock();
+   list_for_each_entry_rcu(t, >devs, list) {
+   if (t->sid == req->rid) {
+   sdev = t;
+   break;
+   }
+   }
+   rcu_read_unlock();
+   }
+
result = QI_RESP_INVALID;
/* Since we're using init_mm.pgd directly, we should never take
 * any faults on kernel addresses. */
@@ -870,6 +938,17 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (!is_canonical_address(address))
goto bad_req;
 
+   /*
+* If prq is to be handled outside iommu driver via receiver of
+* the fault notifiers, we skip the page response here.
+*/
+   if (svm->flags & SVM_FLAG_GUEST_MODE) {
+   if (sdev && !intel_svm_prq_report(sdev->dev, 

[PATCH 12/12] iommu/vt-d: Rename intel-pasid.h to pasid.h

2020-07-23 Thread Lu Baolu
As Intel VT-d files have been moved to its own subdirectory, the prefix
makes no sense. No functional changes.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/debugfs.c  | 2 +-
 drivers/iommu/intel/iommu.c| 2 +-
 drivers/iommu/intel/pasid.c| 2 +-
 drivers/iommu/intel/{intel-pasid.h => pasid.h} | 2 +-
 drivers/iommu/intel/svm.c  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)
 rename drivers/iommu/intel/{intel-pasid.h => pasid.h} (98%)

diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
index cf1ebb98e418..efea7f02abd9 100644
--- a/drivers/iommu/intel/debugfs.c
+++ b/drivers/iommu/intel/debugfs.c
@@ -15,7 +15,7 @@
 
 #include 
 
-#include "intel-pasid.h"
+#include "pasid.h"
 
 struct tbl_walk {
u16 bus;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a939d17d40a1..ca557d351518 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -48,7 +48,7 @@
 #include 
 
 #include "../irq_remapping.h"
-#include "intel-pasid.h"
+#include "pasid.h"
 
 #define ROOT_SIZE  VTD_PAGE_SIZE
 #define CONTEXT_SIZE   VTD_PAGE_SIZE
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index fa0154cce537..e6faedf42fd4 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -19,7 +19,7 @@
 #include 
 #include 
 
-#include "intel-pasid.h"
+#include "pasid.h"
 
 /*
  * Intel IOMMU system wide PASID name space:
diff --git a/drivers/iommu/intel/intel-pasid.h b/drivers/iommu/intel/pasid.h
similarity index 98%
rename from drivers/iommu/intel/intel-pasid.h
rename to drivers/iommu/intel/pasid.h
index c5318d40e0fa..c9850766c3a9 100644
--- a/drivers/iommu/intel/intel-pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * intel-pasid.h - PASID idr, table and entry header
+ * pasid.h - PASID idr, table and entry header
  *
  * Copyright (C) 2018 Intel Corporation
  *
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 85ce8daa3177..442623ac4b47 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -20,7 +20,7 @@
 #include 
 #include 
 
-#include "intel-pasid.h"
+#include "pasid.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
 static void intel_svm_drain_prq(struct device *dev, int pasid);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/12] iommu/vt-d: Fix devTLB flush for vSVA

2020-07-23 Thread Lu Baolu
From: Liu Yi L 

For guest SVA usage, in order to optimize for less VMEXIT, guest request
of IOTLB flush also includes device TLB.

On the host side, IOMMU driver performs IOTLB and implicit devTLB
invalidation. When PASID-selective granularity is requested by the guest
we need to derive the equivalent address range for devTLB instead of
using the address information in the UAPI data. The reason for that is,
unlike IOTLB flush, devTLB flush does not support PASID-selective
granularity. This is to say, we need to set the following in the PASID
based devTLB invalidation descriptor:
- entire 64 bit range in address ~(0x1 << 63)
- S bit = 1 (VT-d CH 6.5.2.6).

Without this fix, device TLB flush range is not set properly for PASID
selective granularity. This patch also merged devTLB flush code for both
implicit and explicit cases.

Fixes: 6ee1b77ba3ac ("iommu/vt-d: Add svm/sva invalidate function")
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 888697dd240a..281b32b233d8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5422,7 +5422,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
sid = PCI_DEVID(bus, devfn);
 
/* Size is only valid in address selective invalidation */
-   if (inv_info->granularity != IOMMU_INV_GRANU_PASID)
+   if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
size = to_vtd_size(inv_info->addr_info.granule_size,
   inv_info->addr_info.nb_granules);
 
@@ -5431,6 +5431,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 IOMMU_CACHE_INV_TYPE_NR) {
int granu = 0;
u64 pasid = 0;
+   u64 addr = 0;
 
granu = to_vtd_granularity(cache_type, inv_info->granularity);
if (granu == -EINVAL) {
@@ -5470,24 +5471,34 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
(granu == QI_GRAN_NONG_PASID) ? -1 : 1 
<< size,
inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
 
+   if (!info->ats_enabled)
+   break;
/*
 * Always flush device IOTLB if ATS is enabled. vIOMMU
 * in the guest may assume IOTLB flush is inclusive,
 * which is more efficient.
 */
-   if (info->ats_enabled)
-   qi_flush_dev_iotlb_pasid(iommu, sid,
-   info->pfsid, pasid,
-   info->ats_qdep,
-   inv_info->addr_info.addr,
-   size);
-   break;
+   fallthrough;
case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
+   /*
+* PASID based device TLB invalidation does not support
+* IOMMU_INV_GRANU_PASID granularity but only supports
+* IOMMU_INV_GRANU_ADDR.
+* The equivalent of that is we set the size to be the
+* entire range of 64 bit. User only provides PASID info
+* without address info. So we set addr to 0.
+*/
+   if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+   size = 64 - VTD_PAGE_SHIFT;
+   addr = 0;
+   } else if (inv_info->granularity == 
IOMMU_INV_GRANU_ADDR) {
+   addr = inv_info->addr_info.addr;
+   }
+
if (info->ats_enabled)
qi_flush_dev_iotlb_pasid(iommu, sid,
info->pfsid, pasid,
-   info->ats_qdep,
-   inv_info->addr_info.addr,
+   info->ats_qdep, addr,
size);
else
pr_warn_ratelimited("Passdown device IOTLB 
flush w/o ATS!\n");
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/12] iommu/vt-d: Remove global page support in devTLB flush

2020-07-23 Thread Lu Baolu
From: Jacob Pan 

Global pages support is removed from VT-d spec 3.0 for dev TLB
invalidation. This patch is to remove the bits for vSVA. Similar change
already made for the native SVA. See the link below.

Link: https://lore.kernel.org/linux-iommu/20190830142919.ge11...@8bytes.org/T/
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/dmar.c  | 4 +---
 drivers/iommu/intel/iommu.c | 4 ++--
 include/linux/intel-iommu.h | 3 +--
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 16f47041f1bf..a46d8d4abdb8 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1439,8 +1439,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, 
u32 pasid, u64 addr,
 
 /* PASID-based device IOTLB Invalidate */
 void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
- u32 pasid,  u16 qdep, u64 addr,
- unsigned int size_order, u64 granu)
+ u32 pasid,  u16 qdep, u64 addr, unsigned int 
size_order)
 {
unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
@@ -1448,7 +1447,6 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, 
u16 sid, u16 pfsid,
desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
QI_DEV_IOTLB_PFSID(pfsid);
-   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
 
/*
 * If S bit is 0, we only flush a single page. If S bit is set,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cc158250f45e..888697dd240a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5480,7 +5480,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
info->pfsid, pasid,
info->ats_qdep,
inv_info->addr_info.addr,
-   size, granu);
+   size);
break;
case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
if (info->ats_enabled)
@@ -5488,7 +5488,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
info->pfsid, pasid,
info->ats_qdep,
inv_info->addr_info.addr,
-   size, granu);
+   size);
else
pr_warn_ratelimited("Passdown device IOTLB 
flush w/o ATS!\n");
break;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 711bdca975be..383847b859a1 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -381,7 +381,6 @@ enum {
 
 #define QI_DEV_EIOTLB_ADDR(a)  ((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
-#define QI_DEV_EIOTLB_GLOB(g)  ((u64)(g) & 0x1)
 #define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xf) << 32)
 #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16)
 #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4)
@@ -707,7 +706,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, 
u32 pasid, u64 addr,
 
 void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
  u32 pasid, u16 qdep, u64 addr,
- unsigned int size_order, u64 granu);
+ unsigned int size_order);
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
  int pasid);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/12] iommu/vt-d: Enforce PASID devTLB field mask

2020-07-23 Thread Lu Baolu
From: Liu Yi L 

Set proper masks to avoid invalid input spillover to reserved bits.

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
---
 include/linux/intel-iommu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 04bd9279c3fb..711bdca975be 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -381,8 +381,8 @@ enum {
 
 #define QI_DEV_EIOTLB_ADDR(a)  ((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
-#define QI_DEV_EIOTLB_GLOB(g)  ((u64)g)
-#define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32)
+#define QI_DEV_EIOTLB_GLOB(g)  ((u64)(g) & 0x1)
+#define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xf) << 32)
 #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16)
 #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4)
 #define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/12] iommu/vt-d: Handle non-page aligned address

2020-07-23 Thread Lu Baolu
From: Liu Yi L 

Address information for device TLB invalidation comes from userspace
when device is directly assigned to a guest with vIOMMU support.
VT-d requires page aligned address. This patch checks and enforce
address to be page aligned, otherwise reserved bits can be set in the
invalidation descriptor. Unrecoverable fault will be reported due to
non-zero value in the reserved bits.

Fixes: 61a06a16e36d8 ("iommu/vt-d: Support flushing more translation cache 
types")
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/dmar.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index a46d8d4abdb8..93e6345f3414 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1457,9 +1457,26 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, 
u16 sid, u16 pfsid,
 * Max Invs Pending (MIP) is set to 0 for now until we have DIT in
 * ECAP.
 */
-   desc.qw1 |= addr & ~mask;
-   if (size_order)
+   if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
+   pr_warn_ratelimited("Invalidate non-aligned address %llx, order 
%d\n",
+   addr, size_order);
+
+   /* Take page address */
+   desc.qw1 = QI_DEV_EIOTLB_ADDR(addr);
+
+   if (size_order) {
+   /*
+* Existing 0s in address below size_order may be the least
+* significant bit, we must set them to 1s to avoid having
+* smaller size than desired.
+*/
+   desc.qw1 |= GENMASK_ULL(size_order + VTD_PAGE_SHIFT - 1,
+   VTD_PAGE_SHIFT);
+   /* Clear size_order bit to indicate size */
+   desc.qw1 &= ~mask;
+   /* Set the S bit to indicate flushing more than 1 page */
desc.qw1 |= QI_DEV_EIOTLB_SIZE;
+   }
 
qi_submit_sync(iommu, , 1, 0);
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/1] PCI/ATS: Check PRI supported on the PF device when SRIOV is enabled

2020-07-23 Thread Bjorn Helgaas
On Thu, Jul 23, 2020 at 03:37:29PM -0700, Ashok Raj wrote:
> PASID and PRI capabilities are only enumerated in PF devices. VF devices
> do not enumerate these capabilites. IOMMU drivers also need to enumerate
> them before enabling features in the IOMMU. Extending the same support as
> PASID feature discovery (pci_pasid_features) for PRI.
> 
> Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities 
> before ATS")
> Signed-off-by: Ashok Raj 

This looks right to me, but I would like Joerg's ack before applying
it.

> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: Lu Baolu 
> Cc: sta...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: Ashok Raj 
> Cc: iommu@lists.linux-foundation.org
> ---
> v3: Added Fixes tag
> v2: Fixed build failure reported from lkp when CONFIG_PRI=n
> 
>  drivers/iommu/intel/iommu.c |  2 +-
>  drivers/pci/ats.c   | 13 +
>  include/linux/pci-ats.h |  4 
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e98..276452f5e6a7 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2560,7 +2560,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   }
>  
>   if (info->ats_supported && ecap_prs(iommu->ecap) &&
> - pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> + pci_pri_supported(pdev))
>   info->pri_supported = 1;
>   }
>   }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..2e6cf0c700f7 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -325,6 +325,19 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  
>   return pdev->pasid_required;
>  }
> +
> +/**
> + * pci_pri_supported - Check if PRI is supported.
> + * @pdev: PCI device structure
> + *
> + * Returns true if PRI capability is present, false otherwise.
> + */
> +bool pci_pri_supported(struct pci_dev *pdev)
> +{
> + /* VFs share the PF PRI configuration */
> + return !!(pci_physfn(pdev)->pri_cap);
> +}
> +EXPORT_SYMBOL_GPL(pci_pri_supported);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index f75c307f346d..df54cd5b15db 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -28,6 +28,10 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  int pci_reset_pri(struct pci_dev *pdev);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +bool pci_pri_supported(struct pci_dev *pdev);
> +#else
> +static inline bool pci_pri_supported(struct pci_dev *pdev)
> +{ return false; }
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/1] PCI/ATS: Check PRI supported on the PF device when SRIOV is enabled

2020-07-23 Thread Ashok Raj
PASID and PRI capabilities are only enumerated in PF devices. VF devices
do not enumerate these capabilites. IOMMU drivers also need to enumerate
them before enabling features in the IOMMU. Extending the same support as
PASID feature discovery (pci_pasid_features) for PRI.

Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities 
before ATS")
Signed-off-by: Ashok Raj 

To: Bjorn Helgaas 
To: Joerg Roedel 
To: Lu Baolu 
Cc: sta...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: Ashok Raj 
Cc: iommu@lists.linux-foundation.org
---
v3: Added Fixes tag
v2: Fixed build failure reported from lkp when CONFIG_PRI=n

 drivers/iommu/intel/iommu.c |  2 +-
 drivers/pci/ats.c   | 13 +
 include/linux/pci-ats.h |  4 
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e98..276452f5e6a7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2560,7 +2560,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
}
 
if (info->ats_supported && ecap_prs(iommu->ecap) &&
-   pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
+   pci_pri_supported(pdev))
info->pri_supported = 1;
}
}
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index b761c1f72f67..2e6cf0c700f7 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -325,6 +325,19 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 
return pdev->pasid_required;
 }
+
+/**
+ * pci_pri_supported - Check if PRI is supported.
+ * @pdev: PCI device structure
+ *
+ * Returns true if PRI capability is present, false otherwise.
+ */
+bool pci_pri_supported(struct pci_dev *pdev)
+{
+   /* VFs share the PF PRI configuration */
+   return !!(pci_physfn(pdev)->pri_cap);
+}
+EXPORT_SYMBOL_GPL(pci_pri_supported);
 #endif /* CONFIG_PCI_PRI */
 
 #ifdef CONFIG_PCI_PASID
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index f75c307f346d..df54cd5b15db 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -28,6 +28,10 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
 void pci_disable_pri(struct pci_dev *pdev);
 int pci_reset_pri(struct pci_dev *pdev);
 int pci_prg_resp_pasid_required(struct pci_dev *pdev);
+bool pci_pri_supported(struct pci_dev *pdev);
+#else
+static inline bool pci_pri_supported(struct pci_dev *pdev)
+{ return false; }
 #endif /* CONFIG_PCI_PRI */
 
 #ifdef CONFIG_PCI_PASID
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 18/21] iommu/mediatek: Add support for multi domain

2020-07-23 Thread Rob Herring
On Sat, Jul 11, 2020 at 02:48:43PM +0800, Yong Wu wrote:
> Some HW IP(ex: CCU) require the special iova range. That means the
> iova got from dma_alloc_attrs for that devices must locate in his
> special range. In this patch, we allocate a special iova_range for
> each a special requirement and create each a iommu domain for each
> a iova_range.
> 
> meanwhile we still use one pagetable which support 16GB iova.
> 
> After this patch, If the iova range of a master is over 4G, the master
> should:
> a) Declare its special dma_ranges in its dtsi node. For example, If we
> preassign the iova 4G-8G for vcodec, then the vcodec dtsi node should:
>   dma-ranges = <0x1 0x0 0x1 0x0 0x1 0x0>;  /* 4G ~ 8G */

BTW, dma-ranges should be in the parent node of the vcodec.

> b) Update the dma_mask:
>  dma_set_mask_and_coherent(dev, DMA_BIT_MASK(33));

This should happen for you automatically. The DMA PFN offset 
should also be 4GB here.

> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 49 ---
>  drivers/iommu/mtk_iommu.h |  3 ++-
>  2 files changed, 42 insertions(+), 10 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] PCI/ATS: PASID and PRI are only enumerated in PF devices.

2020-07-23 Thread Bjorn Helgaas
On Thu, Jul 23, 2020 at 10:38:19AM -0700, Raj, Ashok wrote:
> Hi Bjorn
> 
> On Tue, Jul 21, 2020 at 09:54:01AM -0500, Bjorn Helgaas wrote:
> > On Mon, Jul 20, 2020 at 09:43:00AM -0700, Ashok Raj wrote:
> > > PASID and PRI capabilities are only enumerated in PF devices. VF devices
> > > do not enumerate these capabilites. IOMMU drivers also need to enumerate
> > > them before enabling features in the IOMMU. Extending the same support as
> > > PASID feature discovery (pci_pasid_features) for PRI.
> > > 
> > > Signed-off-by: Ashok Raj 
> > 
> > Hi Ashok,
> > 
> > When you update this for the 0-day implicit declaration thing, can you
> > update the subject to say what the patch *does*, as opposed to what it
> > is solving?  Also, no need for a period at the end.
> 
> Yes, will update and resend. Goofed up a couple things, i'll update those
> as well.
> 
> > Does this fix a regression?  Is it associated with a commit that we
> > could add as a "Fixes:" tag so we know how far back to try to apply
> > to stable kernels?
> 
> Yes, 

Does that mean "yes, this fixes a regression"?

> but the iommu files moved location and git fixes tags only generates
> for a few handful of commits and doesn't show the old ones. 

Not sure how to interpret the rest of this.  I'm happy to include the
SHA1 of the original commit that added the regression, even if the
file has moved since then.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] PCI/ATS: PASID and PRI are only enumerated in PF devices.

2020-07-23 Thread Raj, Ashok
Hi Bjorn

On Tue, Jul 21, 2020 at 09:54:01AM -0500, Bjorn Helgaas wrote:
> On Mon, Jul 20, 2020 at 09:43:00AM -0700, Ashok Raj wrote:
> > PASID and PRI capabilities are only enumerated in PF devices. VF devices
> > do not enumerate these capabilites. IOMMU drivers also need to enumerate
> > them before enabling features in the IOMMU. Extending the same support as
> > PASID feature discovery (pci_pasid_features) for PRI.
> > 
> > Signed-off-by: Ashok Raj 
> 
> Hi Ashok,
> 
> When you update this for the 0-day implicit declaration thing, can you
> update the subject to say what the patch *does*, as opposed to what it
> is solving?  Also, no need for a period at the end.

Yes, will update and resend. Goofed up a couple things, i'll update those
as well.


> 
> Does this fix a regression?  Is it associated with a commit that we
> could add as a "Fixes:" tag so we know how far back to try to apply
> to stable kernels?

Yes, but the iommu files moved location and git fixes tags only generates
for a few handful of commits and doesn't show the old ones. 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/6] docs: IOMMU user API

2020-07-23 Thread Jacob Pan
IOMMU UAPI is newly introduced to support communications between guest
virtual IOMMU and host IOMMU. There has been lots of discussions on how
it should work with VFIO UAPI and userspace in general.

This document is intended to clarify the UAPI design and usage. The
mechanics of how future extensions should be achieved are also covered
in this documentation.

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 Documentation/userspace-api/iommu.rst | 231 ++
 MAINTAINERS   |   1 +
 2 files changed, 232 insertions(+)
 create mode 100644 Documentation/userspace-api/iommu.rst

diff --git a/Documentation/userspace-api/iommu.rst 
b/Documentation/userspace-api/iommu.rst
new file mode 100644
index ..0821b263edc1
--- /dev/null
+++ b/Documentation/userspace-api/iommu.rst
@@ -0,0 +1,231 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. iommu:
+
+=
+IOMMU Userspace API
+=
+
+IOMMU UAPI is used for virtualization cases where communications are
+needed between physical and virtual IOMMU drivers. For baremetal
+usage, the IOMMU is a system device which does not need to communicate
+with user space directly.
+
+The primary use cases are guest Shared Virtual Address (SVA) and
+guest IO virtual address (IOVA), wherin the vIOMMU implementation
+relies on the physical IOMMU and for this reason requires interactions
+with the host driver.
+
+.. contents:: :local:
+
+Functionalities
+===
+Communications of user and kernel involve both directions. The
+supported user-kernel APIs are as follows:
+
+1. Alloc/Free PASID
+2. Bind/unbind guest PASID (e.g. Intel VT-d)
+3. Bind/unbind guest PASID table (e.g. ARM SMMU)
+4. Invalidate IOMMU caches requested by guests
+5. Report errors to the guest and serve page requests
+
+Requirements
+
+The IOMMU UAPIs are generic and extensible to meet the following
+requirements:
+
+1. Emulated and para-virtualised vIOMMUs
+2. Multiple vendors (Intel VT-d, ARM SMMU, etc.)
+3. Extensions to the UAPI shall not break existing user space
+
+Interfaces
+==
+Although the data structures defined in IOMMU UAPI are self-contained,
+there is no user API functions introduced. Instead, IOMMU UAPI is
+designed to work with existing user driver frameworks such as VFIO.
+
+Extension Rules & Precautions
+-
+When IOMMU UAPI gets extended, the data structures can *only* be
+modified in two ways:
+
+1. Adding new fields by re-purposing the padding[] field. No size change.
+2. Adding new union members at the end. May increase the structure sizes.
+
+No new fields can be added *after* the variable sized union in that it
+will break backward compatibility when offset moves. A new flag must
+be introduced whenever a change affects the structure using either
+method. The IOMMU driver processes the data based on flags which
+ensures backward compatibility.
+
+Version field is only reserved for the unlikely event of UAPI upgrade
+at its entirety.
+
+It's *always* the caller's responsibility to indicate the size of the
+structure passed by setting argsz appropriately.
+Though at the same time, argsz is user provided data which is not
+trusted. The argsz field allows the user app to indicate how much data
+it is providing, it's still the kernel's responsibility to validate
+whether it's correct and sufficient for the requested operation.
+
+Compatibility Checking
+--
+When IOMMU UAPI extension results in some structure size increase,
+IOMMU UAPI code shall handle the following cases:
+
+1. User and kernel has exact size match
+2. An older user with older kernel header (smaller UAPI size) running on a
+   newer kernel (larger UAPI size)
+3. A newer user with newer kernel header (larger UAPI size) running
+   on an older kernel.
+4. A malicious/misbehaving user pass illegal/invalid size but within
+   range. The data may contain garbage.
+
+Feature Checking
+
+While launching a guest with vIOMMU, it is important to ensure that host
+can support the UAPI data structures to be used for vIOMMU-pIOMMU
+communications. Without upfront compatibility checking, future failures
+are difficult to report even in normal conditions.
+A capability getter is offered to discover the feature support and
+potential API incompatibility support at an early stage. Detecting
+this later, when performing the actual cache/descriptor handling
+operations appears to be tricky and difficult to escalate to the
+guest.
+
+For example, IOTLB invalidations should always succeed. There is no
+architectural way to report back to the vIOMMU if the UAPI data is
+incompatible. If that happens, in order to guarantee IOMMU iosolation,
+we have to resort to not giving completion status in vIOMMU. This may
+result in VM hang.
+
+For this reason the following IOMMU UAPIs cannot fail without
+catastrophic effect:
+
+1. Free PASID
+2. 

[PATCH v6 6/6] iommu/vt-d: Check UAPI data processed by IOMMU core

2020-07-23 Thread Jacob Pan
IOMMU generic layer already does sanity checks UAPI data for version
match and argsz range under generic information.
Remove the redundant version check from VT-d driver and check for vendor
specific data size.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 3 +--
 drivers/iommu/intel/svm.c   | 7 +--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 021f62078f52..7e03cca31a0e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5391,8 +5391,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
int ret = 0;
u64 size = 0;
 
-   if (!inv_info || !dmar_domain ||
-   inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   if (!inv_info || !dmar_domain)
return -EINVAL;
 
if (!dev || !dev_is_pci(dev))
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 713b3a218483..55ea11e9c0f5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -240,8 +240,11 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
if (WARN_ON(!iommu) || !data)
return -EINVAL;
 
-   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
-   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   if (data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   return -EINVAL;
+
+   /* IOMMU core ensures argsz is more than the start of the union */
+   if (data->argsz < offsetofend(struct iommu_gpasid_bind_data, 
vendor.vtd))
return -EINVAL;
 
if (!dev_is_pci(dev))
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 4/6] iommu/uapi: Rename uapi functions

2020-07-23 Thread Jacob Pan
User APIs such as iommu_sva_unbind_gpasid() may also be used by the
kernel. Since we introduced user pointer to the UAPI functions,
in-kernel callers cannot share the same APIs. In-kernel callers are also
trusted, there is no need to validate the data.

This patch renames all UAPI functions with iommu_uapi_ prefix such that
is clear to the intended callers.

Suggested-by: Alex Williamson 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c | 18 +-
 include/linux/iommu.h | 31 ---
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b6858adc4f17..3a913ce94a3d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1950,35 +1950,35 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
-int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
-  struct iommu_cache_invalidate_info *inv_info)
+int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device 
*dev,
+   struct iommu_cache_invalidate_info *inv_info)
 {
if (unlikely(!domain->ops->cache_invalidate))
return -ENODEV;
 
return domain->ops->cache_invalidate(domain, dev, inv_info);
 }
-EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
+EXPORT_SYMBOL_GPL(iommu_uapi_cache_invalidate);
 
-int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-  struct device *dev, struct iommu_gpasid_bind_data 
*data)
+int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain,
+  struct device *dev, struct 
iommu_gpasid_bind_data *data)
 {
if (unlikely(!domain->ops->sva_bind_gpasid))
return -ENODEV;
 
return domain->ops->sva_bind_gpasid(domain, dev, data);
 }
-EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
+EXPORT_SYMBOL_GPL(iommu_uapi_sva_bind_gpasid);
 
-int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
-ioasid_t pasid)
+int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, struct device 
*dev,
+ioasid_t pasid)
 {
if (unlikely(!domain->ops->sva_unbind_gpasid))
return -ENODEV;
 
return domain->ops->sva_unbind_gpasid(dev, pasid);
 }
-EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
+EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
 
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5f0b7859d2eb..2dcc1a33f6dc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -430,13 +430,13 @@ extern int iommu_attach_device(struct iommu_domain 
*domain,
   struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
-extern int iommu_cache_invalidate(struct iommu_domain *domain,
- struct device *dev,
- struct iommu_cache_invalidate_info *inv_info);
-extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-   struct device *dev, struct iommu_gpasid_bind_data *data);
-extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-   struct device *dev, ioasid_t pasid);
+extern int iommu_uapi_cache_invalidate(struct iommu_domain *domain,
+  struct device *dev,
+  struct iommu_cache_invalidate_info 
*inv_info);
+extern int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain,
+ struct device *dev, struct 
iommu_gpasid_bind_data *data);
+extern int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1054,21 +1054,22 @@ static inline int iommu_sva_get_pasid(struct iommu_sva 
*handle)
return IOMMU_PASID_INVALID;
 }
 
-static inline int
-iommu_cache_invalidate(struct iommu_domain *domain,
-  struct device *dev,
-  struct iommu_cache_invalidate_info *inv_info)
+static inline int iommu_uapi_cache_invalidate(struct iommu_domain *domain,
+ struct device *dev,
+ struct 
iommu_cache_invalidate_info *inv_info)
 {
return -ENODEV;
 }
-static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-   struct device *dev, struct 
iommu_gpasid_bind_data *data)
+
+static inline int 

[PATCH v6 5/6] iommu/uapi: Handle data and argsz filled by users

2020-07-23 Thread Jacob Pan
IOMMU user APIs are responsible for processing user data. This patch
changes the interface such that user pointers can be passed into IOMMU
code directly. Separate kernel APIs without user pointers are introduced
for in-kernel users of the UAPI functionality.

IOMMU UAPI data has a user filled argsz field which indicates the data
length of the structure. User data is not trusted, argsz must be
validated based on the current kernel data size, mandatory data size,
and feature flags.

User data may also be extended, resulting in possible argsz increase.
Backward compatibility is ensured based on size and flags (or
the functional equivalent fields) checking.

This patch adds sanity checks in the IOMMU layer. In addition to argsz,
reserved/unused fields in padding, flags, and version are also checked.
Details are documented in Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c | 202 --
 include/linux/iommu.h |  28 ---
 2 files changed, 213 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3a913ce94a3d..1ce2a61058c6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1950,33 +1950,219 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+/*
+ * Check flags and other user provided data for valid combinations. We also
+ * make sure no reserved fields or unused flags are set. This is to ensure
+ * not breaking userspace in the future when these fields or flags are used.
+ */
+static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info 
*info)
+{
+   u32 mask;
+   int i;
+
+   if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
+   if (info->cache & ~mask)
+   return -EINVAL;
+
+   if (info->granularity >= IOMMU_INV_GRANU_NR)
+   return -EINVAL;
+
+   switch (info->granularity) {
+   case IOMMU_INV_GRANU_ADDR:
+   if (info->cache & IOMMU_CACHE_INV_TYPE_PASID)
+   return -EINVAL;
+
+   mask = IOMMU_INV_ADDR_FLAGS_PASID |
+   IOMMU_INV_ADDR_FLAGS_ARCHID |
+   IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (info->granu.addr_info.flags & ~mask)
+   return -EINVAL;
+   break;
+   case IOMMU_INV_GRANU_PASID:
+   mask = IOMMU_INV_PASID_FLAGS_PASID |
+   IOMMU_INV_PASID_FLAGS_ARCHID;
+   if (info->granu.pasid_info.flags & ~mask)
+   return -EINVAL;
+
+   break;
+   case IOMMU_INV_GRANU_DOMAIN:
+   if (info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB)
+   return -EINVAL;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   /* Check reserved padding fields */
+   for (i = 0; i < 6; i++) {
+   if (info->padding[i])
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device 
*dev,
-   struct iommu_cache_invalidate_info *inv_info)
+   void __user *uinfo)
 {
+   struct iommu_cache_invalidate_info inv_info = { 0 };
+   u32 minsz;
+   int ret = 0;
+
if (unlikely(!domain->ops->cache_invalidate))
return -ENODEV;
 
-   return domain->ops->cache_invalidate(domain, dev, inv_info);
+   /*
+* No new spaces can be added before the variable sized union, the
+* minimum size is the offset to the union.
+*/
+   minsz = offsetof(struct iommu_cache_invalidate_info, granu);
+
+   /* Copy minsz from user to get flags and argsz */
+   if (copy_from_user(_info, uinfo, minsz))
+   return -EFAULT;
+
+   /* Fields before variable size union is mandatory */
+   if (inv_info.argsz < minsz)
+   return -EINVAL;
+
+   /* PASID and address granu require additional info beyond minsz */
+   if (inv_info.argsz == minsz &&
+   ((inv_info.granularity == IOMMU_INV_GRANU_PASID) ||
+   (inv_info.granularity == IOMMU_INV_GRANU_ADDR)))
+   return -EINVAL;
+
+   if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
+   inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, 
granu.pasid_info))
+   return -EINVAL;
+
+   if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
+   inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, 
granu.addr_info))
+   return -EINVAL;
+
+   /*
+* User might be using a newer UAPI header which has a larger data
+* size, we shall support the existing flags within the current
+  

[PATCH v6 3/6] iommu/uapi: Use named union for user data

2020-07-23 Thread Jacob Pan
IOMMU UAPI data size is filled by the user space which must be validated
by the kernel. To ensure backward compatibility, user data can only be
extended by either re-purpose padding bytes or extend the variable sized
union at the end. No size change is allowed before the union. Therefore,
the minimum size is the offset of the union.

To use offsetof() on the union, we must make it named.

Link: https://lore.kernel.org/linux-iommu/20200611145518.0c281...@x1.home/
Signed-off-by: Jacob Pan 
Reviewed-by: Lu Baolu 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel/iommu.c | 22 +++---
 drivers/iommu/intel/svm.c   |  2 +-
 include/uapi/linux/iommu.h  |  4 ++--
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 92c7ad66e64c..021f62078f52 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5417,8 +5417,8 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 
/* Size is only valid in address selective invalidation */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
-   size = to_vtd_size(inv_info->addr_info.granule_size,
-  inv_info->addr_info.nb_granules);
+   size = to_vtd_size(inv_info->granu.addr_info.granule_size,
+  inv_info->granu.addr_info.nb_granules);
 
for_each_set_bit(cache_type,
 (unsigned long *)_info->cache,
@@ -5439,20 +5439,20 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 * granularity.
 */
if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
-   (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
-   pasid = inv_info->pasid_info.pasid;
+   (inv_info->granu.pasid_info.flags & 
IOMMU_INV_PASID_FLAGS_PASID))
+   pasid = inv_info->granu.pasid_info.pasid;
else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
-(inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
-   pasid = inv_info->addr_info.pasid;
+(inv_info->granu.addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
+   pasid = inv_info->granu.addr_info.pasid;
 
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
/* HW will ignore LSB bits based on address mask */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
-   (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
size)) - 1))) {
+   (inv_info->granu.addr_info.addr & 
((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
pr_err_ratelimited("User address not aligned, 
0x%llx, size order %llu\n",
- inv_info->addr_info.addr, size);
+   inv_info->granu.addr_info.addr, 
size);
}
 
/*
@@ -5460,9 +5460,9 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 * We use npages = -1 to indicate that.
 */
qi_flush_piotlb(iommu, did, pasid,
-   mm_to_dma_pfn(inv_info->addr_info.addr),
+   
mm_to_dma_pfn(inv_info->granu.addr_info.addr),
(granu == QI_GRAN_NONG_PASID) ? -1 : 1 
<< size,
-   inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
+   inv_info->granu.addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
 
if (!info->ats_enabled)
break;
@@ -5485,7 +5485,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
size = 64 - VTD_PAGE_SHIFT;
addr = 0;
} else if (inv_info->granularity == 
IOMMU_INV_GRANU_ADDR)
-   addr = inv_info->addr_info.addr;
+   addr = inv_info->granu.addr_info.addr;
 
if (info->ats_enabled)
qi_flush_dev_iotlb_pasid(iommu, sid,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d386853121a2..713b3a218483 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -338,7 +338,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
spin_lock(>lock);
ret = intel_pasid_setup_nested(iommu, dev,
   (pgd_t *)(uintptr_t)data->gpgd,
-  

[PATCH v6 0/6] IOMMU user API enhancement

2020-07-23 Thread Jacob Pan
IOMMU user API header was introduced to support nested DMA translation and
related fault handling. The current UAPI data structures consist of three
areas that cover the interactions between host kernel and guest:
 - fault handling
 - cache invalidation
 - bind guest page tables, i.e. guest PASID

Future extensions are likely to support more architectures and vIOMMU features.

In the previous discussion, using user-filled data size and feature flags is
made a preferred approach over a unified version number.
https://lkml.org/lkml/2020/1/29/45

In addition to introduce argsz field to data structures, this patchset is also
trying to document the UAPI design, usage, and extension rules. VT-d driver
changes to utilize the new argsz field is included, VFIO usage is to follow.

This set is available at:
https://github.com/jacobpan/linux.git vsva_v5.8_uapi_v6

Thanks,

Jacob

Changeog:
v6
- Renamed all UAPI functions with iommu_uapi_ prefix
- Replaced argsz maxsz checking with flag specific size checks
- Documentation improvements based on suggestions by Eric Auger
  Replaced example code with a pointer to the actual code
- Added more checks for illegal flags combinations
- Added doc file to MAINTAINERS
v5
- Addjusted paddings in UAPI data to be 8 byte aligned
- Do not clobber argsz in IOMMU core before passing on to vendor driver
- Removed pr_warn_ for invalid UAPI data check, just return -EINVAL
- Clarified VFIO responsibility in UAPI data handling
- Use iommu_uapi prefix to differentiate APIs has in-kernel caller
- Added comment for unchecked flags of invalidation granularity
- Added example in doc to show vendor data checking

v4
- Added checks of UAPI data for reserved fields, version, and flags.
- Removed version check from vendor driver (vt-d)
- Relaxed argsz check to match the UAPI struct size instead of variable
  union size
- Updated documentation

v3:
- Rewrote backward compatibility rule to support existing code
  re-compiled with newer kernel UAPI header that runs on older
  kernel. Based on review comment from Alex W.
  https://lore.kernel.org/linux-iommu/20200611094741.6d118...@w520.home/
- Take user pointer directly in UAPI functions. Perform argsz check
  and copy_from_user() in IOMMU driver. Eliminate the need for
  VFIO or other upper layer to parse IOMMU data.
- Create wrapper function for in-kernel users of UAPI functions
v2:
- Removed unified API version and helper
- Introduced argsz for each UAPI data
- Introduced UAPI doc


Jacob Pan (6):
  docs: IOMMU user API
  iommu/uapi: Add argsz for user filled data
  iommu/uapi: Use named union for user data
  iommu/uapi: Rename uapi functions
  iommu/uapi: Handle data and argsz filled by users
  iommu/vt-d: Check UAPI data processed by IOMMU core

 Documentation/userspace-api/iommu.rst | 231 ++
 MAINTAINERS   |   1 +
 drivers/iommu/intel/iommu.c   |  25 ++--
 drivers/iommu/intel/svm.c |   9 +-
 drivers/iommu/iommu.c | 206 --
 include/linux/iommu.h |  35 --
 include/uapi/linux/iommu.h|  16 ++-
 7 files changed, 480 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/userspace-api/iommu.rst

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 2/6] iommu/uapi: Add argsz for user filled data

2020-07-23 Thread Jacob Pan
As IOMMU UAPI gets extended, user data size may increase. To support
backward compatibiliy, this patch introduces a size field to each UAPI
data structures. It is *always* the responsibility for the user to fill in
the correct size. Padding fields are adjusted to ensure 8 byte alignment.

Specific scenarios for user data handling are documented in:
Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 include/uapi/linux/iommu.h | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e907b7091a46..d5e9014f690e 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -135,6 +135,7 @@ enum iommu_page_response_code {
 
 /**
  * struct iommu_page_response - Generic page response information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @flags: encodes whether the corresponding fields are valid
  * (IOMMU_FAULT_PAGE_RESPONSE_* values)
@@ -143,6 +144,7 @@ enum iommu_page_response_code {
  * @code: response code from  iommu_page_response_code
  */
 struct iommu_page_response {
+   __u32   argsz;
 #define IOMMU_PAGE_RESP_VERSION_1  1
__u32   version;
 #define IOMMU_PAGE_RESP_PASID_VALID(1 << 0)
@@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
 /**
  * struct iommu_cache_invalidate_info - First level/stage invalidation
  * information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @cache: bitfield that allows to select which caches to invalidate
  * @granularity: defines the lowest granularity used for the invalidation:
@@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
  * must support the used granularity.
  */
 struct iommu_cache_invalidate_info {
+   __u32   argsz;
 #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
__u32   version;
 /* IOMMU paging structure cache */
@@ -255,7 +259,7 @@ struct iommu_cache_invalidate_info {
 #define IOMMU_CACHE_INV_TYPE_NR(3)
__u8cache;
__u8granularity;
-   __u8padding[2];
+   __u8padding[6];
union {
struct iommu_inv_pasid_info pasid_info;
struct iommu_inv_addr_info addr_info;
@@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
 
 /**
  * struct iommu_gpasid_bind_data - Information about device and guest PASID 
binding
+ * @argsz: User filled size of this data
  * @version:   Version of this data structure
  * @format:PASID table entry format
  * @flags: Additional information on guest bind request
@@ -309,17 +314,18 @@ struct iommu_gpasid_bind_data_vtd {
  * PASID to host PASID based on this bind data.
  */
 struct iommu_gpasid_bind_data {
+   __u32 argsz;
 #define IOMMU_GPASID_BIND_VERSION_11
__u32 version;
 #define IOMMU_PASID_FORMAT_INTEL_VTD   1
__u32 format;
+   __u32 addr_width;
 #define IOMMU_SVA_GPASID_VAL   (1 << 0) /* guest PASID valid */
__u64 flags;
__u64 gpgd;
__u64 hpasid;
__u64 gpasid;
-   __u32 addr_width;
-   __u8  padding[12];
+   __u8  padding[8];
/* Vendor specific data */
union {
struct iommu_gpasid_bind_data_vtd vtd;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 12/13] iommu/arm-smmu-v3: Implement iommu_sva_bind/unbind()

2020-07-23 Thread Jean-Philippe Brucker
The sva_bind() function allows devices to access process address spaces
using a PASID (aka SSID).

(1) bind() allocates or gets an existing MMU notifier tied to the
(domain, mm) pair. Each mm gets one PASID.

(2) Any change to the address space calls invalidate_range() which sends
ATC invalidations (in a subsequent patch).

(3) When the process address space dies, the release() notifier disables
the CD to allow reclaiming the page tables. Since release() has to
be light we do not instruct device drivers to stop DMA here, we just
ignore incoming page faults from this point onwards.

To avoid any event 0x0a print (C_BAD_CD) we disable translation
without clearing CD.V. PCIe Translation Requests and Page Requests
are silently denied. Don't clear the R bit because the S bit can't
be cleared when STALL_MODEL==0b10 (forced), and clearing R without
clearing S is useless. Faulting transactions will stall and will be
aborted by the IOPF handler.

(4) After stopping DMA, the device driver releases the bond by calling
unbind(). We release the MMU notifier, free the PASID and the bond.

Three structures keep track of bonds:
* arm_smmu_bond: one per {device, mm} pair, the handle returned to the
  device driver for a bind() request.
* arm_smmu_mmu_notifier: one per {domain, mm} pair, deals with ATS/TLB
  invalidations and clearing the context descriptor on mm exit.
* arm_smmu_ctx_desc: one per mm, holds the pinned ASID and pgd.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/Kconfig   |   2 +
 drivers/iommu/arm-smmu-v3.h |  28 
 drivers/iommu/arm-smmu-v3-sva.c | 230 +++-
 drivers/iommu/arm-smmu-v3.c |  32 +++--
 4 files changed, 282 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index afe355c02a50..2d05aa570f51 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -449,6 +449,8 @@ config ARM_SMMU_V3
 config ARM_SMMU_V3_SVA
bool "Shared Virtual Addressing support for the ARM SMMUv3"
depends on ARM_SMMU_V3
+   select IOMMU_SVA_LIB
+   select MMU_NOTIFIER
help
  Support for sharing process address spaces with devices using the
  SMMUv3.
diff --git a/drivers/iommu/arm-smmu-v3.h b/drivers/iommu/arm-smmu-v3.h
index ba34914813ff..6365c81a4614 100644
--- a/drivers/iommu/arm-smmu-v3.h
+++ b/drivers/iommu/arm-smmu-v3.h
@@ -677,10 +677,18 @@ struct arm_smmu_domain {
 
struct list_headdevices;
spinlock_t  devices_lock;
+
+   struct list_headmmu_notifiers;
 };
 
+static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
 extern struct xarray arm_smmu_asid_xa;
 extern struct mutex arm_smmu_asid_lock;
+extern struct arm_smmu_ctx_desc quiet_cd;
 
 int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
struct arm_smmu_ctx_desc *cd);
@@ -693,6 +701,11 @@ bool arm_smmu_master_sva_supported(struct arm_smmu_master 
*master);
 bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
 int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
 int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
+struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
+   void *drvdata);
+void arm_smmu_sva_unbind(struct iommu_sva *handle);
+int arm_smmu_sva_get_pasid(struct iommu_sva *handle);
+void arm_smmu_sva_notifier_synchronize(void);
 #else /* CONFIG_ARM_SMMU_V3_SVA */
 static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
 {
@@ -718,5 +731,20 @@ static inline int arm_smmu_master_disable_sva(struct 
arm_smmu_master *master)
 {
return -ENODEV;
 }
+
+static inline struct iommu_sva *
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+{
+   return ERR_PTR(-ENODEV);
+}
+
+static inline void arm_smmu_sva_unbind(struct iommu_sva *handle) {}
+
+static inline int arm_smmu_sva_get_pasid(struct iommu_sva *handle)
+{
+   return IOMMU_PASID_INVALID;
+}
+
+static inline void arm_smmu_sva_notifier_synchronize(void) {}
 #endif /* CONFIG_ARM_SMMU_V3_SVA */
 #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm-smmu-v3-sva.c b/drivers/iommu/arm-smmu-v3-sva.c
index b6805effede4..a77abbb7a9d7 100644
--- a/drivers/iommu/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm-smmu-v3-sva.c
@@ -5,11 +5,35 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include "arm-smmu-v3.h"
+#include "iommu-sva-lib.h"
 #include "io-pgtable-arm.h"
 
+struct arm_smmu_mmu_notifier {
+   struct mmu_notifier mn;
+   struct arm_smmu_ctx_desc*cd;
+   boolcleared;
+   refcount_t  refs;
+   struct list_headlist;
+   struct arm_smmu_domain 

[PATCH v9 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-07-23 Thread Jean-Philippe Brucker
Aggregate all sanity-checks for sharing CPU page tables with the SMMU
under a single ARM_SMMU_FEAT_SVA bit. For PCIe SVA, users also need to
check FEAT_ATS and FEAT_PRI. For platform SVA, they will have to check
FEAT_STALLS.

Introduce ARM_SMMU_FEAT_BTM (Broadcast TLB Maintenance), but don't
enable it at the moment. Since the entire VMID space is shared with the
CPU, enabling DVM (by clearing SMMU_CR2.PTM) could result in
over-invalidation and affect performance of stage-2 mappings.

Cc: Suzuki K Poulose 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.h | 10 
 drivers/iommu/arm-smmu-v3-sva.c | 43 +
 drivers/iommu/arm-smmu-v3.c |  3 +++
 3 files changed, 56 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.h b/drivers/iommu/arm-smmu-v3.h
index 90c08f156b43..7b14b48a26c7 100644
--- a/drivers/iommu/arm-smmu-v3.h
+++ b/drivers/iommu/arm-smmu-v3.h
@@ -602,6 +602,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALL_FORCE  (1 << 13)
 #define ARM_SMMU_FEAT_VAX  (1 << 14)
 #define ARM_SMMU_FEAT_RANGE_INV(1 << 15)
+#define ARM_SMMU_FEAT_BTM  (1 << 16)
+#define ARM_SMMU_FEAT_SVA  (1 << 17)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
@@ -683,4 +685,12 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
 bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
 
+#ifdef CONFIG_ARM_SMMU_V3_SVA
+bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
+#else /* CONFIG_ARM_SMMU_V3_SVA */
+static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
+{
+   return false;
+}
+#endif /* CONFIG_ARM_SMMU_V3_SVA */
 #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm-smmu-v3-sva.c b/drivers/iommu/arm-smmu-v3-sva.c
index d590c864bdf3..eedc6c8264f1 100644
--- a/drivers/iommu/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm-smmu-v3-sva.c
@@ -153,3 +153,46 @@ static void arm_smmu_free_shared_cd(struct 
arm_smmu_ctx_desc *cd)
kfree(cd);
}
 }
+
+bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
+{
+   unsigned long reg, fld;
+   unsigned long oas;
+   unsigned long asid_bits;
+
+   u32 feat_mask = ARM_SMMU_FEAT_BTM | ARM_SMMU_FEAT_COHERENCY;
+
+   if ((smmu->features & feat_mask) != feat_mask)
+   return false;
+
+   if (!(smmu->pgsize_bitmap & PAGE_SIZE))
+   return false;
+
+   /*
+* Get the smallest PA size of all CPUs (sanitized by cpufeature). We're
+* not even pretending to support AArch32 here. Abort if the MMU outputs
+* addresses larger than what we support.
+*/
+   reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+   fld = cpuid_feature_extract_unsigned_field(reg, 
ID_AA64MMFR0_PARANGE_SHIFT);
+   oas = id_aa64mmfr0_parange_to_phys_shift(fld);
+   if (smmu->oas < oas)
+   return false;
+
+   /* We can support bigger ASIDs than the CPU, but not smaller */
+   fld = cpuid_feature_extract_unsigned_field(reg, 
ID_AA64MMFR0_ASID_SHIFT);
+   asid_bits = fld ? 16 : 8;
+   if (smmu->asid_bits < asid_bits)
+   return false;
+
+   /*
+* See max_pinned_asids in arch/arm64/mm/context.c. The following is
+* generally the maximum number of bindable processes.
+*/
+   if (IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0))
+   asid_bits--;
+   dev_dbg(smmu->dev, "%d shared contexts\n", (1 << asid_bits) -
+   num_possible_cpus() - 2);
+
+   return true;
+}
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4a47b977ed01..558973e3cc1b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3258,6 +3258,9 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
 
smmu->ias = max(smmu->ias, smmu->oas);
 
+   if (arm_smmu_sva_supported(smmu))
+   smmu->features |= ARM_SMMU_FEAT_SVA;
+
dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
 smmu->ias, smmu->oas, smmu->features);
return 0;
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 11/13] iommu/arm-smmu-v3: Add SVA device feature

2020-07-23 Thread Jean-Philippe Brucker
Implement the IOMMU device feature callbacks to support the SVA feature.
At the moment dev_has_feat() returns false since I/O Page Faults isn't
yet implemented.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.h | 26 +++
 drivers/iommu/arm-smmu-v3-sva.c | 49 
 drivers/iommu/arm-smmu-v3.c | 79 +
 3 files changed, 154 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.h b/drivers/iommu/arm-smmu-v3.h
index 7b14b48a26c7..ba34914813ff 100644
--- a/drivers/iommu/arm-smmu-v3.h
+++ b/drivers/iommu/arm-smmu-v3.h
@@ -646,6 +646,8 @@ struct arm_smmu_master {
u32 *sids;
unsigned intnum_sids;
boolats_enabled;
+   boolsva_enabled;
+   struct list_headbonds;
unsigned intssid_bits;
 };
 
@@ -687,10 +689,34 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
 
 #ifdef CONFIG_ARM_SMMU_V3_SVA
 bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
+bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
+bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
+int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
+int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
 #else /* CONFIG_ARM_SMMU_V3_SVA */
 static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
 {
return false;
 }
+
+static inline bool arm_smmu_master_sva_supported(struct arm_smmu_master 
*master)
+{
+   return false;
+}
+
+static inline bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master)
+{
+   return false;
+}
+
+static inline int arm_smmu_master_enable_sva(struct arm_smmu_master *master)
+{
+   return -ENODEV;
+}
+
+static inline int arm_smmu_master_disable_sva(struct arm_smmu_master *master)
+{
+   return -ENODEV;
+}
 #endif /* CONFIG_ARM_SMMU_V3_SVA */
 #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm-smmu-v3-sva.c b/drivers/iommu/arm-smmu-v3-sva.c
index eedc6c8264f1..b6805effede4 100644
--- a/drivers/iommu/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm-smmu-v3-sva.c
@@ -10,6 +10,8 @@
 #include "arm-smmu-v3.h"
 #include "io-pgtable-arm.h"
 
+static DEFINE_MUTEX(sva_lock);
+
 /*
  * Try to reserve this ASID in the SMMU. If it is in use, try to steal it from
  * the private entry. Careful here, we may be modifying the context tables of
@@ -196,3 +198,50 @@ bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
 
return true;
 }
+
+static bool arm_smmu_iopf_supported(struct arm_smmu_master *master)
+{
+   return false;
+}
+
+bool arm_smmu_master_sva_supported(struct arm_smmu_master *master)
+{
+   if (!(master->smmu->features & ARM_SMMU_FEAT_SVA))
+   return false;
+
+   /* SSID and IOPF support are mandatory for the moment */
+   return master->ssid_bits && arm_smmu_iopf_supported(master);
+}
+
+bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master)
+{
+   bool enabled;
+
+   mutex_lock(_lock);
+   enabled = master->sva_enabled;
+   mutex_unlock(_lock);
+   return enabled;
+}
+
+int arm_smmu_master_enable_sva(struct arm_smmu_master *master)
+{
+   mutex_lock(_lock);
+   master->sva_enabled = true;
+   mutex_unlock(_lock);
+
+   return 0;
+}
+
+int arm_smmu_master_disable_sva(struct arm_smmu_master *master)
+{
+   mutex_lock(_lock);
+   if (!list_empty(>bonds)) {
+   dev_err(master->dev, "cannot disable SVA, device is bound\n");
+   mutex_unlock(_lock);
+   return -EBUSY;
+   }
+   master->sva_enabled = false;
+   mutex_unlock(_lock);
+
+   return 0;
+}
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 558973e3cc1b..0fd63d93bd2c 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2163,6 +2163,16 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
master = dev_iommu_priv_get(dev);
smmu = master->smmu;
 
+   /*
+* Checking that SVA is disabled ensures that this device isn't bound to
+* any mm, and can be safely detached from its old domain. Bonds cannot
+* be removed concurrently since we're holding the group mutex.
+*/
+   if (arm_smmu_master_sva_enabled(master)) {
+   dev_err(dev, "cannot attach - SVA enabled\n");
+   return -EBUSY;
+   }
+
arm_smmu_detach_dev(master);
 
mutex_lock(_domain->init_mutex);
@@ -2310,6 +2320,7 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
master->smmu = smmu;
master->sids = fwspec->ids;
master->num_sids = fwspec->num_ids;
+   INIT_LIST_HEAD(>bonds);
dev_iommu_priv_set(dev, master);
 
/* Check the SIDs are in range of the SMMU and our stream table */
@@ 

[PATCH v9 08/13] iommu/arm-smmu-v3: Share process page tables

2020-07-23 Thread Jean-Philippe Brucker
With Shared Virtual Addressing (SVA), we need to mirror CPU TTBR, TCR,
MAIR and ASIDs in SMMU contexts. Each SMMU has a single ASID space split
into two sets, shared and private. Shared ASIDs correspond to those
obtained from the arch ASID allocator, and private ASIDs are used for
"classic" map/unmap DMA.

A possible conflict happens when trying to use a shared ASID that has
already been allocated for private use by the SMMU driver. This will be
addressed in a later patch by replacing the private ASID. At the
moment we return -EBUSY.

Each mm_struct shared with the SMMU will have a single context
descriptor. Add a refcount to keep track of this. It will be protected
by the global SVA lock.

Introduce a new arm-smmu-v3-sva.c file and the CONFIG_ARM_SMMU_V3_SVA
option to let users opt in SVA support.

Signed-off-by: Jean-Philippe Brucker 
---
v9: Move to arm-smmu-v3-sva.c
---
 drivers/iommu/Kconfig   |  10 +++
 drivers/iommu/Makefile  |   5 +-
 drivers/iommu/arm-smmu-v3.h |   8 +++
 drivers/iommu/arm-smmu-v3-sva.c | 123 
 drivers/iommu/arm-smmu-v3.c |  34 +++--
 5 files changed, 172 insertions(+), 8 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3-sva.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 490c86e3615a..afe355c02a50 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -446,6 +446,16 @@ config ARM_SMMU_V3
  Say Y here if your system includes an IOMMU device implementing
  the ARM SMMUv3 architecture.
 
+config ARM_SMMU_V3_SVA
+   bool "Shared Virtual Addressing support for the ARM SMMUv3"
+   depends on ARM_SMMU_V3
+   help
+ Support for sharing process address spaces with devices using the
+ SMMUv3.
+
+ Say Y here if your system supports SVA extensions such as PCIe PASID
+ and PRI.
+
 config S390_IOMMU
def_bool y if S390 && PCI
depends on S390 && PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 0fe5a7f9bc69..929a752e4723 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -16,7 +16,10 @@ obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
 arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
-obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
+arm_smmu_v3-objs-y += arm-smmu-v3.o
+arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
+arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
+obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/trace.o
diff --git a/drivers/iommu/arm-smmu-v3.h b/drivers/iommu/arm-smmu-v3.h
index 51a9ce07b2d6..6b06a6f19604 100644
--- a/drivers/iommu/arm-smmu-v3.h
+++ b/drivers/iommu/arm-smmu-v3.h
@@ -540,6 +540,9 @@ struct arm_smmu_ctx_desc {
u64 ttbr;
u64 tcr;
u64 mair;
+
+   refcount_t  refs;
+   struct mm_struct*mm;
 };
 
 struct arm_smmu_l1_ctx_desc {
@@ -672,4 +675,9 @@ struct arm_smmu_domain {
spinlock_t  devices_lock;
 };
 
+extern struct xarray arm_smmu_asid_xa;
+extern struct mutex arm_smmu_asid_lock;
+
+bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
+
 #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm-smmu-v3-sva.c b/drivers/iommu/arm-smmu-v3-sva.c
new file mode 100644
index ..7c1541864688
--- /dev/null
+++ b/drivers/iommu/arm-smmu-v3-sva.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Implementation of the IOMMU SVA API for the ARM SMMUv3
+ */
+
+#include 
+#include 
+#include 
+
+#include "arm-smmu-v3.h"
+#include "io-pgtable-arm.h"
+
+static struct arm_smmu_ctx_desc *
+arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
+{
+   struct arm_smmu_ctx_desc *cd;
+
+   cd = xa_load(_smmu_asid_xa, asid);
+   if (!cd)
+   return NULL;
+
+   if (cd->mm) {
+   if (WARN_ON(cd->mm != mm))
+   return ERR_PTR(-EINVAL);
+   /* All devices bound to this mm use the same cd struct. */
+   refcount_inc(>refs);
+   return cd;
+   }
+
+   /* Ouch, ASID is already in use for a private cd. */
+   return ERR_PTR(-EBUSY);
+}
+
+__maybe_unused
+static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
+{
+   u16 asid;
+   int err = 0;
+   u64 tcr, par, reg;
+   struct arm_smmu_ctx_desc *cd;
+   struct arm_smmu_ctx_desc *ret = NULL;
+
+   asid = arm64_mm_context_get(mm);
+   if (!asid)
+   return ERR_PTR(-ESRCH);
+
+   cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+   if (!cd) {
+   err = -ENOMEM;
+   goto out_put_context;
+   }
+
+   

[PATCH v9 01/13] mm: Define pasid in mm

2020-07-23 Thread Jean-Philippe Brucker
From: Fenghua Yu 

PASID is shared by all threads in a process. So the logical place to keep
track of it is in the "mm". Both ARM and X86 need to use the PASID in the
"mm".

Suggested-by: Christoph Hellwig 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
https://lore.kernel.org/linux-iommu/1594684087-61184-8-git-send-email-fenghua...@intel.com/
---
---
 include/linux/mm_types.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 64ede5f150dc..d61285cfe027 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -538,6 +538,10 @@ struct mm_struct {
atomic_long_t hugetlb_usage;
 #endif
struct work_struct async_put_work;
+
+#ifdef CONFIG_IOMMU_SUPPORT
+   u32 pasid;
+#endif
} __randomize_layout;
 
/*
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 09/13] iommu/arm-smmu-v3: Seize private ASID

2020-07-23 Thread Jean-Philippe Brucker
The SMMU has a single ASID space, the union of shared and private ASID
sets. This means that the SMMU driver competes with the arch allocator
for ASIDs. Shared ASIDs are those of Linux processes, allocated by the
arch, and contribute in broadcast TLB maintenance. Private ASIDs are
allocated by the SMMU driver and used for "classic" map/unmap DMA. They
require command-queue TLB invalidations.

When we pin down an mm_context and get an ASID that is already in use by
the SMMU, it belongs to a private context. We used to simply abort the
bind, but this is unfair to users that would be unable to bind a few
seemingly random processes. Try to allocate a new private ASID for the
context, and make the old ASID shared.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.h |  3 +++
 drivers/iommu/arm-smmu-v3-sva.c | 36 +++--
 drivers/iommu/arm-smmu-v3.c | 34 +++
 3 files changed, 58 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.h b/drivers/iommu/arm-smmu-v3.h
index 6b06a6f19604..90c08f156b43 100644
--- a/drivers/iommu/arm-smmu-v3.h
+++ b/drivers/iommu/arm-smmu-v3.h
@@ -678,6 +678,9 @@ struct arm_smmu_domain {
 extern struct xarray arm_smmu_asid_xa;
 extern struct mutex arm_smmu_asid_lock;
 
+int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
+   struct arm_smmu_ctx_desc *cd);
+void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
 bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
 
 #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm-smmu-v3-sva.c b/drivers/iommu/arm-smmu-v3-sva.c
index 7c1541864688..d590c864bdf3 100644
--- a/drivers/iommu/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm-smmu-v3-sva.c
@@ -10,10 +10,19 @@
 #include "arm-smmu-v3.h"
 #include "io-pgtable-arm.h"
 
+/*
+ * Try to reserve this ASID in the SMMU. If it is in use, try to steal it from
+ * the private entry. Careful here, we may be modifying the context tables of
+ * another SMMU!
+ */
 static struct arm_smmu_ctx_desc *
 arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
 {
+   int ret;
+   u32 new_asid;
struct arm_smmu_ctx_desc *cd;
+   struct arm_smmu_device *smmu;
+   struct arm_smmu_domain *smmu_domain;
 
cd = xa_load(_smmu_asid_xa, asid);
if (!cd)
@@ -27,8 +36,31 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
return cd;
}
 
-   /* Ouch, ASID is already in use for a private cd. */
-   return ERR_PTR(-EBUSY);
+   smmu_domain = container_of(cd, struct arm_smmu_domain, s1_cfg.cd);
+   smmu = smmu_domain->smmu;
+
+   ret = xa_alloc(_smmu_asid_xa, _asid, cd,
+  XA_LIMIT(1, 1 << smmu->asid_bits), GFP_KERNEL);
+   if (ret)
+   return ERR_PTR(-ENOSPC);
+   /*
+* Race with unmap: TLB invalidations will start targeting the new ASID,
+* which isn't assigned yet. We'll do an invalidate-all on the old ASID
+* later, so it doesn't matter.
+*/
+   cd->asid = new_asid;
+   /*
+* Update ASID and invalidate CD in all associated masters. There will
+* be some overlap between use of both ASIDs, until we invalidate the
+* TLB.
+*/
+   arm_smmu_write_ctx_desc(smmu_domain, 0, cd);
+
+   /* Invalidate TLB entries previously associated with that context */
+   arm_smmu_tlb_inv_asid(smmu, asid);
+
+   xa_erase(_smmu_asid_xa, asid);
+   return NULL;
 }
 
 __maybe_unused
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 06f148a9ed82..4a47b977ed01 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -873,6 +873,17 @@ static int arm_smmu_cmdq_batch_submit(struct 
arm_smmu_device *smmu,
 }
 
 /* Context descriptor manipulation functions */
+void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
+{
+   struct arm_smmu_cmdq_ent cmd = {
+   .opcode = CMDQ_OP_TLBI_NH_ASID,
+   .tlbi.asid = asid,
+   };
+
+   arm_smmu_cmdq_issue_cmd(smmu, );
+   arm_smmu_cmdq_issue_sync(smmu);
+}
+
 static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
 int ssid, bool leaf)
 {
@@ -953,8 +964,8 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain 
*smmu_domain,
return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS;
 }
 
-static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
-  int ssid, struct arm_smmu_ctx_desc *cd)
+int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
+   struct arm_smmu_ctx_desc *cd)
 {
/*
 * This function handles the following cases:
@@ -1610,15 +1621,6 @@ static void arm_smmu_tlb_inv_context(void *cookie)
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cmdq_ent cmd;
 
-   if 

[PATCH v9 13/13] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops

2020-07-23 Thread Jean-Philippe Brucker
The invalidate_range() notifier is called for any change to the address
space. Perform the required ATC invalidations.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.h |  2 ++
 drivers/iommu/arm-smmu-v3-sva.c | 16 +++-
 drivers/iommu/arm-smmu-v3.c | 18 --
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.h b/drivers/iommu/arm-smmu-v3.h
index 6365c81a4614..baa80498ad9f 100644
--- a/drivers/iommu/arm-smmu-v3.h
+++ b/drivers/iommu/arm-smmu-v3.h
@@ -694,6 +694,8 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
struct arm_smmu_ctx_desc *cd);
 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
 bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
+int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+   unsigned long iova, size_t size);
 
 #ifdef CONFIG_ARM_SMMU_V3_SVA
 bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
diff --git a/drivers/iommu/arm-smmu-v3-sva.c b/drivers/iommu/arm-smmu-v3-sva.c
index a77abbb7a9d7..1ed6b92b2c72 100644
--- a/drivers/iommu/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm-smmu-v3-sva.c
@@ -178,6 +178,16 @@ static void arm_smmu_free_shared_cd(struct 
arm_smmu_ctx_desc *cd)
}
 }
 
+static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end)
+{
+   struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn);
+
+   arm_smmu_atc_inv_domain(smmu_mn->domain, mm->pasid, start,
+   end - start + 1);
+}
+
 static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 {
struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn);
@@ -196,6 +206,7 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, 
struct mm_struct *mm)
arm_smmu_write_ctx_desc(smmu_domain, mm->pasid, _cd);
 
arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
+   arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, 0, 0);
 
smmu_mn->cleared = true;
mutex_unlock(_lock);
@@ -207,6 +218,7 @@ static void arm_smmu_mmu_notifier_free(struct mmu_notifier 
*mn)
 }
 
 static struct mmu_notifier_ops arm_smmu_mmu_notifier_ops = {
+   .invalidate_range   = arm_smmu_mm_invalidate_range,
.release= arm_smmu_mm_release,
.free_notifier  = arm_smmu_mmu_notifier_free,
 };
@@ -279,8 +291,10 @@ static void arm_smmu_mmu_notifier_put(struct 
arm_smmu_mmu_notifier *smmu_mn)
 * If we went through clear(), we've already invalidated, and no
 * new TLB entry can have been formed.
 */
-   if (!smmu_mn->cleared)
+   if (!smmu_mn->cleared) {
arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid);
+   arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, 0, 0);
+   }
 
/* Frees smmu_mn */
mmu_notifier_put(_mn->mn);
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index ad71c07e783f..89efa9db7869 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1515,6 +1515,20 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, 
size_t size,
size_t inval_grain_shift = 12;
unsigned long page_start, page_end;
 
+   /*
+* ATS and PASID:
+*
+* If substream_valid is clear, the PCIe TLP is sent without a PASID
+* prefix. In that case all ATC entries within the address range are
+* invalidated, including those that were requested with a PASID! There
+* is no way to invalidate only entries without PASID.
+*
+* When using STRTAB_STE_1_S1DSS_SSID0 (reserving CD 0 for non-PASID
+* traffic), translation requests without PASID create ATC entries
+* without PASID, which must be invalidated with substream_valid clear.
+* This has the unpleasant side-effect of invalidating all PASID-tagged
+* ATC entries within the address range.
+*/
*cmd = (struct arm_smmu_cmdq_ent) {
.opcode = CMDQ_OP_ATC_INV,
.substream_valid= !!ssid,
@@ -1573,8 +1587,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master 
*master)
return arm_smmu_cmdq_issue_sync(master->smmu);
 }
 
-static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
-  int ssid, unsigned long iova, size_t size)
+int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+   unsigned long iova, size_t size)
 {
int i;
unsigned long flags;
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH v9 00/13] iommu: Shared Virtual Addressing for SMMUv3 (PT sharing part)

2020-07-23 Thread Jean-Philippe Brucker
Add support for sharing CPU page tables with the SMMUv3. Support for I/O
page faults and additional features (DVM, VHE and HTTU) needed for SVA
is available on my sva/current branch [2] and will be sent later.

Since v8 [1]:
* Moved the SVA code to arm-smmu-v3-sva.c under CONFIG_ARM_SMMU_V3_SVA.
  This required moving struct definitions and macros to arm-smmu-v3.h
  (patch 7), hence the new 700 insertions/deletions in the diffstat.
* Updated patches 4 and 8 following review.
* Fixed bug in patch 9 when replacing a private ASID.

[1] 
https://lore.kernel.org/linux-iommu/20200618155125.1548969-1-jean-phili...@linaro.org/
[2] https://jpbrucker.net/git/linux sva/current

Fenghua Yu (1):
  mm: Define pasid in mm

Jean-Philippe Brucker (12):
  iommu/ioasid: Add ioasid references
  iommu/sva: Add PASID helpers
  arm64: mm: Pin down ASIDs for sharing mm with devices
  iommu/io-pgtable-arm: Move some definitions to a header
  arm64: cpufeature: Export symbol read_sanitised_ftr_reg()
  iommu/arm-smmu-v3: Move definitions to a header
  iommu/arm-smmu-v3: Share process page tables
  iommu/arm-smmu-v3: Seize private ASID
  iommu/arm-smmu-v3: Check for SVA features
  iommu/arm-smmu-v3: Add SVA device feature
  iommu/arm-smmu-v3: Implement iommu_sva_bind/unbind()
  iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops

 drivers/iommu/Kconfig|  17 +
 drivers/iommu/Makefile   |   6 +-
 arch/arm64/include/asm/mmu.h |   3 +
 arch/arm64/include/asm/mmu_context.h |  11 +-
 drivers/iommu/arm-smmu-v3.h  | 752 +++
 drivers/iommu/io-pgtable-arm.h   |  30 +
 drivers/iommu/iommu-sva-lib.h|  15 +
 include/linux/ioasid.h   |  10 +-
 include/linux/mm_types.h |   4 +
 arch/arm64/kernel/cpufeature.c   |   1 +
 arch/arm64/mm/context.c  | 105 +++-
 drivers/iommu/arm-smmu-v3-sva.c  | 487 +++
 drivers/iommu/arm-smmu-v3.c  | 860 ++-
 drivers/iommu/intel/iommu.c  |   4 +-
 drivers/iommu/intel/svm.c|   6 +-
 drivers/iommu/io-pgtable-arm.c   |  27 +-
 drivers/iommu/ioasid.c   |  38 +-
 drivers/iommu/iommu-sva-lib.c|  85 +++
 MAINTAINERS  |   3 +-
 19 files changed, 1729 insertions(+), 735 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3.h
 create mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 drivers/iommu/iommu-sva-lib.h
 create mode 100644 drivers/iommu/arm-smmu-v3-sva.c
 create mode 100644 drivers/iommu/iommu-sva-lib.c

-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 04/13] arm64: mm: Pin down ASIDs for sharing mm with devices

2020-07-23 Thread Jean-Philippe Brucker
To enable address space sharing with the IOMMU, introduce
arm64_mm_context_get() and arm64_mm_context_put(), that pin down a
context and ensure that it will keep its ASID after a rollover. Export
the symbols to let the modular SMMUv3 driver use them.

Pinning is necessary because a device constantly needs a valid ASID,
unlike tasks that only require one when running. Without pinning, we would
need to notify the IOMMU when we're about to use a new ASID for a task,
and it would get complicated when a new task is assigned a shared ASID.
Consider the following scenario with no ASID pinned:

1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover. The IOMMU issues invalidation commands that can take tens of
   milliseconds.

It gets needlessly complicated. All we wanted to do was schedule task tn,
that has no business with the IOMMU. By letting the IOMMU pin tasks when
needed, we avoid stalling the slow path, and let the pinning fail when
we're out of shareable ASIDs.

After a rollover, the allocator expects at least one ASID to be available
in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS -
1) is the maximum number of ASIDs that can be shared with the IOMMU.

Signed-off-by: Jean-Philippe Brucker 
---
v9:
* Make mm->context.pinned a refcount_t.
* Prepend exported symbols with arm64_.
* Initialize pinned_asid_map with kernel ASID bits if necessary.
* Allow pinned_asid_map to be NULL. It could also depend on
  CONFIG_ARM_SMMU_V3_SVA since it adds an overhead on rollover (memcpy()
  asid_map instead of memset()), but that can be changed later.
* Only set the USER_ASID_BIT if kpti is enabled (previously it was set
  if CONFIG_UNMAP_KERNEL_AT_EL0).
---
 arch/arm64/include/asm/mmu.h |   3 +
 arch/arm64/include/asm/mmu_context.h |  11 ++-
 arch/arm64/mm/context.c  | 105 +--
 3 files changed, 112 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 8444df000181..77b3bd6ddcfa 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -17,11 +17,14 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+
 typedef struct {
atomic64_t  id;
 #ifdef CONFIG_COMPAT
void*sigpage;
 #endif
+   refcount_t  pinned;
void*vdso;
unsigned long   flags;
 } mm_context_t;
diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index b0bd9b55594c..0a10a3412c93 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -177,7 +177,13 @@ static inline void cpu_replace_ttbr1(pgd_t *pgdp)
 #define destroy_context(mm)do { } while(0)
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define init_new_context(tsk,mm)   ({ atomic64_set(&(mm)->context.id, 0); 
0; })
+static inline int
+init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+   atomic64_set(>context.id, 0);
+   refcount_set(>context.pinned, 0);
+   return 0;
+}
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 static inline void update_saved_ttbr0(struct task_struct *tsk,
@@ -250,6 +256,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
+unsigned long arm64_mm_context_get(struct mm_struct *mm);
+void arm64_mm_context_put(struct mm_struct *mm);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index d702d60e64da..5c06973b15db 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -27,6 +27,10 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
+static unsigned long max_pinned_asids;
+static unsigned long nr_pinned_asids;
+static unsigned long *pinned_asid_map;
+
 #define ASID_MASK  (~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION (1UL << asid_bits)
 
@@ -72,7 +76,7 @@ void verify_cpu_asid_bits(void)
}
 }
 
-static void set_kpti_asid_bits(void)
+static void set_kpti_asid_bits(unsigned long *map)
 {
unsigned int len = BITS_TO_LONGS(NUM_USER_ASIDS) * sizeof(unsigned 
long);
/*
@@ -81,13 +85,15 @@ static void set_kpti_asid_bits(void)
 * is set, then the ASID will map only userspace. Thus
 * mark even as reserved for kernel.
 */
-   memset(asid_map, 0xaa, len);
+   memset(map, 0xaa, len);
 }
 
 static void set_reserved_asid_bits(void)
 {
-   if (arm64_kernel_unmapped_at_el0())
- 

[PATCH v9 02/13] iommu/ioasid: Add ioasid references

2020-07-23 Thread Jean-Philippe Brucker
Let IOASID users take references to existing ioasids with ioasid_get().
ioasid_put() drops a reference and only frees the ioasid when its
reference number is zero. It returns true if the ioasid was freed.
For drivers that don't call ioasid_get(), ioasid_put() is the same as
ioasid_free().

Reviewed-by: Lu Baolu 
Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/ioasid.h  | 10 --
 drivers/iommu/intel/iommu.c |  4 ++--
 drivers/iommu/intel/svm.c   |  6 +++---
 drivers/iommu/ioasid.c  | 38 +
 4 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index 6f000d7a0ddc..e9dacd4b9f6b 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -34,7 +34,8 @@ struct ioasid_allocator_ops {
 #if IS_ENABLED(CONFIG_IOASID)
 ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
  void *private);
-void ioasid_free(ioasid_t ioasid);
+void ioasid_get(ioasid_t ioasid);
+bool ioasid_put(ioasid_t ioasid);
 void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
  bool (*getter)(void *));
 int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
@@ -48,10 +49,15 @@ static inline ioasid_t ioasid_alloc(struct ioasid_set *set, 
ioasid_t min,
return INVALID_IOASID;
 }
 
-static inline void ioasid_free(ioasid_t ioasid)
+static inline void ioasid_get(ioasid_t ioasid)
 {
 }
 
+static inline bool ioasid_put(ioasid_t ioasid)
+{
+   return false;
+}
+
 static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
bool (*getter)(void *))
 {
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e98..fd7a65d4f091 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5139,7 +5139,7 @@ static void auxiliary_unlink_device(struct dmar_domain 
*domain,
domain->auxd_refcnt--;
 
if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   ioasid_free(domain->default_pasid);
+   ioasid_put(domain->default_pasid);
 }
 
 static int aux_domain_add_dev(struct dmar_domain *domain,
@@ -5201,7 +5201,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
spin_unlock(>lock);
spin_unlock_irqrestore(_domain_lock, flags);
if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   ioasid_free(domain->default_pasid);
+   ioasid_put(domain->default_pasid);
 
return ret;
 }
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 6c87c807a0ab..b078a697c42e 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -546,7 +546,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct 
svm_dev_ops *ops,
if (mm) {
ret = mmu_notifier_register(>notifier, mm);
if (ret) {
-   ioasid_free(svm->pasid);
+   ioasid_put(svm->pasid);
kfree(svm);
kfree(sdev);
goto out;
@@ -564,7 +564,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct 
svm_dev_ops *ops,
if (ret) {
if (mm)
mmu_notifier_unregister(>notifier, mm);
-   ioasid_free(svm->pasid);
+   ioasid_put(svm->pasid);
kfree(svm);
kfree(sdev);
goto out;
@@ -639,7 +639,7 @@ static int intel_svm_unbind_mm(struct device *dev, int 
pasid)
kfree_rcu(sdev, rcu);
 
if (list_empty(>devs)) {
-   ioasid_free(svm->pasid);
+   ioasid_put(svm->pasid);
if (svm->mm)
mmu_notifier_unregister(>notifier, 
svm->mm);
list_del(>list);
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 0f8dd377aada..50ee27bbd04e 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -2,7 +2,7 @@
 /*
  * I/O Address Space ID allocator. There is one global IOASID space, split into
  * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
- * free IOASIDs with ioasid_alloc and ioasid_free.
+ * free IOASIDs with ioasid_alloc and ioasid_put.
  */
 #include 
 #include 
@@ -15,6 +15,7 @@ struct ioasid_data {
struct ioasid_set *set;
void *private;
struct rcu_head rcu;
+   refcount_t refs;
 };
 
 /*
@@ -314,6 +315,7 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, 
ioasid_t max,
 
data->set = set;
data->private = private;
+   refcount_set(>refs, 1);
 
/*
 * Custom allocator needs allocator data to perform 

[PATCH v9 07/13] iommu/arm-smmu-v3: Move definitions to a header

2020-07-23 Thread Jean-Philippe Brucker
Allow sharing structure definitions with the upcoming SVA support for
Arm SMMUv3, by moving them to a separate header. We could surgically
extract only what is needed but keeping all definitions in one place
looks nicer.

Signed-off-by: Jean-Philippe Brucker 
---
v9: new
---
 drivers/iommu/arm-smmu-v3.h | 675 
 drivers/iommu/arm-smmu-v3.c | 660 +--
 2 files changed, 676 insertions(+), 659 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3.h

diff --git a/drivers/iommu/arm-smmu-v3.h b/drivers/iommu/arm-smmu-v3.h
new file mode 100644
index ..51a9ce07b2d6
--- /dev/null
+++ b/drivers/iommu/arm-smmu-v3.h
@@ -0,0 +1,675 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * IOMMU API for ARM architected SMMUv3 implementations.
+ *
+ * Copyright (C) 2015 ARM Limited
+ */
+
+#ifndef _ARM_SMMU_V3_H
+#define _ARM_SMMU_V3_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* MMIO registers */
+#define ARM_SMMU_IDR0  0x0
+#define IDR0_ST_LVLGENMASK(28, 27)
+#define IDR0_ST_LVL_2LVL   1
+#define IDR0_STALL_MODEL   GENMASK(25, 24)
+#define IDR0_STALL_MODEL_STALL 0
+#define IDR0_STALL_MODEL_FORCE 2
+#define IDR0_TTENDIAN  GENMASK(22, 21)
+#define IDR0_TTENDIAN_MIXED0
+#define IDR0_TTENDIAN_LE   2
+#define IDR0_TTENDIAN_BE   3
+#define IDR0_CD2L  (1 << 19)
+#define IDR0_VMID16(1 << 18)
+#define IDR0_PRI   (1 << 16)
+#define IDR0_SEV   (1 << 14)
+#define IDR0_MSI   (1 << 13)
+#define IDR0_ASID16(1 << 12)
+#define IDR0_ATS   (1 << 10)
+#define IDR0_HYP   (1 << 9)
+#define IDR0_COHACC(1 << 4)
+#define IDR0_TTF   GENMASK(3, 2)
+#define IDR0_TTF_AARCH64   2
+#define IDR0_TTF_AARCH32_643
+#define IDR0_S1P   (1 << 1)
+#define IDR0_S2P   (1 << 0)
+
+#define ARM_SMMU_IDR1  0x4
+#define IDR1_TABLES_PRESET (1 << 30)
+#define IDR1_QUEUES_PRESET (1 << 29)
+#define IDR1_REL   (1 << 28)
+#define IDR1_CMDQS GENMASK(25, 21)
+#define IDR1_EVTQS GENMASK(20, 16)
+#define IDR1_PRIQS GENMASK(15, 11)
+#define IDR1_SSIDSIZE  GENMASK(10, 6)
+#define IDR1_SIDSIZE   GENMASK(5, 0)
+
+#define ARM_SMMU_IDR3  0xc
+#define IDR3_RIL   (1 << 10)
+
+#define ARM_SMMU_IDR5  0x14
+#define IDR5_STALL_MAX GENMASK(31, 16)
+#define IDR5_GRAN64K   (1 << 6)
+#define IDR5_GRAN16K   (1 << 5)
+#define IDR5_GRAN4K(1 << 4)
+#define IDR5_OAS   GENMASK(2, 0)
+#define IDR5_OAS_32_BIT0
+#define IDR5_OAS_36_BIT1
+#define IDR5_OAS_40_BIT2
+#define IDR5_OAS_42_BIT3
+#define IDR5_OAS_44_BIT4
+#define IDR5_OAS_48_BIT5
+#define IDR5_OAS_52_BIT6
+#define IDR5_VAX   GENMASK(11, 10)
+#define IDR5_VAX_52_BIT1
+
+#define ARM_SMMU_CR0   0x20
+#define CR0_ATSCHK (1 << 4)
+#define CR0_CMDQEN (1 << 3)
+#define CR0_EVTQEN (1 << 2)
+#define CR0_PRIQEN (1 << 1)
+#define CR0_SMMUEN (1 << 0)
+
+#define ARM_SMMU_CR0ACK0x24
+
+#define ARM_SMMU_CR1   0x28
+#define CR1_TABLE_SH   GENMASK(11, 10)
+#define CR1_TABLE_OC   GENMASK(9, 8)
+#define CR1_TABLE_IC   GENMASK(7, 6)
+#define CR1_QUEUE_SH   GENMASK(5, 4)
+#define CR1_QUEUE_OC   GENMASK(3, 2)
+#define CR1_QUEUE_IC   GENMASK(1, 0)
+/* CR1 cacheability fields don't quite follow the usual TCR-style encoding */
+#define CR1_CACHE_NC   0
+#define CR1_CACHE_WB   1
+#define CR1_CACHE_WT   2
+
+#define ARM_SMMU_CR2   0x2c
+#define CR2_PTM(1 << 2)
+#define CR2_RECINVSID  (1 << 1)
+#define CR2_E2H(1 << 0)
+
+#define ARM_SMMU_GBPA  0x44
+#define GBPA_UPDATE(1 << 31)
+#define GBPA_ABORT (1 << 20)
+
+#define ARM_SMMU_IRQ_CTRL  0x50
+#define IRQ_CTRL_EVTQ_IRQEN(1 << 2)
+#define IRQ_CTRL_PRIQ_IRQEN(1 << 1)
+#define IRQ_CTRL_GERROR_IRQEN  (1 << 0)
+
+#define ARM_SMMU_IRQ_CTRLACK   0x54
+

[PATCH v9 05/13] iommu/io-pgtable-arm: Move some definitions to a header

2020-07-23 Thread Jean-Philippe Brucker
Extract some of the most generic TCR defines, so they can be reused by
the page table sharing code.

Acked-by: Will Deacon 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/io-pgtable-arm.h | 30 ++
 drivers/iommu/io-pgtable-arm.c | 27 ++-
 MAINTAINERS|  3 +--
 3 files changed, 33 insertions(+), 27 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm.h

diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
new file mode 100644
index ..ba7cfdf7afa0
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef IO_PGTABLE_ARM_H_
+#define IO_PGTABLE_ARM_H_
+
+#define ARM_LPAE_TCR_TG0_4K0
+#define ARM_LPAE_TCR_TG0_64K   1
+#define ARM_LPAE_TCR_TG0_16K   2
+
+#define ARM_LPAE_TCR_TG1_16K   1
+#define ARM_LPAE_TCR_TG1_4K2
+#define ARM_LPAE_TCR_TG1_64K   3
+
+#define ARM_LPAE_TCR_SH_NS 0
+#define ARM_LPAE_TCR_SH_OS 2
+#define ARM_LPAE_TCR_SH_IS 3
+
+#define ARM_LPAE_TCR_RGN_NC0
+#define ARM_LPAE_TCR_RGN_WBWA  1
+#define ARM_LPAE_TCR_RGN_WT2
+#define ARM_LPAE_TCR_RGN_WB3
+
+#define ARM_LPAE_TCR_PS_32_BIT 0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT 0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT 0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT 0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT 0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT 0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT 0x6ULL
+
+#endif /* IO_PGTABLE_ARM_H_ */
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 04fbd4bf0ff9..f71a2eade04a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -20,6 +20,8 @@
 
 #include 
 
+#include "io-pgtable-arm.h"
+
 #define ARM_LPAE_MAX_ADDR_BITS 52
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES   16
 #define ARM_LPAE_MAX_LEVELS4
@@ -100,23 +102,6 @@
 #define ARM_LPAE_PTE_MEMATTR_DEV   (((arm_lpae_iopte)0x1) << 2)
 
 /* Register bits */
-#define ARM_LPAE_TCR_TG0_4K0
-#define ARM_LPAE_TCR_TG0_64K   1
-#define ARM_LPAE_TCR_TG0_16K   2
-
-#define ARM_LPAE_TCR_TG1_16K   1
-#define ARM_LPAE_TCR_TG1_4K2
-#define ARM_LPAE_TCR_TG1_64K   3
-
-#define ARM_LPAE_TCR_SH_NS 0
-#define ARM_LPAE_TCR_SH_OS 2
-#define ARM_LPAE_TCR_SH_IS 3
-
-#define ARM_LPAE_TCR_RGN_NC0
-#define ARM_LPAE_TCR_RGN_WBWA  1
-#define ARM_LPAE_TCR_RGN_WT2
-#define ARM_LPAE_TCR_RGN_WB3
-
 #define ARM_LPAE_VTCR_SL0_MASK 0x3
 
 #define ARM_LPAE_TCR_T0SZ_SHIFT0
@@ -124,14 +109,6 @@
 #define ARM_LPAE_VTCR_PS_SHIFT 16
 #define ARM_LPAE_VTCR_PS_MASK  0x7
 
-#define ARM_LPAE_TCR_PS_32_BIT 0x0ULL
-#define ARM_LPAE_TCR_PS_36_BIT 0x1ULL
-#define ARM_LPAE_TCR_PS_40_BIT 0x2ULL
-#define ARM_LPAE_TCR_PS_42_BIT 0x3ULL
-#define ARM_LPAE_TCR_PS_44_BIT 0x4ULL
-#define ARM_LPAE_TCR_PS_48_BIT 0x5ULL
-#define ARM_LPAE_TCR_PS_52_BIT 0x6ULL
-
 #define ARM_LPAE_MAIR_ATTR_SHIFT(n)((n) << 3)
 #define ARM_LPAE_MAIR_ATTR_MASK0xff
 #define ARM_LPAE_MAIR_ATTR_DEVICE  0x04
diff --git a/MAINTAINERS b/MAINTAINERS
index b4a43a9e7fbc..b8f00e641c08 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1499,8 +1499,7 @@ L:linux-arm-ker...@lists.infradead.org (moderated 
for non-subscribers)
 S: Maintained
 F: Documentation/devicetree/bindings/iommu/arm,smmu*
 F: drivers/iommu/arm-smmu*
-F: drivers/iommu/io-pgtable-arm-v7s.c
-F: drivers/iommu/io-pgtable-arm.c
+F: drivers/iommu/io-pgtable-arm*
 
 ARM SUB-ARCHITECTURES
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 06/13] arm64: cpufeature: Export symbol read_sanitised_ftr_reg()

2020-07-23 Thread Jean-Philippe Brucker
The SMMUv3 driver would like to read the MMFR0 PARANGE field in order to
share CPU page tables with devices. Allow the driver to be built as
module by exporting the read_sanitized_ftr_reg() cpufeature symbol.

Acked-by: Suzuki K Poulose 
Signed-off-by: Jean-Philippe Brucker 
---
 arch/arm64/kernel/cpufeature.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9fae0efc80c1..ac2ed25c43d3 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1068,6 +1068,7 @@ u64 read_sanitised_ftr_reg(u32 id)
return 0;
return regp->sys_val;
 }
+EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg);
 
 #define read_sysreg_case(r)\
case r: return read_sysreg_s(r)
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 03/13] iommu/sva: Add PASID helpers

2020-07-23 Thread Jean-Philippe Brucker
Let IOMMU drivers allocate a single PASID per mm. Store the mm in the
IOASID set to allow refcounting and searching mm by PASID, when handling
an I/O page fault.

Reviewed-by: Lu Baolu 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/Kconfig |  5 +++
 drivers/iommu/Makefile|  1 +
 drivers/iommu/iommu-sva-lib.h | 15 +++
 drivers/iommu/iommu-sva-lib.c | 85 +++
 4 files changed, 106 insertions(+)
 create mode 100644 drivers/iommu/iommu-sva-lib.h
 create mode 100644 drivers/iommu/iommu-sva-lib.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6dc49ed8377a..490c86e3615a 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -102,6 +102,11 @@ config IOMMU_DMA
select IRQ_MSI_IOMMU
select NEED_SG_DMA_LENGTH
 
+# Shared Virtual Addressing library
+config IOMMU_SVA_LIB
+   bool
+   select IOASID
+
 config FSL_PAMU
bool "Freescale IOMMU support"
depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..0fe5a7f9bc69 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -38,3 +38,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
+obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
new file mode 100644
index ..b40990aef3fd
--- /dev/null
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * SVA library for IOMMU drivers
+ */
+#ifndef _IOMMU_SVA_LIB_H
+#define _IOMMU_SVA_LIB_H
+
+#include 
+#include 
+
+int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);
+void iommu_sva_free_pasid(struct mm_struct *mm);
+struct mm_struct *iommu_sva_find(ioasid_t pasid);
+
+#endif /* _IOMMU_SVA_LIB_H */
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
new file mode 100644
index ..db7e6c104d6b
--- /dev/null
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Helpers for IOMMU drivers implementing SVA
+ */
+#include 
+#include 
+
+#include "iommu-sva-lib.h"
+
+static DEFINE_MUTEX(iommu_sva_lock);
+static DECLARE_IOASID_SET(iommu_sva_pasid);
+
+/**
+ * iommu_sva_alloc_pasid - Allocate a PASID for the mm
+ * @mm: the mm
+ * @min: minimum PASID value (inclusive)
+ * @max: maximum PASID value (inclusive)
+ *
+ * Try to allocate a PASID for this mm, or take a reference to the existing one
+ * provided it fits within the [min, max] range. On success the PASID is
+ * available in mm->pasid, and must be released with iommu_sva_free_pasid().
+ *
+ * Returns 0 on success and < 0 on error.
+ */
+int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
+{
+   int ret = 0;
+   ioasid_t pasid;
+
+   if (min == INVALID_IOASID || max == INVALID_IOASID ||
+   min == 0 || max < min)
+   return -EINVAL;
+
+   mutex_lock(_sva_lock);
+   if (mm->pasid) {
+   if (mm->pasid >= min && mm->pasid <= max)
+   ioasid_get(mm->pasid);
+   else
+   ret = -EOVERFLOW;
+   } else {
+   pasid = ioasid_alloc(_sva_pasid, min, max, mm);
+   if (pasid == INVALID_IOASID)
+   ret = -ENOMEM;
+   else
+   mm->pasid = pasid;
+   }
+   mutex_unlock(_sva_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
+
+/**
+ * iommu_sva_free_pasid - Release the mm's PASID
+ * @mm: the mm.
+ *
+ * Drop one reference to a PASID allocated with iommu_sva_alloc_pasid()
+ */
+void iommu_sva_free_pasid(struct mm_struct *mm)
+{
+   mutex_lock(_sva_lock);
+   if (ioasid_put(mm->pasid))
+   mm->pasid = 0;
+   mutex_unlock(_sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_free_pasid);
+
+/* ioasid wants a void * argument */
+static bool __mmget_not_zero(void *mm)
+{
+   return mmget_not_zero(mm);
+}
+
+/**
+ * iommu_sva_find() - Find mm associated to the given PASID
+ * @pasid: Process Address Space ID assigned to the mm
+ *
+ * On success a reference to the mm is taken, and must be released with 
mmput().
+ *
+ * Returns the mm corresponding to this PASID, or an error if not found.
+ */
+struct mm_struct *iommu_sva_find(ioasid_t pasid)
+{
+   return ioasid_find(_sva_pasid, pasid, __mmget_not_zero);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_find);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/4] iommu aux-domain APIs extensions

2020-07-23 Thread Lu Baolu

Hi Joerg and Alex,

Any comments for this series?

Just check to see whether we could make it for v5.9. The first aux-
domain capable device driver has been posted [1].

[1] 
https://lore.kernel.org/linux-pci/159534667974.28840.2045034360240786644.st...@djiang5-desk3.ch.intel.com/


Best regards,
baolu

On 2020/7/14 13:56, Lu Baolu wrote:

This series aims to extend the IOMMU aux-domain API set so that it
could be more friendly to vfio/mdev usage. The interactions between
vfio/mdev and iommu during mdev creation and passthr are:

1. Create a group for mdev with iommu_group_alloc();
2. Add the device to the group with

group = iommu_group_alloc();
if (IS_ERR(group))
return PTR_ERR(group);

ret = iommu_group_add_device(group, >dev);
if (!ret)
dev_info(>dev, "MDEV: group_id = %d\n",
 iommu_group_id(group));

3. Allocate an aux-domain with iommu_domain_alloc();
4. Attach the aux-domain to the iommu_group.

iommu_group_for_each_dev {
if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
return iommu_aux_attach_device(domain, iommu_device);
else
return iommu_attach_device(domain, iommu_device);
 }

where, iommu_device is the aux-domain-capable device. The mdev's in
the group are all derived from it.

In the whole process, an iommu group was allocated for the mdev and an
iommu domain was attached to the group, but the group->domain leaves
NULL. As the result, iommu_get_domain_for_dev() (or other similar
interfaces) doesn't work anymore.

The iommu_get_domain_for_dev() is a necessary interface for device
drivers that want to support vfio/mdev based aux-domain. For example,

 unsigned long pasid;
 struct iommu_domain *domain;
 struct device *dev = mdev_dev(mdev);
 struct device *iommu_device = vfio_mdev_get_iommu_device(dev);

 domain = iommu_get_domain_for_dev(dev);
 if (!domain)
 return -ENODEV;

 pasid = iommu_aux_get_pasid(domain, iommu_device);
 if (pasid <= 0)
 return -EINVAL;

  /* Program the device context */
  

We tried to address this by extending iommu_aux_at(de)tach_device() so that
the users could pass in an optional device pointer (for example vfio/mdev).
(v2 of this series)

https://lore.kernel.org/linux-iommu/20200707013957.23672-1-baolu...@linux.intel.com/

But that will cause a lock issue as group->mutex has been applied in
iommu_group_for_each_dev(), but has to be reapplied again in the
iommu_aux_attach_device().

This version tries to address this by introducing two new APIs into the
aux-domain API set:

/**
  * iommu_aux_attach_group - attach an aux-domain to an iommu_group which
  *  contains sub-devices (for example mdevs)
  *  derived from @dev.
  * @domain: an aux-domain;
  * @group:  an iommu_group which contains sub-devices derived from @dev;
  * @dev:the physical device which supports IOMMU_DEV_FEAT_AUX.
  *
  * Returns 0 on success, or an error value.
  */
int iommu_aux_attach_group(struct iommu_domain *domain,
struct iommu_group *group, struct device *dev)

/**
  * iommu_aux_detach_group - detach an aux-domain from an iommu_group
  *
  * @domain: an aux-domain;
  * @group:  an iommu_group which contains sub-devices derived from @dev;
  * @dev:the physical device which supports IOMMU_DEV_FEAT_AUX.
  *
  * @domain must have been attached to @group via
  * iommu_aux_attach_group().
  */
void iommu_aux_detach_group(struct iommu_domain *domain,
 struct iommu_group *group, struct device *dev)

This version is evolved according to feedbacks from Robin(v1) and
Alex(v2). Your comments are very appreciated.

Best regards,
baolu

---
Change log:
  - v1->v2:
- 
https://lore.kernel.org/linux-iommu/20200627031532.28046-1-baolu...@linux.intel.com/
- Suggested by Robin.

  - v2->v3:
- 
https://lore.kernel.org/linux-iommu/20200707013957.23672-1-baolu...@linux.intel.com/
- Suggested by Alex

Lu Baolu (4):
   iommu: Check IOMMU_DEV_FEAT_AUX feature in aux api's
   iommu: Add iommu_aux_at(de)tach_group()
   iommu: Add iommu_aux_get_domain_for_dev()
   vfio/type1: Use iommu_aux_at(de)tach_group() APIs

  drivers/iommu/iommu.c   | 92 ++---
  drivers/vfio/vfio_iommu_type1.c | 44 +++-
  include/linux/iommu.h   | 24 +
  3 files changed, 116 insertions(+), 44 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA

2020-07-23 Thread Barry Song
Ganapatrao Kulkarni has put some effort on making arm-smmu-v3 use local
memory to save command queues[1]. I also did similar job in patch
"iommu/arm-smmu-v3: allocate the memory of queues in local numa node"
[2] while not realizing Ganapatrao has done that before.

But it seems it is much better to make dma_alloc_coherent() to be
inherently NUMA-aware on NUMA-capable systems.

Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
Saving queues and tables remotely will increase the latency of ARM SMMU
significantly. For example, when SMMU is at node2 and the default global
CMA is at node0, after sending a CMD_SYNC in an empty command queue, we
have to wait more than 550ns for the completion of the command CMD_SYNC.
However, if we save them locally, we only need to wait for 240ns.

with per-numa CMA, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.

Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024455.html
[2] https://www.spinics.net/lists/iommu/msg44767.html

-v4:
 * rebase on top of Christoph Hellwig's patch:
 [PATCH v2] dma-contiguous: cleanup dma_alloc_contiguous
 https://lore.kernel.org/linux-iommu/20200723120133.94105-1-...@lst.de/
 * cleanup according to Christoph's comment
 * patch 2/2 rebased on top of linux-next to avoid arch/arm64 conflicts
 * reserve cma by checking N_MEMORY rather than N_ONLINE

-v3:
  * move to use page_to_nid() while freeing cma with respect to Robin's
  comment, but this will only work after applying my below patch:
  "mm/cma.c: use exact_nid true to fix possible per-numa cma leak"
  https://marc.info/?l=linux-mm=159333034726647=2

  * handle the case count <= 1 more properly according to Robin's
  comment;

  * add pernuma_cma parameter to support dynamic setting of per-numa
  cma size;
  ideally we can leverage the CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE and
  "cma=" kernel parameter and avoid a new paramter separately for per-
  numa cma. Practically, it is really too complicated considering the
  below problems:
  (1) if we leverage the size of default numa for per-numa, we have to
  avoid creating two cma with same size in node0 since default cma is
  probably on node0.
  (2) default cma can consider the address limitation for old devices
  while per-numa cma doesn't support GFP_DMA and GFP_DMA32. all
  allocations with limitation flags will fallback to default one.
  (3) hard to apply CMA_SIZE_PERCENTAGE to per-numa. it is hard to
  decide if the percentage should apply to the whole memory size
  or only apply to the memory size of a specific numa node.
  (4) default cma size has CMA_SIZE_SEL_MIN and CMA_SIZE_SEL_MAX, it
  makes things even more complicated to per-numa cma.

  I haven't figured out a good way to leverage the size of default cma
  for per-numa cma. it seems a separate parameter for per-numa could
  make life easier.

  * move dma_pernuma_cma_reserve() after hugetlb_cma_reserve() to
  reuse the comment before hugetlb_cma_reserve() with respect to
  Robin's comment

-v2: 
  * fix some issues reported by kernel test robot
  * fallback to default cma while allocation fails in per-numa cma
 free memory properly

Barry Song (2):
  dma-direct: provide the ability to reserve per-numa CMA
  arm64: mm: reserve per-numa CMA to localize coherent dma buffers

 .../admin-guide/kernel-parameters.txt |  9 ++
 arch/arm64/mm/init.c  |  2 +
 include/linux/dma-contiguous.h|  4 +
 kernel/dma/Kconfig| 10 ++
 kernel/dma/contiguous.c   | 94 +--
 5 files changed, 109 insertions(+), 10 deletions(-)

-- 
2.27.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA

2020-07-23 Thread Barry Song
Right now, drivers like ARM SMMU are using dma_alloc_coherent() to get
coherent DMA buffers to save their command queues and page tables. As
there is only one default CMA in the whole system, SMMUs on nodes other
than node0 will get remote memory. This leads to significant latency.

This patch provides per-numa CMA so that drivers like SMMU can get local
memory. Tests show localizing CMA can decrease dma_unmap latency much.
For instance, before this patch, SMMU on node2 has to wait for more than
560ns for the completion of CMD_SYNC in an empty command queue; with this
patch, it needs 240ns only.

A positive side effect of this patch would be improving performance even
further for those users who are worried about performance more than DMA
security and use iommu.passthrough=1 to skip IOMMU. With local CMA, all
drivers can get local coherent DMA buffers.

Cc: Jonathan Cameron 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Ganapatrao Kulkarni 
Cc: Catalin Marinas 
Cc: Nicolas Saenz Julienne 
Cc: Steve Capper 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Signed-off-by: Barry Song 
---
-v4:
 * rebase on top of Christoph Hellwig's patch:
 [PATCH v2] dma-contiguous: cleanup dma_alloc_contiguous
 https://lore.kernel.org/linux-iommu/20200723120133.94105-1-...@lst.de/
 * cleanup according to Christoph's comment
 * reserve cma by checking N_MEMORY rather than N_ONLINE


 .../admin-guide/kernel-parameters.txt |  9 ++
 include/linux/dma-contiguous.h|  4 +
 kernel/dma/Kconfig| 10 ++
 kernel/dma/contiguous.c   | 94 +--
 4 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index b8d6cde06463..6d963484bbdc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -599,6 +599,15 @@
altogether. For more information, see
include/linux/dma-contiguous.h
 
+   pernuma_cma=nn[MG]@[start[MG][-end[MG]]]
+   [ARM,X86,KNL]
+   Sets the size of kernel per-numa memory area for
+   contiguous memory allocations. A value of 0 disables
+   per-numa CMA altogether. DMA users on node nid will
+   first try to allocate buffer from the pernuma area
+   which is located in node nid, if the allocation fails,
+   they will fallback to the global default memory area.
+
cmo_free_hint=  [PPC] Format: { yes | no }
Specify whether pages are marked as being inactive
when they are freed.  This is used in CMO environments
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index 03f8e98e3bcc..278a80a40456 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -79,6 +79,8 @@ static inline void dma_contiguous_set_default(struct cma *cma)
 
 void dma_contiguous_reserve(phys_addr_t addr_limit);
 
+void dma_pernuma_cma_reserve(void);
+
 int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
   phys_addr_t limit, struct cma **res_cma,
   bool fixed);
@@ -128,6 +130,8 @@ static inline void dma_contiguous_set_default(struct cma 
*cma) { }
 
 static inline void dma_contiguous_reserve(phys_addr_t limit) { }
 
+static inline void dma_pernuma_cma_reserve(void) { }
+
 static inline int dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t 
base,
   phys_addr_t limit, struct cma **res_cma,
   bool fixed)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 5d18456b5f01..37e0e3559624 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -120,6 +120,16 @@ config DMA_CMA
 if  DMA_CMA
 comment "Default contiguous memory area size:"
 
+config CMA_PERNUMA_SIZE_MBYTES
+   int "Size in Mega Bytes for per-numa CMA areas"
+   depends on NUMA
+   default 16 if ARM64
+   default 0
+   help
+ Defines the size (in MiB) of the per-numa memory area for Contiguous
+ Memory Allocator. Every numa node will get a separate CMA with this
+ size. If the size of 0 is selected, per-numa CMA is disabled.
+
 config CMA_SIZE_MBYTES
int "Size in Mega Bytes"
depends on !CMA_SIZE_SEL_PERCENTAGE
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index cff7e60968b9..c0482231c0d7 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -30,7 +30,14 @@
 #define CMA_SIZE_MBYTES 0
 #endif
 
+#ifdef CONFIG_CMA_PERNUMA_SIZE_MBYTES
+#define CMA_SIZE_PERNUMA_MBYTES CONFIG_CMA_PERNUMA_SIZE_MBYTES
+#else
+#define CMA_SIZE_PERNUMA_MBYTES 0
+#endif
+
 struct cma 

[PATCH v4 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers

2020-07-23 Thread Barry Song
Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
with this patch, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.
Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Ganapatrao Kulkarni 
Cc: Catalin Marinas 
Cc: Nicolas Saenz Julienne 
Cc: Steve Capper 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Signed-off-by: Barry Song 
---
 -v4:
 * rebase on top of linux-next to avoid arch/arm64 conflicts

 arch/arm64/mm/init.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index b6881d61b818..a6e19145ebb3 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -437,6 +437,8 @@ void __init bootmem_init(void)
arm64_hugetlb_cma_reserve();
 #endif
 
+   dma_pernuma_cma_reserve();
+
memblock_dump_all();
 }
 
-- 
2.27.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 1/2] dma-direct: provide the ability to reserve per-numa CMA

2020-07-23 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Christoph Hellwig [mailto:h...@lst.de]
> Sent: Friday, July 24, 2020 12:01 AM
> To: Song Bao Hua (Barry Song) 
> Cc: Christoph Hellwig ; m.szyprow...@samsung.com;
> robin.mur...@arm.com; w...@kernel.org; ganapatrao.kulka...@cavium.com;
> catalin.mari...@arm.com; iommu@lists.linux-foundation.org; Linuxarm
> ; linux-arm-ker...@lists.infradead.org;
> linux-ker...@vger.kernel.org; Jonathan Cameron
> ; Nicolas Saenz Julienne
> ; Steve Capper ; Andrew
> Morton ; Mike Rapoport ;
> Zengtao (B) ; huangdaode
> 
> Subject: Re: [PATCH v3 1/2] dma-direct: provide the ability to reserve
> per-numa CMA
> 
> On Wed, Jul 22, 2020 at 09:41:50PM +, Song Bao Hua (Barry Song)
> wrote:
> > I got a kernel robot warning which said dev should be checked before
> > being accessed when I did a similar change in v1. Probably it was an
> > invalid warning if dev should never be null.
> 
> That usually shows up if a function is inconsistent about sometimes checking 
> it
> and sometimes now.
> 
> > Yes, it looks much better.
> 
> Below is a prep patch to rebase on top of:

Thanks for letting me know.

Will rebase on top of your patch.

> 
> ---
> From b81a5e1da65fce9750f0a8b66dbb6f842cbfdd4d Mon Sep 17 00:00:00
> 2001
> From: Christoph Hellwig 
> Date: Wed, 22 Jul 2020 16:33:43 +0200
> Subject: dma-contiguous: cleanup dma_alloc_contiguous
> 
> Split out a cma_alloc_aligned helper to deal with the "interesting"
> calling conventions for cma_alloc, which then allows to the main function to
> be written straight forward.  This also takes advantage of the fact that NULL
> dev arguments have been gone from the DMA API for a while.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/contiguous.c | 31 ++-
>  1 file changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c index
> 15bc5026c485f2..cff7e60968b9e1 100644
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -215,6 +215,13 @@ bool dma_release_from_contiguous(struct device
> *dev, struct page *pages,
>   return cma_release(dev_get_cma_area(dev), pages, count);  }
> 
> +static struct page *cma_alloc_aligned(struct cma *cma, size_t size,
> +gfp_t gfp) {
> + unsigned int align = min(get_order(size), CONFIG_CMA_ALIGNMENT);
> +
> + return cma_alloc(cma, size >> PAGE_SHIFT, align, gfp & __GFP_NOWARN);
> +}
> +
>  /**
>   * dma_alloc_contiguous() - allocate contiguous pages
>   * @dev:   Pointer to device for which the allocation is performed.
> @@ -231,24 +238,14 @@ bool dma_release_from_contiguous(struct device
> *dev, struct page *pages,
>   */
>  struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
> {
> - size_t count = size >> PAGE_SHIFT;
> - struct page *page = NULL;
> - struct cma *cma = NULL;
> -
> - if (dev && dev->cma_area)
> - cma = dev->cma_area;
> - else if (count > 1)
> - cma = dma_contiguous_default_area;
> -
>   /* CMA can be used only in the context which permits sleeping */
> - if (cma && gfpflags_allow_blocking(gfp)) {
> - size_t align = get_order(size);
> - size_t cma_align = min_t(size_t, align, CONFIG_CMA_ALIGNMENT);
> -
> - page = cma_alloc(cma, count, cma_align, gfp & __GFP_NOWARN);
> - }
> -
> - return page;
> + if (!gfpflags_allow_blocking(gfp))
> + return NULL;
> + if (dev->cma_area)
> + return cma_alloc_aligned(dev->cma_area, size, gfp);
> + if (size <= PAGE_SIZE || !dma_contiguous_default_area)
> + return NULL;
> + return cma_alloc_aligned(dma_contiguous_default_area, size, gfp);
>  }
> 
>  /**
> --
> 2.27.0

Thanks
Barry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] dma-contiguous: cleanup dma_alloc_contiguous

2020-07-23 Thread Christoph Hellwig
Split out a cma_alloc_aligned helper to deal with the "interesting"
calling conventions for cma_alloc, which then allows to the main
function to be written straight forward.  This also takes advantage
of the fact that NULL dev arguments have been gone from the DMA API
for a while.

Signed-off-by: Christoph Hellwig 
---

Changes since v1:
 - actually pass on the select struct cma
 - clean up cma_alloc_aligned a bit

 kernel/dma/contiguous.c | 31 ++-
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 15bc5026c485f2..cff7e60968b9e1 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -215,6 +215,13 @@ bool dma_release_from_contiguous(struct device *dev, 
struct page *pages,
return cma_release(dev_get_cma_area(dev), pages, count);
 }
 
+static struct page *cma_alloc_aligned(struct cma *cma, size_t size, gfp_t gfp)
+{
+   unsigned int align = min(get_order(size), CONFIG_CMA_ALIGNMENT);
+
+   return cma_alloc(cma, size >> PAGE_SHIFT, align, gfp & __GFP_NOWARN);
+}
+
 /**
  * dma_alloc_contiguous() - allocate contiguous pages
  * @dev:   Pointer to device for which the allocation is performed.
@@ -231,24 +238,14 @@ bool dma_release_from_contiguous(struct device *dev, 
struct page *pages,
  */
 struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
 {
-   size_t count = size >> PAGE_SHIFT;
-   struct page *page = NULL;
-   struct cma *cma = NULL;
-
-   if (dev && dev->cma_area)
-   cma = dev->cma_area;
-   else if (count > 1)
-   cma = dma_contiguous_default_area;
-
/* CMA can be used only in the context which permits sleeping */
-   if (cma && gfpflags_allow_blocking(gfp)) {
-   size_t align = get_order(size);
-   size_t cma_align = min_t(size_t, align, CONFIG_CMA_ALIGNMENT);
-
-   page = cma_alloc(cma, count, cma_align, gfp & __GFP_NOWARN);
-   }
-
-   return page;
+   if (!gfpflags_allow_blocking(gfp))
+   return NULL;
+   if (dev->cma_area)
+   return cma_alloc_aligned(dev->cma_area, size, gfp);
+   if (size <= PAGE_SIZE || !dma_contiguous_default_area)
+   return NULL;
+   return cma_alloc_aligned(dma_contiguous_default_area, size, gfp);
 }
 
 /**
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/2] dma-direct: provide the ability to reserve per-numa CMA

2020-07-23 Thread Christoph Hellwig
On Wed, Jul 22, 2020 at 09:41:50PM +, Song Bao Hua (Barry Song) wrote:
> I got a kernel robot warning which said dev should be checked before being 
> accessed
> when I did a similar change in v1. Probably it was an invalid warning if dev 
> should
> never be null.

That usually shows up if a function is inconsistent about sometimes
checking it and sometimes now.

> Yes, it looks much better.

Below is a prep patch to rebase on top of:

---
>From b81a5e1da65fce9750f0a8b66dbb6f842cbfdd4d Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Wed, 22 Jul 2020 16:33:43 +0200
Subject: dma-contiguous: cleanup dma_alloc_contiguous

Split out a cma_alloc_aligned helper to deal with the "interesting"
calling conventions for cma_alloc, which then allows to the main
function to be written straight forward.  This also takes advantage
of the fact that NULL dev arguments have been gone from the DMA API
for a while.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/contiguous.c | 31 ++-
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 15bc5026c485f2..cff7e60968b9e1 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -215,6 +215,13 @@ bool dma_release_from_contiguous(struct device *dev, 
struct page *pages,
return cma_release(dev_get_cma_area(dev), pages, count);
 }
 
+static struct page *cma_alloc_aligned(struct cma *cma, size_t size, gfp_t gfp)
+{
+   unsigned int align = min(get_order(size), CONFIG_CMA_ALIGNMENT);
+
+   return cma_alloc(cma, size >> PAGE_SHIFT, align, gfp & __GFP_NOWARN);
+}
+
 /**
  * dma_alloc_contiguous() - allocate contiguous pages
  * @dev:   Pointer to device for which the allocation is performed.
@@ -231,24 +238,14 @@ bool dma_release_from_contiguous(struct device *dev, 
struct page *pages,
  */
 struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
 {
-   size_t count = size >> PAGE_SHIFT;
-   struct page *page = NULL;
-   struct cma *cma = NULL;
-
-   if (dev && dev->cma_area)
-   cma = dev->cma_area;
-   else if (count > 1)
-   cma = dma_contiguous_default_area;
-
/* CMA can be used only in the context which permits sleeping */
-   if (cma && gfpflags_allow_blocking(gfp)) {
-   size_t align = get_order(size);
-   size_t cma_align = min_t(size_t, align, CONFIG_CMA_ALIGNMENT);
-
-   page = cma_alloc(cma, count, cma_align, gfp & __GFP_NOWARN);
-   }
-
-   return page;
+   if (!gfpflags_allow_blocking(gfp))
+   return NULL;
+   if (dev->cma_area)
+   return cma_alloc_aligned(dev->cma_area, size, gfp);
+   if (size <= PAGE_SIZE || !dma_contiguous_default_area)
+   return NULL;
+   return cma_alloc_aligned(dma_contiguous_default_area, size, gfp);
 }
 
 /**
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/2] dma-direct: provide the ability to reserve per-numa CMA

2020-07-23 Thread Christoph Hellwig
On Wed, Jul 22, 2020 at 09:26:03PM +, Song Bao Hua (Barry Song) wrote:
> I understand your concern. Anyway, The primary purpose of this patchset is 
> providing
> a general way for users like IOMMU to get local coherent dma buffers to put 
> their
> command queue and page tables in. The first user case is what really made me
> begin to prepare this patchset.
> 
> For the second case, it is probably a positive side effect of this patchset 
> for those users
> who have more concern on performance than dma security, then they maybe skip
> IOMMU by
>   iommu.passthrough=
>   [ARM64, X86] Configure DMA to bypass the IOMMU by 
> default.
>   Format: { "0" | "1" }
>   0 - Use IOMMU translation for DMA.
>   1 - Bypass the IOMMU for DMA.
>   unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
> In this case, they can get local memory and get better performance.
> However, it is not the primary purpose of this patchset.

That's not what I mean.  Hardcoding the CMA regions in the kernel
config is just a bad idea, and we should not add more hard coded values.
You can always use CONFIG_CMDLINE to force a specific kernel command
line including your options.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] iommu: Add gfp parameter to io_pgtable_ops->map()

2020-07-23 Thread Baolin Wang
On Tue, Jul 14, 2020 at 09:28:21AM +0100, Will Deacon wrote:
> On Fri, Jun 12, 2020 at 11:39:55AM +0800, Baolin Wang wrote:
> > Now the ARM page tables are always allocated by GFP_ATOMIC parameter,
> > but the iommu_ops->map() function has been added a gfp_t parameter by
> > commit 781ca2de89ba ("iommu: Add gfp parameter to iommu_ops::map"),
> > thus io_pgtable_ops->map() should use the gfp parameter passed from
> > iommu_ops->map() to allocate page pages, which can avoid wasting the
> > memory allocators atomic pools for some non-atomic contexts.
> > 
> > Signed-off-by: Baolin Wang 
> > ---
> >  drivers/gpu/drm/panfrost/panfrost_mmu.c |  2 +-
> >  drivers/iommu/arm-smmu-v3.c |  2 +-
> >  drivers/iommu/arm-smmu.c|  2 +-
> >  drivers/iommu/io-pgtable-arm-v7s.c  | 18 +-
> >  drivers/iommu/io-pgtable-arm.c  | 18 +-
> >  drivers/iommu/ipmmu-vmsa.c  |  2 +-
> >  drivers/iommu/msm_iommu.c   |  2 +-
> >  drivers/iommu/mtk_iommu.c   |  2 +-
> >  drivers/iommu/qcom_iommu.c  |  2 +-
> >  include/linux/io-pgtable.h  |  2 +-
> >  10 files changed, 26 insertions(+), 26 deletions(-)
> 
> I was a bit nervous about us passing GFP_KERNEL with a spinlock held, but
> it looks like you've checked all the callsites and it looks fine to me, so:
> 
> Acked-by: Will Deacon 
> 
> Joerg -- not sure what you want to do with this one, as it's likely to
> conflict (trivially) with unrelated driver changes.

Thanks Will. Joerg, could you apply this 2 patches if no objection from
your side? Thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info

2020-07-23 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker 
> Sent: Friday, July 17, 2020 5:09 PM
> 
> On Thu, Jul 16, 2020 at 10:38:17PM +0200, Auger Eric wrote:
> > Hi Jean,
> >
> > On 7/16/20 5:39 PM, Jean-Philippe Brucker wrote:
> > > On Tue, Jul 14, 2020 at 10:12:49AM +, Liu, Yi L wrote:
> > >>> Have you verified that this doesn't break the existing usage of
> > >>> DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?
> > >>
> > >> I didn't have ARM machine on my hand. But I contacted with Jean
> > >> Philippe, he confirmed no compiling issue. I didn't see any code
> > >> getting DOMAIN_ATTR_NESTING attr in current
> drivers/vfio/vfio_iommu_type1.c.
> > >> What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
> > >> and won't fail if the iommu_domai_get_attr() returns 0. This patch
> > >> returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
> > >> value is 0 if no error. So I guess it won't fail nesting for ARM.
> > >
> > > I confirm that this series doesn't break the current support for
> > > VFIO_IOMMU_TYPE1_NESTING with an SMMUv3. That said...
> > >
> > > If the SMMU does not support stage-2 then there is a change in behavior
> > > (untested): after the domain is silently switched to stage-1 by the SMMU
> > > driver, VFIO will now query nesting info and obtain -ENODEV. Instead of
> > > succeding as before, the VFIO ioctl will now fail. I believe that's a fix
> > > rather than a regression, it should have been like this since the
> > > beginning. No known userspace has been using VFIO_IOMMU_TYPE1_NESTING
> so
> > > far, so I don't think it should be a concern.
> > But as Yi mentioned ealier, in the current vfio code there is no
> > DOMAIN_ATTR_NESTING query yet.
> 
> That's why something that would have succeeded before will now fail:
> Before this series, if user asked for a VFIO_IOMMU_TYPE1_NESTING, it would
> have succeeded even if the SMMU didn't support stage-2, as the driver
> would have silently fallen back on stage-1 mappings (which work exactly
> the same as stage-2-only since there was no nesting supported). After the
> series, we do check for DOMAIN_ATTR_NESTING so if user asks for
> VFIO_IOMMU_TYPE1_NESTING and the SMMU doesn't support stage-2, the ioctl
> fails.

I think this depends on iommu driver. I noticed current SMMU driver
doesn't check physical IOMMU about nesting capability. So I guess
the SET_IOMMU would still work for SMMU. but it will fail as you
mentioned in prior email, userspace will check the nesting info and
would fail as it gets an empty struct from host.

https://lore.kernel.org/kvm/20200716153959.GA447208@myrica/

> 
> I believe it's a good fix and completely harmless, but wanted to make sure
> no one objects because it's an ABI change.

yes.

Regards,
Yi Liu

> Thanks,
> Jean
> 
> > In my SMMUV3 nested stage series, I added
> > such a query in vfio-pci.c to detect if I need to expose a fault region
> > but I already test both the returned value and the output arg. So to me
> > there is no issue with that change.
> > >
> > > And if userspace queries the nesting properties using the new ABI
> > > introduced in this patchset, it will obtain an empty struct. I think
> > > that's acceptable, but it may be better to avoid adding the nesting cap if
> > > @format is 0?
> > agreed
> >
> > Thanks
> >
> > Eric
> > >
> > > Thanks,
> > > Jean
> > >
> > >>
> > >> @Eric, how about your opinion? your dual-stage vSMMU support may
> > >> also share the vfio_iommu_type1.c code.
> > >>
> > >> Regards,
> > >> Yi Liu
> > >>
> > >>> Will
> > >
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info

2020-07-23 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker 
> Sent: Thursday, July 16, 2020 11:40 PM
> 
> On Tue, Jul 14, 2020 at 10:12:49AM +, Liu, Yi L wrote:
> > > Have you verified that this doesn't break the existing usage of
> > > DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?
> >
> > I didn't have ARM machine on my hand. But I contacted with Jean
> > Philippe, he confirmed no compiling issue. I didn't see any code
> > getting DOMAIN_ATTR_NESTING attr in current drivers/vfio/vfio_iommu_type1.c.
> > What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
> > and won't fail if the iommu_domai_get_attr() returns 0. This patch
> > returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
> > value is 0 if no error. So I guess it won't fail nesting for ARM.
> 
> I confirm that this series doesn't break the current support for
> VFIO_IOMMU_TYPE1_NESTING with an SMMUv3. That said...

thanks.

> If the SMMU does not support stage-2 then there is a change in behavior
> (untested): after the domain is silently switched to stage-1 by the SMMU
> driver, VFIO will now query nesting info and obtain -ENODEV. Instead of
> succeding as before, the VFIO ioctl will now fail. I believe that's a fix
> rather than a regression, it should have been like this since the
> beginning. No known userspace has been using VFIO_IOMMU_TYPE1_NESTING so
> far, so I don't think it should be a concern.
> 
> And if userspace queries the nesting properties using the new ABI
> introduced in this patchset, it will obtain an empty struct.

yes.

> I think
> that's acceptable, but it may be better to avoid adding the nesting cap if
> @format is 0?

right. will add it in patch 4/15.

Regards,
Yi Liu

> 
> Thanks,
> Jean
> 
> >
> > @Eric, how about your opinion? your dual-stage vSMMU support may
> > also share the vfio_iommu_type1.c code.
> >
> > Regards,
> > Yi Liu
> >
> > > Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-contiguous: cleanup dma_alloc_contiguous

2020-07-23 Thread Christoph Hellwig
On Wed, Jul 22, 2020 at 11:00:48PM -0700, Nicolin Chen wrote:
> On Wed, Jul 22, 2020 at 04:43:07PM +0200, Christoph Hellwig wrote:
> > Split out a cma_alloc_aligned helper to deal with the "interesting"
> > calling conventions for cma_alloc, which then allows to the main
> > function to be written straight forward.  This also takes advantage
> > of the fact that NULL dev arguments have been gone from the DMA API
> > for a while.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  kernel/dma/contiguous.c | 31 ++-
> >  1 file changed, 14 insertions(+), 17 deletions(-)
> > 
> > diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> > index 15bc5026c485f2..f16b8d3f9932de 100644
> > --- a/kernel/dma/contiguous.c
> > +++ b/kernel/dma/contiguous.c
> > @@ -215,6 +215,13 @@ bool dma_release_from_contiguous(struct device *dev, 
> > struct page *pages,
> > return cma_release(dev_get_cma_area(dev), pages, count);
> >  }
> >  
> > +static struct page *cma_alloc_aligned(struct cma *cma, size_t size, gfp_t 
> > gfp)
> > +{
> > +   return cma_alloc(dma_contiguous_default_area, size >> PAGE_SHIFT,
> 
> Probably should be 'cma' here instead of 'dma_contiguous_default_area'?

Yes.  If only I could test the per-device CMAs on a normal x86 setup :)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-contiguous: cleanup dma_alloc_contiguous

2020-07-23 Thread Nicolin Chen
On Wed, Jul 22, 2020 at 04:43:07PM +0200, Christoph Hellwig wrote:
> Split out a cma_alloc_aligned helper to deal with the "interesting"
> calling conventions for cma_alloc, which then allows to the main
> function to be written straight forward.  This also takes advantage
> of the fact that NULL dev arguments have been gone from the DMA API
> for a while.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/contiguous.c | 31 ++-
>  1 file changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> index 15bc5026c485f2..f16b8d3f9932de 100644
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -215,6 +215,13 @@ bool dma_release_from_contiguous(struct device *dev, 
> struct page *pages,
>   return cma_release(dev_get_cma_area(dev), pages, count);
>  }
>  
> +static struct page *cma_alloc_aligned(struct cma *cma, size_t size, gfp_t 
> gfp)
> +{
> + return cma_alloc(dma_contiguous_default_area, size >> PAGE_SHIFT,

Probably should be 'cma' here instead of 'dma_contiguous_default_area'?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu