date:20200320

Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

2020-03-20 Thread kbuild test robot

Hi Eric,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vfio/next]
[also build test ERROR on v5.6-rc6 next-20200320]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Eric-Auger/SMMUv3-Nested-Stage-Setup-VFIO-part/20200321-040935
base:   https://github.com/awilliam/linux-vfio.git next
config: arm64-randconfig-a001-20200321 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=9.2.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   In file included from include/linux/vfio.h:16,
from drivers/vfio/vfio.c:32:
>> include/uapi/linux/vfio.h:811:34: error: field 'config' has incomplete type
 811 |  struct iommu_pasid_table_config config; /* used on SET */
 |  ^~
--
   In file included from include/linux/vfio.h:16,
from drivers/vfio/vfio_iommu_type1.c:35:
>> include/uapi/linux/vfio.h:811:34: error: field 'config' has incomplete type
 811 |  struct iommu_pasid_table_config config; /* used on SET */
 |  ^~
   drivers/vfio/vfio_iommu_type1.c: In function 'vfio_detach_pasid_table':
>> drivers/vfio/vfio_iommu_type1.c:2204:3: error: implicit declaration of 
>> function 'iommu_detach_pasid_table'; did you mean 'vfio_detach_pasid_table'? 
>> [-Werror=implicit-function-declaration]
2204 |   iommu_detach_pasid_table(d->domain);
 |   ^~~~
 |   vfio_detach_pasid_table
   drivers/vfio/vfio_iommu_type1.c: In function 'vfio_attach_pasid_table':
>> drivers/vfio/vfio_iommu_type1.c:2219:9: error: implicit declaration of 
>> function 'iommu_attach_pasid_table'; did you mean 'vfio_attach_pasid_table'? 
>> [-Werror=implicit-function-declaration]
2219 |   ret = iommu_attach_pasid_table(d->domain, >config);
 | ^~~~
 | vfio_attach_pasid_table
   cc1: some warnings being treated as errors

vim +/config +811 include/uapi/linux/vfio.h

   797  
   798  /**
   799   * VFIO_IOMMU_SET_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
   800   *  struct vfio_iommu_type1_set_pasid_table)
   801   *
   802   * The SET operation passes a PASID table to the host while the
   803   * UNSET operation detaches the one currently programmed. Setting
   804   * a table while another is already programmed replaces the old table.
   805   */
   806  struct vfio_iommu_type1_set_pasid_table {
   807  __u32   argsz;
   808  __u32   flags;
   809  #define VFIO_PASID_TABLE_FLAG_SET   (1 << 0)
   810  #define VFIO_PASID_TABLE_FLAG_UNSET (1 << 1)
 > 811  struct iommu_pasid_table_config config; /* used on SET */
   812  };
   813  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/3] iommu/vt-d: Remove redundant IOTLB flush

2020-03-20 Thread Lu Baolu


On 2020/3/21 0:20, Jacob Pan wrote:

On Fri, 20 Mar 2020 21:45:26 +0800
Lu Baolu  wrote:


On 2020/3/20 12:32, Jacob Pan wrote:

IOTLB flush already included in the PASID tear down process. There
is no need to flush again.


It seems that intel_pasid_tear_down_entry() doesn't flush the pasid
based device TLB?


I saw this code in intel_pasid_tear_down_entry(). Isn't the last line
flush the devtlb? Not in guest of course since the passdown tlb flush
is inclusive.

pasid_cache_invalidation_with_pasid(iommu, did, pasid);
iotlb_invalidation_with_pasid(iommu, did, pasid);

/* Device IOTLB doesn't need to be flushed in caching mode. */
if (!cap_caching_mode(iommu->cap))
devtlb_invalidation_with_pasid(iommu, dev, pasid);



But devtlb_invalidation_with_pasid() doesn't do the right thing, it
flushes the device tlb, instead of pasid-based device tlb.

static void
devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
   struct device *dev, int pasid)
{
struct device_domain_info *info;
u16 sid, qdep, pfsid;

info = dev->archdata.iommu;
if (!info || !info->ats_enabled)
return;

sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
pfsid = info->pfsid;

qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);

}

Best regards,
baolu


Best regards,
baolu



Cc: Lu Baolu 
Signed-off-by: Jacob Pan 
---
   drivers/iommu/intel-svm.c | 6 ++
   1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 8f42d717d8d7..1483f1845762 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -268,10 +268,9 @@ static void intel_mm_release(struct
mmu_notifier *mn, struct mm_struct *mm)
 * *has* to handle gracefully without affecting other
processes. */
rcu_read_lock();
-   list_for_each_entry_rcu(sdev, >devs, list) {
+   list_for_each_entry_rcu(sdev, >devs, list)
intel_pasid_tear_down_entry(svm->iommu,
sdev->dev, svm->pasid);
-   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
-   }
+
rcu_read_unlock();
   
   }

@@ -731,7 +730,6 @@ int intel_svm_unbind_mm(struct device *dev, int
pasid)
 * large and has to be physically
contiguous. So it's
 * hard to be as defensive as we might
like. */ intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
-   intel_flush_svm_range_dev(svm, sdev, 0,
-1, 0); kfree_rcu(sdev, rcu);
   
   			if (list_empty(>devs)) {
   


[Jacob Pan]


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V10 00/11] Nested Shared Virtual Address (SVA) VT-d support

2020-03-20 Thread Jacob Pan

Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to enable SVA virtualization, i.e. enable use of SVA
within a guest user application.

This is the remaining portion of the original patchset that is based on
Joerg's x86/vt-d branch. The preparatory and cleanup patches are merged here.
(git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git)

Only IOMMU portion of the changes are included in this series. Additional
support is needed in VFIO and QEMU (will be submitted separately) to complete
this functionality.

To make incremental changes and reduce the size of each patchset. This series
does not inlcude support for page request services.

In VT-d implementation, PASID table is per device and maintained in the host.
Guest PASID table is shadowed in VMM where virtual IOMMU is emulated.

.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

This is the remaining VT-d only portion of V5 since the uAPIs and IOASID common
code have been applied to Joerg's IOMMU core branch.
(https://lkml.org/lkml/2019/10/2/833)

The complete set with VFIO patches are here:
https://github.com/jacobpan/linux.git:siov_sva

The complete nested SVA upstream patches are divided into three phases:
1. Common APIs and PCI device direct assignment
2. Page Request Services (PRS) support
3. Mediated device assignment

With this set and the accompanied VFIO code, we will achieve phase #1.

Thanks,

Jacob

ChangeLog:
- v10
  - Addressed Eric's review in v7 and v9. Most fixes are in 3/10 and
6/10. Extra condition checks and consolidation of duplicated codes.

- v9
  - Addressed Baolu's comments for v8 for IOTLB flush consolidation,
bug fixes
  - Removed IOASID notifier code which will be submitted separately
to address PASID life cycle management with multiple users.

- v8
  - Extracted cleanup patches from V7 and accepted into maintainer's
tree (https://lkml.org/lkml/2019/12/2/514).
  - Added IOASID notifier and VT-d handler for termination of PASID
IOMMU context upon free. This will ensure success of VFIO IOASID
free API regardless PASID is in use.

(https://lore.kernel.org/linux-iommu/1571919983-3231-1-git-send-email-yi.l@intel.com/)

- V7
  - Respect vIOMMU PASID range in virtual command PASID/IOASID allocator
  - Caching virtual command capabilities to avoid runtime checks that
could cause vmexits.

- V6
  - Rebased on top of Joerg's core branch
  (git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core)
  - Adapt to new uAPIs and IOASID allocators

- V5
  Rebased on v5.3-rc4 which has some of the IOMMU fault APIs merged.
  Addressed v4 review comments from Eric Auger, Baolu Lu, and
Jonathan Cameron. Specific changes are as follows:
  - Refined custom IOASID allocator to support multiple vIOMMU, hotplug
cases.
  - Extracted vendor data from IOMMU guest PASID bind data, for VT-d
will support all necessary guest PASID entry fields for PASID
bind.
  - Support non-identity host-guest PASID mapping
  - Exception handling in various cases

- V4
  - Redesigned IOASID allocator such that it can support custom
  allocators with shared helper functions. Use separate XArray
  to store IOASIDs per allocator. Took advice from Eric Auger to
  have default allocator use the generic allocator structure.
  Combined into one patch in that the default allocator is just
  "another" allocator now. Can be built as a module in case of
  driver use without IOMMU.
  - Extended bind guest PASID data to support SMMU and non-identity
  guest to host PASID mapping

[PATCH V10 05/11] iommu/vt-d: Add nested translation helper function

2020-03-20 Thread Jacob Pan

Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.

This patch adds a helper function for setting up nested translation
where second level comes from a domain and first level comes from
a guest PGD.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-pasid.c | 240 +++-
 drivers/iommu/intel-pasid.h |  12 +++
 include/linux/intel-iommu.h |   3 +
 3 files changed, 252 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 9bdb7ee228b6..10c7856afc6b 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -359,6 +359,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
pasid_set_bits(>val[2], GENMASK_ULL(3, 2), value << 2);
 }
 
+/*
+ * Setup the Extended Memory Type(EMT) field (Bits 91-93)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emt(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[1], GENMASK_ULL(29, 27), value << 27);
+}
+
+/*
+ * Setup the Page Attribute Table (PAT) field (Bits 96-127)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pat(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[1], GENMASK_ULL(63, 32), value << 32);
+}
+
+/*
+ * Setup the Cache Disable (CD) field (Bit 89)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_cd(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 25, 1 << 25);
+}
+
+/*
+ * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emte(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 26, 1 << 26);
+}
+
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_eafe(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[2], 1 << 7, 1 << 7);
+}
+
+/*
+ * Setup the Page-level Cache Disable (PCD) field (Bit 95)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pcd(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 31, 1 << 31);
+}
+
+/*
+ * Setup the Page-level Write-Through (PWT)) field (Bit 94)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pwt(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 30, 1 << 30);
+}
+
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
u16 did, int pasid)
@@ -492,7 +562,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
 
/* Setup Present and PASID Granular Transfer Type: */
-   pasid_set_translation_type(pte, 1);
+   pasid_set_translation_type(pte, PASID_ENTRY_PGTT_FL_ONLY);
pasid_set_present(pte);
pasid_flush_caches(iommu, pte, pasid, did);
 
@@ -564,7 +634,7 @@ int intel_pasid_setup_second_level(struct intel_iommu 
*iommu,
pasid_set_domain_id(pte, did);
pasid_set_slptr(pte, pgd_val);
pasid_set_address_width(pte, agaw);
-   pasid_set_translation_type(pte, 2);
+   pasid_set_translation_type(pte, PASID_ENTRY_PGTT_SL_ONLY);
pasid_set_fault_enable(pte);
pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
 
@@ -598,7 +668,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
pasid_clear_entry(pte);
pasid_set_domain_id(pte, did);
pasid_set_address_width(pte, iommu->agaw);
-   pasid_set_translation_type(pte, 4);
+   pasid_set_translation_type(pte, PASID_ENTRY_PGTT_PT);
pasid_set_fault_enable(pte);
pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
 
@@ -612,3 +682,167 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
 
return 0;
 }
+
+static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
+   struct pasid_entry *pte,
+   struct iommu_gpasid_bind_data_vtd *pasid_data)
+{
+   /*
+* Not all guest PASID table entry fields are passed down during bind,
+* here we only set up the ones that are dependent on guest settings.
+* Execution related bits such as NXE, SMEP are not meaningful to IOMMU,
+* therefore not set. Other fields, such as snoop related, are set based
+* on host needs regardless of guest settings.
+*/
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
+   if (!ecap_srs(iommu->ecap)) {
+   pr_err("No supervisor request support on %s\n",
+

[PATCH V10 06/11] iommu/vt-d: Add bind guest PASID support

2020-03-20 Thread Jacob Pan

When supporting guest SVA with emulated IOMMU, the guest PASID
table is shadowed in VMM. Updates to guest vIOMMU PASID table
will result in PASID cache flush which will be passed down to
the host as bind guest PASID calls.

For the SL page tables, it will be harvested from device's
default domain (request w/o PASID), or aux domain in case of
mediated device.

.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-iommu.c |   4 +
 drivers/iommu/intel-svm.c   | 224 
 include/linux/intel-iommu.h |   8 +-
 include/linux/intel-svm.h   |  17 
 4 files changed, 252 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e599b2537b1c..b1477cd423dd 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -6203,6 +6203,10 @@ const struct iommu_ops intel_iommu_ops = {
.dev_disable_feat   = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   .sva_bind_gpasid= intel_svm_bind_gpasid,
+   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
+#endif
 };
 
 static void quirk_iommu_igfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index d7f2a5358900..47c0deb5ae56 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -226,6 +226,230 @@ static LIST_HEAD(global_svm_list);
list_for_each_entry((sdev), &(svm)->devs, list) \
if ((d) != (sdev)->dev) {} else
 
+int intel_svm_bind_gpasid(struct iommu_domain *domain,
+   struct device *dev,
+   struct iommu_gpasid_bind_data *data)
+{
+   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+   struct dmar_domain *ddomain;
+   struct intel_svm_dev *sdev;
+   struct intel_svm *svm;
+   int ret = 0;
+
+   if (WARN_ON(!iommu) || !data)
+   return -EINVAL;
+
+   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
+   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   return -EINVAL;
+
+   if (dev_is_pci(dev)) {
+   /* VT-d supports devices with full 20 bit PASIDs only */
+   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
+   return -EINVAL;
+   } else {
+   return -ENOTSUPP;
+   }
+
+   /*
+* We only check host PASID range, we have no knowledge to check
+* guest PASID range nor do we use the guest PASID.
+*/
+   if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
+   return -EINVAL;
+
+   ddomain = to_dmar_domain(domain);
+
+   /* Sanity check paging mode support match between host and guest */
+   if (data->addr_width == ADDR_WIDTH_5LEVEL &&
+   !cap_5lp_support(iommu->cap)) {
+   pr_err("Cannot support 5 level paging requested by guest!\n");
+   return -EINVAL;
+   }
+
+   mutex_lock(_mutex);
+   svm = ioasid_find(NULL, data->hpasid, NULL);
+   if (IS_ERR(svm)) {
+   ret = PTR_ERR(svm);
+   goto out;
+   }
+
+   if (svm) {
+   /*
+* If we found svm for the PASID, there must be at
+* least one device bond, otherwise svm should be freed.
+*/
+   if (WARN_ON(list_empty(>devs))) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   if (svm->mm == get_task_mm(current) &&
+   data->hpasid == svm->pasid &&
+   data->gpasid == svm->gpasid) {
+   pr_warn("Cannot bind the same guest-host PASID for the 
same process\n");
+   mmput(svm->mm);
+   ret = -EINVAL;
+   goto out;
+   }
+   mmput(current->mm);
+

[PATCH V10 07/11] iommu/vt-d: Support flushing more translation cache types

2020-03-20 Thread Jacob Pan

When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
IOTLB invalidation may be passed down from outside IOMMU subsystems.
This patch adds invalidation functions that can be used for additional
translation cache types.

Signed-off-by: Jacob Pan 

---
v9 -> v10:
Fix off by 1 in pasid device iotlb flush

Address v7 missed review from Eric

---
---
 drivers/iommu/dmar.c| 36 
 drivers/iommu/intel-pasid.c |  3 ++-
 include/linux/intel-iommu.h | 20 
 3 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index f77dae7ba7d4..4d6b7b5b37ee 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1421,6 +1421,42 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, 
u32 pasid, u64 addr,
qi_submit_sync(, iommu);
 }
 
+/* PASID-based device IOTLB Invalidate */
+void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+   u32 pasid,  u16 qdep, u64 addr, unsigned size_order, u64 granu)
+{
+   unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
+   struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
+
+   desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
+   QI_DEV_IOTLB_PFSID(pfsid);
+   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
+
+   /*
+* If S bit is 0, we only flush a single page. If S bit is set,
+* The least significant zero bit indicates the invalidation address
+* range. VT-d spec 6.5.2.6.
+* e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB.
+* size order = 0 is PAGE_SIZE 4KB
+* Max Invs Pending (MIP) is set to 0 for now until we have DIT in
+* ECAP.
+*/
+   desc.qw1 |= addr & ~mask;
+   if (size_order)
+   desc.qw1 |= QI_DEV_EIOTLB_SIZE;
+
+   qi_submit_sync(, iommu);
+}
+
+void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int 
pasid)
+{
+   struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
+
+   desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | 
QI_PC_TYPE;
+   qi_submit_sync(, iommu);
+}
+
 /*
  * Disable Queued Invalidation interface.
  */
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 10c7856afc6b..9f6d07410722 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -435,7 +435,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu 
*iommu,
 {
struct qi_desc desc;
 
-   desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+   desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
+   QI_PC_PASID(pasid) | QI_PC_TYPE;
desc.qw1 = 0;
desc.qw2 = 0;
desc.qw3 = 0;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 85b05120940e..43539713b3b3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -334,7 +334,7 @@ enum {
 #define QI_IOTLB_GRAN(gran)(((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
 #define QI_IOTLB_ADDR(addr)(((u64)addr) & VTD_PAGE_MASK)
 #define QI_IOTLB_IH(ih)(((u64)ih) << 6)
-#define QI_IOTLB_AM(am)(((u8)am))
+#define QI_IOTLB_AM(am)(((u8)am) & 0x3f)
 
 #define QI_CC_FM(fm)   (((u64)fm) << 48)
 #define QI_CC_SID(sid) (((u64)sid) << 32)
@@ -353,16 +353,21 @@ enum {
 #define QI_PC_DID(did) (((u64)did) << 16)
 #define QI_PC_GRAN(gran)   (((u64)gran) << 4)
 
-#define QI_PC_ALL_PASIDS   (QI_PC_TYPE | QI_PC_GRAN(0))
-#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1))
+/* PASID cache invalidation granu */
+#define QI_PC_ALL_PASIDS   0
+#define QI_PC_PASID_SEL1
 
 #define QI_EIOTLB_ADDR(addr)   ((u64)(addr) & VTD_PAGE_MASK)
 #define QI_EIOTLB_IH(ih)   (((u64)ih) << 6)
-#define QI_EIOTLB_AM(am)   (((u64)am))
+#define QI_EIOTLB_AM(am)   (((u64)am) & 0x3f)
 #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
 #define QI_EIOTLB_DID(did) (((u64)did) << 16)
 #define QI_EIOTLB_GRAN(gran)   (((u64)gran) << 4)
 
+/* QI Dev-IOTLB inv granu */
+#define QI_DEV_IOTLB_GRAN_ALL  1
+#define QI_DEV_IOTLB_GRAN_PASID_SEL0
+
 #define QI_DEV_EIOTLB_ADDR(a)  ((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
 #define QI_DEV_EIOTLB_GLOB(g)  ((u64)g)
@@ -662,8 +667,15 @@ extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 
did, u64 addr,
  unsigned int size_order, u64 type);
 extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
u16 qdep, u64 addr, unsigned mask);
+
 void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
 unsigned long npages, bool ih);
+
+extern void qi_flush_dev_iotlb_pasid(struct intel_iommu

[PATCH V10 10/11] iommu/vt-d: Enlightened PASID allocation

2020-03-20 Thread Jacob Pan

From: Lu Baolu 

Enabling IOMMU in a guest requires communication with the host
driver for certain aspects. Use of PASID ID to enable Shared Virtual
Addressing (SVA) requires managing PASID's in the host. VT-d 3.0 spec
provides a Virtual Command Register (VCMD) to facilitate this.
Writes to this register in the guest are trapped by QEMU which
proxies the call to the host driver.

This virtual command interface consists of a capability register,
a virtual command register, and a virtual response register. Refer
to section 10.4.42, 10.4.43, 10.4.44 for more information.

This patch adds the enlightened PASID allocation/free interfaces
via the virtual command interface.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel-pasid.c | 57 +
 drivers/iommu/intel-pasid.h | 13 ++-
 include/linux/intel-iommu.h |  1 +
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 9f6d07410722..e87ad67aad36 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -27,6 +27,63 @@
 static DEFINE_SPINLOCK(pasid_lock);
 u32 intel_pasid_max_id = PASID_MAX;
 
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
+{
+   unsigned long flags;
+   u8 status_code;
+   int ret = 0;
+   u64 res;
+
+   raw_spin_lock_irqsave(>register_lock, flags);
+   dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
+   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+   raw_spin_unlock_irqrestore(>register_lock, flags);
+
+   status_code = VCMD_VRSP_SC(res);
+   switch (status_code) {
+   case VCMD_VRSP_SC_SUCCESS:
+   *pasid = VCMD_VRSP_RESULT_PASID(res);
+   break;
+   case VCMD_VRSP_SC_NO_PASID_AVAIL:
+   pr_info("IOMMU: %s: No PASID available\n", iommu->name);
+   ret = -ENOSPC;
+   break;
+   default:
+   ret = -ENODEV;
+   pr_warn("IOMMU: %s: Unexpected error code %d\n",
+   iommu->name, status_code);
+   }
+
+   return ret;
+}
+
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
+{
+   unsigned long flags;
+   u8 status_code;
+   u64 res;
+
+   raw_spin_lock_irqsave(>register_lock, flags);
+   dmar_writeq(iommu->reg + DMAR_VCMD_REG,
+   VCMD_CMD_OPERAND(pasid) | VCMD_CMD_FREE);
+   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+   raw_spin_unlock_irqrestore(>register_lock, flags);
+
+   status_code = VCMD_VRSP_SC(res);
+   switch (status_code) {
+   case VCMD_VRSP_SC_SUCCESS:
+   break;
+   case VCMD_VRSP_SC_INVALID_PASID:
+   pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
+   break;
+   default:
+   pr_warn("IOMMU: %s: Unexpected error code %d\n",
+   iommu->name, status_code);
+   }
+}
+
 /*
  * Per device pasid table management:
  */
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 698015ee3f04..cd3d63f3e936 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -23,6 +23,16 @@
 #define is_pasid_enabled(entry)(((entry)->lo >> 3) & 0x1)
 #define get_pasid_dir_size(entry)  (1 << entry)->lo >> 9) & 0x7) + 7))
 
+/* Virtual command interface for enlightened pasid management. */
+#define VCMD_CMD_ALLOC 0x1
+#define VCMD_CMD_FREE  0x2
+#define VCMD_VRSP_IP   0x1
+#define VCMD_VRSP_SC(e)(((e) >> 1) & 0x3)
+#define VCMD_VRSP_SC_SUCCESS   0
+#define VCMD_VRSP_SC_NO_PASID_AVAIL1
+#define VCMD_VRSP_SC_INVALID_PASID 1
+#define VCMD_VRSP_RESULT_PASID(e)  (((e) >> 8) & 0xf)
+#define VCMD_CMD_OPERAND(e)((e) << 8)
 /*
  * Domain ID reserved for pasid entries programmed for first-level
  * only and pass-through transfer modes.
@@ -113,5 +123,6 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu,
int addr_width);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 struct device *dev, int pasid);
-
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
 #endif /* __INTEL_PASID_H */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ccbf164fb711..9cbf5357138b 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -169,6 +169,7 @@
 #define ecap_smpwc(e)  (((e) >> 48) & 0x1)
 #define ecap_flts(e)   (((e) >> 47) & 0x1)
 #define ecap_slts(e)   (((e) >> 46) &

[PATCH V10 08/11] iommu/vt-d: Add svm/sva invalidate function

2020-03-20 Thread Jacob Pan

When Shared Virtual Address (SVA) is enabled for a guest OS via
vIOMMU, we need to provide invalidation support at IOMMU API and driver
level. This patch adds Intel VT-d specific function to implement
iommu passdown invalidate API for shared virtual address.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

---
v7 review fixed in v10
---

Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-iommu.c | 182 
 1 file changed, 182 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b1477cd423dd..a76afb0fd51a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5619,6 +5619,187 @@ static void intel_iommu_aux_detach_device(struct 
iommu_domain *domain,
aux_domain_remove_dev(to_dmar_domain(domain), dev);
 }
 
+/*
+ * 2D array for converting and sanitizing IOMMU generic TLB granularity to
+ * VT-d granularity. Invalidation is typically included in the unmap operation
+ * as a result of DMA or VFIO unmap. However, for assigned devices guest
+ * owns the first level page tables. Invalidations of translation caches in the
+ * guest are trapped and passed down to the host.
+ *
+ * vIOMMU in the guest will only expose first level page tables, therefore
+ * we do not include IOTLB granularity for request without PASID (second 
level).
+ *
+ * For example, to find the VT-d granularity encoding for IOTLB
+ * type and page selective granularity within PASID:
+ * X: indexed by iommu cache type
+ * Y: indexed by enum iommu_inv_granularity
+ * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
+ *
+ * Granu_map array indicates validity of the table. 1: valid, 0: invalid
+ *
+ */
+const static int 
inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+   /*
+* PASID based IOTLB invalidation: PASID selective (per PASID),
+* page selective (address granularity)
+*/
+   {0, 1, 1},
+   /* PASID based dev TLBs, only support all PASIDs or single PASID */
+   {1, 1, 0},
+   /* PASID cache */
+   {1, 1, 0}
+};
+
+const static int 
inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+   /* PASID based IOTLB */
+   {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
+   /* PASID based dev TLBs */
+   {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
+   /* PASID cache */
+   {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
+};
+
+static inline int to_vtd_granularity(int type, int granu, int *vtd_granu)
+{
+   if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR ||
+   !inv_type_granu_map[type][granu])
+   return -EINVAL;
+
+   *vtd_granu = inv_type_granu_table[type][granu];
+
+   return 0;
+}
+
+static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
+{
+   u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
+
+   /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
+* IOMMU cache invalidate API passes granu_size in bytes, and number of
+* granu size in contiguous memory.
+*/
+   return order_base_2(nr_pages);
+}
+
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct iommu_cache_invalidate_info 
*inv_info)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct device_domain_info *info;
+   struct intel_iommu *iommu;
+   unsigned long flags;
+   int cache_type;
+   u8 bus, devfn;
+   u16 did, sid;
+   int ret = 0;
+   u64 size = 0;
+
+   if (!inv_info || !dmar_domain ||
+   inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;
+
+   iommu = device_to_iommu(dev, , );
+   if (!iommu)
+   return -ENODEV;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   spin_lock(>lock);
+   info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
+   if (!info) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+   did = dmar_domain->iommu_did[iommu->seq_id];
+   sid = PCI_DEVID(bus, devfn);
+
+   /* Size is only valid in non-PASID selective invalidation */
+   if (inv_info->granularity != IOMMU_INV_GRANU_PASID)
+   size = to_vtd_size(inv_info->addr_info.granule_size,
+  inv_info->addr_info.nb_granules);

[PATCH V10 11/11] iommu/vt-d: Add custom allocator for IOASID

2020-03-20 Thread Jacob Pan

When VT-d driver runs in the guest, PASID allocation must be
performed via virtual command interface. This patch registers a
custom IOASID allocator which takes precedence over the default
XArray based allocator. The resulting IOASID allocation will always
come from the host. This ensures that PASID namespace is system-
wide.

Signed-off-by: Lu Baolu 
Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 84 +
 include/linux/intel-iommu.h |  2 ++
 2 files changed, 86 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a76afb0fd51a..c1c0b0fb93c3 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1757,6 +1757,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
}
+   if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
+   ioasid_unregister_allocator(>pasid_allocator);
+
 #endif
 }
 
@@ -3291,6 +3294,84 @@ static int copy_translation_tables(struct intel_iommu 
*iommu)
return ret;
 }
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
+{
+   struct intel_iommu *iommu = data;
+   ioasid_t ioasid;
+
+   if (!iommu)
+   return INVALID_IOASID;
+   /*
+* VT-d virtual command interface always uses the full 20 bit
+* PASID range. Host can partition guest PASID range based on
+* policies but it is out of guest's control.
+*/
+   if (min < PASID_MIN || max > intel_pasid_max_id)
+   return INVALID_IOASID;
+
+   if (vcmd_alloc_pasid(iommu, ))
+   return INVALID_IOASID;
+
+   return ioasid;
+}
+
+static void intel_ioasid_free(ioasid_t ioasid, void *data)
+{
+   struct intel_iommu *iommu = data;
+
+   if (!iommu)
+   return;
+   /*
+* Sanity check the ioasid owner is done at upper layer, e.g. VFIO
+* We can only free the PASID when all the devices are unbound.
+*/
+   if (ioasid_find(NULL, ioasid, NULL)) {
+   pr_alert("Cannot free active IOASID %d\n", ioasid);
+   return;
+   }
+   vcmd_free_pasid(iommu, ioasid);
+}
+
+static void register_pasid_allocator(struct intel_iommu *iommu)
+{
+   /*
+* If we are running in the host, no need for custom allocator
+* in that PASIDs are allocated from the host system-wide.
+*/
+   if (!cap_caching_mode(iommu->cap))
+   return;
+
+   if (!sm_supported(iommu)) {
+   pr_warn("VT-d Scalable Mode not enabled, no PASID 
allocation\n");
+   return;
+   }
+
+   /*
+* Register a custom PASID allocator if we are running in a guest,
+* guest PASID must be obtained via virtual command interface.
+* There can be multiple vIOMMUs in each guest but only one allocator
+* is active. All vIOMMU allocators will eventually be calling the same
+* host allocator.
+*/
+   if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
+   pr_info("Register custom PASID allocator\n");
+   iommu->pasid_allocator.alloc = intel_ioasid_alloc;
+   iommu->pasid_allocator.free = intel_ioasid_free;
+   iommu->pasid_allocator.pdata = (void *)iommu;
+   if (ioasid_register_allocator(>pasid_allocator)) {
+   pr_warn("Custom PASID allocator failed, scalable mode 
disabled\n");
+   /*
+* Disable scalable mode on this IOMMU if there
+* is no custom allocator. Mixing SM capable vIOMMU
+* and non-SM vIOMMU are not supported.
+*/
+   intel_iommu_sm = 0;
+   }
+   }
+}
+#endif
+
 static int __init init_dmars(void)
 {
struct dmar_drhd_unit *drhd;
@@ -3408,6 +3489,9 @@ static int __init init_dmars(void)
 */
for_each_active_iommu(iommu, drhd) {
iommu_flush_write_buffer(iommu);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   register_pasid_allocator(iommu);
+#endif
iommu_set_root_entry(iommu);
iommu->flush.flush_context(iommu, 0, 0, 0, 
DMA_CCMD_GLOBAL_INVL);
iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 9cbf5357138b..9c357a325c72 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -563,6 +564,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
struct page_req_dsc *prq;
unsigned char prq_name[16];/* Name for PRQ interrupt */
+   struct

[PATCH V10 03/11] iommu/vt-d: Add a helper function to skip agaw

2020-03-20 Thread Jacob Pan

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-pasid.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 22b30f10b396..191508c7c03e 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -500,6 +500,28 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
 }
 
 /*
+ * Skip top levels of page tables for iommu which has less agaw
+ * than default. Unnecessary for PT mode.
+ */
+static inline int iommu_skip_agaw(struct dmar_domain *domain,
+ struct intel_iommu *iommu,
+ struct dma_pte **pgd)
+{
+   int agaw;
+
+   for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+   *pgd = phys_to_virt(dma_pte_addr(*pgd));
+   if (!dma_pte_present(*pgd)) {
+   return -EINVAL;
+   }
+   }
+   pr_debug_ratelimited("%s: pgd: %llx, agaw %d d_agaw %d\n", __func__, 
(u64)*pgd,
+   iommu->agaw, domain->agaw);
+
+   return agaw;
+}
+
+/*
  * Set up the scalable mode pasid entry for second only translation type.
  */
 int intel_pasid_setup_second_level(struct intel_iommu *iommu,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V10 01/11] iommu/vt-d: Move domain helper to header

2020-03-20 Thread Jacob Pan

Move domain helper to header to be used by SVA code.

Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel-iommu.c | 6 --
 include/linux/intel-iommu.h | 6 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4be549478691..e599b2537b1c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -446,12 +446,6 @@ static void init_translation_status(struct intel_iommu 
*iommu)
iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
 }
 
-/* Convert generic 'struct iommu_domain to private struct dmar_domain */
-static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
-{
-   return container_of(dom, struct dmar_domain, domain);
-}
-
 static int __init intel_iommu_setup(char *str)
 {
if (!str)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 980234ae0312..ed7171d2ae1f 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -595,6 +595,12 @@ static inline void __iommu_flush_cache(
clflush_cache_range(addr, size);
 }
 
+/* Convert generic struct iommu_domain to private struct dmar_domain */
+static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct dmar_domain, domain);
+}
+
 /*
  * 0: readable
  * 1: writable
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V10 04/11] iommu/vt-d: Use helper function to skip agaw for SL

2020-03-20 Thread Jacob Pan

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-pasid.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 191508c7c03e..9bdb7ee228b6 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -544,17 +544,11 @@ int intel_pasid_setup_second_level(struct intel_iommu 
*iommu,
return -EINVAL;
}
 
-   /*
-* Skip top levels of page tables for iommu which has less agaw
-* than default. Unnecessary for PT mode.
-*/
pgd = domain->pgd;
-   for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
-   pgd = phys_to_virt(dma_pte_addr(pgd));
-   if (!dma_pte_present(pgd)) {
-   dev_err(dev, "Invalid domain page table\n");
-   return -EINVAL;
-   }
+   agaw = iommu_skip_agaw(domain, iommu, );
+   if (agaw < 0) {
+   dev_err(dev, "Invalid domain page table\n");
+   return -EINVAL;
}
 
pgd_val = virt_to_phys(pgd);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V10 02/11] iommu/uapi: Define a mask for bind data

2020-03-20 Thread Jacob Pan

Memory type related flags can be grouped together for one simple check.

---
v9 renamed from EMT to MTS since these are memory type support flags.
---

Signed-off-by: Jacob Pan 
---
 include/uapi/linux/iommu.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 4ad3496e5c43..d7bcbc5f79b0 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -284,7 +284,10 @@ struct iommu_gpasid_bind_data_vtd {
__u32 pat;
__u32 emt;
 };
-
+#define IOMMU_SVA_VTD_GPASID_MTS_MASK  (IOMMU_SVA_VTD_GPASID_CD | \
+IOMMU_SVA_VTD_GPASID_EMTE | \
+IOMMU_SVA_VTD_GPASID_PCD |  \
+IOMMU_SVA_VTD_GPASID_PWT)
 /**
  * struct iommu_gpasid_bind_data - Information about device and guest PASID 
binding
  * @version:   Version of this data structure
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V10 09/11] iommu/vt-d: Cache virtual command capability register

2020-03-20 Thread Jacob Pan

Virtual command registers are used in the guest only, to prevent
vmexit cost, we cache the capability and store it during initialization.

Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Reviewed-by: Lu Baolu 

---
v7 Reviewed by Eric & Baolu
---
---
 drivers/iommu/dmar.c| 1 +
 include/linux/intel-iommu.h | 5 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 4d6b7b5b37ee..3b36491c8bbb 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -963,6 +963,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 
phys_addr)
warn_invalid_dmar(phys_addr, " returns all ones");
goto unmap;
}
+   iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
 
/* the registers might be more than one page */
map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 43539713b3b3..ccbf164fb711 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -194,6 +194,9 @@
 #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
 #define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */
 
+/* Virtual command interface capabilities */
+#define vccap_pasid(v) ((v & DMA_VCS_PAS)) /* PASID allocation */
+
 /* IOTLB_REG */
 #define DMA_TLB_FLUSH_GRANU_OFFSET  60
 #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
@@ -287,6 +290,7 @@
 
 /* PRS_REG */
 #define DMA_PRS_PPR((u32)1)
+#define DMA_VCS_PAS((u64)1)
 
 #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)\
 do {   \
@@ -537,6 +541,7 @@ struct intel_iommu {
u64 reg_size; /* size of hw register set */
u64 cap;
u64 ecap;
+   u64 vccap;
u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
raw_spinlock_t  register_lock; /* protect register handling */
int seq_id; /* sequence id of the iommu */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 09/13] dma-iommu: Implement NESTED_MSI cookie

2020-03-20 Thread kbuild test robot

Hi Eric,

I love your patch! Yet something to improve:

[auto build test ERROR on iommu/next]
[also build test ERROR on linus/master v5.6-rc6 next-20200320]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Eric-Auger/SMMUv3-Nested-Stage-Setup-IOMMU-part/20200321-040627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: arm-exynos_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=9.2.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   In file included from drivers/gpu/drm/exynos/exynos_drm_dma.c:7:
>> include/linux/dma-iommu.h:90:1: error: expected identifier or '(' before '{' 
>> token
  90 | {
 | ^
   In file included from drivers/gpu/drm/exynos/exynos_drm_dma.c:7:
   include/linux/dma-iommu.h:89:1: warning: 'iommu_dma_unbind_guest_msi' 
declared 'static' but never defined [-Wunused-function]
  89 | iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t 
giova);
 | ^~

vim +90 include/linux/dma-iommu.h

87  
88  static inline void
89  iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t 
giova);
  > 90  {
91  }
92  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V9 02/10] iommu/uapi: Define a mask for bind data

2020-03-20 Thread Jacob Pan

On Wed, 12 Feb 2020 13:43:43 +0100
Auger Eric  wrote:

> Hi Jacob,
> 
> On 1/29/20 7:01 AM, Jacob Pan wrote:
> > Memory type related guest PASID bind data can be grouped together
> > for one simple check.  
> Those are flags related to memory type.
right, will rephrase.
> > Link:
> > https://lore.kernel.org/linux-iommu/20200109095123.17ed5e6b@jacob-builder/  
> not sure the link is really helpful.
> > 
will delete. the patch is very simple.

> > Signed-off-by: Jacob Pan 
> > ---
> >  include/uapi/linux/iommu.h | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 4ad3496e5c43..fcafb6401430 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -284,7 +284,10 @@ struct iommu_gpasid_bind_data_vtd {
> > __u32 pat;
> > __u32 emt;
> >  };
> > -
> > +#define IOMMU_SVA_VTD_GPASID_EMT_MASK
> > (IOMMU_SVA_VTD_GPASID_CD | \
> > +IOMMU_SVA_VTD_GPASID_EMTE
> > | \
> > +IOMMU_SVA_VTD_GPASID_PCD
> > |  \
> > +
> > IOMMU_SVA_VTD_GPASID_PWT)  
> Why EMT rather than MT or MTS?
> the spec says:
> Those fields are treated as Reserved(0) for implementations not
> supporting Memory Type (MTS=0 in Extended Capability Register).
> 
MTS makes more sense, will change.
It was from hygiene p.o.v. checking the flag to avoid touching these
fields.

Thanks,

Jacob
> >  /**
> >   * struct iommu_gpasid_bind_data - Information about device and
> > guest PASID binding
> >   * @version:   Version of this data structure
> >   
> 
> Thanks
> 
> Eric
> 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: arm-smmu-v3 high cpu usage for NVMe

2020-03-20 Thread Marc Zyngier


Hi John,

On 2020-03-20 16:20, John Garry wrote:




I've run a bunch of netperf instances on multiple cores and 
collecting

SMMU usage (on TaiShan 2280). I'm getting the following ratio pretty
consistently.

- 6.07% arm_smmu_iotlb_sync
 - 5.74% arm_smmu_tlb_inv_range
  5.09% arm_smmu_cmdq_issue_cmdlist
  0.28% __pi_memset
  0.08% __pi_memcpy
  0.08% arm_smmu_atc_inv_domain.constprop.37
  0.07% arm_smmu_cmdq_build_cmd
  0.01% arm_smmu_cmdq_batch_add
   0.31% __pi_memset

So arm_smmu_atc_inv_domain() takes about 1.4% of 
arm_smmu_iotlb_sync(),
when ATS is not used. According to the annotations, the load from 
the
atomic_read(), that checks whether the domain uses ATS, is 77% of 
the
samples in arm_smmu_atc_inv_domain() (265 of 345 samples), so I'm 
not sure

there is much room for optimization there.


Well I did originally suggest using RCU protection to scan the list 
of
devices, instead of reading an atomic and checking for non-zero 
value. But
that would be an optimsation for ATS also, and there was no ATS 
devices at

the time (to verify performance).


Heh, I have yet to get my hands on one. Currently I can't evaluate ATS
performance, but I agree that using RCU to scan the list should get 
better

results when using ATS.

When ATS isn't in use however, I suspect reading nr_ats_masters should 
be
more efficient than taking the RCU lock + reading an "ats_devices" 
list

(since the smmu_domain->devices list also serves context descriptor
invalidation, even when ATS isn't in use). I'll run some tests 
however, to

see if I can micro-optimize this case, but I don't expect noticeable
improvements.


ok, cheers. I, too, would not expect a significant improvement there.

JFYI, I've been playing for "perf annotate" today and it's giving
strange results for my NVMe testing. So "report" looks somewhat sane,
if not a worryingly high % for arm_smmu_cmdq_issue_cmdlist():


55.39%  irq/342-nvme0q1  [kernel.kallsyms]  [k] 
arm_smmu_cmdq_issue_cmdlist
 9.74%  irq/342-nvme0q1  [kernel.kallsyms]  [k] 
_raw_spin_unlock_irqrestore

 2.02%  irq/342-nvme0q1  [kernel.kallsyms]  [k] nvme_irq
 1.86%  irq/342-nvme0q1  [kernel.kallsyms]  [k] fput_many
 1.73%  irq/342-nvme0q1  [kernel.kallsyms]  [k]
arm_smmu_atc_inv_domain.constprop.42
 1.67%  irq/342-nvme0q1  [kernel.kallsyms]  [k] __arm_lpae_unmap
 1.49%  irq/342-nvme0q1  [kernel.kallsyms]  [k] aio_complete_rw

But "annotate" consistently tells me that a specific instruction
consumes ~99% of the load for the enqueue function:

 :  /* 5. If we are inserting a CMD_SYNC,
we must wait for it to complete */
 :  if (sync) {
0.00 :   80001071c948:   ldr w0, [x29, #108]
 :  int ret = 0;
0.00 :   80001071c94c:   mov w24, #0x0  // #0
 :  if (sync) {
0.00 :   80001071c950:   cbnzw0, 80001071c990

 :  arch_local_irq_restore():
0.00 :   80001071c954:   msr daif, x21
 :  arm_smmu_cmdq_issue_cmdlist():
 :  }
 :  }
 :
 :  local_irq_restore(flags);
 :  return ret;
 :  }
   99.51 :   80001071c958:   adrpx0, 800011909000



This is likely the side effect of the re-enabling of interrupts (msr 
daif, x21)
on the previous instruction which causes the perf interrupt to fire 
right after.


Time to enable pseudo-NMIs in the PMUv3 driver...

 M.
--
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 11/11] vfio: Document nested stage control

2020-03-20 Thread Eric Auger

The VFIO API was enhanced to support nested stage control: a bunch of
new iotcls, one DMA FAULT region and an associated specific IRQ.

Let's document the process to follow to set up nested mode.

Signed-off-by: Eric Auger 

---

v8 -> v9:
- new names for SET_MSI_BINDING and SET_PASID_TABLE
- new layout for the DMA FAULT memory region and specific IRQ

v2 -> v3:
- document the new fault API

v1 -> v2:
- use the new ioctl names
- add doc related to fault handling
---
 Documentation/driver-api/vfio.rst | 77 +++
 1 file changed, 77 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst 
b/Documentation/driver-api/vfio.rst
index f1a4d3c3ba0b..563ebcec9224 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,83 @@ group and can access them as follows::
/* Gratuitous device reset and go... */
ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMU Dual Stage Control
+
+
+Some IOMMUs support 2 stages/levels of translation. "Stage" corresponds to
+the ARM terminology while "level" corresponds to Intel's VTD terminology. In
+the following text we use either without distinction.
+
+This is useful when the guest is exposed with a virtual IOMMU and some
+devices are assigned to the guest through VFIO. Then the guest OS can use
+stage 1 (IOVA -> GPA), while the hypervisor uses stage 2 for VM isolation
+(GPA -> HPA).
+
+The guest gets ownership of the stage 1 page tables and also owns stage 1
+configuration structures. The hypervisor owns the root configuration structure
+(for security reason), including stage 2 configuration. This works as long
+configuration structures and page table format are compatible between the
+virtual IOMMU and the physical IOMMU.
+
+Assuming the HW supports it, this nested mode is selected by choosing the
+VFIO_TYPE1_NESTING_IOMMU type through:
+
+ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
+
+This forces the hypervisor to use the stage 2, leaving stage 1 available for
+guest usage.
+
+Once groups are attached to the container, the guest stage 1 translation
+configuration data can be passed to VFIO by using
+
+ioctl(container, VFIO_IOMMU_SET_PASID_TABLE, _table_info);
+
+This allows to combine the guest stage 1 configuration structure along with
+the hypervisor stage 2 configuration structure. Stage 1 configuration
+structures are dependent on the IOMMU type.
+
+As the stage 1 translation is fully delegated to the HW, translation faults
+encountered during the translation process need to be propagated up to
+the virtualizer and re-injected into the guest.
+
+The userspace must be prepared to receive faults. The VFIO-PCI device
+exposes one dedicated DMA FAULT region: it contains a ring buffer and
+its header that allows to manage the head/tail indices. The region is
+identified by the following index/subindex:
+- VFIO_REGION_TYPE_NESTED/VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT
+
+The DMA FAULT region exposes a VFIO_REGION_INFO_CAP_PRODUCER_FAULT
+region capability that allows the userspace to retrieve the ABI version
+of the fault records filled by the host.
+
+On top of that region, the userspace can be notified whenever a fault
+occurs at the physical level. It can use the VFIO_IRQ_TYPE_NESTED/
+VFIO_IRQ_SUBTYPE_DMA_FAULT specific IRQ to attach the eventfd to be
+signalled.
+
+The ring buffer containing the fault records can be mmapped. When
+the userspace consumes a fault in the queue, it should increment
+the consumer index to allow new fault records to replace the used ones.
+
+The queue size and the entry size can be retrieved in the header.
+The tail index should never overshoot the producer index as in any
+other circular buffer scheme. Also it must be less than the queue size
+otherwise the change fails.
+
+When the guest invalidates stage 1 related caches, invalidations must be
+forwarded to the host through
+ioctl(container, VFIO_IOMMU_CACHE_INVALIDATE, _data);
+Those invalidations can happen at various granularity levels, page, context, 
...
+
+The ARM SMMU specification introduces another challenge: MSIs are translated by
+both the virtual SMMU and the physical SMMU. To build a nested mapping for the
+IOVA programmed into the assigned device, the guest needs to pass its IOVA/MSI
+doorbell GPA binding to the host. Then the hypervisor can build a nested stage 
2
+binding eventually translating into the physical MSI doorbell.
+
+This is achieved by calling
+ioctl(container, VFIO_IOMMU_SET_MSI_BINDING, _binding);
+
 VFIO User API
 ---
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: arm-smmu-v3 high cpu usage for NVMe

2020-03-20 Thread John Garry








I've run a bunch of netperf instances on multiple cores and collecting
SMMU usage (on TaiShan 2280). I'm getting the following ratio pretty
consistently.

- 6.07% arm_smmu_iotlb_sync
 - 5.74% arm_smmu_tlb_inv_range
  5.09% arm_smmu_cmdq_issue_cmdlist
  0.28% __pi_memset
  0.08% __pi_memcpy
  0.08% arm_smmu_atc_inv_domain.constprop.37
  0.07% arm_smmu_cmdq_build_cmd
  0.01% arm_smmu_cmdq_batch_add
   0.31% __pi_memset

So arm_smmu_atc_inv_domain() takes about 1.4% of arm_smmu_iotlb_sync(),
when ATS is not used. According to the annotations, the load from the
atomic_read(), that checks whether the domain uses ATS, is 77% of the
samples in arm_smmu_atc_inv_domain() (265 of 345 samples), so I'm not sure
there is much room for optimization there.


Well I did originally suggest using RCU protection to scan the list of
devices, instead of reading an atomic and checking for non-zero value. But
that would be an optimsation for ATS also, and there was no ATS devices at
the time (to verify performance).


Heh, I have yet to get my hands on one. Currently I can't evaluate ATS
performance, but I agree that using RCU to scan the list should get better
results when using ATS.

When ATS isn't in use however, I suspect reading nr_ats_masters should be
more efficient than taking the RCU lock + reading an "ats_devices" list
(since the smmu_domain->devices list also serves context descriptor
invalidation, even when ATS isn't in use). I'll run some tests however, to
see if I can micro-optimize this case, but I don't expect noticeable
improvements.


ok, cheers. I, too, would not expect a significant improvement there.

JFYI, I've been playing for "perf annotate" today and it's giving 
strange results for my NVMe testing. So "report" looks somewhat sane, if 
not a worryingly high % for arm_smmu_cmdq_issue_cmdlist():



55.39%  irq/342-nvme0q1  [kernel.kallsyms]  [k] 
arm_smmu_cmdq_issue_cmdlist
 9.74%  irq/342-nvme0q1  [kernel.kallsyms]  [k] 
_raw_spin_unlock_irqrestore

 2.02%  irq/342-nvme0q1  [kernel.kallsyms]  [k] nvme_irq
 1.86%  irq/342-nvme0q1  [kernel.kallsyms]  [k] fput_many
 1.73%  irq/342-nvme0q1  [kernel.kallsyms]  [k] 
arm_smmu_atc_inv_domain.constprop.42

 1.67%  irq/342-nvme0q1  [kernel.kallsyms]  [k] __arm_lpae_unmap
 1.49%  irq/342-nvme0q1  [kernel.kallsyms]  [k] aio_complete_rw

But "annotate" consistently tells me that a specific instruction 
consumes ~99% of the load for the enqueue function:


 :  /* 5. If we are inserting a CMD_SYNC, 
we must wait for it to complete */

 :  if (sync) {
0.00 :   80001071c948:   ldr w0, [x29, #108]
 :  int ret = 0;
0.00 :   80001071c94c:   mov w24, #0x0 
 // #0

 :  if (sync) {
0.00 :   80001071c950:   cbnzw0, 80001071c990 


 :  arch_local_irq_restore():
0.00 :   80001071c954:   msr daif, x21
 :  arm_smmu_cmdq_issue_cmdlist():
 :  }
 :  }
 :
 :  local_irq_restore(flags);
 :  return ret;
 :  }
   99.51 :   80001071c958:   adrpx0, 800011909000 


0.00 :   80001071c95c:   add x21, x0, #0x908
0.02 :   80001071c960:   ldr x2, [x29, #488]
0.14 :   80001071c964:   ldr x1, [x21]
0.00 :   80001071c968:   eor x1, x2, x1
0.00 :   80001071c96c:   mov w0, w24


But there may be a hint that we're getting consuming a lot of time in 
polling the CMD_SYNC consumption.


The files are available here:

https://raw.githubusercontent.com/hisilicon/kernel-dev/private-topic-nvme-5.6-profiling/ann.txt, 
report


Or maybe I'm just not using the tool properly ...

Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 10/11] vfio/pci: Register and allow DMA FAULT IRQ signaling

2020-03-20 Thread Eric Auger

Register the VFIO_IRQ_TYPE_NESTED/VFIO_IRQ_SUBTYPE_DMA_FAULT
IRQ that allows to signal a nested mode DMA fault.

Signed-off-by: Eric Auger 
---
 drivers/vfio/pci/vfio_pci.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index ca13067e4718..70e3a31da9f0 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -346,7 +346,7 @@ int vfio_pci_iommu_dev_fault_handler(struct iommu_fault 
*fault, void *data)
struct iommu_fault *new =
(struct iommu_fault *)(vdev->fault_pages + reg->offset +
reg->head * reg->entry_size);
-   int head, tail, size;
+   int head, tail, size, ext_irq_index;
int ret = 0;
 
if (fault->type != IOMMU_FAULT_DMA_UNRECOV)
@@ -367,7 +367,19 @@ int vfio_pci_iommu_dev_fault_handler(struct iommu_fault 
*fault, void *data)
reg->head = (head + 1) % size;
 unlock:
mutex_unlock(>fault_queue_lock);
-   return ret;
+   if (ret)
+   return ret;
+
+   ext_irq_index = vfio_pci_get_ext_irq_index(vdev, VFIO_IRQ_TYPE_NESTED,
+  VFIO_IRQ_SUBTYPE_DMA_FAULT);
+   if (ext_irq_index < 0)
+   return -EINVAL;
+
+   mutex_lock(>igate);
+   if (vdev->ext_irqs[ext_irq_index].trigger)
+   eventfd_signal(vdev->ext_irqs[ext_irq_index].trigger, 1);
+   mutex_unlock(>igate);
+   return 0;
 }
 
 #define DMA_FAULT_RING_LENGTH 512
@@ -520,6 +532,12 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
if (ret)
goto disable_exit;
 
+   ret = vfio_pci_register_irq(vdev, VFIO_IRQ_TYPE_NESTED,
+   VFIO_IRQ_SUBTYPE_DMA_FAULT,
+   VFIO_IRQ_INFO_EVENTFD);
+   if (ret)
+   goto disable_exit;
+
vfio_pci_probe_mmaps(vdev);
 
return 0;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 09/11] vfio/pci: Add framework for custom interrupt indices

2020-03-20 Thread Eric Auger

Implement IRQ capability chain infrastructure. All interrupt
indexes beyond VFIO_PCI_NUM_IRQS are handled as extended
interrupts. They are registered with a specific type/subtype
and supported flags.

Signed-off-by: Eric Auger 
---
 drivers/vfio/pci/vfio_pci.c | 100 +++-
 drivers/vfio/pci/vfio_pci_intrs.c   |  62 +
 drivers/vfio/pci/vfio_pci_private.h |  14 
 3 files changed, 158 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 3c99f6f3825b..ca13067e4718 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -543,6 +543,14 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
VFIO_IRQ_SET_ACTION_TRIGGER,
vdev->irq_type, 0, 0, NULL);
 
+   for (i = 0; i < vdev->num_ext_irqs; i++)
+   vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
+   VFIO_IRQ_SET_ACTION_TRIGGER,
+   VFIO_PCI_NUM_IRQS + i, 0, 0, NULL);
+   vdev->num_ext_irqs = 0;
+   kfree(vdev->ext_irqs);
+   vdev->ext_irqs = NULL;
+
/* Device closed, don't need mutex here */
list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
 >ioeventfds_list, next) {
@@ -709,6 +717,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
return 1;
} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
return 1;
+   } else if (irq_type >= VFIO_PCI_NUM_IRQS &&
+  irq_type < VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs) {
+   return 1;
}
 
return 0;
@@ -878,7 +889,7 @@ static long vfio_pci_ioctl(void *device_data,
info.flags |= VFIO_DEVICE_FLAGS_RESET;
 
info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
-   info.num_irqs = VFIO_PCI_NUM_IRQS;
+   info.num_irqs = VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs;
 
return copy_to_user((void __user *)arg, , minsz) ?
-EFAULT : 0;
@@ -1033,36 +1044,88 @@ static long vfio_pci_ioctl(void *device_data,
 
} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
struct vfio_irq_info info;
+   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+   unsigned long capsz;
 
minsz = offsetofend(struct vfio_irq_info, count);
 
+   /* For backward compatibility, cannot require this */
+   capsz = offsetofend(struct vfio_irq_info, cap_offset);
+
if (copy_from_user(, (void __user *)arg, minsz))
return -EFAULT;
 
-   if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
+   if (info.argsz < minsz ||
+   info.index >= VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs)
return -EINVAL;
 
-   switch (info.index) {
-   case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
-   case VFIO_PCI_REQ_IRQ_INDEX:
-   break;
-   case VFIO_PCI_ERR_IRQ_INDEX:
-   if (pci_is_pcie(vdev->pdev))
-   break;
-   /* fall through */
-   default:
-   return -EINVAL;
-   }
+   if (info.argsz >= capsz)
+   minsz = capsz;
 
info.flags = VFIO_IRQ_INFO_EVENTFD;
 
-   info.count = vfio_pci_get_irq_count(vdev, info.index);
-
-   if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
+   switch (info.index) {
+   case VFIO_PCI_INTX_IRQ_INDEX:
info.flags |= (VFIO_IRQ_INFO_MASKABLE |
   VFIO_IRQ_INFO_AUTOMASKED);
-   else
+   break;
+   case VFIO_PCI_MSI_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
+   case VFIO_PCI_REQ_IRQ_INDEX:
info.flags |= VFIO_IRQ_INFO_NORESIZE;
+   break;
+   case VFIO_PCI_ERR_IRQ_INDEX:
+   info.flags |= VFIO_IRQ_INFO_NORESIZE;
+   if (!pci_is_pcie(vdev->pdev))
+   return -EINVAL;
+   break;
+   /* fall through */
+   default:
+   {
+   struct vfio_irq_info_cap_type cap_type = {
+   .header.id = VFIO_IRQ_INFO_CAP_TYPE,
+   .header.version = 1 };
+   int ret, i;
+
+   if (info.index >= VFIO_PCI_NUM_IRQS +
+   vdev->num_ext_irqs)
+   return -EINVAL;
+   info.index =

[PATCH v10 07/11] vfio: Use capability chains to handle device specific irq

2020-03-20 Thread Eric Auger

From: Tina Zhang 

Caps the number of irqs with fixed indexes and uses capability chains
to chain device specific irqs.

Signed-off-by: Tina Zhang 
Signed-off-by: Eric Auger 
[Eric: Put cap_offset at the end of the vfio_irq_info struct,
remove GFX IRQ at the moment and remove any reference to this latter
in the commit message]

---
---
 include/uapi/linux/vfio.h | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 40d770f80e3d..f0fd26d058c9 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -459,11 +459,27 @@ struct vfio_irq_info {
 #define VFIO_IRQ_INFO_MASKABLE (1 << 1)
 #define VFIO_IRQ_INFO_AUTOMASKED   (1 << 2)
 #define VFIO_IRQ_INFO_NORESIZE (1 << 3)
+#define VFIO_IRQ_INFO_FLAG_CAPS(1 << 4) /* Info supports caps 
*/
__u32   index;  /* IRQ index */
__u32   count;  /* Number of IRQs within this index */
+   __u32   cap_offset; /* Offset within info struct of first cap */
 };
 #define VFIO_DEVICE_GET_IRQ_INFO   _IO(VFIO_TYPE, VFIO_BASE + 9)
 
+/*
+ * The irq type capability allows IRQs unique to a specific device or
+ * class of devices to be exposed.
+ *
+ * The structures below define version 1 of this capability.
+ */
+#define VFIO_IRQ_INFO_CAP_TYPE  3
+
+struct vfio_irq_info_cap_type {
+   struct vfio_info_cap_header header;
+   __u32 type; /* global per bus driver */
+   __u32 subtype;  /* type specific */
+};
+
 /**
  * VFIO_DEVICE_SET_IRQS - _IOW(VFIO_TYPE, VFIO_BASE + 10, struct vfio_irq_set)
  *
@@ -565,7 +581,8 @@ enum {
VFIO_PCI_MSIX_IRQ_INDEX,
VFIO_PCI_ERR_IRQ_INDEX,
VFIO_PCI_REQ_IRQ_INDEX,
-   VFIO_PCI_NUM_IRQS
+   VFIO_PCI_NUM_IRQS = 5   /* Fixed user ABI, IRQ indexes >=5 use   */
+   /* device specific cap to define content */
 };
 
 /*
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 08/11] vfio: Add new IRQ for DMA fault reporting

2020-03-20 Thread Eric Auger

Add a new IRQ type/subtype to get notification on nested
stage DMA faults.

Signed-off-by: Eric Auger 
---
 include/uapi/linux/vfio.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index f0fd26d058c9..73a6bf072b10 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -480,6 +480,9 @@ struct vfio_irq_info_cap_type {
__u32 subtype;  /* type specific */
 };
 
+#define VFIO_IRQ_TYPE_NESTED   (1)
+#define VFIO_IRQ_SUBTYPE_DMA_FAULT (1)
+
 /**
  * VFIO_DEVICE_SET_IRQS - _IOW(VFIO_TYPE, VFIO_BASE + 10, struct vfio_irq_set)
  *
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 03/11] vfio: VFIO_IOMMU_SET_MSI_BINDING

2020-03-20 Thread Eric Auger

This patch adds the VFIO_IOMMU_SET_MSI_BINDING ioctl which aim
to (un)register the guest MSI binding to the host. This latter
then can use those stage 1 bindings to build a nested stage
binding targeting the physical MSIs.

Signed-off-by: Eric Auger 

---

v8 -> v9:
- merge VFIO_IOMMU_BIND_MSI/VFIO_IOMMU_UNBIND_MSI into a single
  VFIO_IOMMU_SET_MSI_BINDING ioctl
- ioctl id changed

v6 -> v7:
- removed the dev arg

v3 -> v4:
- add UNBIND
- unwind on BIND error

v2 -> v3:
- adapt to new proto of bind_guest_msi
- directly use vfio_iommu_for_each_dev

v1 -> v2:
- s/vfio_iommu_type1_guest_msi_binding/vfio_iommu_type1_bind_guest_msi
---
 drivers/vfio/vfio_iommu_type1.c | 55 +
 include/uapi/linux/vfio.h   | 20 
 2 files changed, 75 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 04c6625098bb..17ba63de1494 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2246,6 +2246,42 @@ static int vfio_cache_inv_fn(struct device *dev, void 
*data)
return iommu_cache_invalidate(dc->domain, dev, >info);
 }
 
+static int
+vfio_bind_msi(struct vfio_iommu *iommu,
+ dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+   struct vfio_domain *d;
+   int ret = 0;
+
+   mutex_lock(>lock);
+
+   list_for_each_entry(d, >domain_list, next) {
+   ret = iommu_bind_guest_msi(d->domain, giova, gpa, size);
+   if (ret)
+   goto unwind;
+   }
+   goto unlock;
+unwind:
+   list_for_each_entry_continue_reverse(d, >domain_list, next) {
+   iommu_unbind_guest_msi(d->domain, giova);
+   }
+unlock:
+   mutex_unlock(>lock);
+   return ret;
+}
+
+static void
+vfio_unbind_msi(struct vfio_iommu *iommu, dma_addr_t giova)
+{
+   struct vfio_domain *d;
+
+   mutex_lock(>lock);
+   list_for_each_entry(d, >domain_list, next) {
+   iommu_unbind_guest_msi(d->domain, giova);
+   }
+   mutex_unlock(>lock);
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -2387,6 +2423,25 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
);
mutex_unlock(>lock);
return ret;
+   } else if (cmd == VFIO_IOMMU_SET_MSI_BINDING) {
+   struct vfio_iommu_type1_set_msi_binding ustruct;
+
+   minsz = offsetofend(struct vfio_iommu_type1_set_msi_binding,
+   size);
+
+   if (copy_from_user(, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (ustruct.argsz < minsz)
+   return -EINVAL;
+
+   if (ustruct.flags == VFIO_IOMMU_UNBIND_MSI)
+   vfio_unbind_msi(iommu, ustruct.iova);
+   else if (ustruct.flags == VFIO_IOMMU_BIND_MSI)
+   return vfio_bind_msi(iommu, ustruct.iova, ustruct.gpa,
+   ustruct.size);
+   else
+   return -EINVAL;
}
 
return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 98212c1711e7..9f2429eb1958 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -826,6 +826,26 @@ struct vfio_iommu_type1_cache_invalidate {
 };
 #define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 23)
 
+/**
+ * VFIO_IOMMU_SET_MSI_BINDING - _IOWR(VFIO_TYPE, VFIO_BASE + 24,
+ * struct vfio_iommu_type1_set_msi_binding)
+ *
+ * Pass a stage 1 MSI doorbell mapping to the host so that this
+ * latter can build a nested stage2 mapping. Or conversely tear
+ * down a previously bound stage 1 MSI binding.
+ */
+struct vfio_iommu_type1_set_msi_binding {
+   __u32   argsz;
+   __u32   flags;
+#define VFIO_IOMMU_BIND_MSI(1 << 0)
+#define VFIO_IOMMU_UNBIND_MSI  (1 << 1)
+   __u64   iova;   /* MSI guest IOVA */
+   /* Fields below are used on BIND */
+   __u64   gpa;/* MSI guest physical address */
+   __u64   size;   /* size of stage1 mapping (bytes) */
+};
+#define VFIO_IOMMU_SET_MSI_BINDING  _IO(VFIO_TYPE, VFIO_BASE + 24)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 06/11] vfio/pci: Allow to mmap the fault queue

2020-03-20 Thread Eric Auger

The DMA FAULT region contains the fault ring buffer.
There is benefit to let the userspace mmap this area.
Expose this mmappable area through a sparse mmap entry
and implement the mmap operation.

Signed-off-by: Eric Auger 

---

v8 -> v9:
- remove unused index local variable in vfio_pci_fault_mmap
---
 drivers/vfio/pci/vfio_pci.c | 61 +++--
 1 file changed, 58 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 69595c240baf..3c99f6f3825b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -266,21 +266,75 @@ static void vfio_pci_dma_fault_release(struct 
vfio_pci_device *vdev,
 {
 }
 
+static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
+  struct vfio_pci_region *region,
+  struct vm_area_struct *vma)
+{
+   u64 phys_len, req_len, pgoff, req_start;
+   unsigned long long addr;
+   unsigned int ret;
+
+   phys_len = region->size;
+
+   req_len = vma->vm_end - vma->vm_start;
+   pgoff = vma->vm_pgoff &
+   ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
+   req_start = pgoff << PAGE_SHIFT;
+
+   /* only the second page of the producer fault region is mmappable */
+   if (req_start < PAGE_SIZE)
+   return -EINVAL;
+
+   if (req_start + req_len > phys_len)
+   return -EINVAL;
+
+   addr = virt_to_phys(vdev->fault_pages);
+   vma->vm_private_data = vdev;
+   vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
+
+   ret = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
+ req_len, vma->vm_page_prot);
+   return ret;
+}
+
 static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
 struct vfio_pci_region *region,
 struct vfio_info_cap *caps)
 {
+   struct vfio_region_info_cap_sparse_mmap *sparse = NULL;
struct vfio_region_info_cap_fault cap = {
.header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
.header.version = 1,
.version = 1,
};
-   return vfio_info_add_capability(caps, , sizeof(cap));
+   size_t size = sizeof(*sparse) + sizeof(*sparse->areas);
+   int ret;
+
+   ret = vfio_info_add_capability(caps, , sizeof(cap));
+   if (ret)
+   return ret;
+
+   sparse = kzalloc(size, GFP_KERNEL);
+   if (!sparse)
+   return -ENOMEM;
+
+   sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
+   sparse->header.version = 1;
+   sparse->nr_areas = 1;
+   sparse->areas[0].offset = PAGE_SIZE;
+   sparse->areas[0].size = region->size - PAGE_SIZE;
+
+   ret = vfio_info_add_capability(caps, >header, size);
+   if (ret)
+   kfree(sparse);
+
+   return ret;
 }
 
 static const struct vfio_pci_regops vfio_pci_dma_fault_regops = {
.rw = vfio_pci_dma_fault_rw,
.release= vfio_pci_dma_fault_release,
+   .mmap   = vfio_pci_dma_fault_mmap,
.add_capability = vfio_pci_dma_fault_add_capability,
 };
 
@@ -341,7 +395,8 @@ static int vfio_pci_init_dma_fault_region(struct 
vfio_pci_device *vdev)
VFIO_REGION_TYPE_NESTED,
VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT,
_pci_dma_fault_regops, size,
-   VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE,
+   VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE |
+   VFIO_REGION_INFO_FLAG_MMAP,
vdev->fault_pages);
if (ret)
goto out;
@@ -349,7 +404,7 @@ static int vfio_pci_init_dma_fault_region(struct 
vfio_pci_device *vdev)
header = (struct vfio_region_dma_fault *)vdev->fault_pages;
header->entry_size = sizeof(struct iommu_fault);
header->nb_entries = DMA_FAULT_RING_LENGTH;
-   header->offset = sizeof(struct vfio_region_dma_fault);
+   header->offset = PAGE_SIZE;
 
ret = iommu_register_device_fault_handler(>pdev->dev,
vfio_pci_iommu_dev_fault_handler,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

2020-03-20 Thread Eric Auger

From: "Liu, Yi L" 

This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
which aims to pass the virtual iommu guest configuration
to the host. This latter takes the form of the so-called
PASID table.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
Signed-off-by: Eric Auger 

---
v8 -> v9:
- Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
  VFIO_IOMMU_SET_PASID_TABLE ioctl.

v6 -> v7:
- add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE

v3 -> v4:
- restore ATTACH/DETACH
- add unwind on failure

v2 -> v3:
- s/BIND_PASID_TABLE/SET_PASID_TABLE

v1 -> v2:
- s/BIND_GUEST_STAGE/BIND_PASID_TABLE
- remove the struct device arg
---
 drivers/vfio/vfio_iommu_type1.c | 56 +
 include/uapi/linux/vfio.h   | 19 +++
 2 files changed, 75 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a177bf2c6683..bfacbd876ee1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2172,6 +2172,43 @@ static int vfio_iommu_iova_build_caps(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static void
+vfio_detach_pasid_table(struct vfio_iommu *iommu)
+{
+   struct vfio_domain *d;
+
+   mutex_lock(>lock);
+
+   list_for_each_entry(d, >domain_list, next) {
+   iommu_detach_pasid_table(d->domain);
+   }
+   mutex_unlock(>lock);
+}
+
+static int
+vfio_attach_pasid_table(struct vfio_iommu *iommu,
+   struct vfio_iommu_type1_set_pasid_table *ustruct)
+{
+   struct vfio_domain *d;
+   int ret = 0;
+
+   mutex_lock(>lock);
+
+   list_for_each_entry(d, >domain_list, next) {
+   ret = iommu_attach_pasid_table(d->domain, >config);
+   if (ret)
+   goto unwind;
+   }
+   goto unlock;
+unwind:
+   list_for_each_entry_continue_reverse(d, >domain_list, next) {
+   iommu_detach_pasid_table(d->domain);
+   }
+unlock:
+   mutex_unlock(>lock);
+   return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -2276,6 +2313,25 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
return copy_to_user((void __user *)arg, , minsz) ?
-EFAULT : 0;
+   } else if (cmd == VFIO_IOMMU_SET_PASID_TABLE) {
+   struct vfio_iommu_type1_set_pasid_table ustruct;
+
+   minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table,
+   config);
+
+   if (copy_from_user(, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (ustruct.argsz < minsz)
+   return -EINVAL;
+
+   if (ustruct.flags & VFIO_PASID_TABLE_FLAG_SET)
+   return vfio_attach_pasid_table(iommu, );
+   else if (ustruct.flags & VFIO_PASID_TABLE_FLAG_UNSET) {
+   vfio_detach_pasid_table(iommu);
+   return 0;
+   } else
+   return -EINVAL;
}
 
return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..e032a1fe6ed9 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 
 #define VFIO_API_VERSION   0
 
@@ -794,6 +795,24 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_SET_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
+ * struct vfio_iommu_type1_set_pasid_table)
+ *
+ * The SET operation passes a PASID table to the host while the
+ * UNSET operation detaches the one currently programmed. Setting
+ * a table while another is already programmed replaces the old table.
+ */
+struct vfio_iommu_type1_set_pasid_table {
+   __u32   argsz;
+   __u32   flags;
+#define VFIO_PASID_TABLE_FLAG_SET  (1 << 0)
+#define VFIO_PASID_TABLE_FLAG_UNSET(1 << 1)
+   struct iommu_pasid_table_config config; /* used on SET */
+};
+
+#define VFIO_IOMMU_SET_PASID_TABLE _IO(VFIO_TYPE, VFIO_BASE + 22)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 05/11] vfio/pci: Register an iommu fault handler

2020-03-20 Thread Eric Auger

Register an IOMMU fault handler which records faults in
the DMA FAULT region ring buffer. In a subsequent patch, we
will add the signaling of a specific eventfd to allow the
userspace to be notified whenever a new fault as shown up.

Signed-off-by: Eric Auger 

---

v8 -> v9:
- handler now takes an iommu_fault handle
- eventfd signaling moved to a subsequent patch
- check the fault type and return an error if != UNRECOV
- still the fault handler registration can fail. We need to
  reach an agreement about how to deal with the situation

v3 -> v4:
- move iommu_unregister_device_fault_handler to vfio_pci_release
---
 drivers/vfio/pci/vfio_pci.c | 42 +
 1 file changed, 42 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 586b89debed5..69595c240baf 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vfio_pci_private.h"
 
@@ -283,6 +284,38 @@ static const struct vfio_pci_regops 
vfio_pci_dma_fault_regops = {
.add_capability = vfio_pci_dma_fault_add_capability,
 };
 
+int vfio_pci_iommu_dev_fault_handler(struct iommu_fault *fault, void *data)
+{
+   struct vfio_pci_device *vdev = (struct vfio_pci_device *)data;
+   struct vfio_region_dma_fault *reg =
+   (struct vfio_region_dma_fault *)vdev->fault_pages;
+   struct iommu_fault *new =
+   (struct iommu_fault *)(vdev->fault_pages + reg->offset +
+   reg->head * reg->entry_size);
+   int head, tail, size;
+   int ret = 0;
+
+   if (fault->type != IOMMU_FAULT_DMA_UNRECOV)
+   return -ENOENT;
+
+   mutex_lock(>fault_queue_lock);
+
+   head = reg->head;
+   tail = reg->tail;
+   size = reg->nb_entries;
+
+   if (CIRC_SPACE(head, tail, size) < 1) {
+   ret = -ENOSPC;
+   goto unlock;
+   }
+
+   *new = *fault;
+   reg->head = (head + 1) % size;
+unlock:
+   mutex_unlock(>fault_queue_lock);
+   return ret;
+}
+
 #define DMA_FAULT_RING_LENGTH 512
 
 static int vfio_pci_init_dma_fault_region(struct vfio_pci_device *vdev)
@@ -317,6 +350,13 @@ static int vfio_pci_init_dma_fault_region(struct 
vfio_pci_device *vdev)
header->entry_size = sizeof(struct iommu_fault);
header->nb_entries = DMA_FAULT_RING_LENGTH;
header->offset = sizeof(struct vfio_region_dma_fault);
+
+   ret = iommu_register_device_fault_handler(>pdev->dev,
+   vfio_pci_iommu_dev_fault_handler,
+   vdev);
+   if (ret)
+   goto out;
+
return 0;
 out:
kfree(vdev->fault_pages);
@@ -542,6 +582,8 @@ static void vfio_pci_release(void *device_data)
if (!(--vdev->refcnt)) {
vfio_spapr_pci_eeh_release(vdev->pdev);
vfio_pci_disable(vdev);
+   /* TODO: Failure problematics */
+   iommu_unregister_device_fault_handler(>pdev->dev);
}
 
mutex_unlock(>reflck->lock);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 02/11] vfio: VFIO_IOMMU_CACHE_INVALIDATE

2020-03-20 Thread Eric Auger

From: "Liu, Yi L" 

When the guest "owns" the stage 1 translation structures,  the host
IOMMU driver has no knowledge of caching structure updates unless
the guest invalidation requests are trapped and passed down to the
host.

This patch adds the VFIO_IOMMU_CACHE_INVALIDATE ioctl with aims
at propagating guest stage1 IOMMU cache invalidations to the host.

Signed-off-by: Liu, Yi L 
Signed-off-by: Eric Auger 

---

v8 -> v9:
- change the ioctl ID

v6 -> v7:
- Use iommu_capsule struct
- renamed vfio_iommu_for_each_dev into vfio_iommu_lookup_dev
  due to checkpatch error related to for_each_dev suffix

v2 -> v3:
- introduce vfio_iommu_for_each_dev back in this patch

v1 -> v2:
- s/TLB/CACHE
- remove vfio_iommu_task usage
- commit message rewording
---
 drivers/vfio/vfio_iommu_type1.c | 55 +
 include/uapi/linux/vfio.h   | 13 
 2 files changed, 68 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index bfacbd876ee1..04c6625098bb 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -124,6 +124,34 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(>domain_list))
 
+struct domain_capsule {
+   struct iommu_domain *domain;
+   void *data;
+};
+
+/* iommu->lock must be held */
+static int
+vfio_iommu_lookup_dev(struct vfio_iommu *iommu,
+ int (*fn)(struct device *dev, void *data),
+ void *data)
+{
+   struct domain_capsule dc = {.data = data};
+   struct vfio_domain *d;
+   struct vfio_group *g;
+   int ret = 0;
+
+   list_for_each_entry(d, >domain_list, next) {
+   dc.domain = d->domain;
+   list_for_each_entry(g, >group_list, next) {
+   ret = iommu_group_for_each_dev(g->iommu_group,
+  , fn);
+   if (ret)
+   break;
+   }
+   }
+   return ret;
+}
+
 static int put_pfn(unsigned long pfn, int prot);
 
 /*
@@ -2209,6 +2237,15 @@ vfio_attach_pasid_table(struct vfio_iommu *iommu,
return ret;
 }
 
+static int vfio_cache_inv_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   struct vfio_iommu_type1_cache_invalidate *ustruct =
+   (struct vfio_iommu_type1_cache_invalidate *)dc->data;
+
+   return iommu_cache_invalidate(dc->domain, dev, >info);
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -2332,6 +2369,24 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
return 0;
} else
return -EINVAL;
+   } else if (cmd == VFIO_IOMMU_CACHE_INVALIDATE) {
+   struct vfio_iommu_type1_cache_invalidate ustruct;
+   int ret;
+
+   minsz = offsetofend(struct vfio_iommu_type1_cache_invalidate,
+   info);
+
+   if (copy_from_user(, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (ustruct.argsz < minsz || ustruct.flags)
+   return -EINVAL;
+
+   mutex_lock(>lock);
+   ret = vfio_iommu_lookup_dev(iommu, vfio_cache_inv_fn,
+   );
+   mutex_unlock(>lock);
+   return ret;
}
 
return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index e032a1fe6ed9..98212c1711e7 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -813,6 +813,19 @@ struct vfio_iommu_type1_set_pasid_table {
 
 #define VFIO_IOMMU_SET_PASID_TABLE _IO(VFIO_TYPE, VFIO_BASE + 22)
 
+/**
+ * VFIO_IOMMU_CACHE_INVALIDATE - _IOWR(VFIO_TYPE, VFIO_BASE + 23,
+ * struct vfio_iommu_type1_cache_invalidate)
+ *
+ * Propagate guest IOMMU cache invalidation to the host.
+ */
+struct vfio_iommu_type1_cache_invalidate {
+   __u32   argsz;
+   __u32   flags;
+   struct iommu_cache_invalidate_info info;
+};
+#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)

2020-03-20 Thread Eric Auger

This series brings the VFIO part of HW nested paging support
in the SMMUv3.

This is a rebase on top of Will's arm-smmu-updates branch
(git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
tags/arm-smmu-updates)

This is basically a resend of v9 as patches still applied.

This work has been stalled since Plumber 2019. Since then
some users expressed interest in that work and tested v9:
- https://patchwork.kernel.org/cover/11039995/#23012381
- https://patchwork.kernel.org/cover/11039995/#23197235

The series depends on:
[PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

3 new IOCTLs are introduced that allow the userspace to
1) pass the guest stage 1 configuration
2) pass stage 1 MSI bindings
3) invalidate stage 1 related caches

They map onto the related new IOMMU API functions.

We introduce the capability to register specific interrupt
indexes (see [1]). A new DMA_FAULT interrupt index allows to register
an eventfd to be signaled whenever a stage 1 related fault
is detected at physical level. Also a specific region allows
to expose the fault records to the user space.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10

It series includes Tina's patch steming from
[1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
specific irq" plus patches originally contributed by Yi.

History:

v9 -> v10
- rebase on top of 5.6.0-rc3 (no change versus v9)

v8 -> v9:
- introduce specific irq framework
- single fault region
- iommu_unregister_device_fault_handler failure case not handled
  yet.

v7 -> v8:
- rebase on top of v5.2-rc1 and especially
  8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain pointer
- dynamic alloc of s1_cfg/s2_cfg
- __arm_smmu_tlb_inv_asid/s1_range_nosync
- check there is no HW MSI regions
- asid invalidation using pasid extended struct (change in the uapi)
- add s1_live/s2_live checks
- move check about support of nested stages in domain finalise
- fixes in error reporting according to the discussion with Robin
- reordered the patches to have first iommu/smmuv3 patches and then
  VFIO patches

v6 -> v7:
- removed device handle from bind/unbind_guest_msi
- added "iommu/smmuv3: Nested mode single MSI doorbell per domain
  enforcement"
- added few uapi comments as suggested by Jean, Jacop and Alex

v5 -> v6:
- Fix compilation issue when CONFIG_IOMMU_API is unset

v4 -> v5:
- fix bug reported by Vincent: fault handler unregistration now happens in
  vfio_pci_release
- IOMMU_FAULT_PERM_* moved outside of struct definition + small
  uapi changes suggested by Kean-Philippe (except fetch_addr)
- iommu: introduce device fault report API: removed the PRI part.
- see individual logs for more details
- reset the ste abort flag on detach

v3 -> v4:
- took into account Alex, jean-Philippe and Robin's comments on v3
- rework of the smmuv3 driver integration
- add tear down ops for msi binding and PASID table binding
- fix S1 fault propagation
- put fault reporting patches at the beginning of the series following
  Jean-Philippe's request
- update of the cache invalidate and fault API uapis
- VFIO fault reporting rework with 2 separate regions and one mmappable
  segment for the fault queue
- moved to PATCH

v2 -> v3:
- When registering the S1 MSI binding we now store the device handle. This
  addresses Robin's comment about discimination of devices beonging to
  different S1 groups and using different physical MSI doorbells.
- Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
  set the eventfd and expose the faults through an mmappable fault region

v1 -> v2:
- Added the fault reporting capability
- asid properly passed on invalidation (fix assignment of multiple
  devices)
- see individual change logs for more info


Eric Auger (8):
  vfio: VFIO_IOMMU_SET_MSI_BINDING
  vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
  vfio/pci: Register an iommu fault handler
  vfio/pci: Allow to mmap the fault queue
  vfio: Add new IRQ for DMA fault reporting
  vfio/pci: Add framework for custom interrupt indices
  vfio/pci: Register and allow DMA FAULT IRQ signaling
  vfio: Document nested stage control

Liu, Yi L (2):
  vfio: VFIO_IOMMU_SET_PASID_TABLE
  vfio: VFIO_IOMMU_CACHE_INVALIDATE

Tina Zhang (1):
  vfio: Use capability chains to handle device specific irq

 Documentation/driver-api/vfio.rst   |  77 
 drivers/vfio/pci/vfio_pci.c | 283 ++--
 drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++
 drivers/vfio/pci/vfio_pci_private.h |  24 +++
 drivers/vfio/pci/vfio_pci_rdwr.c|  45 +
 drivers/vfio/vfio_iommu_type1.c | 166 
 include/uapi/linux/vfio.h   | 109 ++-
 7 files changed, 747 insertions(+), 19 deletions(-)

-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 04/11] vfio/pci: Add VFIO_REGION_TYPE_NESTED region type

2020-03-20 Thread Eric Auger

Add a new specific DMA_FAULT region aiming to exposed nested mode
translation faults.

The region has a ring buffer that contains the actual fault
records plus a header allowing to handle it (tail/head indices,
max capacity, entry size). At the moment the region is dimensionned
for 512 fault records.

Signed-off-by: Eric Auger 

---

v8 -> v9:
- Use a single region instead of a prod/cons region

v4 -> v5
- check cons is not null in vfio_pci_check_cons_fault

v3 -> v4:
- use 2 separate regions, respectively in read and write modes
- add the version capability
---
 drivers/vfio/pci/vfio_pci.c | 68 +
 drivers/vfio/pci/vfio_pci_private.h | 10 +
 drivers/vfio/pci/vfio_pci_rdwr.c| 45 +++
 include/uapi/linux/vfio.h   | 35 +++
 4 files changed, 158 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 379a02c36e37..586b89debed5 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -260,6 +260,69 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, 
pci_power_t state)
return ret;
 }
 
+static void vfio_pci_dma_fault_release(struct vfio_pci_device *vdev,
+  struct vfio_pci_region *region)
+{
+}
+
+static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
+struct vfio_pci_region *region,
+struct vfio_info_cap *caps)
+{
+   struct vfio_region_info_cap_fault cap = {
+   .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
+   .header.version = 1,
+   .version = 1,
+   };
+   return vfio_info_add_capability(caps, , sizeof(cap));
+}
+
+static const struct vfio_pci_regops vfio_pci_dma_fault_regops = {
+   .rw = vfio_pci_dma_fault_rw,
+   .release= vfio_pci_dma_fault_release,
+   .add_capability = vfio_pci_dma_fault_add_capability,
+};
+
+#define DMA_FAULT_RING_LENGTH 512
+
+static int vfio_pci_init_dma_fault_region(struct vfio_pci_device *vdev)
+{
+   struct vfio_region_dma_fault *header;
+   size_t size;
+   int ret;
+
+   mutex_init(>fault_queue_lock);
+
+   /*
+* We provision 1 page for the header and space for
+* DMA_FAULT_RING_LENGTH fault records in the ring buffer.
+*/
+   size = ALIGN(sizeof(struct iommu_fault) *
+DMA_FAULT_RING_LENGTH, PAGE_SIZE) + PAGE_SIZE;
+
+   vdev->fault_pages = kzalloc(size, GFP_KERNEL);
+   if (!vdev->fault_pages)
+   return -ENOMEM;
+
+   ret = vfio_pci_register_dev_region(vdev,
+   VFIO_REGION_TYPE_NESTED,
+   VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT,
+   _pci_dma_fault_regops, size,
+   VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE,
+   vdev->fault_pages);
+   if (ret)
+   goto out;
+
+   header = (struct vfio_region_dma_fault *)vdev->fault_pages;
+   header->entry_size = sizeof(struct iommu_fault);
+   header->nb_entries = DMA_FAULT_RING_LENGTH;
+   header->offset = sizeof(struct vfio_region_dma_fault);
+   return 0;
+out:
+   kfree(vdev->fault_pages);
+   return ret;
+}
+
 static int vfio_pci_enable(struct vfio_pci_device *vdev)
 {
struct pci_dev *pdev = vdev->pdev;
@@ -358,6 +421,10 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
}
}
 
+   ret = vfio_pci_init_dma_fault_region(vdev);
+   if (ret)
+   goto disable_exit;
+
vfio_pci_probe_mmaps(vdev);
 
return 0;
@@ -1383,6 +1450,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 
vfio_iommu_group_put(pdev->dev.iommu_group, >dev);
kfree(vdev->region);
+   kfree(vdev->fault_pages);
mutex_destroy(>ioeventfds_lock);
 
if (!disable_idle_d3)
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 8a2c7607d513..a392f50e3a99 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -119,6 +119,8 @@ struct vfio_pci_device {
int ioeventfds_nr;
struct eventfd_ctx  *err_trigger;
struct eventfd_ctx  *req_trigger;
+   u8  *fault_pages;
+   struct mutexfault_queue_lock;
struct list_headdummy_resources_list;
struct mutexioeventfds_lock;
struct list_headioeventfds_list;
@@ -150,6 +152,14 @@ extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device 
*vdev, char __user *buf,
 extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
   uint64_t data, int count, int fd);
 
+struct vfio_pci_fault_abi {
+   u32 entry_size;
+};
+
+extern size_t vfio_pci_dma_fault_rw(struct vfio_pci_device *vdev,
+

Re: [PATCH 1/3] iommu/vt-d: Remove redundant IOTLB flush

2020-03-20 Thread Jacob Pan

On Fri, 20 Mar 2020 21:45:26 +0800
Lu Baolu  wrote:

> On 2020/3/20 12:32, Jacob Pan wrote:
> > IOTLB flush already included in the PASID tear down process. There
> > is no need to flush again.  
> 
> It seems that intel_pasid_tear_down_entry() doesn't flush the pasid
> based device TLB?
> 
I saw this code in intel_pasid_tear_down_entry(). Isn't the last line
flush the devtlb? Not in guest of course since the passdown tlb flush
is inclusive.

pasid_cache_invalidation_with_pasid(iommu, did, pasid);
iotlb_invalidation_with_pasid(iommu, did, pasid);

/* Device IOTLB doesn't need to be flushed in caching mode. */
if (!cap_caching_mode(iommu->cap))
devtlb_invalidation_with_pasid(iommu, dev, pasid);

> Best regards,
> baolu
> 
> > 
> > Cc: Lu Baolu 
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/intel-svm.c | 6 ++
> >   1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 8f42d717d8d7..1483f1845762 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -268,10 +268,9 @@ static void intel_mm_release(struct
> > mmu_notifier *mn, struct mm_struct *mm)
> >  * *has* to handle gracefully without affecting other
> > processes. */
> > rcu_read_lock();
> > -   list_for_each_entry_rcu(sdev, >devs, list) {
> > +   list_for_each_entry_rcu(sdev, >devs, list)
> > intel_pasid_tear_down_entry(svm->iommu,
> > sdev->dev, svm->pasid);
> > -   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
> > -   }
> > +
> > rcu_read_unlock();
> >   
> >   }
> > @@ -731,7 +730,6 @@ int intel_svm_unbind_mm(struct device *dev, int
> > pasid)
> >  * large and has to be physically
> > contiguous. So it's
> >  * hard to be as defensive as we might
> > like. */ intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> > -   intel_flush_svm_range_dev(svm, sdev, 0,
> > -1, 0); kfree_rcu(sdev, rcu);
> >   
> > if (list_empty(>devs)) {
> >   

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 13/13] iommu/smmuv3: Report non recoverable faults

2020-03-20 Thread Eric Auger

When a stage 1 related fault event is read from the event queue,
let's propagate it to potential external fault listeners, ie. users
who registered a fault handler.

Signed-off-by: Eric Auger 

---
v8 -> v9:
- adapt to the removal of IOMMU_FAULT_UNRECOV_PERM_VALID:
  only look at IOMMU_FAULT_UNRECOV_ADDR_VALID which comes with
  perm
- do not advertise IOMMU_FAULT_UNRECOV_PASID_VALID faults for
  translation faults
- trace errors if !master
- test nested before calling iommu_report_device_fault
- call the fault handler unconditionnally in non nested mode

v4 -> v5:
- s/IOMMU_FAULT_PERM_INST/IOMMU_FAULT_PERM_EXEC
---
 drivers/iommu/arm-smmu-v3.c | 182 +---
 1 file changed, 171 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c98ee8edf5cf..c906a07396e7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -171,6 +171,26 @@
 #define ARM_SMMU_PRIQ_IRQ_CFG1 0xd8
 #define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
 
+/* Events */
+#define ARM_SMMU_EVT_F_UUT 0x01
+#define ARM_SMMU_EVT_C_BAD_STREAMID0x02
+#define ARM_SMMU_EVT_F_STE_FETCH   0x03
+#define ARM_SMMU_EVT_C_BAD_STE 0x04
+#define ARM_SMMU_EVT_F_BAD_ATS_TREQ0x05
+#define ARM_SMMU_EVT_F_STREAM_DISABLED 0x06
+#define ARM_SMMU_EVT_F_TRANSL_FORBIDDEN0x07
+#define ARM_SMMU_EVT_C_BAD_SUBSTREAMID 0x08
+#define ARM_SMMU_EVT_F_CD_FETCH0x09
+#define ARM_SMMU_EVT_C_BAD_CD  0x0a
+#define ARM_SMMU_EVT_F_WALK_EABT   0x0b
+#define ARM_SMMU_EVT_F_TRANSLATION 0x10
+#define ARM_SMMU_EVT_F_ADDR_SIZE   0x11
+#define ARM_SMMU_EVT_F_ACCESS  0x12
+#define ARM_SMMU_EVT_F_PERMISSION  0x13
+#define ARM_SMMU_EVT_F_TLB_CONFLICT0x20
+#define ARM_SMMU_EVT_F_CFG_CONFLICT0x21
+#define ARM_SMMU_EVT_E_PAGE_REQUEST0x24
+
 /* Common MSI config fields */
 #define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2)
 #define MSI_CFG2_SHGENMASK(5, 4)
@@ -387,6 +407,15 @@
 #define EVTQ_MAX_SZ_SHIFT  (Q_MAX_SZ_SHIFT - EVTQ_ENT_SZ_SHIFT)
 
 #define EVTQ_0_ID  GENMASK_ULL(7, 0)
+#define EVTQ_0_SSV GENMASK_ULL(11, 11)
+#define EVTQ_0_SUBSTREAMID GENMASK_ULL(31, 12)
+#define EVTQ_0_STREAMIDGENMASK_ULL(63, 32)
+#define EVTQ_1_PNU GENMASK_ULL(33, 33)
+#define EVTQ_1_IND GENMASK_ULL(34, 34)
+#define EVTQ_1_RNW GENMASK_ULL(35, 35)
+#define EVTQ_1_S2  GENMASK_ULL(39, 39)
+#define EVTQ_1_CLASS   GENMASK_ULL(40, 41)
+#define EVTQ_3_FETCH_ADDR  GENMASK_ULL(51, 3)
 
 /* PRI queue */
 #define PRIQ_ENT_SZ_SHIFT  4
@@ -730,6 +759,57 @@ struct arm_smmu_domain {
spinlock_t  devices_lock;
 };
 
+/* fault propagation */
+struct arm_smmu_fault_propagation_data {
+   enum iommu_fault_reason reason;
+   bool s1_check;
+   u32 fields; /* IOMMU_FAULT_UNRECOV_*_VALID bits */
+};
+
+/*
+ * Describes how SMMU faults translate into generic IOMMU faults
+ * and if they need to be reported externally
+ */
+static const struct arm_smmu_fault_propagation_data fault_propagation[] = {
+[ARM_SMMU_EVT_F_UUT]   = { },
+[ARM_SMMU_EVT_C_BAD_STREAMID]  = { },
+[ARM_SMMU_EVT_F_STE_FETCH] = { },
+[ARM_SMMU_EVT_C_BAD_STE]   = { },
+[ARM_SMMU_EVT_F_BAD_ATS_TREQ]  = { },
+[ARM_SMMU_EVT_F_STREAM_DISABLED]   = { },
+[ARM_SMMU_EVT_F_TRANSL_FORBIDDEN]  = { },
+[ARM_SMMU_EVT_C_BAD_SUBSTREAMID]   = {IOMMU_FAULT_REASON_PASID_INVALID,
+  false,
+  IOMMU_FAULT_UNRECOV_PASID_VALID
+ },
+[ARM_SMMU_EVT_F_CD_FETCH]  = {IOMMU_FAULT_REASON_PASID_FETCH,
+  false,
+  IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+ },
+[ARM_SMMU_EVT_C_BAD_CD]= 
{IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+  false,
+ },
+[ARM_SMMU_EVT_F_WALK_EABT] = {IOMMU_FAULT_REASON_WALK_EABT, true,
+  IOMMU_FAULT_UNRECOV_ADDR_VALID |
+  IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_TRANSLATION]   = {IOMMU_FAULT_REASON_PTE_FETCH, true,
+  IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_ADDR_SIZE] = {IOMMU_FAULT_REASON_OOR_ADDRESS, true,
+  IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },

[PATCH v10 09/13] dma-iommu: Implement NESTED_MSI cookie

2020-03-20 Thread Eric Auger

Up to now, when the type was UNMANAGED, we used to
allocate IOVA pages within a reserved IOVA MSI range.

If both the host and the guest are exposed with SMMUs, each
would allocate an IOVA. The guest allocates an IOVA (gIOVA)
to map onto the guest MSI doorbell (gDB). The Host allocates
another IOVA (hIOVA) to map onto the physical doorbell (hDB).

So we end up with 2 unrelated mappings, at S1 and S2:
 S1 S2
gIOVA-> gDB
   hIOVA->hDB

The PCI device would be programmed with hIOVA.
No stage 1 mapping would existing, causing the MSIs to fault.

iommu_dma_bind_guest_msi() allows to pass gIOVA/gDB
to the host so that gIOVA can be used by the host instead of
re-allocating a new hIOVA.

 S1   S2
gIOVA->gDB->hDB

this time, the PCI device can be programmed with the gIOVA MSI
doorbell which is correctly mapped through both stages.

Nested mode is not compatible with HW MSI regions as in that
case gDB and hDB should have a 1-1 mapping. This check will
be done when attaching each device to the IOMMU domain.

Signed-off-by: Eric Auger 

---

v7 -> v8:
- correct iommu_dma_(un)bind_guest_msi when
  !CONFIG_IOMMU_DMA
- Mentioned nested mode is not compatible with HW MSI regions
  in commit message
- protect with msi_lock on unbind

v6 -> v7:
- removed device handle

v3 -> v4:
- change function names; add unregister
- protect with msi_lock

v2 -> v3:
- also store the device handle on S1 mapping registration.
  This garantees we associate the associated S2 mapping binds
  to the correct physical MSI controller.

v1 -> v2:
- unmap stage2 on put()
---
 drivers/iommu/dma-iommu.c | 142 +-
 include/linux/dma-iommu.h |  16 +
 2 files changed, 155 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5fd9a7..b9db87a92c4c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,12 +29,15 @@
 struct iommu_dma_msi_page {
struct list_headlist;
dma_addr_t  iova;
+   dma_addr_t  gpa;
phys_addr_t phys;
+   size_t  s1_granule;
 };
 
 enum iommu_dma_cookie_type {
IOMMU_DMA_IOVA_COOKIE,
IOMMU_DMA_MSI_COOKIE,
+   IOMMU_DMA_NESTED_MSI_COOKIE,
 };
 
 struct iommu_dma_cookie {
@@ -45,6 +49,7 @@ struct iommu_dma_cookie {
dma_addr_t  msi_iova;
};
struct list_headmsi_page_list;
+   spinlock_t  msi_lock;
 
/* Domain for flush queue callback; NULL if flush queue not in use */
struct iommu_domain *fq_domain;
@@ -63,6 +68,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum 
iommu_dma_cookie_type type)
 
cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
if (cookie) {
+   spin_lock_init(>msi_lock);
INIT_LIST_HEAD(>msi_page_list);
cookie->type = type;
}
@@ -96,14 +102,17 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
  *
  * Users who manage their own IOVA allocation and do not want DMA API support,
  * but would still like to take advantage of automatic MSI remapping, can use
- * this to initialise their own domain appropriately. Users should reserve a
+ * this to initialise their own domain appropriately. Users may reserve a
  * contiguous IOVA region, starting at @base, large enough to accommodate the
  * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
- * used by the devices attached to @domain.
+ * used by the devices attached to @domain. The other way round is to provide
+ * usable iova pages through the iommu_dma_bind_doorbell API (nested stages
+ * use case)
  */
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
 {
struct iommu_dma_cookie *cookie;
+   int nesting, ret;
 
if (domain->type != IOMMU_DOMAIN_UNMANAGED)
return -EINVAL;
@@ -111,7 +120,12 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, 
dma_addr_t base)
if (domain->iova_cookie)
return -EEXIST;
 
-   cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+   ret =  iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, );
+   if (!ret && nesting)
+   cookie = cookie_alloc(IOMMU_DMA_NESTED_MSI_COOKIE);
+   else
+   cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+
if (!cookie)
return -ENOMEM;
 
@@ -132,6 +146,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi, *tmp;
+   bool s2_unmap = false;
 
if (!cookie)
return;
@@ -139,7 +154,15 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
if (cookie->type ==

[PATCH v10 11/13] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions

2020-03-20 Thread Eric Auger

Nested mode currently is not compatible with HW MSI reserved regions.
Indeed MSI transactions targeting this MSI doorbells bypass the SMMU.

Let's check nested mode is not attempted in such configuration.

Signed-off-by: Eric Auger 
---
 drivers/iommu/arm-smmu-v3.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 84dce0b2e935..106f9c563823 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2921,6 +2921,23 @@ static bool arm_smmu_share_msi_domain(struct 
iommu_domain *domain,
return share;
 }
 
+static bool arm_smmu_has_hw_msi_resv_region(struct device *dev)
+{
+   struct iommu_resv_region *region;
+   bool has_msi_resv_region = false;
+   LIST_HEAD(resv_regions);
+
+   iommu_get_resv_regions(dev, _regions);
+   list_for_each_entry(region, _regions, list) {
+   if (region->type == IOMMU_RESV_MSI) {
+   has_msi_resv_region = true;
+   break;
+   }
+   }
+   iommu_put_resv_regions(dev, _regions);
+   return has_msi_resv_region;
+}
+
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
int ret = 0;
@@ -2965,10 +2982,12 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
/*
 * In nested mode we must check all devices belonging to the
 * domain share the same physical MSI doorbell. Otherwise nested
-* stage MSI binding is not supported.
+* stage MSI binding is not supported. Also nested mode is not
+* compatible with MSI HW reserved regions.
 */
if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
-   !arm_smmu_share_msi_domain(domain, dev)) {
+   (!arm_smmu_share_msi_domain(domain, dev) ||
+arm_smmu_has_hw_msi_resv_region(dev))) {
ret = -EINVAL;
goto out_unlock;
}
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 12/13] iommu/smmuv3: Implement bind/unbind_guest_msi

2020-03-20 Thread Eric Auger

The bind/unbind_guest_msi() callbacks check the domain
is NESTED and redirect to the dma-iommu implementation.

Signed-off-by: Eric Auger 

---

v6 -> v7:
- remove device handle argument
---
 drivers/iommu/arm-smmu-v3.c | 43 +
 1 file changed, 43 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 106f9c563823..c98ee8edf5cf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3371,6 +3371,47 @@ static void arm_smmu_get_resv_regions(struct device *dev,
iommu_dma_get_resv_regions(dev, head);
 }
 
+static int
+arm_smmu_bind_guest_msi(struct iommu_domain *domain,
+   dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu;
+   int ret = -EINVAL;
+
+   mutex_lock(_domain->init_mutex);
+   smmu = smmu_domain->smmu;
+   if (!smmu)
+   goto out;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   goto out;
+
+   ret = iommu_dma_bind_guest_msi(domain, giova, gpa, size);
+out:
+   mutex_unlock(_domain->init_mutex);
+   return ret;
+}
+
+static void
+arm_smmu_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu;
+
+   mutex_lock(_domain->init_mutex);
+   smmu = smmu_domain->smmu;
+   if (!smmu)
+   goto unlock;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   goto unlock;
+
+   iommu_dma_unbind_guest_msi(domain, giova);
+unlock:
+   mutex_unlock(_domain->init_mutex);
+}
+
 static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
   struct iommu_pasid_table_config *cfg)
 {
@@ -3540,6 +3581,8 @@ static struct iommu_ops arm_smmu_ops = {
.attach_pasid_table = arm_smmu_attach_pasid_table,
.detach_pasid_table = arm_smmu_detach_pasid_table,
.cache_invalidate   = arm_smmu_cache_invalidate,
+   .bind_guest_msi = arm_smmu_bind_guest_msi,
+   .unbind_guest_msi   = arm_smmu_unbind_guest_msi,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
 };
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 10/13] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement

2020-03-20 Thread Eric Auger

In nested mode we enforce the rule that all devices belonging
to the same iommu_domain share the same msi_domain.

Indeed if there were several physical MSI doorbells being used
within a single iommu_domain, it becomes really difficult to
resolve the nested stage mapping translating into the correct
physical doorbell. So let's forbid this situation.

Signed-off-by: Eric Auger 
---
 drivers/iommu/arm-smmu-v3.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 538e368c2e13..84dce0b2e935 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2890,6 +2890,37 @@ static void arm_smmu_detach_dev(struct arm_smmu_master 
*master)
arm_smmu_install_ste_for_dev(master);
 }
 
+static bool arm_smmu_share_msi_domain(struct iommu_domain *domain,
+ struct device *dev)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct irq_domain *irqd = dev_get_msi_domain(dev);
+   struct arm_smmu_master *master;
+   unsigned long flags;
+   bool share = false;
+
+   if (!irqd)
+   return true;
+
+   spin_lock_irqsave(_domain->devices_lock, flags);
+   list_for_each_entry(master, _domain->devices, domain_head) {
+   struct irq_domain *d = dev_get_msi_domain(master->dev);
+
+   if (!d)
+   continue;
+   if (irqd != d) {
+   dev_info(dev, "Nested mode forbids to attach devices "
+"using different physical MSI doorbells "
+"to the same iommu_domain");
+   goto unlock;
+   }
+   }
+   share = true;
+unlock:
+   spin_unlock_irqrestore(_domain->devices_lock, flags);
+   return share;
+}
+
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
int ret = 0;
@@ -2931,6 +2962,16 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
ret = -EINVAL;
goto out_unlock;
}
+   /*
+* In nested mode we must check all devices belonging to the
+* domain share the same physical MSI doorbell. Otherwise nested
+* stage MSI binding is not supported.
+*/
+   if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
+   !arm_smmu_share_msi_domain(domain, dev)) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
 
master->domain = smmu_domain;
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 08/13] iommu/smmuv3: Implement cache_invalidate

2020-03-20 Thread Eric Auger

Implement domain-selective and page-selective IOTLB invalidations.

Signed-off-by: Eric Auger 

---
v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
  anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
 drivers/iommu/arm-smmu-v3.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 39deddea6ae5..538e368c2e13 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3407,6 +3407,58 @@ static void arm_smmu_detach_pasid_table(struct 
iommu_domain *domain)
mutex_unlock(_domain->init_mutex);
 }
 
+static int
+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -EINVAL;
+
+   if (!smmu)
+   return -EINVAL;
+
+   if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB) {
+   if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+   struct iommu_inv_pasid_info *info =
+   _info->pasid_info;
+
+   if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID) ||
+(info->flags & IOMMU_INV_PASID_FLAGS_PASID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+
+   } else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) {
+   struct iommu_inv_addr_info *info = _info->addr_info;
+   size_t size = info->nb_granules * info->granule_size;
+   bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID) ||
+(info->flags & IOMMU_INV_ADDR_FLAGS_PASID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_range(info->addr, size,
+info->granule_size, leaf,
+ smmu_domain, info->archid);
+
+   arm_smmu_cmdq_issue_sync(smmu);
+   } else {
+   return -EINVAL;
+   }
+   }
+   if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
+   inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+   return -ENOENT;
+   }
+   return 0;
+}
+
 static struct iommu_ops arm_smmu_ops = {
.capable= arm_smmu_capable,
.domain_alloc   = arm_smmu_domain_alloc,
@@ -3427,6 +3479,7 @@ static struct iommu_ops arm_smmu_ops = {
.put_resv_regions   = generic_iommu_put_resv_regions,
.attach_pasid_table = arm_smmu_attach_pasid_table,
.detach_pasid_table = arm_smmu_detach_pasid_table,
+   .cache_invalidate   = arm_smmu_cache_invalidate,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
 };
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 07/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-03-20 Thread Eric Auger

With nested stage support, soon we will need to invalidate
S1 contexts and ranges tagged with an unmanaged asid, this
latter being managed by the guest. So let's introduce 2 helpers
that allow to invalidate with externally managed ASIDs

Signed-off-by: Eric Auger 
---
 drivers/iommu/arm-smmu-v3.c | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 92a1a5ac5b50..39deddea6ae5 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2301,13 +2301,18 @@ static int arm_smmu_atc_inv_domain(struct 
arm_smmu_domain *smmu_domain,
 }
 
 /* IO_PGTABLE API */
-static void arm_smmu_tlb_inv_context(void *cookie)
+
+static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain *smmu_domain,
+  int ext_asid)
 {
-   struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cmdq_ent cmd;
 
-   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+   if (ext_asid >= 0) { /* guest stage 1 invalidation */
+   cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
+   cmd.tlbi.asid   = ext_asid;
+   cmd.tlbi.vmid   = smmu_domain->s2_cfg->vmid;
+   } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
cmd.tlbi.asid   = smmu_domain->s1_cfg->cd.asid;
cmd.tlbi.vmid   = 0;
@@ -2328,9 +2333,17 @@ static void arm_smmu_tlb_inv_context(void *cookie)
arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
 }
 
-static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
+static void arm_smmu_tlb_inv_context(void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, -1);
+}
+
+static void __arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
   size_t granule, bool leaf,
-  struct arm_smmu_domain *smmu_domain)
+  struct arm_smmu_domain *smmu_domain,
+  int ext_asid)
 {
struct arm_smmu_device *smmu = smmu_domain->smmu;
unsigned long start = iova, end = iova + size, num_pages = 0, tg = 0;
@@ -2345,7 +2358,11 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, 
size_t size,
if (!size)
return;
 
-   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+   if (ext_asid >= 0) {  /* guest stage 1 invalidation */
+   cmd.opcode  = CMDQ_OP_TLBI_NH_VA;
+   cmd.tlbi.asid   = ext_asid;
+   cmd.tlbi.vmid   = smmu_domain->s2_cfg->vmid;
+   } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
cmd.opcode  = CMDQ_OP_TLBI_NH_VA;
cmd.tlbi.asid   = smmu_domain->s1_cfg->cd.asid;
} else {
@@ -2405,6 +2422,13 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, 
size_t size,
arm_smmu_atc_inv_domain(smmu_domain, 0, start, size);
 }
 
+static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
+  size_t granule, bool leaf,
+  struct arm_smmu_domain *smmu_domain)
+{
+   __arm_smmu_tlb_inv_range(iova, size, granule, leaf, smmu_domain, -1);
+}
+
 static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
 unsigned long iova, size_t granule,
 void *cookie)
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 06/13] iommu/smmuv3: Implement attach/detach_pasid_table

2020-03-20 Thread Eric Auger

On attach_pasid_table() we program STE S1 related info set
by the guest into the actual physical STEs. At minimum
we need to program the context descriptor GPA and compute
whether the stage1 is translated/bypassed or aborted.

Signed-off-by: Eric Auger 

---
v7 -> v8:
- remove smmu->features check, now done on domain finalize

v6 -> v7:
- check versions and comment the fact we don't need to take
  into account s1dss and s1fmt
v3 -> v4:
- adapt to changes in iommu_pasid_table_config
- different programming convention at s1_cfg/s2_cfg/ste.abort

v2 -> v3:
- callback now is named set_pasid_table and struct fields
  are laid out differently.

v1 -> v2:
- invalidate the STE before changing them
- hold init_mutex
- handle new fields
---
 drivers/iommu/arm-smmu-v3.c | 98 +
 1 file changed, 98 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7d00244fe725..92a1a5ac5b50 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3287,6 +3287,102 @@ static void arm_smmu_get_resv_regions(struct device 
*dev,
iommu_dma_get_resv_regions(dev, head);
 }
 
+static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
+  struct iommu_pasid_table_config *cfg)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_master *master;
+   struct arm_smmu_device *smmu;
+   unsigned long flags;
+   int ret = -EINVAL;
+
+   if (cfg->format != IOMMU_PASID_FORMAT_SMMUV3)
+   return -EINVAL;
+
+   if (cfg->version != PASID_TABLE_CFG_VERSION_1 ||
+   cfg->smmuv3.version != PASID_TABLE_SMMUV3_CFG_VERSION_1)
+   return -EINVAL;
+
+   mutex_lock(_domain->init_mutex);
+
+   smmu = smmu_domain->smmu;
+
+   if (!smmu)
+   goto out;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   goto out;
+
+   switch (cfg->config) {
+   case IOMMU_PASID_CONFIG_ABORT:
+   kfree(smmu_domain->s1_cfg);
+   smmu_domain->s1_cfg = NULL;
+   smmu_domain->abort = true;
+   break;
+   case IOMMU_PASID_CONFIG_BYPASS:
+   kfree(smmu_domain->s1_cfg);
+   smmu_domain->s1_cfg = NULL;
+   smmu_domain->abort = false;
+   break;
+   case IOMMU_PASID_CONFIG_TRANSLATE:
+   /* we do not support S1 <-> S1 transitions */
+   if (smmu_domain->s1_cfg)
+   goto out;
+
+   /*
+* we currently support a single CD so s1fmt and s1dss
+* fields are also ignored
+*/
+   if (cfg->pasid_bits)
+   goto out;
+
+   smmu_domain->s1_cfg = kzalloc(sizeof(*smmu_domain->s1_cfg),
+ GFP_KERNEL);
+   if (!smmu_domain->s1_cfg) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   smmu_domain->s1_cfg->cdcfg.cdtab_dma = cfg->base_ptr;
+   smmu_domain->abort = false;
+   break;
+   default:
+   goto out;
+   }
+   spin_lock_irqsave(_domain->devices_lock, flags);
+   list_for_each_entry(master, _domain->devices, domain_head)
+   arm_smmu_install_ste_for_dev(master);
+   spin_unlock_irqrestore(_domain->devices_lock, flags);
+   ret = 0;
+out:
+   mutex_unlock(_domain->init_mutex);
+   return ret;
+}
+
+static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_master *master;
+   unsigned long flags;
+
+   mutex_lock(_domain->init_mutex);
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   goto unlock;
+
+   kfree(smmu_domain->s1_cfg);
+   smmu_domain->s1_cfg = NULL;
+   smmu_domain->abort = true;
+
+   spin_lock_irqsave(_domain->devices_lock, flags);
+   list_for_each_entry(master, _domain->devices, domain_head)
+   arm_smmu_install_ste_for_dev(master);
+   spin_unlock_irqrestore(_domain->devices_lock, flags);
+
+unlock:
+   mutex_unlock(_domain->init_mutex);
+}
+
 static struct iommu_ops arm_smmu_ops = {
.capable= arm_smmu_capable,
.domain_alloc   = arm_smmu_domain_alloc,
@@ -3305,6 +3401,8 @@ static struct iommu_ops arm_smmu_ops = {
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
+   .attach_pasid_table = arm_smmu_attach_pasid_table,
+   .detach_pasid_table = arm_smmu_detach_pasid_table,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
 };
 
-- 
2.20.1

[PATCH v10 01/13] iommu: Introduce attach/detach_pasid_table API

2020-03-20 Thread Eric Auger

From: Jacob Pan 

In virtualization use case, when a guest is assigned
a PCI host device, protected by a virtual IOMMU on the guest,
the physical IOMMU must be programmed to be consistent with
the guest mappings. If the physical IOMMU supports two
translation stages it makes sense to program guest mappings
onto the first stage/level (ARM/Intel terminology) while the host
owns the stage/level 2.

In that case, it is mandated to trap on guest configuration
settings and pass those to the physical iommu driver.

This patch adds a new API to the iommu subsystem that allows
to set/unset the pasid table information.

A generic iommu_pasid_table_config struct is introduced in
a new iommu.h uapi header. This is going to be used by the VFIO
user API.

Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu, Yi L 
Signed-off-by: Ashok Raj 
Signed-off-by: Jacob Pan 
Signed-off-by: Eric Auger 
Reviewed-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu.c  | 19 ++
 include/linux/iommu.h  | 18 ++
 include/uapi/linux/iommu.h | 51 ++
 3 files changed, 88 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3e3528436e0b..7cfc285bac17 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1723,6 +1723,25 @@ int iommu_sva_unbind_gpasid(struct iommu_domain *domain, 
struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
 
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+struct iommu_pasid_table_config *cfg)
+{
+   if (unlikely(!domain->ops->attach_pasid_table))
+   return -ENODEV;
+
+   return domain->ops->attach_pasid_table(domain, cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
+
+void iommu_detach_pasid_table(struct iommu_domain *domain)
+{
+   if (unlikely(!domain->ops->detach_pasid_table))
+   return;
+
+   domain->ops->detach_pasid_table(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d1b5f4d98569..d91c7912ec3d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -248,6 +248,8 @@ struct iommu_iotlb_gather {
  * @cache_invalidate: invalidate translation caches
  * @sva_bind_gpasid: bind guest pasid and mm
  * @sva_unbind_gpasid: unbind guest pasid and mm
+ * @attach_pasid_table: attach a pasid table
+ * @detach_pasid_table: detach the pasid table
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  * @owner: Driver module providing these ops
  */
@@ -307,6 +309,9 @@ struct iommu_ops {
  void *drvdata);
void (*sva_unbind)(struct iommu_sva *handle);
int (*sva_get_pasid)(struct iommu_sva *handle);
+   int (*attach_pasid_table)(struct iommu_domain *domain,
+ struct iommu_pasid_table_config *cfg);
+   void (*detach_pasid_table)(struct iommu_domain *domain);
 
int (*page_response)(struct device *dev,
 struct iommu_fault_event *evt,
@@ -443,6 +448,9 @@ extern int iommu_sva_bind_gpasid(struct iommu_domain 
*domain,
struct device *dev, struct iommu_gpasid_bind_data *data);
 extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid);
+extern int iommu_attach_pasid_table(struct iommu_domain *domain,
+   struct iommu_pasid_table_config *cfg);
+extern void iommu_detach_pasid_table(struct iommu_domain *domain);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1033,6 +1041,16 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct 
device *dev)
return -ENODEV;
 }
 
+static inline
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+struct iommu_pasid_table_config *cfg)
+{
+   return -ENODEV;
+}
+
+static inline
+void iommu_detach_pasid_table(struct iommu_domain *domain) {}
+
 static inline struct iommu_sva *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
 {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 4ad3496e5c43..8d00be10dc6d 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -321,4 +321,55 @@ struct iommu_gpasid_bind_data {
};
 };
 
+/**
+ * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
+ * information
+ * @version: API version of this structure
+ * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
+ * or 2-level table)
+ * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
+ *

[PATCH v10 05/13] iommu/smmuv3: Get prepared for nested stage support

2020-03-20 Thread Eric Auger

When nested stage translation is setup, both s1_cfg and
s2_cfg are allocated.

We introduce a new smmu domain abort field that will be set
upon guest stage1 configuration passing.

arm_smmu_write_strtab_ent() is modified to write both stage
fields in the STE and deal with the abort field.

In nested mode, only stage 2 is "finalized" as the host does
not own/configure the stage 1 context descriptor; guest does.

Signed-off-by: Eric Auger 

---
v7 -> v8:
- rebase on 8be39a1a04c1 iommu/arm-smmu-v3: Add a master->domain
  pointer
- restore live checks for not nested cases and add s1_live and
  s2_live to be more previse. Remove bypass local variable.
  In STE live case, move the ste to abort state and send a
  CFGI_STE before updating the rest of the fields.
- check s2ttb in case of live s2

v4 -> v5:
- reset ste.abort on detach

v3 -> v4:
- s1_cfg.nested_abort and nested_bypass removed.
- s/ste.nested/ste.abort
- arm_smmu_write_strtab_ent modifications with introduction
  of local abort, bypass and translate local variables
- comment updated
---
 drivers/iommu/arm-smmu-v3.c | 62 +++--
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8b3083c5f27b..7d00244fe725 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -223,6 +223,7 @@
 #define STRTAB_STE_0_CFG_BYPASS4
 #define STRTAB_STE_0_CFG_S1_TRANS  5
 #define STRTAB_STE_0_CFG_S2_TRANS  6
+#define STRTAB_STE_0_CFG_NESTED7
 
 #define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
 #define STRTAB_STE_0_S1FMT_LINEAR  0
@@ -721,6 +722,7 @@ struct arm_smmu_domain {
enum arm_smmu_domain_stage  stage;
struct arm_smmu_s1_cfg  *s1_cfg;
struct arm_smmu_s2_cfg  *s2_cfg;
+   boolabort;
 
struct iommu_domain domain;
 
@@ -1807,8 +1809,10 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_master *master, u32 sid,
 * three cases at the moment:
 *
 * 1. Invalid (all zero) -> bypass/fault (init)
-* 2. Bypass/fault -> translation/bypass (attach)
-* 3. Translation/bypass -> bypass/fault (detach)
+* 2. Bypass/fault -> single stage translation/bypass (attach)
+* 3. single stage Translation/bypass -> bypass/fault (detach)
+* 4. S2 -> S1 + S2 (attach_pasid_table)
+* 5. S1 + S2 -> S2 (detach_pasid_table)
 *
 * Given that we can't update the STE atomically and the SMMU
 * doesn't read the thing in a defined order, that leaves us
@@ -1819,7 +1823,8 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_master *master, u32 sid,
 * 3. Update Config, sync
 */
u64 val = le64_to_cpu(dst[0]);
-   bool ste_live = false;
+   bool abort, translate, s1_live = false, s2_live = false, ste_live;
+   bool nested = false;
struct arm_smmu_device *smmu = NULL;
struct arm_smmu_s1_cfg *s1_cfg = NULL;
struct arm_smmu_s2_cfg *s2_cfg = NULL;
@@ -1839,6 +1844,7 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_master *master, u32 sid,
if (smmu_domain) {
s1_cfg = smmu_domain->s1_cfg;
s2_cfg = smmu_domain->s2_cfg;
+   nested = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
}
 
if (val & STRTAB_STE_0_V) {
@@ -1846,23 +1852,34 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_master *master, u32 sid,
case STRTAB_STE_0_CFG_BYPASS:
break;
case STRTAB_STE_0_CFG_S1_TRANS:
+   s1_live = true;
+   break;
case STRTAB_STE_0_CFG_S2_TRANS:
-   ste_live = true;
+   s2_live = true;
+   break;
+   case STRTAB_STE_0_CFG_NESTED:
+   s1_live = true;
+   s2_live = true;
break;
case STRTAB_STE_0_CFG_ABORT:
-   BUG_ON(!disable_bypass);
break;
default:
BUG(); /* STE corruption */
}
}
 
+   ste_live = s1_live || s2_live;
+
/* Nuke the existing STE_0 value, as we're going to rewrite it */
val = STRTAB_STE_0_V;
 
/* Bypass/fault */
-   if (!smmu_domain || !(s1_cfg || s2_cfg)) {
-   if (!smmu_domain && disable_bypass)
+
+   abort = (!smmu_domain && disable_bypass) || smmu_domain->abort;
+   translate = s1_cfg || s2_cfg;
+
+   if (abort || !translate) {
+   if (abort)
val |= FIELD_PREP(STRTAB_STE_0_CFG, 
STRTAB_STE_0_CFG_ABORT);
else
val |= FIELD_PREP(STRTAB_STE_0_CFG, 
STRTAB_STE_0_CFG_BYPASS);
@@ -1880,8 +1897,18 @@

[PATCH v10 04/13] iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg

2020-03-20 Thread Eric Auger

In preparation for the introduction of nested stages
let's turn s1_cfg and s2_cfg fields into pointers which are
dynamically allocated depending on the smmu_domain stage.

In nested mode, both stages will coexist and s1_cfg will
be allocated when the guest configuration gets passed.

Signed-off-by: Eric Auger 
---
 drivers/iommu/arm-smmu-v3.c | 94 -
 1 file changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3d726e97934f..8b3083c5f27b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -719,10 +719,8 @@ struct arm_smmu_domain {
atomic_tnr_ats_masters;
 
enum arm_smmu_domain_stage  stage;
-   union {
-   struct arm_smmu_s1_cfg  s1_cfg;
-   struct arm_smmu_s2_cfg  s2_cfg;
-   };
+   struct arm_smmu_s1_cfg  *s1_cfg;
+   struct arm_smmu_s2_cfg  *s2_cfg;
 
struct iommu_domain domain;
 
@@ -1598,9 +1596,9 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain 
*smmu_domain,
unsigned int idx;
struct arm_smmu_l1_ctx_desc *l1_desc;
struct arm_smmu_device *smmu = smmu_domain->smmu;
-   struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
+   struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg->cdcfg;
 
-   if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
+   if (smmu_domain->s1_cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS;
 
idx = ssid >> CTXDESC_SPLIT;
@@ -1635,7 +1633,7 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain,
__le64 *cdptr;
struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-   if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax)))
+   if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg->s1cdmax)))
return -E2BIG;
 
cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
@@ -1700,7 +1698,7 @@ static int arm_smmu_alloc_cd_tables(struct 
arm_smmu_domain *smmu_domain)
size_t l1size;
size_t max_contexts;
struct arm_smmu_device *smmu = smmu_domain->smmu;
-   struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
+   struct arm_smmu_s1_cfg *cfg = smmu_domain->s1_cfg;
struct arm_smmu_ctx_desc_cfg *cdcfg = >cdcfg;
 
max_contexts = 1 << cfg->s1cdmax;
@@ -1748,7 +1746,7 @@ static void arm_smmu_free_cd_tables(struct 
arm_smmu_domain *smmu_domain)
int i;
size_t size, l1size;
struct arm_smmu_device *smmu = smmu_domain->smmu;
-   struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
+   struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg->cdcfg;
 
if (cdcfg->l1_desc) {
size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
@@ -1839,17 +1837,8 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_master *master, u32 sid,
}
 
if (smmu_domain) {
-   switch (smmu_domain->stage) {
-   case ARM_SMMU_DOMAIN_S1:
-   s1_cfg = _domain->s1_cfg;
-   break;
-   case ARM_SMMU_DOMAIN_S2:
-   case ARM_SMMU_DOMAIN_NESTED:
-   s2_cfg = _domain->s2_cfg;
-   break;
-   default:
-   break;
-   }
+   s1_cfg = smmu_domain->s1_cfg;
+   s2_cfg = smmu_domain->s2_cfg;
}
 
if (val & STRTAB_STE_0_V) {
@@ -2286,11 +2275,11 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 
if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
-   cmd.tlbi.asid   = smmu_domain->s1_cfg.cd.asid;
+   cmd.tlbi.asid   = smmu_domain->s1_cfg->cd.asid;
cmd.tlbi.vmid   = 0;
} else {
cmd.opcode  = CMDQ_OP_TLBI_S12_VMALL;
-   cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
+   cmd.tlbi.vmid   = smmu_domain->s2_cfg->vmid;
}
 
/*
@@ -2324,10 +2313,10 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, 
size_t size,
 
if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
cmd.opcode  = CMDQ_OP_TLBI_NH_VA;
-   cmd.tlbi.asid   = smmu_domain->s1_cfg.cd.asid;
+   cmd.tlbi.asid   = smmu_domain->s1_cfg->cd.asid;
} else {
cmd.opcode  = CMDQ_OP_TLBI_S2_IPA;
-   cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
+   cmd.tlbi.vmid   = smmu_domain->s2_cfg->vmid;
}
 
if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
@@ -2477,22 +2466,24 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct

Re: [GIT PULL] iommu/arm-smmu: Updates for 5.7

2020-03-20 Thread Joerg Roedel

On Fri, Mar 20, 2020 at 03:35:20PM +, Will Deacon wrote:
> Hi Joerg,
> 
> Please pull these Arm SMMU updates for 5.7. The summary is in the tag (which
> you may need to re-fetch if you've got my tree added as a remote).
> 
> Cheers,
> 
> Will
> 
> --->8
> 
> The following changes since commit f8788d86ab28f61f7b46eb6be375f8a726783636:
> 
>   Linux 5.6-rc3 (2020-02-23 16:17:42 -0800)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
> tags/arm-smmu-updates

Pulled, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v10 02/13] iommu: Introduce bind/unbind_guest_msi

2020-03-20 Thread Eric Auger

On ARM, MSI are translated by the SMMU. An IOVA is allocated
for each MSI doorbell. If both the host and the guest are exposed
with SMMUs, we end up with 2 different IOVAs allocated by each.
guest allocates an IOVA (gIOVA) to map onto the guest MSI
doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
onto the physical doorbell (hDB).

So we end up with 2 untied mappings:
 S1S2
gIOVA->gDB
  hIOVA->hDB

Currently the PCI device is programmed by the host with hIOVA
as MSI doorbell. So this does not work.

This patch introduces an API to pass gIOVA/gDB to the host so
that gIOVA can be reused by the host instead of re-allocating
a new IOVA. So the goal is to create the following nested mapping:

 S1S2
gIOVA->gDB ->hDB

and program the PCI device with gIOVA MSI doorbell.

In case we have several devices attached to this nested domain
(devices belonging to the same group), they cannot be isolated
on guest side either. So they should also end up in the same domain
on guest side. We will enforce that all the devices attached to
the host iommu domain use the same physical doorbell and similarly
a single virtual doorbell mapping gets registered (1 single
virtual doorbell is used on guest as well).

Signed-off-by: Eric Auger 

---
v7 -> v8:
- dummy iommu_unbind_guest_msi turned into a void function

v6 -> v7:
- remove the device handle parameter.
- Add comments saying there can only be a single MSI binding
  registered per iommu_domain
v5 -> v6:
-fix compile issue when IOMMU_API is not set

v3 -> v4:
- add unbind

v2 -> v3:
- add a struct device handle
---
 drivers/iommu/iommu.c | 37 +
 include/linux/iommu.h | 19 +++
 2 files changed, 56 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7cfc285bac17..ceef73cb088a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1756,6 +1756,43 @@ static void __iommu_detach_device(struct iommu_domain 
*domain,
trace_detach_device_from_domain(dev);
 }
 
+/**
+ * iommu_bind_guest_msi - Passes the stage1 GIOVA/GPA mapping of a
+ * virtual doorbell
+ *
+ * @domain: iommu domain the stage 1 mapping will be attached to
+ * @iova: iova allocated by the guest
+ * @gpa: guest physical address of the virtual doorbell
+ * @size: granule size used for the mapping
+ *
+ * The associated IOVA can be reused by the host to create a nested
+ * stage2 binding mapping translating into the physical doorbell used
+ * by the devices attached to the domain.
+ *
+ * All devices within the domain must share the same physical doorbell.
+ * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
+ */
+
+int iommu_bind_guest_msi(struct iommu_domain *domain,
+dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+   if (unlikely(!domain->ops->bind_guest_msi))
+   return -ENODEV;
+
+   return domain->ops->bind_guest_msi(domain, giova, gpa, size);
+}
+EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
+
+void iommu_unbind_guest_msi(struct iommu_domain *domain,
+   dma_addr_t iova)
+{
+   if (unlikely(!domain->ops->unbind_guest_msi))
+   return;
+
+   domain->ops->unbind_guest_msi(domain, iova);
+}
+EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
+
 void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 {
struct iommu_group *group;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d91c7912ec3d..c698272913b8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -250,6 +250,8 @@ struct iommu_iotlb_gather {
  * @sva_unbind_gpasid: unbind guest pasid and mm
  * @attach_pasid_table: attach a pasid table
  * @detach_pasid_table: detach the pasid table
+ * @bind_guest_msi: provides a stage1 giova/gpa MSI doorbell mapping
+ * @unbind_guest_msi: withdraw a stage1 giova/gpa MSI doorbell mapping
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  * @owner: Driver module providing these ops
  */
@@ -323,6 +325,10 @@ struct iommu_ops {
 
int (*sva_unbind_gpasid)(struct device *dev, int pasid);
 
+   int (*bind_guest_msi)(struct iommu_domain *domain,
+ dma_addr_t giova, phys_addr_t gpa, size_t size);
+   void (*unbind_guest_msi)(struct iommu_domain *domain, dma_addr_t giova);
+
unsigned long pgsize_bitmap;
struct module *owner;
 };
@@ -451,6 +457,10 @@ extern int iommu_sva_unbind_gpasid(struct iommu_domain 
*domain,
 extern int iommu_attach_pasid_table(struct iommu_domain *domain,
struct iommu_pasid_table_config *cfg);
 extern void iommu_detach_pasid_table(struct iommu_domain *domain);
+extern int iommu_bind_guest_msi(struct iommu_domain *domain,
+   dma_addr_t giova, phys_addr_t gpa, size_t size);
+extern void iommu_unbind_guest_msi(struct iommu_domain *domain,

[PATCH v10 03/13] iommu/arm-smmu-v3: Maintain a SID->device structure

2020-03-20 Thread Eric Auger

From: Jean-Philippe Brucker 

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 112 +++-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index a7222dd5b117..3d726e97934f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -677,6 +677,16 @@ struct arm_smmu_device {
 
/* IOMMU core code handle */
struct iommu_device iommu;
+
+   struct rb_root  streams;
+   struct mutexstreams_mutex;
+
+};
+
+struct arm_smmu_stream {
+   u32 id;
+   struct arm_smmu_master  *master;
+   struct rb_node  node;
 };
 
 /* SMMU private data for each master */
@@ -687,6 +697,7 @@ struct arm_smmu_master {
struct list_headdomain_head;
u32 *sids;
unsigned intnum_sids;
+   struct arm_smmu_stream  *streams;
boolats_enabled;
unsigned intssid_bits;
 };
@@ -1967,6 +1978,32 @@ static int arm_smmu_init_l2_strtab(struct 
arm_smmu_device *smmu, u32 sid)
return 0;
 }
 
+__maybe_unused
+static struct arm_smmu_master *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+   struct rb_node *node;
+   struct arm_smmu_stream *stream;
+   struct arm_smmu_master *master = NULL;
+
+   mutex_lock(>streams_mutex);
+   node = smmu->streams.rb_node;
+   while (node) {
+   stream = rb_entry(node, struct arm_smmu_stream, node);
+   if (stream->id < sid) {
+   node = node->rb_right;
+   } else if (stream->id > sid) {
+   node = node->rb_left;
+   } else {
+   master = stream->master;
+   break;
+   }
+   }
+   mutex_unlock(>streams_mutex);
+
+   return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -2912,6 +2949,69 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device 
*smmu, u32 sid)
return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+ struct arm_smmu_master *master)
+{
+   int i;
+   int ret = 0;
+   struct arm_smmu_stream *new_stream, *cur_stream;
+   struct rb_node **new_node, *parent_node = NULL;
+
+   master->streams = kcalloc(master->num_sids,
+ sizeof(struct arm_smmu_stream), GFP_KERNEL);
+   if (!master->streams)
+   return -ENOMEM;
+
+   mutex_lock(>streams_mutex);
+   for (i = 0; i < master->num_sids && !ret; i++) {
+   new_stream = >streams[i];
+   new_stream->id = master->sids[i];
+   new_stream->master = master;
+
+   new_node = &(smmu->streams.rb_node);
+   while (*new_node) {
+   cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+ node);
+   parent_node = *new_node;
+   if (cur_stream->id > new_stream->id) {
+   new_node = &((*new_node)->rb_left);
+   } else if (cur_stream->id < new_stream->id) {
+   new_node = &((*new_node)->rb_right);
+   } else {
+   dev_warn(master->dev,
+"stream %u already in tree\n",
+cur_stream->id);
+   ret = -EINVAL;
+   break;
+   }
+   }
+
+   if (!ret) {
+   rb_link_node(_stream->node, parent_node, new_node);
+   rb_insert_color(_stream->node, >streams);
+   }
+   }
+   mutex_unlock(>streams_mutex);
+
+   return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
+  struct arm_smmu_master *master)
+{
+   int i;
+
+   if (!master->streams)
+   return;
+
+   mutex_lock(>streams_mutex);
+   for (i = 0; i < master->num_sids; i++)
+   rb_erase(>streams[i].node, >streams);
+   mutex_unlock(>streams_mutex);
+
+   kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static int arm_smmu_add_device(struct device *dev)
@@ -2979,15 +3079,21 @@ static int arm_smmu_add_device(struct device *dev)
if (ret)
goto err_disable_pasid;
 
+

[PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-03-20 Thread Eric Auger

This is a rebase on top of Will's arm-smmu-updates branch
(git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
tags/arm-smmu-updates)

This work has been stalled since Plumber 2019. Since then
some users expressed interest in that work and tested v9:
- https://patchwork.kernel.org/cover/11039995/#23012381
- https://patchwork.kernel.org/cover/11039995/#23197235

The VFIO series is sent separately.

Background:

This series brings the IOMMU part of HW nested paging support
in the SMMUv3. The VFIO part is submitted separately.

The IOMMU API is extended to support 2 new API functionalities:
1) pass the guest stage 1 configuration
2) pass stage 1 MSI bindings

Then those capabilities gets implemented in the SMMUv3 driver.

The virtualizer passes information through the VFIO user API
which cascades them to the iommu subsystem. This allows the guest
to own stage 1 tables and context descriptors (so-called PASID
table) while the host owns stage 2 tables and main configuration
structures (STE).

Best Regards

Eric

This series can be found at:
https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
(also includes the VFIO part)

History:

v9 -> v10:
- rebase on top of 5.6.0-rc3

v8 -> v9:
- rebase on 5.3
- split iommu/vfio parts

v6 -> v8:
- Implement VFIO-PCI device specific interrupt framework

v7 -> v8:
- rebase on top of v5.2-rc1 and especially
  8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain pointer
- dynamic alloc of s1_cfg/s2_cfg
- __arm_smmu_tlb_inv_asid/s1_range_nosync
- check there is no HW MSI regions
- asid invalidation using pasid extended struct (change in the uapi)
- add s1_live/s2_live checks
- move check about support of nested stages in domain finalise
- fixes in error reporting according to the discussion with Robin
- reordered the patches to have first iommu/smmuv3 patches and then
  VFIO patches

v6 -> v7:
- removed device handle from bind/unbind_guest_msi
- added "iommu/smmuv3: Nested mode single MSI doorbell per domain
  enforcement"
- added few uapi comments as suggested by Jean, Jacop and Alex

v5 -> v6:
- Fix compilation issue when CONFIG_IOMMU_API is unset

v4 -> v5:
- fix bug reported by Vincent: fault handler unregistration now happens in
  vfio_pci_release
- IOMMU_FAULT_PERM_* moved outside of struct definition + small
  uapi changes suggested by Kean-Philippe (except fetch_addr)
- iommu: introduce device fault report API: removed the PRI part.
- see individual logs for more details
- reset the ste abort flag on detach

v3 -> v4:
- took into account Alex, jean-Philippe and Robin's comments on v3
- rework of the smmuv3 driver integration
- add tear down ops for msi binding and PASID table binding
- fix S1 fault propagation
- put fault reporting patches at the beginning of the series following
  Jean-Philippe's request
- update of the cache invalidate and fault API uapis
- VFIO fault reporting rework with 2 separate regions and one mmappable
  segment for the fault queue
- moved to PATCH

v2 -> v3:
- When registering the S1 MSI binding we now store the device handle. This
  addresses Robin's comment about discimination of devices beonging to
  different S1 groups and using different physical MSI doorbells.
- Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
  set the eventfd and expose the faults through an mmappable fault region

v1 -> v2:
- Added the fault reporting capability
- asid properly passed on invalidation (fix assignment of multiple
  devices)
- see individual change logs for more info

Eric Auger (11):
  iommu: Introduce bind/unbind_guest_msi
  iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg
  iommu/smmuv3: Get prepared for nested stage support
  iommu/smmuv3: Implement attach/detach_pasid_table
  iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  iommu/smmuv3: Implement cache_invalidate
  dma-iommu: Implement NESTED_MSI cookie
  iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
  iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
regions
  iommu/smmuv3: Implement bind/unbind_guest_msi
  iommu/smmuv3: Report non recoverable faults

Jacob Pan (1):
  iommu: Introduce attach/detach_pasid_table API

Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3: Maintain a SID->device structure

 drivers/iommu/arm-smmu-v3.c | 738 
 drivers/iommu/dma-iommu.c   | 142 ++-
 drivers/iommu/iommu.c   |  56 +++
 include/linux/dma-iommu.h   |  16 +
 include/linux/iommu.h   |  37 ++
 include/uapi/linux/iommu.h  |  51 +++
 6 files changed, 968 insertions(+), 72 deletions(-)

-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[GIT PULL] iommu/arm-smmu: Updates for 5.7

2020-03-20 Thread Will Deacon

Hi Joerg,

Please pull these Arm SMMU updates for 5.7. The summary is in the tag (which
you may need to re-fetch if you've got my tree added as a remote).

Cheers,

Will

--->8

The following changes since commit f8788d86ab28f61f7b46eb6be375f8a726783636:

  Linux 5.6-rc3 (2020-02-23 16:17:42 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
tags/arm-smmu-updates

for you to fetch changes up to 6a481a95d4c198a2dd0a61f8877b92a375757db8:

  iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support (2020-03-18 
21:37:10 +)


Arm SMMU updates for 5.7

- Support for the TLB range invalidation command in SMMUv3.2

- Introduction of command batching helpers...

- ... which are then used to batch up CD and ATC invalidation

- Support for PCI PASID, along with necessary PCI symbol exports

- MAINTAINERS update to include DT binding docs


Jean-Philippe Brucker (5):
  PCI/ATS: Export symbols of PASID functions
  iommu/arm-smmu-v3: Add support for PCI PASID
  iommu/arm-smmu-v3: Write level-1 descriptors atomically
  iommu/arm-smmu-v3: Add command queue batching helpers
  iommu/arm-smmu-v3: Batch context descriptor invalidation

Rob Herring (2):
  iommu/arm-smmu-v3: Batch ATC invalidation commands
  iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support

Robin Murphy (1):
  MAINTAINERS: Cover Arm SMMU DT bindings

 MAINTAINERS |   1 +
 drivers/iommu/arm-smmu-v3.c | 204 ++--
 drivers/pci/ats.c   |   4 +
 3 files changed, 181 insertions(+), 28 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-20 Thread Greg Kroah-Hartman

On Fri, Mar 20, 2020 at 03:16:39PM +0100, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/device.h  |  6 ++
>  include/linux/dma-mapping.h | 30 ++
>  kernel/dma/mapping.c| 36 +++-
>  3 files changed, 51 insertions(+), 21 deletions(-)

Reviewed-by: Greg Kroah-Hartman 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

generic DMA bypass flag v2

2020-03-20 Thread Christoph Hellwig

Hi all,

I've recently beeing chatting with Lu about using dma-iommu and
per-device DMA ops in the intel IOMMU driver, and one missing feature
in dma-iommu is a bypass mode where the direct mapping is used even
when an iommu is attached to improve performance.  The powerpc
code already has a similar mode, so I'd like to move it to the core
DMA mapping code.  As part of that I noticed that the current
powerpc code has a little bug in that it used the wrong check in the
dma_sync_* routines to see if the direct mapping code is used.

These two patches just add the generic code and move powerpc over,
the intel IOMMU bits will require a separate discussion.

The x86 AMD Gart code also has a bypass mode, but it is a lot
strange, so I'm not going to touch it for now.

Changes since v1:
 - rebased to the current dma-mapping-for-next tree
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/2] powerpc: use the generic dma_ops_bypass mode

2020-03-20 Thread Christoph Hellwig

Use the DMA API bypass mechanism for direct window mappings.  This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all.  It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.

Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.

Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/device.h |  5 --
 arch/powerpc/kernel/dma-iommu.c   | 90 ---
 2 files changed, 9 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 266542769e4b..452402215e12 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -18,11 +18,6 @@ struct iommu_table;
  * drivers/macintosh/macio_asic.c
  */
 struct dev_archdata {
-   /*
-* Set to %true if the dma_iommu_ops are requested to use a direct
-* window instead of dynamically mapping memory.
-*/
-   booliommu_bypass : 1;
/*
 * These two used to be a union. However, with the hybrid ops we need
 * both so here we store both a DMA offset for direct mappings and
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de2..569fecd7b5b2 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,23 +14,6 @@
  * Generic iommu implementation
  */
 
-/*
- * The coherent mask may be smaller than the real mask, check if we can
- * really use a direct window.
- */
-static inline bool dma_iommu_alloc_bypass(struct device *dev)
-{
-   return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
-   dma_direct_supported(dev, dev->coherent_dma_mask);
-}
-
-static inline bool dma_iommu_map_bypass(struct device *dev,
-   unsigned long attrs)
-{
-   return dev->archdata.iommu_bypass &&
-   (!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
-}
-
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -39,8 +22,6 @@ static void *dma_iommu_alloc_coherent(struct device *dev, 
size_t size,
  dma_addr_t *dma_handle, gfp_t flag,
  unsigned long attrs)
 {
-   if (dma_iommu_alloc_bypass(dev))
-   return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
dma_handle, dev->coherent_dma_mask, flag,
dev_to_node(dev));
@@ -50,11 +31,7 @@ static void dma_iommu_free_coherent(struct device *dev, 
size_t size,
void *vaddr, dma_addr_t dma_handle,
unsigned long attrs)
 {
-   if (dma_iommu_alloc_bypass(dev))
-   dma_direct_free(dev, size, vaddr, dma_handle, attrs);
-   else
-   iommu_free_coherent(get_iommu_table_base(dev), size, vaddr,
-   dma_handle);
+   iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
 }
 
 /* Creates TCEs for a user provided buffer.  The user buffer must be
@@ -67,9 +44,6 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, 
struct page *page,
 enum dma_data_direction direction,
 unsigned long attrs)
 {
-   if (dma_iommu_map_bypass(dev, attrs))
-   return dma_direct_map_page(dev, page, offset, size, direction,
-   attrs);
return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
  size, dma_get_mask(dev), direction, attrs);
 }
@@ -79,11 +53,8 @@ static void dma_iommu_unmap_page(struct device *dev, 
dma_addr_t dma_handle,
 size_t size, enum dma_data_direction direction,
 unsigned long attrs)
 {
-   if (!dma_iommu_map_bypass(dev, attrs))
-   iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size,
-   direction,  attrs);
-   else
-   dma_direct_unmap_page(dev, dma_handle, size, direction, attrs);
+   iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size, direction,
+attrs);
 }
 
 
@@ -91,8 +62,6 @@ static int dma_iommu_map_sg(struct device *dev, struct 
scatterlist *sglist,
int nelems, enum dma_data_direction

[PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-20 Thread Christoph Hellwig

Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.

Signed-off-by: Christoph Hellwig 
---
 include/linux/device.h  |  6 ++
 include/linux/dma-mapping.h | 30 ++
 kernel/dma/mapping.c| 36 +++-
 3 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 0cd7c647c16c..09be8bb2c4a6 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -525,6 +525,11 @@ struct dev_links_info {
  *   sync_state() callback.
  * @dma_coherent: this particular device is dma coherent, even if the
  * architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ * streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
+ * and optionall (if the coherent mask is large enough) also
+ * for dma allocations.  This flag is managed by the dma ops
+ * instance from ->dma_supported.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -625,6 +630,7 @@ struct device {
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
booldma_coherent:1;
 #endif
+   booldma_ops_bypass : 1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 330ad58fbf4d..c3af0cf5e435 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct 
vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_map_direct(struct device *dev,
+   const struct dma_map_ops *ops)
 {
-   return likely(!ops);
+   return likely(!ops) || dev->dma_ops_bypass;
 }
 
 /*
@@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device 
*dev,
dma_addr_t addr;
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, 
dma_addr_t addr,
const struct dma_map_ops *ops = get_dma_ops(dev);
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (ops->unmap_page)
ops->unmap_page(dev, addr, size, dir, attrs);
@@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, 
struct scatterlist *sg,
int ents;
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, 
struct scatterlist *sg
 
BUG_ON(!valid_dma_direction(dir));
debug_dma_unmap_sg(dev, sg, nents, dir);
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
else if (ops->unmap_sg)
ops->unmap_sg(dev, sg, nents, dir, attrs);
@@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device 
*dev,
if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
return DMA_MAPPING_ERROR;
 
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
addr = dma_direct_map_resource(dev, phys_addr, size, dir, 
attrs);
else if (ops->map_resource)
addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
@@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, 
dma_addr_t addr,
const struct dma_map_ops *ops = get_dma_ops(dev);
 
BUG_ON(!valid_dma_direction(dir));
-   if (!dma_is_direct(ops) && ops->unmap_resource)
+   if (!dma_map_direct(dev, ops) && ops->unmap_resource)
ops->unmap_resource(dev, addr, size, dir, attrs);
debug_dma_unmap_resource(dev, addr, size, dir);
 }
@@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device

Re: [PATCH 3/3] iommu/vt-d: Add build dependency on IOASID

2020-03-20 Thread Lu Baolu


On 2020/3/20 12:32, Jacob Pan wrote:

IOASID code is needed by VT-d scalable mode for PASID allocation.
Add explicit dependency such that IOASID is built-in whenever Intel
IOMMU is enabled.
Otherwise, aux domain code will fail when IOMMU is built-in and IOASID
is compiled as a module.

Signed-off-by: Jacob Pan 


Fixes: 59a623374dc38 ("iommu/vt-d: Replace Intel specific PASID 
allocator with IOASID")

Acked-by: Lu Baolu 

Best regards,
baolu


---
  drivers/iommu/Kconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d2fade984999..25149544d57c 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -188,6 +188,7 @@ config INTEL_IOMMU
select NEED_DMA_MAP_STATE
select DMAR_TABLE
select SWIOTLB
+   select IOASID
help
  DMA remapping (DMAR) devices support enables independent address
  translations for Direct Memory Access (DMA) from devices.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] iommu/vt-d: Fix mm reference leak

2020-03-20 Thread Lu Baolu


On 2020/3/20 12:32, Jacob Pan wrote:

Move canonical address check before mmget_not_zero() to avoid mm
reference leak.

Fixes: 9d8c3af31607 ("iommu/vt-d: IOMMU Page Request needs to check if
address is canonical.")

Signed-off-by: Jacob Pan 


Acked-by: Lu Baolu 

Best regards,
baolu


---
  drivers/iommu/intel-svm.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 1483f1845762..56253c59ca10 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -861,14 +861,15 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 * any faults on kernel addresses. */
if (!svm->mm)
goto bad_req;
-   /* If the mm is already defunct, don't handle faults. */
-   if (!mmget_not_zero(svm->mm))
-   goto bad_req;
  
  		/* If address is not canonical, return invalid response */

if (!is_canonical_address(address))
goto bad_req;
  
+		/* If the mm is already defunct, don't handle faults. */

+   if (!mmget_not_zero(svm->mm))
+   goto bad_req;
+
down_read(>mm->mmap_sem);
vma = find_extend_vma(svm->mm, address);
if (!vma || address < vma->vm_start)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/3] iommu/vt-d: Remove redundant IOTLB flush

2020-03-20 Thread Lu Baolu


On 2020/3/20 12:32, Jacob Pan wrote:

IOTLB flush already included in the PASID tear down process. There
is no need to flush again.


It seems that intel_pasid_tear_down_entry() doesn't flush the pasid
based device TLB?

Best regards,
baolu



Cc: Lu Baolu 
Signed-off-by: Jacob Pan 
---
  drivers/iommu/intel-svm.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 8f42d717d8d7..1483f1845762 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -268,10 +268,9 @@ static void intel_mm_release(struct mmu_notifier *mn, 
struct mm_struct *mm)
 * *has* to handle gracefully without affecting other processes.
 */
rcu_read_lock();
-   list_for_each_entry_rcu(sdev, >devs, list) {
+   list_for_each_entry_rcu(sdev, >devs, list)
intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid);
-   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
-   }
+
rcu_read_unlock();
  
  }

@@ -731,7 +730,6 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 * large and has to be physically contiguous. So it's
 * hard to be as defensive as we might like. */
intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
-   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
kfree_rcu(sdev, rcu);
  
  			if (list_empty(>devs)) {



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: arm-smmu-v3 high cpu usage for NVMe

2020-03-20 Thread Jean-Philippe Brucker

On Fri, Mar 20, 2020 at 10:41:44AM +, John Garry wrote:
> On 19/03/2020 18:43, Jean-Philippe Brucker wrote:
> > On Thu, Mar 19, 2020 at 12:54:59PM +, John Garry wrote:
> > > Hi Will,
> > > 
> > > > 
> > > > On Thu, Jan 02, 2020 at 05:44:39PM +, John Garry wrote:
> > > > > And for the overall system, we have:
> > > > > 
> > > > > PerfTop:   85864 irqs/sec  kernel:89.6%  exact:  0.0% lost: 
> > > > > 0/34434 drop:
> > > > > 0/40116 [4000Hz cycles],  (all, 96 CPUs)
> > > > > --
> > > > > 
> > > > >   27.43%  [kernel]  [k] arm_smmu_cmdq_issue_cmdlist
> > > > >   11.71%  [kernel]  [k] _raw_spin_unlock_irqrestore
> > > > >6.35%  [kernel]  [k] _raw_spin_unlock_irq
> > > > >2.65%  [kernel]  [k] get_user_pages_fast
> > > > >2.03%  [kernel]  [k] __slab_free
> > > > >1.55%  [kernel]  [k] tick_nohz_idle_exit
> > > > >1.47%  [kernel]  [k] arm_lpae_map
> > > > >1.39%  [kernel]  [k] __fget
> > > > >1.14%  [kernel]  [k] __lock_text_start
> > > > >1.09%  [kernel]  [k] _raw_spin_lock
> > > > >1.08%  [kernel]  [k] bio_release_pages.part.42
> > > > >1.03%  [kernel]  [k] __sbitmap_get_word
> > > > >0.97%  [kernel]  [k] 
> > > > > arm_smmu_atc_inv_domain.constprop.42
> > > > >0.91%  [kernel]  [k] fput_many
> > > > >0.88%  [kernel]  [k] __arm_lpae_map
> > > > > 
> > > > > One thing to note is that we still spend an appreciable amount of 
> > > > > time in
> > > > > arm_smmu_atc_inv_domain(), which is disappointing when considering it 
> > > > > should
> > > > > effectively be a noop.
> > > > > 
> > > > > As for arm_smmu_cmdq_issue_cmdlist(), I do note that during the 
> > > > > testing our
> > > > > batch size is 1, so we're not seeing the real benefit of the 
> > > > > batching. I
> > > > > can't help but think that we could improve this code to try to 
> > > > > combine CMD
> > > > > SYNCs for small batches.
> > > > > 
> > > > > Anyway, let me know your thoughts or any questions. I'll have a look 
> > > > > if a
> > > > > get a chance for other possible bottlenecks.
> > > > 
> > > > Did you ever get any more information on this? I don't have any SMMUv3
> > > > hardware any more, so I can't really dig into this myself.
> > > 
> > > I'm only getting back to look at this now, as SMMU performance is a bit 
> > > of a
> > > hot topic again for us.
> > > 
> > > So one thing we are doing which looks to help performance is this series
> > > from Marc:
> > > 
> > > https://lore.kernel.org/lkml/9171c554-50d2-142b-96ae-1357952fc...@huawei.com/T/#mee5562d1efd6aaeb8d2682bdb6807fe7b5d7f56d
> > > 
> > > So that is just spreading the per-CPU load for NVMe interrupt handling
> > > (where the DMA unmapping is happening), so I'd say just side-stepping any
> > > SMMU issue really.
> > > 
> > > Going back to the SMMU, I wanted to run epbf and perf annotate to help
> > > profile this, but was having no luck getting them to work properly. I'll
> > > look at this again now.
> > 
> > Could you also try with the upcoming ATS change currently in Will's tree?
> > They won't improve your numbers but it'd be good to check that they don't
> > make things worse.
> 
> I can do when I get a chance.
> 
> > 
> > I've run a bunch of netperf instances on multiple cores and collecting
> > SMMU usage (on TaiShan 2280). I'm getting the following ratio pretty
> > consistently.
> > 
> > - 6.07% arm_smmu_iotlb_sync
> > - 5.74% arm_smmu_tlb_inv_range
> >  5.09% arm_smmu_cmdq_issue_cmdlist
> >  0.28% __pi_memset
> >  0.08% __pi_memcpy
> >  0.08% arm_smmu_atc_inv_domain.constprop.37
> >  0.07% arm_smmu_cmdq_build_cmd
> >  0.01% arm_smmu_cmdq_batch_add
> >   0.31% __pi_memset
> > 
> > So arm_smmu_atc_inv_domain() takes about 1.4% of arm_smmu_iotlb_sync(),
> > when ATS is not used. According to the annotations, the load from the
> > atomic_read(), that checks whether the domain uses ATS, is 77% of the
> > samples in arm_smmu_atc_inv_domain() (265 of 345 samples), so I'm not sure
> > there is much room for optimization there.
> 
> Well I did originally suggest using RCU protection to scan the list of
> devices, instead of reading an atomic and checking for non-zero value. But
> that would be an optimsation for ATS also, and there was no ATS devices at
> the time (to verify performance).

Heh, I have yet to get my hands on one. Currently I can't evaluate ATS
performance, but I agree that using RCU to scan the list should get better
results when using ATS.

When ATS isn't in use however, I suspect reading nr_ats_masters should be
more efficient than taking the RCU lock + reading an "ats_devices" list
(since the smmu_domain->devices list also serves context

Re: arm-smmu-v3 high cpu usage for NVMe

2020-03-20 Thread John Garry


On 19/03/2020 18:43, Jean-Philippe Brucker wrote:

On Thu, Mar 19, 2020 at 12:54:59PM +, John Garry wrote:

Hi Will,



On Thu, Jan 02, 2020 at 05:44:39PM +, John Garry wrote:

And for the overall system, we have:

PerfTop:   85864 irqs/sec  kernel:89.6%  exact:  0.0% lost: 0/34434 drop:
0/40116 [4000Hz cycles],  (all, 96 CPUs)
--

  27.43%  [kernel]  [k] arm_smmu_cmdq_issue_cmdlist
  11.71%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.35%  [kernel]  [k] _raw_spin_unlock_irq
   2.65%  [kernel]  [k] get_user_pages_fast
   2.03%  [kernel]  [k] __slab_free
   1.55%  [kernel]  [k] tick_nohz_idle_exit
   1.47%  [kernel]  [k] arm_lpae_map
   1.39%  [kernel]  [k] __fget
   1.14%  [kernel]  [k] __lock_text_start
   1.09%  [kernel]  [k] _raw_spin_lock
   1.08%  [kernel]  [k] bio_release_pages.part.42
   1.03%  [kernel]  [k] __sbitmap_get_word
   0.97%  [kernel]  [k] arm_smmu_atc_inv_domain.constprop.42
   0.91%  [kernel]  [k] fput_many
   0.88%  [kernel]  [k] __arm_lpae_map

One thing to note is that we still spend an appreciable amount of time in
arm_smmu_atc_inv_domain(), which is disappointing when considering it should
effectively be a noop.

As for arm_smmu_cmdq_issue_cmdlist(), I do note that during the testing our
batch size is 1, so we're not seeing the real benefit of the batching. I
can't help but think that we could improve this code to try to combine CMD
SYNCs for small batches.

Anyway, let me know your thoughts or any questions. I'll have a look if a
get a chance for other possible bottlenecks.


Did you ever get any more information on this? I don't have any SMMUv3
hardware any more, so I can't really dig into this myself.


I'm only getting back to look at this now, as SMMU performance is a bit of a
hot topic again for us.

So one thing we are doing which looks to help performance is this series
from Marc:

https://lore.kernel.org/lkml/9171c554-50d2-142b-96ae-1357952fc...@huawei.com/T/#mee5562d1efd6aaeb8d2682bdb6807fe7b5d7f56d

So that is just spreading the per-CPU load for NVMe interrupt handling
(where the DMA unmapping is happening), so I'd say just side-stepping any
SMMU issue really.

Going back to the SMMU, I wanted to run epbf and perf annotate to help
profile this, but was having no luck getting them to work properly. I'll
look at this again now.


Could you also try with the upcoming ATS change currently in Will's tree?
They won't improve your numbers but it'd be good to check that they don't
make things worse.


I can do when I get a chance.



I've run a bunch of netperf instances on multiple cores and collecting
SMMU usage (on TaiShan 2280). I'm getting the following ratio pretty
consistently.

- 6.07% arm_smmu_iotlb_sync
- 5.74% arm_smmu_tlb_inv_range
 5.09% arm_smmu_cmdq_issue_cmdlist
 0.28% __pi_memset
 0.08% __pi_memcpy
 0.08% arm_smmu_atc_inv_domain.constprop.37
 0.07% arm_smmu_cmdq_build_cmd
 0.01% arm_smmu_cmdq_batch_add
  0.31% __pi_memset

So arm_smmu_atc_inv_domain() takes about 1.4% of arm_smmu_iotlb_sync(),
when ATS is not used. According to the annotations, the load from the
atomic_read(), that checks whether the domain uses ATS, is 77% of the
samples in arm_smmu_atc_inv_domain() (265 of 345 samples), so I'm not sure
there is much room for optimization there.


Well I did originally suggest using RCU protection to scan the list of 
devices, instead of reading an atomic and checking for non-zero value. 
But that would be an optimsation for ATS also, and there was no ATS 
devices at the time (to verify performance).


Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [linux-next/mainline][bisected 3acac06][ppc] Oops when unloading mpt3sas driver

2020-03-20 Thread Abdul Haleem

On Tue, 2020-02-25 at 12:23 +0530, Sreekanth Reddy wrote:
> On Tue, Feb 25, 2020 at 11:51 AM Abdul Haleem
>  wrote:
> >
> > On Fri, 2020-01-17 at 18:21 +0530, Abdul Haleem wrote:
> > > On Thu, 2020-01-16 at 09:44 -0800, Christoph Hellwig wrote:
> > > > Hi Abdul,
> > > >
> > > > I think the problem is that mpt3sas has some convoluted logic to do
> > > > some DMA allocations with a 32-bit coherent mask, and then switches
> > > > to a 63 or 64 bit mask, which is not supported by the DMA API.
> > > >
> > > > Can you try the patch below?
> > >
> > > Thank you Christoph, with the given patch applied the bug is not seen.
> > >
> > > rmmod of mpt3sas driver is successful, no kernel Oops
> > >
> > > Reported-and-tested-by: Abdul Haleem 
> >
> > Hi Christoph,
> >
> > I see the patch is under discussion, will this be merged upstream any
> > time soon ? as boot is broken on our machines with out your patch.
> >
> 
> Hi Abdul,
> 
> We have posted a new set of patches to fix this issue. This patch set
> won't change the DMA Mask on the fly and also won't hardcode the DMA
> mask to 32 bit.
> 
> [PATCH 0/5] mpt3sas: Fix changing coherent mask after allocation.
> 
> This patchset will have below patches, Please review and try with this
> patch set.
> 
> Suganath Prabu S (5):
>   mpt3sas: Don't change the dma coherent mask after  allocations
>   mpt3sas: Rename function name is_MSB_are_same
>   mpt3sas: Code Refactoring.
>   mpt3sas: Handle RDPQ DMA allocation in same 4g region
>   mpt3sas: Update version to 33.101.00.00

Hi Suganath, 

The above patch fixes the issue, driver is loading and unloading with no
kernel oops. 

Reported-and-tested-by: Abdul Haleem 

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/vt-d: Replace intel SVM APIs with generic SVA APIs

2020-03-20 Thread Jean-Philippe Brucker

On Fri, Mar 20, 2020 at 10:29:55AM +0100, Jean-Philippe Brucker wrote:
> > - success:
> > -   *pasid = svm->pasid;
> > +success:
> > +   sdev->pasid = svm->pasid;
> > +   sdev->sva.dev = dev;
> > +   if (sd)
> > +   *sd = sdev;
> 
> One thing that might be missing: calling bind() multiple times with the
> same (dev, mm) pair should take references to the svm struct, so device
> drivers can call unbind() on it that many times.

Please disregard this, I missed sdev->users

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] iommu/vt-d: Report SVA feature with generic flag

2020-03-20 Thread Jean-Philippe Brucker

On Mon, Feb 24, 2020 at 03:26:36PM -0800, Jacob Pan wrote:
> Query Shared Virtual Address/Memory capability is a generic feature.
> SVA feature check is the required first step before calling
> iommu_sva_bind_device().
> 
> VT-d checks SVA feature enabling at per IOMMU level during this step,
> SVA bind device will check and enable PCI ATS, PRS, and PASID capabilities
> at device level.
> 
> This patch reports Intel SVM as SVA feature such that generic code
> (e.g. Uacce [1]) can use it.
> 
> [1] https://lkml.org/lkml/2020/1/15/604
> 
> Signed-off-by: Jacob Pan 

Don't you also need to have has_feat(), feat_enabled() and disable_feat()
return positive values?

Thanks,
Jean

> ---
>  drivers/iommu/intel-iommu.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 92c2f2e4197b..5eca6e10d2a4 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -6346,9 +6346,14 @@ intel_iommu_dev_has_feat(struct device *dev, enum 
> iommu_dev_features feat)
>  static int
>  intel_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
>  {
> + struct intel_iommu *intel_iommu = dev_to_intel_iommu(dev);
> +
>   if (feat == IOMMU_DEV_FEAT_AUX)
>   return intel_iommu_enable_auxd(dev);
>  
> + if (feat == IOMMU_DEV_FEAT_SVA)
> + return intel_iommu->flags & VTD_FLAG_SVM_CAPABLE;
> +
>   return -ENODEV;
>  }
>  
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/vt-d: Replace intel SVM APIs with generic SVA APIs

2020-03-20 Thread Jean-Philippe Brucker

Hi Jacob,

I think this step is really useful and the patch looks good overall,
thanks for doing this. Some commments inline

On Mon, Feb 24, 2020 at 03:26:37PM -0800, Jacob Pan wrote:
> This patch is an initial step to replace Intel SVM code with the
> following IOMMU SVA ops:
> intel_svm_bind_mm() => iommu_sva_bind_device()
> intel_svm_unbind_mm() => iommu_sva_unbind_device()
> intel_svm_is_pasid_valid() => iommu_sva_get_pasid()
> 
> The features below will continue to work but are not included in this patch
> in that they are handled mostly within the IOMMU subsystem.
> - IO page fault
> - mmu notifier
> 
> Consolidation of the above will come after merging generic IOMMU sva
> code[1]. There should not be any changes needed for SVA users such as
> accelerator device drivers during this time.
> 
> [1] http://jpbrucker.net/sva/
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-iommu.c |   3 ++
>  drivers/iommu/intel-svm.c   | 123 
> 
>  include/linux/intel-iommu.h |   7 +++
>  include/linux/intel-svm.h   |  85 --
>  4 files changed, 78 insertions(+), 140 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5eca6e10d2a4..ccfa5adfd06d 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -6475,6 +6475,9 @@ const struct iommu_ops intel_iommu_ops = {
>   .cache_invalidate   = intel_iommu_sva_invalidate,
>   .sva_bind_gpasid= intel_svm_bind_gpasid,
>   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
> + .sva_bind   = intel_svm_bind,
> + .sva_unbind = intel_svm_unbind,
> + .sva_get_pasid  = intel_svm_get_pasid,
>  #endif
>  };
>  
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 1d7a95372f8c..35d949513728 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -516,13 +516,14 @@ int intel_svm_unbind_gpasid(struct device *dev, int 
> pasid)
>   return ret;
>  }
>  
> -int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
> svm_dev_ops *ops)
> +/* Caller must hold pasid_mutex, mm reference */
> +static int intel_svm_bind_mm(struct device *dev, int flags, struct 
> svm_dev_ops *ops,
> +   struct mm_struct *mm, struct intel_svm_dev **sd)
>  {
>   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>   struct device_domain_info *info;
>   struct intel_svm_dev *sdev;
>   struct intel_svm *svm = NULL;
> - struct mm_struct *mm = NULL;
>   int pasid_max;
>   int ret;
>  
> @@ -539,16 +540,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, 
> int flags, struct svm_dev_
>   } else
>   pasid_max = 1 << 20;
>  
> + /* Bind supervisor PASID shuld have mm = NULL */

should

>   if (flags & SVM_FLAG_SUPERVISOR_MODE) {
> - if (!ecap_srs(iommu->ecap))
> + if (!ecap_srs(iommu->ecap) || mm) {
> + pr_err("Supervisor PASID with user provided mm.\n");
>   return -EINVAL;
> - } else if (pasid) {
> - mm = get_task_mm(current);
> - BUG_ON(!mm);
> + }
>   }
>  
> - mutex_lock(_mutex);
> - if (pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
> + if (!(flags & SVM_FLAG_PRIVATE_PASID)) {
>   struct intel_svm *t;
>  
>   list_for_each_entry(t, _svm_list, list) {
> @@ -586,9 +586,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
> flags, struct svm_dev_
>   sdev->dev = dev;
>  
>   ret = intel_iommu_enable_pasid(iommu, dev);
> - if (ret || !pasid) {
> - /* If they don't actually want to assign a PASID, this is
> -  * just an enabling check/preparation. */
> + if (ret) {
>   kfree(sdev);
>   goto out;
>   }
> @@ -688,18 +686,17 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, 
> int flags, struct svm_dev_
>   }
>   }
>   list_add_rcu(>list, >devs);
> -
> - success:
> - *pasid = svm->pasid;
> +success:
> + sdev->pasid = svm->pasid;
> + sdev->sva.dev = dev;
> + if (sd)
> + *sd = sdev;

One thing that might be missing: calling bind() multiple times with the
same (dev, mm) pair should take references to the svm struct, so device
drivers can call unbind() on it that many times.

>   ret = 0;
>   out:
> - mutex_unlock(_mutex);
> - if (mm)
> - mmput(mm);
>   return ret;
>  }
> -EXPORT_SYMBOL_GPL(intel_svm_bind_mm);
>  
> +/* Caller must hold pasid_mutex */
>  int intel_svm_unbind_mm(struct device *dev, int pasid)
>  {
>   struct intel_svm_dev *sdev;
> @@ -707,7 +704,6 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
>   struct intel_svm *svm;
>   int ret = -EINVAL;
>  
> - mutex_lock(_mutex);
>   iommu = intel_svm_device_to_iommu(dev);
>

[PATCH v3 07/15] iommu/arm-smmu: Fix uninitilized variable warning

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Some unrelated changes in the iommu code caused a new warning to
appear in the arm-smmu driver:

  CC  drivers/iommu/arm-smmu.o
drivers/iommu/arm-smmu.c: In function 'arm_smmu_add_device':
drivers/iommu/arm-smmu.c:1441:2: warning: 'smmu' may be used uninitialized in 
this function [-Wmaybe-uninitialized]
  arm_smmu_rpm_put(smmu);
  ^~

The warning is a false positive, but initialize the variable to NULL
to get rid of it.

Tested-by: Will Deacon  # arm-smmu
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/arm-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 16c4b87af42b..980aae73b45b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1383,7 +1383,7 @@ struct arm_smmu_device *arm_smmu_get_by_fwnode(struct 
fwnode_handle *fwnode)
 
 static int arm_smmu_add_device(struct device *dev)
 {
-   struct arm_smmu_device *smmu;
+   struct arm_smmu_device *smmu = NULL;
struct arm_smmu_master_cfg *cfg;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
int i, ret;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 05/15] iommu: Rename struct iommu_param to dev_iommu

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

The term dev_iommu aligns better with other existing structures and
their accessor functions.

Cc: Greg Kroah-Hartman 
Tested-by: Will Deacon  # arm-smmu
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c  | 28 ++--
 include/linux/device.h |  6 +++---
 include/linux/iommu.h  |  4 ++--
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3e3528436e0b..beac2ef063dd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -152,9 +152,9 @@ void iommu_device_unregister(struct iommu_device *iommu)
 }
 EXPORT_SYMBOL_GPL(iommu_device_unregister);
 
-static struct iommu_param *iommu_get_dev_param(struct device *dev)
+static struct dev_iommu *dev_iommu_get(struct device *dev)
 {
-   struct iommu_param *param = dev->iommu_param;
+   struct dev_iommu *param = dev->iommu;
 
if (param)
return param;
@@ -164,14 +164,14 @@ static struct iommu_param *iommu_get_dev_param(struct 
device *dev)
return NULL;
 
mutex_init(>lock);
-   dev->iommu_param = param;
+   dev->iommu = param;
return param;
 }
 
-static void iommu_free_dev_param(struct device *dev)
+static void dev_iommu_free(struct device *dev)
 {
-   kfree(dev->iommu_param);
-   dev->iommu_param = NULL;
+   kfree(dev->iommu);
+   dev->iommu = NULL;
 }
 
 int iommu_probe_device(struct device *dev)
@@ -183,7 +183,7 @@ int iommu_probe_device(struct device *dev)
if (!ops)
return -EINVAL;
 
-   if (!iommu_get_dev_param(dev))
+   if (!dev_iommu_get(dev))
return -ENOMEM;
 
if (!try_module_get(ops->owner)) {
@@ -200,7 +200,7 @@ int iommu_probe_device(struct device *dev)
 err_module_put:
module_put(ops->owner);
 err_free_dev_param:
-   iommu_free_dev_param(dev);
+   dev_iommu_free(dev);
return ret;
 }
 
@@ -211,9 +211,9 @@ void iommu_release_device(struct device *dev)
if (dev->iommu_group)
ops->remove_device(dev);
 
-   if (dev->iommu_param) {
+   if (dev->iommu) {
module_put(ops->owner);
-   iommu_free_dev_param(dev);
+   dev_iommu_free(dev);
}
 }
 
@@ -972,7 +972,7 @@ int iommu_register_device_fault_handler(struct device *dev,
iommu_dev_fault_handler_t handler,
void *data)
 {
-   struct iommu_param *param = dev->iommu_param;
+   struct dev_iommu *param = dev->iommu;
int ret = 0;
 
if (!param)
@@ -1015,7 +1015,7 @@ EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
  */
 int iommu_unregister_device_fault_handler(struct device *dev)
 {
-   struct iommu_param *param = dev->iommu_param;
+   struct dev_iommu *param = dev->iommu;
int ret = 0;
 
if (!param)
@@ -1055,7 +1055,7 @@ EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
  */
 int iommu_report_device_fault(struct device *dev, struct iommu_fault_event 
*evt)
 {
-   struct iommu_param *param = dev->iommu_param;
+   struct dev_iommu *param = dev->iommu;
struct iommu_fault_event *evt_pending = NULL;
struct iommu_fault_param *fparam;
int ret = 0;
@@ -1104,7 +1104,7 @@ int iommu_page_response(struct device *dev,
int ret = -EINVAL;
struct iommu_fault_event *evt;
struct iommu_fault_page_request *prm;
-   struct iommu_param *param = dev->iommu_param;
+   struct dev_iommu *param = dev->iommu;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 
if (!domain || !domain->ops->page_response)
diff --git a/include/linux/device.h b/include/linux/device.h
index fa04dfd22bbc..405a8f11bec1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -44,7 +44,7 @@ struct iommu_ops;
 struct iommu_group;
 struct iommu_fwspec;
 struct dev_pin_info;
-struct iommu_param;
+struct dev_iommu;
 
 /**
  * struct subsys_interface - interfaces to device functions
@@ -514,7 +514,7 @@ struct dev_links_info {
  * device (i.e. the bus driver that discovered the device).
  * @iommu_group: IOMMU group the device belongs to.
  * @iommu_fwspec: IOMMU-specific properties supplied by firmware.
- * @iommu_param: Per device generic IOMMU runtime data
+ * @iommu: Per device generic IOMMU runtime data
  *
  * @offline_disabled: If set, the device is permanently online.
  * @offline:   Set after successful invocation of bus type's .offline().
@@ -614,7 +614,7 @@ struct device {
void(*release)(struct device *dev);
struct iommu_group  *iommu_group;
struct iommu_fwspec *iommu_fwspec;
-   struct iommu_param  *iommu_param;
+   struct dev_iommu*iommu;
 
booloffline_disabled:1;
booloffline:1;
diff --git

[PATCH v3 06/15] iommu: Move iommu_fwspec to struct dev_iommu

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Move the iommu_fwspec pointer in struct device into struct dev_iommu.
This is a step in the effort to reduce the iommu related pointers in
struct device to one.

Cc: Greg Kroah-Hartman 
Tested-by: Will Deacon  # arm-smmu
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c  |  3 +++
 include/linux/device.h |  3 ---
 include/linux/iommu.h  | 12 
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index beac2ef063dd..826a67ba247f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2405,6 +2405,9 @@ int iommu_fwspec_init(struct device *dev, struct 
fwnode_handle *iommu_fwnode,
if (fwspec)
return ops == fwspec->ops ? 0 : -EINVAL;
 
+   if (!dev_iommu_get(dev))
+   return -ENOMEM;
+
fwspec = kzalloc(sizeof(*fwspec), GFP_KERNEL);
if (!fwspec)
return -ENOMEM;
diff --git a/include/linux/device.h b/include/linux/device.h
index 405a8f11bec1..fc1427ab7e85 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -42,7 +42,6 @@ struct device_node;
 struct fwnode_handle;
 struct iommu_ops;
 struct iommu_group;
-struct iommu_fwspec;
 struct dev_pin_info;
 struct dev_iommu;
 
@@ -513,7 +512,6 @@ struct dev_links_info {
  * gone away. This should be set by the allocator of the
  * device (i.e. the bus driver that discovered the device).
  * @iommu_group: IOMMU group the device belongs to.
- * @iommu_fwspec: IOMMU-specific properties supplied by firmware.
  * @iommu: Per device generic IOMMU runtime data
  *
  * @offline_disabled: If set, the device is permanently online.
@@ -613,7 +611,6 @@ struct device {
 
void(*release)(struct device *dev);
struct iommu_group  *iommu_group;
-   struct iommu_fwspec *iommu_fwspec;
struct dev_iommu*iommu;
 
booloffline_disabled:1;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1c9fa5c1174b..f5edc21a644d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -368,14 +368,15 @@ struct iommu_fault_param {
  * struct dev_iommu - Collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
+ * @fwspec: IOMMU fwspec data
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
  * struct iommu_group  *iommu_group;
- * struct iommu_fwspec *iommu_fwspec;
  */
 struct dev_iommu {
struct mutex lock;
-   struct iommu_fault_param *fault_param;
+   struct iommu_fault_param*fault_param;
+   struct iommu_fwspec *fwspec;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -614,13 +615,16 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct 
fwnode_handle *fwnode);
 
 static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
-   return dev->iommu_fwspec;
+   if (dev->iommu)
+   return dev->iommu->fwspec;
+   else
+   return NULL;
 }
 
 static inline void dev_iommu_fwspec_set(struct device *dev,
struct iommu_fwspec *fwspec)
 {
-   dev->iommu_fwspec = fwspec;
+   dev->iommu->fwspec = fwspec;
 }
 
 int iommu_probe_device(struct device *dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 13/15] iommu/qcom: Use accessor functions for iommu private data

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Make use of dev_iommu_priv_set/get() functions.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/qcom_iommu.c | 61 ++
 1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 4328da0b0a9f..824a9e85fac6 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -74,16 +74,19 @@ static struct qcom_iommu_domain 
*to_qcom_iommu_domain(struct iommu_domain *dom)
 
 static const struct iommu_ops qcom_iommu_ops;
 
-static struct qcom_iommu_dev * to_iommu(struct iommu_fwspec *fwspec)
+static struct qcom_iommu_dev * to_iommu(struct device *dev)
 {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
if (!fwspec || fwspec->ops != _iommu_ops)
return NULL;
-   return fwspec->iommu_priv;
+
+   return dev_iommu_priv_get(dev);
 }
 
-static struct qcom_iommu_ctx * to_ctx(struct iommu_fwspec *fwspec, unsigned 
asid)
+static struct qcom_iommu_ctx * to_ctx(struct device *dev, unsigned asid)
 {
-   struct qcom_iommu_dev *qcom_iommu = to_iommu(fwspec);
+   struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
if (!qcom_iommu)
return NULL;
return qcom_iommu->ctxs[asid - 1];
@@ -115,11 +118,14 @@ iommu_readq(struct qcom_iommu_ctx *ctx, unsigned reg)
 
 static void qcom_iommu_tlb_sync(void *cookie)
 {
-   struct iommu_fwspec *fwspec = cookie;
+   struct iommu_fwspec *fwspec;
+   struct device *dev = cookie;
unsigned i;
 
+   fwspec = dev_iommu_fwspec_get(dev);
+
for (i = 0; i < fwspec->num_ids; i++) {
-   struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]);
+   struct qcom_iommu_ctx *ctx = to_ctx(dev, fwspec->ids[i]);
unsigned int val, ret;
 
iommu_writel(ctx, ARM_SMMU_CB_TLBSYNC, 0);
@@ -133,11 +139,14 @@ static void qcom_iommu_tlb_sync(void *cookie)
 
 static void qcom_iommu_tlb_inv_context(void *cookie)
 {
-   struct iommu_fwspec *fwspec = cookie;
+   struct device *dev = cookie;
+   struct iommu_fwspec *fwspec;
unsigned i;
 
+   fwspec = dev_iommu_fwspec_get(dev);
+
for (i = 0; i < fwspec->num_ids; i++) {
-   struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]);
+   struct qcom_iommu_ctx *ctx = to_ctx(dev, fwspec->ids[i]);
iommu_writel(ctx, ARM_SMMU_CB_S1_TLBIASID, ctx->asid);
}
 
@@ -147,13 +156,16 @@ static void qcom_iommu_tlb_inv_context(void *cookie)
 static void qcom_iommu_tlb_inv_range_nosync(unsigned long iova, size_t size,
size_t granule, bool leaf, void 
*cookie)
 {
-   struct iommu_fwspec *fwspec = cookie;
+   struct device *dev = cookie;
+   struct iommu_fwspec *fwspec;
unsigned i, reg;
 
reg = leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
 
+   fwspec = dev_iommu_fwspec_get(dev);
+
for (i = 0; i < fwspec->num_ids; i++) {
-   struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]);
+   struct qcom_iommu_ctx *ctx = to_ctx(dev, fwspec->ids[i]);
size_t s = size;
 
iova = (iova >> 12) << 12;
@@ -222,9 +234,10 @@ static irqreturn_t qcom_iommu_fault(int irq, void *dev)
 
 static int qcom_iommu_init_domain(struct iommu_domain *domain,
  struct qcom_iommu_dev *qcom_iommu,
- struct iommu_fwspec *fwspec)
+ struct device *dev)
 {
struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct io_pgtable_ops *pgtbl_ops;
struct io_pgtable_cfg pgtbl_cfg;
int i, ret = 0;
@@ -243,7 +256,7 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
};
 
qcom_domain->iommu = qcom_iommu;
-   pgtbl_ops = alloc_io_pgtable_ops(ARM_32_LPAE_S1, _cfg, fwspec);
+   pgtbl_ops = alloc_io_pgtable_ops(ARM_32_LPAE_S1, _cfg, dev);
if (!pgtbl_ops) {
dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n");
ret = -ENOMEM;
@@ -256,7 +269,7 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
domain->geometry.force_aperture = true;
 
for (i = 0; i < fwspec->num_ids; i++) {
-   struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]);
+   struct qcom_iommu_ctx *ctx = to_ctx(dev, fwspec->ids[i]);
 
if (!ctx->secure_init) {
ret = qcom_scm_restore_sec_cfg(qcom_iommu->sec_id, 
ctx->asid);
@@ -363,8 +376,7 @@ static void qcom_iommu_domain_free(struct iommu_domain 
*domain)
 
 static int qcom_iommu_attach_dev(struct iommu_domain *domain, struct device 
*dev)
 {
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-

[PATCH v3 04/15] iommu/tegra-gart: Remove direct access of dev->iommu_fwspec

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Use the accessor functions instead of directly dereferencing
dev->iommu_fwspec.

Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/tegra-gart.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 3fb7ba72507d..db6559e8336f 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -247,7 +247,7 @@ static int gart_iommu_add_device(struct device *dev)
 {
struct iommu_group *group;
 
-   if (!dev->iommu_fwspec)
+   if (!dev_iommu_fwspec_get(dev))
return -ENODEV;
 
group = iommu_group_get_for_dev(dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 01/15] iommu: Define dev_iommu_fwspec_get() for !CONFIG_IOMMU_API

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

There are users outside of the IOMMU code that need to call that
function. Define it for !CONFIG_IOMMU_API too so that compilation does
not break.

Reported-by: kbuild test robot 
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 include/linux/iommu.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d1b5f4d98569..3c4ca041d7a2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1073,6 +1073,10 @@ static inline int iommu_sva_unbind_gpasid(struct 
iommu_domain *domain,
return -ENODEV;
 }
 
+static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
+{
+   return NULL;
+}
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 00/15] iommu: Move iommu_fwspec out of 'struct device'

2020-03-20 Thread Joerg Roedel

Hi,

here is the third version of the changes to move iommu_fwspec out of
'struct device'. Previous versions of this patch-set can be found here:

v2: https://lore.kernel.org/lkml/20200310091229.29830-1-j...@8bytes.org/

v1: https://lore.kernel.org/lkml/20200228150820.15340-1-j...@8bytes.org/

Changes to v2:

- Fix the issues found by Jean-Philippe

- Fix a compile issue in the Mediatek driver

Please review.

Thanks,

Joerg

Joerg Roedel (15):
  iommu: Define dev_iommu_fwspec_get() for !CONFIG_IOMMU_API
  ACPI/IORT: Remove direct access of dev->iommu_fwspec
  drm/msm/mdp5: Remove direct access of dev->iommu_fwspec
  iommu/tegra-gart: Remove direct access of dev->iommu_fwspec
  iommu: Rename struct iommu_param to dev_iommu
  iommu: Move iommu_fwspec to struct dev_iommu
  iommu/arm-smmu: Fix uninitilized variable warning
  iommu: Introduce accessors for iommu private data
  iommu/arm-smmu-v3: Use accessor functions for iommu private data
  iommu/arm-smmu: Use accessor functions for iommu private data
  iommu/renesas: Use accessor functions for iommu private data
  iommu/mediatek: Use accessor functions for iommu private data
  iommu/qcom: Use accessor functions for iommu private data
  iommu/virtio: Use accessor functions for iommu private data
  iommu: Move fwspec->iommu_priv to struct dev_iommu

 drivers/acpi/arm64/iort.c|  6 ++-
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  2 +-
 drivers/iommu/arm-smmu-v3.c  | 10 ++--
 drivers/iommu/arm-smmu.c | 59 ---
 drivers/iommu/iommu.c| 31 ++--
 drivers/iommu/ipmmu-vmsa.c   |  7 +--
 drivers/iommu/mtk_iommu.c| 13 +++--
 drivers/iommu/mtk_iommu_v1.c | 14 +++---
 drivers/iommu/qcom_iommu.c   | 61 ++--
 drivers/iommu/tegra-gart.c   |  2 +-
 drivers/iommu/virtio-iommu.c | 11 ++---
 include/linux/device.h   |  9 ++--
 include/linux/iommu.h| 33 ++---
 13 files changed, 144 insertions(+), 114 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 15/15] iommu: Move fwspec->iommu_priv to struct dev_iommu

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Move the pointer for iommu private data from struct iommu_fwspec to
struct dev_iommu.

Tested-by: Will Deacon  # arm-smmu
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 include/linux/iommu.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 056900e75758..8c4d45fce042 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -369,6 +369,7 @@ struct iommu_fault_param {
  *
  * @fault_param: IOMMU detected device fault reporting data
  * @fwspec: IOMMU fwspec data
+ * @priv:   IOMMU Driver private data
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
  * struct iommu_group  *iommu_group;
@@ -377,6 +378,7 @@ struct dev_iommu {
struct mutex lock;
struct iommu_fault_param*fault_param;
struct iommu_fwspec *fwspec;
+   void*priv;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -589,7 +591,6 @@ struct iommu_group *fsl_mc_device_group(struct device *dev);
 struct iommu_fwspec {
const struct iommu_ops  *ops;
struct fwnode_handle*iommu_fwnode;
-   void*iommu_priv;
u32 flags;
u32 num_pasid_bits;
unsigned intnum_ids;
@@ -629,12 +630,12 @@ static inline void dev_iommu_fwspec_set(struct device 
*dev,
 
 static inline void *dev_iommu_priv_get(struct device *dev)
 {
-   return dev->iommu->fwspec->iommu_priv;
+   return dev->iommu->priv;
 }
 
 static inline void dev_iommu_priv_set(struct device *dev, void *priv)
 {
-   dev->iommu->fwspec->iommu_priv = priv;
+   dev->iommu->priv = priv;
 }
 
 int iommu_probe_device(struct device *dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 08/15] iommu: Introduce accessors for iommu private data

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Add dev_iommu_priv_get/set() functions to access per-device iommu
private data. This makes it easier to move the pointer to a different
location.

Tested-by: Will Deacon  # arm-smmu
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 include/linux/iommu.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f5edc21a644d..056900e75758 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -627,6 +627,16 @@ static inline void dev_iommu_fwspec_set(struct device *dev,
dev->iommu->fwspec = fwspec;
 }
 
+static inline void *dev_iommu_priv_get(struct device *dev)
+{
+   return dev->iommu->fwspec->iommu_priv;
+}
+
+static inline void dev_iommu_priv_set(struct device *dev, void *priv)
+{
+   dev->iommu->fwspec->iommu_priv = priv;
+}
+
 int iommu_probe_device(struct device *dev);
 void iommu_release_device(struct device *dev);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 11/15] iommu/renesas: Use accessor functions for iommu private data

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Make use of dev_iommu_priv_set/get() functions.

Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/ipmmu-vmsa.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index ecb3f9464dd5..310cf09feea3 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -89,9 +89,7 @@ static struct ipmmu_vmsa_domain *to_vmsa_domain(struct 
iommu_domain *dom)
 
 static struct ipmmu_vmsa_device *to_ipmmu(struct device *dev)
 {
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
-   return fwspec ? fwspec->iommu_priv : NULL;
+   return dev_iommu_priv_get(dev);
 }
 
 #define TLB_LOOP_TIMEOUT   100 /* 100us */
@@ -727,14 +725,13 @@ static phys_addr_t ipmmu_iova_to_phys(struct iommu_domain 
*io_domain,
 static int ipmmu_init_platform_device(struct device *dev,
  struct of_phandle_args *args)
 {
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct platform_device *ipmmu_pdev;
 
ipmmu_pdev = of_find_device_by_node(args->np);
if (!ipmmu_pdev)
return -ENODEV;
 
-   fwspec->iommu_priv = platform_get_drvdata(ipmmu_pdev);
+   dev_iommu_priv_set(dev, platform_get_drvdata(ipmmu_pdev));
 
return 0;
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 14/15] iommu/virtio: Use accessor functions for iommu private data

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Make use of dev_iommu_priv_set/get() functions.

Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/virtio-iommu.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index cce329d71fba..8ead57f031f5 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -466,7 +466,7 @@ static int viommu_probe_endpoint(struct viommu_dev *viommu, 
struct device *dev)
struct virtio_iommu_req_probe *probe;
struct virtio_iommu_probe_property *prop;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+   struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
 
if (!fwspec->num_ids)
return -EINVAL;
@@ -648,7 +648,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
int ret = 0;
struct virtio_iommu_req_attach req;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+   struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
mutex_lock(>mutex);
@@ -807,8 +807,7 @@ static void viommu_iotlb_sync(struct iommu_domain *domain,
 static void viommu_get_resv_regions(struct device *dev, struct list_head *head)
 {
struct iommu_resv_region *entry, *new_entry, *msi = NULL;
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+   struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
list_for_each_entry(entry, >resv_regions, list) {
@@ -876,7 +875,7 @@ static int viommu_add_device(struct device *dev)
vdev->dev = dev;
vdev->viommu = viommu;
INIT_LIST_HEAD(>resv_regions);
-   fwspec->iommu_priv = vdev;
+   dev_iommu_priv_set(dev, vdev);
 
if (viommu->probe_size) {
/* Get additional information for this endpoint */
@@ -920,7 +919,7 @@ static void viommu_remove_device(struct device *dev)
if (!fwspec || fwspec->ops != _ops)
return;
 
-   vdev = fwspec->iommu_priv;
+   vdev = dev_iommu_priv_get(dev);
 
iommu_group_remove_device(dev);
iommu_device_unlink(>viommu->iommu, dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 12/15] iommu/mediatek: Use accessor functions for iommu private data

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Make use of dev_iommu_priv_set/get() functions.

Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/mtk_iommu.c| 13 ++---
 drivers/iommu/mtk_iommu_v1.c | 14 +++---
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 95945f467c03..5f4d6df59cf6 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -358,8 +358,8 @@ static void mtk_iommu_domain_free(struct iommu_domain 
*domain)
 static int mtk_iommu_attach_device(struct iommu_domain *domain,
   struct device *dev)
 {
+   struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
-   struct mtk_iommu_data *data = dev_iommu_fwspec_get(dev)->iommu_priv;
 
if (!data)
return -ENODEV;
@@ -378,7 +378,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
 static void mtk_iommu_detach_device(struct iommu_domain *domain,
struct device *dev)
 {
-   struct mtk_iommu_data *data = dev_iommu_fwspec_get(dev)->iommu_priv;
+   struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
 
if (!data)
return;
@@ -450,7 +450,7 @@ static int mtk_iommu_add_device(struct device *dev)
if (!fwspec || fwspec->ops != _iommu_ops)
return -ENODEV; /* Not a iommu client device */
 
-   data = fwspec->iommu_priv;
+   data = dev_iommu_priv_get(dev);
iommu_device_link(>iommu, dev);
 
group = iommu_group_get_for_dev(dev);
@@ -469,7 +469,7 @@ static void mtk_iommu_remove_device(struct device *dev)
if (!fwspec || fwspec->ops != _iommu_ops)
return;
 
-   data = fwspec->iommu_priv;
+   data = dev_iommu_priv_get(dev);
iommu_device_unlink(>iommu, dev);
 
iommu_group_remove_device(dev);
@@ -496,7 +496,6 @@ static struct iommu_group *mtk_iommu_device_group(struct 
device *dev)
 
 static int mtk_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct platform_device *m4updev;
 
if (args->args_count != 1) {
@@ -505,13 +504,13 @@ static int mtk_iommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return -EINVAL;
}
 
-   if (!fwspec->iommu_priv) {
+   if (!dev_iommu_priv_get(dev)) {
/* Get the m4u device */
m4updev = of_find_device_by_node(args->np);
if (WARN_ON(!m4updev))
return -EINVAL;
 
-   fwspec->iommu_priv = platform_get_drvdata(m4updev);
+   dev_iommu_priv_set(dev, platform_get_drvdata(m4updev));
}
 
return iommu_fwspec_add_ids(dev, args->args, 1);
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index e93b94ecac45..a31be05601c9 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -263,8 +263,8 @@ static void mtk_iommu_domain_free(struct iommu_domain 
*domain)
 static int mtk_iommu_attach_device(struct iommu_domain *domain,
   struct device *dev)
 {
+   struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
-   struct mtk_iommu_data *data = dev_iommu_fwspec_get(dev)->iommu_priv;
int ret;
 
if (!data)
@@ -286,7 +286,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
 static void mtk_iommu_detach_device(struct iommu_domain *domain,
struct device *dev)
 {
-   struct mtk_iommu_data *data = dev_iommu_fwspec_get(dev)->iommu_priv;
+   struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
 
if (!data)
return;
@@ -387,20 +387,20 @@ static int mtk_iommu_create_mapping(struct device *dev,
return -EINVAL;
}
 
-   if (!fwspec->iommu_priv) {
+   if (!dev_iommu_priv_get(dev)) {
/* Get the m4u device */
m4updev = of_find_device_by_node(args->np);
if (WARN_ON(!m4updev))
return -EINVAL;
 
-   fwspec->iommu_priv = platform_get_drvdata(m4updev);
+   dev_iommu_priv_set(dev, platform_get_drvdata(m4updev));
}
 
ret = iommu_fwspec_add_ids(dev, args->args, 1);
if (ret)
return ret;
 
-   data = fwspec->iommu_priv;
+   data = dev_iommu_priv_get(dev);
m4udev = data->dev;
mtk_mapping = m4udev->archdata.iommu;
if (!mtk_mapping) {
@@ -459,7 +459,7 @@ static int mtk_iommu_add_device(struct device *dev)
if (err)
return err;
 
-   data = fwspec->iommu_priv;
+   data = dev_iommu_priv_get(dev);
mtk_mapping = data->dev->archdata.iommu;

[PATCH v3 02/15] ACPI/IORT: Remove direct access of dev->iommu_fwspec

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Use the accessor functions instead of directly dereferencing
dev->iommu_fwspec.

Tested-by: Hanjun Guo 
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/acpi/arm64/iort.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index ed3d2d1a7ae9..7d04424189df 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1015,6 +1015,7 @@ const struct iommu_ops *iort_iommu_configure(struct 
device *dev)
return ops;
 
if (dev_is_pci(dev)) {
+   struct iommu_fwspec *fwspec;
struct pci_bus *bus = to_pci_dev(dev)->bus;
struct iort_pci_alias_info info = { .dev = dev };
 
@@ -1027,8 +1028,9 @@ const struct iommu_ops *iort_iommu_configure(struct 
device *dev)
err = pci_for_each_dma_alias(to_pci_dev(dev),
 iort_pci_iommu_init, );
 
-   if (!err && iort_pci_rc_supports_ats(node))
-   dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS;
+   fwspec = dev_iommu_fwspec_get(dev);
+   if (fwspec && iort_pci_rc_supports_ats(node))
+   fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS;
} else {
int i = 0;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 09/15] iommu/arm-smmu-v3: Use accessor functions for iommu private data

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Make use of dev_iommu_priv_set/get() functions in the code.

Tested-by: Hanjun Guo 
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/arm-smmu-v3.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index aa3ac2a03807..2b68498dfb66 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2659,7 +2659,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
if (!fwspec)
return -ENOENT;
 
-   master = fwspec->iommu_priv;
+   master = dev_iommu_priv_get(dev);
smmu = master->smmu;
 
arm_smmu_detach_dev(master);
@@ -2795,7 +2795,7 @@ static int arm_smmu_add_device(struct device *dev)
if (!fwspec || fwspec->ops != _smmu_ops)
return -ENODEV;
 
-   if (WARN_ON_ONCE(fwspec->iommu_priv))
+   if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
return -EBUSY;
 
smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
@@ -2810,7 +2810,7 @@ static int arm_smmu_add_device(struct device *dev)
master->smmu = smmu;
master->sids = fwspec->ids;
master->num_sids = fwspec->num_ids;
-   fwspec->iommu_priv = master;
+   dev_iommu_priv_set(dev, master);
 
/* Check the SIDs are in range of the SMMU and our stream table */
for (i = 0; i < master->num_sids; i++) {
@@ -2852,7 +2852,7 @@ static int arm_smmu_add_device(struct device *dev)
iommu_device_unlink(>iommu, dev);
 err_free_master:
kfree(master);
-   fwspec->iommu_priv = NULL;
+   dev_iommu_priv_set(dev, NULL);
return ret;
 }
 
@@ -2865,7 +2865,7 @@ static void arm_smmu_remove_device(struct device *dev)
if (!fwspec || fwspec->ops != _smmu_ops)
return;
 
-   master = fwspec->iommu_priv;
+   master = dev_iommu_priv_get(dev);
smmu = master->smmu;
arm_smmu_detach_dev(master);
iommu_group_remove_device(dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 10/15] iommu/arm-smmu: Use accessor functions for iommu private data

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Make use of dev_iommu_priv_set/get() functions and simplify the code
where possible with this change.

Tested-by: Will Deacon  # arm-smmu
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/arm-smmu.c | 57 +---
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 980aae73b45b..7aa36e6c19c0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -98,12 +98,15 @@ struct arm_smmu_master_cfg {
s16 smendx[];
 };
 #define INVALID_SMENDX -1
-#define __fwspec_cfg(fw) ((struct arm_smmu_master_cfg *)fw->iommu_priv)
-#define fwspec_smmu(fw)  (__fwspec_cfg(fw)->smmu)
-#define fwspec_smendx(fw, i) \
-   (i >= fw->num_ids ? INVALID_SMENDX : __fwspec_cfg(fw)->smendx[i])
-#define for_each_cfg_sme(fw, i, idx) \
-   for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
+#define __fwspec_cfg(dev) ((struct arm_smmu_master_cfg 
*)dev_iommu_priv_get(dev))
+#define fwspec_smmu(dev)  (__fwspec_cfg(dev)->smmu)
+#define fwspec_smendx(dev, i)  \
+   (i >= dev_iommu_fwspec_get(dev)->num_ids ?  \
+   INVALID_SMENDX :\
+   __fwspec_cfg(dev)->smendx[i])
+#define for_each_cfg_sme(dev, i, idx) \
+   for (i = 0; idx = fwspec_smendx(dev, i), \
+i < dev_iommu_fwspec_get(dev)->num_ids; ++i)
 
 static bool using_legacy_binding, using_generic_binding;
 
@@ -1061,7 +1064,7 @@ static bool arm_smmu_free_sme(struct arm_smmu_device 
*smmu, int idx)
 static int arm_smmu_master_alloc_smes(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct arm_smmu_master_cfg *cfg = fwspec->iommu_priv;
+   struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
struct arm_smmu_device *smmu = cfg->smmu;
struct arm_smmu_smr *smrs = smmu->smrs;
struct iommu_group *group;
@@ -1069,7 +1072,7 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
 
mutex_lock(>stream_map_mutex);
/* Figure out a viable stream map entry allocation */
-   for_each_cfg_sme(fwspec, i, idx) {
+   for_each_cfg_sme(dev, i, idx) {
u16 sid = FIELD_GET(ARM_SMMU_SMR_ID, fwspec->ids[i]);
u16 mask = FIELD_GET(ARM_SMMU_SMR_MASK, fwspec->ids[i]);
 
@@ -1100,7 +1103,7 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
iommu_group_put(group);
 
/* It worked! Now, poke the actual hardware */
-   for_each_cfg_sme(fwspec, i, idx) {
+   for_each_cfg_sme(dev, i, idx) {
arm_smmu_write_sme(smmu, idx);
smmu->s2crs[idx].group = group;
}
@@ -1117,14 +1120,14 @@ static int arm_smmu_master_alloc_smes(struct device 
*dev)
return ret;
 }
 
-static void arm_smmu_master_free_smes(struct iommu_fwspec *fwspec)
+static void arm_smmu_master_free_smes(struct device *dev)
 {
-   struct arm_smmu_device *smmu = fwspec_smmu(fwspec);
-   struct arm_smmu_master_cfg *cfg = fwspec->iommu_priv;
+   struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
+   struct arm_smmu_device *smmu = fwspec_smmu(dev);
int i, idx;
 
mutex_lock(>stream_map_mutex);
-   for_each_cfg_sme(fwspec, i, idx) {
+   for_each_cfg_sme(dev, i, idx) {
if (arm_smmu_free_sme(smmu, idx))
arm_smmu_write_sme(smmu, idx);
cfg->smendx[i] = INVALID_SMENDX;
@@ -1133,7 +1136,7 @@ static void arm_smmu_master_free_smes(struct iommu_fwspec 
*fwspec)
 }
 
 static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
- struct iommu_fwspec *fwspec)
+ struct device *dev)
 {
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_s2cr *s2cr = smmu->s2crs;
@@ -1146,7 +1149,7 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
else
type = S2CR_TYPE_TRANS;
 
-   for_each_cfg_sme(fwspec, i, idx) {
+   for_each_cfg_sme(dev, i, idx) {
if (type == s2cr[idx].type && cbndx == s2cr[idx].cbndx)
continue;
 
@@ -1160,10 +1163,10 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
 
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
-   int ret;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_device *smmu;
-   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret;
 
if (!fwspec || fwspec->ops != _smmu_ops) {
dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
@@ -1177,10 +1180,10 @@ static int arm_smmu_attach_dev(struct iommu_domain

[PATCH v3 03/15] drm/msm/mdp5: Remove direct access of dev->iommu_fwspec

2020-03-20 Thread Joerg Roedel

From: Joerg Roedel 

Use the accessor functions instead of directly dereferencing
dev->iommu_fwspec.

Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Joerg Roedel 
---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index e43ecd4be10a..1252e1d76340 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -725,7 +725,7 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
 
if (config->platform.iommu) {
iommu_dev = >dev;
-   if (!iommu_dev->iommu_fwspec)
+   if (!dev_iommu_fwspec_get(iommu_dev))
iommu_dev = iommu_dev->parent;
 
aspace = msm_gem_address_space_create(iommu_dev,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/6] iommu: Configure default domain with def_domain_type

2020-03-20 Thread Lu Baolu


Hi Joerg,

On 2020/3/19 22:03, Joerg Roedel wrote:

Hi Baolu,

On Sat, Mar 14, 2020 at 09:07:01AM +0800, Lu Baolu wrote:

+static int iommu_group_change_def_domain(struct iommu_group *group, int type)
+{
+   struct group_device *grp_dev, *temp;
+   struct iommu_domain *new, *old;
+   const struct iommu_ops *ops;
+   int ret = 0;
+
+   if ((type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_DMA) ||
+   !group->default_domain || type == group->default_domain->type ||
+   !group->default_domain->ops)
+   return -EINVAL;
+
+   if (group->domain != group->default_domain)
+   return -EBUSY;
+
+   iommu_group_ref_get(group);
+   old = group->default_domain;
+   ops = group->default_domain->ops;
+
+   /* Allocate a new domain of requested type. */
+   new = ops->domain_alloc(type);
+   if (!new) {
+   ret = -ENOMEM;
+   goto domain_out;
+   }
+   new->type = type;
+   new->ops = ops;
+   new->pgsize_bitmap = group->default_domain->pgsize_bitmap;
+
+   group->default_domain = new;
+   group->domain = new;
+   list_for_each_entry_safe(grp_dev, temp, >devices, list) {
+   struct device *dev;
+
+   dev = grp_dev->dev;
+   if (device_is_bound(dev)) {
+   ret = -EINVAL;
+   goto device_out;
+   }
+
+   iommu_group_create_direct_mappings(group, dev);
+   ret = __iommu_attach_device(group->domain, dev);
+   if (ret)
+   goto device_out;


In case of a failure here with a group containing multiple devices, the
other devices temporarily lose their mappings. Please only do the
device_is_bound() check in the loop and the actual re-attachment of the
group with a call to __iommu_attach_group().


Sure. I've post a v3 of this patch according to your comments here.
Please help to review.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 1/1] iommu: Configure default domain with def_domain_type

2020-03-20 Thread Lu Baolu

With the def_domain_type introduced, iommu default domain framework
is now able to determine a proper default domain for a group. Let's
use this to configure a group's default domain.

If unlikely a device requires a special default domain type other
than that in use, iommu probe procedure will either allocate a new
domain according to the specified domain type, or (if the group has
other devices sitting in it) change the default domain. The vendor
iommu driver which exposes the def_domain_type callback should
guarantee that there're no multiple devices in a same group requires
differnt types of default domain.

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/iommu.c | 103 --
 1 file changed, 100 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3e3528436e0b..8db670196706 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1361,6 +1361,89 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(fsl_mc_device_group);
 
+static int pre_def_domain_change(struct device *dev, void *data)
+{
+   struct iommu_group *group = data;
+
+   if (device_is_bound(dev))
+   return -EINVAL;
+
+   return iommu_group_create_direct_mappings(group, dev);
+}
+
+static int post_def_domain_change(struct device *dev, void *data)
+{
+   struct iommu_domain *domain = data;
+
+   return domain->ops->add_device(dev);
+}
+
+/**
+ * Changes the default domain of a group
+ *
+ * @group: The group for which the default domain should be changed
+ * @type: The new default domain type
+ *
+ * The caller should hold the group->mutex before call into this function.
+ * Returns 0 on success and error code on failure.
+ */
+static int iommu_group_change_def_domain(struct iommu_group *group, int type)
+{
+   struct iommu_domain *new, *old;
+   const struct iommu_ops *ops;
+   int ret;
+
+   if ((type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_DMA) ||
+   !group->default_domain || type == group->default_domain->type ||
+   !group->default_domain->ops)
+   return -EINVAL;
+
+   if (group->domain != group->default_domain)
+   return -EBUSY;
+
+   iommu_group_ref_get(group);
+   old = group->default_domain;
+   ops = group->default_domain->ops;
+
+   /* Allocate a new domain of requested type. */
+   new = ops->domain_alloc(type);
+   if (!new) {
+   iommu_group_put(group);
+   return -ENOMEM;
+   }
+   new->type = type;
+   new->ops = ops;
+   new->pgsize_bitmap = old->pgsize_bitmap;
+   group->default_domain = new;
+   group->domain = new;
+
+   ret = __iommu_group_for_each_dev(group, group, pre_def_domain_change);
+   if (ret)
+   goto err_out;
+
+   ret = __iommu_attach_group(new, group);
+   if (ret)
+   goto err_out;
+
+   ret = __iommu_group_for_each_dev(group, new, post_def_domain_change);
+   if (ret)
+   goto err_out;
+
+   iommu_domain_free(old);
+   iommu_group_put(group);
+
+   return 0;
+
+err_out:
+   group->default_domain = old;
+   group->domain = old;
+   __iommu_attach_group(old, group);
+   iommu_domain_free(new);
+   iommu_group_put(group);
+
+   return ret;
+}
+
 /**
  * iommu_group_get_for_dev - Find or create the IOMMU group for a device
  * @dev: target device
@@ -1375,6 +1458,7 @@ struct iommu_group *iommu_group_get_for_dev(struct device 
*dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;
+   int dev_def_type = 0;
int ret;
 
group = iommu_group_get(dev);
@@ -1391,20 +1475,24 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
if (IS_ERR(group))
return group;
 
+   if (ops->def_domain_type)
+   dev_def_type = ops->def_domain_type(dev);
+
/*
 * Try to allocate a default domain - needs support from the
 * IOMMU driver.
 */
if (!group->default_domain) {
struct iommu_domain *dom;
+   int type = dev_def_type ? : iommu_def_domain_type;
 
-   dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
-   if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
+   dom = __iommu_domain_alloc(dev->bus, type);
+   if (!dom && type != IOMMU_DOMAIN_DMA) {
dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
if (dom) {
dev_warn(dev,
 "failed to allocate default IOMMU 
domain of type %u; falling back to IOMMU_DOMAIN_DMA",
-iommu_def_domain_type);
+type);

Re: [PATCH 3/8] iommu/vt-d: Remove IOVA handling code from non-dma_ops path

2020-03-20 Thread Lu Baolu


On 2020/3/20 14:30, Tom Murphy wrote:

Could we merge patch 1-3 from this series? it just cleans up weird
code and merging these patches will cover some of the work needed to
move the intel iommu driver to the dma-iommu api in the future.


Can you please take a look at this patch series?

https://lkml.org/lkml/2020/3/13/1162

It probably makes this series easier.

Best regards,
baolu



On Sat, 21 Dec 2019 at 07:04, Tom Murphy  wrote:

Remove all IOVA handling code from the non-dma_ops path in the intel
iommu driver.

There's no need for the non-dma_ops path to keep track of IOVAs. The
whole point of the non-dma_ops path is that it allows the IOVAs to be
handled separately. The IOVA handling code removed in this patch is
pointless.

Signed-off-by: Tom Murphy

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/8] iommu/vt-d: Remove IOVA handling code from non-dma_ops path

2020-03-20 Thread Tom Murphy

Could we merge patch 1-3 from this series? it just cleans up weird
code and merging these patches will cover some of the work needed to
move the intel iommu driver to the dma-iommu api in the future.

On Sat, 21 Dec 2019 at 07:04, Tom Murphy  wrote:
>
> Remove all IOVA handling code from the non-dma_ops path in the intel
> iommu driver.
>
> There's no need for the non-dma_ops path to keep track of IOVAs. The
> whole point of the non-dma_ops path is that it allows the IOVAs to be
> handled separately. The IOVA handling code removed in this patch is
> pointless.
>
> Signed-off-by: Tom Murphy 
> ---
>  drivers/iommu/intel-iommu.c | 89 ++---
>  1 file changed, 33 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 64b1a9793daa..8d72ea0fb843 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1908,7 +1908,8 @@ static void domain_exit(struct dmar_domain *domain)
> domain_remove_dev_info(domain);
>
> /* destroy iovas */
> -   put_iova_domain(>iovad);
> +   if (domain->domain.type == IOMMU_DOMAIN_DMA)
> +   put_iova_domain(>iovad);
>
> if (domain->pgd) {
> struct page *freelist;
> @@ -2671,19 +2672,9 @@ static struct dmar_domain *set_domain_for_dev(struct 
> device *dev,
>  }
>
>  static int iommu_domain_identity_map(struct dmar_domain *domain,
> -unsigned long long start,
> -unsigned long long end)
> +unsigned long first_vpfn,
> +unsigned long last_vpfn)
>  {
> -   unsigned long first_vpfn = start >> VTD_PAGE_SHIFT;
> -   unsigned long last_vpfn = end >> VTD_PAGE_SHIFT;
> -
> -   if (!reserve_iova(>iovad, dma_to_mm_pfn(first_vpfn),
> - dma_to_mm_pfn(last_vpfn))) {
> -   pr_err("Reserving iova failed\n");
> -   return -ENOMEM;
> -   }
> -
> -   pr_debug("Mapping reserved region %llx-%llx\n", start, end);
> /*
>  * RMRR range might have overlap with physical memory range,
>  * clear it first
> @@ -2760,7 +2751,8 @@ static int __init si_domain_init(int hw)
>
> for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) {
> ret = iommu_domain_identity_map(si_domain,
> -   PFN_PHYS(start_pfn), 
> PFN_PHYS(end_pfn));
> +   mm_to_dma_pfn(start_pfn),
> +   mm_to_dma_pfn(end_pfn));
> if (ret)
> return ret;
> }
> @@ -4593,58 +4585,37 @@ static int intel_iommu_memory_notifier(struct 
> notifier_block *nb,
>unsigned long val, void *v)
>  {
> struct memory_notify *mhp = v;
> -   unsigned long long start, end;
> -   unsigned long start_vpfn, last_vpfn;
> +   unsigned long start_vpfn = mm_to_dma_pfn(mhp->start_pfn);
> +   unsigned long last_vpfn = mm_to_dma_pfn(mhp->start_pfn +
> +   mhp->nr_pages - 1);
>
> switch (val) {
> case MEM_GOING_ONLINE:
> -   start = mhp->start_pfn << PAGE_SHIFT;
> -   end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
> -   if (iommu_domain_identity_map(si_domain, start, end)) {
> -   pr_warn("Failed to build identity map for 
> [%llx-%llx]\n",
> -   start, end);
> +   if (iommu_domain_identity_map(si_domain, start_vpfn,
> +   last_vpfn)) {
> +   pr_warn("Failed to build identity map for 
> [%lx-%lx]\n",
> +   start_vpfn, last_vpfn);
> return NOTIFY_BAD;
> }
> break;
>
> case MEM_OFFLINE:
> case MEM_CANCEL_ONLINE:
> -   start_vpfn = mm_to_dma_pfn(mhp->start_pfn);
> -   last_vpfn = mm_to_dma_pfn(mhp->start_pfn + mhp->nr_pages - 1);
> -   while (start_vpfn <= last_vpfn) {
> -   struct iova *iova;
> +   {
> struct dmar_drhd_unit *drhd;
> struct intel_iommu *iommu;
> struct page *freelist;
>
> -   iova = find_iova(_domain->iovad, start_vpfn);
> -   if (iova == NULL) {
> -   pr_debug("Failed get IOVA for PFN %lx\n",
> -start_vpfn);
> -   break;
> -   }
> -
> -   iova = split_and_remove_iova(_domain->iovad, iova,
> -start_vpfn, last_vpfn);
> -   if

Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api

2020-03-20 Thread Tom Murphy

Any news on this? Is there anyone who wants to try and fix this possible bug?

On Mon, 23 Dec 2019 at 03:41, Jani Nikula  wrote:
>
> On Mon, 23 Dec 2019, Robin Murphy  wrote:
> > On 2019-12-23 10:37 am, Jani Nikula wrote:
> >> On Sat, 21 Dec 2019, Tom Murphy  wrote:
> >>> This patchset converts the intel iommu driver to the dma-iommu api.
> >>>
> >>> While converting the driver I exposed a bug in the intel i915 driver
> >>> which causes a huge amount of artifacts on the screen of my
> >>> laptop. You can see a picture of it here:
> >>> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg
> >>>
> >>> This issue is most likely in the i915 driver and is most likely caused
> >>> by the driver not respecting the return value of the
> >>> dma_map_ops::map_sg function. You can see the driver ignoring the
> >>> return value here:
> >>> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
> >>>
> >>> Previously this didn’t cause issues because the intel map_sg always
> >>> returned the same number of elements as the input scatter gather list
> >>> but with the change to this dma-iommu api this is no longer the
> >>> case. I wasn’t able to track the bug down to a specific line of code
> >>> unfortunately.
> >>>
> >>> Could someone from the intel team look at this?
> >>
> >> Let me get this straight. There is current API that on success always
> >> returns the same number of elements as the input scatter gather
> >> list. You propose to change the API so that this is no longer the case?
> >
> > No, the API for dma_map_sg() has always been that it may return fewer
> > DMA segments than nents - see Documentation/DMA-API.txt (and otherwise,
> > the return value would surely be a simple success/fail condition).
> > Relying on a particular implementation behaviour has never been strictly
> > correct, even if it does happen to be a very common behaviour.
> >
> >> A quick check of various dma_map_sg() calls in the kernel seems to
> >> indicate checking for 0 for errors and then ignoring the non-zero return
> >> is a common pattern. Are you sure it's okay to make the change you're
> >> proposing?
> >
> > Various code uses tricks like just iterating the mapped list until the
> > first segment with zero sg_dma_len(). Others may well simply have bugs.
>
> Thanks for the clarification.
>
> BR,
> Jani.
>
> >
> > Robin.
> >
> >> Anyway, due to the time of year and all, I'd like to ask you to file a
> >> bug against i915 at [1] so this is not forgotten, and please let's not
> >> merge the changes before this is resolved.
> >>
> >>
> >> Thanks,
> >> Jani.
> >>
> >>
> >> [1] https://gitlab.freedesktop.org/drm/intel/issues/new
> >>
> >>
>
> --
> Jani Nikula, Intel Open Source Graphics Center
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

81 matches

Mail list logo