Re: [PATCH 2/2] pci: Add parameter to disable attaching untrusted devices

2020-06-25 Thread Greg Kroah-Hartman
On Thu, Jun 25, 2020 at 05:27:10PM -0700, Rajat Jain wrote:
> Introduce a PCI parameter that disables the automatic attachment of
> untrusted devices to their drivers.

You didn't document this new api anywhere :(
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 2/2] arm64: mm: reserve per-numa CMA after numa_init

2020-06-25 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Thursday, June 25, 2020 11:16 PM
> To: Song Bao Hua (Barry Song) ; h...@lst.de;
> m.szyprow...@samsung.com; w...@kernel.org;
> ganapatrao.kulka...@cavium.com; catalin.mari...@arm.com
> Cc: iommu@lists.linux-foundation.org; Linuxarm ;
> linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; Nicolas
> Saenz Julienne ; Steve Capper
> ; Andrew Morton ;
> Mike Rapoport 
> Subject: Re: [PATCH v2 2/2] arm64: mm: reserve per-numa CMA after
> numa_init
> 
> On 2020-06-25 08:43, Barry Song wrote:
> > Right now, smmu is using dma_alloc_coherent() to get memory to save
> queues
> > and tables. Typically, on ARM64 server, there is a default CMA located at
> > node0, which could be far away from node2, node3 etc.
> > with this patch, smmu will get memory from local numa node to save
> command
> > queues and page tables. that means dma_unmap latency will be shrunk
> much.
> > Meanwhile, when iommu.passthrough is on, device drivers which call dma_
> > alloc_coherent() will also get local memory and avoid the travel between
> > numa nodes.
> >
> > Cc: Christoph Hellwig 
> > Cc: Marek Szyprowski 
> > Cc: Will Deacon 
> > Cc: Robin Murphy 
> > Cc: Ganapatrao Kulkarni 
> > Cc: Catalin Marinas 
> > Cc: Nicolas Saenz Julienne 
> > Cc: Steve Capper 
> > Cc: Andrew Morton 
> > Cc: Mike Rapoport 
> > Signed-off-by: Barry Song 
> > ---
> >   arch/arm64/mm/init.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 1e93cfc7c47a..07d4d1fe7983 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -420,6 +420,8 @@ void __init bootmem_init(void)
> >
> > arm64_numa_init();
> >
> > +   dma_pernuma_cma_reserve();
> > +
> 
> It might be worth putting this after the hugetlb_cma_reserve() call for
> clarity, since the comment below applies equally to this call too.

Yep, it looks even better though dma_pernuma_cma_reserve() is self-documenting 
by name.

> 
> Robin.
> 
> > /*
> >  * must be done after arm64_numa_init() which calls numa_init() to
> >  * initialize node_online_map that gets used in hugetlb_cma_reserve()
> >
Thanks
Barry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 12/12] x86/traps: Fix up invalid PASID

2020-06-25 Thread Lu Baolu

Hi Fenghua,

On 2020/6/26 4:17, Fenghua Yu wrote:

A #GP fault is generated when ENQCMD instruction is executed without
a valid PASID value programmed in the current thread's PASID MSR. The
#GP fault handler will initialize the MSR if a PASID has been allocated
for this process.

Decoding the user instruction is ugly and sets a bad architecture
precedent. It may not function if the faulting instruction is modified
after #GP.

Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
without a valid PASID value programmed. #GP error codes are 16 bits and all
16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
when loading from the source operand, so its not fully comprehending all
the reasons. Rather than special case the ENQCMD, in future Intel may
choose a different fault mechanism for such cases if recovery is needed on
#GP.

The following heuristic is used to avoid decoding the user instructions
to determine the precise reason for the #GP fault:
1) If the mm for the process has not been allocated a PASID, this #GP
cannot be fixed.
2) If the PASID MSR is already initialized, then the #GP was for some
other reason
3) Try initializing the PASID MSR and returning. If the #GP was from
an ENQCMD this will fix it. If not, the #GP fault will be repeated
and will hit case "2".


For changes in Intel VT-d driver,

Reviewed-by: Lu Baolu 

Best regards,
baolu



Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Change PASID type to u32 (Christoph)

v3:
- Check and set current->has_valid_pasid in fixup() (PeterZ)
- Add fpu__pasid_write() to update the MSR (PeterZ)
- Add ioasid_find() sanity check in fixup()

v2:
- Update the first paragraph of the commit message (Thomas)
- Add reasons why don't decode the user instruction and don't use
   #GP error code (Thomas)
- Change get_task_mm() to current->mm (Thomas)
- Add comments on why IRQ is disabled during PASID fixup (Thomas)
- Add comment in fixup() that the function is called when #GP is from
   user (so mm is not NULL) (Dave Hansen)

  arch/x86/include/asm/iommu.h |  1 +
  arch/x86/kernel/traps.c  | 14 +++
  drivers/iommu/intel/svm.c| 78 
  3 files changed, 93 insertions(+)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index ed41259fe7ac..e9365a5d6f7d 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -27,5 +27,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
  }
  
  void __free_pasid(struct mm_struct *mm);

+bool __fixup_pasid_exception(void);
  
  #endif /* _ASM_X86_IOMMU_H */

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f9727b96961f..2352f3f1f7ed 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #ifdef CONFIG_X86_64

  #include 
@@ -514,6 +515,16 @@ static enum kernel_gp_hint get_kernel_gp_address(struct 
pt_regs *regs,
return GP_CANONICAL;
  }
  
+static bool fixup_pasid_exception(void)

+{
+   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+   return false;
+   if (!static_cpu_has(X86_FEATURE_ENQCMD))
+   return false;
+
+   return __fixup_pasid_exception();
+}
+
  #define GPFSTR "general protection fault"
  
  DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)

@@ -526,6 +537,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
  
  	cond_local_irq_enable(regs);
  
+	if (user_mode(regs) && fixup_pasid_exception())

+   goto exit;
+
if (static_cpu_has(X86_FEATURE_UMIP)) {
if (user_mode(regs) && fixup_umip_exception(regs))
goto exit;
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4c70b037..4a84c82a4f8c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1105,3 +1105,81 @@ void __free_pasid(struct mm_struct *mm)
 */
ioasid_free(pasid);
  }
+
+/*
+ * Write the current task's PASID MSR/state. This is called only when PASID
+ * is enabled.
+ */
+static void fpu__pasid_write(u32 pasid)
+{
+   u64 msr_val = pasid | MSR_IA32_PASID_VALID;
+
+   fpregs_lock();
+
+   /*
+* If the MSR is active and owned by the current task's FPU, it can
+* be directly written.
+*
+* Otherwise, write the fpstate.
+*/
+   if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+   wrmsrl(MSR_IA32_PASID, msr_val);
+   } else {
+   struct ia32_pasid_state *ppasid_state;
+
+   ppasid_state = get_xsave_addr(¤t->thread.fpu.state.xsave,
+ XFEATURE_PASID);
+   /*
+* ppasid_state shouldn't be NULL because XFEATURE_PASID
+* is enabled.
+*/
+ 

Re: [PATCH v4 10/12] x86/mmu: Allocate/free PASID

2020-06-25 Thread Lu Baolu

Hi Fenghua,

On 2020/6/26 4:17, Fenghua Yu wrote:

A PASID is allocated for an "mm" the first time any thread attaches
to an SVM capable device. Later device attachments (whether to the same
device or another SVM device) will re-use the same PASID.

The PASID is freed when the process exits (so no need to keep
reference counts on how many SVM devices are sharing the PASID).


For changes in Intel VT-d driver,

Reviewed-by: Lu Baolu 

Best regards,
baolu



Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Change PASID type to u32 (Christoph)

v3:
- Add sanity checks in alloc_pasid() and _free_pasid() (Baolu)
- Add a comment that the private PASID feature will be removed completely
   from IOMMU and don't track private PASID in mm (Thomas)

v2:
- Define a helper free_bind() to simplify error exit code in bind_mm()
   (Thomas)
- Fix a ret error code in bind_mm() (Thomas)
- Change pasid's type from "int" to "unsigned int" to have consistent
   pasid type in iommu (Thomas)
- Simplify alloc_pasid() a bit.

  arch/x86/include/asm/iommu.h   |   2 +
  arch/x86/include/asm/mmu_context.h |  14 
  drivers/iommu/intel/svm.c  | 128 ++---
  3 files changed, 132 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index bf1ed2ddc74b..ed41259fe7ac 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
return -EINVAL;
  }
  
+void __free_pasid(struct mm_struct *mm);

+
  #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 47562147e70b..f8c91ce8c451 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  
  extern atomic64_t last_mm_ctx_id;
  
@@ -117,9 +118,22 @@ static inline int init_new_context(struct task_struct *tsk,

init_new_context_ldt(mm);
return 0;
  }
+
+static inline void free_pasid(struct mm_struct *mm)
+{
+   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+   return;
+
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return;
+
+   __free_pasid(mm);
+}
+
  static inline void destroy_context(struct mm_struct *mm)
  {
destroy_context_ldt(mm);
+   free_pasid(mm);
  }
  
  extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 8a0cf2f0dd54..4c70b037 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -425,6 +425,69 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
return ret;
  }
  
+static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev,

+ bool new_pasid)
+{
+   if (new_pasid)
+   ioasid_free(svm->pasid);
+   kfree(svm);
+   kfree(sdev);
+}
+
+/*
+ * If this mm already has a PASID, use it. Otherwise allocate a new one.
+ * Let the caller know if a new PASID is allocated via 'new_pasid'.
+ */
+static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
+  u32 pasid_max, bool *new_pasid,
+  unsigned int flags)
+{
+   u32 pasid;
+
+   *new_pasid = false;
+
+   /*
+* Reuse the PASID if the mm already has a PASID and not a private
+* PASID is requested.
+*/
+   if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
+   void *p;
+
+   /*
+* Since the mm has a PASID already, the PASID should be
+* bound and unbound to the mm before calling this allocation.
+* So the PASID must be allocated by bind_mm() previously and
+* should still exist in ioasid; but its data must be cleared
+* already by unbind_mm().
+*
+* Do a sanity check here to ensure the PASID has the right
+* status before reusing it.
+*/
+   p = ioasid_find(NULL, mm->pasid, NULL);
+   if (IS_ERR(p) || p)
+   return INVALID_IOASID;
+
+   /*
+* Once the PASID is allocated for this mm, it
+* stays with the mm until the mm is dropped. Reuse
+* the PASID which has been already allocated for the
+* mm instead of allocating a new one.
+*/
+   ioasid_set_data(mm->pasid, svm);
+
+   return mm->pasid;
+   }
+
+   /* Allocate a new pasid. Do not use PASID 0, reserved for init PASID. */
+   pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, svm);
+   if (pasid != INVALID_IOASID) {
+   /* A new pasid is allocated. */
+   *new_pasid = true;
+   }
+
+   return pasid;
+}
+
  /* Ca

Re: [PATCH 00/13] iommu: Remove usage of dev->archdata.iommu

2020-06-25 Thread Jerry Snitselaar

On Thu Jun 25 20, Joerg Roedel wrote:

From: Joerg Roedel 

Hi,

here is a patch-set to remove the usage of dev->archdata.iommu from
the IOMMU code in the kernel and replace its uses by the iommu per-device
private data field. The changes also remove the field entirely from
the architectures which no longer need it.

On PowerPC the field is called dev->archdata.iommu_domain and was only
used by the PAMU IOMMU driver. It gets removed as well.

The patches have been runtime tested on Intel VT-d and compile tested
with allyesconfig for:

* x86 (32 and 64 bit)
* arm and arm64
* ia64 (only drivers/ because build failed for me in
arch/ia64)
* PPC64

Besides that the changes also survived my IOMMU tree compile tests.

Please review.

Regards,

Joerg

Joerg Roedel (13):
 iommu/exynos: Use dev_iommu_priv_get/set()
 iommu/vt-d: Use dev_iommu_priv_get/set()
 iommu/msm: Use dev_iommu_priv_get/set()
 iommu/omap: Use dev_iommu_priv_get/set()
 iommu/rockchip: Use dev_iommu_priv_get/set()
 iommu/tegra: Use dev_iommu_priv_get/set()
 iommu/pamu: Use dev_iommu_priv_get/set()
 iommu/mediatek: Do no use dev->archdata.iommu
 x86: Remove dev->archdata.iommu pointer
 ia64: Remove dev->archdata.iommu pointer
 arm: Remove dev->archdata.iommu pointer
 arm64: Remove dev->archdata.iommu pointer
 powerpc/dma: Remove dev->archdata.iommu_domain

arch/arm/include/asm/device.h |  3 ---
arch/arm64/include/asm/device.h   |  3 ---
arch/ia64/include/asm/device.h|  3 ---
arch/powerpc/include/asm/device.h |  3 ---
arch/x86/include/asm/device.h |  3 ---
.../gpu/drm/i915/selftests/mock_gem_device.c  | 10 --
drivers/iommu/exynos-iommu.c  | 20 +--
drivers/iommu/fsl_pamu_domain.c   |  8 
drivers/iommu/intel/iommu.c   | 18 -
drivers/iommu/msm_iommu.c |  4 ++--
drivers/iommu/mtk_iommu.h |  2 ++
drivers/iommu/mtk_iommu_v1.c  | 10 --
drivers/iommu/omap-iommu.c| 20 +--
drivers/iommu/rockchip-iommu.c|  8 
drivers/iommu/tegra-gart.c|  8 
drivers/iommu/tegra-smmu.c|  8 
.../media/platform/s5p-mfc/s5p_mfc_iommu.h|  4 +++-
17 files changed, 64 insertions(+), 71 deletions(-)

--
2.27.0



Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 02/12] iommu/vt-d: Change flags type to unsigned int in binding mm

2020-06-25 Thread Lu Baolu

Hi Fenghua,

On 2020/6/26 4:17, Fenghua Yu wrote:

"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".

Change its type to "unsigned int".


Reviewed-by: Lu Baolu 

Best regards,
baolu



Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this new patch per Thomas' comment.

  drivers/iommu/intel/svm.c   | 7 ---
  include/linux/intel-iommu.h | 2 +-
  2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 778089d198eb..8a0cf2f0dd54 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -427,7 +427,8 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
  
  /* Caller must hold pasid_mutex, mm reference */

  static int
-intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
+intel_svm_bind_mm(struct device *dev, unsigned int flags,
+ struct svm_dev_ops *ops,
  struct mm_struct *mm, struct intel_svm_dev **sd)
  {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -954,7 +955,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
  {
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
-   int flags = 0;
+   unsigned int flags = 0;
int ret;
  
  	/*

@@ -963,7 +964,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 * and intel_svm etc.
 */
if (drvdata)
-   flags = *(int *)drvdata;
+   flags = *(unsigned int *)drvdata;
mutex_lock(&pasid_mutex);
ret = intel_svm_bind_mm(dev, flags, NULL, mm, &sdev);
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 314a95870e47..4d20f93a5e2d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -759,7 +759,7 @@ struct intel_svm {
struct mm_struct *mm;
  
  	struct intel_iommu *iommu;

-   int flags;
+   unsigned int flags;
u32 pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 01/12] iommu: Change type of pasid to u32

2020-06-25 Thread Lu Baolu

Hi Fenghua,

On 2020/6/26 4:17, Fenghua Yu wrote:

PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".


For changes in Intel VT-d driver,

Reviewed-by: Lu Baolu 

Best regards,
baolu



No PASID type change in uapi although it defines PASID as __u64 in
some places.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Change PASID type from "unsigned int" to "u32" (Christoph)

v2:
- Create this new patch to define PASID as "unsigned int" consistently in
   iommu (Thomas)

  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  4 +--
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  2 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  2 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  2 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  2 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  2 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   |  6 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h   |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  8 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  8 ++---
  .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h   |  2 +-
  .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++---
  drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  8 ++---
  drivers/gpu/drm/amd/amdkfd/kfd_events.h   |  4 +--
  drivers/gpu/drm/amd/amdkfd/kfd_iommu.c|  6 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_pasid.c|  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 18 +--
  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  2 +-
  .../gpu/drm/amd/include/kgd_kfd_interface.h   |  2 +-
  drivers/iommu/amd/amd_iommu.h | 10 +++---
  drivers/iommu/amd/iommu.c | 31 ++-
  drivers/iommu/amd/iommu_v2.c  | 20 ++--
  drivers/iommu/intel/dmar.c|  7 +++--
  drivers/iommu/intel/intel-pasid.h | 24 +++---
  drivers/iommu/intel/iommu.c   |  4 +--
  drivers/iommu/intel/pasid.c   | 31 +--
  drivers/iommu/intel/svm.c | 12 +++
  drivers/iommu/iommu.c |  2 +-
  drivers/misc/uacce/uacce.c|  2 +-
  include/linux/amd-iommu.h |  8 ++---
  include/linux/intel-iommu.h   | 12 +++
  include/linux/intel-svm.h |  2 +-
  include/linux/iommu.h | 10 +++---
  include/linux/uacce.h |  2 +-
  38 files changed, 139 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ffe149aafc39..dfef5a7e0f5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -207,11 +207,11 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev 
*dst, struct kgd_dev *s
})
  
  /* GPUVM API */

-int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
pasid,
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
void **vm, void **process_info,
struct dma_fence **ef);
  int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
-   struct file *filp, unsigned int pasid,
+   struct file *filp, u32 pasid,
void **vm, void **process_info,
struct dma_fence **ef);
  void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index bf927f432506..ee531c3988d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -105,7 +105,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
*kgd, uint32_t vmid,
unlock_srbm(kgd);
  }
  
-static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,

+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
unsigned int vmid)
  {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 744366c7ee85..4d41317b9292 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdg

[PATCH 2/2] pci: Add parameter to disable attaching untrusted devices

2020-06-25 Thread Rajat Jain via iommu
Introduce a PCI parameter that disables the automatic attachment of
untrusted devices to their drivers.

Signed-off-by: Rajat Jain 
---
Context:

  I set out to implement the approach outlined in
https://lkml.org/lkml/2020/6/9/1331
https://lkml.org/lkml/2020/6/15/1453

  But to my surprise, I found that the new hotplugged PCI devices
  were getting automatically attached to drivers even though
  /sys/bus/pci/drivers_autoprobe was set to 0.

  I realized that the device core's "drivers_autoprobe":

  * only disables the *initial* probe of the device (i.e. from
device_add()). If a subsystem calls device_attach() explicitly
for its devices like PCI subsystem does, the drivers_autoprobe
setting does not matter. The core will attach device to the driver.
This looks like correct semantic behavior to me because PCI is
explicitly calling device_attach(), which is a way to explicitly
ask the core to find and attach a driver for a device.

  * "drivers_autoprobe" cannot be controlled at boot time (to restrict
any drivers before userspace comes up).

  The options I considered were:

  1) Change device_attach() so that it takes into consideration the
 drivers_autoprobe property. Not sure if this is semantically correct
 thing to do though. If I do this, then the only way a driver can
 be attached to the drivers would be via userspace
 (/sys/bus/pci/drivers/bind) (Good for our use case though!).

  2) Make the drivers_autoprobe property available to PCI to use
 (currently it is private to device core). The PCI could use this
 to determine whether or not to call device_attach(). This still
 leaves the other problem (of not being able to set
 drivers_autoprobe via command line open).

  3) I found the pci_dev->match_driver, which seemed similar to what I
 am trying to do, but can't be controlled from userspace. I considered
 populating that field based on drivers_autoprobe (still need (2)).
 But the problem is that there is the AMD IOMMU driver which is setting
 this independently, so setting the match_driver based on
 drivers_autoprobe may not be a good idea. May be we can populate it
 for untrusted devicesi, based on the parameter that I'm introducing?

  4) This patch was my option 4 that helps fix both the problems for me.

 drivers/pci/bus.c | 11 ---
 drivers/pci/pci.c |  9 +
 drivers/pci/pci.h |  1 +
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 3cef835b375fd..336aeeb4c4ebf 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -321,9 +321,14 @@ void pci_bus_add_device(struct pci_dev *dev)
pci_bridge_d3_update(dev);
 
dev->match_driver = true;
-   retval = device_attach(&dev->dev);
-   if (retval < 0 && retval != -EPROBE_DEFER)
-   pci_warn(dev, "device attach failed (%d)\n", retval);
+
+   if (dev->untrusted && pci_dont_attach_untrusted_devs) {
+   pci_info(dev, "not attaching untrusted device\n");
+   } else {
+   retval = device_attach(&dev->dev);
+   if (retval < 0 && retval != -EPROBE_DEFER)
+   pci_warn(dev, "device attach failed (%d)\n", retval);
+   }
 
pci_dev_assign_added(dev, true);
 }
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ce096272f52b1..dec1f9ef27d71 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -127,6 +127,13 @@ static bool pcie_ats_disabled;
 /* If set, the PCI config space of each device is printed during boot. */
 bool pci_early_dump;
 
+/*
+ * If set, the devices with "untrusted" flag shall not be attached 
automatically
+ * Userspace will need to attach them manually:
+ * echo   > /sys/bus/pci/drivers//bind
+ */
+bool pci_dont_attach_untrusted_devs;
+
 bool pci_ats_disabled(void)
 {
return pcie_ats_disabled;
@@ -6522,6 +6529,8 @@ static int __init pci_setup(char *str)
pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
} else if (!strncmp(str, "disable_acs_redir=", 18)) {
disable_acs_redir_param = str + 18;
+   } else if (!strcmp(str, "dont_attach_untrusted_devs")) {
+   pci_dont_attach_untrusted_devs = true;
} else {
pr_err("PCI: Unknown option `%s'\n", str);
}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6d3f758671064..30ffad047d926 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -13,6 +13,7 @@
 
 extern const unsigned char pcie_link_speed[];
 extern bool pci_early_dump;
+extern bool pci_dont_attach_untrusted_devs;
 
 bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
 bool pcie_cap_has_rtctl(const struct pci_dev *dev);
-- 
2.27.0.212.ge8ba1cc988-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
ht

[PATCH 1/2] pci: Add pci device even if the driver failed to attach

2020-06-25 Thread Rajat Jain via iommu
device_attach() returning failure indicates a driver error
while trying to probe the device. In such a scenario, the PCI
device should still be added in the system and be visible to
the user.

This patch partially reverts:
commit ab1a187bba5c ("PCI: Check device_attach() return value always")

Signed-off-by: Rajat Jain 
---
 drivers/pci/bus.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 8e40b3e6da77d..3cef835b375fd 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -322,12 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
 
dev->match_driver = true;
retval = device_attach(&dev->dev);
-   if (retval < 0 && retval != -EPROBE_DEFER) {
+   if (retval < 0 && retval != -EPROBE_DEFER)
pci_warn(dev, "device attach failed (%d)\n", retval);
-   pci_proc_detach_device(dev);
-   pci_remove_sysfs_dev_files(dev);
-   return;
-   }
 
pci_dev_assign_added(dev, true);
 }
-- 
2.27.0.212.ge8ba1cc988-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-25 Thread Krishna Reddy
>Should NVIDIA_TEGRA194_SMMU be a separate value for smmu->model, perhaps? That 
>way we avoid this somewhat odd check here.

NVIDIA haven't made any changes to arm,mmu-500. It is only used in different 
topology.  New model would be mis-leading here.
As suggested by Robin, It can just be moved to end of function.

>> diff --git a/drivers/iommu/arm-smmu-nvidia.c 
>> b/drivers/iommu/arm-smmu-nvidia.c
>I wonder if it would be better to name this arm-smmu-tegra.c to make it 
>clearer that this is for a Tegra chip. We do have regular expressions in 
>MAINTAINERS that catch anything with "tegra" in it to make this easier.
>Also, the nsmmu_ prefix looks somewhat odd here. You already use struct 
>nvidia_smmu as the name of the structure, so why not be consistent and 
>continue to use nvidia_smmu_ as the prefix for function names?
>Or perhaps even use tegra_smmu_ as the prefix to match the filename change I 
>suggested earlier.

Prefix can be updated to nvidia_smmu as we seem to be okay for now to keep file 
name as arm-smmu-nvidia.c after the vendor name.  

>> +#define TLB_LOOP_TIMEOUT100 /* 1s! */
>USEC_PER_SEC?

It is not meant for a conversion. Reused Timeout variable from arm-smmu.c for 
tlb_sync implementation.  Can rename it to TLB_LOOP_TIMEOUT_IN_US.


>> +}
>> +dev_err_ratelimited(smmu->dev,
>> +"TLB sync timed out -- SMMU may be deadlocked\n");
>Same here.
>Also, is there anything we can do when this happens?

This is never expected to happen on Silicon. This code and message is reused 
from arm-smmu.c.


>> +#define nsmmu_page(smmu, inst, page) \
>> +(((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
>> +((page) << smmu->pgshift))

>Can we simply define to_nvidia_smmu(smmu)->bases[0] = smmu->base in 
>nvidia_smmu_impl_init()? Then this would become just:
>   to_nvidia_smmu(smmu)->bases[inst] + ((page) << (smmu)->pgshift)
> +
>Maybe add this here to simplify the nsmmu_page() macro above:
>   nsmmu->bases[0] = smmu->base;

This preferred to avoid the check in nsmmu_page(). But, smmu->base is not yet 
populated when nvidia_smmu_impl_init() is called.  
Let me look at the alternative place to set it.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-25 Thread Krishna Reddy
>> +  - nvdia,tegra194-smmu-500
>The -500 suffix here seems a bit redundant since there's no other type of SMMU 
>in Tegra194, correct?

Yeah, there is only one type of SMMU supported in T194. It was added to be 
synonymous with mmu-500.  Can be removed.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 11/12] sched: Define and initialize a flag to identify valid PASID in the task

2020-06-25 Thread Fenghua Yu
From: Peter Zijlstra 

The flag is defined for the task to identify if the task has a valid
PASID. Its initial value is 0 when the task is forked/cloned. It will
be used shortly.

Signed-off-by: Peter Zijlstra 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
---
v2:
- Add this patch to define the flag to identify valid PASID MSR (PeterZ)

 include/linux/sched.h | 3 +++
 kernel/fork.c | 4 
 2 files changed, 7 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b62e6aaf28f0..1b5a6e5c74ca 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -801,6 +801,9 @@ struct task_struct {
/* Stalled due to lack of memory */
unsignedin_memstall:1;
 #endif
+#ifdef CONFIG_IOMMU_SUPPORT
+   unsignedhas_valid_pasid:1;
+#endif
 
unsigned long   atomic_flags; /* Flags requiring atomic 
access. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 43b5f112604d..0a962bebdf88 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
tsk->use_memdelay = 0;
 #endif
 
+#ifdef CONFIG_IOMMU_SUPPORT
+   tsk->has_valid_pasid = 0;
+#endif
+
 #ifdef CONFIG_MEMCG
tsk->active_memcg = NULL;
 #endif
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 09/12] x86/process: Clear PASID state for a newly forked/cloned thread

2020-06-25 Thread Fenghua Yu
The PASID state has to be cleared on forks, since the child has a
different address space. The PASID is also cleared for thread clone. While
it would be correct to inherit the PASID in this case, it is unknown
whether the new task will use ENQCMD. Giving it the PASID "just in case"
would have the downside of increased context switch overhead to setting
the PASID MSR.

Since #GP faults have to be handled on any threads that were created before
the PASID was assigned to the mm of the process, newly created threads
might as well be treated in a consistent way.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Modify init_task_pasid().

 arch/x86/kernel/process.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index f362ce0d5ac0..1b1492e337a6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -121,6 +121,21 @@ static int set_new_tls(struct task_struct *p, unsigned 
long tls)
return do_set_thread_area_64(p, ARCH_SET_FS, tls);
 }
 
+/* Initialize the PASID state for the forked/cloned thread. */
+static void init_task_pasid(struct task_struct *task)
+{
+   struct ia32_pasid_state *ppasid;
+
+   /*
+* Initialize the PASID state so that the PASID MSR will be
+* initialized to its initial state (0) by XRSTORS when the task is
+* scheduled for the first time.
+*/
+   ppasid = get_xsave_addr(&task->thread.fpu.state.xsave, XFEATURE_PASID);
+   if (ppasid)
+   ppasid->pasid = INIT_PASID;
+}
+
 int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
unsigned long arg, struct task_struct *p, unsigned long tls)
 {
@@ -174,6 +189,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned 
long sp,
task_user_gs(p) = get_user_gs(current_pt_regs());
 #endif
 
+   if (static_cpu_has(X86_FEATURE_ENQCMD))
+   init_task_pasid(p);
+
/* Set a new TLS for the child thread? */
if (clone_flags & CLONE_SETTLS)
ret = set_new_tls(p, tls);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 01/12] iommu: Change type of pasid to u32

2020-06-25 Thread Fenghua Yu
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".

No PASID type change in uapi although it defines PASID as __u64 in
some places.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Change PASID type from "unsigned int" to "u32" (Christoph)

v2:
- Create this new patch to define PASID as "unsigned int" consistently in
  iommu (Thomas)

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  4 +--
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   |  6 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h   |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  8 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  8 ++---
 .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h   |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  8 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_events.h   |  4 +--
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c|  6 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_pasid.c|  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 18 +--
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  2 +-
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |  2 +-
 drivers/iommu/amd/amd_iommu.h | 10 +++---
 drivers/iommu/amd/iommu.c | 31 ++-
 drivers/iommu/amd/iommu_v2.c  | 20 ++--
 drivers/iommu/intel/dmar.c|  7 +++--
 drivers/iommu/intel/intel-pasid.h | 24 +++---
 drivers/iommu/intel/iommu.c   |  4 +--
 drivers/iommu/intel/pasid.c   | 31 +--
 drivers/iommu/intel/svm.c | 12 +++
 drivers/iommu/iommu.c |  2 +-
 drivers/misc/uacce/uacce.c|  2 +-
 include/linux/amd-iommu.h |  8 ++---
 include/linux/intel-iommu.h   | 12 +++
 include/linux/intel-svm.h |  2 +-
 include/linux/iommu.h | 10 +++---
 include/linux/uacce.h |  2 +-
 38 files changed, 139 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ffe149aafc39..dfef5a7e0f5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -207,11 +207,11 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev 
*dst, struct kgd_dev *s
})
 
 /* GPUVM API */
-int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
pasid,
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
void **vm, void **process_info,
struct dma_fence **ef);
 int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
-   struct file *filp, unsigned int pasid,
+   struct file *filp, u32 pasid,
void **vm, void **process_info,
struct dma_fence **ef);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index bf927f432506..ee531c3988d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -105,7 +105,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
*kgd, uint32_t vmid,
unlock_srbm(kgd);
 }
 
-static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
unsigned int vmid)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 744366c7ee85..4d41317b9292 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -139,7 +139,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
*kgd, uint32_t vmid,
unlo

[PATCH v4 07/12] mm: Define pasid in mm

2020-06-25 Thread Fenghua Yu
PASID is shared by all threads in a process. So the logical place to keep
track of it is in the "mm". Both ARM and X86 need to use the PASID in the
"mm".

Suggested-by: Christoph Hellwig 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Change PASID type to u32 (Christoph)

v3:
- Change CONFIG_PCI_PASID to CONFIG_IOMMU_SUPPORT because non-PCI device
  can have PASID in ARM (Jean)

v2:
- This new patch moves "pasid" from x86 specific mm_context_t to generic
  struct mm_struct per Christopher's comment: 
https://lore.kernel.org/linux-iommu/20200414170252.714402-1-jean-phili...@linaro.org/T/#mb57110ffe1aaa24750eeea4f93b611f0d1913911
- Jean-Philippe Brucker released a virtually same patch. I still put this
  patch in the series for better review. The upstream kernel only needs one
  of the two patches eventually.
https://lore.kernel.org/linux-iommu/20200519175502.2504091-2-jean-phili...@linaro.org/
- Change CONFIG_IOASID to CONFIG_PCI_PASID (Ashok)

 include/linux/mm_types.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 64ede5f150dc..d61285cfe027 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -538,6 +538,10 @@ struct mm_struct {
atomic_long_t hugetlb_usage;
 #endif
struct work_struct async_put_work;
+
+#ifdef CONFIG_IOMMU_SUPPORT
+   u32 pasid;
+#endif
} __randomize_layout;
 
/*
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

2020-06-25 Thread Fenghua Yu
From: Ashok Raj 

ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
features are a complicated stack with lots of interconnected pieces.
This documentation provides a big picture overview for all of the
features.

Signed-off-by: Ashok Raj 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v3:
- Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu)
- Fix a couple of typos (Baolu)

v2:
- Fix the doc format and add the doc in toctree (Thomas)
- Modify the doc for better description (Thomas, Tony, Dave)

 Documentation/x86/index.rst |   1 +
 Documentation/x86/sva.rst   | 287 
 2 files changed, 288 insertions(+)
 create mode 100644 Documentation/x86/sva.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..e5d5ff096685 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -30,3 +30,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+   sva
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index ..7242a84169ef
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,287 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Shared Virtual Addressing (SVA) with ENQCMD
+===
+
+Background
+==
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM)
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU also is required
+to support PCIe features ATS and PRI. ATS allows devices to cache
+translations for the virtual address. IOMMU driver uses the mmu_notifier()
+support to keep the device tlb cache and the CPU cache in sync. PRI allows
+the device to request paging the virtual address before using if they are
+not paged in the CPU page tables.
+
+
+Shared Hardware Workqueues
+==
+
+Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+==
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement fairness or ensure forward progress can be made.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permit hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=
+
+A new thread scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA capable device this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU specific api
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+iommu_sva_bind_device(), which will do the following.
+
+- Allocate the PASID, and program the process page-table (cr3) in the PASID
+  context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+  the device tlb in sync. For example, when a page-table entry is invalidated,
+  IOMMU propagates the invalidation to device tlb. This will force any
+  future access by the device to this virtual address to participate in
+  ATS. If

[PATCH v4 06/12] x86/msr-index: Define IA32_PASID MSR

2020-06-25 Thread Fenghua Yu
The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses PASID to identify process
address space and direct responses to the right address space.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change "identify process" to "identify process address space" in the
  commit message (Thomas)

 arch/x86/include/asm/msr-index.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e8370e64a155..e5f699ff1dd6 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -237,6 +237,9 @@
 #define MSR_IA32_LASTINTFROMIP 0x01dd
 #define MSR_IA32_LASTINTTOIP   0x01de
 
+#define MSR_IA32_PASID 0x0d93
+#define MSR_IA32_PASID_VALID   BIT_ULL(31)
+
 /* DEBUGCTLMSR bits (others vary by model): */
 #define DEBUGCTLMSR_LBR(1UL <<  0) /* last branch 
recording */
 #define DEBUGCTLMSR_BTF_SHIFT  1
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 12/12] x86/traps: Fix up invalid PASID

2020-06-25 Thread Fenghua Yu
A #GP fault is generated when ENQCMD instruction is executed without
a valid PASID value programmed in the current thread's PASID MSR. The
#GP fault handler will initialize the MSR if a PASID has been allocated
for this process.

Decoding the user instruction is ugly and sets a bad architecture
precedent. It may not function if the faulting instruction is modified
after #GP.

Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
without a valid PASID value programmed. #GP error codes are 16 bits and all
16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
when loading from the source operand, so its not fully comprehending all
the reasons. Rather than special case the ENQCMD, in future Intel may
choose a different fault mechanism for such cases if recovery is needed on
#GP.

The following heuristic is used to avoid decoding the user instructions
to determine the precise reason for the #GP fault:
1) If the mm for the process has not been allocated a PASID, this #GP
   cannot be fixed.
2) If the PASID MSR is already initialized, then the #GP was for some
   other reason
3) Try initializing the PASID MSR and returning. If the #GP was from
   an ENQCMD this will fix it. If not, the #GP fault will be repeated
   and will hit case "2".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Change PASID type to u32 (Christoph)

v3:
- Check and set current->has_valid_pasid in fixup() (PeterZ)
- Add fpu__pasid_write() to update the MSR (PeterZ)
- Add ioasid_find() sanity check in fixup()

v2:
- Update the first paragraph of the commit message (Thomas)
- Add reasons why don't decode the user instruction and don't use
  #GP error code (Thomas)
- Change get_task_mm() to current->mm (Thomas)
- Add comments on why IRQ is disabled during PASID fixup (Thomas)
- Add comment in fixup() that the function is called when #GP is from
  user (so mm is not NULL) (Dave Hansen)

 arch/x86/include/asm/iommu.h |  1 +
 arch/x86/kernel/traps.c  | 14 +++
 drivers/iommu/intel/svm.c| 78 
 3 files changed, 93 insertions(+)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index ed41259fe7ac..e9365a5d6f7d 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -27,5 +27,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
 }
 
 void __free_pasid(struct mm_struct *mm);
+bool __fixup_pasid_exception(void);
 
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f9727b96961f..2352f3f1f7ed 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -514,6 +515,16 @@ static enum kernel_gp_hint get_kernel_gp_address(struct 
pt_regs *regs,
return GP_CANONICAL;
 }
 
+static bool fixup_pasid_exception(void)
+{
+   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+   return false;
+   if (!static_cpu_has(X86_FEATURE_ENQCMD))
+   return false;
+
+   return __fixup_pasid_exception();
+}
+
 #define GPFSTR "general protection fault"
 
 DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
@@ -526,6 +537,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 
cond_local_irq_enable(regs);
 
+   if (user_mode(regs) && fixup_pasid_exception())
+   goto exit;
+
if (static_cpu_has(X86_FEATURE_UMIP)) {
if (user_mode(regs) && fixup_umip_exception(regs))
goto exit;
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4c70b037..4a84c82a4f8c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1105,3 +1105,81 @@ void __free_pasid(struct mm_struct *mm)
 */
ioasid_free(pasid);
 }
+
+/*
+ * Write the current task's PASID MSR/state. This is called only when PASID
+ * is enabled.
+ */
+static void fpu__pasid_write(u32 pasid)
+{
+   u64 msr_val = pasid | MSR_IA32_PASID_VALID;
+
+   fpregs_lock();
+
+   /*
+* If the MSR is active and owned by the current task's FPU, it can
+* be directly written.
+*
+* Otherwise, write the fpstate.
+*/
+   if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+   wrmsrl(MSR_IA32_PASID, msr_val);
+   } else {
+   struct ia32_pasid_state *ppasid_state;
+
+   ppasid_state = get_xsave_addr(¤t->thread.fpu.state.xsave,
+ XFEATURE_PASID);
+   /*
+* ppasid_state shouldn't be NULL because XFEATURE_PASID
+* is enabled.
+*/
+   WARN_ON_ONCE(!ppasid_state);
+   ppasid_state->pasid = msr_val;
+   }
+   fpregs_unlock();
+}
+
+/*
+ * Apply some heuristics to 

[PATCH v4 05/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

2020-06-25 Thread Fenghua Yu
From: Yu-cheng Yu 

ENQCMD instruction reads PASID from IA32_PASID MSR. The MSR is stored
in the task's supervisor FPU PASID state and is context switched by
XSAVES/XRSTORS.

Signed-off-by: Yu-cheng Yu 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Modify the commit message (Thomas)

 arch/x86/include/asm/fpu/types.h  | 10 ++
 arch/x86/include/asm/fpu/xstate.h |  2 +-
 arch/x86/kernel/fpu/xstate.c  |  4 
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f098f6cab94b..00f8efd4c07d 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -114,6 +114,7 @@ enum xfeature {
XFEATURE_Hi16_ZMM,
XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
XFEATURE_PKRU,
+   XFEATURE_PASID,
 
XFEATURE_MAX,
 };
@@ -128,6 +129,7 @@ enum xfeature {
 #define XFEATURE_MASK_Hi16_ZMM (1 << XFEATURE_Hi16_ZMM)
 #define XFEATURE_MASK_PT   (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
+#define XFEATURE_MASK_PASID(1 << XFEATURE_PASID)
 
 #define XFEATURE_MASK_FPSSE(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512   (XFEATURE_MASK_OPMASK \
@@ -229,6 +231,14 @@ struct pkru_state {
u32 pad;
 } __packed;
 
+/*
+ * State component 10 is supervisor state used for context-switching the
+ * PASID state.
+ */
+struct ia32_pasid_state {
+   u64 pasid;
+} __packed;
+
 struct xstate_header {
u64 xfeatures;
u64 xcomp_bv;
diff --git a/arch/x86/include/asm/fpu/xstate.h 
b/arch/x86/include/asm/fpu/xstate.h
index 422d8369012a..ab9833c57aaa 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -33,7 +33,7 @@
  XFEATURE_MASK_BNDCSR)
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (0)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
 
 /*
  * Unsupported supervisor features. When a supervisor feature in this mask is
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index bda2e5eaca0e..31629e43383c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -37,6 +37,7 @@ static const char *xfeature_names[] =
"AVX-512 ZMM_Hi256" ,
"Processor Trace (unused)"  ,
"Protection Keys User registers",
+   "PASID state",
"unknown xstate feature",
 };
 
@@ -51,6 +52,7 @@ static short xsave_cpuid_features[] __initdata = {
X86_FEATURE_AVX512F,
X86_FEATURE_INTEL_PT,
X86_FEATURE_PKU,
+   X86_FEATURE_ENQCMD,
 };
 
 /*
@@ -316,6 +318,7 @@ static void __init print_xstate_features(void)
print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
print_xstate_feature(XFEATURE_MASK_PKRU);
+   print_xstate_feature(XFEATURE_MASK_PASID);
 }
 
 /*
@@ -590,6 +593,7 @@ static void check_xstate_against_struct(int nr)
XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
XCHECK_SZ(sz, nr, XFEATURE_PKRU,  struct pkru_state);
+   XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state);
 
/*
 * Make *SURE* to add any feature numbers in below if
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 00/12] x86: tag application address space for devices

2020-06-25 Thread Fenghua Yu
Typical hardware devices require a driver stack to translate application
buffers to hardware addresses, and a kernel-user transition to notify the
hardware of new work. What if both the translation and transition overhead
could be eliminated? This is what Shared Virtual Address (SVA) and ENQCMD
enabled hardware like Data Streaming Accelerator (DSA) aims to achieve.
Applications map portals in their local-address-space and directly submit
work to them using a new instruction.

This series enables ENQCMD and associated management of the new MSR
(MSR_IA32_PASID). This new MSR allows an application address space to be
associated with what the PCIe spec calls a Process Address Space ID (PASID).
This PASID tag is carried along with all requests between applications and
devices and allows devices to interact with the process address space.

SVA and ENQCMD enabled device drivers need this series. The phase 2 DSA
patches with SVA and ENQCMD support was released on the top of this series:
https://lore.kernel.org/patchwork/cover/1244060/

This series only provides simple and basic support for ENQCMD and the MSR:
1. Clean up type definitions (patch 1-2). These patches can be in a
   separate series.
   - Define "pasid" as "u32" consistently
   - Define "flags" as "unsigned int"
2. Explain different various technical terms used in the series (patch 3).
3. Enumerate support for ENQCMD in the processor (patch 4).
4. Handle FPU PASID state and the MSR during context switch (patches 5-6).
5. Define "pasid" in mm_struct (patch 7).
5. Clear PASID state for new mm and forked and cloned thread (patch 8-9).
6. Allocate and free PASID for a process (patch 10).
7. Fix up the PASID MSR in #GP handler when one thread in a process
   executes ENQCMD for the first time (patches 11-12).

This patch series and the DSA phase 2 series are in
https://github.com/intel/idxd-driver/tree/idxd-stage2

References:
1. Detailed information on the ENQCMD/ENQCMDS instructions and the
IA32_PASID MSR can be found in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

2. Detailed information on DSA can be found in DSA specification:
https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

Chang log:
v4:
- Define PASID as "u32" instead of "unsigned int" in patch 1, 7, 10, 12.
  (Christoph)
- Drop v3 patch 2 which changes PASID type in ocxl because it's not related
  to x86 and was rejected by ocxl maintainer Frederic Barrat
- A split patch which changes PASID type to u32 in crypto/hisilicon/qm.c
  was released separately to linux-crypto mailing list because it's not
  related to x86 and is a standalone patch:

v3:
- Change names of bind_mm() and unbind_mm() to match to new APIs in
  patch 4 (Baolu)
- Change CONFIG_PCI_PASID to CONFIG_IOMMU_SUPPORT because non-PCI device
  can have PASID in ARM in patch 8 (Jean)
- Add a few sanity checks in __free_pasid() and alloc_pasid() in
  patch 11 (Baolu)
- Add patch 12 to define a new flag "has_valid_pasid" for a task and
  use the flag to identify if the task has a valid PASID MSR (PeterZ)
- Add fpu__pasid_write() to update the MSR in fixup() in patch 13
- Check if mm->pasid can be found in fixup() in patch 13

v2:
- Add patches 1-3 to define "pasid" and "flags" as "unsigned int"
  consistently (Thomas)
  (these 3 patches could be in a separate patch set)
- Add patch 8 to move "pasid" to generic mm_struct (Christoph).
  Jean-Philippe Brucker released a virtually same patch. Upstream only
  needs one of the two.
- Add patch 9 to initialize PASID in a new mm.
- Plus other changes described in each patch (Thomas)

Ashok Raj (1):
  docs: x86: Add documentation for SVA (Shared Virtual Addressing)

Fenghua Yu (9):
  iommu: Change type of pasid to u32
  iommu/vt-d: Change flags type to unsigned int in binding mm
  x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
  x86/msr-index: Define IA32_PASID MSR
  mm: Define pasid in mm
  fork: Clear PASID for new mm
  x86/process: Clear PASID state for a newly forked/cloned thread
  x86/mmu: Allocate/free PASID
  x86/traps: Fix up invalid PASID

Peter Zijlstra (1):
  sched: Define and initialize a flag to identify valid PASID in the
task

Yu-cheng Yu (1):
  x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

 Documentation/x86/index.rst   |   1 +
 Documentation/x86/sva.rst | 287 ++
 arch/x86/include/asm/cpufeatures.h|   1 +
 arch/x86/include/asm/fpu/types.h  |  10 +
 arch/x86/include/asm/fpu/xstate.h |   2 +-
 arch/x86/include/asm/iommu.h  |   3 +
 arch/x86/include/asm/mmu_context.h|  14 +
 arch/x86/include/asm/msr-index.h  |   3 +
 arch/x86/kernel/cpu/cpuid-deps.c  |   1 +
 arch/x86/kernel/fpu/xsta

[PATCH v4 02/12] iommu/vt-d: Change flags type to unsigned int in binding mm

2020-06-25 Thread Fenghua Yu
"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".

Change its type to "unsigned int".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this new patch per Thomas' comment.

 drivers/iommu/intel/svm.c   | 7 ---
 include/linux/intel-iommu.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 778089d198eb..8a0cf2f0dd54 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -427,7 +427,8 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
 
 /* Caller must hold pasid_mutex, mm reference */
 static int
-intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
+intel_svm_bind_mm(struct device *dev, unsigned int flags,
+ struct svm_dev_ops *ops,
  struct mm_struct *mm, struct intel_svm_dev **sd)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -954,7 +955,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 {
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
-   int flags = 0;
+   unsigned int flags = 0;
int ret;
 
/*
@@ -963,7 +964,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 * and intel_svm etc.
 */
if (drvdata)
-   flags = *(int *)drvdata;
+   flags = *(unsigned int *)drvdata;
mutex_lock(&pasid_mutex);
ret = intel_svm_bind_mm(dev, flags, NULL, mm, &sdev);
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 314a95870e47..4d20f93a5e2d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -759,7 +759,7 @@ struct intel_svm {
struct mm_struct *mm;
 
struct intel_iommu *iommu;
-   int flags;
+   unsigned int flags;
u32 pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 04/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions

2020-06-25 Thread Fenghua Yu
Work submission instruction comes in two flavors. ENQCMD can be called
both in ring 3 and ring 0 and always uses the contents of PASID MSR when
shipping the command to the device. ENQCMDS allows a kernel driver to
submit commands on behalf of a user process. The driver supplies the
PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel
as of now.

The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Re-write commit message (Thomas)

 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 02dabc9e77b0..4469618c410f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -351,6 +351,7 @@
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
+#define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS 
instructions */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3cbe24ca80ab..3a02707c1f4d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_TOTAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_CQM_MBM_LOCAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_AVX512_BF16,  X86_FEATURE_AVX512VL  },
+   { X86_FEATURE_ENQCMD,   X86_FEATURE_XSAVES},
{}
 };
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 08/12] fork: Clear PASID for new mm

2020-06-25 Thread Fenghua Yu
When a new mm is created, its PASID should be cleared, i.e. the PASID is
initialized to its init state 0 on both ARM and X86.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this patch to initialize PASID value for a new mm.

 include/linux/mm_types.h | 2 ++
 kernel/fork.c| 8 
 2 files changed, 10 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d61285cfe027..d60d2ec10881 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -22,6 +22,8 @@
 #endif
 #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
 
+/* Initial PASID value is 0. */
+#define INIT_PASID 0
 
 struct address_space;
 struct mem_cgroup;
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..43b5f112604d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1007,6 +1007,13 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 #endif
 }
 
+static void mm_init_pasid(struct mm_struct *mm)
+{
+#ifdef CONFIG_IOMMU_SUPPORT
+   mm->pasid = INIT_PASID;
+#endif
+}
+
 static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
@@ -1035,6 +1042,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
+   mm_init_pasid(mm);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 10/12] x86/mmu: Allocate/free PASID

2020-06-25 Thread Fenghua Yu
A PASID is allocated for an "mm" the first time any thread attaches
to an SVM capable device. Later device attachments (whether to the same
device or another SVM device) will re-use the same PASID.

The PASID is freed when the process exits (so no need to keep
reference counts on how many SVM devices are sharing the PASID).

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Change PASID type to u32 (Christoph)

v3:
- Add sanity checks in alloc_pasid() and _free_pasid() (Baolu)
- Add a comment that the private PASID feature will be removed completely
  from IOMMU and don't track private PASID in mm (Thomas)

v2:
- Define a helper free_bind() to simplify error exit code in bind_mm()
  (Thomas)
- Fix a ret error code in bind_mm() (Thomas)
- Change pasid's type from "int" to "unsigned int" to have consistent
  pasid type in iommu (Thomas)
- Simplify alloc_pasid() a bit.

 arch/x86/include/asm/iommu.h   |   2 +
 arch/x86/include/asm/mmu_context.h |  14 
 drivers/iommu/intel/svm.c  | 128 ++---
 3 files changed, 132 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index bf1ed2ddc74b..ed41259fe7ac 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
return -EINVAL;
 }
 
+void __free_pasid(struct mm_struct *mm);
+
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 47562147e70b..f8c91ce8c451 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern atomic64_t last_mm_ctx_id;
 
@@ -117,9 +118,22 @@ static inline int init_new_context(struct task_struct *tsk,
init_new_context_ldt(mm);
return 0;
 }
+
+static inline void free_pasid(struct mm_struct *mm)
+{
+   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+   return;
+
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return;
+
+   __free_pasid(mm);
+}
+
 static inline void destroy_context(struct mm_struct *mm)
 {
destroy_context_ldt(mm);
+   free_pasid(mm);
 }
 
 extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 8a0cf2f0dd54..4c70b037 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -425,6 +425,69 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
return ret;
 }
 
+static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev,
+ bool new_pasid)
+{
+   if (new_pasid)
+   ioasid_free(svm->pasid);
+   kfree(svm);
+   kfree(sdev);
+}
+
+/*
+ * If this mm already has a PASID, use it. Otherwise allocate a new one.
+ * Let the caller know if a new PASID is allocated via 'new_pasid'.
+ */
+static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
+  u32 pasid_max, bool *new_pasid,
+  unsigned int flags)
+{
+   u32 pasid;
+
+   *new_pasid = false;
+
+   /*
+* Reuse the PASID if the mm already has a PASID and not a private
+* PASID is requested.
+*/
+   if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
+   void *p;
+
+   /*
+* Since the mm has a PASID already, the PASID should be
+* bound and unbound to the mm before calling this allocation.
+* So the PASID must be allocated by bind_mm() previously and
+* should still exist in ioasid; but its data must be cleared
+* already by unbind_mm().
+*
+* Do a sanity check here to ensure the PASID has the right
+* status before reusing it.
+*/
+   p = ioasid_find(NULL, mm->pasid, NULL);
+   if (IS_ERR(p) || p)
+   return INVALID_IOASID;
+
+   /*
+* Once the PASID is allocated for this mm, it
+* stays with the mm until the mm is dropped. Reuse
+* the PASID which has been already allocated for the
+* mm instead of allocating a new one.
+*/
+   ioasid_set_data(mm->pasid, svm);
+
+   return mm->pasid;
+   }
+
+   /* Allocate a new pasid. Do not use PASID 0, reserved for init PASID. */
+   pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, svm);
+   if (pasid != INVALID_IOASID) {
+   /* A new pasid is allocated. */
+   *new_pasid = true;
+   }
+
+   return pasid;
+}
+
 /* Caller must hold pasid_mutex, mm reference */
 static int
 intel_svm_bind_mm(struct device *dev, unsigned int flags,
@@ -518,6 +581,8 @@ intel_svm_bind_mm(struct devi

Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-25 Thread Stefano Stabellini
On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > > 
> > > > > > Use xen_swiotlb to determine when vring should use dma APIs to map 
> > > > > > the
> > > > > > ring: when xen_swiotlb is enabled the dma API is required. When it 
> > > > > > is
> > > > > > disabled, it is not required.
> > > > > > 
> > > > > > Signed-off-by: Peng Fan 
> > > > > 
> > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
> > > > > Xen was there first, but everyone else is using that now.
> > > > 
> > > > Unfortunately it is complicated and it is not related to
> > > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > > 
> > > > 
> > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to translate
> > > > foreign mappings (memory coming from other VMs) to physical addresses.
> > > > On x86, it also uses dma_ops to translate Linux's idea of a physical
> > > > address into a real physical address (this is unneeded on ARM.)
> > > > 
> > > > 
> > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be used on 
> > > > Xen/x86
> > > > always and on Xen/ARM if Linux is Dom0 (because it has foreign
> > > > mappings.) That is why we have the if (xen_domain) return true; in
> > > > vring_use_dma_api.
> > > 
> > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > > 
> > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also*
> > > forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> > >
> > > Unfortunately as a result Xen never got around to
> > > properly setting VIRTIO_F_IOMMU_PLATFORM.
> > 
> > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this because
> > the usage of swiotlb_xen is not a property of virtio,
> 
> 
> Basically any device without VIRTIO_F_ACCESS_PLATFORM
> (that is it's name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is
> what linux calls it) is declared as "special, don't follow normal rules
> for access".
> 
> So yes swiotlb_xen is not a property of virtio, but what *is* a property
> of virtio is that it's not special, just a regular device from DMA POV.

I am trying to understand what you meant but I think I am missing
something.

Are you saying that modern virtio should always have
VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other devices?

If that is the case, how is it possible that virtio breaks on ARM using
the default dma_ops? The breakage is not Xen related (except that Xen
turns dma_ops on). The original message from Peng was:

  vring_map_one_sg -> vring_use_dma_api
   -> dma_map_page
   -> __swiotlb_map_page
->swiotlb_map_page
->__dma_map_area(phys_to_virt(dma_to_phys(dev, 
dev_addr)), size, dir);
  However we are using per device dma area for rpmsg, phys_to_virt
  could not return a correct virtual address for virtual address in
  vmalloc area. Then kernel panic.

I must be missing something. Maybe it is because it has to do with RPMesg?
 

> > > > You might have noticed that I missed one possible case above: Xen/ARM
> > > > DomU :-)
> > > > 
> > > > Xen/ARM domUs don't need swiotlb_xen, it is not even initialized. So if
> > > > (xen_domain) return true; would give the wrong answer in that case.
> > > > Linux would end up calling the "normal" dma_ops, not swiotlb-xen, and
> > > > the "normal" dma_ops fail.
> > > > 
> > > > 
> > > > The solution I suggested was to make the check in vring_use_dma_api more
> > > > flexible by returning true if the swiotlb_xen is supposed to be used,
> > > > not in general for all Xen domains, because that is what the check was
> > > > really meant to do.
> > > 
> > > Why not fix DMA ops so they DTRT (nop) on Xen/ARM DomU? What is wrong 
> > > with that?
> > 
> > swiotlb-xen is not used on Xen/ARM DomU, the default dma_ops are the
> > ones that are used. So you are saying, why don't we fix the default
> > dma_ops to work with virtio?
> > 
> > It is bad that the default dma_ops crash with virtio, so yes I think it
> > would be good to fix that. However, even if we fixed that, the if
> > (xen_domain()) check in vring_use_dma_api is still a problem.
> 
> Why is it a problem? It just makes virtio use DMA API.
> If that in turn works, problem solved.

You are correct in the sense that it would work. However I do think it
is wrong for vring_use_dma_api to enable dma_ops/swiotlb-xen for Xen/ARM
DomUs that don't need it. There are many different types of Xen guests,
Xen x86 is drastically different from Xen ARM, it seems wrong to treat
them the same way.



Anyway, re-reading the last messages of the original thread [1], it
looks like Peng

Re: [PATCH 2/2] iommu/amd: Use 'unsigned long' for domain->pt_root

2020-06-25 Thread Qian Cai
On Thu, Jun 25, 2020 at 04:52:27PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Using atomic64_t can be quite expensive, so use unsigned long instead.
> This is safe because the write becomes visible atomically.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/amd/amd_iommu_types.h |  2 +-
>  drivers/iommu/amd/iommu.c   | 10 --
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu_types.h 
> b/drivers/iommu/amd/amd_iommu_types.h
> index 30a5d412255a..f6f102282dda 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -468,7 +468,7 @@ struct protection_domain {
>  iommu core code */
>   spinlock_t lock;/* mostly used to lock the page table*/
>   u16 id; /* the domain id written to the device table */
> - atomic64_t pt_root; /* pgtable root and pgtable mode */
> + unsigned long pt_root;  /* pgtable root and pgtable mode */
>   int glx;/* Number of levels for GCR3 table */
>   u64 *gcr3_tbl;  /* Guest CR3 table */
>   unsigned long flags;/* flags to find out type of domain */
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 5286ddcfc2f9..b0e1dc58244e 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -156,7 +156,7 @@ static struct protection_domain *to_pdomain(struct 
> iommu_domain *dom)
>  static void amd_iommu_domain_get_pgtable(struct protection_domain *domain,
>struct domain_pgtable *pgtable)
>  {
> - u64 pt_root = atomic64_read(&domain->pt_root);
> + unsigned long pt_root = domain->pt_root;

The pt_root might be reload later in case of register pressure where the
compiler decides to not store it as a stack variable, so it needs
smp_rmb() here to match to the smp_wmb() in
amd_iommu_domain_set_pt_root() to make the load visiable to all CPUs.

Then, smp_rmb/wmb() wouldn't be able to deal with data races, so it
needs,

unsigned long pt_root = READ_ONCE(domain->pt_root);

>  
>   pgtable->root = (u64 *)(pt_root & PAGE_MASK);
>   pgtable->mode = pt_root & 7; /* lowest 3 bits encode pgtable mode */
> @@ -164,7 +164,13 @@ static void amd_iommu_domain_get_pgtable(struct 
> protection_domain *domain,
>  
>  static void amd_iommu_domain_set_pt_root(struct protection_domain *domain, 
> u64 root)
>  {
> - atomic64_set(&domain->pt_root, root);
> + domain->pt_root = root;

WRITE_ONCE(domain->pt_root, root);

> +
> + /*
> +  * The new value needs to be gobally visible in case pt_root gets
> +  * cleared, so that the page-table can be safely freed.
> +  */
> + smp_wmb();
>  }
>  
>  static void amd_iommu_domain_clr_pt_root(struct protection_domain *domain)
> -- 
> 2.27.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/amd: Add helper functions to update domain->pt_root

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Do not call atomic64_set() directly to update the domain page-table
root and use two new helper functions.

This makes it easier to implement additional work necessary when
the page-table is updated.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/iommu.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 74cca1757172..5286ddcfc2f9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -162,7 +162,18 @@ static void amd_iommu_domain_get_pgtable(struct 
protection_domain *domain,
pgtable->mode = pt_root & 7; /* lowest 3 bits encode pgtable mode */
 }
 
-static u64 amd_iommu_domain_encode_pgtable(u64 *root, int mode)
+static void amd_iommu_domain_set_pt_root(struct protection_domain *domain, u64 
root)
+{
+   atomic64_set(&domain->pt_root, root);
+}
+
+static void amd_iommu_domain_clr_pt_root(struct protection_domain *domain)
+{
+   amd_iommu_domain_set_pt_root(domain, 0);
+}
+
+static void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
+u64 *root, int mode)
 {
u64 pt_root;
 
@@ -170,7 +181,7 @@ static u64 amd_iommu_domain_encode_pgtable(u64 *root, int 
mode)
pt_root = mode & 7;
pt_root |= (u64)root;
 
-   return pt_root;
+   amd_iommu_domain_set_pt_root(domain, pt_root);
 }
 
 static struct iommu_dev_data *alloc_dev_data(u16 devid)
@@ -1410,7 +1421,7 @@ static bool increase_address_space(struct 
protection_domain *domain,
struct domain_pgtable pgtable;
unsigned long flags;
bool ret = true;
-   u64 *pte, root;
+   u64 *pte;
 
spin_lock_irqsave(&domain->lock, flags);
 
@@ -1438,8 +1449,7 @@ static bool increase_address_space(struct 
protection_domain *domain,
 * Device Table needs to be updated and flushed before the new root can
 * be published.
 */
-   root = amd_iommu_domain_encode_pgtable(pte, pgtable.mode);
-   atomic64_set(&domain->pt_root, root);
+   amd_iommu_domain_set_pgtable(domain, pte, pgtable.mode);
 
ret = true;
 
@@ -2319,7 +2329,7 @@ static void protection_domain_free(struct 
protection_domain *domain)
domain_id_free(domain->id);
 
amd_iommu_domain_get_pgtable(domain, &pgtable);
-   atomic64_set(&domain->pt_root, 0);
+   amd_iommu_domain_clr_pt_root(domain);
free_pagetable(&pgtable);
 
kfree(domain);
@@ -2327,7 +2337,7 @@ static void protection_domain_free(struct 
protection_domain *domain)
 
 static int protection_domain_init(struct protection_domain *domain, int mode)
 {
-   u64 *pt_root = NULL, root;
+   u64 *pt_root = NULL;
 
BUG_ON(mode < PAGE_MODE_NONE || mode > PAGE_MODE_6_LEVEL);
 
@@ -2343,8 +2353,7 @@ static int protection_domain_init(struct 
protection_domain *domain, int mode)
return -ENOMEM;
}
 
-   root = amd_iommu_domain_encode_pgtable(pt_root, mode);
-   atomic64_set(&domain->pt_root, root);
+   amd_iommu_domain_set_pgtable(domain, pt_root, mode);
 
return 0;
 }
@@ -2713,8 +2722,8 @@ void amd_iommu_domain_direct_map(struct iommu_domain *dom)
/* First save pgtable configuration*/
amd_iommu_domain_get_pgtable(domain, &pgtable);
 
-   /* Update data structure */
-   atomic64_set(&domain->pt_root, 0);
+   /* Remove page-table from domain */
+   amd_iommu_domain_clr_pt_root(domain);
 
/* Make changes visible to IOMMUs */
update_domain(domain);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] iommu/amd: Don't use atomic64_t for domain->pt_root

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Hi,

a previous discussion pointed out that using atomic64_t for that
purpose is a bit of overkill. This patch-set replaces it with unsigned
long and introduces some helpers first to make the change more easy.

Qian, can you please test these patches in your environment? You can
trigger any race-condition there pretty reliably :)

Other than that, please review and test.

Regards,

Joerg

Joerg Roedel (2):
  iommu/amd: Add helper functions to update domain->pt_root
  iommu/amd: Use 'unsigned long' for domain->pt_root

 drivers/iommu/amd/amd_iommu_types.h |  2 +-
 drivers/iommu/amd/iommu.c   | 39 -
 2 files changed, 28 insertions(+), 13 deletions(-)

-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/amd: Use 'unsigned long' for domain->pt_root

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Using atomic64_t can be quite expensive, so use unsigned long instead.
This is safe because the write becomes visible atomically.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 +-
 drivers/iommu/amd/iommu.c   | 10 --
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 30a5d412255a..f6f102282dda 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -468,7 +468,7 @@ struct protection_domain {
   iommu core code */
spinlock_t lock;/* mostly used to lock the page table*/
u16 id; /* the domain id written to the device table */
-   atomic64_t pt_root; /* pgtable root and pgtable mode */
+   unsigned long pt_root;  /* pgtable root and pgtable mode */
int glx;/* Number of levels for GCR3 table */
u64 *gcr3_tbl;  /* Guest CR3 table */
unsigned long flags;/* flags to find out type of domain */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 5286ddcfc2f9..b0e1dc58244e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -156,7 +156,7 @@ static struct protection_domain *to_pdomain(struct 
iommu_domain *dom)
 static void amd_iommu_domain_get_pgtable(struct protection_domain *domain,
 struct domain_pgtable *pgtable)
 {
-   u64 pt_root = atomic64_read(&domain->pt_root);
+   unsigned long pt_root = domain->pt_root;
 
pgtable->root = (u64 *)(pt_root & PAGE_MASK);
pgtable->mode = pt_root & 7; /* lowest 3 bits encode pgtable mode */
@@ -164,7 +164,13 @@ static void amd_iommu_domain_get_pgtable(struct 
protection_domain *domain,
 
 static void amd_iommu_domain_set_pt_root(struct protection_domain *domain, u64 
root)
 {
-   atomic64_set(&domain->pt_root, root);
+   domain->pt_root = root;
+
+   /*
+* The new value needs to be gobally visible in case pt_root gets
+* cleared, so that the page-table can be safely freed.
+*/
+   smp_wmb();
 }
 
 static void amd_iommu_domain_clr_pt_root(struct protection_domain *domain)
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/13] iommu/vt-d: Use dev_iommu_priv_get/set()

2020-06-25 Thread Lu Baolu

Hi Joerg,

On 2020/6/25 21:08, Joerg Roedel wrote:

From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
  .../gpu/drm/i915/selftests/mock_gem_device.c   | 10 --
  drivers/iommu/intel/iommu.c| 18 +-


For changes in VT-d driver,

Reviewed-by: Lu Baolu 

Best regards,
baolu


  2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c 
b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 9b105b811f1f..e08601905a64 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -24,6 +24,7 @@
  
  #include 

  #include 
+#include 
  
  #include 
  
@@ -118,6 +119,9 @@ struct drm_i915_private *mock_gem_device(void)

  {
struct drm_i915_private *i915;
struct pci_dev *pdev;
+#if IS_ENABLED(CONFIG_IOMMU_API) && defined(CONFIG_INTEL_IOMMU)
+   struct dev_iommu iommu;
+#endif
int err;
  
  	pdev = kzalloc(sizeof(*pdev), GFP_KERNEL);

@@ -136,8 +140,10 @@ struct drm_i915_private *mock_gem_device(void)
dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
  
  #if IS_ENABLED(CONFIG_IOMMU_API) && defined(CONFIG_INTEL_IOMMU)

-   /* hack to disable iommu for the fake device; force identity mapping */
-   pdev->dev.archdata.iommu = (void *)-1;
+   /* HACK HACK HACK to disable iommu for the fake device; force identity 
mapping */
+   memset(&iommu, 0, sizeof(iommu));
+   iommu.priv = (void *)-1;
+   pdev->dev.iommu = &iommu;
  #endif
  
  	pci_set_drvdata(pdev, i915);

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e98..2ce490c2eab8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -372,7 +372,7 @@ struct device_domain_info *get_domain_info(struct device 
*dev)
if (!dev)
return NULL;
  
-	info = dev->archdata.iommu;

+   info = dev_iommu_priv_get(dev);
if (unlikely(info == DUMMY_DEVICE_DOMAIN_INFO ||
 info == DEFER_DEVICE_DOMAIN_INFO))
return NULL;
@@ -743,12 +743,12 @@ struct context_entry *iommu_context_addr(struct 
intel_iommu *iommu, u8 bus,
  
  static int iommu_dummy(struct device *dev)

  {
-   return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
+   return dev_iommu_priv_get(dev) == DUMMY_DEVICE_DOMAIN_INFO;
  }
  
  static bool attach_deferred(struct device *dev)

  {
-   return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
+   return dev_iommu_priv_get(dev) == DEFER_DEVICE_DOMAIN_INFO;
  }
  
  /**

@@ -2420,7 +2420,7 @@ static inline void unlink_domain_info(struct 
device_domain_info *info)
list_del(&info->link);
list_del(&info->global);
if (info->dev)
-   info->dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(info->dev, NULL);
  }
  
  static void domain_remove_dev_info(struct dmar_domain *domain)

@@ -2453,7 +2453,7 @@ static void do_deferred_attach(struct device *dev)
  {
struct iommu_domain *domain;
  
-	dev->archdata.iommu = NULL;

+   dev_iommu_priv_set(dev, NULL);
domain = iommu_get_domain_for_dev(dev);
if (domain)
intel_iommu_attach_device(domain, dev);
@@ -2599,7 +2599,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
list_add(&info->link, &domain->devices);
list_add(&info->global, &device_domain_list);
if (dev)
-   dev->archdata.iommu = info;
+   dev_iommu_priv_set(dev, info);
spin_unlock_irqrestore(&device_domain_lock, flags);
  
  	/* PASID table is mandatory for a PCI device in scalable mode. */

@@ -4004,7 +4004,7 @@ static void quirk_ioat_snb_local_iommu(struct pci_dev 
*pdev)
if (!drhd || drhd->reg_base_addr - vtbar != 0xa000) {
pr_warn_once(FW_BUG "BIOS assigned incorrect VT-d unit for Intel(R) 
QuickData Technology device\n");
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
-   pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
+   dev_iommu_priv_set(&pdev->dev, DUMMY_DEVICE_DOMAIN_INFO);
}
  }
  DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IOAT_SNB, 
quirk_ioat_snb_local_iommu);
@@ -4043,7 +4043,7 @@ static void __init init_no_remapping_devices(void)
drhd->ignored = 1;
for_each_active_dev_scope(drhd->devices,
  drhd->devices_cnt, i, dev)
-   dev->archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
+   dev_iommu_priv_set(dev, 
DUMMY_DEVICE_DOMAIN_INFO);
}
}
  }
@@ -5665,7 +5665,7 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *

Re: [PATCH 12/13] arm64: Remove dev->archdata.iommu pointer

2020-06-25 Thread Will Deacon
On Thu, Jun 25, 2020 at 03:08:35PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> There are no users left, all drivers have been converted to use the
> per-device private pointer offered by IOMMU core.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  arch/arm64/include/asm/device.h | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
> index 12b778d55342..996498751318 100644
> --- a/arch/arm64/include/asm/device.h
> +++ b/arch/arm64/include/asm/device.h
> @@ -6,9 +6,6 @@
>  #define __ASM_DEVICE_H
>  
>  struct dev_archdata {
> -#ifdef CONFIG_IOMMU_API
> - void *iommu;/* private IOMMU data */
> -#endif
>  };

Acked-by: Will Deacon 

Thanks, Joerg.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/13] arm64: Remove dev->archdata.iommu pointer

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

There are no users left, all drivers have been converted to use the
per-device private pointer offered by IOMMU core.

Signed-off-by: Joerg Roedel 
---
 arch/arm64/include/asm/device.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 12b778d55342..996498751318 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -6,9 +6,6 @@
 #define __ASM_DEVICE_H
 
 struct dev_archdata {
-#ifdef CONFIG_IOMMU_API
-   void *iommu;/* private IOMMU data */
-#endif
 };
 
 struct pdev_archdata {
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/13] iommu/exynos: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/exynos-iommu.c  | 20 +--
 .../media/platform/s5p-mfc/s5p_mfc_iommu.h|  4 +++-
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 60c8a56e4a3f..6a9b67302369 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -173,7 +173,7 @@ static u32 lv2ent_offset(sysmmu_iova_t iova)
 #define REG_V5_FAULT_AR_VA 0x070
 #define REG_V5_FAULT_AW_VA 0x080
 
-#define has_sysmmu(dev)(dev->archdata.iommu != NULL)
+#define has_sysmmu(dev)(dev_iommu_priv_get(dev) != NULL)
 
 static struct device *dma_dev;
 static struct kmem_cache *lv2table_kmem_cache;
@@ -226,7 +226,7 @@ static const struct sysmmu_fault_info sysmmu_v5_faults[] = {
 };
 
 /*
- * This structure is attached to dev.archdata.iommu of the master device
+ * This structure is attached to dev->iommu->priv of the master device
  * on device add, contains a list of SYSMMU controllers defined by device tree,
  * which are bound to given master device. It is usually referenced by 'owner'
  * pointer.
@@ -670,7 +670,7 @@ static int __maybe_unused exynos_sysmmu_suspend(struct 
device *dev)
struct device *master = data->master;
 
if (master) {
-   struct exynos_iommu_owner *owner = master->archdata.iommu;
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
 
mutex_lock(&owner->rpm_lock);
if (data->domain) {
@@ -688,7 +688,7 @@ static int __maybe_unused exynos_sysmmu_resume(struct 
device *dev)
struct device *master = data->master;
 
if (master) {
-   struct exynos_iommu_owner *owner = master->archdata.iommu;
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
 
mutex_lock(&owner->rpm_lock);
if (data->domain) {
@@ -837,8 +837,8 @@ static void exynos_iommu_domain_free(struct iommu_domain 
*iommu_domain)
 static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain,
struct device *dev)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct exynos_iommu_domain *domain = to_exynos_domain(iommu_domain);
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
phys_addr_t pagetable = virt_to_phys(domain->pgtable);
struct sysmmu_drvdata *data, *next;
unsigned long flags;
@@ -875,8 +875,8 @@ static void exynos_iommu_detach_device(struct iommu_domain 
*iommu_domain,
 static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain,
   struct device *dev)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct exynos_iommu_domain *domain = to_exynos_domain(iommu_domain);
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
struct sysmmu_drvdata *data;
phys_addr_t pagetable = virt_to_phys(domain->pgtable);
unsigned long flags;
@@ -1237,7 +1237,7 @@ static phys_addr_t exynos_iommu_iova_to_phys(struct 
iommu_domain *iommu_domain,
 
 static struct iommu_device *exynos_iommu_probe_device(struct device *dev)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
struct sysmmu_drvdata *data;
 
if (!has_sysmmu(dev))
@@ -1263,7 +1263,7 @@ static struct iommu_device 
*exynos_iommu_probe_device(struct device *dev)
 
 static void exynos_iommu_release_device(struct device *dev)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
struct sysmmu_drvdata *data;
 
if (!has_sysmmu(dev))
@@ -1287,8 +1287,8 @@ static void exynos_iommu_release_device(struct device 
*dev)
 static int exynos_iommu_of_xlate(struct device *dev,
 struct of_phandle_args *spec)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct platform_device *sysmmu = of_find_device_by_node(spec->np);
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
struct sysmmu_drvdata *data, *entry;
 
if (!sysmmu)
@@ -1305,7 +1305,7 @@ static int exynos_iommu_of_xlate(struct device *dev,
 
INIT_LIST_HEAD(&owner->controllers);
mutex_init(&owner->rpm_lock);
-   dev->archdata.iommu = owner;
+   dev_iommu_priv_set(dev, owner);
}
 
list_for_each_entry(entry, &owner->controllers, owner_node)
diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_iommu.h 
b/drivers/media/platform/s5p-mfc/s5p_mfc_iommu.h
index 152a713fff78..1a32266b7ddc 100644
--- a/drivers/media/platform/s

[PATCH 03/13] iommu/msm: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/msm_iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 3d8a63555c25..f773cc85f311 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -593,14 +593,14 @@ static void insert_iommu_master(struct device *dev,
struct msm_iommu_dev **iommu,
struct of_phandle_args *spec)
 {
-   struct msm_iommu_ctx_dev *master = dev->archdata.iommu;
+   struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
int sid;
 
if (list_empty(&(*iommu)->ctx_list)) {
master = kzalloc(sizeof(*master), GFP_ATOMIC);
master->of_node = dev->of_node;
list_add(&master->list, &(*iommu)->ctx_list);
-   dev->archdata.iommu = master;
+   dev_iommu_priv_set(dev, master);
}
 
for (sid = 0; sid < master->num_mids; sid++)
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/13] x86: Remove dev->archdata.iommu pointer

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

There are no users left, all drivers have been converted to use the
per-device private pointer offered by IOMMU core.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/device.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/include/asm/device.h b/arch/x86/include/asm/device.h
index 49bd6cf3eec9..7c0a52ca2f4d 100644
--- a/arch/x86/include/asm/device.h
+++ b/arch/x86/include/asm/device.h
@@ -3,9 +3,6 @@
 #define _ASM_X86_DEVICE_H
 
 struct dev_archdata {
-#ifdef CONFIG_IOMMU_API
-   void *iommu; /* hook for IOMMU specific extension */
-#endif
 };
 
 struct pdev_archdata {
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/13] iommu/mediatek: Do no use dev->archdata.iommu

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

The iommu private pointer is already used in the Mediatek IOMMU v1
driver, so move the dma_iommu_mapping pointer into 'struct
mtk_iommu_data' and do not use dev->archdata.iommu anymore.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/mtk_iommu.h|  2 ++
 drivers/iommu/mtk_iommu_v1.c | 10 --
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index ea949a324e33..1682406e98dc 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -62,6 +62,8 @@ struct mtk_iommu_data {
struct iommu_device iommu;
const struct mtk_iommu_plat_data *plat_data;
 
+   struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */
+
struct list_headlist;
struct mtk_smi_larb_iommu   larb_imu[MTK_LARB_NR_MAX];
 };
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index c9d79cff4d17..82ddfe9170d4 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -269,7 +269,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
int ret;
 
/* Only allow the domain created internally. */
-   mtk_mapping = data->dev->archdata.iommu;
+   mtk_mapping = data->mapping;
if (mtk_mapping->domain != domain)
return 0;
 
@@ -369,7 +369,6 @@ static int mtk_iommu_create_mapping(struct device *dev,
struct mtk_iommu_data *data;
struct platform_device *m4updev;
struct dma_iommu_mapping *mtk_mapping;
-   struct device *m4udev;
int ret;
 
if (args->args_count != 1) {
@@ -401,8 +400,7 @@ static int mtk_iommu_create_mapping(struct device *dev,
return ret;
 
data = dev_iommu_priv_get(dev);
-   m4udev = data->dev;
-   mtk_mapping = m4udev->archdata.iommu;
+   mtk_mapping = data->mapping;
if (!mtk_mapping) {
/* MTK iommu support 4GB iova address space. */
mtk_mapping = arm_iommu_create_mapping(&platform_bus_type,
@@ -410,7 +408,7 @@ static int mtk_iommu_create_mapping(struct device *dev,
if (IS_ERR(mtk_mapping))
return PTR_ERR(mtk_mapping);
 
-   m4udev->archdata.iommu = mtk_mapping;
+   data->mapping = mtk_mapping;
}
 
return 0;
@@ -459,7 +457,7 @@ static void mtk_iommu_probe_finalize(struct device *dev)
int err;
 
data= dev_iommu_priv_get(dev);
-   mtk_mapping = data->dev->archdata.iommu;
+   mtk_mapping = data->mapping;
 
err = arm_iommu_attach_device(dev, mtk_mapping);
if (err)
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/13] ia64: Remove dev->archdata.iommu pointer

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

There are no users left, all drivers have been converted to use the
per-device private pointer offered by IOMMU core.

Signed-off-by: Joerg Roedel 
---
 arch/ia64/include/asm/device.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/ia64/include/asm/device.h b/arch/ia64/include/asm/device.h
index 3eb397415381..918b198cd5bb 100644
--- a/arch/ia64/include/asm/device.h
+++ b/arch/ia64/include/asm/device.h
@@ -6,9 +6,6 @@
 #define _ASM_IA64_DEVICE_H
 
 struct dev_archdata {
-#ifdef CONFIG_IOMMU_API
-   void *iommu; /* hook for IOMMU specific extension */
-#endif
 };
 
 struct pdev_archdata {
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/13] iommu/omap: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/omap-iommu.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index c8282cc212cb..e84ead6fb234 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -71,7 +71,7 @@ static struct omap_iommu_domain *to_omap_domain(struct 
iommu_domain *dom)
  **/
 void omap_iommu_save_ctx(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct omap_iommu *obj;
u32 *p;
int i;
@@ -101,7 +101,7 @@ EXPORT_SYMBOL_GPL(omap_iommu_save_ctx);
  **/
 void omap_iommu_restore_ctx(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct omap_iommu *obj;
u32 *p;
int i;
@@ -1398,7 +1398,7 @@ static size_t omap_iommu_unmap(struct iommu_domain 
*domain, unsigned long da,
 
 static int omap_iommu_count(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
int count = 0;
 
while (arch_data->iommu_dev) {
@@ -1459,8 +1459,8 @@ static void omap_iommu_detach_fini(struct 
omap_iommu_domain *odomain)
 static int
 omap_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct omap_iommu_domain *omap_domain = to_omap_domain(domain);
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
struct omap_iommu_device *iommu;
struct omap_iommu *oiommu;
int ret = 0;
@@ -1524,7 +1524,7 @@ omap_iommu_attach_dev(struct iommu_domain *domain, struct 
device *dev)
 static void _omap_iommu_detach_dev(struct omap_iommu_domain *omap_domain,
   struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct omap_iommu_device *iommu = omap_domain->iommus;
struct omap_iommu *oiommu;
int i;
@@ -1650,7 +1650,7 @@ static struct iommu_device 
*omap_iommu_probe_device(struct device *dev)
int num_iommus, i;
 
/*
-* Allocate the archdata iommu structure for DT-based devices.
+* Allocate the per-device iommu structure for DT-based devices.
 *
 * TODO: Simplify this when removing non-DT support completely from the
 * IOMMU users.
@@ -1698,7 +1698,7 @@ static struct iommu_device 
*omap_iommu_probe_device(struct device *dev)
of_node_put(np);
}
 
-   dev->archdata.iommu = arch_data;
+   dev_iommu_priv_set(dev, arch_data);
 
/*
 * use the first IOMMU alone for the sysfs device linking.
@@ -1712,19 +1712,19 @@ static struct iommu_device 
*omap_iommu_probe_device(struct device *dev)
 
 static void omap_iommu_release_device(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
 
if (!dev->of_node || !arch_data)
return;
 
-   dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(dev, NULL);
kfree(arch_data);
 
 }
 
 static struct iommu_group *omap_iommu_device_group(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct iommu_group *group = ERR_PTR(-EINVAL);
 
if (!arch_data)
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/13] iommu/pamu: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu_domain and use the private
per-device pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/fsl_pamu_domain.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 928d37771ece..b2110767caf4 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -323,7 +323,7 @@ static void remove_device_ref(struct device_domain_info 
*info, u32 win_cnt)
pamu_disable_liodn(info->liodn);
spin_unlock_irqrestore(&iommu_lock, flags);
spin_lock_irqsave(&device_domain_lock, flags);
-   info->dev->archdata.iommu_domain = NULL;
+   dev_iommu_priv_set(info->dev, NULL);
kmem_cache_free(iommu_devinfo_cache, info);
spin_unlock_irqrestore(&device_domain_lock, flags);
 }
@@ -352,7 +352,7 @@ static void attach_device(struct fsl_dma_domain 
*dma_domain, int liodn, struct d
 * Check here if the device is already attached to domain or not.
 * If the device is already attached to a domain detach it.
 */
-   old_domain_info = dev->archdata.iommu_domain;
+   old_domain_info = dev_iommu_priv_get(dev);
if (old_domain_info && old_domain_info->domain != dma_domain) {
spin_unlock_irqrestore(&device_domain_lock, flags);
detach_device(dev, old_domain_info->domain);
@@ -371,8 +371,8 @@ static void attach_device(struct fsl_dma_domain 
*dma_domain, int liodn, struct d
 * the info for the first LIODN as all
 * LIODNs share the same domain
 */
-   if (!dev->archdata.iommu_domain)
-   dev->archdata.iommu_domain = info;
+   if (!dev_iommu_priv_get(dev))
+   dev_iommu_priv_set(dev, info);
spin_unlock_irqrestore(&device_domain_lock, flags);
 }
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 13/13] powerpc/dma: Remove dev->archdata.iommu_domain

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

There are no users left, so remove the pointer and save some memory.

Signed-off-by: Joerg Roedel 
---
 arch/powerpc/include/asm/device.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 266542769e4b..1bc595213338 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -34,9 +34,6 @@ struct dev_archdata {
struct iommu_table  *iommu_table_base;
 #endif
 
-#ifdef CONFIG_IOMMU_API
-   void*iommu_domain;
-#endif
 #ifdef CONFIG_PPC64
struct pci_dn   *pci_data;
 #endif
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/13] iommu/vt-d: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 .../gpu/drm/i915/selftests/mock_gem_device.c   | 10 --
 drivers/iommu/intel/iommu.c| 18 +-
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c 
b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 9b105b811f1f..e08601905a64 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -118,6 +119,9 @@ struct drm_i915_private *mock_gem_device(void)
 {
struct drm_i915_private *i915;
struct pci_dev *pdev;
+#if IS_ENABLED(CONFIG_IOMMU_API) && defined(CONFIG_INTEL_IOMMU)
+   struct dev_iommu iommu;
+#endif
int err;
 
pdev = kzalloc(sizeof(*pdev), GFP_KERNEL);
@@ -136,8 +140,10 @@ struct drm_i915_private *mock_gem_device(void)
dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
 
 #if IS_ENABLED(CONFIG_IOMMU_API) && defined(CONFIG_INTEL_IOMMU)
-   /* hack to disable iommu for the fake device; force identity mapping */
-   pdev->dev.archdata.iommu = (void *)-1;
+   /* HACK HACK HACK to disable iommu for the fake device; force identity 
mapping */
+   memset(&iommu, 0, sizeof(iommu));
+   iommu.priv = (void *)-1;
+   pdev->dev.iommu = &iommu;
 #endif
 
pci_set_drvdata(pdev, i915);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e98..2ce490c2eab8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -372,7 +372,7 @@ struct device_domain_info *get_domain_info(struct device 
*dev)
if (!dev)
return NULL;
 
-   info = dev->archdata.iommu;
+   info = dev_iommu_priv_get(dev);
if (unlikely(info == DUMMY_DEVICE_DOMAIN_INFO ||
 info == DEFER_DEVICE_DOMAIN_INFO))
return NULL;
@@ -743,12 +743,12 @@ struct context_entry *iommu_context_addr(struct 
intel_iommu *iommu, u8 bus,
 
 static int iommu_dummy(struct device *dev)
 {
-   return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
+   return dev_iommu_priv_get(dev) == DUMMY_DEVICE_DOMAIN_INFO;
 }
 
 static bool attach_deferred(struct device *dev)
 {
-   return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
+   return dev_iommu_priv_get(dev) == DEFER_DEVICE_DOMAIN_INFO;
 }
 
 /**
@@ -2420,7 +2420,7 @@ static inline void unlink_domain_info(struct 
device_domain_info *info)
list_del(&info->link);
list_del(&info->global);
if (info->dev)
-   info->dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(info->dev, NULL);
 }
 
 static void domain_remove_dev_info(struct dmar_domain *domain)
@@ -2453,7 +2453,7 @@ static void do_deferred_attach(struct device *dev)
 {
struct iommu_domain *domain;
 
-   dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(dev, NULL);
domain = iommu_get_domain_for_dev(dev);
if (domain)
intel_iommu_attach_device(domain, dev);
@@ -2599,7 +2599,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
list_add(&info->link, &domain->devices);
list_add(&info->global, &device_domain_list);
if (dev)
-   dev->archdata.iommu = info;
+   dev_iommu_priv_set(dev, info);
spin_unlock_irqrestore(&device_domain_lock, flags);
 
/* PASID table is mandatory for a PCI device in scalable mode. */
@@ -4004,7 +4004,7 @@ static void quirk_ioat_snb_local_iommu(struct pci_dev 
*pdev)
if (!drhd || drhd->reg_base_addr - vtbar != 0xa000) {
pr_warn_once(FW_BUG "BIOS assigned incorrect VT-d unit for 
Intel(R) QuickData Technology device\n");
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
-   pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
+   dev_iommu_priv_set(&pdev->dev, DUMMY_DEVICE_DOMAIN_INFO);
}
 }
 DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IOAT_SNB, 
quirk_ioat_snb_local_iommu);
@@ -4043,7 +4043,7 @@ static void __init init_no_remapping_devices(void)
drhd->ignored = 1;
for_each_active_dev_scope(drhd->devices,
  drhd->devices_cnt, i, dev)
-   dev->archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
+   dev_iommu_priv_set(dev, 
DUMMY_DEVICE_DOMAIN_INFO);
}
}
 }
@@ -5665,7 +5665,7 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
return ERR_PTR(-ENODEV);
 
if (translation_pre_enabled(iommu))
-   dev->archdata.iommu = DEFER_DEVICE_DOMAIN_

[PATCH 11/13] arm: Remove dev->archdata.iommu pointer

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

There are no users left, all drivers have been converted to use the
per-device private pointer offered by IOMMU core.

Signed-off-by: Joerg Roedel 
---
 arch/arm/include/asm/device.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index c675bc0d5aa8..be666f58bf7a 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -9,9 +9,6 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
struct dmabounce_device_info *dmabounce;
 #endif
-#ifdef CONFIG_IOMMU_API
-   void *iommu; /* private IOMMU data */
-#endif
 #ifdef CONFIG_ARM_DMA_USE_IOMMU
struct dma_iommu_mapping*mapping;
 #endif
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 00/13] iommu: Remove usage of dev->archdata.iommu

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Hi,

here is a patch-set to remove the usage of dev->archdata.iommu from
the IOMMU code in the kernel and replace its uses by the iommu per-device
private data field. The changes also remove the field entirely from
the architectures which no longer need it.

On PowerPC the field is called dev->archdata.iommu_domain and was only
used by the PAMU IOMMU driver. It gets removed as well.

The patches have been runtime tested on Intel VT-d and compile tested
with allyesconfig for:

* x86 (32 and 64 bit)
* arm and arm64
* ia64 (only drivers/ because build failed for me in
arch/ia64)
* PPC64

Besides that the changes also survived my IOMMU tree compile tests.

Please review.

Regards,

Joerg

Joerg Roedel (13):
  iommu/exynos: Use dev_iommu_priv_get/set()
  iommu/vt-d: Use dev_iommu_priv_get/set()
  iommu/msm: Use dev_iommu_priv_get/set()
  iommu/omap: Use dev_iommu_priv_get/set()
  iommu/rockchip: Use dev_iommu_priv_get/set()
  iommu/tegra: Use dev_iommu_priv_get/set()
  iommu/pamu: Use dev_iommu_priv_get/set()
  iommu/mediatek: Do no use dev->archdata.iommu
  x86: Remove dev->archdata.iommu pointer
  ia64: Remove dev->archdata.iommu pointer
  arm: Remove dev->archdata.iommu pointer
  arm64: Remove dev->archdata.iommu pointer
  powerpc/dma: Remove dev->archdata.iommu_domain

 arch/arm/include/asm/device.h |  3 ---
 arch/arm64/include/asm/device.h   |  3 ---
 arch/ia64/include/asm/device.h|  3 ---
 arch/powerpc/include/asm/device.h |  3 ---
 arch/x86/include/asm/device.h |  3 ---
 .../gpu/drm/i915/selftests/mock_gem_device.c  | 10 --
 drivers/iommu/exynos-iommu.c  | 20 +--
 drivers/iommu/fsl_pamu_domain.c   |  8 
 drivers/iommu/intel/iommu.c   | 18 -
 drivers/iommu/msm_iommu.c |  4 ++--
 drivers/iommu/mtk_iommu.h |  2 ++
 drivers/iommu/mtk_iommu_v1.c  | 10 --
 drivers/iommu/omap-iommu.c| 20 +--
 drivers/iommu/rockchip-iommu.c|  8 
 drivers/iommu/tegra-gart.c|  8 
 drivers/iommu/tegra-smmu.c|  8 
 .../media/platform/s5p-mfc/s5p_mfc_iommu.h|  4 +++-
 17 files changed, 64 insertions(+), 71 deletions(-)

-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/13] iommu/tegra: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/tegra-gart.c | 8 
 drivers/iommu/tegra-smmu.c | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 5fbdff6ff41a..fac720273889 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -113,8 +113,8 @@ static int gart_iommu_attach_dev(struct iommu_domain 
*domain,
 
if (gart->active_domain && gart->active_domain != domain) {
ret = -EBUSY;
-   } else if (dev->archdata.iommu != domain) {
-   dev->archdata.iommu = domain;
+   } else if (dev_iommu_priv_get(dev) != domain) {
+   dev_iommu_priv_set(dev, domain);
gart->active_domain = domain;
gart->active_devices++;
}
@@ -131,8 +131,8 @@ static void gart_iommu_detach_dev(struct iommu_domain 
*domain,
 
spin_lock(&gart->dom_lock);
 
-   if (dev->archdata.iommu == domain) {
-   dev->archdata.iommu = NULL;
+   if (dev_iommu_priv_get(dev) == domain) {
+   dev_iommu_priv_set(dev, NULL);
 
if (--gart->active_devices == 0)
gart->active_domain = NULL;
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 7426b7666e2b..124c8848ab7e 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -465,7 +465,7 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu *smmu,
 static int tegra_smmu_attach_dev(struct iommu_domain *domain,
 struct device *dev)
 {
-   struct tegra_smmu *smmu = dev->archdata.iommu;
+   struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
struct tegra_smmu_as *as = to_smmu_as(domain);
struct device_node *np = dev->of_node;
struct of_phandle_args args;
@@ -780,7 +780,7 @@ static struct iommu_device *tegra_smmu_probe_device(struct 
device *dev)
 * supported by the Linux kernel, so abort after the
 * first match.
 */
-   dev->archdata.iommu = smmu;
+   dev_iommu_priv_set(dev, smmu);
 
break;
}
@@ -797,7 +797,7 @@ static struct iommu_device *tegra_smmu_probe_device(struct 
device *dev)
 
 static void tegra_smmu_release_device(struct device *dev)
 {
-   dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(dev, NULL);
 }
 
 static const struct tegra_smmu_group_soc *
@@ -856,7 +856,7 @@ static struct iommu_group *tegra_smmu_group_get(struct 
tegra_smmu *smmu,
 static struct iommu_group *tegra_smmu_device_group(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct tegra_smmu *smmu = dev->archdata.iommu;
+   struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
struct iommu_group *group;
 
group = tegra_smmu_group_get(smmu, fwspec->ids[0]);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/13] iommu/rockchip: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/rockchip-iommu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index d25c2486ca07..e5d86b7177de 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -836,7 +836,7 @@ static size_t rk_iommu_unmap(struct iommu_domain *domain, 
unsigned long _iova,
 
 static struct rk_iommu *rk_iommu_from_dev(struct device *dev)
 {
-   struct rk_iommudata *data = dev->archdata.iommu;
+   struct rk_iommudata *data = dev_iommu_priv_get(dev);
 
return data ? data->iommu : NULL;
 }
@@ -1059,7 +1059,7 @@ static struct iommu_device *rk_iommu_probe_device(struct 
device *dev)
struct rk_iommudata *data;
struct rk_iommu *iommu;
 
-   data = dev->archdata.iommu;
+   data = dev_iommu_priv_get(dev);
if (!data)
return ERR_PTR(-ENODEV);
 
@@ -1073,7 +1073,7 @@ static struct iommu_device *rk_iommu_probe_device(struct 
device *dev)
 
 static void rk_iommu_release_device(struct device *dev)
 {
-   struct rk_iommudata *data = dev->archdata.iommu;
+   struct rk_iommudata *data = dev_iommu_priv_get(dev);
 
device_link_del(data->link);
 }
@@ -1100,7 +1100,7 @@ static int rk_iommu_of_xlate(struct device *dev,
iommu_dev = of_find_device_by_node(args->np);
 
data->iommu = platform_get_drvdata(iommu_dev);
-   dev->archdata.iommu = data;
+   dev_iommu_priv_set(dev, data);
 
platform_device_put(iommu_dev);
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID

2020-06-25 Thread Lu Baolu

Hi Jacob,

On 2020/6/24 1:03, Jacob Pan wrote:

+int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+   struct iommu_gpasid_bind_data *data)
  {
if (unlikely(!domain->ops->sva_unbind_gpasid))
return -ENODEV;
  
-	return domain->ops->sva_unbind_gpasid(dev, pasid);

+   return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid);


__iommu_sva_unbind_gpasid() looks more like an internal only helper. How
about something like iommu_sva_kunbind_gpasid()?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/7] iommu/vt-d: Disable multiple GPASID-dev bind

2020-06-25 Thread Lu Baolu

On 2020/6/23 23:43, Jacob Pan wrote:

For the unlikely use case where multiple aux domains from the same pdev
are attached to a single guest and then bound to a single process
(thus same PASID) within that guest, we cannot easily support this case
by refcounting the number of users. As there is only one SL page table
per PASID while we have multiple aux domains thus multiple SL page tables
for the same PASID.

Extra unbinding guest PASID can happen due to race between normal and
exception cases. Termination of one aux domain may affect others unless
we actively track and switch aux domains to ensure the validity of SL
page tables and TLB states in the shared PASID entry.

Support for sharing second level PGDs across domains can reduce the
complexity but this is not available due to the limitations on VFIO
container architecture. We can revisit this decision once sharing PGDs
are available.

Overall, the complexity and potential glitch do not warrant this unlikely
use case thereby removed by this patch.

Cc: Kevin Tian 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 


Fixes: 56722a4398a30 ("iommu/vt-d: Add bind guest PASID support")
Acked-by: Lu Baolu 

Best regards,
baolu


---
  drivers/iommu/intel/svm.c | 22 +-
  1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 6c87c807a0ab..d386853121a2 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -277,20 +277,16 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
goto out;
}
  
+		/*

+* Do not allow multiple bindings of the same device-PASID since
+* there is only one SL page tables per PASID. We may revisit
+* once sharing PGD across domains are supported.
+*/
for_each_svm_dev(sdev, svm, dev) {
-   /*
-* For devices with aux domains, we should allow
-* multiple bind calls with the same PASID and pdev.
-*/
-   if (iommu_dev_feature_enabled(dev,
- IOMMU_DEV_FEAT_AUX)) {
-   sdev->users++;
-   } else {
-   dev_warn_ratelimited(dev,
-"Already bound with PASID 
%u\n",
-svm->pasid);
-   ret = -EBUSY;
-   }
+   dev_warn_ratelimited(dev,
+"Already bound with PASID %u\n",
+svm->pasid);
+   ret = -EBUSY;
goto out;
}
} else {


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] arm64: mm: reserve per-numa CMA after numa_init

2020-06-25 Thread Robin Murphy

On 2020-06-25 08:43, Barry Song wrote:

Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
with this patch, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.
Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Ganapatrao Kulkarni 
Cc: Catalin Marinas 
Cc: Nicolas Saenz Julienne 
Cc: Steve Capper 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Signed-off-by: Barry Song 
---
  arch/arm64/mm/init.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1e93cfc7c47a..07d4d1fe7983 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -420,6 +420,8 @@ void __init bootmem_init(void)
  
  	arm64_numa_init();
  
+	dma_pernuma_cma_reserve();

+


It might be worth putting this after the hugetlb_cma_reserve() call for 
clarity, since the comment below applies equally to this call too.


Robin.


/*
 * must be done after arm64_numa_init() which calls numa_init() to
 * initialize node_online_map that gets used in hugetlb_cma_reserve()


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] dma-direct: provide the ability to reserve per-numa CMA

2020-06-25 Thread Robin Murphy

On 2020-06-25 08:43, Barry Song wrote:

This is useful for at least two scenarios:
1. ARM64 smmu will get memory from local numa node, it can save its
command queues and page tables locally. Tests show it can decrease
dma_unmap latency at lot. For example, without this patch, smmu on
node2 will get memory from node0 by calling dma_alloc_coherent(),
typically, it has to wait for more than 560ns for the completion of
CMD_SYNC in an empty command queue; with this patch, it needs 240ns
only.
2. when we set iommu passthrough, drivers will get memory from CMA,
local memory means much less latency.

Cc: Jonathan Cameron 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Ganapatrao Kulkarni 
Cc: Catalin Marinas 
Cc: Nicolas Saenz Julienne 
Cc: Steve Capper 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Signed-off-by: Barry Song 
---
  include/linux/dma-contiguous.h |  4 ++
  kernel/dma/Kconfig | 10 
  kernel/dma/contiguous.c| 99 ++
  3 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index 03f8e98e3bcc..278a80a40456 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -79,6 +79,8 @@ static inline void dma_contiguous_set_default(struct cma *cma)
  
  void dma_contiguous_reserve(phys_addr_t addr_limit);
  
+void dma_pernuma_cma_reserve(void);

+
  int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
   phys_addr_t limit, struct cma **res_cma,
   bool fixed);
@@ -128,6 +130,8 @@ static inline void dma_contiguous_set_default(struct cma 
*cma) { }
  
  static inline void dma_contiguous_reserve(phys_addr_t limit) { }
  
+static inline void dma_pernuma_cma_reserve(void) { }

+
  static inline int dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t 
base,
   phys_addr_t limit, struct cma **res_cma,
   bool fixed)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index d006668c0027..aeb976b1d21c 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -104,6 +104,16 @@ config DMA_CMA
  if  DMA_CMA
  comment "Default contiguous memory area size:"
  
+config CMA_PERNUMA_SIZE_MBYTES

+   int "Size in Mega Bytes for per-numa CMA areas"
+   depends on NUMA
+   default 16 if ARM64
+   default 0
+   help
+ Defines the size (in MiB) of the per-numa memory area for Contiguous
+ Memory Allocator. Every numa node will get a separate CMA with this
+ size. If the size of 0 is selected, per-numa CMA is disabled.
+


I think this needs to be cleverer than just a static config option. 
Pretty much everything else CMA-related is runtime-configurable to some 
degree, and doing any per-node setup when booting on a single-node 
system would be wasted effort.


Since this is conceptually very similar to the existing hugetlb_cma 
implementation I'm also wondering about inconsistency with respect to 
specifying per-node vs. total sizes.


Another thought, though, is that systems large enough to have multiple 
NUMA nodes tend not to be short on memory, so it might not be 
unreasonable to base this all on whatever size the default area is 
given, and simply have a binary on/off switch to control the per-node 
aspect.



  config CMA_SIZE_MBYTES
int "Size in Mega Bytes"
depends on !CMA_SIZE_SEL_PERCENTAGE
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 15bc5026c485..bcbd53aead93 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -30,7 +30,14 @@
  #define CMA_SIZE_MBYTES 0
  #endif
  
+#ifdef CONFIG_CMA_PERNUMA_SIZE_MBYTES

+#define CMA_SIZE_PERNUMA_MBYTES CONFIG_CMA_PERNUMA_SIZE_MBYTES
+#else
+#define CMA_SIZE_PERNUMA_MBYTES 0
+#endif
+
  struct cma *dma_contiguous_default_area;
+static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
  
  /*

   * Default global CMA area size can be defined in kernel's .config.
@@ -44,6 +51,8 @@ struct cma *dma_contiguous_default_area;
   */
  static const phys_addr_t size_bytes __initconst =
(phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
+static const phys_addr_t pernuma_size_bytes __initconst =
+   (phys_addr_t)CMA_SIZE_PERNUMA_MBYTES * SZ_1M;
  static phys_addr_t  size_cmdline __initdata = -1;
  static phys_addr_t base_cmdline __initdata;
  static phys_addr_t limit_cmdline __initdata;
@@ -96,6 +105,33 @@ static inline __maybe_unused phys_addr_t 
cma_early_percent_memory(void)
  
  #endif
  
+void __init dma_pernuma_cma_reserve(void)

+{
+   int nid;
+
+   if (!pernuma_size_bytes || nr_online_nodes <= 1)
+   return;
+
+   for_each_node_state(nid, N_ONLINE) {


Do we need/want notifiers to handle currently-offline nodes coming 
online later (I'm not sure off-hand how NUMA interacts with stuff like 
"maxcpus=n")?



+   

Re: [PATCH 6/7] iommu/vt-d: Warn on out-of-range invalidation address

2020-06-25 Thread Lu Baolu

Hi,

On 2020/6/23 23:43, Jacob Pan wrote:

For guest requested IOTLB invalidation, address and mask are provided as
part of the invalidation data. VT-d HW silently ignores any address bits
below the mask. SW shall also allow such case but give warning if
address does not align with the mask. This patch relax the fault
handling from error to warning and proceed with invalidation request
with the given mask.

Signed-off-by: Jacob Pan 
---
  drivers/iommu/intel/iommu.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 5ea5732d5ec4..50fc62413a35 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5439,13 +5439,12 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
  
  		switch (BIT(cache_type)) {

case IOMMU_CACHE_INV_TYPE_IOTLB:
+   /* HW will ignore LSB bits based on address mask */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
(inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
size)) - 1))) {
-   pr_err_ratelimited("Address out of range, 0x%llx, 
size order %llu\n",
-  inv_info->addr_info.addr, 
size);
-   ret = -ERANGE;
-   goto out_unlock;
+   WARN_ONCE(1, "Address out of range, 0x%llx, size 
order %llu\n",
+ inv_info->addr_info.addr, size);


I don't think WARN_ONCE() is suitable here. It makes users think it's a
kernel bug. How about pr_warn_ratelimited()?

Best regards,
baolu


}
  
  			/*



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/7] iommu/vt-d: Handle non-page aligned address

2020-06-25 Thread Lu Baolu

Hi,

On 2020/6/23 23:43, Jacob Pan wrote:

From: Liu Yi L 

Address information for device TLB invalidation comes from userspace
when device is directly assigned to a guest with vIOMMU support.
VT-d requires page aligned address. This patch checks and enforce
address to be page aligned, otherwise reserved bits can be set in the
invalidation descriptor. Unrecoverable fault will be reported due to
non-zero value in the reserved bits.

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
  drivers/iommu/intel/dmar.c | 19 +--
  1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d9f973fa1190..53f4e5003620 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1455,9 +1455,24 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, 
u16 sid, u16 pfsid,
 * Max Invs Pending (MIP) is set to 0 for now until we have DIT in
 * ECAP.
 */
-   desc.qw1 |= addr & ~mask;
-   if (size_order)
+   if (addr & ~VTD_PAGE_MASK)
+   pr_warn_ratelimited("Invalidate non-page aligned address 
%llx\n", addr);
+
+   if (size_order) {
+   /* Take page address */
+   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr);


If size_order == 0 (that means only a single page is about to be
invalidated), do you still need to set ADDR field of the descriptor?

Best regards,
baolu


+   /*
+* Existing 0s in address below size_order may be the least
+* significant bit, we must set them to 1s to avoid having
+* smaller size than desired.
+*/
+   desc.qw1 |= GENMASK_ULL(size_order + VTD_PAGE_SHIFT,
+   VTD_PAGE_SHIFT);
+   /* Clear size_order bit to indicate size */
+   desc.qw1 &= ~mask;
+   /* Set the S bit to indicate flushing more than 1 page */
desc.qw1 |= QI_DEV_EIOTLB_SIZE;
+   }
  
  	qi_submit_sync(iommu, &desc, 1, 0);

  }


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/11] media: exynos4-is: Improve support for s5pv210 and parallel ports

2020-06-25 Thread Jonathan Bakker
Hi Tomasz,

On 2020-06-24 4:58 a.m., Tomasz Figa wrote:
> On Wed, Jun 24, 2020 at 1:54 PM Krzysztof Kozlowski  wrote:
>>
>> On Wed, Jun 24, 2020 at 01:39:50PM +0200, Hans Verkuil wrote:
>>> Can someone from Samsung or someone who knows this SoC take a look at this 
>>> series?
>>>
>>> This series looks sane to me, so I'll probably merge this if nobody replies
>>> in the next two weeks or so.
>>
>> Unfortunately I don't know the media part on S5Pv210 at all so I cannot
>> provide any feedback. There are not many active users of these SoCs
>> nowadays. One of hem is Jonathan, so if he wants to change something he will
>> mostly break/affect his own setup. :) Therefore I think it is safe to merge.
> 
> I think this driver is also used on Exynos4210 and on some setups with
> 4412 where the ISP is not used.

Yes, this driver is also used by Exynos4210 and Exynos4412, notably by the 
Galaxy S3 series.
They don't use the parallel ports, but rather the CSIS.  I don't believe I've 
broken support
for that, but I don't have the hardware to test.

My other remaining concern is whether to adjust the camera port A/camera port B 
to match the
device tree documentation or to update the documentation to match the driver.  
I decided to
update the driver in these patches, but it is much simpler to simply update the 
binding doc.
The only mainline user of the parallel ports is the Goni dev board which 
appears to be setup
for the driver way of numbering as opposed to the binding doc.  I have no 
strong preference
on which way to actually go.

> 
> I can't promise anything, but I'll try to do a high level review.
> Hopefully I still have some memory from the time I used to play with
> this hardware.

Thanks, that would be appreciated,
Jonathan

> 
> Best regards,
> Tomasz
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA

2020-06-25 Thread Barry Song
Ganapatrao Kulkarni has put some effort on making arm-smmu-v3 use local
memory to save command queues[1]. I also did similar job in patch
"iommu/arm-smmu-v3: allocate the memory of queues in local numa node"
[2] while not realizing Ganapatrao did that before.

But it seems it is much better to make dma_alloc_coherent() to be
inherently NUMA-aware on NUMA-capable systems.

Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
Saving queues and tables remotely will increase the latency of ARM SMMU
significantly. For example, when SMMU is at node2 and the default global
CMA is at node0, after sending a CMD_SYNC in an empty command queue, we
have to wait more than 550ns for the completion of the command CMD_SYNC.
However, if we save them locally, we only need to wait for 240ns.

with per-numa CMA, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.

Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024455.html
[2] https://www.spinics.net/lists/iommu/msg44767.html

-v2: fix some issues reported by kernel test robot;
 fallback to default cma to avoid regression while allocation fails in
 per-numa cma according to Jonathan Cameron's suggestion;
 free memory properly

Barry Song (2):
  dma-direct: provide the ability to reserve per-numa CMA
  arm64: mm: reserve per-numa CMA after numa_init

 arch/arm64/mm/init.c   |  2 +
 include/linux/dma-contiguous.h |  4 ++
 kernel/dma/Kconfig | 10 
 kernel/dma/contiguous.c| 99 ++
 4 files changed, 106 insertions(+), 9 deletions(-)

-- 
2.27.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] arm64: mm: reserve per-numa CMA after numa_init

2020-06-25 Thread Barry Song
Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
with this patch, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.
Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Ganapatrao Kulkarni 
Cc: Catalin Marinas 
Cc: Nicolas Saenz Julienne 
Cc: Steve Capper 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Signed-off-by: Barry Song 
---
 arch/arm64/mm/init.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1e93cfc7c47a..07d4d1fe7983 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -420,6 +420,8 @@ void __init bootmem_init(void)
 
arm64_numa_init();
 
+   dma_pernuma_cma_reserve();
+
/*
 * must be done after arm64_numa_init() which calls numa_init() to
 * initialize node_online_map that gets used in hugetlb_cma_reserve()
-- 
2.27.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/2] dma-direct: provide the ability to reserve per-numa CMA

2020-06-25 Thread Barry Song
This is useful for at least two scenarios:
1. ARM64 smmu will get memory from local numa node, it can save its
command queues and page tables locally. Tests show it can decrease
dma_unmap latency at lot. For example, without this patch, smmu on
node2 will get memory from node0 by calling dma_alloc_coherent(),
typically, it has to wait for more than 560ns for the completion of
CMD_SYNC in an empty command queue; with this patch, it needs 240ns
only.
2. when we set iommu passthrough, drivers will get memory from CMA,
local memory means much less latency.

Cc: Jonathan Cameron 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Ganapatrao Kulkarni 
Cc: Catalin Marinas 
Cc: Nicolas Saenz Julienne 
Cc: Steve Capper 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Signed-off-by: Barry Song 
---
 include/linux/dma-contiguous.h |  4 ++
 kernel/dma/Kconfig | 10 
 kernel/dma/contiguous.c| 99 ++
 3 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index 03f8e98e3bcc..278a80a40456 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -79,6 +79,8 @@ static inline void dma_contiguous_set_default(struct cma *cma)
 
 void dma_contiguous_reserve(phys_addr_t addr_limit);
 
+void dma_pernuma_cma_reserve(void);
+
 int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
   phys_addr_t limit, struct cma **res_cma,
   bool fixed);
@@ -128,6 +130,8 @@ static inline void dma_contiguous_set_default(struct cma 
*cma) { }
 
 static inline void dma_contiguous_reserve(phys_addr_t limit) { }
 
+static inline void dma_pernuma_cma_reserve(void) { }
+
 static inline int dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t 
base,
   phys_addr_t limit, struct cma **res_cma,
   bool fixed)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index d006668c0027..aeb976b1d21c 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -104,6 +104,16 @@ config DMA_CMA
 if  DMA_CMA
 comment "Default contiguous memory area size:"
 
+config CMA_PERNUMA_SIZE_MBYTES
+   int "Size in Mega Bytes for per-numa CMA areas"
+   depends on NUMA
+   default 16 if ARM64
+   default 0
+   help
+ Defines the size (in MiB) of the per-numa memory area for Contiguous
+ Memory Allocator. Every numa node will get a separate CMA with this
+ size. If the size of 0 is selected, per-numa CMA is disabled.
+
 config CMA_SIZE_MBYTES
int "Size in Mega Bytes"
depends on !CMA_SIZE_SEL_PERCENTAGE
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 15bc5026c485..bcbd53aead93 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -30,7 +30,14 @@
 #define CMA_SIZE_MBYTES 0
 #endif
 
+#ifdef CONFIG_CMA_PERNUMA_SIZE_MBYTES
+#define CMA_SIZE_PERNUMA_MBYTES CONFIG_CMA_PERNUMA_SIZE_MBYTES
+#else
+#define CMA_SIZE_PERNUMA_MBYTES 0
+#endif
+
 struct cma *dma_contiguous_default_area;
+static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
 
 /*
  * Default global CMA area size can be defined in kernel's .config.
@@ -44,6 +51,8 @@ struct cma *dma_contiguous_default_area;
  */
 static const phys_addr_t size_bytes __initconst =
(phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
+static const phys_addr_t pernuma_size_bytes __initconst =
+   (phys_addr_t)CMA_SIZE_PERNUMA_MBYTES * SZ_1M;
 static phys_addr_t  size_cmdline __initdata = -1;
 static phys_addr_t base_cmdline __initdata;
 static phys_addr_t limit_cmdline __initdata;
@@ -96,6 +105,33 @@ static inline __maybe_unused phys_addr_t 
cma_early_percent_memory(void)
 
 #endif
 
+void __init dma_pernuma_cma_reserve(void)
+{
+   int nid;
+
+   if (!pernuma_size_bytes || nr_online_nodes <= 1)
+   return;
+
+   for_each_node_state(nid, N_ONLINE) {
+   int ret;
+   char name[20];
+
+   snprintf(name, sizeof(name), "pernuma%d", nid);
+   ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0,
+0, false, name,
+
&dma_contiguous_pernuma_area[nid],
+nid);
+   if (ret) {
+   pr_warn("%s: reservation failed: err %d, node %d", 
__func__,
+   ret, nid);
+   continue;
+   }
+
+   pr_debug("%s: reserved %llu MiB on node %d\n", __func__,
+   (unsigned long long)pernuma_size_bytes / SZ_1M, nid);
+   }
+}
+
 /**
  * dma_contiguous_reserve() - reserve area(s) for contiguous memory handling
  * @limit: End address of the reserved memory (optional, 0 for any).
@@ -222,22 +258,

Re: [PATCH 3/7] iommu/vt-d: Fix PASID devTLB invalidation

2020-06-25 Thread Lu Baolu

On 2020/6/23 23:43, Jacob Pan wrote:

DevTLB flush can be used for both DMA request with and without PASIDs.
The former uses PASID#0 (RID2PASID), latter uses non-zero PASID for SVA
usage.

This patch adds a check for PASID value such that devTLB flush with
PASID is used for SVA case. This is more efficient in that multiple
PASIDs can be used by a single device, when tearing down a PASID entry
we shall flush only the devTLB specific to a PASID.

Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table")
Signed-off-by: Jacob Pan 
---
  drivers/iommu/intel/pasid.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c81f0f17c6ba..3991a24539a1 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
qdep = info->ats_qdep;
pfsid = info->pfsid;
  
-	qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);

+   /*
+* When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
+* devTLB flush w/o PASID should be used. For non-zero PASID under
+* SVA usage, device could do DMA with multiple PASIDs. It is more
+* efficient to flush devTLB specific to the PASID.
+*/
+   if (pasid)


How about

if (pasid == PASID_RID2PASID)
qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);
else
		qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);


?

It makes the code more readable and still works even we reassign another
pasid for RID2PASID.

Best regards,
baolu


+   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 
- VTD_PAGE_SHIFT);
+   else
+   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);
  }
  
  void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/7] iommu/vt-d: Enforce PASID devTLB field mask

2020-06-25 Thread Lu Baolu

On 2020/6/23 23:43, Jacob Pan wrote:

From: Liu Yi L 

Set proper masks to avoid invalid input spillover to reserved bits.



Acked-by: Lu Baolu 

Best regards,
baolu


Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
  include/linux/intel-iommu.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4100bd224f5c..729386ca8122 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -380,8 +380,8 @@ enum {
  
  #define QI_DEV_EIOTLB_ADDR(a)	((u64)(a) & VTD_PAGE_MASK)

  #define QI_DEV_EIOTLB_SIZE(((u64)1) << 11)
-#define QI_DEV_EIOTLB_GLOB(g)  ((u64)g)
-#define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32)
+#define QI_DEV_EIOTLB_GLOB(g)  ((u64)(g) & 0x1)
+#define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xf) << 32)
  #define QI_DEV_EIOTLB_SID(sid)((u64)((sid) & 0x) << 16)
  #define QI_DEV_EIOTLB_QDEP(qd)((u64)((qd) & 0x1f) << 4)
  #define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/7] iommu/vt-d: Remove global page support in devTLB flush

2020-06-25 Thread Lu Baolu

On 2020/6/23 23:43, Jacob Pan wrote:

Global pages support is removed from VT-d spec 3.0 for dev TLB
invalidation. This patch is to remove the bits for vSVA. Similar change
already made for the native SVA. See the link below.



Acked-by: Lu Baolu 

Best regards,
baolu


Link: https://lkml.org/lkml/2019/8/26/651
Signed-off-by: Jacob Pan 
---
  drivers/iommu/intel/dmar.c  | 4 +---
  drivers/iommu/intel/iommu.c | 4 ++--
  include/linux/intel-iommu.h | 3 +--
  3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index cc46dff98fa0..d9f973fa1190 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1437,8 +1437,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, 
u32 pasid, u64 addr,
  
  /* PASID-based device IOTLB Invalidate */

  void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
- u32 pasid,  u16 qdep, u64 addr,
- unsigned int size_order, u64 granu)
+ u32 pasid,  u16 qdep, u64 addr, unsigned int 
size_order)
  {
unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
@@ -1446,7 +1445,6 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, 
u16 sid, u16 pfsid,
desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
QI_DEV_IOTLB_PFSID(pfsid);
-   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
  
  	/*

 * If S bit is 0, we only flush a single page. If S bit is set,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9129663a7406..96340da57075 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5466,7 +5466,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
info->pfsid, pasid,
info->ats_qdep,
inv_info->addr_info.addr,
-   size, granu);
+   size);
break;
case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
if (info->ats_enabled)
@@ -5474,7 +5474,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
info->pfsid, pasid,
info->ats_qdep,
inv_info->addr_info.addr,
-   size, granu);
+   size);
else
pr_warn_ratelimited("Passdown device IOTLB flush w/o 
ATS!\n");
break;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 729386ca8122..9a6614880773 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -380,7 +380,6 @@ enum {
  
  #define QI_DEV_EIOTLB_ADDR(a)	((u64)(a) & VTD_PAGE_MASK)

  #define QI_DEV_EIOTLB_SIZE(((u64)1) << 11)
-#define QI_DEV_EIOTLB_GLOB(g)  ((u64)(g) & 0x1)
  #define QI_DEV_EIOTLB_PASID(p)((u64)((p) & 0xf) << 32)
  #define QI_DEV_EIOTLB_SID(sid)((u64)((sid) & 0x) << 16)
  #define QI_DEV_EIOTLB_QDEP(qd)((u64)((qd) & 0x1f) << 4)
@@ -704,7 +703,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, 
u32 pasid, u64 addr,
  
  void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,

  u32 pasid, u16 qdep, u64 addr,
- unsigned int size_order, u64 granu);
+ unsigned int size_order);
  void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
  int pasid);
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users

2020-06-25 Thread Lu Baolu

On 2020/6/25 1:07, Jacob Pan wrote:

On Wed, 24 Jun 2020 14:54:49 +0800
Lu Baolu  wrote:


Hi Jacob,

On 2020/6/24 1:03, Jacob Pan wrote:

IOMMU UAPI data has a user filled argsz field which indicates the
data length comes with the API call. User data is not trusted,
argsz must be validated based on the current kernel data size,
mandatory data size, and feature flags.

User data may also be extended, results in possible argsz increase.
Backward compatibility is ensured based on size and flags checking.
Details are documented in Documentation/userspace-api/iommu.rst

This patch adds sanity checks in both IOMMU layer and vendor code,
where VT-d is the only user for now.

Signed-off-by: Liu Yi L
Signed-off-by: Jacob Pan
---
   drivers/iommu/intel/svm.c |  3 ++
   drivers/iommu/iommu.c | 96
---
include/linux/iommu.h |  7 ++-- 3 files changed, 98
insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 713b3a218483..237db56878c0 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -244,6 +244,9 @@ int intel_svm_bind_gpasid(struct iommu_domain
*domain, struct device *dev, data->format !=
IOMMU_PASID_FORMAT_INTEL_VTD) return -EINVAL;
   
+	if (data->argsz != offsetofend(struct

iommu_gpasid_bind_data, vendor.vtd))
+   return -EINVAL;

Need to do size check in intel_iommu_sva_invalidate() as well?


No need. The difference is that there is no
vendor specific union for intel_iommu_sva_invalidate().

Generic flags are used to process invalidation data inside
intel_iommu_sva_invalidate().


Thanks for the explanation. With the nit tweaked,

Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu