Re: [PATCH 1/2] iommu/vt-d: Move Kconfig and Makefile bits down into intel directory

2020-06-12 Thread Lu Baolu

Hi Jerry,

On 2020/6/13 7:10, Jerry Snitselaar wrote:

Move Intel Kconfig and Makefile bits down into intel directory
with the rest of the Intel specific files.

Cc: Joerg Roedel 
Cc: Lu Baolu 


Thanks!

Reviewed-by: Lu Baolu 

Best regards,
baolu


Signed-off-by: Jerry Snitselaar 
---
  drivers/iommu/Kconfig| 86 +---
  drivers/iommu/Makefile   |  8 +---
  drivers/iommu/intel/Kconfig  | 86 
  drivers/iommu/intel/Makefile |  7 +++
  4 files changed, 96 insertions(+), 91 deletions(-)
  create mode 100644 drivers/iommu/intel/Kconfig
  create mode 100644 drivers/iommu/intel/Makefile

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index aca76383f201..b12d4ec124f6 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -176,91 +176,7 @@ config AMD_IOMMU_DEBUGFS
  This option is -NOT- intended for production environments, and should
  not generally be enabled.
  
-# Intel IOMMU support

-config DMAR_TABLE
-   bool
-
-config INTEL_IOMMU
-   bool "Support for Intel IOMMU using DMA Remapping Devices"
-   depends on PCI_MSI && ACPI && (X86 || IA64)
-   select IOMMU_API
-   select IOMMU_IOVA
-   select NEED_DMA_MAP_STATE
-   select DMAR_TABLE
-   select SWIOTLB
-   select IOASID
-   help
- DMA remapping (DMAR) devices support enables independent address
- translations for Direct Memory Access (DMA) from devices.
- These DMA remapping devices are reported via ACPI tables
- and include PCI device scope covered by these DMA
- remapping devices.
-
-config INTEL_IOMMU_DEBUGFS
-   bool "Export Intel IOMMU internals in Debugfs"
-   depends on INTEL_IOMMU && IOMMU_DEBUGFS
-   help
- !!!WARNING!!!
-
- DO NOT ENABLE THIS OPTION UNLESS YOU REALLY KNOW WHAT YOU ARE DOING!!!
-
- Expose Intel IOMMU internals in Debugfs.
-
- This option is -NOT- intended for production environments, and should
- only be enabled for debugging Intel IOMMU.
-
-config INTEL_IOMMU_SVM
-   bool "Support for Shared Virtual Memory with Intel IOMMU"
-   depends on INTEL_IOMMU && X86
-   select PCI_PASID
-   select PCI_PRI
-   select MMU_NOTIFIER
-   select IOASID
-   help
- Shared Virtual Memory (SVM) provides a facility for devices
- to access DMA resources through process address space by
- means of a Process Address Space ID (PASID).
-
-config INTEL_IOMMU_DEFAULT_ON
-   def_bool y
-   prompt "Enable Intel DMA Remapping Devices by default"
-   depends on INTEL_IOMMU
-   help
- Selecting this option will enable a DMAR device at boot time if
- one is found. If this option is not selected, DMAR support can
- be enabled by passing intel_iommu=on to the kernel.
-
-config INTEL_IOMMU_BROKEN_GFX_WA
-   bool "Workaround broken graphics drivers (going away soon)"
-   depends on INTEL_IOMMU && BROKEN && X86
-   ---help---
- Current Graphics drivers tend to use physical address
- for DMA and avoid using DMA APIs. Setting this config
- option permits the IOMMU driver to set a unity map for
- all the OS-visible memory. Hence the driver can continue
- to use physical addresses for DMA, at least until this
- option is removed in the 2.6.32 kernel.
-
-config INTEL_IOMMU_FLOPPY_WA
-   def_bool y
-   depends on INTEL_IOMMU && X86
-   ---help---
- Floppy disk drivers are known to bypass DMA API calls
- thereby failing to work when IOMMU is enabled. This
- workaround will setup a 1:1 mapping for the first
- 16MiB to make floppy (an ISA device) work.
-
-config INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
-   bool "Enable Intel IOMMU scalable mode by default"
-   depends on INTEL_IOMMU
-   help
- Selecting this option will enable by default the scalable mode if
- hardware presents the capability. The scalable mode is defined in
- VT-d 3.0. The scalable mode capability could be checked by reading
- /sys/devices/virtual/iommu/dmar*/intel-iommu/ecap. If this option
- is not selected, scalable mode support could also be enabled by
- passing intel_iommu=sm_on to the kernel. If not sure, please use
- the default value.
+source "drivers/iommu/intel/Kconfig"
  
  config IRQ_REMAP

bool "Support for Interrupt Remapping"
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..71dd2f382e78 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,4 +1,5 @@
  # SPDX-License-Identifier: GPL-2.0
+obj-y += intel/
  obj-$(CONFIG_IOMMU_API) += iommu.o
  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
@@ -17,13 +18,8 @@ obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
  obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
  arm_smmu-ob

Re: [PATCH v8 5/7] iommu/arm-smmu: Add implementation for the adreno GPU SMMU

2020-06-12 Thread kernel test robot
Hi Jordan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on next-20200612]
[cannot apply to iommu/next robh/for-next arm/for-next keystone/next 
rockchip/for-next arm64/for-next/core shawnguo/for-next soc/for-next v5.7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Jordan-Crouse/iommu-arm-smmu-Enable-split-pagetable-support/20200612-094718
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
b961f8dc8976c091180839f4483d67b7c2ca2578
config: arm64-randconfig-r026-20200612 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
3b43f006294971b8049d4807110032169780e5b8)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> drivers/iommu/arm-smmu-qcom.c:17:49: error: member reference type 'struct 
>> device *' is a pointer; did you mean to use '->'?
return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno");
^
->
1 error generated.

vim +17 drivers/iommu/arm-smmu-qcom.c

14  
15  static bool qcom_adreno_smmu_is_gpu_device(struct arm_smmu_domain 
*smmu_domain)
16  {
  > 17  return of_device_is_compatible(smmu_domain->dev.of_node, 
"qcom,adreno");
18  }
19  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-12 Thread Fenghua Yu
A #GP fault is generated when ENQCMD instruction is executed without
a valid PASID value programmed in the current thread's PASID MSR. The
#GP fault handler will initialize the MSR if a PASID has been allocated
for this process.

Decoding the user instruction is ugly and sets a bad architecture
precedent. It may not function if the faulting instruction is modified
after #GP.

Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
without a valid PASID value programmed. #GP error codes are 16 bits and all
16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
when loading from the source operand, so its not fully comprehending all
the reasons. Rather than special case the ENQCMD, in future Intel may
choose a different fault mechanism for such cases if recovery is needed on
#GP.

The following heuristic is used to avoid decoding the user instructions
to determine the precise reason for the #GP fault:
1) If the mm for the process has not been allocated a PASID, this #GP
   cannot be fixed.
2) If the PASID MSR is already initialized, then the #GP was for some
   other reason
3) Try initializing the PASID MSR and returning. If the #GP was from
   an ENQCMD this will fix it. If not, the #GP fault will be repeated
   and will hit case "2".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Update the first paragraph of the commit message (Thomas)
- Add reasons why don't decode the user instruction and don't use
  #GP error code (Thomas)
- Change get_task_mm() to current->mm (Thomas)
- Add comments on why IRQ is disabled during PASID fixup (Thomas)
- Add comment in fixup() that the function is called when #GP is from
  user (so mm is not NULL) (Dave Hansen)

 arch/x86/include/asm/iommu.h |  1 +
 arch/x86/kernel/traps.c  | 23 +
 drivers/iommu/intel/svm.c| 39 
 3 files changed, 63 insertions(+)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index ed41259fe7ac..e9365a5d6f7d 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -27,5 +27,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
 }
 
 void __free_pasid(struct mm_struct *mm);
+bool __fixup_pasid_exception(void);
 
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4cc541051994..0f78d5cdddfe 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -436,6 +437,16 @@ static enum kernel_gp_hint get_kernel_gp_address(struct 
pt_regs *regs,
return GP_CANONICAL;
 }
 
+static bool fixup_pasid_exception(void)
+{
+   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+   return false;
+   if (!static_cpu_has(X86_FEATURE_ENQCMD))
+   return false;
+
+   return __fixup_pasid_exception();
+}
+
 #define GPFSTR "general protection fault"
 
 dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
@@ -447,6 +458,18 @@ dotraplinkage void do_general_protection(struct pt_regs 
*regs, long error_code)
int ret;
 
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+   /*
+* Perform the check for a user mode PASID exception before enable
+* interrupts. Doing this here ensures that the PASID MSR can be simply
+* accessed because the contents are known to be still associated
+* with the current process.
+*/
+   if (user_mode(regs) && fixup_pasid_exception()) {
+   cond_local_irq_enable(regs);
+   return;
+   }
+
cond_local_irq_enable(regs);
 
if (static_cpu_has(X86_FEATURE_UMIP)) {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 27dc866b8461..81fd2380c0f9 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1078,3 +1078,42 @@ void __free_pasid(struct mm_struct *mm)
 */
ioasid_free(pasid);
 }
+
+/*
+ * Apply some heuristics to see if the #GP fault was caused by a thread
+ * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
+ * is the problem, try initializing the IA32_PASID MSR. If the heuristic
+ * guesses incorrectly, take one more #GP fault.
+ */
+bool __fixup_pasid_exception(void)
+{
+   u64 pasid_msr;
+   unsigned int pasid;
+
+   /*
+* This function is called only when this #GP was triggered from user
+* space. So the mm cannot be NULL.
+*/
+   pasid = current->mm->pasid;
+   /* If the mm doesn't have a valid PASID, then can't help. */
+   if (invalid_pasid(pasid))
+   return false;
+
+   /*
+* Since IRQ is disabled now, the current task still owns the FPU on
+* this CPU and the PASID MSR can be 

[PATCH v2 01/12] iommu: Change type of pasid to unsigned int

2020-06-12 Thread Fenghua Yu
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with ioasid's
type, define PASID and its variations (e.g. max PASID) as "unsigned int".

No PASID type change in uapi.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Create this new patch to define PASID as "unsigned int" consistently in
  iommu (Thomas)

 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c |  5 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  2 +-
 drivers/iommu/amd/amd_iommu.h  | 13 
 drivers/iommu/amd/amd_iommu_types.h| 12 
 drivers/iommu/amd/init.c   |  4 +--
 drivers/iommu/amd/iommu.c  | 41 ++
 drivers/iommu/amd/iommu_v2.c   | 22 +++---
 drivers/iommu/intel/debugfs.c  |  2 +-
 drivers/iommu/intel/dmar.c | 13 
 drivers/iommu/intel/intel-pasid.h  | 21 ++---
 drivers/iommu/intel/iommu.c|  4 +--
 drivers/iommu/intel/pasid.c| 36 +++---
 drivers/iommu/intel/svm.c  | 12 
 drivers/iommu/iommu.c  |  2 +-
 drivers/misc/uacce/uacce.c |  2 +-
 include/linux/amd-iommu.h  |  9 +++---
 include/linux/intel-iommu.h| 18 +--
 include/linux/intel-svm.h  |  2 +-
 include/linux/iommu.h  |  8 ++---
 include/linux/uacce.h  |  2 +-
 20 files changed, 121 insertions(+), 109 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
index 7c8786b9eb0a..703d23deca76 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -139,7 +139,8 @@ void kfd_iommu_unbind_process(struct kfd_process *p)
 }
 
 /* Callback for process shutdown invoked by the IOMMU driver */
-static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
+static void iommu_pasid_shutdown_callback(struct pci_dev *pdev,
+ unsigned int pasid)
 {
struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
struct kfd_process *p;
@@ -185,7 +186,7 @@ static void iommu_pasid_shutdown_callback(struct pci_dev 
*pdev, int pasid)
 }
 
 /* This function called by IOMMU driver on PPR failure */
-static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
+static int iommu_invalid_ppr_cb(struct pci_dev *pdev, unsigned int pasid,
unsigned long address, u16 flags)
 {
struct kfd_dev *dev;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f0587d94294d..3c7d1f774afe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -714,7 +714,7 @@ struct kfd_process {
/* We want to receive a notification when the mm_struct is destroyed */
struct mmu_notifier mmu_notifier;
 
-   uint16_t pasid;
+   unsigned int pasid;
unsigned int doorbell_index;
 
/*
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index f892992c8744..0914b5b6f879 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -45,12 +45,13 @@ extern int amd_iommu_register_ppr_notifier(struct 
notifier_block *nb);
 extern int amd_iommu_unregister_ppr_notifier(struct notifier_block *nb);
 extern void amd_iommu_domain_direct_map(struct iommu_domain *dom);
 extern int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids);
-extern int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
+extern int amd_iommu_flush_page(struct iommu_domain *dom, unsigned int pasid,
u64 address);
-extern int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid);
-extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
-unsigned long cr3);
-extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid);
+extern int amd_iommu_flush_tlb(struct iommu_domain *dom, unsigned int pasid);
+extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom,
+unsigned int pasid, unsigned long cr3);
+extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom,
+  unsigned int pasid);
 extern struct iommu_domain *amd_iommu_get_v2_domain(struct pci_dev *pdev);
 
 #ifdef CONFIG_IRQ_REMAP
@@ -66,7 +67,7 @@ static inline int amd_iommu_create_irq_domain(struct 
amd_iommu *iommu)
 #define PPR_INVALID0x1
 #define PPR_FAILURE0xf
 
-extern int amd_iommu_complete_ppr(struct pci_dev *pdev, int pasid,
+extern int amd_iommu_complete_ppr(struct pci_dev *pdev, unsigned int pasid,
  int status, int tag);
 
 static inline bool is_rd890_iommu(struct pci_dev *pdev)
diff --git a/drivers/iommu/amd/amd_iommu_types.h 

[PATCH v2 02/12] ocxl: Change type of pasid to unsigned int

2020-06-12 Thread Fenghua Yu
PASID is defined as "int" although it's a 20-bit value and shouldn't be
negative int. To be consistent with type defined in iommu, define PASID
as "unsigned int".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Create this new patch to define PASID as "unsigned int" consistently in
  ocxl (Thomas)

 drivers/misc/ocxl/config.c|  3 ++-
 drivers/misc/ocxl/link.c  |  6 +++---
 drivers/misc/ocxl/ocxl_internal.h |  6 +++---
 drivers/misc/ocxl/pasid.c |  2 +-
 drivers/misc/ocxl/trace.h | 20 ++--
 include/misc/ocxl.h   |  6 +++---
 6 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index c8e19bfb5ef9..22d034caed3d 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -806,7 +806,8 @@ int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
 }
 EXPORT_SYMBOL_GPL(ocxl_config_set_TL);
 
-int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control, int 
pasid)
+int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control,
+   unsigned int pasid)
 {
u32 val;
unsigned long timeout;
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 58d111afd9f6..931f6ae022db 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -492,7 +492,7 @@ static u64 calculate_cfg_state(bool kernel)
return state;
 }
 
-int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
+int ocxl_link_add_pe(void *link_handle, unsigned int pasid, u32 pidr, u32 tidr,
u64 amr, struct mm_struct *mm,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
@@ -572,7 +572,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 }
 EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
 
-int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
+int ocxl_link_update_pe(void *link_handle, unsigned int pasid, __u16 tid)
 {
struct ocxl_link *link = (struct ocxl_link *) link_handle;
struct spa *spa = link->spa;
@@ -608,7 +608,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 
tid)
return rc;
 }
 
-int ocxl_link_remove_pe(void *link_handle, int pasid)
+int ocxl_link_remove_pe(void *link_handle, unsigned int pasid)
 {
struct ocxl_link *link = (struct ocxl_link *) link_handle;
struct spa *spa = link->spa;
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 345bf843a38e..3ca982ba7472 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -41,7 +41,7 @@ struct ocxl_afu {
struct ocxl_afu_config config;
int pasid_base;
int pasid_count; /* opened contexts */
-   int pasid_max; /* maximum number of contexts */
+   unsigned int pasid_max; /* maximum number of contexts */
int actag_base;
int actag_enabled;
struct mutex contexts_lock;
@@ -69,7 +69,7 @@ struct ocxl_xsl_error {
 
 struct ocxl_context {
struct ocxl_afu *afu;
-   int pasid;
+   unsigned int pasid;
struct mutex status_mutex;
enum ocxl_context_status status;
struct address_space *mapping;
@@ -128,7 +128,7 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
  * pasid: the PASID for the AFU context
  * tid: the new thread id for the process element
  */
-int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
+int ocxl_link_update_pe(void *link_handle, unsigned int pasid, __u16 tid);
 
 int ocxl_context_mmap(struct ocxl_context *ctx,
struct vm_area_struct *vma);
diff --git a/drivers/misc/ocxl/pasid.c b/drivers/misc/ocxl/pasid.c
index d14cb56e6920..a151fc8f0bec 100644
--- a/drivers/misc/ocxl/pasid.c
+++ b/drivers/misc/ocxl/pasid.c
@@ -80,7 +80,7 @@ static void range_free(struct list_head *head, u32 start, u32 
size,
 
 int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size)
 {
-   int max_pasid;
+   unsigned int max_pasid;
 
if (fn->config.max_pasid_log < 0)
return -ENOSPC;
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index 17e21cb2addd..019e2fc63b1d 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -9,13 +9,13 @@
 #include 
 
 DECLARE_EVENT_CLASS(ocxl_context,
-   TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
+   TP_PROTO(pid_t pid, void *spa, unsigned int pasid, u32 pidr, u32 tidr),
TP_ARGS(pid, spa, pasid, pidr, tidr),
 
TP_STRUCT__entry(
__field(pid_t, pid)
__field(void*, spa)
-   __field(int, pasid)
+   __field(unsigned int, pasid)
__field(u32, pidr)
__field(u32, tidr)
),
@@ -38,21 +38,21 @@ DECLARE_EVENT_CLASS(ocxl_context,
 );
 
 DEFINE_EVENT(ocxl_context, ocxl_co

[PATCH v2 04/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

2020-06-12 Thread Fenghua Yu
From: Ashok Raj 

ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
features are a complicated stack with lots of interconnected pieces.
This documentation provides a big picture overview for all of the
features.

Signed-off-by: Ashok Raj 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Fix the doc format and add the doc in toctree (Thomas)
- Modify the doc for better description (Thomas, Tony, Dave)

 Documentation/x86/index.rst |   1 +
 Documentation/x86/sva.rst   | 287 
 2 files changed, 288 insertions(+)
 create mode 100644 Documentation/x86/sva.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..e5d5ff096685 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -30,3 +30,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+   sva
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index ..1e52208c7dda
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,287 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Shared Virtual Addressing (SVA) with ENQCMD
+===
+
+Background
+==
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM)
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU also is required
+to support PCIe features ATS and PRI. ATS allows devices to cache
+translations for the virtual address. IOMMU driver uses the mmu_notifier()
+support to keep the device tlb cache and the CPU cache in sync. PRI allows
+the device to request paging the virtual address before using if they are
+not paged in the CPU page tables.
+
+
+Shared Hardware Workqueues
+==
+
+Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+==
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement implement fairness or ensure forward progress can be made.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permit hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=
+
+A new thread scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA capable device this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU specific api
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+intel_svm_bind_mm(), which will do the following.
+
+- Allocate the PASID, and program the process page-table (cr3) in the PASID
+  context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+  the device tlb in sync. For example, when a page-table entry is invalidated,
+  IOMMU propagates the invalidation to device tlb. This will force any
+  future access by the device to this virtual address to participate in
+  ATS. If the IOMMU responds with proper response that a page is not
+  present, the device would request the pa

[PATCH v2 09/12] fork: Clear PASID for new mm

2020-06-12 Thread Fenghua Yu
When a new mm is created, its PASID should be cleared, i.e. the PASID is
initialized to its init state 0 on both ARM and X86.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this patch to initialize PASID value for a new mm.

 include/linux/mm_types.h | 2 ++
 kernel/fork.c| 8 
 2 files changed, 10 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5778db3aa42d..904bc07411a9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -22,6 +22,8 @@
 #endif
 #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
 
+/* Initial PASID value is 0. */
+#define INIT_PASID 0
 
 struct address_space;
 struct mem_cgroup;
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..085e72d3e9eb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1007,6 +1007,13 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 #endif
 }
 
+static void mm_init_pasid(struct mm_struct *mm)
+{
+#ifdef CONFIG_PCI_PASID
+   mm->pasid = INIT_PASID;
+#endif
+}
+
 static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
@@ -1035,6 +1042,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
+   mm_init_pasid(mm);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 07/12] x86/msr-index: Define IA32_PASID MSR

2020-06-12 Thread Fenghua Yu
The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses PASID to identify process
address space and direct responses to the right address space.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change "identify process" to "identify process address space" in the
  commit message (Thomas)

 arch/x86/include/asm/msr-index.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e8370e64a155..e5f699ff1dd6 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -237,6 +237,9 @@
 #define MSR_IA32_LASTINTFROMIP 0x01dd
 #define MSR_IA32_LASTINTTOIP   0x01de
 
+#define MSR_IA32_PASID 0x0d93
+#define MSR_IA32_PASID_VALID   BIT_ULL(31)
+
 /* DEBUGCTLMSR bits (others vary by model): */
 #define DEBUGCTLMSR_LBR(1UL <<  0) /* last branch 
recording */
 #define DEBUGCTLMSR_BTF_SHIFT  1
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 10/12] x86/process: Clear PASID state for a newly forked/cloned thread

2020-06-12 Thread Fenghua Yu
The PASID state has to be cleared on forks, since the child has a
different address space. The PASID is also cleared for thread clone. While
it would be correct to inherit the PASID in this case, it is unknown
whether the new task will use ENQCMD. Giving it the PASID "just in case"
would have the downside of increased context switch overhead to setting
the PASID MSR.

Since #GP faults have to be handled on any threads that were created before
the PASID was assigned to the mm of the process, newly created threads
might as well be treated in a consistent way.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Modify init_task_pasid().

 arch/x86/kernel/process.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index f362ce0d5ac0..1b1492e337a6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -121,6 +121,21 @@ static int set_new_tls(struct task_struct *p, unsigned 
long tls)
return do_set_thread_area_64(p, ARCH_SET_FS, tls);
 }
 
+/* Initialize the PASID state for the forked/cloned thread. */
+static void init_task_pasid(struct task_struct *task)
+{
+   struct ia32_pasid_state *ppasid;
+
+   /*
+* Initialize the PASID state so that the PASID MSR will be
+* initialized to its initial state (0) by XRSTORS when the task is
+* scheduled for the first time.
+*/
+   ppasid = get_xsave_addr(&task->thread.fpu.state.xsave, XFEATURE_PASID);
+   if (ppasid)
+   ppasid->pasid = INIT_PASID;
+}
+
 int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
unsigned long arg, struct task_struct *p, unsigned long tls)
 {
@@ -174,6 +189,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned 
long sp,
task_user_gs(p) = get_user_gs(current_pt_regs());
 #endif
 
+   if (static_cpu_has(X86_FEATURE_ENQCMD))
+   init_task_pasid(p);
+
/* Set a new TLS for the child thread? */
if (clone_flags & CLONE_SETTLS)
ret = set_new_tls(p, tls);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 06/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

2020-06-12 Thread Fenghua Yu
From: Yu-cheng Yu 

ENQCMD instruction reads PASID from IA32_PASID MSR. The MSR is stored
in the task's supervisor FPU PASID state and is context switched by
XSAVES/XRSTORS.

Signed-off-by: Yu-cheng Yu 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Modify the commit message (Thomas)

 arch/x86/include/asm/fpu/types.h  | 10 ++
 arch/x86/include/asm/fpu/xstate.h |  2 +-
 arch/x86/kernel/fpu/xstate.c  |  4 
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f098f6cab94b..00f8efd4c07d 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -114,6 +114,7 @@ enum xfeature {
XFEATURE_Hi16_ZMM,
XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
XFEATURE_PKRU,
+   XFEATURE_PASID,
 
XFEATURE_MAX,
 };
@@ -128,6 +129,7 @@ enum xfeature {
 #define XFEATURE_MASK_Hi16_ZMM (1 << XFEATURE_Hi16_ZMM)
 #define XFEATURE_MASK_PT   (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
+#define XFEATURE_MASK_PASID(1 << XFEATURE_PASID)
 
 #define XFEATURE_MASK_FPSSE(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512   (XFEATURE_MASK_OPMASK \
@@ -229,6 +231,14 @@ struct pkru_state {
u32 pad;
 } __packed;
 
+/*
+ * State component 10 is supervisor state used for context-switching the
+ * PASID state.
+ */
+struct ia32_pasid_state {
+   u64 pasid;
+} __packed;
+
 struct xstate_header {
u64 xfeatures;
u64 xcomp_bv;
diff --git a/arch/x86/include/asm/fpu/xstate.h 
b/arch/x86/include/asm/fpu/xstate.h
index 422d8369012a..ab9833c57aaa 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -33,7 +33,7 @@
  XFEATURE_MASK_BNDCSR)
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (0)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
 
 /*
  * Unsupported supervisor features. When a supervisor feature in this mask is
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index bda2e5eaca0e..31629e43383c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -37,6 +37,7 @@ static const char *xfeature_names[] =
"AVX-512 ZMM_Hi256" ,
"Processor Trace (unused)"  ,
"Protection Keys User registers",
+   "PASID state",
"unknown xstate feature",
 };
 
@@ -51,6 +52,7 @@ static short xsave_cpuid_features[] __initdata = {
X86_FEATURE_AVX512F,
X86_FEATURE_INTEL_PT,
X86_FEATURE_PKU,
+   X86_FEATURE_ENQCMD,
 };
 
 /*
@@ -316,6 +318,7 @@ static void __init print_xstate_features(void)
print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
print_xstate_feature(XFEATURE_MASK_PKRU);
+   print_xstate_feature(XFEATURE_MASK_PASID);
 }
 
 /*
@@ -590,6 +593,7 @@ static void check_xstate_against_struct(int nr)
XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
XCHECK_SZ(sz, nr, XFEATURE_PKRU,  struct pkru_state);
+   XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state);
 
/*
 * Make *SURE* to add any feature numbers in below if
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 11/12] x86/mmu: Allocate/free PASID

2020-06-12 Thread Fenghua Yu
A PASID is allocated for an "mm" the first time any thread attaches
to an SVM capable device. Later device attachments (whether to the same
device or another SVM device) will re-use the same PASID.

The PASID is freed when the process exits (so no need to keep
reference counts on how many SVM devices are sharing the PASID).

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Define a helper free_bind() to simplify error exit code in bind_mm()
  (Thomas)
- Fix a ret error code in bind_mm() (Thomas)
- Change pasid's type from "int" to "unsigned int" to have consistent
  pasid type in iommu (Thomas)
- Simplify alloc_pasid() a bit.

 arch/x86/include/asm/iommu.h   |   2 +
 arch/x86/include/asm/mmu_context.h |  14 
 drivers/iommu/intel/svm.c  | 101 +
 3 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index bf1ed2ddc74b..ed41259fe7ac 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
return -EINVAL;
 }
 
+void __free_pasid(struct mm_struct *mm);
+
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 47562147e70b..f8c91ce8c451 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern atomic64_t last_mm_ctx_id;
 
@@ -117,9 +118,22 @@ static inline int init_new_context(struct task_struct *tsk,
init_new_context_ldt(mm);
return 0;
 }
+
+static inline void free_pasid(struct mm_struct *mm)
+{
+   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+   return;
+
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return;
+
+   __free_pasid(mm);
+}
+
 static inline void destroy_context(struct mm_struct *mm)
 {
destroy_context_ldt(mm);
+   free_pasid(mm);
 }
 
 extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4e775e12ae52..27dc866b8461 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -425,6 +425,53 @@ int intel_svm_unbind_gpasid(struct device *dev, unsigned 
int pasid)
return ret;
 }
 
+static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev,
+ bool new_pasid)
+{
+   if (new_pasid)
+   ioasid_free(svm->pasid);
+   kfree(svm);
+   kfree(sdev);
+}
+
+/*
+ * If this mm already has a PASID, use it. Otherwise allocate a new one.
+ * Let the caller know if a new PASID is allocated via 'new_pasid'.
+ */
+static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
+  unsigned int pasid_max, bool *new_pasid,
+  unsigned int flags)
+{
+   unsigned int pasid;
+
+   *new_pasid = false;
+
+   /*
+* Reuse the PASID if the mm already has a PASID and not a private
+* PASID is requested.
+*/
+   if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
+   /*
+* Once a PASID is allocated for this mm, the PASID
+* stays with the mm until the mm is dropped. Reuse
+* the PASID which has been already allocated for the
+* mm instead of allocating a new one.
+*/
+   ioasid_set_data(mm->pasid, svm);
+
+   return mm->pasid;
+   }
+
+   /* Allocate a new pasid. Do not use PASID 0, reserved for init PASID. */
+   pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, svm);
+   if (pasid != INVALID_IOASID) {
+   /* A new pasid is allocated. */
+   *new_pasid = true;
+   }
+
+   return pasid;
+}
+
 /* Caller must hold pasid_mutex, mm reference */
 static int
 intel_svm_bind_mm(struct device *dev, unsigned int flags,
@@ -518,6 +565,8 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
init_rcu_head(&sdev->rcu);
 
if (!svm) {
+   bool new_pasid;
+
svm = kzalloc(sizeof(*svm), GFP_KERNEL);
if (!svm) {
ret = -ENOMEM;
@@ -529,12 +578,9 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
if (pasid_max > intel_pasid_max_id)
pasid_max = intel_pasid_max_id;
 
-   /* Do not use PASID 0, reserved for RID to PASID */
-   svm->pasid = ioasid_alloc(NULL, PASID_MIN,
- pasid_max - 1, svm);
+   svm->pasid = alloc_pasid(svm, mm, pasid_max, &new_pasid, flags);
if (svm->pasid == INVALID_IOASID) {
-   kfree(svm);
-   kfree(sdev);
+   free_bind(svm, sdev, new_pasid);
   

[PATCH v2 05/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions

2020-06-12 Thread Fenghua Yu
Work submission instruction comes in two flavors. ENQCMD can be called
both in ring 3 and ring 0 and always uses the contents of PASID MSR when
shipping the command to the device. ENQCMDS allows a kernel driver to
submit commands on behalf of a user process. The driver supplies the
PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel
as of now.

The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Re-write commit message (Thomas)

 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 02dabc9e77b0..4469618c410f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -351,6 +351,7 @@
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
+#define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS 
instructions */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3cbe24ca80ab..3a02707c1f4d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_TOTAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_CQM_MBM_LOCAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_AVX512_BF16,  X86_FEATURE_AVX512VL  },
+   { X86_FEATURE_ENQCMD,   X86_FEATURE_XSAVES},
{}
 };
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 03/12] iommu/vt-d: Change flags type to unsigned int in binding mm

2020-06-12 Thread Fenghua Yu
"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".

Change its type to "unsigned int".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this new patch per Thomas' comment.

 drivers/iommu/intel/svm.c   | 7 ---
 include/linux/intel-iommu.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index b5618341b4b1..4e775e12ae52 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -427,7 +427,8 @@ int intel_svm_unbind_gpasid(struct device *dev, unsigned 
int pasid)
 
 /* Caller must hold pasid_mutex, mm reference */
 static int
-intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
+intel_svm_bind_mm(struct device *dev, unsigned int flags,
+ struct svm_dev_ops *ops,
  struct mm_struct *mm, struct intel_svm_dev **sd)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -954,7 +955,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 {
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
-   int flags = 0;
+   unsigned int flags = 0;
int ret;
 
/*
@@ -963,7 +964,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 * and intel_svm etc.
 */
if (drvdata)
-   flags = *(int *)drvdata;
+   flags = *(unsigned int *)drvdata;
mutex_lock(&pasid_mutex);
ret = intel_svm_bind_mm(dev, flags, NULL, mm, &sdev);
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 44fa8879f829..9abc30cf10fc 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -759,7 +759,7 @@ struct intel_svm {
struct mm_struct *mm;
 
struct intel_iommu *iommu;
-   int flags;
+   unsigned int flags;
unsigned int pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 00/12] x86: tag application address space for devices

2020-06-12 Thread Fenghua Yu
Typical hardware devices require a driver stack to translate application
buffers to hardware addresses, and a kernel-user transition to notify the
hardware of new work. What if both the translation and transition overhead
could be eliminated? This is what Shared Virtual Address (SVA) and ENQCMD
enabled hardware like Data Streaming Accelerator (DSA) aims to achieve.
Applications map portals in their local-address-space and directly submit
work to them using a new instruction.

This series enables ENQCMD and associated management of the new MSR
(MSR_IA32_PASID). This new MSR allows an application address space to be
associated with what the PCIe spec calls a Process Address Space ID (PASID).
This PASID tag is carried along with all requests between applications and
devices and allows devices to interact with the process address space.

SVA and ENQCMD enabled device drivers need this series. The phase 2 DSA
patches with SVA and ENQCMD support was released on the top of this series:
https://lore.kernel.org/patchwork/cover/1244060/

This series only provides simple and basic support for ENQCMD and the MSR:
1. Clean up type definitions (patch 1-3). These patches can be in a
   separate series.
   - Define "pasid" as "unsigned int" consistently (patch 1 and 2).
   - Define "flags" as "unsigned int"
2. Explain different various technical terms used in the series (patch 4).
3. Enumerate support for ENQCMD in the processor (patch 5).
4. Handle FPU PASID state and the MSR during context switch (patches 6-7).
5. Define "pasid" in mm_struct (patch 8).
5. Clear PASID state for new mm and forked and cloned thread (patch 9-10).
6. Allocate and free PASID for a process (patch 11).
7. Fix up the PASID MSR in #GP handler when one thread in a process
   executes ENQCMD for the first time (patches 12).

This patch series and the DSA phase 2 series are in
https://github.com/intel/idxd-driver/tree/idxd-stage2

References:
1. Detailed information on the ENQCMD/ENQCMDS instructions and the
IA32_PASID MSR can be found in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

2. Detailed information on DSA can be found in DSA specification:
https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

Chang log:
v2:
- Add patches 1-3 to define "pasid" and "flags" as "unsigned int"
  consistently (Thomas)
  (these 3 patches could be in a separate patch set)
- Add patch 8 to move "pasid" to generic mm_struct (Christoph).
  Jean-Philippe Brucker released a virtually same patch. Upstream only
  needs one of the two.
- Add patch 9 to initialize PASID in a new mm.
- Plus other changes described in each patch (Thomas)

Ashok Raj (1):
  docs: x86: Add documentation for SVA (Shared Virtual Addressing)

Fenghua Yu (10):
  iommu: Change type of pasid to unsigned int
  ocxl: Change type of pasid to unsigned int
  iommu/vt-d: Change flags type to unsigned int in binding mm
  x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
  x86/msr-index: Define IA32_PASID MSR
  mm: Define pasid in mm
  fork: Clear PASID for new mm
  x86/process: Clear PASID state for a newly forked/cloned thread
  x86/mmu: Allocate/free PASID
  x86/traps: Fix up invalid PASID

Yu-cheng Yu (1):
  x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

 Documentation/x86/index.rst|   1 +
 Documentation/x86/sva.rst  | 287 +
 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/fpu/types.h   |  10 +
 arch/x86/include/asm/fpu/xstate.h  |   2 +-
 arch/x86/include/asm/iommu.h   |   3 +
 arch/x86/include/asm/mmu_context.h |  14 ++
 arch/x86/include/asm/msr-index.h   |   3 +
 arch/x86/kernel/cpu/cpuid-deps.c   |   1 +
 arch/x86/kernel/fpu/xstate.c   |   4 +
 arch/x86/kernel/process.c  |  18 ++
 arch/x86/kernel/traps.c|  23 ++
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c |   5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |   2 +-
 drivers/iommu/amd/amd_iommu.h  |  13 +-
 drivers/iommu/amd/amd_iommu_types.h|  12 +-
 drivers/iommu/amd/init.c   |   4 +-
 drivers/iommu/amd/iommu.c  |  41 ++--
 drivers/iommu/amd/iommu_v2.c   |  22 +-
 drivers/iommu/intel/debugfs.c  |   2 +-
 drivers/iommu/intel/dmar.c |  13 +-
 drivers/iommu/intel/intel-pasid.h  |  21 +-
 drivers/iommu/intel/iommu.c|   4 +-
 drivers/iommu/intel/pasid.c|  36 ++--
 drivers/iommu/intel/svm.c  | 159 --
 drivers/iommu/iommu.c  |   2 +-
 drivers/misc/ocxl/config.c |   3 +-
 drivers/misc/ocxl/link.c   |   6 +-
 drivers/misc/ocxl/ocxl_internal.h  |   6 +-
 drivers/misc/ocxl/pasid.c  |   2 +-
 drivers/misc/ocxl/trace.h

[PATCH v2 08/12] mm: Define pasid in mm

2020-06-12 Thread Fenghua Yu
PASID is shared by all threads in a process. So the logical place to keep
track of it is in the "mm". Both ARM and X86 need to use the PASID in the
"mm".

Suggested-by: Christoph Hellwig 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- This new patch moves "pasid" from x86 specific mm_context_t to generic
  struct mm_struct per Christopher's comment: 
https://lore.kernel.org/linux-iommu/20200414170252.714402-1-jean-phili...@linaro.org/T/#mb57110ffe1aaa24750eeea4f93b611f0d1913911
- Jean-Philippe Brucker released a virtually same patch. I still put this
  patch in the series for better review. The upstream kernel only needs one
  of the two patches eventually.
https://lore.kernel.org/linux-iommu/20200519175502.2504091-2-jean-phili...@linaro.org/
- Change CONFIG_IOASID to CONFIG_PCI_PASID (Ashok)

 include/linux/mm_types.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 64ede5f150dc..5778db3aa42d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -538,6 +538,10 @@ struct mm_struct {
atomic_long_t hugetlb_usage;
 #endif
struct work_struct async_put_work;
+
+#ifdef CONFIG_PCI_PASID
+   unsigned int pasid;
+#endif
} __randomize_layout;
 
/*
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/amd: Move Kconfig and Makefile bits down into amd directory

2020-06-12 Thread Jerry Snitselaar
Move AMD Kconfig and Makefile bits down into the amd directory
with the rest of the AMD specific files.

Cc: Joerg Roedel 
Cc: Suravee Suthikulpanit 
Signed-off-by: Jerry Snitselaar 
---
 drivers/iommu/Kconfig  | 45 +-
 drivers/iommu/Makefile |  5 +
 drivers/iommu/amd/Kconfig  | 44 +
 drivers/iommu/amd/Makefile |  4 
 4 files changed, 50 insertions(+), 48 deletions(-)
 create mode 100644 drivers/iommu/amd/Kconfig
 create mode 100644 drivers/iommu/amd/Makefile

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index b12d4ec124f6..78a8be0053b3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -132,50 +132,7 @@ config IOMMU_PGTABLES_L2
def_bool y
depends on MSM_IOMMU && MMU && SMP && CPU_DCACHE_DISABLE=n
 
-# AMD IOMMU support
-config AMD_IOMMU
-   bool "AMD IOMMU support"
-   select SWIOTLB
-   select PCI_MSI
-   select PCI_ATS
-   select PCI_PRI
-   select PCI_PASID
-   select IOMMU_API
-   select IOMMU_IOVA
-   select IOMMU_DMA
-   depends on X86_64 && PCI && ACPI
-   ---help---
- With this option you can enable support for AMD IOMMU hardware in
- your system. An IOMMU is a hardware component which provides
- remapping of DMA memory accesses from devices. With an AMD IOMMU you
- can isolate the DMA memory of different devices and protect the
- system from misbehaving device drivers or hardware.
-
- You can find out if your system has an AMD IOMMU if you look into
- your BIOS for an option to enable it or if you have an IVRS ACPI
- table.
-
-config AMD_IOMMU_V2
-   tristate "AMD IOMMU Version 2 driver"
-   depends on AMD_IOMMU
-   select MMU_NOTIFIER
-   ---help---
- This option enables support for the AMD IOMMUv2 features of the IOMMU
- hardware. Select this option if you want to use devices that support
- the PCI PRI and PASID interface.
-
-config AMD_IOMMU_DEBUGFS
-   bool "Enable AMD IOMMU internals in DebugFS"
-   depends on AMD_IOMMU && IOMMU_DEBUGFS
-   ---help---
- !!!WARNING!!!  !!!WARNING!!!  !!!WARNING!!!  !!!WARNING!!!
-
- DO NOT ENABLE THIS OPTION UNLESS YOU REALLY, -REALLY- KNOW WHAT YOU 
ARE DOING!!!
- Exposes AMD IOMMU device internals in DebugFS.
-
- This option is -NOT- intended for production environments, and should
- not generally be enabled.
-
+source "drivers/iommu/amd/Kconfig"
 source "drivers/iommu/intel/Kconfig"
 
 config IRQ_REMAP
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 71dd2f382e78..f356bc12b1c7 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-y += intel/
+obj-y += amd/ intel/
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
@@ -12,9 +12,6 @@ obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU) += of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
-obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
-obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
-obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
 arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
new file mode 100644
index ..1f061d91e0b8
--- /dev/null
+++ b/drivers/iommu/amd/Kconfig
@@ -0,0 +1,44 @@
+# SPDX-License-Identifier: GPL-2.0-only
+# AMD IOMMU support
+config AMD_IOMMU
+   bool "AMD IOMMU support"
+   select SWIOTLB
+   select PCI_MSI
+   select PCI_ATS
+   select PCI_PRI
+   select PCI_PASID
+   select IOMMU_API
+   select IOMMU_IOVA
+   select IOMMU_DMA
+   depends on X86_64 && PCI && ACPI
+   help
+ With this option you can enable support for AMD IOMMU hardware in
+ your system. An IOMMU is a hardware component which provides
+ remapping of DMA memory accesses from devices. With an AMD IOMMU you
+ can isolate the DMA memory of different devices and protect the
+ system from misbehaving device drivers or hardware.
+
+ You can find out if your system has an AMD IOMMU if you look into
+ your BIOS for an option to enable it or if you have an IVRS ACPI
+ table.
+
+config AMD_IOMMU_V2
+   tristate "AMD IOMMU Version 2 driver"
+   depends on AMD_IOMMU
+   select MMU_NOTIFIER
+   help
+ This option enables support for the AMD IOMMUv2 features of the IOMMU
+ hardware. Select this option if you want to use devices that support
+ the PCI PRI and PASID interface.
+
+config AMD_IOMMU_DEBUGFS
+   bool "Enable AMD IOMMU internals in DebugFS"
+   depends on AMD_IOMMU && IOMM

[PATCH 0/2] iommu: Move AMD and Intel Kconfig + Makefile bits into their directories

2020-06-12 Thread Jerry Snitselaar
This patchset imeplements the suggestion from Linus to move the Kconfig
and Makefile bits for AMD and Intel into their respective directories.
It also cleans up a couple Kconfig entries to use the newer help
attribute instead of ---help--- (complaint from checkpatch).

Jerry Snitselaar (2):
  iommu/vt-d: Move Kconfig and Makefile bits down into intel directory
  iommu/amd: Move Kconfig and Makefile bits down into amd directory


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/vt-d: Move Kconfig and Makefile bits down into intel directory

2020-06-12 Thread Jerry Snitselaar
Move Intel Kconfig and Makefile bits down into intel directory
with the rest of the Intel specific files.

Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Jerry Snitselaar 
---
 drivers/iommu/Kconfig| 86 +---
 drivers/iommu/Makefile   |  8 +---
 drivers/iommu/intel/Kconfig  | 86 
 drivers/iommu/intel/Makefile |  7 +++
 4 files changed, 96 insertions(+), 91 deletions(-)
 create mode 100644 drivers/iommu/intel/Kconfig
 create mode 100644 drivers/iommu/intel/Makefile

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index aca76383f201..b12d4ec124f6 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -176,91 +176,7 @@ config AMD_IOMMU_DEBUGFS
  This option is -NOT- intended for production environments, and should
  not generally be enabled.
 
-# Intel IOMMU support
-config DMAR_TABLE
-   bool
-
-config INTEL_IOMMU
-   bool "Support for Intel IOMMU using DMA Remapping Devices"
-   depends on PCI_MSI && ACPI && (X86 || IA64)
-   select IOMMU_API
-   select IOMMU_IOVA
-   select NEED_DMA_MAP_STATE
-   select DMAR_TABLE
-   select SWIOTLB
-   select IOASID
-   help
- DMA remapping (DMAR) devices support enables independent address
- translations for Direct Memory Access (DMA) from devices.
- These DMA remapping devices are reported via ACPI tables
- and include PCI device scope covered by these DMA
- remapping devices.
-
-config INTEL_IOMMU_DEBUGFS
-   bool "Export Intel IOMMU internals in Debugfs"
-   depends on INTEL_IOMMU && IOMMU_DEBUGFS
-   help
- !!!WARNING!!!
-
- DO NOT ENABLE THIS OPTION UNLESS YOU REALLY KNOW WHAT YOU ARE DOING!!!
-
- Expose Intel IOMMU internals in Debugfs.
-
- This option is -NOT- intended for production environments, and should
- only be enabled for debugging Intel IOMMU.
-
-config INTEL_IOMMU_SVM
-   bool "Support for Shared Virtual Memory with Intel IOMMU"
-   depends on INTEL_IOMMU && X86
-   select PCI_PASID
-   select PCI_PRI
-   select MMU_NOTIFIER
-   select IOASID
-   help
- Shared Virtual Memory (SVM) provides a facility for devices
- to access DMA resources through process address space by
- means of a Process Address Space ID (PASID).
-
-config INTEL_IOMMU_DEFAULT_ON
-   def_bool y
-   prompt "Enable Intel DMA Remapping Devices by default"
-   depends on INTEL_IOMMU
-   help
- Selecting this option will enable a DMAR device at boot time if
- one is found. If this option is not selected, DMAR support can
- be enabled by passing intel_iommu=on to the kernel.
-
-config INTEL_IOMMU_BROKEN_GFX_WA
-   bool "Workaround broken graphics drivers (going away soon)"
-   depends on INTEL_IOMMU && BROKEN && X86
-   ---help---
- Current Graphics drivers tend to use physical address
- for DMA and avoid using DMA APIs. Setting this config
- option permits the IOMMU driver to set a unity map for
- all the OS-visible memory. Hence the driver can continue
- to use physical addresses for DMA, at least until this
- option is removed in the 2.6.32 kernel.
-
-config INTEL_IOMMU_FLOPPY_WA
-   def_bool y
-   depends on INTEL_IOMMU && X86
-   ---help---
- Floppy disk drivers are known to bypass DMA API calls
- thereby failing to work when IOMMU is enabled. This
- workaround will setup a 1:1 mapping for the first
- 16MiB to make floppy (an ISA device) work.
-
-config INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
-   bool "Enable Intel IOMMU scalable mode by default"
-   depends on INTEL_IOMMU
-   help
- Selecting this option will enable by default the scalable mode if
- hardware presents the capability. The scalable mode is defined in
- VT-d 3.0. The scalable mode capability could be checked by reading
- /sys/devices/virtual/iommu/dmar*/intel-iommu/ecap. If this option
- is not selected, scalable mode support could also be enabled by
- passing intel_iommu=sm_on to the kernel. If not sure, please use
- the default value.
+source "drivers/iommu/intel/Kconfig"
 
 config IRQ_REMAP
bool "Support for Interrupt Remapping"
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..71dd2f382e78 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
+obj-y += intel/
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
@@ -17,13 +18,8 @@ obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
 arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
-obj-$(CONFIG_DMAR_TABLE) += intel/dma

Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-12 Thread Jacob Pan
Hi Jon,

On Thu, 11 Jun 2020 10:30:32 +0100
Jonathan Cameron  wrote:

> On Wed, 10 Jun 2020 21:12:13 -0700
> Jacob Pan  wrote:
> 
> > IOMMU UAPI is newly introduced to support communications between
> > guest virtual IOMMU and host IOMMU. There has been lots of
> > discussions on how it should work with VFIO UAPI and userspace in
> > general.
> > 
> > This document is indended to clarify the UAPI design and usage. The
> > mechenics of how future extensions should be achieved are also
> > covered  
> 
> mechanics 
> 
will fix,

> > in this documentation.
> > 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan   
> Mostly seems sensible.  A few comments / queries inline.
> 
> Jonathan
> 
> > ---
> >  Documentation/userspace-api/iommu.rst | 210
> > ++ 1 file changed, 210 insertions(+)
> >  create mode 100644 Documentation/userspace-api/iommu.rst
> > 
> > diff --git a/Documentation/userspace-api/iommu.rst
> > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > index ..e95dc5a04a41
> > --- /dev/null
> > +++ b/Documentation/userspace-api/iommu.rst
> > @@ -0,0 +1,210 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. iommu:
> > +
> > +=
> > +IOMMU Userspace API
> > +=
> > +
> > +IOMMU UAPI is used for virtualization cases where communications
> > are +needed between physical and virtual IOMMU drivers. For native
> > +usage, IOMMU is a system device which does not need to communicate
> > +with user space directly.
> > +
> > +The primary use cases are guest Shared Virtual Address (SVA) and
> > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU)
> > is  
> 
> wherein _a_ virtual IOMMU 
right,

> 
> > +required to communicate with the physical IOMMU in the host.
> > +
> > +.. contents:: :local:
> > +
> > +Functionalities
> > +
> > +Communications of user and kernel involve both directions. The
> > +supported user-kernel APIs are as follows:
> > +
> > +1. Alloc/Free PASID
> > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > +4. Invalidate IOMMU caches
> > +5. Service page request
> > +
> > +Requirements
> > +
> > +The IOMMU UAPIs are generic and extensible to meet the following
> > +requirements:
> > +
> > +1. Emulated and para-virtualised vIOMMUs
> > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > +3. Extensions to the UAPI shall not break existing user space
> > +
> > +Interfaces
> > +
> > +Although the data structures defined in IOMMU UAPI are
> > self-contained, +there is no user API functions introduced.
> > Instead, IOMMU UAPI is +designed to work with existing user driver
> > frameworks such as VFIO. +
> > +Extension Rules & Precautions
> > +-
> > +When IOMMU UAPI gets extended, the data structures can *only* be
> > +modified in two ways:
> > +
> > +1. Adding new fields by re-purposing the padding[] field. No size
> > change. +2. Adding new union members at the end. May increase in
> > size. +
> > +No new fields can be added *after* the variable size union in that
> > it +will break backward compatibility when offset moves. In both
> > cases, a +new flag must be accompanied with a new field such that
> > the IOMMU +driver can process the data based on the new flag.
> > Version field is +only reserved for the unlikely event of UAPI
> > upgrade at its entirety. +
> > +It's *always* the caller's responsibility to indicate the size of
> > the +structure passed by setting argsz appropriately.
> > +
> > +When IOMMU UAPI extension results in size increase, user such as
> > VFIO +has to handle the following scenarios:
> > +
> > +1. User and kernel has exact size match
> > +2. An older user with older kernel header (smaller UAPI size)
> > running on a
> > +   newer kernel (larger UAPI size)
> > +3. A newer user with newer kernel header (larger UAPI size) running
> > +   on a older kernel.
> > +4. A malicious/misbehaving user pass illegal/invalid size but
> > within
> > +   range. The data may contain garbage.
> > +
> > +
> > +Feature Checking
> > +
> > +While launching a guest with vIOMMU, it is important to ensure
> > that host +can support the UAPI data structures to be used for
> > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > checking, future +faults are difficult to report even in normal
> > conditions. For example, +TLB invalidations should always succeed
> > from vIOMMU's +perspective.   
> 
> This statement has me concerned.  If a TLB invalidation fails, but
> is reported to the guest as successful do we have possible breaking
> of iommu isolation guarantees?
> 
Good point. we should never report success if TLB invalidation fails.
Perhaps reword as:
"For example, TLB invalidations shoul

Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-12 Thread Jacob Pan
On Thu, 11 Jun 2020 14:33:08 +0800
Lu Baolu  wrote:

> Hi Jacob,
> 
> On 2020/6/11 12:12, Jacob Pan wrote:
> > IOMMU UAPI is newly introduced to support communications between
> > guest virtual IOMMU and host IOMMU. There has been lots of
> > discussions on how it should work with VFIO UAPI and userspace in
> > general.
> > 
> > This document is indended to clarify the UAPI design and usage. The
> > mechenics of how future extensions should be achieved are also
> > covered in this documentation.
> > 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >   Documentation/userspace-api/iommu.rst | 210
> > ++ 1 file changed, 210 insertions(+)
> >   create mode 100644 Documentation/userspace-api/iommu.rst
> > 
> > diff --git a/Documentation/userspace-api/iommu.rst
> > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > index ..e95dc5a04a41
> > --- /dev/null
> > +++ b/Documentation/userspace-api/iommu.rst
> > @@ -0,0 +1,210 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. iommu:
> > +
> > +=
> > +IOMMU Userspace API
> > +=
> > +
> > +IOMMU UAPI is used for virtualization cases where communications
> > are +needed between physical and virtual IOMMU drivers. For native
> > +usage, IOMMU is a system device which does not need to communicate
> > +with user space directly.
> > +
> > +The primary use cases are guest Shared Virtual Address (SVA) and
> > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
> > +required to communicate with the physical IOMMU in the host.
> > +
> > +.. contents:: :local:
> > +
> > +Functionalities
> > +
> > +Communications of user and kernel involve both directions. The
> > +supported user-kernel APIs are as follows:
> > +
> > +1. Alloc/Free PASID
> > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > +4. Invalidate IOMMU caches
> > +5. Service page request
> > +
> > +Requirements
> > +
> > +The IOMMU UAPIs are generic and extensible to meet the following
> > +requirements:
> > +
> > +1. Emulated and para-virtualised vIOMMUs
> > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > +3. Extensions to the UAPI shall not break existing user space
> > +
> > +Interfaces
> > +
> > +Although the data structures defined in IOMMU UAPI are
> > self-contained, +there is no user API functions introduced.
> > Instead, IOMMU UAPI is +designed to work with existing user driver
> > frameworks such as VFIO. +
> > +Extension Rules & Precautions
> > +-
> > +When IOMMU UAPI gets extended, the data structures can *only* be
> > +modified in two ways:
> > +
> > +1. Adding new fields by re-purposing the padding[] field. No size
> > change. +2. Adding new union members at the end. May increase in
> > size. +
> > +No new fields can be added *after* the variable size union in that
> > it +will break backward compatibility when offset moves. In both
> > cases, a +new flag must be accompanied with a new field such that
> > the IOMMU +driver can process the data based on the new flag.
> > Version field is +only reserved for the unlikely event of UAPI
> > upgrade at its entirety. +
> > +It's *always* the caller's responsibility to indicate the size of
> > the +structure passed by setting argsz appropriately.
> > +
> > +When IOMMU UAPI extension results in size increase, user such as
> > VFIO +has to handle the following scenarios:
> > +
> > +1. User and kernel has exact size match
> > +2. An older user with older kernel header (smaller UAPI size)
> > running on a
> > +   newer kernel (larger UAPI size)
> > +3. A newer user with newer kernel header (larger UAPI size) running
> > +   on a older kernel.
> > +4. A malicious/misbehaving user pass illegal/invalid size but
> > within
> > +   range. The data may contain garbage.
> > +
> > +
> > +Feature Checking
> > +
> > +While launching a guest with vIOMMU, it is important to ensure
> > that host +can support the UAPI data structures to be used for
> > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > checking, future +faults are difficult to report even in normal
> > conditions. For example, +TLB invalidations should always succeed
> > from vIOMMU's +perspective. There is no architectural way to report
> > back to the vIOMMU +if the UAPI data is incompatible. For this
> > reason the following IOMMU +UAPIs cannot fail:
> > +
> > +1. Free PASID
> > +2. Unbind guest PASID
> > +3. Unbind guest PASID table (SMMU)
> > +4. Cache invalidate
> > +5. Page response
> > +
> > +User applications such as QEMU is expected to import kernel UAPI
> > +headers. Only backward compatibility is supported. For example, an
> > +older QEMU (with older kernel header) can

Re: [git pull] iommu: Move Intel and AMD drivers to a subdirectory

2020-06-12 Thread pr-tracker-bot
The pull request you sent on Fri, 12 Jun 2020 17:22:10 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
> tags/iommu-drivers-move-v5.8

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8f02f363f76f99f08117336cfac7f24c76b25be3

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [git pull] iommu: Move Intel and AMD drivers to a subdirectory

2020-06-12 Thread Linus Torvalds
On Fri, Jun 12, 2020 at 8:22 AM Joerg Roedel  wrote:
>
> I am not sure it is the right time to send this.

Looks good to me. Any time a directory starts to have a lot of
filenames with a particular prefix, moving them deeper like this seems
to make sense. And doing it just before the -rc1 release and avoiding
unnecessary conflicts seems like the right time too.

So pulled.

Looking at it, it might even be worth moving the Kconfig and Makefile
details down to the intel/amd subdirectories, and have them be
included from the main iommu ones? But that's up to you.

Thanks,

Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] drm/msm/a6xx: Add support for per-instance pagetables

2020-06-12 Thread Jordan Crouse
On Thu, Jun 11, 2020 at 08:22:29PM -0700, Rob Clark wrote:
> On Thu, Jun 11, 2020 at 3:29 PM Jordan Crouse  wrote:
> >
> > Add support for using per-instance pagetables if all the dependencies are
> > available.
> >
> > Signed-off-by: Jordan Crouse 
> > ---
> >
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 69 ++-
> >  drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
> >  2 files changed, 69 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index a1589e040c57..5e82b85d4d55 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -79,6 +79,58 @@ static void get_stats_counter(struct msm_ringbuffer 
> > *ring, u32 counter,
> > OUT_RING(ring, upper_32_bits(iova));
> >  }
> >
> > +static void a6xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer 
> > *ring,
> > +   struct msm_file_private *ctx)
> > +{
> > +   phys_addr_t ttbr;
> > +   u32 asid;
> > +
> > +   if (msm_iommu_pagetable_params(ctx->aspace->mmu, &ttbr, &asid))
> > +   return;
> > +
> > +   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> > +   OUT_RING(ring, 0);
> > +
> > +   /* Turn on APIV mode to access critical regions */
> > +   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
> > +   OUT_RING(ring, 1);
> > +
> > +   /* Make sure the ME is synchronized before staring the update */
> > +   OUT_PKT7(ring, CP_WAIT_FOR_ME, 0);
> > +
> > +   /* Execute the table update */
> > +   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
> > +   OUT_RING(ring, lower_32_bits(ttbr));
> > +   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
> > +   /* CONTEXTIDR is currently unused */
> > +   OUT_RING(ring, 0);
> > +   /* CONTEXTBANK is currently unused */
> > +   OUT_RING(ring, 0);
> 
> I can add this to xml (on userspace side, we've been describing packet
> payload in xml and using the generated builders), and update generated
> headers, if you agree to not add more open-coded pkt7 building ;-)

But open coding opcode is so much fun! :)  Its fine to put this in the XML. It
can only be executed from the ringbuffer FWIW.

> > +
> > +   /*
> > +* Write the new TTBR0 to the memstore. This is good for debugging.
> > +*/
> > +   OUT_PKT7(ring, CP_MEM_WRITE, 4);
> > +   OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
> > +   OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
> > +   OUT_RING(ring, lower_32_bits(ttbr));
> > +   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
> > +
> > +   /* Invalidate the draw state so we start off fresh */
> > +   OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
> > +   OUT_RING(ring, 0x4);
> > +   OUT_RING(ring, 1);
> > +   OUT_RING(ring, 0);
> 
> Ie, this would look like:
> 
> OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
> OUT_RING(ring, CP_SET_DRAW_STATE__0_COUNT(0) |
> CP_SET_DRAW_STATE__0_DISABLE_ALL_GROUPS |
> CP_SET_DRAW_STATE__0_GROUP_ID(0));
> OUT_RING(ring, CP_SET_DRAW_STATE__1_ADDR_LO(1));
> OUT_RING(ring, CP_SET_DRAW_STATE__2_ADDR_HI(0));
> 
> .. but written that way, I think you meant ADDR_LO to be zero?
> 
> (it is possible we need to regen headers for that to work, the kernel
> headers are somewhat out of date by now)

As we discussed on IRC this bit isn't needed because the CP_SMMU_TABLE_UPDATE
handles it for us.  I'll remove that.

> BR,
> -R

Jordan

> > +
> > +   /* Turn off APRIV */
> > +   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
> > +   OUT_RING(ring, 0);
> > +
> > +   /* Turn off protected mode */
> > +   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> > +   OUT_RING(ring, 1);
> > +}
> > +
> >  static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> > struct msm_file_private *ctx)
> >  {
> > @@ -89,6 +141,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> > msm_gem_submit *submit,
> > struct msm_ringbuffer *ring = submit->ring;
> > unsigned int i;
> >
> > +   a6xx_set_pagetable(gpu, ring, ctx);
> > +
> > get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
> > rbmemptr_stats(ring, index, cpcycles_start));
> >
> > @@ -872,6 +926,18 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
> > return (unsigned long)busy_time;
> >  }
> >
> > +struct msm_gem_address_space *a6xx_address_space_instance(struct msm_gpu 
> > *gpu)
> > +{
> > +   struct msm_mmu *mmu;
> > +
> > +   mmu = msm_iommu_pagetable_create(gpu->aspace->mmu);
> > +   if (IS_ERR(mmu))
> > +   return msm_gem_address_space_get(gpu->aspace);
> > +
> > +   return msm_gem_address_space_create(mmu,
> > +   "gpu", 0x1ULL, 0x1ULL);
> > +}
> > +
> >  static const struct adreno_gpu_funcs funcs = {
> > .base = {
> >

[git pull] iommu: Move Intel and AMD drivers to a subdirectory

2020-06-12 Thread Joerg Roedel
Hi Linus,

I am not sure it is the right time to send this. The patches below have
not been part of the IOMMU updates for v5.8, in the AMD case because it
made the merge conflicts even worse, and in the Intel case because it
was not done yet.

It would be good to have both drivers moved in v5.8-rc1, mostly because
it avoids conflicts between fixes and v5.9 updates later in the cycle.
But it is also no problem to defer these changes, I can put them again
into the IOMMU tree and the end of the v5.8 cycle when you feel this
should have been ready earlier. With that said:

The following changes since commit 431275afdc7155415254aef4bd3816a1b8a2ead0:

  iommu: Check for deferred attach in iommu_group_do_dma_attach() (2020-06-04 
11:38:17 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-drivers-move-v5.8

for you to fetch changes up to 672cf6df9b8a3a6d70a6a5c30397f76fa40d3178:

  iommu/vt-d: Move Intel IOMMU driver into subdirectory (2020-06-10 17:46:43 
+0200)


IOMMU drivers directory structure cleanup:

- Move the Intel and AMD IOMMU drivers into their own
  subdirectory. Both drivers consist of several files by now and
  giving them their own directory unclutters the IOMMU top-level
  directory a bit.


Joerg Roedel (2):
  iommu/amd: Move AMD IOMMU driver into subdirectory
  iommu/vt-d: Move Intel IOMMU driver into subdirectory

 MAINTAINERS|  5 ++---
 drivers/iommu/Makefile | 18 +-
 drivers/iommu/{ => amd}/amd_iommu.h|  0
 drivers/iommu/{ => amd}/amd_iommu_types.h  |  0
 drivers/iommu/{amd_iommu_debugfs.c => amd/debugfs.c}   |  0
 drivers/iommu/{amd_iommu_init.c => amd/init.c} |  2 +-
 drivers/iommu/{amd_iommu.c => amd/iommu.c} |  2 +-
 drivers/iommu/{amd_iommu_v2.c => amd/iommu_v2.c}   |  0
 drivers/iommu/{amd_iommu_quirks.c => amd/quirks.c} |  0
 .../iommu/{intel-iommu-debugfs.c => intel/debugfs.c}   |  0
 drivers/iommu/{ => intel}/dmar.c   |  2 +-
 drivers/iommu/{ => intel}/intel-pasid.h|  0
 drivers/iommu/{intel-iommu.c => intel/iommu.c} |  2 +-
 .../{intel_irq_remapping.c => intel/irq_remapping.c}   |  2 +-
 drivers/iommu/{intel-pasid.c => intel/pasid.c} |  0
 drivers/iommu/{intel-svm.c => intel/svm.c} |  0
 drivers/iommu/{intel-trace.c => intel/trace.c} |  0
 17 files changed, 16 insertions(+), 17 deletions(-)
 rename drivers/iommu/{ => amd}/amd_iommu.h (100%)
 rename drivers/iommu/{ => amd}/amd_iommu_types.h (100%)
 rename drivers/iommu/{amd_iommu_debugfs.c => amd/debugfs.c} (100%)
 rename drivers/iommu/{amd_iommu_init.c => amd/init.c} (99%)
 rename drivers/iommu/{amd_iommu.c => amd/iommu.c} (99%)
 rename drivers/iommu/{amd_iommu_v2.c => amd/iommu_v2.c} (100%)
 rename drivers/iommu/{amd_iommu_quirks.c => amd/quirks.c} (100%)
 rename drivers/iommu/{intel-iommu-debugfs.c => intel/debugfs.c} (100%)
 rename drivers/iommu/{ => intel}/dmar.c (99%)
 rename drivers/iommu/{ => intel}/intel-pasid.h (100%)
 rename drivers/iommu/{intel-iommu.c => intel/iommu.c} (99%)
 rename drivers/iommu/{intel_irq_remapping.c => intel/irq_remapping.c} (99%)
 rename drivers/iommu/{intel-pasid.c => intel/pasid.c} (100%)
 rename drivers/iommu/{intel-svm.c => intel/svm.c} (100%)
 rename drivers/iommu/{intel-trace.c => intel/trace.c} (100%)

Please pull.

Thanks,

Joerg


signature.asc
Description: Digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: arm64 iommu groups issue

2020-06-12 Thread Lorenzo Pieralisi
On Mon, Feb 17, 2020 at 12:08:48PM +, John Garry wrote:
> > > 
> > > Right, and even worse is that it relies on the port driver even
> > > existing at all.
> > > 
> > > All this iommu group assignment should be taken outside device
> > > driver probe paths.
> > > 
> > > However we could still consider device links for sync'ing the SMMU
> > > and each device probing.
> > 
> > Yes, we should get that for DT now thanks to the of_devlink stuff, but
> > cooking up some equivalent for IORT might be worthwhile.
> 
> It doesn't solve this problem, but at least we could remove the iommu_ops
> check in iort_iommu_xlate().
> 
> We would need to carve out a path from pci_device_add() or even device_add()
> to solve all cases.
> 
> > 
> > > > Another thought that crosses my mind is that when pci_device_group()
> > > > walks up to the point of ACS isolation and doesn't find an existing
> > > > group, it can still infer that everything it walked past *should* be put
> > > > in the same group it's then eventually going to return. Unfortunately I
> > > > can't see an obvious way for it to act on that knowledge, though, since
> > > > recursive iommu_probe_device() is unlikely to end well.
> > > 
> 
> [...]
> 
> > > And this looks to be the reason for which current
> > > iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails
> > > also.
> > 
> > Of course, just adding a 'correct' add_device replay without the
> > of_xlate process doesn't help at all. No wonder this looked suspiciously
> > simpler than where the first idea left off...
> > 
> > (on reflection, the core of this idea seems to be recycling the existing
> > iommu_bus_init walk rather than building up a separate "waiting list",
> > while forgetting that that wasn't the difficult part of the original
> > idea anyway)
> 
> We could still use a bus walk to add the group per iommu, but we would need
> an additional check to ensure the device is associated with the IOMMU.
> 
> > 
> > > On this current code mentioned, the principle of this seems wrong to
> > > me - we call bus_for_each_device(..., add_iommu_group) for the first
> > > SMMU in the system which probes, but we attempt to add_iommu_group()
> > > for all devices on the bus, even though the SMMU for that device may
> > > yet to have probed.
> > 
> > Yes, iommu_bus_init() is one of the places still holding a
> > deeply-ingrained assumption that the ops go live for all IOMMU instances
> > at once, which is what warranted the further replay in
> > of_iommu_configure() originally. Moving that out of
> > of_platform_device_create() to support probe deferral is where the
> > trouble really started.
> 
> I'm not too familiar with the history here, but could this be reverted now
> with the introduction of of_devlink stuff?

Hi John,

have we managed to reach a consensus on this thread on how to solve
the issue ? Asking because this thread seems stalled - I am keen on
getting it fixed.

Thanks,
Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-12 Thread Jacob Pan
On Fri, 12 Jun 2020 07:38:44 +
"Tian, Kevin"  wrote:

> > From: Jacob Pan 
> > Sent: Friday, June 12, 2020 8:27 AM
> > 
> > On Thu, 11 Jun 2020 14:40:47 -0600
> > Alex Williamson  wrote:
> >   
> > > On Thu, 11 Jun 2020 12:52:05 -0700
> > > Jacob Pan  wrote:
> > >  
> > > > Hi Alex,
> > > >
> > > > On Thu, 11 Jun 2020 09:47:41 -0600
> > > > Alex Williamson  wrote:
> > > >  
> > > > > On Wed, 10 Jun 2020 21:12:13 -0700
> > > > > Jacob Pan  wrote:
> > > > >  
> > > > > > IOMMU UAPI is newly introduced to support communications  
> > between  
> > > > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > > > discussions on how it should work with VFIO UAPI and
> > > > > > userspace in general.
> > > > > >
> > > > > > This document is indended to clarify the UAPI design and
> > > > > > usage. The mechenics of how future extensions should be
> > > > > > achieved are also covered in this documentation.
> > > > > >
> > > > > > Signed-off-by: Liu Yi L 
> > > > > > Signed-off-by: Jacob Pan 
> > > > > > ---
> > > > > >  Documentation/userspace-api/iommu.rst | 210
> > > > > > ++ 1 file changed, 210
> > > > > > insertions(+) create mode 100644
> > > > > > Documentation/userspace-api/iommu.rst
> > > > > >
> > > > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > > > index ..e95dc5a04a41
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > > > @@ -0,0 +1,210 @@
> > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > +.. iommu:
> > > > > > +
> > > > > > +=
> > > > > > +IOMMU Userspace API
> > > > > > +=
> > > > > > +
> > > > > > +IOMMU UAPI is used for virtualization cases where
> > > > > > communications are +needed between physical and virtual
> > > > > > IOMMU drivers. For native +usage, IOMMU is a system device
> > > > > > which does not need to communicate +with user space
> > > > > > directly. +
> > > > > > +The primary use cases are guest Shared Virtual Address
> > > > > > (SVA) and +guest IO virtual address (IOVA), wherein virtual
> > > > > > IOMMU (vIOMMU) is +required to communicate with the
> > > > > > physical IOMMU in the host. +
> > > > > > +.. contents:: :local:
> > > > > > +
> > > > > > +Functionalities
> > > > > > +
> > > > > > +Communications of user and kernel involve both directions.
> > > > > > The +supported user-kernel APIs are as follows:
> > > > > > +
> > > > > > +1. Alloc/Free PASID
> > > > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > > > +4. Invalidate IOMMU caches
> > > > > > +5. Service page request
> > > > > > +
> > > > > > +Requirements
> > > > > > +
> > > > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > > > following +requirements:
> > > > > > +
> > > > > > +1. Emulated and para-virtualised vIOMMUs
> > > > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > > > +3. Extensions to the UAPI shall not break existing user
> > > > > > space +
> > > > > > +Interfaces
> > > > > > +
> > > > > > +Although the data structures defined in IOMMU UAPI are
> > > > > > self-contained, +there is no user API functions introduced.
> > > > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > > > driver frameworks such as VFIO. +
> > > > > > +Extension Rules & Precautions
> > > > > > +-
> > > > > > +When IOMMU UAPI gets extended, the data structures can
> > > > > > *only* be +modified in two ways:
> > > > > > +
> > > > > > +1. Adding new fields by re-purposing the padding[] field.
> > > > > > No size change. +2. Adding new union members at the end. May
> > > > > > increase in size. +
> > > > > > +No new fields can be added *after* the variable size union
> > > > > > in that it +will break backward compatibility when offset
> > > > > > moves. In both cases, a +new flag must be accompanied with
> > > > > > a new field such that the IOMMU +driver can process the
> > > > > > data based on the new flag. Version field is +only reserved
> > > > > > for the unlikely event of UAPI upgrade at its entirety. +
> > > > > > +It's *always* the caller's responsibility to indicate the
> > > > > > size of the +structure passed by setting argsz
> > > > > > appropriately. +
> > > > > > +When IOMMU UAPI extension results in size increase, user
> > > > > > such as VFIO +has to handle the following scenarios:
> > > > > > +
> > > > > > +1. User and kernel has exact size match
> > > > > > +2. An older user with older kernel header (smaller UAPI
> > > > > > size) running on a
> > > > > > +   newer kernel (larger UAPI size)
> > > > > > +3. A newer user with newer kernel head

RE: [PATCH v2 02/15] iommu: Report domain nesting info

2020-06-12 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Friday, June 12, 2020 3:30 AM
> 
> On Thu, 11 Jun 2020 05:15:21 -0700
> Liu Yi L  wrote:
> 
> > IOMMUs that support nesting translation needs report the capability
> > info to userspace, e.g. the format of first level/stage paging structures.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> > @Jean, Eric: as nesting was introduced for ARM, but looks like no
> > actual user of it. right? So I'm wondering if we can reuse
> > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your opinions?
> >
> >  include/linux/iommu.h  |  1 +
> >  include/uapi/linux/iommu.h | 34 ++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > 78a26ae..f6e4b49 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -126,6 +126,7 @@ enum iommu_attr {
> > DOMAIN_ATTR_FSL_PAMUV1,
> > DOMAIN_ATTR_NESTING,/* two stages of translation */
> > DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > +   DOMAIN_ATTR_NESTING_INFO,
> > DOMAIN_ATTR_MAX,
> >  };
> >
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 303f148..02eac73 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
> > };
> >  };
> >
> > +struct iommu_nesting_info {
> > +   __u32   size;
> > +   __u32   format;
> > +   __u32   features;
> > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID   (1 << 0)
> > +#define IOMMU_NESTING_FEAT_BIND_PGTBL  (1 << 1)
> > +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2)
> > +   __u32   flags;
> > +   __u8data[];
> > +};
> > +
> > +/*
> > + * @flags: VT-d specific flags. Currently reserved for future
> > + * extension.
> > + * @addr_width:The output addr width of first level/stage translation
> > + * @pasid_bits:Maximum supported PASID bits, 0 represents no PASID
> > + * support.
> > + * @cap_reg:   Describe basic capabilities as defined in VT-d 
> > capability
> > + * register.
> > + * @cap_mask:  Mark valid capability bits in @cap_reg.
> > + * @ecap_reg:  Describe the extended capabilities as defined in VT-d
> > + * extended capability register.
> > + * @ecap_mask: Mark the valid capability bits in @ecap_reg.
> 
> Please explain this a little further, why do we need to tell userspace about
> cap/ecap register bits that aren't valid through this interface?
> Thanks,

we only want to tell userspace about the bits marked in the cap/ecap_mask.
cap/ecap_mask is kind of white-list of the cap/ecap register. userspace should
only care about the bits in the white-list, for other bits, it should ignore.

Regards,
Yi Liu

> Alex
> 
> 
> > + */
> > +struct iommu_nesting_info_vtd {
> > +   __u32   flags;
> > +   __u16   addr_width;
> > +   __u16   pasid_bits;
> > +   __u64   cap_reg;
> > +   __u64   cap_mask;
> > +   __u64   ecap_reg;
> > +   __u64   ecap_mask;
> > +};
> > +
> >  #endif /* _UAPI_IOMMU_H */

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 1/3] docs: IOMMU user API

2020-06-12 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Friday, June 12, 2020 8:27 AM
> 
> On Thu, 11 Jun 2020 14:40:47 -0600
> Alex Williamson  wrote:
> 
> > On Thu, 11 Jun 2020 12:52:05 -0700
> > Jacob Pan  wrote:
> >
> > > Hi Alex,
> > >
> > > On Thu, 11 Jun 2020 09:47:41 -0600
> > > Alex Williamson  wrote:
> > >
> > > > On Wed, 10 Jun 2020 21:12:13 -0700
> > > > Jacob Pan  wrote:
> > > >
> > > > > IOMMU UAPI is newly introduced to support communications
> between
> > > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > > discussions on how it should work with VFIO UAPI and userspace
> > > > > in general.
> > > > >
> > > > > This document is indended to clarify the UAPI design and usage.
> > > > > The mechenics of how future extensions should be achieved are
> > > > > also covered in this documentation.
> > > > >
> > > > > Signed-off-by: Liu Yi L 
> > > > > Signed-off-by: Jacob Pan 
> > > > > ---
> > > > >  Documentation/userspace-api/iommu.rst | 210
> > > > > ++ 1 file changed, 210
> > > > > insertions(+) create mode 100644
> > > > > Documentation/userspace-api/iommu.rst
> > > > >
> > > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > > index ..e95dc5a04a41
> > > > > --- /dev/null
> > > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > > @@ -0,0 +1,210 @@
> > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > +.. iommu:
> > > > > +
> > > > > +=
> > > > > +IOMMU Userspace API
> > > > > +=
> > > > > +
> > > > > +IOMMU UAPI is used for virtualization cases where
> > > > > communications are +needed between physical and virtual IOMMU
> > > > > drivers. For native +usage, IOMMU is a system device which does
> > > > > not need to communicate +with user space directly.
> > > > > +
> > > > > +The primary use cases are guest Shared Virtual Address (SVA)
> > > > > and +guest IO virtual address (IOVA), wherein virtual IOMMU
> > > > > (vIOMMU) is +required to communicate with the physical IOMMU in
> > > > > the host. +
> > > > > +.. contents:: :local:
> > > > > +
> > > > > +Functionalities
> > > > > +
> > > > > +Communications of user and kernel involve both directions. The
> > > > > +supported user-kernel APIs are as follows:
> > > > > +
> > > > > +1. Alloc/Free PASID
> > > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > > +4. Invalidate IOMMU caches
> > > > > +5. Service page request
> > > > > +
> > > > > +Requirements
> > > > > +
> > > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > > following +requirements:
> > > > > +
> > > > > +1. Emulated and para-virtualised vIOMMUs
> > > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > > +3. Extensions to the UAPI shall not break existing user space
> > > > > +
> > > > > +Interfaces
> > > > > +
> > > > > +Although the data structures defined in IOMMU UAPI are
> > > > > self-contained, +there is no user API functions introduced.
> > > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > > driver frameworks such as VFIO. +
> > > > > +Extension Rules & Precautions
> > > > > +-
> > > > > +When IOMMU UAPI gets extended, the data structures can *only*
> > > > > be +modified in two ways:
> > > > > +
> > > > > +1. Adding new fields by re-purposing the padding[] field. No
> > > > > size change. +2. Adding new union members at the end. May
> > > > > increase in size. +
> > > > > +No new fields can be added *after* the variable size union in
> > > > > that it +will break backward compatibility when offset moves.
> > > > > In both cases, a +new flag must be accompanied with a new field
> > > > > such that the IOMMU +driver can process the data based on the
> > > > > new flag. Version field is +only reserved for the unlikely
> > > > > event of UAPI upgrade at its entirety. +
> > > > > +It's *always* the caller's responsibility to indicate the size
> > > > > of the +structure passed by setting argsz appropriately.
> > > > > +
> > > > > +When IOMMU UAPI extension results in size increase, user such
> > > > > as VFIO +has to handle the following scenarios:
> > > > > +
> > > > > +1. User and kernel has exact size match
> > > > > +2. An older user with older kernel header (smaller UAPI size)
> > > > > running on a
> > > > > +   newer kernel (larger UAPI size)
> > > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > > running
> > > > > +   on a older kernel.
> > > > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > > > within
> > > > > +   range. The data may contain garbage.
> > > > > +
> > > > > +
> > > > > +Feature Checking
> > > > > +-