Re: [PATCH v8 5/7] iommu/arm-smmu: Add implementation for the adreno GPU SMMU

2020-06-11 Thread kernel test robot
Hi Jordan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on next-20200611]
[cannot apply to iommu/next robh/for-next arm/for-next keystone/next 
rockchip/for-next arm64/for-next/core shawnguo/for-next soc/for-next v5.7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Jordan-Crouse/iommu-arm-smmu-Enable-split-pagetable-support/20200612-094718
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
b961f8dc8976c091180839f4483d67b7c2ca2578
config: arm64-randconfig-s031-20200612 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-250-g42323db3-dirty
# save the attached .config to linux build tree
make W=1 C=1 ARCH=arm64 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All error/warnings (new ones prefixed by >>, old ones prefixed by <<):

drivers/iommu/arm-smmu-qcom.c: In function 'qcom_adreno_smmu_is_gpu_device':
>> drivers/iommu/arm-smmu-qcom.c:17:49: error: 'smmu_domain->dev' is a pointer; 
>> did you mean to use '->'?
17 |  return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno");
| ^
| ->
>> drivers/iommu/arm-smmu-qcom.c:18:1: warning: control reaches end of non-void 
>> function [-Wreturn-type]
18 | }
| ^

vim +17 drivers/iommu/arm-smmu-qcom.c

14  
15  static bool qcom_adreno_smmu_is_gpu_device(struct arm_smmu_domain 
*smmu_domain)
16  {
  > 17  return of_device_is_compatible(smmu_domain->dev.of_node, 
"qcom,adreno");
  > 18  }
19  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 6/7] drm/msm: Set the global virtual address range from the IOMMU domain

2020-06-11 Thread kernel test robot
Hi Jordan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on next-20200611]
[cannot apply to iommu/next robh/for-next arm/for-next keystone/next 
rockchip/for-next arm64/for-next/core shawnguo/for-next soc/for-next v5.7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Jordan-Crouse/iommu-arm-smmu-Enable-split-pagetable-support/20200612-094718
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
b961f8dc8976c091180839f4483d67b7c2ca2578
config: arm-defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>, old ones prefixed by <<):

In file included from include/linux/bitops.h:5,
from include/linux/kernel.h:12,
from include/linux/ascii85.h:11,
from drivers/gpu/drm/msm/adreno/adreno_gpu.c:9:
drivers/gpu/drm/msm/adreno/adreno_gpu.c: In function 
'adreno_iommu_create_address_space':
include/linux/bits.h:37:11: warning: right shift count is negative 
[-Wshift-count-negative]
37 |   (~UL(0) >> (BITS_PER_LONG - 1 - (h
|   ^~
include/linux/bits.h:39:31: note: in expansion of macro '__GENMASK'
39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
|   ^
>> drivers/gpu/drm/msm/adreno/adreno_gpu.c:206:11: note: in expansion of macro 
>> 'GENMASK'
206 |   start & GENMASK(48, 0), size);
|   ^~~
--
In file included from include/linux/bits.h:6,
from include/linux/bitops.h:5,
from include/linux/kernel.h:12,
from drivers/gpu/drm/msm/msm_drv.h:11,
from drivers/gpu/drm/msm/msm_iommu.c:7:
drivers/gpu/drm/msm/msm_iommu.c: In function 'msm_iommu_map':
>> include/vdso/bits.h:7:26: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
7 | #define BIT(nr)   (UL(1) << (nr))
|  ^~
>> drivers/gpu/drm/msm/msm_iommu.c:40:13: note: in expansion of macro 'BIT'
40 |  if (iova & BIT(48))
| ^~~
In file included from include/linux/bitops.h:5,
from include/linux/kernel.h:12,
from drivers/gpu/drm/msm/msm_drv.h:11,
from drivers/gpu/drm/msm/msm_iommu.c:7:
>> include/linux/bits.h:36:22: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
36 |  (((~UL(0)) - (UL(1) << (l)) + 1) &  |  ^~
include/linux/bits.h:39:31: note: in expansion of macro '__GENMASK'
39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
|   ^
>> drivers/gpu/drm/msm/msm_iommu.c:41:11: note: in expansion of macro 'GENMASK'
41 |   iova |= GENMASK(63, 49);
|   ^~~
include/linux/bits.h:37:11: warning: right shift count is negative 
[-Wshift-count-negative]
37 |   (~UL(0) >> (BITS_PER_LONG - 1 - (h
|   ^~
include/linux/bits.h:39:31: note: in expansion of macro '__GENMASK'
39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
|   ^
>> drivers/gpu/drm/msm/msm_iommu.c:41:11: note: in expansion of macro 'GENMASK'
41 |   iova |= GENMASK(63, 49);
|   ^~~
In file included from include/linux/bits.h:6,
from include/linux/bitops.h:5,
from include/linux/kernel.h:12,
from drivers/gpu/drm/msm/msm_drv.h:11,
from drivers/gpu/drm/msm/msm_iommu.c:7:
drivers/gpu/drm/msm/msm_iommu.c: In function 'msm_iommu_unmap':
>> include/vdso/bits.h:7:26: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
7 | #define BIT(nr)   (UL(1) << (nr))
|  ^~
drivers/gpu/drm/msm/msm_iommu.c:53:13: note: in expansion of macro 'BIT'
53 |  if (iova & BIT(48))
| ^~~
In file included from include/linux/bitops.h:5,
from include/linux/kernel.h:12,
from drivers/gpu/drm/msm/msm_drv.h:11,
from drivers/gpu/drm/msm/msm_iommu.c:7:
>> include/linux/bits.h:36:22: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
36 |  (((~UL(0)) - (UL(1) << (l)) + 1) &  |  ^~
include/linux/bits.h:39:31: note: in expansion of macro '__GENMASK'
39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
|   ^
drivers/gpu/drm/msm/msm_iommu.c:54:11: note: in expansion of macro 'GENMASK'
54 |   iova |= GENMASK(63, 4

Re: [PATCH v8 4/7] iommu/arm-smmu: Add a pointer to the attached device to smmu_domain

2020-06-11 Thread kernel test robot
Hi Jordan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.7 next-20200611]
[cannot apply to iommu/next robh/for-next arm/for-next keystone/next 
rockchip/for-next arm64/for-next/core shawnguo/for-next soc/for-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Jordan-Crouse/iommu-arm-smmu-Enable-split-pagetable-support/20200612-094718
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
b961f8dc8976c091180839f4483d67b7c2ca2578
config: arm64-randconfig-s031-20200612 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-250-g42323db3-dirty
# save the attached .config to linux build tree
make W=1 C=1 ARCH=arm64 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

drivers/iommu/arm-smmu.c: In function 'arm_smmu_init_domain_context':
>> drivers/iommu/arm-smmu.c:804:21: error: 'dev' undeclared (first use in this 
>> function); did you mean 'cdev'?
804 |  smmu_domain->dev = dev;
| ^~~
| cdev
drivers/iommu/arm-smmu.c:804:21: note: each undeclared identifier is reported 
only once for each function it appears in

vim +804 drivers/iommu/arm-smmu.c

   669  
   670  static int arm_smmu_init_domain_context(struct iommu_domain *domain,
   671  struct arm_smmu_device *smmu)
   672  {
   673  int irq, start, ret = 0;
   674  unsigned long ias, oas;
   675  struct io_pgtable_ops *pgtbl_ops;
   676  struct io_pgtable_cfg pgtbl_cfg;
   677  enum io_pgtable_fmt fmt;
   678  struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
   679  struct arm_smmu_cfg *cfg = _domain->cfg;
   680  
   681  mutex_lock(_domain->init_mutex);
   682  if (smmu_domain->smmu)
   683  goto out_unlock;
   684  
   685  if (domain->type == IOMMU_DOMAIN_IDENTITY) {
   686  smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS;
   687  smmu_domain->smmu = smmu;
   688  goto out_unlock;
   689  }
   690  
   691  /*
   692   * Mapping the requested stage onto what we support is 
surprisingly
   693   * complicated, mainly because the spec allows S1+S2 SMMUs 
without
   694   * support for nested translation. That means we end up with the
   695   * following table:
   696   *
   697   * RequestedSupportedActual
   698   * S1   N  S1
   699   * S1 S1+S2S1
   700   * S1   S2 S2
   701   * S1   S1 S1
   702   * NN  N
   703   * N  S1+S2S2
   704   * NS2 S2
   705   * NS1 S1
   706   *
   707   * Note that you can't actually request stage-2 mappings.
   708   */
   709  if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
   710  smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
   711  if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
   712  smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
   713  
   714  /*
   715   * Choosing a suitable context format is even more fiddly. 
Until we
   716   * grow some way for the caller to express a preference, and/or 
move
   717   * the decision into the io-pgtable code where it arguably 
belongs,
   718   * just aim for the closest thing to the rest of the system, 
and hope
   719   * that the hardware isn't esoteric enough that we can't assume 
AArch64
   720   * support to be a superset of AArch32 support...
   721   */
   722  if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_L)
   723  cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_L;
   724  if (IS_ENABLED(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) &&
   725  !IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_ARM_LPAE) &&
   726  (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_S) &&
   727  (smmu_domain->stage == ARM_SMMU_DOMAIN_S1))
   728  cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_S;
   

[PATCH v2 1/2] iommu: Mark __iommu_map/__iommu_map_sg as static

2020-06-11 Thread Baolin Wang
Now the __iommu_map() and __iommu_map_sg() are used only in iommu.c
file, so mark them as static.

Signed-off-by: Baolin Wang 
---
 drivers/iommu/iommu.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8584f48..14eca9f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2174,8 +2174,8 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
return pgsize;
 }
 
-int __iommu_map(struct iommu_domain *domain, unsigned long iova,
- phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
+  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
const struct iommu_ops *ops = domain->ops;
unsigned long orig_iova = iova;
@@ -2325,9 +2325,9 @@ size_t iommu_unmap_fast(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_unmap_fast);
 
-size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
-   struct scatterlist *sg, unsigned int nents, int prot,
-   gfp_t gfp)
+static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
+struct scatterlist *sg, unsigned int nents, int 
prot,
+gfp_t gfp)
 {
size_t len = 0, mapped = 0;
phys_addr_t start;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/2] Some improvements for iommu

2020-06-11 Thread Baolin Wang
Hi,

The first patch masks some functions as static, and the second patch
changes to use the gfp parameter from iommu_ops->map() to allocate
ARM page pages. Any comments are welcome. Thanks.

Changes from v1:
 - Fix the building errors when enabling CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST

Baolin Wang (2):
  iommu: Mark __iommu_map/__iommu_map_sg as static
  iommu: Add gfp parameter to io_pgtable_ops->map()

 drivers/gpu/drm/panfrost/panfrost_mmu.c |  2 +-
 drivers/iommu/arm-smmu-v3.c |  2 +-
 drivers/iommu/arm-smmu.c|  2 +-
 drivers/iommu/io-pgtable-arm-v7s.c  | 18 +-
 drivers/iommu/io-pgtable-arm.c  | 18 +-
 drivers/iommu/iommu.c   | 10 +-
 drivers/iommu/ipmmu-vmsa.c  |  2 +-
 drivers/iommu/msm_iommu.c   |  2 +-
 drivers/iommu/mtk_iommu.c   |  2 +-
 drivers/iommu/qcom_iommu.c  |  2 +-
 include/linux/io-pgtable.h  |  2 +-
 11 files changed, 31 insertions(+), 31 deletions(-)

-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] iommu: Add gfp parameter to io_pgtable_ops->map()

2020-06-11 Thread Baolin Wang
Now the ARM page tables are always allocated by GFP_ATOMIC parameter,
but the iommu_ops->map() function has been added a gfp_t parameter by
commit 781ca2de89ba ("iommu: Add gfp parameter to iommu_ops::map"),
thus io_pgtable_ops->map() should use the gfp parameter passed from
iommu_ops->map() to allocate page pages, which can avoid wasting the
memory allocators atomic pools for some non-atomic contexts.

Signed-off-by: Baolin Wang 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  2 +-
 drivers/iommu/arm-smmu-v3.c |  2 +-
 drivers/iommu/arm-smmu.c|  2 +-
 drivers/iommu/io-pgtable-arm-v7s.c  | 18 +-
 drivers/iommu/io-pgtable-arm.c  | 18 +-
 drivers/iommu/ipmmu-vmsa.c  |  2 +-
 drivers/iommu/msm_iommu.c   |  2 +-
 drivers/iommu/mtk_iommu.c   |  2 +-
 drivers/iommu/qcom_iommu.c  |  2 +-
 include/linux/io-pgtable.h  |  2 +-
 10 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index ed28aeb..5a39eee 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -262,7 +262,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct 
panfrost_mmu *mmu,
while (len) {
size_t pgsize = get_pgsize(iova | paddr, len);
 
-   ops->map(ops, iova, paddr, pgsize, prot);
+   ops->map(ops, iova, paddr, pgsize, prot, GFP_KERNEL);
iova += pgsize;
paddr += pgsize;
len -= pgsize;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f578677..7b59f06 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2850,7 +2850,7 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
if (!ops)
return -ENODEV;
 
-   return ops->map(ops, iova, paddr, size, prot);
+   return ops->map(ops, iova, paddr, size, prot, gfp);
 }
 
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4c..dc1d253 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1227,7 +1227,7 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
return -ENODEV;
 
arm_smmu_rpm_get(smmu);
-   ret = ops->map(ops, iova, paddr, size, prot);
+   ret = ops->map(ops, iova, paddr, size, prot, gfp);
arm_smmu_rpm_put(smmu);
 
return ret;
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 4272fe4..a688f22 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -470,7 +470,7 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte 
*table,
 
 static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot,
-int lvl, arm_v7s_iopte *ptep)
+int lvl, arm_v7s_iopte *ptep, gfp_t gfp)
 {
struct io_pgtable_cfg *cfg = >iop.cfg;
arm_v7s_iopte pte, *cptep;
@@ -491,7 +491,7 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, 
unsigned long iova,
/* Grab a pointer to the next level */
pte = READ_ONCE(*ptep);
if (!pte) {
-   cptep = __arm_v7s_alloc_table(lvl + 1, GFP_ATOMIC, data);
+   cptep = __arm_v7s_alloc_table(lvl + 1, gfp, data);
if (!cptep)
return -ENOMEM;
 
@@ -512,11 +512,11 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, 
unsigned long iova,
}
 
/* Rinse, repeat */
-   return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep);
+   return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep, 
gfp);
 }
 
 static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot)
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
struct io_pgtable *iop = >iop;
@@ -530,7 +530,7 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned 
long iova,
paddr >= (1ULL << data->iop.cfg.oas)))
return -ERANGE;
 
-   ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd);
+   ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd, gfp);
/*
 * Synchronise all PTE updates for the new mapping before there's
 * a chance for anything to kick off a table walk for the new iova.
@@ -922,12 +922,12 @@ static int __init arm_v7s_do_selftests(void)
if (ops->map(ops, iova, iova, 

Re: [PATCH 6/6] drm/msm/a6xx: Add support for per-instance pagetables

2020-06-11 Thread Rob Clark
On Thu, Jun 11, 2020 at 3:29 PM Jordan Crouse  wrote:
>
> Add support for using per-instance pagetables if all the dependencies are
> available.
>
> Signed-off-by: Jordan Crouse 
> ---
>
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 69 ++-
>  drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
>  2 files changed, 69 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index a1589e040c57..5e82b85d4d55 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -79,6 +79,58 @@ static void get_stats_counter(struct msm_ringbuffer *ring, 
> u32 counter,
> OUT_RING(ring, upper_32_bits(iova));
>  }
>
> +static void a6xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer 
> *ring,
> +   struct msm_file_private *ctx)
> +{
> +   phys_addr_t ttbr;
> +   u32 asid;
> +
> +   if (msm_iommu_pagetable_params(ctx->aspace->mmu, , ))
> +   return;
> +
> +   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> +   OUT_RING(ring, 0);
> +
> +   /* Turn on APIV mode to access critical regions */
> +   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
> +   OUT_RING(ring, 1);
> +
> +   /* Make sure the ME is synchronized before staring the update */
> +   OUT_PKT7(ring, CP_WAIT_FOR_ME, 0);
> +
> +   /* Execute the table update */
> +   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
> +   OUT_RING(ring, lower_32_bits(ttbr));
> +   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
> +   /* CONTEXTIDR is currently unused */
> +   OUT_RING(ring, 0);
> +   /* CONTEXTBANK is currently unused */
> +   OUT_RING(ring, 0);

I can add this to xml (on userspace side, we've been describing packet
payload in xml and using the generated builders), and update generated
headers, if you agree to not add more open-coded pkt7 building ;-)

> +
> +   /*
> +* Write the new TTBR0 to the memstore. This is good for debugging.
> +*/
> +   OUT_PKT7(ring, CP_MEM_WRITE, 4);
> +   OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
> +   OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
> +   OUT_RING(ring, lower_32_bits(ttbr));
> +   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
> +
> +   /* Invalidate the draw state so we start off fresh */
> +   OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
> +   OUT_RING(ring, 0x4);
> +   OUT_RING(ring, 1);
> +   OUT_RING(ring, 0);

Ie, this would look like:

OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
OUT_RING(ring, CP_SET_DRAW_STATE__0_COUNT(0) |
CP_SET_DRAW_STATE__0_DISABLE_ALL_GROUPS |
CP_SET_DRAW_STATE__0_GROUP_ID(0));
OUT_RING(ring, CP_SET_DRAW_STATE__1_ADDR_LO(1));
OUT_RING(ring, CP_SET_DRAW_STATE__2_ADDR_HI(0));

.. but written that way, I think you meant ADDR_LO to be zero?

(it is possible we need to regen headers for that to work, the kernel
headers are somewhat out of date by now)

BR,
-R

> +
> +   /* Turn off APRIV */
> +   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
> +   OUT_RING(ring, 0);
> +
> +   /* Turn off protected mode */
> +   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> +   OUT_RING(ring, 1);
> +}
> +
>  static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> struct msm_file_private *ctx)
>  {
> @@ -89,6 +141,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit,
> struct msm_ringbuffer *ring = submit->ring;
> unsigned int i;
>
> +   a6xx_set_pagetable(gpu, ring, ctx);
> +
> get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
> rbmemptr_stats(ring, index, cpcycles_start));
>
> @@ -872,6 +926,18 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
> return (unsigned long)busy_time;
>  }
>
> +struct msm_gem_address_space *a6xx_address_space_instance(struct msm_gpu 
> *gpu)
> +{
> +   struct msm_mmu *mmu;
> +
> +   mmu = msm_iommu_pagetable_create(gpu->aspace->mmu);
> +   if (IS_ERR(mmu))
> +   return msm_gem_address_space_get(gpu->aspace);
> +
> +   return msm_gem_address_space_create(mmu,
> +   "gpu", 0x1ULL, 0x1ULL);
> +}
> +
>  static const struct adreno_gpu_funcs funcs = {
> .base = {
> .get_param = adreno_get_param,
> @@ -893,8 +959,9 @@ static const struct adreno_gpu_funcs funcs = {
>  #if defined(CONFIG_DRM_MSM_GPU_STATE)
> .gpu_state_get = a6xx_gpu_state_get,
> .gpu_state_put = a6xx_gpu_state_put,
> -   .create_address_space = adreno_iommu_create_address_space,
>  #endif
> +   .create_address_space = adreno_iommu_create_address_space,
> +   .address_space_instance = a6xx_address_space_instance,
> },
> .get_timestamp = a6xx_get_timestamp,
>  };
> diff --git 

Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-11 Thread Jacob Pan
On Thu, 11 Jun 2020 14:40:47 -0600
Alex Williamson  wrote:

> On Thu, 11 Jun 2020 12:52:05 -0700
> Jacob Pan  wrote:
> 
> > Hi Alex,
> > 
> > On Thu, 11 Jun 2020 09:47:41 -0600
> > Alex Williamson  wrote:
> >   
> > > On Wed, 10 Jun 2020 21:12:13 -0700
> > > Jacob Pan  wrote:
> > > 
> > > > IOMMU UAPI is newly introduced to support communications between
> > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > discussions on how it should work with VFIO UAPI and userspace
> > > > in general.
> > > > 
> > > > This document is indended to clarify the UAPI design and usage.
> > > > The mechenics of how future extensions should be achieved are
> > > > also covered in this documentation.
> > > > 
> > > > Signed-off-by: Liu Yi L 
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > >  Documentation/userspace-api/iommu.rst | 210
> > > > ++ 1 file changed, 210
> > > > insertions(+) create mode 100644
> > > > Documentation/userspace-api/iommu.rst
> > > > 
> > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > index ..e95dc5a04a41
> > > > --- /dev/null
> > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > @@ -0,0 +1,210 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > +.. iommu:
> > > > +
> > > > +=
> > > > +IOMMU Userspace API
> > > > +=
> > > > +
> > > > +IOMMU UAPI is used for virtualization cases where
> > > > communications are +needed between physical and virtual IOMMU
> > > > drivers. For native +usage, IOMMU is a system device which does
> > > > not need to communicate +with user space directly.
> > > > +
> > > > +The primary use cases are guest Shared Virtual Address (SVA)
> > > > and +guest IO virtual address (IOVA), wherein virtual IOMMU
> > > > (vIOMMU) is +required to communicate with the physical IOMMU in
> > > > the host. +
> > > > +.. contents:: :local:
> > > > +
> > > > +Functionalities
> > > > +
> > > > +Communications of user and kernel involve both directions. The
> > > > +supported user-kernel APIs are as follows:
> > > > +
> > > > +1. Alloc/Free PASID
> > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > +4. Invalidate IOMMU caches
> > > > +5. Service page request
> > > > +
> > > > +Requirements
> > > > +
> > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > following +requirements:
> > > > +
> > > > +1. Emulated and para-virtualised vIOMMUs
> > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > +3. Extensions to the UAPI shall not break existing user space
> > > > +
> > > > +Interfaces
> > > > +
> > > > +Although the data structures defined in IOMMU UAPI are
> > > > self-contained, +there is no user API functions introduced.
> > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > driver frameworks such as VFIO. +
> > > > +Extension Rules & Precautions
> > > > +-
> > > > +When IOMMU UAPI gets extended, the data structures can *only*
> > > > be +modified in two ways:
> > > > +
> > > > +1. Adding new fields by re-purposing the padding[] field. No
> > > > size change. +2. Adding new union members at the end. May
> > > > increase in size. +
> > > > +No new fields can be added *after* the variable size union in
> > > > that it +will break backward compatibility when offset moves.
> > > > In both cases, a +new flag must be accompanied with a new field
> > > > such that the IOMMU +driver can process the data based on the
> > > > new flag. Version field is +only reserved for the unlikely
> > > > event of UAPI upgrade at its entirety. +
> > > > +It's *always* the caller's responsibility to indicate the size
> > > > of the +structure passed by setting argsz appropriately.
> > > > +
> > > > +When IOMMU UAPI extension results in size increase, user such
> > > > as VFIO +has to handle the following scenarios:
> > > > +
> > > > +1. User and kernel has exact size match
> > > > +2. An older user with older kernel header (smaller UAPI size)
> > > > running on a
> > > > +   newer kernel (larger UAPI size)
> > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > running
> > > > +   on a older kernel.
> > > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > > within
> > > > +   range. The data may contain garbage.
> > > > +
> > > > +
> > > > +Feature Checking
> > > > +
> > > > +While launching a guest with vIOMMU, it is important to ensure
> > > > that host +can support the UAPI data structures to be used for
> > > > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > > > checking, future +faults are difficult to report 

Re: [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data

2020-06-11 Thread Jacob Pan
On Thu, 11 Jun 2020 10:49:36 -0600
Alex Williamson  wrote:

> On Wed, 10 Jun 2020 21:12:14 -0700
> Jacob Pan  wrote:
> 
> > As IOMMU UAPI gets extended, user data size may increase. To support
> > backward compatibiliy, this patch introduces a size field to each
> > UAPI data structures. It is *always* the responsibility for the
> > user to fill in the correct size.  
> 
> Though at the same time, argsz is user provided data which we don't
> trust.  The argsz field allows the user to indicate how much data
> they're providing, it's still the kernel's responsibility to validate
> whether it's correct and sufficient for the requested operation.
> Thanks,
> 
Yes, will add this clarification.

Thanks,

Jacob
> Alex
> 
> > Specific scenarios for user data handling are documented in:
> > Documentation/userspace-api/iommu.rst
> > 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  include/uapi/linux/iommu.h | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index e907b7091a46..303f148a5cd7 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -135,6 +135,7 @@ enum iommu_page_response_code {
> >  
> >  /**
> >   * struct iommu_page_response - Generic page response information
> > + * @argsz: User filled size of this data
> >   * @version: API version of this structure
> >   * @flags: encodes whether the corresponding fields are valid
> >   * (IOMMU_FAULT_PAGE_RESPONSE_* values)
> > @@ -143,6 +144,7 @@ enum iommu_page_response_code {
> >   * @code: response code from  iommu_page_response_code
> >   */
> >  struct iommu_page_response {
> > +   __u32   argsz;
> >  #define IOMMU_PAGE_RESP_VERSION_1  1
> > __u32   version;
> >  #define IOMMU_PAGE_RESP_PASID_VALID(1 << 0)
> > @@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
> >  /**
> >   * struct iommu_cache_invalidate_info - First level/stage
> > invalidation
> >   * information
> > + * @argsz: User filled size of this data
> >   * @version: API version of this structure
> >   * @cache: bitfield that allows to select which caches to
> > invalidate
> >   * @granularity: defines the lowest granularity used for the
> > invalidation: @@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
> >   * must support the used granularity.
> >   */
> >  struct iommu_cache_invalidate_info {
> > +   __u32   argsz;
> >  #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > __u32   version;
> >  /* IOMMU paging structure cache */
> > @@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
> >  
> >  /**
> >   * struct iommu_gpasid_bind_data - Information about device and
> > guest PASID binding
> > + * @argsz: User filled size of this data
> >   * @version:   Version of this data structure
> >   * @format:PASID table entry format
> >   * @flags: Additional information on guest bind request
> > @@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd {
> >   * PASID to host PASID based on this bind data.
> >   */
> >  struct iommu_gpasid_bind_data {
> > +   __u32 argsz;
> >  #define IOMMU_GPASID_BIND_VERSION_11
> > __u32 version;
> >  #define IOMMU_PASID_FORMAT_INTEL_VTD   1  
> 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users

2020-06-11 Thread Jacob Pan
On Thu, 11 Jun 2020 14:55:18 -0600
Alex Williamson  wrote:

> On Thu, 11 Jun 2020 13:02:24 -0700
> Jacob Pan  wrote:
> 
> > On Thu, 11 Jun 2020 11:08:16 -0600
> > Alex Williamson  wrote:
> >   
> > > On Wed, 10 Jun 2020 21:12:15 -0700
> > > Jacob Pan  wrote:
> > > 
> > > > IOMMU UAPI data has an argsz field which is filled by user. As
> > > > the data structures expands, argsz may change. As the UAPI data
> > > > are shared among different architectures, extensions of UAPI
> > > > data could be a result of one architecture which has no impact
> > > > on another. Therefore, these argsz santity checks are performed
> > > > in the model specific IOMMU drivers. This patch adds sanity
> > > > checks in the VT-d to ensure argsz passed by userspace matches
> > > > feature flags and other contents.
> > > > 
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > >  drivers/iommu/intel-iommu.c | 16 
> > > >  drivers/iommu/intel-svm.c   | 12 
> > > >  2 files changed, 28 insertions(+)
> > > > 
> > > > diff --git a/drivers/iommu/intel-iommu.c
> > > > b/drivers/iommu/intel-iommu.c index 27ebf4b9faef..c98b5109684b
> > > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > > +++ b/drivers/iommu/intel-iommu.c
> > > > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct
> > > > iommu_domain *domain, struct device *dev, struct
> > > > device_domain_info *info; struct intel_iommu *iommu;
> > > > unsigned long flags;
> > > > +   unsigned long minsz;
> > > > int cache_type;
> > > > u8 bus, devfn;
> > > > u16 did, sid;
> > > > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct
> > > > iommu_domain *domain, struct device *dev, if
> > > > (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) return
> > > > -EINVAL; 
> > > > +   minsz = offsetofend(struct iommu_cache_invalidate_info,
> > > > padding);  
> > > 
> > > Would it still be better to look for the end of the last field
> > > that's actually used to avoid the code churn and oversights
> > > if/when the padding field does get used and renamed?
> > > 
> > My thought was that if the padding gets partially re-purposed, the
> > remaining padding would still be valid for minsz check. The
> > extension rule ensures that there is no size change other the
> > variable size union at the end. So use padding would avoid the
> > churn, or i am totally wrong?  
> 
> No, it's trying to predict the future either way.  I figured checking
> minsz against the fields we actually consume allows complete use of
> the padding fields and provides a little leniency to the user.  We'd
> need to be careful though that if those fields are later used by this
> driver, the code would still need to accept the smaller size.  If the
> union was named rather than anonymous we could just use offsetof() to
> avoid directly referencing the padding fields.
>  
I will change it to named union.

Thanks,

> > > Per my comment on patch 1/, this also seems like where the device
> > > specific IOMMU driver should also have the responsibility of
> > > receiving a __user pointer to do the copy_from_user() here.  vfio
> > > can't know which flags require which fields to make a UAPI with
> > > acceptable compatibility guarantees otherwise.
> > > 
> > Right, VFIO cannot do compatibility guarantees, it is just seem to
> > be that VFIO has enough information to copy_from_user sanely &
> > safely and handle over to IOMMU. Please help define the
> > roles/responsibilities in my other email. Then I will follow the
> > guideline.  
> 
> We can keep that part of the discussion in the other thread.  Thanks,
> 
> Alex
> 
> > > > +   if (inv_info->argsz < minsz)
> > > > +   return -EINVAL;
> > > > +
> > > > +   /* Sanity check user filled invalidation dat sizes */
> > > > +   if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> > > > +   inv_info->argsz != offsetofend(struct
> > > > iommu_cache_invalidate_info,
> > > > +   addr_info))
> > > > +   return -EINVAL;
> > > > +
> > > > +   if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> > > > +   inv_info->argsz != offsetofend(struct
> > > > iommu_cache_invalidate_info,
> > > > +   pasid_info))
> > > > +   return -EINVAL;
> > > > +
> > > > spin_lock_irqsave(_domain_lock, flags);
> > > > spin_lock(>lock);
> > > > info = get_domain_info(dev);
> > > > diff --git a/drivers/iommu/intel-svm.c
> > > > b/drivers/iommu/intel-svm.c index 35b43fe819ed..64dc2c66dfff
> > > > 100644 --- a/drivers/iommu/intel-svm.c
> > > > +++ b/drivers/iommu/intel-svm.c
> > > > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct
> > > > iommu_domain *domain, struct device *dev, struct dmar_domain
> > > > *dmar_domain; struct intel_svm_dev *sdev;
> > > > struct intel_svm *svm;
> > > > +   unsigned long minsz;
> > > > int ret = 0;
> > 

[PATCH] iommu/arm-smmu: Add a init_context_bank implementation hook

2020-06-11 Thread Jordan Crouse
Add a new implementation hook to allow the implementation specific code
to tweek the context bank configuration just before it gets written.
The first user will be the Adreno GPU implementation to turn on
SCTLR.HUPCF to ensure that a page fault doesn't terminating pending
transactions. Doing so could hang the GPU if one of the terminated
transactions is a CP read.

This depends on the arm-smmu adreno SMMU implementation [1].

[1] https://patchwork.kernel.org/patch/11600943/

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-qcom.c | 13 +
 drivers/iommu/arm-smmu.c  | 28 +---
 drivers/iommu/arm-smmu.h  | 11 +++
 3 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 6d0ab4865fc7..e5c6345da6fc 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -17,6 +17,18 @@ static bool qcom_adreno_smmu_is_gpu_device(struct 
arm_smmu_domain *smmu_domain)
return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno");
 }
 
+static void qcom_adreno_smmu_init_context_bank(struct arm_smmu_domain 
*smmu_domain,
+   struct arm_smmu_cb *cb)
+{
+   /*
+* On the GPU device we want to process subsequent transactions after a
+* fault to keep the GPU from hanging
+*/
+
+   if (qcom_adreno_smmu_is_gpu_device(smmu_domain))
+   cb->sctlr |= ARM_SMMU_SCTLR_HUPCF;
+}
+
 static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
struct io_pgtable_cfg *pgtbl_cfg)
 {
@@ -92,6 +104,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
.init_context = qcom_adreno_smmu_init_context,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
+   .init_context_bank = qcom_adreno_smmu_init_context_bank,
 };
 
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a06cbcaec247..f0f201ece3a0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -86,13 +86,6 @@ struct arm_smmu_smr {
boolvalid;
 };
 
-struct arm_smmu_cb {
-   u64 ttbr[2];
-   u32 tcr[2];
-   u32 mair[2];
-   struct arm_smmu_cfg *cfg;
-};
-
 struct arm_smmu_master_cfg {
struct arm_smmu_device  *smmu;
s16 smendx[];
@@ -579,6 +572,18 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
}
}
+
+   cb->sctlr = ARM_SMMU_SCTLR_CFIE | ARM_SMMU_SCTLR_CFRE | 
ARM_SMMU_SCTLR_AFE |
+   ARM_SMMU_SCTLR_TRE | ARM_SMMU_SCTLR_M;
+
+   if (stage1)
+   cb->sctlr |= ARM_SMMU_SCTLR_S1_ASIDPNE;
+   if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+   cb->sctlr |= ARM_SMMU_SCTLR_E;
+
+   /* Give the implementation a chance to adjust the configuration */
+   if (smmu_domain->smmu->impl && 
smmu_domain->smmu->impl->init_context_bank)
+   smmu_domain->smmu->impl->init_context_bank(smmu_domain, cb);
 }
 
 static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
@@ -657,14 +662,7 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
}
 
/* SCTLR */
-   reg = ARM_SMMU_SCTLR_CFIE | ARM_SMMU_SCTLR_CFRE | ARM_SMMU_SCTLR_AFE |
- ARM_SMMU_SCTLR_TRE | ARM_SMMU_SCTLR_M;
-   if (stage1)
-   reg |= ARM_SMMU_SCTLR_S1_ASIDPNE;
-   if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
-   reg |= ARM_SMMU_SCTLR_E;
-
-   arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
+   arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, cb->sctlr);
 }
 
 /*
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 79d441024043..9b539820997b 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -142,6 +142,7 @@ enum arm_smmu_cbar_type {
 
 #define ARM_SMMU_CB_SCTLR  0x0
 #define ARM_SMMU_SCTLR_S1_ASIDPNE  BIT(12)
+#define ARM_SMMU_SCTLR_HUPCF   BIT(8)
 #define ARM_SMMU_SCTLR_CFCFG   BIT(7)
 #define ARM_SMMU_SCTLR_CFIEBIT(6)
 #define ARM_SMMU_SCTLR_CFREBIT(5)
@@ -349,6 +350,14 @@ struct arm_smmu_domain {
boolaux;
 };
 
+struct arm_smmu_cb {
+   u64 ttbr[2];
+   u32 tcr[2];
+   u32 mair[2];
+   u32 sctlr;
+   struct arm_smmu_cfg *cfg;
+};
+
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
@@ -403,6 +412,8 @@ struct arm_smmu_impl {
void (*tlb_sync)(struct arm_smmu_device 

[PATCH 3/6] iommu/arm-smmu: Add a domain attribute to pass the pagetable config

2020-06-11 Thread Jordan Crouse
The Adreno GPU has the capacity to manage its own pagetables and switch
them dynamically from the hardware. Add a domain attribute for arm-smmu-v2
to get the default pagetable configuration so that the GPU driver can match
the format for its own pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 12 
 include/linux/iommu.h|  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 46a96c578592..a06cbcaec247 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1710,6 +1710,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_PGTABLE_CFG: {
+   struct io_pgtable *pgtable;
+   struct io_pgtable_cfg *dest = data;
+
+   if (!smmu_domain->pgtbl_ops)
+   return -ENODEV;
+
+   pgtable = 
io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+
+   memcpy(dest, >cfg, sizeof(*dest));
+   return 0;
+   }
default:
return -ENODEV;
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5f0b7859d2eb..2388117641f1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_PGTABLE_CFG,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/6] drm/msm/a6xx: Add support for per-instance pagetables

2020-06-11 Thread Jordan Crouse
Add support for using per-instance pagetables if all the dependencies are
available.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 69 ++-
 drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index a1589e040c57..5e82b85d4d55 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -79,6 +79,58 @@ static void get_stats_counter(struct msm_ringbuffer *ring, 
u32 counter,
OUT_RING(ring, upper_32_bits(iova));
 }
 
+static void a6xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer 
*ring,
+   struct msm_file_private *ctx)
+{
+   phys_addr_t ttbr;
+   u32 asid;
+
+   if (msm_iommu_pagetable_params(ctx->aspace->mmu, , ))
+   return;
+
+   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+   OUT_RING(ring, 0);
+
+   /* Turn on APIV mode to access critical regions */
+   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
+   OUT_RING(ring, 1);
+
+   /* Make sure the ME is synchronized before staring the update */
+   OUT_PKT7(ring, CP_WAIT_FOR_ME, 0);
+
+   /* Execute the table update */
+   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
+   /* CONTEXTIDR is currently unused */
+   OUT_RING(ring, 0);
+   /* CONTEXTBANK is currently unused */
+   OUT_RING(ring, 0);
+
+   /*
+* Write the new TTBR0 to the memstore. This is good for debugging.
+*/
+   OUT_PKT7(ring, CP_MEM_WRITE, 4);
+   OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
+   OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
+
+   /* Invalidate the draw state so we start off fresh */
+   OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
+   OUT_RING(ring, 0x4);
+   OUT_RING(ring, 1);
+   OUT_RING(ring, 0);
+
+   /* Turn off APRIV */
+   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
+   OUT_RING(ring, 0);
+
+   /* Turn off protected mode */
+   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+   OUT_RING(ring, 1);
+}
+
 static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
struct msm_file_private *ctx)
 {
@@ -89,6 +141,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
struct msm_ringbuffer *ring = submit->ring;
unsigned int i;
 
+   a6xx_set_pagetable(gpu, ring, ctx);
+
get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
rbmemptr_stats(ring, index, cpcycles_start));
 
@@ -872,6 +926,18 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+struct msm_gem_address_space *a6xx_address_space_instance(struct msm_gpu *gpu)
+{
+   struct msm_mmu *mmu;
+
+   mmu = msm_iommu_pagetable_create(gpu->aspace->mmu);
+   if (IS_ERR(mmu))
+   return msm_gem_address_space_get(gpu->aspace);
+
+   return msm_gem_address_space_create(mmu,
+   "gpu", 0x1ULL, 0x1ULL);
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -893,8 +959,9 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
-   .create_address_space = adreno_iommu_create_address_space,
 #endif
+   .create_address_space = adreno_iommu_create_address_space,
+   .address_space_instance = a6xx_address_space_instance,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h 
b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 7764373d0ed2..0987d6bf848c 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -31,6 +31,7 @@ struct msm_rbmemptrs {
volatile uint32_t fence;
 
volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
+   volatile u64 ttbr0;
 };
 
 struct msm_ringbuffer {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2

2020-06-11 Thread Jordan Crouse
Support auxiliary domains for arm-smmu-v2 to initialize and support
multiple pagetables for a single SMMU context bank. Since the smmu-v2
hardware doesn't have any built in support for switching the pagetable
base it is left as an exercise to the caller to actually use the pagetable.

Aux domains are supported if split pagetable (TTBR1) support has been
enabled on the master domain.  Each auxiliary domain will reuse the
configuration of the master domain. By default the a domain with TTBR1
support will have the TTBR0 region disabled so the first attached aux
domain will enable the TTBR0 region in the hardware and conversely the
last domain to be detached will disable TTBR0 translations.  All subsequent
auxiliary domains create a pagetable but not touch the hardware.

The leaf driver will be able to query the physical address of the
pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
address with whatever means it has to switch the pagetable base.

Following is a pseudo code example of how a domain can be created

 /* Check to see if aux domains are supported */
 if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
 iommu = iommu_domain_alloc(...);

 if (iommu_aux_attach_device(domain, dev))
 return FAIL;

/* Save the base address of the pagetable for use by the driver
iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, );
 }

Then 'domain' can be used like any other iommu domain to map and
unmap iova addresses in the pagetable.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 216 ---
 drivers/iommu/arm-smmu.h |   1 +
 2 files changed, 201 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 743d75b9ff3f..46a96c578592 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -667,6 +667,84 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
+/*
+ * Update the context context bank to enable TTBR0. Assumes AARCH64 S1
+ * configuration.
+ */
+static void arm_smmu_context_set_ttbr0(struct arm_smmu_cb *cb,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   u32 tcr = cb->tcr[0];
+
+   /* Add the TCR configuration from the new pagetable config */
+   tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+
+   /* Make sure that both TTBR0 and TTBR1 are enabled */
+   tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+   /* Udate the TCR register */
+   cb->tcr[0] = tcr;
+
+   /* Program the new TTBR0 */
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+}
+
+/*
+ * Thus function assumes that the current model only allows aux domains for
+ * AARCH64 S1 configurations
+ */
+static int arm_smmu_aux_init_domain_context(struct iommu_domain *domain,
+   struct arm_smmu_device *smmu, struct arm_smmu_cfg *master)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *pgtbl_ops;
+   struct io_pgtable_cfg pgtbl_cfg;
+
+   mutex_lock(_domain->init_mutex);
+
+   /* Copy the configuration from the master */
+   memcpy(_domain->cfg, master, sizeof(smmu_domain->cfg));
+
+   smmu_domain->flush_ops = _smmu_s1_tlb_ops;
+   smmu_domain->smmu = smmu;
+
+   pgtbl_cfg = (struct io_pgtable_cfg) {
+   .pgsize_bitmap = smmu->pgsize_bitmap,
+   .ias = smmu->va_size,
+   .oas = smmu->ipa_size,
+   .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
+   .tlb = smmu_domain->flush_ops,
+   .iommu_dev = smmu->dev,
+   .quirks = 0,
+   };
+
+   if (smmu_domain->non_strict)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
+   pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1, _cfg,
+   smmu_domain);
+   if (!pgtbl_ops) {
+   mutex_unlock(_domain->init_mutex);
+   return -ENOMEM;
+   }
+
+   domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
+
+   domain->geometry.aperture_end = (1UL << smmu->va_size) - 1;
+   domain->geometry.force_aperture = true;
+
+   /* enable TTBR0 when the the first aux domain is attached */
+   if (atomic_inc_return(>cbs[master->cbndx].aux) == 1) {
+   arm_smmu_context_set_ttbr0(>cbs[master->cbndx],
+   _cfg);
+   arm_smmu_write_context_bank(smmu, master->cbndx);
+   }
+
+   smmu_domain->pgtbl_ops = pgtbl_ops;
+   return 0;
+}
+
 static int arm_smmu_init_domain_context(struct iommu_domain *domain,
struct arm_smmu_device *smmu)
 {
@@ -870,36 +948,70 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
return ret;
 }
 
+static void

[PATCH 4/6] drm/msm: Add support to create a local pagetable

2020-06-11 Thread Jordan Crouse
Add support to create a io-pgtable for use by targets that support
per-instance pagetables.  In order to support per-instance pagetables the
GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
split pagetables and auxiliary domains need to be supported and enabled.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c  | 180 ++-
 drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
 3 files changed, 195 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c
index 310a31b05faa..aab121f4beb7 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct 
msm_gpu *gpu)
}
 
gpummu->gpu = gpu;
-   msm_mmu_init(>base, dev, );
+   msm_mmu_init(>base, dev, , MSM_MMU_GPUMMU);
 
return >base;
 }
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index bbe129867590..c7efe43388e3 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -4,15 +4,192 @@
  * Author: Rob Clark 
  */
 
+#include 
 #include "msm_drv.h"
 #include "msm_mmu.h"
 
 struct msm_iommu {
struct msm_mmu base;
struct iommu_domain *domain;
+   struct iommu_domain *aux_domain;
 };
+
 #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
 
+struct msm_iommu_pagetable {
+   struct msm_mmu base;
+   struct msm_mmu *parent;
+   struct io_pgtable_ops *pgtbl_ops;
+   phys_addr_t ttbr;
+   u32 asid;
+};
+
+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
+{
+   return container_of(mmu, struct msm_iommu_pagetable, base);
+}
+
+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
+   size_t size)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   size_t unmapped = 0;
+
+   /* Unmap the block one page at a time */
+   while (size) {
+   unmapped += ops->unmap(ops, iova, 4096, NULL);
+   iova += 4096;
+   size -= 4096;
+   }
+
+   iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
+
+   return (unmapped == size) ? 0 : -EINVAL;
+}
+
+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
+   struct sg_table *sgt, size_t len, int prot)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   struct scatterlist *sg;
+   size_t mapped = 0;
+   u64 addr = iova;
+   unsigned int i;
+
+   for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+   size_t size = sg->length;
+   phys_addr_t phys = sg_phys(sg);
+
+   /* Map the block one page at a time */
+   while (size) {
+   if (ops->map(ops, addr, phys, 4096, prot)) {
+   msm_iommu_pagetable_unmap(mmu, iova, mapped);
+   return -EINVAL;
+   }
+
+   phys += 4096;
+   addr += 4096;
+   size -= 4096;
+   mapped += 4096;
+   }
+   }
+
+   return 0;
+}
+
+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+
+   free_io_pgtable_ops(pagetable->pgtbl_ops);
+   kfree(pagetable);
+}
+
+/*
+ * Given a parent device, create and return an aux domain. This will enable the
+ * TTBR0 region
+ */
+static struct iommu_domain *msm_iommu_get_aux_domain(struct msm_mmu *parent)
+{
+   struct msm_iommu *iommu = to_msm_iommu(parent);
+   struct iommu_domain *domain;
+   int ret;
+
+   if (iommu->aux_domain)
+   return iommu->aux_domain;
+
+   if (!iommu_dev_has_feature(parent->dev, IOMMU_DEV_FEAT_AUX))
+   return ERR_PTR(-ENODEV);
+
+   domain = iommu_domain_alloc(_bus_type);
+   if (!domain)
+   return ERR_PTR(-ENODEV);
+
+   ret = iommu_aux_attach_device(domain, parent->dev);
+   if (ret) {
+   iommu_domain_free(domain);
+   return ERR_PTR(ret);
+   }
+
+   iommu->aux_domain = domain;
+   return domain;
+}
+
+int msm_iommu_pagetable_params(struct msm_mmu *mmu,
+   phys_addr_t *ttbr, int *asid)
+{
+   struct msm_iommu_pagetable *pagetable;
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (ttbr)
+   *ttbr = pagetable->ttbr;
+
+   if (asid)
+   *asid = pagetable->asid;
+
+   return 0;
+}
+
+static const struct msm_mmu_funcs pagetable_funcs = {
+   .map = 

[PATCH 5/6] drm/msm: Add support for address space instances

2020-06-11 Thread Jordan Crouse
Add support for allocating an address space instance. Targets that support
per-instance pagetables should implement their own function to allocate a
new instance. The default will return the existing generic address space.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/msm_drv.c | 15 +--
 drivers/gpu/drm/msm/msm_drv.h |  4 
 drivers/gpu/drm/msm/msm_gem_vma.c |  9 +
 drivers/gpu/drm/msm/msm_gpu.c | 17 +
 drivers/gpu/drm/msm/msm_gpu.h |  5 +
 5 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index f6ce40bf3699..0c219b954943 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -599,7 +599,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
 
msm_submitqueue_init(dev, ctx);
 
-   ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
+   ctx->aspace = msm_gpu_address_space_instance(priv->gpu);
file->driver_priv = ctx;
 
return 0;
@@ -618,6 +618,8 @@ static int msm_open(struct drm_device *dev, struct drm_file 
*file)
 static void context_close(struct msm_file_private *ctx)
 {
msm_submitqueue_close(ctx);
+
+   msm_gem_address_space_put(ctx->aspace);
kfree(ctx);
 }
 
@@ -782,18 +784,19 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, 
void *data,
 }
 
 static int msm_ioctl_gem_info_iova(struct drm_device *dev,
-   struct drm_gem_object *obj, uint64_t *iova)
+   struct drm_file *file, struct drm_gem_object *obj,
+   uint64_t *iova)
 {
-   struct msm_drm_private *priv = dev->dev_private;
+   struct msm_file_private *ctx = file->driver_priv;
 
-   if (!priv->gpu)
+   if (!ctx->aspace)
return -EINVAL;
 
/*
 * Don't pin the memory here - just get an address so that userspace can
 * be productive
 */
-   return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
+   return msm_gem_get_iova(obj, ctx->aspace, iova);
 }
 
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
@@ -832,7 +835,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
args->value = msm_gem_mmap_offset(obj);
break;
case MSM_INFO_GET_IOVA:
-   ret = msm_ioctl_gem_info_iova(dev, obj, >value);
+   ret = msm_ioctl_gem_info_iova(dev, file, obj, >value);
break;
case MSM_INFO_SET_NAME:
/* length check should leave room for terminating null: */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index e2d6a6056418..983a8b7e5a74 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -249,6 +249,10 @@ int msm_gem_map_vma(struct msm_gem_address_space *aspace,
 void msm_gem_close_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma);
 
+
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace);
+
 void msm_gem_address_space_put(struct msm_gem_address_space *aspace);
 
 struct msm_gem_address_space *
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 5f6a11211b64..29cc1305cf37 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -27,6 +27,15 @@ void msm_gem_address_space_put(struct msm_gem_address_space 
*aspace)
kref_put(>kref, msm_gem_address_space_destroy);
 }
 
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace)
+{
+   if (!IS_ERR_OR_NULL(aspace))
+   kref_get(>kref);
+
+   return aspace;
+}
+
 /* Actually unmap memory for the vma */
 void msm_gem_purge_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma)
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index a22d30622306..b4f31460 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -821,6 +821,23 @@ static int get_clocks(struct platform_device *pdev, struct 
msm_gpu *gpu)
return 0;
 }
 
+/* Return a new address space instance */
+struct msm_gem_address_space *
+msm_gpu_address_space_instance(struct msm_gpu *gpu)
+{
+   if (!gpu)
+   return NULL;
+
+   /*
+* If the GPU doesn't support instanced address spaces return the
+* default address space
+*/
+   if (!gpu->funcs->address_space_instance)
+   return msm_gem_address_space_get(gpu->aspace);
+
+   return gpu->funcs->address_space_instance(gpu);
+}
+
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,
const char *name, struct msm_gpu_config *config)
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 

[PATCH 0/6] iommu-arm-smmu: Add auxiliary domains and per-instance pagetables

2020-06-11 Thread Jordan Crouse
This is a new refresh of support for auxiliary domains for arm-smmu-v2
and per-instance pagetables for drm/msm. The big change here from past
efforts is that outside of creating a single aux-domain to enable TTBR0
all of the per-instance pagetables are created and managed exclusively
in drm/msm without involving the arm-smmu driver. This fits in with the
suggested model of letting the GPU hardware do what it needs and leave the
arm-smmu driver blissfully unaware.

Almost. In order to set up the io-pgtable properly in drm/msm we need to
query the pagetable configuration from the current active domain and we need to
rely on the iommu API to flush TLBs after a unmap. In the future we can optimize
this in the drm/msm driver to track the state of the TLBs but for now the big
hammer lets us get off the ground.

This series is build on the split pagetable support [1].

[1] https://patchwork.kernel.org/patch/11600949/

Jordan Crouse (6):
  iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2
  iommu/io-pgtable: Allow a pgtable implementation to skip TLB
operations
  iommu/arm-smmu: Add a domain attribute to pass the pagetable config
  drm/msm: Add support to create a local pagetable
  drm/msm: Add support for address space instances
  drm/msm/a6xx: Add support for per-instance pagetables

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  69 +++-
 drivers/gpu/drm/msm/msm_drv.c |  15 +-
 drivers/gpu/drm/msm/msm_drv.h |   4 +
 drivers/gpu/drm/msm/msm_gem_vma.c |   9 +
 drivers/gpu/drm/msm/msm_gpu.c |  17 ++
 drivers/gpu/drm/msm/msm_gpu.h |   5 +
 drivers/gpu/drm/msm/msm_gpummu.c  |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c   | 180 +++-
 drivers/gpu/drm/msm/msm_mmu.h |  16 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h  |   1 +
 drivers/iommu/arm-smmu.c  | 228 --
 drivers/iommu/arm-smmu.h  |   1 +
 include/linux/io-pgtable.h|  11 +-
 include/linux/iommu.h |   1 +
 14 files changed, 529 insertions(+), 30 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations

2020-06-11 Thread Jordan Crouse
Allow a io-pgtable implementation to skip TLB operations by checking for
NULL pointers in the helper functions. It will be up to to the owner
of the io-pgtable instance to make sure that they independently handle
the TLB correctly.

Signed-off-by: Jordan Crouse 
---

 include/linux/io-pgtable.h | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 53d53c6c2be9..bbed1d3925ba 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -210,21 +210,24 @@ struct io_pgtable {
 
 static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
 {
-   iop->cfg.tlb->tlb_flush_all(iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_all(iop->cookie);
 }
 
 static inline void
 io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
 {
-   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
 }
 
 static inline void
 io_pgtable_tlb_flush_leaf(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
 {
-   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
 }
 
 static inline void
@@ -232,7 +235,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
struct iommu_iotlb_gather * gather, unsigned long iova,
size_t granule)
 {
-   if (iop->cfg.tlb->tlb_add_page)
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 5/7] iommu/arm-smmu: Add implementation for the adreno GPU SMMU

2020-06-11 Thread Jordan Crouse
Add a special implementation for the SMMU attached to most Adreno GPU
target triggered from the qcom,adreno-gpu-smmu compatible string. When
selected the driver will attempt to enable split pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 +++
 drivers/iommu/arm-smmu-qcom.c | 45 +--
 drivers/iommu/arm-smmu.h  |  1 +
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index a20e426d81ac..309675cf6699 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -176,5 +176,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
 
+   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
+   return qcom_adreno_smmu_impl_init(smmu);
+
return smmu;
 }
diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index cf01d0215a39..6d0ab4865fc7 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -12,6 +12,29 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
+static bool qcom_adreno_smmu_is_gpu_device(struct arm_smmu_domain *smmu_domain)
+{
+   return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno");
+}
+
+static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   /* TTBR1 is only for the GPU stream ID and not the GMU */
+   if (!qcom_adreno_smmu_is_gpu_device(smmu_domain))
+   return 0;
+   /*
+* All targets that use the qcom,adreno-smmu compatible string *should*
+* be AARCH64 stage 1 but double check because the arm-smmu code assumes
+* that is the case when the TTBR1 quirk is enabled
+*/
+   if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
+   (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
+   pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+
+   return 0;
+}
+
 static const struct of_device_id qcom_smmu_client_of_match[] = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -65,7 +88,15 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.reset = qcom_smmu500_reset,
 };
 
-struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
+   .init_context = qcom_adreno_smmu_init_context,
+   .def_domain_type = qcom_smmu_def_domain_type,
+   .reset = qcom_smmu500_reset,
+};
+
+
+static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
+   const struct arm_smmu_impl *impl)
 {
struct qcom_smmu *qsmmu;
 
@@ -75,8 +106,18 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct 
arm_smmu_device *smmu)
 
qsmmu->smmu = *smmu;
 
-   qsmmu->smmu.impl = _smmu_impl;
+   qsmmu->smmu.impl = impl;
devm_kfree(smmu->dev, smmu);
 
return >smmu;
 }
+
+struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+   return qcom_smmu_create(smmu, _smmu_impl);
+}
+
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu)
+{
+   return qcom_smmu_create(smmu, _adreno_smmu_impl);
+}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d33cfe26b2f5..c417814f1d98 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -466,6 +466,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device 
*smmu, int page,
 
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu);
 
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 3/7] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU

2020-06-11 Thread Jordan Crouse
Every Qcom Adreno GPU has an embedded SMMU for its own use. These
devices depend on unique features such as split pagetables,
different stall/halt requirements and other settings. Identify them
with a compatible string so that they can be identified in the
arm-smmu implementation specific code.

Signed-off-by: Jordan Crouse 
---

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..e52a1b146c97 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,10 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
+items:
+  - const: qcom,adreno-smmu
+  - const: qcom,smmu-v2
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 2/7] iommu/arm-smmu: Add support for split pagetables

2020-06-11 Thread Jordan Crouse
Enable TTBR1 for a context bank if IO_PGTABLE_QUIRK_ARM_TTBR1 is selected
by the io-pgtable configuration.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 21 -
 drivers/iommu/arm-smmu.h | 25 +++--
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 8a3a6c8c887a..048de2681670 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -555,11 +555,15 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
- cfg->asid);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+   cfg->asid);
cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-cfg->asid);
+   cfg->asid);
+
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   cb->ttbr[1] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   else
+   cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -824,7 +828,14 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~0UL << ias;
+   domain->geometry.aperture_end = ~0UL;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   }
+
domain->geometry.force_aperture = true;
 
/* Initialise the context bank with our page table cfg */
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 38b041530a4f..5f2de20e883b 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -168,10 +168,12 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_TCR0x30
 #define ARM_SMMU_TCR_EAE   BIT(31)
 #define ARM_SMMU_TCR_EPD1  BIT(23)
+#define ARM_SMMU_TCR_A1BIT(22)
 #define ARM_SMMU_TCR_TG0   GENMASK(15, 14)
 #define ARM_SMMU_TCR_SH0   GENMASK(13, 12)
 #define ARM_SMMU_TCR_ORGN0 GENMASK(11, 10)
 #define ARM_SMMU_TCR_IRGN0 GENMASK(9, 8)
+#define ARM_SMMU_TCR_EPD0  BIT(7)
 #define ARM_SMMU_TCR_T0SZ  GENMASK(5, 0)
 
 #define ARM_SMMU_VTCR_RES1 BIT(31)
@@ -347,12 +349,23 @@ struct arm_smmu_domain {
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
-   return ARM_SMMU_TCR_EPD1 |
-  FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
-  FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
-  FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
-  FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
-  FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+   u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
+   FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
+   FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
+   FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
+   FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+
+   /*
+   * When TTBR1 is selected shift the TCR fields by 16 bits and disable
+   * translation in TTBR0
+   */
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   tcr = (tcr << 16) & ~ARM_SMMU_TCR_A1;
+   tcr |= ARM_SMMU_TCR_EPD0;
+   } else
+   tcr |= ARM_SMMU_TCR_EPD1;
+
+   return tcr;
 }
 
 static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 1/7] iommu/arm-smmu: Pass io-pgtable config to implementation specific function

2020-06-11 Thread Jordan Crouse
Construct the io-pgtable config before calling the implementation specific
init_context function and pass it so the implementation specific function
can get a chance to change it before the io-pgtable is created.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c  | 11 ++-
 drivers/iommu/arm-smmu.h  |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..a20e426d81ac 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..8a3a6c8c887a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -797,11 +797,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
-   if (smmu->impl && smmu->impl->init_context) {
-   ret = smmu->impl->init_context(smmu_domain);
-   if (ret)
-   goto out_unlock;
-   }
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -812,6 +807,12 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
.iommu_dev  = smmu->dev,
};
 
+   if (smmu->impl && smmu->impl->init_context) {
+   ret = smmu->impl->init_context(smmu_domain, _cfg);
+   if (ret)
+   goto out_unlock;
+   }
+
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..38b041530a4f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -383,7 +383,8 @@ struct arm_smmu_impl {
u64 val);
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
-   int (*init_context)(struct arm_smmu_domain *smmu_domain);
+   int (*init_context)(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *cfg);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 int status);
int (*def_domain_type)(struct device *dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 7/7] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

2020-06-11 Thread Jordan Crouse
Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable
split pagetables.

Signed-off-by: Jordan Crouse 
---

 arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 8eb5a31346d2..8b15cd74e9ba 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -3556,7 +3556,7 @@
};
 
adreno_smmu: iommu@504 {
-   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+   compatible = "qcom,adreno-smmu", "qcom,smmu-v2";
reg = <0 0x504 0 0x1>;
#iommu-cells = <1>;
#global-interrupts = <2>;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 4/7] iommu/arm-smmu: Add a pointer to the attached device to smmu_domain

2020-06-11 Thread Jordan Crouse
Add a link to the pointer to the struct device that is attached to a
domain. This makes it easy to get the pointer if it is needed in the
implementation specific code.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 1 +
 drivers/iommu/arm-smmu.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 048de2681670..743d75b9ff3f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -801,6 +801,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
+   smmu_domain->dev = dev;
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 5f2de20e883b..d33cfe26b2f5 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -345,6 +345,7 @@ struct arm_smmu_domain {
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
struct iommu_domain domain;
+   struct device   *dev;   /* Device attached to this 
domain */
 };
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 0/7] iommu/arm-smmu: Enable split pagetable support

2020-06-11 Thread Jordan Crouse


Another iteration of the split-pagetable support for arm-smmu and the Adreno GPU
SMMU. After email discussions [1] we opted to make a arm-smmu implementation for
specifically for the Adreno GPU and use that to enable split pagetable support
and later other implementation specific bits that we need.

On the hardware side this is very close to the same code from before [2] only
the TTBR1 quirk is turned on by the implementation and not a domain attribute.
In drm/msm we use the returned size of the aperture as a clue to let us know
which virtual address space we should use for global memory objects.

There are two open items that you should be aware of. First, in the
implementation specific code we have to check the compatible string of the
device so that we only enable TTBR1 for the GPU (SID 0) and not the GMU (SID 4).
I went back and forth trying to decide if I wanted to use the compatbile string
or the SID as the filter and settled on the compatible string but I could be
talked out of it.

The other open item is that in drm/msm the hardware only uses 49 bits of the
address space but arm-smmu expects the address to be sign extended all the way
to 64 bits. This isn't a problem normally unless you look at the hardware
registers that contain a IOVA and then the upper bits will be zero. I opted to
restrict the internal drm/msm IOVA range to only 49 bits and then sign extend
right before calling iommu_map / iommu_unmap. This is a bit wonky but I thought
that matching the hardware would be less confusing when debugging a hang.

v8: Pass the attached device in the smmu_domain to the implementation
specific functions

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-May/044537.html
[2] https://patchwork.kernel.org/patch/11482591/


Jordan Crouse (7):
  iommu/arm-smmu: Pass io-pgtable config to implementation specific
function
  iommu/arm-smmu: Add support for split pagetables
  dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  iommu/arm-smmu: Add a pointer to the attached device to smmu_domain
  iommu/arm-smmu: Add implementation for the adreno GPU SMMU
  drm/msm: Set the global virtual address range from the IOMMU domain
  arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  4 ++
 arch/arm64/boot/dts/qcom/sdm845.dtsi  |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 13 +-
 drivers/gpu/drm/msm/msm_iommu.c   |  7 +++
 drivers/iommu/arm-smmu-impl.c |  6 ++-
 drivers/iommu/arm-smmu-qcom.c | 45 ++-
 drivers/iommu/arm-smmu.c  | 33 +-
 drivers/iommu/arm-smmu.h  | 30 ++---
 8 files changed, 117 insertions(+), 23 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 6/7] drm/msm: Set the global virtual address range from the IOMMU domain

2020-06-11 Thread Jordan Crouse
Use the aperture settings from the IOMMU domain to set up the virtual
address range for the GPU. This allows us to transparently deal with
IOMMU side features (like split pagetables).

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++--
 drivers/gpu/drm/msm/msm_iommu.c |  7 +++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 89673c7ed473..3e717c1ebb7f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu,
struct iommu_domain *iommu = iommu_domain_alloc(_bus_type);
struct msm_mmu *mmu = msm_iommu_new(>dev, iommu);
struct msm_gem_address_space *aspace;
+   u64 start, size;
 
-   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
-   0xfff);
+   /*
+* Use the aperture start or SZ_16M, whichever is greater. This will
+* ensure that we align with the allocated pagetable range while still
+* allowing room in the lower 32 bits for GMEM and whatnot
+*/
+   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
+   size = iommu->geometry.aperture_end - start + 1;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu",
+   start & GENMASK(48, 0), size);
 
if (IS_ERR(aspace) && !IS_ERR(mmu))
mmu->funcs->destroy(mmu);
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 3a381a9674c9..bbe129867590 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
struct msm_iommu *iommu = to_msm_iommu(mmu);
size_t ret;
 
+   /* The arm-smmu driver expects the addresses to be sign extended */
+   if (iova & BIT(48))
+   iova |= GENMASK(63, 49);
+
ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
WARN_ON(!ret);
 
@@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t 
iova, size_t len)
 {
struct msm_iommu *iommu = to_msm_iommu(mmu);
 
+   if (iova & BIT(48))
+   iova |= GENMASK(63, 49);
+
iommu_unmap(iommu->domain, iova, len);
 
return 0;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users

2020-06-11 Thread Alex Williamson
On Thu, 11 Jun 2020 13:02:24 -0700
Jacob Pan  wrote:

> On Thu, 11 Jun 2020 11:08:16 -0600
> Alex Williamson  wrote:
> 
> > On Wed, 10 Jun 2020 21:12:15 -0700
> > Jacob Pan  wrote:
> >   
> > > IOMMU UAPI data has an argsz field which is filled by user. As the
> > > data structures expands, argsz may change. As the UAPI data are
> > > shared among different architectures, extensions of UAPI data could
> > > be a result of one architecture which has no impact on another.
> > > Therefore, these argsz santity checks are performed in the model
> > > specific IOMMU drivers. This patch adds sanity checks in the VT-d
> > > to ensure argsz passed by userspace matches feature flags and other
> > > contents.
> > > 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/iommu/intel-iommu.c | 16 
> > >  drivers/iommu/intel-svm.c   | 12 
> > >  2 files changed, 28 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/intel-iommu.c
> > > b/drivers/iommu/intel-iommu.c index 27ebf4b9faef..c98b5109684b
> > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct
> > > iommu_domain *domain, struct device *dev, struct device_domain_info
> > > *info; struct intel_iommu *iommu;
> > >   unsigned long flags;
> > > + unsigned long minsz;
> > >   int cache_type;
> > >   u8 bus, devfn;
> > >   u16 did, sid;
> > > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct
> > > iommu_domain *domain, struct device *dev, if (!(dmar_domain->flags
> > > & DOMAIN_FLAG_NESTING_MODE)) return -EINVAL;
> > >  
> > > + minsz = offsetofend(struct iommu_cache_invalidate_info,
> > > padding);
> > 
> > Would it still be better to look for the end of the last field that's
> > actually used to avoid the code churn and oversights if/when the
> > padding field does get used and renamed?
> >   
> My thought was that if the padding gets partially re-purposed, the
> remaining padding would still be valid for minsz check. The extension
> rule ensures that there is no size change other the variable size union
> at the end. So use padding would avoid the churn, or i am totally wrong?

No, it's trying to predict the future either way.  I figured checking
minsz against the fields we actually consume allows complete use of the
padding fields and provides a little leniency to the user.  We'd need
to be careful though that if those fields are later used by this
driver, the code would still need to accept the smaller size.  If the
union was named rather than anonymous we could just use offsetof() to
avoid directly referencing the padding fields.
 
> > Per my comment on patch 1/, this also seems like where the device
> > specific IOMMU driver should also have the responsibility of receiving
> > a __user pointer to do the copy_from_user() here.  vfio can't know
> > which flags require which fields to make a UAPI with acceptable
> > compatibility guarantees otherwise.
> >   
> Right, VFIO cannot do compatibility guarantees, it is just seem to be
> that VFIO has enough information to copy_from_user sanely & safely and
> handle over to IOMMU. Please help define the roles/responsibilities in
> my other email. Then I will follow the guideline.

We can keep that part of the discussion in the other thread.  Thanks,

Alex

> > > + if (inv_info->argsz < minsz)
> > > + return -EINVAL;
> > > +
> > > + /* Sanity check user filled invalidation dat sizes */
> > > + if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> > > + inv_info->argsz != offsetofend(struct
> > > iommu_cache_invalidate_info,
> > > + addr_info))
> > > + return -EINVAL;
> > > +
> > > + if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> > > + inv_info->argsz != offsetofend(struct
> > > iommu_cache_invalidate_info,
> > > + pasid_info))
> > > + return -EINVAL;
> > > +
> > >   spin_lock_irqsave(_domain_lock, flags);
> > >   spin_lock(>lock);
> > >   info = get_domain_info(dev);
> > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > > index 35b43fe819ed..64dc2c66dfff 100644
> > > --- a/drivers/iommu/intel-svm.c
> > > +++ b/drivers/iommu/intel-svm.c
> > > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain
> > > *domain, struct device *dev, struct dmar_domain *dmar_domain;
> > >   struct intel_svm_dev *sdev;
> > >   struct intel_svm *svm;
> > > + unsigned long minsz;
> > >   int ret = 0;
> > >  
> > >   if (WARN_ON(!iommu) || !data)
> > >   return -EINVAL;
> > >  
> > > + /*
> > > +  * We mandate that no size change in IOMMU UAPI data
> > > before the
> > > +  * variable size union at the end.
> > > +  */
> > > + minsz = offsetofend(struct iommu_gpasid_bind_data,
> > > padding);
> > 
> > Same.  Thanks,
> > 
> > Alex
> >   
> > > + if (data->argsz < minsz)
> > > + return -EINVAL;
> > > +
> > >   if (data->version != 

Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-11 Thread Alex Williamson
On Thu, 11 Jun 2020 12:52:05 -0700
Jacob Pan  wrote:

> Hi Alex,
> 
> On Thu, 11 Jun 2020 09:47:41 -0600
> Alex Williamson  wrote:
> 
> > On Wed, 10 Jun 2020 21:12:13 -0700
> > Jacob Pan  wrote:
> >   
> > > IOMMU UAPI is newly introduced to support communications between
> > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > discussions on how it should work with VFIO UAPI and userspace in
> > > general.
> > > 
> > > This document is indended to clarify the UAPI design and usage. The
> > > mechenics of how future extensions should be achieved are also
> > > covered in this documentation.
> > > 
> > > Signed-off-by: Liu Yi L 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  Documentation/userspace-api/iommu.rst | 210
> > > ++ 1 file changed, 210 insertions(+)
> > >  create mode 100644 Documentation/userspace-api/iommu.rst
> > > 
> > > diff --git a/Documentation/userspace-api/iommu.rst
> > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > index ..e95dc5a04a41
> > > --- /dev/null
> > > +++ b/Documentation/userspace-api/iommu.rst
> > > @@ -0,0 +1,210 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +.. iommu:
> > > +
> > > +=
> > > +IOMMU Userspace API
> > > +=
> > > +
> > > +IOMMU UAPI is used for virtualization cases where communications
> > > are +needed between physical and virtual IOMMU drivers. For native
> > > +usage, IOMMU is a system device which does not need to communicate
> > > +with user space directly.
> > > +
> > > +The primary use cases are guest Shared Virtual Address (SVA) and
> > > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
> > > +required to communicate with the physical IOMMU in the host.
> > > +
> > > +.. contents:: :local:
> > > +
> > > +Functionalities
> > > +
> > > +Communications of user and kernel involve both directions. The
> > > +supported user-kernel APIs are as follows:
> > > +
> > > +1. Alloc/Free PASID
> > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > +4. Invalidate IOMMU caches
> > > +5. Service page request
> > > +
> > > +Requirements
> > > +
> > > +The IOMMU UAPIs are generic and extensible to meet the following
> > > +requirements:
> > > +
> > > +1. Emulated and para-virtualised vIOMMUs
> > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > +3. Extensions to the UAPI shall not break existing user space
> > > +
> > > +Interfaces
> > > +
> > > +Although the data structures defined in IOMMU UAPI are
> > > self-contained, +there is no user API functions introduced.
> > > Instead, IOMMU UAPI is +designed to work with existing user driver
> > > frameworks such as VFIO. +
> > > +Extension Rules & Precautions
> > > +-
> > > +When IOMMU UAPI gets extended, the data structures can *only* be
> > > +modified in two ways:
> > > +
> > > +1. Adding new fields by re-purposing the padding[] field. No size
> > > change. +2. Adding new union members at the end. May increase in
> > > size. +
> > > +No new fields can be added *after* the variable size union in that
> > > it +will break backward compatibility when offset moves. In both
> > > cases, a +new flag must be accompanied with a new field such that
> > > the IOMMU +driver can process the data based on the new flag.
> > > Version field is +only reserved for the unlikely event of UAPI
> > > upgrade at its entirety. +
> > > +It's *always* the caller's responsibility to indicate the size of
> > > the +structure passed by setting argsz appropriately.
> > > +
> > > +When IOMMU UAPI extension results in size increase, user such as
> > > VFIO +has to handle the following scenarios:
> > > +
> > > +1. User and kernel has exact size match
> > > +2. An older user with older kernel header (smaller UAPI size)
> > > running on a
> > > +   newer kernel (larger UAPI size)
> > > +3. A newer user with newer kernel header (larger UAPI size) running
> > > +   on a older kernel.
> > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > within
> > > +   range. The data may contain garbage.
> > > +
> > > +
> > > +Feature Checking
> > > +
> > > +While launching a guest with vIOMMU, it is important to ensure
> > > that host +can support the UAPI data structures to be used for
> > > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > > checking, future +faults are difficult to report even in normal
> > > conditions. For example, +TLB invalidations should always succeed
> > > from vIOMMU's +perspective. There is no architectural way to report
> > > back to the vIOMMU +if the UAPI data is incompatible. For this
> > > reason the following IOMMU +UAPIs cannot fail:
> > > +
> > > +1. Free 

Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users

2020-06-11 Thread Jacob Pan
On Thu, 11 Jun 2020 11:08:16 -0600
Alex Williamson  wrote:

> On Wed, 10 Jun 2020 21:12:15 -0700
> Jacob Pan  wrote:
> 
> > IOMMU UAPI data has an argsz field which is filled by user. As the
> > data structures expands, argsz may change. As the UAPI data are
> > shared among different architectures, extensions of UAPI data could
> > be a result of one architecture which has no impact on another.
> > Therefore, these argsz santity checks are performed in the model
> > specific IOMMU drivers. This patch adds sanity checks in the VT-d
> > to ensure argsz passed by userspace matches feature flags and other
> > contents.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/intel-iommu.c | 16 
> >  drivers/iommu/intel-svm.c   | 12 
> >  2 files changed, 28 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 27ebf4b9faef..c98b5109684b
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct
> > iommu_domain *domain, struct device *dev, struct device_domain_info
> > *info; struct intel_iommu *iommu;
> > unsigned long flags;
> > +   unsigned long minsz;
> > int cache_type;
> > u8 bus, devfn;
> > u16 did, sid;
> > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct
> > iommu_domain *domain, struct device *dev, if (!(dmar_domain->flags
> > & DOMAIN_FLAG_NESTING_MODE)) return -EINVAL;
> >  
> > +   minsz = offsetofend(struct iommu_cache_invalidate_info,
> > padding);  
> 
> Would it still be better to look for the end of the last field that's
> actually used to avoid the code churn and oversights if/when the
> padding field does get used and renamed?
> 
My thought was that if the padding gets partially re-purposed, the
remaining padding would still be valid for minsz check. The extension
rule ensures that there is no size change other the variable size union
at the end. So use padding would avoid the churn, or i am totally wrong?

> Per my comment on patch 1/, this also seems like where the device
> specific IOMMU driver should also have the responsibility of receiving
> a __user pointer to do the copy_from_user() here.  vfio can't know
> which flags require which fields to make a UAPI with acceptable
> compatibility guarantees otherwise.
> 
Right, VFIO cannot do compatibility guarantees, it is just seem to be
that VFIO has enough information to copy_from_user sanely & safely and
handle over to IOMMU. Please help define the roles/responsibilities in
my other email. Then I will follow the guideline.

> > +   if (inv_info->argsz < minsz)
> > +   return -EINVAL;
> > +
> > +   /* Sanity check user filled invalidation dat sizes */
> > +   if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> > +   inv_info->argsz != offsetofend(struct
> > iommu_cache_invalidate_info,
> > +   addr_info))
> > +   return -EINVAL;
> > +
> > +   if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> > +   inv_info->argsz != offsetofend(struct
> > iommu_cache_invalidate_info,
> > +   pasid_info))
> > +   return -EINVAL;
> > +
> > spin_lock_irqsave(_domain_lock, flags);
> > spin_lock(>lock);
> > info = get_domain_info(dev);
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 35b43fe819ed..64dc2c66dfff 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain
> > *domain, struct device *dev, struct dmar_domain *dmar_domain;
> > struct intel_svm_dev *sdev;
> > struct intel_svm *svm;
> > +   unsigned long minsz;
> > int ret = 0;
> >  
> > if (WARN_ON(!iommu) || !data)
> > return -EINVAL;
> >  
> > +   /*
> > +* We mandate that no size change in IOMMU UAPI data
> > before the
> > +* variable size union at the end.
> > +*/
> > +   minsz = offsetofend(struct iommu_gpasid_bind_data,
> > padding);  
> 
> Same.  Thanks,
> 
> Alex
> 
> > +   if (data->argsz < minsz)
> > +   return -EINVAL;
> > +
> > if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > return -EINVAL;
> >  
> > +   if (data->argsz != offsetofend(struct
> > iommu_gpasid_bind_data, vtd))
> > +   return -EINVAL;
> > +
> > if (!dev_is_pci(dev))
> > return -ENOTSUPP;
> >
> 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-11 Thread Jacob Pan
Hi Alex,

On Thu, 11 Jun 2020 09:47:41 -0600
Alex Williamson  wrote:

> On Wed, 10 Jun 2020 21:12:13 -0700
> Jacob Pan  wrote:
> 
> > IOMMU UAPI is newly introduced to support communications between
> > guest virtual IOMMU and host IOMMU. There has been lots of
> > discussions on how it should work with VFIO UAPI and userspace in
> > general.
> > 
> > This document is indended to clarify the UAPI design and usage. The
> > mechenics of how future extensions should be achieved are also
> > covered in this documentation.
> > 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  Documentation/userspace-api/iommu.rst | 210
> > ++ 1 file changed, 210 insertions(+)
> >  create mode 100644 Documentation/userspace-api/iommu.rst
> > 
> > diff --git a/Documentation/userspace-api/iommu.rst
> > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > index ..e95dc5a04a41
> > --- /dev/null
> > +++ b/Documentation/userspace-api/iommu.rst
> > @@ -0,0 +1,210 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. iommu:
> > +
> > +=
> > +IOMMU Userspace API
> > +=
> > +
> > +IOMMU UAPI is used for virtualization cases where communications
> > are +needed between physical and virtual IOMMU drivers. For native
> > +usage, IOMMU is a system device which does not need to communicate
> > +with user space directly.
> > +
> > +The primary use cases are guest Shared Virtual Address (SVA) and
> > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
> > +required to communicate with the physical IOMMU in the host.
> > +
> > +.. contents:: :local:
> > +
> > +Functionalities
> > +
> > +Communications of user and kernel involve both directions. The
> > +supported user-kernel APIs are as follows:
> > +
> > +1. Alloc/Free PASID
> > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > +4. Invalidate IOMMU caches
> > +5. Service page request
> > +
> > +Requirements
> > +
> > +The IOMMU UAPIs are generic and extensible to meet the following
> > +requirements:
> > +
> > +1. Emulated and para-virtualised vIOMMUs
> > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > +3. Extensions to the UAPI shall not break existing user space
> > +
> > +Interfaces
> > +
> > +Although the data structures defined in IOMMU UAPI are
> > self-contained, +there is no user API functions introduced.
> > Instead, IOMMU UAPI is +designed to work with existing user driver
> > frameworks such as VFIO. +
> > +Extension Rules & Precautions
> > +-
> > +When IOMMU UAPI gets extended, the data structures can *only* be
> > +modified in two ways:
> > +
> > +1. Adding new fields by re-purposing the padding[] field. No size
> > change. +2. Adding new union members at the end. May increase in
> > size. +
> > +No new fields can be added *after* the variable size union in that
> > it +will break backward compatibility when offset moves. In both
> > cases, a +new flag must be accompanied with a new field such that
> > the IOMMU +driver can process the data based on the new flag.
> > Version field is +only reserved for the unlikely event of UAPI
> > upgrade at its entirety. +
> > +It's *always* the caller's responsibility to indicate the size of
> > the +structure passed by setting argsz appropriately.
> > +
> > +When IOMMU UAPI extension results in size increase, user such as
> > VFIO +has to handle the following scenarios:
> > +
> > +1. User and kernel has exact size match
> > +2. An older user with older kernel header (smaller UAPI size)
> > running on a
> > +   newer kernel (larger UAPI size)
> > +3. A newer user with newer kernel header (larger UAPI size) running
> > +   on a older kernel.
> > +4. A malicious/misbehaving user pass illegal/invalid size but
> > within
> > +   range. The data may contain garbage.
> > +
> > +
> > +Feature Checking
> > +
> > +While launching a guest with vIOMMU, it is important to ensure
> > that host +can support the UAPI data structures to be used for
> > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > checking, future +faults are difficult to report even in normal
> > conditions. For example, +TLB invalidations should always succeed
> > from vIOMMU's +perspective. There is no architectural way to report
> > back to the vIOMMU +if the UAPI data is incompatible. For this
> > reason the following IOMMU +UAPIs cannot fail:
> > +
> > +1. Free PASID
> > +2. Unbind guest PASID
> > +3. Unbind guest PASID table (SMMU)
> > +4. Cache invalidate
> > +5. Page response
> > +
> > +User applications such as QEMU is expected to import kernel UAPI
> > +headers. Only backward compatibility is supported. For example, an
> > +older QEMU (with 

Re: [PATCH v2 02/15] iommu: Report domain nesting info

2020-06-11 Thread Alex Williamson
On Thu, 11 Jun 2020 05:15:21 -0700
Liu Yi L  wrote:

> IOMMUs that support nesting translation needs report the capability info
> to userspace, e.g. the format of first level/stage paging structures.
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
> @Jean, Eric: as nesting was introduced for ARM, but looks like no actual
> user of it. right? So I'm wondering if we can reuse DOMAIN_ATTR_NESTING
> to retrieve nesting info? how about your opinions?
> 
>  include/linux/iommu.h  |  1 +
>  include/uapi/linux/iommu.h | 34 ++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 78a26ae..f6e4b49 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -126,6 +126,7 @@ enum iommu_attr {
>   DOMAIN_ATTR_FSL_PAMUV1,
>   DOMAIN_ATTR_NESTING,/* two stages of translation */
>   DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> + DOMAIN_ATTR_NESTING_INFO,
>   DOMAIN_ATTR_MAX,
>  };
>  
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 303f148..02eac73 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
>   };
>  };
>  
> +struct iommu_nesting_info {
> + __u32   size;
> + __u32   format;
> + __u32   features;
> +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0)
> +#define IOMMU_NESTING_FEAT_BIND_PGTBL(1 << 1)
> +#define IOMMU_NESTING_FEAT_CACHE_INVLD   (1 << 2)
> + __u32   flags;
> + __u8data[];
> +};
> +
> +/*
> + * @flags:   VT-d specific flags. Currently reserved for future
> + *   extension.
> + * @addr_width:  The output addr width of first level/stage translation
> + * @pasid_bits:  Maximum supported PASID bits, 0 represents no PASID
> + *   support.
> + * @cap_reg: Describe basic capabilities as defined in VT-d capability
> + *   register.
> + * @cap_mask:Mark valid capability bits in @cap_reg.
> + * @ecap_reg:Describe the extended capabilities as defined in VT-d
> + *   extended capability register.
> + * @ecap_mask:   Mark the valid capability bits in @ecap_reg.

Please explain this a little further, why do we need to tell userspace
about cap/ecap register bits that aren't valid through this interface?
Thanks,

Alex


> + */
> +struct iommu_nesting_info_vtd {
> + __u32   flags;
> + __u16   addr_width;
> + __u16   pasid_bits;
> + __u64   cap_reg;
> + __u64   cap_mask;
> + __u64   ecap_reg;
> + __u64   ecap_mask;
> +};
> +
>  #endif /* _UAPI_IOMMU_H */

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 1/4] dma-direct: always align allocation size in dma_direct_alloc_pages()

2020-06-11 Thread David Rientjes via iommu
dma_alloc_contiguous() does size >> PAGE_SHIFT and set_memory_decrypted()
works at page granularity.  It's necessary to page align the allocation
size in dma_direct_alloc_pages() for consistent behavior.

This also fixes an issue when arch_dma_prep_coherent() is called on an
unaligned allocation size for dma_alloc_need_uncached() when
CONFIG_DMA_DIRECT_REMAP is disabled but CONFIG_ARCH_HAS_DMA_SET_UNCACHED
is enabled.

Cc: sta...@vger.kernel.org
Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -112,11 +112,12 @@ static inline bool dma_should_free_from_pool(struct 
device *dev,
 struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp, unsigned long attrs)
 {
-   size_t alloc_size = PAGE_ALIGN(size);
int node = dev_to_node(dev);
struct page *page = NULL;
u64 phys_limit;
 
+   VM_BUG_ON(!PAGE_ALIGNED(size));
+
if (attrs & DMA_ATTR_NO_WARN)
gfp |= __GFP_NOWARN;
 
@@ -124,14 +125,14 @@ struct page *__dma_direct_alloc_pages(struct device *dev, 
size_t size,
gfp &= ~__GFP_ZERO;
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
-   page = dma_alloc_contiguous(dev, alloc_size, gfp);
+   page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
-   dma_free_contiguous(dev, page, alloc_size);
+   dma_free_contiguous(dev, page, size);
page = NULL;
}
 again:
if (!page)
-   page = alloc_pages_node(node, gfp, get_order(alloc_size));
+   page = alloc_pages_node(node, gfp, get_order(size));
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
page = NULL;
@@ -158,8 +159,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
struct page *page;
void *ret;
 
+   size = PAGE_ALIGN(size);
+
if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
-   ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), , gfp);
+   ret = dma_alloc_from_pool(dev, size, , gfp);
if (!ret)
return NULL;
goto done;
@@ -183,10 +186,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
 dma_alloc_need_uncached(dev, attrs)) ||
(IS_ENABLED(CONFIG_DMA_REMAP) && PageHighMem(page))) {
/* remove any dirty cache lines on the kernel alias */
-   arch_dma_prep_coherent(page, PAGE_ALIGN(size));
+   arch_dma_prep_coherent(page, size);
 
/* create a coherent mapping */
-   ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size),
+   ret = dma_common_contiguous_remap(page, size,
dma_pgprot(dev, PAGE_KERNEL, attrs),
__builtin_return_address(0));
if (!ret)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 4/4] dma-direct: add missing set_memory_decrypted() for coherent mapping

2020-06-11 Thread David Rientjes via iommu
When a coherent mapping is created in dma_direct_alloc_pages(), it needs
to be decrypted if the device requires unencrypted DMA before returning.

Fixes: 3acac065508f ("dma-mapping: merge the generic remapping helpers
into dma-direct")
Cc: sta...@vger.kernel.org # 5.5+
Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -195,6 +195,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
__builtin_return_address(0));
if (!ret)
goto out_free_pages;
+   if (force_dma_unencrypted(dev)) {
+   err = set_memory_decrypted((unsigned long)ret,
+  1 << get_order(size));
+   if (err)
+   goto out_free_pages;
+   }
memset(ret, 0, size);
goto done;
}
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 2/4] dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails

2020-06-11 Thread David Rientjes via iommu
If arch_dma_set_uncached() fails after memory has been decrypted, it needs
to be re-encrypted before freeing.

Fixes: fa7e2247c572 ("dma-direct: make uncached_kernel_address more
general")
Cc: sta...@vger.kernel.org # 5.7
Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -220,7 +220,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
arch_dma_prep_coherent(page, size);
ret = arch_dma_set_uncached(ret, size);
if (IS_ERR(ret))
-   goto out_free_pages;
+   goto out_encrypt_pages;
}
 done:
if (force_dma_unencrypted(dev))
@@ -228,6 +228,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
else
*dma_handle = phys_to_dma(dev, page_to_phys(page));
return ret;
+out_encrypt_pages:
+   if (force_dma_unencrypted(dev))
+   set_memory_encrypted((unsigned long)page_address(page),
+1 << get_order(size));
 out_free_pages:
dma_free_contiguous(dev, page, size);
return NULL;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 0/4] dma-direct: dma_direct_alloc_pages() fixes for AMD SEV

2020-06-11 Thread David Rientjes via iommu
While debugging recently reported issues concerning DMA allocation
practices when CONFIG_AMD_MEM_ENCRYPT is enabled, some curiosities arose
when looking at dma_direct_alloc_pages() behavior.

Fix these up.  These are likely all stable material, so proposing for 5.8.
---
 kernel/dma/direct.c | 42 --
 1 file changed, 32 insertions(+), 10 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 3/4] dma-direct: check return value when encrypting or decrypting memory

2020-06-11 Thread David Rientjes via iommu
__change_page_attr() can fail which will cause set_memory_encrypted() and
set_memory_decrypted() to return non-zero.

If the device requires unencrypted DMA memory and decryption fails, simply
free the memory and fail.

If attempting to re-encrypt in the failure path and that encryption fails,
there is no alternative other than to leak the memory.

Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent
buffers in common code")
Cc: sta...@vger.kernel.org # 4.17+
Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -158,6 +158,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
 {
struct page *page;
void *ret;
+   int err;
 
size = PAGE_ALIGN(size);
 
@@ -210,8 +211,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
}
 
ret = page_address(page);
-   if (force_dma_unencrypted(dev))
-   set_memory_decrypted((unsigned long)ret, 1 << get_order(size));
+   if (force_dma_unencrypted(dev)) {
+   err = set_memory_decrypted((unsigned long)ret,
+  1 << get_order(size));
+   if (err)
+   goto out_free_pages;
+   }
 
memset(ret, 0, size);
 
@@ -229,9 +234,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
*dma_handle = phys_to_dma(dev, page_to_phys(page));
return ret;
 out_encrypt_pages:
-   if (force_dma_unencrypted(dev))
-   set_memory_encrypted((unsigned long)page_address(page),
-1 << get_order(size));
+   if (force_dma_unencrypted(dev)) {
+   err = set_memory_encrypted((unsigned long)page_address(page),
+  1 << get_order(size));
+   /* If memory cannot be re-encrypted, it must be leaked */
+   if (err)
+   return NULL;
+   }
 out_free_pages:
dma_free_contiguous(dev, page, size);
return NULL;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users

2020-06-11 Thread Alex Williamson
On Wed, 10 Jun 2020 21:12:15 -0700
Jacob Pan  wrote:

> IOMMU UAPI data has an argsz field which is filled by user. As the data
> structures expands, argsz may change. As the UAPI data are shared among
> different architectures, extensions of UAPI data could be a result of
> one architecture which has no impact on another. Therefore, these argsz
> santity checks are performed in the model specific IOMMU drivers. This
> patch adds sanity checks in the VT-d to ensure argsz passed by userspace
> matches feature flags and other contents.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-iommu.c | 16 
>  drivers/iommu/intel-svm.c   | 12 
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 27ebf4b9faef..c98b5109684b 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
> struct device *dev,
>   struct device_domain_info *info;
>   struct intel_iommu *iommu;
>   unsigned long flags;
> + unsigned long minsz;
>   int cache_type;
>   u8 bus, devfn;
>   u16 did, sid;
> @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct iommu_domain 
> *domain, struct device *dev,
>   if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
>   return -EINVAL;
>  
> + minsz = offsetofend(struct iommu_cache_invalidate_info, padding);

Would it still be better to look for the end of the last field that's
actually used to avoid the code churn and oversights if/when the padding
field does get used and renamed?

Per my comment on patch 1/, this also seems like where the device
specific IOMMU driver should also have the responsibility of receiving
a __user pointer to do the copy_from_user() here.  vfio can't know
which flags require which fields to make a UAPI with acceptable
compatibility guarantees otherwise.

> + if (inv_info->argsz < minsz)
> + return -EINVAL;
> +
> + /* Sanity check user filled invalidation dat sizes */
> + if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> + inv_info->argsz != offsetofend(struct 
> iommu_cache_invalidate_info,
> + addr_info))
> + return -EINVAL;
> +
> + if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> + inv_info->argsz != offsetofend(struct 
> iommu_cache_invalidate_info,
> + pasid_info))
> + return -EINVAL;
> +
>   spin_lock_irqsave(_domain_lock, flags);
>   spin_lock(>lock);
>   info = get_domain_info(dev);
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 35b43fe819ed..64dc2c66dfff 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
> struct device *dev,
>   struct dmar_domain *dmar_domain;
>   struct intel_svm_dev *sdev;
>   struct intel_svm *svm;
> + unsigned long minsz;
>   int ret = 0;
>  
>   if (WARN_ON(!iommu) || !data)
>   return -EINVAL;
>  
> + /*
> +  * We mandate that no size change in IOMMU UAPI data before the
> +  * variable size union at the end.
> +  */
> + minsz = offsetofend(struct iommu_gpasid_bind_data, padding);

Same.  Thanks,

Alex

> + if (data->argsz < minsz)
> + return -EINVAL;
> +
>   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
>   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
>   return -EINVAL;
>  
> + if (data->argsz != offsetofend(struct iommu_gpasid_bind_data, vtd))
> + return -EINVAL;
> +
>   if (!dev_is_pci(dev))
>   return -ENOTSUPP;
>  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data

2020-06-11 Thread Alex Williamson
On Wed, 10 Jun 2020 21:12:14 -0700
Jacob Pan  wrote:

> As IOMMU UAPI gets extended, user data size may increase. To support
> backward compatibiliy, this patch introduces a size field to each UAPI
> data structures. It is *always* the responsibility for the user to fill in
> the correct size.

Though at the same time, argsz is user provided data which we don't
trust.  The argsz field allows the user to indicate how much data
they're providing, it's still the kernel's responsibility to validate
whether it's correct and sufficient for the requested operation.
Thanks,

Alex

> Specific scenarios for user data handling are documented in:
> Documentation/userspace-api/iommu.rst
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  include/uapi/linux/iommu.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index e907b7091a46..303f148a5cd7 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -135,6 +135,7 @@ enum iommu_page_response_code {
>  
>  /**
>   * struct iommu_page_response - Generic page response information
> + * @argsz: User filled size of this data
>   * @version: API version of this structure
>   * @flags: encodes whether the corresponding fields are valid
>   * (IOMMU_FAULT_PAGE_RESPONSE_* values)
> @@ -143,6 +144,7 @@ enum iommu_page_response_code {
>   * @code: response code from  iommu_page_response_code
>   */
>  struct iommu_page_response {
> + __u32   argsz;
>  #define IOMMU_PAGE_RESP_VERSION_11
>   __u32   version;
>  #define IOMMU_PAGE_RESP_PASID_VALID  (1 << 0)
> @@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
>  /**
>   * struct iommu_cache_invalidate_info - First level/stage invalidation
>   * information
> + * @argsz: User filled size of this data
>   * @version: API version of this structure
>   * @cache: bitfield that allows to select which caches to invalidate
>   * @granularity: defines the lowest granularity used for the invalidation:
> @@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
>   * must support the used granularity.
>   */
>  struct iommu_cache_invalidate_info {
> + __u32   argsz;
>  #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>   __u32   version;
>  /* IOMMU paging structure cache */
> @@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
>  
>  /**
>   * struct iommu_gpasid_bind_data - Information about device and guest PASID 
> binding
> + * @argsz:   User filled size of this data
>   * @version: Version of this data structure
>   * @format:  PASID table entry format
>   * @flags:   Additional information on guest bind request
> @@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd {
>   * PASID to host PASID based on this bind data.
>   */
>  struct iommu_gpasid_bind_data {
> + __u32 argsz;
>  #define IOMMU_GPASID_BIND_VERSION_1  1
>   __u32 version;
>  #define IOMMU_PASID_FORMAT_INTEL_VTD 1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-11 Thread Jacob Pan
Hi Jon,

On Thu, 11 Jun 2020 07:55:00 -0600
Jonathan Corbet  wrote:

> On Wed, 10 Jun 2020 21:12:13 -0700
> Jacob Pan  wrote:
> 
> A little nit but...this pattern:
> 
> > +pattern below:
> > +
> > +::
> > +
> > +   struct {
> > +   __u32 argsz;
> > +   __u32 flags;
> > +   __u8  data[];
> > +  }  
> 
> can be more concisely and attractively written as:
> 
>pattern below::
> 
>   struct { 
> ...
> 
Will do. thanks!

> Thanks,
> 
> jon

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-11 Thread Alex Williamson
On Wed, 10 Jun 2020 21:12:13 -0700
Jacob Pan  wrote:

> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is indended to clarify the UAPI design and usage. The
> mechenics of how future extensions should be achieved are also covered
> in this documentation.
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  Documentation/userspace-api/iommu.rst | 210 
> ++
>  1 file changed, 210 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommu.rst
> 
> diff --git a/Documentation/userspace-api/iommu.rst 
> b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index ..e95dc5a04a41
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,210 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=
> +IOMMU Userspace API
> +=
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For native
> +usage, IOMMU is a system device which does not need to communicate
> +with user space directly.
> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
> +required to communicate with the physical IOMMU in the host.
> +
> +.. contents:: :local:
> +
> +Functionalities
> +
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> +4. Invalidate IOMMU caches
> +5. Service page request
> +
> +Requirements
> +
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> +3. Extensions to the UAPI shall not break existing user space
> +
> +Interfaces
> +
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is
> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase in size.
> +
> +No new fields can be added *after* the variable size union in that it
> +will break backward compatibility when offset moves. In both cases, a
> +new flag must be accompanied with a new field such that the IOMMU
> +driver can process the data based on the new flag. Version field is
> +only reserved for the unlikely event of UAPI upgrade at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +
> +When IOMMU UAPI extension results in size increase, user such as VFIO
> +has to handle the following scenarios:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on a older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within
> +   range. The data may contain garbage.
> +
> +
> +Feature Checking
> +
> +While launching a guest with vIOMMU, it is important to ensure that host
> +can support the UAPI data structures to be used for vIOMMU-pIOMMU
> +communications. Without the upfront compatibility checking, future
> +faults are difficult to report even in normal conditions. For example,
> +TLB invalidations should always succeed from vIOMMU's
> +perspective. There is no architectural way to report back to the vIOMMU
> +if the UAPI data is incompatible. For this reason the following IOMMU
> +UAPIs cannot fail:
> +
> +1. Free PASID
> +2. Unbind guest PASID
> +3. Unbind guest PASID table (SMMU)
> +4. Cache invalidate
> +5. Page response
> +
> +User applications such as QEMU is expected to import kernel UAPI
> +headers. Only backward compatibility is supported. For example, an
> +older QEMU (with older kernel header) can run on newer kernel. Newer
> +QEMU (with new kernel header) may fail on older kernel.

"Build your user application against newer kernels and it may break on
older kernels" is not a great selling point of this UAPI.  Clearly new
features may not be available on older kernels and an 

Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-11 Thread Jonathan Corbet
On Wed, 10 Jun 2020 21:12:13 -0700
Jacob Pan  wrote:

A little nit but...this pattern:

> +pattern below:
> +
> +::
> +
> +   struct {
> + __u32 argsz;
> + __u32 flags;
> + __u8  data[];
> +  }

can be more concisely and attractively written as:

   pattern below::

struct { 
...

Thanks,

jon
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-11 Thread Bjorn Helgaas
On Thu, Jun 11, 2020 at 10:54:45AM +0800, Zhangfei Gao wrote:
> On 2020/6/10 上午12:49, Bjorn Helgaas wrote:
> > On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote:
> > > On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao  
> > > wrote:
> > > > On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
> > > > > On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> > > > > > On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > > > > > > > +++ b/drivers/iommu/iommu.c
> > > > > > > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device 
> > > > > > > > *dev, struct
> > > > > > > > fwnode_handle *iommu_fwnode,
> > > > > > > >fwspec->iommu_fwnode = iommu_fwnode;
> > > > > > > >fwspec->ops = ops;
> > > > > > > >dev_iommu_fwspec_set(dev, fwspec);
> > > > > > > > +
> > > > > > > > +   if (dev_is_pci(dev))
> > > > > > > > +   pci_fixup_device(pci_fixup_final, 
> > > > > > > > to_pci_dev(dev));
> > > > > > > > +
> > > > > > > > 
> > > > > > > > Then pci_fixup_final will be called twice, the first in 
> > > > > > > > pci_bus_add_device.
> > > > > > > > Here in iommu_fwspec_init is the second time, specifically for 
> > > > > > > > iommu_fwspec.
> > > > > > > > Will send this when 5.8-rc1 is open.
> > > > > > > Wait, this whole fixup approach seems wrong to me.  No matter how 
> > > > > > > you
> > > > > > > do the fixup, it's still a fixup, which means it requires ongoing
> > > > > > > maintenance.  Surely we don't want to have to add the 
> > > > > > > Vendor/Device ID
> > > > > > > for every new AMBA device that comes along, do we?
> > > > > > > 
> > > > > > Here the fake pci device has standard PCI cfg space, but physical
> > > > > > implementation is base on AMBA
> > > > > > They can provide pasid feature.
> > > > > > However,
> > > > > > 1, does not support tlp since they are not real pci devices.
> > > > > > 2. does not support pri, instead support stall (provided by smmu)
> > > > > > And stall is not a pci feature, so it is not described in struct 
> > > > > > pci_dev,
> > > > > > but in struct iommu_fwspec.
> > > > > > So we use this fixup to tell pci system that the devices can 
> > > > > > support stall,
> > > > > > and hereby support pasid.
> > > > > This did not answer my question.  Are you proposing that we update a
> > > > > quirk every time a new AMBA device is released?  I don't think that
> > > > > would be a good model.
> > > > Yes, you are right, but we do not have any better idea yet.
> > > > Currently we have three fake pci devices, which support stall and pasid.
> > > > We have to let pci system know the device can support pasid, because of
> > > > stall feature, though not support pri.
> > > > Do you have any other ideas?
> > > It sounds like the best way would be to allocate a PCI capability for it, 
> > > so
> > > detection can be done through config space, at least in future devices,
> > > or possibly after a firmware update if the config space in your system
> > > is controlled by firmware somewhere.  Once there is a proper mechanism
> > > to do this, using fixups to detect the early devices that don't use that
> > > should be uncontroversial. I have no idea what the process or timeline
> > > is to add new capabilities into the PCIe specification, or if this one
> > > would be acceptable to the PCI SIG at all.
> > That sounds like a possibility.  The spec already defines a
> > Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
> > be a candidate.
> Will investigate this, thanks Bjorn

FWIW, there's also a Vendor-Specific Capability that can appear in the
first 256 bytes of config space (the Vendor-Specific Extended
Capability must appear in the "Extended Configuration Space" from
0x100-0xfff).

> > > If detection cannot be done through PCI config space, the next best
> > > alternative is to pass auxiliary data through firmware. On DT based
> > > machines, you can list non-hotpluggable PCIe devices and add custom
> > > properties that could be read during device enumeration. I assume
> > > ACPI has something similar, but I have not done that.
> Yes, thanks Arnd
> > ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
> > like this better than a PCI capability because the property you need
> > to expose is not a PCI property.
> _DSM may not workable, since it is working in runtime.
> We need stall information in init stage, neither too early (after allocation
> of iommu_fwspec)
> nor too late (before arm_smmu_add_device ).

I'm not aware of a restriction on when _DSM can be evaluated.  I'm
looking at ACPI v6.3, sec 9.1.1.  Are you seeing something different?

> By the way, It would be a long time if we need modify either pcie
> spec or acpi spec.  Can we use pci_fixup_device in iommu_fwspec_init
> first, it is relatively simple and meet the requirement of platform
> device using pasid, and they are already in product.

Neither the PCI Vendor-Specific Capability nor the ACPI _DSM requires
a spec change.  Both can 

[PATCH v2 15/15] iommu/vt-d: Support reporting nesting capability info

2020-06-11 Thread Liu Yi L
Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 81 +++--
 include/linux/intel-iommu.h | 16 +
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 25650ac..5415dc7 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5655,12 +5655,16 @@ static inline bool iommu_pasid_support(void)
 static inline bool nested_mode_support(void)
 {
struct dmar_drhd_unit *drhd;
-   struct intel_iommu *iommu;
+   struct intel_iommu *iommu, *prev = NULL;
bool ret = true;
 
rcu_read_lock();
for_each_active_iommu(iommu, drhd) {
-   if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
+   if (!prev)
+   prev = iommu;
+   if (!sm_supported(iommu) || !ecap_nest(iommu->ecap) ||
+   (VTD_CAP_MASK & (iommu->cap ^ prev->cap)) ||
+   (VTD_ECAP_MASK & (iommu->ecap ^ prev->ecap))) {
ret = false;
break;
}
@@ -6069,11 +6073,84 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
return ret;
 }
 
+static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
+   struct iommu_nesting_info *info)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
+   struct device_domain_info *domain_info;
+   struct iommu_nesting_info_vtd vtd;
+   unsigned long flags;
+   u32 size;
+
+   if ((domain->type != IOMMU_DOMAIN_UNMANAGED) ||
+   !(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
+   return -ENODEV;
+
+   if (!info)
+   return -EINVAL;
+
+   size = sizeof(struct iommu_nesting_info) +
+   sizeof(struct iommu_nesting_info_vtd);
+   /*
+* if provided buffer size is not equal to the size, should
+* return 0 and also the expected buffer size to caller.
+*/
+   if (info->size != size) {
+   info->size = size;
+   return 0;
+   }
+
+   spin_lock_irqsave(_domain_lock, flags);
+   /*
+* arbitrary select the first domain_info as all nesting
+* related capabilities should be consistent across iommu
+* units.
+*/
+   domain_info = list_first_entry(_domain->devices,
+ struct device_domain_info, link);
+   cap &= domain_info->iommu->cap;
+   ecap &= domain_info->iommu->ecap;
+   spin_unlock_irqrestore(_domain_lock, flags);
+
+   info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+   info->features = IOMMU_NESTING_FEAT_SYSWIDE_PASID |
+IOMMU_NESTING_FEAT_BIND_PGTBL |
+IOMMU_NESTING_FEAT_CACHE_INVLD;
+   vtd.flags = 0;
+   vtd.addr_width = dmar_domain->gaw;
+   vtd.pasid_bits = ilog2(intel_pasid_max_id);
+   vtd.cap_reg = cap;
+   vtd.cap_mask = VTD_CAP_MASK;
+   vtd.ecap_reg = ecap;
+   vtd.ecap_mask = VTD_ECAP_MASK;
+
+   memcpy(info->data, , sizeof(vtd));
+   return 0;
+}
+
+static int intel_iommu_domain_get_attr(struct iommu_domain *domain,
+  enum iommu_attr attr, void *data)
+{
+   switch (attr) {
+   case DOMAIN_ATTR_NESTING_INFO:
+   {
+   struct iommu_nesting_info *info =
+   (struct iommu_nesting_info *) data;
+
+   return intel_iommu_get_nesting_info(domain, info);
+   }
+   default:
+   return -ENODEV;
+   }
+}
+
 const struct iommu_ops intel_iommu_ops = {
.capable= intel_iommu_capable,
.domain_alloc   = intel_iommu_domain_alloc,
.domain_free= intel_iommu_domain_free,
.domain_set_attr= intel_iommu_domain_set_attr,
+   .domain_get_attr= intel_iommu_domain_get_attr,
.attach_dev = intel_iommu_attach_device,
.detach_dev = intel_iommu_detach_device,
.aux_attach_dev = intel_iommu_aux_attach_device,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 0b238c3..be48f4e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -196,6 +196,22 @@
 #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
 #define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */
 
+/* Nesting Support Capability Alignment */
+#define VTD_CAP_FL1GP  (1ULL << 56)
+#define VTD_CAP_FL5LP  (1ULL << 60)
+#define VTD_ECAP_PRS   (1ULL << 29)
+#define VTD_ECAP_ERS   (1ULL << 30)
+#define VTD_ECAP_SRS   (1ULL << 

[PATCH v2 09/15] iommu/vt-d: Check ownership for PASIDs from user-space

2020-06-11 Thread Liu Yi L
When an IOMMU domain with nesting attribute is used for guest SVA, a
system-wide PASID is allocated for binding with the device and the domain.
For security reason, we need to check the PASID passsed from user-space.
e.g. page table bind/unbind and PASID related cache invalidation.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 10 ++
 drivers/iommu/intel-svm.c   |  6 --
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2d59a5d..25650ac 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5433,6 +5433,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
int granu = 0;
u64 pasid = 0;
u64 addr = 0;
+   void *pdata;
 
granu = to_vtd_granularity(cache_type, inv_info->granularity);
if (granu == -EINVAL) {
@@ -5452,6 +5453,15 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 (inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
pasid = inv_info->addr_info.pasid;
 
+   pdata = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
+   if (!pdata) {
+   ret = -EINVAL;
+   goto out_unlock;
+   } else if (IS_ERR(pdata)) {
+   ret = PTR_ERR(pdata);
+   goto out_unlock;
+   }
+
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
/* HW will ignore LSB bits based on address mask */
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index bf55e2f..49059c1 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -332,7 +332,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
dmar_domain = to_dmar_domain(domain);
 
mutex_lock(_mutex);
-   svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);
+   svm = ioasid_find(dmar_domain->ioasid_sid, data->hpasid, NULL);
if (IS_ERR(svm)) {
ret = PTR_ERR(svm);
goto out;
@@ -450,6 +450,7 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
struct iommu_gpasid_unbind_data *data)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+   struct dmar_domain *dmar_domain;
struct intel_svm_dev *sdev;
struct intel_svm *svm;
int ret = -EINVAL;
@@ -464,9 +465,10 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
return -EINVAL;
 
pasid = data->pasid;
+   dmar_domain = to_dmar_domain(domain);
 
mutex_lock(_mutex);
-   svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
+   svm = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
if (!svm) {
ret = -EINVAL;
goto out;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 03/15] vfio/type1: Report iommu nesting info to userspace

2020-06-11 Thread Liu Yi L
This patch exports iommu nesting capability info to user space through
VFIO. User space is expected to check this info for supported uAPIs (e.g.
PASID alloc/free, bind page table, and cache invalidation) and the vendor
specific format information for first level/stage page table that will be
bound to.

The nesting info is available only after the nesting iommu type is set
for a container. Current implementation imposes one limitation - one
nesting container should include at most one group. The philosophy of
vfio container is having all groups/devices within the container share
the same IOMMU context. When vSVA is enabled, one IOMMU context could
include one 2nd-level address space and multiple 1st-level address spaces.
While the 2nd-leve address space is reasonably sharable by multiple groups
, blindly sharing 1st-level address spaces across all groups within the
container might instead break the guest expectation. In the future sub/
super container concept might be introduced to allow partial address space
sharing within an IOMMU context. But for now let's go with this restriction
by requiring singleton container for using nesting iommu features. Below
link has the related discussion about this decision.

https://lkml.org/lkml/2020/5/15/1028

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
---
 drivers/vfio/vfio_iommu_type1.c | 73 +
 include/uapi/linux/vfio.h   |  9 +
 2 files changed, 82 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 402aad3..22432cf 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   struct iommu_nesting_info *nesting_info;
 };
 
 struct vfio_domain {
@@ -125,6 +126,9 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(>domain_list))
 
+#define IS_DOMAIN_IN_CONTAINER(iommu)  ((iommu->external_domain) || \
+(!list_empty(>domain_list)))
+
 static int put_pfn(unsigned long pfn, int prot);
 
 /*
@@ -1641,6 +1645,12 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
}
}
 
+   /* Nesting type container can include only one group */
+   if (iommu->nesting && IS_DOMAIN_IN_CONTAINER(iommu)) {
+   mutex_unlock(>lock);
+   return -EINVAL;
+   }
+
group = kzalloc(sizeof(*group), GFP_KERNEL);
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!group || !domain) {
@@ -1700,6 +1710,36 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
if (ret)
goto out_domain;
 
+   /* Nesting cap info is available only after attaching */
+   if (iommu->nesting) {
+   struct iommu_nesting_info tmp;
+   struct iommu_nesting_info *info;
+
+   /* First get the size of vendor specific nesting info */
+   ret = iommu_domain_get_attr(domain->domain,
+   DOMAIN_ATTR_NESTING_INFO,
+   );
+   if (ret)
+   goto out_detach;
+
+   info = kzalloc(tmp.size, GFP_KERNEL);
+   if (!info) {
+   ret = -ENOMEM;
+   goto out_detach;
+   }
+
+   /* Now get the nesting info */
+   info->size = tmp.size;
+   ret = iommu_domain_get_attr(domain->domain,
+   DOMAIN_ATTR_NESTING_INFO,
+   info);
+   if (ret) {
+   kfree(info);
+   goto out_detach;
+   }
+   iommu->nesting_info = info;
+   }
+
/* Get aperture info */
iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, );
 
@@ -1801,6 +1841,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
return 0;
 
 out_detach:
+   kfree(iommu->nesting_info);
vfio_iommu_detach_group(domain, group);
 out_domain:
iommu_domain_free(domain->domain);
@@ -1998,6 +2039,8 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
vfio_iommu_unmap_unpin_all(iommu);
else
vfio_iommu_unmap_unpin_reaccount(iommu);
+
+   kfree(iommu->nesting_info);
}
iommu_domain_free(domain->domain);
list_del(>next);
@@ -2190,6 +2233,30 @@ static int vfio_iommu_iova_build_caps(struct 

[PATCH v2 06/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-06-11 Thread Liu Yi L
This patch allows user space to request PASID allocation/free, e.g. when
serving the request from the guest.

PASIDs that are not freed by userspace are automatically freed when the
IOASID set is destroyed when process exits.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Yi Sun 
Signed-off-by: Jacob Pan 
---
v1 -> v2:
*) move the vfio_mm related code to be a seprate module
*) use a single structure for alloc/free, could support a range of PASIDs
*) fetch vfio_mm at group_attach time instead of at iommu driver open time

 drivers/vfio/Kconfig|  1 +
 drivers/vfio/vfio_iommu_type1.c | 96 -
 drivers/vfio/vfio_pasid.c   | 10 +
 include/linux/vfio.h|  6 +++
 include/uapi/linux/vfio.h   | 36 
 5 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 3d8a108..95d90c6 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -2,6 +2,7 @@
 config VFIO_IOMMU_TYPE1
tristate
depends on VFIO
+   select VFIO_PASID if (X86)
default n
 
 config VFIO_IOMMU_SPAPR_TCE
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 22432cf..a7b3a83 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -72,6 +72,7 @@ struct vfio_iommu {
boolv2;
boolnesting;
struct iommu_nesting_info *nesting_info;
+   struct vfio_mm  *vmm;
 };
 
 struct vfio_domain {
@@ -1615,6 +1616,17 @@ static void vfio_iommu_iova_insert_copy(struct 
vfio_iommu *iommu,
 
list_splice_tail(iova_copy, iova);
 }
+
+static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
+{
+   if (iommu->vmm) {
+   vfio_mm_put(iommu->vmm);
+   iommu->vmm = NULL;
+   }
+
+   kfree(iommu->nesting_info);
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 struct iommu_group *iommu_group)
 {
@@ -1738,6 +1750,25 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
goto out_detach;
}
iommu->nesting_info = info;
+
+   if (info->features & IOMMU_NESTING_FEAT_SYSWIDE_PASID) {
+   struct vfio_mm *vmm;
+   int sid;
+
+   vmm = vfio_mm_get_from_task(current);
+   if (IS_ERR(vmm)) {
+   ret = PTR_ERR(vmm);
+   goto out_detach;
+   }
+   iommu->vmm = vmm;
+
+   sid = vfio_mm_ioasid_sid(vmm);
+   ret = iommu_domain_set_attr(domain->domain,
+   DOMAIN_ATTR_IOASID_SID,
+   );
+   if (ret)
+   goto out_detach;
+   }
}
 
/* Get aperture info */
@@ -1841,7 +1872,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
return 0;
 
 out_detach:
-   kfree(iommu->nesting_info);
+   if (iommu->nesting_info)
+   vfio_iommu_release_nesting_info(iommu);
vfio_iommu_detach_group(domain, group);
 out_domain:
iommu_domain_free(domain->domain);
@@ -2040,7 +2072,8 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
else
vfio_iommu_unmap_unpin_reaccount(iommu);
 
-   kfree(iommu->nesting_info);
+   if (iommu->nesting_info)
+   vfio_iommu_release_nesting_info(iommu);
}
iommu_domain_free(domain->domain);
list_del(>next);
@@ -2363,6 +2396,63 @@ static int vfio_iommu_type1_unmap_dma(struct vfio_iommu 
*iommu,
 
 }
 
+static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
+   unsigned int min,
+   unsigned int max)
+{
+   int ret = -ENOTSUPP;
+
+   mutex_lock(>lock);
+   if (iommu->vmm)
+   ret = vfio_pasid_alloc(iommu->vmm, min, max);
+   mutex_unlock(>lock);
+   return ret;
+}
+
+static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
+   unsigned int min,
+   unsigned int max)
+{
+   int ret = -ENOTSUPP;
+
+   mutex_lock(>lock);
+   if (iommu->vmm) {
+   vfio_pasid_free_range(iommu->vmm, min, max);
+   ret = 0;
+   }
+   mutex_unlock(>lock);
+   return ret;
+}
+
+static int 

[PATCH v2 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache

2020-06-11 Thread Liu Yi L
This patch provides an interface allowing the userspace to invalidate
IOMMU cache for first-level page table. It is required when the first
level IOMMU page table is not managed by the host kernel in the nested
translation setup.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Eric Auger 
Signed-off-by: Jacob Pan 
---
v1 -> v2:
*) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type"
*) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn()
*) vfio_dev_cache_inv_fn() always successful
*) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse VFIO_IOMMU_NESTING_OP

 drivers/vfio/vfio_iommu_type1.c | 59 +
 include/uapi/linux/vfio.h   |  3 +++
 2 files changed, 62 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f1468a0..c233c8e 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2582,6 +2582,54 @@ static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   struct iommu_cache_invalidate_info *cache_info =
+   (struct iommu_cache_invalidate_info *) dc->data;
+
+   iommu_cache_invalidate(dc->domain, dev, cache_info);
+   return 0;
+}
+
+static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu,
+   struct iommu_cache_invalidate_info *cache_info)
+{
+   struct domain_capsule dc = { .data = cache_info };
+   struct vfio_group *group;
+   struct vfio_domain *domain;
+   int ret = 0;
+   struct iommu_nesting_info *info;
+
+   mutex_lock(>lock);
+   /*
+* Cache invalidation is required for any nesting IOMMU,
+* so no need to check system-wide PASID support.
+*/
+   info = iommu->nesting_info;
+   if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) {
+   ret = -ENOTSUPP;
+   goto out_unlock;
+   }
+
+   group = vfio_find_nesting_group(iommu);
+   if (!group) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   domain = list_first_entry(>domain_list,
+ struct vfio_domain, next);
+   dc.group = group;
+   dc.domain = domain->domain;
+   iommu_group_for_each_dev(group->iommu_group, ,
+vfio_dev_cache_invalidate_fn);
+
+out_unlock:
+   mutex_unlock(>lock);
+   return ret;
+}
+
 static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
unsigned long arg)
 {
@@ -2607,6 +2655,9 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu 
*iommu,
case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
data_size = sizeof(struct iommu_gpasid_unbind_data);
break;
+   case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
+   data_size = sizeof(struct iommu_cache_invalidate_info);
+   break;
default:
return -EINVAL;
}
@@ -2633,6 +2684,14 @@ static long vfio_iommu_type1_nesting_op(struct 
vfio_iommu *iommu,
case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
ret = vfio_iommu_handle_pgtbl_op(iommu, false, data);
break;
+   case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
+   {
+   struct iommu_cache_invalidate_info *cache_info =
+   (struct iommu_cache_invalidate_info *)data;
+
+   ret = vfio_iommu_invalidate_cache(iommu, cache_info);
+   break;
+   }
default:
ret = -EINVAL;
}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index e4aa466..9a011d6 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -891,6 +891,8 @@ struct vfio_iommu_type1_pasid_request {
  * +-+---+
  * | UNBIND_PGTBL|  struct iommu_gpasid_unbind_data  |
  * +-+---+
+ * | CACHE_INVLD |  struct iommu_cache_invalidate_info   |
+ * +-+---+
  *
  * returns: 0 on success, -errno on failure.
  */
@@ -903,6 +905,7 @@ struct vfio_iommu_type1_nesting_op {
 
 #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL   (0)
 #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL (1)
+#define VFIO_IOMMU_NESTING_OP_CACHE_INVLD  (2)
 
 #define VFIO_IOMMU_NESTING_OP  _IO(VFIO_TYPE, VFIO_BASE + 23)
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 05/15] iommu/vt-d: Support setting ioasid set to domain

2020-06-11 Thread Liu Yi L
>From IOMMU p.o.v., PASIDs allocated and managed by external components
(e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
needs some knowledge to check the PASID ownership, hence add an interface
for those components to tell the PASID owner.

In latest kernel design, PASID ownership is managed by IOASID set where
the PASID is allocated from. This patch adds support for setting ioasid
set ID to the domains used for nesting/vSVA. Subsequent SVA operations
on the PASID will be checked against its IOASID set for proper ownership.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 16 
 include/linux/intel-iommu.h |  4 
 include/linux/iommu.h   |  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5628e4b..2d59a5d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1788,6 +1788,7 @@ static struct dmar_domain *alloc_domain(int flags)
if (first_level_by_default())
domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
domain->has_iotlb_device = false;
+   domain->ioasid_sid = INVALID_IOASID_SET;
INIT_LIST_HEAD(>devices);
 
return domain;
@@ -6035,6 +6036,21 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
}
spin_unlock_irqrestore(_domain_lock, flags);
break;
+   case DOMAIN_ATTR_IOASID_SID:
+   if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
+   ret = -ENODEV;
+   break;
+   }
+   if ((dmar_domain->ioasid_sid != INVALID_IOASID_SET) &&
+   (dmar_domain->ioasid_sid != (*(int *) data))) {
+   pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
+   dmar_domain->ioasid_sid,
+   (*(int *) data));
+   ret = -EBUSY;
+   break;
+   }
+   dmar_domain->ioasid_sid = *(int *) data;
+   break;
default:
ret = -EINVAL;
break;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 1e02624..29d1c6f 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -548,6 +548,10 @@ struct dmar_domain {
   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
u64 max_addr;   /* maximum mapped address */
 
+   int ioasid_sid; /*
+* the ioasid set which tracks all
+* PASIDs used by the domain.
+*/
int default_pasid;  /*
 * The default pasid used for non-SVM
 * traffic on mediated devices.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f6e4b49..57c46ae 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,6 +127,7 @@ enum iommu_attr {
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
DOMAIN_ATTR_NESTING_INFO,
+   DOMAIN_ATTR_IOASID_SID,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 04/15] vfio: Add PASID allocation/free support

2020-06-11 Thread Liu Yi L
Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
multiple process virtual address spaces with the device for simplified
programming model. PASID is used to tag an virtual address space in DMA
requests and to identify the related translation structure in IOMMU. When
a PASID-capable device is assigned to a VM, we want the same capability
of using PASID to tag guest process virtual address spaces to achieve
virtual SVA (vSVA).

PASID management for guest is vendor specific. Some vendors (e.g. Intel
VT-d) requires system-wide managed PASIDs cross all devices, regardless
of whether a device is used by host or assigned to guest. Other vendors
(e.g. ARM SMMU) may allow PASIDs managed per-device thus could be fully
delegated to the guest for assigned devices.

For system-wide managed PASIDs, this patch introduces a vfio module to
handle explicit PASID alloc/free requests from guest. Allocated PASIDs
are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
object is introduced to track mm_struct. Multiple VFIO containers within
a process share the same vfio_mm object.

A quota mechanism is provided to prevent malicious user from exhausting
available PASIDs. Currently the quota is a global parameter applied to
all VFIO devices. In the future per-device quota might be supported too.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Suggested-by: Alex Williamson 
Signed-off-by: Liu Yi L 
---
v1 -> v2:
*) added in v2, split from the pasid alloc/free support of v1

 drivers/vfio/Kconfig  |   5 ++
 drivers/vfio/Makefile |   1 +
 drivers/vfio/vfio_pasid.c | 151 ++
 include/linux/vfio.h  |  28 +
 4 files changed, 185 insertions(+)
 create mode 100644 drivers/vfio/vfio_pasid.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index fd17db9..3d8a108 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -19,6 +19,11 @@ config VFIO_VIRQFD
depends on VFIO && EVENTFD
default n
 
+config VFIO_PASID
+   tristate
+   depends on IOASID && VFIO
+   default n
+
 menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index de67c47..bb836a3 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
 
 obj-$(CONFIG_VFIO) += vfio.o
 obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
+obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
new file mode 100644
index 000..dd5b6d1
--- /dev/null
+++ b/drivers/vfio/vfio_pasid.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ * Author: Liu Yi L 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Liu Yi L "
+#define DRIVER_DESC "PASID management for VFIO bus drivers"
+
+#define VFIO_DEFAULT_PASID_QUOTA   1000
+static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
+module_param_named(pasid_quota, pasid_quota, uint, 0444);
+MODULE_PARM_DESC(pasid_quota,
+" Set the quota for max number of PASIDs that an application 
is allowed to request (default 1000)");
+
+struct vfio_mm_token {
+   unsigned long long val;
+};
+
+struct vfio_mm {
+   struct kref kref;
+   struct vfio_mm_tokentoken;
+   int ioasid_sid;
+   int pasid_quota;
+   struct list_headnext;
+};
+
+static struct vfio_pasid {
+   struct mutexvfio_mm_lock;
+   struct list_headvfio_mm_list;
+} vfio_pasid;
+
+/* called with vfio.vfio_mm_lock held */
+static void vfio_mm_release(struct kref *kref)
+{
+   struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
+
+   list_del(>next);
+   mutex_unlock(_pasid.vfio_mm_lock);
+   ioasid_free_set(vmm->ioasid_sid, true);
+   kfree(vmm);
+}
+
+void vfio_mm_put(struct vfio_mm *vmm)
+{
+   kref_put_mutex(>kref, vfio_mm_release, _pasid.vfio_mm_lock);
+}
+
+static void vfio_mm_get(struct vfio_mm *vmm)
+{
+   kref_get(>kref);
+}
+
+struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
+{
+   struct mm_struct *mm = get_task_mm(task);
+   struct vfio_mm *vmm;
+   unsigned long long val = (unsigned long long) mm;
+   int ret;
+
+   mutex_lock(_pasid.vfio_mm_lock);
+   /* Search existing vfio_mm with current mm pointer */
+   list_for_each_entry(vmm, _pasid.vfio_mm_list, next) {
+   if (vmm->token.val == val) {
+   vfio_mm_get(vmm);
+   goto 

[PATCH v2 02/15] iommu: Report domain nesting info

2020-06-11 Thread Liu Yi L
IOMMUs that support nesting translation needs report the capability info
to userspace, e.g. the format of first level/stage paging structures.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
@Jean, Eric: as nesting was introduced for ARM, but looks like no actual
user of it. right? So I'm wondering if we can reuse DOMAIN_ATTR_NESTING
to retrieve nesting info? how about your opinions?

 include/linux/iommu.h  |  1 +
 include/uapi/linux/iommu.h | 34 ++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 78a26ae..f6e4b49 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -126,6 +126,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_NESTING_INFO,
DOMAIN_ATTR_MAX,
 };
 
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 303f148..02eac73 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
};
 };
 
+struct iommu_nesting_info {
+   __u32   size;
+   __u32   format;
+   __u32   features;
+#define IOMMU_NESTING_FEAT_SYSWIDE_PASID   (1 << 0)
+#define IOMMU_NESTING_FEAT_BIND_PGTBL  (1 << 1)
+#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2)
+   __u32   flags;
+   __u8data[];
+};
+
+/*
+ * @flags: VT-d specific flags. Currently reserved for future
+ * extension.
+ * @addr_width:The output addr width of first level/stage translation
+ * @pasid_bits:Maximum supported PASID bits, 0 represents no PASID
+ * support.
+ * @cap_reg:   Describe basic capabilities as defined in VT-d capability
+ * register.
+ * @cap_mask:  Mark valid capability bits in @cap_reg.
+ * @ecap_reg:  Describe the extended capabilities as defined in VT-d
+ * extended capability register.
+ * @ecap_mask: Mark the valid capability bits in @ecap_reg.
+ */
+struct iommu_nesting_info_vtd {
+   __u32   flags;
+   __u16   addr_width;
+   __u16   pasid_bits;
+   __u64   cap_reg;
+   __u64   cap_mask;
+   __u64   ecap_reg;
+   __u64   ecap_mask;
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 13/15] vfio/pci: Expose PCIe PASID capability to guest

2020-06-11 Thread Liu Yi L
This patch exposes PCIe PASID capability to guest for assigned devices.
Existing vfio_pci driver hides it from guest by setting the capability
length as 0 in pci_ext_cap_length[].

And this patch only exposes PASID capability for devices which has PCIe
PASID extended struture in its configuration space. So VFs, will will
not see PASID capability on VFs as VF doesn't implement PASID extended
structure in its configuration space. For VF, it is a TODO in future.
Related discussion can be found in below link:

https://lkml.org/lkml/2020/4/7/693

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
---
v1 -> v2:
*) added in v2, but it was sent in a separate patchseries before

 drivers/vfio/pci/vfio_pci_config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index 90c0b80..4b9af99 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = 
{
[PCI_EXT_CAP_ID_LTR]=   PCI_EXT_CAP_LTR_SIZEOF,
[PCI_EXT_CAP_ID_SECPCI] =   0,  /* not yet */
[PCI_EXT_CAP_ID_PMUX]   =   0,  /* not yet */
-   [PCI_EXT_CAP_ID_PASID]  =   0,  /* not yet */
+   [PCI_EXT_CAP_ID_PASID]  =   PCI_EXT_CAP_PASID_SIZEOF,
 };
 
 /*
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 14/15] vfio: Document dual stage control

2020-06-11 Thread Liu Yi L
From: Eric Auger 

The VFIO API was enhanced to support nested stage control: a bunch of
new iotcls and usage guideline.

Let's document the process to follow to set up nested mode.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Eric Auger 
Signed-off-by: Liu Yi L 
---
v1 -> v2:
*) new in v2, compared with Eric's original version, pasid table bind
   and fault reporting is removed as this series doesn't cover them.
   Original version from Eric.
   https://lkml.org/lkml/2020/3/20/700

 Documentation/driver-api/vfio.rst | 64 +++
 1 file changed, 64 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst 
b/Documentation/driver-api/vfio.rst
index f1a4d3c..06224bd 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,70 @@ group and can access them as follows::
/* Gratuitous device reset and go... */
ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMU Dual Stage Control
+
+
+Some IOMMUs support 2 stages/levels of translation. Stage corresponds to
+the ARM terminology while level corresponds to Intel's VTD terminology.
+In the following text we use either without distinction.
+
+This is useful when the guest is exposed with a virtual IOMMU and some
+devices are assigned to the guest through VFIO. Then the guest OS can use
+stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for
+VM isolation (GPA -> HPA).
+
+Under dual stage translation, the guest gets ownership of the stage 1 page
+tables and also owns stage 1 configuration structures. The hypervisor owns
+the root configuration structure (for security reason), including stage 2
+configuration. This works as long configuration structures and page table
+format are compatible between the virtual IOMMU and the physical IOMMU.
+
+Assuming the HW supports it, this nested mode is selected by choosing the
+VFIO_TYPE1_NESTING_IOMMU type through:
+
+ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
+
+This forces the hypervisor to use the stage 2, leaving stage 1 available
+for guest usage. The guest stage 1 format depends on IOMMU vendor, and
+it is the same with the nesting configuration method. User space should
+check the format and configuration method after setting nesting type by
+using:
+
+ioctl(container->fd, VFIO_IOMMU_GET_INFO, _info);
+
+Details can be found in Documentation/userspace-api/iommu.rst. For Intel
+VT-d, each stage 1 page table is bound to host by:
+
+nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
+memcpy(_op->data, _data, sizeof(bind_data));
+ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
+GVA->GPA page tables are available when PASID (Process Address Space ID)
+is exposed to guest. e.g. guest with PASID-capable devices assigned. For
+such page table binding, the bind_data should include PASID info, which
+is allocated by guest itself or by host. This depends on hardware vendor
+e.g. Intel VT-d requires to allocate PASID from host. This requirement is
+available by VFIO_IOMMU_GET_INFO. User space could allocate PASID from
+host by:
+
+req.flags = VFIO_IOMMU_ALLOC_PASID;
+ioctl(container, VFIO_IOMMU_PASID_REQUEST, );
+
+With first stage/level page table bound to host, it allows to combine the
+guest stage 1 translation along with the hypervisor stage 2 translation to
+get final address.
+
+When the guest invalidates stage 1 related caches, invalidations must be
+forwarded to the host through
+
+nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
+memcpy(_op->data, _data, sizeof(inv_data));
+ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+Those invalidations can happen at various granularity levels, page, context,
+...
+
 VFIO User API
 ---
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data

2020-06-11 Thread Liu Yi L
Existing iommu_gpasid_bind_data is used for binding guest page tables
to a specified PASID. While for unwind it, a unbind_data structure is
needed.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 include/uapi/linux/iommu.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 02eac73..46a7c57 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -332,6 +332,19 @@ struct iommu_gpasid_bind_data {
};
 };
 
+/**
+ * struct iommu_gpasid_unbind_data - Information about device and guest PASID
+ *  unbinding
+ * @argsz: User filled size of this data
+ * @flags: Additional information on guest unbind request
+ * @pasid: Process address space ID used for the guest mm in host IOMMU
+ */
+struct iommu_gpasid_unbind_data {
+   __u32 argsz;
+   __u64 flags;
+   __u64 pasid;
+};
+
 struct iommu_nesting_info {
__u32   size;
__u32   format;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 10/15] vfio/type1: Support binding guest page tables to PASID

2020-06-11 Thread Liu Yi L
Nesting translation allows two-levels/stages page tables, with 1st level
for guest translations (e.g. GVA->GPA), 2nd level for host translations
(e.g. GPA->HPA). This patch adds interface for binding guest page tables
to a PASID. This PASID must have been allocated to user space before the
binding request.

CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v1 -> v2:
*) rename subject from "vfio/type1: Bind guest page tables to host"
*) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/
   unbind guet page table
*) replaced vfio_iommu_for_each_dev() with a group level loop since this
   series enforces one group per container w/ nesting type as start.
*) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
*) vfio_dev_unbind_gpasid() always successful
*) use vfio_mm->pasid_lock to avoid race between PASID free and page table
   bind/unbind

Cc: Kevin Tian 
 drivers/vfio/vfio_iommu_type1.c | 191 
 drivers/vfio/vfio_pasid.c   |  30 +++
 include/linux/vfio.h|  20 +
 include/uapi/linux/vfio.h   |  30 +++
 4 files changed, 271 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a7b3a83..f1468a0 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -130,6 +130,33 @@ struct vfio_regions {
 #define IS_DOMAIN_IN_CONTAINER(iommu)  ((iommu->external_domain) || \
 (!list_empty(>domain_list)))
 
+struct domain_capsule {
+   struct vfio_group *group;
+   struct iommu_domain *domain;
+   void *data;
+};
+
+/* iommu->lock must be held */
+static struct vfio_group *vfio_find_nesting_group(struct vfio_iommu *iommu)
+{
+   struct vfio_domain *d;
+   struct vfio_group *g, *group = NULL;
+
+   if (!iommu->nesting_info)
+   return NULL;
+
+   /* only support singleton container with nesting type */
+   list_for_each_entry(d, >domain_list, next) {
+   list_for_each_entry(g, >group_list, next) {
+   if (!group) {
+   group = g;
+   break;
+   }
+   }
+   }
+   return group;
+}
+
 static int put_pfn(unsigned long pfn, int prot);
 
 /*
@@ -2014,6 +2041,39 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   struct iommu_gpasid_bind_data *bind_data =
+   (struct iommu_gpasid_bind_data *) dc->data;
+
+   return iommu_sva_bind_gpasid(dc->domain, dev, bind_data);
+}
+
+static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   struct iommu_gpasid_unbind_data *unbind_data =
+   (struct iommu_gpasid_unbind_data *) dc->data;
+
+   iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
+   return 0;
+}
+
+static void vfio_group_unbind_gpasid_fn(ioasid_t pasid, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *) data;
+   struct iommu_gpasid_unbind_data unbind_data;
+
+   unbind_data.argsz = sizeof(unbind_data);
+   unbind_data.flags = 0;
+   unbind_data.pasid = pasid;
+
+   dc->data = _data;
+   iommu_group_for_each_dev(dc->group->iommu_group,
+dc, vfio_dev_unbind_gpasid_fn);
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
  struct iommu_group *iommu_group)
 {
@@ -2055,6 +2115,21 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
if (!group)
continue;
 
+   if (iommu->nesting_info && iommu->vmm &&
+   (iommu->nesting_info->features &
+   IOMMU_NESTING_FEAT_BIND_PGTBL)) {
+   struct domain_capsule dc = { .group = group,
+.domain = domain->domain,
+.data = NULL };
+
+   /*
+* Unbind page tables bound with system wide PASIDs
+* which are allocated to user space.
+*/
+   vfio_mm_for_each_pasid(iommu->vmm, ,
+  vfio_group_unbind_gpasid_fn);
+   }
+
vfio_iommu_detach_group(domain, group);
list_del(>next);
kfree(group);
@@ -2453,6 +2528,120 @@ static int vfio_iommu_type1_pasid_request(struct 
vfio_iommu *iommu,
  

[PATCH v2 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs

2020-06-11 Thread Liu Yi L
Recent years, mediated device pass-through framework (e.g. vfio-mdev)
is used to achieve flexible device sharing across domains (e.g. VMs).
Also there are hardware assisted mediated pass-through solutions from
platform vendors. e.g. Intel VT-d scalable mode which supports Intel
Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
concept, which means mdevs are protected by an iommu domain which is
auxiliary to the domain that the kernel driver primarily uses for DMA
API. Details can be found in the KVM presentation as below:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\
Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf

This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
main requirement is to use the auxiliary domain associated with mdev.

Cc: Kevin Tian 
CC: Jacob Pan 
CC: Jun Tian 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
---
v1 -> v2:
*) check the iommu_device to ensure the handling mdev is IOMMU-backed

 drivers/vfio/vfio_iommu_type1.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c233c8e..bcd7935 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2041,13 +2041,27 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static struct device *vfio_get_iommu_device(struct vfio_group *group,
+   struct device *dev)
+{
+   if (group->mdev_group)
+   return vfio_mdev_get_iommu_device(dev);
+   else
+   return dev;
+}
+
 static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
 {
struct domain_capsule *dc = (struct domain_capsule *)data;
struct iommu_gpasid_bind_data *bind_data =
(struct iommu_gpasid_bind_data *) dc->data;
+   struct device *iommu_device;
+
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return -EINVAL;
 
-   return iommu_sva_bind_gpasid(dc->domain, dev, bind_data);
+   return iommu_sva_bind_gpasid(dc->domain, iommu_device, bind_data);
 }
 
 static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
@@ -2055,8 +2069,13 @@ static int vfio_dev_unbind_gpasid_fn(struct device *dev, 
void *data)
struct domain_capsule *dc = (struct domain_capsule *)data;
struct iommu_gpasid_unbind_data *unbind_data =
(struct iommu_gpasid_unbind_data *) dc->data;
+   struct device *iommu_device;
+
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return 0;
 
-   iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
+   iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data);
return 0;
 }
 
@@ -2587,8 +2606,13 @@ static int vfio_dev_cache_invalidate_fn(struct device 
*dev, void *data)
struct domain_capsule *dc = (struct domain_capsule *)data;
struct iommu_cache_invalidate_info *cache_info =
(struct iommu_cache_invalidate_info *) dc->data;
+   struct device *iommu_device;
+
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return -EINVAL;
 
-   iommu_cache_invalidate(dc->domain, dev, cache_info);
+   iommu_cache_invalidate(dc->domain, iommu_device, cache_info);
return 0;
 }
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 08/15] iommu: Pass domain and unbind_data to sva_unbind_gpasid()

2020-06-11 Thread Liu Yi L
From: Yi Sun 

Current interface is good enough for SVA virtualization on an assigned
physical PCI device, but when it comes to mediated devices, a physical
device may attached with multiple aux-domains. So this interface needs
to pass in domain info. Then the iommu driver is able to know which
domain will be used for the 2nd stage translation of the nesting mode.

This patch passes @domain per the above reason. This interface is supposed
to serve the unbind request from user-space, should pass in unbind_data as
well.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Yi Sun 
Signed-off-by: Liu Yi L 
---
 drivers/iommu/intel-svm.c   | 14 --
 drivers/iommu/iommu.c   |  4 ++--
 include/linux/intel-iommu.h |  3 ++-
 include/linux/iommu.h   |  8 +---
 4 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 6272da6..bf55e2f 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -445,16 +445,26 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
return ret;
 }
 
-int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev,
+   struct iommu_gpasid_unbind_data *data)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
struct intel_svm_dev *sdev;
struct intel_svm *svm;
int ret = -EINVAL;
+   unsigned long minsz;
+   int pasid;
+
+   if (WARN_ON(!iommu) || !data)
+   return -EINVAL;
 
-   if (WARN_ON(!iommu))
+   minsz = offsetofend(struct iommu_gpasid_unbind_data, pasid);
+   if (data->argsz < minsz || data->flags)
return -EINVAL;
 
+   pasid = data->pasid;
+
mutex_lock(_mutex);
svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
if (!svm) {
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 374b34f..57aac03 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1955,12 +1955,12 @@ int iommu_sva_bind_gpasid(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
 
 int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
-ioasid_t pasid)
+   struct iommu_gpasid_unbind_data *data)
 {
if (unlikely(!domain->ops->sva_unbind_gpasid))
return -ENODEV;
 
-   return domain->ops->sva_unbind_gpasid(dev, pasid);
+   return domain->ops->sva_unbind_gpasid(domain, dev, data);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 29d1c6f..0b238c3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -737,7 +737,8 @@ extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
  struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, int pasid);
+int intel_svm_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+   struct iommu_gpasid_unbind_data *data);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 57c46ae..a19f063 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -325,7 +325,8 @@ struct iommu_ops {
int (*sva_bind_gpasid)(struct iommu_domain *domain,
struct device *dev, struct iommu_gpasid_bind_data 
*data);
 
-   int (*sva_unbind_gpasid)(struct device *dev, int pasid);
+   int (*sva_unbind_gpasid)(struct iommu_domain *domain,
+   struct device *dev, struct iommu_gpasid_unbind_data *data);
 
int (*def_domain_type)(struct device *dev);
 
@@ -459,7 +460,7 @@ extern int iommu_cache_invalidate(struct iommu_domain 
*domain,
 extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
struct device *dev, struct iommu_gpasid_bind_data *data);
 extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-   struct device *dev, ioasid_t pasid);
+   struct device *dev, struct iommu_gpasid_unbind_data *data);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1084,7 +1085,8 @@ static inline int iommu_sva_bind_gpasid(struct 
iommu_domain *domain,
 }
 
 static inline int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-  

[PATCH v2 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl()

2020-06-11 Thread Liu Yi L
This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of
if-else, and each cmd got a helper function.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Suggested-by: Christoph Hellwig 
Signed-off-by: Liu Yi L 
---
 drivers/vfio/vfio_iommu_type1.c | 183 +++-
 1 file changed, 105 insertions(+), 78 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cc1d647..402aad3 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2106,6 +2106,23 @@ static int vfio_domains_have_iommu_cache(struct 
vfio_iommu *iommu)
return ret;
 }
 
+static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
+   unsigned long arg)
+{
+   switch (arg) {
+   case VFIO_TYPE1_IOMMU:
+   case VFIO_TYPE1v2_IOMMU:
+   case VFIO_TYPE1_NESTING_IOMMU:
+   return 1;
+   case VFIO_DMA_CC_IOMMU:
+   if (!iommu)
+   return 0;
+   return vfio_domains_have_iommu_cache(iommu);
+   default:
+   return 0;
+   }
+}
+
 static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
 struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
 size_t size)
@@ -2173,110 +2190,120 @@ static int vfio_iommu_iova_build_caps(struct 
vfio_iommu *iommu,
return ret;
 }
 
-static long vfio_iommu_type1_ioctl(void *iommu_data,
-  unsigned int cmd, unsigned long arg)
+static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
+unsigned long arg)
 {
-   struct vfio_iommu *iommu = iommu_data;
+   struct vfio_iommu_type1_info info;
unsigned long minsz;
+   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+   unsigned long capsz;
+   int ret;
 
-   if (cmd == VFIO_CHECK_EXTENSION) {
-   switch (arg) {
-   case VFIO_TYPE1_IOMMU:
-   case VFIO_TYPE1v2_IOMMU:
-   case VFIO_TYPE1_NESTING_IOMMU:
-   return 1;
-   case VFIO_DMA_CC_IOMMU:
-   if (!iommu)
-   return 0;
-   return vfio_domains_have_iommu_cache(iommu);
-   default:
-   return 0;
-   }
-   } else if (cmd == VFIO_IOMMU_GET_INFO) {
-   struct vfio_iommu_type1_info info;
-   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
-   unsigned long capsz;
-   int ret;
-
-   minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
+   minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
-   /* For backward compatibility, cannot require this */
-   capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
+   /* For backward compatibility, cannot require this */
+   capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
 
-   if (copy_from_user(, (void __user *)arg, minsz))
-   return -EFAULT;
+   if (copy_from_user(, (void __user *)arg, minsz))
+   return -EFAULT;
 
-   if (info.argsz < minsz)
-   return -EINVAL;
+   if (info.argsz < minsz)
+   return -EINVAL;
 
-   if (info.argsz >= capsz) {
-   minsz = capsz;
-   info.cap_offset = 0; /* output, no-recopy necessary */
-   }
+   if (info.argsz >= capsz) {
+   minsz = capsz;
+   info.cap_offset = 0; /* output, no-recopy necessary */
+   }
 
-   info.flags = VFIO_IOMMU_INFO_PGSIZES;
+   info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
-   info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
+   info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
-   ret = vfio_iommu_iova_build_caps(iommu, );
-   if (ret)
-   return ret;
+   ret = vfio_iommu_iova_build_caps(iommu, );
+   if (ret)
+   return ret;
 
-   if (caps.size) {
-   info.flags |= VFIO_IOMMU_INFO_CAPS;
+   if (caps.size) {
+   info.flags |= VFIO_IOMMU_INFO_CAPS;
 
-   if (info.argsz < sizeof(info) + caps.size) {
-   info.argsz = sizeof(info) + caps.size;
-   } else {
-   vfio_info_cap_shift(, sizeof(info));
-   if (copy_to_user((void __user *)arg +
-   sizeof(info), caps.buf,
-   caps.size)) {
-   kfree(caps.buf);
- 

[PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs

2020-06-11 Thread Liu Yi L
Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Patch Overview:
 1. a refactor to vfio_iommu_type1 ioctl (patch 0001)
 2. reports IOMMU nesting info to userspace ( patch 0002, 0003 and 0015)
 3. vfio support for PASID allocation and free for VMs (patch 0004, 0005, 0006)
 4. vfio support for binding guest page table to host (patch 0007, 0008, 0009, 
0010)
 5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
 7. expose PASID capability to VM (patch 0013)
 8. add doc for VFIO dual stage control (patch 0014)

The complete vSVA kernel upstream patches are divided into three phases:
1. Common APIs and PCI device direct assignment
2. IOMMU-backed Mediated Device assignment
3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2, and based on Jacob's
below series.
[PATCH v13 0/8] Nested Shared Virtual Address (SVA) VT-d support - merged
https://lkml.org/lkml/2020/5/13/1582

[PATCH v2 0/3] IOMMU user API enhancement - wip
https://lkml.org/lkml/2020/6/11/5

[PATCH 00/10] IOASID extensions for guest SVA - wip
https://lkml.org/lkml/2020/3/25/874

The latest IOASID code added below new interface for itertate all PASIDs of an
ioasid_set. The implementation is not sent out yet as Jacob needs some cleanup,
it can be found in branch vsva-linux-5.7-rc4-v2.
 int ioasid_set_for_each_ioasid(int sid, void (*fn)(ioasid_t id, void *data), 
void *data);

Complete set for current vSVA can be found in below branch.
This branch also includes some extra modifications to IOASID core code and
vt-d iommu driver cleanup patches.
https://github.com/luxis1999/linux-vsva.git:vsva-linux-5.7-rc4-v2

The corresponding QEMU patch series is included in below branch:
https://github.com/luxis1999/qemu.git:vsva_5.7_rc4_qemu_rfcv6


Regards,
Yi Liu

Changelog:
- Patch v1 -> Patch v2:
  a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph
 Hellwig.
  b) Re-sequence the patch series for better bisect support.
  c) Report IOMMU nesting cap info in detail instead of a format in
 v1.
  d) Enforce one group per nesting type container for vfio iommu type1
 driver.
  e) Build the vfio_mm related code from vfio.c to be a separate
 vfio_pasid.ko.
  f) Add PASID ownership check in IOMMU driver.
  g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version
 check. Added iommu_gpasid_unbind_data for unbind requests from
 userspace.
  h) Define a single ioctl:VFIO_IOMMU_NESTING_OP for bind/unbind_gtbl
 and cahce_invld.
  i) Document dual stage control in vfio.rst.
  Patch v1: 
https://lore.kernel.org/linux-iommu/1584880325-10561-1-git-send-email-yi.l@intel.com/

- RFC v3 -> Patch v1:
  a) Address comments to the PASID request(alloc/free) path
  b) Report PASID alloc/free availabitiy to user-space
  c) Add a vfio_iommu_type1 parameter to support pasid quota tuning
  d) Adjusted to latest ioasid code implementation. e.g. remove the
 code for tracking the allocated PASIDs as latest ioasid code
 will track it, VFIO could use ioasid_free_set() to free all
  

[PATCH 0/2] iommu/renesas: Add support for r8a77961

2020-06-11 Thread Yoshihiro Shimoda
This patch series is based on next-20200611.

Yoshihiro Shimoda (2):
  dt-bindings: iommu: renesas,ipmmu-vmsa: add r8a77961 support
  iommu/renesas: Add support for r8a77961

 Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 1 +
 drivers/iommu/ipmmu-vmsa.c  | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] dt-bindings: iommu: renesas, ipmmu-vmsa: add r8a77961 support

2020-06-11 Thread Yoshihiro Shimoda
Add support for r8a77961 (R-Car M3-W+).

Signed-off-by: Yoshihiro Shimoda 
---
 Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml 
b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
index 39675cf..e9d28a4 100644
--- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
+++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
@@ -35,6 +35,7 @@ properties:
   - renesas,ipmmu-r8a774c0 # RZ/G2E
   - renesas,ipmmu-r8a7795  # R-Car H3
   - renesas,ipmmu-r8a7796  # R-Car M3-W
+  - renesas,ipmmu-r8a77961 # R-Car M3-W+
   - renesas,ipmmu-r8a77965 # R-Car M3-N
   - renesas,ipmmu-r8a77970 # R-Car V3M
   - renesas,ipmmu-r8a77980 # R-Car V3H
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/renesas: Add support for r8a77961

2020-06-11 Thread Yoshihiro Shimoda
Add support for r8a77961 (R-Car M3-W+).

Signed-off-by: Yoshihiro Shimoda 
---
 drivers/iommu/ipmmu-vmsa.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 4c2972f..b57b1f2 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -3,7 +3,7 @@
  * IOMMU API for Renesas VMSA-compatible IPMMU
  * Author: Laurent Pinchart 
  *
- * Copyright (C) 2014 Renesas Electronics Corporation
+ * Copyright (C) 2014-2020 Renesas Electronics Corporation
  */
 
 #include 
@@ -753,6 +753,7 @@ static const struct soc_device_attribute 
soc_rcar_gen3_whitelist[] = {
{ .soc_id = "r8a774b1", },
{ .soc_id = "r8a774c0", },
{ .soc_id = "r8a7795", .revision = "ES3.*" },
+   { .soc_id = "r8a77961", },
{ .soc_id = "r8a77965", },
{ .soc_id = "r8a77990", },
{ .soc_id = "r8a77995", },
@@ -970,6 +971,9 @@ static const struct of_device_id ipmmu_of_ids[] = {
.compatible = "renesas,ipmmu-r8a7796",
.data = _features_rcar_gen3,
}, {
+   .compatible = "renesas,ipmmu-r8a77961",
+   .data = _features_rcar_gen3,
+   }, {
.compatible = "renesas,ipmmu-r8a77965",
.data = _features_rcar_gen3,
}, {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-11 Thread Jonathan Cameron
On Wed, 10 Jun 2020 21:12:13 -0700
Jacob Pan  wrote:

> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is indended to clarify the UAPI design and usage. The
> mechenics of how future extensions should be achieved are also covered

mechanics 

> in this documentation.
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
Mostly seems sensible.  A few comments / queries inline.

Jonathan

> ---
>  Documentation/userspace-api/iommu.rst | 210 
> ++
>  1 file changed, 210 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommu.rst
> 
> diff --git a/Documentation/userspace-api/iommu.rst 
> b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index ..e95dc5a04a41
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,210 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=
> +IOMMU Userspace API
> +=
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For native
> +usage, IOMMU is a system device which does not need to communicate
> +with user space directly.
> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is

wherein _a_ virtual IOMMU 

> +required to communicate with the physical IOMMU in the host.
> +
> +.. contents:: :local:
> +
> +Functionalities
> +
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> +4. Invalidate IOMMU caches
> +5. Service page request
> +
> +Requirements
> +
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> +3. Extensions to the UAPI shall not break existing user space
> +
> +Interfaces
> +
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is
> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase in size.
> +
> +No new fields can be added *after* the variable size union in that it
> +will break backward compatibility when offset moves. In both cases, a
> +new flag must be accompanied with a new field such that the IOMMU
> +driver can process the data based on the new flag. Version field is
> +only reserved for the unlikely event of UAPI upgrade at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +
> +When IOMMU UAPI extension results in size increase, user such as VFIO
> +has to handle the following scenarios:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on a older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within
> +   range. The data may contain garbage.
> +
> +
> +Feature Checking
> +
> +While launching a guest with vIOMMU, it is important to ensure that host
> +can support the UAPI data structures to be used for vIOMMU-pIOMMU
> +communications. Without the upfront compatibility checking, future
> +faults are difficult to report even in normal conditions. For example,
> +TLB invalidations should always succeed from vIOMMU's
> +perspective. 

This statement has me concerned.  If a TLB invalidation fails, but
is reported to the guest as successful do we have possible breaking of iommu
isolation guarantees?

If you get a TLB invalidation not happening, for some reason, that's a critical
fault, isolate the device using the IOMMU or kill the VM.

I'd reword it as "TLB invalidations should always succeed."

As you mention, we should never get to this state anyway!

> There is no architectural way to report back to the vIOMMU
> +if the UAPI data is incompatible. For this reason the following IOMMU
> +UAPIs cannot fail:
> +
> +1. Free PASID
> +2. Unbind guest PASID
> +3. Unbind 

Re: [PATCH v2 1/3] docs: IOMMU user API

2020-06-11 Thread Lu Baolu

Hi Jacob,

On 2020/6/11 12:12, Jacob Pan wrote:

IOMMU UAPI is newly introduced to support communications between guest
virtual IOMMU and host IOMMU. There has been lots of discussions on how
it should work with VFIO UAPI and userspace in general.

This document is indended to clarify the UAPI design and usage. The
mechenics of how future extensions should be achieved are also covered
in this documentation.

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
  Documentation/userspace-api/iommu.rst | 210 ++
  1 file changed, 210 insertions(+)
  create mode 100644 Documentation/userspace-api/iommu.rst

diff --git a/Documentation/userspace-api/iommu.rst 
b/Documentation/userspace-api/iommu.rst
new file mode 100644
index ..e95dc5a04a41
--- /dev/null
+++ b/Documentation/userspace-api/iommu.rst
@@ -0,0 +1,210 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. iommu:
+
+=
+IOMMU Userspace API
+=
+
+IOMMU UAPI is used for virtualization cases where communications are
+needed between physical and virtual IOMMU drivers. For native
+usage, IOMMU is a system device which does not need to communicate
+with user space directly.
+
+The primary use cases are guest Shared Virtual Address (SVA) and
+guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
+required to communicate with the physical IOMMU in the host.
+
+.. contents:: :local:
+
+Functionalities
+
+Communications of user and kernel involve both directions. The
+supported user-kernel APIs are as follows:
+
+1. Alloc/Free PASID
+2. Bind/unbind guest PASID (e.g. Intel VT-d)
+3. Bind/unbind guest PASID table (e.g. ARM sMMU)
+4. Invalidate IOMMU caches
+5. Service page request
+
+Requirements
+
+The IOMMU UAPIs are generic and extensible to meet the following
+requirements:
+
+1. Emulated and para-virtualised vIOMMUs
+2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
+3. Extensions to the UAPI shall not break existing user space
+
+Interfaces
+
+Although the data structures defined in IOMMU UAPI are self-contained,
+there is no user API functions introduced. Instead, IOMMU UAPI is
+designed to work with existing user driver frameworks such as VFIO.
+
+Extension Rules & Precautions
+-
+When IOMMU UAPI gets extended, the data structures can *only* be
+modified in two ways:
+
+1. Adding new fields by re-purposing the padding[] field. No size change.
+2. Adding new union members at the end. May increase in size.
+
+No new fields can be added *after* the variable size union in that it
+will break backward compatibility when offset moves. In both cases, a
+new flag must be accompanied with a new field such that the IOMMU
+driver can process the data based on the new flag. Version field is
+only reserved for the unlikely event of UAPI upgrade at its entirety.
+
+It's *always* the caller's responsibility to indicate the size of the
+structure passed by setting argsz appropriately.
+
+When IOMMU UAPI extension results in size increase, user such as VFIO
+has to handle the following scenarios:
+
+1. User and kernel has exact size match
+2. An older user with older kernel header (smaller UAPI size) running on a
+   newer kernel (larger UAPI size)
+3. A newer user with newer kernel header (larger UAPI size) running
+   on a older kernel.
+4. A malicious/misbehaving user pass illegal/invalid size but within
+   range. The data may contain garbage.
+
+
+Feature Checking
+
+While launching a guest with vIOMMU, it is important to ensure that host
+can support the UAPI data structures to be used for vIOMMU-pIOMMU
+communications. Without the upfront compatibility checking, future
+faults are difficult to report even in normal conditions. For example,
+TLB invalidations should always succeed from vIOMMU's
+perspective. There is no architectural way to report back to the vIOMMU
+if the UAPI data is incompatible. For this reason the following IOMMU
+UAPIs cannot fail:
+
+1. Free PASID
+2. Unbind guest PASID
+3. Unbind guest PASID table (SMMU)
+4. Cache invalidate
+5. Page response
+
+User applications such as QEMU is expected to import kernel UAPI
+headers. Only backward compatibility is supported. For example, an
+older QEMU (with older kernel header) can run on newer kernel. Newer
+QEMU (with new kernel header) may fail on older kernel.
+
+IOMMU vendor driver should report the below features to IOMMU UAPI
+consumers (e.g. via VFIO).
+
+1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
+2. IOMMU_NESTING_FEAT_BIND_PGTBL
+3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
+4. IOMMU_NESTING_FEAT_CACHE_INVLD
+5. IOMMU_NESTING_FEAT_PAGE_REQUEST
+
+Take VFIO as example, upon request from VFIO user space (e.g. QEMU),
+VFIO kernel code shall query IOMMU vendor driver for the support of
+the above