Re: [PATCH v8 5/7] iommu/arm-smmu: Add implementation for the adreno GPU SMMU
Hi Jordan, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on next-20200611] [cannot apply to iommu/next robh/for-next arm/for-next keystone/next rockchip/for-next arm64/for-next/core shawnguo/for-next soc/for-next v5.7] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Jordan-Crouse/iommu-arm-smmu-Enable-split-pagetable-support/20200612-094718 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b961f8dc8976c091180839f4483d67b7c2ca2578 config: arm64-randconfig-s031-20200612 (attached as .config) compiler: aarch64-linux-gcc (GCC) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.1-250-g42323db3-dirty # save the attached .config to linux build tree make W=1 C=1 ARCH=arm64 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All error/warnings (new ones prefixed by >>, old ones prefixed by <<): drivers/iommu/arm-smmu-qcom.c: In function 'qcom_adreno_smmu_is_gpu_device': >> drivers/iommu/arm-smmu-qcom.c:17:49: error: 'smmu_domain->dev' is a pointer; >> did you mean to use '->'? 17 | return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno"); | ^ | -> >> drivers/iommu/arm-smmu-qcom.c:18:1: warning: control reaches end of non-void >> function [-Wreturn-type] 18 | } | ^ vim +17 drivers/iommu/arm-smmu-qcom.c 14 15 static bool qcom_adreno_smmu_is_gpu_device(struct arm_smmu_domain *smmu_domain) 16 { > 17 return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno"); > 18 } 19 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 6/7] drm/msm: Set the global virtual address range from the IOMMU domain
Hi Jordan, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on next-20200611] [cannot apply to iommu/next robh/for-next arm/for-next keystone/next rockchip/for-next arm64/for-next/core shawnguo/for-next soc/for-next v5.7] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Jordan-Crouse/iommu-arm-smmu-Enable-split-pagetable-support/20200612-094718 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b961f8dc8976c091180839f4483d67b7c2ca2578 config: arm-defconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>, old ones prefixed by <<): In file included from include/linux/bitops.h:5, from include/linux/kernel.h:12, from include/linux/ascii85.h:11, from drivers/gpu/drm/msm/adreno/adreno_gpu.c:9: drivers/gpu/drm/msm/adreno/adreno_gpu.c: In function 'adreno_iommu_create_address_space': include/linux/bits.h:37:11: warning: right shift count is negative [-Wshift-count-negative] 37 | (~UL(0) >> (BITS_PER_LONG - 1 - (h | ^~ include/linux/bits.h:39:31: note: in expansion of macro '__GENMASK' 39 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) | ^ >> drivers/gpu/drm/msm/adreno/adreno_gpu.c:206:11: note: in expansion of macro >> 'GENMASK' 206 | start & GENMASK(48, 0), size); | ^~~ -- In file included from include/linux/bits.h:6, from include/linux/bitops.h:5, from include/linux/kernel.h:12, from drivers/gpu/drm/msm/msm_drv.h:11, from drivers/gpu/drm/msm/msm_iommu.c:7: drivers/gpu/drm/msm/msm_iommu.c: In function 'msm_iommu_map': >> include/vdso/bits.h:7:26: warning: left shift count >= width of type >> [-Wshift-count-overflow] 7 | #define BIT(nr) (UL(1) << (nr)) | ^~ >> drivers/gpu/drm/msm/msm_iommu.c:40:13: note: in expansion of macro 'BIT' 40 | if (iova & BIT(48)) | ^~~ In file included from include/linux/bitops.h:5, from include/linux/kernel.h:12, from drivers/gpu/drm/msm/msm_drv.h:11, from drivers/gpu/drm/msm/msm_iommu.c:7: >> include/linux/bits.h:36:22: warning: left shift count >= width of type >> [-Wshift-count-overflow] 36 | (((~UL(0)) - (UL(1) << (l)) + 1) & | ^~ include/linux/bits.h:39:31: note: in expansion of macro '__GENMASK' 39 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) | ^ >> drivers/gpu/drm/msm/msm_iommu.c:41:11: note: in expansion of macro 'GENMASK' 41 | iova |= GENMASK(63, 49); | ^~~ include/linux/bits.h:37:11: warning: right shift count is negative [-Wshift-count-negative] 37 | (~UL(0) >> (BITS_PER_LONG - 1 - (h | ^~ include/linux/bits.h:39:31: note: in expansion of macro '__GENMASK' 39 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) | ^ >> drivers/gpu/drm/msm/msm_iommu.c:41:11: note: in expansion of macro 'GENMASK' 41 | iova |= GENMASK(63, 49); | ^~~ In file included from include/linux/bits.h:6, from include/linux/bitops.h:5, from include/linux/kernel.h:12, from drivers/gpu/drm/msm/msm_drv.h:11, from drivers/gpu/drm/msm/msm_iommu.c:7: drivers/gpu/drm/msm/msm_iommu.c: In function 'msm_iommu_unmap': >> include/vdso/bits.h:7:26: warning: left shift count >= width of type >> [-Wshift-count-overflow] 7 | #define BIT(nr) (UL(1) << (nr)) | ^~ drivers/gpu/drm/msm/msm_iommu.c:53:13: note: in expansion of macro 'BIT' 53 | if (iova & BIT(48)) | ^~~ In file included from include/linux/bitops.h:5, from include/linux/kernel.h:12, from drivers/gpu/drm/msm/msm_drv.h:11, from drivers/gpu/drm/msm/msm_iommu.c:7: >> include/linux/bits.h:36:22: warning: left shift count >= width of type >> [-Wshift-count-overflow] 36 | (((~UL(0)) - (UL(1) << (l)) + 1) & | ^~ include/linux/bits.h:39:31: note: in expansion of macro '__GENMASK' 39 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) | ^ drivers/gpu/drm/msm/msm_iommu.c:54:11: note: in expansion of macro 'GENMASK' 54 | iova |= GENMASK(63, 4
Re: [PATCH v8 4/7] iommu/arm-smmu: Add a pointer to the attached device to smmu_domain
Hi Jordan, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.7 next-20200611] [cannot apply to iommu/next robh/for-next arm/for-next keystone/next rockchip/for-next arm64/for-next/core shawnguo/for-next soc/for-next] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Jordan-Crouse/iommu-arm-smmu-Enable-split-pagetable-support/20200612-094718 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b961f8dc8976c091180839f4483d67b7c2ca2578 config: arm64-randconfig-s031-20200612 (attached as .config) compiler: aarch64-linux-gcc (GCC) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.1-250-g42323db3-dirty # save the attached .config to linux build tree make W=1 C=1 ARCH=arm64 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>, old ones prefixed by <<): drivers/iommu/arm-smmu.c: In function 'arm_smmu_init_domain_context': >> drivers/iommu/arm-smmu.c:804:21: error: 'dev' undeclared (first use in this >> function); did you mean 'cdev'? 804 | smmu_domain->dev = dev; | ^~~ | cdev drivers/iommu/arm-smmu.c:804:21: note: each undeclared identifier is reported only once for each function it appears in vim +804 drivers/iommu/arm-smmu.c 669 670 static int arm_smmu_init_domain_context(struct iommu_domain *domain, 671 struct arm_smmu_device *smmu) 672 { 673 int irq, start, ret = 0; 674 unsigned long ias, oas; 675 struct io_pgtable_ops *pgtbl_ops; 676 struct io_pgtable_cfg pgtbl_cfg; 677 enum io_pgtable_fmt fmt; 678 struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); 679 struct arm_smmu_cfg *cfg = _domain->cfg; 680 681 mutex_lock(_domain->init_mutex); 682 if (smmu_domain->smmu) 683 goto out_unlock; 684 685 if (domain->type == IOMMU_DOMAIN_IDENTITY) { 686 smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS; 687 smmu_domain->smmu = smmu; 688 goto out_unlock; 689 } 690 691 /* 692 * Mapping the requested stage onto what we support is surprisingly 693 * complicated, mainly because the spec allows S1+S2 SMMUs without 694 * support for nested translation. That means we end up with the 695 * following table: 696 * 697 * RequestedSupportedActual 698 * S1 N S1 699 * S1 S1+S2S1 700 * S1 S2 S2 701 * S1 S1 S1 702 * NN N 703 * N S1+S2S2 704 * NS2 S2 705 * NS1 S1 706 * 707 * Note that you can't actually request stage-2 mappings. 708 */ 709 if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) 710 smmu_domain->stage = ARM_SMMU_DOMAIN_S2; 711 if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2)) 712 smmu_domain->stage = ARM_SMMU_DOMAIN_S1; 713 714 /* 715 * Choosing a suitable context format is even more fiddly. Until we 716 * grow some way for the caller to express a preference, and/or move 717 * the decision into the io-pgtable code where it arguably belongs, 718 * just aim for the closest thing to the rest of the system, and hope 719 * that the hardware isn't esoteric enough that we can't assume AArch64 720 * support to be a superset of AArch32 support... 721 */ 722 if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_L) 723 cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_L; 724 if (IS_ENABLED(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) && 725 !IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_ARM_LPAE) && 726 (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_S) && 727 (smmu_domain->stage == ARM_SMMU_DOMAIN_S1)) 728 cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_S;
[PATCH v2 1/2] iommu: Mark __iommu_map/__iommu_map_sg as static
Now the __iommu_map() and __iommu_map_sg() are used only in iommu.c file, so mark them as static. Signed-off-by: Baolin Wang --- drivers/iommu/iommu.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 8584f48..14eca9f 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2174,8 +2174,8 @@ static size_t iommu_pgsize(struct iommu_domain *domain, return pgsize; } -int __iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot, gfp_t gfp) +static int __iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { const struct iommu_ops *ops = domain->ops; unsigned long orig_iova = iova; @@ -2325,9 +2325,9 @@ size_t iommu_unmap_fast(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_unmap_fast); -size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, - struct scatterlist *sg, unsigned int nents, int prot, - gfp_t gfp) +static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, +struct scatterlist *sg, unsigned int nents, int prot, +gfp_t gfp) { size_t len = 0, mapped = 0; phys_addr_t start; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/2] Some improvements for iommu
Hi, The first patch masks some functions as static, and the second patch changes to use the gfp parameter from iommu_ops->map() to allocate ARM page pages. Any comments are welcome. Thanks. Changes from v1: - Fix the building errors when enabling CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST Baolin Wang (2): iommu: Mark __iommu_map/__iommu_map_sg as static iommu: Add gfp parameter to io_pgtable_ops->map() drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c| 2 +- drivers/iommu/io-pgtable-arm-v7s.c | 18 +- drivers/iommu/io-pgtable-arm.c | 18 +- drivers/iommu/iommu.c | 10 +- drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- include/linux/io-pgtable.h | 2 +- 11 files changed, 31 insertions(+), 31 deletions(-) -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/2] iommu: Add gfp parameter to io_pgtable_ops->map()
Now the ARM page tables are always allocated by GFP_ATOMIC parameter, but the iommu_ops->map() function has been added a gfp_t parameter by commit 781ca2de89ba ("iommu: Add gfp parameter to iommu_ops::map"), thus io_pgtable_ops->map() should use the gfp parameter passed from iommu_ops->map() to allocate page pages, which can avoid wasting the memory allocators atomic pools for some non-atomic contexts. Signed-off-by: Baolin Wang --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c| 2 +- drivers/iommu/io-pgtable-arm-v7s.c | 18 +- drivers/iommu/io-pgtable-arm.c | 18 +- drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- include/linux/io-pgtable.h | 2 +- 10 files changed, 26 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index ed28aeb..5a39eee 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -262,7 +262,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu, while (len) { size_t pgsize = get_pgsize(iova | paddr, len); - ops->map(ops, iova, paddr, pgsize, prot); + ops->map(ops, iova, paddr, pgsize, prot, GFP_KERNEL); iova += pgsize; paddr += pgsize; len -= pgsize; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index f578677..7b59f06 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2850,7 +2850,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, if (!ops) return -ENODEV; - return ops->map(ops, iova, paddr, size, prot); + return ops->map(ops, iova, paddr, size, prot, gfp); } static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4c..dc1d253 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1227,7 +1227,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, return -ENODEV; arm_smmu_rpm_get(smmu); - ret = ops->map(ops, iova, paddr, size, prot); + ret = ops->map(ops, iova, paddr, size, prot, gfp); arm_smmu_rpm_put(smmu); return ret; diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 4272fe4..a688f22 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -470,7 +470,7 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte *table, static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova, phys_addr_t paddr, size_t size, int prot, -int lvl, arm_v7s_iopte *ptep) +int lvl, arm_v7s_iopte *ptep, gfp_t gfp) { struct io_pgtable_cfg *cfg = >iop.cfg; arm_v7s_iopte pte, *cptep; @@ -491,7 +491,7 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova, /* Grab a pointer to the next level */ pte = READ_ONCE(*ptep); if (!pte) { - cptep = __arm_v7s_alloc_table(lvl + 1, GFP_ATOMIC, data); + cptep = __arm_v7s_alloc_table(lvl + 1, gfp, data); if (!cptep) return -ENOMEM; @@ -512,11 +512,11 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova, } /* Rinse, repeat */ - return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep); + return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep, gfp); } static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops); struct io_pgtable *iop = >iop; @@ -530,7 +530,7 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova, paddr >= (1ULL << data->iop.cfg.oas))) return -ERANGE; - ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd); + ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd, gfp); /* * Synchronise all PTE updates for the new mapping before there's * a chance for anything to kick off a table walk for the new iova. @@ -922,12 +922,12 @@ static int __init arm_v7s_do_selftests(void) if (ops->map(ops, iova, iova,
Re: [PATCH 6/6] drm/msm/a6xx: Add support for per-instance pagetables
On Thu, Jun 11, 2020 at 3:29 PM Jordan Crouse wrote: > > Add support for using per-instance pagetables if all the dependencies are > available. > > Signed-off-by: Jordan Crouse > --- > > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 69 ++- > drivers/gpu/drm/msm/msm_ringbuffer.h | 1 + > 2 files changed, 69 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > index a1589e040c57..5e82b85d4d55 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > @@ -79,6 +79,58 @@ static void get_stats_counter(struct msm_ringbuffer *ring, > u32 counter, > OUT_RING(ring, upper_32_bits(iova)); > } > > +static void a6xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer > *ring, > + struct msm_file_private *ctx) > +{ > + phys_addr_t ttbr; > + u32 asid; > + > + if (msm_iommu_pagetable_params(ctx->aspace->mmu, , )) > + return; > + > + OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1); > + OUT_RING(ring, 0); > + > + /* Turn on APIV mode to access critical regions */ > + OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1); > + OUT_RING(ring, 1); > + > + /* Make sure the ME is synchronized before staring the update */ > + OUT_PKT7(ring, CP_WAIT_FOR_ME, 0); > + > + /* Execute the table update */ > + OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4); > + OUT_RING(ring, lower_32_bits(ttbr)); > + OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr)); > + /* CONTEXTIDR is currently unused */ > + OUT_RING(ring, 0); > + /* CONTEXTBANK is currently unused */ > + OUT_RING(ring, 0); I can add this to xml (on userspace side, we've been describing packet payload in xml and using the generated builders), and update generated headers, if you agree to not add more open-coded pkt7 building ;-) > + > + /* > +* Write the new TTBR0 to the memstore. This is good for debugging. > +*/ > + OUT_PKT7(ring, CP_MEM_WRITE, 4); > + OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0))); > + OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0))); > + OUT_RING(ring, lower_32_bits(ttbr)); > + OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr)); > + > + /* Invalidate the draw state so we start off fresh */ > + OUT_PKT7(ring, CP_SET_DRAW_STATE, 3); > + OUT_RING(ring, 0x4); > + OUT_RING(ring, 1); > + OUT_RING(ring, 0); Ie, this would look like: OUT_PKT7(ring, CP_SET_DRAW_STATE, 3); OUT_RING(ring, CP_SET_DRAW_STATE__0_COUNT(0) | CP_SET_DRAW_STATE__0_DISABLE_ALL_GROUPS | CP_SET_DRAW_STATE__0_GROUP_ID(0)); OUT_RING(ring, CP_SET_DRAW_STATE__1_ADDR_LO(1)); OUT_RING(ring, CP_SET_DRAW_STATE__2_ADDR_HI(0)); .. but written that way, I think you meant ADDR_LO to be zero? (it is possible we need to regen headers for that to work, the kernel headers are somewhat out of date by now) BR, -R > + > + /* Turn off APRIV */ > + OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1); > + OUT_RING(ring, 0); > + > + /* Turn off protected mode */ > + OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1); > + OUT_RING(ring, 1); > +} > + > static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit, > struct msm_file_private *ctx) > { > @@ -89,6 +141,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct > msm_gem_submit *submit, > struct msm_ringbuffer *ring = submit->ring; > unsigned int i; > > + a6xx_set_pagetable(gpu, ring, ctx); > + > get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO, > rbmemptr_stats(ring, index, cpcycles_start)); > > @@ -872,6 +926,18 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu) > return (unsigned long)busy_time; > } > > +struct msm_gem_address_space *a6xx_address_space_instance(struct msm_gpu > *gpu) > +{ > + struct msm_mmu *mmu; > + > + mmu = msm_iommu_pagetable_create(gpu->aspace->mmu); > + if (IS_ERR(mmu)) > + return msm_gem_address_space_get(gpu->aspace); > + > + return msm_gem_address_space_create(mmu, > + "gpu", 0x1ULL, 0x1ULL); > +} > + > static const struct adreno_gpu_funcs funcs = { > .base = { > .get_param = adreno_get_param, > @@ -893,8 +959,9 @@ static const struct adreno_gpu_funcs funcs = { > #if defined(CONFIG_DRM_MSM_GPU_STATE) > .gpu_state_get = a6xx_gpu_state_get, > .gpu_state_put = a6xx_gpu_state_put, > - .create_address_space = adreno_iommu_create_address_space, > #endif > + .create_address_space = adreno_iommu_create_address_space, > + .address_space_instance = a6xx_address_space_instance, > }, > .get_timestamp = a6xx_get_timestamp, > }; > diff --git
Re: [PATCH v2 1/3] docs: IOMMU user API
On Thu, 11 Jun 2020 14:40:47 -0600 Alex Williamson wrote: > On Thu, 11 Jun 2020 12:52:05 -0700 > Jacob Pan wrote: > > > Hi Alex, > > > > On Thu, 11 Jun 2020 09:47:41 -0600 > > Alex Williamson wrote: > > > > > On Wed, 10 Jun 2020 21:12:13 -0700 > > > Jacob Pan wrote: > > > > > > > IOMMU UAPI is newly introduced to support communications between > > > > guest virtual IOMMU and host IOMMU. There has been lots of > > > > discussions on how it should work with VFIO UAPI and userspace > > > > in general. > > > > > > > > This document is indended to clarify the UAPI design and usage. > > > > The mechenics of how future extensions should be achieved are > > > > also covered in this documentation. > > > > > > > > Signed-off-by: Liu Yi L > > > > Signed-off-by: Jacob Pan > > > > --- > > > > Documentation/userspace-api/iommu.rst | 210 > > > > ++ 1 file changed, 210 > > > > insertions(+) create mode 100644 > > > > Documentation/userspace-api/iommu.rst > > > > > > > > diff --git a/Documentation/userspace-api/iommu.rst > > > > b/Documentation/userspace-api/iommu.rst new file mode 100644 > > > > index ..e95dc5a04a41 > > > > --- /dev/null > > > > +++ b/Documentation/userspace-api/iommu.rst > > > > @@ -0,0 +1,210 @@ > > > > +.. SPDX-License-Identifier: GPL-2.0 > > > > +.. iommu: > > > > + > > > > += > > > > +IOMMU Userspace API > > > > += > > > > + > > > > +IOMMU UAPI is used for virtualization cases where > > > > communications are +needed between physical and virtual IOMMU > > > > drivers. For native +usage, IOMMU is a system device which does > > > > not need to communicate +with user space directly. > > > > + > > > > +The primary use cases are guest Shared Virtual Address (SVA) > > > > and +guest IO virtual address (IOVA), wherein virtual IOMMU > > > > (vIOMMU) is +required to communicate with the physical IOMMU in > > > > the host. + > > > > +.. contents:: :local: > > > > + > > > > +Functionalities > > > > + > > > > +Communications of user and kernel involve both directions. The > > > > +supported user-kernel APIs are as follows: > > > > + > > > > +1. Alloc/Free PASID > > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d) > > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU) > > > > +4. Invalidate IOMMU caches > > > > +5. Service page request > > > > + > > > > +Requirements > > > > + > > > > +The IOMMU UAPIs are generic and extensible to meet the > > > > following +requirements: > > > > + > > > > +1. Emulated and para-virtualised vIOMMUs > > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.) > > > > +3. Extensions to the UAPI shall not break existing user space > > > > + > > > > +Interfaces > > > > + > > > > +Although the data structures defined in IOMMU UAPI are > > > > self-contained, +there is no user API functions introduced. > > > > Instead, IOMMU UAPI is +designed to work with existing user > > > > driver frameworks such as VFIO. + > > > > +Extension Rules & Precautions > > > > +- > > > > +When IOMMU UAPI gets extended, the data structures can *only* > > > > be +modified in two ways: > > > > + > > > > +1. Adding new fields by re-purposing the padding[] field. No > > > > size change. +2. Adding new union members at the end. May > > > > increase in size. + > > > > +No new fields can be added *after* the variable size union in > > > > that it +will break backward compatibility when offset moves. > > > > In both cases, a +new flag must be accompanied with a new field > > > > such that the IOMMU +driver can process the data based on the > > > > new flag. Version field is +only reserved for the unlikely > > > > event of UAPI upgrade at its entirety. + > > > > +It's *always* the caller's responsibility to indicate the size > > > > of the +structure passed by setting argsz appropriately. > > > > + > > > > +When IOMMU UAPI extension results in size increase, user such > > > > as VFIO +has to handle the following scenarios: > > > > + > > > > +1. User and kernel has exact size match > > > > +2. An older user with older kernel header (smaller UAPI size) > > > > running on a > > > > + newer kernel (larger UAPI size) > > > > +3. A newer user with newer kernel header (larger UAPI size) > > > > running > > > > + on a older kernel. > > > > +4. A malicious/misbehaving user pass illegal/invalid size but > > > > within > > > > + range. The data may contain garbage. > > > > + > > > > + > > > > +Feature Checking > > > > + > > > > +While launching a guest with vIOMMU, it is important to ensure > > > > that host +can support the UAPI data structures to be used for > > > > vIOMMU-pIOMMU +communications. Without the upfront compatibility > > > > checking, future +faults are difficult to report
Re: [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data
On Thu, 11 Jun 2020 10:49:36 -0600 Alex Williamson wrote: > On Wed, 10 Jun 2020 21:12:14 -0700 > Jacob Pan wrote: > > > As IOMMU UAPI gets extended, user data size may increase. To support > > backward compatibiliy, this patch introduces a size field to each > > UAPI data structures. It is *always* the responsibility for the > > user to fill in the correct size. > > Though at the same time, argsz is user provided data which we don't > trust. The argsz field allows the user to indicate how much data > they're providing, it's still the kernel's responsibility to validate > whether it's correct and sufficient for the requested operation. > Thanks, > Yes, will add this clarification. Thanks, Jacob > Alex > > > Specific scenarios for user data handling are documented in: > > Documentation/userspace-api/iommu.rst > > > > Signed-off-by: Liu Yi L > > Signed-off-by: Jacob Pan > > --- > > include/uapi/linux/iommu.h | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > > index e907b7091a46..303f148a5cd7 100644 > > --- a/include/uapi/linux/iommu.h > > +++ b/include/uapi/linux/iommu.h > > @@ -135,6 +135,7 @@ enum iommu_page_response_code { > > > > /** > > * struct iommu_page_response - Generic page response information > > + * @argsz: User filled size of this data > > * @version: API version of this structure > > * @flags: encodes whether the corresponding fields are valid > > * (IOMMU_FAULT_PAGE_RESPONSE_* values) > > @@ -143,6 +144,7 @@ enum iommu_page_response_code { > > * @code: response code from iommu_page_response_code > > */ > > struct iommu_page_response { > > + __u32 argsz; > > #define IOMMU_PAGE_RESP_VERSION_1 1 > > __u32 version; > > #define IOMMU_PAGE_RESP_PASID_VALID(1 << 0) > > @@ -218,6 +220,7 @@ struct iommu_inv_pasid_info { > > /** > > * struct iommu_cache_invalidate_info - First level/stage > > invalidation > > * information > > + * @argsz: User filled size of this data > > * @version: API version of this structure > > * @cache: bitfield that allows to select which caches to > > invalidate > > * @granularity: defines the lowest granularity used for the > > invalidation: @@ -246,6 +249,7 @@ struct iommu_inv_pasid_info { > > * must support the used granularity. > > */ > > struct iommu_cache_invalidate_info { > > + __u32 argsz; > > #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 > > __u32 version; > > /* IOMMU paging structure cache */ > > @@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd { > > > > /** > > * struct iommu_gpasid_bind_data - Information about device and > > guest PASID binding > > + * @argsz: User filled size of this data > > * @version: Version of this data structure > > * @format:PASID table entry format > > * @flags: Additional information on guest bind request > > @@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd { > > * PASID to host PASID based on this bind data. > > */ > > struct iommu_gpasid_bind_data { > > + __u32 argsz; > > #define IOMMU_GPASID_BIND_VERSION_11 > > __u32 version; > > #define IOMMU_PASID_FORMAT_INTEL_VTD 1 > [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users
On Thu, 11 Jun 2020 14:55:18 -0600 Alex Williamson wrote: > On Thu, 11 Jun 2020 13:02:24 -0700 > Jacob Pan wrote: > > > On Thu, 11 Jun 2020 11:08:16 -0600 > > Alex Williamson wrote: > > > > > On Wed, 10 Jun 2020 21:12:15 -0700 > > > Jacob Pan wrote: > > > > > > > IOMMU UAPI data has an argsz field which is filled by user. As > > > > the data structures expands, argsz may change. As the UAPI data > > > > are shared among different architectures, extensions of UAPI > > > > data could be a result of one architecture which has no impact > > > > on another. Therefore, these argsz santity checks are performed > > > > in the model specific IOMMU drivers. This patch adds sanity > > > > checks in the VT-d to ensure argsz passed by userspace matches > > > > feature flags and other contents. > > > > > > > > Signed-off-by: Jacob Pan > > > > --- > > > > drivers/iommu/intel-iommu.c | 16 > > > > drivers/iommu/intel-svm.c | 12 > > > > 2 files changed, 28 insertions(+) > > > > > > > > diff --git a/drivers/iommu/intel-iommu.c > > > > b/drivers/iommu/intel-iommu.c index 27ebf4b9faef..c98b5109684b > > > > 100644 --- a/drivers/iommu/intel-iommu.c > > > > +++ b/drivers/iommu/intel-iommu.c > > > > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct > > > > iommu_domain *domain, struct device *dev, struct > > > > device_domain_info *info; struct intel_iommu *iommu; > > > > unsigned long flags; > > > > + unsigned long minsz; > > > > int cache_type; > > > > u8 bus, devfn; > > > > u16 did, sid; > > > > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct > > > > iommu_domain *domain, struct device *dev, if > > > > (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) return > > > > -EINVAL; > > > > + minsz = offsetofend(struct iommu_cache_invalidate_info, > > > > padding); > > > > > > Would it still be better to look for the end of the last field > > > that's actually used to avoid the code churn and oversights > > > if/when the padding field does get used and renamed? > > > > > My thought was that if the padding gets partially re-purposed, the > > remaining padding would still be valid for minsz check. The > > extension rule ensures that there is no size change other the > > variable size union at the end. So use padding would avoid the > > churn, or i am totally wrong? > > No, it's trying to predict the future either way. I figured checking > minsz against the fields we actually consume allows complete use of > the padding fields and provides a little leniency to the user. We'd > need to be careful though that if those fields are later used by this > driver, the code would still need to accept the smaller size. If the > union was named rather than anonymous we could just use offsetof() to > avoid directly referencing the padding fields. > I will change it to named union. Thanks, > > > Per my comment on patch 1/, this also seems like where the device > > > specific IOMMU driver should also have the responsibility of > > > receiving a __user pointer to do the copy_from_user() here. vfio > > > can't know which flags require which fields to make a UAPI with > > > acceptable compatibility guarantees otherwise. > > > > > Right, VFIO cannot do compatibility guarantees, it is just seem to > > be that VFIO has enough information to copy_from_user sanely & > > safely and handle over to IOMMU. Please help define the > > roles/responsibilities in my other email. Then I will follow the > > guideline. > > We can keep that part of the discussion in the other thread. Thanks, > > Alex > > > > > + if (inv_info->argsz < minsz) > > > > + return -EINVAL; > > > > + > > > > + /* Sanity check user filled invalidation dat sizes */ > > > > + if (inv_info->granularity == IOMMU_INV_GRANU_ADDR && > > > > + inv_info->argsz != offsetofend(struct > > > > iommu_cache_invalidate_info, > > > > + addr_info)) > > > > + return -EINVAL; > > > > + > > > > + if (inv_info->granularity == IOMMU_INV_GRANU_PASID && > > > > + inv_info->argsz != offsetofend(struct > > > > iommu_cache_invalidate_info, > > > > + pasid_info)) > > > > + return -EINVAL; > > > > + > > > > spin_lock_irqsave(_domain_lock, flags); > > > > spin_lock(>lock); > > > > info = get_domain_info(dev); > > > > diff --git a/drivers/iommu/intel-svm.c > > > > b/drivers/iommu/intel-svm.c index 35b43fe819ed..64dc2c66dfff > > > > 100644 --- a/drivers/iommu/intel-svm.c > > > > +++ b/drivers/iommu/intel-svm.c > > > > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct > > > > iommu_domain *domain, struct device *dev, struct dmar_domain > > > > *dmar_domain; struct intel_svm_dev *sdev; > > > > struct intel_svm *svm; > > > > + unsigned long minsz; > > > > int ret = 0; > >
[PATCH] iommu/arm-smmu: Add a init_context_bank implementation hook
Add a new implementation hook to allow the implementation specific code to tweek the context bank configuration just before it gets written. The first user will be the Adreno GPU implementation to turn on SCTLR.HUPCF to ensure that a page fault doesn't terminating pending transactions. Doing so could hang the GPU if one of the terminated transactions is a CP read. This depends on the arm-smmu adreno SMMU implementation [1]. [1] https://patchwork.kernel.org/patch/11600943/ Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu-qcom.c | 13 + drivers/iommu/arm-smmu.c | 28 +--- drivers/iommu/arm-smmu.h | 11 +++ 3 files changed, 37 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c index 6d0ab4865fc7..e5c6345da6fc 100644 --- a/drivers/iommu/arm-smmu-qcom.c +++ b/drivers/iommu/arm-smmu-qcom.c @@ -17,6 +17,18 @@ static bool qcom_adreno_smmu_is_gpu_device(struct arm_smmu_domain *smmu_domain) return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno"); } +static void qcom_adreno_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_cb *cb) +{ + /* +* On the GPU device we want to process subsequent transactions after a +* fault to keep the GPU from hanging +*/ + + if (qcom_adreno_smmu_is_gpu_device(smmu_domain)) + cb->sctlr |= ARM_SMMU_SCTLR_HUPCF; +} + static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, struct io_pgtable_cfg *pgtbl_cfg) { @@ -92,6 +104,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = { .init_context = qcom_adreno_smmu_init_context, .def_domain_type = qcom_smmu_def_domain_type, .reset = qcom_smmu500_reset, + .init_context_bank = qcom_adreno_smmu_init_context_bank, }; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index a06cbcaec247..f0f201ece3a0 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -86,13 +86,6 @@ struct arm_smmu_smr { boolvalid; }; -struct arm_smmu_cb { - u64 ttbr[2]; - u32 tcr[2]; - u32 mair[2]; - struct arm_smmu_cfg *cfg; -}; - struct arm_smmu_master_cfg { struct arm_smmu_device *smmu; s16 smendx[]; @@ -579,6 +572,18 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32; } } + + cb->sctlr = ARM_SMMU_SCTLR_CFIE | ARM_SMMU_SCTLR_CFRE | ARM_SMMU_SCTLR_AFE | + ARM_SMMU_SCTLR_TRE | ARM_SMMU_SCTLR_M; + + if (stage1) + cb->sctlr |= ARM_SMMU_SCTLR_S1_ASIDPNE; + if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) + cb->sctlr |= ARM_SMMU_SCTLR_E; + + /* Give the implementation a chance to adjust the configuration */ + if (smmu_domain->smmu->impl && smmu_domain->smmu->impl->init_context_bank) + smmu_domain->smmu->impl->init_context_bank(smmu_domain, cb); } static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) @@ -657,14 +662,7 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) } /* SCTLR */ - reg = ARM_SMMU_SCTLR_CFIE | ARM_SMMU_SCTLR_CFRE | ARM_SMMU_SCTLR_AFE | - ARM_SMMU_SCTLR_TRE | ARM_SMMU_SCTLR_M; - if (stage1) - reg |= ARM_SMMU_SCTLR_S1_ASIDPNE; - if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) - reg |= ARM_SMMU_SCTLR_E; - - arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg); + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, cb->sctlr); } /* diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index 79d441024043..9b539820997b 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -142,6 +142,7 @@ enum arm_smmu_cbar_type { #define ARM_SMMU_CB_SCTLR 0x0 #define ARM_SMMU_SCTLR_S1_ASIDPNE BIT(12) +#define ARM_SMMU_SCTLR_HUPCF BIT(8) #define ARM_SMMU_SCTLR_CFCFG BIT(7) #define ARM_SMMU_SCTLR_CFIEBIT(6) #define ARM_SMMU_SCTLR_CFREBIT(5) @@ -349,6 +350,14 @@ struct arm_smmu_domain { boolaux; }; +struct arm_smmu_cb { + u64 ttbr[2]; + u32 tcr[2]; + u32 mair[2]; + u32 sctlr; + struct arm_smmu_cfg *cfg; +}; + static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg) { u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) | @@ -403,6 +412,8 @@ struct arm_smmu_impl { void (*tlb_sync)(struct arm_smmu_device
[PATCH 3/6] iommu/arm-smmu: Add a domain attribute to pass the pagetable config
The Adreno GPU has the capacity to manage its own pagetables and switch them dynamically from the hardware. Add a domain attribute for arm-smmu-v2 to get the default pagetable configuration so that the GPU driver can match the format for its own pagetables. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu.c | 12 include/linux/iommu.h| 1 + 2 files changed, 13 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 46a96c578592..a06cbcaec247 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1710,6 +1710,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_PGTABLE_CFG: { + struct io_pgtable *pgtable; + struct io_pgtable_cfg *dest = data; + + if (!smmu_domain->pgtbl_ops) + return -ENODEV; + + pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops); + + memcpy(dest, >cfg, sizeof(*dest)); + return 0; + } default: return -ENODEV; } diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5f0b7859d2eb..2388117641f1 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -124,6 +124,7 @@ enum iommu_attr { DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_NESTING,/* two stages of translation */ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, + DOMAIN_ATTR_PGTABLE_CFG, DOMAIN_ATTR_MAX, }; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/6] drm/msm/a6xx: Add support for per-instance pagetables
Add support for using per-instance pagetables if all the dependencies are available. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 69 ++- drivers/gpu/drm/msm/msm_ringbuffer.h | 1 + 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index a1589e040c57..5e82b85d4d55 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -79,6 +79,58 @@ static void get_stats_counter(struct msm_ringbuffer *ring, u32 counter, OUT_RING(ring, upper_32_bits(iova)); } +static void a6xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer *ring, + struct msm_file_private *ctx) +{ + phys_addr_t ttbr; + u32 asid; + + if (msm_iommu_pagetable_params(ctx->aspace->mmu, , )) + return; + + OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1); + OUT_RING(ring, 0); + + /* Turn on APIV mode to access critical regions */ + OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1); + OUT_RING(ring, 1); + + /* Make sure the ME is synchronized before staring the update */ + OUT_PKT7(ring, CP_WAIT_FOR_ME, 0); + + /* Execute the table update */ + OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4); + OUT_RING(ring, lower_32_bits(ttbr)); + OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr)); + /* CONTEXTIDR is currently unused */ + OUT_RING(ring, 0); + /* CONTEXTBANK is currently unused */ + OUT_RING(ring, 0); + + /* +* Write the new TTBR0 to the memstore. This is good for debugging. +*/ + OUT_PKT7(ring, CP_MEM_WRITE, 4); + OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0))); + OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0))); + OUT_RING(ring, lower_32_bits(ttbr)); + OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr)); + + /* Invalidate the draw state so we start off fresh */ + OUT_PKT7(ring, CP_SET_DRAW_STATE, 3); + OUT_RING(ring, 0x4); + OUT_RING(ring, 1); + OUT_RING(ring, 0); + + /* Turn off APRIV */ + OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1); + OUT_RING(ring, 0); + + /* Turn off protected mode */ + OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1); + OUT_RING(ring, 1); +} + static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit, struct msm_file_private *ctx) { @@ -89,6 +141,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit, struct msm_ringbuffer *ring = submit->ring; unsigned int i; + a6xx_set_pagetable(gpu, ring, ctx); + get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO, rbmemptr_stats(ring, index, cpcycles_start)); @@ -872,6 +926,18 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu) return (unsigned long)busy_time; } +struct msm_gem_address_space *a6xx_address_space_instance(struct msm_gpu *gpu) +{ + struct msm_mmu *mmu; + + mmu = msm_iommu_pagetable_create(gpu->aspace->mmu); + if (IS_ERR(mmu)) + return msm_gem_address_space_get(gpu->aspace); + + return msm_gem_address_space_create(mmu, + "gpu", 0x1ULL, 0x1ULL); +} + static const struct adreno_gpu_funcs funcs = { .base = { .get_param = adreno_get_param, @@ -893,8 +959,9 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DRM_MSM_GPU_STATE) .gpu_state_get = a6xx_gpu_state_get, .gpu_state_put = a6xx_gpu_state_put, - .create_address_space = adreno_iommu_create_address_space, #endif + .create_address_space = adreno_iommu_create_address_space, + .address_space_instance = a6xx_address_space_instance, }, .get_timestamp = a6xx_get_timestamp, }; diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h index 7764373d0ed2..0987d6bf848c 100644 --- a/drivers/gpu/drm/msm/msm_ringbuffer.h +++ b/drivers/gpu/drm/msm/msm_ringbuffer.h @@ -31,6 +31,7 @@ struct msm_rbmemptrs { volatile uint32_t fence; volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT]; + volatile u64 ttbr0; }; struct msm_ringbuffer { -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2
Support auxiliary domains for arm-smmu-v2 to initialize and support multiple pagetables for a single SMMU context bank. Since the smmu-v2 hardware doesn't have any built in support for switching the pagetable base it is left as an exercise to the caller to actually use the pagetable. Aux domains are supported if split pagetable (TTBR1) support has been enabled on the master domain. Each auxiliary domain will reuse the configuration of the master domain. By default the a domain with TTBR1 support will have the TTBR0 region disabled so the first attached aux domain will enable the TTBR0 region in the hardware and conversely the last domain to be detached will disable TTBR0 translations. All subsequent auxiliary domains create a pagetable but not touch the hardware. The leaf driver will be able to query the physical address of the pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the address with whatever means it has to switch the pagetable base. Following is a pseudo code example of how a domain can be created /* Check to see if aux domains are supported */ if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { iommu = iommu_domain_alloc(...); if (iommu_aux_attach_device(domain, dev)) return FAIL; /* Save the base address of the pagetable for use by the driver iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, ); } Then 'domain' can be used like any other iommu domain to map and unmap iova addresses in the pagetable. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu.c | 216 --- drivers/iommu/arm-smmu.h | 1 + 2 files changed, 201 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 743d75b9ff3f..46a96c578592 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -667,6 +667,84 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg); } +/* + * Update the context context bank to enable TTBR0. Assumes AARCH64 S1 + * configuration. + */ +static void arm_smmu_context_set_ttbr0(struct arm_smmu_cb *cb, + struct io_pgtable_cfg *pgtbl_cfg) +{ + u32 tcr = cb->tcr[0]; + + /* Add the TCR configuration from the new pagetable config */ + tcr |= arm_smmu_lpae_tcr(pgtbl_cfg); + + /* Make sure that both TTBR0 and TTBR1 are enabled */ + tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1); + + /* Udate the TCR register */ + cb->tcr[0] = tcr; + + /* Program the new TTBR0 */ + cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid); +} + +/* + * Thus function assumes that the current model only allows aux domains for + * AARCH64 S1 configurations + */ +static int arm_smmu_aux_init_domain_context(struct iommu_domain *domain, + struct arm_smmu_device *smmu, struct arm_smmu_cfg *master) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct io_pgtable_ops *pgtbl_ops; + struct io_pgtable_cfg pgtbl_cfg; + + mutex_lock(_domain->init_mutex); + + /* Copy the configuration from the master */ + memcpy(_domain->cfg, master, sizeof(smmu_domain->cfg)); + + smmu_domain->flush_ops = _smmu_s1_tlb_ops; + smmu_domain->smmu = smmu; + + pgtbl_cfg = (struct io_pgtable_cfg) { + .pgsize_bitmap = smmu->pgsize_bitmap, + .ias = smmu->va_size, + .oas = smmu->ipa_size, + .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK, + .tlb = smmu_domain->flush_ops, + .iommu_dev = smmu->dev, + .quirks = 0, + }; + + if (smmu_domain->non_strict) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; + + pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1, _cfg, + smmu_domain); + if (!pgtbl_ops) { + mutex_unlock(_domain->init_mutex); + return -ENOMEM; + } + + domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; + + domain->geometry.aperture_end = (1UL << smmu->va_size) - 1; + domain->geometry.force_aperture = true; + + /* enable TTBR0 when the the first aux domain is attached */ + if (atomic_inc_return(>cbs[master->cbndx].aux) == 1) { + arm_smmu_context_set_ttbr0(>cbs[master->cbndx], + _cfg); + arm_smmu_write_context_bank(smmu, master->cbndx); + } + + smmu_domain->pgtbl_ops = pgtbl_ops; + return 0; +} + static int arm_smmu_init_domain_context(struct iommu_domain *domain, struct arm_smmu_device *smmu) { @@ -870,36 +948,70 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, return ret; } +static void
[PATCH 4/6] drm/msm: Add support to create a local pagetable
Add support to create a io-pgtable for use by targets that support per-instance pagetables. In order to support per-instance pagetables the GPU SMMU device needs to have the qcom,adreno-smmu compatible string and split pagetables and auxiliary domains need to be supported and enabled. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/msm_gpummu.c | 2 +- drivers/gpu/drm/msm/msm_iommu.c | 180 ++- drivers/gpu/drm/msm/msm_mmu.h| 16 ++- 3 files changed, 195 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c index 310a31b05faa..aab121f4beb7 100644 --- a/drivers/gpu/drm/msm/msm_gpummu.c +++ b/drivers/gpu/drm/msm/msm_gpummu.c @@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct msm_gpu *gpu) } gpummu->gpu = gpu; - msm_mmu_init(>base, dev, ); + msm_mmu_init(>base, dev, , MSM_MMU_GPUMMU); return >base; } diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index bbe129867590..c7efe43388e3 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -4,15 +4,192 @@ * Author: Rob Clark */ +#include #include "msm_drv.h" #include "msm_mmu.h" struct msm_iommu { struct msm_mmu base; struct iommu_domain *domain; + struct iommu_domain *aux_domain; }; + #define to_msm_iommu(x) container_of(x, struct msm_iommu, base) +struct msm_iommu_pagetable { + struct msm_mmu base; + struct msm_mmu *parent; + struct io_pgtable_ops *pgtbl_ops; + phys_addr_t ttbr; + u32 asid; +}; + +static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu) +{ + return container_of(mmu, struct msm_iommu_pagetable, base); +} + +static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova, + size_t size) +{ + struct msm_iommu_pagetable *pagetable = to_pagetable(mmu); + struct io_pgtable_ops *ops = pagetable->pgtbl_ops; + size_t unmapped = 0; + + /* Unmap the block one page at a time */ + while (size) { + unmapped += ops->unmap(ops, iova, 4096, NULL); + iova += 4096; + size -= 4096; + } + + iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain); + + return (unmapped == size) ? 0 : -EINVAL; +} + +static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova, + struct sg_table *sgt, size_t len, int prot) +{ + struct msm_iommu_pagetable *pagetable = to_pagetable(mmu); + struct io_pgtable_ops *ops = pagetable->pgtbl_ops; + struct scatterlist *sg; + size_t mapped = 0; + u64 addr = iova; + unsigned int i; + + for_each_sg(sgt->sgl, sg, sgt->nents, i) { + size_t size = sg->length; + phys_addr_t phys = sg_phys(sg); + + /* Map the block one page at a time */ + while (size) { + if (ops->map(ops, addr, phys, 4096, prot)) { + msm_iommu_pagetable_unmap(mmu, iova, mapped); + return -EINVAL; + } + + phys += 4096; + addr += 4096; + size -= 4096; + mapped += 4096; + } + } + + return 0; +} + +static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu) +{ + struct msm_iommu_pagetable *pagetable = to_pagetable(mmu); + + free_io_pgtable_ops(pagetable->pgtbl_ops); + kfree(pagetable); +} + +/* + * Given a parent device, create and return an aux domain. This will enable the + * TTBR0 region + */ +static struct iommu_domain *msm_iommu_get_aux_domain(struct msm_mmu *parent) +{ + struct msm_iommu *iommu = to_msm_iommu(parent); + struct iommu_domain *domain; + int ret; + + if (iommu->aux_domain) + return iommu->aux_domain; + + if (!iommu_dev_has_feature(parent->dev, IOMMU_DEV_FEAT_AUX)) + return ERR_PTR(-ENODEV); + + domain = iommu_domain_alloc(_bus_type); + if (!domain) + return ERR_PTR(-ENODEV); + + ret = iommu_aux_attach_device(domain, parent->dev); + if (ret) { + iommu_domain_free(domain); + return ERR_PTR(ret); + } + + iommu->aux_domain = domain; + return domain; +} + +int msm_iommu_pagetable_params(struct msm_mmu *mmu, + phys_addr_t *ttbr, int *asid) +{ + struct msm_iommu_pagetable *pagetable; + + if (mmu->type != MSM_MMU_IOMMU_PAGETABLE) + return -EINVAL; + + pagetable = to_pagetable(mmu); + + if (ttbr) + *ttbr = pagetable->ttbr; + + if (asid) + *asid = pagetable->asid; + + return 0; +} + +static const struct msm_mmu_funcs pagetable_funcs = { + .map =
[PATCH 5/6] drm/msm: Add support for address space instances
Add support for allocating an address space instance. Targets that support per-instance pagetables should implement their own function to allocate a new instance. The default will return the existing generic address space. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/msm_drv.c | 15 +-- drivers/gpu/drm/msm/msm_drv.h | 4 drivers/gpu/drm/msm/msm_gem_vma.c | 9 + drivers/gpu/drm/msm/msm_gpu.c | 17 + drivers/gpu/drm/msm/msm_gpu.h | 5 + 5 files changed, 44 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index f6ce40bf3699..0c219b954943 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -599,7 +599,7 @@ static int context_init(struct drm_device *dev, struct drm_file *file) msm_submitqueue_init(dev, ctx); - ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL; + ctx->aspace = msm_gpu_address_space_instance(priv->gpu); file->driver_priv = ctx; return 0; @@ -618,6 +618,8 @@ static int msm_open(struct drm_device *dev, struct drm_file *file) static void context_close(struct msm_file_private *ctx) { msm_submitqueue_close(ctx); + + msm_gem_address_space_put(ctx->aspace); kfree(ctx); } @@ -782,18 +784,19 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data, } static int msm_ioctl_gem_info_iova(struct drm_device *dev, - struct drm_gem_object *obj, uint64_t *iova) + struct drm_file *file, struct drm_gem_object *obj, + uint64_t *iova) { - struct msm_drm_private *priv = dev->dev_private; + struct msm_file_private *ctx = file->driver_priv; - if (!priv->gpu) + if (!ctx->aspace) return -EINVAL; /* * Don't pin the memory here - just get an address so that userspace can * be productive */ - return msm_gem_get_iova(obj, priv->gpu->aspace, iova); + return msm_gem_get_iova(obj, ctx->aspace, iova); } static int msm_ioctl_gem_info(struct drm_device *dev, void *data, @@ -832,7 +835,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data, args->value = msm_gem_mmap_offset(obj); break; case MSM_INFO_GET_IOVA: - ret = msm_ioctl_gem_info_iova(dev, obj, >value); + ret = msm_ioctl_gem_info_iova(dev, file, obj, >value); break; case MSM_INFO_SET_NAME: /* length check should leave room for terminating null: */ diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index e2d6a6056418..983a8b7e5a74 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -249,6 +249,10 @@ int msm_gem_map_vma(struct msm_gem_address_space *aspace, void msm_gem_close_vma(struct msm_gem_address_space *aspace, struct msm_gem_vma *vma); + +struct msm_gem_address_space * +msm_gem_address_space_get(struct msm_gem_address_space *aspace); + void msm_gem_address_space_put(struct msm_gem_address_space *aspace); struct msm_gem_address_space * diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c b/drivers/gpu/drm/msm/msm_gem_vma.c index 5f6a11211b64..29cc1305cf37 100644 --- a/drivers/gpu/drm/msm/msm_gem_vma.c +++ b/drivers/gpu/drm/msm/msm_gem_vma.c @@ -27,6 +27,15 @@ void msm_gem_address_space_put(struct msm_gem_address_space *aspace) kref_put(>kref, msm_gem_address_space_destroy); } +struct msm_gem_address_space * +msm_gem_address_space_get(struct msm_gem_address_space *aspace) +{ + if (!IS_ERR_OR_NULL(aspace)) + kref_get(>kref); + + return aspace; +} + /* Actually unmap memory for the vma */ void msm_gem_purge_vma(struct msm_gem_address_space *aspace, struct msm_gem_vma *vma) diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index a22d30622306..b4f31460 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -821,6 +821,23 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu) return 0; } +/* Return a new address space instance */ +struct msm_gem_address_space * +msm_gpu_address_space_instance(struct msm_gpu *gpu) +{ + if (!gpu) + return NULL; + + /* +* If the GPU doesn't support instanced address spaces return the +* default address space +*/ + if (!gpu->funcs->address_space_instance) + return msm_gem_address_space_get(gpu->aspace); + + return gpu->funcs->address_space_instance(gpu); +} + int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs, const char *name, struct msm_gpu_config *config) diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index
[PATCH 0/6] iommu-arm-smmu: Add auxiliary domains and per-instance pagetables
This is a new refresh of support for auxiliary domains for arm-smmu-v2 and per-instance pagetables for drm/msm. The big change here from past efforts is that outside of creating a single aux-domain to enable TTBR0 all of the per-instance pagetables are created and managed exclusively in drm/msm without involving the arm-smmu driver. This fits in with the suggested model of letting the GPU hardware do what it needs and leave the arm-smmu driver blissfully unaware. Almost. In order to set up the io-pgtable properly in drm/msm we need to query the pagetable configuration from the current active domain and we need to rely on the iommu API to flush TLBs after a unmap. In the future we can optimize this in the drm/msm driver to track the state of the TLBs but for now the big hammer lets us get off the ground. This series is build on the split pagetable support [1]. [1] https://patchwork.kernel.org/patch/11600949/ Jordan Crouse (6): iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2 iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations iommu/arm-smmu: Add a domain attribute to pass the pagetable config drm/msm: Add support to create a local pagetable drm/msm: Add support for address space instances drm/msm/a6xx: Add support for per-instance pagetables drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 69 +++- drivers/gpu/drm/msm/msm_drv.c | 15 +- drivers/gpu/drm/msm/msm_drv.h | 4 + drivers/gpu/drm/msm/msm_gem_vma.c | 9 + drivers/gpu/drm/msm/msm_gpu.c | 17 ++ drivers/gpu/drm/msm/msm_gpu.h | 5 + drivers/gpu/drm/msm/msm_gpummu.c | 2 +- drivers/gpu/drm/msm/msm_iommu.c | 180 +++- drivers/gpu/drm/msm/msm_mmu.h | 16 +- drivers/gpu/drm/msm/msm_ringbuffer.h | 1 + drivers/iommu/arm-smmu.c | 228 -- drivers/iommu/arm-smmu.h | 1 + include/linux/io-pgtable.h| 11 +- include/linux/iommu.h | 1 + 14 files changed, 529 insertions(+), 30 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations
Allow a io-pgtable implementation to skip TLB operations by checking for NULL pointers in the helper functions. It will be up to to the owner of the io-pgtable instance to make sure that they independently handle the TLB correctly. Signed-off-by: Jordan Crouse --- include/linux/io-pgtable.h | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 53d53c6c2be9..bbed1d3925ba 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -210,21 +210,24 @@ struct io_pgtable { static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop) { - iop->cfg.tlb->tlb_flush_all(iop->cookie); + if (iop->cfg.tlb) + iop->cfg.tlb->tlb_flush_all(iop->cookie); } static inline void io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova, size_t size, size_t granule) { - iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie); + if (iop->cfg.tlb) + iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie); } static inline void io_pgtable_tlb_flush_leaf(struct io_pgtable *iop, unsigned long iova, size_t size, size_t granule) { - iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie); + if (iop->cfg.tlb) + iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie); } static inline void @@ -232,7 +235,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop, struct iommu_iotlb_gather * gather, unsigned long iova, size_t granule) { - if (iop->cfg.tlb->tlb_add_page) + if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page) iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie); } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 5/7] iommu/arm-smmu: Add implementation for the adreno GPU SMMU
Add a special implementation for the SMMU attached to most Adreno GPU target triggered from the qcom,adreno-gpu-smmu compatible string. When selected the driver will attempt to enable split pagetables. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu-impl.c | 3 +++ drivers/iommu/arm-smmu-qcom.c | 45 +-- drivers/iommu/arm-smmu.h | 1 + 3 files changed, 47 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index a20e426d81ac..309675cf6699 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -176,5 +176,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) of_device_is_compatible(np, "qcom,sc7180-smmu-500")) return qcom_smmu_impl_init(smmu); + if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu")) + return qcom_adreno_smmu_impl_init(smmu); + return smmu; } diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c index cf01d0215a39..6d0ab4865fc7 100644 --- a/drivers/iommu/arm-smmu-qcom.c +++ b/drivers/iommu/arm-smmu-qcom.c @@ -12,6 +12,29 @@ struct qcom_smmu { struct arm_smmu_device smmu; }; +static bool qcom_adreno_smmu_is_gpu_device(struct arm_smmu_domain *smmu_domain) +{ + return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno"); +} + +static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg) +{ + /* TTBR1 is only for the GPU stream ID and not the GMU */ + if (!qcom_adreno_smmu_is_gpu_device(smmu_domain)) + return 0; + /* +* All targets that use the qcom,adreno-smmu compatible string *should* +* be AARCH64 stage 1 but double check because the arm-smmu code assumes +* that is the case when the TTBR1 quirk is enabled +*/ + if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1) && + (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64)) + pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1; + + return 0; +} + static const struct of_device_id qcom_smmu_client_of_match[] = { { .compatible = "qcom,adreno" }, { .compatible = "qcom,mdp4" }, @@ -65,7 +88,15 @@ static const struct arm_smmu_impl qcom_smmu_impl = { .reset = qcom_smmu500_reset, }; -struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu) +static const struct arm_smmu_impl qcom_adreno_smmu_impl = { + .init_context = qcom_adreno_smmu_init_context, + .def_domain_type = qcom_smmu_def_domain_type, + .reset = qcom_smmu500_reset, +}; + + +static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu, + const struct arm_smmu_impl *impl) { struct qcom_smmu *qsmmu; @@ -75,8 +106,18 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu) qsmmu->smmu = *smmu; - qsmmu->smmu.impl = _smmu_impl; + qsmmu->smmu.impl = impl; devm_kfree(smmu->dev, smmu); return >smmu; } + +struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu) +{ + return qcom_smmu_create(smmu, _smmu_impl); +} + +struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu) +{ + return qcom_smmu_create(smmu, _adreno_smmu_impl); +} diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index d33cfe26b2f5..c417814f1d98 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -466,6 +466,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu); struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu); +struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu); int arm_mmu500_reset(struct arm_smmu_device *smmu); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 3/7] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
Every Qcom Adreno GPU has an embedded SMMU for its own use. These devices depend on unique features such as split pagetables, different stall/halt requirements and other settings. Identify them with a compatible string so that they can be identified in the arm-smmu implementation specific code. Signed-off-by: Jordan Crouse --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index d7ceb4c34423..e52a1b146c97 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -38,6 +38,10 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: Qcom Adreno GPUs implementing "arm,smmu-v2" +items: + - const: qcom,adreno-smmu + - const: qcom,smmu-v2 - items: - const: arm,mmu-500 - const: arm,smmu-v2 -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 2/7] iommu/arm-smmu: Add support for split pagetables
Enable TTBR1 for a context bank if IO_PGTABLE_QUIRK_ARM_TTBR1 is selected by the io-pgtable configuration. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu.c | 21 - drivers/iommu/arm-smmu.h | 25 +++-- 2 files changed, 35 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 8a3a6c8c887a..048de2681670 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -555,11 +555,15 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr; cb->ttbr[1] = 0; } else { - cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; - cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, - cfg->asid); + cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, + cfg->asid); cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, -cfg->asid); + cfg->asid); + + if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) + cb->ttbr[1] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + else + cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr; } } else { cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr; @@ -824,7 +828,14 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, /* Update the domain's page sizes to reflect the page table format */ domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; - domain->geometry.aperture_end = (1UL << ias) - 1; + + if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) { + domain->geometry.aperture_start = ~0UL << ias; + domain->geometry.aperture_end = ~0UL; + } else { + domain->geometry.aperture_end = (1UL << ias) - 1; + } + domain->geometry.force_aperture = true; /* Initialise the context bank with our page table cfg */ diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index 38b041530a4f..5f2de20e883b 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -168,10 +168,12 @@ enum arm_smmu_cbar_type { #define ARM_SMMU_CB_TCR0x30 #define ARM_SMMU_TCR_EAE BIT(31) #define ARM_SMMU_TCR_EPD1 BIT(23) +#define ARM_SMMU_TCR_A1BIT(22) #define ARM_SMMU_TCR_TG0 GENMASK(15, 14) #define ARM_SMMU_TCR_SH0 GENMASK(13, 12) #define ARM_SMMU_TCR_ORGN0 GENMASK(11, 10) #define ARM_SMMU_TCR_IRGN0 GENMASK(9, 8) +#define ARM_SMMU_TCR_EPD0 BIT(7) #define ARM_SMMU_TCR_T0SZ GENMASK(5, 0) #define ARM_SMMU_VTCR_RES1 BIT(31) @@ -347,12 +349,23 @@ struct arm_smmu_domain { static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg) { - return ARM_SMMU_TCR_EPD1 | - FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) | - FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) | - FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) | - FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) | - FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz); + u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) | + FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) | + FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) | + FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) | + FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz); + + /* + * When TTBR1 is selected shift the TCR fields by 16 bits and disable + * translation in TTBR0 + */ + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) { + tcr = (tcr << 16) & ~ARM_SMMU_TCR_A1; + tcr |= ARM_SMMU_TCR_EPD0; + } else + tcr |= ARM_SMMU_TCR_EPD1; + + return tcr; } static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 1/7] iommu/arm-smmu: Pass io-pgtable config to implementation specific function
Construct the io-pgtable config before calling the implementation specific init_context function and pass it so the implementation specific function can get a chance to change it before the io-pgtable is created. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu-impl.c | 3 ++- drivers/iommu/arm-smmu.c | 11 ++- drivers/iommu/arm-smmu.h | 3 ++- 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b70..a20e426d81ac 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu) return 0; } -static int cavium_init_context(struct arm_smmu_domain *smmu_domain) +static int cavium_init_context(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg) { struct cavium_smmu *cs = container_of(smmu_domain->smmu, struct cavium_smmu, smmu); diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705..8a3a6c8c887a 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -797,11 +797,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, cfg->asid = cfg->cbndx; smmu_domain->smmu = smmu; - if (smmu->impl && smmu->impl->init_context) { - ret = smmu->impl->init_context(smmu_domain); - if (ret) - goto out_unlock; - } pgtbl_cfg = (struct io_pgtable_cfg) { .pgsize_bitmap = smmu->pgsize_bitmap, @@ -812,6 +807,12 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, .iommu_dev = smmu->dev, }; + if (smmu->impl && smmu->impl->init_context) { + ret = smmu->impl->init_context(smmu_domain, _cfg); + if (ret) + goto out_unlock; + } + if (smmu_domain->non_strict) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index d172c024be61..38b041530a4f 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -383,7 +383,8 @@ struct arm_smmu_impl { u64 val); int (*cfg_probe)(struct arm_smmu_device *smmu); int (*reset)(struct arm_smmu_device *smmu); - int (*init_context)(struct arm_smmu_domain *smmu_domain); + int (*init_context)(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *cfg); void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync, int status); int (*def_domain_type)(struct device *dev); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 7/7] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU
Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable split pagetables. Signed-off-by: Jordan Crouse --- arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi index 8eb5a31346d2..8b15cd74e9ba 100644 --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi @@ -3556,7 +3556,7 @@ }; adreno_smmu: iommu@504 { - compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2"; + compatible = "qcom,adreno-smmu", "qcom,smmu-v2"; reg = <0 0x504 0 0x1>; #iommu-cells = <1>; #global-interrupts = <2>; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 4/7] iommu/arm-smmu: Add a pointer to the attached device to smmu_domain
Add a link to the pointer to the struct device that is attached to a domain. This makes it easy to get the pointer if it is needed in the implementation specific code. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu.c | 1 + drivers/iommu/arm-smmu.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 048de2681670..743d75b9ff3f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -801,6 +801,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, cfg->asid = cfg->cbndx; smmu_domain->smmu = smmu; + smmu_domain->dev = dev; pgtbl_cfg = (struct io_pgtable_cfg) { .pgsize_bitmap = smmu->pgsize_bitmap, diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index 5f2de20e883b..d33cfe26b2f5 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -345,6 +345,7 @@ struct arm_smmu_domain { struct mutexinit_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ struct iommu_domain domain; + struct device *dev; /* Device attached to this domain */ }; static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 0/7] iommu/arm-smmu: Enable split pagetable support
Another iteration of the split-pagetable support for arm-smmu and the Adreno GPU SMMU. After email discussions [1] we opted to make a arm-smmu implementation for specifically for the Adreno GPU and use that to enable split pagetable support and later other implementation specific bits that we need. On the hardware side this is very close to the same code from before [2] only the TTBR1 quirk is turned on by the implementation and not a domain attribute. In drm/msm we use the returned size of the aperture as a clue to let us know which virtual address space we should use for global memory objects. There are two open items that you should be aware of. First, in the implementation specific code we have to check the compatible string of the device so that we only enable TTBR1 for the GPU (SID 0) and not the GMU (SID 4). I went back and forth trying to decide if I wanted to use the compatbile string or the SID as the filter and settled on the compatible string but I could be talked out of it. The other open item is that in drm/msm the hardware only uses 49 bits of the address space but arm-smmu expects the address to be sign extended all the way to 64 bits. This isn't a problem normally unless you look at the hardware registers that contain a IOVA and then the upper bits will be zero. I opted to restrict the internal drm/msm IOVA range to only 49 bits and then sign extend right before calling iommu_map / iommu_unmap. This is a bit wonky but I thought that matching the hardware would be less confusing when debugging a hang. v8: Pass the attached device in the smmu_domain to the implementation specific functions [1] https://lists.linuxfoundation.org/pipermail/iommu/2020-May/044537.html [2] https://patchwork.kernel.org/patch/11482591/ Jordan Crouse (7): iommu/arm-smmu: Pass io-pgtable config to implementation specific function iommu/arm-smmu: Add support for split pagetables dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU iommu/arm-smmu: Add a pointer to the attached device to smmu_domain iommu/arm-smmu: Add implementation for the adreno GPU SMMU drm/msm: Set the global virtual address range from the IOMMU domain arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU .../devicetree/bindings/iommu/arm,smmu.yaml | 4 ++ arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +- drivers/gpu/drm/msm/msm_iommu.c | 7 +++ drivers/iommu/arm-smmu-impl.c | 6 ++- drivers/iommu/arm-smmu-qcom.c | 45 ++- drivers/iommu/arm-smmu.c | 33 +- drivers/iommu/arm-smmu.h | 30 ++--- 8 files changed, 117 insertions(+), 23 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 6/7] drm/msm: Set the global virtual address range from the IOMMU domain
Use the aperture settings from the IOMMU domain to set up the virtual address range for the GPU. This allows us to transparently deal with IOMMU side features (like split pagetables). Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++-- drivers/gpu/drm/msm/msm_iommu.c | 7 +++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 89673c7ed473..3e717c1ebb7f 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu, struct iommu_domain *iommu = iommu_domain_alloc(_bus_type); struct msm_mmu *mmu = msm_iommu_new(>dev, iommu); struct msm_gem_address_space *aspace; + u64 start, size; - aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M, - 0xfff); + /* +* Use the aperture start or SZ_16M, whichever is greater. This will +* ensure that we align with the allocated pagetable range while still +* allowing room in the lower 32 bits for GMEM and whatnot +*/ + start = max_t(u64, SZ_16M, iommu->geometry.aperture_start); + size = iommu->geometry.aperture_end - start + 1; + + aspace = msm_gem_address_space_create(mmu, "gpu", + start & GENMASK(48, 0), size); if (IS_ERR(aspace) && !IS_ERR(mmu)) mmu->funcs->destroy(mmu); diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 3a381a9674c9..bbe129867590 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, struct msm_iommu *iommu = to_msm_iommu(mmu); size_t ret; + /* The arm-smmu driver expects the addresses to be sign extended */ + if (iova & BIT(48)) + iova |= GENMASK(63, 49); + ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot); WARN_ON(!ret); @@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova, size_t len) { struct msm_iommu *iommu = to_msm_iommu(mmu); + if (iova & BIT(48)) + iova |= GENMASK(63, 49); + iommu_unmap(iommu->domain, iova, len); return 0; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users
On Thu, 11 Jun 2020 13:02:24 -0700 Jacob Pan wrote: > On Thu, 11 Jun 2020 11:08:16 -0600 > Alex Williamson wrote: > > > On Wed, 10 Jun 2020 21:12:15 -0700 > > Jacob Pan wrote: > > > > > IOMMU UAPI data has an argsz field which is filled by user. As the > > > data structures expands, argsz may change. As the UAPI data are > > > shared among different architectures, extensions of UAPI data could > > > be a result of one architecture which has no impact on another. > > > Therefore, these argsz santity checks are performed in the model > > > specific IOMMU drivers. This patch adds sanity checks in the VT-d > > > to ensure argsz passed by userspace matches feature flags and other > > > contents. > > > > > > Signed-off-by: Jacob Pan > > > --- > > > drivers/iommu/intel-iommu.c | 16 > > > drivers/iommu/intel-svm.c | 12 > > > 2 files changed, 28 insertions(+) > > > > > > diff --git a/drivers/iommu/intel-iommu.c > > > b/drivers/iommu/intel-iommu.c index 27ebf4b9faef..c98b5109684b > > > 100644 --- a/drivers/iommu/intel-iommu.c > > > +++ b/drivers/iommu/intel-iommu.c > > > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct > > > iommu_domain *domain, struct device *dev, struct device_domain_info > > > *info; struct intel_iommu *iommu; > > > unsigned long flags; > > > + unsigned long minsz; > > > int cache_type; > > > u8 bus, devfn; > > > u16 did, sid; > > > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct > > > iommu_domain *domain, struct device *dev, if (!(dmar_domain->flags > > > & DOMAIN_FLAG_NESTING_MODE)) return -EINVAL; > > > > > > + minsz = offsetofend(struct iommu_cache_invalidate_info, > > > padding); > > > > Would it still be better to look for the end of the last field that's > > actually used to avoid the code churn and oversights if/when the > > padding field does get used and renamed? > > > My thought was that if the padding gets partially re-purposed, the > remaining padding would still be valid for minsz check. The extension > rule ensures that there is no size change other the variable size union > at the end. So use padding would avoid the churn, or i am totally wrong? No, it's trying to predict the future either way. I figured checking minsz against the fields we actually consume allows complete use of the padding fields and provides a little leniency to the user. We'd need to be careful though that if those fields are later used by this driver, the code would still need to accept the smaller size. If the union was named rather than anonymous we could just use offsetof() to avoid directly referencing the padding fields. > > Per my comment on patch 1/, this also seems like where the device > > specific IOMMU driver should also have the responsibility of receiving > > a __user pointer to do the copy_from_user() here. vfio can't know > > which flags require which fields to make a UAPI with acceptable > > compatibility guarantees otherwise. > > > Right, VFIO cannot do compatibility guarantees, it is just seem to be > that VFIO has enough information to copy_from_user sanely & safely and > handle over to IOMMU. Please help define the roles/responsibilities in > my other email. Then I will follow the guideline. We can keep that part of the discussion in the other thread. Thanks, Alex > > > + if (inv_info->argsz < minsz) > > > + return -EINVAL; > > > + > > > + /* Sanity check user filled invalidation dat sizes */ > > > + if (inv_info->granularity == IOMMU_INV_GRANU_ADDR && > > > + inv_info->argsz != offsetofend(struct > > > iommu_cache_invalidate_info, > > > + addr_info)) > > > + return -EINVAL; > > > + > > > + if (inv_info->granularity == IOMMU_INV_GRANU_PASID && > > > + inv_info->argsz != offsetofend(struct > > > iommu_cache_invalidate_info, > > > + pasid_info)) > > > + return -EINVAL; > > > + > > > spin_lock_irqsave(_domain_lock, flags); > > > spin_lock(>lock); > > > info = get_domain_info(dev); > > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > > > index 35b43fe819ed..64dc2c66dfff 100644 > > > --- a/drivers/iommu/intel-svm.c > > > +++ b/drivers/iommu/intel-svm.c > > > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain > > > *domain, struct device *dev, struct dmar_domain *dmar_domain; > > > struct intel_svm_dev *sdev; > > > struct intel_svm *svm; > > > + unsigned long minsz; > > > int ret = 0; > > > > > > if (WARN_ON(!iommu) || !data) > > > return -EINVAL; > > > > > > + /* > > > + * We mandate that no size change in IOMMU UAPI data > > > before the > > > + * variable size union at the end. > > > + */ > > > + minsz = offsetofend(struct iommu_gpasid_bind_data, > > > padding); > > > > Same. Thanks, > > > > Alex > > > > > + if (data->argsz < minsz) > > > + return -EINVAL; > > > + > > > if (data->version !=
Re: [PATCH v2 1/3] docs: IOMMU user API
On Thu, 11 Jun 2020 12:52:05 -0700 Jacob Pan wrote: > Hi Alex, > > On Thu, 11 Jun 2020 09:47:41 -0600 > Alex Williamson wrote: > > > On Wed, 10 Jun 2020 21:12:13 -0700 > > Jacob Pan wrote: > > > > > IOMMU UAPI is newly introduced to support communications between > > > guest virtual IOMMU and host IOMMU. There has been lots of > > > discussions on how it should work with VFIO UAPI and userspace in > > > general. > > > > > > This document is indended to clarify the UAPI design and usage. The > > > mechenics of how future extensions should be achieved are also > > > covered in this documentation. > > > > > > Signed-off-by: Liu Yi L > > > Signed-off-by: Jacob Pan > > > --- > > > Documentation/userspace-api/iommu.rst | 210 > > > ++ 1 file changed, 210 insertions(+) > > > create mode 100644 Documentation/userspace-api/iommu.rst > > > > > > diff --git a/Documentation/userspace-api/iommu.rst > > > b/Documentation/userspace-api/iommu.rst new file mode 100644 > > > index ..e95dc5a04a41 > > > --- /dev/null > > > +++ b/Documentation/userspace-api/iommu.rst > > > @@ -0,0 +1,210 @@ > > > +.. SPDX-License-Identifier: GPL-2.0 > > > +.. iommu: > > > + > > > += > > > +IOMMU Userspace API > > > += > > > + > > > +IOMMU UAPI is used for virtualization cases where communications > > > are +needed between physical and virtual IOMMU drivers. For native > > > +usage, IOMMU is a system device which does not need to communicate > > > +with user space directly. > > > + > > > +The primary use cases are guest Shared Virtual Address (SVA) and > > > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is > > > +required to communicate with the physical IOMMU in the host. > > > + > > > +.. contents:: :local: > > > + > > > +Functionalities > > > + > > > +Communications of user and kernel involve both directions. The > > > +supported user-kernel APIs are as follows: > > > + > > > +1. Alloc/Free PASID > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d) > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU) > > > +4. Invalidate IOMMU caches > > > +5. Service page request > > > + > > > +Requirements > > > + > > > +The IOMMU UAPIs are generic and extensible to meet the following > > > +requirements: > > > + > > > +1. Emulated and para-virtualised vIOMMUs > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.) > > > +3. Extensions to the UAPI shall not break existing user space > > > + > > > +Interfaces > > > + > > > +Although the data structures defined in IOMMU UAPI are > > > self-contained, +there is no user API functions introduced. > > > Instead, IOMMU UAPI is +designed to work with existing user driver > > > frameworks such as VFIO. + > > > +Extension Rules & Precautions > > > +- > > > +When IOMMU UAPI gets extended, the data structures can *only* be > > > +modified in two ways: > > > + > > > +1. Adding new fields by re-purposing the padding[] field. No size > > > change. +2. Adding new union members at the end. May increase in > > > size. + > > > +No new fields can be added *after* the variable size union in that > > > it +will break backward compatibility when offset moves. In both > > > cases, a +new flag must be accompanied with a new field such that > > > the IOMMU +driver can process the data based on the new flag. > > > Version field is +only reserved for the unlikely event of UAPI > > > upgrade at its entirety. + > > > +It's *always* the caller's responsibility to indicate the size of > > > the +structure passed by setting argsz appropriately. > > > + > > > +When IOMMU UAPI extension results in size increase, user such as > > > VFIO +has to handle the following scenarios: > > > + > > > +1. User and kernel has exact size match > > > +2. An older user with older kernel header (smaller UAPI size) > > > running on a > > > + newer kernel (larger UAPI size) > > > +3. A newer user with newer kernel header (larger UAPI size) running > > > + on a older kernel. > > > +4. A malicious/misbehaving user pass illegal/invalid size but > > > within > > > + range. The data may contain garbage. > > > + > > > + > > > +Feature Checking > > > + > > > +While launching a guest with vIOMMU, it is important to ensure > > > that host +can support the UAPI data structures to be used for > > > vIOMMU-pIOMMU +communications. Without the upfront compatibility > > > checking, future +faults are difficult to report even in normal > > > conditions. For example, +TLB invalidations should always succeed > > > from vIOMMU's +perspective. There is no architectural way to report > > > back to the vIOMMU +if the UAPI data is incompatible. For this > > > reason the following IOMMU +UAPIs cannot fail: > > > + > > > +1. Free
Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users
On Thu, 11 Jun 2020 11:08:16 -0600 Alex Williamson wrote: > On Wed, 10 Jun 2020 21:12:15 -0700 > Jacob Pan wrote: > > > IOMMU UAPI data has an argsz field which is filled by user. As the > > data structures expands, argsz may change. As the UAPI data are > > shared among different architectures, extensions of UAPI data could > > be a result of one architecture which has no impact on another. > > Therefore, these argsz santity checks are performed in the model > > specific IOMMU drivers. This patch adds sanity checks in the VT-d > > to ensure argsz passed by userspace matches feature flags and other > > contents. > > > > Signed-off-by: Jacob Pan > > --- > > drivers/iommu/intel-iommu.c | 16 > > drivers/iommu/intel-svm.c | 12 > > 2 files changed, 28 insertions(+) > > > > diff --git a/drivers/iommu/intel-iommu.c > > b/drivers/iommu/intel-iommu.c index 27ebf4b9faef..c98b5109684b > > 100644 --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct > > iommu_domain *domain, struct device *dev, struct device_domain_info > > *info; struct intel_iommu *iommu; > > unsigned long flags; > > + unsigned long minsz; > > int cache_type; > > u8 bus, devfn; > > u16 did, sid; > > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct > > iommu_domain *domain, struct device *dev, if (!(dmar_domain->flags > > & DOMAIN_FLAG_NESTING_MODE)) return -EINVAL; > > > > + minsz = offsetofend(struct iommu_cache_invalidate_info, > > padding); > > Would it still be better to look for the end of the last field that's > actually used to avoid the code churn and oversights if/when the > padding field does get used and renamed? > My thought was that if the padding gets partially re-purposed, the remaining padding would still be valid for minsz check. The extension rule ensures that there is no size change other the variable size union at the end. So use padding would avoid the churn, or i am totally wrong? > Per my comment on patch 1/, this also seems like where the device > specific IOMMU driver should also have the responsibility of receiving > a __user pointer to do the copy_from_user() here. vfio can't know > which flags require which fields to make a UAPI with acceptable > compatibility guarantees otherwise. > Right, VFIO cannot do compatibility guarantees, it is just seem to be that VFIO has enough information to copy_from_user sanely & safely and handle over to IOMMU. Please help define the roles/responsibilities in my other email. Then I will follow the guideline. > > + if (inv_info->argsz < minsz) > > + return -EINVAL; > > + > > + /* Sanity check user filled invalidation dat sizes */ > > + if (inv_info->granularity == IOMMU_INV_GRANU_ADDR && > > + inv_info->argsz != offsetofend(struct > > iommu_cache_invalidate_info, > > + addr_info)) > > + return -EINVAL; > > + > > + if (inv_info->granularity == IOMMU_INV_GRANU_PASID && > > + inv_info->argsz != offsetofend(struct > > iommu_cache_invalidate_info, > > + pasid_info)) > > + return -EINVAL; > > + > > spin_lock_irqsave(_domain_lock, flags); > > spin_lock(>lock); > > info = get_domain_info(dev); > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > > index 35b43fe819ed..64dc2c66dfff 100644 > > --- a/drivers/iommu/intel-svm.c > > +++ b/drivers/iommu/intel-svm.c > > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain > > *domain, struct device *dev, struct dmar_domain *dmar_domain; > > struct intel_svm_dev *sdev; > > struct intel_svm *svm; > > + unsigned long minsz; > > int ret = 0; > > > > if (WARN_ON(!iommu) || !data) > > return -EINVAL; > > > > + /* > > +* We mandate that no size change in IOMMU UAPI data > > before the > > +* variable size union at the end. > > +*/ > > + minsz = offsetofend(struct iommu_gpasid_bind_data, > > padding); > > Same. Thanks, > > Alex > > > + if (data->argsz < minsz) > > + return -EINVAL; > > + > > if (data->version != IOMMU_GPASID_BIND_VERSION_1 || > > data->format != IOMMU_PASID_FORMAT_INTEL_VTD) > > return -EINVAL; > > > > + if (data->argsz != offsetofend(struct > > iommu_gpasid_bind_data, vtd)) > > + return -EINVAL; > > + > > if (!dev_is_pci(dev)) > > return -ENOTSUPP; > > > [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/3] docs: IOMMU user API
Hi Alex, On Thu, 11 Jun 2020 09:47:41 -0600 Alex Williamson wrote: > On Wed, 10 Jun 2020 21:12:13 -0700 > Jacob Pan wrote: > > > IOMMU UAPI is newly introduced to support communications between > > guest virtual IOMMU and host IOMMU. There has been lots of > > discussions on how it should work with VFIO UAPI and userspace in > > general. > > > > This document is indended to clarify the UAPI design and usage. The > > mechenics of how future extensions should be achieved are also > > covered in this documentation. > > > > Signed-off-by: Liu Yi L > > Signed-off-by: Jacob Pan > > --- > > Documentation/userspace-api/iommu.rst | 210 > > ++ 1 file changed, 210 insertions(+) > > create mode 100644 Documentation/userspace-api/iommu.rst > > > > diff --git a/Documentation/userspace-api/iommu.rst > > b/Documentation/userspace-api/iommu.rst new file mode 100644 > > index ..e95dc5a04a41 > > --- /dev/null > > +++ b/Documentation/userspace-api/iommu.rst > > @@ -0,0 +1,210 @@ > > +.. SPDX-License-Identifier: GPL-2.0 > > +.. iommu: > > + > > += > > +IOMMU Userspace API > > += > > + > > +IOMMU UAPI is used for virtualization cases where communications > > are +needed between physical and virtual IOMMU drivers. For native > > +usage, IOMMU is a system device which does not need to communicate > > +with user space directly. > > + > > +The primary use cases are guest Shared Virtual Address (SVA) and > > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is > > +required to communicate with the physical IOMMU in the host. > > + > > +.. contents:: :local: > > + > > +Functionalities > > + > > +Communications of user and kernel involve both directions. The > > +supported user-kernel APIs are as follows: > > + > > +1. Alloc/Free PASID > > +2. Bind/unbind guest PASID (e.g. Intel VT-d) > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU) > > +4. Invalidate IOMMU caches > > +5. Service page request > > + > > +Requirements > > + > > +The IOMMU UAPIs are generic and extensible to meet the following > > +requirements: > > + > > +1. Emulated and para-virtualised vIOMMUs > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.) > > +3. Extensions to the UAPI shall not break existing user space > > + > > +Interfaces > > + > > +Although the data structures defined in IOMMU UAPI are > > self-contained, +there is no user API functions introduced. > > Instead, IOMMU UAPI is +designed to work with existing user driver > > frameworks such as VFIO. + > > +Extension Rules & Precautions > > +- > > +When IOMMU UAPI gets extended, the data structures can *only* be > > +modified in two ways: > > + > > +1. Adding new fields by re-purposing the padding[] field. No size > > change. +2. Adding new union members at the end. May increase in > > size. + > > +No new fields can be added *after* the variable size union in that > > it +will break backward compatibility when offset moves. In both > > cases, a +new flag must be accompanied with a new field such that > > the IOMMU +driver can process the data based on the new flag. > > Version field is +only reserved for the unlikely event of UAPI > > upgrade at its entirety. + > > +It's *always* the caller's responsibility to indicate the size of > > the +structure passed by setting argsz appropriately. > > + > > +When IOMMU UAPI extension results in size increase, user such as > > VFIO +has to handle the following scenarios: > > + > > +1. User and kernel has exact size match > > +2. An older user with older kernel header (smaller UAPI size) > > running on a > > + newer kernel (larger UAPI size) > > +3. A newer user with newer kernel header (larger UAPI size) running > > + on a older kernel. > > +4. A malicious/misbehaving user pass illegal/invalid size but > > within > > + range. The data may contain garbage. > > + > > + > > +Feature Checking > > + > > +While launching a guest with vIOMMU, it is important to ensure > > that host +can support the UAPI data structures to be used for > > vIOMMU-pIOMMU +communications. Without the upfront compatibility > > checking, future +faults are difficult to report even in normal > > conditions. For example, +TLB invalidations should always succeed > > from vIOMMU's +perspective. There is no architectural way to report > > back to the vIOMMU +if the UAPI data is incompatible. For this > > reason the following IOMMU +UAPIs cannot fail: > > + > > +1. Free PASID > > +2. Unbind guest PASID > > +3. Unbind guest PASID table (SMMU) > > +4. Cache invalidate > > +5. Page response > > + > > +User applications such as QEMU is expected to import kernel UAPI > > +headers. Only backward compatibility is supported. For example, an > > +older QEMU (with
Re: [PATCH v2 02/15] iommu: Report domain nesting info
On Thu, 11 Jun 2020 05:15:21 -0700 Liu Yi L wrote: > IOMMUs that support nesting translation needs report the capability info > to userspace, e.g. the format of first level/stage paging structures. > > Cc: Kevin Tian > CC: Jacob Pan > Cc: Alex Williamson > Cc: Eric Auger > Cc: Jean-Philippe Brucker > Cc: Joerg Roedel > Cc: Lu Baolu > Signed-off-by: Liu Yi L > Signed-off-by: Jacob Pan > --- > @Jean, Eric: as nesting was introduced for ARM, but looks like no actual > user of it. right? So I'm wondering if we can reuse DOMAIN_ATTR_NESTING > to retrieve nesting info? how about your opinions? > > include/linux/iommu.h | 1 + > include/uapi/linux/iommu.h | 34 ++ > 2 files changed, 35 insertions(+) > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 78a26ae..f6e4b49 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -126,6 +126,7 @@ enum iommu_attr { > DOMAIN_ATTR_FSL_PAMUV1, > DOMAIN_ATTR_NESTING,/* two stages of translation */ > DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, > + DOMAIN_ATTR_NESTING_INFO, > DOMAIN_ATTR_MAX, > }; > > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > index 303f148..02eac73 100644 > --- a/include/uapi/linux/iommu.h > +++ b/include/uapi/linux/iommu.h > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data { > }; > }; > > +struct iommu_nesting_info { > + __u32 size; > + __u32 format; > + __u32 features; > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0) > +#define IOMMU_NESTING_FEAT_BIND_PGTBL(1 << 1) > +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2) > + __u32 flags; > + __u8data[]; > +}; > + > +/* > + * @flags: VT-d specific flags. Currently reserved for future > + * extension. > + * @addr_width: The output addr width of first level/stage translation > + * @pasid_bits: Maximum supported PASID bits, 0 represents no PASID > + * support. > + * @cap_reg: Describe basic capabilities as defined in VT-d capability > + * register. > + * @cap_mask:Mark valid capability bits in @cap_reg. > + * @ecap_reg:Describe the extended capabilities as defined in VT-d > + * extended capability register. > + * @ecap_mask: Mark the valid capability bits in @ecap_reg. Please explain this a little further, why do we need to tell userspace about cap/ecap register bits that aren't valid through this interface? Thanks, Alex > + */ > +struct iommu_nesting_info_vtd { > + __u32 flags; > + __u16 addr_width; > + __u16 pasid_bits; > + __u64 cap_reg; > + __u64 cap_mask; > + __u64 ecap_reg; > + __u64 ecap_mask; > +}; > + > #endif /* _UAPI_IOMMU_H */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 1/4] dma-direct: always align allocation size in dma_direct_alloc_pages()
dma_alloc_contiguous() does size >> PAGE_SHIFT and set_memory_decrypted() works at page granularity. It's necessary to page align the allocation size in dma_direct_alloc_pages() for consistent behavior. This also fixes an issue when arch_dma_prep_coherent() is called on an unaligned allocation size for dma_alloc_need_uncached() when CONFIG_DMA_DIRECT_REMAP is disabled but CONFIG_ARCH_HAS_DMA_SET_UNCACHED is enabled. Cc: sta...@vger.kernel.org Signed-off-by: David Rientjes --- kernel/dma/direct.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -112,11 +112,12 @@ static inline bool dma_should_free_from_pool(struct device *dev, struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp_t gfp, unsigned long attrs) { - size_t alloc_size = PAGE_ALIGN(size); int node = dev_to_node(dev); struct page *page = NULL; u64 phys_limit; + VM_BUG_ON(!PAGE_ALIGNED(size)); + if (attrs & DMA_ATTR_NO_WARN) gfp |= __GFP_NOWARN; @@ -124,14 +125,14 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp &= ~__GFP_ZERO; gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, _limit); - page = dma_alloc_contiguous(dev, alloc_size, gfp); + page = dma_alloc_contiguous(dev, size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { - dma_free_contiguous(dev, page, alloc_size); + dma_free_contiguous(dev, page, size); page = NULL; } again: if (!page) - page = alloc_pages_node(node, gfp, get_order(alloc_size)); + page = alloc_pages_node(node, gfp, get_order(size)); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { dma_free_contiguous(dev, page, size); page = NULL; @@ -158,8 +159,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret; + size = PAGE_ALIGN(size); + if (dma_should_alloc_from_pool(dev, gfp, attrs)) { - ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), , gfp); + ret = dma_alloc_from_pool(dev, size, , gfp); if (!ret) return NULL; goto done; @@ -183,10 +186,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, dma_alloc_need_uncached(dev, attrs)) || (IS_ENABLED(CONFIG_DMA_REMAP) && PageHighMem(page))) { /* remove any dirty cache lines on the kernel alias */ - arch_dma_prep_coherent(page, PAGE_ALIGN(size)); + arch_dma_prep_coherent(page, size); /* create a coherent mapping */ - ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size), + ret = dma_common_contiguous_remap(page, size, dma_pgprot(dev, PAGE_KERNEL, attrs), __builtin_return_address(0)); if (!ret) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 4/4] dma-direct: add missing set_memory_decrypted() for coherent mapping
When a coherent mapping is created in dma_direct_alloc_pages(), it needs to be decrypted if the device requires unencrypted DMA before returning. Fixes: 3acac065508f ("dma-mapping: merge the generic remapping helpers into dma-direct") Cc: sta...@vger.kernel.org # 5.5+ Signed-off-by: David Rientjes --- kernel/dma/direct.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -195,6 +195,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, __builtin_return_address(0)); if (!ret) goto out_free_pages; + if (force_dma_unencrypted(dev)) { + err = set_memory_decrypted((unsigned long)ret, + 1 << get_order(size)); + if (err) + goto out_free_pages; + } memset(ret, 0, size); goto done; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 2/4] dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails
If arch_dma_set_uncached() fails after memory has been decrypted, it needs to be re-encrypted before freeing. Fixes: fa7e2247c572 ("dma-direct: make uncached_kernel_address more general") Cc: sta...@vger.kernel.org # 5.7 Signed-off-by: David Rientjes --- kernel/dma/direct.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -220,7 +220,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, arch_dma_prep_coherent(page, size); ret = arch_dma_set_uncached(ret, size); if (IS_ERR(ret)) - goto out_free_pages; + goto out_encrypt_pages; } done: if (force_dma_unencrypted(dev)) @@ -228,6 +228,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, else *dma_handle = phys_to_dma(dev, page_to_phys(page)); return ret; +out_encrypt_pages: + if (force_dma_unencrypted(dev)) + set_memory_encrypted((unsigned long)page_address(page), +1 << get_order(size)); out_free_pages: dma_free_contiguous(dev, page, size); return NULL; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 0/4] dma-direct: dma_direct_alloc_pages() fixes for AMD SEV
While debugging recently reported issues concerning DMA allocation practices when CONFIG_AMD_MEM_ENCRYPT is enabled, some curiosities arose when looking at dma_direct_alloc_pages() behavior. Fix these up. These are likely all stable material, so proposing for 5.8. --- kernel/dma/direct.c | 42 -- 1 file changed, 32 insertions(+), 10 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 3/4] dma-direct: check return value when encrypting or decrypting memory
__change_page_attr() can fail which will cause set_memory_encrypted() and set_memory_decrypted() to return non-zero. If the device requires unencrypted DMA memory and decryption fails, simply free the memory and fail. If attempting to re-encrypt in the failure path and that encryption fails, there is no alternative other than to leak the memory. Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent buffers in common code") Cc: sta...@vger.kernel.org # 4.17+ Signed-off-by: David Rientjes --- kernel/dma/direct.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -158,6 +158,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, { struct page *page; void *ret; + int err; size = PAGE_ALIGN(size); @@ -210,8 +211,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, } ret = page_address(page); - if (force_dma_unencrypted(dev)) - set_memory_decrypted((unsigned long)ret, 1 << get_order(size)); + if (force_dma_unencrypted(dev)) { + err = set_memory_decrypted((unsigned long)ret, + 1 << get_order(size)); + if (err) + goto out_free_pages; + } memset(ret, 0, size); @@ -229,9 +234,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, *dma_handle = phys_to_dma(dev, page_to_phys(page)); return ret; out_encrypt_pages: - if (force_dma_unencrypted(dev)) - set_memory_encrypted((unsigned long)page_address(page), -1 << get_order(size)); + if (force_dma_unencrypted(dev)) { + err = set_memory_encrypted((unsigned long)page_address(page), + 1 << get_order(size)); + /* If memory cannot be re-encrypted, it must be leaked */ + if (err) + return NULL; + } out_free_pages: dma_free_contiguous(dev, page, size); return NULL; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users
On Wed, 10 Jun 2020 21:12:15 -0700 Jacob Pan wrote: > IOMMU UAPI data has an argsz field which is filled by user. As the data > structures expands, argsz may change. As the UAPI data are shared among > different architectures, extensions of UAPI data could be a result of > one architecture which has no impact on another. Therefore, these argsz > santity checks are performed in the model specific IOMMU drivers. This > patch adds sanity checks in the VT-d to ensure argsz passed by userspace > matches feature flags and other contents. > > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel-iommu.c | 16 > drivers/iommu/intel-svm.c | 12 > 2 files changed, 28 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 27ebf4b9faef..c98b5109684b 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, > struct device *dev, > struct device_domain_info *info; > struct intel_iommu *iommu; > unsigned long flags; > + unsigned long minsz; > int cache_type; > u8 bus, devfn; > u16 did, sid; > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct iommu_domain > *domain, struct device *dev, > if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) > return -EINVAL; > > + minsz = offsetofend(struct iommu_cache_invalidate_info, padding); Would it still be better to look for the end of the last field that's actually used to avoid the code churn and oversights if/when the padding field does get used and renamed? Per my comment on patch 1/, this also seems like where the device specific IOMMU driver should also have the responsibility of receiving a __user pointer to do the copy_from_user() here. vfio can't know which flags require which fields to make a UAPI with acceptable compatibility guarantees otherwise. > + if (inv_info->argsz < minsz) > + return -EINVAL; > + > + /* Sanity check user filled invalidation dat sizes */ > + if (inv_info->granularity == IOMMU_INV_GRANU_ADDR && > + inv_info->argsz != offsetofend(struct > iommu_cache_invalidate_info, > + addr_info)) > + return -EINVAL; > + > + if (inv_info->granularity == IOMMU_INV_GRANU_PASID && > + inv_info->argsz != offsetofend(struct > iommu_cache_invalidate_info, > + pasid_info)) > + return -EINVAL; > + > spin_lock_irqsave(_domain_lock, flags); > spin_lock(>lock); > info = get_domain_info(dev); > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > index 35b43fe819ed..64dc2c66dfff 100644 > --- a/drivers/iommu/intel-svm.c > +++ b/drivers/iommu/intel-svm.c > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, > struct device *dev, > struct dmar_domain *dmar_domain; > struct intel_svm_dev *sdev; > struct intel_svm *svm; > + unsigned long minsz; > int ret = 0; > > if (WARN_ON(!iommu) || !data) > return -EINVAL; > > + /* > + * We mandate that no size change in IOMMU UAPI data before the > + * variable size union at the end. > + */ > + minsz = offsetofend(struct iommu_gpasid_bind_data, padding); Same. Thanks, Alex > + if (data->argsz < minsz) > + return -EINVAL; > + > if (data->version != IOMMU_GPASID_BIND_VERSION_1 || > data->format != IOMMU_PASID_FORMAT_INTEL_VTD) > return -EINVAL; > > + if (data->argsz != offsetofend(struct iommu_gpasid_bind_data, vtd)) > + return -EINVAL; > + > if (!dev_is_pci(dev)) > return -ENOTSUPP; > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data
On Wed, 10 Jun 2020 21:12:14 -0700 Jacob Pan wrote: > As IOMMU UAPI gets extended, user data size may increase. To support > backward compatibiliy, this patch introduces a size field to each UAPI > data structures. It is *always* the responsibility for the user to fill in > the correct size. Though at the same time, argsz is user provided data which we don't trust. The argsz field allows the user to indicate how much data they're providing, it's still the kernel's responsibility to validate whether it's correct and sufficient for the requested operation. Thanks, Alex > Specific scenarios for user data handling are documented in: > Documentation/userspace-api/iommu.rst > > Signed-off-by: Liu Yi L > Signed-off-by: Jacob Pan > --- > include/uapi/linux/iommu.h | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > index e907b7091a46..303f148a5cd7 100644 > --- a/include/uapi/linux/iommu.h > +++ b/include/uapi/linux/iommu.h > @@ -135,6 +135,7 @@ enum iommu_page_response_code { > > /** > * struct iommu_page_response - Generic page response information > + * @argsz: User filled size of this data > * @version: API version of this structure > * @flags: encodes whether the corresponding fields are valid > * (IOMMU_FAULT_PAGE_RESPONSE_* values) > @@ -143,6 +144,7 @@ enum iommu_page_response_code { > * @code: response code from iommu_page_response_code > */ > struct iommu_page_response { > + __u32 argsz; > #define IOMMU_PAGE_RESP_VERSION_11 > __u32 version; > #define IOMMU_PAGE_RESP_PASID_VALID (1 << 0) > @@ -218,6 +220,7 @@ struct iommu_inv_pasid_info { > /** > * struct iommu_cache_invalidate_info - First level/stage invalidation > * information > + * @argsz: User filled size of this data > * @version: API version of this structure > * @cache: bitfield that allows to select which caches to invalidate > * @granularity: defines the lowest granularity used for the invalidation: > @@ -246,6 +249,7 @@ struct iommu_inv_pasid_info { > * must support the used granularity. > */ > struct iommu_cache_invalidate_info { > + __u32 argsz; > #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 > __u32 version; > /* IOMMU paging structure cache */ > @@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd { > > /** > * struct iommu_gpasid_bind_data - Information about device and guest PASID > binding > + * @argsz: User filled size of this data > * @version: Version of this data structure > * @format: PASID table entry format > * @flags: Additional information on guest bind request > @@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd { > * PASID to host PASID based on this bind data. > */ > struct iommu_gpasid_bind_data { > + __u32 argsz; > #define IOMMU_GPASID_BIND_VERSION_1 1 > __u32 version; > #define IOMMU_PASID_FORMAT_INTEL_VTD 1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/3] docs: IOMMU user API
Hi Jon, On Thu, 11 Jun 2020 07:55:00 -0600 Jonathan Corbet wrote: > On Wed, 10 Jun 2020 21:12:13 -0700 > Jacob Pan wrote: > > A little nit but...this pattern: > > > +pattern below: > > + > > +:: > > + > > + struct { > > + __u32 argsz; > > + __u32 flags; > > + __u8 data[]; > > + } > > can be more concisely and attractively written as: > >pattern below:: > > struct { > ... > Will do. thanks! > Thanks, > > jon [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/3] docs: IOMMU user API
On Wed, 10 Jun 2020 21:12:13 -0700 Jacob Pan wrote: > IOMMU UAPI is newly introduced to support communications between guest > virtual IOMMU and host IOMMU. There has been lots of discussions on how > it should work with VFIO UAPI and userspace in general. > > This document is indended to clarify the UAPI design and usage. The > mechenics of how future extensions should be achieved are also covered > in this documentation. > > Signed-off-by: Liu Yi L > Signed-off-by: Jacob Pan > --- > Documentation/userspace-api/iommu.rst | 210 > ++ > 1 file changed, 210 insertions(+) > create mode 100644 Documentation/userspace-api/iommu.rst > > diff --git a/Documentation/userspace-api/iommu.rst > b/Documentation/userspace-api/iommu.rst > new file mode 100644 > index ..e95dc5a04a41 > --- /dev/null > +++ b/Documentation/userspace-api/iommu.rst > @@ -0,0 +1,210 @@ > +.. SPDX-License-Identifier: GPL-2.0 > +.. iommu: > + > += > +IOMMU Userspace API > += > + > +IOMMU UAPI is used for virtualization cases where communications are > +needed between physical and virtual IOMMU drivers. For native > +usage, IOMMU is a system device which does not need to communicate > +with user space directly. > + > +The primary use cases are guest Shared Virtual Address (SVA) and > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is > +required to communicate with the physical IOMMU in the host. > + > +.. contents:: :local: > + > +Functionalities > + > +Communications of user and kernel involve both directions. The > +supported user-kernel APIs are as follows: > + > +1. Alloc/Free PASID > +2. Bind/unbind guest PASID (e.g. Intel VT-d) > +3. Bind/unbind guest PASID table (e.g. ARM sMMU) > +4. Invalidate IOMMU caches > +5. Service page request > + > +Requirements > + > +The IOMMU UAPIs are generic and extensible to meet the following > +requirements: > + > +1. Emulated and para-virtualised vIOMMUs > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.) > +3. Extensions to the UAPI shall not break existing user space > + > +Interfaces > + > +Although the data structures defined in IOMMU UAPI are self-contained, > +there is no user API functions introduced. Instead, IOMMU UAPI is > +designed to work with existing user driver frameworks such as VFIO. > + > +Extension Rules & Precautions > +- > +When IOMMU UAPI gets extended, the data structures can *only* be > +modified in two ways: > + > +1. Adding new fields by re-purposing the padding[] field. No size change. > +2. Adding new union members at the end. May increase in size. > + > +No new fields can be added *after* the variable size union in that it > +will break backward compatibility when offset moves. In both cases, a > +new flag must be accompanied with a new field such that the IOMMU > +driver can process the data based on the new flag. Version field is > +only reserved for the unlikely event of UAPI upgrade at its entirety. > + > +It's *always* the caller's responsibility to indicate the size of the > +structure passed by setting argsz appropriately. > + > +When IOMMU UAPI extension results in size increase, user such as VFIO > +has to handle the following scenarios: > + > +1. User and kernel has exact size match > +2. An older user with older kernel header (smaller UAPI size) running on a > + newer kernel (larger UAPI size) > +3. A newer user with newer kernel header (larger UAPI size) running > + on a older kernel. > +4. A malicious/misbehaving user pass illegal/invalid size but within > + range. The data may contain garbage. > + > + > +Feature Checking > + > +While launching a guest with vIOMMU, it is important to ensure that host > +can support the UAPI data structures to be used for vIOMMU-pIOMMU > +communications. Without the upfront compatibility checking, future > +faults are difficult to report even in normal conditions. For example, > +TLB invalidations should always succeed from vIOMMU's > +perspective. There is no architectural way to report back to the vIOMMU > +if the UAPI data is incompatible. For this reason the following IOMMU > +UAPIs cannot fail: > + > +1. Free PASID > +2. Unbind guest PASID > +3. Unbind guest PASID table (SMMU) > +4. Cache invalidate > +5. Page response > + > +User applications such as QEMU is expected to import kernel UAPI > +headers. Only backward compatibility is supported. For example, an > +older QEMU (with older kernel header) can run on newer kernel. Newer > +QEMU (with new kernel header) may fail on older kernel. "Build your user application against newer kernels and it may break on older kernels" is not a great selling point of this UAPI. Clearly new features may not be available on older kernels and an
Re: [PATCH v2 1/3] docs: IOMMU user API
On Wed, 10 Jun 2020 21:12:13 -0700 Jacob Pan wrote: A little nit but...this pattern: > +pattern below: > + > +:: > + > + struct { > + __u32 argsz; > + __u32 flags; > + __u8 data[]; > + } can be more concisely and attractively written as: pattern below:: struct { ... Thanks, jon ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU
On Thu, Jun 11, 2020 at 10:54:45AM +0800, Zhangfei Gao wrote: > On 2020/6/10 上午12:49, Bjorn Helgaas wrote: > > On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote: > > > On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao > > > wrote: > > > > On 2020/6/9 上午12:41, Bjorn Helgaas wrote: > > > > > On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote: > > > > > > On 2020/6/6 上午7:19, Bjorn Helgaas wrote: > > > > > > > > +++ b/drivers/iommu/iommu.c > > > > > > > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device > > > > > > > > *dev, struct > > > > > > > > fwnode_handle *iommu_fwnode, > > > > > > > >fwspec->iommu_fwnode = iommu_fwnode; > > > > > > > >fwspec->ops = ops; > > > > > > > >dev_iommu_fwspec_set(dev, fwspec); > > > > > > > > + > > > > > > > > + if (dev_is_pci(dev)) > > > > > > > > + pci_fixup_device(pci_fixup_final, > > > > > > > > to_pci_dev(dev)); > > > > > > > > + > > > > > > > > > > > > > > > > Then pci_fixup_final will be called twice, the first in > > > > > > > > pci_bus_add_device. > > > > > > > > Here in iommu_fwspec_init is the second time, specifically for > > > > > > > > iommu_fwspec. > > > > > > > > Will send this when 5.8-rc1 is open. > > > > > > > Wait, this whole fixup approach seems wrong to me. No matter how > > > > > > > you > > > > > > > do the fixup, it's still a fixup, which means it requires ongoing > > > > > > > maintenance. Surely we don't want to have to add the > > > > > > > Vendor/Device ID > > > > > > > for every new AMBA device that comes along, do we? > > > > > > > > > > > > > Here the fake pci device has standard PCI cfg space, but physical > > > > > > implementation is base on AMBA > > > > > > They can provide pasid feature. > > > > > > However, > > > > > > 1, does not support tlp since they are not real pci devices. > > > > > > 2. does not support pri, instead support stall (provided by smmu) > > > > > > And stall is not a pci feature, so it is not described in struct > > > > > > pci_dev, > > > > > > but in struct iommu_fwspec. > > > > > > So we use this fixup to tell pci system that the devices can > > > > > > support stall, > > > > > > and hereby support pasid. > > > > > This did not answer my question. Are you proposing that we update a > > > > > quirk every time a new AMBA device is released? I don't think that > > > > > would be a good model. > > > > Yes, you are right, but we do not have any better idea yet. > > > > Currently we have three fake pci devices, which support stall and pasid. > > > > We have to let pci system know the device can support pasid, because of > > > > stall feature, though not support pri. > > > > Do you have any other ideas? > > > It sounds like the best way would be to allocate a PCI capability for it, > > > so > > > detection can be done through config space, at least in future devices, > > > or possibly after a firmware update if the config space in your system > > > is controlled by firmware somewhere. Once there is a proper mechanism > > > to do this, using fixups to detect the early devices that don't use that > > > should be uncontroversial. I have no idea what the process or timeline > > > is to add new capabilities into the PCIe specification, or if this one > > > would be acceptable to the PCI SIG at all. > > That sounds like a possibility. The spec already defines a > > Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might > > be a candidate. > Will investigate this, thanks Bjorn FWIW, there's also a Vendor-Specific Capability that can appear in the first 256 bytes of config space (the Vendor-Specific Extended Capability must appear in the "Extended Configuration Space" from 0x100-0xfff). > > > If detection cannot be done through PCI config space, the next best > > > alternative is to pass auxiliary data through firmware. On DT based > > > machines, you can list non-hotpluggable PCIe devices and add custom > > > properties that could be read during device enumeration. I assume > > > ACPI has something similar, but I have not done that. > Yes, thanks Arnd > > ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate. I > > like this better than a PCI capability because the property you need > > to expose is not a PCI property. > _DSM may not workable, since it is working in runtime. > We need stall information in init stage, neither too early (after allocation > of iommu_fwspec) > nor too late (before arm_smmu_add_device ). I'm not aware of a restriction on when _DSM can be evaluated. I'm looking at ACPI v6.3, sec 9.1.1. Are you seeing something different? > By the way, It would be a long time if we need modify either pcie > spec or acpi spec. Can we use pci_fixup_device in iommu_fwspec_init > first, it is relatively simple and meet the requirement of platform > device using pasid, and they are already in product. Neither the PCI Vendor-Specific Capability nor the ACPI _DSM requires a spec change. Both can
[PATCH v2 15/15] iommu/vt-d: Support reporting nesting capability info
Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/iommu/intel-iommu.c | 81 +++-- include/linux/intel-iommu.h | 16 + 2 files changed, 95 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 25650ac..5415dc7 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5655,12 +5655,16 @@ static inline bool iommu_pasid_support(void) static inline bool nested_mode_support(void) { struct dmar_drhd_unit *drhd; - struct intel_iommu *iommu; + struct intel_iommu *iommu, *prev = NULL; bool ret = true; rcu_read_lock(); for_each_active_iommu(iommu, drhd) { - if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) { + if (!prev) + prev = iommu; + if (!sm_supported(iommu) || !ecap_nest(iommu->ecap) || + (VTD_CAP_MASK & (iommu->cap ^ prev->cap)) || + (VTD_ECAP_MASK & (iommu->ecap ^ prev->ecap))) { ret = false; break; } @@ -6069,11 +6073,84 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain, return ret; } +static int intel_iommu_get_nesting_info(struct iommu_domain *domain, + struct iommu_nesting_info *info) +{ + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK; + struct device_domain_info *domain_info; + struct iommu_nesting_info_vtd vtd; + unsigned long flags; + u32 size; + + if ((domain->type != IOMMU_DOMAIN_UNMANAGED) || + !(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) + return -ENODEV; + + if (!info) + return -EINVAL; + + size = sizeof(struct iommu_nesting_info) + + sizeof(struct iommu_nesting_info_vtd); + /* +* if provided buffer size is not equal to the size, should +* return 0 and also the expected buffer size to caller. +*/ + if (info->size != size) { + info->size = size; + return 0; + } + + spin_lock_irqsave(_domain_lock, flags); + /* +* arbitrary select the first domain_info as all nesting +* related capabilities should be consistent across iommu +* units. +*/ + domain_info = list_first_entry(_domain->devices, + struct device_domain_info, link); + cap &= domain_info->iommu->cap; + ecap &= domain_info->iommu->ecap; + spin_unlock_irqrestore(_domain_lock, flags); + + info->format = IOMMU_PASID_FORMAT_INTEL_VTD; + info->features = IOMMU_NESTING_FEAT_SYSWIDE_PASID | +IOMMU_NESTING_FEAT_BIND_PGTBL | +IOMMU_NESTING_FEAT_CACHE_INVLD; + vtd.flags = 0; + vtd.addr_width = dmar_domain->gaw; + vtd.pasid_bits = ilog2(intel_pasid_max_id); + vtd.cap_reg = cap; + vtd.cap_mask = VTD_CAP_MASK; + vtd.ecap_reg = ecap; + vtd.ecap_mask = VTD_ECAP_MASK; + + memcpy(info->data, , sizeof(vtd)); + return 0; +} + +static int intel_iommu_domain_get_attr(struct iommu_domain *domain, + enum iommu_attr attr, void *data) +{ + switch (attr) { + case DOMAIN_ATTR_NESTING_INFO: + { + struct iommu_nesting_info *info = + (struct iommu_nesting_info *) data; + + return intel_iommu_get_nesting_info(domain, info); + } + default: + return -ENODEV; + } +} + const struct iommu_ops intel_iommu_ops = { .capable= intel_iommu_capable, .domain_alloc = intel_iommu_domain_alloc, .domain_free= intel_iommu_domain_free, .domain_set_attr= intel_iommu_domain_set_attr, + .domain_get_attr= intel_iommu_domain_get_attr, .attach_dev = intel_iommu_attach_device, .detach_dev = intel_iommu_detach_device, .aux_attach_dev = intel_iommu_aux_attach_device, diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 0b238c3..be48f4e 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -196,6 +196,22 @@ #define ecap_max_handle_mask(e) ((e >> 20) & 0xf) #define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */ +/* Nesting Support Capability Alignment */ +#define VTD_CAP_FL1GP (1ULL << 56) +#define VTD_CAP_FL5LP (1ULL << 60) +#define VTD_ECAP_PRS (1ULL << 29) +#define VTD_ECAP_ERS (1ULL << 30) +#define VTD_ECAP_SRS (1ULL <<
[PATCH v2 09/15] iommu/vt-d: Check ownership for PASIDs from user-space
When an IOMMU domain with nesting attribute is used for guest SVA, a system-wide PASID is allocated for binding with the device and the domain. For security reason, we need to check the PASID passsed from user-space. e.g. page table bind/unbind and PASID related cache invalidation. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/iommu/intel-iommu.c | 10 ++ drivers/iommu/intel-svm.c | 6 -- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 2d59a5d..25650ac 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5433,6 +5433,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, int granu = 0; u64 pasid = 0; u64 addr = 0; + void *pdata; granu = to_vtd_granularity(cache_type, inv_info->granularity); if (granu == -EINVAL) { @@ -5452,6 +5453,15 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, (inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID)) pasid = inv_info->addr_info.pasid; + pdata = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL); + if (!pdata) { + ret = -EINVAL; + goto out_unlock; + } else if (IS_ERR(pdata)) { + ret = PTR_ERR(pdata); + goto out_unlock; + } + switch (BIT(cache_type)) { case IOMMU_CACHE_INV_TYPE_IOTLB: /* HW will ignore LSB bits based on address mask */ diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index bf55e2f..49059c1 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -332,7 +332,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev, dmar_domain = to_dmar_domain(domain); mutex_lock(_mutex); - svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL); + svm = ioasid_find(dmar_domain->ioasid_sid, data->hpasid, NULL); if (IS_ERR(svm)) { ret = PTR_ERR(svm); goto out; @@ -450,6 +450,7 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain, struct iommu_gpasid_unbind_data *data) { struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); + struct dmar_domain *dmar_domain; struct intel_svm_dev *sdev; struct intel_svm *svm; int ret = -EINVAL; @@ -464,9 +465,10 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain, return -EINVAL; pasid = data->pasid; + dmar_domain = to_dmar_domain(domain); mutex_lock(_mutex); - svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL); + svm = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL); if (!svm) { ret = -EINVAL; goto out; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 03/15] vfio/type1: Report iommu nesting info to userspace
This patch exports iommu nesting capability info to user space through VFIO. User space is expected to check this info for supported uAPIs (e.g. PASID alloc/free, bind page table, and cache invalidation) and the vendor specific format information for first level/stage page table that will be bound to. The nesting info is available only after the nesting iommu type is set for a container. Current implementation imposes one limitation - one nesting container should include at most one group. The philosophy of vfio container is having all groups/devices within the container share the same IOMMU context. When vSVA is enabled, one IOMMU context could include one 2nd-level address space and multiple 1st-level address spaces. While the 2nd-leve address space is reasonably sharable by multiple groups , blindly sharing 1st-level address spaces across all groups within the container might instead break the guest expectation. In the future sub/ super container concept might be introduced to allow partial address space sharing within an IOMMU context. But for now let's go with this restriction by requiring singleton container for using nesting iommu features. Below link has the related discussion about this decision. https://lkml.org/lkml/2020/5/15/1028 Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 73 + include/uapi/linux/vfio.h | 9 + 2 files changed, 82 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 402aad3..22432cf 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -71,6 +71,7 @@ struct vfio_iommu { unsigned intdma_avail; boolv2; boolnesting; + struct iommu_nesting_info *nesting_info; }; struct vfio_domain { @@ -125,6 +126,9 @@ struct vfio_regions { #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\ (!list_empty(>domain_list)) +#define IS_DOMAIN_IN_CONTAINER(iommu) ((iommu->external_domain) || \ +(!list_empty(>domain_list))) + static int put_pfn(unsigned long pfn, int prot); /* @@ -1641,6 +1645,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, } } + /* Nesting type container can include only one group */ + if (iommu->nesting && IS_DOMAIN_IN_CONTAINER(iommu)) { + mutex_unlock(>lock); + return -EINVAL; + } + group = kzalloc(sizeof(*group), GFP_KERNEL); domain = kzalloc(sizeof(*domain), GFP_KERNEL); if (!group || !domain) { @@ -1700,6 +1710,36 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, if (ret) goto out_domain; + /* Nesting cap info is available only after attaching */ + if (iommu->nesting) { + struct iommu_nesting_info tmp; + struct iommu_nesting_info *info; + + /* First get the size of vendor specific nesting info */ + ret = iommu_domain_get_attr(domain->domain, + DOMAIN_ATTR_NESTING_INFO, + ); + if (ret) + goto out_detach; + + info = kzalloc(tmp.size, GFP_KERNEL); + if (!info) { + ret = -ENOMEM; + goto out_detach; + } + + /* Now get the nesting info */ + info->size = tmp.size; + ret = iommu_domain_get_attr(domain->domain, + DOMAIN_ATTR_NESTING_INFO, + info); + if (ret) { + kfree(info); + goto out_detach; + } + iommu->nesting_info = info; + } + /* Get aperture info */ iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, ); @@ -1801,6 +1841,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, return 0; out_detach: + kfree(iommu->nesting_info); vfio_iommu_detach_group(domain, group); out_domain: iommu_domain_free(domain->domain); @@ -1998,6 +2039,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, vfio_iommu_unmap_unpin_all(iommu); else vfio_iommu_unmap_unpin_reaccount(iommu); + + kfree(iommu->nesting_info); } iommu_domain_free(domain->domain); list_del(>next); @@ -2190,6 +2233,30 @@ static int vfio_iommu_iova_build_caps(struct
[PATCH v2 06/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
This patch allows user space to request PASID allocation/free, e.g. when serving the request from the guest. PASIDs that are not freed by userspace are automatically freed when the IOASID set is destroyed when process exits. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L Signed-off-by: Yi Sun Signed-off-by: Jacob Pan --- v1 -> v2: *) move the vfio_mm related code to be a seprate module *) use a single structure for alloc/free, could support a range of PASIDs *) fetch vfio_mm at group_attach time instead of at iommu driver open time drivers/vfio/Kconfig| 1 + drivers/vfio/vfio_iommu_type1.c | 96 - drivers/vfio/vfio_pasid.c | 10 + include/linux/vfio.h| 6 +++ include/uapi/linux/vfio.h | 36 5 files changed, 147 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index 3d8a108..95d90c6 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -2,6 +2,7 @@ config VFIO_IOMMU_TYPE1 tristate depends on VFIO + select VFIO_PASID if (X86) default n config VFIO_IOMMU_SPAPR_TCE diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 22432cf..a7b3a83 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -72,6 +72,7 @@ struct vfio_iommu { boolv2; boolnesting; struct iommu_nesting_info *nesting_info; + struct vfio_mm *vmm; }; struct vfio_domain { @@ -1615,6 +1616,17 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu, list_splice_tail(iova_copy, iova); } + +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu) +{ + if (iommu->vmm) { + vfio_mm_put(iommu->vmm); + iommu->vmm = NULL; + } + + kfree(iommu->nesting_info); +} + static int vfio_iommu_type1_attach_group(void *iommu_data, struct iommu_group *iommu_group) { @@ -1738,6 +1750,25 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, goto out_detach; } iommu->nesting_info = info; + + if (info->features & IOMMU_NESTING_FEAT_SYSWIDE_PASID) { + struct vfio_mm *vmm; + int sid; + + vmm = vfio_mm_get_from_task(current); + if (IS_ERR(vmm)) { + ret = PTR_ERR(vmm); + goto out_detach; + } + iommu->vmm = vmm; + + sid = vfio_mm_ioasid_sid(vmm); + ret = iommu_domain_set_attr(domain->domain, + DOMAIN_ATTR_IOASID_SID, + ); + if (ret) + goto out_detach; + } } /* Get aperture info */ @@ -1841,7 +1872,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, return 0; out_detach: - kfree(iommu->nesting_info); + if (iommu->nesting_info) + vfio_iommu_release_nesting_info(iommu); vfio_iommu_detach_group(domain, group); out_domain: iommu_domain_free(domain->domain); @@ -2040,7 +2072,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, else vfio_iommu_unmap_unpin_reaccount(iommu); - kfree(iommu->nesting_info); + if (iommu->nesting_info) + vfio_iommu_release_nesting_info(iommu); } iommu_domain_free(domain->domain); list_del(>next); @@ -2363,6 +2396,63 @@ static int vfio_iommu_type1_unmap_dma(struct vfio_iommu *iommu, } +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu, + unsigned int min, + unsigned int max) +{ + int ret = -ENOTSUPP; + + mutex_lock(>lock); + if (iommu->vmm) + ret = vfio_pasid_alloc(iommu->vmm, min, max); + mutex_unlock(>lock); + return ret; +} + +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, + unsigned int min, + unsigned int max) +{ + int ret = -ENOTSUPP; + + mutex_lock(>lock); + if (iommu->vmm) { + vfio_pasid_free_range(iommu->vmm, min, max); + ret = 0; + } + mutex_unlock(>lock); + return ret; +} + +static int
[PATCH v2 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache
This patch provides an interface allowing the userspace to invalidate IOMMU cache for first-level page table. It is required when the first level IOMMU page table is not managed by the host kernel in the nested translation setup. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L Signed-off-by: Eric Auger Signed-off-by: Jacob Pan --- v1 -> v2: *) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type" *) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn() *) vfio_dev_cache_inv_fn() always successful *) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse VFIO_IOMMU_NESTING_OP drivers/vfio/vfio_iommu_type1.c | 59 + include/uapi/linux/vfio.h | 3 +++ 2 files changed, 62 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index f1468a0..c233c8e 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2582,6 +2582,54 @@ static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu, return ret; } +static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_cache_invalidate_info *cache_info = + (struct iommu_cache_invalidate_info *) dc->data; + + iommu_cache_invalidate(dc->domain, dev, cache_info); + return 0; +} + +static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu, + struct iommu_cache_invalidate_info *cache_info) +{ + struct domain_capsule dc = { .data = cache_info }; + struct vfio_group *group; + struct vfio_domain *domain; + int ret = 0; + struct iommu_nesting_info *info; + + mutex_lock(>lock); + /* +* Cache invalidation is required for any nesting IOMMU, +* so no need to check system-wide PASID support. +*/ + info = iommu->nesting_info; + if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) { + ret = -ENOTSUPP; + goto out_unlock; + } + + group = vfio_find_nesting_group(iommu); + if (!group) { + ret = -EINVAL; + goto out_unlock; + } + + domain = list_first_entry(>domain_list, + struct vfio_domain, next); + dc.group = group; + dc.domain = domain->domain; + iommu_group_for_each_dev(group->iommu_group, , +vfio_dev_cache_invalidate_fn); + +out_unlock: + mutex_unlock(>lock); + return ret; +} + static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu, unsigned long arg) { @@ -2607,6 +2655,9 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu, case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL: data_size = sizeof(struct iommu_gpasid_unbind_data); break; + case VFIO_IOMMU_NESTING_OP_CACHE_INVLD: + data_size = sizeof(struct iommu_cache_invalidate_info); + break; default: return -EINVAL; } @@ -2633,6 +2684,14 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu, case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL: ret = vfio_iommu_handle_pgtbl_op(iommu, false, data); break; + case VFIO_IOMMU_NESTING_OP_CACHE_INVLD: + { + struct iommu_cache_invalidate_info *cache_info = + (struct iommu_cache_invalidate_info *)data; + + ret = vfio_iommu_invalidate_cache(iommu, cache_info); + break; + } default: ret = -EINVAL; } diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index e4aa466..9a011d6 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -891,6 +891,8 @@ struct vfio_iommu_type1_pasid_request { * +-+---+ * | UNBIND_PGTBL| struct iommu_gpasid_unbind_data | * +-+---+ + * | CACHE_INVLD | struct iommu_cache_invalidate_info | + * +-+---+ * * returns: 0 on success, -errno on failure. */ @@ -903,6 +905,7 @@ struct vfio_iommu_type1_nesting_op { #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL (0) #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL (1) +#define VFIO_IOMMU_NESTING_OP_CACHE_INVLD (2) #define VFIO_IOMMU_NESTING_OP _IO(VFIO_TYPE, VFIO_BASE + 23) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 05/15] iommu/vt-d: Support setting ioasid set to domain
>From IOMMU p.o.v., PASIDs allocated and managed by external components (e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU needs some knowledge to check the PASID ownership, hence add an interface for those components to tell the PASID owner. In latest kernel design, PASID ownership is managed by IOASID set where the PASID is allocated from. This patch adds support for setting ioasid set ID to the domains used for nesting/vSVA. Subsequent SVA operations on the PASID will be checked against its IOASID set for proper ownership. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/iommu/intel-iommu.c | 16 include/linux/intel-iommu.h | 4 include/linux/iommu.h | 1 + 3 files changed, 21 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 5628e4b..2d59a5d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1788,6 +1788,7 @@ static struct dmar_domain *alloc_domain(int flags) if (first_level_by_default()) domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL; domain->has_iotlb_device = false; + domain->ioasid_sid = INVALID_IOASID_SET; INIT_LIST_HEAD(>devices); return domain; @@ -6035,6 +6036,21 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain, } spin_unlock_irqrestore(_domain_lock, flags); break; + case DOMAIN_ATTR_IOASID_SID: + if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) { + ret = -ENODEV; + break; + } + if ((dmar_domain->ioasid_sid != INVALID_IOASID_SET) && + (dmar_domain->ioasid_sid != (*(int *) data))) { + pr_warn_ratelimited("multi ioasid_set (%d:%d) setting", + dmar_domain->ioasid_sid, + (*(int *) data)); + ret = -EBUSY; + break; + } + dmar_domain->ioasid_sid = *(int *) data; + break; default: ret = -EINVAL; break; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 1e02624..29d1c6f 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -548,6 +548,10 @@ struct dmar_domain { 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */ u64 max_addr; /* maximum mapped address */ + int ioasid_sid; /* +* the ioasid set which tracks all +* PASIDs used by the domain. +*/ int default_pasid; /* * The default pasid used for non-SVM * traffic on mediated devices. diff --git a/include/linux/iommu.h b/include/linux/iommu.h index f6e4b49..57c46ae 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -127,6 +127,7 @@ enum iommu_attr { DOMAIN_ATTR_NESTING,/* two stages of translation */ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, DOMAIN_ATTR_NESTING_INFO, + DOMAIN_ATTR_IOASID_SID, DOMAIN_ATTR_MAX, }; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 04/15] vfio: Add PASID allocation/free support
Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing multiple process virtual address spaces with the device for simplified programming model. PASID is used to tag an virtual address space in DMA requests and to identify the related translation structure in IOMMU. When a PASID-capable device is assigned to a VM, we want the same capability of using PASID to tag guest process virtual address spaces to achieve virtual SVA (vSVA). PASID management for guest is vendor specific. Some vendors (e.g. Intel VT-d) requires system-wide managed PASIDs cross all devices, regardless of whether a device is used by host or assigned to guest. Other vendors (e.g. ARM SMMU) may allow PASIDs managed per-device thus could be fully delegated to the guest for assigned devices. For system-wide managed PASIDs, this patch introduces a vfio module to handle explicit PASID alloc/free requests from guest. Allocated PASIDs are associated to a process (or, mm_struct) in IOASID core. A vfio_mm object is introduced to track mm_struct. Multiple VFIO containers within a process share the same vfio_mm object. A quota mechanism is provided to prevent malicious user from exhausting available PASIDs. Currently the quota is a global parameter applied to all VFIO devices. In the future per-device quota might be supported too. Cc: Kevin Tian CC: Jacob Pan Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Suggested-by: Alex Williamson Signed-off-by: Liu Yi L --- v1 -> v2: *) added in v2, split from the pasid alloc/free support of v1 drivers/vfio/Kconfig | 5 ++ drivers/vfio/Makefile | 1 + drivers/vfio/vfio_pasid.c | 151 ++ include/linux/vfio.h | 28 + 4 files changed, 185 insertions(+) create mode 100644 drivers/vfio/vfio_pasid.c diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index fd17db9..3d8a108 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -19,6 +19,11 @@ config VFIO_VIRQFD depends on VFIO && EVENTFD default n +config VFIO_PASID + tristate + depends on IOASID && VFIO + default n + menuconfig VFIO tristate "VFIO Non-Privileged userspace driver framework" depends on IOMMU_API diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index de67c47..bb836a3 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o obj-$(CONFIG_VFIO) += vfio.o obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o +obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c new file mode 100644 index 000..dd5b6d1 --- /dev/null +++ b/drivers/vfio/vfio_pasid.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2020 Intel Corporation. + * Author: Liu Yi L + * + */ + +#include +#include +#include +#include +#include +#include + +#define DRIVER_VERSION "0.1" +#define DRIVER_AUTHOR "Liu Yi L " +#define DRIVER_DESC "PASID management for VFIO bus drivers" + +#define VFIO_DEFAULT_PASID_QUOTA 1000 +static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA; +module_param_named(pasid_quota, pasid_quota, uint, 0444); +MODULE_PARM_DESC(pasid_quota, +" Set the quota for max number of PASIDs that an application is allowed to request (default 1000)"); + +struct vfio_mm_token { + unsigned long long val; +}; + +struct vfio_mm { + struct kref kref; + struct vfio_mm_tokentoken; + int ioasid_sid; + int pasid_quota; + struct list_headnext; +}; + +static struct vfio_pasid { + struct mutexvfio_mm_lock; + struct list_headvfio_mm_list; +} vfio_pasid; + +/* called with vfio.vfio_mm_lock held */ +static void vfio_mm_release(struct kref *kref) +{ + struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref); + + list_del(>next); + mutex_unlock(_pasid.vfio_mm_lock); + ioasid_free_set(vmm->ioasid_sid, true); + kfree(vmm); +} + +void vfio_mm_put(struct vfio_mm *vmm) +{ + kref_put_mutex(>kref, vfio_mm_release, _pasid.vfio_mm_lock); +} + +static void vfio_mm_get(struct vfio_mm *vmm) +{ + kref_get(>kref); +} + +struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task) +{ + struct mm_struct *mm = get_task_mm(task); + struct vfio_mm *vmm; + unsigned long long val = (unsigned long long) mm; + int ret; + + mutex_lock(_pasid.vfio_mm_lock); + /* Search existing vfio_mm with current mm pointer */ + list_for_each_entry(vmm, _pasid.vfio_mm_list, next) { + if (vmm->token.val == val) { + vfio_mm_get(vmm); + goto
[PATCH v2 02/15] iommu: Report domain nesting info
IOMMUs that support nesting translation needs report the capability info to userspace, e.g. the format of first level/stage paging structures. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- @Jean, Eric: as nesting was introduced for ARM, but looks like no actual user of it. right? So I'm wondering if we can reuse DOMAIN_ATTR_NESTING to retrieve nesting info? how about your opinions? include/linux/iommu.h | 1 + include/uapi/linux/iommu.h | 34 ++ 2 files changed, 35 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 78a26ae..f6e4b49 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -126,6 +126,7 @@ enum iommu_attr { DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_NESTING,/* two stages of translation */ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, + DOMAIN_ATTR_NESTING_INFO, DOMAIN_ATTR_MAX, }; diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 303f148..02eac73 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data { }; }; +struct iommu_nesting_info { + __u32 size; + __u32 format; + __u32 features; +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0) +#define IOMMU_NESTING_FEAT_BIND_PGTBL (1 << 1) +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2) + __u32 flags; + __u8data[]; +}; + +/* + * @flags: VT-d specific flags. Currently reserved for future + * extension. + * @addr_width:The output addr width of first level/stage translation + * @pasid_bits:Maximum supported PASID bits, 0 represents no PASID + * support. + * @cap_reg: Describe basic capabilities as defined in VT-d capability + * register. + * @cap_mask: Mark valid capability bits in @cap_reg. + * @ecap_reg: Describe the extended capabilities as defined in VT-d + * extended capability register. + * @ecap_mask: Mark the valid capability bits in @ecap_reg. + */ +struct iommu_nesting_info_vtd { + __u32 flags; + __u16 addr_width; + __u16 pasid_bits; + __u64 cap_reg; + __u64 cap_mask; + __u64 ecap_reg; + __u64 ecap_mask; +}; + #endif /* _UAPI_IOMMU_H */ -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 13/15] vfio/pci: Expose PCIe PASID capability to guest
This patch exposes PCIe PASID capability to guest for assigned devices. Existing vfio_pci driver hides it from guest by setting the capability length as 0 in pci_ext_cap_length[]. And this patch only exposes PASID capability for devices which has PCIe PASID extended struture in its configuration space. So VFs, will will not see PASID capability on VFs as VF doesn't implement PASID extended structure in its configuration space. For VF, it is a TODO in future. Related discussion can be found in below link: https://lkml.org/lkml/2020/4/7/693 Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L --- v1 -> v2: *) added in v2, but it was sent in a separate patchseries before drivers/vfio/pci/vfio_pci_config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 90c0b80..4b9af99 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = { [PCI_EXT_CAP_ID_LTR]= PCI_EXT_CAP_LTR_SIZEOF, [PCI_EXT_CAP_ID_SECPCI] = 0, /* not yet */ [PCI_EXT_CAP_ID_PMUX] = 0, /* not yet */ - [PCI_EXT_CAP_ID_PASID] = 0, /* not yet */ + [PCI_EXT_CAP_ID_PASID] = PCI_EXT_CAP_PASID_SIZEOF, }; /* -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 14/15] vfio: Document dual stage control
From: Eric Auger The VFIO API was enhanced to support nested stage control: a bunch of new iotcls and usage guideline. Let's document the process to follow to set up nested mode. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Eric Auger Signed-off-by: Liu Yi L --- v1 -> v2: *) new in v2, compared with Eric's original version, pasid table bind and fault reporting is removed as this series doesn't cover them. Original version from Eric. https://lkml.org/lkml/2020/3/20/700 Documentation/driver-api/vfio.rst | 64 +++ 1 file changed, 64 insertions(+) diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst index f1a4d3c..06224bd 100644 --- a/Documentation/driver-api/vfio.rst +++ b/Documentation/driver-api/vfio.rst @@ -239,6 +239,70 @@ group and can access them as follows:: /* Gratuitous device reset and go... */ ioctl(device, VFIO_DEVICE_RESET); +IOMMU Dual Stage Control + + +Some IOMMUs support 2 stages/levels of translation. Stage corresponds to +the ARM terminology while level corresponds to Intel's VTD terminology. +In the following text we use either without distinction. + +This is useful when the guest is exposed with a virtual IOMMU and some +devices are assigned to the guest through VFIO. Then the guest OS can use +stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for +VM isolation (GPA -> HPA). + +Under dual stage translation, the guest gets ownership of the stage 1 page +tables and also owns stage 1 configuration structures. The hypervisor owns +the root configuration structure (for security reason), including stage 2 +configuration. This works as long configuration structures and page table +format are compatible between the virtual IOMMU and the physical IOMMU. + +Assuming the HW supports it, this nested mode is selected by choosing the +VFIO_TYPE1_NESTING_IOMMU type through: + +ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU); + +This forces the hypervisor to use the stage 2, leaving stage 1 available +for guest usage. The guest stage 1 format depends on IOMMU vendor, and +it is the same with the nesting configuration method. User space should +check the format and configuration method after setting nesting type by +using: + +ioctl(container->fd, VFIO_IOMMU_GET_INFO, _info); + +Details can be found in Documentation/userspace-api/iommu.rst. For Intel +VT-d, each stage 1 page table is bound to host by: + +nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL; +memcpy(_op->data, _data, sizeof(bind_data)); +ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op); + +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA. +GVA->GPA page tables are available when PASID (Process Address Space ID) +is exposed to guest. e.g. guest with PASID-capable devices assigned. For +such page table binding, the bind_data should include PASID info, which +is allocated by guest itself or by host. This depends on hardware vendor +e.g. Intel VT-d requires to allocate PASID from host. This requirement is +available by VFIO_IOMMU_GET_INFO. User space could allocate PASID from +host by: + +req.flags = VFIO_IOMMU_ALLOC_PASID; +ioctl(container, VFIO_IOMMU_PASID_REQUEST, ); + +With first stage/level page table bound to host, it allows to combine the +guest stage 1 translation along with the hypervisor stage 2 translation to +get final address. + +When the guest invalidates stage 1 related caches, invalidations must be +forwarded to the host through + +nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD; +memcpy(_op->data, _data, sizeof(inv_data)); +ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op); + +Those invalidations can happen at various granularity levels, page, context, +... + VFIO User API --- -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data
Existing iommu_gpasid_bind_data is used for binding guest page tables to a specified PASID. While for unwind it, a unbind_data structure is needed. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- include/uapi/linux/iommu.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 02eac73..46a7c57 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -332,6 +332,19 @@ struct iommu_gpasid_bind_data { }; }; +/** + * struct iommu_gpasid_unbind_data - Information about device and guest PASID + * unbinding + * @argsz: User filled size of this data + * @flags: Additional information on guest unbind request + * @pasid: Process address space ID used for the guest mm in host IOMMU + */ +struct iommu_gpasid_unbind_data { + __u32 argsz; + __u64 flags; + __u64 pasid; +}; + struct iommu_nesting_info { __u32 size; __u32 format; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 10/15] vfio/type1: Support binding guest page tables to PASID
Nesting translation allows two-levels/stages page tables, with 1st level for guest translations (e.g. GVA->GPA), 2nd level for host translations (e.g. GPA->HPA). This patch adds interface for binding guest page tables to a PASID. This PASID must have been allocated to user space before the binding request. CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Jean-Philippe Brucker Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- v1 -> v2: *) rename subject from "vfio/type1: Bind guest page tables to host" *) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/ unbind guet page table *) replaced vfio_iommu_for_each_dev() with a group level loop since this series enforces one group per container w/ nesting type as start. *) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn() *) vfio_dev_unbind_gpasid() always successful *) use vfio_mm->pasid_lock to avoid race between PASID free and page table bind/unbind Cc: Kevin Tian drivers/vfio/vfio_iommu_type1.c | 191 drivers/vfio/vfio_pasid.c | 30 +++ include/linux/vfio.h| 20 + include/uapi/linux/vfio.h | 30 +++ 4 files changed, 271 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index a7b3a83..f1468a0 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -130,6 +130,33 @@ struct vfio_regions { #define IS_DOMAIN_IN_CONTAINER(iommu) ((iommu->external_domain) || \ (!list_empty(>domain_list))) +struct domain_capsule { + struct vfio_group *group; + struct iommu_domain *domain; + void *data; +}; + +/* iommu->lock must be held */ +static struct vfio_group *vfio_find_nesting_group(struct vfio_iommu *iommu) +{ + struct vfio_domain *d; + struct vfio_group *g, *group = NULL; + + if (!iommu->nesting_info) + return NULL; + + /* only support singleton container with nesting type */ + list_for_each_entry(d, >domain_list, next) { + list_for_each_entry(g, >group_list, next) { + if (!group) { + group = g; + break; + } + } + } + return group; +} + static int put_pfn(unsigned long pfn, int prot); /* @@ -2014,6 +2041,39 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu, return ret; } +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_gpasid_bind_data *bind_data = + (struct iommu_gpasid_bind_data *) dc->data; + + return iommu_sva_bind_gpasid(dc->domain, dev, bind_data); +} + +static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_gpasid_unbind_data *unbind_data = + (struct iommu_gpasid_unbind_data *) dc->data; + + iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data); + return 0; +} + +static void vfio_group_unbind_gpasid_fn(ioasid_t pasid, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *) data; + struct iommu_gpasid_unbind_data unbind_data; + + unbind_data.argsz = sizeof(unbind_data); + unbind_data.flags = 0; + unbind_data.pasid = pasid; + + dc->data = _data; + iommu_group_for_each_dev(dc->group->iommu_group, +dc, vfio_dev_unbind_gpasid_fn); +} + static void vfio_iommu_type1_detach_group(void *iommu_data, struct iommu_group *iommu_group) { @@ -2055,6 +2115,21 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, if (!group) continue; + if (iommu->nesting_info && iommu->vmm && + (iommu->nesting_info->features & + IOMMU_NESTING_FEAT_BIND_PGTBL)) { + struct domain_capsule dc = { .group = group, +.domain = domain->domain, +.data = NULL }; + + /* +* Unbind page tables bound with system wide PASIDs +* which are allocated to user space. +*/ + vfio_mm_for_each_pasid(iommu->vmm, , + vfio_group_unbind_gpasid_fn); + } + vfio_iommu_detach_group(domain, group); list_del(>next); kfree(group); @@ -2453,6 +2528,120 @@ static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
[PATCH v2 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs
Recent years, mediated device pass-through framework (e.g. vfio-mdev) is used to achieve flexible device sharing across domains (e.g. VMs). Also there are hardware assisted mediated pass-through solutions from platform vendors. e.g. Intel VT-d scalable mode which supports Intel Scalable I/O Virtualization technology. Such mdevs are called IOMMU- backed mdevs as there are IOMMU enforced DMA isolation for such mdevs. In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain concept, which means mdevs are protected by an iommu domain which is auxiliary to the domain that the kernel driver primarily uses for DMA API. Details can be found in the KVM presentation as below: https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\ Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The main requirement is to use the auxiliary domain associated with mdev. Cc: Kevin Tian CC: Jacob Pan CC: Jun Tian Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Liu Yi L --- v1 -> v2: *) check the iommu_device to ensure the handling mdev is IOMMU-backed drivers/vfio/vfio_iommu_type1.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index c233c8e..bcd7935 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2041,13 +2041,27 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu, return ret; } +static struct device *vfio_get_iommu_device(struct vfio_group *group, + struct device *dev) +{ + if (group->mdev_group) + return vfio_mdev_get_iommu_device(dev); + else + return dev; +} + static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) { struct domain_capsule *dc = (struct domain_capsule *)data; struct iommu_gpasid_bind_data *bind_data = (struct iommu_gpasid_bind_data *) dc->data; + struct device *iommu_device; + + iommu_device = vfio_get_iommu_device(dc->group, dev); + if (!iommu_device) + return -EINVAL; - return iommu_sva_bind_gpasid(dc->domain, dev, bind_data); + return iommu_sva_bind_gpasid(dc->domain, iommu_device, bind_data); } static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data) @@ -2055,8 +2069,13 @@ static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data) struct domain_capsule *dc = (struct domain_capsule *)data; struct iommu_gpasid_unbind_data *unbind_data = (struct iommu_gpasid_unbind_data *) dc->data; + struct device *iommu_device; + + iommu_device = vfio_get_iommu_device(dc->group, dev); + if (!iommu_device) + return 0; - iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data); + iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data); return 0; } @@ -2587,8 +2606,13 @@ static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data) struct domain_capsule *dc = (struct domain_capsule *)data; struct iommu_cache_invalidate_info *cache_info = (struct iommu_cache_invalidate_info *) dc->data; + struct device *iommu_device; + + iommu_device = vfio_get_iommu_device(dc->group, dev); + if (!iommu_device) + return -EINVAL; - iommu_cache_invalidate(dc->domain, dev, cache_info); + iommu_cache_invalidate(dc->domain, iommu_device, cache_info); return 0; } -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 08/15] iommu: Pass domain and unbind_data to sva_unbind_gpasid()
From: Yi Sun Current interface is good enough for SVA virtualization on an assigned physical PCI device, but when it comes to mediated devices, a physical device may attached with multiple aux-domains. So this interface needs to pass in domain info. Then the iommu driver is able to know which domain will be used for the 2nd stage translation of the nesting mode. This patch passes @domain per the above reason. This interface is supposed to serve the unbind request from user-space, should pass in unbind_data as well. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Yi Sun Signed-off-by: Liu Yi L --- drivers/iommu/intel-svm.c | 14 -- drivers/iommu/iommu.c | 4 ++-- include/linux/intel-iommu.h | 3 ++- include/linux/iommu.h | 8 +--- 4 files changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 6272da6..bf55e2f 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -445,16 +445,26 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev, return ret; } -int intel_svm_unbind_gpasid(struct device *dev, int pasid) +int intel_svm_unbind_gpasid(struct iommu_domain *domain, + struct device *dev, + struct iommu_gpasid_unbind_data *data) { struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); struct intel_svm_dev *sdev; struct intel_svm *svm; int ret = -EINVAL; + unsigned long minsz; + int pasid; + + if (WARN_ON(!iommu) || !data) + return -EINVAL; - if (WARN_ON(!iommu)) + minsz = offsetofend(struct iommu_gpasid_unbind_data, pasid); + if (data->argsz < minsz || data->flags) return -EINVAL; + pasid = data->pasid; + mutex_lock(_mutex); svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL); if (!svm) { diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 374b34f..57aac03 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1955,12 +1955,12 @@ int iommu_sva_bind_gpasid(struct iommu_domain *domain, EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid); int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev, -ioasid_t pasid) + struct iommu_gpasid_unbind_data *data) { if (unlikely(!domain->ops->sva_unbind_gpasid)) return -ENODEV; - return domain->ops->sva_unbind_gpasid(dev, pasid); + return domain->ops->sva_unbind_gpasid(domain, dev, data); } EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid); diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 29d1c6f..0b238c3 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -737,7 +737,8 @@ extern int intel_svm_finish_prq(struct intel_iommu *iommu); int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev, struct iommu_gpasid_bind_data *data); -int intel_svm_unbind_gpasid(struct device *dev, int pasid); +int intel_svm_unbind_gpasid(struct iommu_domain *domain, struct device *dev, + struct iommu_gpasid_unbind_data *data); struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata); void intel_svm_unbind(struct iommu_sva *handle); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 57c46ae..a19f063 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -325,7 +325,8 @@ struct iommu_ops { int (*sva_bind_gpasid)(struct iommu_domain *domain, struct device *dev, struct iommu_gpasid_bind_data *data); - int (*sva_unbind_gpasid)(struct device *dev, int pasid); + int (*sva_unbind_gpasid)(struct iommu_domain *domain, + struct device *dev, struct iommu_gpasid_unbind_data *data); int (*def_domain_type)(struct device *dev); @@ -459,7 +460,7 @@ extern int iommu_cache_invalidate(struct iommu_domain *domain, extern int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev, struct iommu_gpasid_bind_data *data); extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain, - struct device *dev, ioasid_t pasid); + struct device *dev, struct iommu_gpasid_unbind_data *data); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern struct iommu_domain *iommu_get_dma_domain(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, @@ -1084,7 +1085,8 @@ static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain, } static inline int iommu_sva_unbind_gpasid(struct iommu_domain *domain, -
[PATCH v2 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl()
This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of if-else, and each cmd got a helper function. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Suggested-by: Christoph Hellwig Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 183 +++- 1 file changed, 105 insertions(+), 78 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index cc1d647..402aad3 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2106,6 +2106,23 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu) return ret; } +static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, + unsigned long arg) +{ + switch (arg) { + case VFIO_TYPE1_IOMMU: + case VFIO_TYPE1v2_IOMMU: + case VFIO_TYPE1_NESTING_IOMMU: + return 1; + case VFIO_DMA_CC_IOMMU: + if (!iommu) + return 0; + return vfio_domains_have_iommu_cache(iommu); + default: + return 0; + } +} + static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps, struct vfio_iommu_type1_info_cap_iova_range *cap_iovas, size_t size) @@ -2173,110 +2190,120 @@ static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu, return ret; } -static long vfio_iommu_type1_ioctl(void *iommu_data, - unsigned int cmd, unsigned long arg) +static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu, +unsigned long arg) { - struct vfio_iommu *iommu = iommu_data; + struct vfio_iommu_type1_info info; unsigned long minsz; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + unsigned long capsz; + int ret; - if (cmd == VFIO_CHECK_EXTENSION) { - switch (arg) { - case VFIO_TYPE1_IOMMU: - case VFIO_TYPE1v2_IOMMU: - case VFIO_TYPE1_NESTING_IOMMU: - return 1; - case VFIO_DMA_CC_IOMMU: - if (!iommu) - return 0; - return vfio_domains_have_iommu_cache(iommu); - default: - return 0; - } - } else if (cmd == VFIO_IOMMU_GET_INFO) { - struct vfio_iommu_type1_info info; - struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; - unsigned long capsz; - int ret; - - minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes); + minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes); - /* For backward compatibility, cannot require this */ - capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset); + /* For backward compatibility, cannot require this */ + capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset); - if (copy_from_user(, (void __user *)arg, minsz)) - return -EFAULT; + if (copy_from_user(, (void __user *)arg, minsz)) + return -EFAULT; - if (info.argsz < minsz) - return -EINVAL; + if (info.argsz < minsz) + return -EINVAL; - if (info.argsz >= capsz) { - minsz = capsz; - info.cap_offset = 0; /* output, no-recopy necessary */ - } + if (info.argsz >= capsz) { + minsz = capsz; + info.cap_offset = 0; /* output, no-recopy necessary */ + } - info.flags = VFIO_IOMMU_INFO_PGSIZES; + info.flags = VFIO_IOMMU_INFO_PGSIZES; - info.iova_pgsizes = vfio_pgsize_bitmap(iommu); + info.iova_pgsizes = vfio_pgsize_bitmap(iommu); - ret = vfio_iommu_iova_build_caps(iommu, ); - if (ret) - return ret; + ret = vfio_iommu_iova_build_caps(iommu, ); + if (ret) + return ret; - if (caps.size) { - info.flags |= VFIO_IOMMU_INFO_CAPS; + if (caps.size) { + info.flags |= VFIO_IOMMU_INFO_CAPS; - if (info.argsz < sizeof(info) + caps.size) { - info.argsz = sizeof(info) + caps.size; - } else { - vfio_info_cap_shift(, sizeof(info)); - if (copy_to_user((void __user *)arg + - sizeof(info), caps.buf, - caps.size)) { - kfree(caps.buf); -
[PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on Intel platforms allows address space sharing between device DMA and applications. SVA can reduce programming complexity and enhance security. This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing guest application address space with passthru devices. This is called vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU changes. For IOMMU and QEMU changes, they are in separate series (listed in the "Related series"). The high-level architecture for SVA virtualization is as below, the key design of vSVA support is to utilize the dual-stage IOMMU translation ( also known as IOMMU nesting translation) capability in host IOMMU. .-. .---. | vIOMMU| | Guest process CR3, FL only| | | '---' ./ | PASID Entry |--- PASID cache flush - '-' | | | V | |CR3 in GPA '-' Guest --| Shadow |--| vv v Host .-. .--. | pIOMMU| | Bind FL for GVA-GPA | | | '--' ./ | | PASID Entry | V (Nested xlate) '\.--. | | |SL for GPA-HPA, default domain| | | '--' '-' Where: - FL = First level/stage one page tables - SL = Second level/stage two page tables Patch Overview: 1. a refactor to vfio_iommu_type1 ioctl (patch 0001) 2. reports IOMMU nesting info to userspace ( patch 0002, 0003 and 0015) 3. vfio support for PASID allocation and free for VMs (patch 0004, 0005, 0006) 4. vfio support for binding guest page table to host (patch 0007, 0008, 0009, 0010) 5. vfio support for IOMMU cache invalidation from VMs (patch 0011) 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012) 7. expose PASID capability to VM (patch 0013) 8. add doc for VFIO dual stage control (patch 0014) The complete vSVA kernel upstream patches are divided into three phases: 1. Common APIs and PCI device direct assignment 2. IOMMU-backed Mediated Device assignment 3. Page Request Services (PRS) support This patchset is aiming for the phase 1 and phase 2, and based on Jacob's below series. [PATCH v13 0/8] Nested Shared Virtual Address (SVA) VT-d support - merged https://lkml.org/lkml/2020/5/13/1582 [PATCH v2 0/3] IOMMU user API enhancement - wip https://lkml.org/lkml/2020/6/11/5 [PATCH 00/10] IOASID extensions for guest SVA - wip https://lkml.org/lkml/2020/3/25/874 The latest IOASID code added below new interface for itertate all PASIDs of an ioasid_set. The implementation is not sent out yet as Jacob needs some cleanup, it can be found in branch vsva-linux-5.7-rc4-v2. int ioasid_set_for_each_ioasid(int sid, void (*fn)(ioasid_t id, void *data), void *data); Complete set for current vSVA can be found in below branch. This branch also includes some extra modifications to IOASID core code and vt-d iommu driver cleanup patches. https://github.com/luxis1999/linux-vsva.git:vsva-linux-5.7-rc4-v2 The corresponding QEMU patch series is included in below branch: https://github.com/luxis1999/qemu.git:vsva_5.7_rc4_qemu_rfcv6 Regards, Yi Liu Changelog: - Patch v1 -> Patch v2: a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph Hellwig. b) Re-sequence the patch series for better bisect support. c) Report IOMMU nesting cap info in detail instead of a format in v1. d) Enforce one group per nesting type container for vfio iommu type1 driver. e) Build the vfio_mm related code from vfio.c to be a separate vfio_pasid.ko. f) Add PASID ownership check in IOMMU driver. g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version check. Added iommu_gpasid_unbind_data for unbind requests from userspace. h) Define a single ioctl:VFIO_IOMMU_NESTING_OP for bind/unbind_gtbl and cahce_invld. i) Document dual stage control in vfio.rst. Patch v1: https://lore.kernel.org/linux-iommu/1584880325-10561-1-git-send-email-yi.l@intel.com/ - RFC v3 -> Patch v1: a) Address comments to the PASID request(alloc/free) path b) Report PASID alloc/free availabitiy to user-space c) Add a vfio_iommu_type1 parameter to support pasid quota tuning d) Adjusted to latest ioasid code implementation. e.g. remove the code for tracking the allocated PASIDs as latest ioasid code will track it, VFIO could use ioasid_free_set() to free all
[PATCH 0/2] iommu/renesas: Add support for r8a77961
This patch series is based on next-20200611. Yoshihiro Shimoda (2): dt-bindings: iommu: renesas,ipmmu-vmsa: add r8a77961 support iommu/renesas: Add support for r8a77961 Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 1 + drivers/iommu/ipmmu-vmsa.c | 6 +- 2 files changed, 6 insertions(+), 1 deletion(-) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] dt-bindings: iommu: renesas, ipmmu-vmsa: add r8a77961 support
Add support for r8a77961 (R-Car M3-W+). Signed-off-by: Yoshihiro Shimoda --- Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml index 39675cf..e9d28a4 100644 --- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml +++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml @@ -35,6 +35,7 @@ properties: - renesas,ipmmu-r8a774c0 # RZ/G2E - renesas,ipmmu-r8a7795 # R-Car H3 - renesas,ipmmu-r8a7796 # R-Car M3-W + - renesas,ipmmu-r8a77961 # R-Car M3-W+ - renesas,ipmmu-r8a77965 # R-Car M3-N - renesas,ipmmu-r8a77970 # R-Car V3M - renesas,ipmmu-r8a77980 # R-Car V3H -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu/renesas: Add support for r8a77961
Add support for r8a77961 (R-Car M3-W+). Signed-off-by: Yoshihiro Shimoda --- drivers/iommu/ipmmu-vmsa.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 4c2972f..b57b1f2 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -3,7 +3,7 @@ * IOMMU API for Renesas VMSA-compatible IPMMU * Author: Laurent Pinchart * - * Copyright (C) 2014 Renesas Electronics Corporation + * Copyright (C) 2014-2020 Renesas Electronics Corporation */ #include @@ -753,6 +753,7 @@ static const struct soc_device_attribute soc_rcar_gen3_whitelist[] = { { .soc_id = "r8a774b1", }, { .soc_id = "r8a774c0", }, { .soc_id = "r8a7795", .revision = "ES3.*" }, + { .soc_id = "r8a77961", }, { .soc_id = "r8a77965", }, { .soc_id = "r8a77990", }, { .soc_id = "r8a77995", }, @@ -970,6 +971,9 @@ static const struct of_device_id ipmmu_of_ids[] = { .compatible = "renesas,ipmmu-r8a7796", .data = _features_rcar_gen3, }, { + .compatible = "renesas,ipmmu-r8a77961", + .data = _features_rcar_gen3, + }, { .compatible = "renesas,ipmmu-r8a77965", .data = _features_rcar_gen3, }, { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/3] docs: IOMMU user API
On Wed, 10 Jun 2020 21:12:13 -0700 Jacob Pan wrote: > IOMMU UAPI is newly introduced to support communications between guest > virtual IOMMU and host IOMMU. There has been lots of discussions on how > it should work with VFIO UAPI and userspace in general. > > This document is indended to clarify the UAPI design and usage. The > mechenics of how future extensions should be achieved are also covered mechanics > in this documentation. > > Signed-off-by: Liu Yi L > Signed-off-by: Jacob Pan Mostly seems sensible. A few comments / queries inline. Jonathan > --- > Documentation/userspace-api/iommu.rst | 210 > ++ > 1 file changed, 210 insertions(+) > create mode 100644 Documentation/userspace-api/iommu.rst > > diff --git a/Documentation/userspace-api/iommu.rst > b/Documentation/userspace-api/iommu.rst > new file mode 100644 > index ..e95dc5a04a41 > --- /dev/null > +++ b/Documentation/userspace-api/iommu.rst > @@ -0,0 +1,210 @@ > +.. SPDX-License-Identifier: GPL-2.0 > +.. iommu: > + > += > +IOMMU Userspace API > += > + > +IOMMU UAPI is used for virtualization cases where communications are > +needed between physical and virtual IOMMU drivers. For native > +usage, IOMMU is a system device which does not need to communicate > +with user space directly. > + > +The primary use cases are guest Shared Virtual Address (SVA) and > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is wherein _a_ virtual IOMMU > +required to communicate with the physical IOMMU in the host. > + > +.. contents:: :local: > + > +Functionalities > + > +Communications of user and kernel involve both directions. The > +supported user-kernel APIs are as follows: > + > +1. Alloc/Free PASID > +2. Bind/unbind guest PASID (e.g. Intel VT-d) > +3. Bind/unbind guest PASID table (e.g. ARM sMMU) > +4. Invalidate IOMMU caches > +5. Service page request > + > +Requirements > + > +The IOMMU UAPIs are generic and extensible to meet the following > +requirements: > + > +1. Emulated and para-virtualised vIOMMUs > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.) > +3. Extensions to the UAPI shall not break existing user space > + > +Interfaces > + > +Although the data structures defined in IOMMU UAPI are self-contained, > +there is no user API functions introduced. Instead, IOMMU UAPI is > +designed to work with existing user driver frameworks such as VFIO. > + > +Extension Rules & Precautions > +- > +When IOMMU UAPI gets extended, the data structures can *only* be > +modified in two ways: > + > +1. Adding new fields by re-purposing the padding[] field. No size change. > +2. Adding new union members at the end. May increase in size. > + > +No new fields can be added *after* the variable size union in that it > +will break backward compatibility when offset moves. In both cases, a > +new flag must be accompanied with a new field such that the IOMMU > +driver can process the data based on the new flag. Version field is > +only reserved for the unlikely event of UAPI upgrade at its entirety. > + > +It's *always* the caller's responsibility to indicate the size of the > +structure passed by setting argsz appropriately. > + > +When IOMMU UAPI extension results in size increase, user such as VFIO > +has to handle the following scenarios: > + > +1. User and kernel has exact size match > +2. An older user with older kernel header (smaller UAPI size) running on a > + newer kernel (larger UAPI size) > +3. A newer user with newer kernel header (larger UAPI size) running > + on a older kernel. > +4. A malicious/misbehaving user pass illegal/invalid size but within > + range. The data may contain garbage. > + > + > +Feature Checking > + > +While launching a guest with vIOMMU, it is important to ensure that host > +can support the UAPI data structures to be used for vIOMMU-pIOMMU > +communications. Without the upfront compatibility checking, future > +faults are difficult to report even in normal conditions. For example, > +TLB invalidations should always succeed from vIOMMU's > +perspective. This statement has me concerned. If a TLB invalidation fails, but is reported to the guest as successful do we have possible breaking of iommu isolation guarantees? If you get a TLB invalidation not happening, for some reason, that's a critical fault, isolate the device using the IOMMU or kill the VM. I'd reword it as "TLB invalidations should always succeed." As you mention, we should never get to this state anyway! > There is no architectural way to report back to the vIOMMU > +if the UAPI data is incompatible. For this reason the following IOMMU > +UAPIs cannot fail: > + > +1. Free PASID > +2. Unbind guest PASID > +3. Unbind
Re: [PATCH v2 1/3] docs: IOMMU user API
Hi Jacob, On 2020/6/11 12:12, Jacob Pan wrote: IOMMU UAPI is newly introduced to support communications between guest virtual IOMMU and host IOMMU. There has been lots of discussions on how it should work with VFIO UAPI and userspace in general. This document is indended to clarify the UAPI design and usage. The mechenics of how future extensions should be achieved are also covered in this documentation. Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- Documentation/userspace-api/iommu.rst | 210 ++ 1 file changed, 210 insertions(+) create mode 100644 Documentation/userspace-api/iommu.rst diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst new file mode 100644 index ..e95dc5a04a41 --- /dev/null +++ b/Documentation/userspace-api/iommu.rst @@ -0,0 +1,210 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. iommu: + += +IOMMU Userspace API += + +IOMMU UAPI is used for virtualization cases where communications are +needed between physical and virtual IOMMU drivers. For native +usage, IOMMU is a system device which does not need to communicate +with user space directly. + +The primary use cases are guest Shared Virtual Address (SVA) and +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is +required to communicate with the physical IOMMU in the host. + +.. contents:: :local: + +Functionalities + +Communications of user and kernel involve both directions. The +supported user-kernel APIs are as follows: + +1. Alloc/Free PASID +2. Bind/unbind guest PASID (e.g. Intel VT-d) +3. Bind/unbind guest PASID table (e.g. ARM sMMU) +4. Invalidate IOMMU caches +5. Service page request + +Requirements + +The IOMMU UAPIs are generic and extensible to meet the following +requirements: + +1. Emulated and para-virtualised vIOMMUs +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.) +3. Extensions to the UAPI shall not break existing user space + +Interfaces + +Although the data structures defined in IOMMU UAPI are self-contained, +there is no user API functions introduced. Instead, IOMMU UAPI is +designed to work with existing user driver frameworks such as VFIO. + +Extension Rules & Precautions +- +When IOMMU UAPI gets extended, the data structures can *only* be +modified in two ways: + +1. Adding new fields by re-purposing the padding[] field. No size change. +2. Adding new union members at the end. May increase in size. + +No new fields can be added *after* the variable size union in that it +will break backward compatibility when offset moves. In both cases, a +new flag must be accompanied with a new field such that the IOMMU +driver can process the data based on the new flag. Version field is +only reserved for the unlikely event of UAPI upgrade at its entirety. + +It's *always* the caller's responsibility to indicate the size of the +structure passed by setting argsz appropriately. + +When IOMMU UAPI extension results in size increase, user such as VFIO +has to handle the following scenarios: + +1. User and kernel has exact size match +2. An older user with older kernel header (smaller UAPI size) running on a + newer kernel (larger UAPI size) +3. A newer user with newer kernel header (larger UAPI size) running + on a older kernel. +4. A malicious/misbehaving user pass illegal/invalid size but within + range. The data may contain garbage. + + +Feature Checking + +While launching a guest with vIOMMU, it is important to ensure that host +can support the UAPI data structures to be used for vIOMMU-pIOMMU +communications. Without the upfront compatibility checking, future +faults are difficult to report even in normal conditions. For example, +TLB invalidations should always succeed from vIOMMU's +perspective. There is no architectural way to report back to the vIOMMU +if the UAPI data is incompatible. For this reason the following IOMMU +UAPIs cannot fail: + +1. Free PASID +2. Unbind guest PASID +3. Unbind guest PASID table (SMMU) +4. Cache invalidate +5. Page response + +User applications such as QEMU is expected to import kernel UAPI +headers. Only backward compatibility is supported. For example, an +older QEMU (with older kernel header) can run on newer kernel. Newer +QEMU (with new kernel header) may fail on older kernel. + +IOMMU vendor driver should report the below features to IOMMU UAPI +consumers (e.g. via VFIO). + +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID +2. IOMMU_NESTING_FEAT_BIND_PGTBL +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE +4. IOMMU_NESTING_FEAT_CACHE_INVLD +5. IOMMU_NESTING_FEAT_PAGE_REQUEST + +Take VFIO as example, upon request from VFIO user space (e.g. QEMU), +VFIO kernel code shall query IOMMU vendor driver for the support of +the above