Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.
Hello Konrad, On Tue, Jun 23, 2020 at 09:38:43AM -0400, Konrad Rzeszutek Wilk wrote: > On Mon, Apr 27, 2020 at 06:53:18PM +, Ashish Kalra wrote: > > Hello Konrad, > > > > On Mon, Mar 30, 2020 at 10:25:51PM +, Ashish Kalra wrote: > > > Hello Konrad, > > > > > > On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote: > > > > On Tue, Feb 04, 2020 at 07:35:00PM +, Ashish Kalra wrote: > > > > > Hello Konrad, > > > > > > > > > > Looking fwd. to your feedback regarding support of other memory > > > > > encryption architectures such as Power, S390, etc. > > > > > > > > > > Thanks, > > > > > Ashish > > > > > > > > > > On Fri, Jan 24, 2020 at 11:00:08PM +, Ashish Kalra wrote: > > > > > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk > > > > > > wrote: > > > > > > > > > > > > > > > > Additional memory calculations based on # of PCI devices and > > > > > > > > their memory ranges will make it more complicated with so > > > > > > > > many other permutations and combinations to explore, it is > > > > > > > > essential to keep this patch as simple as possible by > > > > > > > > adjusting the bounce buffer size simply by determining it > > > > > > > > from the amount of provisioned guest memory. > > > > > > >> > > > > > > >> Please rework the patch to: > > > > > > >> > > > > > > >> - Use a log solution instead of the multiplication. > > > > > > >>Feel free to cap it at a sensible value. > > > > > > > > > > > > Ok. > > > > > > > > > > > > >> > > > > > > >> - Also the code depends on SWIOTLB calling in to the > > > > > > >>adjust_swiotlb_default_size which looks wrong. > > > > > > >> > > > > > > >>You should not adjust io_tlb_nslabs from > > > > > > >> swiotlb_size_or_default. > > > > > > > > > > > > >>That function's purpose is to report a value. > > > > > > >> > > > > > > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code. > > > > > > >> > > > > > > >> - Can you utilize the IOMMU_INIT APIs and have your own detect > > > > > > >> which would > > > > > > >>modify the io_tlb_nslabs (and set swiotbl=1?). > > > > > > > > > > > > This seems to be a nice option, but then IOMMU_INIT APIs are > > > > > > x86-specific and this swiotlb buffer size adjustment is also needed > > > > > > for other memory encryption architectures like Power, S390, etc. > > > > > > > > Oh dear. That I hadn't considered. > > > > > > > > > > > > >> > > > > > > >>Actually you seem to be piggybacking on > > > > > > >> pci_swiotlb_detect_4gb - so > > > > > > >>perhaps add in this code ? Albeit it really should be in it's > > > > > > >> own > > > > > > >>file, not in arch/x86/kernel/pci-swiotlb.c > > > > > > > > > > > > Actually, we piggyback on pci_swiotlb_detect_override which sets > > > > > > swiotlb=1 as x86_64_start_kernel() and invocation of > > > > > > sme_early_init() > > > > > > forces swiotlb on, but again this is all x86 architecture specific. > > > > > > > > Then it looks like the best bet is to do it from within swiotlb_init? > > > > We really can't do it from swiotlb_size_or_default - that function > > > > should just return a value and nothing else. > > > > > > > > > > Actually, we need to do it in swiotlb_size_or_default() as this gets > > > called by > > > reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to > > > reserve low crashkernel memory. If we adjust swiotlb size later in > > > swiotlb_init() which gets called later than reserve_crashkernel_low(), > > > then any swiotlb size changes/expansion will conflict/overlap with the > > > low memory reserved for crashkernel. > > > > > and will also potentially cause SWIOTLB buffer allocation failures. > > > > Do you have any feedback, comments on the above ? > > > The init boot chain looks like this: > > initmem_init > pci_iommu_alloc > -> pci_swiotlb_detect_4gb > -> swiotlb_init > > reserve_crashkernel > reserve_crashkernel_low > -> swiotlb_size_or_default > .. > > > (rootfs code): > pci_iommu_init > -> a bunch of the other IOMMU late_init code gets called.. > -> pci_swiotlb_late_init > > I have to say I am lost to how your patch fixes "If we adjust swiolb > size later .. then any swiotlb size .. will overlap with the low memory > reserved for crashkernel"? > Actually as per the boot flow : setup_arch() calls reserve_crashkernel() and pci_iommu_alloc() is invoked through mm_init()/mem_init() and not via initmem_init(). start_kernel: ... setup_arch() reserve_crashkernel reserve_crashkernel_low -> swiotlb_size_or_default ... ... mm_init() mem_init() pci_iommu_alloc -> pci_swiotlb_detect_4gb -> swiotlb_init So as per the above boot flow, reserve_crashkernel() can get called before swiotlb_detect/init, and hence, if we don't
Re: [PATCH 5/7] iommu/vt-d: Fix devTLB flush for vSVA
On Tue, 23 Jun 2020 08:43:14 -0700 Jacob Pan wrote: > From: Liu Yi L > > For guest SVA usage, in order to optimize for less VMEXIT, guest > request of IOTLB flush also includes device TLB. > > On the host side, IOMMU driver performs IOTLB and implicit devTLB > invalidation. When PASID-selective granularity is requested by the > guest we need to derive the equivalent address range for devTLB > instead of using the address information in the UAPI data. The reason > for that is, unlike IOTLB flush, devTLB flush does not support > PASID-selective granularity. This is to say, we need to set the > following in the PASID based devTLB invalidation descriptor: > - entire 64 bit range in address ~(0x1 << 63) > - S bit = 1 (VT-d CH 6.5.2.6). > > Without this fix, device TLB flush range is not set properly for PASID > selective granularity. This patch also merged devTLB flush code for > both implicit and explicit cases. > > Fixes: 6ee1b77ba3ac ("iommu/vt-d: Add svm/sva invalidate function") > Signed-off-by: Liu Yi L > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel/iommu.c | 25 + > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 96340da57075..5ea5732d5ec4 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -5408,7 +5408,7 @@ intel_iommu_sva_invalidate(struct iommu_domain > *domain, struct device *dev, sid = PCI_DEVID(bus, devfn); > > /* Size is only valid in address selective invalidation */ > - if (inv_info->granularity != IOMMU_INV_GRANU_PASID) > + if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) > size = to_vtd_size(inv_info->addr_info.granule_size, > inv_info->addr_info.nb_granules); > > @@ -5417,6 +5417,7 @@ intel_iommu_sva_invalidate(struct iommu_domain > *domain, struct device *dev, IOMMU_CACHE_INV_TYPE_NR) { > int granu = 0; > u64 pasid = 0; > + u64 addr = 0; > > granu = to_vtd_granularity(cache_type, > inv_info->granularity); if (granu == -EINVAL) { > @@ -5456,19 +5457,27 @@ intel_iommu_sva_invalidate(struct > iommu_domain *domain, struct device *dev, (granu == > QI_GRAN_NONG_PASID) ? -1 : 1 << size, inv_info->addr_info.flags & > IOMMU_INV_ADDR_FLAGS_LEAF); > + if (!info->ats_enabled) > + break; > /* >* Always flush device IOTLB if ATS is > enabled. vIOMMU >* in the guest may assume IOTLB flush is > inclusive, >* which is more efficient. >*/ > - if (info->ats_enabled) > - qi_flush_dev_iotlb_pasid(iommu, sid, > - info->pfsid, pasid, > - info->ats_qdep, > - > inv_info->addr_info.addr, > - size); > - break; > + fallthrough; > case IOMMU_CACHE_INV_TYPE_DEV_IOTLB: > + /* > + * There is no PASID selective flush for > device TLB, so > + * the equivalent of that is we set the size > to be the > + * entire range of 64 bit. User only > provides PASID info > + * without address info. So we set addr to 0. > + */ > + if (inv_info->granularity == > IOMMU_INV_GRANU_PASID) { > + size = 64 - VTD_PAGE_SHIFT; > + addr = 0; > + } else if (inv_info->granularity == > IOMMU_INV_GRANU_ADDR) > + addr = inv_info->addr_info.addr; > + > if (info->ats_enabled) > qi_flush_dev_iotlb_pasid(iommu, sid, > info->pfsid, pasid, addr should be used here. will fix in the next version. Baolu has pointed out this before but missed it here. Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7] iommu/vt-d: Fix devTLB flush for vSVA
Hi Jacob, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on iommu/next] [also build test WARNING on linux/master linus/master v5.8-rc2 next-20200623] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Jacob-Pan/iommu-vt-d-Misc-tweaks-and-fixes-for-vSVA/20200623-233905 base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next config: i386-allyesconfig (attached as .config) compiler: gcc-9 (Debian 9.3.0-13) 9.3.0 reproduce (this is a W=1 build): # save the attached .config to linux build tree make W=1 ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): drivers/iommu/intel/iommu.c: In function 'intel_iommu_sva_invalidate': >> drivers/iommu/intel/iommu.c:5420:7: warning: variable 'addr' set but not >> used [-Wunused-but-set-variable] 5420 | u64 addr = 0; | ^~~~ vim +/addr +5420 drivers/iommu/intel/iommu.c 5370 5371 #ifdef CONFIG_INTEL_IOMMU_SVM 5372 static int 5373 intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, 5374 struct iommu_cache_invalidate_info *inv_info) 5375 { 5376 struct dmar_domain *dmar_domain = to_dmar_domain(domain); 5377 struct device_domain_info *info; 5378 struct intel_iommu *iommu; 5379 unsigned long flags; 5380 int cache_type; 5381 u8 bus, devfn; 5382 u16 did, sid; 5383 int ret = 0; 5384 u64 size = 0; 5385 5386 if (!inv_info || !dmar_domain || 5387 inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1) 5388 return -EINVAL; 5389 5390 if (!dev || !dev_is_pci(dev)) 5391 return -ENODEV; 5392 5393 iommu = device_to_iommu(dev, , ); 5394 if (!iommu) 5395 return -ENODEV; 5396 5397 if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) 5398 return -EINVAL; 5399 5400 spin_lock_irqsave(_domain_lock, flags); 5401 spin_lock(>lock); 5402 info = get_domain_info(dev); 5403 if (!info) { 5404 ret = -EINVAL; 5405 goto out_unlock; 5406 } 5407 did = dmar_domain->iommu_did[iommu->seq_id]; 5408 sid = PCI_DEVID(bus, devfn); 5409 5410 /* Size is only valid in address selective invalidation */ 5411 if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) 5412 size = to_vtd_size(inv_info->addr_info.granule_size, 5413 inv_info->addr_info.nb_granules); 5414 5415 for_each_set_bit(cache_type, 5416 (unsigned long *)_info->cache, 5417 IOMMU_CACHE_INV_TYPE_NR) { 5418 int granu = 0; 5419 u64 pasid = 0; > 5420 u64 addr = 0; 5421 5422 granu = to_vtd_granularity(cache_type, inv_info->granularity); 5423 if (granu == -EINVAL) { 5424 pr_err_ratelimited("Invalid cache type and granu combination %d/%d\n", 5425 cache_type, inv_info->granularity); 5426 break; 5427 } 5428 5429 /* 5430 * PASID is stored in different locations based on the 5431 * granularity. 5432 */ 5433 if (inv_info->granularity == IOMMU_INV_GRANU_PASID && 5434 (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID)) 5435 pasid = inv_info->pasid_info.pasid; 5436 else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR && 5437 (inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID)) 5438 pasid = inv_info->addr_info.pasid; 5439 5440 switch (BIT(cache_type)) { 5441 case IOMMU_CACHE_INV_TYPE_IOTLB: 5442 if (inv_info->granularity == IOMMU_INV_GRANU_ADDR && 5443 size && 5444 (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) { 5445 pr_err_ratelimited("Address out of range, 0x%llx, size order %llu\n", 5446
[PATCH v3 2/5] iommu/uapi: Add argsz for user filled data
As IOMMU UAPI gets extended, user data size may increase. To support backward compatibiliy, this patch introduces a size field to each UAPI data structures. It is *always* the responsibility for the user to fill in the correct size. Specific scenarios for user data handling are documented in: Documentation/userspace-api/iommu.rst Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- include/uapi/linux/iommu.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index e907b7091a46..303f148a5cd7 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -135,6 +135,7 @@ enum iommu_page_response_code { /** * struct iommu_page_response - Generic page response information + * @argsz: User filled size of this data * @version: API version of this structure * @flags: encodes whether the corresponding fields are valid * (IOMMU_FAULT_PAGE_RESPONSE_* values) @@ -143,6 +144,7 @@ enum iommu_page_response_code { * @code: response code from iommu_page_response_code */ struct iommu_page_response { + __u32 argsz; #define IOMMU_PAGE_RESP_VERSION_1 1 __u32 version; #define IOMMU_PAGE_RESP_PASID_VALID(1 << 0) @@ -218,6 +220,7 @@ struct iommu_inv_pasid_info { /** * struct iommu_cache_invalidate_info - First level/stage invalidation * information + * @argsz: User filled size of this data * @version: API version of this structure * @cache: bitfield that allows to select which caches to invalidate * @granularity: defines the lowest granularity used for the invalidation: @@ -246,6 +249,7 @@ struct iommu_inv_pasid_info { * must support the used granularity. */ struct iommu_cache_invalidate_info { + __u32 argsz; #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 __u32 version; /* IOMMU paging structure cache */ @@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd { /** * struct iommu_gpasid_bind_data - Information about device and guest PASID binding + * @argsz: User filled size of this data * @version: Version of this data structure * @format:PASID table entry format * @flags: Additional information on guest bind request @@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd { * PASID to host PASID based on this bind data. */ struct iommu_gpasid_bind_data { + __u32 argsz; #define IOMMU_GPASID_BIND_VERSION_11 __u32 version; #define IOMMU_PASID_FORMAT_INTEL_VTD 1 -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID
Guest SVA unbind data can come from either kernel and user space, if a user pointer is passed in, IOMMU driver must copy from data from user. If the unbind data is assembled in kernel, data can be trusted and directly used. This patch creates a wrapper for unbind gpasid such that user pointer can be parsed and sanitized before calling into the kernel unbind function. Common user data copy code also consolidated. Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/iommu/iommu.c | 70 ++- include/linux/iommu.h | 13 -- 2 files changed, 64 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4a025c429b41..595527e4c6b7 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2010,19 +2010,15 @@ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev, } EXPORT_SYMBOL_GPL(iommu_cache_invalidate); -int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev, - void __user *udata) -{ - struct iommu_gpasid_bind_data data; +static int iommu_sva_prepare_bind_data(void __user *udata, bool bind, + struct iommu_gpasid_bind_data *data) +{ unsigned long minsz, maxsz; - if (unlikely(!domain->ops->sva_bind_gpasid)) - return -ENODEV; - /* Current kernel data size is the max to be copied from user */ maxsz = sizeof(struct iommu_gpasid_bind_data); - memset((void *), 0, maxsz); + memset((void *)data, 0, maxsz); /* * No new spaces can be added before the variable sized union, the @@ -2031,11 +2027,11 @@ int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev, minsz = offsetof(struct iommu_gpasid_bind_data, vendor); /* Copy minsz from user to get flags and argsz */ - if (copy_from_user(, udata, minsz)) + if (copy_from_user(data, udata, minsz)) return -EFAULT; /* Fields before variable size union is mandatory */ - if (data.argsz < minsz) + if (data->argsz < minsz) return -EINVAL; /* * User might be using a newer UAPI header, we shall let IOMMU vendor @@ -2043,26 +2039,66 @@ int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev, * can be vendor specific, larger argsz could be the result of extension * for one vendor but it should not affect another vendor. */ - if (data.argsz > maxsz) - data.argsz = maxsz; + if (data->argsz > maxsz) + data->argsz = maxsz; + + /* +* For unbind, we don't need any extra data, host PASID is included in +* the minsz and that is all we need. +*/ + if (!bind) + return 0; /* Copy the remaining user data _after_ minsz */ - if (copy_from_user((void *) + minsz, udata + minsz, - data.argsz - minsz)) + if (copy_from_user((void *)data + minsz, udata + minsz, + data->argsz - minsz)) return -EFAULT; + return 0; +} + +int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev, + void __user *udata) +{ + + struct iommu_gpasid_bind_data data; + int ret; + + if (unlikely(!domain->ops->sva_bind_gpasid)) + return -ENODEV; + + ret = iommu_sva_prepare_bind_data(udata, true, ); + if (ret) + return ret; return domain->ops->sva_bind_gpasid(domain, dev, ); } EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid); -int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev, -ioasid_t pasid) +int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev, + struct iommu_gpasid_bind_data *data) { if (unlikely(!domain->ops->sva_unbind_gpasid)) return -ENODEV; - return domain->ops->sva_unbind_gpasid(dev, pasid); + return domain->ops->sva_unbind_gpasid(dev, data->hpasid); +} +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid); + +int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev, + void __user *udata) +{ + struct iommu_gpasid_bind_data data; + int ret; + + if (unlikely(!domain->ops->sva_bind_gpasid)) + return -ENODEV; + + ret = iommu_sva_prepare_bind_data(udata, false, ); + if (ret) + return ret; + + return __iommu_sva_unbind_gpasid(domain, dev, ); } EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a688fea42ae5..2567c33dc4e8 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -437,7 +437,9 @@ extern int iommu_cache_invalidate(struct
[PATCH v3 3/5] iommu/uapi: Use named union for user data
IOMMU UAPI data size is filled by the user space which must be validated by ther kernel. To ensure backward compatibility, user data can only be extended by either re-purpose padding bytes or extend the variable sized union at the end. No size change is allowed before the union. Therefore, the minimum size is the offset of the union. To use offsetof() on the union, we must make it named. Link: https://lkml.org/lkml/2020/6/11/834 Signed-off-by: Jacob Pan --- drivers/iommu/intel/iommu.c | 24 drivers/iommu/intel/svm.c | 2 +- include/uapi/linux/iommu.h | 4 ++-- 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 50fc62413a35..59cba214ef13 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5409,8 +5409,8 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, /* Size is only valid in address selective invalidation */ if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) - size = to_vtd_size(inv_info->addr_info.granule_size, - inv_info->addr_info.nb_granules); + size = to_vtd_size(inv_info->granu.addr_info.granule_size, + inv_info->granu.addr_info.nb_granules); for_each_set_bit(cache_type, (unsigned long *)_info->cache, @@ -5431,20 +5431,20 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, * granularity. */ if (inv_info->granularity == IOMMU_INV_GRANU_PASID && - (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID)) - pasid = inv_info->pasid_info.pasid; + (inv_info->granu.pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID)) + pasid = inv_info->granu.pasid_info.pasid; else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR && -(inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID)) - pasid = inv_info->addr_info.pasid; +(inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID)) + pasid = inv_info->granu.addr_info.pasid; switch (BIT(cache_type)) { case IOMMU_CACHE_INV_TYPE_IOTLB: /* HW will ignore LSB bits based on address mask */ if (inv_info->granularity == IOMMU_INV_GRANU_ADDR && size && - (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) { + (inv_info->granu.addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) { WARN_ONCE(1, "Address out of range, 0x%llx, size order %llu\n", - inv_info->addr_info.addr, size); + inv_info->granu.addr_info.addr, size); } /* @@ -5452,9 +5452,9 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, * We use npages = -1 to indicate that. */ qi_flush_piotlb(iommu, did, pasid, - mm_to_dma_pfn(inv_info->addr_info.addr), + mm_to_dma_pfn(inv_info->granu.addr_info.addr), (granu == QI_GRAN_NONG_PASID) ? -1 : 1 << size, - inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF); + inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF); if (!info->ats_enabled) break; @@ -5475,13 +5475,13 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, size = 64 - VTD_PAGE_SHIFT; addr = 0; } else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) - addr = inv_info->addr_info.addr; + addr = inv_info->granu.addr_info.addr; if (info->ats_enabled) qi_flush_dev_iotlb_pasid(iommu, sid, info->pfsid, pasid, info->ats_qdep, - inv_info->addr_info.addr, + inv_info->granu.addr_info.addr, size); else pr_warn_ratelimited("Passdown device IOTLB flush w/o ATS!\n"); diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index
[PATCH v3 1/5] docs: IOMMU user API
IOMMU UAPI is newly introduced to support communications between guest virtual IOMMU and host IOMMU. There has been lots of discussions on how it should work with VFIO UAPI and userspace in general. This document is indended to clarify the UAPI design and usage. The mechenics of how future extensions should be achieved are also covered in this documentation. Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- Documentation/userspace-api/iommu.rst | 244 ++ 1 file changed, 244 insertions(+) create mode 100644 Documentation/userspace-api/iommu.rst diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst new file mode 100644 index ..f9e4ed90a413 --- /dev/null +++ b/Documentation/userspace-api/iommu.rst @@ -0,0 +1,244 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. iommu: + += +IOMMU Userspace API += + +IOMMU UAPI is used for virtualization cases where communications are +needed between physical and virtual IOMMU drivers. For native +usage, IOMMU is a system device which does not need to communicate +with user space directly. + +The primary use cases are guest Shared Virtual Address (SVA) and +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU) is +required to communicate with the physical IOMMU in the host. + +.. contents:: :local: + +Functionalities +=== +Communications of user and kernel involve both directions. The +supported user-kernel APIs are as follows: + +1. Alloc/Free PASID +2. Bind/unbind guest PASID (e.g. Intel VT-d) +3. Bind/unbind guest PASID table (e.g. ARM sMMU) +4. Invalidate IOMMU caches +5. Service page requests + +Requirements + +The IOMMU UAPIs are generic and extensible to meet the following +requirements: + +1. Emulated and para-virtualised vIOMMUs +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.) +3. Extensions to the UAPI shall not break existing user space + +Interfaces +== +Although the data structures defined in IOMMU UAPI are self-contained, +there is no user API functions introduced. Instead, IOMMU UAPI is +designed to work with existing user driver frameworks such as VFIO. + +Extension Rules & Precautions +- +When IOMMU UAPI gets extended, the data structures can *only* be +modified in two ways: + +1. Adding new fields by re-purposing the padding[] field. No size change. +2. Adding new union members at the end. May increase in size. + +No new fields can be added *after* the variable sized union in that it +will break backward compatibility when offset moves. In both cases, a +new flag must be accompanied with a new field such that the IOMMU +driver can process the data based on the new flag. Version field is +only reserved for the unlikely event of UAPI upgrade at its entirety. + +It's *always* the caller's responsibility to indicate the size of the +structure passed by setting argsz appropriately. +Though at the same time, argsz is user provided data which is not +trusted. The argsz field allows the user to indicate how much data +they're providing, it's still the kernel's responsibility to validate +whether it's correct and sufficient for the requested operation. + +Compatibility Checking +-- +When IOMMU UAPI extension results in size increase, user such as VFIO +has to handle the following cases: + +1. User and kernel has exact size match +2. An older user with older kernel header (smaller UAPI size) running on a + newer kernel (larger UAPI size) +3. A newer user with newer kernel header (larger UAPI size) running + on an older kernel. +4. A malicious/misbehaving user pass illegal/invalid size but within + range. The data may contain garbage. + +Feature Checking + +While launching a guest with vIOMMU, it is important to ensure that host +can support the UAPI data structures to be used for vIOMMU-pIOMMU +communications. Without upfront compatibility checking, future faults +are difficult to report even in normal conditions. For example, TLB +invalidations should always succeed. There is no architectural way to +report back to the vIOMMU if the UAPI data is incompatible. If that +happens, in order to protect IOMMU iosolation guarantee, we have to +resort to not giving completion status in vIOMMU. This may result in +VM hang. + +For this reason the following IOMMU UAPIs cannot fail: + +1. Free PASID +2. Unbind guest PASID +3. Unbind guest PASID table (SMMU) +4. Cache invalidate + +User applications such as QEMU is expected to import kernel UAPI +headers. Backward compatibility is supported per feature flags. +For example, an older QEMU (with older kernel header) can run on newer +kernel. Newer QEMU (with new kernel header) may refuse to initialize +on an older kernel if new feature flags are not supported by older +kernel. Simply recompile existing code with newer kernel header should +not be an issue in that
[PATCH v3 0/5] IOMMU user API enhancement
IOMMU user API header was introduced to support nested DMA translation and related fault handling. The current UAPI data structures consist of three areas that cover the interactions between host kernel and guest: - fault handling - cache invalidation - bind guest page tables, i.e. guest PASID Future extensions are likely to support more architectures and vIOMMU features. In the previous discussion, using user-filled data size and feature flags is made a preferred approach over a unified version number. https://lkml.org/lkml/2020/1/29/45 In addition to introduce argsz field to data structures, this patchset is also trying to document the UAPI design, usage, and extension rules. VT-d driver changes to utilize the new argsz field is included, VFIO usage is to follow. Thanks, Jacob Changeog: v3: - Rewrote backward compatibility rule to support existing code re-compiled with newer kernel UAPI header that runs on older kernel. Based on review comment from Alex W. https://lore.kernel.org/linux-iommu/20200611094741.6d118...@w520.home/ - Take user pointer directly in UAPI functions. Perform argsz check and copy_from_user() in IOMMU driver. Eliminate the need for VFIO or other upper layer to parse IOMMU data. - Create wrapper function for in-kernel users of UAPI functions v2: - Removed unified API version and helper - Introduced argsz for each UAPI data - Introduced UAPI doc Jacob Pan (5): docs: IOMMU user API iommu/uapi: Add argsz for user filled data iommu/uapi: Use named union for user data iommu/uapi: Handle data and argsz filled by users iommu/uapi: Support both kernel and user unbind guest PASID Documentation/userspace-api/iommu.rst | 244 ++ drivers/iommu/intel/iommu.c | 24 ++-- drivers/iommu/intel/svm.c | 5 +- drivers/iommu/iommu.c | 138 +-- include/linux/iommu.h | 20 ++- include/uapi/linux/iommu.h| 10 +- 6 files changed, 413 insertions(+), 28 deletions(-) create mode 100644 Documentation/userspace-api/iommu.rst -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users
IOMMU UAPI data has a user filled argsz field which indicates the data length comes with the API call. User data is not trusted, argsz must be validated based on the current kernel data size, mandatory data size, and feature flags. User data may also be extended, results in possible argsz increase. Backward compatibility is ensured based on size and flags checking. Details are documented in Documentation/userspace-api/iommu.rst This patch adds sanity checks in both IOMMU layer and vendor code, where VT-d is the only user for now. Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/iommu/intel/svm.c | 3 ++ drivers/iommu/iommu.c | 96 --- include/linux/iommu.h | 7 ++-- 3 files changed, 98 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 713b3a218483..237db56878c0 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -244,6 +244,9 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev, data->format != IOMMU_PASID_FORMAT_INTEL_VTD) return -EINVAL; + if (data->argsz != offsetofend(struct iommu_gpasid_bind_data, vendor.vtd)) + return -EINVAL; + if (!dev_is_pci(dev)) return -ENOTSUPP; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index d43120eb1dc5..4a025c429b41 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1951,22 +1951,108 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev) EXPORT_SYMBOL_GPL(iommu_attach_device); int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev, - struct iommu_cache_invalidate_info *inv_info) + void __user *uinfo) { + struct iommu_cache_invalidate_info inv_info; + unsigned long minsz, maxsz; + if (unlikely(!domain->ops->cache_invalidate)) return -ENODEV; - return domain->ops->cache_invalidate(domain, dev, inv_info); + /* Current kernel data size is the max to be copied from user */ + maxsz = sizeof(struct iommu_cache_invalidate_info); + memset((void *)_info, 0, maxsz); + + /* +* No new spaces can be added before the variable sized union, the +* minimum size is the offset to the union. +*/ + minsz = offsetof(struct iommu_cache_invalidate_info, granu); + + /* Copy minsz from user to get flags and argsz */ + if (copy_from_user(_info, uinfo, minsz)) + return -EFAULT; + + /* Fields before variable size union is mandatory */ + if (inv_info.argsz < minsz) + return -EINVAL; + /* +* User might be using a newer UAPI header, we shall let IOMMU vendor +* driver decide on what size it needs. Since the UAPI data extension +* can be vendor specific, larger argsz could be the result of extension +* for one vendor but it should not affect another vendor. +*/ + /* +* User might be using a newer UAPI header which has a larger data +* size, we shall support the existing flags within the current +* size. +*/ + if (inv_info.argsz > maxsz) + inv_info.argsz = maxsz; + + /* Checking the exact argsz based on generic flags */ + if (inv_info.granularity == IOMMU_INV_GRANU_ADDR && + inv_info.argsz != offsetofend(struct iommu_cache_invalidate_info, + granu.addr_info)) + return -EINVAL; + + if (inv_info.granularity == IOMMU_INV_GRANU_PASID && + inv_info.argsz != offsetofend(struct iommu_cache_invalidate_info, + granu.pasid_info)) + return -EINVAL; + + /* Copy the remaining user data _after_ minsz */ + if (copy_from_user((void *)_info + minsz, uinfo + minsz, + inv_info.argsz - minsz)) + return -EFAULT; + + return domain->ops->cache_invalidate(domain, dev, _info); } EXPORT_SYMBOL_GPL(iommu_cache_invalidate); -int iommu_sva_bind_gpasid(struct iommu_domain *domain, - struct device *dev, struct iommu_gpasid_bind_data *data) +int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev, + void __user *udata) { + + struct iommu_gpasid_bind_data data; + unsigned long minsz, maxsz; + if (unlikely(!domain->ops->sva_bind_gpasid)) return -ENODEV; - return domain->ops->sva_bind_gpasid(domain, dev, data); + /* Current kernel data size is the max to be copied from user */ + maxsz = sizeof(struct iommu_gpasid_bind_data); + memset((void *), 0, maxsz); + + /* +* No new spaces can be added before the variable sized union, the +* minimum
Re: [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()
On 2020-06-23 10:21, John Garry wrote: > On 23/06/2020 02:07, kernel test robot wrote: > > + Rikard, as the GENMASK compile-time sanity checks were added recently > >> Hi John, >> >> I love your patch! Perhaps something to improve: >> >> [auto build test WARNING on iommu/next] >> [If your patch is applied to the wrong git tree, kindly drop us a note. >> And when submitting patch, we suggest to use as documented in >> https://git-scm.com/docs/git-format-patch] >> >> url: >> https://github.com/0day-ci/linux/commits/John-Garry/iommu-arm-smmu-v3-Improve-cmdq-lock-efficiency/20200623-013438 >> >> >> base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git >> next >> config: arm64-randconfig-c024-20200622 (attached as .config) >> compiler: aarch64-linux-gcc (GCC) 9.3.0 >> >> If you fix the issue, kindly add following tag as appropriate >> Reported-by: kernel test robot >> >> All warnings (new ones prefixed by >>, old ones prefixed by <<): >> >> In file included from include/linux/bits.h:23, >> from include/linux/ioport.h:15, >> from include/linux/acpi.h:12, >> from drivers/iommu/arm-smmu-v3.c:12: >> drivers/iommu/arm-smmu-v3.c: In function 'arm_smmu_cmdq_issue_cmdlist': >> include/linux/bits.h:26:28: warning: comparison of unsigned expression >> < 0 is always false [-Wtype-limits] >> 26 | __builtin_constant_p((l) > (h)), (l) > (h), 0))) > > I'd say that GENMASK_INPUT_CHECK() should be able to handle a l=0 and > h=unsigned value, so I doubt this warn. > > Using GENMASK((int)cmdq->q.llq.max_n_shift, 0) resolves it, but it looks > like GENMASK_INPUT_CHECK() could be improved. That said, I think this particular case might be even better off dodging GENMASK() entirely, by doing something like this first. Untested... Robin. ->8- Subject: [PATCH] iommu/arm-smmu-v3: Streamline queue calculations Beyond the initial queue setup based on the log2 values from ID registers, the log2 queue size is only ever used in the form of (1 << max_n_shift) to repeatedly recalculate the number of queue elements. Simply storing it in that form leads to slightly more efficient code, particularly in the low-level queue accessors where it counts most: add/remove: 0/0 grow/shrink: 1/7 up/down: 4/-120 (-116) Function old new delta arm_smmu_init_one_queue 360 364 +4 arm_smmu_priq_thread 512 508 -4 arm_smmu_evtq_thread 300 292 -8 __arm_smmu_cmdq_poll_set_valid_map.isra 296 288 -8 queue_remove_raw 180 164 -16 arm_smmu_gerror_handler 732 716 -16 arm_smmu_device_probe 43124284 -28 arm_smmu_cmdq_issue_cmdlist 18921852 -40 Total: Before=20135, After=20019, chg -0.58% Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-v3.c | 46 ++--- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index f578677a5c41..407cb9451a7a 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -185,8 +185,8 @@ #define ARM_SMMU_MEMATTR_DEVICE_nGnRE 0x1 #define ARM_SMMU_MEMATTR_OIWB 0xf -#define Q_IDX(llq, p) ((p) & ((1 << (llq)->max_n_shift) - 1)) -#define Q_WRP(llq, p) ((p) & (1 << (llq)->max_n_shift)) +#define Q_IDX(llq, p) ((p) & ((llq)->max_n - 1)) +#define Q_WRP(llq, p) ((p) & (llq)->max_n) #define Q_OVERFLOW_FLAG(1U << 31) #define Q_OVF(p) ((p) & Q_OVERFLOW_FLAG) #define Q_ENT(q, p)((q)->base +\ @@ -531,7 +531,7 @@ struct arm_smmu_ll_queue { } atomic; u8 __pad[SMP_CACHE_BYTES]; } cacheline_aligned_in_smp; - u32 max_n_shift; + u32 max_n; }; struct arm_smmu_queue { @@ -771,7 +771,7 @@ static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n) cons = Q_IDX(q, q->cons); if (Q_WRP(q, q->prod) == Q_WRP(q, q->cons)) - space = (1 << q->max_n_shift) - (prod - cons); + space = q->max_n - (prod - cons); else space = cons - prod; @@ -1164,8 +1164,8 @@ static void __arm_smmu_cmdq_poll_set_valid_map(struct arm_smmu_cmdq *cmdq, { u32 swidx, sbidx, ewidx, ebidx; struct arm_smmu_ll_queue llq = { - .max_n_shift
[kbuild] Re: [PATCH v7 33/36] rapidio: fix common struct sg_table related issues
Hi Marek, url: https://github.com/0day-ci/linux/commits/Marek-Szyprowski/DRM-fix-struct-sg_table-nents-vs-orig_nents-misuse/20200619-184302 base:ce2cc8efd7a40cbd17841add878cb691d0ce0bba config: i386-randconfig-m021-20200623 (attached as .config) compiler: gcc-9 (Debian 9.3.0-13) 9.3.0 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot Reported-by: Dan Carpenter smatch warnings: drivers/rapidio/devices/rio_mport_cdev.c:939 rio_dma_transfer() error: uninitialized symbol 'nents'. # https://github.com/0day-ci/linux/commit/c99597eab54307f46248273962da0c23a9a88b76 git remote add linux-review https://github.com/0day-ci/linux git remote update linux-review git checkout c99597eab54307f46248273962da0c23a9a88b76 vim +/nents +939 drivers/rapidio/devices/rio_mport_cdev.c e8de370188d098 Alexandre Bounine 2016-03-22 804 static int 4e1016dac1ccce Alexandre Bounine 2016-05-05 805 rio_dma_transfer(struct file *filp, u32 transfer_mode, e8de370188d098 Alexandre Bounine 2016-03-22 806enum rio_transfer_sync sync, enum dma_data_direction dir, e8de370188d098 Alexandre Bounine 2016-03-22 807struct rio_transfer_io *xfer) e8de370188d098 Alexandre Bounine 2016-03-22 808 { e8de370188d098 Alexandre Bounine 2016-03-22 809 struct mport_cdev_priv *priv = filp->private_data; e8de370188d098 Alexandre Bounine 2016-03-22 810 unsigned long nr_pages = 0; e8de370188d098 Alexandre Bounine 2016-03-22 811 struct page **page_list = NULL; e8de370188d098 Alexandre Bounine 2016-03-22 812 struct mport_dma_req *req; e8de370188d098 Alexandre Bounine 2016-03-22 813 struct mport_dev *md = priv->md; e8de370188d098 Alexandre Bounine 2016-03-22 814 struct dma_chan *chan; 67446283d89467 John Hubbard 2020-06-04 815 int ret; e8de370188d098 Alexandre Bounine 2016-03-22 816 int nents; ^ e8de370188d098 Alexandre Bounine 2016-03-22 817 e8de370188d098 Alexandre Bounine 2016-03-22 818 if (xfer->length == 0) e8de370188d098 Alexandre Bounine 2016-03-22 819 return -EINVAL; e8de370188d098 Alexandre Bounine 2016-03-22 820 req = kzalloc(sizeof(*req), GFP_KERNEL); e8de370188d098 Alexandre Bounine 2016-03-22 821 if (!req) e8de370188d098 Alexandre Bounine 2016-03-22 822 return -ENOMEM; e8de370188d098 Alexandre Bounine 2016-03-22 823 e8de370188d098 Alexandre Bounine 2016-03-22 824 ret = get_dma_channel(priv); e8de370188d098 Alexandre Bounine 2016-03-22 825 if (ret) { e8de370188d098 Alexandre Bounine 2016-03-22 826 kfree(req); e8de370188d098 Alexandre Bounine 2016-03-22 827 return ret; e8de370188d098 Alexandre Bounine 2016-03-22 828 } c5157b76869ba9 Ioan Nicu 2018-04-20 829 chan = priv->dmach; c5157b76869ba9 Ioan Nicu 2018-04-20 830 c5157b76869ba9 Ioan Nicu 2018-04-20 831 kref_init(>refcount); c5157b76869ba9 Ioan Nicu 2018-04-20 832 init_completion(>req_comp); c5157b76869ba9 Ioan Nicu 2018-04-20 833 req->dir = dir; c5157b76869ba9 Ioan Nicu 2018-04-20 834 req->filp = filp; c5157b76869ba9 Ioan Nicu 2018-04-20 835 req->priv = priv; c5157b76869ba9 Ioan Nicu 2018-04-20 836 req->dmach = chan; c5157b76869ba9 Ioan Nicu 2018-04-20 837 req->sync = sync; e8de370188d098 Alexandre Bounine 2016-03-22 838 e8de370188d098 Alexandre Bounine 2016-03-22 839 /* e8de370188d098 Alexandre Bounine 2016-03-22 840* If parameter loc_addr != NULL, we are transferring data from/to e8de370188d098 Alexandre Bounine 2016-03-22 841* data buffer allocated in user-space: lock in memory user-space e8de370188d098 Alexandre Bounine 2016-03-22 842* buffer pages and build an SG table for DMA transfer request e8de370188d098 Alexandre Bounine 2016-03-22 843* e8de370188d098 Alexandre Bounine 2016-03-22 844* Otherwise (loc_addr == NULL) contiguous kernel-space buffer is e8de370188d098 Alexandre Bounine 2016-03-22 845* used for DMA data transfers: build single entry SG table using e8de370188d098 Alexandre Bounine 2016-03-22 846* offset within the internal buffer specified by handle parameter. e8de370188d098 Alexandre Bounine 2016-03-22 847*/ e8de370188d098 Alexandre Bounine 2016-03-22 848 if (xfer->loc_addr) { c4860ad6056483 Tvrtko Ursulin 2017-07-31 849 unsigned int offset; e8de370188d098 Alexandre Bounine 2016-03-22 850 long pinned; e8de370188d098 Alexandre Bounine 2016-03-22 851 c4860ad6056483 Tvrtko Ursulin 2017-07-31 852 offset = lower_32_bits(offset_in_page(xfer->loc_addr)); e8de370188d098 Alexandre Bounine
[PATCH 2/7] iommu/vt-d: Remove global page support in devTLB flush
Global pages support is removed from VT-d spec 3.0 for dev TLB invalidation. This patch is to remove the bits for vSVA. Similar change already made for the native SVA. See the link below. Link: https://lkml.org/lkml/2019/8/26/651 Signed-off-by: Jacob Pan --- drivers/iommu/intel/dmar.c | 4 +--- drivers/iommu/intel/iommu.c | 4 ++-- include/linux/intel-iommu.h | 3 +-- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index cc46dff98fa0..d9f973fa1190 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1437,8 +1437,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr, /* PASID-based device IOTLB Invalidate */ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, - u32 pasid, u16 qdep, u64 addr, - unsigned int size_order, u64 granu) + u32 pasid, u16 qdep, u64 addr, unsigned int size_order) { unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1); struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0}; @@ -1446,7 +1445,6 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) | QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE | QI_DEV_IOTLB_PFSID(pfsid); - desc.qw1 = QI_DEV_EIOTLB_GLOB(granu); /* * If S bit is 0, we only flush a single page. If S bit is set, diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9129663a7406..96340da57075 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5466,7 +5466,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, info->pfsid, pasid, info->ats_qdep, inv_info->addr_info.addr, - size, granu); + size); break; case IOMMU_CACHE_INV_TYPE_DEV_IOTLB: if (info->ats_enabled) @@ -5474,7 +5474,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, info->pfsid, pasid, info->ats_qdep, inv_info->addr_info.addr, - size, granu); + size); else pr_warn_ratelimited("Passdown device IOTLB flush w/o ATS!\n"); break; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 729386ca8122..9a6614880773 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -380,7 +380,6 @@ enum { #define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK) #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11) -#define QI_DEV_EIOTLB_GLOB(g) ((u64)(g) & 0x1) #define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xf) << 32) #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16) #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4) @@ -704,7 +703,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr, void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, u32 pasid, u16 qdep, u64 addr, - unsigned int size_order, u64 granu); + unsigned int size_order); void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid); -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/7] iommu/vt-d: Fix devTLB flush for vSVA
From: Liu Yi L For guest SVA usage, in order to optimize for less VMEXIT, guest request of IOTLB flush also includes device TLB. On the host side, IOMMU driver performs IOTLB and implicit devTLB invalidation. When PASID-selective granularity is requested by the guest we need to derive the equivalent address range for devTLB instead of using the address information in the UAPI data. The reason for that is, unlike IOTLB flush, devTLB flush does not support PASID-selective granularity. This is to say, we need to set the following in the PASID based devTLB invalidation descriptor: - entire 64 bit range in address ~(0x1 << 63) - S bit = 1 (VT-d CH 6.5.2.6). Without this fix, device TLB flush range is not set properly for PASID selective granularity. This patch also merged devTLB flush code for both implicit and explicit cases. Fixes: 6ee1b77ba3ac ("iommu/vt-d: Add svm/sva invalidate function") Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/iommu/intel/iommu.c | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 96340da57075..5ea5732d5ec4 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5408,7 +5408,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, sid = PCI_DEVID(bus, devfn); /* Size is only valid in address selective invalidation */ - if (inv_info->granularity != IOMMU_INV_GRANU_PASID) + if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) size = to_vtd_size(inv_info->addr_info.granule_size, inv_info->addr_info.nb_granules); @@ -5417,6 +5417,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, IOMMU_CACHE_INV_TYPE_NR) { int granu = 0; u64 pasid = 0; + u64 addr = 0; granu = to_vtd_granularity(cache_type, inv_info->granularity); if (granu == -EINVAL) { @@ -5456,19 +5457,27 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, (granu == QI_GRAN_NONG_PASID) ? -1 : 1 << size, inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF); + if (!info->ats_enabled) + break; /* * Always flush device IOTLB if ATS is enabled. vIOMMU * in the guest may assume IOTLB flush is inclusive, * which is more efficient. */ - if (info->ats_enabled) - qi_flush_dev_iotlb_pasid(iommu, sid, - info->pfsid, pasid, - info->ats_qdep, - inv_info->addr_info.addr, - size); - break; + fallthrough; case IOMMU_CACHE_INV_TYPE_DEV_IOTLB: + /* +* There is no PASID selective flush for device TLB, so +* the equivalent of that is we set the size to be the +* entire range of 64 bit. User only provides PASID info +* without address info. So we set addr to 0. +*/ + if (inv_info->granularity == IOMMU_INV_GRANU_PASID) { + size = 64 - VTD_PAGE_SHIFT; + addr = 0; + } else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) + addr = inv_info->addr_info.addr; + if (info->ats_enabled) qi_flush_dev_iotlb_pasid(iommu, sid, info->pfsid, pasid, -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/7] iommu/vt-d: Enforce PASID devTLB field mask
From: Liu Yi L Set proper masks to avoid invalid input spillover to reserved bits. Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- include/linux/intel-iommu.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 4100bd224f5c..729386ca8122 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -380,8 +380,8 @@ enum { #define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK) #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11) -#define QI_DEV_EIOTLB_GLOB(g) ((u64)g) -#define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32) +#define QI_DEV_EIOTLB_GLOB(g) ((u64)(g) & 0x1) +#define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xf) << 32) #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16) #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4) #define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \ -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/7] iommu/vt-d: Misc tweaks and fixes for vSVA
Hi Baolu and all, This a series to address some of the issues we found in vSVA support. Most of the patches deal with exception handling, we also removed some bits that are not currently supported. Many thanks to Kevin Tian's review. Jacob & Yi Jacob Pan (4): iommu/vt-d: Remove global page support in devTLB flush iommu/vt-d: Fix PASID devTLB invalidation iommu/vt-d: Warn on out-of-range invalidation address iommu/vt-d: Disable multiple GPASID-dev bind Liu Yi L (3): iommu/vt-d: Enforce PASID devTLB field mask iommu/vt-d: Handle non-page aligned address iommu/vt-d: Fix devTLB flush for vSVA drivers/iommu/intel/dmar.c | 23 ++- drivers/iommu/intel/iommu.c | 34 +- drivers/iommu/intel/pasid.c | 11 ++- drivers/iommu/intel/svm.c | 22 +- include/linux/intel-iommu.h | 5 ++--- 5 files changed, 60 insertions(+), 35 deletions(-) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/7] iommu/vt-d: Handle non-page aligned address
From: Liu Yi L Address information for device TLB invalidation comes from userspace when device is directly assigned to a guest with vIOMMU support. VT-d requires page aligned address. This patch checks and enforce address to be page aligned, otherwise reserved bits can be set in the invalidation descriptor. Unrecoverable fault will be reported due to non-zero value in the reserved bits. Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/iommu/intel/dmar.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index d9f973fa1190..53f4e5003620 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1455,9 +1455,24 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, * Max Invs Pending (MIP) is set to 0 for now until we have DIT in * ECAP. */ - desc.qw1 |= addr & ~mask; - if (size_order) + if (addr & ~VTD_PAGE_MASK) + pr_warn_ratelimited("Invalidate non-page aligned address %llx\n", addr); + + if (size_order) { + /* Take page address */ + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr); + /* +* Existing 0s in address below size_order may be the least +* significant bit, we must set them to 1s to avoid having +* smaller size than desired. +*/ + desc.qw1 |= GENMASK_ULL(size_order + VTD_PAGE_SHIFT, + VTD_PAGE_SHIFT); + /* Clear size_order bit to indicate size */ + desc.qw1 &= ~mask; + /* Set the S bit to indicate flushing more than 1 page */ desc.qw1 |= QI_DEV_EIOTLB_SIZE; + } qi_submit_sync(iommu, , 1, 0); } -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/7] iommu/vt-d: Fix PASID devTLB invalidation
DevTLB flush can be used for both DMA request with and without PASIDs. The former uses PASID#0 (RID2PASID), latter uses non-zero PASID for SVA usage. This patch adds a check for PASID value such that devTLB flush with PASID is used for SVA case. This is more efficient in that multiple PASIDs can be used by a single device, when tearing down a PASID entry we shall flush only the devTLB specific to a PASID. Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table") Signed-off-by: Jacob Pan --- drivers/iommu/intel/pasid.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index c81f0f17c6ba..3991a24539a1 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu, qdep = info->ats_qdep; pfsid = info->pfsid; - qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT); + /* +* When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID), +* devTLB flush w/o PASID should be used. For non-zero PASID under +* SVA usage, device could do DMA with multiple PASIDs. It is more +* efficient to flush devTLB specific to the PASID. +*/ + if (pasid) + qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT); + else + qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT); } void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev, -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/7] iommu/vt-d: Warn on out-of-range invalidation address
For guest requested IOTLB invalidation, address and mask are provided as part of the invalidation data. VT-d HW silently ignores any address bits below the mask. SW shall also allow such case but give warning if address does not align with the mask. This patch relax the fault handling from error to warning and proceed with invalidation request with the given mask. Signed-off-by: Jacob Pan --- drivers/iommu/intel/iommu.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5ea5732d5ec4..50fc62413a35 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5439,13 +5439,12 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, switch (BIT(cache_type)) { case IOMMU_CACHE_INV_TYPE_IOTLB: + /* HW will ignore LSB bits based on address mask */ if (inv_info->granularity == IOMMU_INV_GRANU_ADDR && size && (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) { - pr_err_ratelimited("Address out of range, 0x%llx, size order %llu\n", - inv_info->addr_info.addr, size); - ret = -ERANGE; - goto out_unlock; + WARN_ONCE(1, "Address out of range, 0x%llx, size order %llu\n", + inv_info->addr_info.addr, size); } /* -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 7/7] iommu/vt-d: Disable multiple GPASID-dev bind
For the unlikely use case where multiple aux domains from the same pdev are attached to a single guest and then bound to a single process (thus same PASID) within that guest, we cannot easily support this case by refcounting the number of users. As there is only one SL page table per PASID while we have multiple aux domains thus multiple SL page tables for the same PASID. Extra unbinding guest PASID can happen due to race between normal and exception cases. Termination of one aux domain may affect others unless we actively track and switch aux domains to ensure the validity of SL page tables and TLB states in the shared PASID entry. Support for sharing second level PGDs across domains can reduce the complexity but this is not available due to the limitations on VFIO container architecture. We can revisit this decision once sharing PGDs are available. Overall, the complexity and potential glitch do not warrant this unlikely use case thereby removed by this patch. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/iommu/intel/svm.c | 22 +- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 6c87c807a0ab..d386853121a2 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -277,20 +277,16 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev, goto out; } + /* +* Do not allow multiple bindings of the same device-PASID since +* there is only one SL page tables per PASID. We may revisit +* once sharing PGD across domains are supported. +*/ for_each_svm_dev(sdev, svm, dev) { - /* -* For devices with aux domains, we should allow -* multiple bind calls with the same PASID and pdev. -*/ - if (iommu_dev_feature_enabled(dev, - IOMMU_DEV_FEAT_AUX)) { - sdev->users++; - } else { - dev_warn_ratelimited(dev, -"Already bound with PASID %u\n", -svm->pasid); - ret = -EBUSY; - } + dev_warn_ratelimited(dev, +"Already bound with PASID %u\n", +svm->pasid); + ret = -EBUSY; goto out; } } else { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU
On Fri, Jun 19, 2020 at 10:26:54AM +0800, Zhangfei Gao wrote: > Have studied _DSM method, two issues we met comparing using quirk. > > 1. Need change definition of either pci_host_bridge or pci_dev, like adding > member can_stall, > while pci system does not know stall now. > > a, pci devices do not have uuid: uuid need be described in dsdt, while pci > devices are not defined in dsdt. > so we have to use host bridge. PCI devices *can* be described in the DSDT. IIUC these particular devices are hardwired (not plug-in cards), so platform firmware can know about them and could describe them in the DSDT. > b, Parsing dsdt is in in pci subsystem. > Like drivers/acpi/pci_root.c: > obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), _acpi_dsm_guid, > 1, > IGNORE_PCI_BOOT_CONFIG_DSM, NULL); > > After parsing DSM in pci, we need record this info. > Currently, can_stall info is recorded in iommu_fwspec, > which is allocated in iommu_fwspec_init and called by iort_iommu_configure > for uefi. You can look for a _DSM wherever it is convenient for you. It could be in an AMBA shim layer. > 2. Guest kernel also need support sva. > Using quirk, the guest can boot with sva enabled, since quirk is > self-contained by kernel. > If using _DSM, a specific uefi or dtb has to be provided, > currently we can useQEMU_EFI.fd from apt install qemu-efi I don't quite understand what this means, but as I mentioned before, a quirk for a *limited* number of devices is OK, as long as there is a plan that removes the need for a quirk for future devices. E.g., if the next platform version ships with a DTB or firmware with a _DSM or other mechanism that enables the kernel to discover this information without a kernel change, it's fine to use a quirk to cover the early platform. The principles are: - I don't want to have to update a quirk for every new Device ID that needs this. - I don't really want to have to manage non-PCI information in the struct pci_dev. If this is AMBA- or IOMMU-related, it should be stored in a structure related to AMBA or the IOMMU. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()
Den tis 23 juni 2020 12:21John Garry skrev: > On 23/06/2020 10:35, Rikard Falkeborn wrote: > > > > I'd say that GENMASK_INPUT_CHECK() should be able to handle a l=0 and > > h=unsigned value, so I doubt this warn. > > > > Using GENMASK((int)cmdq->q.llq.max_n_shift, 0) resolves it, but it > > looks > > like GENMASK_INPUT_CHECK() could be improved. > > > > > > Indeed it could, it is fixed in -next. > > ok, thanks for the pointer, but I still see this on today's -next with > this patch: > > make W=1 drivers/iommu/arm-smmu-v3.o > > Oh, ok thanks for reporting. I guess different gcc versions have different behaviour. I guess we'll have to change the comparison to (!((h) == (l) || (h) > (l))) instead (not sure I got all parenthesis and logic correct but you get the idea). I'm travelling and wont have time to look at this until next week though. Rikard In file included from ./include/linux/bits.h:23:0, > from ./include/linux/ioport.h:15, > from ./include/linux/acpi.h:12, > from drivers/iommu/arm-smmu-v3.c:12: > drivers/iommu/arm-smmu-v3.c: In function ‘arm_smmu_cmdq_issue_cmdlist’: > ./include/linux/bits.h:27:7: warning: comparison of unsigned expression > < 0 is always false [-Wtype-limits] >(l) > (h), 0))) >^ > ./include/linux/build_bug.h:16:62: note: in definition of macro > ‘BUILD_BUG_ON_ZERO’ > #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) > ^ > ./include/linux/bits.h:40:3: note: in expansion of macro > ‘GENMASK_INPUT_CHECK’ > (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) >^~~ > drivers/iommu/arm-smmu-v3.c:1404:18: note: in expansion of macro ‘GENMASK’ > u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0); > > That's gcc 7.5.0 . > > Cheers, > John > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.
On Mon, Apr 27, 2020 at 06:53:18PM +, Ashish Kalra wrote: > Hello Konrad, > > On Mon, Mar 30, 2020 at 10:25:51PM +, Ashish Kalra wrote: > > Hello Konrad, > > > > On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote: > > > On Tue, Feb 04, 2020 at 07:35:00PM +, Ashish Kalra wrote: > > > > Hello Konrad, > > > > > > > > Looking fwd. to your feedback regarding support of other memory > > > > encryption architectures such as Power, S390, etc. > > > > > > > > Thanks, > > > > Ashish > > > > > > > > On Fri, Jan 24, 2020 at 11:00:08PM +, Ashish Kalra wrote: > > > > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote: > > > > > > > > > > > > > > Additional memory calculations based on # of PCI devices and > > > > > > > their memory ranges will make it more complicated with so > > > > > > > many other permutations and combinations to explore, it is > > > > > > > essential to keep this patch as simple as possible by > > > > > > > adjusting the bounce buffer size simply by determining it > > > > > > > from the amount of provisioned guest memory. > > > > > >> > > > > > >> Please rework the patch to: > > > > > >> > > > > > >> - Use a log solution instead of the multiplication. > > > > > >>Feel free to cap it at a sensible value. > > > > > > > > > > Ok. > > > > > > > > > > >> > > > > > >> - Also the code depends on SWIOTLB calling in to the > > > > > >>adjust_swiotlb_default_size which looks wrong. > > > > > >> > > > > > >>You should not adjust io_tlb_nslabs from > > > > > >> swiotlb_size_or_default. > > > > > > > > > > >>That function's purpose is to report a value. > > > > > >> > > > > > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code. > > > > > >> > > > > > >> - Can you utilize the IOMMU_INIT APIs and have your own detect > > > > > >> which would > > > > > >>modify the io_tlb_nslabs (and set swiotbl=1?). > > > > > > > > > > This seems to be a nice option, but then IOMMU_INIT APIs are > > > > > x86-specific and this swiotlb buffer size adjustment is also needed > > > > > for other memory encryption architectures like Power, S390, etc. > > > > > > Oh dear. That I hadn't considered. > > > > > > > > > > >> > > > > > >>Actually you seem to be piggybacking on pci_swiotlb_detect_4gb > > > > > >> - so > > > > > >>perhaps add in this code ? Albeit it really should be in it's > > > > > >> own > > > > > >>file, not in arch/x86/kernel/pci-swiotlb.c > > > > > > > > > > Actually, we piggyback on pci_swiotlb_detect_override which sets > > > > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init() > > > > > forces swiotlb on, but again this is all x86 architecture specific. > > > > > > Then it looks like the best bet is to do it from within swiotlb_init? > > > We really can't do it from swiotlb_size_or_default - that function > > > should just return a value and nothing else. > > > > > > > Actually, we need to do it in swiotlb_size_or_default() as this gets called > > by > > reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to > > reserve low crashkernel memory. If we adjust swiotlb size later in > > swiotlb_init() which gets called later than reserve_crashkernel_low(), > > then any swiotlb size changes/expansion will conflict/overlap with the > > low memory reserved for crashkernel. > > > and will also potentially cause SWIOTLB buffer allocation failures. > > Do you have any feedback, comments on the above ? The init boot chain looks like this: initmem_init pci_iommu_alloc -> pci_swiotlb_detect_4gb -> swiotlb_init reserve_crashkernel reserve_crashkernel_low -> swiotlb_size_or_default .. (rootfs code): pci_iommu_init -> a bunch of the other IOMMU late_init code gets called.. -> pci_swiotlb_late_init I have to say I am lost to how your patch fixes "If we adjust swiolb size later .. then any swiotlb size .. will overlap with the low memory reserved for crashkernel"? Or are you saying that 'reserve_crashkernel_low' is the _culprit_ and it is the one changing the size? And hence it modifying the swiotlb size will fix this problem? Aka _before_ all the other IOMMU get their hand on it? If so why not create an IOMMU_INIT(crashkernel_adjust_swiotlb,pci_swiotlb_detect_override, NULL, NULL); And crashkernel_adjust_swiotlb would change the size of swiotlb buffer if conditions are found to require it. You also may want to put a #define DEBUG in arch/x86/kernel/pci-iommu_table.c to check out whether the tree structure of IOMMU entries is correct. But still I am lost - if say the AMD one does decide for unknown reason to expand the SWIOTLB you are still stuck with the 'overlap with the low memory reserved' or so. Perhaps add a late_init that gets called as the last one to validate this ? And maybe if the swiotlb gets turned off you also take proper
Re: [PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
On Tue, Jun 23, 2020 at 12:16:55PM +0100, Robin Murphy wrote: > On 2020-06-23 11:29, Thierry Reding wrote: > [...] > > > diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c > > > index c75b9d957b702..52c84c30f83e4 100644 > > > --- a/drivers/iommu/arm-smmu-impl.c > > > +++ b/drivers/iommu/arm-smmu-impl.c > > > @@ -160,6 +160,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct > > > arm_smmu_device *smmu) > > >*/ > > > switch (smmu->model) { > > > case ARM_MMU500: > > > + if (of_device_is_compatible(smmu->dev->of_node, > > > + "nvidia,tegra194-smmu-500")) > > > + return nvidia_smmu_impl_init(smmu); > > > > Should NVIDIA_TEGRA194_SMMU be a separate value for smmu->model, > > perhaps? That way we avoid this somewhat odd check here. > > No, this is simply in the wrong place. The design here is that we pick up > anything related to the basic SMMU IP (model) first, then make any > platform-specific integration checks. That way a platform-specific init > function can see the model impl set and subclass it if necessary (although > nobody's actually done that yet). The setup for Cavium is just a short-cut > since their model is unique to their integration, so the lines get a bit > blurred and there's little benefit to trying to separate it out. > > In short, put this down below with the other of_device_is_compatible() > checks. > > > > smmu->impl = _mmu500_impl; > > > break; > > > case CAVIUM_SMMUV2: > > > diff --git a/drivers/iommu/arm-smmu-nvidia.c > > > b/drivers/iommu/arm-smmu-nvidia.c > > > > I wonder if it would be better to name this arm-smmu-tegra.c to make it > > clearer that this is for a Tegra chip. We do have regular expressions in > > MAINTAINERS that catch anything with "tegra" in it to make this easier. > > There was a notion that these would be grouped by vendor, but if there's a > strong preference for all NVIDIA-SoC-related stuff to be named "Tegra" then > I'm not going to complain too much. Maybe I was being overly cautious. I was just trying to avoid adding something called nvidia-arm-smmu which might eventually turn out to be ambiguous if there was ever a non-Tegra chip and the ARM SMMU implementation was not compatible with the one instantiated on Tegra. Note that I have no knowledge of such a chip being designed, so this may never actually become an issue. In either case, the compatible string already identifies this as Tegra- specific, so we could always change the driver name later on if we have to. > > > new file mode 100644 > > > index 0..dafc293a45217 > > > --- /dev/null > > > +++ b/drivers/iommu/arm-smmu-nvidia.c > > > @@ -0,0 +1,161 @@ > > > +// SPDX-License-Identifier: GPL-2.0-only > > > +// Nvidia ARM SMMU v2 implementation quirks > > > > s/Nvidia/NVIDIA/ > > > > > +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. > > > > I suppose this should now also include 2020. > > > > > + > > > +#define pr_fmt(fmt) "nvidia-smmu: " fmt > > > > Same here. Might be worth making this "tegra-smmu: " for consistency. > > On the other hand, a log prefix that is literally the name of a completely > unrelated driver seems more confusing to users than useful. Same for the > function naming - the tegra_smmu_* namespace is already owned by that > driver. The ARM SMMU replaced the Tegra SMMU on Tegra186 and later, so both drivers are never going to run concurrently. The only "problem" might be that both drivers have symbols with the same prefix, but since they don't export any of those symbols I don't see how that would become a real issue. But then again, the Tegra SMMU is also technically an NVIDIA SMMU, so sticking with the current name might also be confusing. Perhaps a good compromise would be to use a "tegra194{-,_}smmu" prefix instead, which would make this fairly specific to just Tegra194 (and compatible chips). That's a fairly common pattern we've been following on Tegra, as you can for example see in drivers/gpio/gpio-tegra186.c, drivers/dma/tegra210-adma.c, drivers/memory/tegra/tegra186-emc.c, etc. Thierry signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 3/4] iommu/arm-smmu: Add global/context fault implementation hooks
On Tue, Jun 23, 2020 at 12:30:16PM +0100, Robin Murphy wrote: > On 2020-06-23 09:36, Thierry Reding wrote: > [...] > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > > index 243bc4cb2705b..d720e1e191176 100644 > > > --- a/drivers/iommu/arm-smmu.c > > > +++ b/drivers/iommu/arm-smmu.c > > > @@ -673,6 +673,7 @@ static int arm_smmu_init_domain_context(struct > > > iommu_domain *domain, > > > enum io_pgtable_fmt fmt; > > > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > > > struct arm_smmu_cfg *cfg = _domain->cfg; > > > + irqreturn_t (*context_fault)(int irq, void *dev); > > > mutex_lock(_domain->init_mutex); > > > if (smmu_domain->smmu) > > > @@ -835,7 +836,9 @@ static int arm_smmu_init_domain_context(struct > > > iommu_domain *domain, > > >* handler seeing a half-initialised domain state. > > >*/ > > > irq = smmu->irqs[smmu->num_global_irqs + cfg->irptndx]; > > > - ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault, > > > + context_fault = (smmu->impl && smmu->impl->context_fault) ? > > > + smmu->impl->context_fault : arm_smmu_context_fault; > > > > A simpler way might have been to assign arm_smmu_context_fault to all > > implementations. That way we wouldn't have to perform this check here > > and instead just always using smmu->impl->context_fault. > > But smmu->impl can still be NULL... > > Everything in impl, including the presence of impl itself, is optional, so > the notion of overriding a default with the same default doesn't really make > much sense, and would go against the pattern everywhere else. True. I had assumed that every implementation would set smmu->impl anyway, in which case there'd be little reason to use these default fallbacks since each implementation could simply directly refer to the exact implementation that it wants. Perhaps the above could be made a bit more palatable by using a standard if/else rather than the ternary operator? That would also more closely match the pattern elsewhere. Thierry signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] dma-remap: Align the size in dma_common_*_remap()
On Tue, Jun 23, 2020 at 02:07:55PM +0200, Eric Auger wrote: > Running a guest with a virtio-iommu protecting virtio devices > is broken since commit 515e5b6d90d4 ("dma-mapping: use vmap insted > of reimplementing it"). Before the conversion, the size was > page aligned in __get_vm_area_node(). Doing so fixes the > regression. > > Fixes: 515e5b6d90d4 ("dma-mapping: use vmap insted of reimplementing it") > Signed-off-by: Eric Auger Thanks, I've applied this locally, waiting for git.infradead.org to be back up. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] dma-remap: Align the size in dma_common_*_remap()
Running a guest with a virtio-iommu protecting virtio devices is broken since commit 515e5b6d90d4 ("dma-mapping: use vmap insted of reimplementing it"). Before the conversion, the size was page aligned in __get_vm_area_node(). Doing so fixes the regression. Fixes: 515e5b6d90d4 ("dma-mapping: use vmap insted of reimplementing it") Signed-off-by: Eric Auger --- v1 -> v2: - fix line over 80 characters --- kernel/dma/remap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c index e739a6eea6e7..78b23f089cf1 100644 --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -24,7 +24,8 @@ void *dma_common_pages_remap(struct page **pages, size_t size, { void *vaddr; - vaddr = vmap(pages, size >> PAGE_SHIFT, VM_DMA_COHERENT, prot); + vaddr = vmap(pages, PAGE_ALIGN(size) >> PAGE_SHIFT, +VM_DMA_COHERENT, prot); if (vaddr) find_vm_area(vaddr)->pages = pages; return vaddr; @@ -37,7 +38,7 @@ void *dma_common_pages_remap(struct page **pages, size_t size, void *dma_common_contiguous_remap(struct page *page, size_t size, pgprot_t prot, const void *caller) { - int count = size >> PAGE_SHIFT; + int count = PAGE_ALIGN(size) >> PAGE_SHIFT; struct page **pages; void *vaddr; int i; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-remap: Align the size in dma_common_*_remap()
On Mon, Jun 22, 2020 at 10:23:20PM +0200, Eric Auger wrote: > Running a guest with a virtio-iommu protecting virtio devices > is broken since commit 515e5b6d90d4 ("dma-mapping: use vmap insted > of reimplementing it"). Before the conversion, the size was > page aligned in __get_vm_area_node(). Doing so fixes the > regression. > > Fixes: 515e5b6d90d4 ("dma-mapping: use vmap insted of reimplementing it") > Signed-off-by: Eric Auger > --- > kernel/dma/remap.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c > index e739a6eea6e7..a3151a9b0c08 100644 > --- a/kernel/dma/remap.c > +++ b/kernel/dma/remap.c > @@ -24,7 +24,7 @@ void *dma_common_pages_remap(struct page **pages, size_t > size, > { > void *vaddr; > > - vaddr = vmap(pages, size >> PAGE_SHIFT, VM_DMA_COHERENT, prot); > + vaddr = vmap(pages, PAGE_ALIGN(size) >> PAGE_SHIFT, VM_DMA_COHERENT, > prot); Please respin without going over 80 characters. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 3/4] iommu/arm-smmu: Add global/context fault implementation hooks
On 2020-06-23 09:36, Thierry Reding wrote: [...] diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705b..d720e1e191176 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -673,6 +673,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, enum io_pgtable_fmt fmt; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_cfg *cfg = _domain->cfg; + irqreturn_t (*context_fault)(int irq, void *dev); mutex_lock(_domain->init_mutex); if (smmu_domain->smmu) @@ -835,7 +836,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, * handler seeing a half-initialised domain state. */ irq = smmu->irqs[smmu->num_global_irqs + cfg->irptndx]; - ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault, + context_fault = (smmu->impl && smmu->impl->context_fault) ? +smmu->impl->context_fault : arm_smmu_context_fault; A simpler way might have been to assign arm_smmu_context_fault to all implementations. That way we wouldn't have to perform this check here and instead just always using smmu->impl->context_fault. But smmu->impl can still be NULL... Everything in impl, including the presence of impl itself, is optional, so the notion of overriding a default with the same default doesn't really make much sense, and would go against the pattern everywhere else. Robin. + ret = devm_request_irq(smmu->dev, irq, context_fault, IRQF_SHARED, "arm-smmu-context-fault", domain); if (ret < 0) { dev_err(smmu->dev, "failed to request context IRQ %d (%u)\n", @@ -2107,6 +2110,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev) struct arm_smmu_device *smmu; struct device *dev = >dev; int num_irqs, i, err; + irqreturn_t (*global_fault)(int irq, void *dev); smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL); if (!smmu) { @@ -2193,9 +2197,12 @@ static int arm_smmu_device_probe(struct platform_device *pdev) smmu->num_context_irqs = smmu->num_context_banks; } + global_fault = (smmu->impl && smmu->impl->global_fault) ? + smmu->impl->global_fault : arm_smmu_global_fault; + Same as above. Thierry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
On 2020-06-23 11:29, Thierry Reding wrote: [...] diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b702..52c84c30f83e4 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -160,6 +160,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) */ switch (smmu->model) { case ARM_MMU500: + if (of_device_is_compatible(smmu->dev->of_node, + "nvidia,tegra194-smmu-500")) + return nvidia_smmu_impl_init(smmu); Should NVIDIA_TEGRA194_SMMU be a separate value for smmu->model, perhaps? That way we avoid this somewhat odd check here. No, this is simply in the wrong place. The design here is that we pick up anything related to the basic SMMU IP (model) first, then make any platform-specific integration checks. That way a platform-specific init function can see the model impl set and subclass it if necessary (although nobody's actually done that yet). The setup for Cavium is just a short-cut since their model is unique to their integration, so the lines get a bit blurred and there's little benefit to trying to separate it out. In short, put this down below with the other of_device_is_compatible() checks. smmu->impl = _mmu500_impl; break; case CAVIUM_SMMUV2: diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c I wonder if it would be better to name this arm-smmu-tegra.c to make it clearer that this is for a Tegra chip. We do have regular expressions in MAINTAINERS that catch anything with "tegra" in it to make this easier. There was a notion that these would be grouped by vendor, but if there's a strong preference for all NVIDIA-SoC-related stuff to be named "Tegra" then I'm not going to complain too much. new file mode 100644 index 0..dafc293a45217 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Nvidia ARM SMMU v2 implementation quirks s/Nvidia/NVIDIA/ +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. I suppose this should now also include 2020. + +#define pr_fmt(fmt) "nvidia-smmu: " fmt Same here. Might be worth making this "tegra-smmu: " for consistency. On the other hand, a log prefix that is literally the name of a completely unrelated driver seems more confusing to users than useful. Same for the function naming - the tegra_smmu_* namespace is already owned by that driver. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
On Thu, Jun 04, 2020 at 04:44:11PM -0700, Krishna Reddy wrote: > NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave s/soc/SoC/ > IOVA accesses across them. > Add NVIDIA implementation for dual ARM MMU-500s and add new compatible > string for Tegra194 soc. Same here. > > Signed-off-by: Krishna Reddy > --- > MAINTAINERS | 2 + > drivers/iommu/Makefile | 2 +- > drivers/iommu/arm-smmu-impl.c | 3 + > drivers/iommu/arm-smmu-nvidia.c | 161 > drivers/iommu/arm-smmu.h| 1 + > 5 files changed, 168 insertions(+), 1 deletion(-) > create mode 100644 drivers/iommu/arm-smmu-nvidia.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index 50659d76976b7..118da0893c964 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -16572,9 +16572,11 @@ F: drivers/i2c/busses/i2c-tegra.c > > TEGRA IOMMU DRIVERS > M: Thierry Reding > +R: Krishna Reddy > L: linux-te...@vger.kernel.org > S: Supported > F: drivers/iommu/tegra* > +F: drivers/iommu/arm-smmu-nvidia.c > > TEGRA KBC DRIVER > M: Laxman Dewangan > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 57cf4ba5e27cb..35542df00da72 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o > amd_iommu_quirks.o > obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o > obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o > obj-$(CONFIG_ARM_SMMU) += arm_smmu.o > -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o > +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o > obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o > obj-$(CONFIG_DMAR_TABLE) += dmar.o > obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o > diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c > index c75b9d957b702..52c84c30f83e4 100644 > --- a/drivers/iommu/arm-smmu-impl.c > +++ b/drivers/iommu/arm-smmu-impl.c > @@ -160,6 +160,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct > arm_smmu_device *smmu) >*/ > switch (smmu->model) { > case ARM_MMU500: > + if (of_device_is_compatible(smmu->dev->of_node, > + "nvidia,tegra194-smmu-500")) > + return nvidia_smmu_impl_init(smmu); Should NVIDIA_TEGRA194_SMMU be a separate value for smmu->model, perhaps? That way we avoid this somewhat odd check here. > smmu->impl = _mmu500_impl; > break; > case CAVIUM_SMMUV2: > diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c I wonder if it would be better to name this arm-smmu-tegra.c to make it clearer that this is for a Tegra chip. We do have regular expressions in MAINTAINERS that catch anything with "tegra" in it to make this easier. > new file mode 100644 > index 0..dafc293a45217 > --- /dev/null > +++ b/drivers/iommu/arm-smmu-nvidia.c > @@ -0,0 +1,161 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +// Nvidia ARM SMMU v2 implementation quirks s/Nvidia/NVIDIA/ > +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. I suppose this should now also include 2020. > + > +#define pr_fmt(fmt) "nvidia-smmu: " fmt Same here. Might be worth making this "tegra-smmu: " for consistency. > + > +#include > +#include > +#include > +#include > +#include > + > +#include "arm-smmu.h" > + > +/* Tegra194 has three ARM MMU-500 Instances. > + * Two of them are used together for Interleaved IOVA accesses and > + * used by Non-Isochronous Hw devices for SMMU translations. "non-isochronous", s/Hw/HW/ > + * Third one is used for SMMU translations from Isochronous HW devices. "isochronous" > + * It is possible to use this Implementation to program either "implementation" > + * all three or two of the instances identically as desired through > + * DT node. > + * > + * Programming all the three instances identically comes with redundant tlb s/tlb/TLB/ > + * invalidations as all three never need to be tlb invalidated for a HW > device. Same here. > + * > + * When Linux Kernel supports multiple SMMU devices, The SMMU device used for "kernel" and "..., the SMMU device" > + * Isochornous HW devices should be added as a separate ARM MMU-500 device "isochronous" > + * in DT and be programmed independently for efficient tlb invalidates. "TLB" > + * > + */ > +#define MAX_SMMU_INSTANCES 3 > + > +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ USEC_PER_SEC? > +#define TLB_SPIN_COUNT 10 > + > +struct nvidia_smmu { > + struct arm_smmu_device smmu; > + unsigned intnum_inst; > + void __iomem*bases[MAX_SMMU_INSTANCES]; > +}; > + > +#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu) Making this static inline can make error messages more readable. > + > +#define nsmmu_page(smmu, inst, page) \ > +
Re: [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()
On 23/06/2020 10:35, Rikard Falkeborn wrote: I'd say that GENMASK_INPUT_CHECK() should be able to handle a l=0 and h=unsigned value, so I doubt this warn. Using GENMASK((int)cmdq->q.llq.max_n_shift, 0) resolves it, but it looks like GENMASK_INPUT_CHECK() could be improved. Indeed it could, it is fixed in -next. ok, thanks for the pointer, but I still see this on today's -next with this patch: make W=1 drivers/iommu/arm-smmu-v3.o In file included from ./include/linux/bits.h:23:0, from ./include/linux/ioport.h:15, from ./include/linux/acpi.h:12, from drivers/iommu/arm-smmu-v3.c:12: drivers/iommu/arm-smmu-v3.c: In function ‘arm_smmu_cmdq_issue_cmdlist’: ./include/linux/bits.h:27:7: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] (l) > (h), 0))) ^ ./include/linux/build_bug.h:16:62: note: in definition of macro ‘BUILD_BUG_ON_ZERO’ #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) ^ ./include/linux/bits.h:40:3: note: in expansion of macro ‘GENMASK_INPUT_CHECK’ (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) ^~~ drivers/iommu/arm-smmu-v3.c:1404:18: note: in expansion of macro ‘GENMASK’ u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0); That's gcc 7.5.0 . Cheers, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()
Hi Den tis 23 juni 2020 11:22John Garry skrev: > On 23/06/2020 02:07, kernel test robot wrote: > > + Rikard, as the GENMASK compile-time sanity checks were added recently > > > Hi John, > > > > I love your patch! Perhaps something to improve: > > > > [auto build test WARNING on iommu/next] > > [If your patch is applied to the wrong git tree, kindly drop us a note. > > And when submitting patch, we suggest to use as documented in > > https://git-scm.com/docs/git-format-patch] > > > > url: > https://github.com/0day-ci/linux/commits/John-Garry/iommu-arm-smmu-v3-Improve-cmdq-lock-efficiency/20200623-013438 > > base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git > next > > config: arm64-randconfig-c024-20200622 (attached as .config) > > compiler: aarch64-linux-gcc (GCC) 9.3.0 > > > > If you fix the issue, kindly add following tag as appropriate > > Reported-by: kernel test robot > > > > All warnings (new ones prefixed by >>, old ones prefixed by <<): > > > > In file included from include/linux/bits.h:23, > > from include/linux/ioport.h:15, > > from include/linux/acpi.h:12, > > from drivers/iommu/arm-smmu-v3.c:12: > > drivers/iommu/arm-smmu-v3.c: In function 'arm_smmu_cmdq_issue_cmdlist': > > include/linux/bits.h:26:28: warning: comparison of unsigned expression < > 0 is always false [-Wtype-limits] > > 26 | __builtin_constant_p((l) > (h)), (l) > (h), 0))) > > I'd say that GENMASK_INPUT_CHECK() should be able to handle a l=0 and > h=unsigned value, so I doubt this warn. > > Using GENMASK((int)cmdq->q.llq.max_n_shift, 0) resolves it, but it looks > like GENMASK_INPUT_CHECK() could be improved. > Indeed it could, it is fixed in -next. Rikard > |^ > > include/linux/build_bug.h:16:62: note: in definition of macro > 'BUILD_BUG_ON_ZERO' > > 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); > }))) > > | ^ > > include/linux/bits.h:39:3: note: in expansion of macro > 'GENMASK_INPUT_CHECK' > > 39 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) > > | ^~~ > >>> drivers/iommu/arm-smmu-v3.c:1404:18: note: in expansion of macro > 'GENMASK' > > 1404 | u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0); > > | ^~~ > > include/linux/bits.h:26:40: warning: comparison of unsigned expression < > 0 is always false [-Wtype-limits] > > 26 | __builtin_constant_p((l) > (h)), (l) > (h), 0))) > > |^ > > include/linux/build_bug.h:16:62: note: in definition of macro > 'BUILD_BUG_ON_ZERO' > > 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); > }))) > > | ^ > > include/linux/bits.h:39:3: note: in expansion of macro > 'GENMASK_INPUT_CHECK' > > 39 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) > > | ^~~ > >>> drivers/iommu/arm-smmu-v3.c:1404:18: note: in expansion of macro > 'GENMASK' > > 1404 | u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0); > > | ^~~ > > > > vim +/GENMASK +1404 drivers/iommu/arm-smmu-v3.c > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()
On 23/06/2020 02:07, kernel test robot wrote: + Rikard, as the GENMASK compile-time sanity checks were added recently Hi John, I love your patch! Perhaps something to improve: [auto build test WARNING on iommu/next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/John-Garry/iommu-arm-smmu-v3-Improve-cmdq-lock-efficiency/20200623-013438 base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next config: arm64-randconfig-c024-20200622 (attached as .config) compiler: aarch64-linux-gcc (GCC) 9.3.0 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>, old ones prefixed by <<): In file included from include/linux/bits.h:23, from include/linux/ioport.h:15, from include/linux/acpi.h:12, from drivers/iommu/arm-smmu-v3.c:12: drivers/iommu/arm-smmu-v3.c: In function 'arm_smmu_cmdq_issue_cmdlist': include/linux/bits.h:26:28: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] 26 | __builtin_constant_p((l) > (h)), (l) > (h), 0))) I'd say that GENMASK_INPUT_CHECK() should be able to handle a l=0 and h=unsigned value, so I doubt this warn. Using GENMASK((int)cmdq->q.llq.max_n_shift, 0) resolves it, but it looks like GENMASK_INPUT_CHECK() could be improved. |^ include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO' 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) | ^ include/linux/bits.h:39:3: note: in expansion of macro 'GENMASK_INPUT_CHECK' 39 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) | ^~~ drivers/iommu/arm-smmu-v3.c:1404:18: note: in expansion of macro 'GENMASK' 1404 | u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0); | ^~~ include/linux/bits.h:26:40: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] 26 | __builtin_constant_p((l) > (h)), (l) > (h), 0))) |^ include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO' 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) | ^ include/linux/bits.h:39:3: note: in expansion of macro 'GENMASK_INPUT_CHECK' 39 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) | ^~~ drivers/iommu/arm-smmu-v3.c:1404:18: note: in expansion of macro 'GENMASK' 1404 | u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0); | ^~~ vim +/GENMASK +1404 drivers/iommu/arm-smmu-v3.c ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
On Thu, Jun 04, 2020 at 04:44:12PM -0700, Krishna Reddy wrote: > Add binding for NVIDIA's Tegra194 Soc SMMU that is based > on ARM MMU-500. > > Signed-off-by: Krishna Reddy > --- > Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > index e3ef1c69d1326..8f7ffd248f303 100644 > --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > @@ -37,6 +37,11 @@ properties: >- qcom,sc7180-smmu-500 >- qcom,sdm845-smmu-500 >- const: arm,mmu-500 > + - description: NVIDIA SoCs that use more than one "arm,mmu-500" > +items: > + - enum: > + - nvdia,tegra194-smmu-500 The -500 suffix here seems a bit redundant since there's no other type of SMMU in Tegra194, correct? Thierry signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 3/4] iommu/arm-smmu: Add global/context fault implementation hooks
On Thu, Jun 04, 2020 at 04:44:13PM -0700, Krishna Reddy wrote: > Add global/context fault hooks to allow NVIDIA SMMU implementation > handle faults across multiple SMMUs. > > Signed-off-by: Krishna Reddy > --- > drivers/iommu/arm-smmu-nvidia.c | 100 > drivers/iommu/arm-smmu.c| 11 +++- > drivers/iommu/arm-smmu.h| 3 + > 3 files changed, 112 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c > index dafc293a45217..5999b6a770992 100644 > --- a/drivers/iommu/arm-smmu-nvidia.c > +++ b/drivers/iommu/arm-smmu-nvidia.c > @@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu) > return 0; > } > > +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) > +{ > + return container_of(dom, struct arm_smmu_domain, domain); > +} > + > +static irqreturn_t nsmmu_global_fault_inst(int irq, > +struct arm_smmu_device *smmu, > +int inst) > +{ > + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; > + > + gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); > + gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) + > + ARM_SMMU_GR0_sGFSYNR0); > + gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) + > + ARM_SMMU_GR0_sGFSYNR1); > + gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) + > + ARM_SMMU_GR0_sGFSYNR2); > + > + if (!gfsr) > + return IRQ_NONE; > + > + dev_err_ratelimited(smmu->dev, > + "Unexpected global fault, this could be serious\n"); > + dev_err_ratelimited(smmu->dev, > + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 > 0x%08x\n", > + gfsr, gfsynr0, gfsynr1, gfsynr2); > + > + writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); > + return IRQ_HANDLED; > +} > + > +static irqreturn_t nsmmu_global_fault(int irq, void *dev) > +{ > + int inst; > + irqreturn_t irq_ret = IRQ_NONE; > + struct arm_smmu_device *smmu = dev; > + > + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { > + irq_ret = nsmmu_global_fault_inst(irq, smmu, inst); > + if (irq_ret == IRQ_HANDLED) > + return irq_ret; > + } > + > + return irq_ret; > +} > + > +static irqreturn_t nsmmu_context_fault_bank(int irq, > + struct arm_smmu_device *smmu, > + int idx, int inst) > +{ > + u32 fsr, fsynr, cbfrsynra; > + unsigned long iova; > + > + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); > + if (!(fsr & ARM_SMMU_FSR_FAULT)) > + return IRQ_NONE; > + > + fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + > + ARM_SMMU_CB_FSYNR0); > + iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + > + ARM_SMMU_CB_FAR); > + cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) + > + ARM_SMMU_GR1_CBFRSYNRA(idx)); > + > + dev_err_ratelimited(smmu->dev, > + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, > cbfrsynra=0x%x, cb=%d\n", > + fsr, iova, fsynr, cbfrsynra, idx); > + > + writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) + > + ARM_SMMU_CB_FSR); > + return IRQ_HANDLED; > +} > + > +static irqreturn_t nsmmu_context_fault(int irq, void *dev) > +{ > + int inst, idx; > + irqreturn_t irq_ret = IRQ_NONE; > + struct iommu_domain *domain = dev; > + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > + struct arm_smmu_device *smmu = smmu_domain->smmu; > + > + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { > + /* Interrupt line shared between all context faults. > + * Check for faults across all contexts. > + */ > + for (idx = 0; idx < smmu->num_context_banks; idx++) { > + irq_ret = nsmmu_context_fault_bank(irq, smmu, > +idx, inst); > + > + if (irq_ret == IRQ_HANDLED) > + return irq_ret; > + } > + } > + > + return irq_ret; > +} > + > static const struct arm_smmu_impl nvidia_smmu_impl = { > .read_reg = nsmmu_read_reg, > .write_reg = nsmmu_write_reg, > @@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { > .write_reg64 = nsmmu_write_reg64, > .reset = nsmmu_reset, > .tlb_sync = nsmmu_tlb_sync, > + .global_fault = nsmmu_global_fault, > + .context_fault = nsmmu_context_fault, > }; > > struct arm_smmu_device *nvidia_smmu_impl_init(struct
Re: [PATCH v6 4/4] iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot
On Thu, Jun 04, 2020 at 04:44:14PM -0700, Krishna Reddy wrote: > >> drivers/iommu/arm-smmu-nvidia.c:151:33: sparse: sparse: cast removes > >> address space '' of expression > > Reported-by: kbuild test robot > Signed-off-by: Krishna Reddy > --- > drivers/iommu/arm-smmu-nvidia.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) This should be folded into the patch that introduced this error. Thierry signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/6] [PULL REQUEST] iommu/vt-d: fixes for v5.8
On Tue, Jun 23, 2020 at 07:13:39AM +0800, Lu Baolu wrote: > Hi Joerg, > > Below fix patches have been piled up for v5.8. Please consider them for > your fix branch. > > Best regards, > Lu Baolu > > Lu Baolu (5): > iommu/vt-d: Make Intel SVM code 64-bit only > iommu/vt-d: Set U/S bit in first level page table by default > iommu/vt-d: Enable PCI ACS for platform opt in hint > iommu/vt-d: Update scalable mode paging structure coherency > iommu/vt-d: Fix misuse of iommu_domain_identity_map() > > Rajat Jain (1): > iommu/vt-d: Don't apply gfx quirks to untrusted devices > > drivers/iommu/Kconfig | 2 +- > drivers/iommu/intel/dmar.c | 3 +- > drivers/iommu/intel/iommu.c | 59 - > include/linux/intel-iommu.h | 1 + > 4 files changed, 56 insertions(+), 9 deletions(-) Applied to iommu/fixes, thanks Baolu. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU
Hi, Joerg On 2020/6/22 下午7:55, Joerg Roedel wrote: On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote: +++ b/drivers/iommu/iommu.c @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode, fwspec->iommu_fwnode = iommu_fwnode; fwspec->ops = ops; dev_iommu_fwspec_set(dev, fwspec); + + if (dev_is_pci(dev)) + pci_fixup_device(pci_fixup_final, to_pci_dev(dev)); + That's not going to fly, I don't think we should run the fixups twice, and they should not be run from IOMMU code. Is the only reason for this second pass that iommu_fwspec is not yet allocated when it runs the first time? I ask because it might be easier to just allocate the struct earlier then. Thanks for looking this. Yes, it is the only reason calling fixup secondly after iommu_fwspec is allocated. The first time fixup final is very early in pci_bus_add_device. If allocating iommu_fwspec earlier, it maybe in pci_alloc_dev. And assigning ops still in iommu_fwspec_init. Have tested it works. Not sure is it acceptable? Alternatively, adding can_stall to struct pci_dev is simple but ugly too, since pci does not know stall now. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 14/15] docs: fix references for DMA*.txt files
On Tue, Jun 23, 2020 at 09:09:10AM +0200, Mauro Carvalho Chehab wrote: > As we moved those files to core-api, fix references to point > to their newer locations. Can we please just revert the RST conversion that I didn't ACK? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 14/15] docs: fix references for DMA*.txt files
As we moved those files to core-api, fix references to point to their newer locations. Signed-off-by: Mauro Carvalho Chehab --- Documentation/PCI/pci.rst | 6 +++--- Documentation/block/biodoc.rst | 2 +- Documentation/bus-virt-phys-mapping.txt| 2 +- Documentation/core-api/dma-api.rst | 6 +++--- Documentation/core-api/dma-isa-lpc.rst | 2 +- Documentation/driver-api/usb/dma.rst | 6 +++--- .../translations/ko_KR/memory-barriers.txt | 6 +++--- arch/ia64/hp/common/sba_iommu.c| 12 ++-- arch/parisc/kernel/pci-dma.c | 2 +- arch/x86/include/asm/dma-mapping.h | 4 ++-- arch/x86/kernel/amd_gart_64.c | 2 +- drivers/parisc/sba_iommu.c | 14 +++--- include/linux/dma-mapping.h| 2 +- include/media/videobuf-dma-sg.h| 2 +- kernel/dma/debug.c | 2 +- 15 files changed, 35 insertions(+), 35 deletions(-) diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst index 8c016d8c9862..d10d3fe604c5 100644 --- a/Documentation/PCI/pci.rst +++ b/Documentation/PCI/pci.rst @@ -265,7 +265,7 @@ Set the DMA mask size - .. note:: If anything below doesn't make sense, please refer to - Documentation/DMA-API.txt. This section is just a reminder that + :doc:`/core-api/dma-api`. This section is just a reminder that drivers need to indicate DMA capabilities of the device and is not an authoritative source for DMA interfaces. @@ -291,7 +291,7 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are Setup shared control data - Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared) -memory. See Documentation/DMA-API.txt for a full description of +memory. See :doc:`/core-api/dma-api` for a full description of the DMA APIs. This section is just a reminder that it needs to be done before enabling DMA on the device. @@ -421,7 +421,7 @@ owners if there is one. Then clean up "consistent" buffers which contain the control data. -See Documentation/DMA-API.txt for details on unmapping interfaces. +See :doc:`/core-api/dma-api` for details on unmapping interfaces. Unregister from other subsystems diff --git a/Documentation/block/biodoc.rst b/Documentation/block/biodoc.rst index b964796ec9c7..ba7f45d0271c 100644 --- a/Documentation/block/biodoc.rst +++ b/Documentation/block/biodoc.rst @@ -196,7 +196,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address do not have a corresponding kernel virtual address space mapping) and low-memory pages. -Note: Please refer to Documentation/DMA-API-HOWTO.txt for a discussion +Note: Please refer to :doc:`/core-api/dma-api-howto` for a discussion on PCI high mem DMA aspects and mapping of scatter gather lists, and support for 64 bit PCI. diff --git a/Documentation/bus-virt-phys-mapping.txt b/Documentation/bus-virt-phys-mapping.txt index 4bb07c2f3e7d..c7bc99cd2e21 100644 --- a/Documentation/bus-virt-phys-mapping.txt +++ b/Documentation/bus-virt-phys-mapping.txt @@ -8,7 +8,7 @@ How to access I/O mapped memory from within device drivers The virt_to_bus() and bus_to_virt() functions have been superseded by the functionality provided by the PCI DMA interface - (see Documentation/DMA-API-HOWTO.txt). They continue + (see :doc:`/core-api/dma-api-howto`). They continue to be documented below for historical purposes, but new code must not use them. --davidm 00/12/12 diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst index 2d8d2fed7317..63b4a2f20867 100644 --- a/Documentation/core-api/dma-api.rst +++ b/Documentation/core-api/dma-api.rst @@ -5,7 +5,7 @@ Dynamic DMA mapping using the generic device :Author: James E.J. Bottomley This document describes the DMA API. For a more gentle introduction -of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt. +of the API (and actual examples), see :doc:`/core-api/dma-api-howto`. This API is split into two pieces. Part I describes the basic API. Part II describes extensions for supporting non-consistent memory @@ -471,7 +471,7 @@ without the _attrs suffixes, except that they pass an optional dma_attrs. The interpretation of DMA attributes is architecture-specific, and -each attribute should be documented in Documentation/DMA-attributes.txt. +each attribute should be documented in :doc:`/core-api/dma-attributes`. If dma_attrs are 0, the semantics of each of these functions is identical to those of the corresponding function @@ -484,7 +484,7 @@ for DMA:: #include /* DMA_ATTR_FOO should be defined in linux/dma-mapping.h and - * documented in Documentation/DMA-attributes.txt
[PATCH v2 00/15] Documentation fixes
Hi Jon, As requested, this is a rebase of a previous series posted on Jan, 15. Since then, several patches got merged via other trees or became obsolete. There were also 2 patches before that fits better at the ReST conversion patchset. So, I'll be sending it on another patch series together with the remaining ReST conversions. I also added reviews/acks received. So, the series reduced from 29 to 15 patches. Let's hope b4 would be able to properly handle this one. Regards, Mauro Mauro Carvalho Chehab (15): mm: vmalloc.c: remove a kernel-doc annotation from a removed parameter net: dev: add a missing kernel-doc annotation net: netdevice.h: add a description for napi_defer_hard_irqs scripts/kernel-doc: parse __ETHTOOL_DECLARE_LINK_MODE_MASK net: pylink.h: add kernel-doc descriptions for new fields at phylink_config scripts/kernel-doc: handle function pointer prototypes fs: fs.h: fix a kernel-doc parameter description kcsan: fix a kernel-doc warning selftests/vm/keys: fix a broken reference at protection_keys.c docs: hugetlbpage.rst: fix some warnings docs: powerpc: fix some issues at vas-api.rst docs: driver-model: remove a duplicated markup at driver.rst docs: ABI: fix a typo when pointing to w1-generic.rst docs: fix references for DMA*.txt files docs: fs: proc.rst: convert a new chapter to ReST .../ABI/testing/sysfs-driver-w1_therm | 2 +- Documentation/PCI/pci.rst | 6 +-- Documentation/admin-guide/mm/hugetlbpage.rst | 23 +++--- Documentation/block/biodoc.rst| 2 +- Documentation/bus-virt-phys-mapping.txt | 2 +- Documentation/core-api/dma-api.rst| 6 +-- Documentation/core-api/dma-isa-lpc.rst| 2 +- .../driver-api/driver-model/driver.rst| 2 - Documentation/driver-api/usb/dma.rst | 6 +-- Documentation/filesystems/proc.rst| 44 +-- Documentation/powerpc/vas-api.rst | 23 +++--- .../translations/ko_KR/memory-barriers.txt| 6 +-- arch/ia64/hp/common/sba_iommu.c | 12 ++--- arch/parisc/kernel/pci-dma.c | 2 +- arch/x86/include/asm/dma-mapping.h| 4 +- arch/x86/kernel/amd_gart_64.c | 2 +- drivers/parisc/sba_iommu.c| 14 +++--- include/linux/dma-mapping.h | 2 +- include/linux/fs.h| 2 +- include/linux/kcsan-checks.h | 10 +++-- include/linux/netdevice.h | 2 + include/linux/phylink.h | 4 ++ include/media/videobuf-dma-sg.h | 2 +- kernel/dma/debug.c| 2 +- mm/vmalloc.c | 1 - net/core/dev.c| 1 + scripts/kernel-doc| 7 +++ tools/testing/selftests/vm/protection_keys.c | 2 +- 28 files changed, 114 insertions(+), 79 deletions(-) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 14/15] vfio: Document dual stage control
> From: Stefan Hajnoczi > Sent: Monday, June 22, 2020 8:51 PM > > On Wed, Jun 17, 2020 at 06:27:27AM +, Liu, Yi L wrote: > > > From: Stefan Hajnoczi > > > Sent: Monday, June 15, 2020 5:41 PM > > > On Thu, Jun 11, 2020 at 05:15:33AM -0700, Liu Yi L wrote: > > > > > > > From: Eric Auger > > > > > > > > The VFIO API was enhanced to support nested stage control: a bunch of > > > > new iotcls and usage guideline. > > > > > > > > Let's document the process to follow to set up nested mode. > > > > > > > > Cc: Kevin Tian > > > > CC: Jacob Pan > > > > Cc: Alex Williamson > > > > Cc: Eric Auger > > > > Cc: Jean-Philippe Brucker > > > > Cc: Joerg Roedel > > > > Cc: Lu Baolu > > > > Signed-off-by: Eric Auger > > > > Signed-off-by: Liu Yi L > > > > --- > > > > v1 -> v2: > > > > *) new in v2, compared with Eric's original version, pasid table bind > > > >and fault reporting is removed as this series doesn't cover them. > > > >Original version from Eric. > > > >https://lkml.org/lkml/2020/3/20/700 > > > > > > > > Documentation/driver-api/vfio.rst | 64 > > > > +++ > > > > 1 file changed, 64 insertions(+) > > > > > > > > diff --git a/Documentation/driver-api/vfio.rst > > > > b/Documentation/driver-api/vfio.rst > > > > index f1a4d3c..06224bd 100644 > > > > --- a/Documentation/driver-api/vfio.rst > > > > +++ b/Documentation/driver-api/vfio.rst > > > > @@ -239,6 +239,70 @@ group and can access them as follows:: > > > > /* Gratuitous device reset and go... */ > > > > ioctl(device, VFIO_DEVICE_RESET); > > > > > > > > +IOMMU Dual Stage Control > > > > + > > > > + > > > > +Some IOMMUs support 2 stages/levels of translation. Stage corresponds > > > > +to the ARM terminology while level corresponds to Intel's VTD > > > > terminology. > > > > +In the following text we use either without distinction. > > > > + > > > > +This is useful when the guest is exposed with a virtual IOMMU and > > > > +some devices are assigned to the guest through VFIO. Then the guest > > > > +OS can use stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor > > > > +uses stage 2 for VM isolation (GPA -> HPA). > > > > + > > > > +Under dual stage translation, the guest gets ownership of the stage 1 > > > > +page tables and also owns stage 1 configuration structures. The > > > > +hypervisor owns the root configuration structure (for security > > > > +reason), including stage 2 configuration. This works as long > > > > +configuration structures and page table > > > > > > s/as long configuration/as long as configuration/ > > > > got it. > > > > > > > > > +format are compatible between the virtual IOMMU and the physical IOMMU. > > > > > > s/format/formats/ > > > > I see. > > > > > > + > > > > +Assuming the HW supports it, this nested mode is selected by choosing > > > > +the VFIO_TYPE1_NESTING_IOMMU type through: > > > > + > > > > +ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU); > > > > + > > > > +This forces the hypervisor to use the stage 2, leaving stage 1 > > > > +available for guest usage. The guest stage 1 format depends on IOMMU > > > > +vendor, and it is the same with the nesting configuration method. > > > > +User space should check the format and configuration method after > > > > +setting nesting type by > > > > +using: > > > > + > > > > +ioctl(container->fd, VFIO_IOMMU_GET_INFO, _info); > > > > + > > > > +Details can be found in Documentation/userspace-api/iommu.rst. For > > > > +Intel VT-d, each stage 1 page table is bound to host by: > > > > + > > > > +nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL; > > > > +memcpy(_op->data, _data, sizeof(bind_data)); > > > > +ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op); > > > > + > > > > +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or > > > > GVA->GPA. > > > > +GVA->GPA page tables are available when PASID (Process Address Space > > > > +GVA->ID) > > > > +is exposed to guest. e.g. guest with PASID-capable devices assigned. > > > > +For such page table binding, the bind_data should include PASID info, > > > > +which is allocated by guest itself or by host. This depends on > > > > +hardware vendor e.g. Intel VT-d requires to allocate PASID from host. > > > > +This requirement is available by VFIO_IOMMU_GET_INFO. User space > > > > +could allocate PASID from host by: > > > > + > > > > +req.flags = VFIO_IOMMU_ALLOC_PASID; > > > > +ioctl(container, VFIO_IOMMU_PASID_REQUEST, ); > > > > > > It is not clear how the userspace application determines whether PASIDs > > > must be > > > allocated from the host via VFIO_IOMMU_PASID_REQUEST or if the guest > > > itself > can > > > allocate PASIDs. The text mentions VFIO_IOMMU_GET_INFO but what exactly > > > should the userspace application check? > > > > For VT-d, spec 3.0 introduced Virtual Cmd interface for PASID allocation, > > guest request PASID from host if it detects the interface.