Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function
Hi Eric, Thanks for the review, I somehow missed it, my apologies. See comments below. On Tue, 12 Nov 2019 11:28:37 +0100 Auger Eric wrote: > Hi Jacob, > > On 10/24/19 9:55 PM, Jacob Pan wrote: > > When Shared Virtual Address (SVA) is enabled for a guest OS via > > vIOMMU, we need to provide invalidation support at IOMMU API and > > driver level. This patch adds Intel VT-d specific function to > > implement iommu passdown invalidate API for shared virtual address. > > > > The use case is for supporting caching structure invalidation > > of assigned SVM capable devices. Emulated IOMMU exposes queue > > invalidation capability and passes down all descriptors from the > > guest to the physical IOMMU. > > > > The assumption is that guest to host device ID mapping should be > > resolved prior to calling IOMMU driver. Based on the device handle, > > host IOMMU driver can replace certain fields before submit to the > > invalidation queue. > > > > Signed-off-by: Jacob Pan > > Signed-off-by: Ashok Raj > > Signed-off-by: Liu, Yi L > > --- > > drivers/iommu/intel-iommu.c | 170 > > 1 file changed, 170 > > insertions(+) > > > > diff --git a/drivers/iommu/intel-iommu.c > > b/drivers/iommu/intel-iommu.c index 5fab32fbc4b4..a73e76d6457a > > 100644 --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -5491,6 +5491,175 @@ static void > > intel_iommu_aux_detach_device(struct iommu_domain *domain, > > aux_domain_remove_dev(to_dmar_domain(domain), dev); } > > > > +/* > > + * 2D array for converting and sanitizing IOMMU generic TLB > > granularity to > > + * VT-d granularity. Invalidation is typically included in the > > unmap operation > > + * as a result of DMA or VFIO unmap. However, for assigned device > > where guest > > + * could own the first level page tables without being shadowed by > > QEMU. In > above sentence needs to be rephrased. Yes, how about this: /* * 2D array for converting and sanitizing IOMMU generic TLB granularity to * VT-d granularity. Invalidation is typically included in the unmap operation * as a result of DMA or VFIO unmap. However, for assigned devices guest * owns the first level page tables. Invalidations of translation caches in the * guest are trapped and passed down to the host. * * vIOMMU in the guest will only expose first level page tables, therefore * we do not include IOTLB granularity for request without PASID (second level). * * For example, to find the VT-d granularity encoding for IOTLB > > + * this case there is no pass down unmap to the host IOMMU as a > > result of unmap > > + * in the guest. Only invalidations are trapped and passed down. > > + * In all cases, only first level TLB invalidation (request with > > PASID) can be > > + * passed down, therefore we do not include IOTLB granularity for > > request > > + * without PASID (second level). > > + * > > + * For an example, to find the VT-d granularity encoding for > > IOTLB > for example sounds better. > > + * type and page selective granularity within PASID: > > + * X: indexed by iommu cache type > > + * Y: indexed by enum iommu_inv_granularity > > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR] > > + * > > + * Granu_map array indicates validity of the table. 1: valid, 0: > > invalid > > + * > > + */ > > +const static int > > inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = { > > + /* PASID based IOTLB, support PASID selective and page > > selective */ > I would rather use the generic terminology, ie. IOTLB invalidation > supports PASID and ADDR granularity Understood. My choice of terminology is based on VT-d spec and this is VT-d only code. Perhaps add the generic terms by the side? i.e. /* * PASID based IOTLB invalidation: PASID selective (per PASID), * page selective (address granularity) */ > > + {0, 1, 1},> + /* PASID based dev TLBs, only support > > all PASIDs or single PASID */ > Device IOLTB invalidation supports DOMAIN and PASID granularities > > + {1, 1, 0}, > > + /* PASID cache */ > PASID cache invalidation support DOMAIN and PASID granularity > > + {1, 1, 0} > > +}; > > + > > +const static u64 > > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = > > { > > + /* PASID based IOTLB */ > > + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}, > > + /* PASID based dev TLBs */ > > + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0}, > > + /* PASID cache */ > > + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0}, > > +}; > > + > > +static inline int to_vtd_granularity(int type, int granu, u64 > > *vtd_granu) > nit: this looks a bit weird to me to manipulate an u64 here. Why not > use a int Yes, should be int. > > +{ > > + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= > > IOMMU_INV_GRANU_NR || > > + !inv_type_granu_map[type][granu]) > > + return -EINVAL; > > + > > + *vtd_granu = inv_type_granu_table[type][granu];> + > > + return 0;
Re: [PATCH V9 06/10] iommu/vt-d: Add svm/sva invalidate function
On Wed, 12 Feb 2020 14:13:37 +0100 Auger Eric wrote: > Hi Jacob, > > On 1/29/20 7:01 AM, Jacob Pan wrote: > > When Shared Virtual Address (SVA) is enabled for a guest OS via > > vIOMMU, we need to provide invalidation support at IOMMU API and > > driver level. This patch adds Intel VT-d specific function to > > implement iommu passdown invalidate API for shared virtual address. > > > > The use case is for supporting caching structure invalidation > > of assigned SVM capable devices. Emulated IOMMU exposes queue > > invalidation capability and passes down all descriptors from the > > guest to the physical IOMMU. > > > > The assumption is that guest to host device ID mapping should be > > resolved prior to calling IOMMU driver. Based on the device handle, > > host IOMMU driver can replace certain fields before submit to the > > invalidation queue. > > > > Signed-off-by: Jacob Pan > > Signed-off-by: Ashok Raj > > Signed-off-by: Liu, Yi L > > I sent comments on the v7 in https://lkml.org/lkml/2019/11/12/266 > I don't see any of them taken into account and if I am not wrong we > did not discuss their (un)relevance on the ML ;-) > > I let you have a look at them then. > Sorry, I missed it. Let me reply to your original comments. Thanks! > Thanks > > Eric > > --- > > drivers/iommu/intel-iommu.c | 173 > > 1 file changed, 173 > > insertions(+) > > > > diff --git a/drivers/iommu/intel-iommu.c > > b/drivers/iommu/intel-iommu.c index 8a4136e805ac..b8aa6479b87f > > 100644 --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -5605,6 +5605,178 @@ static void > > intel_iommu_aux_detach_device(struct iommu_domain *domain, > > aux_domain_remove_dev(to_dmar_domain(domain), dev); } > > > > +/* > > + * 2D array for converting and sanitizing IOMMU generic TLB > > granularity to > > + * VT-d granularity. Invalidation is typically included in the > > unmap operation > > + * as a result of DMA or VFIO unmap. However, for assigned device > > where guest > > + * could own the first level page tables without being shadowed by > > QEMU. In > > + * this case there is no pass down unmap to the host IOMMU as a > > result of unmap > > + * in the guest. Only invalidations are trapped and passed down. > > + * In all cases, only first level TLB invalidation (request with > > PASID) can be > > + * passed down, therefore we do not include IOTLB granularity for > > request > > + * without PASID (second level). > > + * > > + * For an example, to find the VT-d granularity encoding for IOTLB > > + * type and page selective granularity within PASID: > > + * X: indexed by iommu cache type > > + * Y: indexed by enum iommu_inv_granularity > > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR] > > + * > > + * Granu_map array indicates validity of the table. 1: valid, 0: > > invalid > > + * > > + */ > > +const static int > > inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = { > > + /* PASID based IOTLB, support PASID selective and page > > selective */ > > + {0, 1, 1}, > > + /* PASID based dev TLBs, only support all PASIDs or single > > PASID */ > > + {1, 1, 0}, > > + /* PASID cache */ > > + {1, 1, 0} > > +}; > > + > > +const static u64 > > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = > > { > > + /* PASID based IOTLB */ > > + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}, > > + /* PASID based dev TLBs */ > > + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0}, > > + /* PASID cache */ > > + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0}, > > +}; > > + > > +static inline int to_vtd_granularity(int type, int granu, u64 > > *vtd_granu) +{ > > + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= > > IOMMU_INV_GRANU_NR || > > + !inv_type_granu_map[type][granu]) > > + return -EINVAL; > > + > > + *vtd_granu = inv_type_granu_table[type][granu]; > > + > > + return 0; > > +} > > + > > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules) > > +{ > > + u64 nr_pages = (granu_size * nr_granules) >> > > VTD_PAGE_SHIFT; + > > + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 > > for 2MB, etc. > > +* IOMMU cache invalidate API passes granu_size in bytes, > > and number of > > +* granu size in contiguous memory. > > +*/ > > + return order_base_2(nr_pages); > > +} > > + > > +#ifdef CONFIG_INTEL_IOMMU_SVM > > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain, > > + struct device *dev, struct > > iommu_cache_invalidate_info *inv_info) +{ > > + struct dmar_domain *dmar_domain = to_dmar_domain(domain); > > + struct device_domain_info *info; > > + struct intel_iommu *iommu; > > + unsigned long flags; > > + int cache_type; > > + u8 bus, devfn; > > + u16 did, sid; > > + int ret = 0; > > + u64 size; > > + > > + if (!inv_info || !dmar_domain || > > + inv_info->version != > > IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
Re: [PATCH V9 05/10] iommu/vt-d: Support flushing more translation cache types
Hi Eric, On Wed, 12 Feb 2020 13:55:25 +0100 Auger Eric wrote: > Hi Jacob, > > On 1/29/20 7:01 AM, Jacob Pan wrote: > > When Shared Virtual Memory is exposed to a guest via vIOMMU, > > scalable IOTLB invalidation may be passed down from outside IOMMU > > subsystems. This patch adds invalidation functions that can be used > > for additional translation cache types. > > > > Signed-off-by: Jacob Pan > > --- > > drivers/iommu/dmar.c| 33 + > > drivers/iommu/intel-pasid.c | 3 ++- > > include/linux/intel-iommu.h | 20 > > 3 files changed, 51 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > > index 071bb42bbbc5..206733ec8140 100644 > > --- a/drivers/iommu/dmar.c > > +++ b/drivers/iommu/dmar.c > > @@ -1411,6 +1411,39 @@ void qi_flush_piotlb(struct intel_iommu > > *iommu, u16 did, u32 pasid, u64 addr, qi_submit_sync(, iommu); > > } > > > > +/* PASID-based device IOTLB Invalidate */ > > +void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, > > u16 pfsid, > > + u32 pasid, u16 qdep, u64 addr, unsigned > > size_order, u64 granu) +{ > > + struct qi_desc desc = {.qw2 = 0, .qw3 = 0}; > > + > > + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | > > QI_DEV_EIOTLB_SID(sid) | > > + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE | > > + QI_DEV_IOTLB_PFSID(pfsid); > > + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu); > > + > > + /* If S bit is 0, we only flush a single page. If S bit is > > set, > > +* The least significant zero bit indicates the > > invalidation address > > +* range. VT-d spec 6.5.2.6. > > +* e.g. address bit 12[0] indicates 8KB, 13[0] indicates > > 16KB. > > +*/ > > + if (!size_order) { > > + desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & > > ~QI_DEV_EIOTLB_SIZE; > > + } else { > > + unsigned long mask = 1UL << (VTD_PAGE_SHIFT + > > size_order); > > + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | > > QI_DEV_EIOTLB_SIZE; > > + } > > + qi_submit_sync(, iommu); > I made some comments in > https://lkml.org/lkml/2019/8/14/1311 > that do not seem to have been taken into account. Or do I miss > something? > I missed adding these changes. At the time Baolu was doing cache flush consolidation so I wasn't sure if I could use his code completely. This patch is on top of his consolidated flush code with what is still needed for vSVA. Then I forgot to address your comments. Sorry about that. > More generally having an individual history log would be useful and > speed up the review. > Will add history to each patch, e.g. like this? --- v8 -> v9 --- > Thanks > > Eric > > +} > > + > > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 > > granu, int pasid) +{ > > + struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0}; > > + > > + desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | > > QI_PC_GRAN(granu) | QI_PC_TYPE; > > + qi_submit_sync(, iommu); > > +} > > + > > /* > > * Disable Queued Invalidation interface. > > */ > > diff --git a/drivers/iommu/intel-pasid.c > > b/drivers/iommu/intel-pasid.c index bd067af4d20b..b100f51407f9 > > 100644 --- a/drivers/iommu/intel-pasid.c > > +++ b/drivers/iommu/intel-pasid.c > > @@ -435,7 +435,8 @@ pasid_cache_invalidation_with_pasid(struct > > intel_iommu *iommu, { > > struct qi_desc desc; > > > > - desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | > > QI_PC_PASID(pasid); > > + desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) | > > + QI_PC_PASID(pasid) | QI_PC_TYPE; > > desc.qw1 = 0; > > desc.qw2 = 0; > > desc.qw3 = 0; > > diff --git a/include/linux/intel-iommu.h > > b/include/linux/intel-iommu.h index b0ffecbc0dfc..dd9fa61689bc > > 100644 --- a/include/linux/intel-iommu.h > > +++ b/include/linux/intel-iommu.h > > @@ -332,7 +332,7 @@ enum { > > #define QI_IOTLB_GRAN(gran)(((u64)gran) >> > > (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr) > > (((u64)addr) & VTD_PAGE_MASK) #define > > QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define > > QI_IOTLB_AM(am) (((u8)am)) +#define > > QI_IOTLB_AM(am) (((u8)am) & 0x3f) > > #define QI_CC_FM(fm) (((u64)fm) << 48) > > #define QI_CC_SID(sid) (((u64)sid) << 32) > > @@ -351,16 +351,21 @@ enum { > > #define QI_PC_DID(did) (((u64)did) << 16) > > #define QI_PC_GRAN(gran) (((u64)gran) << 4) > > > > -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0)) > > -#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1)) > > +/* PASID cache invalidation granu */ > > +#define QI_PC_ALL_PASIDS 0 > > +#define QI_PC_PASID_SEL1 > > > > #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK) > > #define QI_EIOTLB_IH(ih) (((u64)ih) << 6) > > -#define QI_EIOTLB_AM(am) (((u64)am)) > > +#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f) > > #define QI_EIOTLB_PASID(pasid) (((u64)pasid)
[RFC PATCH] iommu/iova: Add a best-fit algorithm
From: Liam Mark Using the best-fit algorithm, instead of the first-fit algorithm, may reduce fragmentation when allocating IOVAs. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/dma-iommu.c | 17 +++ drivers/iommu/iova.c | 73 +-- include/linux/dma-iommu.h | 7 + include/linux/iova.h | 1 + 4 files changed, 96 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index a2e96a5..af08770 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -364,9 +364,26 @@ static int iommu_dma_deferred_attach(struct device *dev, if (unlikely(ops->is_attach_deferred && ops->is_attach_deferred(domain, dev))) return iommu_attach_device(domain, dev); + return 0; +} + +/* + * Should be called prior to using dma-apis. + */ +int iommu_dma_enable_best_fit_algo(struct device *dev) +{ + struct iommu_domain *domain; + struct iova_domain *iovad; + + domain = iommu_get_domain_for_dev(dev); + if (!domain || !domain->iova_cookie) + return -EINVAL; + iovad = &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad; + iovad->best_fit = true; return 0; } +EXPORT_SYMBOL(iommu_dma_enable_best_fit_algo); /** * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 0e6a953..716b05f 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -50,6 +50,7 @@ static unsigned long iova_rcache_get(struct iova_domain *iovad, iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR; rb_link_node(>anchor.node, NULL, >rbroot.rb_node); rb_insert_color(>anchor.node, >rbroot); + iovad->best_fit = false; init_iova_rcaches(iovad); } EXPORT_SYMBOL_GPL(init_iova_domain); @@ -227,6 +228,69 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, return -ENOMEM; } +static int __alloc_and_insert_iova_best_fit(struct iova_domain *iovad, + unsigned long size, unsigned long limit_pfn, + struct iova *new, bool size_aligned) +{ + struct rb_node *curr, *prev; + struct iova *curr_iova, *prev_iova; + unsigned long flags; + unsigned long align_mask = ~0UL; + struct rb_node *candidate_rb_parent; + unsigned long new_pfn, candidate_pfn = ~0UL; + unsigned long gap, candidate_gap = ~0UL; + + if (size_aligned) + align_mask <<= limit_align(iovad, fls_long(size - 1)); + + /* Walk the tree backwards */ + spin_lock_irqsave(>iova_rbtree_lock, flags); + curr = >anchor.node; + prev = rb_prev(curr); + for (; prev; curr = prev, prev = rb_prev(curr)) { + curr_iova = rb_entry(curr, struct iova, node); + prev_iova = rb_entry(prev, struct iova, node); + + limit_pfn = min(limit_pfn, curr_iova->pfn_lo); + new_pfn = (limit_pfn - size) & align_mask; + gap = curr_iova->pfn_lo - prev_iova->pfn_hi - 1; + if ((limit_pfn >= size) && (new_pfn > prev_iova->pfn_hi) + && (gap < candidate_gap)) { + candidate_gap = gap; + candidate_pfn = new_pfn; + candidate_rb_parent = curr; + if (gap == size) + goto insert; + } + } + + curr_iova = rb_entry(curr, struct iova, node); + limit_pfn = min(limit_pfn, curr_iova->pfn_lo); + new_pfn = (limit_pfn - size) & align_mask; + gap = curr_iova->pfn_lo - iovad->start_pfn; + if (limit_pfn >= size && new_pfn >= iovad->start_pfn && + gap < candidate_gap) { + candidate_gap = gap; + candidate_pfn = new_pfn; + candidate_rb_parent = curr; + } + +insert: + if (candidate_pfn == ~0UL) { + spin_unlock_irqrestore(>iova_rbtree_lock, flags); + return -ENOMEM; + } + + /* pfn_lo will point to size aligned address if size_aligned is set */ + new->pfn_lo = candidate_pfn; + new->pfn_hi = new->pfn_lo + size - 1; + + /* If we have 'prev', it's a valid place to start the insertion. */ + iova_insert_rbtree(>rbroot, new, candidate_rb_parent); + spin_unlock_irqrestore(>iova_rbtree_lock, flags); + return 0; +} + static struct kmem_cache *iova_cache; static unsigned int iova_cache_users; static DEFINE_MUTEX(iova_cache_mutex); @@ -302,8 +366,13 @@ struct iova * if (!new_iova) return NULL; - ret = __alloc_and_insert_iova_range(iovad, size, limit_pfn + 1, - new_iova, size_aligned); + if (iovad->best_fit) { + ret = __alloc_and_insert_iova_best_fit(iovad, size, +
[RFC PATCH] iommu/dma: Allow drivers to reserve an iova range
From: Liam Mark Some devices have a memory map which contains gaps or holes. In order for the device to have as much IOVA space as possible, allow its driver to inform the DMA-IOMMU layer that it should not allocate addresses from these holes. Change-Id: I15bd1d313d889c2572d0eb2adecf6bebde3267f7 Signed-off-by: Isaac J. Manjarres --- drivers/iommu/dma-iommu.c | 28 include/linux/dma-iommu.h | 9 + 2 files changed, 37 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index a2e96a5..3b83e1a 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -368,6 +368,34 @@ static int iommu_dma_deferred_attach(struct device *dev, return 0; } +/* + * Should be called prior to using dma-apis + */ +int iommu_dma_reserve_iova(struct device *dev, dma_addr_t base, + u64 size) +{ + struct iommu_domain *domain; + struct iommu_dma_cookie *cookie; + struct iova_domain *iovad; + unsigned long pfn_lo, pfn_hi; + + domain = iommu_get_domain_for_dev(dev); + if (!domain || !domain->iova_cookie) + return -EINVAL; + + cookie = domain->iova_cookie; + iovad = >iovad; + + /* iova will be freed automatically by put_iova_domain() */ + pfn_lo = iova_pfn(iovad, base); + pfn_hi = iova_pfn(iovad, base + size - 1); + if (!reserve_iova(iovad, pfn_lo, pfn_hi)) + return -EINVAL; + + return 0; +} +EXPORT_SYMBOL(iommu_dma_reserve_iova); + /** * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API *page flags. diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 2112f21..79eef7c 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -37,6 +37,9 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); +int iommu_dma_reserve_iova(struct device *dev, dma_addr_t base, + u64 size); + #else /* CONFIG_IOMMU_DMA */ struct iommu_domain; @@ -78,5 +81,11 @@ static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_he { } +static inline int iommu_dma_reserve_iova(struct device *dev, dma_addr_t base, +u64 size) +{ + return -ENODEV; +} + #endif /* CONFIG_IOMMU_DMA */ #endif /* __DMA_IOMMU_H */ -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu: fix module name for parameters
Commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module") introduced a side effect that changed the module name from arm-smmu to arm-smmu-mod. This breaks the users of kernel parameters for the driver (e.g. arm-smmu.disable_bypass). This patch changes the module name for parameters back to arm-smmu to be consistent with older kernel. Signed-off-by: Li Yang --- drivers/iommu/arm-smmu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 16c4b87af42b..8d5a19bfde5c 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -58,6 +58,8 @@ #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 +#undef MODULE_PARAM_PREFIX +#define MODULE_PARAM_PREFIX "arm-smmu." static int force_stage; module_param(force_stage, int, S_IRUGO); MODULE_PARM_DESC(force_stage, -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1
Hi Jerry, On 2020-02-14 8:13 pm, Jerry Snitselaar wrote: Hi Will, On a gigabyte system with Cavium CN8xx, when doing a fio test against an nvme drive we are seeing the following: [ 637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80136000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80118000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80100036, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801ba000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8013e000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 Those "IOVAs" don't look much like IOVAs from the DMA allocator - if they were physical addresses, would they correspond to an expected region of the physical memory map? I would suspect that this is most likely misbehaviour in the NVMe driver (issuing a write to a non-DMA-mapped address), and the SMMU is just doing its job in blocking and reporting it. I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I couldn't narrow it down further into 5.4-rc1. I don't know smmu or the code well, any thoughts on where to start digging into this? fio test that is being run is: #fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite -ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting -name=mytest -numjobs=32 Just to clarify, do other tests work OK on the same device? Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1
Hi Will, On a gigabyte system with Cavium CN8xx, when doing a fio test against an nvme drive we are seeing the following: [ 637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80136000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80118000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80100036, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801ba000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8013e000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I couldn't narrow it down further into 5.4-rc1. I don't know smmu or the code well, any thoughts on where to start digging into this? fio test that is being run is: #fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite -ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting -name=mytest -numjobs=32 Regards, Jerry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm64 iommu groups issue
On 14/02/2020 2:09 pm, John Garry wrote: @@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) /* Set up MSI IRQ domain */ pci_set_msi_domain(dev); + parent = dev->dev.parent; + if (parent && parent->bus == _bus_type) + device_link_add(>dev, parent, DL_FLAG_AUTOPROBE_CONSUMER); + /* Notifier could use PCI capabilities */ dev->match_driver = false; ret = device_add(>dev); -- This would work, but the problem is that if the port driver fails in probing - and not just for -EPROBE_DEFER - then the child device will never probe. This very thing happens on my dev board. However we could expand the device links API to cover this sort of scenario. Yes, that's an undesirable issue, but in fact I think it's mostly indicative that involving drivers in something which is designed to happen at a level below drivers is still fundamentally wrong and doomed to be fragile at best. Right, and even worse is that it relies on the port driver even existing at all. All this iommu group assignment should be taken outside device driver probe paths. However we could still consider device links for sync'ing the SMMU and each device probing. Yes, we should get that for DT now thanks to the of_devlink stuff, but cooking up some equivalent for IORT might be worthwhile. Another thought that crosses my mind is that when pci_device_group() walks up to the point of ACS isolation and doesn't find an existing group, it can still infer that everything it walked past *should* be put in the same group it's then eventually going to return. Unfortunately I can't see an obvious way for it to act on that knowledge, though, since recursive iommu_probe_device() is unlikely to end well. I'd be inclined not to change that code. As for alternatives, it looks pretty difficult to me to disassociate the group allocation from the dma_configure path. Indeed it's non-trivial, but it really does need cleaning up at some point. Having just had yet another spark, does something like the untested super-hack below work at all? I tried it and it doesn't (yet) work. Bleh - further reinforcement of the "ideas after 6PM are bad ideas" rule... So when we try iommu_bus_replay()->add_iommu_group()->iommu_probe_device()->arm_smmu_add_device(), the iommu_fwspec is still NULL for that device - this is not set until later when the device driver is going to finally probe in iort_iommu_xlate()->iommu_fwspec_init(), and it's too late... And this looks to be the reason for which current iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also. Of course, just adding a 'correct' add_device replay without the of_xlate process doesn't help at all. No wonder this looked suspiciously simpler than where the first idea left off... (on reflection, the core of this idea seems to be recycling the existing iommu_bus_init walk rather than building up a separate "waiting list", while forgetting that that wasn't the difficult part of the original idea anyway) On this current code mentioned, the principle of this seems wrong to me - we call bus_for_each_device(..., add_iommu_group) for the first SMMU in the system which probes, but we attempt to add_iommu_group() for all devices on the bus, even though the SMMU for that device may yet to have probed. Yes, iommu_bus_init() is one of the places still holding a deeply-ingrained assumption that the ops go live for all IOMMU instances at once, which is what warranted the further replay in of_iommu_configure() originally. Moving that out of of_platform_device_create() to support probe deferral is where the trouble really started. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] PCI: Add DMA configuration for virtual platforms
On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote: Hardware platforms usually describe the IOMMU topology using either device-tree pointers or vendor-specific ACPI tables. For virtual platforms that don't provide a device-tree, the virtio-iommu device contains a description of the endpoints it manages. That information allows us to probe endpoints after the IOMMU is probed (possibly as late as userspace modprobe), provided it is discovered early enough. Add a hook to pci_dma_configure(), which returns -EPROBE_DEFER if the endpoint is managed by a vIOMMU that will be loaded later, or 0 in any other case to avoid disturbing the normal DMA configuration methods. When CONFIG_VIRTIO_IOMMU_TOPOLOGY isn't selected, the call to virt_dma_configure() is compiled out. As long as the information is consistent, platforms can provide both a device-tree and a built-in topology, and the IOMMU infrastructure is able to deal with multiple DMA configuration methods. Urgh, it's already been established[1] that having IOMMU setup tied to DMA configuration at driver probe time is not just conceptually wrong but actually broken, so the concept here worries me a bit. In a world where of_iommu_configure() and friends are being called much earlier around iommu_probe_device() time, how badly will this fall apart? Robin. [1] https://lore.kernel.org/linux-iommu/9625faf4-48ef-2dd3-d82f-931d9cf26...@huawei.com/ Signed-off-by: Jean-Philippe Brucker --- drivers/pci/pci-driver.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 0454ca0e4e3f..69303a814f21 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "pci.h" #include "pcie/portdrv.h" @@ -1602,6 +1603,10 @@ static int pci_dma_configure(struct device *dev) struct device *bridge; int ret = 0; + ret = virt_dma_configure(dev); + if (ret) + return ret; + bridge = pci_get_host_bridge_device(to_pci_dev(dev)); if (IS_ENABLED(CONFIG_OF) && bridge->parent && ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] iommu/virtio: Enable x86 support
On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote: With the built-in topology description in place, x86 platforms can now use the virtio-iommu. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 068d4e0e3541..adcbda44d473 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -508,8 +508,9 @@ config HYPERV_IOMMU config VIRTIO_IOMMU bool "Virtio IOMMU driver" depends on VIRTIO=y - depends on ARM64 + depends on (ARM64 || X86) select IOMMU_API + select IOMMU_DMA Can that have an "if X86" for clarity? AIUI it's not necessary for virtio-iommu itself (and really shouldn't be), but is merely to satisfy the x86 arch code's expectation that IOMMU drivers bring their own DMA ops, right? Robin. select INTERVAL_TREE help Para-virtualised IOMMU driver with virtio. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/vt-d: consider real PCI device when checking if mapping is needed
Hi Daniel, sorry for the delay On Fri, 2020-02-14 at 17:02 +0800, Daniel Drake wrote: > From: Jon Derrick > > The PCI devices handled by intel-iommu may have a DMA requester on > another bus, such as VMD subdevices needing to use the VMD endpoint. > > The real DMA device is now used for the DMA mapping, but one case was > missed earlier: if the VMD device (and hence subdevices too) are under > IOMMU_DOMAIN_IDENTITY, mappings do not work. > > Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by > creating an iommu DMA mapping, and fall back on dma_direct_map_page() > for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY > case is broken when intel_page_page() handles a subdevice. intel_map_page? > > We observe that at iommu attach time, dmar_insert_one_dev_info() for > the subdevices will never set dev->archdata.iommu. This is because > that function uses find_domain() to check if there is already an IOMMU > for the device, and find_domain() then defers to the real DMA device > which does have one. Thus dmar_insert_one_dev_info() returns without > assigning dev->archdata.iommu. > > Then, later: > > 1. intel_map_page() checks if an IOMMU mapping is needed by calling >iommu_need_mapping() on the subdevice. identity_mapping() returns >false because dev->archdata.iommu is NULL, so this function >returns false indicating that mapping is needed. > 2. __intel_map_single() is called to create the mapping. > 3. __intel_map_single() calls find_domain(). This function now returns >the IDENTITY domain corresponding to the real DMA device. > 4. __intel_map_single() calls domain_get_iommu() on this "real" domain. >A failure is hit and the entire operation is aborted, because this >codepath is not intended to handle IDENTITY mappings: >if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA)) >return NULL; > > Fix this by using the real DMA device when checking if a mapping is > needed, while also considering the subdevice DMA mask. > The IDENTITY case will then directly fall back on dma_direct_map_page(). > > Reported-by: Daniel Drake > Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") > Signed-off-by: Daniel Drake > --- > > Notes: > v2: switch to Jon's approach instead. > > This problem was detected with a non-upstream patch > "PCI: Add Intel remapped NVMe device support" > (https://marc.info/?l=linux-ide=156015271021615=2) > > This patch creates PCI devices a bit like VMD, and hence > I believe VMD would hit this class of problem for any cases where > the VMD device is in the IDENTITY domain. (I presume the reason this > bug was not seen already there is that it is in a DMA iommu domain). > > However this hasn't actually been tested on VMD (don't have the hardware) > so if I've missed anything and/or it's not a real issue then feel free to > drop this patch. > > drivers/iommu/intel-iommu.c | 16 ++-- > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 9dc37672bf89..edbe2866b515 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -3582,19 +3582,23 @@ static struct dmar_domain > *get_private_domain_for_dev(struct device *dev) > /* Check if the dev needs to go through non-identity map and unmap process.*/ > static bool iommu_need_mapping(struct device *dev) > { > + u64 dma_mask, required_dma_mask; > int ret; > > if (iommu_dummy(dev)) > return false; > > - ret = identity_mapping(dev); > - if (ret) { > - u64 dma_mask = *dev->dma_mask; > + dma_mask = *dev->dma_mask; > + if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) > + dma_mask = dev->coherent_dma_mask; > + required_dma_mask = dma_direct_get_required_mask(dev); > > - if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) > - dma_mask = dev->coherent_dma_mask; > + if (dev_is_pci(dev)) > + dev = _real_dma_dev(to_pci_dev(dev))->dev; > > - if (dma_mask >= dma_direct_get_required_mask(dev)) > + ret = identity_mapping(dev); > + if (ret) { > + if (dma_mask >= required_dma_mask) > return false; > > /* I think this might work better since it shortcuts the mask check in the non-identity case. Tests fine when VMD is forced into Identity domain. Feel free to add my sign-off for either patch you go with. Thanks, Jon diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9dc3767..7ffd252 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3582,12 +3582,16 @@ static struct dmar_domain *get_private_domain_for_dev(struct device *dev) /* Check if the dev needs to go through non-identity map and unmap
[PATCH] iommu/virtio: Build virtio-iommu as module
From: Jean-Philippe Brucker Now that the infrastructure changes are in place, enable virtio-iommu to be built as a module. Remove the redundant pci_request_acs() call, since it's not exported but is already invoked during DMA setup. Signed-off-by: Jean-Philippe Brucker --- This conflicts with the multiplatform work [1] since they both change Kconfig. Locally I have this patch applied on top of that series but there is no functional dependency between the two. [1] https://lore.kernel.org/linux-iommu/20200214160413.1475396-1-jean-phili...@linaro.org/ --- drivers/iommu/Kconfig| 4 ++-- drivers/iommu/virtio-iommu.c | 1 - 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index adcbda44d473..bfd4e5fcd6aa 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -506,8 +506,8 @@ config HYPERV_IOMMU guests to run with x2APIC mode enabled. config VIRTIO_IOMMU - bool "Virtio IOMMU driver" - depends on VIRTIO=y + tristate "Virtio IOMMU driver" + depends on VIRTIO depends on (ARM64 || X86) select IOMMU_API select IOMMU_DMA diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index f18ba8e22ebd..5429c12c879b 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -1084,7 +1084,6 @@ static int viommu_probe(struct virtio_device *vdev) #ifdef CONFIG_PCI if (pci_bus_type.iommu_ops != _ops) { - pci_request_acs(); ret = bus_set_iommu(_bus_type, _ops); if (ret) goto err_unregister; -- 2.25.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.4 072/100] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
From: Will Deacon [ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ] If, for some bizarre reason, the compiler decided to split up the write of STE DWORD 0, we could end up making a partial structure valid. Although this probably won't happen, follow the example of the context-descriptor code and use WRITE_ONCE() to ensure atomicity of the write. Reported-by: Jean-Philippe Brucker Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index eb9937225d645..6c10f307a1c98 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1090,7 +1090,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, } arm_smmu_sync_ste_for_sid(smmu, sid); - dst[0] = cpu_to_le64(val); + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(dst[0], cpu_to_le64(val)); arm_smmu_sync_ste_for_sid(smmu, sid); /* It's likely that we'll want to use the new STE soon */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.9 102/141] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
From: Will Deacon [ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ] If, for some bizarre reason, the compiler decided to split up the write of STE DWORD 0, we could end up making a partial structure valid. Although this probably won't happen, follow the example of the context-descriptor code and use WRITE_ONCE() to ensure atomicity of the write. Reported-by: Jean-Philippe Brucker Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 7bd98585d78d2..48d3820087881 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1103,7 +1103,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, } arm_smmu_sync_ste_for_sid(smmu, sid); - dst[0] = cpu_to_le64(val); + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(dst[0], cpu_to_le64(val)); arm_smmu_sync_ste_for_sid(smmu, sid); /* It's likely that we'll want to use the new STE soon */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.14 131/186] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
From: Will Deacon [ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ] If, for some bizarre reason, the compiler decided to split up the write of STE DWORD 0, we could end up making a partial structure valid. Although this probably won't happen, follow the example of the context-descriptor code and use WRITE_ONCE() to ensure atomicity of the write. Reported-by: Jean-Philippe Brucker Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 09eb258a9a7de..29feafa8007fb 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1145,7 +1145,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, } arm_smmu_sync_ste_for_sid(smmu, sid); - dst[0] = cpu_to_le64(val); + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(dst[0], cpu_to_le64(val)); arm_smmu_sync_ste_for_sid(smmu, sid); /* It's likely that we'll want to use the new STE soon */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.19 214/252] iommu/vt-d: Remove unnecessary WARN_ON_ONCE()
From: Lu Baolu [ Upstream commit 857f081426e5aa38313426c13373730f1345fe95 ] Address field in device TLB invalidation descriptor is qualified by the S field. If S field is zero, a single page at page address specified by address [63:12] is requested to be invalidated. If S field is set, the least significant bit in the address field with value 0b (say bit N) indicates the invalidation address range. The spec doesn't require the address [N - 1, 0] to be cleared, hence remove the unnecessary WARN_ON_ONCE(). Otherwise, the caller might set "mask = MAX_AGAW_PFN_WIDTH" in order to invalidating all the cached mappings on an endpoint, and below overflow error will be triggered. [...] UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1354:3 shift exponent 64 is too large for 64-bit type 'long long unsigned int' [...] Reported-and-tested-by: Frank Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/dmar.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 7f9824b0609e7..72994d67bc5b9 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1345,7 +1345,6 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, struct qi_desc desc; if (mask) { - WARN_ON_ONCE(addr & ((1ULL << (VTD_PAGE_SHIFT + mask)) - 1)); addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1; desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE; } else -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.19 143/252] iommu/arm-smmu-v3: Populate VMID field for CMDQ_OP_TLBI_NH_VA
From: Shameer Kolothum [ Upstream commit 935d43ba272e0001f8ef446a3eff15d8175cb11b ] CMDQ_OP_TLBI_NH_VA requires VMID and this was missing since commit 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA"). Add it back. Fixes: 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA") Signed-off-by: Shameer Kolothum Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 2ab7100bcff12..eff1f3aa5ef43 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -810,6 +810,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31); break; case CMDQ_OP_TLBI_NH_VA: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf); cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.19 177/252] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
From: Will Deacon [ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ] If, for some bizarre reason, the compiler decided to split up the write of STE DWORD 0, we could end up making a partial structure valid. Although this probably won't happen, follow the example of the context-descriptor code and use WRITE_ONCE() to ensure atomicity of the write. Reported-by: Jean-Philippe Brucker Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index eff1f3aa5ef43..6b7664052b5be 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1185,7 +1185,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, } arm_smmu_sync_ste_for_sid(smmu, sid); - dst[0] = cpu_to_le64(val); + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(dst[0], cpu_to_le64(val)); arm_smmu_sync_ste_for_sid(smmu, sid); /* It's likely that we'll want to use the new STE soon */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.19 019/252] iommu/vt-d: Fix off-by-one in PASID allocation
From: Jacob Pan [ Upstream commit 39d630e332144028f56abba83d94291978e72df1 ] PASID allocator uses IDR which is exclusive for the end of the allocation range. There is no need to decrement pasid_max. Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA") Reported-by: Eric Auger Signed-off-by: Jacob Pan Reviewed-by: Eric Auger Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index fd8730b2cd46e..5944d3b4dca37 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -377,7 +377,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ /* Do not use PASID 0 in caching mode (virtualised IOMMU) */ ret = intel_pasid_alloc_id(svm, !!cap_caching_mode(iommu->cap), - pasid_max - 1, GFP_KERNEL); + pasid_max, GFP_KERNEL); if (ret < 0) { kfree(svm); kfree(sdev); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.4 395/459] iommu/vt-d: Remove unnecessary WARN_ON_ONCE()
From: Lu Baolu [ Upstream commit 857f081426e5aa38313426c13373730f1345fe95 ] Address field in device TLB invalidation descriptor is qualified by the S field. If S field is zero, a single page at page address specified by address [63:12] is requested to be invalidated. If S field is set, the least significant bit in the address field with value 0b (say bit N) indicates the invalidation address range. The spec doesn't require the address [N - 1, 0] to be cleared, hence remove the unnecessary WARN_ON_ONCE(). Otherwise, the caller might set "mask = MAX_AGAW_PFN_WIDTH" in order to invalidating all the cached mappings on an endpoint, and below overflow error will be triggered. [...] UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1354:3 shift exponent 64 is too large for 64-bit type 'long long unsigned int' [...] Reported-and-tested-by: Frank Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/dmar.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index eecd6a4216672..7196cabafb252 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1351,7 +1351,6 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, struct qi_desc desc; if (mask) { - WARN_ON_ONCE(addr & ((1ULL << (VTD_PAGE_SHIFT + mask)) - 1)); addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1; desc.qw1 = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE; } else -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.4 322/459] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
From: Will Deacon [ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ] If, for some bizarre reason, the compiler decided to split up the write of STE DWORD 0, we could end up making a partial structure valid. Although this probably won't happen, follow the example of the context-descriptor code and use WRITE_ONCE() to ensure atomicity of the write. Reported-by: Jean-Philippe Brucker Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index ee8d48d863e16..ef6af714a7e64 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1643,7 +1643,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, STRTAB_STE_1_EATS_TRANS)); arm_smmu_sync_ste_for_sid(smmu, sid); - dst[0] = cpu_to_le64(val); + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(dst[0], cpu_to_le64(val)); arm_smmu_sync_ste_for_sid(smmu, sid); /* It's likely that we'll want to use the new STE soon */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.4 271/459] iommu/arm-smmu-v3: Populate VMID field for CMDQ_OP_TLBI_NH_VA
From: Shameer Kolothum [ Upstream commit 935d43ba272e0001f8ef446a3eff15d8175cb11b ] CMDQ_OP_TLBI_NH_VA requires VMID and this was missing since commit 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA"). Add it back. Fixes: 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA") Signed-off-by: Shameer Kolothum Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index ed90361b84dc7..ee8d48d863e16 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -856,6 +856,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31); break; case CMDQ_OP_TLBI_NH_VA: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf); cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.4 253/459] iommu/vt-d: Match CPU and IOMMU paging mode
From: Jacob Pan [ Upstream commit 79db7e1b4cf2a006f556099c13de3b12970fc6e3 ] When setting up first level page tables for sharing with CPU, we need to ensure IOMMU can support no less than the levels supported by the CPU. It is not adequate, as in the current code, to set up 5-level paging in PASID entry First Level Paging Mode(FLPM) solely based on CPU. Currently, intel_pasid_setup_first_level() is only used by native SVM code which already checks paging mode matches. However, future use of this helper function may not be limited to native SVM. https://lkml.org/lkml/2019/11/18/1037 Fixes: 437f35e1cd4c8 ("iommu/vt-d: Add first level page table interface") Signed-off-by: Jacob Pan Reviewed-by: Eric Auger Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-pasid.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index 040a445be3009..e7cb0b8a73327 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -499,8 +499,16 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu, } #ifdef CONFIG_X86 - if (cpu_feature_enabled(X86_FEATURE_LA57)) - pasid_set_flpm(pte, 1); + /* Both CPU and IOMMU paging mode need to match */ + if (cpu_feature_enabled(X86_FEATURE_LA57)) { + if (cap_5lp_support(iommu->cap)) { + pasid_set_flpm(pte, 1); + } else { + pr_err("VT-d has no 5-level paging support for CPU\n"); + pasid_clear_entry(pte); + return -EINVAL; + } + } #endif /* CONFIG_X86 */ pasid_set_domain_id(pte, did); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/3] virtio-iommu on non-devicetree platforms
Add topology description to the virtio-iommu driver and enable x86 platforms. Since the RFC [1] I've mostly given up on ACPI tables, since the internal discussions seem to have reached a dead end. The built-in topology description presented here isn't ideal, but it is simple to implement and doesn't impose a dependency on ACPI or device-tree, which can be beneficial to lightweight hypervisors. The built-in description is an array in the virtio config space. The driver parses the config space early and postpones endpoint probe until the virtio-iommu device is ready. Each element in the array describes either a PCI range or a single MMIO endpoint, and their associated endpoint IDs: struct virtio_iommu_topo_pci_range { __le16 type;/* 1: PCI range */ __le16 hierarchy; /* PCI domain number */ __le16 requester_start; /* First BDF */ __le16 requester_end; /* Last BDF */ __le32 endpoint_start; /* First endpoint ID */ }; struct virtio_iommu_topo_endpoint { __le16 type;/* 2: Endpoint */ __le16 reserved;/* 0 */ __le32 endpoint;/* Endpoint ID */ __le64 address; /* First MMIO address */ }; You can find the QEMU patches based on Eric's latest device on my virtio-iommu/devel branch [2]. I test on both x86 q35, and aarch64 virt machine with edk2. [1] https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/ [2] https://jpbrucker.net/git/qemu virtio-iommu/devel Jean-Philippe Brucker (3): iommu/virtio: Add topology description to virtio-iommu config space PCI: Add DMA configuration for virtual platforms iommu/virtio: Enable x86 support MAINTAINERS | 2 + drivers/iommu/Kconfig | 13 +- drivers/iommu/Makefile| 1 + drivers/iommu/virtio-iommu-topology.c | 343 ++ drivers/iommu/virtio-iommu.c | 3 + drivers/pci/pci-driver.c | 5 + include/linux/virt_iommu.h| 19 ++ include/uapi/linux/virtio_iommu.h | 26 ++ 8 files changed, 411 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/virtio-iommu-topology.c create mode 100644 include/linux/virt_iommu.h -- 2.25.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/3] iommu/virtio: Add topology description to virtio-iommu config space
Platforms without device-tree do not currently have a method for describing the vIOMMU topology. Provide a topology description embedded into the virtio device. Use PCI FIXUP to probe the config space early, because we need to discover the topology before any DMA configuration takes place, and the virtio driver may be loaded much later. Since we discover the topology description when probing the PCI hierarchy, the virtual IOMMU cannot manage other platform devices discovered earlier. This solution isn't elegant nor foolproof, but is the best we can do at the moment and works with existing virtio-iommu implementations. It also enables an IOMMU for lightweight hypervisors that do not rely on firmware methods for booting. Signed-off-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- MAINTAINERS | 2 + drivers/iommu/Kconfig | 10 + drivers/iommu/Makefile| 1 + drivers/iommu/virtio-iommu-topology.c | 343 ++ drivers/iommu/virtio-iommu.c | 3 + include/linux/virt_iommu.h| 19 ++ include/uapi/linux/virtio_iommu.h | 26 ++ 7 files changed, 404 insertions(+) create mode 100644 drivers/iommu/virtio-iommu-topology.c create mode 100644 include/linux/virt_iommu.h diff --git a/MAINTAINERS b/MAINTAINERS index 38fe2f3f7b6f..6b978b0d0c90 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17780,6 +17780,8 @@ M: Jean-Philippe Brucker L: virtualizat...@lists.linux-foundation.org S: Maintained F: drivers/iommu/virtio-iommu.c +F: drivers/iommu/virtio-iommu-topology.c +F: include/linux/virt_iommu.h F: include/uapi/linux/virtio_iommu.h VIRTUAL BOX GUEST DEVICE DRIVER diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index d2fade984999..068d4e0e3541 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -516,4 +516,14 @@ config VIRTIO_IOMMU Say Y here if you intend to run this kernel as a guest. +config VIRTIO_IOMMU_TOPOLOGY + bool "Topology properties for the virtio-iommu" + depends on VIRTIO_IOMMU + default y + help + Enable early probing of the virtio-iommu device, to detect the + built-in topology description. + + Say Y here if you intend to run this kernel as a guest. + endif # IOMMU_SUPPORT diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 2104fb8afc06..f295cacf9c6e 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -37,3 +37,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o diff --git a/drivers/iommu/virtio-iommu-topology.c b/drivers/iommu/virtio-iommu-topology.c new file mode 100644 index ..e4ab49701df5 --- /dev/null +++ b/drivers/iommu/virtio-iommu-topology.c @@ -0,0 +1,343 @@ +// SPDX-License-Identifier: GPL-2.0 +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include + +struct viommu_cap_config { + u8 bar; + u32 length; /* structure size */ + u32 offset; /* structure offset within the bar */ +}; + +union viommu_topo_cfg { + __le16 type; + struct virtio_iommu_topo_pci_range pci; + struct virtio_iommu_topo_endpoint ep; +}; + +struct viommu_spec { + struct device *dev; /* transport device */ + struct fwnode_handle*fwnode; + struct iommu_ops*ops; + struct list_headlist; + size_t num_items; + /* The config array of length num_items follows */ + union viommu_topo_cfg cfg[]; +}; + +static LIST_HEAD(viommus); +static DEFINE_MUTEX(viommus_lock); + +#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field) + +static inline int viommu_pci_find_capability(struct pci_dev *dev, u8 cfg_type, +struct viommu_cap_config *cap) +{ + int pos; + u8 bar; + + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR); +pos > 0; +pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) { + u8 type; + + pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), ); + if (type != cfg_type) + continue; + + pci_read_config_byte(dev, pos + VPCI_FIELD(bar), ); + + /* Ignore structures with reserved BAR values */ + if (type != VIRTIO_PCI_CAP_PCI_CFG && bar > 0x5) + continue; + + cap->bar = bar; + pci_read_config_dword(dev, pos + VPCI_FIELD(length), + >length); +
[PATCH 3/3] iommu/virtio: Enable x86 support
With the built-in topology description in place, x86 platforms can now use the virtio-iommu. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 068d4e0e3541..adcbda44d473 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -508,8 +508,9 @@ config HYPERV_IOMMU config VIRTIO_IOMMU bool "Virtio IOMMU driver" depends on VIRTIO=y - depends on ARM64 + depends on (ARM64 || X86) select IOMMU_API + select IOMMU_DMA select INTERVAL_TREE help Para-virtualised IOMMU driver with virtio. -- 2.25.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.4 254/459] iommu/vt-d: Avoid sending invalid page response
From: Jacob Pan [ Upstream commit 5f75585e19cc7018bf2016aa771632081ee2f313 ] Page responses should only be sent when last page in group (LPIG) or private data is present in the page request. This patch avoids sending invalid descriptors. Fixes: 5d308fc1ecf53 ("iommu/vt-d: Add 256-bit invalidation descriptor support") Signed-off-by: Jacob Pan Reviewed-by: Eric Auger Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-svm.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index ff7a3f9add325..518d0b2d12afd 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -654,11 +654,10 @@ static irqreturn_t prq_event_thread(int irq, void *d) if (req->priv_data_present) memcpy(, req->priv_data, sizeof(req->priv_data)); + resp.qw2 = 0; + resp.qw3 = 0; + qi_submit_sync(, iommu); } - resp.qw2 = 0; - resp.qw3 = 0; - qi_submit_sync(, iommu); - head = (head + sizeof(*req)) & PRQ_RING_MASK; } -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.4 220/459] iommu/amd: Check feature support bit before accessing MSI capability registers
From: Suravee Suthikulpanit [ Upstream commit 813071438e83d338ba5cfe98b3b26c890dc0a6c0 ] The IOMMU MMIO access to MSI capability registers is available only if the EFR[MsiCapMmioSup] is set. Current implementation assumes this bit is set if the EFR[XtSup] is set, which might not be the case. Fix by checking the EFR[MsiCapMmioSup] before accessing the MSI address low/high and MSI data registers via the MMIO. Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts') Signed-off-by: Suravee Suthikulpanit Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_init.c | 17 - drivers/iommu/amd_iommu_types.h | 1 + 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 483f7bc379fa8..61628c906ce11 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -147,7 +147,7 @@ bool amd_iommu_dump; bool amd_iommu_irq_remap __read_mostly; int amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_VAPIC; -static int amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE; +static int amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE; static bool amd_iommu_detected; static bool __initdata amd_iommu_disabled; @@ -1534,8 +1534,15 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h) iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET; if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY; - if (((h->efr_reg & (0x1 << IOMMU_EFR_XTSUP_SHIFT)) == 0)) - amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE; + /* +* Note: Since iommu_update_intcapxt() leverages +* the IOMMU MMIO access to MSI capability block registers +* for MSI address lo/hi/data, we need to check both +* EFR[XtSup] and EFR[MsiCapMmioSup] for x2APIC support. +*/ + if ((h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT)) && + (h->efr_reg & BIT(IOMMU_EFR_MSICAPMMIOSUP_SHIFT))) + amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE; break; default: return -EINVAL; @@ -1996,8 +2003,8 @@ static int iommu_init_intcapxt(struct amd_iommu *iommu) struct irq_affinity_notify *notify = >intcapxt_notify; /** -* IntCapXT requires XTSup=1, which can be inferred -* amd_iommu_xt_mode. +* IntCapXT requires XTSup=1 and MsiCapMmioSup=1, +* which can be inferred from amd_iommu_xt_mode. */ if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE) return 0; diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index fc956479b94e6..1b4c340890662 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -383,6 +383,7 @@ /* IOMMU Extended Feature Register (EFR) */ #define IOMMU_EFR_XTSUP_SHIFT 2 #define IOMMU_EFR_GASUP_SHIFT 7 +#define IOMMU_EFR_MSICAPMMIOSUP_SHIFT 46 #define MAX_DOMAIN_ID 65536 -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.4 221/459] iommu/amd: Only support x2APIC with IVHD type 11h/40h
From: Suravee Suthikulpanit [ Upstream commit 966b753cf3969553ca50bacd2b8c4ddade5ecc9e ] Current implementation for IOMMU x2APIC support makes use of the MMIO access to MSI capability block registers, which requires checking EFR[MsiCapMmioSup]. However, only IVHD type 11h/40h contain the information, and not in the IVHD type 10h IOMMU feature reporting field. Since the BIOS in newer systems, which supports x2APIC, would normally contain IVHD type 11h/40h, remove the IOMMU_FEAT_XTSUP_SHIFT check for IVHD type 10h, and only support x2APIC with IVHD type 11h/40h. Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts') Signed-off-by: Suravee Suthikulpanit Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_init.c | 2 -- drivers/iommu/amd_iommu_types.h | 1 - 2 files changed, 3 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 61628c906ce11..d7cbca8bf2cd4 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1523,8 +1523,6 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h) iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET; if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0)) amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY; - if (((h->efr_attr & (0x1 << IOMMU_FEAT_XTSUP_SHIFT)) == 0)) - amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE; break; case 0x11: case 0x40: diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index 1b4c340890662..daeabd98c60e2 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -377,7 +377,6 @@ #define IOMMU_CAP_EFR 27 /* IOMMU Feature Reporting Field (for IVHD type 10h */ -#define IOMMU_FEAT_XTSUP_SHIFT 0 #define IOMMU_FEAT_GASUP_SHIFT 6 /* IOMMU Extended Feature Register (EFR) */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.4 196/459] PCI: Add nr_devfns parameter to pci_add_dma_alias()
From: James Sewart [ Upstream commit 09298542cd891b43778db1f65aa3613aa5a562eb ] Add a "nr_devfns" parameter to pci_add_dma_alias() so it can be used to create DMA aliases for a range of devfns. [bhelgaas: incorporate nr_devfns fix from James, update quirk_pex_vca_alias() and setup_aliases()] Signed-off-by: James Sewart Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu.c | 7 ++- drivers/pci/pci.c | 22 +- drivers/pci/quirks.c | 23 +-- include/linux/pci.h | 2 +- 4 files changed, 29 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 454695b372c8c..8bd5d608a82c2 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -272,11 +272,8 @@ static struct pci_dev *setup_aliases(struct device *dev) */ ivrs_alias = amd_iommu_alias_table[pci_dev_id(pdev)]; if (ivrs_alias != pci_dev_id(pdev) && - PCI_BUS_NUM(ivrs_alias) == pdev->bus->number) { - pci_add_dma_alias(pdev, ivrs_alias & 0xff); - pci_info(pdev, "Added PCI DMA alias %02x.%d\n", - PCI_SLOT(ivrs_alias), PCI_FUNC(ivrs_alias)); - } + PCI_BUS_NUM(ivrs_alias) == pdev->bus->number) + pci_add_dma_alias(pdev, ivrs_alias & 0xff, 1); clone_aliases(pdev); diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index cbf3d3889874c..981ae16f935bc 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5875,7 +5875,8 @@ EXPORT_SYMBOL_GPL(pci_pr3_present); /** * pci_add_dma_alias - Add a DMA devfn alias for a device * @dev: the PCI device for which alias is added - * @devfn: alias slot and function + * @devfn_from: alias slot and function + * @nr_devfns: number of subsequent devfns to alias * * This helper encodes an 8-bit devfn as a bit number in dma_alias_mask * which is used to program permissible bus-devfn source addresses for DMA @@ -5891,8 +5892,13 @@ EXPORT_SYMBOL_GPL(pci_pr3_present); * cannot be left as a userspace activity). DMA aliases should therefore * be configured via quirks, such as the PCI fixup header quirk. */ -void pci_add_dma_alias(struct pci_dev *dev, u8 devfn) +void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, unsigned nr_devfns) { + int devfn_to; + + nr_devfns = min(nr_devfns, (unsigned) MAX_NR_DEVFNS - devfn_from); + devfn_to = devfn_from + nr_devfns - 1; + if (!dev->dma_alias_mask) dev->dma_alias_mask = bitmap_zalloc(MAX_NR_DEVFNS, GFP_KERNEL); if (!dev->dma_alias_mask) { @@ -5900,9 +5906,15 @@ void pci_add_dma_alias(struct pci_dev *dev, u8 devfn) return; } - set_bit(devfn, dev->dma_alias_mask); - pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n", -PCI_SLOT(devfn), PCI_FUNC(devfn)); + bitmap_set(dev->dma_alias_mask, devfn_from, nr_devfns); + + if (nr_devfns == 1) + pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n", + PCI_SLOT(devfn_from), PCI_FUNC(devfn_from)); + else if (nr_devfns > 1) + pci_info(dev, "Enabling fixed DMA alias for devfn range from %02x.%d to %02x.%d\n", + PCI_SLOT(devfn_from), PCI_FUNC(devfn_from), + PCI_SLOT(devfn_to), PCI_FUNC(devfn_to)); } bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 7b6df2d8d6cde..67a9ad3734d18 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3927,7 +3927,7 @@ int pci_dev_specific_reset(struct pci_dev *dev, int probe) static void quirk_dma_func0_alias(struct pci_dev *dev) { if (PCI_FUNC(dev->devfn) != 0) - pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 0)); + pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 0), 1); } /* @@ -3941,7 +3941,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_RICOH, 0xe476, quirk_dma_func0_alias); static void quirk_dma_func1_alias(struct pci_dev *dev) { if (PCI_FUNC(dev->devfn) != 1) - pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 1)); + pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 1), 1); } /* @@ -4026,7 +4026,7 @@ static void quirk_fixed_dma_alias(struct pci_dev *dev) id = pci_match_id(fixed_dma_alias_tbl, dev); if (id) - pci_add_dma_alias(dev, id->driver_data); + pci_add_dma_alias(dev, id->driver_data, 1); } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ADAPTEC2, 0x0285, quirk_fixed_dma_alias); @@ -4068,9 +4068,9 @@ DECLARE_PCI_FIXUP_HEADER(0x8086, 0x244e, quirk_use_pcie_bridge_dma_alias); */ static void quirk_mic_x200_dma_alias(struct pci_dev *pdev) { - pci_add_dma_alias(pdev, PCI_DEVFN(0x10, 0x0)); - pci_add_dma_alias(pdev,
[PATCH AUTOSEL 5.4 045/459] iommu/vt-d: Fix off-by-one in PASID allocation
From: Jacob Pan [ Upstream commit 39d630e332144028f56abba83d94291978e72df1 ] PASID allocator uses IDR which is exclusive for the end of the allocation range. There is no need to decrement pasid_max. Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA") Reported-by: Eric Auger Signed-off-by: Jacob Pan Reviewed-by: Eric Auger Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index dca88f9fdf29a..ff7a3f9add325 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -317,7 +317,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ /* Do not use PASID 0 in caching mode (virtualised IOMMU) */ ret = intel_pasid_alloc_id(svm, !!cap_caching_mode(iommu->cap), - pasid_max - 1, GFP_KERNEL); + pasid_max, GFP_KERNEL); if (ret < 0) { kfree(svm); kfree(sdev); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.5 459/542] iommu/vt-d: Remove unnecessary WARN_ON_ONCE()
From: Lu Baolu [ Upstream commit 857f081426e5aa38313426c13373730f1345fe95 ] Address field in device TLB invalidation descriptor is qualified by the S field. If S field is zero, a single page at page address specified by address [63:12] is requested to be invalidated. If S field is set, the least significant bit in the address field with value 0b (say bit N) indicates the invalidation address range. The spec doesn't require the address [N - 1, 0] to be cleared, hence remove the unnecessary WARN_ON_ONCE(). Otherwise, the caller might set "mask = MAX_AGAW_PFN_WIDTH" in order to invalidating all the cached mappings on an endpoint, and below overflow error will be triggered. [...] UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1354:3 shift exponent 64 is too large for 64-bit type 'long long unsigned int' [...] Reported-and-tested-by: Frank Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/dmar.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 3acfa6a25fa29..fb66f717127d2 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1354,7 +1354,6 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, struct qi_desc desc; if (mask) { - WARN_ON_ONCE(addr & ((1ULL << (VTD_PAGE_SHIFT + mask)) - 1)); addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1; desc.qw1 = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE; } else -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.5 458/542] iommu/vt-d: Mark firmware tainted if RMRR fails sanity check
From: Barret Rhoden [ Upstream commit f5a68bb0752e0cf77c06f53f72258e7beb41381b ] RMRR entries describe memory regions that are DMA targets for devices outside the kernel's control. RMRR entries that fail the sanity check are pointing to regions of memory that the firmware did not tell the kernel are reserved or otherwise should not be used. Instead of aborting DMAR processing, this commit marks the firmware as tainted. These RMRRs will still be identity mapped, otherwise, some devices, e.x. graphic devices, will not work during boot. Signed-off-by: Barret Rhoden Signed-off-by: Lu Baolu Fixes: f036c7fa0ab60 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-iommu.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 541896ab3d086..dfedbb04f647d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4320,12 +4320,16 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg) { struct acpi_dmar_reserved_memory *rmrr; struct dmar_rmrr_unit *rmrru; - int ret; rmrr = (struct acpi_dmar_reserved_memory *)header; - ret = arch_rmrr_sanity_check(rmrr); - if (ret) - return ret; + if (arch_rmrr_sanity_check(rmrr)) + WARN_TAINT(1, TAINT_FIRMWARE_WORKAROUND, + "Your BIOS is broken; bad RMRR [%#018Lx-%#018Lx]\n" + "BIOS vendor: %s; Ver: %s; Product Version: %s\n", + rmrr->base_address, rmrr->end_address, + dmi_get_system_info(DMI_BIOS_VENDOR), + dmi_get_system_info(DMI_BIOS_VERSION), + dmi_get_system_info(DMI_PRODUCT_VERSION)); rmrru = kzalloc(sizeof(*rmrru), GFP_KERNEL); if (!rmrru) -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.5 370/542] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
From: Will Deacon [ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ] If, for some bizarre reason, the compiler decided to split up the write of STE DWORD 0, we could end up making a partial structure valid. Although this probably won't happen, follow the example of the context-descriptor code and use WRITE_ONCE() to ensure atomicity of the write. Reported-by: Jean-Philippe Brucker Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 2f7680faba49e..6bd6a3f3f4710 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1643,7 +1643,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, STRTAB_STE_1_EATS_TRANS)); arm_smmu_sync_ste_for_sid(smmu, sid); - dst[0] = cpu_to_le64(val); + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(dst[0], cpu_to_le64(val)); arm_smmu_sync_ste_for_sid(smmu, sid); /* It's likely that we'll want to use the new STE soon */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.5 312/542] iommu/arm-smmu-v3: Populate VMID field for CMDQ_OP_TLBI_NH_VA
From: Shameer Kolothum [ Upstream commit 935d43ba272e0001f8ef446a3eff15d8175cb11b ] CMDQ_OP_TLBI_NH_VA requires VMID and this was missing since commit 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA"). Add it back. Fixes: 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA") Signed-off-by: Shameer Kolothum Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index effe72eb89e7f..2f7680faba49e 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -856,6 +856,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31); break; case CMDQ_OP_TLBI_NH_VA: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf); cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.5 291/542] iommu/vt-d: Match CPU and IOMMU paging mode
From: Jacob Pan [ Upstream commit 79db7e1b4cf2a006f556099c13de3b12970fc6e3 ] When setting up first level page tables for sharing with CPU, we need to ensure IOMMU can support no less than the levels supported by the CPU. It is not adequate, as in the current code, to set up 5-level paging in PASID entry First Level Paging Mode(FLPM) solely based on CPU. Currently, intel_pasid_setup_first_level() is only used by native SVM code which already checks paging mode matches. However, future use of this helper function may not be limited to native SVM. https://lkml.org/lkml/2019/11/18/1037 Fixes: 437f35e1cd4c8 ("iommu/vt-d: Add first level page table interface") Signed-off-by: Jacob Pan Reviewed-by: Eric Auger Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-pasid.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index 040a445be3009..e7cb0b8a73327 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -499,8 +499,16 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu, } #ifdef CONFIG_X86 - if (cpu_feature_enabled(X86_FEATURE_LA57)) - pasid_set_flpm(pte, 1); + /* Both CPU and IOMMU paging mode need to match */ + if (cpu_feature_enabled(X86_FEATURE_LA57)) { + if (cap_5lp_support(iommu->cap)) { + pasid_set_flpm(pte, 1); + } else { + pr_err("VT-d has no 5-level paging support for CPU\n"); + pasid_clear_entry(pte); + return -EINVAL; + } + } #endif /* CONFIG_X86 */ pasid_set_domain_id(pte, did); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.5 292/542] iommu/vt-d: Avoid sending invalid page response
From: Jacob Pan [ Upstream commit 5f75585e19cc7018bf2016aa771632081ee2f313 ] Page responses should only be sent when last page in group (LPIG) or private data is present in the page request. This patch avoids sending invalid descriptors. Fixes: 5d308fc1ecf53 ("iommu/vt-d: Add 256-bit invalidation descriptor support") Signed-off-by: Jacob Pan Reviewed-by: Eric Auger Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-svm.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index ff7a3f9add325..518d0b2d12afd 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -654,11 +654,10 @@ static irqreturn_t prq_event_thread(int irq, void *d) if (req->priv_data_present) memcpy(, req->priv_data, sizeof(req->priv_data)); + resp.qw2 = 0; + resp.qw3 = 0; + qi_submit_sync(, iommu); } - resp.qw2 = 0; - resp.qw3 = 0; - qi_submit_sync(, iommu); - head = (head + sizeof(*req)) & PRQ_RING_MASK; } -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.5 255/542] iommu/amd: Only support x2APIC with IVHD type 11h/40h
From: Suravee Suthikulpanit [ Upstream commit 966b753cf3969553ca50bacd2b8c4ddade5ecc9e ] Current implementation for IOMMU x2APIC support makes use of the MMIO access to MSI capability block registers, which requires checking EFR[MsiCapMmioSup]. However, only IVHD type 11h/40h contain the information, and not in the IVHD type 10h IOMMU feature reporting field. Since the BIOS in newer systems, which supports x2APIC, would normally contain IVHD type 11h/40h, remove the IOMMU_FEAT_XTSUP_SHIFT check for IVHD type 10h, and only support x2APIC with IVHD type 11h/40h. Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts') Signed-off-by: Suravee Suthikulpanit Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_init.c | 2 -- drivers/iommu/amd_iommu_types.h | 1 - 2 files changed, 3 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 61628c906ce11..d7cbca8bf2cd4 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1523,8 +1523,6 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h) iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET; if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0)) amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY; - if (((h->efr_attr & (0x1 << IOMMU_FEAT_XTSUP_SHIFT)) == 0)) - amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE; break; case 0x11: case 0x40: diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index f8a7945f3df90..798e1533a1471 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -377,7 +377,6 @@ #define IOMMU_CAP_EFR 27 /* IOMMU Feature Reporting Field (for IVHD type 10h */ -#define IOMMU_FEAT_XTSUP_SHIFT 0 #define IOMMU_FEAT_GASUP_SHIFT 6 /* IOMMU Extended Feature Register (EFR) */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.5 256/542] iommu/iova: Silence warnings under memory pressure
From: Qian Cai [ Upstream commit 944c9175397476199d4dd1028d87ddc582c35ee8 ] When running heavy memory pressure workloads, this 5+ old system is throwing endless warnings below because disk IO is too slow to recover from swapping. Since the volume from alloc_iova_fast() could be large, once it calls printk(), it will trigger disk IO (writing to the log files) and pending softirqs which could cause an infinite loop and make no progress for days by the ongoimng memory reclaim. This is the counter part for Intel where the AMD part has already been merged. See the commit 3d708895325b ("iommu/amd: Silence warnings under memory pressure"). Since the allocation failure will be reported in intel_alloc_iova(), so just call dev_err_once() there because even the "ratelimited" is too much, and silence the one in alloc_iova_mem() to avoid the expensive warn_alloc(). hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed slab_out_of_memory: 66 callbacks suppressed SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 0: slabs: 1822, objs: 16398, free: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 1: slabs: 2051, objs: 18459, free: 31 node 0: slabs: 697, objs: 4182, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) node 1: slabs: 381, objs: 2286, free: 27 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 hpsa :03:00.0: DMAR: Allocating 1-page iova failed warn_alloc: 96 callbacks suppressed kworker/11:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: GB Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 Workqueue: kblockd blk_mq_run_work_fn Call Trace: dump_stack+0xa0/0xea warn_alloc.cold.94+0x8a/0x12d __alloc_pages_slowpath+0x1750/0x1870 __alloc_pages_nodemask+0x58a/0x710 alloc_pages_current+0x9c/0x110 alloc_slab_page+0xc9/0x760 allocate_slab+0x48f/0x5d0 new_slab+0x46/0x70 ___slab_alloc+0x4ab/0x7b0 __slab_alloc+0x43/0x70 kmem_cache_alloc+0x2dd/0x450 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) alloc_iova+0x33/0x210 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 alloc_iova_fast+0x62/0x3d1 node 1: slabs: 381, objs: 2286, free: 27 intel_alloc_iova+0xce/0xe0 intel_map_sg+0xed/0x410 scsi_dma_map+0xd7/0x160 scsi_queue_rq+0xbf7/0x1310 blk_mq_dispatch_rq_list+0x4d9/0xbc0 blk_mq_sched_dispatch_requests+0x24a/0x300 __blk_mq_run_hw_queue+0x156/0x230 blk_mq_run_work_fn+0x3b/0x40 process_one_work+0x579/0xb90 worker_thread+0x63/0x5b0 kthread+0x1e6/0x210 ret_from_fork+0x3a/0x50
[PATCH AUTOSEL 5.5 254/542] iommu/amd: Check feature support bit before accessing MSI capability registers
From: Suravee Suthikulpanit [ Upstream commit 813071438e83d338ba5cfe98b3b26c890dc0a6c0 ] The IOMMU MMIO access to MSI capability registers is available only if the EFR[MsiCapMmioSup] is set. Current implementation assumes this bit is set if the EFR[XtSup] is set, which might not be the case. Fix by checking the EFR[MsiCapMmioSup] before accessing the MSI address low/high and MSI data registers via the MMIO. Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts') Signed-off-by: Suravee Suthikulpanit Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_init.c | 17 - drivers/iommu/amd_iommu_types.h | 1 + 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 483f7bc379fa8..61628c906ce11 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -147,7 +147,7 @@ bool amd_iommu_dump; bool amd_iommu_irq_remap __read_mostly; int amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_VAPIC; -static int amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE; +static int amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE; static bool amd_iommu_detected; static bool __initdata amd_iommu_disabled; @@ -1534,8 +1534,15 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h) iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET; if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY; - if (((h->efr_reg & (0x1 << IOMMU_EFR_XTSUP_SHIFT)) == 0)) - amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE; + /* +* Note: Since iommu_update_intcapxt() leverages +* the IOMMU MMIO access to MSI capability block registers +* for MSI address lo/hi/data, we need to check both +* EFR[XtSup] and EFR[MsiCapMmioSup] for x2APIC support. +*/ + if ((h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT)) && + (h->efr_reg & BIT(IOMMU_EFR_MSICAPMMIOSUP_SHIFT))) + amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE; break; default: return -EINVAL; @@ -1996,8 +2003,8 @@ static int iommu_init_intcapxt(struct amd_iommu *iommu) struct irq_affinity_notify *notify = >intcapxt_notify; /** -* IntCapXT requires XTSup=1, which can be inferred -* amd_iommu_xt_mode. +* IntCapXT requires XTSup=1 and MsiCapMmioSup=1, +* which can be inferred from amd_iommu_xt_mode. */ if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE) return 0; diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index f52f59d5c6bd4..f8a7945f3df90 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -383,6 +383,7 @@ /* IOMMU Extended Feature Register (EFR) */ #define IOMMU_EFR_XTSUP_SHIFT 2 #define IOMMU_EFR_GASUP_SHIFT 7 +#define IOMMU_EFR_MSICAPMMIOSUP_SHIFT 46 #define MAX_DOMAIN_ID 65536 -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 5.5 051/542] iommu/vt-d: Fix off-by-one in PASID allocation
From: Jacob Pan [ Upstream commit 39d630e332144028f56abba83d94291978e72df1 ] PASID allocator uses IDR which is exclusive for the end of the allocation range. There is no need to decrement pasid_max. Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA") Reported-by: Eric Auger Signed-off-by: Jacob Pan Reviewed-by: Eric Auger Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index dca88f9fdf29a..ff7a3f9add325 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -317,7 +317,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ /* Do not use PASID 0 in caching mode (virtualised IOMMU) */ ret = intel_pasid_alloc_id(svm, !!cap_caching_mode(iommu->cap), - pasid_max - 1, GFP_KERNEL); + pasid_max, GFP_KERNEL); if (ret < 0) { kfree(svm); kfree(sdev); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm64 iommu groups issue
@@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) /* Set up MSI IRQ domain */ pci_set_msi_domain(dev); + parent = dev->dev.parent; + if (parent && parent->bus == _bus_type) + device_link_add(>dev, parent, DL_FLAG_AUTOPROBE_CONSUMER); + /* Notifier could use PCI capabilities */ dev->match_driver = false; ret = device_add(>dev); -- This would work, but the problem is that if the port driver fails in probing - and not just for -EPROBE_DEFER - then the child device will never probe. This very thing happens on my dev board. However we could expand the device links API to cover this sort of scenario. Yes, that's an undesirable issue, but in fact I think it's mostly indicative that involving drivers in something which is designed to happen at a level below drivers is still fundamentally wrong and doomed to be fragile at best. Right, and even worse is that it relies on the port driver even existing at all. All this iommu group assignment should be taken outside device driver probe paths. However we could still consider device links for sync'ing the SMMU and each device probing. Another thought that crosses my mind is that when pci_device_group() walks up to the point of ACS isolation and doesn't find an existing group, it can still infer that everything it walked past *should* be put in the same group it's then eventually going to return. Unfortunately I can't see an obvious way for it to act on that knowledge, though, since recursive iommu_probe_device() is unlikely to end well. I'd be inclined not to change that code. As for alternatives, it looks pretty difficult to me to disassociate the group allocation from the dma_configure path. Indeed it's non-trivial, but it really does need cleaning up at some point. Having just had yet another spark, does something like the untested super-hack below work at all? I tried it and it doesn't (yet) work. So when we try iommu_bus_replay()->add_iommu_group()->iommu_probe_device()->arm_smmu_add_device(), the iommu_fwspec is still NULL for that device - this is not set until later when the device driver is going to finally probe in iort_iommu_xlate()->iommu_fwspec_init(), and it's too late... And this looks to be the reason for which current iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also. On this current code mentioned, the principle of this seems wrong to me - we call bus_for_each_device(..., add_iommu_group) for the first SMMU in the system which probes, but we attempt to add_iommu_group() for all devices on the bus, even though the SMMU for that device may yet to have probed. Thanks, John I doubt it's a viable direction to take in itself, but it could be food for thought if it at least proves the concept. Robin. ->8- diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index aa3ac2a03807..554cde76c766 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -3841,20 +3841,20 @@ static int arm_smmu_set_bus_ops(struct iommu_ops *ops) int err; #ifdef CONFIG_PCI - if (pci_bus_type.iommu_ops != ops) { + if (1) { err = bus_set_iommu(_bus_type, ops); if (err) return err; } #endif #ifdef CONFIG_ARM_AMBA - if (amba_bustype.iommu_ops != ops) { + if (1) { err = bus_set_iommu(_bustype, ops); if (err) goto err_reset_pci_ops; } #endif - if (platform_bus_type.iommu_ops != ops) { + if (1) { err = bus_set_iommu(_bus_type, ops); if (err) goto err_reset_amba_ops; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 660eea8d1d2f..b81ae2b4d4fb 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1542,6 +1542,14 @@ static int iommu_bus_init(struct bus_type *bus, const struct iommu_ops *ops) return err; } +static int iommu_bus_replay(struct device *dev, void *data) +{ + if (dev->iommu_group) + return 0; + + return add_iommu_group(dev, data); +} + /** * bus_set_iommu - set iommu-callbacks for the bus * @bus: bus. @@ -1564,6 +1572,9 @@ int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops) return 0; } + if (bus->iommu_ops == ops) + return bus_for_each_dev(bus, NULL, NULL, iommu_bus_replay); + if (bus->iommu_ops != NULL) return -EBUSY; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/vt-d: consider real PCI device when checking if mapping is needed
Hi, On 2020/2/14 17:02, Daniel Drake wrote: From: Jon Derrick The PCI devices handled by intel-iommu may have a DMA requester on another bus, such as VMD subdevices needing to use the VMD endpoint. The real DMA device is now used for the DMA mapping, but one case was missed earlier: if the VMD device (and hence subdevices too) are under IOMMU_DOMAIN_IDENTITY, mappings do not work. Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an iommu DMA mapping, and fall back on dma_direct_map_page() for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken when intel_page_page() handles a subdevice. We observe that at iommu attach time, dmar_insert_one_dev_info() for the subdevices will never set dev->archdata.iommu. This is because that function uses find_domain() to check if there is already an IOMMU for the device, and find_domain() then defers to the real DMA device which does have one. Thus dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu. Then, later: 1. intel_map_page() checks if an IOMMU mapping is needed by calling iommu_need_mapping() on the subdevice. identity_mapping() returns false because dev->archdata.iommu is NULL, so this function returns false indicating that mapping is needed. 2. __intel_map_single() is called to create the mapping. 3. __intel_map_single() calls find_domain(). This function now returns the IDENTITY domain corresponding to the real DMA device. 4. __intel_map_single() calls domain_get_iommu() on this "real" domain. A failure is hit and the entire operation is aborted, because this codepath is not intended to handle IDENTITY mappings: if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA)) return NULL; Fix this by using the real DMA device when checking if a mapping is needed, while also considering the subdevice DMA mask. The IDENTITY case will then directly fall back on dma_direct_map_page(). Reported-by: Daniel Drake Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Signed-off-by: Daniel Drake Why not have the patch author's signed-off-by? Best regards, baolu --- Notes: v2: switch to Jon's approach instead. This problem was detected with a non-upstream patch "PCI: Add Intel remapped NVMe device support" (https://marc.info/?l=linux-ide=156015271021615=2) This patch creates PCI devices a bit like VMD, and hence I believe VMD would hit this class of problem for any cases where the VMD device is in the IDENTITY domain. (I presume the reason this bug was not seen already there is that it is in a DMA iommu domain). However this hasn't actually been tested on VMD (don't have the hardware) so if I've missed anything and/or it's not a real issue then feel free to drop this patch. drivers/iommu/intel-iommu.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9dc37672bf89..edbe2866b515 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3582,19 +3582,23 @@ static struct dmar_domain *get_private_domain_for_dev(struct device *dev) /* Check if the dev needs to go through non-identity map and unmap process.*/ static bool iommu_need_mapping(struct device *dev) { + u64 dma_mask, required_dma_mask; int ret; if (iommu_dummy(dev)) return false; - ret = identity_mapping(dev); - if (ret) { - u64 dma_mask = *dev->dma_mask; + dma_mask = *dev->dma_mask; + if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) + dma_mask = dev->coherent_dma_mask; + required_dma_mask = dma_direct_get_required_mask(dev); - if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) - dma_mask = dev->coherent_dma_mask; + if (dev_is_pci(dev)) + dev = _real_dma_dev(to_pci_dev(dev))->dev; - if (dma_mask >= dma_direct_get_required_mask(dev)) + ret = identity_mapping(dev); + if (ret) { + if (dma_mask >= required_dma_mask) return false; /* ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/1] iommu/amd: Fix the configuration of GCR3 table root pointer
From: Adrian Huang The SPA of the GCR3 table root pointer[51:31] masks 20 bits. However, this requires 21 bits (Please see the AMD IOMMU specification). This leads to the potential failure when the bit 51 of SPA of the GCR3 table root pointer is 1'. Signed-off-by: Adrian Huang --- drivers/iommu/amd_iommu_types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index f8d01d6b00da..ca8c4522045b 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -348,7 +348,7 @@ #define DTE_GCR3_VAL_A(x) (((x) >> 12) & 0x7ULL) #define DTE_GCR3_VAL_B(x) (((x) >> 15) & 0x0ULL) -#define DTE_GCR3_VAL_C(x) (((x) >> 31) & 0xfULL) +#define DTE_GCR3_VAL_C(x) (((x) >> 31) & 0x1fULL) #define DTE_GCR3_INDEX_A 0 #define DTE_GCR3_INDEX_B 1 -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/vt-d: consider real PCI device when checking if mapping is needed
From: Jon Derrick The PCI devices handled by intel-iommu may have a DMA requester on another bus, such as VMD subdevices needing to use the VMD endpoint. The real DMA device is now used for the DMA mapping, but one case was missed earlier: if the VMD device (and hence subdevices too) are under IOMMU_DOMAIN_IDENTITY, mappings do not work. Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an iommu DMA mapping, and fall back on dma_direct_map_page() for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken when intel_page_page() handles a subdevice. We observe that at iommu attach time, dmar_insert_one_dev_info() for the subdevices will never set dev->archdata.iommu. This is because that function uses find_domain() to check if there is already an IOMMU for the device, and find_domain() then defers to the real DMA device which does have one. Thus dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu. Then, later: 1. intel_map_page() checks if an IOMMU mapping is needed by calling iommu_need_mapping() on the subdevice. identity_mapping() returns false because dev->archdata.iommu is NULL, so this function returns false indicating that mapping is needed. 2. __intel_map_single() is called to create the mapping. 3. __intel_map_single() calls find_domain(). This function now returns the IDENTITY domain corresponding to the real DMA device. 4. __intel_map_single() calls domain_get_iommu() on this "real" domain. A failure is hit and the entire operation is aborted, because this codepath is not intended to handle IDENTITY mappings: if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA)) return NULL; Fix this by using the real DMA device when checking if a mapping is needed, while also considering the subdevice DMA mask. The IDENTITY case will then directly fall back on dma_direct_map_page(). Reported-by: Daniel Drake Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Signed-off-by: Daniel Drake --- Notes: v2: switch to Jon's approach instead. This problem was detected with a non-upstream patch "PCI: Add Intel remapped NVMe device support" (https://marc.info/?l=linux-ide=156015271021615=2) This patch creates PCI devices a bit like VMD, and hence I believe VMD would hit this class of problem for any cases where the VMD device is in the IDENTITY domain. (I presume the reason this bug was not seen already there is that it is in a DMA iommu domain). However this hasn't actually been tested on VMD (don't have the hardware) so if I've missed anything and/or it's not a real issue then feel free to drop this patch. drivers/iommu/intel-iommu.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9dc37672bf89..edbe2866b515 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3582,19 +3582,23 @@ static struct dmar_domain *get_private_domain_for_dev(struct device *dev) /* Check if the dev needs to go through non-identity map and unmap process.*/ static bool iommu_need_mapping(struct device *dev) { + u64 dma_mask, required_dma_mask; int ret; if (iommu_dummy(dev)) return false; - ret = identity_mapping(dev); - if (ret) { - u64 dma_mask = *dev->dma_mask; + dma_mask = *dev->dma_mask; + if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) + dma_mask = dev->coherent_dma_mask; + required_dma_mask = dma_direct_get_required_mask(dev); - if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) - dma_mask = dev->coherent_dma_mask; + if (dev_is_pci(dev)) + dev = _real_dma_dev(to_pci_dev(dev))->dev; - if (dma_mask >= dma_direct_get_required_mask(dev)) + ret = identity_mapping(dev); + if (ret) { + if (dma_mask >= required_dma_mask) return false; /* -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu