Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

2020-02-14 Thread Jacob Pan
Hi Eric,

Thanks for the review, I somehow missed it, my apologies. See comments
below.

On Tue, 12 Nov 2019 11:28:37 +0100
Auger Eric  wrote:

> Hi Jacob,
> 
> On 10/24/19 9:55 PM, Jacob Pan wrote:
> > When Shared Virtual Address (SVA) is enabled for a guest OS via
> > vIOMMU, we need to provide invalidation support at IOMMU API and
> > driver level. This patch adds Intel VT-d specific function to
> > implement iommu passdown invalidate API for shared virtual address.
> > 
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the
> > guest to the physical IOMMU.
> > 
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/intel-iommu.c | 170
> >  1 file changed, 170
> > insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 5fab32fbc4b4..a73e76d6457a
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5491,6 +5491,175 @@ static void
> > intel_iommu_aux_detach_device(struct iommu_domain *domain,
> > aux_domain_remove_dev(to_dmar_domain(domain), dev); }
> >  
> > +/*
> > + * 2D array for converting and sanitizing IOMMU generic TLB
> > granularity to
> > + * VT-d granularity. Invalidation is typically included in the
> > unmap operation
> > + * as a result of DMA or VFIO unmap. However, for assigned device
> > where guest
> > + * could own the first level page tables without being shadowed by
> > QEMU. In  
> above sentence needs to be rephrased.
Yes, how about this:
/*
 * 2D array for converting and sanitizing IOMMU generic TLB granularity
to
 * VT-d granularity. Invalidation is typically included in the unmap
operation
 * as a result of DMA or VFIO unmap. However, for assigned devices guest
 * owns the first level page tables. Invalidations of translation
caches in the
 * guest are trapped and passed down to the host.
 *
 * vIOMMU in the guest will only expose first level page tables,
therefore
 * we do not include IOTLB granularity for request without PASID
(second level). *
 * For example, to find the VT-d granularity encoding for IOTLB

> > + * this case there is no pass down unmap to the host IOMMU as a
> > result of unmap
> > + * in the guest. Only invalidations are trapped and passed down.
> > + * In all cases, only first level TLB invalidation (request with
> > PASID) can be
> > + * passed down, therefore we do not include IOTLB granularity for
> > request
> > + * without PASID (second level).
> > + *
> > + * For an example, to find the VT-d granularity encoding for
> > IOTLB  
> for example
sounds better.

> > + * type and page selective granularity within PASID:
> > + * X: indexed by iommu cache type
> > + * Y: indexed by enum iommu_inv_granularity
> > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> > + *
> > + * Granu_map array indicates validity of the table. 1: valid, 0:
> > invalid
> > + *
> > + */
> > +const static int
> > inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
> > +   /* PASID based IOTLB, support PASID selective and page
> > selective */  
> I would rather use the generic terminology, ie. IOTLB invalidation
> supports PASID and ADDR granularity
Understood. My choice of terminology is based on VT-d spec and this is
VT-d only code. Perhaps add the generic terms by the side? i.e.
/*
 * PASID based IOTLB invalidation: PASID selective (per PASID),
 * page selective (address granularity)
 */

> > +   {0, 1, 1},> +   /* PASID based dev TLBs, only support
> > all PASIDs or single PASID */  
> Device IOLTB invalidation supports DOMAIN and PASID granularities
> > +   {1, 1, 0},
> > +   /* PASID cache */  
> PASID cache invalidation support DOMAIN and PASID granularity
> > +   {1, 1, 0}
> > +};
> > +
> > +const static u64
> > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] =
> > {
> > +   /* PASID based IOTLB */
> > +   {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> > +   /* PASID based dev TLBs */
> > +   {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
> > +   /* PASID cache */
> > +   {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
> > +};
> > +
> > +static inline int to_vtd_granularity(int type, int granu, u64
> > *vtd_granu)  
> nit: this looks a bit weird to me to manipulate an u64 here. Why not
> use a int
Yes, should be int.
> > +{
> > +   if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >=
> > IOMMU_INV_GRANU_NR ||
> > +   !inv_type_granu_map[type][granu])
> > +   return -EINVAL;
> > +
> > +   *vtd_granu = inv_type_granu_table[type][granu];> +
> > +   return 0;

Re: [PATCH V9 06/10] iommu/vt-d: Add svm/sva invalidate function

2020-02-14 Thread Jacob Pan
On Wed, 12 Feb 2020 14:13:37 +0100
Auger Eric  wrote:

> Hi Jacob,
> 
> On 1/29/20 7:01 AM, Jacob Pan wrote:
> > When Shared Virtual Address (SVA) is enabled for a guest OS via
> > vIOMMU, we need to provide invalidation support at IOMMU API and
> > driver level. This patch adds Intel VT-d specific function to
> > implement iommu passdown invalidate API for shared virtual address.
> > 
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the
> > guest to the physical IOMMU.
> > 
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > Signed-off-by: Liu, Yi L   
> 
> I sent comments on the v7 in https://lkml.org/lkml/2019/11/12/266
> I don't see any of them taken into account and if I am not wrong we
> did not discuss their (un)relevance on the ML ;-)
> 
> I let you have a look at them then.
> 
Sorry, I missed it. Let me reply to your original comments.
Thanks!

> Thanks
> 
> Eric
> > ---
> >  drivers/iommu/intel-iommu.c | 173
> >  1 file changed, 173
> > insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 8a4136e805ac..b8aa6479b87f
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5605,6 +5605,178 @@ static void
> > intel_iommu_aux_detach_device(struct iommu_domain *domain,
> > aux_domain_remove_dev(to_dmar_domain(domain), dev); }
> >  
> > +/*
> > + * 2D array for converting and sanitizing IOMMU generic TLB
> > granularity to
> > + * VT-d granularity. Invalidation is typically included in the
> > unmap operation
> > + * as a result of DMA or VFIO unmap. However, for assigned device
> > where guest
> > + * could own the first level page tables without being shadowed by
> > QEMU. In
> > + * this case there is no pass down unmap to the host IOMMU as a
> > result of unmap
> > + * in the guest. Only invalidations are trapped and passed down.
> > + * In all cases, only first level TLB invalidation (request with
> > PASID) can be
> > + * passed down, therefore we do not include IOTLB granularity for
> > request
> > + * without PASID (second level).
> > + *
> > + * For an example, to find the VT-d granularity encoding for IOTLB
> > + * type and page selective granularity within PASID:
> > + * X: indexed by iommu cache type
> > + * Y: indexed by enum iommu_inv_granularity
> > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> > + *
> > + * Granu_map array indicates validity of the table. 1: valid, 0:
> > invalid
> > + *
> > + */
> > +const static int
> > inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
> > +   /* PASID based IOTLB, support PASID selective and page
> > selective */
> > +   {0, 1, 1},
> > +   /* PASID based dev TLBs, only support all PASIDs or single
> > PASID */
> > +   {1, 1, 0},
> > +   /* PASID cache */
> > +   {1, 1, 0}
> > +};
> > +
> > +const static u64
> > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] =
> > {
> > +   /* PASID based IOTLB */
> > +   {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> > +   /* PASID based dev TLBs */
> > +   {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
> > +   /* PASID cache */
> > +   {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
> > +};
> > +
> > +static inline int to_vtd_granularity(int type, int granu, u64
> > *vtd_granu) +{
> > +   if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >=
> > IOMMU_INV_GRANU_NR ||
> > +   !inv_type_granu_map[type][granu])
> > +   return -EINVAL;
> > +
> > +   *vtd_granu = inv_type_granu_table[type][granu];
> > +
> > +   return 0;
> > +}
> > +
> > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> > +{
> > +   u64 nr_pages = (granu_size * nr_granules) >>
> > VTD_PAGE_SHIFT; +
> > +   /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9
> > for 2MB, etc.
> > +* IOMMU cache invalidate API passes granu_size in bytes,
> > and number of
> > +* granu size in contiguous memory.
> > +*/
> > +   return order_base_2(nr_pages);
> > +}
> > +
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> > +   struct device *dev, struct
> > iommu_cache_invalidate_info *inv_info) +{
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct device_domain_info *info;
> > +   struct intel_iommu *iommu;
> > +   unsigned long flags;
> > +   int cache_type;
> > +   u8 bus, devfn;
> > +   u16 did, sid;
> > +   int ret = 0;
> > +   u64 size;
> > +
> > +   if (!inv_info || !dmar_domain ||
> > +   inv_info->version !=
> > IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)

Re: [PATCH V9 05/10] iommu/vt-d: Support flushing more translation cache types

2020-02-14 Thread Jacob Pan
Hi Eric,

On Wed, 12 Feb 2020 13:55:25 +0100
Auger Eric  wrote:

> Hi Jacob,
> 
> On 1/29/20 7:01 AM, Jacob Pan wrote:
> > When Shared Virtual Memory is exposed to a guest via vIOMMU,
> > scalable IOTLB invalidation may be passed down from outside IOMMU
> > subsystems. This patch adds invalidation functions that can be used
> > for additional translation cache types.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/dmar.c| 33 +
> >  drivers/iommu/intel-pasid.c |  3 ++-
> >  include/linux/intel-iommu.h | 20 
> >  3 files changed, 51 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 071bb42bbbc5..206733ec8140 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1411,6 +1411,39 @@ void qi_flush_piotlb(struct intel_iommu
> > *iommu, u16 did, u32 pasid, u64 addr, qi_submit_sync(, iommu);
> >  }
> >  
> > +/* PASID-based device IOTLB Invalidate */
> > +void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid,
> > u16 pfsid,
> > +   u32 pasid,  u16 qdep, u64 addr, unsigned
> > size_order, u64 granu) +{
> > +   struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> > +
> > +   desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> > QI_DEV_EIOTLB_SID(sid) |
> > +   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> > +   QI_DEV_IOTLB_PFSID(pfsid);
> > +   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> > +
> > +   /* If S bit is 0, we only flush a single page. If S bit is
> > set,
> > +* The least significant zero bit indicates the
> > invalidation address
> > +* range. VT-d spec 6.5.2.6.
> > +* e.g. address bit 12[0] indicates 8KB, 13[0] indicates
> > 16KB.
> > +*/
> > +   if (!size_order) {
> > +   desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) &
> > ~QI_DEV_EIOTLB_SIZE;
> > +   } else {
> > +   unsigned long mask = 1UL << (VTD_PAGE_SHIFT +
> > size_order);
> > +   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) |
> > QI_DEV_EIOTLB_SIZE;
> > +   }
> > +   qi_submit_sync(, iommu);  
> I made some comments in
> https://lkml.org/lkml/2019/8/14/1311
> that do not seem to have been taken into account. Or do I miss
> something?
> 
I missed adding these changes. At the time Baolu was doing cache flush
consolidation so I wasn't sure if I could use his code completely. This
patch is on top of his consolidated flush code with what is still
needed for vSVA. Then I forgot to address your comments. Sorry about
that.

> More generally having an individual history log would be useful and
> speed up the review.
> 
Will add history to each patch, e.g. like this?
---
v8 -> v9
---
> Thanks
> 
> Eric
> > +}
> > +
> > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
> > granu, int pasid) +{
> > +   struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> > +
> > +   desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
> > QI_PC_GRAN(granu) | QI_PC_TYPE;
> > +   qi_submit_sync(, iommu);
> > +}
> > +
> >  /*
> >   * Disable Queued Invalidation interface.
> >   */
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index bd067af4d20b..b100f51407f9
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -435,7 +435,8 @@ pasid_cache_invalidation_with_pasid(struct
> > intel_iommu *iommu, {
> > struct qi_desc desc;
> >  
> > -   desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
> > QI_PC_PASID(pasid);
> > +   desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
> > +   QI_PC_PASID(pasid) | QI_PC_TYPE;
> > desc.qw1 = 0;
> > desc.qw2 = 0;
> > desc.qw3 = 0;
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index b0ffecbc0dfc..dd9fa61689bc
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -332,7 +332,7 @@ enum {
> >  #define QI_IOTLB_GRAN(gran)(((u64)gran) >>
> > (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr)
> > (((u64)addr) & VTD_PAGE_MASK) #define
> > QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define
> > QI_IOTLB_AM(am) (((u8)am)) +#define
> > QI_IOTLB_AM(am) (((u8)am) & 0x3f) 
> >  #define QI_CC_FM(fm)   (((u64)fm) << 48)
> >  #define QI_CC_SID(sid) (((u64)sid) << 32)
> > @@ -351,16 +351,21 @@ enum {
> >  #define QI_PC_DID(did) (((u64)did) << 16)
> >  #define QI_PC_GRAN(gran)   (((u64)gran) << 4)
> >  
> > -#define QI_PC_ALL_PASIDS   (QI_PC_TYPE | QI_PC_GRAN(0))
> > -#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1))
> > +/* PASID cache invalidation granu */
> > +#define QI_PC_ALL_PASIDS   0
> > +#define QI_PC_PASID_SEL1
> >  
> >  #define QI_EIOTLB_ADDR(addr)   ((u64)(addr) & VTD_PAGE_MASK)
> >  #define QI_EIOTLB_IH(ih)   (((u64)ih) << 6)
> > -#define QI_EIOTLB_AM(am)   (((u64)am))
> > +#define QI_EIOTLB_AM(am)   (((u64)am) & 0x3f)
> >  #define QI_EIOTLB_PASID(pasid) (((u64)pasid) 

[RFC PATCH] iommu/iova: Add a best-fit algorithm

2020-02-14 Thread Isaac J. Manjarres
From: Liam Mark 

Using the best-fit algorithm, instead of the first-fit
algorithm, may reduce fragmentation when allocating
IOVAs.

Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/dma-iommu.c | 17 +++
 drivers/iommu/iova.c  | 73 +--
 include/linux/dma-iommu.h |  7 +
 include/linux/iova.h  |  1 +
 4 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5..af08770 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -364,9 +364,26 @@ static int iommu_dma_deferred_attach(struct device *dev,
if (unlikely(ops->is_attach_deferred &&
ops->is_attach_deferred(domain, dev)))
return iommu_attach_device(domain, dev);
+   return 0;
+}
+
+/*
+ * Should be called prior to using dma-apis.
+ */
+int iommu_dma_enable_best_fit_algo(struct device *dev)
+{
+   struct iommu_domain *domain;
+   struct iova_domain *iovad;
+
+   domain = iommu_get_domain_for_dev(dev);
+   if (!domain || !domain->iova_cookie)
+   return -EINVAL;
 
+   iovad = &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad;
+   iovad->best_fit = true;
return 0;
 }
+EXPORT_SYMBOL(iommu_dma_enable_best_fit_algo);
 
 /**
  * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 0e6a953..716b05f 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -50,6 +50,7 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
rb_link_node(>anchor.node, NULL, >rbroot.rb_node);
rb_insert_color(>anchor.node, >rbroot);
+   iovad->best_fit = false;
init_iova_rcaches(iovad);
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
@@ -227,6 +228,69 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
return -ENOMEM;
 }
 
+static int __alloc_and_insert_iova_best_fit(struct iova_domain *iovad,
+   unsigned long size, unsigned long limit_pfn,
+   struct iova *new, bool size_aligned)
+{
+   struct rb_node *curr, *prev;
+   struct iova *curr_iova, *prev_iova;
+   unsigned long flags;
+   unsigned long align_mask = ~0UL;
+   struct rb_node *candidate_rb_parent;
+   unsigned long new_pfn, candidate_pfn = ~0UL;
+   unsigned long gap, candidate_gap = ~0UL;
+
+   if (size_aligned)
+   align_mask <<= limit_align(iovad, fls_long(size - 1));
+
+   /* Walk the tree backwards */
+   spin_lock_irqsave(>iova_rbtree_lock, flags);
+   curr = >anchor.node;
+   prev = rb_prev(curr);
+   for (; prev; curr = prev, prev = rb_prev(curr)) {
+   curr_iova = rb_entry(curr, struct iova, node);
+   prev_iova = rb_entry(prev, struct iova, node);
+
+   limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
+   new_pfn = (limit_pfn - size) & align_mask;
+   gap = curr_iova->pfn_lo - prev_iova->pfn_hi - 1;
+   if ((limit_pfn >= size) && (new_pfn > prev_iova->pfn_hi)
+   && (gap < candidate_gap)) {
+   candidate_gap = gap;
+   candidate_pfn = new_pfn;
+   candidate_rb_parent = curr;
+   if (gap == size)
+   goto insert;
+   }
+   }
+
+   curr_iova = rb_entry(curr, struct iova, node);
+   limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
+   new_pfn = (limit_pfn - size) & align_mask;
+   gap = curr_iova->pfn_lo - iovad->start_pfn;
+   if (limit_pfn >= size && new_pfn >= iovad->start_pfn &&
+   gap < candidate_gap) {
+   candidate_gap = gap;
+   candidate_pfn = new_pfn;
+   candidate_rb_parent = curr;
+   }
+
+insert:
+   if (candidate_pfn == ~0UL) {
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return -ENOMEM;
+   }
+
+   /* pfn_lo will point to size aligned address if size_aligned is set */
+   new->pfn_lo = candidate_pfn;
+   new->pfn_hi = new->pfn_lo + size - 1;
+
+   /* If we have 'prev', it's a valid place to start the insertion. */
+   iova_insert_rbtree(>rbroot, new, candidate_rb_parent);
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return 0;
+}
+
 static struct kmem_cache *iova_cache;
 static unsigned int iova_cache_users;
 static DEFINE_MUTEX(iova_cache_mutex);
@@ -302,8 +366,13 @@ struct iova *
if (!new_iova)
return NULL;
 
-   ret = __alloc_and_insert_iova_range(iovad, size, limit_pfn + 1,
-   new_iova, size_aligned);
+   if (iovad->best_fit) {
+   ret = __alloc_and_insert_iova_best_fit(iovad, size,
+   

[RFC PATCH] iommu/dma: Allow drivers to reserve an iova range

2020-02-14 Thread Isaac J. Manjarres
From: Liam Mark 

Some devices have a memory map which contains gaps or holes.
In order for the device to have as much IOVA space as possible,
allow its driver to inform the DMA-IOMMU layer that it should
not allocate addresses from these holes.

Change-Id: I15bd1d313d889c2572d0eb2adecf6bebde3267f7
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/dma-iommu.c | 28 
 include/linux/dma-iommu.h |  9 +
 2 files changed, 37 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5..3b83e1a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -368,6 +368,34 @@ static int iommu_dma_deferred_attach(struct device *dev,
return 0;
 }
 
+/*
+ * Should be called prior to using dma-apis
+ */
+int iommu_dma_reserve_iova(struct device *dev, dma_addr_t base,
+  u64 size)
+{
+   struct iommu_domain *domain;
+   struct iommu_dma_cookie *cookie;
+   struct iova_domain *iovad;
+   unsigned long pfn_lo, pfn_hi;
+
+   domain = iommu_get_domain_for_dev(dev);
+   if (!domain || !domain->iova_cookie)
+   return -EINVAL;
+
+   cookie = domain->iova_cookie;
+   iovad = >iovad;
+
+   /* iova will be freed automatically by put_iova_domain() */
+   pfn_lo = iova_pfn(iovad, base);
+   pfn_hi = iova_pfn(iovad, base + size - 1);
+   if (!reserve_iova(iovad, pfn_lo, pfn_hi))
+   return -EINVAL;
+
+   return 0;
+}
+EXPORT_SYMBOL(iommu_dma_reserve_iova);
+
 /**
  * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
  *page flags.
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 2112f21..79eef7c 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -37,6 +37,9 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
+int iommu_dma_reserve_iova(struct device *dev, dma_addr_t base,
+  u64 size);
+
 #else /* CONFIG_IOMMU_DMA */
 
 struct iommu_domain;
@@ -78,5 +81,11 @@ static inline void iommu_dma_get_resv_regions(struct device 
*dev, struct list_he
 {
 }
 
+static inline int iommu_dma_reserve_iova(struct device *dev, dma_addr_t base,
+u64 size)
+{
+   return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_DMA */
 #endif /* __DMA_IOMMU_H */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: fix module name for parameters

2020-02-14 Thread Li Yang
Commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module")
introduced a side effect that changed the module name from arm-smmu to
arm-smmu-mod.  This breaks the users of kernel parameters for the driver
(e.g. arm-smmu.disable_bypass).  This patch changes the module name for
parameters back to arm-smmu to be consistent with older kernel.

Signed-off-by: Li Yang 
---
 drivers/iommu/arm-smmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 16c4b87af42b..8d5a19bfde5c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -58,6 +58,8 @@
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
+#undef MODULE_PARAM_PREFIX
+#define MODULE_PARAM_PREFIX "arm-smmu."
 static int force_stage;
 module_param(force_stage, int, S_IRUGO);
 MODULE_PARM_DESC(force_stage,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1

2020-02-14 Thread Robin Murphy

Hi Jerry,

On 2020-02-14 8:13 pm, Jerry Snitselaar wrote:

Hi Will,

On a gigabyte system with Cavium CN8xx, when doing a fio test against
an nvme drive we are seeing the following:

[  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x80136000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x80118000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x80100036, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x801ba000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8013e000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, cb=7


Those "IOVAs" don't look much like IOVAs from the DMA allocator - if 
they were physical addresses, would they correspond to an expected 
region of the physical memory map?


I would suspect that this is most likely misbehaviour in the NVMe driver 
(issuing a write to a non-DMA-mapped address), and the SMMU is just 
doing its job in blocking and reporting it.


I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I 
couldn't narrow it down further into 5.4-rc1.
I don't know smmu or the code well, any thoughts on where to start 
digging into this?


fio test that is being run is:

#fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite 
-ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting 
-name=mytest -numjobs=32


Just to clarify, do other tests work OK on the same device?

Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1

2020-02-14 Thread Jerry Snitselaar

Hi Will,

On a gigabyte system with Cavium CN8xx, when doing a fio test against
an nvme drive we are seeing the following:

[  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x80136000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x80118000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x80100036, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x801ba000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8013e000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, cb=7

I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I couldn't 
narrow it down further into 5.4-rc1.
I don't know smmu or the code well, any thoughts on where to start digging into 
this?

fio test that is being run is:

#fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite -ioengine=libaio 
-bs=4k -runtime=43200 -size=-group_reporting -name=mytest -numjobs=32


Regards,
Jerry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm64 iommu groups issue

2020-02-14 Thread Robin Murphy

On 14/02/2020 2:09 pm, John Garry wrote:


@@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct
pci_bus *bus)
   /* Set up MSI IRQ domain */
   pci_set_msi_domain(dev);

+    parent = dev->dev.parent;
+    if (parent && parent->bus == _bus_type)
+    device_link_add(>dev, parent, DL_FLAG_AUTOPROBE_CONSUMER);
+
   /* Notifier could use PCI capabilities */
   dev->match_driver = false;
   ret = device_add(>dev);
--

This would work, but the problem is that if the port driver fails in
probing - and not just for -EPROBE_DEFER - then the child device will
never probe. This very thing happens on my dev board. However we could
expand the device links API to cover this sort of scenario.


Yes, that's an undesirable issue, but in fact I think it's mostly
indicative that involving drivers in something which is designed to
happen at a level below drivers is still fundamentally wrong and doomed
to be fragile at best.


Right, and even worse is that it relies on the port driver even existing 
at all.


All this iommu group assignment should be taken outside device driver 
probe paths.


However we could still consider device links for sync'ing the SMMU and 
each device probing.


Yes, we should get that for DT now thanks to the of_devlink stuff, but 
cooking up some equivalent for IORT might be worthwhile.



Another thought that crosses my mind is that when pci_device_group()
walks up to the point of ACS isolation and doesn't find an existing
group, it can still infer that everything it walked past *should* be put
in the same group it's then eventually going to return. Unfortunately I
can't see an obvious way for it to act on that knowledge, though, since
recursive iommu_probe_device() is unlikely to end well.


I'd be inclined not to change that code.




As for alternatives, it looks pretty difficult to me to disassociate the
group allocation from the dma_configure path.


Indeed it's non-trivial, but it really does need cleaning up at some 
point.


Having just had yet another spark, does something like the untested
super-hack below work at all? 


I tried it and it doesn't (yet) work.


Bleh - further reinforcement of the "ideas after 6PM are bad ideas" rule...

So when we try 
iommu_bus_replay()->add_iommu_group()->iommu_probe_device()->arm_smmu_add_device(), 

the iommu_fwspec is still NULL for that device - this is not set until 
later when the device driver is going to finally probe in 
iort_iommu_xlate()->iommu_fwspec_init(), and it's too late...


And this looks to be the reason for which current 
iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also.


Of course, just adding a 'correct' add_device replay without the 
of_xlate process doesn't help at all. No wonder this looked suspiciously 
simpler than where the first idea left off...


(on reflection, the core of this idea seems to be recycling the existing 
iommu_bus_init walk rather than building up a separate "waiting list", 
while forgetting that that wasn't the difficult part of the original 
idea anyway)


On this current code mentioned, the principle of this seems wrong to me 
- we call bus_for_each_device(..., add_iommu_group) for the first SMMU 
in the system which probes, but we attempt to add_iommu_group() for all 
devices on the bus, even though the SMMU for that device may yet to have 
probed.


Yes, iommu_bus_init() is one of the places still holding a 
deeply-ingrained assumption that the ops go live for all IOMMU instances 
at once, which is what warranted the further replay in 
of_iommu_configure() originally. Moving that out of 
of_platform_device_create() to support probe deferral is where the 
trouble really started.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] PCI: Add DMA configuration for virtual platforms

2020-02-14 Thread Robin Murphy

On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:

Hardware platforms usually describe the IOMMU topology using either
device-tree pointers or vendor-specific ACPI tables.  For virtual
platforms that don't provide a device-tree, the virtio-iommu device
contains a description of the endpoints it manages.  That information
allows us to probe endpoints after the IOMMU is probed (possibly as late
as userspace modprobe), provided it is discovered early enough.

Add a hook to pci_dma_configure(), which returns -EPROBE_DEFER if the
endpoint is managed by a vIOMMU that will be loaded later, or 0 in any
other case to avoid disturbing the normal DMA configuration methods.
When CONFIG_VIRTIO_IOMMU_TOPOLOGY isn't selected, the call to
virt_dma_configure() is compiled out.

As long as the information is consistent, platforms can provide both a
device-tree and a built-in topology, and the IOMMU infrastructure is
able to deal with multiple DMA configuration methods.


Urgh, it's already been established[1] that having IOMMU setup tied to 
DMA configuration at driver probe time is not just conceptually wrong 
but actually broken, so the concept here worries me a bit. In a world 
where of_iommu_configure() and friends are being called much earlier 
around iommu_probe_device() time, how badly will this fall apart?


Robin.

[1] 
https://lore.kernel.org/linux-iommu/9625faf4-48ef-2dd3-d82f-931d9cf26...@huawei.com/



Signed-off-by: Jean-Philippe Brucker 
---
  drivers/pci/pci-driver.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 0454ca0e4e3f..69303a814f21 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "pci.h"
  #include "pcie/portdrv.h"
  
@@ -1602,6 +1603,10 @@ static int pci_dma_configure(struct device *dev)

struct device *bridge;
int ret = 0;
  
+	ret = virt_dma_configure(dev);

+   if (ret)
+   return ret;
+
bridge = pci_get_host_bridge_device(to_pci_dev(dev));
  
  	if (IS_ENABLED(CONFIG_OF) && bridge->parent &&



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-14 Thread Robin Murphy

On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:

With the built-in topology description in place, x86 platforms can now
use the virtio-iommu.

Signed-off-by: Jean-Philippe Brucker 
---
  drivers/iommu/Kconfig | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 068d4e0e3541..adcbda44d473 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -508,8 +508,9 @@ config HYPERV_IOMMU
  config VIRTIO_IOMMU
bool "Virtio IOMMU driver"
depends on VIRTIO=y
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA


Can that have an "if X86" for clarity? AIUI it's not necessary for 
virtio-iommu itself (and really shouldn't be), but is merely to satisfy 
the x86 arch code's expectation that IOMMU drivers bring their own DMA 
ops, right?


Robin.


select INTERVAL_TREE
help
  Para-virtualised IOMMU driver with virtio.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: consider real PCI device when checking if mapping is needed

2020-02-14 Thread Derrick, Jonathan
Hi Daniel, sorry for the delay


On Fri, 2020-02-14 at 17:02 +0800, Daniel Drake wrote:
> From: Jon Derrick 
> 
> The PCI devices handled by intel-iommu may have a DMA requester on
> another bus, such as VMD subdevices needing to use the VMD endpoint.
> 
> The real DMA device is now used for the DMA mapping, but one case was
> missed earlier: if the VMD device (and hence subdevices too) are under
> IOMMU_DOMAIN_IDENTITY, mappings do not work.
> 
> Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by
> creating an iommu DMA mapping, and fall back on dma_direct_map_page()
> for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY
> case is broken when intel_page_page() handles a subdevice.
intel_map_page?


> 
> We observe that at iommu attach time, dmar_insert_one_dev_info() for
> the subdevices will never set dev->archdata.iommu. This is because
> that function uses find_domain() to check if there is already an IOMMU
> for the device, and find_domain() then defers to the real DMA device
> which does have one. Thus dmar_insert_one_dev_info() returns without
> assigning dev->archdata.iommu.
> 
> Then, later:
> 
> 1. intel_map_page() checks if an IOMMU mapping is needed by calling
>iommu_need_mapping() on the subdevice. identity_mapping() returns
>false because dev->archdata.iommu is NULL, so this function
>returns false indicating that mapping is needed.
> 2. __intel_map_single() is called to create the mapping.
> 3. __intel_map_single() calls find_domain(). This function now returns
>the IDENTITY domain corresponding to the real DMA device.
> 4. __intel_map_single() calls domain_get_iommu() on this "real" domain.
>A failure is hit and the entire operation is aborted, because this
>codepath is not intended to handle IDENTITY mappings:
>if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
>return NULL;
> 
> Fix this by using the real DMA device when checking if a mapping is
> needed, while also considering the subdevice DMA mask.
> The IDENTITY case will then directly fall back on dma_direct_map_page().
> 
> Reported-by: Daniel Drake 
> Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
> Signed-off-by: Daniel Drake 
> ---
> 
> Notes:
> v2: switch to Jon's approach instead.
> 
> This problem was detected with a non-upstream patch
> "PCI: Add Intel remapped NVMe device support"
> (https://marc.info/?l=linux-ide=156015271021615=2)
> 
> This patch creates PCI devices a bit like VMD, and hence
> I believe VMD would hit this class of problem for any cases where
> the VMD device is in the IDENTITY domain. (I presume the reason this
> bug was not seen already there is that it is in a DMA iommu domain).
> 
> However this hasn't actually been tested on VMD (don't have the hardware)
> so if I've missed anything and/or it's not a real issue then feel free to
> drop this patch.
> 
>  drivers/iommu/intel-iommu.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 9dc37672bf89..edbe2866b515 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -3582,19 +3582,23 @@ static struct dmar_domain 
> *get_private_domain_for_dev(struct device *dev)
>  /* Check if the dev needs to go through non-identity map and unmap process.*/
>  static bool iommu_need_mapping(struct device *dev)
>  {
> + u64 dma_mask, required_dma_mask;
>   int ret;
>  
>   if (iommu_dummy(dev))
>   return false;
>  
> - ret = identity_mapping(dev);
> - if (ret) {
> - u64 dma_mask = *dev->dma_mask;
> + dma_mask = *dev->dma_mask;
> + if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
> + dma_mask = dev->coherent_dma_mask;
> + required_dma_mask = dma_direct_get_required_mask(dev);
>  
> - if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
> - dma_mask = dev->coherent_dma_mask;
> + if (dev_is_pci(dev))
> + dev = _real_dma_dev(to_pci_dev(dev))->dev;
>  
> - if (dma_mask >= dma_direct_get_required_mask(dev))
> + ret = identity_mapping(dev);
> + if (ret) {
> + if (dma_mask >= required_dma_mask)
>   return false;
>  
>   /*



I think this might work better since it shortcuts the mask check in the
non-identity case. Tests fine when VMD is forced into Identity domain.
Feel free to add my sign-off for either patch you go with.

Thanks,
Jon



diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9dc3767..7ffd252 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3582,12 +3582,16 @@ static struct dmar_domain
*get_private_domain_for_dev(struct device *dev)
 /* Check if the dev needs to go through non-identity map and unmap

[PATCH] iommu/virtio: Build virtio-iommu as module

2020-02-14 Thread Jean-Philippe Brucker
From: Jean-Philippe Brucker 

Now that the infrastructure changes are in place, enable virtio-iommu to
be built as a module. Remove the redundant pci_request_acs() call, since
it's not exported but is already invoked during DMA setup.

Signed-off-by: Jean-Philippe Brucker 
---
This conflicts with the multiplatform work [1] since they both change
Kconfig. Locally I have this patch applied on top of that series but
there is no functional dependency between the two.

[1] 
https://lore.kernel.org/linux-iommu/20200214160413.1475396-1-jean-phili...@linaro.org/
---
 drivers/iommu/Kconfig| 4 ++--
 drivers/iommu/virtio-iommu.c | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index adcbda44d473..bfd4e5fcd6aa 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -506,8 +506,8 @@ config HYPERV_IOMMU
  guests to run with x2APIC mode enabled.
 
 config VIRTIO_IOMMU
-   bool "Virtio IOMMU driver"
-   depends on VIRTIO=y
+   tristate "Virtio IOMMU driver"
+   depends on VIRTIO
depends on (ARM64 || X86)
select IOMMU_API
select IOMMU_DMA
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index f18ba8e22ebd..5429c12c879b 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1084,7 +1084,6 @@ static int viommu_probe(struct virtio_device *vdev)
 
 #ifdef CONFIG_PCI
if (pci_bus_type.iommu_ops != _ops) {
-   pci_request_acs();
ret = bus_set_iommu(_bus_type, _ops);
if (ret)
goto err_unregister;
-- 
2.25.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.4 072/100] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE

2020-02-14 Thread Sasha Levin
From: Will Deacon 

[ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ]

If, for some bizarre reason, the compiler decided to split up the write
of STE DWORD 0, we could end up making a partial structure valid.

Although this probably won't happen, follow the example of the
context-descriptor code and use WRITE_ONCE() to ensure atomicity of the
write.

Reported-by: Jean-Philippe Brucker 
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index eb9937225d645..6c10f307a1c98 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1090,7 +1090,8 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_device *smmu, u32 sid,
}
 
arm_smmu_sync_ste_for_sid(smmu, sid);
-   dst[0] = cpu_to_le64(val);
+   /* See comment in arm_smmu_write_ctx_desc() */
+   WRITE_ONCE(dst[0], cpu_to_le64(val));
arm_smmu_sync_ste_for_sid(smmu, sid);
 
/* It's likely that we'll want to use the new STE soon */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.9 102/141] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE

2020-02-14 Thread Sasha Levin
From: Will Deacon 

[ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ]

If, for some bizarre reason, the compiler decided to split up the write
of STE DWORD 0, we could end up making a partial structure valid.

Although this probably won't happen, follow the example of the
context-descriptor code and use WRITE_ONCE() to ensure atomicity of the
write.

Reported-by: Jean-Philippe Brucker 
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7bd98585d78d2..48d3820087881 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1103,7 +1103,8 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_device *smmu, u32 sid,
}
 
arm_smmu_sync_ste_for_sid(smmu, sid);
-   dst[0] = cpu_to_le64(val);
+   /* See comment in arm_smmu_write_ctx_desc() */
+   WRITE_ONCE(dst[0], cpu_to_le64(val));
arm_smmu_sync_ste_for_sid(smmu, sid);
 
/* It's likely that we'll want to use the new STE soon */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.14 131/186] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE

2020-02-14 Thread Sasha Levin
From: Will Deacon 

[ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ]

If, for some bizarre reason, the compiler decided to split up the write
of STE DWORD 0, we could end up making a partial structure valid.

Although this probably won't happen, follow the example of the
context-descriptor code and use WRITE_ONCE() to ensure atomicity of the
write.

Reported-by: Jean-Philippe Brucker 
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 09eb258a9a7de..29feafa8007fb 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1145,7 +1145,8 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_device *smmu, u32 sid,
}
 
arm_smmu_sync_ste_for_sid(smmu, sid);
-   dst[0] = cpu_to_le64(val);
+   /* See comment in arm_smmu_write_ctx_desc() */
+   WRITE_ONCE(dst[0], cpu_to_le64(val));
arm_smmu_sync_ste_for_sid(smmu, sid);
 
/* It's likely that we'll want to use the new STE soon */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.19 214/252] iommu/vt-d: Remove unnecessary WARN_ON_ONCE()

2020-02-14 Thread Sasha Levin
From: Lu Baolu 

[ Upstream commit 857f081426e5aa38313426c13373730f1345fe95 ]

Address field in device TLB invalidation descriptor is qualified
by the S field. If S field is zero, a single page at page address
specified by address [63:12] is requested to be invalidated. If S
field is set, the least significant bit in the address field with
value 0b (say bit N) indicates the invalidation address range. The
spec doesn't require the address [N - 1, 0] to be cleared, hence
remove the unnecessary WARN_ON_ONCE().

Otherwise, the caller might set "mask = MAX_AGAW_PFN_WIDTH" in order
to invalidating all the cached mappings on an endpoint, and below
overflow error will be triggered.

[...]
UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1354:3
shift exponent 64 is too large for 64-bit type 'long long unsigned int'
[...]

Reported-and-tested-by: Frank 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/dmar.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 7f9824b0609e7..72994d67bc5b9 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1345,7 +1345,6 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 
sid, u16 pfsid,
struct qi_desc desc;
 
if (mask) {
-   WARN_ON_ONCE(addr & ((1ULL << (VTD_PAGE_SHIFT + mask)) - 1));
addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1;
desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
} else
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.19 143/252] iommu/arm-smmu-v3: Populate VMID field for CMDQ_OP_TLBI_NH_VA

2020-02-14 Thread Sasha Levin
From: Shameer Kolothum 

[ Upstream commit 935d43ba272e0001f8ef446a3eff15d8175cb11b ]

CMDQ_OP_TLBI_NH_VA requires VMID and this was missing since
commit 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask
for CMD_TLBI_S2_IPA"). Add it back.

Fixes: 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask for 
CMD_TLBI_S2_IPA")
Signed-off-by: Shameer Kolothum 
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2ab7100bcff12..eff1f3aa5ef43 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -810,6 +810,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
arm_smmu_cmdq_ent *ent)
cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
break;
case CMDQ_OP_TLBI_NH_VA:
+   cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.19 177/252] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE

2020-02-14 Thread Sasha Levin
From: Will Deacon 

[ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ]

If, for some bizarre reason, the compiler decided to split up the write
of STE DWORD 0, we could end up making a partial structure valid.

Although this probably won't happen, follow the example of the
context-descriptor code and use WRITE_ONCE() to ensure atomicity of the
write.

Reported-by: Jean-Philippe Brucker 
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index eff1f3aa5ef43..6b7664052b5be 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1185,7 +1185,8 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_device *smmu, u32 sid,
}
 
arm_smmu_sync_ste_for_sid(smmu, sid);
-   dst[0] = cpu_to_le64(val);
+   /* See comment in arm_smmu_write_ctx_desc() */
+   WRITE_ONCE(dst[0], cpu_to_le64(val));
arm_smmu_sync_ste_for_sid(smmu, sid);
 
/* It's likely that we'll want to use the new STE soon */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.19 019/252] iommu/vt-d: Fix off-by-one in PASID allocation

2020-02-14 Thread Sasha Levin
From: Jacob Pan 

[ Upstream commit 39d630e332144028f56abba83d94291978e72df1 ]

PASID allocator uses IDR which is exclusive for the end of the
allocation range. There is no need to decrement pasid_max.

Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA")
Reported-by: Eric Auger 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index fd8730b2cd46e..5944d3b4dca37 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -377,7 +377,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
ret = intel_pasid_alloc_id(svm,
   !!cap_caching_mode(iommu->cap),
-  pasid_max - 1, GFP_KERNEL);
+  pasid_max, GFP_KERNEL);
if (ret < 0) {
kfree(svm);
kfree(sdev);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.4 395/459] iommu/vt-d: Remove unnecessary WARN_ON_ONCE()

2020-02-14 Thread Sasha Levin
From: Lu Baolu 

[ Upstream commit 857f081426e5aa38313426c13373730f1345fe95 ]

Address field in device TLB invalidation descriptor is qualified
by the S field. If S field is zero, a single page at page address
specified by address [63:12] is requested to be invalidated. If S
field is set, the least significant bit in the address field with
value 0b (say bit N) indicates the invalidation address range. The
spec doesn't require the address [N - 1, 0] to be cleared, hence
remove the unnecessary WARN_ON_ONCE().

Otherwise, the caller might set "mask = MAX_AGAW_PFN_WIDTH" in order
to invalidating all the cached mappings on an endpoint, and below
overflow error will be triggered.

[...]
UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1354:3
shift exponent 64 is too large for 64-bit type 'long long unsigned int'
[...]

Reported-and-tested-by: Frank 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/dmar.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index eecd6a4216672..7196cabafb252 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1351,7 +1351,6 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 
sid, u16 pfsid,
struct qi_desc desc;
 
if (mask) {
-   WARN_ON_ONCE(addr & ((1ULL << (VTD_PAGE_SHIFT + mask)) - 1));
addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1;
desc.qw1 = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
} else
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.4 322/459] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE

2020-02-14 Thread Sasha Levin
From: Will Deacon 

[ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ]

If, for some bizarre reason, the compiler decided to split up the write
of STE DWORD 0, we could end up making a partial structure valid.

Although this probably won't happen, follow the example of the
context-descriptor code and use WRITE_ONCE() to ensure atomicity of the
write.

Reported-by: Jean-Philippe Brucker 
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index ee8d48d863e16..ef6af714a7e64 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1643,7 +1643,8 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_master *master, u32 sid,
 STRTAB_STE_1_EATS_TRANS));
 
arm_smmu_sync_ste_for_sid(smmu, sid);
-   dst[0] = cpu_to_le64(val);
+   /* See comment in arm_smmu_write_ctx_desc() */
+   WRITE_ONCE(dst[0], cpu_to_le64(val));
arm_smmu_sync_ste_for_sid(smmu, sid);
 
/* It's likely that we'll want to use the new STE soon */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.4 271/459] iommu/arm-smmu-v3: Populate VMID field for CMDQ_OP_TLBI_NH_VA

2020-02-14 Thread Sasha Levin
From: Shameer Kolothum 

[ Upstream commit 935d43ba272e0001f8ef446a3eff15d8175cb11b ]

CMDQ_OP_TLBI_NH_VA requires VMID and this was missing since
commit 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask
for CMD_TLBI_S2_IPA"). Add it back.

Fixes: 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask for 
CMD_TLBI_S2_IPA")
Signed-off-by: Shameer Kolothum 
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index ed90361b84dc7..ee8d48d863e16 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -856,6 +856,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
arm_smmu_cmdq_ent *ent)
cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
break;
case CMDQ_OP_TLBI_NH_VA:
+   cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.4 253/459] iommu/vt-d: Match CPU and IOMMU paging mode

2020-02-14 Thread Sasha Levin
From: Jacob Pan 

[ Upstream commit 79db7e1b4cf2a006f556099c13de3b12970fc6e3 ]

When setting up first level page tables for sharing with CPU, we need
to ensure IOMMU can support no less than the levels supported by the
CPU.

It is not adequate, as in the current code, to set up 5-level paging
in PASID entry First Level Paging Mode(FLPM) solely based on CPU.

Currently, intel_pasid_setup_first_level() is only used by native SVM
code which already checks paging mode matches. However, future use of
this helper function may not be limited to native SVM.
https://lkml.org/lkml/2019/11/18/1037

Fixes: 437f35e1cd4c8 ("iommu/vt-d: Add first level page table interface")
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-pasid.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 040a445be3009..e7cb0b8a73327 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -499,8 +499,16 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
}
 
 #ifdef CONFIG_X86
-   if (cpu_feature_enabled(X86_FEATURE_LA57))
-   pasid_set_flpm(pte, 1);
+   /* Both CPU and IOMMU paging mode need to match */
+   if (cpu_feature_enabled(X86_FEATURE_LA57)) {
+   if (cap_5lp_support(iommu->cap)) {
+   pasid_set_flpm(pte, 1);
+   } else {
+   pr_err("VT-d has no 5-level paging support for CPU\n");
+   pasid_clear_entry(pte);
+   return -EINVAL;
+   }
+   }
 #endif /* CONFIG_X86 */
 
pasid_set_domain_id(pte, did);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] virtio-iommu on non-devicetree platforms

2020-02-14 Thread Jean-Philippe Brucker
Add topology description to the virtio-iommu driver and enable x86
platforms. Since the RFC [1] I've mostly given up on ACPI tables, since
the internal discussions seem to have reached a dead end. The built-in
topology description presented here isn't ideal, but it is simple to
implement and doesn't impose a dependency on ACPI or device-tree, which
can be beneficial to lightweight hypervisors.

The built-in description is an array in the virtio config space. The
driver parses the config space early and postpones endpoint probe until
the virtio-iommu device is ready. Each element in the array describes
either a PCI range or a single MMIO endpoint, and their associated
endpoint IDs:

struct virtio_iommu_topo_pci_range {
__le16 type;/* 1: PCI range */
__le16 hierarchy;   /* PCI domain number */
__le16 requester_start; /* First BDF */
__le16 requester_end;   /* Last BDF */
__le32 endpoint_start;  /* First endpoint ID */
};

struct virtio_iommu_topo_endpoint {
__le16 type;/* 2: Endpoint */
__le16 reserved;/* 0 */
__le32 endpoint;/* Endpoint ID */
__le64 address; /* First MMIO address */
};

You can find the QEMU patches based on Eric's latest device on my
virtio-iommu/devel branch [2]. I test on both x86 q35, and aarch64 virt
machine with edk2.

[1] 
https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
[2] https://jpbrucker.net/git/qemu virtio-iommu/devel

Jean-Philippe Brucker (3):
  iommu/virtio: Add topology description to virtio-iommu config space
  PCI: Add DMA configuration for virtual platforms
  iommu/virtio: Enable x86 support

 MAINTAINERS   |   2 +
 drivers/iommu/Kconfig |  13 +-
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu-topology.c | 343 ++
 drivers/iommu/virtio-iommu.c  |   3 +
 drivers/pci/pci-driver.c  |   5 +
 include/linux/virt_iommu.h|  19 ++
 include/uapi/linux/virtio_iommu.h |  26 ++
 8 files changed, 411 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/virtio-iommu-topology.c
 create mode 100644 include/linux/virt_iommu.h

-- 
2.25.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-02-14 Thread Jean-Philippe Brucker
Platforms without device-tree do not currently have a method for
describing the vIOMMU topology. Provide a topology description embedded
into the virtio device.

Use PCI FIXUP to probe the config space early, because we need to
discover the topology before any DMA configuration takes place, and the
virtio driver may be loaded much later. Since we discover the topology
description when probing the PCI hierarchy, the virtual IOMMU cannot
manage other platform devices discovered earlier.

This solution isn't elegant nor foolproof, but is the best we can do at
the moment and works with existing virtio-iommu implementations. It also
enables an IOMMU for lightweight hypervisors that do not rely on
firmware methods for booting.

Signed-off-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 MAINTAINERS   |   2 +
 drivers/iommu/Kconfig |  10 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu-topology.c | 343 ++
 drivers/iommu/virtio-iommu.c  |   3 +
 include/linux/virt_iommu.h|  19 ++
 include/uapi/linux/virtio_iommu.h |  26 ++
 7 files changed, 404 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu-topology.c
 create mode 100644 include/linux/virt_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 38fe2f3f7b6f..6b978b0d0c90 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17780,6 +17780,8 @@ M:  Jean-Philippe Brucker 
 L: virtualizat...@lists.linux-foundation.org
 S: Maintained
 F: drivers/iommu/virtio-iommu.c
+F: drivers/iommu/virtio-iommu-topology.c
+F: include/linux/virt_iommu.h
 F: include/uapi/linux/virtio_iommu.h
 
 VIRTUAL BOX GUEST DEVICE DRIVER
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d2fade984999..068d4e0e3541 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -516,4 +516,14 @@ config VIRTIO_IOMMU
 
  Say Y here if you intend to run this kernel as a guest.
 
+config VIRTIO_IOMMU_TOPOLOGY
+   bool "Topology properties for the virtio-iommu"
+   depends on VIRTIO_IOMMU
+   default y
+   help
+ Enable early probing of the virtio-iommu device, to detect the
+ built-in topology description.
+
+ Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 2104fb8afc06..f295cacf9c6e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -37,3 +37,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o
diff --git a/drivers/iommu/virtio-iommu-topology.c 
b/drivers/iommu/virtio-iommu-topology.c
new file mode 100644
index ..e4ab49701df5
--- /dev/null
+++ b/drivers/iommu/virtio-iommu-topology.c
@@ -0,0 +1,343 @@
+// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct viommu_cap_config {
+   u8 bar;
+   u32 length; /* structure size */
+   u32 offset; /* structure offset within the bar */
+};
+
+union viommu_topo_cfg {
+   __le16  type;
+   struct virtio_iommu_topo_pci_range  pci;
+   struct virtio_iommu_topo_endpoint   ep;
+};
+
+struct viommu_spec {
+   struct device   *dev; /* transport device */
+   struct fwnode_handle*fwnode;
+   struct iommu_ops*ops;
+   struct list_headlist;
+   size_t  num_items;
+   /* The config array of length num_items follows */
+   union viommu_topo_cfg   cfg[];
+};
+
+static LIST_HEAD(viommus);
+static DEFINE_MUTEX(viommus_lock);
+
+#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field)
+
+static inline int viommu_pci_find_capability(struct pci_dev *dev, u8 cfg_type,
+struct viommu_cap_config *cap)
+{
+   int pos;
+   u8 bar;
+
+   for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+pos > 0;
+pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
+   u8 type;
+
+   pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), );
+   if (type != cfg_type)
+   continue;
+
+   pci_read_config_byte(dev, pos + VPCI_FIELD(bar), );
+
+   /* Ignore structures with reserved BAR values */
+   if (type != VIRTIO_PCI_CAP_PCI_CFG && bar > 0x5)
+   continue;
+
+   cap->bar = bar;
+   pci_read_config_dword(dev, pos + VPCI_FIELD(length),
+ >length);
+   

[PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-14 Thread Jean-Philippe Brucker
With the built-in topology description in place, x86 platforms can now
use the virtio-iommu.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 068d4e0e3541..adcbda44d473 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -508,8 +508,9 @@ config HYPERV_IOMMU
 config VIRTIO_IOMMU
bool "Virtio IOMMU driver"
depends on VIRTIO=y
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA
select INTERVAL_TREE
help
  Para-virtualised IOMMU driver with virtio.
-- 
2.25.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.4 254/459] iommu/vt-d: Avoid sending invalid page response

2020-02-14 Thread Sasha Levin
From: Jacob Pan 

[ Upstream commit 5f75585e19cc7018bf2016aa771632081ee2f313 ]

Page responses should only be sent when last page in group (LPIG) or
private data is present in the page request. This patch avoids sending
invalid descriptors.

Fixes: 5d308fc1ecf53 ("iommu/vt-d: Add 256-bit invalidation descriptor support")
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-svm.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index ff7a3f9add325..518d0b2d12afd 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -654,11 +654,10 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (req->priv_data_present)
memcpy(, req->priv_data,
   sizeof(req->priv_data));
+   resp.qw2 = 0;
+   resp.qw3 = 0;
+   qi_submit_sync(, iommu);
}
-   resp.qw2 = 0;
-   resp.qw3 = 0;
-   qi_submit_sync(, iommu);
-
head = (head + sizeof(*req)) & PRQ_RING_MASK;
}
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.4 220/459] iommu/amd: Check feature support bit before accessing MSI capability registers

2020-02-14 Thread Sasha Levin
From: Suravee Suthikulpanit 

[ Upstream commit 813071438e83d338ba5cfe98b3b26c890dc0a6c0 ]

The IOMMU MMIO access to MSI capability registers is available only if
the EFR[MsiCapMmioSup] is set. Current implementation assumes this bit
is set if the EFR[XtSup] is set, which might not be the case.

Fix by checking the EFR[MsiCapMmioSup] before accessing the MSI address
low/high and MSI data registers via the MMIO.

Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts')
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c  | 17 -
 drivers/iommu/amd_iommu_types.h |  1 +
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 483f7bc379fa8..61628c906ce11 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -147,7 +147,7 @@ bool amd_iommu_dump;
 bool amd_iommu_irq_remap __read_mostly;
 
 int amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_VAPIC;
-static int amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE;
+static int amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE;
 
 static bool amd_iommu_detected;
 static bool __initdata amd_iommu_disabled;
@@ -1534,8 +1534,15 @@ static int __init init_iommu_one(struct amd_iommu 
*iommu, struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0))
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
-   if (((h->efr_reg & (0x1 << IOMMU_EFR_XTSUP_SHIFT)) == 0))
-   amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE;
+   /*
+* Note: Since iommu_update_intcapxt() leverages
+* the IOMMU MMIO access to MSI capability block registers
+* for MSI address lo/hi/data, we need to check both
+* EFR[XtSup] and EFR[MsiCapMmioSup] for x2APIC support.
+*/
+   if ((h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT)) &&
+   (h->efr_reg & BIT(IOMMU_EFR_MSICAPMMIOSUP_SHIFT)))
+   amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE;
break;
default:
return -EINVAL;
@@ -1996,8 +2003,8 @@ static int iommu_init_intcapxt(struct amd_iommu *iommu)
struct irq_affinity_notify *notify = >intcapxt_notify;
 
/**
-* IntCapXT requires XTSup=1, which can be inferred
-* amd_iommu_xt_mode.
+* IntCapXT requires XTSup=1 and MsiCapMmioSup=1,
+* which can be inferred from amd_iommu_xt_mode.
 */
if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE)
return 0;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index fc956479b94e6..1b4c340890662 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -383,6 +383,7 @@
 /* IOMMU Extended Feature Register (EFR) */
 #define IOMMU_EFR_XTSUP_SHIFT  2
 #define IOMMU_EFR_GASUP_SHIFT  7
+#define IOMMU_EFR_MSICAPMMIOSUP_SHIFT  46
 
 #define MAX_DOMAIN_ID 65536
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.4 221/459] iommu/amd: Only support x2APIC with IVHD type 11h/40h

2020-02-14 Thread Sasha Levin
From: Suravee Suthikulpanit 

[ Upstream commit 966b753cf3969553ca50bacd2b8c4ddade5ecc9e ]

Current implementation for IOMMU x2APIC support makes use of
the MMIO access to MSI capability block registers, which requires
checking EFR[MsiCapMmioSup]. However, only IVHD type 11h/40h contain
the information, and not in the IVHD type 10h IOMMU feature reporting
field. Since the BIOS in newer systems, which supports x2APIC, would
normally contain IVHD type 11h/40h, remove the IOMMU_FEAT_XTSUP_SHIFT
check for IVHD type 10h, and only support x2APIC with IVHD type 11h/40h.

Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts')
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c  | 2 --
 drivers/iommu/amd_iommu_types.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 61628c906ce11..d7cbca8bf2cd4 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1523,8 +1523,6 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
-   if (((h->efr_attr & (0x1 << IOMMU_FEAT_XTSUP_SHIFT)) == 0))
-   amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE;
break;
case 0x11:
case 0x40:
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 1b4c340890662..daeabd98c60e2 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -377,7 +377,6 @@
 #define IOMMU_CAP_EFR 27
 
 /* IOMMU Feature Reporting Field (for IVHD type 10h */
-#define IOMMU_FEAT_XTSUP_SHIFT 0
 #define IOMMU_FEAT_GASUP_SHIFT 6
 
 /* IOMMU Extended Feature Register (EFR) */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.4 196/459] PCI: Add nr_devfns parameter to pci_add_dma_alias()

2020-02-14 Thread Sasha Levin
From: James Sewart 

[ Upstream commit 09298542cd891b43778db1f65aa3613aa5a562eb ]

Add a "nr_devfns" parameter to pci_add_dma_alias() so it can be used to
create DMA aliases for a range of devfns.

[bhelgaas: incorporate nr_devfns fix from James, update
quirk_pex_vca_alias() and setup_aliases()]
Signed-off-by: James Sewart 
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu.c |  7 ++-
 drivers/pci/pci.c | 22 +-
 drivers/pci/quirks.c  | 23 +--
 include/linux/pci.h   |  2 +-
 4 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 454695b372c8c..8bd5d608a82c2 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -272,11 +272,8 @@ static struct pci_dev *setup_aliases(struct device *dev)
 */
ivrs_alias = amd_iommu_alias_table[pci_dev_id(pdev)];
if (ivrs_alias != pci_dev_id(pdev) &&
-   PCI_BUS_NUM(ivrs_alias) == pdev->bus->number) {
-   pci_add_dma_alias(pdev, ivrs_alias & 0xff);
-   pci_info(pdev, "Added PCI DMA alias %02x.%d\n",
-   PCI_SLOT(ivrs_alias), PCI_FUNC(ivrs_alias));
-   }
+   PCI_BUS_NUM(ivrs_alias) == pdev->bus->number)
+   pci_add_dma_alias(pdev, ivrs_alias & 0xff, 1);
 
clone_aliases(pdev);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index cbf3d3889874c..981ae16f935bc 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5875,7 +5875,8 @@ EXPORT_SYMBOL_GPL(pci_pr3_present);
 /**
  * pci_add_dma_alias - Add a DMA devfn alias for a device
  * @dev: the PCI device for which alias is added
- * @devfn: alias slot and function
+ * @devfn_from: alias slot and function
+ * @nr_devfns: number of subsequent devfns to alias
  *
  * This helper encodes an 8-bit devfn as a bit number in dma_alias_mask
  * which is used to program permissible bus-devfn source addresses for DMA
@@ -5891,8 +5892,13 @@ EXPORT_SYMBOL_GPL(pci_pr3_present);
  * cannot be left as a userspace activity).  DMA aliases should therefore
  * be configured via quirks, such as the PCI fixup header quirk.
  */
-void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
+void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, unsigned nr_devfns)
 {
+   int devfn_to;
+
+   nr_devfns = min(nr_devfns, (unsigned) MAX_NR_DEVFNS - devfn_from);
+   devfn_to = devfn_from + nr_devfns - 1;
+
if (!dev->dma_alias_mask)
dev->dma_alias_mask = bitmap_zalloc(MAX_NR_DEVFNS, GFP_KERNEL);
if (!dev->dma_alias_mask) {
@@ -5900,9 +5906,15 @@ void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
return;
}
 
-   set_bit(devfn, dev->dma_alias_mask);
-   pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n",
-PCI_SLOT(devfn), PCI_FUNC(devfn));
+   bitmap_set(dev->dma_alias_mask, devfn_from, nr_devfns);
+
+   if (nr_devfns == 1)
+   pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n",
+   PCI_SLOT(devfn_from), PCI_FUNC(devfn_from));
+   else if (nr_devfns > 1)
+   pci_info(dev, "Enabling fixed DMA alias for devfn range from 
%02x.%d to %02x.%d\n",
+   PCI_SLOT(devfn_from), PCI_FUNC(devfn_from),
+   PCI_SLOT(devfn_to), PCI_FUNC(devfn_to));
 }
 
 bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 7b6df2d8d6cde..67a9ad3734d18 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3927,7 +3927,7 @@ int pci_dev_specific_reset(struct pci_dev *dev, int probe)
 static void quirk_dma_func0_alias(struct pci_dev *dev)
 {
if (PCI_FUNC(dev->devfn) != 0)
-   pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
+   pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 0), 1);
 }
 
 /*
@@ -3941,7 +3941,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_RICOH, 0xe476, 
quirk_dma_func0_alias);
 static void quirk_dma_func1_alias(struct pci_dev *dev)
 {
if (PCI_FUNC(dev->devfn) != 1)
-   pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 1));
+   pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 1), 1);
 }
 
 /*
@@ -4026,7 +4026,7 @@ static void quirk_fixed_dma_alias(struct pci_dev *dev)
 
id = pci_match_id(fixed_dma_alias_tbl, dev);
if (id)
-   pci_add_dma_alias(dev, id->driver_data);
+   pci_add_dma_alias(dev, id->driver_data, 1);
 }
 
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ADAPTEC2, 0x0285, 
quirk_fixed_dma_alias);
@@ -4068,9 +4068,9 @@ DECLARE_PCI_FIXUP_HEADER(0x8086, 0x244e, 
quirk_use_pcie_bridge_dma_alias);
  */
 static void quirk_mic_x200_dma_alias(struct pci_dev *pdev)
 {
-   pci_add_dma_alias(pdev, PCI_DEVFN(0x10, 0x0));
-   pci_add_dma_alias(pdev, 

[PATCH AUTOSEL 5.4 045/459] iommu/vt-d: Fix off-by-one in PASID allocation

2020-02-14 Thread Sasha Levin
From: Jacob Pan 

[ Upstream commit 39d630e332144028f56abba83d94291978e72df1 ]

PASID allocator uses IDR which is exclusive for the end of the
allocation range. There is no need to decrement pasid_max.

Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA")
Reported-by: Eric Auger 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index dca88f9fdf29a..ff7a3f9add325 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -317,7 +317,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
ret = intel_pasid_alloc_id(svm,
   !!cap_caching_mode(iommu->cap),
-  pasid_max - 1, GFP_KERNEL);
+  pasid_max, GFP_KERNEL);
if (ret < 0) {
kfree(svm);
kfree(sdev);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.5 459/542] iommu/vt-d: Remove unnecessary WARN_ON_ONCE()

2020-02-14 Thread Sasha Levin
From: Lu Baolu 

[ Upstream commit 857f081426e5aa38313426c13373730f1345fe95 ]

Address field in device TLB invalidation descriptor is qualified
by the S field. If S field is zero, a single page at page address
specified by address [63:12] is requested to be invalidated. If S
field is set, the least significant bit in the address field with
value 0b (say bit N) indicates the invalidation address range. The
spec doesn't require the address [N - 1, 0] to be cleared, hence
remove the unnecessary WARN_ON_ONCE().

Otherwise, the caller might set "mask = MAX_AGAW_PFN_WIDTH" in order
to invalidating all the cached mappings on an endpoint, and below
overflow error will be triggered.

[...]
UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1354:3
shift exponent 64 is too large for 64-bit type 'long long unsigned int'
[...]

Reported-and-tested-by: Frank 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/dmar.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 3acfa6a25fa29..fb66f717127d2 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1354,7 +1354,6 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 
sid, u16 pfsid,
struct qi_desc desc;
 
if (mask) {
-   WARN_ON_ONCE(addr & ((1ULL << (VTD_PAGE_SHIFT + mask)) - 1));
addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1;
desc.qw1 = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
} else
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.5 458/542] iommu/vt-d: Mark firmware tainted if RMRR fails sanity check

2020-02-14 Thread Sasha Levin
From: Barret Rhoden 

[ Upstream commit f5a68bb0752e0cf77c06f53f72258e7beb41381b ]

RMRR entries describe memory regions that are DMA targets for devices
outside the kernel's control.

RMRR entries that fail the sanity check are pointing to regions of
memory that the firmware did not tell the kernel are reserved or
otherwise should not be used.

Instead of aborting DMAR processing, this commit marks the firmware
as tainted. These RMRRs will still be identity mapped, otherwise,
some devices, e.x. graphic devices, will not work during boot.

Signed-off-by: Barret Rhoden 
Signed-off-by: Lu Baolu 
Fixes: f036c7fa0ab60 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported 
as reserved")
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-iommu.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 541896ab3d086..dfedbb04f647d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4320,12 +4320,16 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header 
*header, void *arg)
 {
struct acpi_dmar_reserved_memory *rmrr;
struct dmar_rmrr_unit *rmrru;
-   int ret;
 
rmrr = (struct acpi_dmar_reserved_memory *)header;
-   ret = arch_rmrr_sanity_check(rmrr);
-   if (ret)
-   return ret;
+   if (arch_rmrr_sanity_check(rmrr))
+   WARN_TAINT(1, TAINT_FIRMWARE_WORKAROUND,
+  "Your BIOS is broken; bad RMRR [%#018Lx-%#018Lx]\n"
+  "BIOS vendor: %s; Ver: %s; Product Version: %s\n",
+  rmrr->base_address, rmrr->end_address,
+  dmi_get_system_info(DMI_BIOS_VENDOR),
+  dmi_get_system_info(DMI_BIOS_VERSION),
+  dmi_get_system_info(DMI_PRODUCT_VERSION));
 
rmrru = kzalloc(sizeof(*rmrru), GFP_KERNEL);
if (!rmrru)
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.5 370/542] iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE

2020-02-14 Thread Sasha Levin
From: Will Deacon 

[ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ]

If, for some bizarre reason, the compiler decided to split up the write
of STE DWORD 0, we could end up making a partial structure valid.

Although this probably won't happen, follow the example of the
context-descriptor code and use WRITE_ONCE() to ensure atomicity of the
write.

Reported-by: Jean-Philippe Brucker 
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2f7680faba49e..6bd6a3f3f4710 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1643,7 +1643,8 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_master *master, u32 sid,
 STRTAB_STE_1_EATS_TRANS));
 
arm_smmu_sync_ste_for_sid(smmu, sid);
-   dst[0] = cpu_to_le64(val);
+   /* See comment in arm_smmu_write_ctx_desc() */
+   WRITE_ONCE(dst[0], cpu_to_le64(val));
arm_smmu_sync_ste_for_sid(smmu, sid);
 
/* It's likely that we'll want to use the new STE soon */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.5 312/542] iommu/arm-smmu-v3: Populate VMID field for CMDQ_OP_TLBI_NH_VA

2020-02-14 Thread Sasha Levin
From: Shameer Kolothum 

[ Upstream commit 935d43ba272e0001f8ef446a3eff15d8175cb11b ]

CMDQ_OP_TLBI_NH_VA requires VMID and this was missing since
commit 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask
for CMD_TLBI_S2_IPA"). Add it back.

Fixes: 1c27df1c0a82 ("iommu/arm-smmu: Use correct address mask for 
CMD_TLBI_S2_IPA")
Signed-off-by: Shameer Kolothum 
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index effe72eb89e7f..2f7680faba49e 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -856,6 +856,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
arm_smmu_cmdq_ent *ent)
cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
break;
case CMDQ_OP_TLBI_NH_VA:
+   cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.5 291/542] iommu/vt-d: Match CPU and IOMMU paging mode

2020-02-14 Thread Sasha Levin
From: Jacob Pan 

[ Upstream commit 79db7e1b4cf2a006f556099c13de3b12970fc6e3 ]

When setting up first level page tables for sharing with CPU, we need
to ensure IOMMU can support no less than the levels supported by the
CPU.

It is not adequate, as in the current code, to set up 5-level paging
in PASID entry First Level Paging Mode(FLPM) solely based on CPU.

Currently, intel_pasid_setup_first_level() is only used by native SVM
code which already checks paging mode matches. However, future use of
this helper function may not be limited to native SVM.
https://lkml.org/lkml/2019/11/18/1037

Fixes: 437f35e1cd4c8 ("iommu/vt-d: Add first level page table interface")
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-pasid.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 040a445be3009..e7cb0b8a73327 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -499,8 +499,16 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
}
 
 #ifdef CONFIG_X86
-   if (cpu_feature_enabled(X86_FEATURE_LA57))
-   pasid_set_flpm(pte, 1);
+   /* Both CPU and IOMMU paging mode need to match */
+   if (cpu_feature_enabled(X86_FEATURE_LA57)) {
+   if (cap_5lp_support(iommu->cap)) {
+   pasid_set_flpm(pte, 1);
+   } else {
+   pr_err("VT-d has no 5-level paging support for CPU\n");
+   pasid_clear_entry(pte);
+   return -EINVAL;
+   }
+   }
 #endif /* CONFIG_X86 */
 
pasid_set_domain_id(pte, did);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.5 292/542] iommu/vt-d: Avoid sending invalid page response

2020-02-14 Thread Sasha Levin
From: Jacob Pan 

[ Upstream commit 5f75585e19cc7018bf2016aa771632081ee2f313 ]

Page responses should only be sent when last page in group (LPIG) or
private data is present in the page request. This patch avoids sending
invalid descriptors.

Fixes: 5d308fc1ecf53 ("iommu/vt-d: Add 256-bit invalidation descriptor support")
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-svm.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index ff7a3f9add325..518d0b2d12afd 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -654,11 +654,10 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (req->priv_data_present)
memcpy(, req->priv_data,
   sizeof(req->priv_data));
+   resp.qw2 = 0;
+   resp.qw3 = 0;
+   qi_submit_sync(, iommu);
}
-   resp.qw2 = 0;
-   resp.qw3 = 0;
-   qi_submit_sync(, iommu);
-
head = (head + sizeof(*req)) & PRQ_RING_MASK;
}
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.5 255/542] iommu/amd: Only support x2APIC with IVHD type 11h/40h

2020-02-14 Thread Sasha Levin
From: Suravee Suthikulpanit 

[ Upstream commit 966b753cf3969553ca50bacd2b8c4ddade5ecc9e ]

Current implementation for IOMMU x2APIC support makes use of
the MMIO access to MSI capability block registers, which requires
checking EFR[MsiCapMmioSup]. However, only IVHD type 11h/40h contain
the information, and not in the IVHD type 10h IOMMU feature reporting
field. Since the BIOS in newer systems, which supports x2APIC, would
normally contain IVHD type 11h/40h, remove the IOMMU_FEAT_XTSUP_SHIFT
check for IVHD type 10h, and only support x2APIC with IVHD type 11h/40h.

Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts')
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c  | 2 --
 drivers/iommu/amd_iommu_types.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 61628c906ce11..d7cbca8bf2cd4 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1523,8 +1523,6 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
-   if (((h->efr_attr & (0x1 << IOMMU_FEAT_XTSUP_SHIFT)) == 0))
-   amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE;
break;
case 0x11:
case 0x40:
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index f8a7945f3df90..798e1533a1471 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -377,7 +377,6 @@
 #define IOMMU_CAP_EFR 27
 
 /* IOMMU Feature Reporting Field (for IVHD type 10h */
-#define IOMMU_FEAT_XTSUP_SHIFT 0
 #define IOMMU_FEAT_GASUP_SHIFT 6
 
 /* IOMMU Extended Feature Register (EFR) */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.5 256/542] iommu/iova: Silence warnings under memory pressure

2020-02-14 Thread Sasha Levin
From: Qian Cai 

[ Upstream commit 944c9175397476199d4dd1028d87ddc582c35ee8 ]

When running heavy memory pressure workloads, this 5+ old system is
throwing endless warnings below because disk IO is too slow to recover
from swapping. Since the volume from alloc_iova_fast() could be large,
once it calls printk(), it will trigger disk IO (writing to the log
files) and pending softirqs which could cause an infinite loop and make
no progress for days by the ongoimng memory reclaim. This is the counter
part for Intel where the AMD part has already been merged. See the
commit 3d708895325b ("iommu/amd: Silence warnings under memory
pressure"). Since the allocation failure will be reported in
intel_alloc_iova(), so just call dev_err_once() there because even the
"ratelimited" is too much, and silence the one in alloc_iova_mem() to
avoid the expensive warn_alloc().

 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 slab_out_of_memory: 66 callbacks suppressed
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
   node 0: slabs: 1822, objs: 16398, free: 0
   node 1: slabs: 2051, objs: 18459, free: 31
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
   node 0: slabs: 1822, objs: 16398, free: 0
   node 1: slabs: 2051, objs: 18459, free: 31
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 0: slabs: 1822, objs: 16398, free: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 1: slabs: 2051, objs: 18459, free: 31
   node 0: slabs: 697, objs: 4182, free: 0
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   node 1: slabs: 381, objs: 2286, free: 27
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 1: slabs: 381, objs: 2286, free: 27
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 warn_alloc: 96 callbacks suppressed
 kworker/11:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1
 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: GB
 Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19
12/27/2015
 Workqueue: kblockd blk_mq_run_work_fn
 Call Trace:
  dump_stack+0xa0/0xea
  warn_alloc.cold.94+0x8a/0x12d
  __alloc_pages_slowpath+0x1750/0x1870
  __alloc_pages_nodemask+0x58a/0x710
  alloc_pages_current+0x9c/0x110
  alloc_slab_page+0xc9/0x760
  allocate_slab+0x48f/0x5d0
  new_slab+0x46/0x70
  ___slab_alloc+0x4ab/0x7b0
  __slab_alloc+0x43/0x70
  kmem_cache_alloc+0x2dd/0x450
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
  alloc_iova+0x33/0x210
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
  alloc_iova_fast+0x62/0x3d1
   node 1: slabs: 381, objs: 2286, free: 27
  intel_alloc_iova+0xce/0xe0
  intel_map_sg+0xed/0x410
  scsi_dma_map+0xd7/0x160
  scsi_queue_rq+0xbf7/0x1310
  blk_mq_dispatch_rq_list+0x4d9/0xbc0
  blk_mq_sched_dispatch_requests+0x24a/0x300
  __blk_mq_run_hw_queue+0x156/0x230
  blk_mq_run_work_fn+0x3b/0x40
  process_one_work+0x579/0xb90
  worker_thread+0x63/0x5b0
  kthread+0x1e6/0x210
  ret_from_fork+0x3a/0x50
 

[PATCH AUTOSEL 5.5 254/542] iommu/amd: Check feature support bit before accessing MSI capability registers

2020-02-14 Thread Sasha Levin
From: Suravee Suthikulpanit 

[ Upstream commit 813071438e83d338ba5cfe98b3b26c890dc0a6c0 ]

The IOMMU MMIO access to MSI capability registers is available only if
the EFR[MsiCapMmioSup] is set. Current implementation assumes this bit
is set if the EFR[XtSup] is set, which might not be the case.

Fix by checking the EFR[MsiCapMmioSup] before accessing the MSI address
low/high and MSI data registers via the MMIO.

Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts')
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c  | 17 -
 drivers/iommu/amd_iommu_types.h |  1 +
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 483f7bc379fa8..61628c906ce11 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -147,7 +147,7 @@ bool amd_iommu_dump;
 bool amd_iommu_irq_remap __read_mostly;
 
 int amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_VAPIC;
-static int amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE;
+static int amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE;
 
 static bool amd_iommu_detected;
 static bool __initdata amd_iommu_disabled;
@@ -1534,8 +1534,15 @@ static int __init init_iommu_one(struct amd_iommu 
*iommu, struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0))
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
-   if (((h->efr_reg & (0x1 << IOMMU_EFR_XTSUP_SHIFT)) == 0))
-   amd_iommu_xt_mode = IRQ_REMAP_XAPIC_MODE;
+   /*
+* Note: Since iommu_update_intcapxt() leverages
+* the IOMMU MMIO access to MSI capability block registers
+* for MSI address lo/hi/data, we need to check both
+* EFR[XtSup] and EFR[MsiCapMmioSup] for x2APIC support.
+*/
+   if ((h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT)) &&
+   (h->efr_reg & BIT(IOMMU_EFR_MSICAPMMIOSUP_SHIFT)))
+   amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE;
break;
default:
return -EINVAL;
@@ -1996,8 +2003,8 @@ static int iommu_init_intcapxt(struct amd_iommu *iommu)
struct irq_affinity_notify *notify = >intcapxt_notify;
 
/**
-* IntCapXT requires XTSup=1, which can be inferred
-* amd_iommu_xt_mode.
+* IntCapXT requires XTSup=1 and MsiCapMmioSup=1,
+* which can be inferred from amd_iommu_xt_mode.
 */
if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE)
return 0;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index f52f59d5c6bd4..f8a7945f3df90 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -383,6 +383,7 @@
 /* IOMMU Extended Feature Register (EFR) */
 #define IOMMU_EFR_XTSUP_SHIFT  2
 #define IOMMU_EFR_GASUP_SHIFT  7
+#define IOMMU_EFR_MSICAPMMIOSUP_SHIFT  46
 
 #define MAX_DOMAIN_ID 65536
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.5 051/542] iommu/vt-d: Fix off-by-one in PASID allocation

2020-02-14 Thread Sasha Levin
From: Jacob Pan 

[ Upstream commit 39d630e332144028f56abba83d94291978e72df1 ]

PASID allocator uses IDR which is exclusive for the end of the
allocation range. There is no need to decrement pasid_max.

Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA")
Reported-by: Eric Auger 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index dca88f9fdf29a..ff7a3f9add325 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -317,7 +317,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
ret = intel_pasid_alloc_id(svm,
   !!cap_caching_mode(iommu->cap),
-  pasid_max - 1, GFP_KERNEL);
+  pasid_max, GFP_KERNEL);
if (ret < 0) {
kfree(svm);
kfree(sdev);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm64 iommu groups issue

2020-02-14 Thread John Garry


@@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct
pci_bus *bus)
   /* Set up MSI IRQ domain */
   pci_set_msi_domain(dev);

+    parent = dev->dev.parent;
+    if (parent && parent->bus == _bus_type)
+    device_link_add(>dev, parent, DL_FLAG_AUTOPROBE_CONSUMER);
+
   /* Notifier could use PCI capabilities */
   dev->match_driver = false;
   ret = device_add(>dev);
--

This would work, but the problem is that if the port driver fails in
probing - and not just for -EPROBE_DEFER - then the child device will
never probe. This very thing happens on my dev board. However we could
expand the device links API to cover this sort of scenario.


Yes, that's an undesirable issue, but in fact I think it's mostly
indicative that involving drivers in something which is designed to
happen at a level below drivers is still fundamentally wrong and doomed
to be fragile at best.


Right, and even worse is that it relies on the port driver even existing 
at all.


All this iommu group assignment should be taken outside device driver 
probe paths.


However we could still consider device links for sync'ing the SMMU and 
each device probing.




Another thought that crosses my mind is that when pci_device_group()
walks up to the point of ACS isolation and doesn't find an existing
group, it can still infer that everything it walked past *should* be put
in the same group it's then eventually going to return. Unfortunately I
can't see an obvious way for it to act on that knowledge, though, since
recursive iommu_probe_device() is unlikely to end well.


I'd be inclined not to change that code.




As for alternatives, it looks pretty difficult to me to disassociate the
group allocation from the dma_configure path.


Indeed it's non-trivial, but it really does need cleaning up at some point.

Having just had yet another spark, does something like the untested
super-hack below work at all? 


I tried it and it doesn't (yet) work.

So when we try 
iommu_bus_replay()->add_iommu_group()->iommu_probe_device()->arm_smmu_add_device(),
the iommu_fwspec is still NULL for that device - this is not set until 
later when the device driver is going to finally probe in 
iort_iommu_xlate()->iommu_fwspec_init(), and it's too late...


And this looks to be the reason for which current 
iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also.


On this current code mentioned, the principle of this seems wrong to me 
- we call bus_for_each_device(..., add_iommu_group) for the first SMMU 
in the system which probes, but we attempt to add_iommu_group() for all 
devices on the bus, even though the SMMU for that device may yet to have 
probed.


Thanks,
John

I doubt it's a viable direction to take in

itself, but it could be food for thought if it at least proves the concept.

Robin.

->8-
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index aa3ac2a03807..554cde76c766 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3841,20 +3841,20 @@ static int arm_smmu_set_bus_ops(struct iommu_ops
*ops)
int err;

   #ifdef CONFIG_PCI
-   if (pci_bus_type.iommu_ops != ops) {
+   if (1) {
err = bus_set_iommu(_bus_type, ops);
if (err)
return err;
}
   #endif
   #ifdef CONFIG_ARM_AMBA
-   if (amba_bustype.iommu_ops != ops) {
+   if (1) {
err = bus_set_iommu(_bustype, ops);
if (err)
goto err_reset_pci_ops;
}
   #endif
-   if (platform_bus_type.iommu_ops != ops) {
+   if (1) {
err = bus_set_iommu(_bus_type, ops);
if (err)
goto err_reset_amba_ops;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 660eea8d1d2f..b81ae2b4d4fb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1542,6 +1542,14 @@ static int iommu_bus_init(struct bus_type *bus,
const struct iommu_ops *ops)
return err;
   }

+static int iommu_bus_replay(struct device *dev, void *data)
+{
+   if (dev->iommu_group)
+   return 0;
+
+   return add_iommu_group(dev, data);
+}
+
   /**
* bus_set_iommu - set iommu-callbacks for the bus
* @bus: bus.
@@ -1564,6 +1572,9 @@ int bus_set_iommu(struct bus_type *bus, const
struct iommu_ops *ops)
return 0;
}

+   if (bus->iommu_ops == ops)
+   return bus_for_each_dev(bus, NULL, NULL, iommu_bus_replay);
+
if (bus->iommu_ops != NULL)
return -EBUSY;



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] iommu/vt-d: consider real PCI device when checking if mapping is needed

2020-02-14 Thread Lu Baolu

Hi,

On 2020/2/14 17:02, Daniel Drake wrote:

From: Jon Derrick 

The PCI devices handled by intel-iommu may have a DMA requester on
another bus, such as VMD subdevices needing to use the VMD endpoint.

The real DMA device is now used for the DMA mapping, but one case was
missed earlier: if the VMD device (and hence subdevices too) are under
IOMMU_DOMAIN_IDENTITY, mappings do not work.

Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by
creating an iommu DMA mapping, and fall back on dma_direct_map_page()
for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY
case is broken when intel_page_page() handles a subdevice.

We observe that at iommu attach time, dmar_insert_one_dev_info() for
the subdevices will never set dev->archdata.iommu. This is because
that function uses find_domain() to check if there is already an IOMMU
for the device, and find_domain() then defers to the real DMA device
which does have one. Thus dmar_insert_one_dev_info() returns without
assigning dev->archdata.iommu.

Then, later:

1. intel_map_page() checks if an IOMMU mapping is needed by calling
iommu_need_mapping() on the subdevice. identity_mapping() returns
false because dev->archdata.iommu is NULL, so this function
returns false indicating that mapping is needed.
2. __intel_map_single() is called to create the mapping.
3. __intel_map_single() calls find_domain(). This function now returns
the IDENTITY domain corresponding to the real DMA device.
4. __intel_map_single() calls domain_get_iommu() on this "real" domain.
A failure is hit and the entire operation is aborted, because this
codepath is not intended to handle IDENTITY mappings:
if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
return NULL;

Fix this by using the real DMA device when checking if a mapping is
needed, while also considering the subdevice DMA mask.
The IDENTITY case will then directly fall back on dma_direct_map_page().

Reported-by: Daniel Drake 
Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Signed-off-by: Daniel Drake 


Why not have the patch author's signed-off-by?

Best regards,
baolu


---

Notes:
 v2: switch to Jon's approach instead.
 
 This problem was detected with a non-upstream patch

 "PCI: Add Intel remapped NVMe device support"
 (https://marc.info/?l=linux-ide=156015271021615=2)
 
 This patch creates PCI devices a bit like VMD, and hence

 I believe VMD would hit this class of problem for any cases where
 the VMD device is in the IDENTITY domain. (I presume the reason this
 bug was not seen already there is that it is in a DMA iommu domain).
 
 However this hasn't actually been tested on VMD (don't have the hardware)

 so if I've missed anything and/or it's not a real issue then feel free to
 drop this patch.

  drivers/iommu/intel-iommu.c | 16 ++--
  1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9dc37672bf89..edbe2866b515 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3582,19 +3582,23 @@ static struct dmar_domain 
*get_private_domain_for_dev(struct device *dev)
  /* Check if the dev needs to go through non-identity map and unmap process.*/
  static bool iommu_need_mapping(struct device *dev)
  {
+   u64 dma_mask, required_dma_mask;
int ret;
  
  	if (iommu_dummy(dev))

return false;
  
-	ret = identity_mapping(dev);

-   if (ret) {
-   u64 dma_mask = *dev->dma_mask;
+   dma_mask = *dev->dma_mask;
+   if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
+   dma_mask = dev->coherent_dma_mask;
+   required_dma_mask = dma_direct_get_required_mask(dev);
  
-		if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)

-   dma_mask = dev->coherent_dma_mask;
+   if (dev_is_pci(dev))
+   dev = _real_dma_dev(to_pci_dev(dev))->dev;
  
-		if (dma_mask >= dma_direct_get_required_mask(dev))

+   ret = identity_mapping(dev);
+   if (ret) {
+   if (dma_mask >= required_dma_mask)
return false;
  
  		/*



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/amd: Fix the configuration of GCR3 table root pointer

2020-02-14 Thread Adrian Huang
From: Adrian Huang 

The SPA of the GCR3 table root pointer[51:31] masks 20 bits. However,
this requires 21 bits (Please see the AMD IOMMU specification).
This leads to the potential failure when the bit 51 of SPA of
the GCR3 table root pointer is 1'.

Signed-off-by: Adrian Huang 
---
 drivers/iommu/amd_iommu_types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index f8d01d6b00da..ca8c4522045b 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -348,7 +348,7 @@
 
 #define DTE_GCR3_VAL_A(x)  (((x) >> 12) & 0x7ULL)
 #define DTE_GCR3_VAL_B(x)  (((x) >> 15) & 0x0ULL)
-#define DTE_GCR3_VAL_C(x)  (((x) >> 31) & 0xfULL)
+#define DTE_GCR3_VAL_C(x)  (((x) >> 31) & 0x1fULL)
 
 #define DTE_GCR3_INDEX_A   0
 #define DTE_GCR3_INDEX_B   1
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/vt-d: consider real PCI device when checking if mapping is needed

2020-02-14 Thread Daniel Drake
From: Jon Derrick 

The PCI devices handled by intel-iommu may have a DMA requester on
another bus, such as VMD subdevices needing to use the VMD endpoint.

The real DMA device is now used for the DMA mapping, but one case was
missed earlier: if the VMD device (and hence subdevices too) are under
IOMMU_DOMAIN_IDENTITY, mappings do not work.

Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by
creating an iommu DMA mapping, and fall back on dma_direct_map_page()
for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY
case is broken when intel_page_page() handles a subdevice.

We observe that at iommu attach time, dmar_insert_one_dev_info() for
the subdevices will never set dev->archdata.iommu. This is because
that function uses find_domain() to check if there is already an IOMMU
for the device, and find_domain() then defers to the real DMA device
which does have one. Thus dmar_insert_one_dev_info() returns without
assigning dev->archdata.iommu.

Then, later:

1. intel_map_page() checks if an IOMMU mapping is needed by calling
   iommu_need_mapping() on the subdevice. identity_mapping() returns
   false because dev->archdata.iommu is NULL, so this function
   returns false indicating that mapping is needed.
2. __intel_map_single() is called to create the mapping.
3. __intel_map_single() calls find_domain(). This function now returns
   the IDENTITY domain corresponding to the real DMA device.
4. __intel_map_single() calls domain_get_iommu() on this "real" domain.
   A failure is hit and the entire operation is aborted, because this
   codepath is not intended to handle IDENTITY mappings:
   if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
   return NULL;

Fix this by using the real DMA device when checking if a mapping is
needed, while also considering the subdevice DMA mask.
The IDENTITY case will then directly fall back on dma_direct_map_page().

Reported-by: Daniel Drake 
Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Signed-off-by: Daniel Drake 
---

Notes:
v2: switch to Jon's approach instead.

This problem was detected with a non-upstream patch
"PCI: Add Intel remapped NVMe device support"
(https://marc.info/?l=linux-ide=156015271021615=2)

This patch creates PCI devices a bit like VMD, and hence
I believe VMD would hit this class of problem for any cases where
the VMD device is in the IDENTITY domain. (I presume the reason this
bug was not seen already there is that it is in a DMA iommu domain).

However this hasn't actually been tested on VMD (don't have the hardware)
so if I've missed anything and/or it's not a real issue then feel free to
drop this patch.

 drivers/iommu/intel-iommu.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9dc37672bf89..edbe2866b515 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3582,19 +3582,23 @@ static struct dmar_domain 
*get_private_domain_for_dev(struct device *dev)
 /* Check if the dev needs to go through non-identity map and unmap process.*/
 static bool iommu_need_mapping(struct device *dev)
 {
+   u64 dma_mask, required_dma_mask;
int ret;
 
if (iommu_dummy(dev))
return false;
 
-   ret = identity_mapping(dev);
-   if (ret) {
-   u64 dma_mask = *dev->dma_mask;
+   dma_mask = *dev->dma_mask;
+   if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
+   dma_mask = dev->coherent_dma_mask;
+   required_dma_mask = dma_direct_get_required_mask(dev);
 
-   if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
-   dma_mask = dev->coherent_dma_mask;
+   if (dev_is_pci(dev))
+   dev = _real_dma_dev(to_pci_dev(dev))->dev;
 
-   if (dma_mask >= dma_direct_get_required_mask(dev))
+   ret = identity_mapping(dev);
+   if (ret) {
+   if (dma_mask >= required_dma_mask)
return false;
 
/*
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu