Re: [PATCH] iommu/vt-d: Fix sysfs leak in alloc_domain()

2021-05-20 Thread Rolf Eike Beer
Am Donnerstag, 22. April 2021, 07:34:17 CEST schrieb Lu Baolu:
> Hi Rolf,
> 
> On 4/22/21 1:39 PM, Rolf Eike Beer wrote:
> > iommu_device_sysfs_add() is called before, so is has to be cleaned on
> > subsequent errors.
> > 
> > Fixes: 39ab9555c2411 ("iommu: Add sysfs bindings for struct iommu_device")
> > Cc: sta...@vger.kernel.org # 4.11.x
> > Signed-off-by: Rolf Eike Beer 
> 
> Acked-by: Lu Baolu 

Ping. Who is going to pick this up, please?

Eike
-- 
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055

emlix - smart embedded open source


signature.asc
Description: This is a digitally signed message part.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/2] iommu/arm-smmu-v3: Add some parameter check in __arm_smmu_tlb_inv_range()

2021-05-20 Thread Kunkun Jiang

Hi Robin,

On 2021/5/19 18:01, Robin Murphy wrote:

On 2021-05-19 10:43, Kunkun Jiang wrote:

Hi all,

This set of patches solves some errors when I tested the SMMU nested 
mode.


Test scenario description:
guest kernel: 4KB translation granule
host kernel: 16KB translation granule

errors:
1. encountered an endless loop in __arm_smmu_tlb_inv_range because
num_pages is 0
2. encountered CERROR_ILL because the fields of TLB invalidation
command are as follow: TG = 2, NUM = 0, SCALE = 0, TTL = 0. The
combination is exactly the kind of reserved combination pointed
out in the SMMUv3 spec(page 143-144, version D.a)

In my opinion, it is more appropriate to add parameter check in
__arm_smmu_tlb_inv_range(), although these problems only appeared
when I tested the SMMU nested mode. What do you think?


FWIW I think it would be better to fix the caller to not issue broken 
commands in the first place. The kernel shouldn't do so for itself 
(and definitely needs fixing if it ever does), so it sounds like the 
nesting implementation needs to do a bit more validation of what it's 
passing through.

Thanks for your reply.
I will report these errors to Eric and discuss how to fix them.

Thanks,
Kunkun Jiang


Robin.


This series include patches as below:
Patch 1:
- align the invalid range with leaf page size upwards when smmu
supports RIL

Patch 2:
- add a check to standardize granule size when smmu supports RIL

Kunkun Jiang (2):
   iommu/arm-smmu-v3: Align invalid range with leaf page size upwards
 when support RIL
   iommu/arm-smmu-v3: Standardize granule size when support RIL

  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +
  1 file changed, 9 insertions(+)


.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v15 07/12] iommu/smmuv3: Implement cache_invalidate

2021-05-20 Thread Kunkun Jiang

Hi Eric,

On 2021/4/11 19:12, Eric Auger wrote:

Implement domain-selective, pasid selective and page-selective
IOTLB invalidations.

Signed-off-by: Eric Auger 

---
v4 -> v15:
- remove the redundant arm_smmu_cmdq_issue_sync(smmu)
   in IOMMU_INV_GRANU_ADDR case (Zenghui)
- if RIL is not supported by the host, make sure the granule_size
   that is passed by the userspace is supported or fix it
   (Chenxiang)

v13 -> v14:
- Add domain invalidation
- do global inval when asid is not provided with addr
   granularity

v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
   anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 +
  1 file changed, 89 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 56a301fbe75a..bfc112cc0d38 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2961,6 +2961,94 @@ static void arm_smmu_detach_pasid_table(struct 
iommu_domain *domain)
mutex_unlock(&smmu_domain->init_mutex);
  }
  
+static int

+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info)
+{
+   struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -EINVAL;
+
+   if (!smmu)
+   return -EINVAL;
+
+   if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
+   inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+   return -ENOENT;
+   }
+
+   if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
+   return -EINVAL;
+
+   /* IOTLB invalidation */
+
+   switch (inv_info->granularity) {
+   case IOMMU_INV_GRANU_PASID:
+   {
+   struct iommu_inv_pasid_info *info =
+   &inv_info->granu.pasid_info;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+   if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+   return 0;
+   }
+   case IOMMU_INV_GRANU_ADDR:
+   {
+   struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
+   size_t granule_size  = info->granule_size;
+   size_t size = info->nb_granules * info->granule_size;
+   bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+   int tg;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+
+   if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
+   break;
+
+   tg = __ffs(granule_size);
+   if (granule_size & ~(1 << tg))
+   return -EINVAL;
+   /*
+* When RIL is not supported, make sure the granule size that is
+* passed is supported. In RIL mode, this is enforced in
+* __arm_smmu_tlb_inv_range()
+*/
+   if (!(smmu->features & ARM_SMMU_FEAT_RANGE_INV) &&
+   !(granule_size & smmu_domain->domain.pgsize_bitmap)) {
+   tg = __ffs(smmu_domain->domain.pgsize_bitmap);
+   granule_size = 1 << tg;
+   size = size >> tg;
+   }
+
+   arm_smmu_tlb_inv_range_domain(info->addr, size,
+ granule_size, leaf,
+ info->archid, smmu_domain);

I encountered some errors when I tested the SMMU nested mode.

Test scenario description:
guest kernel: 4KB translation granule
host kernel: 16KB translation granule

errors:
1. encountered an endless loop in __arm_smmu_tlb_inv_range because
num_pages is 0
2. encountered CERROR_ILL because the fields of TLB invalidation
command are as follow: TG = 2, NUM = 0, SCALE = 0, TTL = 0. The
combination is exactly the kind of reserved combination pointed
out in the SMMUv3 spec(page 143-144, version D.a)

According to my analysis, we should do a bit more validation on the
'size' and 'granule_size' when SMMU support

Re: [RFC PATCH v3 7/8] vfio/type1: Add selective DMA faulting support

2021-05-20 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:19 +0800
> Shenming Lu  wrote:
> 
>> Some devices only allow selective DMA faulting. Similar to the selective
>> dirty page tracking, the vendor driver can call vfio_pin_pages() to
>> indicate the non-faultable scope, we add a new struct vfio_range to
>> record it, then when the IOPF handler receives any page request out
>> of the scope, we can directly return with an invalid response.
> 
> Seems like this highlights a deficiency in the design, that the user
> can't specify mappings as iopf enabled or disabled.  Also, if the
> vendor driver has pinned pages within the range, shouldn't that prevent
> them from faulting in the first place?  Why do we need yet more
> tracking structures?  Pages pinned by the vendor driver need to count
> against the user's locked memory limits regardless of iopf.  Thanks,

Currently we only have a vfio_pfn struct to track the external pinned pages
(single page granularity), so I add a vfio_range struct for efficient lookup.

Yeah, by this patch, for the non-pinned scope, we can directly return INVALID,
but for the pinned(non-faultable) scope, tracking the pinned range doesn't seem
to help more...

Thanks,
Shenming

> 
> Alex
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 6/8] vfio/type1: No need to statically pin and map if IOPF enabled

2021-05-20 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:18 +0800
> Shenming Lu  wrote:
> 
>> If IOPF enabled for the VFIO container, there is no need to statically
>> pin and map the entire DMA range, we can do it on demand. And unmap
>> according to the IOPF mapped bitmap when removing vfio_dma.
>>
>> Note that we still mark all pages dirty even if IOPF enabled, we may
>> add IOPF-based fine grained dirty tracking support in the future.
>>
>> Signed-off-by: Shenming Lu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 38 +++--
>>  1 file changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 7df5711e743a..dcc93c3b258c 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -175,6 +175,7 @@ struct vfio_iopf_group {
>>  #define IOPF_MAPPED_BITMAP_GET(dma, i)  \
>>((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG]
>> \
>> >> ((i) % BITS_PER_LONG)) & 0x1)  
>> +#define IOPF_MAPPED_BITMAP_BYTES(n) DIRTY_BITMAP_BYTES(n)
>>  
>>  #define WAITED 1
>>  
>> @@ -959,7 +960,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>>   * already pinned and accounted. Accouting should be done if there is no
>>   * iommu capable domain in the container.
>>   */
>> -do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
>> +do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) ||
>> +iommu->iopf_enabled;
>>  
>>  for (i = 0; i < npage; i++) {
>>  struct vfio_pfn *vpfn;
>> @@ -1048,7 +1050,8 @@ static int vfio_iommu_type1_unpin_pages(void 
>> *iommu_data,
>>  
>>  mutex_lock(&iommu->lock);
>>  
>> -do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
>> +do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) ||
>> +iommu->iopf_enabled;
> 
> pin/unpin are actually still pinning pages, why does iopf exempt them
> from accounting?

If iopf_enabled is true, do_accounting will be true too, we will account
the external pinned pages?

> 
> 
>>  for (i = 0; i < npage; i++) {
>>  struct vfio_dma *dma;
>>  dma_addr_t iova;
>> @@ -1169,7 +1172,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma,
>>  if (!dma->size)
>>  return 0;
>>  
>> -if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
>> +if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || iommu->iopf_enabled)
>>  return 0;
>>  
>>  /*
>> @@ -1306,11 +1309,20 @@ static void vfio_unmap_partial_iopf(struct 
>> vfio_iommu *iommu,
>>  }
>>  }
>>  
>> +static void vfio_dma_clean_iopf(struct vfio_iommu *iommu, struct vfio_dma 
>> *dma)
>> +{
>> +vfio_unmap_partial_iopf(iommu, dma, dma->iova, dma->iova + dma->size);
>> +
>> +kfree(dma->iopf_mapped_bitmap);
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>>  WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list));
>>  vfio_unmap_unpin(iommu, dma, true);
>>  vfio_unlink_dma(iommu, dma);
>> +if (iommu->iopf_enabled)
>> +vfio_dma_clean_iopf(iommu, dma);
>>  put_task_struct(dma->task);
>>  vfio_dma_bitmap_free(dma);
>>  if (dma->vaddr_invalid) {
>> @@ -1359,7 +1371,8 @@ static int update_user_bitmap(u64 __user *bitmap, 
>> struct vfio_iommu *iommu,
>>   * mark all pages dirty if any IOMMU capable device is not able
>>   * to report dirty pages and all pages are pinned and mapped.
>>   */
>> -if (iommu->num_non_pinned_groups && dma->iommu_mapped)
>> +if (iommu->num_non_pinned_groups &&
>> +(dma->iommu_mapped || iommu->iopf_enabled))
>>  bitmap_set(dma->bitmap, 0, nbits);
> 
> This seems like really poor integration of iopf into dirty page
> tracking.  I'd expect dirty logging to flush the mapped pages and
> write faults to mark pages dirty.  Shouldn't the fault handler also
> provide only the access faulted, so for example a read fault wouldn't
> mark the page dirty?
I just want to keep the behavior here as before, if IOPF enabled, we
will still mark all pages dirty.

We can distinguish between write and read faults in the fault handler,
so there is a way to add IOPF-based fine grained dirty tracking support...
But I am not sure whether there is a need to implement this, we can
consider this in the future?

> 
>>  
>>  if (shift) {
>> @@ -1772,6 +1785,16 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  goto out_unlock;
>>  }
>>  
>> +if (iommu->iopf_enabled) {
>> +dma->iopf_mapped_bitmap = kvzalloc(IOPF_MAPPED_BITMAP_BYTES(
>> +size >> PAGE_SHIFT), 
>> GFP_KERNEL);
>> +if (!dma->iopf_mapped_bitmap) {
>> +ret = -ENOMEM;
>> +kfree(dma);
>> +goto out_un

Re: [RFC PATCH v3 2/8] vfio/type1: Add a page fault handler

2021-05-20 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:14 +0800
> Shenming Lu  wrote:
> 
>> VFIO manages the DMA mapping itself. To support IOPF (on-demand paging)
>> for VFIO (IOMMU capable) devices, we add a VFIO page fault handler to
>> serve the reported page faults from the IOMMU driver.
>>
>> Signed-off-by: Shenming Lu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 114 
>>  1 file changed, 114 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 45cbfd4879a5..ab0ff60ee207 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -101,6 +101,7 @@ struct vfio_dma {
>>  struct task_struct  *task;
>>  struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
>>  unsigned long   *bitmap;
>> +unsigned long   *iopf_mapped_bitmap;
>>  };
>>  
>>  struct vfio_batch {
>> @@ -141,6 +142,16 @@ struct vfio_regions {
>>  size_t len;
>>  };
>>  
>> +/* A global IOPF enabled group list */
>> +static struct rb_root iopf_group_list = RB_ROOT;
>> +static DEFINE_MUTEX(iopf_group_list_lock);
>> +
>> +struct vfio_iopf_group {
>> +struct rb_node  node;
>> +struct iommu_group  *iommu_group;
>> +struct vfio_iommu   *iommu;
>> +};
>> +
>>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
>>  (!list_empty(&iommu->domain_list))
>>  
>> @@ -157,6 +168,10 @@ struct vfio_regions {
>>  #define DIRTY_BITMAP_PAGES_MAX   ((u64)INT_MAX)
>>  #define DIRTY_BITMAP_SIZE_MAX
>> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
>>  
>> +#define IOPF_MAPPED_BITMAP_GET(dma, i)  \
>> +  ((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG]
>> \
>> +   >> ((i) % BITS_PER_LONG)) & 0x1)
> 
> 
> Can't we just use test_bit()?

Yeah, we can use it.

> 
> 
>> +
>>  #define WAITED 1
>>  
>>  static int put_pfn(unsigned long pfn, int prot);
>> @@ -416,6 +431,34 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, 
>> struct vfio_pfn *vpfn)
>>  return ret;
>>  }
>>  
>> +/*
>> + * Helper functions for iopf_group_list
>> + */
>> +static struct vfio_iopf_group *
>> +vfio_find_iopf_group(struct iommu_group *iommu_group)
>> +{
>> +struct vfio_iopf_group *iopf_group;
>> +struct rb_node *node;
>> +
>> +mutex_lock(&iopf_group_list_lock);
>> +
>> +node = iopf_group_list.rb_node;
>> +
>> +while (node) {
>> +iopf_group = rb_entry(node, struct vfio_iopf_group, node);
>> +
>> +if (iommu_group < iopf_group->iommu_group)
>> +node = node->rb_left;
>> +else if (iommu_group > iopf_group->iommu_group)
>> +node = node->rb_right;
>> +else
>> +break;
>> +}
>> +
>> +mutex_unlock(&iopf_group_list_lock);
>> +return node ? iopf_group : NULL;
>> +}
> 
> This looks like a pretty heavy weight operation per DMA fault.
> 
> I'm also suspicious of this validity of this iopf_group after we've
> dropped the locking, the ordering of patches makes this very confusing.

My thought was to include the handling of DMA faults completely in the type1
backend by introducing the vfio_iopf_group struct. But it seems that introducing
a struct with an unknown lifecycle causes more problems...
I will use the path from vfio-core as in the v2 for simplicity and validity.

Sorry for the confusing, I will reconstruct the series later. :-)

> 
>> +
>>  static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>>  {
>>  struct mm_struct *mm;
>> @@ -3106,6 +3149,77 @@ static int vfio_iommu_type1_dirty_pages(struct 
>> vfio_iommu *iommu,
>>  return -EINVAL;
>>  }
>>  
>> +/* VFIO I/O Page Fault handler */
>> +static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void 
>> *data)
> 
>>From the comment, this seems like the IOMMU fault handler (the
> construction of this series makes this difficult to follow) and
> eventually it handles more than DMA mapping, for example transferring
> faults to the device driver.  "dma_map_iopf" seems like a poorly scoped
> name.

Maybe just call it dev_fault_handler?

> 
>> +{
>> +struct device *dev = (struct device *)data;
>> +struct iommu_group *iommu_group;
>> +struct vfio_iopf_group *iopf_group;
>> +struct vfio_iommu *iommu;
>> +struct vfio_dma *dma;
>> +dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE);
>> +int access_flags = 0;
>> +unsigned long bit_offset, vaddr, pfn;
>> +int ret;
>> +enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
>> +struct iommu_page_response resp = {0};
>> +
>> +if (fault->type != IOMMU_FAULT_PAGE_REQ)
>> +return -EOPNOTSUPP;
>> +
>> +iommu_group = iommu_group_get(dev);
>> +if (!iommu_group)
>> +return -ENODEV;
>> +
>> +iopf_group = vfio_find_iopf_grou

Re: [RFC PATCH v3 5/8] vfio/type1: VFIO_IOMMU_ENABLE_IOPF

2021-05-20 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:17 +0800
> Shenming Lu  wrote:
> 
>> Since enabling IOPF for devices may lead to a slow ramp up of performance,
>> we add an ioctl VFIO_IOMMU_ENABLE_IOPF to make it configurable. And the
>> IOPF enabling of a VFIO device includes setting IOMMU_DEV_FEAT_IOPF and
>> registering the VFIO IOPF handler.
>>
>> Note that VFIO_IOMMU_DISABLE_IOPF is not supported since there may be
>> inflight page faults when disabling.
>>
>> Signed-off-by: Shenming Lu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 223 +++-
>>  include/uapi/linux/vfio.h   |   6 +
>>  2 files changed, 226 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 01e296c6dc9e..7df5711e743a 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -71,6 +71,7 @@ struct vfio_iommu {
>>  struct rb_root  dma_list;
>>  struct blocking_notifier_head notifier;
>>  struct mmu_notifier mn;
>> +struct mm_struct*mm;
> 
> We currently have no requirement that a single mm is used for all
> container mappings.  Does enabling IOPF impose such a requirement?
> Shouldn't MAP/UNMAP enforce that?

Did you mean that there is a possibility that each vfio_dma in a
container has a different mm? If so, could we register one MMU
notifier for each vfio_dma in the MAP ioctl?

> 
>>  unsigned intdma_avail;
>>  unsigned intvaddr_invalid_count;
>>  uint64_tpgsize_bitmap;
>> @@ -81,6 +82,7 @@ struct vfio_iommu {
>>  booldirty_page_tracking;
>>  boolpinned_page_dirty_scope;
>>  boolcontainer_open;
>> +booliopf_enabled;
>>  };
>>  
>>  struct vfio_domain {
>> @@ -461,6 +463,38 @@ vfio_find_iopf_group(struct iommu_group *iommu_group)
>>  return node ? iopf_group : NULL;
>>  }
>>  
>> +static void vfio_link_iopf_group(struct vfio_iopf_group *new)
>> +{
>> +struct rb_node **link, *parent = NULL;
>> +struct vfio_iopf_group *iopf_group;
>> +
>> +mutex_lock(&iopf_group_list_lock);
>> +
>> +link = &iopf_group_list.rb_node;
>> +
>> +while (*link) {
>> +parent = *link;
>> +iopf_group = rb_entry(parent, struct vfio_iopf_group, node);
>> +
>> +if (new->iommu_group < iopf_group->iommu_group)
>> +link = &(*link)->rb_left;
>> +else
>> +link = &(*link)->rb_right;
>> +}
>> +
>> +rb_link_node(&new->node, parent, link);
>> +rb_insert_color(&new->node, &iopf_group_list);
>> +
>> +mutex_unlock(&iopf_group_list_lock);
>> +}
>> +
>> +static void vfio_unlink_iopf_group(struct vfio_iopf_group *old)
>> +{
>> +mutex_lock(&iopf_group_list_lock);
>> +rb_erase(&old->node, &iopf_group_list);
>> +mutex_unlock(&iopf_group_list_lock);
>> +}
>> +
>>  static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>>  {
>>  struct mm_struct *mm;
>> @@ -2363,6 +2397,68 @@ static void vfio_iommu_iova_insert_copy(struct 
>> vfio_iommu *iommu,
>>  list_splice_tail(iova_copy, iova);
>>  }
>>  
>> +static int vfio_dev_domian_nested(struct device *dev, int *nested)
> 
> 
> s/domian/domain/
> 
> 
>> +{
>> +struct iommu_domain *domain;
>> +
>> +domain = iommu_get_domain_for_dev(dev);
>> +if (!domain)
>> +return -ENODEV;
>> +
>> +return iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, nested);
> 
> 
> Wouldn't this be easier to use if it returned bool, false on -errno?

Make sense.

> 
> 
>> +}
>> +
>> +static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void 
>> *data);
>> +
>> +static int dev_enable_iopf(struct device *dev, void *data)
>> +{
>> +int *enabled_dev_cnt = data;
>> +int nested;
>> +u32 flags;
>> +int ret;
>> +
>> +ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF);
>> +if (ret)
>> +return ret;
>> +
>> +ret = vfio_dev_domian_nested(dev, &nested);
>> +if (ret)
>> +goto out_disable;
>> +
>> +if (nested)
>> +flags = FAULT_REPORT_NESTED_L2;
>> +else
>> +flags = FAULT_REPORT_FLAT;
>> +
>> +ret = iommu_register_device_fault_handler(dev,
>> +vfio_iommu_type1_dma_map_iopf, flags, dev);
>> +if (ret)
>> +goto out_disable;
>> +
>> +(*enabled_dev_cnt)++;
>> +return 0;
>> +
>> +out_disable:
>> +iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_IOPF);
>> +return ret;
>> +}
>> +
>> +static int dev_disable_iopf(struct device *dev, void *data)
>> +{
>> +int *enabled_dev_cnt = data;
>> +
>> +if (enabled_dev_cnt && *enabled_dev_cnt <= 0)
>> +return -1;
> 
> 
> Use an errno.>
> 
>> +
>> +WARN_ON(iommu_unregister_device_fault_handler(dev));
>> +WARN_ON(iommu_dev_disable_fea

Re: [RFC PATCH v3 4/8] vfio/type1: Pre-map more pages than requested in the IOPF handling

2021-05-20 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:16 +0800
> Shenming Lu  wrote:
> 
>> To optimize for fewer page fault handlings, we can pre-map more pages
>> than requested at once.
>>
>> Note that IOPF_PREMAP_LEN is just an arbitrary value for now, which we
>> could try further tuning.
> 
> I'd prefer that the series introduced full end-to-end functionality
> before trying to improve performance.  The pre-map value seems
> arbitrary here and as noted in the previous patch, the IOMMU API does
> not guarantee unmaps of ranges smaller than the original mapping.  This
> would need to map with single page granularity in order to guarantee
> page granularity at the mmu notifier when the IOMMU supports
> superpages.  Thanks,

Yeah, I will drop this patch in the current stage.

Thanks,
Shenming

> 
> Alex
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 3/8] vfio/type1: Add an MMU notifier to avoid pinning

2021-05-20 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:15 +0800
> Shenming Lu  wrote:
> 
>> To avoid pinning pages when they are mapped in IOMMU page tables, we
>> add an MMU notifier to tell the addresses which are no longer valid
>> and try to unmap them.
>>
>> Signed-off-by: Shenming Lu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 112 +++-
>>  1 file changed, 109 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index ab0ff60ee207..1cb9d1f2717b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -40,6 +40,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson "
>> @@ -69,6 +70,7 @@ struct vfio_iommu {
>>  struct mutexlock;
>>  struct rb_root  dma_list;
>>  struct blocking_notifier_head notifier;
>> +struct mmu_notifier mn;
>>  unsigned intdma_avail;
>>  unsigned intvaddr_invalid_count;
>>  uint64_tpgsize_bitmap;
>> @@ -1204,6 +1206,72 @@ static long vfio_unmap_unpin(struct vfio_iommu 
>> *iommu, struct vfio_dma *dma,
>>  return unlocked;
>>  }
>>  
>> +/* Unmap the IOPF mapped pages in the specified range. */
>> +static void vfio_unmap_partial_iopf(struct vfio_iommu *iommu,
>> +struct vfio_dma *dma,
>> +dma_addr_t start, dma_addr_t end)
>> +{
>> +struct iommu_iotlb_gather *gathers;
>> +struct vfio_domain *d;
>> +int i, num_domains = 0;
>> +
>> +list_for_each_entry(d, &iommu->domain_list, next)
>> +num_domains++;
>> +
>> +gathers = kzalloc(sizeof(*gathers) * num_domains, GFP_KERNEL);
>> +if (gathers) {
>> +for (i = 0; i < num_domains; i++)
>> +iommu_iotlb_gather_init(&gathers[i]);
>> +}
> 
> 
> If we're always serialized on iommu->lock, would it make sense to have
> gathers pre-allocated on the vfio_iommu object?

Yeah, we can do it.

> 
>> +
>> +while (start < end) {
>> +unsigned long bit_offset;
>> +size_t len;
>> +
>> +bit_offset = (start - dma->iova) >> PAGE_SHIFT;
>> +
>> +for (len = 0; start + len < end; len += PAGE_SIZE) {
>> +if (!IOPF_MAPPED_BITMAP_GET(dma,
>> +bit_offset + (len >> PAGE_SHIFT)))
>> +break;
> 
> 
> There are bitmap helpers for this, find_first_bit(),
> find_next_zero_bit().

Thanks for the hint. :-)

> 
> 
>> +}
>> +
>> +if (len) {
>> +i = 0;
>> +list_for_each_entry(d, &iommu->domain_list, next) {
>> +size_t unmapped;
>> +
>> +if (gathers)
>> +unmapped = iommu_unmap_fast(d->domain,
>> +start, len,
>> +
>> &gathers[i++]);
>> +else
>> +unmapped = iommu_unmap(d->domain,
>> +   start, len);
>> +
>> +if (WARN_ON(unmapped != len))
> 
> The IOMMU API does not guarantee arbitrary unmap sizes, this will
> trigger and this exit path is wrong.  If we've already unmapped the
> IOMMU, shouldn't we proceed with @unmapped rather than @len so the
> device can re-fault the extra mappings?  Otherwise the IOMMU state
> doesn't match the iopf bitmap state.

OK, I will correct it.

And can we assume that the @unmapped values (returned from iommu_unmap)
of all domains are the same (since all domains share the same 
iopf_mapped_bitmap)?

> 
>> +goto out;
>> +}
>> +
>> +bitmap_clear(dma->iopf_mapped_bitmap,
>> + bit_offset, len >> PAGE_SHIFT);
>> +
>> +cond_resched();
>> +}
>> +
>> +start += (len + PAGE_SIZE);
>> +}
>> +
>> +out:
>> +if (gathers) {
>> +i = 0;
>> +list_for_each_entry(d, &iommu->domain_list, next)
>> +iommu_iotlb_sync(d->domain, &gathers[i++]);
>> +
>> +kfree(gathers);
>> +}
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>>  WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list));
>> @@ -3197,17 +3265,18 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
>> iommu_fault *fault, void *data)
>>  
>>  vaddr = iova - dma->iova + dma->vaddr;
>>  
>> -if (vfio_pin_page_external(dma, vaddr, &pfn, true))
>> +if (vfio_pin_page_external(dma, vaddr, &pfn, false))
>>  goto out_invalid;
>>  
>>   

Re: [RFC PATCH v3 1/8] iommu: Evolve the device fault reporting framework

2021-05-20 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:13 +0800
> Shenming Lu  wrote:
> 
>> This patch follows the discussion here:
>>
>> https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/
>>
>> Besides SVA/vSVA, such as VFIO may also enable (2nd level) IOPF to remove
>> pinning restriction. In order to better support more scenarios of using
>> device faults, we extend iommu_register_fault_handler() with flags and
>> introduce FAULT_REPORT_ to describe the device fault reporting capability
>> under a specific configuration.
>>
>> Note that we don't further distinguish recoverable and unrecoverable faults
>> by flags in the fault reporting cap, having PAGE_FAULT_REPORT_ +
>> UNRECOV_FAULT_REPORT_ seems not a clean way.
>>
>> In addition, still take VFIO as an example, in nested mode, the 1st level
>> and 2nd level fault reporting may be configured separately and currently
>> each device can only register one iommu dev fault handler, so we add a
>> handler update interface for this.
> 
> 
> IIUC, you're introducing flags for the fault handler callout, which
> essentially allows the IOMMU layer to filter which types of faults the
> handler can receive.  You then need an update function to modify those
> flags.  Why can't the handler itself perform this filtering?  For
> instance in your vfio example, the handler registered by the type1
> backend could itself return fault until the fault transfer path to the
> device driver is established.  Thanks,

As discussed in [1]:

In nested IOPF, we have to figure out whether a fault comes from L1 or L2.
A SMMU stall event comes with this information, but a PRI page request doesn't.
The IOMMU driver can walk the page tables to find out the level information.
If the walk terminates at L1, further inject to the guest. Otherwise fix the
fault at L2 in VFIO. It's not efficient compared to hardware-provided info.

And in pinned case, if VFIO can tell the IOMMU driver that no L2 fault is
expected, there is no need to walk the page tables in the IOMMU driver?

[1] 
https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210108145217.2254447-4-jean-phili...@linaro.org/

Thanks,
Shenming

> 
> Alex
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 0/8] Add IOPF support for VFIO passthrough

2021-05-20 Thread Shenming Lu
Hi Alex,

On 2021/5/19 2:57, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:12 +0800
> Shenming Lu  wrote:
> 
>> Hi,
>>
>> Requesting for your comments and suggestions. :-)
>>
>> The static pinning and mapping problem in VFIO and possible solutions
>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
>> Page Fault support for VFIO devices. Different from those relatively
>> complicated software approaches such as presenting a vIOMMU that provides
>> the DMA buffer information (might include para-virtualized optimizations),
>> IOPF mainly depends on the hardware faulting capability, such as the PCIe
>> PRI extension or Arm SMMU stall model. What's more, the IOPF support in
>> the IOMMU driver has already been implemented in SVA [3]. So we add IOPF
>> support for VFIO passthrough based on the IOPF part of SVA in this series.
> 
> The SVA proposals are being reworked to make use of a new IOASID
> object, it's not clear to me that this shouldn't also make use of that
> work as it does a significant expansion of the type1 IOMMU with fault
> handlers that would duplicate new work using that new model.

It seems that the IOASID extension for guest SVA would not affect this series,
will we do any host-guest IOASID translation in the VFIO fault handler?

> 
>> We have measured its performance with UADK [4] (passthrough an accelerator
>> to a VM(1U16G)) on Hisilicon Kunpeng920 board (and compared with host SVA):
>>
>> Run hisi_sec_test...
>>  - with varying sending times and message lengths
>>  - with/without IOPF enabled (speed slowdown)
>>
>> when msg_len = 1MB (and PREMAP_LEN (in Patch 4) = 1):
>> slowdown (num of faults)
>>  times  VFIO IOPF  host SVA
>>  1  63.4% (518)82.8% (512)
>>  10022.9% (1058)   47.9% (1024)
>>  1000   2.6% (1071)8.5% (1024)
>>
>> when msg_len = 10MB (and PREMAP_LEN = 512):
>> slowdown (num of faults)
>>  times  VFIO IOPF
>>  1  32.6% (13)
>>  1003.5% (26)
>>  1000   1.6% (26)
> 
> It seems like this is only an example that you can make a benchmark
> show anything you want.  The best results would be to pre-map
> everything, which is what we have without this series.  What is an
> acceptable overhead to incur to avoid page pinning?  What if userspace
> had more fine grained control over which mappings were available for
> faulting and which were statically mapped?  I don't really see what
> sense the pre-mapping range makes.  If I assume the user is QEMU in a
> non-vIOMMU configuration, pre-mapping the beginning of each RAM section
> doesn't make any logical sense relative to device DMA.

As you said in Patch 4, we can introduce full end-to-end functionality
before trying to improve performance, and I will drop the pre-mapping patch
in the current stage...

Is there a need that userspace wants more fine grained control over which
mappings are available for faulting? If so, we may evolve the MAP ioctl
to support for specifying the faulting range.

As for the overhead of IOPF, it is unavoidable if enabling on-demand paging
(and page faults occur almost only when first accessing), and it seems that
there is a large optimization space compared to CPU page faulting.

Thanks,
Shenming

> 
> Comments per patch to follow.  Thanks,
> 
> Alex
> 
> 
>> History:
>>
>> v2 -> v3
>>  - Nit fixes.
>>  - No reason to disable reporting the unrecoverable faults. (baolu)
>>  - Maintain a global IOPF enabled group list.
>>  - Split the pre-mapping optimization to be a separate patch.
>>  - Add selective faulting support (use vfio_pin_pages to indicate the
>>non-faultable scope and add a new struct vfio_range to record it,
>>untested). (Kevin)
>>
>> v1 -> v2
>>  - Numerous improvements following the suggestions. Thanks a lot to all
>>of you.
>>
>> Note that PRI is not supported at the moment since there is no hardware.
>>
>> Links:
>> [1] Lesokhin I, et al. Page Fault Support for Network Controllers. In ASPLOS,
>> 2016.
>> [2] Tian K, et al. coIOMMU: A Virtual IOMMU with Cooperative DMA Buffer 
>> Tracking
>> for Efficient Memory Management in Direct I/O. In USENIX ATC, 2020.
>> [3] 
>> https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210401154718.307519-1-jean-phili...@linaro.org/
>> [4] https://github.com/Linaro/uadk
>>
>> Thanks,
>> Shenming
>>
>>
>> Shenming Lu (8):
>>   iommu: Evolve the device fault reporting framework
>>   vfio/type1: Add a page fault handler
>>   vfio/type1: Add an MMU notifier to avoid pinning
>>   vfio/type1: Pre-map more pages than requested in the IOPF handling
>>   vfio/type1: VFIO_IOMMU_ENABLE_IOPF
>>   vfio/type1: No need to statically pin and map if IOPF enabled
>>   vfio/type1: Add selective DMA faulting support
>>   vfio: Add nested IOPF support
>>
>>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |3 +-
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   18 +-
>>  drivers/iommu/iommu.c |   56 +-
>>  drivers/vf

[PATCH] dma-iommu: Add a check to avoid dereference null pointer in function iommu_dma_map_sg()

2021-05-20 Thread chenxiang
From: Xiang Chen 

The issue is reported by tool TscanCode, and it is possible to deference
null pointer when prev is NULL which is the initial value.

Signed-off-by: Xiang Chen 
---
 drivers/iommu/dma-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4cb63b2..88a4f34 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1042,7 +1042,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
 *   iova_len == 0, thus we cannot dereference prev the first
 *   time through here (i.e. before it has a meaningful value).
 */
-   if (pad_len && pad_len < s_length - 1) {
+   if (prev && pad_len && pad_len < s_length - 1) {
prev->length += pad_len;
iova_len += pad_len;
}
-- 
2.8.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-05-20 Thread Wang Xingang
From: Xingang Wang 

When booting with devicetree, the pci_request_acs() is called after the
enumeration and initialization of PCI devices, thus the ACS is not
enabled. And ACS should be enabled when IOMMU is detected for the
PCI host bridge, so add check for IOMMU before probe of PCI host and call
pci_request_acs() to make sure ACS will be enabled when enumerating PCI
devices.

Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
configuring IOMMU linkage")
Signed-off-by: Xingang Wang 
---
 drivers/iommu/of_iommu.c | 1 -
 drivers/pci/of.c | 8 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index a9d2df001149..54a14da242cc 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
.np = master_np,
};
 
-   pci_request_acs();
err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, &info);
} else {
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index da5b414d585a..2313c3f848b0 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device 
*dev,
 
 int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge *bridge)
 {
-   if (!dev->of_node)
+   struct device_node *node = dev->of_node;
+
+   if (!node)
return 0;
 
+   /* Detect IOMMU and make sure ACS will be enabled */
+   if (of_property_read_bool(node, "iommu-map"))
+   pci_request_acs();
+
bridge->swizzle_irq = pci_common_swizzle;
bridge->map_irq = of_irq_parse_and_map_pci;
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier

2021-05-20 Thread Rob Herring
On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Reserved memory region phandle references can be accompanied by a
> specifier that provides additional information about how that specific
> reference should be treated.
> 
> One use-case is to mark a memory region as needing an identity mapping
> in the system's IOMMU for the device that references the region. This is
> needed for example when the bootloader has set up hardware (such as a
> display controller) to actively access a memory region (e.g. a boot
> splash screen framebuffer) during boot. The operating system can use the
> identity mapping flag from the specifier to make sure an IOMMU identity
> mapping is set up for the framebuffer before IOMMU translations are
> enabled for the display controller.
> 
> Signed-off-by: Thierry Reding 
> ---
>  .../reserved-memory/reserved-memory.txt   | 21 +++
>  include/dt-bindings/reserved-memory.h |  8 +++
>  2 files changed, 29 insertions(+)
>  create mode 100644 include/dt-bindings/reserved-memory.h

Sorry for being slow on this. I have 2 concerns.

First, this creates an ABI issue. A DT with cells in 'memory-region' 
will not be understood by an existing OS. I'm less concerned about this 
if we address that with a stable fix. (Though I'm pretty sure we've 
naively added #?-cells in the past ignoring this issue.)

Second, it could be the bootloader setting up the reserved region. If a 
node already has 'memory-region', then adding more regions is more 
complicated compared to adding new properties. And defining what each 
memory-region entry is or how many in schemas is impossible.

Both could be addressed with a new property. Perhaps something like 
'iommu-memory-region = <&phandle>;'. I think the 'iommu' prefix is 
appropriate given this is entirely because of the IOMMU being in the 
mix. I might feel differently if we had other uses for cells, but I 
don't really see it in this case. 

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: i915 and swiotlb_max_segment

2021-05-20 Thread Konrad Rzeszutek Wilk
On Mon, May 10, 2021 at 05:25:25PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> swiotlb_max_segment is a rather strange "API" export by swiotlb.c,
> and i915 is the only (remaining) user.
> 
> swiotlb_max_segment returns 0 if swiotlb is not in use, 1 if
> SWIOTLB_FORCE is set or swiotlb-zen is set, and the swiotlb segment
> size when swiotlb is otherwise enabled.
> 
> i915 then uses it to:
> 
>  a) decided on the max order in i915_gem_object_get_pages_internal
>  b) decide on a max segment size in i915_sg_segment_size
> 
> for a) it really seems i915 should switch to dma_alloc_noncoherent
> or dma_alloc_noncontigous ASAP instead of using alloc_page and
> streaming DMA mappings.  Any chance I could trick one of the i915
> maintaines into doing just that given that the callchain is not
> exactly trivial?
> 
> For b) I'm not sure swiotlb and i915 really agree on the meaning
> of the value.  swiotlb_set_max_segment basically returns the entire
> size of the swiotlb buffer, while i915 seems to use it to limit
> the size each scatterlist entry.  It seems like dma_max_mapping_size
> might be the best value to use here.

Yes. The background behind that was SWIOTLB would fail because well, the
size of the sg was too large. And some way to limit it to max size
was needed - the dma_max_mapping_size "should" be just fine.

> 
> Once that is fixed I'd like to kill off swiotlb_max_segment as it is
> a horribly confusing API.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/6] vfio: remove the unused mdev iommu hook

2021-05-20 Thread Jason Gunthorpe
On Thu, May 20, 2021 at 03:13:55PM +0100, Robin Murphy wrote:

> By "mdev-like" I mean it's very similar in shape to the general SIOV-style
> mediated device concept - i.e. a physical device with an awareness of
> operating on multiple contexts at once, using a Substream ID/PASID for each
> one - but instead of exposing control of the contexts to anyone else, they
> remain hidden behind the kernel driver which already has its own abstracted
> uAPI, so overall it ends up as more just internal housekeeping than any
> actual mediation. We were looking at the mdev code for inspiration, but
> directly using it was never the plan.

Well:
 - Who maps memory into the IOASID (ie the specific sub stream id)?
 - What memory must be mapped?
 - Who triggers DMA to this memory?
 
> The driver simply needs to keep track of the domains and PASIDs -
> when a process submits some work, it can look up the relevant
> domain, iommu_map() the user pages to the right addresses, dma_map()
> them for coherency, then poke in the PASID as part of scheduling the
> work on the physical device.

If you are doing stuff like this then the /dev/ioasid is what you
actually want. The userprocess can create its own IOASID, program the
io page tables for that IOASID to point to pages as it wants and then
just hand over a fully instantiated io page table to the device
driver.

What you are describing is the literal use case of /dev/ioasid - a
clean seperation of managing the IOMMU related parts through
/dev/ioasid and the device driver itself is only concerned with
generating device DMA that has the proper PASID/substream tag.

The entire point is to not duplicate all the iommu code you are
describing having written into every driver that just wants an IOASID.

In particular, you are talking about having a substream capable device
and driver but your driver's uAPI is so limited it can't address the
full range of substream configurations:

 - A substream pointing at a SVA
 - A substream pointing a IO page table nested under another
 - A substream pointing at an IOMMU page table shared by many users

And more. Which is bad.

> > We already talked about this on the "how to use PASID from the kernel"
> > thread.
> 
> Do you have a pointer to the right thread so I can catch up? It's not the
> easiest thing to search for on lore amongst all the other PASID-related
> business :(

Somewhere in here:

http://lore.kernel.org/r/20210517143758.gp1002...@nvidia.com

> FWIW my non-SVA view is that a PASID is merely an index into a set of
> iommu_domains, and in that context it doesn't even really matter *who*
> allocates them, only that the device driver and IOMMU driver are in sync :)

Right, this is where /dev/ioasid is going.

However it gets worked out at the kAPI level in the iommu layer the
things you asked for are intended to be solved, and lots more.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/6] vfio: remove the unused mdev iommu hook

2021-05-20 Thread Robin Murphy

On 2021-05-20 00:24, Jason Gunthorpe wrote:

On Wed, May 19, 2021 at 11:12:46PM +, Tian, Kevin wrote:

From: Jason Gunthorpe 
Sent: Thursday, May 20, 2021 2:07 AM

On Wed, May 19, 2021 at 04:23:21PM +0100, Robin Murphy wrote:

On 2021-05-17 16:35, Joerg Roedel wrote:

On Mon, May 17, 2021 at 10:35:00AM -0300, Jason Gunthorpe wrote:

Well, I'm sorry, but there is a huge other thread talking about the
IOASID design in great detail and why this is all needed. Jumping into
this thread without context and basically rejecting all the
conclusions that were reached over the last several weeks is really
not helpful - especially since your objection is not technical.

I think you should wait for Intel to put together the /dev/ioasid uAPI
proposal and the example use cases it should address then you can give
feedback there, with proper context.


Yes, I think the next step is that someone who read the whole thread
writes up the conclusions and a rough /dev/ioasid API proposal, also
mentioning the use-cases it addresses. Based on that we can discuss the
implications this needs to have for IOMMU-API and code.

  From the use-cases I know the mdev concept is just fine. But if there is
a more generic one we can talk about it.


Just to add another voice here, I have some colleagues working on drivers
where they want to use SMMU Substream IDs for a single hardware block

to

operate on multiple iommu_domains managed entirely within the
kernel.


If it is entirely within the kernel I'm confused how mdev gets
involved? mdev is only for vfio which is userspace.


By "mdev-like" I mean it's very similar in shape to the general 
SIOV-style mediated device concept - i.e. a physical device with an 
awareness of operating on multiple contexts at once, using a Substream 
ID/PASID for each one - but instead of exposing control of the contexts 
to anyone else, they remain hidden behind the kernel driver which 
already has its own abstracted uAPI, so overall it ends up as more just 
internal housekeeping than any actual mediation. We were looking at the 
mdev code for inspiration, but directly using it was never the plan.



Just add some background. aux domain is used to support mdev but they
are not tied together.


[ yes, technically my comments are relevant to patch #4, but the 
discussion was here, so... :) ]



Literally aux domain just implies that there could be
multiple domains attached to a device then when one of them becomes
the primary all the remaining are deemed as auxiliary. From this angle it
doesn't matter whether the requirement of multiple domains come from
user or kernel.


You can't entirely use aux domain from inside the kernel because you
can't compose it with the DMA API unless you also attach it to some
struct device, and where will the struct device come from?


DMA mapping would still be done using the physical device - where this 
model diverges from mdev is that it doesn't need to fake up a struct 
device to represent each context since they aren't exposed to anyone 
else. Assume the driver already has some kind of token to represent each 
client process, so it just allocates an iommu_domain for a client 
context and does an iommu_aux_attach_dev() to hook it up to some PASID 
(which again nobody else ever sees). The driver simply needs to keep 
track of the domains and PASIDs - when a process submits some work, it 
can look up the relevant domain, iommu_map() the user pages to the right 
addresses, dma_map() them for coherency, then poke in the PASID as part 
of scheduling the work on the physical device.



We already talked about this on the "how to use PASID from the kernel"
thread.


Do you have a pointer to the right thread so I can catch up? It's not 
the easiest thing to search for on lore amongst all the other 
PASID-related business :(



If Robin just wants to use a stream ID from a kernel driver then that
API to make a PASID == RID seems like a better answer for kernel DMA
than aux domains is.


No, that's not the model - the device has a single Stream ID (RID), and 
it wants multiple Substream IDs (PASIDs) hanging off that for distinct 
client contexts; it can still generate non-PASID traffic for stuff like 
loading its firmware (the regular iommu_domain might be 
explicitly-managed or might be automatic via iommu-dma - it doesn’t 
really matter in this context). Aux domains really were a perfect fit 
conceptually, even if the edges were a bit rough.


Now, much as I’d like a stable upstream solution, I can't argue based on 
this particular driver, since the PASID functionality is still in 
development, and there seems little likelihood of it being upstreamed 
either way (the driver belongs to a product team rather than the OSS 
group I'm part of; I'm just helping them with the SMMU angle). If 
designing something around aux domains is a dead-end then we (Arm) will 
probably just prototype our thing using downstream patches to the SMMU 
driver for now. However given the clear overlap wit

Re: [PATCH v3] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-05-20 Thread Rob Herring
On Thu, May 20, 2021 at 2:28 AM Wang Xingang  wrote:
>
> From: Xingang Wang 
>
> When booting with devicetree, the pci_request_acs() is called after the
> enumeration and initialization of PCI devices, thus the ACS is not
> enabled. And ACS should be enabled when IOMMU is detected for the
> PCI host bridge, so add check for IOMMU before probe of PCI host and call
> pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> devices.
>
> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 
> ---
>  drivers/iommu/of_iommu.c |  1 -
>  drivers/pci/controller/pci-host-common.c | 17 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..54a14da242cc 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
> .np = master_np,
> };
>
> -   pci_request_acs();
> err = pci_for_each_dma_alias(to_pci_dev(dev),
>  of_pci_iommu_init, &info);
> } else {
> diff --git a/drivers/pci/controller/pci-host-common.c 
> b/drivers/pci/controller/pci-host-common.c
> index d3924a44db02..5904ad0bd9ae 100644
> --- a/drivers/pci/controller/pci-host-common.c
> +++ b/drivers/pci/controller/pci-host-common.c

This file is generally only for ECAM compliant hosts. Are those the
only hosts we need to support this? From the looks of dts files with
iommu-map, that would be dropping support in lots of cases.

Perhaps in devm_of_pci_bridge_init() or one of the functions it calls
is the better place.

> @@ -49,6 +49,21 @@ static struct pci_config_window *gen_pci_init(struct 
> device *dev,
> return cfg;
>  }
>
> +static void pci_host_enable_acs(struct pci_host_bridge *bridge)
> +{
> +   struct device_node *np = bridge->dev.parent->of_node;
> +   static bool acs_enabled;
> +
> +   if (!np || acs_enabled)
> +   return;
> +
> +   /* Detect IOMMU and make sure ACS will be enabled */
> +   if (of_property_read_bool(np, "iommu-map")) {
> +   acs_enabled = true;
> +   pci_request_acs();

Given this function just sets a variable, I don't think you need the
static acs_enabled here.

> +   }
> +}
> +
>  int pci_host_common_probe(struct platform_device *pdev)
>  {
> struct device *dev = &pdev->dev;
> @@ -81,6 +96,8 @@ int pci_host_common_probe(struct platform_device *pdev)
> bridge->ops = (struct pci_ops *)&ops->pci_ops;
> bridge->msi_domain = true;
>
> +   pci_host_enable_acs(bridge);
> +
> return pci_host_probe(bridge);
>  }
>  EXPORT_SYMBOL_GPL(pci_host_common_probe);
> --
> 2.19.1
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] lib/vsprintf: Use pr_crit() instead of long fancy messages

2021-05-20 Thread Petr Mladek via iommu
On Mon 2021-05-17 08:21:12, Geert Uytterhoeven wrote:
> On Wed, Mar 31, 2021 at 11:59 AM Geert Uytterhoeven
>  wrote:
> > While long fancy messages have a higher probability of being seen than
> > small messages, they may scroll of the screen fast, if visible at all,
> > and may still be missed.  In addition, they increase boot time and
> > kernel size.
> >
> > The correct mechanism to increase importance of a kernel message is not
> > to draw fancy boxes with more text, but to shout louder, i.e. increase
> > the message's reporting level.  Making sure the administrator of the
> > system is aware of such a message is a system policy, and is the
> > responsability of a user-space log daemon.
> >
> > Fix this by increasing the reporting level from KERN_WARNING to
> > KERN_CRIT, and removing irrelevant text and graphics.
> >
> > This reduces kernel size by ca. 0.5 KiB.
> >
> > Fixes: 5ead723a20e0447b ("lib/vsprintf: no_hash_pointers prints all 
> > addresses as unhashed")
> > Signed-off-by: Geert Uytterhoeven 
> 
> No comments?
> Unlike the cases handled by the other two patches in this series,
> this one cannot be configured out.

IMHO, the best solution would be to create a generic API for
eye-catching messages.

I am sure that WARN() is misused on many locations all over the kernel
because people just wanted eye-catching message, for example, see
https://lore.kernel.org/r/2149df3f542d25ce15d049e81d6188bb7198478c.ca...@fi.rohmeurope.com

It might be a win-win solution.

Best Regards,
Petr
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace

2021-05-20 Thread Yongji Xie
On Thu, May 20, 2021 at 2:06 PM Michael S. Tsirkin  wrote:
>
> On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote:
> > This series introduces a framework, which can be used to implement
> > vDPA Devices in a userspace program. The work consist of two parts:
> > control path forwarding and data path offloading.
> >
> > In the control path, the VDUSE driver will make use of message
> > mechnism to forward the config operation from vdpa bus driver
> > to userspace. Userspace can use read()/write() to receive/reply
> > those control messages.
> >
> > In the data path, the core is mapping dma buffer into VDUSE
> > daemon's address space, which can be implemented in different ways
> > depending on the vdpa bus to which the vDPA device is attached.
> >
> > In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> > buffer is reside in a userspace memory region which can be shared to the
> > VDUSE userspace processs via transferring the shmfd.
> >
> > The details and our user case is shown below:
> >
> > -   
> > --
> > |Container ||  QEMU(VM) |   |   
> > VDUSE daemon |
> > |   -  ||  ---  |   | 
> > -  |
> > |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
> > emulation | | block driver | |
> > +--- ---+   
> > -+--+-
> > |   ||  
> > |
> > |   ||  
> > |
> > +---++--+-
> > || block device |   |  vhost device || vduse driver 
> > |  | TCP/IP ||
> > |---+   +
> > ---+  -+|
> > |   |   |   |   
> > ||
> > | --+--   --+--- 
> > ---+---||
> > | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device 
> > |||
> > | --+--   --+--- 
> > ---+---||
> > |   |  virtio bus   |   |   
> > ||
> > |   ++---   |   |   
> > ||
> > ||  |   |   
> > ||
> > |  --+--|   |   
> > ||
> > |  | virtio-blk device ||   |   
> > ||
> > |  --+--|   |   
> > ||
> > ||  |   |   
> > ||
> > | ---+---   |   |   
> > ||
> > | |  virtio-vdpa driver |   |   |   
> > ||
> > | ---+---   |   |   
> > ||
> > ||  |   |
> > vdpa bus   ||
> > | 
> > ---+--+---+ 
> >   ||
> > |   
> >  ---+--- |
> > -|
> >  NIC |--
> > 
> >  ---+---
> > 
> > |
> > 
> >-+-
> > 
> >| Remote Storages |
> > 
> >---
> >
> > We make use of it to implement a block device connecting to
> > our distributed storage, which can be used both in containers and
> > VMs. Thus, we can have an unified technology stack in this two cases.
> >
> > To 

Re: [PATCH v2 00/15] dma mapping/iommu: Allow IOMMU IOVA rcache range to be configured

2021-05-20 Thread John Garry

On 10/05/2021 15:17, John Garry wrote:

Hi Robin, guys,

A friendly reminder on this one...

Thanks


For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced.

This is much more pronounced from commit 4e89dce72521 ("iommu/iova: Retry
from last rb tree node if iova search fails"), as discussed at [0].

IOVAs which cannot be cached are highly involved in the IOVA aging issue,
as discussed at [1].

This series allows the IOVA rcache range be configured, so that we may
cache all IOVAs per domain, thus improving performance.

A new IOMMU group sysfs file is added - max_opt_dma_size - which is used
indirectly to configure the IOVA rcache range:
/sys/kernel/iommu_groups/X/max_opt_dma_size

This file is updated same as how the IOMMU group default domain type is
updated, i.e. must unbind the only device in the group first. However, the
IOMMU default domain is reallocated in the device driver reprobe, and not
immediately.

In addition, we keep (from v1 series) the DMA mapping API to allow DMA max
optimised size be set from a LLDD. How it works is a lot different. When
the LLDD calls this during probe, once the value is successfully recorded, we
return -EDEFER_PROBE. In the reprobe, the IOMM group default domain is
reallocated, and the new IOVA domain rcache upper limit is set according
to that DMA max optimised size. As such, we don't operate on a live IOMMU
domain.

Note that the DMA mapping API frontend is not strictly required, but saves
the LLDD calling IOMMU APIs directly, that being not preferred.

Some figures for storage scenario:
v5.13-rc1 baseline: 1200K IOPS
With series:1800K IOPS

All above are for IOMMU strict mode. Non-strict mode gives ~1800K IOPS in
all scenarios.

Patch breakdown:
1-11: Add support for setting DMA max optimised size via sysfs
12-15: Add support for setting DMA max optimised size from LLDD

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/1607538189-237944-1-git-send-email-john.ga...@huawei.com/

Differences to v1:
- Many
- Change method to not operate on a 'live' IOMMU domain:
- rather, force device driver to be re-probed once
  dma_max_opt_size is set, and reconfig a new IOMMU group then
- Add iommu sysfs max_dma_opt_size file, and allow updating same as how
   group type is changed

John Garry (15):
   iommu: Reactor iommu_group_store_type()
   iova: Allow rcache range upper limit to be flexible
   iommu: Allow max opt DMA len be set for a group via sysfs
   iommu: Add iommu_group_get_max_opt_dma_size()
   iova: Add iova_domain_len_is_cached()
   iommu: Allow iommu_change_dev_def_domain() realloc default domain for
 same type
   iommu: Add iommu_realloc_dev_group()
   dma-iommu: Add iommu_reconfig_dev_group_dma()
   iova: Add init_iova_domain_ext()
   dma-iommu: Use init_iova_domain_ext() for IOVA domain init
   dma-iommu: Reconfig group domain
   iommu: Add iommu_set_dev_dma_opt_size()
   dma-mapping: Add dma_set_max_opt_size()
   dma-iommu: Add iommu_dma_set_opt_size()
   scsi: hisi_sas: Set max optimal DMA size for v3 hw

  drivers/iommu/dma-iommu.c  |  51 +-
  drivers/iommu/iommu.c  | 231 +++--
  drivers/iommu/iova.c   |  61 +--
  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |   5 +
  include/linux/dma-iommu.h  |   4 +
  include/linux/dma-map-ops.h|   1 +
  include/linux/dma-mapping.h|   8 +
  include/linux/iommu.h  |  19 ++
  include/linux/iova.h   |  21 ++-
  kernel/dma/mapping.c   |  11 ++
  10 files changed, 344 insertions(+), 68 deletions(-)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-05-20 Thread Wang Xingang
From: Xingang Wang 

When booting with devicetree, the pci_request_acs() is called after the
enumeration and initialization of PCI devices, thus the ACS is not
enabled. And ACS should be enabled when IOMMU is detected for the
PCI host bridge, so add check for IOMMU before probe of PCI host and call
pci_request_acs() to make sure ACS will be enabled when enumerating PCI
devices.

Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
configuring IOMMU linkage")
Signed-off-by: Xingang Wang 
---
 drivers/iommu/of_iommu.c |  1 -
 drivers/pci/controller/pci-host-common.c | 17 +
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index a9d2df001149..54a14da242cc 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
.np = master_np,
};
 
-   pci_request_acs();
err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, &info);
} else {
diff --git a/drivers/pci/controller/pci-host-common.c 
b/drivers/pci/controller/pci-host-common.c
index d3924a44db02..5904ad0bd9ae 100644
--- a/drivers/pci/controller/pci-host-common.c
+++ b/drivers/pci/controller/pci-host-common.c
@@ -49,6 +49,21 @@ static struct pci_config_window *gen_pci_init(struct device 
*dev,
return cfg;
 }
 
+static void pci_host_enable_acs(struct pci_host_bridge *bridge)
+{
+   struct device_node *np = bridge->dev.parent->of_node;
+   static bool acs_enabled;
+
+   if (!np || acs_enabled)
+   return;
+
+   /* Detect IOMMU and make sure ACS will be enabled */
+   if (of_property_read_bool(np, "iommu-map")) {
+   acs_enabled = true;
+   pci_request_acs();
+   }
+}
+
 int pci_host_common_probe(struct platform_device *pdev)
 {
struct device *dev = &pdev->dev;
@@ -81,6 +96,8 @@ int pci_host_common_probe(struct platform_device *pdev)
bridge->ops = (struct pci_ops *)&ops->pci_ops;
bridge->msi_domain = true;
 
+   pci_host_enable_acs(bridge);
+
return pci_host_probe(bridge);
 }
 EXPORT_SYMBOL_GPL(pci_host_common_probe);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next 1/3] iommu/arm-smmu-v3: fix missing a blank line after declarations

2021-05-20 Thread Bixuan Cui
Fixes checkpatch warnings in arm-smmu-v3.c:
WARNING: Missing a blank line after declarations

Signed-off-by: Bixuan Cui 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 54b2f27b81d4..4f184119c26d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -151,6 +151,7 @@ static void queue_sync_cons_out(struct arm_smmu_queue *q)
 static void queue_inc_cons(struct arm_smmu_ll_queue *q)
 {
u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1;
+
q->cons = Q_OVF(q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
 }
 
@@ -176,6 +177,7 @@ static int queue_sync_prod_in(struct arm_smmu_queue *q)
 static u32 queue_inc_prod_n(struct arm_smmu_ll_queue *q, int n)
 {
u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + n;
+
return Q_OVF(q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
 }
 
@@ -1895,6 +1897,7 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
mutex_unlock(&arm_smmu_asid_lock);
} else {
struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
+
if (cfg->vmid)
arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
}
@@ -2724,6 +2727,7 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device 
*smmu,
 static void arm_smmu_cmdq_free_bitmap(void *data)
 {
unsigned long *bitmap = data;
+
bitmap_free(bitmap);
 }
 
@@ -2939,6 +2943,7 @@ static int arm_smmu_update_gbpa(struct arm_smmu_device 
*smmu, u32 set, u32 clr)
 static void arm_smmu_free_msis(void *data)
 {
struct device *dev = data;
+
platform_msi_domain_free_irqs(dev);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next 0/3] iommu/arm-smmu-v3: clean up some code style issues

2021-05-20 Thread Bixuan Cui
Clean up some code style issues.

Bixuan Cui (3):
  iommu/arm-smmu-v3: fix missing a blank line after declarations
  iommu/arm-smmu-v3: Change *array into *const array
  iommu/arm-smmu-v3: Prefer unsigned int to bare use of unsigned

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next 3/3] iommu/arm-smmu-v3: Prefer unsigned int to bare use of unsigned

2021-05-20 Thread Bixuan Cui
Fix checkpatch warning in arm-smmu-v3.c:
Prefer 'unsigned int' to bare use of 'unsigned'

Signed-off-by: Bixuan Cui 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 51ce44fe550c..725b099d0652 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1827,7 +1827,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
}
 }
 
-static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
+static struct iommu_domain *arm_smmu_domain_alloc(unsigned int type)
 {
struct arm_smmu_domain *smmu_domain;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next 2/3] iommu/arm-smmu-v3: Change *array into *const array

2021-05-20 Thread Bixuan Cui
Fix checkpatch warning in arm-smmu-v3.c:
static const char * array should probably be static const char * const

Signed-off-by: Bixuan Cui 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4f184119c26d..51ce44fe550c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -354,7 +354,7 @@ static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct 
arm_smmu_device *smmu,
 
 static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 {
-   static const char *cerror_str[] = {
+   static const char * const cerror_str[] = {
[CMDQ_ERR_CERROR_NONE_IDX]  = "No error",
[CMDQ_ERR_CERROR_ILL_IDX]   = "Illegal command",
[CMDQ_ERR_CERROR_ABT_IDX]   = "Abort on command fetch",
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Re: Re: [PATCH v7 04/12] virtio-blk: Add validation for block size in config space

2021-05-20 Thread Yongji Xie
On Thu, May 20, 2021 at 1:43 PM Michael S. Tsirkin  wrote:
>
> On Thu, May 20, 2021 at 01:25:16PM +0800, Yongji Xie wrote:
> > On Wed, May 19, 2021 at 10:42 PM Dan Carpenter  
> > wrote:
> > >
> > > On Wed, May 19, 2021 at 09:39:20PM +0800, Yongji Xie wrote:
> > > > On Mon, May 17, 2021 at 5:56 PM Xie Yongji  
> > > > wrote:
> > > > >
> > > > > This ensures that we will not use an invalid block size
> > > > > in config space (might come from an untrusted device).
> > >
> > > I looked at if I should add this as an untrusted function so that Smatch
> > > could find these sorts of bugs but this is reading data from the host so
> > > there has to be some level of trust...
> > >
> >
> > It would be great if Smatch could detect this case if possible. The
> > data might be trusted in traditional VM cases. But now the data can be
> > read from a userspace daemon when VDUSE is enabled.
> >
> > > I should add some more untrusted data kvm functions to Smatch.  Right
> > > now I only have kvm_register_read() and I've added kvm_read_guest_virt()
> > > just now.
> > >
> > > > >
> > > > > Signed-off-by: Xie Yongji 
> > > > > ---
> > > > >  drivers/block/virtio_blk.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > > index ebb4d3fe803f..c848aa36d49b 100644
> > > > > --- a/drivers/block/virtio_blk.c
> > > > > +++ b/drivers/block/virtio_blk.c
> > > > > @@ -826,7 +826,7 @@ static int virtblk_probe(struct virtio_device 
> > > > > *vdev)
> > > > > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
> > > > >struct virtio_blk_config, blk_size,
> > > > >&blk_size);
> > > > > -   if (!err)
> > > > > +   if (!err && blk_size > 0 && blk_size <= max_size)
> > > >
> > > > The check here is incorrect. I will use PAGE_SIZE as the maximum
> > > > boundary in the new version.
> > >
> > > What does this bug look like to the user?
> >
> > The kernel will panic if the block size is larger than PAGE_SIZE.
>
> Kernel panic at this point is par for the course IMHO.

But it seems better if we can avoid this kind of panic. Because this
might also be triggered by a buggy VDUSE daemon.

> Let's focus on eliminating data corruption for starters.

OK, now the incorrect used length might cause data corruption in
virtio-net and virtio-console drivers as I mentioned in another mail.
I will send a fix ASAP.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-05-20 Thread Yongji Xie
On Thu, May 20, 2021 at 2:28 PM Al Viro  wrote:
>
> On Mon, May 17, 2021 at 05:55:12PM +0800, Xie Yongji wrote:
>
> > + case VDUSE_IOTLB_GET_FD: {
> > + struct vduse_iotlb_entry entry;
> > + struct vhost_iotlb_map *map;
> > + struct vdpa_map_file *map_file;
> > + struct vduse_iova_domain *domain = dev->domain;
> > + struct file *f = NULL;
> > +
> > + ret = -EFAULT;
> > + if (copy_from_user(&entry, argp, sizeof(entry)))
> > + break;
>
> return -EFAULT;
> surely?
> > +
> > + ret = -EINVAL;
> > + if (entry.start > entry.last)
> > + break;
>
> ... and similar here, etc.
>

OK.

> > + spin_lock(&domain->iotlb_lock);
> > + map = vhost_iotlb_itree_first(domain->iotlb,
> > +   entry.start, entry.last);
> > + if (map) {
> > + map_file = (struct vdpa_map_file *)map->opaque;
> > + f = get_file(map_file->file);
> > + entry.offset = map_file->offset;
> > + entry.start = map->start;
> > + entry.last = map->last;
> > + entry.perm = map->perm;
> > + }
> > + spin_unlock(&domain->iotlb_lock);
> > + ret = -EINVAL;
> > + if (!f)
> > + break;
> > +
> > + ret = -EFAULT;
> > + if (copy_to_user(argp, &entry, sizeof(entry))) {
> > + fput(f);
> > + break;
> > + }
> > + ret = receive_fd(f, perm_to_file_flags(entry.perm));
> > + fput(f);
> > + break;
>
> IDGI.  The main difference between receive_fd() and plain old
> get_unused_fd_flags() + fd_install() is __receive_sock() call.
> Which does nothing whatsoever in case of non-sockets.  Can you
> get a socket here?
>

Actually what I want here is the security_file_receive() hook in receive_fd().

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu