Re: [PATCH] iommu/vt-d: Fix sysfs leak in alloc_domain()
Am Donnerstag, 22. April 2021, 07:34:17 CEST schrieb Lu Baolu: > Hi Rolf, > > On 4/22/21 1:39 PM, Rolf Eike Beer wrote: > > iommu_device_sysfs_add() is called before, so is has to be cleaned on > > subsequent errors. > > > > Fixes: 39ab9555c2411 ("iommu: Add sysfs bindings for struct iommu_device") > > Cc: sta...@vger.kernel.org # 4.11.x > > Signed-off-by: Rolf Eike Beer > > Acked-by: Lu Baolu Ping. Who is going to pick this up, please? Eike -- Rolf Eike Beer, emlix GmbH, http://www.emlix.com Fon +49 551 30664-0, Fax +49 551 30664-11 Gothaer Platz 3, 37083 Göttingen, Germany Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160 Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055 emlix - smart embedded open source signature.asc Description: This is a digitally signed message part. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v1 0/2] iommu/arm-smmu-v3: Add some parameter check in __arm_smmu_tlb_inv_range()
Hi Robin, On 2021/5/19 18:01, Robin Murphy wrote: On 2021-05-19 10:43, Kunkun Jiang wrote: Hi all, This set of patches solves some errors when I tested the SMMU nested mode. Test scenario description: guest kernel: 4KB translation granule host kernel: 16KB translation granule errors: 1. encountered an endless loop in __arm_smmu_tlb_inv_range because num_pages is 0 2. encountered CERROR_ILL because the fields of TLB invalidation command are as follow: TG = 2, NUM = 0, SCALE = 0, TTL = 0. The combination is exactly the kind of reserved combination pointed out in the SMMUv3 spec(page 143-144, version D.a) In my opinion, it is more appropriate to add parameter check in __arm_smmu_tlb_inv_range(), although these problems only appeared when I tested the SMMU nested mode. What do you think? FWIW I think it would be better to fix the caller to not issue broken commands in the first place. The kernel shouldn't do so for itself (and definitely needs fixing if it ever does), so it sounds like the nesting implementation needs to do a bit more validation of what it's passing through. Thanks for your reply. I will report these errors to Eric and discuss how to fix them. Thanks, Kunkun Jiang Robin. This series include patches as below: Patch 1: - align the invalid range with leaf page size upwards when smmu supports RIL Patch 2: - add a check to standardize granule size when smmu supports RIL Kunkun Jiang (2): iommu/arm-smmu-v3: Align invalid range with leaf page size upwards when support RIL iommu/arm-smmu-v3: Standardize granule size when support RIL drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 + 1 file changed, 9 insertions(+) . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 07/12] iommu/smmuv3: Implement cache_invalidate
Hi Eric, On 2021/4/11 19:12, Eric Auger wrote: Implement domain-selective, pasid selective and page-selective IOTLB invalidations. Signed-off-by: Eric Auger --- v4 -> v15: - remove the redundant arm_smmu_cmdq_issue_sync(smmu) in IOMMU_INV_GRANU_ADDR case (Zenghui) - if RIL is not supported by the host, make sure the granule_size that is passed by the userspace is supported or fix it (Chenxiang) v13 -> v14: - Add domain invalidation - do global inval when asid is not provided with addr granularity v7 -> v8: - ASID based invalidation using iommu_inv_pasid_info - check ARCHID/PASID flags in addr based invalidation - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync v6 -> v7 - check the uapi version v3 -> v4: - adapt to changes in the uapi - add support for leaf parameter - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context anymore v2 -> v3: - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync v1 -> v2: - properly pass the asid --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 + 1 file changed, 89 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 56a301fbe75a..bfc112cc0d38 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2961,6 +2961,94 @@ static void arm_smmu_detach_pasid_table(struct iommu_domain *domain) mutex_unlock(&smmu_domain->init_mutex); } +static int +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev, + struct iommu_cache_invalidate_info *inv_info) +{ + struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL}; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) + return -EINVAL; + + if (!smmu) + return -EINVAL; + + if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1) + return -EINVAL; + + if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID || + inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) { + return -ENOENT; + } + + if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB)) + return -EINVAL; + + /* IOTLB invalidation */ + + switch (inv_info->granularity) { + case IOMMU_INV_GRANU_PASID: + { + struct iommu_inv_pasid_info *info = + &inv_info->granu.pasid_info; + + if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID) + return -ENOENT; + if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID)) + return -EINVAL; + + __arm_smmu_tlb_inv_context(smmu_domain, info->archid); + return 0; + } + case IOMMU_INV_GRANU_ADDR: + { + struct iommu_inv_addr_info *info = &inv_info->granu.addr_info; + size_t granule_size = info->granule_size; + size_t size = info->nb_granules * info->granule_size; + bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF; + int tg; + + if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID) + return -ENOENT; + + if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID)) + break; + + tg = __ffs(granule_size); + if (granule_size & ~(1 << tg)) + return -EINVAL; + /* +* When RIL is not supported, make sure the granule size that is +* passed is supported. In RIL mode, this is enforced in +* __arm_smmu_tlb_inv_range() +*/ + if (!(smmu->features & ARM_SMMU_FEAT_RANGE_INV) && + !(granule_size & smmu_domain->domain.pgsize_bitmap)) { + tg = __ffs(smmu_domain->domain.pgsize_bitmap); + granule_size = 1 << tg; + size = size >> tg; + } + + arm_smmu_tlb_inv_range_domain(info->addr, size, + granule_size, leaf, + info->archid, smmu_domain); I encountered some errors when I tested the SMMU nested mode. Test scenario description: guest kernel: 4KB translation granule host kernel: 16KB translation granule errors: 1. encountered an endless loop in __arm_smmu_tlb_inv_range because num_pages is 0 2. encountered CERROR_ILL because the fields of TLB invalidation command are as follow: TG = 2, NUM = 0, SCALE = 0, TTL = 0. The combination is exactly the kind of reserved combination pointed out in the SMMUv3 spec(page 143-144, version D.a) According to my analysis, we should do a bit more validation on the 'size' and 'granule_size' when SMMU support
Re: [RFC PATCH v3 7/8] vfio/type1: Add selective DMA faulting support
On 2021/5/19 2:58, Alex Williamson wrote: > On Fri, 9 Apr 2021 11:44:19 +0800 > Shenming Lu wrote: > >> Some devices only allow selective DMA faulting. Similar to the selective >> dirty page tracking, the vendor driver can call vfio_pin_pages() to >> indicate the non-faultable scope, we add a new struct vfio_range to >> record it, then when the IOPF handler receives any page request out >> of the scope, we can directly return with an invalid response. > > Seems like this highlights a deficiency in the design, that the user > can't specify mappings as iopf enabled or disabled. Also, if the > vendor driver has pinned pages within the range, shouldn't that prevent > them from faulting in the first place? Why do we need yet more > tracking structures? Pages pinned by the vendor driver need to count > against the user's locked memory limits regardless of iopf. Thanks, Currently we only have a vfio_pfn struct to track the external pinned pages (single page granularity), so I add a vfio_range struct for efficient lookup. Yeah, by this patch, for the non-pinned scope, we can directly return INVALID, but for the pinned(non-faultable) scope, tracking the pinned range doesn't seem to help more... Thanks, Shenming > > Alex > > . > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v3 6/8] vfio/type1: No need to statically pin and map if IOPF enabled
On 2021/5/19 2:58, Alex Williamson wrote: > On Fri, 9 Apr 2021 11:44:18 +0800 > Shenming Lu wrote: > >> If IOPF enabled for the VFIO container, there is no need to statically >> pin and map the entire DMA range, we can do it on demand. And unmap >> according to the IOPF mapped bitmap when removing vfio_dma. >> >> Note that we still mark all pages dirty even if IOPF enabled, we may >> add IOPF-based fine grained dirty tracking support in the future. >> >> Signed-off-by: Shenming Lu >> --- >> drivers/vfio/vfio_iommu_type1.c | 38 +++-- >> 1 file changed, 32 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c >> b/drivers/vfio/vfio_iommu_type1.c >> index 7df5711e743a..dcc93c3b258c 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -175,6 +175,7 @@ struct vfio_iopf_group { >> #define IOPF_MAPPED_BITMAP_GET(dma, i) \ >>((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG] >> \ >> >> ((i) % BITS_PER_LONG)) & 0x1) >> +#define IOPF_MAPPED_BITMAP_BYTES(n) DIRTY_BITMAP_BYTES(n) >> >> #define WAITED 1 >> >> @@ -959,7 +960,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, >> * already pinned and accounted. Accouting should be done if there is no >> * iommu capable domain in the container. >> */ >> -do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu); >> +do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || >> +iommu->iopf_enabled; >> >> for (i = 0; i < npage; i++) { >> struct vfio_pfn *vpfn; >> @@ -1048,7 +1050,8 @@ static int vfio_iommu_type1_unpin_pages(void >> *iommu_data, >> >> mutex_lock(&iommu->lock); >> >> -do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu); >> +do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || >> +iommu->iopf_enabled; > > pin/unpin are actually still pinning pages, why does iopf exempt them > from accounting? If iopf_enabled is true, do_accounting will be true too, we will account the external pinned pages? > > >> for (i = 0; i < npage; i++) { >> struct vfio_dma *dma; >> dma_addr_t iova; >> @@ -1169,7 +1172,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, >> struct vfio_dma *dma, >> if (!dma->size) >> return 0; >> >> -if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) >> +if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || iommu->iopf_enabled) >> return 0; >> >> /* >> @@ -1306,11 +1309,20 @@ static void vfio_unmap_partial_iopf(struct >> vfio_iommu *iommu, >> } >> } >> >> +static void vfio_dma_clean_iopf(struct vfio_iommu *iommu, struct vfio_dma >> *dma) >> +{ >> +vfio_unmap_partial_iopf(iommu, dma, dma->iova, dma->iova + dma->size); >> + >> +kfree(dma->iopf_mapped_bitmap); >> +} >> + >> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) >> { >> WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)); >> vfio_unmap_unpin(iommu, dma, true); >> vfio_unlink_dma(iommu, dma); >> +if (iommu->iopf_enabled) >> +vfio_dma_clean_iopf(iommu, dma); >> put_task_struct(dma->task); >> vfio_dma_bitmap_free(dma); >> if (dma->vaddr_invalid) { >> @@ -1359,7 +1371,8 @@ static int update_user_bitmap(u64 __user *bitmap, >> struct vfio_iommu *iommu, >> * mark all pages dirty if any IOMMU capable device is not able >> * to report dirty pages and all pages are pinned and mapped. >> */ >> -if (iommu->num_non_pinned_groups && dma->iommu_mapped) >> +if (iommu->num_non_pinned_groups && >> +(dma->iommu_mapped || iommu->iopf_enabled)) >> bitmap_set(dma->bitmap, 0, nbits); > > This seems like really poor integration of iopf into dirty page > tracking. I'd expect dirty logging to flush the mapped pages and > write faults to mark pages dirty. Shouldn't the fault handler also > provide only the access faulted, so for example a read fault wouldn't > mark the page dirty? I just want to keep the behavior here as before, if IOPF enabled, we will still mark all pages dirty. We can distinguish between write and read faults in the fault handler, so there is a way to add IOPF-based fine grained dirty tracking support... But I am not sure whether there is a need to implement this, we can consider this in the future? > >> >> if (shift) { >> @@ -1772,6 +1785,16 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >> goto out_unlock; >> } >> >> +if (iommu->iopf_enabled) { >> +dma->iopf_mapped_bitmap = kvzalloc(IOPF_MAPPED_BITMAP_BYTES( >> +size >> PAGE_SHIFT), >> GFP_KERNEL); >> +if (!dma->iopf_mapped_bitmap) { >> +ret = -ENOMEM; >> +kfree(dma); >> +goto out_un
Re: [RFC PATCH v3 2/8] vfio/type1: Add a page fault handler
On 2021/5/19 2:58, Alex Williamson wrote: > On Fri, 9 Apr 2021 11:44:14 +0800 > Shenming Lu wrote: > >> VFIO manages the DMA mapping itself. To support IOPF (on-demand paging) >> for VFIO (IOMMU capable) devices, we add a VFIO page fault handler to >> serve the reported page faults from the IOMMU driver. >> >> Signed-off-by: Shenming Lu >> --- >> drivers/vfio/vfio_iommu_type1.c | 114 >> 1 file changed, 114 insertions(+) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c >> b/drivers/vfio/vfio_iommu_type1.c >> index 45cbfd4879a5..ab0ff60ee207 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -101,6 +101,7 @@ struct vfio_dma { >> struct task_struct *task; >> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >> unsigned long *bitmap; >> +unsigned long *iopf_mapped_bitmap; >> }; >> >> struct vfio_batch { >> @@ -141,6 +142,16 @@ struct vfio_regions { >> size_t len; >> }; >> >> +/* A global IOPF enabled group list */ >> +static struct rb_root iopf_group_list = RB_ROOT; >> +static DEFINE_MUTEX(iopf_group_list_lock); >> + >> +struct vfio_iopf_group { >> +struct rb_node node; >> +struct iommu_group *iommu_group; >> +struct vfio_iommu *iommu; >> +}; >> + >> #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ >> (!list_empty(&iommu->domain_list)) >> >> @@ -157,6 +168,10 @@ struct vfio_regions { >> #define DIRTY_BITMAP_PAGES_MAX ((u64)INT_MAX) >> #define DIRTY_BITMAP_SIZE_MAX >> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX) >> >> +#define IOPF_MAPPED_BITMAP_GET(dma, i) \ >> + ((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG] >> \ >> + >> ((i) % BITS_PER_LONG)) & 0x1) > > > Can't we just use test_bit()? Yeah, we can use it. > > >> + >> #define WAITED 1 >> >> static int put_pfn(unsigned long pfn, int prot); >> @@ -416,6 +431,34 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, >> struct vfio_pfn *vpfn) >> return ret; >> } >> >> +/* >> + * Helper functions for iopf_group_list >> + */ >> +static struct vfio_iopf_group * >> +vfio_find_iopf_group(struct iommu_group *iommu_group) >> +{ >> +struct vfio_iopf_group *iopf_group; >> +struct rb_node *node; >> + >> +mutex_lock(&iopf_group_list_lock); >> + >> +node = iopf_group_list.rb_node; >> + >> +while (node) { >> +iopf_group = rb_entry(node, struct vfio_iopf_group, node); >> + >> +if (iommu_group < iopf_group->iommu_group) >> +node = node->rb_left; >> +else if (iommu_group > iopf_group->iommu_group) >> +node = node->rb_right; >> +else >> +break; >> +} >> + >> +mutex_unlock(&iopf_group_list_lock); >> +return node ? iopf_group : NULL; >> +} > > This looks like a pretty heavy weight operation per DMA fault. > > I'm also suspicious of this validity of this iopf_group after we've > dropped the locking, the ordering of patches makes this very confusing. My thought was to include the handling of DMA faults completely in the type1 backend by introducing the vfio_iopf_group struct. But it seems that introducing a struct with an unknown lifecycle causes more problems... I will use the path from vfio-core as in the v2 for simplicity and validity. Sorry for the confusing, I will reconstruct the series later. :-) > >> + >> static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >> { >> struct mm_struct *mm; >> @@ -3106,6 +3149,77 @@ static int vfio_iommu_type1_dirty_pages(struct >> vfio_iommu *iommu, >> return -EINVAL; >> } >> >> +/* VFIO I/O Page Fault handler */ >> +static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void >> *data) > >>From the comment, this seems like the IOMMU fault handler (the > construction of this series makes this difficult to follow) and > eventually it handles more than DMA mapping, for example transferring > faults to the device driver. "dma_map_iopf" seems like a poorly scoped > name. Maybe just call it dev_fault_handler? > >> +{ >> +struct device *dev = (struct device *)data; >> +struct iommu_group *iommu_group; >> +struct vfio_iopf_group *iopf_group; >> +struct vfio_iommu *iommu; >> +struct vfio_dma *dma; >> +dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE); >> +int access_flags = 0; >> +unsigned long bit_offset, vaddr, pfn; >> +int ret; >> +enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID; >> +struct iommu_page_response resp = {0}; >> + >> +if (fault->type != IOMMU_FAULT_PAGE_REQ) >> +return -EOPNOTSUPP; >> + >> +iommu_group = iommu_group_get(dev); >> +if (!iommu_group) >> +return -ENODEV; >> + >> +iopf_group = vfio_find_iopf_grou
Re: [RFC PATCH v3 5/8] vfio/type1: VFIO_IOMMU_ENABLE_IOPF
On 2021/5/19 2:58, Alex Williamson wrote: > On Fri, 9 Apr 2021 11:44:17 +0800 > Shenming Lu wrote: > >> Since enabling IOPF for devices may lead to a slow ramp up of performance, >> we add an ioctl VFIO_IOMMU_ENABLE_IOPF to make it configurable. And the >> IOPF enabling of a VFIO device includes setting IOMMU_DEV_FEAT_IOPF and >> registering the VFIO IOPF handler. >> >> Note that VFIO_IOMMU_DISABLE_IOPF is not supported since there may be >> inflight page faults when disabling. >> >> Signed-off-by: Shenming Lu >> --- >> drivers/vfio/vfio_iommu_type1.c | 223 +++- >> include/uapi/linux/vfio.h | 6 + >> 2 files changed, 226 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c >> b/drivers/vfio/vfio_iommu_type1.c >> index 01e296c6dc9e..7df5711e743a 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -71,6 +71,7 @@ struct vfio_iommu { >> struct rb_root dma_list; >> struct blocking_notifier_head notifier; >> struct mmu_notifier mn; >> +struct mm_struct*mm; > > We currently have no requirement that a single mm is used for all > container mappings. Does enabling IOPF impose such a requirement? > Shouldn't MAP/UNMAP enforce that? Did you mean that there is a possibility that each vfio_dma in a container has a different mm? If so, could we register one MMU notifier for each vfio_dma in the MAP ioctl? > >> unsigned intdma_avail; >> unsigned intvaddr_invalid_count; >> uint64_tpgsize_bitmap; >> @@ -81,6 +82,7 @@ struct vfio_iommu { >> booldirty_page_tracking; >> boolpinned_page_dirty_scope; >> boolcontainer_open; >> +booliopf_enabled; >> }; >> >> struct vfio_domain { >> @@ -461,6 +463,38 @@ vfio_find_iopf_group(struct iommu_group *iommu_group) >> return node ? iopf_group : NULL; >> } >> >> +static void vfio_link_iopf_group(struct vfio_iopf_group *new) >> +{ >> +struct rb_node **link, *parent = NULL; >> +struct vfio_iopf_group *iopf_group; >> + >> +mutex_lock(&iopf_group_list_lock); >> + >> +link = &iopf_group_list.rb_node; >> + >> +while (*link) { >> +parent = *link; >> +iopf_group = rb_entry(parent, struct vfio_iopf_group, node); >> + >> +if (new->iommu_group < iopf_group->iommu_group) >> +link = &(*link)->rb_left; >> +else >> +link = &(*link)->rb_right; >> +} >> + >> +rb_link_node(&new->node, parent, link); >> +rb_insert_color(&new->node, &iopf_group_list); >> + >> +mutex_unlock(&iopf_group_list_lock); >> +} >> + >> +static void vfio_unlink_iopf_group(struct vfio_iopf_group *old) >> +{ >> +mutex_lock(&iopf_group_list_lock); >> +rb_erase(&old->node, &iopf_group_list); >> +mutex_unlock(&iopf_group_list_lock); >> +} >> + >> static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >> { >> struct mm_struct *mm; >> @@ -2363,6 +2397,68 @@ static void vfio_iommu_iova_insert_copy(struct >> vfio_iommu *iommu, >> list_splice_tail(iova_copy, iova); >> } >> >> +static int vfio_dev_domian_nested(struct device *dev, int *nested) > > > s/domian/domain/ > > >> +{ >> +struct iommu_domain *domain; >> + >> +domain = iommu_get_domain_for_dev(dev); >> +if (!domain) >> +return -ENODEV; >> + >> +return iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, nested); > > > Wouldn't this be easier to use if it returned bool, false on -errno? Make sense. > > >> +} >> + >> +static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void >> *data); >> + >> +static int dev_enable_iopf(struct device *dev, void *data) >> +{ >> +int *enabled_dev_cnt = data; >> +int nested; >> +u32 flags; >> +int ret; >> + >> +ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF); >> +if (ret) >> +return ret; >> + >> +ret = vfio_dev_domian_nested(dev, &nested); >> +if (ret) >> +goto out_disable; >> + >> +if (nested) >> +flags = FAULT_REPORT_NESTED_L2; >> +else >> +flags = FAULT_REPORT_FLAT; >> + >> +ret = iommu_register_device_fault_handler(dev, >> +vfio_iommu_type1_dma_map_iopf, flags, dev); >> +if (ret) >> +goto out_disable; >> + >> +(*enabled_dev_cnt)++; >> +return 0; >> + >> +out_disable: >> +iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_IOPF); >> +return ret; >> +} >> + >> +static int dev_disable_iopf(struct device *dev, void *data) >> +{ >> +int *enabled_dev_cnt = data; >> + >> +if (enabled_dev_cnt && *enabled_dev_cnt <= 0) >> +return -1; > > > Use an errno.> > >> + >> +WARN_ON(iommu_unregister_device_fault_handler(dev)); >> +WARN_ON(iommu_dev_disable_fea
Re: [RFC PATCH v3 4/8] vfio/type1: Pre-map more pages than requested in the IOPF handling
On 2021/5/19 2:58, Alex Williamson wrote: > On Fri, 9 Apr 2021 11:44:16 +0800 > Shenming Lu wrote: > >> To optimize for fewer page fault handlings, we can pre-map more pages >> than requested at once. >> >> Note that IOPF_PREMAP_LEN is just an arbitrary value for now, which we >> could try further tuning. > > I'd prefer that the series introduced full end-to-end functionality > before trying to improve performance. The pre-map value seems > arbitrary here and as noted in the previous patch, the IOMMU API does > not guarantee unmaps of ranges smaller than the original mapping. This > would need to map with single page granularity in order to guarantee > page granularity at the mmu notifier when the IOMMU supports > superpages. Thanks, Yeah, I will drop this patch in the current stage. Thanks, Shenming > > Alex > > . > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v3 3/8] vfio/type1: Add an MMU notifier to avoid pinning
On 2021/5/19 2:58, Alex Williamson wrote: > On Fri, 9 Apr 2021 11:44:15 +0800 > Shenming Lu wrote: > >> To avoid pinning pages when they are mapped in IOMMU page tables, we >> add an MMU notifier to tell the addresses which are no longer valid >> and try to unmap them. >> >> Signed-off-by: Shenming Lu >> --- >> drivers/vfio/vfio_iommu_type1.c | 112 +++- >> 1 file changed, 109 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c >> b/drivers/vfio/vfio_iommu_type1.c >> index ab0ff60ee207..1cb9d1f2717b 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -40,6 +40,7 @@ >> #include >> #include >> #include >> +#include >> >> #define DRIVER_VERSION "0.2" >> #define DRIVER_AUTHOR "Alex Williamson " >> @@ -69,6 +70,7 @@ struct vfio_iommu { >> struct mutexlock; >> struct rb_root dma_list; >> struct blocking_notifier_head notifier; >> +struct mmu_notifier mn; >> unsigned intdma_avail; >> unsigned intvaddr_invalid_count; >> uint64_tpgsize_bitmap; >> @@ -1204,6 +1206,72 @@ static long vfio_unmap_unpin(struct vfio_iommu >> *iommu, struct vfio_dma *dma, >> return unlocked; >> } >> >> +/* Unmap the IOPF mapped pages in the specified range. */ >> +static void vfio_unmap_partial_iopf(struct vfio_iommu *iommu, >> +struct vfio_dma *dma, >> +dma_addr_t start, dma_addr_t end) >> +{ >> +struct iommu_iotlb_gather *gathers; >> +struct vfio_domain *d; >> +int i, num_domains = 0; >> + >> +list_for_each_entry(d, &iommu->domain_list, next) >> +num_domains++; >> + >> +gathers = kzalloc(sizeof(*gathers) * num_domains, GFP_KERNEL); >> +if (gathers) { >> +for (i = 0; i < num_domains; i++) >> +iommu_iotlb_gather_init(&gathers[i]); >> +} > > > If we're always serialized on iommu->lock, would it make sense to have > gathers pre-allocated on the vfio_iommu object? Yeah, we can do it. > >> + >> +while (start < end) { >> +unsigned long bit_offset; >> +size_t len; >> + >> +bit_offset = (start - dma->iova) >> PAGE_SHIFT; >> + >> +for (len = 0; start + len < end; len += PAGE_SIZE) { >> +if (!IOPF_MAPPED_BITMAP_GET(dma, >> +bit_offset + (len >> PAGE_SHIFT))) >> +break; > > > There are bitmap helpers for this, find_first_bit(), > find_next_zero_bit(). Thanks for the hint. :-) > > >> +} >> + >> +if (len) { >> +i = 0; >> +list_for_each_entry(d, &iommu->domain_list, next) { >> +size_t unmapped; >> + >> +if (gathers) >> +unmapped = iommu_unmap_fast(d->domain, >> +start, len, >> + >> &gathers[i++]); >> +else >> +unmapped = iommu_unmap(d->domain, >> + start, len); >> + >> +if (WARN_ON(unmapped != len)) > > The IOMMU API does not guarantee arbitrary unmap sizes, this will > trigger and this exit path is wrong. If we've already unmapped the > IOMMU, shouldn't we proceed with @unmapped rather than @len so the > device can re-fault the extra mappings? Otherwise the IOMMU state > doesn't match the iopf bitmap state. OK, I will correct it. And can we assume that the @unmapped values (returned from iommu_unmap) of all domains are the same (since all domains share the same iopf_mapped_bitmap)? > >> +goto out; >> +} >> + >> +bitmap_clear(dma->iopf_mapped_bitmap, >> + bit_offset, len >> PAGE_SHIFT); >> + >> +cond_resched(); >> +} >> + >> +start += (len + PAGE_SIZE); >> +} >> + >> +out: >> +if (gathers) { >> +i = 0; >> +list_for_each_entry(d, &iommu->domain_list, next) >> +iommu_iotlb_sync(d->domain, &gathers[i++]); >> + >> +kfree(gathers); >> +} >> +} >> + >> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) >> { >> WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)); >> @@ -3197,17 +3265,18 @@ static int vfio_iommu_type1_dma_map_iopf(struct >> iommu_fault *fault, void *data) >> >> vaddr = iova - dma->iova + dma->vaddr; >> >> -if (vfio_pin_page_external(dma, vaddr, &pfn, true)) >> +if (vfio_pin_page_external(dma, vaddr, &pfn, false)) >> goto out_invalid; >> >>
Re: [RFC PATCH v3 1/8] iommu: Evolve the device fault reporting framework
On 2021/5/19 2:58, Alex Williamson wrote: > On Fri, 9 Apr 2021 11:44:13 +0800 > Shenming Lu wrote: > >> This patch follows the discussion here: >> >> https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/ >> >> Besides SVA/vSVA, such as VFIO may also enable (2nd level) IOPF to remove >> pinning restriction. In order to better support more scenarios of using >> device faults, we extend iommu_register_fault_handler() with flags and >> introduce FAULT_REPORT_ to describe the device fault reporting capability >> under a specific configuration. >> >> Note that we don't further distinguish recoverable and unrecoverable faults >> by flags in the fault reporting cap, having PAGE_FAULT_REPORT_ + >> UNRECOV_FAULT_REPORT_ seems not a clean way. >> >> In addition, still take VFIO as an example, in nested mode, the 1st level >> and 2nd level fault reporting may be configured separately and currently >> each device can only register one iommu dev fault handler, so we add a >> handler update interface for this. > > > IIUC, you're introducing flags for the fault handler callout, which > essentially allows the IOMMU layer to filter which types of faults the > handler can receive. You then need an update function to modify those > flags. Why can't the handler itself perform this filtering? For > instance in your vfio example, the handler registered by the type1 > backend could itself return fault until the fault transfer path to the > device driver is established. Thanks, As discussed in [1]: In nested IOPF, we have to figure out whether a fault comes from L1 or L2. A SMMU stall event comes with this information, but a PRI page request doesn't. The IOMMU driver can walk the page tables to find out the level information. If the walk terminates at L1, further inject to the guest. Otherwise fix the fault at L2 in VFIO. It's not efficient compared to hardware-provided info. And in pinned case, if VFIO can tell the IOMMU driver that no L2 fault is expected, there is no need to walk the page tables in the IOMMU driver? [1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210108145217.2254447-4-jean-phili...@linaro.org/ Thanks, Shenming > > Alex > > . > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v3 0/8] Add IOPF support for VFIO passthrough
Hi Alex, On 2021/5/19 2:57, Alex Williamson wrote: > On Fri, 9 Apr 2021 11:44:12 +0800 > Shenming Lu wrote: > >> Hi, >> >> Requesting for your comments and suggestions. :-) >> >> The static pinning and mapping problem in VFIO and possible solutions >> have been discussed a lot [1, 2]. One of the solutions is to add I/O >> Page Fault support for VFIO devices. Different from those relatively >> complicated software approaches such as presenting a vIOMMU that provides >> the DMA buffer information (might include para-virtualized optimizations), >> IOPF mainly depends on the hardware faulting capability, such as the PCIe >> PRI extension or Arm SMMU stall model. What's more, the IOPF support in >> the IOMMU driver has already been implemented in SVA [3]. So we add IOPF >> support for VFIO passthrough based on the IOPF part of SVA in this series. > > The SVA proposals are being reworked to make use of a new IOASID > object, it's not clear to me that this shouldn't also make use of that > work as it does a significant expansion of the type1 IOMMU with fault > handlers that would duplicate new work using that new model. It seems that the IOASID extension for guest SVA would not affect this series, will we do any host-guest IOASID translation in the VFIO fault handler? > >> We have measured its performance with UADK [4] (passthrough an accelerator >> to a VM(1U16G)) on Hisilicon Kunpeng920 board (and compared with host SVA): >> >> Run hisi_sec_test... >> - with varying sending times and message lengths >> - with/without IOPF enabled (speed slowdown) >> >> when msg_len = 1MB (and PREMAP_LEN (in Patch 4) = 1): >> slowdown (num of faults) >> times VFIO IOPF host SVA >> 1 63.4% (518)82.8% (512) >> 10022.9% (1058) 47.9% (1024) >> 1000 2.6% (1071)8.5% (1024) >> >> when msg_len = 10MB (and PREMAP_LEN = 512): >> slowdown (num of faults) >> times VFIO IOPF >> 1 32.6% (13) >> 1003.5% (26) >> 1000 1.6% (26) > > It seems like this is only an example that you can make a benchmark > show anything you want. The best results would be to pre-map > everything, which is what we have without this series. What is an > acceptable overhead to incur to avoid page pinning? What if userspace > had more fine grained control over which mappings were available for > faulting and which were statically mapped? I don't really see what > sense the pre-mapping range makes. If I assume the user is QEMU in a > non-vIOMMU configuration, pre-mapping the beginning of each RAM section > doesn't make any logical sense relative to device DMA. As you said in Patch 4, we can introduce full end-to-end functionality before trying to improve performance, and I will drop the pre-mapping patch in the current stage... Is there a need that userspace wants more fine grained control over which mappings are available for faulting? If so, we may evolve the MAP ioctl to support for specifying the faulting range. As for the overhead of IOPF, it is unavoidable if enabling on-demand paging (and page faults occur almost only when first accessing), and it seems that there is a large optimization space compared to CPU page faulting. Thanks, Shenming > > Comments per patch to follow. Thanks, > > Alex > > >> History: >> >> v2 -> v3 >> - Nit fixes. >> - No reason to disable reporting the unrecoverable faults. (baolu) >> - Maintain a global IOPF enabled group list. >> - Split the pre-mapping optimization to be a separate patch. >> - Add selective faulting support (use vfio_pin_pages to indicate the >>non-faultable scope and add a new struct vfio_range to record it, >>untested). (Kevin) >> >> v1 -> v2 >> - Numerous improvements following the suggestions. Thanks a lot to all >>of you. >> >> Note that PRI is not supported at the moment since there is no hardware. >> >> Links: >> [1] Lesokhin I, et al. Page Fault Support for Network Controllers. In ASPLOS, >> 2016. >> [2] Tian K, et al. coIOMMU: A Virtual IOMMU with Cooperative DMA Buffer >> Tracking >> for Efficient Memory Management in Direct I/O. In USENIX ATC, 2020. >> [3] >> https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210401154718.307519-1-jean-phili...@linaro.org/ >> [4] https://github.com/Linaro/uadk >> >> Thanks, >> Shenming >> >> >> Shenming Lu (8): >> iommu: Evolve the device fault reporting framework >> vfio/type1: Add a page fault handler >> vfio/type1: Add an MMU notifier to avoid pinning >> vfio/type1: Pre-map more pages than requested in the IOPF handling >> vfio/type1: VFIO_IOMMU_ENABLE_IOPF >> vfio/type1: No need to statically pin and map if IOPF enabled >> vfio/type1: Add selective DMA faulting support >> vfio: Add nested IOPF support >> >> .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c |3 +- >> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +- >> drivers/iommu/iommu.c | 56 +- >> drivers/vf
[PATCH] dma-iommu: Add a check to avoid dereference null pointer in function iommu_dma_map_sg()
From: Xiang Chen The issue is reported by tool TscanCode, and it is possible to deference null pointer when prev is NULL which is the initial value. Signed-off-by: Xiang Chen --- drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 4cb63b2..88a4f34 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1042,7 +1042,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, * iova_len == 0, thus we cannot dereference prev the first * time through here (i.e. before it has a meaningful value). */ - if (pad_len && pad_len < s_length - 1) { + if (prev && pad_len && pad_len < s_length - 1) { prev->length += pad_len; iova_len += pad_len; } -- 2.8.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices
From: Xingang Wang When booting with devicetree, the pci_request_acs() is called after the enumeration and initialization of PCI devices, thus the ACS is not enabled. And ACS should be enabled when IOMMU is detected for the PCI host bridge, so add check for IOMMU before probe of PCI host and call pci_request_acs() to make sure ACS will be enabled when enumerating PCI devices. Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when configuring IOMMU linkage") Signed-off-by: Xingang Wang --- drivers/iommu/of_iommu.c | 1 - drivers/pci/of.c | 8 +++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index a9d2df001149..54a14da242cc 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, .np = master_np, }; - pci_request_acs(); err = pci_for_each_dma_alias(to_pci_dev(dev), of_pci_iommu_init, &info); } else { diff --git a/drivers/pci/of.c b/drivers/pci/of.c index da5b414d585a..2313c3f848b0 100644 --- a/drivers/pci/of.c +++ b/drivers/pci/of.c @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device *dev, int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge *bridge) { - if (!dev->of_node) + struct device_node *node = dev->of_node; + + if (!node) return 0; + /* Detect IOMMU and make sure ACS will be enabled */ + if (of_property_read_bool(node, "iommu-map")) + pci_request_acs(); + bridge->swizzle_irq = pci_common_swizzle; bridge->map_irq = of_irq_parse_and_map_pci; -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier
On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote: > From: Thierry Reding > > Reserved memory region phandle references can be accompanied by a > specifier that provides additional information about how that specific > reference should be treated. > > One use-case is to mark a memory region as needing an identity mapping > in the system's IOMMU for the device that references the region. This is > needed for example when the bootloader has set up hardware (such as a > display controller) to actively access a memory region (e.g. a boot > splash screen framebuffer) during boot. The operating system can use the > identity mapping flag from the specifier to make sure an IOMMU identity > mapping is set up for the framebuffer before IOMMU translations are > enabled for the display controller. > > Signed-off-by: Thierry Reding > --- > .../reserved-memory/reserved-memory.txt | 21 +++ > include/dt-bindings/reserved-memory.h | 8 +++ > 2 files changed, 29 insertions(+) > create mode 100644 include/dt-bindings/reserved-memory.h Sorry for being slow on this. I have 2 concerns. First, this creates an ABI issue. A DT with cells in 'memory-region' will not be understood by an existing OS. I'm less concerned about this if we address that with a stable fix. (Though I'm pretty sure we've naively added #?-cells in the past ignoring this issue.) Second, it could be the bootloader setting up the reserved region. If a node already has 'memory-region', then adding more regions is more complicated compared to adding new properties. And defining what each memory-region entry is or how many in schemas is impossible. Both could be addressed with a new property. Perhaps something like 'iommu-memory-region = <&phandle>;'. I think the 'iommu' prefix is appropriate given this is entirely because of the IOMMU being in the mix. I might feel differently if we had other uses for cells, but I don't really see it in this case. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: i915 and swiotlb_max_segment
On Mon, May 10, 2021 at 05:25:25PM +0200, Christoph Hellwig wrote: > Hi all, > > swiotlb_max_segment is a rather strange "API" export by swiotlb.c, > and i915 is the only (remaining) user. > > swiotlb_max_segment returns 0 if swiotlb is not in use, 1 if > SWIOTLB_FORCE is set or swiotlb-zen is set, and the swiotlb segment > size when swiotlb is otherwise enabled. > > i915 then uses it to: > > a) decided on the max order in i915_gem_object_get_pages_internal > b) decide on a max segment size in i915_sg_segment_size > > for a) it really seems i915 should switch to dma_alloc_noncoherent > or dma_alloc_noncontigous ASAP instead of using alloc_page and > streaming DMA mappings. Any chance I could trick one of the i915 > maintaines into doing just that given that the callchain is not > exactly trivial? > > For b) I'm not sure swiotlb and i915 really agree on the meaning > of the value. swiotlb_set_max_segment basically returns the entire > size of the swiotlb buffer, while i915 seems to use it to limit > the size each scatterlist entry. It seems like dma_max_mapping_size > might be the best value to use here. Yes. The background behind that was SWIOTLB would fail because well, the size of the sg was too large. And some way to limit it to max size was needed - the dma_max_mapping_size "should" be just fine. > > Once that is fixed I'd like to kill off swiotlb_max_segment as it is > a horribly confusing API. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/6] vfio: remove the unused mdev iommu hook
On Thu, May 20, 2021 at 03:13:55PM +0100, Robin Murphy wrote: > By "mdev-like" I mean it's very similar in shape to the general SIOV-style > mediated device concept - i.e. a physical device with an awareness of > operating on multiple contexts at once, using a Substream ID/PASID for each > one - but instead of exposing control of the contexts to anyone else, they > remain hidden behind the kernel driver which already has its own abstracted > uAPI, so overall it ends up as more just internal housekeeping than any > actual mediation. We were looking at the mdev code for inspiration, but > directly using it was never the plan. Well: - Who maps memory into the IOASID (ie the specific sub stream id)? - What memory must be mapped? - Who triggers DMA to this memory? > The driver simply needs to keep track of the domains and PASIDs - > when a process submits some work, it can look up the relevant > domain, iommu_map() the user pages to the right addresses, dma_map() > them for coherency, then poke in the PASID as part of scheduling the > work on the physical device. If you are doing stuff like this then the /dev/ioasid is what you actually want. The userprocess can create its own IOASID, program the io page tables for that IOASID to point to pages as it wants and then just hand over a fully instantiated io page table to the device driver. What you are describing is the literal use case of /dev/ioasid - a clean seperation of managing the IOMMU related parts through /dev/ioasid and the device driver itself is only concerned with generating device DMA that has the proper PASID/substream tag. The entire point is to not duplicate all the iommu code you are describing having written into every driver that just wants an IOASID. In particular, you are talking about having a substream capable device and driver but your driver's uAPI is so limited it can't address the full range of substream configurations: - A substream pointing at a SVA - A substream pointing a IO page table nested under another - A substream pointing at an IOMMU page table shared by many users And more. Which is bad. > > We already talked about this on the "how to use PASID from the kernel" > > thread. > > Do you have a pointer to the right thread so I can catch up? It's not the > easiest thing to search for on lore amongst all the other PASID-related > business :( Somewhere in here: http://lore.kernel.org/r/20210517143758.gp1002...@nvidia.com > FWIW my non-SVA view is that a PASID is merely an index into a set of > iommu_domains, and in that context it doesn't even really matter *who* > allocates them, only that the device driver and IOMMU driver are in sync :) Right, this is where /dev/ioasid is going. However it gets worked out at the kAPI level in the iommu layer the things you asked for are intended to be solved, and lots more. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/6] vfio: remove the unused mdev iommu hook
On 2021-05-20 00:24, Jason Gunthorpe wrote: On Wed, May 19, 2021 at 11:12:46PM +, Tian, Kevin wrote: From: Jason Gunthorpe Sent: Thursday, May 20, 2021 2:07 AM On Wed, May 19, 2021 at 04:23:21PM +0100, Robin Murphy wrote: On 2021-05-17 16:35, Joerg Roedel wrote: On Mon, May 17, 2021 at 10:35:00AM -0300, Jason Gunthorpe wrote: Well, I'm sorry, but there is a huge other thread talking about the IOASID design in great detail and why this is all needed. Jumping into this thread without context and basically rejecting all the conclusions that were reached over the last several weeks is really not helpful - especially since your objection is not technical. I think you should wait for Intel to put together the /dev/ioasid uAPI proposal and the example use cases it should address then you can give feedback there, with proper context. Yes, I think the next step is that someone who read the whole thread writes up the conclusions and a rough /dev/ioasid API proposal, also mentioning the use-cases it addresses. Based on that we can discuss the implications this needs to have for IOMMU-API and code. From the use-cases I know the mdev concept is just fine. But if there is a more generic one we can talk about it. Just to add another voice here, I have some colleagues working on drivers where they want to use SMMU Substream IDs for a single hardware block to operate on multiple iommu_domains managed entirely within the kernel. If it is entirely within the kernel I'm confused how mdev gets involved? mdev is only for vfio which is userspace. By "mdev-like" I mean it's very similar in shape to the general SIOV-style mediated device concept - i.e. a physical device with an awareness of operating on multiple contexts at once, using a Substream ID/PASID for each one - but instead of exposing control of the contexts to anyone else, they remain hidden behind the kernel driver which already has its own abstracted uAPI, so overall it ends up as more just internal housekeeping than any actual mediation. We were looking at the mdev code for inspiration, but directly using it was never the plan. Just add some background. aux domain is used to support mdev but they are not tied together. [ yes, technically my comments are relevant to patch #4, but the discussion was here, so... :) ] Literally aux domain just implies that there could be multiple domains attached to a device then when one of them becomes the primary all the remaining are deemed as auxiliary. From this angle it doesn't matter whether the requirement of multiple domains come from user or kernel. You can't entirely use aux domain from inside the kernel because you can't compose it with the DMA API unless you also attach it to some struct device, and where will the struct device come from? DMA mapping would still be done using the physical device - where this model diverges from mdev is that it doesn't need to fake up a struct device to represent each context since they aren't exposed to anyone else. Assume the driver already has some kind of token to represent each client process, so it just allocates an iommu_domain for a client context and does an iommu_aux_attach_dev() to hook it up to some PASID (which again nobody else ever sees). The driver simply needs to keep track of the domains and PASIDs - when a process submits some work, it can look up the relevant domain, iommu_map() the user pages to the right addresses, dma_map() them for coherency, then poke in the PASID as part of scheduling the work on the physical device. We already talked about this on the "how to use PASID from the kernel" thread. Do you have a pointer to the right thread so I can catch up? It's not the easiest thing to search for on lore amongst all the other PASID-related business :( If Robin just wants to use a stream ID from a kernel driver then that API to make a PASID == RID seems like a better answer for kernel DMA than aux domains is. No, that's not the model - the device has a single Stream ID (RID), and it wants multiple Substream IDs (PASIDs) hanging off that for distinct client contexts; it can still generate non-PASID traffic for stuff like loading its firmware (the regular iommu_domain might be explicitly-managed or might be automatic via iommu-dma - it doesn’t really matter in this context). Aux domains really were a perfect fit conceptually, even if the edges were a bit rough. Now, much as I’d like a stable upstream solution, I can't argue based on this particular driver, since the PASID functionality is still in development, and there seems little likelihood of it being upstreamed either way (the driver belongs to a product team rather than the OSS group I'm part of; I'm just helping them with the SMMU angle). If designing something around aux domains is a dead-end then we (Arm) will probably just prototype our thing using downstream patches to the SMMU driver for now. However given the clear overlap wit
Re: [PATCH v3] iommu/of: Fix pci_request_acs() before enumerating PCI devices
On Thu, May 20, 2021 at 2:28 AM Wang Xingang wrote: > > From: Xingang Wang > > When booting with devicetree, the pci_request_acs() is called after the > enumeration and initialization of PCI devices, thus the ACS is not > enabled. And ACS should be enabled when IOMMU is detected for the > PCI host bridge, so add check for IOMMU before probe of PCI host and call > pci_request_acs() to make sure ACS will be enabled when enumerating PCI > devices. > > Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when > configuring IOMMU linkage") > Signed-off-by: Xingang Wang > --- > drivers/iommu/of_iommu.c | 1 - > drivers/pci/controller/pci-host-common.c | 17 + > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c > index a9d2df001149..54a14da242cc 100644 > --- a/drivers/iommu/of_iommu.c > +++ b/drivers/iommu/of_iommu.c > @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device > *dev, > .np = master_np, > }; > > - pci_request_acs(); > err = pci_for_each_dma_alias(to_pci_dev(dev), > of_pci_iommu_init, &info); > } else { > diff --git a/drivers/pci/controller/pci-host-common.c > b/drivers/pci/controller/pci-host-common.c > index d3924a44db02..5904ad0bd9ae 100644 > --- a/drivers/pci/controller/pci-host-common.c > +++ b/drivers/pci/controller/pci-host-common.c This file is generally only for ECAM compliant hosts. Are those the only hosts we need to support this? From the looks of dts files with iommu-map, that would be dropping support in lots of cases. Perhaps in devm_of_pci_bridge_init() or one of the functions it calls is the better place. > @@ -49,6 +49,21 @@ static struct pci_config_window *gen_pci_init(struct > device *dev, > return cfg; > } > > +static void pci_host_enable_acs(struct pci_host_bridge *bridge) > +{ > + struct device_node *np = bridge->dev.parent->of_node; > + static bool acs_enabled; > + > + if (!np || acs_enabled) > + return; > + > + /* Detect IOMMU and make sure ACS will be enabled */ > + if (of_property_read_bool(np, "iommu-map")) { > + acs_enabled = true; > + pci_request_acs(); Given this function just sets a variable, I don't think you need the static acs_enabled here. > + } > +} > + > int pci_host_common_probe(struct platform_device *pdev) > { > struct device *dev = &pdev->dev; > @@ -81,6 +96,8 @@ int pci_host_common_probe(struct platform_device *pdev) > bridge->ops = (struct pci_ops *)&ops->pci_ops; > bridge->msi_domain = true; > > + pci_host_enable_acs(bridge); > + > return pci_host_probe(bridge); > } > EXPORT_SYMBOL_GPL(pci_host_common_probe); > -- > 2.19.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] lib/vsprintf: Use pr_crit() instead of long fancy messages
On Mon 2021-05-17 08:21:12, Geert Uytterhoeven wrote: > On Wed, Mar 31, 2021 at 11:59 AM Geert Uytterhoeven > wrote: > > While long fancy messages have a higher probability of being seen than > > small messages, they may scroll of the screen fast, if visible at all, > > and may still be missed. In addition, they increase boot time and > > kernel size. > > > > The correct mechanism to increase importance of a kernel message is not > > to draw fancy boxes with more text, but to shout louder, i.e. increase > > the message's reporting level. Making sure the administrator of the > > system is aware of such a message is a system policy, and is the > > responsability of a user-space log daemon. > > > > Fix this by increasing the reporting level from KERN_WARNING to > > KERN_CRIT, and removing irrelevant text and graphics. > > > > This reduces kernel size by ca. 0.5 KiB. > > > > Fixes: 5ead723a20e0447b ("lib/vsprintf: no_hash_pointers prints all > > addresses as unhashed") > > Signed-off-by: Geert Uytterhoeven > > No comments? > Unlike the cases handled by the other two patches in this series, > this one cannot be configured out. IMHO, the best solution would be to create a generic API for eye-catching messages. I am sure that WARN() is misused on many locations all over the kernel because people just wanted eye-catching message, for example, see https://lore.kernel.org/r/2149df3f542d25ce15d049e81d6188bb7198478c.ca...@fi.rohmeurope.com It might be a win-win solution. Best Regards, Petr ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace
On Thu, May 20, 2021 at 2:06 PM Michael S. Tsirkin wrote: > > On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote: > > This series introduces a framework, which can be used to implement > > vDPA Devices in a userspace program. The work consist of two parts: > > control path forwarding and data path offloading. > > > > In the control path, the VDUSE driver will make use of message > > mechnism to forward the config operation from vdpa bus driver > > to userspace. Userspace can use read()/write() to receive/reply > > those control messages. > > > > In the data path, the core is mapping dma buffer into VDUSE > > daemon's address space, which can be implemented in different ways > > depending on the vdpa bus to which the vDPA device is attached. > > > > In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with > > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma > > buffer is reside in a userspace memory region which can be shared to the > > VDUSE userspace processs via transferring the shmfd. > > > > The details and our user case is shown below: > > > > - > > -- > > |Container || QEMU(VM) | | > > VDUSE daemon | > > | - || --- | | > > - | > > | |dev/vdx| || |/dev/vhost-vdpa-x| | | | vDPA device > > emulation | | block driver | | > > +--- ---+ > > -+--+- > > | || > > | > > | || > > | > > +---++--+- > > || block device | | vhost device || vduse driver > > | | TCP/IP || > > |---+ + > > ---+ -+| > > | | | | > > || > > | --+-- --+--- > > ---+---|| > > | | virtio-blk driver | | vhost-vdpa driver | | vdpa device > > ||| > > | --+-- --+--- > > ---+---|| > > | | virtio bus | | > > || > > | ++--- | | > > || > > || | | > > || > > | --+--| | > > || > > | | virtio-blk device || | > > || > > | --+--| | > > || > > || | | > > || > > | ---+--- | | > > || > > | | virtio-vdpa driver | | | > > || > > | ---+--- | | > > || > > || | | > > vdpa bus || > > | > > ---+--+---+ > > || > > | > > ---+--- | > > -| > > NIC |-- > > > > ---+--- > > > > | > > > >-+- > > > >| Remote Storages | > > > >--- > > > > We make use of it to implement a block device connecting to > > our distributed storage, which can be used both in containers and > > VMs. Thus, we can have an unified technology stack in this two cases. > > > > To
Re: [PATCH v2 00/15] dma mapping/iommu: Allow IOMMU IOVA rcache range to be configured
On 10/05/2021 15:17, John Garry wrote: Hi Robin, guys, A friendly reminder on this one... Thanks For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. This is much more pronounced from commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails"), as discussed at [0]. IOVAs which cannot be cached are highly involved in the IOVA aging issue, as discussed at [1]. This series allows the IOVA rcache range be configured, so that we may cache all IOVAs per domain, thus improving performance. A new IOMMU group sysfs file is added - max_opt_dma_size - which is used indirectly to configure the IOVA rcache range: /sys/kernel/iommu_groups/X/max_opt_dma_size This file is updated same as how the IOMMU group default domain type is updated, i.e. must unbind the only device in the group first. However, the IOMMU default domain is reallocated in the device driver reprobe, and not immediately. In addition, we keep (from v1 series) the DMA mapping API to allow DMA max optimised size be set from a LLDD. How it works is a lot different. When the LLDD calls this during probe, once the value is successfully recorded, we return -EDEFER_PROBE. In the reprobe, the IOMM group default domain is reallocated, and the new IOVA domain rcache upper limit is set according to that DMA max optimised size. As such, we don't operate on a live IOMMU domain. Note that the DMA mapping API frontend is not strictly required, but saves the LLDD calling IOMMU APIs directly, that being not preferred. Some figures for storage scenario: v5.13-rc1 baseline: 1200K IOPS With series:1800K IOPS All above are for IOMMU strict mode. Non-strict mode gives ~1800K IOPS in all scenarios. Patch breakdown: 1-11: Add support for setting DMA max optimised size via sysfs 12-15: Add support for setting DMA max optimised size from LLDD [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ [1] https://lore.kernel.org/linux-iommu/1607538189-237944-1-git-send-email-john.ga...@huawei.com/ Differences to v1: - Many - Change method to not operate on a 'live' IOMMU domain: - rather, force device driver to be re-probed once dma_max_opt_size is set, and reconfig a new IOMMU group then - Add iommu sysfs max_dma_opt_size file, and allow updating same as how group type is changed John Garry (15): iommu: Reactor iommu_group_store_type() iova: Allow rcache range upper limit to be flexible iommu: Allow max opt DMA len be set for a group via sysfs iommu: Add iommu_group_get_max_opt_dma_size() iova: Add iova_domain_len_is_cached() iommu: Allow iommu_change_dev_def_domain() realloc default domain for same type iommu: Add iommu_realloc_dev_group() dma-iommu: Add iommu_reconfig_dev_group_dma() iova: Add init_iova_domain_ext() dma-iommu: Use init_iova_domain_ext() for IOVA domain init dma-iommu: Reconfig group domain iommu: Add iommu_set_dev_dma_opt_size() dma-mapping: Add dma_set_max_opt_size() dma-iommu: Add iommu_dma_set_opt_size() scsi: hisi_sas: Set max optimal DMA size for v3 hw drivers/iommu/dma-iommu.c | 51 +- drivers/iommu/iommu.c | 231 +++-- drivers/iommu/iova.c | 61 +-- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 5 + include/linux/dma-iommu.h | 4 + include/linux/dma-map-ops.h| 1 + include/linux/dma-mapping.h| 8 + include/linux/iommu.h | 19 ++ include/linux/iova.h | 21 ++- kernel/dma/mapping.c | 11 ++ 10 files changed, 344 insertions(+), 68 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3] iommu/of: Fix pci_request_acs() before enumerating PCI devices
From: Xingang Wang When booting with devicetree, the pci_request_acs() is called after the enumeration and initialization of PCI devices, thus the ACS is not enabled. And ACS should be enabled when IOMMU is detected for the PCI host bridge, so add check for IOMMU before probe of PCI host and call pci_request_acs() to make sure ACS will be enabled when enumerating PCI devices. Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when configuring IOMMU linkage") Signed-off-by: Xingang Wang --- drivers/iommu/of_iommu.c | 1 - drivers/pci/controller/pci-host-common.c | 17 + 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index a9d2df001149..54a14da242cc 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, .np = master_np, }; - pci_request_acs(); err = pci_for_each_dma_alias(to_pci_dev(dev), of_pci_iommu_init, &info); } else { diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c index d3924a44db02..5904ad0bd9ae 100644 --- a/drivers/pci/controller/pci-host-common.c +++ b/drivers/pci/controller/pci-host-common.c @@ -49,6 +49,21 @@ static struct pci_config_window *gen_pci_init(struct device *dev, return cfg; } +static void pci_host_enable_acs(struct pci_host_bridge *bridge) +{ + struct device_node *np = bridge->dev.parent->of_node; + static bool acs_enabled; + + if (!np || acs_enabled) + return; + + /* Detect IOMMU and make sure ACS will be enabled */ + if (of_property_read_bool(np, "iommu-map")) { + acs_enabled = true; + pci_request_acs(); + } +} + int pci_host_common_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -81,6 +96,8 @@ int pci_host_common_probe(struct platform_device *pdev) bridge->ops = (struct pci_ops *)&ops->pci_ops; bridge->msi_domain = true; + pci_host_enable_acs(bridge); + return pci_host_probe(bridge); } EXPORT_SYMBOL_GPL(pci_host_common_probe); -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH -next 1/3] iommu/arm-smmu-v3: fix missing a blank line after declarations
Fixes checkpatch warnings in arm-smmu-v3.c: WARNING: Missing a blank line after declarations Signed-off-by: Bixuan Cui --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 54b2f27b81d4..4f184119c26d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -151,6 +151,7 @@ static void queue_sync_cons_out(struct arm_smmu_queue *q) static void queue_inc_cons(struct arm_smmu_ll_queue *q) { u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1; + q->cons = Q_OVF(q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons); } @@ -176,6 +177,7 @@ static int queue_sync_prod_in(struct arm_smmu_queue *q) static u32 queue_inc_prod_n(struct arm_smmu_ll_queue *q, int n) { u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + n; + return Q_OVF(q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod); } @@ -1895,6 +1897,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) mutex_unlock(&arm_smmu_asid_lock); } else { struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; + if (cfg->vmid) arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid); } @@ -2724,6 +2727,7 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu, static void arm_smmu_cmdq_free_bitmap(void *data) { unsigned long *bitmap = data; + bitmap_free(bitmap); } @@ -2939,6 +2943,7 @@ static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr) static void arm_smmu_free_msis(void *data) { struct device *dev = data; + platform_msi_domain_free_irqs(dev); } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH -next 0/3] iommu/arm-smmu-v3: clean up some code style issues
Clean up some code style issues. Bixuan Cui (3): iommu/arm-smmu-v3: fix missing a blank line after declarations iommu/arm-smmu-v3: Change *array into *const array iommu/arm-smmu-v3: Prefer unsigned int to bare use of unsigned drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH -next 3/3] iommu/arm-smmu-v3: Prefer unsigned int to bare use of unsigned
Fix checkpatch warning in arm-smmu-v3.c: Prefer 'unsigned int' to bare use of 'unsigned' Signed-off-by: Bixuan Cui --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 51ce44fe550c..725b099d0652 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1827,7 +1827,7 @@ static bool arm_smmu_capable(enum iommu_cap cap) } } -static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) +static struct iommu_domain *arm_smmu_domain_alloc(unsigned int type) { struct arm_smmu_domain *smmu_domain; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH -next 2/3] iommu/arm-smmu-v3: Change *array into *const array
Fix checkpatch warning in arm-smmu-v3.c: static const char * array should probably be static const char * const Signed-off-by: Bixuan Cui --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4f184119c26d..51ce44fe550c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -354,7 +354,7 @@ static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device *smmu, static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu) { - static const char *cerror_str[] = { + static const char * const cerror_str[] = { [CMDQ_ERR_CERROR_NONE_IDX] = "No error", [CMDQ_ERR_CERROR_ILL_IDX] = "Illegal command", [CMDQ_ERR_CERROR_ABT_IDX] = "Abort on command fetch", -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Re: Re: [PATCH v7 04/12] virtio-blk: Add validation for block size in config space
On Thu, May 20, 2021 at 1:43 PM Michael S. Tsirkin wrote: > > On Thu, May 20, 2021 at 01:25:16PM +0800, Yongji Xie wrote: > > On Wed, May 19, 2021 at 10:42 PM Dan Carpenter > > wrote: > > > > > > On Wed, May 19, 2021 at 09:39:20PM +0800, Yongji Xie wrote: > > > > On Mon, May 17, 2021 at 5:56 PM Xie Yongji > > > > wrote: > > > > > > > > > > This ensures that we will not use an invalid block size > > > > > in config space (might come from an untrusted device). > > > > > > I looked at if I should add this as an untrusted function so that Smatch > > > could find these sorts of bugs but this is reading data from the host so > > > there has to be some level of trust... > > > > > > > It would be great if Smatch could detect this case if possible. The > > data might be trusted in traditional VM cases. But now the data can be > > read from a userspace daemon when VDUSE is enabled. > > > > > I should add some more untrusted data kvm functions to Smatch. Right > > > now I only have kvm_register_read() and I've added kvm_read_guest_virt() > > > just now. > > > > > > > > > > > > > Signed-off-by: Xie Yongji > > > > > --- > > > > > drivers/block/virtio_blk.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > > > > > index ebb4d3fe803f..c848aa36d49b 100644 > > > > > --- a/drivers/block/virtio_blk.c > > > > > +++ b/drivers/block/virtio_blk.c > > > > > @@ -826,7 +826,7 @@ static int virtblk_probe(struct virtio_device > > > > > *vdev) > > > > > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE, > > > > >struct virtio_blk_config, blk_size, > > > > >&blk_size); > > > > > - if (!err) > > > > > + if (!err && blk_size > 0 && blk_size <= max_size) > > > > > > > > The check here is incorrect. I will use PAGE_SIZE as the maximum > > > > boundary in the new version. > > > > > > What does this bug look like to the user? > > > > The kernel will panic if the block size is larger than PAGE_SIZE. > > Kernel panic at this point is par for the course IMHO. But it seems better if we can avoid this kind of panic. Because this might also be triggered by a buggy VDUSE daemon. > Let's focus on eliminating data corruption for starters. OK, now the incorrect used length might cause data corruption in virtio-net and virtio-console drivers as I mentioned in another mail. I will send a fix ASAP. Thanks, Yongji ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
On Thu, May 20, 2021 at 2:28 PM Al Viro wrote: > > On Mon, May 17, 2021 at 05:55:12PM +0800, Xie Yongji wrote: > > > + case VDUSE_IOTLB_GET_FD: { > > + struct vduse_iotlb_entry entry; > > + struct vhost_iotlb_map *map; > > + struct vdpa_map_file *map_file; > > + struct vduse_iova_domain *domain = dev->domain; > > + struct file *f = NULL; > > + > > + ret = -EFAULT; > > + if (copy_from_user(&entry, argp, sizeof(entry))) > > + break; > > return -EFAULT; > surely? > > + > > + ret = -EINVAL; > > + if (entry.start > entry.last) > > + break; > > ... and similar here, etc. > OK. > > + spin_lock(&domain->iotlb_lock); > > + map = vhost_iotlb_itree_first(domain->iotlb, > > + entry.start, entry.last); > > + if (map) { > > + map_file = (struct vdpa_map_file *)map->opaque; > > + f = get_file(map_file->file); > > + entry.offset = map_file->offset; > > + entry.start = map->start; > > + entry.last = map->last; > > + entry.perm = map->perm; > > + } > > + spin_unlock(&domain->iotlb_lock); > > + ret = -EINVAL; > > + if (!f) > > + break; > > + > > + ret = -EFAULT; > > + if (copy_to_user(argp, &entry, sizeof(entry))) { > > + fput(f); > > + break; > > + } > > + ret = receive_fd(f, perm_to_file_flags(entry.perm)); > > + fput(f); > > + break; > > IDGI. The main difference between receive_fd() and plain old > get_unused_fd_flags() + fd_install() is __receive_sock() call. > Which does nothing whatsoever in case of non-sockets. Can you > get a socket here? > Actually what I want here is the security_file_receive() hook in receive_fd(). Thanks, Yongji ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu