Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function
On 28/12/17 19:25, Jacob Pan wrote: [...] >>> + * @size: 2^size of 4K pages, 0 for 4k, 9 for 2MB, >>> etc. >> >> Having only power of two invalidation seems too restrictive for a >> software interface. You might have the same problem as above, where >> the guest or userspace needs to send lots of invalidation requests, >> They could be multiplexed by passing an arbitrary range instead. How >> about making @size a __u64? >> > Sure if you have such need for non power of two. So it will be __u64 of > 4k pages? 4k granule would work for us right now, but other architectures may plan to support arbitrary sizes. The map/unmap API does support arbitrary sizes, so it might be better to have a byte granularity in @size. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function
On Fri, 24 Nov 2017 12:04:31 + Jean-Philippe Bruckerwrote: > Hi, > > On 17/11/17 18:55, Jacob Pan wrote: > > From: "Liu, Yi L" > > > > When an SVM capable device is assigned to a guest, the first level > > page tables are owned by the guest and the guest PASID table > > pointer is linked to the device context entry of the physical IOMMU. > > > > Host IOMMU driver has no knowledge of caching structure updates > > unless the guest invalidation activities are passed down to the > > host. The primary usage is derived from emulated IOMMU in the > > guest, where QEMU can trap invalidation activities before passing > > them down to the host/physical IOMMU. > > Since the invalidation data are obtained from user space and will be > > written into physical IOMMU, we must allow security check at various > > layers. Therefore, generic invalidation data format are proposed > > here, model specific IOMMU drivers need to convert them into their > > own format. > > > > Signed-off-by: Liu, Yi L > > Signed-off-by: Jacob Pan > > Signed-off-by: Ashok Raj > [...] > > #endif /* __LINUX_IOMMU_H */ > > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > > index 651ad5d..039ba36 100644 > > --- a/include/uapi/linux/iommu.h > > +++ b/include/uapi/linux/iommu.h > > @@ -36,4 +36,66 @@ struct pasid_table_config { > > }; > > }; > > > > +enum iommu_inv_granularity { > > + IOMMU_INV_GRANU_GLOBAL, /* all TLBs > > invalidated */ > > + IOMMU_INV_GRANU_DOMAIN, /* all TLBs > > associated with a domain */ > > + IOMMU_INV_GRANU_DEVICE, /* caching > > structure associated with a > > +* device ID > > +*/ > > I thought you were planning on removing these? If we do need global > invalidation, for example the guest clears the whole PASID table and > doesn't want to send individual GRANU_ALL_PASID invalidations, maybe > keep only GRANU_DOMAIN? > yes, we can remove global and keep domain & pasid. > > + IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with > > a domain */ > > + IOMMU_INV_GRANU_ALL_PASID, /* cache of a given > > PASID */ > > + IOMMU_INV_GRANU_PASID_SEL, /* only invalidate > > specified PASID */ > > GRANU_PASID_SEL seems redundant, don't you already get it by default > with GRANU_ALL_PASID and GRANU_DOMAIN_PAGE (with > IOMMU_INVALIDATE_PASID_TAGGED flag)? > yes, you can deduce from certain combinations of flags. My thinking was for an easy look up from generic flags to model specific fields. Same as the one below. I will try to consolidate based on your input in the next version. > > + > > + IOMMU_INV_GRANU_NG_ALL_PASID, /* non-global within > > all PASIDs */ > > + IOMMU_INV_GRANU_NG_PASID, /* non-global within a > > PASIDs */ > > Don't you get the "NG" behavior by not passing the > IOMMU_INVALIDATE_GLOBAL_PAGE flag defined below? > > > + IOMMU_INV_GRANU_PAGE_PASID, /* page-selective > > within a PASID */ > > And don't you get this with > GRANU_DOMAIN_PAGE+IOMMU_INVALIDATE_PASID_TAGGED? > > > + IOMMU_INV_NR_GRANU, > > +}; > > + > > +enum iommu_inv_type { > > + IOMMU_INV_TYPE_DTLB,/* device IOTLB */ > > + IOMMU_INV_TYPE_TLB, /* IOMMU paging structure cache > > */ > > + IOMMU_INV_TYPE_PASID, /* PASID cache */ > > + IOMMU_INV_TYPE_CONTEXT, /* device context entry > > cache */ > > + IOMMU_INV_NR_TYPE > > +}; > > When the guest removes a PASID entry, it would have to send DTLB, TLB > and PASID invalidations separately? Could we define this inv_type as > cumulative, to avoid redundant invalidation requests: > That is a good idea, but it will require some change to VT-d driver. For emulated IOMMU and current VT-d driver, we do send separate requests for PASID cache, followed by IOTLB/DTLB invalidation. But we do have a caching mode capability bit to tell the driver whether it is running on a real IOMMU or not. So we can combine and reduce invalidation overhead as you said below. Not sure about AMD though? > * TYPE_DTLB only invalidates ATC entries. > * TYPE_TLB invalidates both ATC and IOTLB entries. > * TYPE_PASID invalidates all ATC and IOTLB entries for a PASID, and > also the PASID cache entry. Sounds good to me. > * TYPE_CONTEXT invalidates all. Although is it needed by userspace or > just here fore completeness? "CONTEXT" is specific to VT-d (doesn't > exist on AMD and has a different meaning on SMMU), how about "DEVICE" > instead? It is here for completeness. context entry is set during bind/unbind pasid table call. I can remove it for now. > > This is important because invalidation will probably become the > bottleneck. The guest shouldn't have to send DTLB and TLB invalidation > separately after each unmapping. > Agreed, i will change the VT-d driver to accommodate that. i.e. For
Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function
A quick update on invalidations before I leave for holidays, since we're struggling to define useful semantics. I worked on the virtio-iommu prototype for vSVA, so I tried to pin down what I think is needed for vSVA invalidation in the host. I don't know whether the VT-d and AMD emulations can translate all of this from guest commands. Scope selects which entries are invalidated, and flags cherry-pick what caches to invalidate. For example a guest might remove GBs of sparse mappings, and decide that it would be quicker to invalidate the whole context instead of one at a time. Then it would set only flags = (TLB | DEV_TLB) with scope = PASID. If the guest clears one entry in the PASID table, then it would send scope = PASID and flags = (LEAF | CONFIG | TLB | DEV_TLB). On an ARM system the guest can invalidate TLBs with CPU instructions, but can't invalidate ATCs. So it would send an invalidate with flags = (LEAF | TLB) and scope = VA. enum iommu_sva_inval_scope { IOMMU_INVALIDATE_DOMAIN = 1, IOMMU_INVALIDATE_PASID, IOMMU_INVALIDATE_VA, }; /* Only invalidate leaf entry. Applies to PASID table if scope == PASID or * page tables if scope == VA. */ #define IOMMU_INVALIDATE_LEAF (1 << 0) /* Invalidate cached PASID table configuration */ #define IOMMU_INVALIDATE_CONFIG (1 << 1) /* Invalidate IOTLBs */ #define IOMMU_INVALIDATE_TLB(1 << 2) /* Invalidate ATCs */ #define IOMMU_INVALIDATE_DEV_TLB(1 << 3) /* + Need a global flag? */ struct iommu_sva_invalidate { enum iommu_sva_inval_scope scope; u32 flags; u32 pasid; u64 iova; u64 size; /* Arch-specific, format is determined at bind time */ union { struct { u16 asid; u8 granule; } arm; } }; ARM needs two more fields. A 16-bit @asid (Address Space ID) targets TLB entries and may be different from the PASID (up to the guest to decide), which targets ATC and config entries. @granule is the TLB granule that we're invalidating. For instance if the guest just unmapped a few 2M huge pages, it sets @granule to 21 bits, so we issue less invalidation commands, since we only need to evict huge TLB entries. I'm not sure about other architecture but I'd be surprised if this wasn't more common. Should we move it to the common part? int iommu_sva_invalidate(struct iommu_domain *domain, struct iommu_sva_invalidate *inval); And so the host driver implementation is roughly: -- bool leaf = flags & IOMMU_INVALIDATE_LEAF; bool config = flags & IOMMU_INVALIDATE_CONFIG; bool tlb= flags & IOMMU_INVALIDATE_TLB; bool atc= flags & IOMMU_INVALIDATE_DEV_TLB; if (config) { switch (scope) { case IOMMU_INVALIDATE_PASID: inval_cached_pasid_entry(domain, pasid, leaf); break; case IOMMU_INVALIDATE_DOMAIN: inval_all_cached_pasid_entries(domain); break; default: return -EINVAL; } /* Wait for caches to be clean, then invalidate TLBs */ sync_commands(); } if (tlb) { switch (scope) { case IOMMU_INVALIDATE_VA: inval_tlb_entries(domain, asid, iova, size, granule, leaf); break; case IOMMU_INVALIDATE_PASID: inval_all_tlb_entries_for_asid(domain, asid); break; case IOMMU_INVALIDATE_DOMAIN: inval_all_tlb_entries(domain); break; default: return -EINVAL; } /* Wait for TLBs to be clean, then invalidate ATCs. */ sync_commands(); } if (atc) { /* ATC invalidations are sent to all devices in the domain */ switch (scope) { case IOMMU_INVALIDATE_VA: inval_atc_entries(domain, pasid, iova, size); break; case IOMMU_INVALIDATE_PASID: /* Covers the full address space */ inval_all_atc_entries_for_pasid(domain, pasid); break; case IOMMU_INVALIDATE_DOMAIN: /* Set Global Invalidate */ inval_all_atc_entries(domain); break; default: return -EINVAL; } sync_commands(); } /* Then return to guest. */ -- I think this covers what we need and allows userspace or the guest to gather multiple invalidations into a single request/ioctl. I don't think per-device ATC invalidation is needed, but might be wrong. According to ATS it is implicit when the
Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function
Hi, On 17/11/17 18:55, Jacob Pan wrote: > From: "Liu, Yi L"> > When an SVM capable device is assigned to a guest, the first level page > tables are owned by the guest and the guest PASID table pointer is > linked to the device context entry of the physical IOMMU. > > Host IOMMU driver has no knowledge of caching structure updates unless > the guest invalidation activities are passed down to the host. The > primary usage is derived from emulated IOMMU in the guest, where QEMU > can trap invalidation activities before passing them down to the > host/physical IOMMU. > Since the invalidation data are obtained from user space and will be > written into physical IOMMU, we must allow security check at various > layers. Therefore, generic invalidation data format are proposed here, > model specific IOMMU drivers need to convert them into their own format. > > Signed-off-by: Liu, Yi L > Signed-off-by: Jacob Pan > Signed-off-by: Ashok Raj [...] > #endif /* __LINUX_IOMMU_H */ > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > index 651ad5d..039ba36 100644 > --- a/include/uapi/linux/iommu.h > +++ b/include/uapi/linux/iommu.h > @@ -36,4 +36,66 @@ struct pasid_table_config { > }; > }; > > +enum iommu_inv_granularity { > + IOMMU_INV_GRANU_GLOBAL, /* all TLBs invalidated */ > + IOMMU_INV_GRANU_DOMAIN, /* all TLBs associated with a domain */ > + IOMMU_INV_GRANU_DEVICE, /* caching structure associated with a > + * device ID > + */ I thought you were planning on removing these? If we do need global invalidation, for example the guest clears the whole PASID table and doesn't want to send individual GRANU_ALL_PASID invalidations, maybe keep only GRANU_DOMAIN? > + IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with a domain */ > + IOMMU_INV_GRANU_ALL_PASID, /* cache of a given PASID */ > + IOMMU_INV_GRANU_PASID_SEL, /* only invalidate specified PASID */ GRANU_PASID_SEL seems redundant, don't you already get it by default with GRANU_ALL_PASID and GRANU_DOMAIN_PAGE (with IOMMU_INVALIDATE_PASID_TAGGED flag)? > + > + IOMMU_INV_GRANU_NG_ALL_PASID, /* non-global within all PASIDs */ > + IOMMU_INV_GRANU_NG_PASID, /* non-global within a PASIDs */ Don't you get the "NG" behavior by not passing the IOMMU_INVALIDATE_GLOBAL_PAGE flag defined below? > + IOMMU_INV_GRANU_PAGE_PASID, /* page-selective within a PASID */ And don't you get this with GRANU_DOMAIN_PAGE+IOMMU_INVALIDATE_PASID_TAGGED? > + IOMMU_INV_NR_GRANU, > +}; > + > +enum iommu_inv_type { > + IOMMU_INV_TYPE_DTLB,/* device IOTLB */ > + IOMMU_INV_TYPE_TLB, /* IOMMU paging structure cache */ > + IOMMU_INV_TYPE_PASID, /* PASID cache */ > + IOMMU_INV_TYPE_CONTEXT, /* device context entry cache */ > + IOMMU_INV_NR_TYPE > +}; When the guest removes a PASID entry, it would have to send DTLB, TLB and PASID invalidations separately? Could we define this inv_type as cumulative, to avoid redundant invalidation requests: * TYPE_DTLB only invalidates ATC entries. * TYPE_TLB invalidates both ATC and IOTLB entries. * TYPE_PASID invalidates all ATC and IOTLB entries for a PASID, and also the PASID cache entry. * TYPE_CONTEXT invalidates all. Although is it needed by userspace or just here fore completeness? "CONTEXT" is specific to VT-d (doesn't exist on AMD and has a different meaning on SMMU), how about "DEVICE" instead? This is important because invalidation will probably become the bottleneck. The guest shouldn't have to send DTLB and TLB invalidation separately after each unmapping. > +/** > + * Translation cache invalidation header that contains mandatory meta data. > + * @version: info format version, expecting future extesions > + * @type:type of translation cache to be invalidated > + */ > +struct tlb_invalidate_hdr { > + __u32 version; > +#define TLB_INV_HDR_VERSION_1 1 > + enum iommu_inv_type type; > +}; > + > +/** > + * Translation cache invalidation information, contains generic IOMMU > + * data which can be parsed based on model ID by model specific drivers. > + * > + * @granularity: requested invalidation granularity, type dependent > + * @size:2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. Having only power of two invalidation seems too restrictive for a software interface. You might have the same problem as above, where the guest or userspace needs to send lots of invalidation requests, They could be multiplexed by passing an arbitrary range instead. How about making @size a __u64? > + * @pasid: processor address space ID value per PCI spec. > + * @addr:page address to be invalidated > + * @flagsIOMMU_INVALIDATE_PASID_TAGGED: DMA with PASID tagged, > + *
[PATCH v3 03/16] iommu: introduce iommu invalidate API function
From: "Liu, Yi L"When an SVM capable device is assigned to a guest, the first level page tables are owned by the guest and the guest PASID table pointer is linked to the device context entry of the physical IOMMU. Host IOMMU driver has no knowledge of caching structure updates unless the guest invalidation activities are passed down to the host. The primary usage is derived from emulated IOMMU in the guest, where QEMU can trap invalidation activities before passing them down to the host/physical IOMMU. Since the invalidation data are obtained from user space and will be written into physical IOMMU, we must allow security check at various layers. Therefore, generic invalidation data format are proposed here, model specific IOMMU drivers need to convert them into their own format. Signed-off-by: Liu, Yi L Signed-off-by: Jacob Pan Signed-off-by: Ashok Raj --- drivers/iommu/iommu.c | 14 +++ include/linux/iommu.h | 12 + include/uapi/linux/iommu.h | 62 ++ 3 files changed, 88 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index c7e0d64..829e9e9 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1341,6 +1341,20 @@ void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table); +int iommu_sva_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info) +{ + int ret = 0; + + if (unlikely(!domain->ops->sva_invalidate)) + return -ENODEV; + + ret = domain->ops->sva_invalidate(domain, dev, inv_info); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_sva_invalidate); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0f6f6c5..da684a7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -190,6 +190,7 @@ struct iommu_resv_region { * @pgsize_bitmap: bitmap of all possible supported page sizes * @bind_pasid_table: bind pasid table pointer for guest SVM * @unbind_pasid_table: unbind pasid table pointer and restore defaults + * @sva_invalidate: invalidate translation caches of shared virtual address */ struct iommu_ops { bool (*capable)(enum iommu_cap); @@ -243,6 +244,8 @@ struct iommu_ops { struct pasid_table_config *pasidt_binfo); void (*unbind_pasid_table)(struct iommu_domain *domain, struct device *dev); + int (*sva_invalidate)(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info); unsigned long pgsize_bitmap; }; @@ -309,6 +312,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, struct pasid_table_config *pasidt_binfo); extern void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev); +extern int iommu_sva_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info); + extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); @@ -720,6 +726,12 @@ void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev) { } +static inline int iommu_sva_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info) +{ + return -EINVAL; +} + #endif /* CONFIG_IOMMU_API */ #endif /* __LINUX_IOMMU_H */ diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 651ad5d..039ba36 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -36,4 +36,66 @@ struct pasid_table_config { }; }; +enum iommu_inv_granularity { + IOMMU_INV_GRANU_GLOBAL, /* all TLBs invalidated */ + IOMMU_INV_GRANU_DOMAIN, /* all TLBs associated with a domain */ + IOMMU_INV_GRANU_DEVICE, /* caching structure associated with a +* device ID +*/ + IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with a domain */ + IOMMU_INV_GRANU_ALL_PASID, /* cache of a given PASID */ + IOMMU_INV_GRANU_PASID_SEL, /* only invalidate specified PASID */ + + IOMMU_INV_GRANU_NG_ALL_PASID, /* non-global within all PASIDs */ + IOMMU_INV_GRANU_NG_PASID, /* non-global within a PASIDs */ + IOMMU_INV_GRANU_PAGE_PASID, /* page-selective within a PASID */ + IOMMU_INV_NR_GRANU, +}; + +enum iommu_inv_type { +