Re: No check of the size passed to unmap_single in swiotlb
..snip.. > > There doesn't seem to be much good reason for SWIOTLB to be more special > > than other DMA API backends, and not all of them have enough internal state > > to > > be able to make such a check. It's also not necessarily possible to "prevent > > damage" anyway - if a driver does pass a bogus size for > > dma_unmap_single(..., > > DMA_FROM_DEVICE), SWIOTLB might be able to keep itself internally > > consistent, > > but it still can't prevent the arch code in the middle from invalidating > > the wrong > > cache lines and potentially corrupting adjacent memory. > > > > In short, trying to work around broken drivers is a much worse idea than > > just > > fixing those drivers, and that's what we already have dma-debug for. > > > > Robin. > > Hi Robin, > > I agree that hacking kernel to fix broken drivers is not acceptable, actually > we spent days to fight driver supplier with this, they do not want to change > their code and want fix it directly in kernel. You could upstream the driver? That way it will be fixed. I am not sure if you are aware but there is a staging directory as well in Linux. > > I tried Dma-debug yesterday, it works very well, but I think only the size > mismatch check may not be enough for the map entry corrupt situation, some > run-time warning may be better when the real corruption happen. > > For most of the dma-api backend, the size mismatch may do no harm at all, and > even in SWIOTLB itself when the bounce buffer is not used, the size mismatch > do no harm either. In our case, the same buggy driver works well when board > has 2G DDR, but panic frequently in 4G DDR because of the use of bounce > buffer and these corrupted map entries. it is hard to catch this kind of > bugs, for when the corruption happen, the kernel has all kind of reasons to > panic, but not even one may directly point to the real source. > > Add the warning messages is a big convenience for figure this kind of issues, > at least to me and the AP driver supplier, such warnings may save weeks of > hard debug time. I would prefer that all of those warning messages go in the DMA debug API - as you can have multiple DMA different backends. In other words, instead of being in one implementation it would make much more sense to be in the generic API layer. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function
Hi, On 17/11/17 18:55, Jacob Pan wrote: > From: "Liu, Yi L" > > When an SVM capable device is assigned to a guest, the first level page > tables are owned by the guest and the guest PASID table pointer is > linked to the device context entry of the physical IOMMU. > > Host IOMMU driver has no knowledge of caching structure updates unless > the guest invalidation activities are passed down to the host. The > primary usage is derived from emulated IOMMU in the guest, where QEMU > can trap invalidation activities before passing them down to the > host/physical IOMMU. > Since the invalidation data are obtained from user space and will be > written into physical IOMMU, we must allow security check at various > layers. Therefore, generic invalidation data format are proposed here, > model specific IOMMU drivers need to convert them into their own format. > > Signed-off-by: Liu, Yi L > Signed-off-by: Jacob Pan > Signed-off-by: Ashok Raj [...] > #endif /* __LINUX_IOMMU_H */ > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > index 651ad5d..039ba36 100644 > --- a/include/uapi/linux/iommu.h > +++ b/include/uapi/linux/iommu.h > @@ -36,4 +36,66 @@ struct pasid_table_config { > }; > }; > > +enum iommu_inv_granularity { > + IOMMU_INV_GRANU_GLOBAL, /* all TLBs invalidated */ > + IOMMU_INV_GRANU_DOMAIN, /* all TLBs associated with a domain */ > + IOMMU_INV_GRANU_DEVICE, /* caching structure associated with a > + * device ID > + */ I thought you were planning on removing these? If we do need global invalidation, for example the guest clears the whole PASID table and doesn't want to send individual GRANU_ALL_PASID invalidations, maybe keep only GRANU_DOMAIN? > + IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with a domain */ > + IOMMU_INV_GRANU_ALL_PASID, /* cache of a given PASID */ > + IOMMU_INV_GRANU_PASID_SEL, /* only invalidate specified PASID */ GRANU_PASID_SEL seems redundant, don't you already get it by default with GRANU_ALL_PASID and GRANU_DOMAIN_PAGE (with IOMMU_INVALIDATE_PASID_TAGGED flag)? > + > + IOMMU_INV_GRANU_NG_ALL_PASID, /* non-global within all PASIDs */ > + IOMMU_INV_GRANU_NG_PASID, /* non-global within a PASIDs */ Don't you get the "NG" behavior by not passing the IOMMU_INVALIDATE_GLOBAL_PAGE flag defined below? > + IOMMU_INV_GRANU_PAGE_PASID, /* page-selective within a PASID */ And don't you get this with GRANU_DOMAIN_PAGE+IOMMU_INVALIDATE_PASID_TAGGED? > + IOMMU_INV_NR_GRANU, > +}; > + > +enum iommu_inv_type { > + IOMMU_INV_TYPE_DTLB,/* device IOTLB */ > + IOMMU_INV_TYPE_TLB, /* IOMMU paging structure cache */ > + IOMMU_INV_TYPE_PASID, /* PASID cache */ > + IOMMU_INV_TYPE_CONTEXT, /* device context entry cache */ > + IOMMU_INV_NR_TYPE > +}; When the guest removes a PASID entry, it would have to send DTLB, TLB and PASID invalidations separately? Could we define this inv_type as cumulative, to avoid redundant invalidation requests: * TYPE_DTLB only invalidates ATC entries. * TYPE_TLB invalidates both ATC and IOTLB entries. * TYPE_PASID invalidates all ATC and IOTLB entries for a PASID, and also the PASID cache entry. * TYPE_CONTEXT invalidates all. Although is it needed by userspace or just here fore completeness? "CONTEXT" is specific to VT-d (doesn't exist on AMD and has a different meaning on SMMU), how about "DEVICE" instead? This is important because invalidation will probably become the bottleneck. The guest shouldn't have to send DTLB and TLB invalidation separately after each unmapping. > +/** > + * Translation cache invalidation header that contains mandatory meta data. > + * @version: info format version, expecting future extesions > + * @type:type of translation cache to be invalidated > + */ > +struct tlb_invalidate_hdr { > + __u32 version; > +#define TLB_INV_HDR_VERSION_1 1 > + enum iommu_inv_type type; > +}; > + > +/** > + * Translation cache invalidation information, contains generic IOMMU > + * data which can be parsed based on model ID by model specific drivers. > + * > + * @granularity: requested invalidation granularity, type dependent > + * @size:2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. Having only power of two invalidation seems too restrictive for a software interface. You might have the same problem as above, where the guest or userspace needs to send lots of invalidation requests, They could be multiplexed by passing an arbitrary range instead. How about making @size a __u64? > + * @pasid: processor address space ID value per PCI spec. > + * @addr:page address to be invalidated > + * @flagsIOMMU_INVALIDATE_PASID_TAGGED: DMA with PASID tagged, > + * @pasid validity can be > + * deduced from @granul
Re: [PATCH v3 08/16] iommu: introduce device fault data
Hi Jacob, On 17/11/17 18:55, Jacob Pan wrote: > Device faults detected by IOMMU can be reported outside IOMMU > subsystem for further processing. This patch intends to provide > a generic device fault data such that device drivers can be > communicated with IOMMU faults without model specific knowledge. > > The proposed format is the result of discussion at: > https://lkml.org/lkml/2017/11/10/291 > Part of the code is based on Jean-Philippe Brucker's patchset > (https://patchwork.kernel.org/patch/9989315/). > > The assumption is that model specific IOMMU driver can filter and > handle most of the internal faults if the cause is within IOMMU driver > control. Therefore, the fault reasons can be reported are grouped > and generalized based common specifications such as PCI ATS. > > Signed-off-by: Jacob Pan > Signed-off-by: Liu, Yi L > Signed-off-by: Ashok Raj This looks good from my point of view. And since it's not UAPI, we can always update it if it turns out that device drivers need more information. [...] > +/** > + * struct iommu_fault_event - Generic per device fault data > + * > + * - PCI and non-PCI devices > + * - Recoverable faults (e.g. page request), information based on PCI ATS > + * and PASID spec. > + * - Un-recoverable faults of device interest > + * - DMA remapping and IRQ remapping faults > + > + * @type contains fault type. > + * @reason fault reasons if relevant outside IOMMU driver, IOMMU driver > internal > + * faults are not reported > + * @addr: tells the offending page address > + * @pasid: contains process address space ID, used in shared virtual > memory(SVM) > + * @rid: requestor ID This comment can be removed Thanks, Jean > + * @page_req_group_id: page request group index > + * @last_req: last request in a page request group > + * @pasid_valid: indicates if the PRQ has a valid PASID > + * @prot: page access protection flag, e.g. IOMMU_FAULT_READ, > IOMMU_FAULT_WRITE > + * @device_private: if present, uniquely identify device-specific > + * private data for an individual page request. > + * @iommu_private: used by the IOMMU driver for storing fault-specific > + * data. Users should not modify this field before > + * sending the fault response. > + */ > +struct iommu_fault_event { > + enum iommu_fault_type type; > + enum iommu_fault_reason reason; > + u64 addr; > + u32 pasid; > + u32 page_req_group_id : 9; > + u32 last_req : 1; > + u32 pasid_valid : 1; > + u32 prot; > + u64 device_private; > + u64 iommu_private; > +}; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/16] iommu: introduce bind_pasid_table API function
On 17/11/17 18:54, Jacob Pan wrote: > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) > use in the guest: > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html > > As part of the proposed architecture, when an SVM capable PCI > device is assigned to a guest, nested mode is turned on. Guest owns the > first level page tables (request with PASID) which performs GVA->GPA > translation. Second level page tables are owned by the host for GPA->HPA > translation for both request with and without PASID. > > A new IOMMU driver interface is therefore needed to perform tasks as > follows: > * Enable nested translation and appropriate translation type > * Assign guest PASID table pointer (in GPA) and size to host IOMMU > > This patch introduces new API functions to perform bind/unbind guest PASID > tables. Based on common data, model specific IOMMU drivers can be extended > to perform the specific steps for binding pasid table of assigned devices. > [...] > > #define IOMMU_READ (1 << 0) > #define IOMMU_WRITE (1 << 1) > @@ -187,6 +188,8 @@ struct iommu_resv_region { > * @domain_get_windows: Return the number of windows for a domain > * @of_xlate: add OF master IDs to iommu grouping > * @pgsize_bitmap: bitmap of all possible supported page sizes > + * @bind_pasid_table: bind pasid table pointer for guest SVM > + * @unbind_pasid_table: unbind pasid table pointer and restore defaults I was wondering, are you planning on using the IOMMU_DOMAIN_NESTING attribute? It differentiates a domain that supports bind/unbind_pasid_table and map/unmap GPA (virt SVM), from the domain that supports bind/unbind individual PASIDs and map/unmap IOVA (host SVM)? Users can set this attribute by using the VFIO_TYPE1_NESTING_IOMMU type instead of VFIO_TYPE1v2_IOMMU, which seems ideal for what we're trying to do. [...] > +/** > + * PASID table data used to bind guest PASID table to the host IOMMU. This > will > + * enable guest managed first level page tables. > + * @version: for future extensions and identification of the data format > + * @bytes: size of this structure > + * @base_ptr:PASID table pointer > + * @pasid_bits: number of bits supported in the guest PASID table, must > be less > + * or equal than the host supported PASID size. Why remove the @model parameter? > + */ > +struct pasid_table_config { > + __u32 version; > +#define PASID_TABLE_CFG_VERSION 1 > + __u32 bytes; > + __u64 base_ptr; > + __u8 pasid_bits; > + /* reserved for extension of vendor specific config */ > + union { > + struct { > + /* ARM specific fields */ > + bool pasid0_dma_no_pasid; > + } arm; I think @model is still required for sanity check, but could you remove the whole union for the moment? Other parameters will be needed and I'm still thinking about it, so I'll add the arm struct back in a future patch. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 15/16] iommu: introduce page response function
On 17/11/17 18:55, Jacob Pan wrote: > When nested translation is turned on and guest owns the > first level page tables, device page request can be forwared > to the guest for handling faults. As the page response returns > by the guest, IOMMU driver on the host need to process the > response which informs the device and completes the page request > transaction. > > This patch introduces generic API function for page response > passing from the guest or other in-kernel users. The definitions of > the generic data is based on PCI ATS specification not limited to > any vendor.> > Signed-off-by: Jacob Pan > --- > drivers/iommu/iommu.c | 14 ++ > include/linux/iommu.h | 42 ++ > 2 files changed, 56 insertions(+) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 97b7990..7aefb40 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1416,6 +1416,20 @@ int iommu_sva_invalidate(struct iommu_domain *domain, > } > EXPORT_SYMBOL_GPL(iommu_sva_invalidate); > > +int iommu_page_response(struct iommu_domain *domain, struct device *dev, > + struct page_response_msg *msg) I think it's simpler, both for IOMMU and device drivers, to pass the exact structure received in the fault handler back to the IOMMU driver, along with a separate response status. So maybe int iommu_page_response(struct iommu_domain *domain, struct device *dev, struct iommu_fault_event *event, u32 response) And then you'd just need to define the IOMMU_PAGE_RESPONSE_* values. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/2] iommu/ipmmu-vmsa: Add r8a779(70|95) DT bindings
Update the IPMMU DT binding documentation to include the r8a77970 (R-Car V3M) and r8a77995 (R-Car D3) compat strings. Based on work for r8a7796 by Magnus Damm. Signed-off-by: Simon Horman Reviewed-by: Geert Uytterhoeven --- v2 * Correct typo: R8A7770 -> R8A77970 --- Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt index 98ae900a7a76..8e4e89c1bd48 100644 --- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt +++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt @@ -17,6 +17,8 @@ Required Properties: - "renesas,ipmmu-r8a7794" for the R8A7794 (R-Car E2) IPMMU. - "renesas,ipmmu-r8a7795" for the R8A7795 (R-Car H3) IPMMU. - "renesas,ipmmu-r8a7796" for the R8A7796 (R-Car M3-W) IPMMU. +- "renesas,ipmmu-r8a77970" for the R8A77970 (R-Car V3M) IPMMU. +- "renesas,ipmmu-r8a77995" for the R8A77995 (R-Car D3) IPMMU. - "renesas,ipmmu-vmsa" for generic R-Car Gen2 VMSA-compatible IPMMU. - reg: Base address and size of the IPMMU registers. -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/2] iommu/ipmmu-vmsa: Add r8a7796 DT binding
From: Magnus Damm Update the IPMMU DT binding documentation to include the r8a7796 compat string for R-Car M3-W. Signed-off-by: Magnus Damm Acked-by: Laurent Pinchart Acked-by: Rob Herring Acked-by: Simon Horman Acked-by: Geert Uytterhoeven --- Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt index 3ed027cfca95..98ae900a7a76 100644 --- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt +++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt @@ -16,6 +16,7 @@ Required Properties: - "renesas,ipmmu-r8a7793" for the R8A7793 (R-Car M2-N) IPMMU. - "renesas,ipmmu-r8a7794" for the R8A7794 (R-Car E2) IPMMU. - "renesas,ipmmu-r8a7795" for the R8A7795 (R-Car H3) IPMMU. +- "renesas,ipmmu-r8a7796" for the R8A7796 (R-Car M3-W) IPMMU. - "renesas,ipmmu-vmsa" for generic R-Car Gen2 VMSA-compatible IPMMU. - reg: Base address and size of the IPMMU registers. -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/2] iommu/ipmmu-vmsa: Add r8a779(6|70|95) DT bindings
Update the IPMMU DT binding documentation to include r8a7796 (R-Car M3-W), r8a77970 (R-Car V3M) and r8a77995 (R-Car D3) compat strings. This patch-set is comprised of patches previously included in * [PATCH v4 0/3] iommu/ipmmu-vmsa: r8a7796 support V4 * [PATCH 0/2] iommu/ipmmu-vmsa: r8a779(70|95) support Those series also include driver updates to implement the bindings, however, it while it is felt that the bindings themselves are ready to be merged we feel that the driver updates are not at that stage yet. Changes since v1: * Correct typo in r8a779(70|95) patch: R8A7770 -> R8A77970 Magnus Damm (1): iommu/ipmmu-vmsa: Add r8a7796 DT binding Simon Horman (1): iommu/ipmmu-vmsa: Add r8a779(70|95) DT bindings Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt | 3 +++ 1 file changed, 3 insertions(+) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
On 24/11/17 08:23, Bob Liu wrote: > On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a >> bond between a container and a process address space, identified by a >> device-specific ID named PASID. This allows the device to target DMA >> transactions at the process virtual addresses without a need for mapping >> and unmapping buffers explicitly in the IOMMU. The process page tables are >> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to >> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a >> PASID. >> > > How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)? > e.g always bind to current process in SET_IOMMU. > > Not sure about the real use case. I guess you could introduce a new VFIO IOMMU type for this. I think this would be useful for SVA without PASID: if the device supports I/O page faults, use SET_IOMMU with a VFIO_SVA_IOMMU type (for example) and the process is bound automatically to the default translation context of the device. This requires a new IOMMU type because the MAP/UNMAP ioctl won't work anymore. I'm not keen on introducing loads of new features in the APIs at the moment, because I only have the IOMMU point of view, not many endpoint users. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory
From: Jaewon Kim > Sent: 24 November 2017 05:59 > > dma-coherent uses bitmap APIs which internally consider align based on the > requested size. If most of allocations are small size like KBs, using > alignment scheme seems to be good for anti-fragmentation. But if large > allocation are commonly used, then an allocation could be failed because > of the alignment. To avoid the allocation failure, we had to increase total > size. > > This is a example, total size is 30MB, only few memory at front is being > used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The > first try on offset 0MB will be failed because others already are using > them. The second try on offset 16MB will be failed because of ouf of bound. > > So if the alignment is not necessary on a specific dma-coherent memory > region, we can set no-align property. Then dma-coherent will ignore the > alignment only for the memory region. ISTM that the alignment needs to be a property of the request, not of the device. Certainly the device driver code is most likely to know the specific alignment requirements of any specific allocation. We've some hardware that would need large allocations to be 16k aligned. We actually use multiple 16k allocations because any large buffers are accessed directly from userspace (mmap and vm_iomap_memory) and the card has its own page tables (with 16k pages). David ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
On 2017/10/6 21:31, Jean-Philippe Brucker wrote: > Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a > bond between a container and a process address space, identified by a > device-specific ID named PASID. This allows the device to target DMA > transactions at the process virtual addresses without a need for mapping > and unmapping buffers explicitly in the IOMMU. The process page tables are > shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to > handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a > PASID. > How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)? e.g always bind to current process in SET_IOMMU. Not sure about the real use case. > Signed-off-by: Jean-Philippe Brucker > --- > drivers/vfio/vfio_iommu_type1.c | 243 > +++- > include/uapi/linux/vfio.h | 69 > 2 files changed, 311 insertions(+), 1 deletion(-) > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu