RE: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> From: Raj, Ashok > Sent: Friday, May 15, 2020 11:20 PM > > On Fri, May 15, 2020 at 12:39:14AM -0700, Tian, Kevin wrote: > > Hi, Alex, > > > > When working on an updated version Yi and I found an design open > > which needs your guidance. > > > > In concept nested translation can be incarnated as one GPA->HPA page > > table and multiple GVA->GPA page tables per VM. It means one container > > is sufficient to include all SVA-capable devices assigned to the same guest, > > as there is just one iova space (GPA->HPA) to be managed by vfio iommu > > map/unmap api. GVA->GPA page tables are bound through the new api > > introduced in this patch. It is different from legacy shadow translation > > which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device > requires > > its own iova space and must be in a separate container. > > > > However supporting multiple SVA-capable devices in one container > > imposes one challenge. Now the bind_guest_pgtbl is implemented as > > iommu type1 api. From the definition of iommu type 1 any operation > > should be applied to all devices within the same container, just like > > dma map/unmap. However this philosophy is incorrect regarding to > > page table binding. We must follow the guest binding requirements > > between its page tables and assigned devices, otherwise every bound > > address space is suddenly accessible by all assigned devices thus creating > > security holes. > > The above 2 paragraphs are bit confusing :-) but what really matters > is the below: > > > > Do you think whether it's possible to extend iommu api to accept > > device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds > > also problematic, as PASID and page tables are IOMMU things which > > are not touched by vfio device drivers today. > > All you are referring to is when admin groups multiple devices in a > single container, you are saying you can't give isolation to them > for SVA purposes. This is logically equivalent to a single group with > multiple devices. > > - Such as devices behind p2p bridge > - MFD's or devices behind switches or hieararchies without ACS > support for instance. > > By limitation you mean, disable PASID on those devices in a single > container? the limitation means disabling nesting capability in such container and yes it implies not exposing PASID capability to userspace too. > > what about ATS? ATS is possibly fine. VFIO exposes it to userspace even today. Thanks Kevin > > > > > Alternatively can we impose the limitation that nesting APIs can be > > only enabled for singleton containers which contains only one device? > > This basically falls back to the state of legacy shadow translation > > vIOMMU. and our current Qemu vIOMMU implementation actually > > does this way regardless of whether nesting is used. Do you think > > whether such tradeoff is acceptable as a starting point? > > > > Thanks > > Kevin > > > > > -Original Message- > > > From: Liu, Yi L > > > Sent: Sunday, March 22, 2020 8:32 PM > > > To: alex.william...@redhat.com; eric.au...@redhat.com > > > Cc: Tian, Kevin ; jacob.jun@linux.intel.com; > > > j...@8bytes.org; Raj, Ashok ; Liu, Yi L > > > ; Tian, Jun J ; Sun, Yi Y > > > ; jean-phili...@linaro.org; pet...@redhat.com; > > > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux- > > > ker...@vger.kernel.org; Wu, Hao > > > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host > > > > > > From: Liu Yi L > > > > > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by > > > hardware > > > IOMMUs that have nesting DMA translation (a.k.a dual stage address > > > translation). For such hardware IOMMUs, there are two stages/levels of > > > address translation, and software may let userspace/VM to own the first- > > > level/stage-1 translation structures. Example of such usage is vSVA ( > > > virtual Shared Virtual Addressing). VM owns the first-level/stage-1 > > > translation structures and bind the structures to host, then hardware > > > IOMMU would utilize nesting translation when doing DMA translation fo > > > the devices behind such hardware IOMMU. > > > > > > This patch adds vfio support for binding guest translation (a.k.a stage 1) > > > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not > only > > > bind > > > guest page table is needed, it also requires to expose interface to guest > > > for iommu cache invalidation when guest modified the first-level/stage-1 > > > translation structures since hardware needs to be notified to flush stale > > > iotlbs. This would be introduced in next patch. > > > > > > In this patch, guest page table bind and unbind are done by using flags > > > VFIO_IOMMU_BIND_GUEST_PGTBL and > > > VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL > > > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by > > > struct iommu_gpasid_bind_data. Before binding guest page table to host, > > > VM should have got a PASID allocated by host via > > >
RE: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> From: Alex Williamson > Sent: Saturday, May 16, 2020 1:59 AM > > On Fri, 15 May 2020 07:39:14 + > "Tian, Kevin" wrote: > > > Hi, Alex, > > > > When working on an updated version Yi and I found an design open > > which needs your guidance. > > > > In concept nested translation can be incarnated as one GPA->HPA page > > table and multiple GVA->GPA page tables per VM. It means one container > > is sufficient to include all SVA-capable devices assigned to the same guest, > > as there is just one iova space (GPA->HPA) to be managed by vfio iommu > > map/unmap api. GVA->GPA page tables are bound through the new api > > introduced in this patch. It is different from legacy shadow translation > > which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device > requires > > its own iova space and must be in a separate container. > > I think you've redefined the container as specifically IOVA context, > but I think a container is actually more of a shared IOMMU context > between groups and devices within those groups. Userspace is under no > obligation to share a container between groups and the kernel is under > no obligation to let groups share a container. oh, sorry, I didn't redefine anything here. Just describe the usage examples of containers when vIOMMU is used. Before vSVA there is just one address space per IOMMU context, which is what current vfio iommu api is designed for. Now when extending it to support vSVA there are two-layer multiple address spaces per IOMMU context, and the sharing possibility of those address spaces are different. Then this open is about whether we want to allow partial sharing of IOMMU context within a container, i.e. sharing the 2nd level but allows binding to different 1st levels. > > > However supporting multiple SVA-capable devices in one container > > imposes one challenge. Now the bind_guest_pgtbl is implemented as > > iommu type1 api. From the definition of iommu type 1 any operation > > should be applied to all devices within the same container, just like > > dma map/unmap. However this philosophy is incorrect regarding to > > page table binding. We must follow the guest binding requirements > > between its page tables and assigned devices, otherwise every bound > > address space is suddenly accessible by all assigned devices thus creating > > security holes. > > Is that a security hole, or is that simply the nature of a container? > Containers are meant to allow a user to share an IOMMU context between > groups of devices. Sharing that context necessarily implies that the > user is granting the devices access to those address spaces. It's the nature of container. I just tried to state that the current api is insufficient if we want to allow partial-sharing within container. > > > Do you think whether it's possible to extend iommu api to accept > > device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds > > also problematic, as PASID and page tables are IOMMU things which > > are not touched by vfio device drivers today. > > Is this really that different from a group holding multiple devices, > each of which has a unique requester ID as seen by the IOMMU? The > IOMMU can support separate contexts per device, but the topology > restricts us that those contexts are not fully isolated. So we've > imposed the group restriction on ourselves to reflect that. If a user > wants to share an IOMMU context between devices, maybe that precludes > the user from making use of nested translation. Agree. > > > Alternatively can we impose the limitation that nesting APIs can be > > only enabled for singleton containers which contains only one device? > > This basically falls back to the state of legacy shadow translation > > vIOMMU. and our current Qemu vIOMMU implementation actually > > does this way regardless of whether nesting is used. Do you think > > whether such tradeoff is acceptable as a starting point? > > I think it's an acceptable starting point. It seems what you're > describing is separating a monolithic notion of IOMMU context into > multiple layers, so we can share them at different points, ex. share a > GPA->HPA context among multiple different GVA->GPA contexts. That > seems to imply something like the sub/super-container idea that's been > tossed around before, but never been fully defined or explored. Thanks, yes, that'd be a future extension and we'll start with singleton limitation for now. Thanks Kevin > > > > -Original Message- > > > From: Liu, Yi L > > > Sent: Sunday, March 22, 2020 8:32 PM > > > To: alex.william...@redhat.com; eric.au...@redhat.com > > > Cc: Tian, Kevin ; jacob.jun@linux.intel.com; > > > j...@8bytes.org; Raj, Ashok ; Liu, Yi L > > > ; Tian, Jun J ; Sun, Yi Y > > > ; jean-phili...@linaro.org; pet...@redhat.com; > > > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux- > > > ker...@vger.kernel.org; Wu, Hao > > > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables
RE: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem
> Subject: Re: Constantly map and unmap of streaming DMA buffers with > IOMMU backend might cause serious performance problem > > On 2020-05-15 09:19, Song Bao Hua wrote: > [ snip... nice analysis, but ultimately it's still "doing stuff has more > overhead > than not doing stuff" ] > > > I am thinking several possible ways on decreasing or removing the latency of > DMA map/unmap for every single DMA transfer. Meanwhile, "non-strict" as an > existing option with possible safety issues, I won't discuss it in this mail. > > But passthrough and non-strict mode *specifically exist* for the cases where > performance is the most important concern - streaming DMA with an IOMMU > in the middle has an unavoidable tradeoff between performance and isolation, > so dismissing that out of hand is not a good way to start making this > argument. I do understand there is a tradeoff between performance and isolation. However, users might ask for performance while supporting isolation. In passthrough mode, the whole memory might be accessible by DMA. In non-strict mode, a buffer could be still mapped in IOMMU when users have returned it to buddy and the buffer has even been allocated by another user. > > > 1. provide bounce coherent buffers for streaming buffers. > > As the coherent buffers keep the status of mapping, we can remove the > overhead of map and unmap for each single DMA operations. However, this > solution requires memory copy between stream buffers and bounce buffers. > Thus it will work only if copy is faster than map/unmap. Meanwhile, it will > consume much more memory bandwidth. > > I'm struggling to understand how that would work, can you explain it in more > detail? lower-layer drivers maintain some reusable coherent buffers. For TX path, drivers copy streaming buffer to coherent buffer, then do DMA; For RX path, drivers do DMA in coherent buffer, then copy to streaming buffer. > > > 2.make upper-layer kernel components aware of the pain of iommu > > map/unmap upper-layer fs, mm, networks can somehow let the lower-layer > drivers know the end of the life cycle of sg buffers. In zswap case, I have > seen > zswap always use the same 2 pages as the destination buffers to save > compressed page, but the compressor driver still has to constantly map and > unmap those same two pages for every single compression since zswap and zip > drivers are working in two completely different software layers. > > > > I am thinking some way as below, upper-layer kernel code can call: > > sg_init_table(); > > sg_mark_reusable(); > > /* use the buffer many times */ > > > > sg_mark_stop_reuse(); > > > > After that, if low level drivers see "reusable" flag, it will realize the > > buffer can > be used multiple times and will not do map/unmap every time. it means > upper-layer components will further use the buffers and the same buffers will > probably be given to lower-layer drivers for new DMA transfer later. When > upper-layer code sets " stop_reuse", lower-layer driver will unmap the sg > buffers, possibly by providing a unmap-callback to upper-layer components. > For zswap case, I have seen the same buffers are always re-used and zip driver > maps and unmaps it again and again. Shortly after the buffer is unmapped, it > will be mapped in the next transmission, almost without any time gap > between unmap and map. In case zswap can set the "reusable" flag, zip driver > will save a lot of time. > > Meanwhile, for the safety of buffers, lower-layer drivers need to make > > certain > the buffers have already been unmapped in iommu before those buffers go > back to buddy for other users. > > That sounds like it would only have benefit in a very small set of specific > circumstances, and would be very difficult to generalise to buffers that are > mapped via dma_map_page() or dma_map_single(). Yes, indeed. Hopefully the small set of specific circumstances will encourage more upper-layer consumers to reuse buffers, then the "reusable" flag can extend to more common cases, such as page and single buffer. > Furthermore, a high-level API that affects a low-level driver's > interpretation of > mid-layer API calls without the mid-layer's knowledge sounds like a hideous > abomination of anti-design. If a mid-layer API lends itself to inefficiency > at the > lower level, it would seem a lot cleaner and more robust to extend *that* API > for stateful buffer reuse. Absolutely agree. I didn't say the method is elegant. For this moment, maybe "reuse" can get started from a small case like zswap. After some while, it is possible more users are encouraged to do some optimization for buffer reuse, understanding the suffering of lower-layer drivers. Then those performance problems might be solved case by case. On the other hand, it is always the freedom of upper-layer code to indicate "reuse" or not. If they don't say anything about reuse, lower-layer drivers can simply do map and unmap. > Failing that, it
Re: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem
On 2020-05-15 22:33, Song Bao Hua wrote: Subject: Re: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem On Fri, May 15, 2020 at 01:10:21PM +0100, Robin Murphy wrote: Meanwhile, for the safety of buffers, lower-layer drivers need to make certain the buffers have already been unmapped in iommu before those buffers go back to buddy for other users. That sounds like it would only have benefit in a very small set of specific circumstances, and would be very difficult to generalise to buffers that are mapped via dma_map_page() or dma_map_single(). Furthermore, a high-level API that affects a low-level driver's interpretation of mid-layer API calls without the mid-layer's knowledge sounds like a hideous abomination of anti-design. If a mid-layer API lends itself to inefficiency at the lower level, it would seem a lot cleaner and more robust to extend *that* API for stateful buffer reuse. Failing that, it might possibly be appropriate to approach this at the driver level - many of the cleverer network drivers already implement buffer pools to recycle mapped SKBs internally, couldn't the "zip driver" simply try doing something like that for itself? Exactly. If you upper consumer of the DMA API keeps reusing the same pages just map them once and use dma_sync_* to transfer ownership as needed. The problem is that the lower-layer drivers don't know if upper consumer keeps reusing the same pages. They are running in different software layers. For example, Consumer is here in mm/zswap.c static int zswap_frontswap_store(unsigned type, pgoff_t offset, struct page *page) { ... /* compress */ dst = get_cpu_var(zswap_dstmem); ... ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, ); ... } But the lower-layer driver is in drivers/crypto/... Meanwhile, the lower-layer driver couldn't cache the pointers of buffer address coming from consumers to detect if the upper-layer is using the same page. Because the same page might come from different users or come from the different stages of the same user with different permissions. Indeed the driver can't cache arbitrary pointers, but if typical buffers are small enough it can copy the data into its own already-mapped page, dma_sync it, and perform the DMA operation from there. That might even be more or less what your first suggestion was, but I'm still not quite sure. For example, consumer A uses the buffer as destination, then returns it to buddy, but consumer B gets the same buffer and uses it as source. Another possibility is Consumer A uses the buffer, returns it to buddy, after some time, it allocates a buffer again, but gets the same buffer from buddy like before. For the safety of the buffer, lower-layer driver must guarantee the buffer is unmapped when the buffer returns to buddy. I think only the upper-layer consumer knows if it is reusing the buffer. Right, and if reusing buffers is common in crypto callers, then there's an argument for "set up reusable buffer", "process updated buffer" and "clean up buffer" operations to be added to the crypto API itself, such that the underlying drivers can then optimise for DMA usage in a robust and obvious way if they want to (or just implement the setup and teardown as no-ops and still do a full map/unmap in each "process" call if they don't). Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem
> Subject: Re: Constantly map and unmap of streaming DMA buffers with > IOMMU backend might cause serious performance problem > > On Fri, May 15, 2020 at 01:10:21PM +0100, Robin Murphy wrote: > >> Meanwhile, for the safety of buffers, lower-layer drivers need to make > certain the buffers have already been unmapped in iommu before those > buffers go back to buddy for other users. > > > > That sounds like it would only have benefit in a very small set of specific > > circumstances, and would be very difficult to generalise to buffers that > > are mapped via dma_map_page() or dma_map_single(). Furthermore, a > > high-level API that affects a low-level driver's interpretation of > > mid-layer API calls without the mid-layer's knowledge sounds like a hideous > > abomination of anti-design. If a mid-layer API lends itself to inefficiency > > at the lower level, it would seem a lot cleaner and more robust to extend > > *that* API for stateful buffer reuse. Failing that, it might possibly be > > appropriate to approach this at the driver level - many of the cleverer > > network drivers already implement buffer pools to recycle mapped SKBs > > internally, couldn't the "zip driver" simply try doing something like that > > for itself? > > Exactly. If you upper consumer of the DMA API keeps reusing the same > pages just map them once and use dma_sync_* to transfer ownership as > needed. The problem is that the lower-layer drivers don't know if upper consumer keeps reusing the same pages. They are running in different software layers. For example, Consumer is here in mm/zswap.c static int zswap_frontswap_store(unsigned type, pgoff_t offset, struct page *page) { ... /* compress */ dst = get_cpu_var(zswap_dstmem); ... ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, ); ... } But the lower-layer driver is in drivers/crypto/... Meanwhile, the lower-layer driver couldn't cache the pointers of buffer address coming from consumers to detect if the upper-layer is using the same page. Because the same page might come from different users or come from the different stages of the same user with different permissions. For example, consumer A uses the buffer as destination, then returns it to buddy, but consumer B gets the same buffer and uses it as source. Another possibility is Consumer A uses the buffer, returns it to buddy, after some time, it allocates a buffer again, but gets the same buffer from buddy like before. For the safety of the buffer, lower-layer driver must guarantee the buffer is unmapped when the buffer returns to buddy. I think only the upper-layer consumer knows if it is reusing the buffer. Thanks Barry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] PCI/ATS: Only enable ATS for trusted devices
On Fri, May 15, 2020 at 12:43:59PM +0200, Jean-Philippe Brucker wrote: > Add pci_ats_supported(), which checks whether a device has an ATS > capability, and whether it is trusted. A device is untrusted if it is > plugged into an external-facing port such as Thunderbolt and could be > spoof an existing device to exploit weaknesses in the IOMMU > configuration. PCIe ATS is one such weaknesses since it allows > endpoints to cache IOMMU translations and emit transactions with > 'Translated' Address Type (10b) that partially bypass the IOMMU > translation. > > The SMMUv3 and VT-d IOMMU drivers already disallow ATS and transactions > with 'Translated' Address Type for untrusted devices. Add the check to > pci_enable_ats() to let other drivers (AMD IOMMU for now) benefit from > it. > > By checking ats_cap, the pci_ats_supported() helper also returns whether > ATS was globally disabled with pci=noats, and could later include more > things, for example whether the whole PCIe hierarchy down to the > endpoint supports ATS. > > Signed-off-by: Jean-Philippe Brucker Acked-by: Bjorn Helgaas > --- > include/linux/pci-ats.h | 3 +++ > drivers/pci/ats.c | 18 +- > 2 files changed, 20 insertions(+), 1 deletion(-) > > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h > index d08f0869f1213e..f75c307f346de9 100644 > --- a/include/linux/pci-ats.h > +++ b/include/linux/pci-ats.h > @@ -6,11 +6,14 @@ > > #ifdef CONFIG_PCI_ATS > /* Address Translation Service */ > +bool pci_ats_supported(struct pci_dev *dev); > int pci_enable_ats(struct pci_dev *dev, int ps); > void pci_disable_ats(struct pci_dev *dev); > int pci_ats_queue_depth(struct pci_dev *dev); > int pci_ats_page_aligned(struct pci_dev *dev); > #else /* CONFIG_PCI_ATS */ > +static inline bool pci_ats_supported(struct pci_dev *d) > +{ return false; } > static inline int pci_enable_ats(struct pci_dev *d, int ps) > { return -ENODEV; } > static inline void pci_disable_ats(struct pci_dev *d) { } > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c > index 390e92f2d8d1fc..15fa0c37fd8e44 100644 > --- a/drivers/pci/ats.c > +++ b/drivers/pci/ats.c > @@ -30,6 +30,22 @@ void pci_ats_init(struct pci_dev *dev) > dev->ats_cap = pos; > } > > +/** > + * pci_ats_supported - check if the device can use ATS > + * @dev: the PCI device > + * > + * Returns true if the device supports ATS and is allowed to use it, false > + * otherwise. > + */ > +bool pci_ats_supported(struct pci_dev *dev) > +{ > + if (!dev->ats_cap) > + return false; > + > + return !dev->untrusted; > +} > +EXPORT_SYMBOL_GPL(pci_ats_supported); > + > /** > * pci_enable_ats - enable the ATS capability > * @dev: the PCI device > @@ -42,7 +58,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps) > u16 ctrl; > struct pci_dev *pdev; > > - if (!dev->ats_cap) > + if (!pci_ats_supported(dev)) > return -EINVAL; > > if (WARN_ON(dev->ats_enabled)) > -- > 2.26.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Implement deferred domain attachment
On 2020-05-15 19:26, Joerg Roedel wrote: On Fri, May 15, 2020 at 05:28:53PM +0100, Robin Murphy wrote: On 2020-05-15 17:14, Joerg Roedel wrote: diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ba128d1cdaee..403fda04ea98 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -362,8 +362,8 @@ static int iommu_dma_deferred_attach(struct device *dev, return 0; if (unlikely(ops->is_attach_deferred && - ops->is_attach_deferred(domain, dev))) - return iommu_attach_device(domain, dev); +ops->is_attach_deferred(domain, dev))) + return iommu_attach_device_no_defer(domain, dev); Wouldn't it be simpler to just invoke ops->attach_dev directly and avoid having to formalise a public interface that nobody else should ever use anyway? That would omit the ops->attach_dev != NULL check and the trace-point on device attach. Besides that, it would be a layering violation. But the function is of course entirely internal to the iommu subsytem and is a good canditate to be moved to a header file in drivers/iommu. Sure, checking the pointer before calling was implied, but the tracepoint is a good argument, I'd forgotten about that :) @@ -746,8 +747,11 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev) mutex_lock(>mutex); list_add_tail(>list, >devices); - if (group->domain) - ret = __iommu_attach_device(group->domain, dev); + domain = group->domain; + if (domain && (!domain->ops->is_attach_deferred || + !domain->ops->is_attach_deferred(domain, dev))) + ret = __iommu_attach_device(domain, dev); + } mutex_unlock(>mutex); if (ret) goto err_put_group; No, doing this in iommu_group_add_device() doesn't solve the problem. The attach must not happen before a device driver took control of the device and silenced any DMA initiated by the old kernel. At probe time this isn't guaranteed. But that's not what this is; this is (supposed to be) the exact same "don't actually perform the attach yet" logic as before, just restricting it to default domains in the one place that it actually needs to be, so as not to fundamentally bugger up iommu_attach_device() in a way that prevents it from working as expected at the correct point later. Thinking a bit more, consider if the driver resets the device then attaches it straight to its own unmanaged domain rather than calling any DMA ops (e.g. VFIO?) - it looks like that would also be totally broken right now, and no amount of bodges in iommu-dma is going to help there. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] dt-bindings: arm-smmu: Add sc7180 compatible string
Hi, On Fri, May 1, 2020 at 3:30 AM Sharat Masetty wrote: > > This patch simply adds a new compatible string for SC7180 platform. > > Signed-off-by: Sharat Masetty > --- > Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > index 6515dbe..986098b 100644 > --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > @@ -28,6 +28,7 @@ properties: >- enum: >- qcom,msm8996-smmu-v2 >- qcom,msm8998-smmu-v2 > + - qcom,sc7180-smmu-v2 >- qcom,sdm845-smmu-v2 >- const: qcom,smmu-v2 Is anything blocking this patch from landing now? -Doug ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu: Remove functions that support private domain
Hi Joerg, > -Original Message- > From: Joerg Roedel > Sent: Friday, May 15, 2020 2:59 AM > To: Prakhya, Sai Praneeth > Cc: iommu@lists.linux-foundation.org; Lu Baolu > Subject: Re: [PATCH] iommu: Remove functions that support private domain > > On Thu, May 14, 2020 at 11:12:52PM +, Prakhya, Sai Praneeth wrote: > > +static int is_driver_bound(struct device *dev, void *not_used) { > > + int ret = 0; > > + > > + device_lock(dev); > > + if (device_is_bound(dev)) > > + ret = 1; > > + device_unlock(dev); > > + return ret; > > +} > > This locks only one device, so without lock-conversion there could be a driver > probe after the device_unlock(), while we are probing the other devices of the > group. > > > [SNIP] > > > > + /* > > +* Check if any device in the group still has a driver binded to it. > > +* This might race with device driver probing code and unfortunately > > +* there is no clean way out of that either, locking all devices in the > > +* group and then do the re-attach will introduce a lock-inversion with > > +* group->mutex - Joerg. > > +*/ > > + if (iommu_group_for_each_dev(group, NULL, is_driver_bound)) { > > + pr_err("Active drivers exist for devices in the group\n"); > > + return -EBUSY; > > + } > > The lock inversion comes into the picture when this code is called from > device(-driver) core through the bus-notifiers. The notifiers are called with > the > device already locked. Make sense. I will look through that code. > > Another question I have is.. if it's racy then it should be racy even > > for one device iommu groups.. right? Why would it be racy only with > > multiple devices iommu group? > > Valid point. So the device needs to be locked _while_ the default domain > change > happens. If triggered by sysfs there should be no locking problems, I guess. > But > you better try it out. I will try this out and will update you. Regards, Sai ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Implement deferred domain attachment
On Fri, May 15, 2020 at 05:28:53PM +0100, Robin Murphy wrote: > On 2020-05-15 17:14, Joerg Roedel wrote: > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > index ba128d1cdaee..403fda04ea98 100644 > > --- a/drivers/iommu/dma-iommu.c > > +++ b/drivers/iommu/dma-iommu.c > > @@ -362,8 +362,8 @@ static int iommu_dma_deferred_attach(struct device *dev, > > return 0; > > if (unlikely(ops->is_attach_deferred && > > - ops->is_attach_deferred(domain, dev))) > > - return iommu_attach_device(domain, dev); > > +ops->is_attach_deferred(domain, dev))) > > + return iommu_attach_device_no_defer(domain, dev); > > Wouldn't it be simpler to just invoke ops->attach_dev directly and avoid > having to formalise a public interface that nobody else should ever use > anyway? That would omit the ops->attach_dev != NULL check and the trace-point on device attach. Besides that, it would be a layering violation. But the function is of course entirely internal to the iommu subsytem and is a good canditate to be moved to a header file in drivers/iommu. > @@ -746,8 +747,11 @@ int iommu_group_add_device(struct iommu_group *group, > struct device *dev) > > mutex_lock(>mutex); > list_add_tail(>list, >devices); > - if (group->domain) > - ret = __iommu_attach_device(group->domain, dev); > + domain = group->domain; > + if (domain && (!domain->ops->is_attach_deferred || > + !domain->ops->is_attach_deferred(domain, dev))) > + ret = __iommu_attach_device(domain, dev); > + } > mutex_unlock(>mutex); > if (ret) > goto err_put_group; No, doing this in iommu_group_add_device() doesn't solve the problem. The attach must not happen before a device driver took control of the device and silenced any DMA initiated by the old kernel. At probe time this isn't guaranteed. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
On Fri, 15 May 2020 07:39:14 + "Tian, Kevin" wrote: > Hi, Alex, > > When working on an updated version Yi and I found an design open > which needs your guidance. > > In concept nested translation can be incarnated as one GPA->HPA page > table and multiple GVA->GPA page tables per VM. It means one container > is sufficient to include all SVA-capable devices assigned to the same guest, > as there is just one iova space (GPA->HPA) to be managed by vfio iommu > map/unmap api. GVA->GPA page tables are bound through the new api > introduced in this patch. It is different from legacy shadow translation > which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device requires > its own iova space and must be in a separate container. I think you've redefined the container as specifically IOVA context, but I think a container is actually more of a shared IOMMU context between groups and devices within those groups. Userspace is under no obligation to share a container between groups and the kernel is under no obligation to let groups share a container. > However supporting multiple SVA-capable devices in one container > imposes one challenge. Now the bind_guest_pgtbl is implemented as > iommu type1 api. From the definition of iommu type 1 any operation > should be applied to all devices within the same container, just like > dma map/unmap. However this philosophy is incorrect regarding to > page table binding. We must follow the guest binding requirements > between its page tables and assigned devices, otherwise every bound > address space is suddenly accessible by all assigned devices thus creating > security holes. Is that a security hole, or is that simply the nature of a container? Containers are meant to allow a user to share an IOMMU context between groups of devices. Sharing that context necessarily implies that the user is granting the devices access to those address spaces. > Do you think whether it's possible to extend iommu api to accept > device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds > also problematic, as PASID and page tables are IOMMU things which > are not touched by vfio device drivers today. Is this really that different from a group holding multiple devices, each of which has a unique requester ID as seen by the IOMMU? The IOMMU can support separate contexts per device, but the topology restricts us that those contexts are not fully isolated. So we've imposed the group restriction on ourselves to reflect that. If a user wants to share an IOMMU context between devices, maybe that precludes the user from making use of nested translation. > Alternatively can we impose the limitation that nesting APIs can be > only enabled for singleton containers which contains only one device? > This basically falls back to the state of legacy shadow translation > vIOMMU. and our current Qemu vIOMMU implementation actually > does this way regardless of whether nesting is used. Do you think > whether such tradeoff is acceptable as a starting point? I think it's an acceptable starting point. It seems what you're describing is separating a monolithic notion of IOMMU context into multiple layers, so we can share them at different points, ex. share a GPA->HPA context among multiple different GVA->GPA contexts. That seems to imply something like the sub/super-container idea that's been tossed around before, but never been fully defined or explored. Thanks, Alex > > -Original Message- > > From: Liu, Yi L > > Sent: Sunday, March 22, 2020 8:32 PM > > To: alex.william...@redhat.com; eric.au...@redhat.com > > Cc: Tian, Kevin ; jacob.jun@linux.intel.com; > > j...@8bytes.org; Raj, Ashok ; Liu, Yi L > > ; Tian, Jun J ; Sun, Yi Y > > ; jean-phili...@linaro.org; pet...@redhat.com; > > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux- > > ker...@vger.kernel.org; Wu, Hao > > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host > > > > From: Liu Yi L > > > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by > > hardware > > IOMMUs that have nesting DMA translation (a.k.a dual stage address > > translation). For such hardware IOMMUs, there are two stages/levels of > > address translation, and software may let userspace/VM to own the first- > > level/stage-1 translation structures. Example of such usage is vSVA ( > > virtual Shared Virtual Addressing). VM owns the first-level/stage-1 > > translation structures and bind the structures to host, then hardware > > IOMMU would utilize nesting translation when doing DMA translation fo > > the devices behind such hardware IOMMU. > > > > This patch adds vfio support for binding guest translation (a.k.a stage 1) > > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only > > bind > > guest page table is needed, it also requires to expose interface to guest > > for iommu cache invalidation when guest modified the first-level/stage-1 > > translation structures since
Re: [PATCH 03/14] docs: fix references for DMA*.txt files
On Fri, 1 May 2020 17:37:47 +0200 Mauro Carvalho Chehab wrote: > As we moved those files to core-api, fix references to point > to their newer locations. > > Signed-off-by: Mauro Carvalho Chehab > --- > Documentation/PCI/pci.rst | 6 +++--- > Documentation/block/biodoc.rst | 2 +- > Documentation/core-api/bus-virt-phys-mapping.rst | 2 +- > Documentation/core-api/dma-api.rst | 6 +++--- > Documentation/core-api/dma-isa-lpc.rst | 2 +- > Documentation/driver-api/usb/dma.rst | 6 +++--- > Documentation/memory-barriers.txt | 6 +++--- > .../translations/ko_KR/memory-barriers.txt | 6 +++--- > arch/ia64/hp/common/sba_iommu.c| 12 ++-- > arch/parisc/kernel/pci-dma.c | 2 +- > arch/x86/include/asm/dma-mapping.h | 4 ++-- > arch/x86/kernel/amd_gart_64.c | 2 +- > drivers/parisc/sba_iommu.c | 14 +++--- > include/linux/dma-mapping.h| 2 +- > include/media/videobuf-dma-sg.h| 2 +- > kernel/dma/debug.c | 2 +- > 16 files changed, 38 insertions(+), 38 deletions(-) ...of course, not applying patch 2 causes this one to not apply, so holding off for now... jon ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement
Hi, On Fri, May 15, 2020 at 10:19:49AM -0700, Raj, Ashok wrote: > On Fri, May 15, 2020 at 08:43:51AM -0700, Christoph Hellwig wrote: > > Can you please lift the untrusted flag into struct device? It really > > isn't a PCI specific concept, and we should not have code poking into > > pci_dev all over the iommu code. > > Just for clarification: All IOMMU's today mostly pertain to only PCI devices > and for devices that aren't PCI like HPET for instance we give a PCI > identifier. Facilities like ATS for instance require the platform to have > an IOMMU. > > what additional problems does moving this to struct device solve? ATS is PCI specific, but IOMMUs certainly aren't! The vast majority of IOMMUs deployed in arm/arm64 SoCs are /not/ using any sort of PCI. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement
Hi Christoph On Fri, May 15, 2020 at 08:43:51AM -0700, Christoph Hellwig wrote: > Can you please lift the untrusted flag into struct device? It really > isn't a PCI specific concept, and we should not have code poking into > pci_dev all over the iommu code. Just for clarification: All IOMMU's today mostly pertain to only PCI devices and for devices that aren't PCI like HPET for instance we give a PCI identifier. Facilities like ATS for instance require the platform to have an IOMMU. what additional problems does moving this to struct device solve? Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Implement deferred domain attachment
On 2020-05-15 17:14, Joerg Roedel wrote: On Fri, May 15, 2020 at 04:42:23PM +0100, Robin Murphy wrote: struct iommu_domain *iommu_get_dma_domain(struct device *dev) { - return dev->iommu_group->default_domain; + struct iommu_domain *domain = dev->iommu_group->default_domain; + + if (__iommu_is_attach_deferred(domain, dev)) + __iommu_attach_device_no_defer(domain, dev); This raises a red flag, since iommu-dma already has explicit deferred attach handling where it should need it, immediately after this is called to retrieve the domain. The whole thing smells to me like we should have an explicit special-case in iommu_probe_device() rather than hooking __iommu_attach_device() in general then having to bodge around the fallout elsewhere. Good point, I missed that. But it didn't work for its only user, the AMD IOMMU driver, the reason is that it calls iommu_attach_device(), which in its code-path checks for deferred attaching again and bails out, without do the real attachment. But below updated fix should work. Jerry, could you please test it again? From 4e262dedcd36c7572312c65e66416da74fc78047 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Fri, 15 May 2020 11:25:03 +0200 Subject: [PATCH] iommu: Fix deferred domain attachment The IOMMU core code has support for deferring the attachment of a domain to a device. This is needed in kdump kernels where the new domain must not be attached to a device before the device driver takes it over. When the AMD IOMMU driver got converted to use the dma-iommu implementation, the deferred attaching got lost. The code in dma-iommu.c has support for deferred attaching, but it calls into iommu_attach_device() to actually do it. But iommu_attach_device() will check if the device should be deferred in it code-path and do nothing, breaking deferred attachment. Provide a function in IOMMU core code to reliably attach a device to a domain without any deferred checks and also without other safe-guards. Cc: Jerry Snitselaar Cc: Tom Murphy Reported-by: Jerry Snitselaar Fixes: 7959b6f8 ("iommu/dma-iommu: Handle deferred devices") Signed-off-by: Joerg Roedel --- drivers/iommu/dma-iommu.c | 4 ++-- drivers/iommu/iommu.c | 37 - include/linux/iommu.h | 2 ++ 3 files changed, 36 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ba128d1cdaee..403fda04ea98 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -362,8 +362,8 @@ static int iommu_dma_deferred_attach(struct device *dev, return 0; if (unlikely(ops->is_attach_deferred && - ops->is_attach_deferred(domain, dev))) - return iommu_attach_device(domain, dev); +ops->is_attach_deferred(domain, dev))) + return iommu_attach_device_no_defer(domain, dev); Wouldn't it be simpler to just invoke ops->attach_dev directly and avoid having to formalise a public interface that nobody else should ever use anyway? That said, unless I've entirely misunderstood the situation I still think that something like the below makes more sense (apologies for broken whitespace). Robin. ->8- diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2b471419e26c..1a52e530774c 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -704,6 +704,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev) { int ret, i = 0; struct group_device *device; + struct iommu_domain *domain; device = kzalloc(sizeof(*device), GFP_KERNEL); if (!device) @@ -746,8 +747,11 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev) mutex_lock(>mutex); list_add_tail(>list, >devices); - if (group->domain) - ret = __iommu_attach_device(group->domain, dev); + domain = group->domain; + if (domain && (!domain->ops->is_attach_deferred || + !domain->ops->is_attach_deferred(domain, dev))) + ret = __iommu_attach_device(domain, dev); + } mutex_unlock(>mutex); if (ret) goto err_put_group; @@ -1652,9 +1656,6 @@ static int __iommu_attach_device(struct iommu_domain *domain, struct device *dev) { int ret; - if ((domain->ops->is_attach_deferred != NULL) && - domain->ops->is_attach_deferred(domain, dev)) - return 0; if (unlikely(domain->ops->attach_dev == NULL)) return -ENODEV; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Implement deferred domain attachment
On Fri, May 15, 2020 at 04:42:23PM +0100, Robin Murphy wrote: > > struct iommu_domain *iommu_get_dma_domain(struct device *dev) > > { > > - return dev->iommu_group->default_domain; > > + struct iommu_domain *domain = dev->iommu_group->default_domain; > > + > > + if (__iommu_is_attach_deferred(domain, dev)) > > + __iommu_attach_device_no_defer(domain, dev); > > This raises a red flag, since iommu-dma already has explicit deferred attach > handling where it should need it, immediately after this is called to > retrieve the domain. The whole thing smells to me like we should have an > explicit special-case in iommu_probe_device() rather than hooking > __iommu_attach_device() in general then having to bodge around the fallout > elsewhere. Good point, I missed that. But it didn't work for its only user, the AMD IOMMU driver, the reason is that it calls iommu_attach_device(), which in its code-path checks for deferred attaching again and bails out, without do the real attachment. But below updated fix should work. Jerry, could you please test it again? >From 4e262dedcd36c7572312c65e66416da74fc78047 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Fri, 15 May 2020 11:25:03 +0200 Subject: [PATCH] iommu: Fix deferred domain attachment The IOMMU core code has support for deferring the attachment of a domain to a device. This is needed in kdump kernels where the new domain must not be attached to a device before the device driver takes it over. When the AMD IOMMU driver got converted to use the dma-iommu implementation, the deferred attaching got lost. The code in dma-iommu.c has support for deferred attaching, but it calls into iommu_attach_device() to actually do it. But iommu_attach_device() will check if the device should be deferred in it code-path and do nothing, breaking deferred attachment. Provide a function in IOMMU core code to reliably attach a device to a domain without any deferred checks and also without other safe-guards. Cc: Jerry Snitselaar Cc: Tom Murphy Reported-by: Jerry Snitselaar Fixes: 7959b6f8 ("iommu/dma-iommu: Handle deferred devices") Signed-off-by: Joerg Roedel --- drivers/iommu/dma-iommu.c | 4 ++-- drivers/iommu/iommu.c | 37 - include/linux/iommu.h | 2 ++ 3 files changed, 36 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ba128d1cdaee..403fda04ea98 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -362,8 +362,8 @@ static int iommu_dma_deferred_attach(struct device *dev, return 0; if (unlikely(ops->is_attach_deferred && - ops->is_attach_deferred(domain, dev))) - return iommu_attach_device(domain, dev); +ops->is_attach_deferred(domain, dev))) + return iommu_attach_device_no_defer(domain, dev); return 0; } diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4050569188be..91dbdbc6d640 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -1889,13 +1890,19 @@ void iommu_domain_free(struct iommu_domain *domain) } EXPORT_SYMBOL_GPL(iommu_domain_free); -static int __iommu_attach_device(struct iommu_domain *domain, -struct device *dev) +static bool __iommu_is_attach_deferred(struct iommu_domain *domain, + struct device *dev) +{ + if (!domain->ops->is_attach_deferred) + return false; + + return domain->ops->is_attach_deferred(domain, dev); +} + +static int __iommu_attach_device_no_defer(struct iommu_domain *domain, + struct device *dev) { int ret; - if ((domain->ops->is_attach_deferred != NULL) && - domain->ops->is_attach_deferred(domain, dev)) - return 0; if (unlikely(domain->ops->attach_dev == NULL)) return -ENODEV; @@ -1903,9 +1910,29 @@ static int __iommu_attach_device(struct iommu_domain *domain, ret = domain->ops->attach_dev(domain, dev); if (!ret) trace_attach_device_to_domain(dev); + return ret; } +static int __iommu_attach_device(struct iommu_domain *domain, +struct device *dev) +{ + if (__iommu_is_attach_deferred(domain, dev)) + return 0; + + return __iommu_attach_device_no_defer(domain, dev); +} + +int iommu_attach_device_no_defer(struct iommu_domain *domain, +struct device *dev) +{ + /* Safe-Guard to only call this when needed */ + if (!is_kdump_kernel()) + return -ENODEV; + + return __iommu_attach_device_no_defer(domain, dev); +} + int iommu_attach_device(struct iommu_domain *domain, struct device *dev) {
Re: [PATCH] iommu: Remove functions that support private domain
On Fri, May 15, 2020 at 08:55:42PM +0800, Lu Baolu wrote: > It seems that we can do like this: > > [1] mutex_lock(>lock) > [2] for_each_group_device(device_lock()) > [3] if (for_each_group_device(!device_is_bound())) > change_default_domain() > [4] for_each_group_device_reverse(device_unlock()) > [5] mutex_unlock(>lock) The problem here is that I am pretty sure we also have: device_lock() /* from device/driver core code */ -> bus_notifier() -> iommu_bus_notifier() -> ... -> mutex_lock(>lock) Which would cause lock-inversion with the above code. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement
Can you please lift the untrusted flag into struct device? It really isn't a PCI specific concept, and we should not have code poking into pci_dev all over the iommu code. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Implement deferred domain attachment
On 2020-05-15 10:45, Joerg Roedel wrote: From: Joerg Roedel The IOMMU core code has support for deferring the attachment of a domain to a device. This is needed in kdump kernels where the new domain must not be attached to a device before the device driver takes it over. But this needs support from the dma-ops code too, to actually do the late attachment when there are DMA-API calls for the device. This got lost in the AMD IOMMU driver after converting it to the dma-iommu code. Do the late attachment in the dma-iommu code-path to fix the issue. Cc: Jerry Snitselaar Cc: Tom Murphy Reported-by: Jerry Snitselaar Tested-by: Jerry Snitselaar Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api") Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 33 +++-- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4050569188be..f54ebb964271 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1889,13 +1889,19 @@ void iommu_domain_free(struct iommu_domain *domain) } EXPORT_SYMBOL_GPL(iommu_domain_free); -static int __iommu_attach_device(struct iommu_domain *domain, -struct device *dev) +static bool __iommu_is_attach_deferred(struct iommu_domain *domain, + struct device *dev) +{ + if (!domain->ops->is_attach_deferred) + return false; + + return domain->ops->is_attach_deferred(domain, dev); +} + +static int __iommu_attach_device_no_defer(struct iommu_domain *domain, + struct device *dev) { int ret; - if ((domain->ops->is_attach_deferred != NULL) && - domain->ops->is_attach_deferred(domain, dev)) - return 0; if (unlikely(domain->ops->attach_dev == NULL)) return -ENODEV; @@ -1903,9 +1909,19 @@ static int __iommu_attach_device(struct iommu_domain *domain, ret = domain->ops->attach_dev(domain, dev); if (!ret) trace_attach_device_to_domain(dev); + return ret; } +static int __iommu_attach_device(struct iommu_domain *domain, +struct device *dev) +{ + if (__iommu_is_attach_deferred(domain, dev)) + return 0; + + return __iommu_attach_device_no_defer(domain, dev); +} + int iommu_attach_device(struct iommu_domain *domain, struct device *dev) { struct iommu_group *group; @@ -2023,7 +2039,12 @@ EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev); */ struct iommu_domain *iommu_get_dma_domain(struct device *dev) { - return dev->iommu_group->default_domain; + struct iommu_domain *domain = dev->iommu_group->default_domain; + + if (__iommu_is_attach_deferred(domain, dev)) + __iommu_attach_device_no_defer(domain, dev); This raises a red flag, since iommu-dma already has explicit deferred attach handling where it should need it, immediately after this is called to retrieve the domain. The whole thing smells to me like we should have an explicit special-case in iommu_probe_device() rather than hooking __iommu_attach_device() in general then having to bodge around the fallout elsewhere. Robin. + + return domain; } /* ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
On Fri, May 15, 2020 at 12:39:14AM -0700, Tian, Kevin wrote: > Hi, Alex, > > When working on an updated version Yi and I found an design open > which needs your guidance. > > In concept nested translation can be incarnated as one GPA->HPA page > table and multiple GVA->GPA page tables per VM. It means one container > is sufficient to include all SVA-capable devices assigned to the same guest, > as there is just one iova space (GPA->HPA) to be managed by vfio iommu > map/unmap api. GVA->GPA page tables are bound through the new api > introduced in this patch. It is different from legacy shadow translation > which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device requires > its own iova space and must be in a separate container. > > However supporting multiple SVA-capable devices in one container > imposes one challenge. Now the bind_guest_pgtbl is implemented as > iommu type1 api. From the definition of iommu type 1 any operation > should be applied to all devices within the same container, just like > dma map/unmap. However this philosophy is incorrect regarding to > page table binding. We must follow the guest binding requirements > between its page tables and assigned devices, otherwise every bound > address space is suddenly accessible by all assigned devices thus creating > security holes. The above 2 paragraphs are bit confusing :-) but what really matters is the below: > > Do you think whether it's possible to extend iommu api to accept > device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds > also problematic, as PASID and page tables are IOMMU things which > are not touched by vfio device drivers today. All you are referring to is when admin groups multiple devices in a single container, you are saying you can't give isolation to them for SVA purposes. This is logically equivalent to a single group with multiple devices. - Such as devices behind p2p bridge - MFD's or devices behind switches or hieararchies without ACS support for instance. By limitation you mean, disable PASID on those devices in a single container? what about ATS? > > Alternatively can we impose the limitation that nesting APIs can be > only enabled for singleton containers which contains only one device? > This basically falls back to the state of legacy shadow translation > vIOMMU. and our current Qemu vIOMMU implementation actually > does this way regardless of whether nesting is used. Do you think > whether such tradeoff is acceptable as a starting point? > > Thanks > Kevin > > > -Original Message- > > From: Liu, Yi L > > Sent: Sunday, March 22, 2020 8:32 PM > > To: alex.william...@redhat.com; eric.au...@redhat.com > > Cc: Tian, Kevin ; jacob.jun@linux.intel.com; > > j...@8bytes.org; Raj, Ashok ; Liu, Yi L > > ; Tian, Jun J ; Sun, Yi Y > > ; jean-phili...@linaro.org; pet...@redhat.com; > > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux- > > ker...@vger.kernel.org; Wu, Hao > > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host > > > > From: Liu Yi L > > > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by > > hardware > > IOMMUs that have nesting DMA translation (a.k.a dual stage address > > translation). For such hardware IOMMUs, there are two stages/levels of > > address translation, and software may let userspace/VM to own the first- > > level/stage-1 translation structures. Example of such usage is vSVA ( > > virtual Shared Virtual Addressing). VM owns the first-level/stage-1 > > translation structures and bind the structures to host, then hardware > > IOMMU would utilize nesting translation when doing DMA translation fo > > the devices behind such hardware IOMMU. > > > > This patch adds vfio support for binding guest translation (a.k.a stage 1) > > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only > > bind > > guest page table is needed, it also requires to expose interface to guest > > for iommu cache invalidation when guest modified the first-level/stage-1 > > translation structures since hardware needs to be notified to flush stale > > iotlbs. This would be introduced in next patch. > > > > In this patch, guest page table bind and unbind are done by using flags > > VFIO_IOMMU_BIND_GUEST_PGTBL and > > VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL > > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by > > struct iommu_gpasid_bind_data. Before binding guest page table to host, > > VM should have got a PASID allocated by host via > > VFIO_IOMMU_PASID_REQUEST. > > > > Bind guest translation structures (here is guest page table) to host > > are the first step to setup vSVA (Virtual Shared Virtual Addressing). > > > > Cc: Kevin Tian > > CC: Jacob Pan > > Cc: Alex Williamson > > Cc: Eric Auger > > Cc: Jean-Philippe Brucker > > Signed-off-by: Jean-Philippe Brucker > > Signed-off-by: Liu Yi L > > Signed-off-by: Jacob Pan > > --- > > drivers/vfio/vfio_iommu_type1.c | 158 > >
Re: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem
On Fri, May 15, 2020 at 01:10:21PM +0100, Robin Murphy wrote: >> Meanwhile, for the safety of buffers, lower-layer drivers need to make >> certain the buffers have already been unmapped in iommu before those buffers >> go back to buddy for other users. > > That sounds like it would only have benefit in a very small set of specific > circumstances, and would be very difficult to generalise to buffers that > are mapped via dma_map_page() or dma_map_single(). Furthermore, a > high-level API that affects a low-level driver's interpretation of > mid-layer API calls without the mid-layer's knowledge sounds like a hideous > abomination of anti-design. If a mid-layer API lends itself to inefficiency > at the lower level, it would seem a lot cleaner and more robust to extend > *that* API for stateful buffer reuse. Failing that, it might possibly be > appropriate to approach this at the driver level - many of the cleverer > network drivers already implement buffer pools to recycle mapped SKBs > internally, couldn't the "zip driver" simply try doing something like that > for itself? Exactly. If you upper consumer of the DMA API keeps reusing the same pages just map them once and use dma_sync_* to transfer ownership as needed. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Implement deferred domain attachment
On Fri, May 15, 2020 at 09:51:03PM +0800, Lu Baolu wrote: > On 2020/5/15 17:45, Joerg Roedel wrote: > > struct iommu_domain *iommu_get_dma_domain(struct device *dev) > > { > > - return dev->iommu_group->default_domain; > > + struct iommu_domain *domain = dev->iommu_group->default_domain; > > + > > + if (__iommu_is_attach_deferred(domain, dev)) > > + __iommu_attach_device_no_defer(domain, dev); > > It seems that the return value needs to be checked. The default domain > is invalid if attach() failed. True, I looked at that, the callers can't handle returning NULL here, so I kept it this way for now. The outcome is that DMA will fail, but otherwise we'd see a NULL-ptr dereference really quickly after returning from that function. Bottom line: This needs to be cleaned up separatly. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Implement deferred domain attachment
Hi Joerg, On 2020/5/15 17:45, Joerg Roedel wrote: From: Joerg Roedel The IOMMU core code has support for deferring the attachment of a domain to a device. This is needed in kdump kernels where the new domain must not be attached to a device before the device driver takes it over. But this needs support from the dma-ops code too, to actually do the late attachment when there are DMA-API calls for the device. This got lost in the AMD IOMMU driver after converting it to the dma-iommu code. Do the late attachment in the dma-iommu code-path to fix the issue. Cc: Jerry Snitselaar Cc: Tom Murphy Reported-by: Jerry Snitselaar Tested-by: Jerry Snitselaar Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api") Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 33 +++-- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4050569188be..f54ebb964271 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1889,13 +1889,19 @@ void iommu_domain_free(struct iommu_domain *domain) } EXPORT_SYMBOL_GPL(iommu_domain_free); -static int __iommu_attach_device(struct iommu_domain *domain, -struct device *dev) +static bool __iommu_is_attach_deferred(struct iommu_domain *domain, + struct device *dev) +{ + if (!domain->ops->is_attach_deferred) + return false; + + return domain->ops->is_attach_deferred(domain, dev); +} + +static int __iommu_attach_device_no_defer(struct iommu_domain *domain, + struct device *dev) { int ret; - if ((domain->ops->is_attach_deferred != NULL) && - domain->ops->is_attach_deferred(domain, dev)) - return 0; if (unlikely(domain->ops->attach_dev == NULL)) return -ENODEV; @@ -1903,9 +1909,19 @@ static int __iommu_attach_device(struct iommu_domain *domain, ret = domain->ops->attach_dev(domain, dev); if (!ret) trace_attach_device_to_domain(dev); + return ret; } +static int __iommu_attach_device(struct iommu_domain *domain, +struct device *dev) +{ + if (__iommu_is_attach_deferred(domain, dev)) + return 0; + + return __iommu_attach_device_no_defer(domain, dev); +} + int iommu_attach_device(struct iommu_domain *domain, struct device *dev) { struct iommu_group *group; @@ -2023,7 +2039,12 @@ EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev); */ struct iommu_domain *iommu_get_dma_domain(struct device *dev) { - return dev->iommu_group->default_domain; + struct iommu_domain *domain = dev->iommu_group->default_domain; + + if (__iommu_is_attach_deferred(domain, dev)) + __iommu_attach_device_no_defer(domain, dev); It seems that the return value needs to be checked. The default domain is invalid if attach() failed. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Remove functions that support private domain
Hi, On 2020/5/15 17:59, Joerg Roedel wrote: On Thu, May 14, 2020 at 11:12:52PM +, Prakhya, Sai Praneeth wrote: +static int is_driver_bound(struct device *dev, void *not_used) +{ + int ret = 0; + + device_lock(dev); + if (device_is_bound(dev)) + ret = 1; + device_unlock(dev); + return ret; +} This locks only one device, so without lock-conversion there could be a driver probe after the device_unlock(), while we are probing the other devices of the group. [SNIP] + /* +* Check if any device in the group still has a driver binded to it. +* This might race with device driver probing code and unfortunately +* there is no clean way out of that either, locking all devices in the +* group and then do the re-attach will introduce a lock-inversion with +* group->mutex - Joerg. +*/ + if (iommu_group_for_each_dev(group, NULL, is_driver_bound)) { + pr_err("Active drivers exist for devices in the group\n"); + return -EBUSY; + } The lock inversion comes into the picture when this code is called from device(-driver) core through the bus-notifiers. The notifiers are called with the device already locked. Another question I have is.. if it's racy then it should be racy even for one device iommu groups.. right? Why would it be racy only with multiple devices iommu group? Valid point. So the device needs to be locked _while_ the default domain change happens. If triggered by sysfs there should be no locking problems, I guess. But you better try it out. It seems that we can do like this: [1] mutex_lock(>lock) [2] for_each_group_device(device_lock()) [3] if (for_each_group_device(!device_is_bound())) change_default_domain() [4] for_each_group_device_reverse(device_unlock()) [5] mutex_unlock(>lock) A possible problem exists at step 2 when another thread is trying to lock devices in the reverse order at the same time. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] iommu/amd: Use pci_ats_supported()
On Fri, May 15, 2020 at 02:11:24PM +0200, Jean-Philippe Brucker wrote: > On Fri, May 15, 2020 at 02:01:50PM +0200, Joerg Roedel wrote: > > Hmm, but per patch 1, pci_ats_supported() does not check > > pci_ats_disabled(), or do I miss something? > > The commit message isn't clear. pci_ats_init() sets dev->ats_cap only if > !pci_ats_disabled(), so checking dev->ats_cap in pci_ats_supported() > takes pci_ats_disabled() into account. Right, so the patch is fine: Reviewed-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] iommu/amd: Use pci_ats_supported()
On Fri, May 15, 2020 at 02:01:50PM +0200, Joerg Roedel wrote: > On Fri, May 15, 2020 at 12:44:00PM +0200, Jean-Philippe Brucker wrote: > > The pci_ats_supported() function checks if a device supports ATS and is > > allowed to use it. In addition to checking that the device has an ATS > > capability and that the global pci=noats is not set > > (pci_ats_disabled()), it also checks if a device is untrusted. > > Hmm, but per patch 1, pci_ats_supported() does not check > pci_ats_disabled(), or do I miss something? The commit message isn't clear. pci_ats_init() sets dev->ats_cap only if !pci_ats_disabled(), so checking dev->ats_cap in pci_ats_supported() takes pci_ats_disabled() into account. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem
On 2020-05-15 09:19, Song Bao Hua wrote: [ snip... nice analysis, but ultimately it's still "doing stuff has more overhead than not doing stuff" ] I am thinking several possible ways on decreasing or removing the latency of DMA map/unmap for every single DMA transfer. Meanwhile, "non-strict" as an existing option with possible safety issues, I won't discuss it in this mail. But passthrough and non-strict mode *specifically exist* for the cases where performance is the most important concern - streaming DMA with an IOMMU in the middle has an unavoidable tradeoff between performance and isolation, so dismissing that out of hand is not a good way to start making this argument. 1. provide bounce coherent buffers for streaming buffers. As the coherent buffers keep the status of mapping, we can remove the overhead of map and unmap for each single DMA operations. However, this solution requires memory copy between stream buffers and bounce buffers. Thus it will work only if copy is faster than map/unmap. Meanwhile, it will consume much more memory bandwidth. I'm struggling to understand how that would work, can you explain it in more detail? 2.make upper-layer kernel components aware of the pain of iommu map/unmap upper-layer fs, mm, networks can somehow let the lower-layer drivers know the end of the life cycle of sg buffers. In zswap case, I have seen zswap always use the same 2 pages as the destination buffers to save compressed page, but the compressor driver still has to constantly map and unmap those same two pages for every single compression since zswap and zip drivers are working in two completely different software layers. I am thinking some way as below, upper-layer kernel code can call: sg_init_table(); sg_mark_reusable(); /* use the buffer many times */ sg_mark_stop_reuse(); After that, if low level drivers see "reusable" flag, it will realize the buffer can be used multiple times and will not do map/unmap every time. it means upper-layer components will further use the buffers and the same buffers will probably be given to lower-layer drivers for new DMA transfer later. When upper-layer code sets " stop_reuse", lower-layer driver will unmap the sg buffers, possibly by providing a unmap-callback to upper-layer components. For zswap case, I have seen the same buffers are always re-used and zip driver maps and unmaps it again and again. Shortly after the buffer is unmapped, it will be mapped in the next transmission, almost without any time gap between unmap and map. In case zswap can set the "reusable" flag, zip driver will save a lot of time. Meanwhile, for the safety of buffers, lower-layer drivers need to make certain the buffers have already been unmapped in iommu before those buffers go back to buddy for other users. That sounds like it would only have benefit in a very small set of specific circumstances, and would be very difficult to generalise to buffers that are mapped via dma_map_page() or dma_map_single(). Furthermore, a high-level API that affects a low-level driver's interpretation of mid-layer API calls without the mid-layer's knowledge sounds like a hideous abomination of anti-design. If a mid-layer API lends itself to inefficiency at the lower level, it would seem a lot cleaner and more robust to extend *that* API for stateful buffer reuse. Failing that, it might possibly be appropriate to approach this at the driver level - many of the cleverer network drivers already implement buffer pools to recycle mapped SKBs internally, couldn't the "zip driver" simply try doing something like that for itself? Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] iommu/amd: Use pci_ats_supported()
On Fri, May 15, 2020 at 12:44:00PM +0200, Jean-Philippe Brucker wrote: > The pci_ats_supported() function checks if a device supports ATS and is > allowed to use it. In addition to checking that the device has an ATS > capability and that the global pci=noats is not set > (pci_ats_disabled()), it also checks if a device is untrusted. Hmm, but per patch 1, pci_ats_supported() does not check pci_ats_disabled(), or do I miss something? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] PCI/ATS: Only enable ATS for trusted devices
Hi Jean-Philippe, thanks for doing this! On Fri, May 15, 2020 at 12:43:59PM +0200, Jean-Philippe Brucker wrote: > Add pci_ats_supported(), which checks whether a device has an ATS > capability, and whether it is trusted. A device is untrusted if it is > plugged into an external-facing port such as Thunderbolt and could be > spoof an existing device to exploit weaknesses in the IOMMU > configuration. PCIe ATS is one such weaknesses since it allows > endpoints to cache IOMMU translations and emit transactions with > 'Translated' Address Type (10b) that partially bypass the IOMMU > translation. > > The SMMUv3 and VT-d IOMMU drivers already disallow ATS and transactions > with 'Translated' Address Type for untrusted devices. Add the check to > pci_enable_ats() to let other drivers (AMD IOMMU for now) benefit from > it. > > By checking ats_cap, the pci_ats_supported() helper also returns whether > ATS was globally disabled with pci=noats, and could later include more > things, for example whether the whole PCIe hierarchy down to the > endpoint supports ATS. > > Signed-off-by: Jean-Philippe Brucker > --- > include/linux/pci-ats.h | 3 +++ > drivers/pci/ats.c | 18 +- > 2 files changed, 20 insertions(+), 1 deletion(-) > > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h > index d08f0869f1213e..f75c307f346de9 100644 > --- a/include/linux/pci-ats.h > +++ b/include/linux/pci-ats.h > @@ -6,11 +6,14 @@ > > #ifdef CONFIG_PCI_ATS > /* Address Translation Service */ > +bool pci_ats_supported(struct pci_dev *dev); > int pci_enable_ats(struct pci_dev *dev, int ps); > void pci_disable_ats(struct pci_dev *dev); > int pci_ats_queue_depth(struct pci_dev *dev); > int pci_ats_page_aligned(struct pci_dev *dev); > #else /* CONFIG_PCI_ATS */ > +static inline bool pci_ats_supported(struct pci_dev *d) > +{ return false; } > static inline int pci_enable_ats(struct pci_dev *d, int ps) > { return -ENODEV; } > static inline void pci_disable_ats(struct pci_dev *d) { } > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c > index 390e92f2d8d1fc..15fa0c37fd8e44 100644 > --- a/drivers/pci/ats.c > +++ b/drivers/pci/ats.c > @@ -30,6 +30,22 @@ void pci_ats_init(struct pci_dev *dev) > dev->ats_cap = pos; > } > > +/** > + * pci_ats_supported - check if the device can use ATS > + * @dev: the PCI device > + * > + * Returns true if the device supports ATS and is allowed to use it, false > + * otherwise. > + */ > +bool pci_ats_supported(struct pci_dev *dev) > +{ > + if (!dev->ats_cap) > + return false; > + > + return !dev->untrusted; dev->untrusted is an 'unsigned int :1', so while this works I would prefer 'return (dev->untrusted == 0);' here, to be more type-safe. With that changed: Reviewed-by: Joerg Roedel > +} > +EXPORT_SYMBOL_GPL(pci_ats_supported); > + > /** > * pci_enable_ats - enable the ATS capability > * @dev: the PCI device > @@ -42,7 +58,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps) > u16 ctrl; > struct pci_dev *pdev; > > - if (!dev->ats_cap) > + if (!pci_ats_supported(dev)) > return -EINVAL; > > if (WARN_ON(dev->ats_enabled)) > -- > 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/4] iommu/vt-d: Use pci_ats_supported()
The pci_ats_supported() helper checks if a device supports ATS and is allowed to use it. By checking the ATS capability it also integrates the pci_ats_disabled() check from pci_ats_init(). Simplify the vt-d checks. Acked-by: Lu Baolu Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/intel-iommu.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 0182cff2c7ac75..ed21ce6d123810 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1454,8 +1454,7 @@ static void iommu_enable_dev_iotlb(struct device_domain_info *info) !pci_reset_pri(pdev) && !pci_enable_pri(pdev, 32)) info->pri_enabled = 1; #endif - if (!pdev->untrusted && info->ats_supported && - pci_ats_page_aligned(pdev) && + if (info->ats_supported && pci_ats_page_aligned(pdev) && !pci_enable_ats(pdev, VTD_PAGE_SHIFT)) { info->ats_enabled = 1; domain_update_iotlb(info->domain); @@ -2611,10 +2610,8 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, if (dev && dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(info->dev); - if (!pdev->untrusted && - !pci_ats_disabled() && - ecap_dev_iotlb_support(iommu->ecap) && - pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS) && + if (ecap_dev_iotlb_support(iommu->ecap) && + pci_ats_supported(pdev) && dmar_find_matched_atsr_unit(pdev)) info->ats_supported = 1; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/4] PCI/ATS: Only enable ATS for trusted devices
Add pci_ats_supported(), which checks whether a device has an ATS capability, and whether it is trusted. A device is untrusted if it is plugged into an external-facing port such as Thunderbolt and could be spoof an existing device to exploit weaknesses in the IOMMU configuration. PCIe ATS is one such weaknesses since it allows endpoints to cache IOMMU translations and emit transactions with 'Translated' Address Type (10b) that partially bypass the IOMMU translation. The SMMUv3 and VT-d IOMMU drivers already disallow ATS and transactions with 'Translated' Address Type for untrusted devices. Add the check to pci_enable_ats() to let other drivers (AMD IOMMU for now) benefit from it. By checking ats_cap, the pci_ats_supported() helper also returns whether ATS was globally disabled with pci=noats, and could later include more things, for example whether the whole PCIe hierarchy down to the endpoint supports ATS. Signed-off-by: Jean-Philippe Brucker --- include/linux/pci-ats.h | 3 +++ drivers/pci/ats.c | 18 +- 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h index d08f0869f1213e..f75c307f346de9 100644 --- a/include/linux/pci-ats.h +++ b/include/linux/pci-ats.h @@ -6,11 +6,14 @@ #ifdef CONFIG_PCI_ATS /* Address Translation Service */ +bool pci_ats_supported(struct pci_dev *dev); int pci_enable_ats(struct pci_dev *dev, int ps); void pci_disable_ats(struct pci_dev *dev); int pci_ats_queue_depth(struct pci_dev *dev); int pci_ats_page_aligned(struct pci_dev *dev); #else /* CONFIG_PCI_ATS */ +static inline bool pci_ats_supported(struct pci_dev *d) +{ return false; } static inline int pci_enable_ats(struct pci_dev *d, int ps) { return -ENODEV; } static inline void pci_disable_ats(struct pci_dev *d) { } diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c index 390e92f2d8d1fc..15fa0c37fd8e44 100644 --- a/drivers/pci/ats.c +++ b/drivers/pci/ats.c @@ -30,6 +30,22 @@ void pci_ats_init(struct pci_dev *dev) dev->ats_cap = pos; } +/** + * pci_ats_supported - check if the device can use ATS + * @dev: the PCI device + * + * Returns true if the device supports ATS and is allowed to use it, false + * otherwise. + */ +bool pci_ats_supported(struct pci_dev *dev) +{ + if (!dev->ats_cap) + return false; + + return !dev->untrusted; +} +EXPORT_SYMBOL_GPL(pci_ats_supported); + /** * pci_enable_ats - enable the ATS capability * @dev: the PCI device @@ -42,7 +58,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps) u16 ctrl; struct pci_dev *pdev; - if (!dev->ats_cap) + if (!pci_ats_supported(dev)) return -EINVAL; if (WARN_ON(dev->ats_enabled)) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/4] iommu/arm-smmu-v3: Use pci_ats_supported()
The new pci_ats_supported() function checks if a device supports ATS and is allowed to use it. Signed-off-by: Jean-Philippe Brucker --- I dropped the Ack because I slightly changed the patch to keep the fwspec check, since last version: https://lore.kernel.org/linux-iommu/20200311124506.208376-8-jean-phili...@linaro.org/ --- drivers/iommu/arm-smmu-v3.c | 20 +--- 1 file changed, 5 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 82508730feb7a1..39b935e86ab203 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2652,26 +2652,16 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master) } } -#ifdef CONFIG_PCI_ATS static bool arm_smmu_ats_supported(struct arm_smmu_master *master) { - struct pci_dev *pdev; + struct device *dev = master->dev; struct arm_smmu_device *smmu = master->smmu; - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev); - - if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) || - !(fwspec->flags & IOMMU_FWSPEC_PCI_RC_ATS) || pci_ats_disabled()) - return false; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); - pdev = to_pci_dev(master->dev); - return !pdev->untrusted && pdev->ats_cap; + return (smmu->features & ARM_SMMU_FEAT_ATS) && + !(fwspec->flags & IOMMU_FWSPEC_PCI_RC_ATS) && + dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev)); } -#else -static bool arm_smmu_ats_supported(struct arm_smmu_master *master) -{ - return false; -} -#endif static void arm_smmu_enable_ats(struct arm_smmu_master *master) { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement
I sent these in March as part of ATS enablement for device-tree [1], but haven't found the time to address the largest comment on that series about consolidating the root bridge ATS support between the different ACPI tables. I'm resending only the bits that consolidate the 'untrusted' check for ATS, since there have been more discussions about this [2]. Patch 1 moves the 'untrusted' check to drivers/pci/ats.c and patches 2-4 modify the ATS-capable IOMMU drivers. The only functional change should be to the AMD IOMMU driver. With this change all IOMMU drivers block 'Translated' PCIe transactions and Translation Requests from untrusted devices. [1] https://lore.kernel.org/linux-iommu/20200311124506.208376-1-jean-phili...@linaro.org/ [2] https://lore.kernel.org/linux-pci/20200513151929.GA38418@bjorn-Precision-5520/ Jean-Philippe Brucker (4): PCI/ATS: Only enable ATS for trusted devices iommu/amd: Use pci_ats_supported() iommu/arm-smmu-v3: Use pci_ats_supported() iommu/vt-d: Use pci_ats_supported() include/linux/pci-ats.h | 3 +++ drivers/iommu/amd_iommu.c | 12 drivers/iommu/arm-smmu-v3.c | 20 +--- drivers/iommu/intel-iommu.c | 9 +++-- drivers/pci/ats.c | 18 +- 5 files changed, 32 insertions(+), 30 deletions(-) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/4] iommu/amd: Use pci_ats_supported()
The pci_ats_supported() function checks if a device supports ATS and is allowed to use it. In addition to checking that the device has an ATS capability and that the global pci=noats is not set (pci_ats_disabled()), it also checks if a device is untrusted. A device is untrusted if it is plugged into an external-facing port such as Thunderbolt and could be spoofing an existing device to exploit weaknesses in the IOMMU configuration. By calling pci_ats_supported() we keep DTE[I]=0 for untrusted devices and abort transactions with Pretranslated Addresses. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/amd_iommu.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 1dc3718560d0e8..8b7a9e811d33a6 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -313,16 +313,15 @@ static struct iommu_group *acpihid_device_group(struct device *dev) static bool pci_iommuv2_capable(struct pci_dev *pdev) { static const int caps[] = { - PCI_EXT_CAP_ID_ATS, PCI_EXT_CAP_ID_PRI, PCI_EXT_CAP_ID_PASID, }; int i, pos; - if (pci_ats_disabled()) + if (!pci_ats_supported(pdev)) return false; - for (i = 0; i < 3; ++i) { + for (i = 0; i < 2; ++i) { pos = pci_find_ext_capability(pdev, caps[i]); if (pos == 0) return false; @@ -3150,11 +3149,8 @@ int amd_iommu_device_info(struct pci_dev *pdev, memset(info, 0, sizeof(*info)); - if (!pci_ats_disabled()) { - pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS); - if (pos) - info->flags |= AMD_IOMMU_DEVICE_FLAG_ATS_SUP; - } + if (pci_ats_supported(pdev)) + info->flags |= AMD_IOMMU_DEVICE_FLAG_ATS_SUP; pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI); if (pos) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 30/38] dmabuf: fix common struct sg_table related issues
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c > index acb26c6..89e293b 100644 > --- a/drivers/dma-buf/udmabuf.c > +++ b/drivers/dma-buf/udmabuf.c > @@ -63,10 +63,9 @@ static struct sg_table *get_sg_table(struct device *dev, > struct dma_buf *buf, > GFP_KERNEL); > if (ret < 0) > goto err; > - if (!dma_map_sg(dev, sg->sgl, sg->nents, direction)) { > - ret = -EINVAL; > + ret = dma_map_sgtable(dev, sg, direction, 0); > + if (ret < 0) > goto err; > - } > return sg; > > err: > @@ -78,7 +77,7 @@ static struct sg_table *get_sg_table(struct device *dev, > struct dma_buf *buf, > static void put_sg_table(struct device *dev, struct sg_table *sg, >enum dma_data_direction direction) > { > - dma_unmap_sg(dev, sg->sgl, sg->nents, direction); > + dma_unmap_sgtable(dev, sg, direction, 0); > sg_free_table(sg); > kfree(sg); > } Easy straightforward conversation. Acked-by: Gerd Hoffmann take care, Gerd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 25/38] drm: virtio: fix common struct sg_table related issues
On Wed, May 13, 2020 at 03:32:32PM +0200, Marek Szyprowski wrote: > The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function > returns the number of the created entries in the DMA address space. > However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and > dma_unmap_sg must be called with the original number of the entries > passed to the dma_map_sg(). > > struct sg_table is a common structure used for describing a non-contiguous > memory buffer, used commonly in the DRM and graphics subsystems. It > consists of a scatterlist with memory pages and DMA addresses (sgl entry), > as well as the number of scatterlist entries: CPU pages (orig_nents entry) > and DMA mapped pages (nents entry). > > It turned out that it was a common mistake to misuse nents and orig_nents > entries, calling DMA-mapping functions with a wrong number of entries or > ignoring the number of mapped entries returned by the dma_map_sg() > function. > > To avoid such issues, lets use a common dma-mapping wrappers operating > directly on the struct sg_table objects and use scatterlist page > iterators where possible. This, almost always, hides references to the > nents and orig_nents entries, making the code robust, easier to follow > and copy/paste safe. Looks all sane. Acked-by: Gerd Hoffmann take care, Gerd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 23/33] iommu/mediatek-v1 Convert to probe/release_device() call-backs
Hi, On Fri, May 15, 2020 at 03:44:59PM +0800, Yong Wu wrote: > On Tue, 2020-04-14 at 15:15 +0200, Joerg Roedel wrote: > > - return iommu_device_link(>iommu, dev); > > + err = arm_iommu_attach_device(dev, mtk_mapping); > > + if (err) > > + dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not > > work\n"); > > > Hi Joerg, > > Thanks very much for this patch. > > This arm_iommu_attach_device is called just as we expected. > > But it will fail in this callstack as the group->mutex was tried to > be re-locked... > > [] (iommu_attach_device) from [] > (__arm_iommu_attach_device+0x34/0x90) > [] (__arm_iommu_attach_device) from [] > (arm_iommu_attach_device+0xc/0x20) > [] (arm_iommu_attach_device) from [] > (mtk_iommu_probe_finalize+0x34/0x50) > [] (mtk_iommu_probe_finalize) from [] > (bus_iommu_probe+0x2a8/0x2c4) > [] (bus_iommu_probe) from [] (bus_set_iommu > +0x88/0xd4) > [] (bus_set_iommu) from [] (mtk_iommu_probe > +0x2f8/0x364) Thanks for the report, is https://lore.kernel.org/lkml/1589530123-30240-1-git-send-email-yong...@mediatek.com/ The fix for this issue or is something else required? Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Remove functions that support private domain
On Wed, May 13, 2020 at 03:47:21PM -0700, Sai Praneeth Prakhya wrote: > After moving iommu_group setup to iommu core code [1][2] and removing > private domain support in vt-d [3], there are no users for functions such > as iommu_request_dm_for_dev(), iommu_request_dma_domain_for_dev() and > request_default_domain_for_dev(). So, remove these functions. > > [1] commit dce8d6964ebd ("iommu/amd: Convert to probe/release_device() > call-backs") > [2] commit e5d1841f18b2 ("iommu/vt-d: Convert to probe/release_device() > call-backs") > [3] commit 327d5b2fee91 ("iommu/vt-d: Allow 32bit devices to uses DMA > domain") > > Cc: Joerg Roedel > Cc: Lu Baolu > Signed-off-by: Sai Praneeth Prakhya > --- > drivers/iommu/iommu.c | 65 --- > include/linux/iommu.h | 12 > 2 files changed, 77 deletions(-) Applied that patch to the x86/vt-d branch. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Remove functions that support private domain
On Thu, May 14, 2020 at 11:12:52PM +, Prakhya, Sai Praneeth wrote: > +static int is_driver_bound(struct device *dev, void *not_used) > +{ > + int ret = 0; > + > + device_lock(dev); > + if (device_is_bound(dev)) > + ret = 1; > + device_unlock(dev); > + return ret; > +} This locks only one device, so without lock-conversion there could be a driver probe after the device_unlock(), while we are probing the other devices of the group. > [SNIP] > > + /* > + * Check if any device in the group still has a driver binded to it. > + * This might race with device driver probing code and unfortunately > + * there is no clean way out of that either, locking all devices in the > + * group and then do the re-attach will introduce a lock-inversion with > + * group->mutex - Joerg. > + */ > + if (iommu_group_for_each_dev(group, NULL, is_driver_bound)) { > + pr_err("Active drivers exist for devices in the group\n"); > + return -EBUSY; > + } The lock inversion comes into the picture when this code is called from device(-driver) core through the bus-notifiers. The notifiers are called with the device already locked. > Another question I have is.. if it's racy then it should be racy even > for one device iommu groups.. right? Why would it be racy only with > multiple devices iommu group? Valid point. So the device needs to be locked _while_ the default domain change happens. If triggered by sysfs there should be no locking problems, I guess. But you better try it out. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/5] drm/sun4i: mixer: Call of_dma_configure if there's an IOMMU
Hi, On Wed 13 May 20, 16:07, Maxime Ripard wrote: > The main DRM device is actually a virtual device so it doesn't have the > iommus property, which is instead on the DMA masters, in this case the > mixers. > > Add a call to of_dma_configure with the mixers DT node but on the DRM > virtual device to configure it in the same way than the mixers. Although I'm not very familiar with the DMA API, this looks legit to me and matches what's already done in sun4i_backend for the interconnect. So: Reviewed-by: Paul Kocialkowski Cheers, Paul > Signed-off-by: Maxime Ripard > --- > drivers/gpu/drm/sun4i/sun8i_mixer.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/drivers/gpu/drm/sun4i/sun8i_mixer.c > b/drivers/gpu/drm/sun4i/sun8i_mixer.c > index 56cc037fd312..cc4fb916318f 100644 > --- a/drivers/gpu/drm/sun4i/sun8i_mixer.c > +++ b/drivers/gpu/drm/sun4i/sun8i_mixer.c > @@ -363,6 +363,19 @@ static int sun8i_mixer_bind(struct device *dev, struct > device *master, > mixer->engine.ops = _engine_ops; > mixer->engine.node = dev->of_node; > > + if (of_find_property(dev->of_node, "iommus", NULL)) { > + /* > + * This assume we have the same DMA constraints for > + * all our the mixers in our pipeline. This sounds > + * bad, but it has always been the case for us, and > + * DRM doesn't do per-device allocation either, so we > + * would need to fix DRM first... > + */ > + ret = of_dma_configure(drm->dev, dev->of_node, true); > + if (ret) > + return ret; > + } > + > /* >* While this function can fail, we shouldn't do anything >* if this happens. Some early DE2 DT entries don't provide > -- > git-series 0.9.1 > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- Paul Kocialkowski, Bootlin Embedded Linux and kernel engineering https://bootlin.com signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: Implement deferred domain attachment
From: Joerg Roedel The IOMMU core code has support for deferring the attachment of a domain to a device. This is needed in kdump kernels where the new domain must not be attached to a device before the device driver takes it over. But this needs support from the dma-ops code too, to actually do the late attachment when there are DMA-API calls for the device. This got lost in the AMD IOMMU driver after converting it to the dma-iommu code. Do the late attachment in the dma-iommu code-path to fix the issue. Cc: Jerry Snitselaar Cc: Tom Murphy Reported-by: Jerry Snitselaar Tested-by: Jerry Snitselaar Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api") Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 33 +++-- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4050569188be..f54ebb964271 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1889,13 +1889,19 @@ void iommu_domain_free(struct iommu_domain *domain) } EXPORT_SYMBOL_GPL(iommu_domain_free); -static int __iommu_attach_device(struct iommu_domain *domain, -struct device *dev) +static bool __iommu_is_attach_deferred(struct iommu_domain *domain, + struct device *dev) +{ + if (!domain->ops->is_attach_deferred) + return false; + + return domain->ops->is_attach_deferred(domain, dev); +} + +static int __iommu_attach_device_no_defer(struct iommu_domain *domain, + struct device *dev) { int ret; - if ((domain->ops->is_attach_deferred != NULL) && - domain->ops->is_attach_deferred(domain, dev)) - return 0; if (unlikely(domain->ops->attach_dev == NULL)) return -ENODEV; @@ -1903,9 +1909,19 @@ static int __iommu_attach_device(struct iommu_domain *domain, ret = domain->ops->attach_dev(domain, dev); if (!ret) trace_attach_device_to_domain(dev); + return ret; } +static int __iommu_attach_device(struct iommu_domain *domain, +struct device *dev) +{ + if (__iommu_is_attach_deferred(domain, dev)) + return 0; + + return __iommu_attach_device_no_defer(domain, dev); +} + int iommu_attach_device(struct iommu_domain *domain, struct device *dev) { struct iommu_group *group; @@ -2023,7 +2039,12 @@ EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev); */ struct iommu_domain *iommu_get_dma_domain(struct device *dev) { - return dev->iommu_group->default_domain; + struct iommu_domain *domain = dev->iommu_group->default_domain; + + if (__iommu_is_attach_deferred(domain, dev)) + __iommu_attach_device_no_defer(domain, dev); + + return domain; } /* -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem
Hi Russell & All, In many DMA streaming map/unmap use cases, lower-layer device drivers completely have no idea how and when single/sg buffers are allocated and freed by upper-layer filesystem, network protocol, mm management system etc. So the only thing device drivers can do is constantly mapping the buffer before DMA begins and unmapping the buffer when DMA is done. This will dramatically increase the latency of dma_map_single/sg and dma_unmap_single/sg when these APIs are bound with the IOMMU backend. As for each map, iommu driver needs to allocate iova and do the map in iommu. And for each unmap, it needs to free iova and unmap the buffer in iommu hardware. When devices performing DMA are super-fast, for example, on 100GbE networks, the DMA streaming map/unmap latency might become a critical system bottleneck. In comparison to DMA streaming APIs, DMA consistent APIs using IOMMU backend may show much better performance as the map is done when the buffer is allocated and unmap is done when the buffer is freed. DMA can be done multiple times before the buffers are freed by dma_free_coherent(). There is no such map and unmap overhead for each separate DMA transfer as streaming APIs. The typical work flow is like dma_alloc_coherent-> doing DMA -> doing DMA -> doing DMA -> /* DMA many times */ dma_free_coherent However, the typical work flow for streaming DMA is like dma_map_sg -> doing DMA -> dma_unmap_sg -> dma_map_sg -> doing DMA -> dma_unmap_sg -> dma_map_sg -> doing DMA -> dma_unmap_sg -> /* map, DMA transfer, unmap many times */ Even though upper-layer software might use the same buffers multiple times, for each single DMA transmission, map and unmap still need to be done by lower-level drivers as lower-layer drivers don't know this fact. A possible routine to improve the performance of stream APIs is like: dma_map_sg -> dma_sync_sg_for_device -> doing DMA -> dma_sync_sg_for_device -> doing DMA -> dma_sync_sg_for_device -> doing DMA -> ... ->/* sync between DMA and CPU many times */ dma_unmap_sg For every single DMA, software only needs to do sync operations which are much lighter that map and unmap. But this case is often not applicable to device drivers as the buffers usually come from the upper-layer filesystem, network protocol, mm management system etc. Device drivers have to work with the assumption that the buffer will be freed immediately after DMA is done. However, for those device drivers which are able to allocate and free the DMA stream buffers by themselves, they will get benefits of reusing the same buffers for doing DMA multiple times without map/unmap overhead. I collected some latency data for iommu_dma_map_sg and iommu_dma_unmap_sg. In the test case, zswap is calling acomp APIs to compress/decompress pages, and comp/decomp is done by lower-level hardware ZIP driver. root@ubuntu:/usr/share/bcc/tools# ./funclatency iommu_dma_map_sg Tracing 1 functions for "iommu_dma_map_sg"... Hit Ctrl-C to end. ^C nsecs : count distribution 0 -> 1 : 0|| 2 -> 3 : 0|| 4 -> 7 : 0|| 8 -> 15 : 0|| 16 -> 31 : 0|| 32 -> 63 : 0|| 64 -> 127: 0|| 128 -> 255: 0|| 256 -> 511: 0|| 512 -> 1023 : 0|| 1024 -> 2047 : 2274570 |*** | 2048 -> 4095 : 3896310 || 4096 -> 8191 : 74499|| 8192 -> 16383 : 4475 || 16384 -> 32767 : 1519 || 32768 -> 65535 : 480 || 65536 -> 131071 : 286 || 131072 -> 262143 : 18 || 262144 -> 524287 : 2|| root@ubuntu:/usr/share/bcc/tools# ./funclatency iommu_dma_unmap_sg Tracing 1 functions for "iommu_dma_unmap_sg"... Hit Ctrl-C to end. ^C nsecs : count distribution 0 -> 1 : 0|| 2 -> 3 : 0|| 4 -> 7 : 0|
Re: [PATCH v6] iommu/arm-smmu-qcom: Request direct mapping for modem device
Hey Will, On 2020-05-11 23:25, Sibi Sankar wrote: The modem remote processor has two access paths to DDR. One path is directly connected to DDR and another path goes through an SMMU. The SMMU path is configured to be a direct mapping because it's used by various peripherals in the modem subsystem. Typically this direct mapping is configured statically at EL2 by QHEE (Qualcomm's Hypervisor Execution Environment) before the kernel is entered. In certain firmware configuration, especially when the kernel is already in full control of the SMMU, defer programming the modem SIDs to the kernel. Let's add compatibles here so that we can have the kernel program the SIDs for the modem in these cases. Signed-off-by: Sibi Sankar --- Now that the patch is reworded can you please pick it up since its the only pending path from the series. V6 * Rebased on Will's for-joerg/arm-smmu/updates * Reword commit message and add more details [Stephen] drivers/iommu/arm-smmu-qcom.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c index 5bedf21587a56..cf01d0215a397 100644 --- a/drivers/iommu/arm-smmu-qcom.c +++ b/drivers/iommu/arm-smmu-qcom.c @@ -17,7 +17,9 @@ static const struct of_device_id qcom_smmu_client_of_match[] = { { .compatible = "qcom,mdp4" }, { .compatible = "qcom,mdss" }, { .compatible = "qcom,sc7180-mdss" }, + { .compatible = "qcom,sc7180-mss-pil" }, { .compatible = "qcom,sdm845-mdss" }, + { .compatible = "qcom,sdm845-mss-pil" }, { } }; -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/mediatek-v1: Add def_domain_type
The MediaTek V1 IOMMU is arm32 whose default domain type is IOMMU_DOMAIN_UNMANAGED. Add this to satisfy the bus_iommu_probe to enter "probe_finalize". The iommu framework will create a iommu domain for each a device. But all the devices share a iommu domain here, thus we skip all the other domains in the "attach_device" except the domain we create internally with arm_iommu_create_mapping. Also a minor change: in the attach_device, "data" always is not null. Remove "if (!data) return". Signed-off-by: Yong Wu --- a. rebase on linux-next. b. After this patch and fixed the mutex issue(locally I only move mutex_unlock(>mutex) before __iommu_group_dma_attach(group)), the mtk_iommu_v1.c could work normally. --- drivers/iommu/mtk_iommu_v1.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 7bdd74c..f353b07 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -265,10 +265,13 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, { struct mtk_iommu_data *data = dev_iommu_priv_get(dev); struct mtk_iommu_domain *dom = to_mtk_domain(domain); + struct dma_iommu_mapping *mtk_mapping; int ret; - if (!data) - return -ENODEV; + /* Only allow the domain created internally. */ + mtk_mapping = data->dev->archdata.iommu; + if (mtk_mapping->domain != domain) + return 0; if (!data->m4u_dom) { data->m4u_dom = dom; @@ -288,9 +291,6 @@ static void mtk_iommu_detach_device(struct iommu_domain *domain, { struct mtk_iommu_data *data = dev_iommu_priv_get(dev); - if (!data) - return; - mtk_iommu_config(data, dev, false); } @@ -416,6 +416,11 @@ static int mtk_iommu_create_mapping(struct device *dev, return 0; } +static int mtk_iommu_def_domain_type(struct device *dev) +{ + return IOMMU_DOMAIN_UNMANAGED; +} + static struct iommu_device *mtk_iommu_probe_device(struct device *dev) { struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); @@ -525,6 +530,7 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data) .probe_device = mtk_iommu_probe_device, .probe_finalize = mtk_iommu_probe_finalize, .release_device = mtk_iommu_release_device, + .def_domain_type = mtk_iommu_def_domain_type, .device_group = generic_device_group, .pgsize_bitmap = ~0UL << MT2701_IOMMU_PAGE_SHIFT, }; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 23/33] iommu/mediatek-v1 Convert to probe/release_device() call-backs
On Tue, 2020-04-14 at 15:15 +0200, Joerg Roedel wrote: > From: Joerg Roedel > > Convert the Mediatek-v1 IOMMU driver to use the probe_device() and > release_device() call-backs of iommu_ops, so that the iommu core code > does the group and sysfs setup. > > Signed-off-by: Joerg Roedel > --- > drivers/iommu/mtk_iommu_v1.c | 50 +++- > 1 file changed, 20 insertions(+), 30 deletions(-) > > diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c > index a31be05601c9..7bdd74c7cb9f 100644 > --- a/drivers/iommu/mtk_iommu_v1.c > +++ b/drivers/iommu/mtk_iommu_v1.c > @@ -416,14 +416,12 @@ static int mtk_iommu_create_mapping(struct device *dev, > return 0; > } > > -static int mtk_iommu_add_device(struct device *dev) > +static struct iommu_device *mtk_iommu_probe_device(struct device *dev) > { > struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > - struct dma_iommu_mapping *mtk_mapping; > struct of_phandle_args iommu_spec; > struct of_phandle_iterator it; > struct mtk_iommu_data *data; > - struct iommu_group *group; > int err; > > of_for_each_phandle(, err, dev->of_node, "iommus", > @@ -442,35 +440,28 @@ static int mtk_iommu_add_device(struct device *dev) > } > > if (!fwspec || fwspec->ops != _iommu_ops) > - return -ENODEV; /* Not a iommu client device */ > + return ERR_PTR(-ENODEV); /* Not a iommu client device */ > > - /* > - * This is a short-term bodge because the ARM DMA code doesn't > - * understand multi-device groups, but we have to call into it > - * successfully (and not just rely on a normal IOMMU API attach > - * here) in order to set the correct DMA API ops on @dev. > - */ > - group = iommu_group_alloc(); > - if (IS_ERR(group)) > - return PTR_ERR(group); > + data = dev_iommu_priv_get(dev); > > - err = iommu_group_add_device(group, dev); > - iommu_group_put(group); > - if (err) > - return err; > + return >iommu; > +} > > - data = dev_iommu_priv_get(dev); > +static void mtk_iommu_probe_finalize(struct device *dev) > +{ > + struct dma_iommu_mapping *mtk_mapping; > + struct mtk_iommu_data *data; > + int err; > + > + data= dev_iommu_priv_get(dev); > mtk_mapping = data->dev->archdata.iommu; > - err = arm_iommu_attach_device(dev, mtk_mapping); > - if (err) { > - iommu_group_remove_device(dev); > - return err; > - } > > - return iommu_device_link(>iommu, dev); > + err = arm_iommu_attach_device(dev, mtk_mapping); > + if (err) > + dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not > work\n"); Hi Joerg, Thanks very much for this patch. This arm_iommu_attach_device is called just as we expected. But it will fail in this callstack as the group->mutex was tried to be re-locked... [] (iommu_attach_device) from [] (__arm_iommu_attach_device+0x34/0x90) [] (__arm_iommu_attach_device) from [] (arm_iommu_attach_device+0xc/0x20) [] (arm_iommu_attach_device) from [] (mtk_iommu_probe_finalize+0x34/0x50) [] (mtk_iommu_probe_finalize) from [] (bus_iommu_probe+0x2a8/0x2c4) [] (bus_iommu_probe) from [] (bus_set_iommu +0x88/0xd4) [] (bus_set_iommu) from [] (mtk_iommu_probe +0x2f8/0x364) > } > > -static void mtk_iommu_remove_device(struct device *dev) > +static void mtk_iommu_release_device(struct device *dev) > { > struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > struct mtk_iommu_data *data; > @@ -479,9 +470,6 @@ static void mtk_iommu_remove_device(struct device *dev) > return; > > data = dev_iommu_priv_get(dev); > - iommu_device_unlink(>iommu, dev); > - > - iommu_group_remove_device(dev); > iommu_fwspec_free(dev); > } > > @@ -534,8 +522,10 @@ static const struct iommu_ops mtk_iommu_ops = { > .map= mtk_iommu_map, > .unmap = mtk_iommu_unmap, > .iova_to_phys = mtk_iommu_iova_to_phys, > - .add_device = mtk_iommu_add_device, > - .remove_device = mtk_iommu_remove_device, > + .probe_device = mtk_iommu_probe_device, > + .probe_finalize = mtk_iommu_probe_finalize, > + .release_device = mtk_iommu_release_device, > + .device_group = generic_device_group, > .pgsize_bitmap = ~0UL << MT2701_IOMMU_PAGE_SHIFT, > }; > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
(a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
Hi, Alex, When working on an updated version Yi and I found an design open which needs your guidance. In concept nested translation can be incarnated as one GPA->HPA page table and multiple GVA->GPA page tables per VM. It means one container is sufficient to include all SVA-capable devices assigned to the same guest, as there is just one iova space (GPA->HPA) to be managed by vfio iommu map/unmap api. GVA->GPA page tables are bound through the new api introduced in this patch. It is different from legacy shadow translation which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device requires its own iova space and must be in a separate container. However supporting multiple SVA-capable devices in one container imposes one challenge. Now the bind_guest_pgtbl is implemented as iommu type1 api. From the definition of iommu type 1 any operation should be applied to all devices within the same container, just like dma map/unmap. However this philosophy is incorrect regarding to page table binding. We must follow the guest binding requirements between its page tables and assigned devices, otherwise every bound address space is suddenly accessible by all assigned devices thus creating security holes. Do you think whether it's possible to extend iommu api to accept device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds also problematic, as PASID and page tables are IOMMU things which are not touched by vfio device drivers today. Alternatively can we impose the limitation that nesting APIs can be only enabled for singleton containers which contains only one device? This basically falls back to the state of legacy shadow translation vIOMMU. and our current Qemu vIOMMU implementation actually does this way regardless of whether nesting is used. Do you think whether such tradeoff is acceptable as a starting point? Thanks Kevin > -Original Message- > From: Liu, Yi L > Sent: Sunday, March 22, 2020 8:32 PM > To: alex.william...@redhat.com; eric.au...@redhat.com > Cc: Tian, Kevin ; jacob.jun@linux.intel.com; > j...@8bytes.org; Raj, Ashok ; Liu, Yi L > ; Tian, Jun J ; Sun, Yi Y > ; jean-phili...@linaro.org; pet...@redhat.com; > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux- > ker...@vger.kernel.org; Wu, Hao > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host > > From: Liu Yi L > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by > hardware > IOMMUs that have nesting DMA translation (a.k.a dual stage address > translation). For such hardware IOMMUs, there are two stages/levels of > address translation, and software may let userspace/VM to own the first- > level/stage-1 translation structures. Example of such usage is vSVA ( > virtual Shared Virtual Addressing). VM owns the first-level/stage-1 > translation structures and bind the structures to host, then hardware > IOMMU would utilize nesting translation when doing DMA translation fo > the devices behind such hardware IOMMU. > > This patch adds vfio support for binding guest translation (a.k.a stage 1) > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only > bind > guest page table is needed, it also requires to expose interface to guest > for iommu cache invalidation when guest modified the first-level/stage-1 > translation structures since hardware needs to be notified to flush stale > iotlbs. This would be introduced in next patch. > > In this patch, guest page table bind and unbind are done by using flags > VFIO_IOMMU_BIND_GUEST_PGTBL and > VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by > struct iommu_gpasid_bind_data. Before binding guest page table to host, > VM should have got a PASID allocated by host via > VFIO_IOMMU_PASID_REQUEST. > > Bind guest translation structures (here is guest page table) to host > are the first step to setup vSVA (Virtual Shared Virtual Addressing). > > Cc: Kevin Tian > CC: Jacob Pan > Cc: Alex Williamson > Cc: Eric Auger > Cc: Jean-Philippe Brucker > Signed-off-by: Jean-Philippe Brucker > Signed-off-by: Liu Yi L > Signed-off-by: Jacob Pan > --- > drivers/vfio/vfio_iommu_type1.c | 158 > > include/uapi/linux/vfio.h | 46 > 2 files changed, 204 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c > b/drivers/vfio/vfio_iommu_type1.c > index 82a9e0b..a877747 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -130,6 +130,33 @@ struct vfio_regions { > #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ > (!list_empty(>domain_list)) > > +struct domain_capsule { > + struct iommu_domain *domain; > + void *data; > +}; > + > +/* iommu->lock must be held */ > +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu, > + int (*fn)(struct device *dev, void *data), > + void *data) >