RE: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Tian, Kevin
> From: Raj, Ashok 
> Sent: Friday, May 15, 2020 11:20 PM
> 
> On Fri, May 15, 2020 at 12:39:14AM -0700, Tian, Kevin wrote:
> > Hi, Alex,
> >
> > When working on an updated version Yi and I found an design open
> > which needs your guidance.
> >
> > In concept nested translation can be incarnated as one GPA->HPA page
> > table and multiple GVA->GPA page tables per VM. It means one container
> > is sufficient to include all SVA-capable devices assigned to the same guest,
> > as there is just one iova space (GPA->HPA) to be managed by vfio iommu
> > map/unmap api. GVA->GPA page tables are bound through the new api
> > introduced in this patch. It is different from legacy shadow translation
> > which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device
> requires
> > its own iova space and must be in a separate container.
> >
> > However supporting multiple SVA-capable devices in one container
> > imposes one challenge. Now the bind_guest_pgtbl is implemented as
> > iommu type1 api. From the definition of iommu type 1 any operation
> > should be applied to all devices within the same container, just like
> > dma map/unmap. However this philosophy is incorrect regarding to
> > page table binding. We must follow the guest binding requirements
> > between its page tables and assigned devices, otherwise every bound
> > address space is suddenly accessible by all assigned devices thus creating
> > security holes.
> 
> The above 2 paragraphs are bit confusing :-) but what really matters
> is the below:
> >
> > Do you think whether it's possible to extend iommu api to accept
> > device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
> > also problematic, as PASID and page tables are IOMMU things which
> > are not touched by vfio device drivers today.
> 
> All you are referring to is when admin groups multiple devices in a
> single container, you are saying you can't give isolation to them
> for SVA purposes. This is logically equivalent to a single group with
> multiple devices.
> 
>   - Such as devices behind p2p bridge
>   - MFD's or devices behind switches or hieararchies without ACS
> support for instance.
> 
> By limitation you mean, disable PASID on those devices in a single
> container?

the limitation means disabling nesting capability in such container
and yes it implies not exposing PASID capability to userspace too.

> 
> what about ATS?

ATS is possibly fine. VFIO exposes it to userspace even today.

Thanks
Kevin

> 
> >
> > Alternatively can we impose the limitation that nesting APIs can be
> > only enabled for singleton containers which contains only one device?
> > This basically falls back to the state of legacy shadow translation
> > vIOMMU. and our current Qemu vIOMMU implementation actually
> > does this way regardless of whether nesting is used. Do you think
> > whether such tradeoff is acceptable as a starting point?
> >
> > Thanks
> > Kevin
> >
> > > -Original Message-
> > > From: Liu, Yi L 
> > > Sent: Sunday, March 22, 2020 8:32 PM
> > > To: alex.william...@redhat.com; eric.au...@redhat.com
> > > Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> > > j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> > > ; Tian, Jun J ; Sun, Yi Y
> > > ; jean-phili...@linaro.org; pet...@redhat.com;
> > > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> > > ker...@vger.kernel.org; Wu, Hao 
> > > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > >
> > > From: Liu Yi L 
> > >
> > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> > > hardware
> > > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > > translation). For such hardware IOMMUs, there are two stages/levels of
> > > address translation, and software may let userspace/VM to own the first-
> > > level/stage-1 translation structures. Example of such usage is vSVA (
> > > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > > translation structures and bind the structures to host, then hardware
> > > IOMMU would utilize nesting translation when doing DMA translation fo
> > > the devices behind such hardware IOMMU.
> > >
> > > This patch adds vfio support for binding guest translation (a.k.a stage 1)
> > > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not
> only
> > > bind
> > > guest page table is needed, it also requires to expose interface to guest
> > > for iommu cache invalidation when guest modified the first-level/stage-1
> > > translation structures since hardware needs to be notified to flush stale
> > > iotlbs. This would be introduced in next patch.
> > >
> > > In this patch, guest page table bind and unbind are done by using flags
> > > VFIO_IOMMU_BIND_GUEST_PGTBL and
> > > VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL
> > > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> > > struct iommu_gpasid_bind_data. Before binding guest page table to host,
> > > VM should have got a PASID allocated by host via
> > > 

RE: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Saturday, May 16, 2020 1:59 AM
> 
> On Fri, 15 May 2020 07:39:14 +
> "Tian, Kevin"  wrote:
> 
> > Hi, Alex,
> >
> > When working on an updated version Yi and I found an design open
> > which needs your guidance.
> >
> > In concept nested translation can be incarnated as one GPA->HPA page
> > table and multiple GVA->GPA page tables per VM. It means one container
> > is sufficient to include all SVA-capable devices assigned to the same guest,
> > as there is just one iova space (GPA->HPA) to be managed by vfio iommu
> > map/unmap api. GVA->GPA page tables are bound through the new api
> > introduced in this patch. It is different from legacy shadow translation
> > which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device
> requires
> > its own iova space and must be in a separate container.
> 
> I think you've redefined the container as specifically IOVA context,
> but I think a container is actually more of a shared IOMMU context
> between groups and devices within those groups.  Userspace is under no
> obligation to share a container between groups and the kernel is under
> no obligation to let groups share a container.

oh, sorry, I didn't redefine anything here. Just describe the usage
examples of containers when vIOMMU is used. Before vSVA there
is just one address space per IOMMU context, which is what current
vfio iommu api is designed for. Now when extending it to support
vSVA there are two-layer multiple address spaces per IOMMU
 context, and the sharing possibility of those address spaces are 
different. Then this open is about whether we want to allow partial
sharing of IOMMU context within a container, i.e. sharing the 2nd
level but allows binding to different 1st levels.

> 
> > However supporting multiple SVA-capable devices in one container
> > imposes one challenge. Now the bind_guest_pgtbl is implemented as
> > iommu type1 api. From the definition of iommu type 1 any operation
> > should be applied to all devices within the same container, just like
> > dma map/unmap. However this philosophy is incorrect regarding to
> > page table binding. We must follow the guest binding requirements
> > between its page tables and assigned devices, otherwise every bound
> > address space is suddenly accessible by all assigned devices thus creating
> > security holes.
> 
> Is that a security hole, or is that simply the nature of a container?
> Containers are meant to allow a user to share an IOMMU context between
> groups of devices.  Sharing that context necessarily implies that the
> user is granting the devices access to those address spaces.

It's the nature of container. I just tried to state that the current api
is insufficient if we want to allow partial-sharing within container. 

> 
> > Do you think whether it's possible to extend iommu api to accept
> > device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
> > also problematic, as PASID and page tables are IOMMU things which
> > are not touched by vfio device drivers today.
> 
> Is this really that different from a group holding multiple devices,
> each of which has a unique requester ID as seen by the IOMMU?  The
> IOMMU can support separate contexts per device, but the topology
> restricts us that those contexts are not fully isolated.  So we've
> imposed the group restriction on ourselves to reflect that.  If a user
> wants to share an IOMMU context between devices, maybe that precludes
> the user from making use of nested translation.

Agree.

> 
> > Alternatively can we impose the limitation that nesting APIs can be
> > only enabled for singleton containers which contains only one device?
> > This basically falls back to the state of legacy shadow translation
> > vIOMMU. and our current Qemu vIOMMU implementation actually
> > does this way regardless of whether nesting is used. Do you think
> > whether such tradeoff is acceptable as a starting point?
> 
> I think it's an acceptable starting point.  It seems what you're
> describing is separating a monolithic notion of IOMMU context into
> multiple layers, so we can share them at different points, ex. share a
> GPA->HPA context among multiple different GVA->GPA contexts.  That
> seems to imply something like the sub/super-container idea that's been
> tossed around before, but never been fully defined or explored.  Thanks,

yes, that'd be a future extension and we'll start with singleton
limitation for now. 

Thanks
Kevin

> 
> > > -Original Message-
> > > From: Liu, Yi L 
> > > Sent: Sunday, March 22, 2020 8:32 PM
> > > To: alex.william...@redhat.com; eric.au...@redhat.com
> > > Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> > > j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> > > ; Tian, Jun J ; Sun, Yi Y
> > > ; jean-phili...@linaro.org; pet...@redhat.com;
> > > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> > > ker...@vger.kernel.org; Wu, Hao 
> > > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables 

RE: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem

2020-05-15 Thread Song Bao Hua
> Subject: Re: Constantly map and unmap of streaming DMA buffers with
> IOMMU backend might cause serious performance problem
> 
> On 2020-05-15 09:19, Song Bao Hua wrote:
> [ snip... nice analysis, but ultimately it's still "doing stuff has more 
> overhead
> than not doing stuff" ]
> 
> > I am thinking several possible ways on decreasing or removing the latency of
> DMA map/unmap for every single DMA transfer. Meanwhile, "non-strict" as an
> existing option with possible safety issues, I won't discuss it in this mail.
> 
> But passthrough and non-strict mode *specifically exist* for the cases where
> performance is the most important concern - streaming DMA with an IOMMU
> in the middle has an unavoidable tradeoff between performance and isolation,
> so dismissing that out of hand is not a good way to start making this
> argument.

I do understand there is a tradeoff between performance and isolation. However, 
users might ask for performance while supporting isolation. 
In passthrough mode, the whole memory might be accessible by DMA. In non-strict 
mode, a buffer could be still mapped in IOMMU when users have returned it to 
buddy and the buffer has even been allocated by another user. 

> 
> > 1. provide bounce coherent buffers for streaming buffers.
> > As the coherent buffers keep the status of mapping, we can remove the
> overhead of map and unmap for each single DMA operations. However, this
> solution requires memory copy between stream buffers and bounce buffers.
> Thus it will work only if copy is faster than map/unmap. Meanwhile, it will
> consume much more memory bandwidth.
> 
> I'm struggling to understand how that would work, can you explain it in more
> detail?

lower-layer drivers maintain some reusable coherent buffers.
For TX path, drivers copy streaming buffer to coherent buffer, then do DMA;
For RX path, drivers do DMA in coherent buffer, then copy to streaming buffer.

> 
> > 2.make upper-layer kernel components aware of the pain of iommu
> > map/unmap upper-layer fs, mm, networks can somehow let the lower-layer
> drivers know the end of the life cycle of sg buffers. In zswap case, I have 
> seen
> zswap always use the same 2 pages as the destination buffers to save
> compressed page, but the compressor driver still has to constantly map and
> unmap those same two pages for every single compression since zswap and zip
> drivers are working in two completely different software layers.
> >
> > I am thinking some way as below, upper-layer kernel code can call:
> > sg_init_table();
> > sg_mark_reusable();
> >  /* use the buffer many times */
> > 
> > sg_mark_stop_reuse();
> >
> > After that, if low level drivers see "reusable" flag, it will realize the 
> > buffer can
> be used multiple times and will not do map/unmap every time. it means
> upper-layer components will further use the buffers and the same buffers will
> probably be given to lower-layer drivers for new DMA transfer later. When
> upper-layer code sets " stop_reuse", lower-layer driver will unmap the sg
> buffers, possibly by providing a unmap-callback to upper-layer components.
> For zswap case, I have seen the same buffers are always re-used and zip driver
> maps and unmaps it again and again. Shortly after the buffer is unmapped, it
> will be mapped in the next transmission, almost without any time gap
> between unmap and map. In case zswap can set the "reusable" flag, zip driver
> will save a lot of time.
> > Meanwhile, for the safety of buffers, lower-layer drivers need to make 
> > certain
> the buffers have already been unmapped in iommu before those buffers go
> back to buddy for other users.
> 
> That sounds like it would only have benefit in a very small set of specific
> circumstances, and would be very difficult to generalise to buffers that are
> mapped via dma_map_page() or dma_map_single().

Yes, indeed. Hopefully the small set of specific circumstances will encourage 
more upper-layer consumers to reuse buffers, then the "reusable" flag can 
extend to more common cases, such as page and single buffer.

> Furthermore, a high-level API that affects a low-level driver's 
> interpretation of
> mid-layer API calls without the mid-layer's knowledge sounds like a hideous
> abomination of anti-design. If a mid-layer API lends itself to inefficiency 
> at the
> lower level, it would seem a lot cleaner and more robust to extend *that* API
> for stateful buffer reuse.

Absolutely agree. I didn't say the method is elegant. For this moment, maybe 
"reuse" can get started from a small case like zswap. After some while, it is 
possible more users are encouraged to do some optimization for buffer reuse, 
understanding the suffering of lower-layer drivers. Then those performance 
problems might be solved case by case.
On the other hand, it is always the freedom of upper-layer code to indicate 
"reuse" or not. If they don't say anything about reuse, lower-layer drivers can 
simply do map and unmap.

> Failing that, it 

Re: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem

2020-05-15 Thread Robin Murphy

On 2020-05-15 22:33, Song Bao Hua wrote:

Subject: Re: Constantly map and unmap of streaming DMA buffers with
IOMMU backend might cause serious performance problem

On Fri, May 15, 2020 at 01:10:21PM +0100, Robin Murphy wrote:

Meanwhile, for the safety of buffers, lower-layer drivers need to make

certain the buffers have already been unmapped in iommu before those
buffers go back to buddy for other users.


That sounds like it would only have benefit in a very small set of specific
circumstances, and would be very difficult to generalise to buffers that
are mapped via dma_map_page() or dma_map_single(). Furthermore, a
high-level API that affects a low-level driver's interpretation of
mid-layer API calls without the mid-layer's knowledge sounds like a hideous
abomination of anti-design. If a mid-layer API lends itself to inefficiency
at the lower level, it would seem a lot cleaner and more robust to extend
*that* API for stateful buffer reuse. Failing that, it might possibly be
appropriate to approach this at the driver level - many of the cleverer
network drivers already implement buffer pools to recycle mapped SKBs
internally, couldn't the "zip driver" simply try doing something like that
for itself?


Exactly.  If you upper consumer of the DMA API keeps reusing the same
pages just map them once and use dma_sync_* to transfer ownership as
needed.


The problem is that the lower-layer drivers don't know if upper consumer keeps 
reusing the same pages. They are running in different software layers.
For example, Consumer is here in mm/zswap.c
static int zswap_frontswap_store(unsigned type, pgoff_t offset,
struct page *page)
{
...
/* compress */
dst = get_cpu_var(zswap_dstmem);
...
ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, );
...
}

But the lower-layer driver is in drivers/crypto/...

Meanwhile, the lower-layer driver couldn't cache the pointers of buffer address 
coming from consumers to detect if the upper-layer is using the same page.
Because the same page might come from different users or come from the 
different stages of the same user with different permissions.


Indeed the driver can't cache arbitrary pointers, but if typical buffers 
are small enough it can copy the data into its own already-mapped page, 
dma_sync it, and perform the DMA operation from there. That might even 
be more or less what your first suggestion was, but I'm still not quite 
sure.



For example, consumer A uses the buffer as destination, then returns it to 
buddy, but consumer B gets the same buffer and uses it as source.

Another possibility is
Consumer A uses the buffer, returns it to buddy, after some time, it allocates 
a buffer again, but gets the same buffer from buddy like before.

For the safety of the buffer, lower-layer driver must guarantee the buffer is 
unmapped when the buffer returns to buddy.

I think only the upper-layer consumer knows if it is reusing the buffer.


Right, and if reusing buffers is common in crypto callers, then there's 
an argument for "set up reusable buffer", "process updated buffer" and 
"clean up buffer" operations to be added to the crypto API itself, such 
that the underlying drivers can then optimise for DMA usage in a robust 
and obvious way if they want to (or just implement the setup and 
teardown as no-ops and still do a full map/unmap in each "process" call 
if they don't).


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem

2020-05-15 Thread Song Bao Hua
> Subject: Re: Constantly map and unmap of streaming DMA buffers with
> IOMMU backend might cause serious performance problem
> 
> On Fri, May 15, 2020 at 01:10:21PM +0100, Robin Murphy wrote:
> >> Meanwhile, for the safety of buffers, lower-layer drivers need to make
> certain the buffers have already been unmapped in iommu before those
> buffers go back to buddy for other users.
> >
> > That sounds like it would only have benefit in a very small set of specific
> > circumstances, and would be very difficult to generalise to buffers that
> > are mapped via dma_map_page() or dma_map_single(). Furthermore, a
> > high-level API that affects a low-level driver's interpretation of
> > mid-layer API calls without the mid-layer's knowledge sounds like a hideous
> > abomination of anti-design. If a mid-layer API lends itself to inefficiency
> > at the lower level, it would seem a lot cleaner and more robust to extend
> > *that* API for stateful buffer reuse. Failing that, it might possibly be
> > appropriate to approach this at the driver level - many of the cleverer
> > network drivers already implement buffer pools to recycle mapped SKBs
> > internally, couldn't the "zip driver" simply try doing something like that
> > for itself?
> 
> Exactly.  If you upper consumer of the DMA API keeps reusing the same
> pages just map them once and use dma_sync_* to transfer ownership as
> needed.

The problem is that the lower-layer drivers don't know if upper consumer keeps 
reusing the same pages. They are running in different software layers.
For example, Consumer is here in mm/zswap.c
static int zswap_frontswap_store(unsigned type, pgoff_t offset,
struct page *page)
{
...
/* compress */
dst = get_cpu_var(zswap_dstmem);
...
ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, );
...
}

But the lower-layer driver is in drivers/crypto/...

Meanwhile, the lower-layer driver couldn't cache the pointers of buffer address 
coming from consumers to detect if the upper-layer is using the same page.
Because the same page might come from different users or come from the 
different stages of the same user with different permissions.
 
For example, consumer A uses the buffer as destination, then returns it to 
buddy, but consumer B gets the same buffer and uses it as source.

Another possibility is
Consumer A uses the buffer, returns it to buddy, after some time, it allocates 
a buffer again, but gets the same buffer from buddy like before.

For the safety of the buffer, lower-layer driver must guarantee the buffer is 
unmapped when the buffer returns to buddy.

I think only the upper-layer consumer knows if it is reusing the buffer. 

Thanks
Barry


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] PCI/ATS: Only enable ATS for trusted devices

2020-05-15 Thread Bjorn Helgaas
On Fri, May 15, 2020 at 12:43:59PM +0200, Jean-Philippe Brucker wrote:
> Add pci_ats_supported(), which checks whether a device has an ATS
> capability, and whether it is trusted.  A device is untrusted if it is
> plugged into an external-facing port such as Thunderbolt and could be
> spoof an existing device to exploit weaknesses in the IOMMU
> configuration.  PCIe ATS is one such weaknesses since it allows
> endpoints to cache IOMMU translations and emit transactions with
> 'Translated' Address Type (10b) that partially bypass the IOMMU
> translation.
> 
> The SMMUv3 and VT-d IOMMU drivers already disallow ATS and transactions
> with 'Translated' Address Type for untrusted devices.  Add the check to
> pci_enable_ats() to let other drivers (AMD IOMMU for now) benefit from
> it.
> 
> By checking ats_cap, the pci_ats_supported() helper also returns whether
> ATS was globally disabled with pci=noats, and could later include more
> things, for example whether the whole PCIe hierarchy down to the
> endpoint supports ATS.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  include/linux/pci-ats.h |  3 +++
>  drivers/pci/ats.c   | 18 +-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index d08f0869f1213e..f75c307f346de9 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -6,11 +6,14 @@
>  
>  #ifdef CONFIG_PCI_ATS
>  /* Address Translation Service */
> +bool pci_ats_supported(struct pci_dev *dev);
>  int pci_enable_ats(struct pci_dev *dev, int ps);
>  void pci_disable_ats(struct pci_dev *dev);
>  int pci_ats_queue_depth(struct pci_dev *dev);
>  int pci_ats_page_aligned(struct pci_dev *dev);
>  #else /* CONFIG_PCI_ATS */
> +static inline bool pci_ats_supported(struct pci_dev *d)
> +{ return false; }
>  static inline int pci_enable_ats(struct pci_dev *d, int ps)
>  { return -ENODEV; }
>  static inline void pci_disable_ats(struct pci_dev *d) { }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 390e92f2d8d1fc..15fa0c37fd8e44 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -30,6 +30,22 @@ void pci_ats_init(struct pci_dev *dev)
>   dev->ats_cap = pos;
>  }
>  
> +/**
> + * pci_ats_supported - check if the device can use ATS
> + * @dev: the PCI device
> + *
> + * Returns true if the device supports ATS and is allowed to use it, false
> + * otherwise.
> + */
> +bool pci_ats_supported(struct pci_dev *dev)
> +{
> + if (!dev->ats_cap)
> + return false;
> +
> + return !dev->untrusted;
> +}
> +EXPORT_SYMBOL_GPL(pci_ats_supported);
> +
>  /**
>   * pci_enable_ats - enable the ATS capability
>   * @dev: the PCI device
> @@ -42,7 +58,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
>   u16 ctrl;
>   struct pci_dev *pdev;
>  
> - if (!dev->ats_cap)
> + if (!pci_ats_supported(dev))
>   return -EINVAL;
>  
>   if (WARN_ON(dev->ats_enabled))
> -- 
> 2.26.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Implement deferred domain attachment

2020-05-15 Thread Robin Murphy

On 2020-05-15 19:26, Joerg Roedel wrote:

On Fri, May 15, 2020 at 05:28:53PM +0100, Robin Murphy wrote:

On 2020-05-15 17:14, Joerg Roedel wrote:

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ba128d1cdaee..403fda04ea98 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -362,8 +362,8 @@ static int iommu_dma_deferred_attach(struct device *dev,
return 0;
if (unlikely(ops->is_attach_deferred &&
-   ops->is_attach_deferred(domain, dev)))
-   return iommu_attach_device(domain, dev);
+ops->is_attach_deferred(domain, dev)))
+   return iommu_attach_device_no_defer(domain, dev);


Wouldn't it be simpler to just invoke ops->attach_dev directly and avoid
having to formalise a public interface that nobody else should ever use
anyway?


That would omit the ops->attach_dev != NULL check and the trace-point on
device attach. Besides that, it would be a layering violation. But the
function is of course entirely internal to the iommu subsytem and is a
good canditate to be moved to a header file in drivers/iommu.


Sure, checking the pointer before calling was implied, but the 
tracepoint is a good argument, I'd forgotten about that :)



@@ -746,8 +747,11 @@ int iommu_group_add_device(struct iommu_group *group,
struct device *dev)

 mutex_lock(>mutex);
 list_add_tail(>list, >devices);
-   if (group->domain)
-   ret = __iommu_attach_device(group->domain, dev);
+   domain = group->domain;
+   if (domain && (!domain->ops->is_attach_deferred ||
+  !domain->ops->is_attach_deferred(domain, dev)))
+   ret = __iommu_attach_device(domain, dev);
+   }
 mutex_unlock(>mutex);
 if (ret)
 goto err_put_group;


No, doing this in iommu_group_add_device() doesn't solve the problem.
The attach must not happen before a device driver took control of the
device and silenced any DMA initiated by the old kernel. At probe time
this isn't guaranteed.


But that's not what this is; this is (supposed to be) the exact same 
"don't actually perform the attach yet" logic as before, just 
restricting it to default domains in the one place that it actually 
needs to be, so as not to fundamentally bugger up iommu_attach_device() 
in a way that prevents it from working as expected at the correct point 
later.


Thinking a bit more, consider if the driver resets the device then 
attaches it straight to its own unmanaged domain rather than calling any 
DMA ops (e.g. VFIO?) - it looks like that would also be totally broken 
right now, and no amount of bodges in iommu-dma is going to help there.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] dt-bindings: arm-smmu: Add sc7180 compatible string

2020-05-15 Thread Doug Anderson
Hi,

On Fri, May 1, 2020 at 3:30 AM Sharat Masetty  wrote:
>
> This patch simply adds a new compatible string for SC7180 platform.
>
> Signed-off-by: Sharat Masetty 
> ---
>  Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> index 6515dbe..986098b 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> @@ -28,6 +28,7 @@ properties:
>- enum:
>- qcom,msm8996-smmu-v2
>- qcom,msm8998-smmu-v2
> +  - qcom,sc7180-smmu-v2
>- qcom,sdm845-smmu-v2
>- const: qcom,smmu-v2

Is anything blocking this patch from landing now?

-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu: Remove functions that support private domain

2020-05-15 Thread Prakhya, Sai Praneeth
Hi Joerg,

> -Original Message-
> From: Joerg Roedel 
> Sent: Friday, May 15, 2020 2:59 AM
> To: Prakhya, Sai Praneeth 
> Cc: iommu@lists.linux-foundation.org; Lu Baolu 
> Subject: Re: [PATCH] iommu: Remove functions that support private domain
> 
> On Thu, May 14, 2020 at 11:12:52PM +, Prakhya, Sai Praneeth wrote:
> > +static int is_driver_bound(struct device *dev, void *not_used) {
> > +   int ret = 0;
> > +
> > +   device_lock(dev);
> > +   if (device_is_bound(dev))
> > +   ret = 1;
> > +   device_unlock(dev);
> > +   return ret;
> > +}
> 
> This locks only one device, so without lock-conversion there could be a driver
> probe after the device_unlock(), while we are probing the other devices of the
> group.
> 
> > [SNIP]
> >
> > +   /*
> > +* Check if any device in the group still has a driver binded to it.
> > +* This might race with device driver probing code and unfortunately
> > +* there is no clean way out of that either, locking all devices in the
> > +* group and then do the re-attach will introduce a lock-inversion with
> > +* group->mutex - Joerg.
> > +*/
> > +   if (iommu_group_for_each_dev(group, NULL, is_driver_bound)) {
> > +   pr_err("Active drivers exist for devices in the group\n");
> > +   return -EBUSY;
> > +   }
> 
> The lock inversion comes into the picture when this code is called from
> device(-driver) core through the bus-notifiers. The notifiers are called with 
> the
> device already locked.

Make sense. I will look through that code.

> > Another question I have is.. if it's racy then it should be racy even
> > for one device iommu groups.. right? Why would it be racy only with
> > multiple devices iommu group?
> 
> Valid point. So the device needs to be locked _while_ the default domain 
> change
> happens. If triggered by sysfs there should be no locking problems, I guess. 
> But
> you better try it out.

I will try this out and will update you.

Regards,
Sai
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Implement deferred domain attachment

2020-05-15 Thread Joerg Roedel
On Fri, May 15, 2020 at 05:28:53PM +0100, Robin Murphy wrote:
> On 2020-05-15 17:14, Joerg Roedel wrote:
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index ba128d1cdaee..403fda04ea98 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -362,8 +362,8 @@ static int iommu_dma_deferred_attach(struct device *dev,
> > return 0;
> > if (unlikely(ops->is_attach_deferred &&
> > -   ops->is_attach_deferred(domain, dev)))
> > -   return iommu_attach_device(domain, dev);
> > +ops->is_attach_deferred(domain, dev)))
> > +   return iommu_attach_device_no_defer(domain, dev);
> 
> Wouldn't it be simpler to just invoke ops->attach_dev directly and avoid
> having to formalise a public interface that nobody else should ever use
> anyway?

That would omit the ops->attach_dev != NULL check and the trace-point on
device attach. Besides that, it would be a layering violation. But the
function is of course entirely internal to the iommu subsytem and is a
good canditate to be moved to a header file in drivers/iommu.

> @@ -746,8 +747,11 @@ int iommu_group_add_device(struct iommu_group *group,
> struct device *dev)
> 
> mutex_lock(>mutex);
> list_add_tail(>list, >devices);
> -   if (group->domain)
> -   ret = __iommu_attach_device(group->domain, dev);
> +   domain = group->domain;
> +   if (domain && (!domain->ops->is_attach_deferred ||
> +  !domain->ops->is_attach_deferred(domain, dev)))
> +   ret = __iommu_attach_device(domain, dev);
> +   }
> mutex_unlock(>mutex);
> if (ret)
> goto err_put_group;

No, doing this in iommu_group_add_device() doesn't solve the problem.
The attach must not happen before a device driver took control of the
device and silenced any DMA initiated by the old kernel. At probe time
this isn't guaranteed.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Alex Williamson
On Fri, 15 May 2020 07:39:14 +
"Tian, Kevin"  wrote:

> Hi, Alex,
> 
> When working on an updated version Yi and I found an design open
> which needs your guidance.
> 
> In concept nested translation can be incarnated as one GPA->HPA page
> table and multiple GVA->GPA page tables per VM. It means one container
> is sufficient to include all SVA-capable devices assigned to the same guest,
> as there is just one iova space (GPA->HPA) to be managed by vfio iommu
> map/unmap api. GVA->GPA page tables are bound through the new api
> introduced in this patch. It is different from legacy shadow translation 
> which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device requires 
> its own iova space and must be in a separate container.

I think you've redefined the container as specifically IOVA context,
but I think a container is actually more of a shared IOMMU context
between groups and devices within those groups.  Userspace is under no
obligation to share a container between groups and the kernel is under
no obligation to let groups share a container.

> However supporting multiple SVA-capable devices in one container 
> imposes one challenge. Now the bind_guest_pgtbl is implemented as
> iommu type1 api. From the definition of iommu type 1 any operation
> should be applied to all devices within the same container, just like
> dma map/unmap. However this philosophy is incorrect regarding to
> page table binding. We must follow the guest binding requirements 
> between its page tables and assigned devices, otherwise every bound
> address space is suddenly accessible by all assigned devices thus creating
> security holes.

Is that a security hole, or is that simply the nature of a container?
Containers are meant to allow a user to share an IOMMU context between
groups of devices.  Sharing that context necessarily implies that the
user is granting the devices access to those address spaces.

> Do you think whether it's possible to extend iommu api to accept
> device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
> also problematic, as PASID and page tables are IOMMU things which
> are not touched by vfio device drivers today.

Is this really that different from a group holding multiple devices,
each of which has a unique requester ID as seen by the IOMMU?  The
IOMMU can support separate contexts per device, but the topology
restricts us that those contexts are not fully isolated.  So we've
imposed the group restriction on ourselves to reflect that.  If a user
wants to share an IOMMU context between devices, maybe that precludes
the user from making use of nested translation.
 
> Alternatively can we impose the limitation that nesting APIs can be
> only enabled for singleton containers which contains only one device?
> This basically falls back to the state of legacy shadow translation 
> vIOMMU. and our current Qemu vIOMMU implementation actually 
> does this way regardless of whether nesting is used. Do you think 
> whether such tradeoff is acceptable as a starting point?

I think it's an acceptable starting point.  It seems what you're
describing is separating a monolithic notion of IOMMU context into
multiple layers, so we can share them at different points, ex. share a
GPA->HPA context among multiple different GVA->GPA contexts.  That
seems to imply something like the sub/super-container idea that's been
tossed around before, but never been fully defined or explored.  Thanks,

Alex

> > -Original Message-
> > From: Liu, Yi L 
> > Sent: Sunday, March 22, 2020 8:32 PM
> > To: alex.william...@redhat.com; eric.au...@redhat.com
> > Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> > j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> > ; Tian, Jun J ; Sun, Yi Y
> > ; jean-phili...@linaro.org; pet...@redhat.com;
> > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; Wu, Hao 
> > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > 
> > From: Liu Yi L 
> > 
> > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> > hardware
> > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > translation). For such hardware IOMMUs, there are two stages/levels of
> > address translation, and software may let userspace/VM to own the first-
> > level/stage-1 translation structures. Example of such usage is vSVA (
> > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > translation structures and bind the structures to host, then hardware
> > IOMMU would utilize nesting translation when doing DMA translation fo
> > the devices behind such hardware IOMMU.
> > 
> > This patch adds vfio support for binding guest translation (a.k.a stage 1)
> > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only
> > bind
> > guest page table is needed, it also requires to expose interface to guest
> > for iommu cache invalidation when guest modified the first-level/stage-1
> > translation structures since 

Re: [PATCH 03/14] docs: fix references for DMA*.txt files

2020-05-15 Thread Jonathan Corbet
On Fri,  1 May 2020 17:37:47 +0200
Mauro Carvalho Chehab  wrote:

> As we moved those files to core-api, fix references to point
> to their newer locations.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/PCI/pci.rst  |  6 +++---
>  Documentation/block/biodoc.rst |  2 +-
>  Documentation/core-api/bus-virt-phys-mapping.rst   |  2 +-
>  Documentation/core-api/dma-api.rst |  6 +++---
>  Documentation/core-api/dma-isa-lpc.rst |  2 +-
>  Documentation/driver-api/usb/dma.rst   |  6 +++---
>  Documentation/memory-barriers.txt  |  6 +++---
>  .../translations/ko_KR/memory-barriers.txt |  6 +++---
>  arch/ia64/hp/common/sba_iommu.c| 12 ++--
>  arch/parisc/kernel/pci-dma.c   |  2 +-
>  arch/x86/include/asm/dma-mapping.h |  4 ++--
>  arch/x86/kernel/amd_gart_64.c  |  2 +-
>  drivers/parisc/sba_iommu.c | 14 +++---
>  include/linux/dma-mapping.h|  2 +-
>  include/media/videobuf-dma-sg.h|  2 +-
>  kernel/dma/debug.c |  2 +-
>  16 files changed, 38 insertions(+), 38 deletions(-)

...of course, not applying patch 2 causes this one to not apply, so
holding off for now...

jon
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement

2020-05-15 Thread Will Deacon
Hi,

On Fri, May 15, 2020 at 10:19:49AM -0700, Raj, Ashok wrote:
> On Fri, May 15, 2020 at 08:43:51AM -0700, Christoph Hellwig wrote:
> > Can you please lift the untrusted flag into struct device?  It really
> > isn't a PCI specific concept, and we should not have code poking into
> > pci_dev all over the iommu code.
> 
> Just for clarification: All IOMMU's today mostly pertain to only PCI devices
> and for devices that aren't PCI like HPET for instance we give a PCI
> identifier. Facilities like ATS for instance require the platform to have 
> an IOMMU.
> 
> what additional problems does moving this to struct device solve?

ATS is PCI specific, but IOMMUs certainly aren't! The vast majority of
IOMMUs deployed in arm/arm64 SoCs are /not/ using any sort of PCI.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement

2020-05-15 Thread Raj, Ashok
Hi Christoph

On Fri, May 15, 2020 at 08:43:51AM -0700, Christoph Hellwig wrote:
> Can you please lift the untrusted flag into struct device?  It really
> isn't a PCI specific concept, and we should not have code poking into
> pci_dev all over the iommu code.

Just for clarification: All IOMMU's today mostly pertain to only PCI devices
and for devices that aren't PCI like HPET for instance we give a PCI
identifier. Facilities like ATS for instance require the platform to have 
an IOMMU.

what additional problems does moving this to struct device solve?

Cheers,
Ashok

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Implement deferred domain attachment

2020-05-15 Thread Robin Murphy

On 2020-05-15 17:14, Joerg Roedel wrote:

On Fri, May 15, 2020 at 04:42:23PM +0100, Robin Murphy wrote:

   struct iommu_domain *iommu_get_dma_domain(struct device *dev)
   {
-   return dev->iommu_group->default_domain;
+   struct iommu_domain *domain = dev->iommu_group->default_domain;
+
+   if (__iommu_is_attach_deferred(domain, dev))
+   __iommu_attach_device_no_defer(domain, dev);


This raises a red flag, since iommu-dma already has explicit deferred attach
handling where it should need it, immediately after this is called to
retrieve the domain. The whole thing smells to me like we should have an
explicit special-case in iommu_probe_device() rather than hooking
__iommu_attach_device() in general then having to bodge around the fallout
elsewhere.


Good point, I missed that. But it didn't work for its only user, the
AMD IOMMU driver, the reason is that it calls iommu_attach_device(),
which in its code-path checks for deferred attaching again and bails
out, without do the real attachment.

But below updated fix should work. Jerry, could you please test it
again?

 From 4e262dedcd36c7572312c65e66416da74fc78047 Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Fri, 15 May 2020 11:25:03 +0200
Subject: [PATCH] iommu: Fix deferred domain attachment

The IOMMU core code has support for deferring the attachment of a domain
to a device. This is needed in kdump kernels where the new domain must
not be attached to a device before the device driver takes it over.

When the AMD IOMMU driver got converted to use the dma-iommu
implementation, the deferred attaching got lost. The code in
dma-iommu.c has support for deferred attaching, but it calls into
iommu_attach_device() to actually do it. But iommu_attach_device()
will check if the device should be deferred in it code-path and do
nothing, breaking deferred attachment.

Provide a function in IOMMU core code to reliably attach a device to a
domain without any deferred checks and also without other safe-guards.

Cc: Jerry Snitselaar 
Cc: Tom Murphy 
Reported-by: Jerry Snitselaar 
Fixes: 7959b6f8 ("iommu/dma-iommu: Handle deferred devices")
Signed-off-by: Joerg Roedel 
---
  drivers/iommu/dma-iommu.c |  4 ++--
  drivers/iommu/iommu.c | 37 -
  include/linux/iommu.h |  2 ++
  3 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ba128d1cdaee..403fda04ea98 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -362,8 +362,8 @@ static int iommu_dma_deferred_attach(struct device *dev,
return 0;
  
  	if (unlikely(ops->is_attach_deferred &&

-   ops->is_attach_deferred(domain, dev)))
-   return iommu_attach_device(domain, dev);
+ops->is_attach_deferred(domain, dev)))
+   return iommu_attach_device_no_defer(domain, dev);


Wouldn't it be simpler to just invoke ops->attach_dev directly and avoid 
having to formalise a public interface that nobody else should ever use 
anyway?


That said, unless I've entirely misunderstood the situation I still 
think that something like the below makes more sense (apologies for 
broken whitespace).


Robin.

->8-
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2b471419e26c..1a52e530774c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -704,6 +704,7 @@ int iommu_group_add_device(struct iommu_group 
*group, struct device *dev)

 {
int ret, i = 0;
struct group_device *device;
+   struct iommu_domain *domain;

device = kzalloc(sizeof(*device), GFP_KERNEL);
if (!device)
@@ -746,8 +747,11 @@ int iommu_group_add_device(struct iommu_group 
*group, struct device *dev)


mutex_lock(>mutex);
list_add_tail(>list, >devices);
-   if (group->domain)
-   ret = __iommu_attach_device(group->domain, dev);
+   domain = group->domain;
+   if (domain && (!domain->ops->is_attach_deferred ||
+  !domain->ops->is_attach_deferred(domain, dev)))
+   ret = __iommu_attach_device(domain, dev);
+   }
mutex_unlock(>mutex);
if (ret)
goto err_put_group;
@@ -1652,9 +1656,6 @@ static int __iommu_attach_device(struct 
iommu_domain *domain,

 struct device *dev)
 {
int ret;
-   if ((domain->ops->is_attach_deferred != NULL) &&
-   domain->ops->is_attach_deferred(domain, dev))
-   return 0;

if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Implement deferred domain attachment

2020-05-15 Thread Joerg Roedel
On Fri, May 15, 2020 at 04:42:23PM +0100, Robin Murphy wrote:
> >   struct iommu_domain *iommu_get_dma_domain(struct device *dev)
> >   {
> > -   return dev->iommu_group->default_domain;
> > +   struct iommu_domain *domain = dev->iommu_group->default_domain;
> > +
> > +   if (__iommu_is_attach_deferred(domain, dev))
> > +   __iommu_attach_device_no_defer(domain, dev);
> 
> This raises a red flag, since iommu-dma already has explicit deferred attach
> handling where it should need it, immediately after this is called to
> retrieve the domain. The whole thing smells to me like we should have an
> explicit special-case in iommu_probe_device() rather than hooking
> __iommu_attach_device() in general then having to bodge around the fallout
> elsewhere.

Good point, I missed that. But it didn't work for its only user, the
AMD IOMMU driver, the reason is that it calls iommu_attach_device(),
which in its code-path checks for deferred attaching again and bails
out, without do the real attachment.

But below updated fix should work. Jerry, could you please test it
again?

>From 4e262dedcd36c7572312c65e66416da74fc78047 Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Fri, 15 May 2020 11:25:03 +0200
Subject: [PATCH] iommu: Fix deferred domain attachment

The IOMMU core code has support for deferring the attachment of a domain
to a device. This is needed in kdump kernels where the new domain must
not be attached to a device before the device driver takes it over.

When the AMD IOMMU driver got converted to use the dma-iommu
implementation, the deferred attaching got lost. The code in
dma-iommu.c has support for deferred attaching, but it calls into
iommu_attach_device() to actually do it. But iommu_attach_device()
will check if the device should be deferred in it code-path and do
nothing, breaking deferred attachment.

Provide a function in IOMMU core code to reliably attach a device to a
domain without any deferred checks and also without other safe-guards.

Cc: Jerry Snitselaar 
Cc: Tom Murphy 
Reported-by: Jerry Snitselaar 
Fixes: 7959b6f8 ("iommu/dma-iommu: Handle deferred devices")
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/dma-iommu.c |  4 ++--
 drivers/iommu/iommu.c | 37 -
 include/linux/iommu.h |  2 ++
 3 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ba128d1cdaee..403fda04ea98 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -362,8 +362,8 @@ static int iommu_dma_deferred_attach(struct device *dev,
return 0;
 
if (unlikely(ops->is_attach_deferred &&
-   ops->is_attach_deferred(domain, dev)))
-   return iommu_attach_device(domain, dev);
+ops->is_attach_deferred(domain, dev)))
+   return iommu_attach_device_no_defer(domain, dev);
 
return 0;
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4050569188be..91dbdbc6d640 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct kset *iommu_group_kset;
@@ -1889,13 +1890,19 @@ void iommu_domain_free(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
 
-static int __iommu_attach_device(struct iommu_domain *domain,
-struct device *dev)
+static bool __iommu_is_attach_deferred(struct iommu_domain *domain,
+  struct device *dev)
+{
+   if (!domain->ops->is_attach_deferred)
+   return false;
+
+   return domain->ops->is_attach_deferred(domain, dev);
+}
+
+static int __iommu_attach_device_no_defer(struct iommu_domain *domain,
+ struct device *dev)
 {
int ret;
-   if ((domain->ops->is_attach_deferred != NULL) &&
-   domain->ops->is_attach_deferred(domain, dev))
-   return 0;
 
if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;
@@ -1903,9 +1910,29 @@ static int __iommu_attach_device(struct iommu_domain 
*domain,
ret = domain->ops->attach_dev(domain, dev);
if (!ret)
trace_attach_device_to_domain(dev);
+
return ret;
 }
 
+static int __iommu_attach_device(struct iommu_domain *domain,
+struct device *dev)
+{
+   if (__iommu_is_attach_deferred(domain, dev))
+   return 0;
+
+   return __iommu_attach_device_no_defer(domain, dev);
+}
+
+int iommu_attach_device_no_defer(struct iommu_domain *domain,
+struct device *dev)
+{
+   /* Safe-Guard to only call this when needed */
+   if (!is_kdump_kernel())
+   return -ENODEV;
+
+   return __iommu_attach_device_no_defer(domain, dev);
+}
+
 int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 {

Re: [PATCH] iommu: Remove functions that support private domain

2020-05-15 Thread Joerg Roedel
On Fri, May 15, 2020 at 08:55:42PM +0800, Lu Baolu wrote:
> It seems that we can do like this:
> 
> [1] mutex_lock(>lock)
> [2] for_each_group_device(device_lock())
> [3] if (for_each_group_device(!device_is_bound()))
>   change_default_domain()
> [4] for_each_group_device_reverse(device_unlock())
> [5] mutex_unlock(>lock)

The problem here is that I am pretty sure we also have:

device_lock() /* from device/driver core code */
-> bus_notifier()
  -> iommu_bus_notifier()
-> ...
  -> mutex_lock(>lock)

Which would cause lock-inversion with the above code.


Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement

2020-05-15 Thread Christoph Hellwig
Can you please lift the untrusted flag into struct device?  It really
isn't a PCI specific concept, and we should not have code poking into
pci_dev all over the iommu code.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Implement deferred domain attachment

2020-05-15 Thread Robin Murphy

On 2020-05-15 10:45, Joerg Roedel wrote:

From: Joerg Roedel 

The IOMMU core code has support for deferring the attachment of a domain
to a device. This is needed in kdump kernels where the new domain must
not be attached to a device before the device driver takes it over.

But this needs support from the dma-ops code too, to actually do the
late attachment when there are DMA-API calls for the device. This got
lost in the AMD IOMMU driver after converting it to the dma-iommu code.

Do the late attachment in the dma-iommu code-path to fix the issue.

Cc: Jerry Snitselaar 
Cc: Tom Murphy 
Reported-by: Jerry Snitselaar 
Tested-by: Jerry Snitselaar 
Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
Signed-off-by: Joerg Roedel 
---
  drivers/iommu/iommu.c | 33 +++--
  1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4050569188be..f54ebb964271 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1889,13 +1889,19 @@ void iommu_domain_free(struct iommu_domain *domain)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_free);
  
-static int __iommu_attach_device(struct iommu_domain *domain,

-struct device *dev)
+static bool __iommu_is_attach_deferred(struct iommu_domain *domain,
+  struct device *dev)
+{
+   if (!domain->ops->is_attach_deferred)
+   return false;
+
+   return domain->ops->is_attach_deferred(domain, dev);
+}
+
+static int __iommu_attach_device_no_defer(struct iommu_domain *domain,
+ struct device *dev)
  {
int ret;
-   if ((domain->ops->is_attach_deferred != NULL) &&
-   domain->ops->is_attach_deferred(domain, dev))
-   return 0;
  
  	if (unlikely(domain->ops->attach_dev == NULL))

return -ENODEV;
@@ -1903,9 +1909,19 @@ static int __iommu_attach_device(struct iommu_domain 
*domain,
ret = domain->ops->attach_dev(domain, dev);
if (!ret)
trace_attach_device_to_domain(dev);
+
return ret;
  }
  
+static int __iommu_attach_device(struct iommu_domain *domain,

+struct device *dev)
+{
+   if (__iommu_is_attach_deferred(domain, dev))
+   return 0;
+
+   return __iommu_attach_device_no_defer(domain, dev);
+}
+
  int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
  {
struct iommu_group *group;
@@ -2023,7 +2039,12 @@ EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
   */
  struct iommu_domain *iommu_get_dma_domain(struct device *dev)
  {
-   return dev->iommu_group->default_domain;
+   struct iommu_domain *domain = dev->iommu_group->default_domain;
+
+   if (__iommu_is_attach_deferred(domain, dev))
+   __iommu_attach_device_no_defer(domain, dev);


This raises a red flag, since iommu-dma already has explicit deferred 
attach handling where it should need it, immediately after this is 
called to retrieve the domain. The whole thing smells to me like we 
should have an explicit special-case in iommu_probe_device() rather than 
hooking __iommu_attach_device() in general then having to bodge around 
the fallout elsewhere.


Robin.


+
+   return domain;
  }
  
  /*



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Raj, Ashok
On Fri, May 15, 2020 at 12:39:14AM -0700, Tian, Kevin wrote:
> Hi, Alex,
> 
> When working on an updated version Yi and I found an design open
> which needs your guidance.
> 
> In concept nested translation can be incarnated as one GPA->HPA page
> table and multiple GVA->GPA page tables per VM. It means one container
> is sufficient to include all SVA-capable devices assigned to the same guest,
> as there is just one iova space (GPA->HPA) to be managed by vfio iommu
> map/unmap api. GVA->GPA page tables are bound through the new api
> introduced in this patch. It is different from legacy shadow translation
> which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device requires
> its own iova space and must be in a separate container.
> 
> However supporting multiple SVA-capable devices in one container
> imposes one challenge. Now the bind_guest_pgtbl is implemented as
> iommu type1 api. From the definition of iommu type 1 any operation
> should be applied to all devices within the same container, just like
> dma map/unmap. However this philosophy is incorrect regarding to
> page table binding. We must follow the guest binding requirements
> between its page tables and assigned devices, otherwise every bound
> address space is suddenly accessible by all assigned devices thus creating
> security holes.

The above 2 paragraphs are bit confusing :-) but what really matters
is the below: 
> 
> Do you think whether it's possible to extend iommu api to accept
> device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
> also problematic, as PASID and page tables are IOMMU things which
> are not touched by vfio device drivers today.

All you are referring to is when admin groups multiple devices in a
single container, you are saying you can't give isolation to them
for SVA purposes. This is logically equivalent to a single group with
multiple devices.

- Such as devices behind p2p bridge
- MFD's or devices behind switches or hieararchies without ACS
  support for instance.

By limitation you mean, disable PASID on those devices in a single 
container?

what about ATS? 

> 
> Alternatively can we impose the limitation that nesting APIs can be
> only enabled for singleton containers which contains only one device?
> This basically falls back to the state of legacy shadow translation
> vIOMMU. and our current Qemu vIOMMU implementation actually
> does this way regardless of whether nesting is used. Do you think
> whether such tradeoff is acceptable as a starting point?
> 
> Thanks
> Kevin
> 
> > -Original Message-
> > From: Liu, Yi L 
> > Sent: Sunday, March 22, 2020 8:32 PM
> > To: alex.william...@redhat.com; eric.au...@redhat.com
> > Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> > j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> > ; Tian, Jun J ; Sun, Yi Y
> > ; jean-phili...@linaro.org; pet...@redhat.com;
> > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; Wu, Hao 
> > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> >
> > From: Liu Yi L 
> >
> > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> > hardware
> > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > translation). For such hardware IOMMUs, there are two stages/levels of
> > address translation, and software may let userspace/VM to own the first-
> > level/stage-1 translation structures. Example of such usage is vSVA (
> > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > translation structures and bind the structures to host, then hardware
> > IOMMU would utilize nesting translation when doing DMA translation fo
> > the devices behind such hardware IOMMU.
> >
> > This patch adds vfio support for binding guest translation (a.k.a stage 1)
> > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only
> > bind
> > guest page table is needed, it also requires to expose interface to guest
> > for iommu cache invalidation when guest modified the first-level/stage-1
> > translation structures since hardware needs to be notified to flush stale
> > iotlbs. This would be introduced in next patch.
> >
> > In this patch, guest page table bind and unbind are done by using flags
> > VFIO_IOMMU_BIND_GUEST_PGTBL and
> > VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL
> > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> > struct iommu_gpasid_bind_data. Before binding guest page table to host,
> > VM should have got a PASID allocated by host via
> > VFIO_IOMMU_PASID_REQUEST.
> >
> > Bind guest translation structures (here is guest page table) to host
> > are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Signed-off-by: Jean-Philippe Brucker 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 158
> > 

Re: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem

2020-05-15 Thread h...@lst.de
On Fri, May 15, 2020 at 01:10:21PM +0100, Robin Murphy wrote:
>> Meanwhile, for the safety of buffers, lower-layer drivers need to make 
>> certain the buffers have already been unmapped in iommu before those buffers 
>> go back to buddy for other users.
>
> That sounds like it would only have benefit in a very small set of specific 
> circumstances, and would be very difficult to generalise to buffers that 
> are mapped via dma_map_page() or dma_map_single(). Furthermore, a 
> high-level API that affects a low-level driver's interpretation of 
> mid-layer API calls without the mid-layer's knowledge sounds like a hideous 
> abomination of anti-design. If a mid-layer API lends itself to inefficiency 
> at the lower level, it would seem a lot cleaner and more robust to extend 
> *that* API for stateful buffer reuse. Failing that, it might possibly be 
> appropriate to approach this at the driver level - many of the cleverer 
> network drivers already implement buffer pools to recycle mapped SKBs 
> internally, couldn't the "zip driver" simply try doing something like that 
> for itself?

Exactly.  If you upper consumer of the DMA API keeps reusing the same
pages just map them once and use dma_sync_* to transfer ownership as
needed.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Implement deferred domain attachment

2020-05-15 Thread Joerg Roedel
On Fri, May 15, 2020 at 09:51:03PM +0800, Lu Baolu wrote:
> On 2020/5/15 17:45, Joerg Roedel wrote:
> >   struct iommu_domain *iommu_get_dma_domain(struct device *dev)
> >   {
> > -   return dev->iommu_group->default_domain;
> > +   struct iommu_domain *domain = dev->iommu_group->default_domain;
> > +
> > +   if (__iommu_is_attach_deferred(domain, dev))
> > +   __iommu_attach_device_no_defer(domain, dev);
> 
> It seems that the return value needs to be checked. The default domain
> is invalid if attach() failed.

True, I looked at that, the callers can't handle returning NULL here, so
I kept it this way for now. The outcome is that DMA will fail, but
otherwise we'd see a NULL-ptr dereference really quickly after returning
from that function.

Bottom line: This needs to be cleaned up separatly.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Implement deferred domain attachment

2020-05-15 Thread Lu Baolu

Hi Joerg,

On 2020/5/15 17:45, Joerg Roedel wrote:

From: Joerg Roedel 

The IOMMU core code has support for deferring the attachment of a domain
to a device. This is needed in kdump kernels where the new domain must
not be attached to a device before the device driver takes it over.

But this needs support from the dma-ops code too, to actually do the
late attachment when there are DMA-API calls for the device. This got
lost in the AMD IOMMU driver after converting it to the dma-iommu code.

Do the late attachment in the dma-iommu code-path to fix the issue.

Cc: Jerry Snitselaar 
Cc: Tom Murphy 
Reported-by: Jerry Snitselaar 
Tested-by: Jerry Snitselaar 
Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
Signed-off-by: Joerg Roedel 
---
  drivers/iommu/iommu.c | 33 +++--
  1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4050569188be..f54ebb964271 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1889,13 +1889,19 @@ void iommu_domain_free(struct iommu_domain *domain)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_free);
  
-static int __iommu_attach_device(struct iommu_domain *domain,

-struct device *dev)
+static bool __iommu_is_attach_deferred(struct iommu_domain *domain,
+  struct device *dev)
+{
+   if (!domain->ops->is_attach_deferred)
+   return false;
+
+   return domain->ops->is_attach_deferred(domain, dev);
+}
+
+static int __iommu_attach_device_no_defer(struct iommu_domain *domain,
+ struct device *dev)
  {
int ret;
-   if ((domain->ops->is_attach_deferred != NULL) &&
-   domain->ops->is_attach_deferred(domain, dev))
-   return 0;
  
  	if (unlikely(domain->ops->attach_dev == NULL))

return -ENODEV;
@@ -1903,9 +1909,19 @@ static int __iommu_attach_device(struct iommu_domain 
*domain,
ret = domain->ops->attach_dev(domain, dev);
if (!ret)
trace_attach_device_to_domain(dev);
+
return ret;
  }
  
+static int __iommu_attach_device(struct iommu_domain *domain,

+struct device *dev)
+{
+   if (__iommu_is_attach_deferred(domain, dev))
+   return 0;
+
+   return __iommu_attach_device_no_defer(domain, dev);
+}
+
  int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
  {
struct iommu_group *group;
@@ -2023,7 +2039,12 @@ EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
   */
  struct iommu_domain *iommu_get_dma_domain(struct device *dev)
  {
-   return dev->iommu_group->default_domain;
+   struct iommu_domain *domain = dev->iommu_group->default_domain;
+
+   if (__iommu_is_attach_deferred(domain, dev))
+   __iommu_attach_device_no_defer(domain, dev);


It seems that the return value needs to be checked. The default domain
is invalid if attach() failed.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Remove functions that support private domain

2020-05-15 Thread Lu Baolu

Hi,

On 2020/5/15 17:59, Joerg Roedel wrote:

On Thu, May 14, 2020 at 11:12:52PM +, Prakhya, Sai Praneeth wrote:

+static int is_driver_bound(struct device *dev, void *not_used)
+{
+   int ret = 0;
+
+   device_lock(dev);
+   if (device_is_bound(dev))
+   ret = 1;
+   device_unlock(dev);
+   return ret;
+}


This locks only one device, so without lock-conversion there could be a
driver probe after the device_unlock(), while we are probing the other
devices of the group.


[SNIP]

+   /*
+* Check if any device in the group still has a driver binded to it.
+* This might race with device driver probing code and unfortunately
+* there is no clean way out of that either, locking all devices in the
+* group and then do the re-attach will introduce a lock-inversion with
+* group->mutex - Joerg.
+*/
+   if (iommu_group_for_each_dev(group, NULL, is_driver_bound)) {
+   pr_err("Active drivers exist for devices in the group\n");
+   return -EBUSY;
+   }


The lock inversion comes into the picture when this code is called from
device(-driver) core through the bus-notifiers. The notifiers are called
with the device already locked.


Another question I have is.. if it's racy then it should be racy even
for one device iommu groups.. right? Why would it be racy only with
multiple devices iommu group?


Valid point. So the device needs to be locked _while_ the default domain
change happens. If triggered by sysfs there should be no locking
problems, I guess. But you better try it out.


It seems that we can do like this:

[1] mutex_lock(>lock)
[2] for_each_group_device(device_lock())
[3] if (for_each_group_device(!device_is_bound()))
change_default_domain()
[4] for_each_group_device_reverse(device_unlock())
[5] mutex_unlock(>lock)

A possible problem exists at step 2 when another thread is trying to
lock devices in the reverse order at the same time.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu/amd: Use pci_ats_supported()

2020-05-15 Thread Joerg Roedel
On Fri, May 15, 2020 at 02:11:24PM +0200, Jean-Philippe Brucker wrote:
> On Fri, May 15, 2020 at 02:01:50PM +0200, Joerg Roedel wrote:
> > Hmm, but per patch 1, pci_ats_supported() does not check
> > pci_ats_disabled(), or do I miss something?
> 
> The commit message isn't clear. pci_ats_init() sets dev->ats_cap only if
> !pci_ats_disabled(), so checking dev->ats_cap in pci_ats_supported()
> takes pci_ats_disabled() into account.

Right, so the patch is fine:

Reviewed-by: Joerg Roedel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu/amd: Use pci_ats_supported()

2020-05-15 Thread Jean-Philippe Brucker
On Fri, May 15, 2020 at 02:01:50PM +0200, Joerg Roedel wrote:
> On Fri, May 15, 2020 at 12:44:00PM +0200, Jean-Philippe Brucker wrote:
> > The pci_ats_supported() function checks if a device supports ATS and is
> > allowed to use it. In addition to checking that the device has an ATS
> > capability and that the global pci=noats is not set
> > (pci_ats_disabled()), it also checks if a device is untrusted.
> 
> Hmm, but per patch 1, pci_ats_supported() does not check
> pci_ats_disabled(), or do I miss something?

The commit message isn't clear. pci_ats_init() sets dev->ats_cap only if
!pci_ats_disabled(), so checking dev->ats_cap in pci_ats_supported()
takes pci_ats_disabled() into account.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem

2020-05-15 Thread Robin Murphy

On 2020-05-15 09:19, Song Bao Hua wrote:
[ snip... nice analysis, but ultimately it's still "doing stuff has more 
overhead than not doing stuff" ]



I am thinking several possible ways on decreasing or removing the latency of DMA 
map/unmap for every single DMA transfer. Meanwhile, "non-strict" as an existing 
option with possible safety issues, I won't discuss it in this mail.


But passthrough and non-strict mode *specifically exist* for the cases 
where performance is the most important concern - streaming DMA with an 
IOMMU in the middle has an unavoidable tradeoff between performance and 
isolation, so dismissing that out of hand is not a good way to start 
making this argument.



1. provide bounce coherent buffers for streaming buffers.
As the coherent buffers keep the status of mapping, we can remove the overhead 
of map and unmap for each single DMA operations. However, this solution 
requires memory copy between stream buffers and bounce buffers. Thus it will 
work only if copy is faster than map/unmap. Meanwhile, it will consume much 
more memory bandwidth.


I'm struggling to understand how that would work, can you explain it in 
more detail?



2.make upper-layer kernel components aware of the pain of iommu map/unmap
upper-layer fs, mm, networks can somehow let the lower-layer drivers know the 
end of the life cycle of sg buffers. In zswap case, I have seen zswap always 
use the same 2 pages as the destination buffers to save compressed page, but 
the compressor driver still has to constantly map and unmap those same two 
pages for every single compression since zswap and zip drivers are working in 
two completely different software layers.

I am thinking some way as below, upper-layer kernel code can call:
sg_init_table();
sg_mark_reusable();
 /* use the buffer many times */

sg_mark_stop_reuse();

After that, if low level drivers see "reusable" flag, it will realize the buffer can be used 
multiple times and will not do map/unmap every time. it means upper-layer components will further use the 
buffers and the same buffers will probably be given to lower-layer drivers for new DMA transfer later. When 
upper-layer code sets " stop_reuse", lower-layer driver will unmap the sg buffers, possibly by 
providing a unmap-callback to upper-layer components. For zswap case, I have seen the same buffers are always 
re-used and zip driver maps and unmaps it again and again. Shortly after the buffer is unmapped, it will be 
mapped in the next transmission, almost without any time gap between unmap and map. In case zswap can set the 
"reusable" flag, zip driver will save a lot of time.
Meanwhile, for the safety of buffers, lower-layer drivers need to make certain 
the buffers have already been unmapped in iommu before those buffers go back to 
buddy for other users.


That sounds like it would only have benefit in a very small set of 
specific circumstances, and would be very difficult to generalise to 
buffers that are mapped via dma_map_page() or dma_map_single(). 
Furthermore, a high-level API that affects a low-level driver's 
interpretation of mid-layer API calls without the mid-layer's knowledge 
sounds like a hideous abomination of anti-design. If a mid-layer API 
lends itself to inefficiency at the lower level, it would seem a lot 
cleaner and more robust to extend *that* API for stateful buffer reuse. 
Failing that, it might possibly be appropriate to approach this at the 
driver level - many of the cleverer network drivers already implement 
buffer pools to recycle mapped SKBs internally, couldn't the "zip 
driver" simply try doing something like that for itself?


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu/amd: Use pci_ats_supported()

2020-05-15 Thread Joerg Roedel
On Fri, May 15, 2020 at 12:44:00PM +0200, Jean-Philippe Brucker wrote:
> The pci_ats_supported() function checks if a device supports ATS and is
> allowed to use it. In addition to checking that the device has an ATS
> capability and that the global pci=noats is not set
> (pci_ats_disabled()), it also checks if a device is untrusted.

Hmm, but per patch 1, pci_ats_supported() does not check
pci_ats_disabled(), or do I miss something?


Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] PCI/ATS: Only enable ATS for trusted devices

2020-05-15 Thread Joerg Roedel
Hi Jean-Philippe,

thanks for doing this!

On Fri, May 15, 2020 at 12:43:59PM +0200, Jean-Philippe Brucker wrote:
> Add pci_ats_supported(), which checks whether a device has an ATS
> capability, and whether it is trusted.  A device is untrusted if it is
> plugged into an external-facing port such as Thunderbolt and could be
> spoof an existing device to exploit weaknesses in the IOMMU
> configuration.  PCIe ATS is one such weaknesses since it allows
> endpoints to cache IOMMU translations and emit transactions with
> 'Translated' Address Type (10b) that partially bypass the IOMMU
> translation.
> 
> The SMMUv3 and VT-d IOMMU drivers already disallow ATS and transactions
> with 'Translated' Address Type for untrusted devices.  Add the check to
> pci_enable_ats() to let other drivers (AMD IOMMU for now) benefit from
> it.
> 
> By checking ats_cap, the pci_ats_supported() helper also returns whether
> ATS was globally disabled with pci=noats, and could later include more
> things, for example whether the whole PCIe hierarchy down to the
> endpoint supports ATS.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  include/linux/pci-ats.h |  3 +++
>  drivers/pci/ats.c   | 18 +-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index d08f0869f1213e..f75c307f346de9 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -6,11 +6,14 @@
>  
>  #ifdef CONFIG_PCI_ATS
>  /* Address Translation Service */
> +bool pci_ats_supported(struct pci_dev *dev);
>  int pci_enable_ats(struct pci_dev *dev, int ps);
>  void pci_disable_ats(struct pci_dev *dev);
>  int pci_ats_queue_depth(struct pci_dev *dev);
>  int pci_ats_page_aligned(struct pci_dev *dev);
>  #else /* CONFIG_PCI_ATS */
> +static inline bool pci_ats_supported(struct pci_dev *d)
> +{ return false; }
>  static inline int pci_enable_ats(struct pci_dev *d, int ps)
>  { return -ENODEV; }
>  static inline void pci_disable_ats(struct pci_dev *d) { }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 390e92f2d8d1fc..15fa0c37fd8e44 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -30,6 +30,22 @@ void pci_ats_init(struct pci_dev *dev)
>   dev->ats_cap = pos;
>  }
>  
> +/**
> + * pci_ats_supported - check if the device can use ATS
> + * @dev: the PCI device
> + *
> + * Returns true if the device supports ATS and is allowed to use it, false
> + * otherwise.
> + */
> +bool pci_ats_supported(struct pci_dev *dev)
> +{
> + if (!dev->ats_cap)
> + return false;
> +
> + return !dev->untrusted;

dev->untrusted is an 'unsigned int :1', so while this works I would
prefer 'return (dev->untrusted == 0);' here, to be more type-safe.

With that changed:

Reviewed-by: Joerg Roedel 

> +}
> +EXPORT_SYMBOL_GPL(pci_ats_supported);
> +
>  /**
>   * pci_enable_ats - enable the ATS capability
>   * @dev: the PCI device
> @@ -42,7 +58,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
>   u16 ctrl;
>   struct pci_dev *pdev;
>  
> - if (!dev->ats_cap)
> + if (!pci_ats_supported(dev))
>   return -EINVAL;
>  
>   if (WARN_ON(dev->ats_enabled))
> -- 
> 2.26.2
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] iommu/vt-d: Use pci_ats_supported()

2020-05-15 Thread Jean-Philippe Brucker
The pci_ats_supported() helper checks if a device supports ATS and is
allowed to use it. By checking the ATS capability it also integrates the
pci_ats_disabled() check from pci_ats_init(). Simplify the vt-d checks.

Acked-by: Lu Baolu 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/intel-iommu.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0182cff2c7ac75..ed21ce6d123810 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1454,8 +1454,7 @@ static void iommu_enable_dev_iotlb(struct 
device_domain_info *info)
!pci_reset_pri(pdev) && !pci_enable_pri(pdev, 32))
info->pri_enabled = 1;
 #endif
-   if (!pdev->untrusted && info->ats_supported &&
-   pci_ats_page_aligned(pdev) &&
+   if (info->ats_supported && pci_ats_page_aligned(pdev) &&
!pci_enable_ats(pdev, VTD_PAGE_SHIFT)) {
info->ats_enabled = 1;
domain_update_iotlb(info->domain);
@@ -2611,10 +2610,8 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
 
-   if (!pdev->untrusted &&
-   !pci_ats_disabled() &&
-   ecap_dev_iotlb_support(iommu->ecap) &&
-   pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS) &&
+   if (ecap_dev_iotlb_support(iommu->ecap) &&
+   pci_ats_supported(pdev) &&
dmar_find_matched_atsr_unit(pdev))
info->ats_supported = 1;
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] PCI/ATS: Only enable ATS for trusted devices

2020-05-15 Thread Jean-Philippe Brucker
Add pci_ats_supported(), which checks whether a device has an ATS
capability, and whether it is trusted.  A device is untrusted if it is
plugged into an external-facing port such as Thunderbolt and could be
spoof an existing device to exploit weaknesses in the IOMMU
configuration.  PCIe ATS is one such weaknesses since it allows
endpoints to cache IOMMU translations and emit transactions with
'Translated' Address Type (10b) that partially bypass the IOMMU
translation.

The SMMUv3 and VT-d IOMMU drivers already disallow ATS and transactions
with 'Translated' Address Type for untrusted devices.  Add the check to
pci_enable_ats() to let other drivers (AMD IOMMU for now) benefit from
it.

By checking ats_cap, the pci_ats_supported() helper also returns whether
ATS was globally disabled with pci=noats, and could later include more
things, for example whether the whole PCIe hierarchy down to the
endpoint supports ATS.

Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/pci-ats.h |  3 +++
 drivers/pci/ats.c   | 18 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index d08f0869f1213e..f75c307f346de9 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -6,11 +6,14 @@
 
 #ifdef CONFIG_PCI_ATS
 /* Address Translation Service */
+bool pci_ats_supported(struct pci_dev *dev);
 int pci_enable_ats(struct pci_dev *dev, int ps);
 void pci_disable_ats(struct pci_dev *dev);
 int pci_ats_queue_depth(struct pci_dev *dev);
 int pci_ats_page_aligned(struct pci_dev *dev);
 #else /* CONFIG_PCI_ATS */
+static inline bool pci_ats_supported(struct pci_dev *d)
+{ return false; }
 static inline int pci_enable_ats(struct pci_dev *d, int ps)
 { return -ENODEV; }
 static inline void pci_disable_ats(struct pci_dev *d) { }
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 390e92f2d8d1fc..15fa0c37fd8e44 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -30,6 +30,22 @@ void pci_ats_init(struct pci_dev *dev)
dev->ats_cap = pos;
 }
 
+/**
+ * pci_ats_supported - check if the device can use ATS
+ * @dev: the PCI device
+ *
+ * Returns true if the device supports ATS and is allowed to use it, false
+ * otherwise.
+ */
+bool pci_ats_supported(struct pci_dev *dev)
+{
+   if (!dev->ats_cap)
+   return false;
+
+   return !dev->untrusted;
+}
+EXPORT_SYMBOL_GPL(pci_ats_supported);
+
 /**
  * pci_enable_ats - enable the ATS capability
  * @dev: the PCI device
@@ -42,7 +58,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
u16 ctrl;
struct pci_dev *pdev;
 
-   if (!dev->ats_cap)
+   if (!pci_ats_supported(dev))
return -EINVAL;
 
if (WARN_ON(dev->ats_enabled))
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] iommu/arm-smmu-v3: Use pci_ats_supported()

2020-05-15 Thread Jean-Philippe Brucker
The new pci_ats_supported() function checks if a device supports ATS and
is allowed to use it.

Signed-off-by: Jean-Philippe Brucker 
---
I dropped the Ack because I slightly changed the patch to keep the
fwspec check, since last version:
https://lore.kernel.org/linux-iommu/20200311124506.208376-8-jean-phili...@linaro.org/
---
 drivers/iommu/arm-smmu-v3.c | 20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 82508730feb7a1..39b935e86ab203 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2652,26 +2652,16 @@ static void arm_smmu_install_ste_for_dev(struct 
arm_smmu_master *master)
}
 }
 
-#ifdef CONFIG_PCI_ATS
 static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
 {
-   struct pci_dev *pdev;
+   struct device *dev = master->dev;
struct arm_smmu_device *smmu = master->smmu;
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
-
-   if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
-   !(fwspec->flags & IOMMU_FWSPEC_PCI_RC_ATS) || pci_ats_disabled())
-   return false;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 
-   pdev = to_pci_dev(master->dev);
-   return !pdev->untrusted && pdev->ats_cap;
+   return (smmu->features & ARM_SMMU_FEAT_ATS) &&
+   !(fwspec->flags & IOMMU_FWSPEC_PCI_RC_ATS) &&
+   dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev));
 }
-#else
-static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
-{
-   return false;
-}
-#endif
 
 static void arm_smmu_enable_ats(struct arm_smmu_master *master)
 {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement

2020-05-15 Thread Jean-Philippe Brucker
I sent these in March as part of ATS enablement for device-tree [1], but
haven't found the time to address the largest comment on that series
about consolidating the root bridge ATS support between the different
ACPI tables.

I'm resending only the bits that consolidate the 'untrusted' check for
ATS, since there have been more discussions about this [2]. Patch 1
moves the 'untrusted' check to drivers/pci/ats.c and patches 2-4 modify
the ATS-capable IOMMU drivers.

The only functional change should be to the AMD IOMMU driver. With this
change all IOMMU drivers block 'Translated' PCIe transactions and
Translation Requests from untrusted devices.

[1] 
https://lore.kernel.org/linux-iommu/20200311124506.208376-1-jean-phili...@linaro.org/
[2] 
https://lore.kernel.org/linux-pci/20200513151929.GA38418@bjorn-Precision-5520/

Jean-Philippe Brucker (4):
  PCI/ATS: Only enable ATS for trusted devices
  iommu/amd: Use pci_ats_supported()
  iommu/arm-smmu-v3: Use pci_ats_supported()
  iommu/vt-d: Use pci_ats_supported()

 include/linux/pci-ats.h |  3 +++
 drivers/iommu/amd_iommu.c   | 12 
 drivers/iommu/arm-smmu-v3.c | 20 +---
 drivers/iommu/intel-iommu.c |  9 +++--
 drivers/pci/ats.c   | 18 +-
 5 files changed, 32 insertions(+), 30 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] iommu/amd: Use pci_ats_supported()

2020-05-15 Thread Jean-Philippe Brucker
The pci_ats_supported() function checks if a device supports ATS and is
allowed to use it. In addition to checking that the device has an ATS
capability and that the global pci=noats is not set
(pci_ats_disabled()), it also checks if a device is untrusted.

A device is untrusted if it is plugged into an external-facing port such
as Thunderbolt and could be spoofing an existing device to exploit
weaknesses in the IOMMU configuration. By calling pci_ats_supported() we
keep DTE[I]=0 for untrusted devices and abort transactions with
Pretranslated Addresses.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/amd_iommu.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 1dc3718560d0e8..8b7a9e811d33a6 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -313,16 +313,15 @@ static struct iommu_group *acpihid_device_group(struct 
device *dev)
 static bool pci_iommuv2_capable(struct pci_dev *pdev)
 {
static const int caps[] = {
-   PCI_EXT_CAP_ID_ATS,
PCI_EXT_CAP_ID_PRI,
PCI_EXT_CAP_ID_PASID,
};
int i, pos;
 
-   if (pci_ats_disabled())
+   if (!pci_ats_supported(pdev))
return false;
 
-   for (i = 0; i < 3; ++i) {
+   for (i = 0; i < 2; ++i) {
pos = pci_find_ext_capability(pdev, caps[i]);
if (pos == 0)
return false;
@@ -3150,11 +3149,8 @@ int amd_iommu_device_info(struct pci_dev *pdev,
 
memset(info, 0, sizeof(*info));
 
-   if (!pci_ats_disabled()) {
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS);
-   if (pos)
-   info->flags |= AMD_IOMMU_DEVICE_FLAG_ATS_SUP;
-   }
+   if (pci_ats_supported(pdev))
+   info->flags |= AMD_IOMMU_DEVICE_FLAG_ATS_SUP;
 
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
if (pos)
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 30/38] dmabuf: fix common struct sg_table related issues

2020-05-15 Thread Gerd Hoffmann
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index acb26c6..89e293b 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -63,10 +63,9 @@ static struct sg_table *get_sg_table(struct device *dev, 
> struct dma_buf *buf,
>   GFP_KERNEL);
>   if (ret < 0)
>   goto err;
> - if (!dma_map_sg(dev, sg->sgl, sg->nents, direction)) {
> - ret = -EINVAL;
> + ret = dma_map_sgtable(dev, sg, direction, 0);
> + if (ret < 0)
>   goto err;
> - }
>   return sg;
>  
>  err:
> @@ -78,7 +77,7 @@ static struct sg_table *get_sg_table(struct device *dev, 
> struct dma_buf *buf,
>  static void put_sg_table(struct device *dev, struct sg_table *sg,
>enum dma_data_direction direction)
>  {
> - dma_unmap_sg(dev, sg->sgl, sg->nents, direction);
> + dma_unmap_sgtable(dev, sg, direction, 0);
>   sg_free_table(sg);
>   kfree(sg);
>  }

Easy straightforward conversation.

Acked-by: Gerd Hoffmann 

take care,
  Gerd

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 25/38] drm: virtio: fix common struct sg_table related issues

2020-05-15 Thread Gerd Hoffmann
On Wed, May 13, 2020 at 03:32:32PM +0200, Marek Szyprowski wrote:
> The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
> returns the number of the created entries in the DMA address space.
> However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
> dma_unmap_sg must be called with the original number of the entries
> passed to the dma_map_sg().
> 
> struct sg_table is a common structure used for describing a non-contiguous
> memory buffer, used commonly in the DRM and graphics subsystems. It
> consists of a scatterlist with memory pages and DMA addresses (sgl entry),
> as well as the number of scatterlist entries: CPU pages (orig_nents entry)
> and DMA mapped pages (nents entry).
> 
> It turned out that it was a common mistake to misuse nents and orig_nents
> entries, calling DMA-mapping functions with a wrong number of entries or
> ignoring the number of mapped entries returned by the dma_map_sg()
> function.
> 
> To avoid such issues, lets use a common dma-mapping wrappers operating
> directly on the struct sg_table objects and use scatterlist page
> iterators where possible. This, almost always, hides references to the
> nents and orig_nents entries, making the code robust, easier to follow
> and copy/paste safe.

Looks all sane.

Acked-by: Gerd Hoffmann 

take care,
  Gerd

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 23/33] iommu/mediatek-v1 Convert to probe/release_device() call-backs

2020-05-15 Thread Joerg Roedel
Hi,

On Fri, May 15, 2020 at 03:44:59PM +0800, Yong Wu wrote:
> On Tue, 2020-04-14 at 15:15 +0200, Joerg Roedel wrote:
> > -   return iommu_device_link(>iommu, dev);
> > +   err = arm_iommu_attach_device(dev, mtk_mapping);
> > +   if (err)
> > +   dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not 
> > work\n");
> 
> 
> Hi Joerg,
> 
>  Thanks very much for this patch.
> 
>  This arm_iommu_attach_device is called just as we expected.
> 
>  But it will fail in this callstack as the group->mutex was tried to
> be re-locked...
> 
> [] (iommu_attach_device) from []
> (__arm_iommu_attach_device+0x34/0x90)
> [] (__arm_iommu_attach_device) from []
> (arm_iommu_attach_device+0xc/0x20)
> [] (arm_iommu_attach_device) from []
> (mtk_iommu_probe_finalize+0x34/0x50)
> [] (mtk_iommu_probe_finalize) from []
> (bus_iommu_probe+0x2a8/0x2c4)
> [] (bus_iommu_probe) from [] (bus_set_iommu
> +0x88/0xd4)
> [] (bus_set_iommu) from [] (mtk_iommu_probe
> +0x2f8/0x364)

Thanks for the report, is


https://lore.kernel.org/lkml/1589530123-30240-1-git-send-email-yong...@mediatek.com/

The fix for this issue or is something else required?


Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Remove functions that support private domain

2020-05-15 Thread Joerg Roedel
On Wed, May 13, 2020 at 03:47:21PM -0700, Sai Praneeth Prakhya wrote:
> After moving iommu_group setup to iommu core code [1][2] and removing
> private domain support in vt-d [3], there are no users for functions such
> as iommu_request_dm_for_dev(), iommu_request_dma_domain_for_dev() and
> request_default_domain_for_dev(). So, remove these functions.
> 
> [1] commit dce8d6964ebd ("iommu/amd: Convert to probe/release_device()
> call-backs")
> [2] commit e5d1841f18b2 ("iommu/vt-d: Convert to probe/release_device()
> call-backs")
> [3] commit 327d5b2fee91 ("iommu/vt-d: Allow 32bit devices to uses DMA
> domain")
> 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Signed-off-by: Sai Praneeth Prakhya 
> ---
>  drivers/iommu/iommu.c | 65 ---
>  include/linux/iommu.h | 12 
>  2 files changed, 77 deletions(-)

Applied that patch to the x86/vt-d branch.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Remove functions that support private domain

2020-05-15 Thread Joerg Roedel
On Thu, May 14, 2020 at 11:12:52PM +, Prakhya, Sai Praneeth wrote:
> +static int is_driver_bound(struct device *dev, void *not_used)
> +{
> + int ret = 0;
> +
> + device_lock(dev);
> + if (device_is_bound(dev))
> + ret = 1;
> + device_unlock(dev);
> + return ret;
> +}

This locks only one device, so without lock-conversion there could be a
driver probe after the device_unlock(), while we are probing the other
devices of the group.

> [SNIP]
> 
> + /*
> +  * Check if any device in the group still has a driver binded to it.
> +  * This might race with device driver probing code and unfortunately
> +  * there is no clean way out of that either, locking all devices in the
> +  * group and then do the re-attach will introduce a lock-inversion with
> +  * group->mutex - Joerg.
> +  */
> + if (iommu_group_for_each_dev(group, NULL, is_driver_bound)) {
> + pr_err("Active drivers exist for devices in the group\n");
> + return -EBUSY;
> + }

The lock inversion comes into the picture when this code is called from
device(-driver) core through the bus-notifiers. The notifiers are called
with the device already locked.

> Another question I have is.. if it's racy then it should be racy even
> for one device iommu groups.. right? Why would it be racy only with
> multiple devices iommu group?

Valid point. So the device needs to be locked _while_ the default domain
change happens. If triggered by sysfs there should be no locking
problems, I guess. But you better try it out.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/5] drm/sun4i: mixer: Call of_dma_configure if there's an IOMMU

2020-05-15 Thread Paul Kocialkowski
Hi,

On Wed 13 May 20, 16:07, Maxime Ripard wrote:
> The main DRM device is actually a virtual device so it doesn't have the
> iommus property, which is instead on the DMA masters, in this case the
> mixers.
> 
> Add a call to of_dma_configure with the mixers DT node but on the DRM
> virtual device to configure it in the same way than the mixers.

Although I'm not very familiar with the DMA API, this looks legit to me and
matches what's already done in sun4i_backend for the interconnect. So:

Reviewed-by: Paul Kocialkowski 

Cheers,

Paul

> Signed-off-by: Maxime Ripard 
> ---
>  drivers/gpu/drm/sun4i/sun8i_mixer.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/sun4i/sun8i_mixer.c 
> b/drivers/gpu/drm/sun4i/sun8i_mixer.c
> index 56cc037fd312..cc4fb916318f 100644
> --- a/drivers/gpu/drm/sun4i/sun8i_mixer.c
> +++ b/drivers/gpu/drm/sun4i/sun8i_mixer.c
> @@ -363,6 +363,19 @@ static int sun8i_mixer_bind(struct device *dev, struct 
> device *master,
>   mixer->engine.ops = _engine_ops;
>   mixer->engine.node = dev->of_node;
>  
> + if (of_find_property(dev->of_node, "iommus", NULL)) {
> + /*
> +  * This assume we have the same DMA constraints for
> +  * all our the mixers in our pipeline. This sounds
> +  * bad, but it has always been the case for us, and
> +  * DRM doesn't do per-device allocation either, so we
> +  * would need to fix DRM first...
> +  */
> + ret = of_dma_configure(drm->dev, dev->of_node, true);
> + if (ret)
> + return ret;
> + }
> +
>   /*
>* While this function can fail, we shouldn't do anything
>* if this happens. Some early DE2 DT entries don't provide
> -- 
> git-series 0.9.1
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu: Implement deferred domain attachment

2020-05-15 Thread Joerg Roedel
From: Joerg Roedel 

The IOMMU core code has support for deferring the attachment of a domain
to a device. This is needed in kdump kernels where the new domain must
not be attached to a device before the device driver takes it over.

But this needs support from the dma-ops code too, to actually do the
late attachment when there are DMA-API calls for the device. This got
lost in the AMD IOMMU driver after converting it to the dma-iommu code.

Do the late attachment in the dma-iommu code-path to fix the issue.

Cc: Jerry Snitselaar 
Cc: Tom Murphy 
Reported-by: Jerry Snitselaar 
Tested-by: Jerry Snitselaar 
Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4050569188be..f54ebb964271 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1889,13 +1889,19 @@ void iommu_domain_free(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
 
-static int __iommu_attach_device(struct iommu_domain *domain,
-struct device *dev)
+static bool __iommu_is_attach_deferred(struct iommu_domain *domain,
+  struct device *dev)
+{
+   if (!domain->ops->is_attach_deferred)
+   return false;
+
+   return domain->ops->is_attach_deferred(domain, dev);
+}
+
+static int __iommu_attach_device_no_defer(struct iommu_domain *domain,
+ struct device *dev)
 {
int ret;
-   if ((domain->ops->is_attach_deferred != NULL) &&
-   domain->ops->is_attach_deferred(domain, dev))
-   return 0;
 
if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;
@@ -1903,9 +1909,19 @@ static int __iommu_attach_device(struct iommu_domain 
*domain,
ret = domain->ops->attach_dev(domain, dev);
if (!ret)
trace_attach_device_to_domain(dev);
+
return ret;
 }
 
+static int __iommu_attach_device(struct iommu_domain *domain,
+struct device *dev)
+{
+   if (__iommu_is_attach_deferred(domain, dev))
+   return 0;
+
+   return __iommu_attach_device_no_defer(domain, dev);
+}
+
 int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 {
struct iommu_group *group;
@@ -2023,7 +2039,12 @@ EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
  */
 struct iommu_domain *iommu_get_dma_domain(struct device *dev)
 {
-   return dev->iommu_group->default_domain;
+   struct iommu_domain *domain = dev->iommu_group->default_domain;
+
+   if (__iommu_is_attach_deferred(domain, dev))
+   __iommu_attach_device_no_defer(domain, dev);
+
+   return domain;
 }
 
 /*
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem

2020-05-15 Thread Song Bao Hua
Hi Russell & All,

In many DMA streaming map/unmap use cases, lower-layer device drivers 
completely have no idea how and when single/sg buffers are allocated and freed 
by upper-layer filesystem, network protocol, mm management system etc. So the 
only thing device drivers can do is constantly mapping the buffer before DMA 
begins and unmapping the buffer when DMA is done.

This will dramatically increase the latency of dma_map_single/sg and 
dma_unmap_single/sg when these APIs are bound with the IOMMU backend. As for 
each map, iommu driver needs to allocate iova and do the map in iommu. And for 
each unmap, it needs to free iova and unmap the buffer in iommu hardware. When 
devices performing DMA are super-fast, for example, on 100GbE networks, the DMA 
streaming map/unmap latency might become a critical system bottleneck.

In comparison to DMA streaming APIs, DMA consistent APIs using IOMMU backend 
may show much better performance as the map is done when the buffer is 
allocated and unmap is done when the buffer is freed. DMA can be done multiple 
times before the buffers are freed by dma_free_coherent(). There is no such map 
and unmap overhead for each separate DMA transfer as streaming APIs. The 
typical work flow is like
dma_alloc_coherent-> 
doing DMA -> 
doing DMA ->
doing DMA ->
 /* DMA many times */
dma_free_coherent

However, the typical work flow for streaming DMA is like
dma_map_sg -> doing DMA -> dma_unmap_sg -> 
dma_map_sg -> doing DMA -> dma_unmap_sg ->  
dma_map_sg -> doing DMA -> dma_unmap_sg ->  
 /* map, DMA transfer, unmap many times */

Even though upper-layer software might use the same buffers multiple times, for 
each single DMA transmission, map and unmap still need to be done by 
lower-level drivers as lower-layer drivers don't know this fact.

A possible routine to improve the performance of stream APIs is like:
dma_map_sg -> 
dma_sync_sg_for_device -> doing DMA -> 
dma_sync_sg_for_device -> doing DMA -> 
dma_sync_sg_for_device -> doing DMA -> 
... ->/* sync between DMA and CPU many times */
dma_unmap_sg

For every single DMA, software only needs to do sync operations which are much 
lighter that map and unmap. But this case is often not applicable to device 
drivers as the buffers usually come from the upper-layer filesystem, network 
protocol, mm management system etc. Device drivers have to work with the 
assumption that the buffer will be freed immediately after DMA is done. 
However, for those device drivers which are able to allocate and free the DMA 
stream buffers by themselves, they will get benefits of reusing the same 
buffers for doing DMA multiple times without map/unmap overhead.

I collected some latency data for iommu_dma_map_sg and iommu_dma_unmap_sg. In 
the test case, zswap is calling acomp APIs to compress/decompress pages, and 
comp/decomp is done by lower-level hardware ZIP driver.
root@ubuntu:/usr/share/bcc/tools# ./funclatency iommu_dma_map_sg
Tracing 1 functions for "iommu_dma_map_sg"... Hit Ctrl-C to end.
^C
 nsecs   : count distribution
 0 -> 1  : 0||
 2 -> 3  : 0||
 4 -> 7  : 0||
 8 -> 15 : 0||
16 -> 31 : 0||
32 -> 63 : 0||
64 -> 127: 0||
   128 -> 255: 0||
   256 -> 511: 0||
   512 -> 1023   : 0||
  1024 -> 2047   : 2274570  |*** |
  2048 -> 4095   : 3896310  ||
  4096 -> 8191   : 74499||
  8192 -> 16383  : 4475 ||
 16384 -> 32767  : 1519 ||
 32768 -> 65535  : 480  ||
 65536 -> 131071 : 286  ||
131072 -> 262143 : 18   ||
262144 -> 524287 : 2||

root@ubuntu:/usr/share/bcc/tools# ./funclatency iommu_dma_unmap_sg
Tracing 1 functions for "iommu_dma_unmap_sg"... Hit Ctrl-C to end.
^C
 nsecs   : count distribution
 0 -> 1  : 0||
 2 -> 3  : 0||
 4 -> 7  : 0|  

Re: [PATCH v6] iommu/arm-smmu-qcom: Request direct mapping for modem device

2020-05-15 Thread Sibi Sankar

Hey Will,

On 2020-05-11 23:25, Sibi Sankar wrote:

The modem remote processor has two access paths to DDR. One path is
directly connected to DDR and another path goes through an SMMU. The
SMMU path is configured to be a direct mapping because it's used by
various peripherals in the modem subsystem. Typically this direct
mapping is configured statically at EL2 by QHEE (Qualcomm's Hypervisor
Execution Environment) before the kernel is entered.

In certain firmware configuration, especially when the kernel is 
already

in full control of the SMMU, defer programming the modem SIDs to the
kernel. Let's add compatibles here so that we can have the kernel
program the SIDs for the modem in these cases.

Signed-off-by: Sibi Sankar 
---


Now that the patch is reworded can
you please pick it up since its the
only pending path from the series.



V6
 * Rebased on Will's for-joerg/arm-smmu/updates
 * Reword commit message and add more details [Stephen]

 drivers/iommu/arm-smmu-qcom.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/arm-smmu-qcom.c 
b/drivers/iommu/arm-smmu-qcom.c

index 5bedf21587a56..cf01d0215a397 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -17,7 +17,9 @@ static const struct of_device_id
qcom_smmu_client_of_match[] = {
{ .compatible = "qcom,mdp4" },
{ .compatible = "qcom,mdss" },
{ .compatible = "qcom,sc7180-mdss" },
+   { .compatible = "qcom,sc7180-mss-pil" },
{ .compatible = "qcom,sdm845-mdss" },
+   { .compatible = "qcom,sdm845-mss-pil" },
{ }
 };


--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/mediatek-v1: Add def_domain_type

2020-05-15 Thread Yong Wu
The MediaTek V1 IOMMU is arm32 whose default domain type is
IOMMU_DOMAIN_UNMANAGED. Add this to satisfy the bus_iommu_probe to
enter "probe_finalize".

The iommu framework will create a iommu domain for each a device.
But all the devices share a iommu domain here, thus we skip all the
other domains in the "attach_device" except the domain we create
internally with arm_iommu_create_mapping.

Also a minor change: in the attach_device, "data" always is not null.
Remove "if (!data) return".

Signed-off-by: Yong Wu 
---
a. rebase on linux-next.
b. After this patch and fixed the mutex issue(locally I only move
   mutex_unlock(>mutex) before __iommu_group_dma_attach(group)),
   the mtk_iommu_v1.c could work normally.
---
 drivers/iommu/mtk_iommu_v1.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 7bdd74c..f353b07 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -265,10 +265,13 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
 {
struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   struct dma_iommu_mapping *mtk_mapping;
int ret;
 
-   if (!data)
-   return -ENODEV;
+   /* Only allow the domain created internally. */
+   mtk_mapping = data->dev->archdata.iommu;
+   if (mtk_mapping->domain != domain)
+   return 0;
 
if (!data->m4u_dom) {
data->m4u_dom = dom;
@@ -288,9 +291,6 @@ static void mtk_iommu_detach_device(struct iommu_domain 
*domain,
 {
struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
 
-   if (!data)
-   return;
-
mtk_iommu_config(data, dev, false);
 }
 
@@ -416,6 +416,11 @@ static int mtk_iommu_create_mapping(struct device *dev,
return 0;
 }
 
+static int mtk_iommu_def_domain_type(struct device *dev)
+{
+   return IOMMU_DOMAIN_UNMANAGED;
+}
+
 static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -525,6 +530,7 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
.probe_device   = mtk_iommu_probe_device,
.probe_finalize = mtk_iommu_probe_finalize,
.release_device = mtk_iommu_release_device,
+   .def_domain_type = mtk_iommu_def_domain_type,
.device_group   = generic_device_group,
.pgsize_bitmap  = ~0UL << MT2701_IOMMU_PAGE_SHIFT,
 };
-- 
1.9.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 23/33] iommu/mediatek-v1 Convert to probe/release_device() call-backs

2020-05-15 Thread Yong Wu
On Tue, 2020-04-14 at 15:15 +0200, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Convert the Mediatek-v1 IOMMU driver to use the probe_device() and
> release_device() call-backs of iommu_ops, so that the iommu core code
> does the group and sysfs setup.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/mtk_iommu_v1.c | 50 +++-
>  1 file changed, 20 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
> index a31be05601c9..7bdd74c7cb9f 100644
> --- a/drivers/iommu/mtk_iommu_v1.c
> +++ b/drivers/iommu/mtk_iommu_v1.c
> @@ -416,14 +416,12 @@ static int mtk_iommu_create_mapping(struct device *dev,
>   return 0;
>  }
>  
> -static int mtk_iommu_add_device(struct device *dev)
> +static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
>  {
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> - struct dma_iommu_mapping *mtk_mapping;
>   struct of_phandle_args iommu_spec;
>   struct of_phandle_iterator it;
>   struct mtk_iommu_data *data;
> - struct iommu_group *group;
>   int err;
>  
>   of_for_each_phandle(, err, dev->of_node, "iommus",
> @@ -442,35 +440,28 @@ static int mtk_iommu_add_device(struct device *dev)
>   }
>  
>   if (!fwspec || fwspec->ops != _iommu_ops)
> - return -ENODEV; /* Not a iommu client device */
> + return ERR_PTR(-ENODEV); /* Not a iommu client device */
>  
> - /*
> -  * This is a short-term bodge because the ARM DMA code doesn't
> -  * understand multi-device groups, but we have to call into it
> -  * successfully (and not just rely on a normal IOMMU API attach
> -  * here) in order to set the correct DMA API ops on @dev.
> -  */
> - group = iommu_group_alloc();
> - if (IS_ERR(group))
> - return PTR_ERR(group);
> + data = dev_iommu_priv_get(dev);
>  
> - err = iommu_group_add_device(group, dev);
> - iommu_group_put(group);
> - if (err)
> - return err;
> + return >iommu;
> +}
>  
> - data = dev_iommu_priv_get(dev);
> +static void mtk_iommu_probe_finalize(struct device *dev)
> +{
> + struct dma_iommu_mapping *mtk_mapping;
> + struct mtk_iommu_data *data;
> + int err;
> +
> + data= dev_iommu_priv_get(dev);
>   mtk_mapping = data->dev->archdata.iommu;
> - err = arm_iommu_attach_device(dev, mtk_mapping);
> - if (err) {
> - iommu_group_remove_device(dev);
> - return err;
> - }
>  
> - return iommu_device_link(>iommu, dev);
> + err = arm_iommu_attach_device(dev, mtk_mapping);
> + if (err)
> + dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not 
> work\n");


Hi Joerg,

 Thanks very much for this patch.

 This arm_iommu_attach_device is called just as we expected.

 But it will fail in this callstack as the group->mutex was tried to
be re-locked...

[] (iommu_attach_device) from []
(__arm_iommu_attach_device+0x34/0x90)
[] (__arm_iommu_attach_device) from []
(arm_iommu_attach_device+0xc/0x20)
[] (arm_iommu_attach_device) from []
(mtk_iommu_probe_finalize+0x34/0x50)
[] (mtk_iommu_probe_finalize) from []
(bus_iommu_probe+0x2a8/0x2c4)
[] (bus_iommu_probe) from [] (bus_set_iommu
+0x88/0xd4)
[] (bus_set_iommu) from [] (mtk_iommu_probe
+0x2f8/0x364)


>  }
>  
> -static void mtk_iommu_remove_device(struct device *dev)
> +static void mtk_iommu_release_device(struct device *dev)
>  {
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>   struct mtk_iommu_data *data;
> @@ -479,9 +470,6 @@ static void mtk_iommu_remove_device(struct device *dev)
>   return;
>  
>   data = dev_iommu_priv_get(dev);
> - iommu_device_unlink(>iommu, dev);
> -
> - iommu_group_remove_device(dev);
>   iommu_fwspec_free(dev);
>  }
>  
> @@ -534,8 +522,10 @@ static const struct iommu_ops mtk_iommu_ops = {
>   .map= mtk_iommu_map,
>   .unmap  = mtk_iommu_unmap,
>   .iova_to_phys   = mtk_iommu_iova_to_phys,
> - .add_device = mtk_iommu_add_device,
> - .remove_device  = mtk_iommu_remove_device,
> + .probe_device   = mtk_iommu_probe_device,
> + .probe_finalize = mtk_iommu_probe_finalize,
> + .release_device = mtk_iommu_release_device,
> + .device_group   = generic_device_group,
>   .pgsize_bitmap  = ~0UL << MT2701_IOMMU_PAGE_SHIFT,
>  };
>  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


(a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Tian, Kevin
Hi, Alex,

When working on an updated version Yi and I found an design open
which needs your guidance.

In concept nested translation can be incarnated as one GPA->HPA page
table and multiple GVA->GPA page tables per VM. It means one container
is sufficient to include all SVA-capable devices assigned to the same guest,
as there is just one iova space (GPA->HPA) to be managed by vfio iommu
map/unmap api. GVA->GPA page tables are bound through the new api
introduced in this patch. It is different from legacy shadow translation 
which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device requires 
its own iova space and must be in a separate container.

However supporting multiple SVA-capable devices in one container 
imposes one challenge. Now the bind_guest_pgtbl is implemented as
iommu type1 api. From the definition of iommu type 1 any operation
should be applied to all devices within the same container, just like
dma map/unmap. However this philosophy is incorrect regarding to
page table binding. We must follow the guest binding requirements 
between its page tables and assigned devices, otherwise every bound
address space is suddenly accessible by all assigned devices thus creating
security holes.

Do you think whether it's possible to extend iommu api to accept
device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
also problematic, as PASID and page tables are IOMMU things which
are not touched by vfio device drivers today.

Alternatively can we impose the limitation that nesting APIs can be
only enabled for singleton containers which contains only one device?
This basically falls back to the state of legacy shadow translation 
vIOMMU. and our current Qemu vIOMMU implementation actually 
does this way regardless of whether nesting is used. Do you think 
whether such tradeoff is acceptable as a starting point?

Thanks
Kevin

> -Original Message-
> From: Liu, Yi L 
> Sent: Sunday, March 22, 2020 8:32 PM
> To: alex.william...@redhat.com; eric.au...@redhat.com
> Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> ; Tian, Jun J ; Sun, Yi Y
> ; jean-phili...@linaro.org; pet...@redhat.com;
> iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Wu, Hao 
> Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> 
> From: Liu Yi L 
> 
> VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> hardware
> IOMMUs that have nesting DMA translation (a.k.a dual stage address
> translation). For such hardware IOMMUs, there are two stages/levels of
> address translation, and software may let userspace/VM to own the first-
> level/stage-1 translation structures. Example of such usage is vSVA (
> virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> translation structures and bind the structures to host, then hardware
> IOMMU would utilize nesting translation when doing DMA translation fo
> the devices behind such hardware IOMMU.
> 
> This patch adds vfio support for binding guest translation (a.k.a stage 1)
> structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only
> bind
> guest page table is needed, it also requires to expose interface to guest
> for iommu cache invalidation when guest modified the first-level/stage-1
> translation structures since hardware needs to be notified to flush stale
> iotlbs. This would be introduced in next patch.
> 
> In this patch, guest page table bind and unbind are done by using flags
> VFIO_IOMMU_BIND_GUEST_PGTBL and
> VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL
> VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> struct iommu_gpasid_bind_data. Before binding guest page table to host,
> VM should have got a PASID allocated by host via
> VFIO_IOMMU_PASID_REQUEST.
> 
> Bind guest translation structures (here is guest page table) to host
> are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 158
> 
>  include/uapi/linux/vfio.h   |  46 
>  2 files changed, 204 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index 82a9e0b..a877747 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -130,6 +130,33 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
> 
> +struct domain_capsule {
> + struct iommu_domain *domain;
> + void *data;
> +};
> +
> +/* iommu->lock must be held */
> +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> +   int (*fn)(struct device *dev, void *data),
> +   void *data)
>