Re: of_dma_request_slave_channel() failed ?

2018-09-13 Thread Geert Uytterhoeven
Hi Morimoto-san,

On Fri, Sep 14, 2018 at 2:53 AM Kuninori Morimoto
 wrote:
> > > Before 6c92d5a2744e2761 patch, driver will forcibly ignore
> > > non-SSI modules, and try to use PIO mode.
> > > I'm not sure it is "kindly support" or "overkill support".
> > >
> > > After this patch, it needs DMA, otherwise, probe will be failed.
> > > DT shouldn't have non-SSI modules if you want to use PIO mode.
> > >
> > > + /* use PIO mode */
> > > - playback = <&ssi0 &src0 &dvc0>;
> > > + playback = <&ssi0>;
> >
> > OK, so falling back to PIO was really a "best effort" fallback, and the user
> > should really enable DMA?
>
> Yeah, I'm thinking this is better.
> PIO fallback is of course "nice to have" if possible.
> But, because of this new iommu patch, keeping this feature
> needs "big complicated patch", and we can get "small effect" I think.
> Thus, I think this is the time to remove this feature.
> Can you agree ?

You're the rcar-sound expert ;-)
If you think there's not much value in keeping PIO fallback, it can be
removed.

We can change behavior once these dependency issues have been resolved
in a generic way.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-13 Thread Tian, Kevin
> From: Jerome Glisse
> Sent: Thursday, September 13, 2018 10:52 PM
>
[...]
 > AFAIK, on x86 and PPC at least, all PCIE devices are in the same group
> by default at boot or at least all devices behind the same bridge.

the group thing reflects physical hierarchy limitation, not changed
cross boot. Please note iommu group defines the minimal isolation
boundary - all devices within same group must be attached to the
same iommu domain or address space, because physically IOMMU
cannot differentiate DMAs out of those devices. devices behind
legacy PCI-X bridge is one example. other examples include devices
behind a PCIe switch port which doesn't support ACS thus cannot
route p2p transaction to IOMMU. If talking about typical PCIe 
endpoint (with upstreaming ports all supporting ACS), you'll get
one device per group.

One iommu group today is attached to only one iommu domain.
In the future one group may attach to multiple domains, as the
aux domain concept being discussed in another thread.

> 
> Maybe they are kernel option to avoid that and userspace init program
> can definitly re-arrange that base on sysadmin policy).

I don't think there is such option, as it may break isolation model
enabled by IOMMU.

[...]
> > > That is why i am being pedantic :) on making sure there is good reasons
> > > to do what you do inside VFIO. I do believe that we want a common
> frame-
> > > work like the one you are proposing but i do not believe it should be
> > > part of VFIO given the baggages it comes with and that are not relevant
> > > to the use cases for this kind of devices.
> >

The purpose of VFIO is clear - the kernel portal for granting generic 
device resource (mmio, irq, etc.) to user space. VFIO doesn't care
what exactly a resource is used for (queue, cmd reg, etc.). If really
pursuing VFIO path is necessary, maybe such common framework
should lay down in user space, which gets all granted resource from
kernel driver thru VFIO and then provides accelerator services to 
other processes?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-13 Thread Kenneth Lee
On Thu, Sep 13, 2018 at 10:51:50AM -0400, Jerome Glisse wrote:
> Date: Thu, 13 Sep 2018 10:51:50 -0400
> From: Jerome Glisse 
> To: Kenneth Lee 
> CC: Kenneth Lee , Alex Williamson
>  , Herbert Xu ,
>  k...@vger.kernel.org, Jonathan Corbet , Greg Kroah-Hartman
>  , Zaibo Xu ,
>  linux-...@vger.kernel.org, Sanjay Kumar , Hao
>  Fang , linux-ker...@vger.kernel.org,
>  linux...@huawei.com, iommu@lists.linux-foundation.org, "David S . Miller"
>  , linux-cry...@vger.kernel.org, Zhou Wang
>  , Philippe Ombredanne ,
>  Thomas Gleixner , Joerg Roedel ,
>  linux-accelerat...@lists.ozlabs.org, Lu Baolu 
> Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <20180913145149.gb3...@redhat.com>
> 
> On Thu, Sep 13, 2018 at 04:32:32PM +0800, Kenneth Lee wrote:
> > On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote:
> > > On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote:
> > > > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote:
> > > > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote:
> > > > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote:
> > > > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote:
> > > > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote:
> > > > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote:
> > > > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex 
> > > > > > > > > > > > Williamson wrote:
> > > > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth 
> > > > > > > > > > > > > > Lee wrote:
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it 
> > > > > > > > > > "copy_from_user" the
> > > > > > > > > > user memory to the kernel. That is not what we need. What 
> > > > > > > > > > we try to get is: the
> > > > > > > > > > user application do something on its data, and push it away 
> > > > > > > > > > to the accelerator,
> > > > > > > > > > and says: "I'm tied, it is your turn to do the job...". 
> > > > > > > > > > Then the accelerator has
> > > > > > > > > > the memory, referring any portion of it with the same VAs 
> > > > > > > > > > of the application,
> > > > > > > > > > even the VAs are stored inside the memory itself.
> > > > > > > > > 
> > > > > > > > > You were not looking at right place see 
> > > > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > > > > > > It does GUP and create GEM object AFAICR you can wrap that 
> > > > > > > > > GEM object into a
> > > > > > > > > dma buffer object.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Thank you for directing me to this implementation. It is 
> > > > > > > > interesting:).
> > > > > > > > 
> > > > > > > > But it is not yet solve my problem. If I understand it right, 
> > > > > > > > the userptr in
> > > > > > > > i915 do the following:
> > > > > > > > 
> > > > > > > > 1. The user process sets a user pointer with size to the kernel 
> > > > > > > > via ioctl.
> > > > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's mm 
> > > > > > > > for further
> > > > > > > >reference.
> > > > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the 
> > > > > > > > device. So the data
> > > > > > > >can be shared between the user space and the hardware.
> > > > > > > > 
> > > > > > > > But my scenario is: 
> > > > > > > > 
> > > > > > > > 1. The user process has some data in the user space, pointed by 
> > > > > > > > a pointer, say
> > > > > > > >ptr1. And within the memory, there may be some other 
> > > > > > > > pointers, let's say one
> > > > > > > >of them is ptr2.
> > > > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO 
> > > > > > > > space. And the
> > > > > > > >hardware must refer ptr1 and ptr2 *directly* for data.
> > > > > > > > 
> > > > > > > > Userptr lets the hardware and process share the same memory 
> > > > > > > > space. But I need
> > > > > > > > them to share the same *address space*. So IOMMU is a MUST for 
> > > > > > > > WarpDrive,
> > > > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the 
> > > > > > > > procedure is OK.
> > > > > > > 
> > > > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ?
> > > > > > > If so then wait for necessary SVA/SVM to land and do warp drive
> > > > > > > without non SVA/SVM path.
> > > > > > > 
> > > > > > 
> > > > > > I think we should clear the concept of SVA/SVM here. As my 
> > > > > > understanding, Share
> > > > > > Virtual Address/Memory means: any virtual address 

RE: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-13 Thread Tian, Kevin
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Friday, September 14, 2018 10:47 AM
> 
> Hi,
> 
> On 09/13/2018 01:54 AM, Jean-Philippe Brucker wrote:
> > On 12/09/2018 03:42, Lu Baolu wrote:
> >> Hi,
> >>
> >> On 09/11/2018 12:22 AM, Jean-Philippe Brucker wrote:
> >>> Hi,
> >>>
> >>> On 30/08/2018 05:09, Lu Baolu wrote:
>  Below APIs are introduced in the IOMMU glue for device drivers to
> use
>  the finer granularity translation.
> 
>  * iommu_capable(IOMMU_CAP_AUX_DOMAIN)
>  - Represents the ability for supporting multiple domains per device
>    (a.k.a. finer granularity translations) of the IOMMU hardware.
> >>> iommu_capable() cannot represent hardware capabilities, we need
> >>> something else for systems with multiple IOMMUs that have different
> >>> caps. How about iommu_domain_get_attr on the device's domain
> instead?
> >> Domain is not a good choice for per iommu cap query. A domain might
> be
> >> attached to devices belonging to different iommu's.
> >>
> >> How about an API with device structure as parameter? A device always
> >> belongs to a specific iommu. This API is supposed to be used the
> >> device driver.
> > Ah right, domain attributes won't work. Your suggestion seems more
> > suitable, but maybe users can simply try to enable auxiliary domains
> > first, and conclude that the IOMMU doesn't support it if it returns an
> error
> >
> 
> Some driver might want to check whether hardware supports
> AUX_DOMAIN
> during the driver probe stage, but doesn't want to enable AUX_DOMAIN
> at that time. One reasonable use case is driver check AUX_DOMAIN cap
> during driver probe and expose different sysfs nodes according to
> whether AUX_DOMAIN is support or not, then AUX_DOMAIN is enabled or
> disabled during run time through a sysfs node. With this consideration,
> we still need a API to check cap.
> 
> How about
> 
> * iommu_check_aux_domain(struct device *dev)
> - Check whether the iommu driver supports multiple domains on @dev.
> 

maybe generalized as iommu_check_attr with aux_domain as a flag,
in case other IOMMU checks introduced in the future. hinted by Jean's
comment on iommu_sva_device_init part.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-13 Thread Lu Baolu

Hi,

On 09/13/2018 01:54 AM, Jean-Philippe Brucker wrote:

On 12/09/2018 03:42, Lu Baolu wrote:

Hi,

On 09/11/2018 12:22 AM, Jean-Philippe Brucker wrote:

Hi,

On 30/08/2018 05:09, Lu Baolu wrote:

Below APIs are introduced in the IOMMU glue for device drivers to use
the finer granularity translation.

* iommu_capable(IOMMU_CAP_AUX_DOMAIN)
- Represents the ability for supporting multiple domains per device
  (a.k.a. finer granularity translations) of the IOMMU hardware.

iommu_capable() cannot represent hardware capabilities, we need
something else for systems with multiple IOMMUs that have different
caps. How about iommu_domain_get_attr on the device's domain instead?

Domain is not a good choice for per iommu cap query. A domain might be
attached to devices belonging to different iommu's.

How about an API with device structure as parameter? A device always
belongs to a specific iommu. This API is supposed to be used the
device driver.

Ah right, domain attributes won't work. Your suggestion seems more
suitable, but maybe users can simply try to enable auxiliary domains
first, and conclude that the IOMMU doesn't support it if it returns an error



Some driver might want to check whether hardware supports AUX_DOMAIN
during the driver probe stage, but doesn't want to enable AUX_DOMAIN
at that time. One reasonable use case is driver check AUX_DOMAIN cap
during driver probe and expose different sysfs nodes according to
whether AUX_DOMAIN is support or not, then AUX_DOMAIN is enabled or 
disabled during run time through a sysfs node. With this consideration,

we still need a API to check cap.

How about

* iommu_check_aux_domain(struct device *dev)
   - Check whether the iommu driver supports multiple domains on @dev.

Best regards,
Lu Baolu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: of_dma_request_slave_channel() failed ?

2018-09-13 Thread Kuninori Morimoto


Hi Geert

> > ...which is in fact the exact same problem that the IOMMU code has -
> > might it make sense to give of_dma_request_slave_channel() similar
> > (optional?) driver_deferred_probe_check_state() logic, i.e. "if my DMAC
> > driver's not shown up by this point, assume it's not built-in and go on
> > without it". Of course it is somewhat easier for IOMMU drivers as
> > there's zero chance of those popping up as modules later on.
> 
> It may solve the issue in some cases.  But only if the dmac is reprobed
> before the DMA slave driver, which is not guaranteed.
> 
> BTW, it seems e.g. i2c and serial suffer from the same problem, and fall
> back to PIO. However, these drivers try to obtain the DMA channel when
> used, not during probe, so they start using DMA after the dmac has been
> probed successfully.

Actually sound driver get DMA channel when used.
But checking DMA availability when probe timing (= try to get, and release it 
soon).
sound driver side other approach is that it don't check it when probe.

Best regards
---
Kuninori Morimoto
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: of_dma_request_slave_channel() failed ?

2018-09-13 Thread Kuninori Morimoto


Hi Geert

> > Before 6c92d5a2744e2761 patch, driver will forcibly ignore
> > non-SSI modules, and try to use PIO mode.
> > I'm not sure it is "kindly support" or "overkill support".
> >
> > After this patch, it needs DMA, otherwise, probe will be failed.
> > DT shouldn't have non-SSI modules if you want to use PIO mode.
> >
> > + /* use PIO mode */
> > - playback = <&ssi0 &src0 &dvc0>;
> > + playback = <&ssi0>;
> >
> 
> OK, so falling back to PIO was really a "best effort" fallback, and the user
> should really enable DMA?

Yeah, I'm thinking this is better.
PIO fallback is of course "nice to have" if possible.
But, because of this new iommu patch, keeping this feature
needs "big complicated patch", and we can get "small effect" I think.
Thus, I think this is the time to remove this feature.
Can you agree ?

Best regards
---
Kuninori Morimoto
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-13 Thread Tian, Kevin
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Thursday, September 13, 2018 11:03 PM
> 
> On 13/09/2018 01:19, Tian, Kevin wrote:
> >>> This is proposed for architectures which support finer granularity
> >>> second level translation with no impact on architectures which only
> >>> support Source ID or the similar granularity.
> >>
> >> Just to be clear, in this paragraph you're only referring to the
> >> Nested/second-level translation for mdev, which is specific to vt-d
> >> rev3? Other architectures can still do first-level translation with
> >> PASID, to support some use-cases of IOMMU aware mediated device
> >> (assigning mdevs to userspace drivers, for example)
> >
> > yes. aux domain concept applies only to vt-d rev3 which introduces
> > scalable mode. Care is taken to avoid breaking usages on existing
> > architectures.
> >
> > one note. Assigning mdevs to user space alone doesn't imply IOMMU
> > aware. All existing mdev usages use software or proprietary methods to
> > isolate DMA. There is only one potential IOMMU aware mdev usage
> > which we talked not rely on vt-d rev3 scalable mode - wrap a random
> > PCI device into a single mdev instance (no sharing). In that case mdev
> > inherits RID from parent PCI device, thus is isolated by IOMMU in RID
> > granular. Our RFC supports this usage too. In VFIO two usages (PASID-
> > based and RID-based) use same code path, i.e. always binding domain to
> > the parent device of mdev. But within IOMMU they go different paths.
> > PASID-based will go to aux-domain as iommu_enable_aux_domain
> > has been called on that device. RID-based will follow existing
> > unmanaged domain path, as if it is parent device assignment.
> 
> For Arm SMMU we're more interested in the PASID-granular case than the
> RID-granular one. It doesn't necessarily require vt-d rev3 scalable
> mode, the following example can be implemented with an SMMUv3, since
> it
> only needs PASID-granular first-level translation:
> 
> We have a PCI function that supports PASID, and can be partitioned into
> multiple isolated entities, mdevs. Each mdev has an MMIO frame, an MSI
> vector and a PASID.
> 
> Different processes (userspace drivers, not QEMU) each open one mdev. A
> process controlling one mdev has two ways of doing DMA:
> 
> (1) Classically, the process uses a VFIO_TYPE1v2_IOMMU container. This
> creates an auxiliary domain for the mdev, with PASID #35. The process
> creates DMA mappings with VFIO_IOMMU_MAP_DMA. VFIO calls
> iommu_map on
> the auxiliary domain. The IOMMU driver populates the pgtables associated
> with PASID #35.
> 
> (2) SVA. One way of doing it: the process uses a new
> "VFIO_TYPE1_SVA_IOMMU" type of container. VFIO binds the process
> address
> space to the device, gets PASID #35. Simpler, but not everyone wants to
> use SVA, especially not userspace drivers which need the highest
> performance.
> 
> 
> This example only needs to modify first-level translation, and works
> with SMMUv3. The kernel here could be the host, in which case
> second-level translation is disabled in the SMMU, or it could be the
> guest, in which case second-level mappings are created by QEMU and
> first-level translation is managed by assigning PASID tables to the guest.

the former yes applies to aux domain concept. The latter doesn't -
you have only one second-level per device. whole PASID table managed
by guest means you assign the whole device to guest, which is not the
concept of aux domain here.

> 
> So (2) would use iommu_sva_bind_device(), but (1) needs something else.
> Aren't auxiliary domains suitable for (1)? Why limit auxiliary domain to
> second-level or nested translation? It seems silly to use a different
> API for first-level, since the flow in userspace and VFIO is the same as
> your second-level case as far as MAP_DMA ioctl goes. The difference is
> that in your case the auxiliary domain supports an additional operation
> which binds first-level page tables. An auxiliary domain that only
> supports first-level wouldn't support this operation, but it can still
> implement iommu_map/unmap/etc.

Thanks for correcting me on this. You are right that aux domain shouldn't
impose such limitation on 2nd or nested only. We define aux domain
as a normal domain (aux takes effect only when attaching to a device),
thus it should support all capabilities possible on a normal domain.

btw I'm not sure whether you look at my comment to patch 8/10. I
explained the rationale why aux domain doesn't interfere with existing
default domain usage, and in a quick thinking above example might
not make difference. but need your confirm here. :-)

> 
> 
> Another note: if for some reason you did want to allow userspace to
> choose between first-level or second-level, you could implement the
> VFIO_TYPE1_NESTING_IOMMU container. It acts like a
> VFIO_TYPE1v2_IOMMU,
> but also sets the DOMAIN_ATTR_NESTING on the IOMMU domain. So
> DMA_MAP
> ioctl on a NESTING container woul

Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-13 Thread Raj, Ashok
On Thu, Sep 13, 2018 at 04:03:01PM +0100, Jean-Philippe Brucker wrote:
> On 13/09/2018 01:19, Tian, Kevin wrote:
> >>> This is proposed for architectures which support finer granularity
> >>> second level translation with no impact on architectures which only
> >>> support Source ID or the similar granularity.
> >>
> >> Just to be clear, in this paragraph you're only referring to the
> >> Nested/second-level translation for mdev, which is specific to vt-d
> >> rev3? Other architectures can still do first-level translation with
> >> PASID, to support some use-cases of IOMMU aware mediated device
> >> (assigning mdevs to userspace drivers, for example)
> > 
> > yes. aux domain concept applies only to vt-d rev3 which introduces
> > scalable mode. Care is taken to avoid breaking usages on existing
> > architectures.
> > 
> > one note. Assigning mdevs to user space alone doesn't imply IOMMU
> > aware. All existing mdev usages use software or proprietary methods to
> > isolate DMA. There is only one potential IOMMU aware mdev usage 
> > which we talked not rely on vt-d rev3 scalable mode - wrap a random 
> > PCI device into a single mdev instance (no sharing). In that case mdev 
> > inherits RID from parent PCI device, thus is isolated by IOMMU in RID 
> > granular. Our RFC supports this usage too. In VFIO two usages (PASID-
> > based and RID-based) use same code path, i.e. always binding domain to
> > the parent device of mdev. But within IOMMU they go different paths.
> > PASID-based will go to aux-domain as iommu_enable_aux_domain
> > has been called on that device. RID-based will follow existing 
> > unmanaged domain path, as if it is parent device assignment.
> 
> For Arm SMMU we're more interested in the PASID-granular case than the
> RID-granular one. It doesn't necessarily require vt-d rev3 scalable
> mode, the following example can be implemented with an SMMUv3, since it
> only needs PASID-granular first-level translation:

You are right, you can simply use the first level as IOVA for every PASID.

Only issue becomes when you need to assign that to a guest, you would be 
required
to shadow the 1st level. If you have a 2nd level per-pasid first level can
be managed in guest and don't require to shadow them.

> 
> We have a PCI function that supports PASID, and can be partitioned into
> multiple isolated entities, mdevs. Each mdev has an MMIO frame, an MSI
> vector and a PASID.
> 
> Different processes (userspace drivers, not QEMU) each open one mdev. A
> process controlling one mdev has two ways of doing DMA:
> 
> (1) Classically, the process uses a VFIO_TYPE1v2_IOMMU container. This
> creates an auxiliary domain for the mdev, with PASID #35. The process
> creates DMA mappings with VFIO_IOMMU_MAP_DMA. VFIO calls iommu_map on
> the auxiliary domain. The IOMMU driver populates the pgtables associated
> with PASID #35.
> 
> (2) SVA. One way of doing it: the process uses a new
> "VFIO_TYPE1_SVA_IOMMU" type of container. VFIO binds the process address
> space to the device, gets PASID #35. Simpler, but not everyone wants to
> use SVA, especially not userspace drivers which need the highest
> performance.
> 
> 
> This example only needs to modify first-level translation, and works
> with SMMUv3. The kernel here could be the host, in which case
> second-level translation is disabled in the SMMU, or it could be the
> guest, in which case second-level mappings are created by QEMU and
> first-level translation is managed by assigning PASID tables to the guest.
> 
> So (2) would use iommu_sva_bind_device(), but (1) needs something else.
> Aren't auxiliary domains suitable for (1)? Why limit auxiliary domain to
> second-level or nested translation? It seems silly to use a different
> API for first-level, since the flow in userspace and VFIO is the same as
> your second-level case as far as MAP_DMA ioctl goes. The difference is
> that in your case the auxiliary domain supports an additional operation
> which binds first-level page tables. An auxiliary domain that only
> supports first-level wouldn't support this operation, but it can still
> implement iommu_map/unmap/etc.
> 
> 
> Another note: if for some reason you did want to allow userspace to
> choose between first-level or second-level, you could implement the
> VFIO_TYPE1_NESTING_IOMMU container. It acts like a VFIO_TYPE1v2_IOMMU,
> but also sets the DOMAIN_ATTR_NESTING on the IOMMU domain. So DMA_MAP
> ioctl on a NESTING container would populate second-level, and DMA_MAP on
> a normal container populates first-level. But if you're always going to
> use second-level by default, the distinction isn't necessary.

Where is the nesting attribute specified? in vt-d2 it was part of context
entry, so also meant all PASID's are nested now. In vt-d3 its part of
PASID context.

It seems unsafe to share PASID's with different VM's since any request 
W/O PASID has only one mapping.

> 
> 
> >> Sounds good, I'll drop the private PASID patch if we can figure out a
> >> solut

[PATCH v6 6/7] iommu/arm-smmu: Support non-strict mode

2018-09-13 Thread Robin Murphy
All we need is to wire up .flush_iotlb_all properly and implement the
domain attribute, and iommu-dma and io-pgtable-arm will do the rest for
us. Rather than bother implementing it for v7s format for the highly
unlikely chance of that being relevant, we can simply hide the
non-strict flag from io-pgtable for that combination just so anyone who
does actually try it will simply get over-invalidation instead of
failure to initialise domains.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 43 +---
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fd1b80ef9490..c727080e7acd 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -246,6 +246,7 @@ struct arm_smmu_domain {
const struct iommu_gather_ops   *tlb_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
+   boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
struct iommu_domain domain;
@@ -863,6 +864,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
 
+   if (smmu_domain->non_strict && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH32_S)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
smmu_domain->smmu = smmu;
pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
if (!pgtbl_ops) {
@@ -1252,6 +1256,14 @@ static size_t arm_smmu_unmap(struct iommu_domain 
*domain, unsigned long iova,
return ops->unmap(ops, iova, size);
 }
 
+static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   if (smmu_domain->tlb_ops)
+   smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
+}
+
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -1470,13 +1482,17 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
switch (attr) {
case DOMAIN_ATTR_NESTING:
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return -EINVAL;
+   *(int *)data = smmu_domain->non_strict;
+   return 0;
default:
return -ENODEV;
}
@@ -1488,13 +1504,15 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
mutex_lock(&smmu_domain->init_mutex);
 
switch (attr) {
case DOMAIN_ATTR_NESTING:
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
if (smmu_domain->smmu) {
ret = -EPERM;
goto out_unlock;
@@ -1506,6 +1524,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
break;
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   if (domain->type != IOMMU_DOMAIN_DMA) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   smmu_domain->non_strict = *(int *)data;
+   if (smmu_domain->pgtbl_ops)
+   io_pgtable_set_non_strict(smmu_domain->pgtbl_ops,
+ smmu_domain->non_strict);
+   break;
default:
ret = -ENODEV;
}
@@ -1562,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {
.attach_dev = arm_smmu_attach_dev,
.map= arm_smmu_map,
.unmap  = arm_smmu_unmap,
-   .flush_iotlb_all= arm_smmu_iotlb_sync,
+   .flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
.add_device = arm_smmu_add_device,
-- 
2.19.0.dirty

___
iommu mailing list
iommu@

[PATCH v6 7/7] iommu/dma: Add bootup option "iommu.non_strict"

2018-09-13 Thread Robin Murphy
From: Zhen Lei 

Add a bootup option to make the system manager can choose which mode to
be used. The default mode is strict.

Signed-off-by: Zhen Lei 
[rm: make it a generic iommu-dma feature]
Signed-off-by: Robin Murphy 
---
 .../admin-guide/kernel-parameters.txt  | 13 +
 drivers/iommu/dma-iommu.c  | 18 ++
 2 files changed, 31 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9871e649ffef..406b91759b62 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1749,6 +1749,19 @@
nobypass[PPC/POWERNV]
Disable IOMMU bypass, using IOMMU for PCI devices.
 
+   iommu.non_strict=   [ARM64]
+   Format: { "0" | "1" }
+   0 - strict mode, default.
+   Release IOVAs after the related TLBs are invalid
+   completely.
+   1 - non-strict mode.
+   Put off TLBs invalidation and release memory first.
+   It's good for scatter-gather performance but lacks
+   full isolation, an untrusted device can access the
+   reused memory because the TLBs may still valid.
+   Please take full consideration before choosing this
+   mode. Note that, VFIO will always use strict mode.
+
iommu.passthrough=
[ARM64] Configure DMA to bypass the IOMMU by default.
Format: { "0" | "1" }
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d91849fe4ebe..04d4c5453acd 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -62,6 +62,24 @@ struct iommu_dma_cookie {
 
 static bool iommu_dma_non_strict __read_mostly;
 
+static int __init iommu_dma_setup(char *str)
+{
+   int ret;
+
+   ret = kstrtobool(str, &iommu_dma_non_strict);
+   if (ret)
+   return ret;
+
+   if (iommu_dma_non_strict) {
+   pr_warn("WARNING: iommu non-strict mode is chosen.\n"
+   "It's good for scatter-gather performance but lacks 
full isolation\n");
+   add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
+   }
+
+   return 0;
+}
+early_param("iommu.non_strict", iommu_dma_setup);
+
 static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
 {
if (cookie->type == IOMMU_DMA_IOVA_COOKIE)
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 5/7] iommu/arm-smmu-v3: Add support for non-strict mode

2018-09-13 Thread Robin Murphy
From: Zhen Lei 

Dynamically choose strict or non-strict mode for page table config based
on the iommu domain type.

Signed-off-by: Zhen Lei 
[rm: convert to domain attribute]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-v3.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f10c852479fc..e2f0e4a3374d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -612,6 +612,7 @@ struct arm_smmu_domain {
struct mutexinit_mutex; /* Protects smmu pointer */
 
struct io_pgtable_ops   *pgtbl_ops;
+   boolnon_strict;
 
enum arm_smmu_domain_stage  stage;
union {
@@ -1633,6 +1634,9 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain)
if (smmu->features & ARM_SMMU_FEAT_COHERENCY)
pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
 
+   if (smmu_domain->non_strict)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
if (!pgtbl_ops)
return -ENOMEM;
@@ -1934,13 +1938,17 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
switch (attr) {
case DOMAIN_ATTR_NESTING:
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return -EINVAL;
+   *(int *)data = smmu_domain->non_strict;
+   return 0;
default:
return -ENODEV;
}
@@ -1952,13 +1960,15 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
mutex_lock(&smmu_domain->init_mutex);
 
switch (attr) {
case DOMAIN_ATTR_NESTING:
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
if (smmu_domain->smmu) {
ret = -EPERM;
goto out_unlock;
@@ -1970,6 +1980,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
break;
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   if (domain->type != IOMMU_DOMAIN_DMA) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   smmu_domain->non_strict = *(int *)data;
+   if (smmu_domain->pgtbl_ops)
+   io_pgtable_set_non_strict(smmu_domain->pgtbl_ops,
+ smmu_domain->non_strict);
+   break;
default:
ret = -ENODEV;
}
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 4/7] iommu/io-pgtable: Add helper for toggling non-strict mode

2018-09-13 Thread Robin Murphy
Since this might become a repeated idiom in drivers, let's add a tidy
encapsulation from the outset.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/io-pgtable.c | 9 +
 drivers/iommu/io-pgtable.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 127558d83667..af9abe52cf06 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -77,3 +77,12 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
io_pgtable_tlb_flush_all(iop);
io_pgtable_init_table[iop->fmt]->free(iop);
 }
+
+void io_pgtable_set_non_strict(struct io_pgtable_ops *ops, bool val) {
+   struct io_pgtable *iop = io_pgtable_ops_to_pgtable(ops);
+
+   if (val)
+   iop->cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+   else
+   iop->cfg.quirks &= ~IO_PGTABLE_QUIRK_NON_STRICT;
+}
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 47d5ae559329..0602a132655c 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -153,6 +153,7 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum 
io_pgtable_fmt fmt,
  */
 void free_io_pgtable_ops(struct io_pgtable_ops *ops);
 
+void io_pgtable_set_non_strict(struct io_pgtable_ops *ops, bool val);
 
 /*
  * Internal structures for page table allocator implementations.
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 2/7] iommu/dma: Add support for non-strict mode

2018-09-13 Thread Robin Murphy
From: Zhen Lei 

1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
   capable call domain->ops->flush_iotlb_all to flush TLB.
2. During the iommu domain initialization phase, base on domain->non_strict
   field to check whether non-strict mode is supported or not. If so, call
   init_iova_flush_queue to register iovad->flush_cb callback.
3. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
   -->iommu_dma_free_iova. If the domain is non-strict, call queue_iova to
   put off iova freeing, and omit iommu_tlb_sync operation.

Signed-off-by: Zhen Lei 
[rm: convert raw boolean to domain attribute]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 31 ++-
 include/linux/iommu.h |  1 +
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 511ff9a1d6d9..d91849fe4ebe 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -55,8 +55,13 @@ struct iommu_dma_cookie {
};
struct list_headmsi_page_list;
spinlock_t  msi_lock;
+
+   /* Only be assigned in non-strict mode, otherwise it's NULL */
+   struct iommu_domain *domain;
 };
 
+static bool iommu_dma_non_strict __read_mostly;
+
 static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
 {
if (cookie->type == IOMMU_DMA_IOVA_COOKIE)
@@ -257,6 +262,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
return ret;
 }
 
+static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad)
+{
+   struct iommu_dma_cookie *cookie;
+   struct iommu_domain *domain;
+
+   cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
+   domain = cookie->domain;
+
+   domain->ops->flush_iotlb_all(domain);
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -275,6 +291,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
unsigned long order, base_pfn, end_pfn;
+   int attr = 1;
 
if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
return -EINVAL;
@@ -308,6 +325,13 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
}
 
init_iova_domain(iovad, 1UL << order, base_pfn);
+
+   if (iommu_dma_non_strict && !iommu_domain_set_attr(domain,
+   DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, &attr)) {
+   cookie->domain = domain;
+   init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, NULL);
+   }
+
if (!dev)
return 0;
 
@@ -393,6 +417,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie 
*cookie,
/* The MSI case is only ever cleaning up its most recent allocation */
if (cookie->type == IOMMU_DMA_MSI_COOKIE)
cookie->msi_iova -= size;
+   else if (cookie->domain)/* non-strict mode */
+   queue_iova(iovad, iova_pfn(iovad, iova),
+   size >> iova_shift(iovad), 0);
else
free_iova_fast(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad));
@@ -408,7 +435,9 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, 
dma_addr_t dma_addr,
dma_addr -= iova_off;
size = iova_align(iovad, size + iova_off);
 
-   WARN_ON(iommu_unmap(domain, dma_addr, size) != size);
+   WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
+   if (!cookie->domain)
+   iommu_tlb_sync(domain);
iommu_dma_free_iova(cookie, dma_addr, size);
 }
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 87994c265bf5..decabe8e8dbe 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMU_ENABLE,
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
+   DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 3/7] iommu/io-pgtable-arm: Add support for non-strict mode

2018-09-13 Thread Robin Murphy
From: Zhen Lei 

To support non-strict mode, now we only TLBI and sync for strict mode,
except for non-leaf invalidations since page table updates themselves
must always be synchronous.

To save having to reason about it too much, make sure the invalidation
in arm_lpae_split_blk_unmap() just performs its own unconditional sync
to minimise the window in which we're technically violating the break-
before-make requirement on a live mapping. This might work out redundant
with an outer-level sync for strict unmaps, but we'll never be splitting
blocks on a DMA fastpath anyway.

Signed-off-by: Zhen Lei 
[rm: tweak comment, commit message, and split_blk_unmap logic]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/io-pgtable-arm.c | 9 ++---
 drivers/iommu/io-pgtable.h | 5 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 2f79efd16a05..5b915aab7fd3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -576,6 +576,7 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
tablep = iopte_deref(pte, data);
} else if (unmap_idx >= 0) {
io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
+   io_pgtable_tlb_sync(&data->iop);
return size;
}
 
@@ -609,7 +610,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
io_pgtable_tlb_sync(iop);
ptep = iopte_deref(pte, data);
__arm_lpae_free_pgtable(data, lvl + 1, ptep);
-   } else {
+   } else if (!(iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT)) {
io_pgtable_tlb_add_flush(iop, iova, size, size, true);
}
 
@@ -771,7 +772,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
u64 reg;
struct arm_lpae_io_pgtable *data;
 
-   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
+   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
+   IO_PGTABLE_QUIRK_NON_STRICT))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
@@ -863,7 +865,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, 
void *cookie)
struct arm_lpae_io_pgtable *data;
 
/* The NS quirk doesn't apply at stage 2 */
-   if (cfg->quirks & ~IO_PGTABLE_QUIRK_NO_DMA)
+   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
+   IO_PGTABLE_QUIRK_NON_STRICT))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 2df79093cad9..47d5ae559329 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -71,12 +71,17 @@ struct io_pgtable_cfg {
 *  be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
 *  software-emulated IOMMU), such that pagetable updates need not
 *  be treated as explicit DMA data.
+*
+* IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
+*  on unmap, for DMA domains using the flush queue mechanism for
+*  delayed invalidation.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
#define IO_PGTABLE_QUIRK_TLBI_ON_MAPBIT(2)
#define IO_PGTABLE_QUIRK_ARM_MTK_4GBBIT(3)
#define IO_PGTABLE_QUIRK_NO_DMA BIT(4)
+   #define IO_PGTABLE_QUIRK_NON_STRICT BIT(5)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 0/6] Add non-strict mode support for iommu-dma

2018-09-13 Thread Robin Murphy
Hi all,

Since we'd like to get this polished up and merged and Leizhen has other
commitments, here's v6 of the previous series[1] wherein I address all
my own feedback :)

The principal change is that I've inverted things slightly such that
it's now a generic domain attribute controlled by iommu-dma given the
necessary support from individual IOMMU drivers. That way we can easily
enable other drivers straight away, as I've done for SMMUv2 here (which
also allowed me to give it a quick test with MMU-401s on a Juno board).
Otherwise it's really just cosmetic cleanup and rebasing onto Will's
pending SMMU queue.

Robin.

[1] https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg25150.html


Robin Murphy (2):
  iommu/io-pgtable: Add helper for toggling non-strict mode
  iommu/arm-smmu: Support non-strict mode

Zhen Lei (5):
  iommu/arm-smmu-v3: Implement flush_iotlb_all hook
  iommu/dma: Add support for non-strict mode
  iommu/io-pgtable-arm: Add support for non-strict mode
  iommu/arm-smmu-v3: Add support for non-strict mode
  iommu/dma: Add bootup option "iommu.non_strict"

 .../admin-guide/kernel-parameters.txt | 13 +
 drivers/iommu/arm-smmu-v3.c   | 43 +---
 drivers/iommu/arm-smmu.c  | 43 +---
 drivers/iommu/dma-iommu.c | 49 ++-
 drivers/iommu/io-pgtable-arm.c|  9 ++--
 drivers/iommu/io-pgtable.c|  9 
 drivers/iommu/io-pgtable.h|  6 +++
 include/linux/iommu.h |  1 +
 8 files changed, 155 insertions(+), 18 deletions(-)

-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/7] iommu/arm-smmu-v3: Implement flush_iotlb_all hook

2018-09-13 Thread Robin Murphy
From: Zhen Lei 

.flush_iotlb_all is currently stubbed to arm_smmu_iotlb_sync() since the
only time it would ever need to actually do anything is for callers
doing their own explicit batching, e.g.:

iommu_unmap_fast(domain, ...);
iommu_unmap_fast(domain, ...);
iommu_iotlb_flush_all(domain, ...);

where since io-pgtable still issues the TLBI commands implicitly in the
unmap instead of implementing .iotlb_range_add, the "flush" only needs
to ensure completion of those already-in-flight invalidations.

However, we're about to start using it in anger with flush queues, so
let's get a proper implementation wired up.

Signed-off-by: Zhen Lei 
Reviewed-by: Robin Murphy 
[rm: expand commit message]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-v3.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e395f1ff3f81..f10c852479fc 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1781,6 +1781,14 @@ arm_smmu_unmap(struct iommu_domain *domain, unsigned 
long iova, size_t size)
return ops->unmap(ops, iova, size);
 }
 
+static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   if (smmu_domain->smmu)
+   arm_smmu_tlb_inv_context(smmu_domain);
+}
+
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -2008,7 +2016,7 @@ static struct iommu_ops arm_smmu_ops = {
.attach_dev = arm_smmu_attach_dev,
.map= arm_smmu_map,
.unmap  = arm_smmu_unmap,
-   .flush_iotlb_all= arm_smmu_iotlb_sync,
+   .flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
.add_device = arm_smmu_add_device,
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 2/3] dma-mapping: clear dma_ops pointer also on ARM

2018-09-13 Thread Wolfram Sang
The generic fallback of arch_teardown_dma_ops() clears the dma_ops
pointer but the ARM specific version does not. Rename the generic one,
so it can be called from ARM specific one, too. This will ensure
consistent behaviour.

Signed-off-by: Wolfram Sang 
---
 arch/arm/mm/dma-mapping.c   | 6 +++---
 include/linux/dma-mapping.h | 5 +++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e3b04786838f..466b0242e8af 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2396,8 +2396,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
 
 void arch_teardown_dma_ops(struct device *dev)
 {
-   if (!dev->archdata.dma_ops_setup)
-   return;
+   if (dev->archdata.dma_ops_setup)
+   arm_teardown_iommu_dma_ops(dev);
 
-   arm_teardown_iommu_dma_ops(dev);
+   generic_teardown_dma_ops(dev);
 }
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index eafd6f318e78..020512cb7f0e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -663,11 +663,12 @@ static inline void arch_setup_dma_ops(struct device *dev, 
u64 dma_base,
  bool coherent) { }
 #endif
 
-#ifndef arch_teardown_dma_ops
-static inline void arch_teardown_dma_ops(struct device *dev)
+static inline void generic_teardown_dma_ops(struct device *dev)
 {
dev->dma_ops = NULL;
 }
+#ifndef arch_teardown_dma_ops
+#define arch_teardown_dma_ops generic_teardown_dma_ops
 #endif
 
 static inline unsigned int dma_get_max_seg_size(struct device *dev)
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 3/3] dma-mapping: clear dma_parms on teardown, too

2018-09-13 Thread Wolfram Sang
While sanitizig the pointer for dma_ops on teardown, do the same for
dma_parms, too. Rename the function to have a more generic name.

Signed-off-by: Wolfram Sang 
---
 arch/arm/mm/dma-mapping.c   | 2 +-
 include/linux/dma-mapping.h | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 466b0242e8af..bcf77bc0423f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2399,5 +2399,5 @@ void arch_teardown_dma_ops(struct device *dev)
if (dev->archdata.dma_ops_setup)
arm_teardown_iommu_dma_ops(dev);
 
-   generic_teardown_dma_ops(dev);
+   generic_teardown_dma(dev);
 }
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 020512cb7f0e..6a2d8779b1d8 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -663,12 +663,13 @@ static inline void arch_setup_dma_ops(struct device *dev, 
u64 dma_base,
  bool coherent) { }
 #endif
 
-static inline void generic_teardown_dma_ops(struct device *dev)
+static inline void generic_teardown_dma(struct device *dev)
 {
dev->dma_ops = NULL;
+   dev->dma_parms = NULL;
 }
 #ifndef arch_teardown_dma_ops
-#define arch_teardown_dma_ops generic_teardown_dma_ops
+#define arch_teardown_dma_ops generic_teardown_dma
 #endif
 
 static inline unsigned int dma_get_max_seg_size(struct device *dev)
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 1/3] ARM: dma-mapping: update comment about handling dma_ops when detaching from IOMMU

2018-09-13 Thread Wolfram Sang
Update the comment because we don't set the pointer to NULL anymore.
Also use the correct pointer name 'dma_ops' instead of 'dma_map_ops'.

Fixes: 1874619a7df4 ("ARM: dma-mapping: Set proper DMA ops in 
arm_iommu_detach_device()")
Signed-off-by: Wolfram Sang 
---
 arch/arm/mm/dma-mapping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 66566472c153..e3b04786838f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2287,7 +2287,7 @@ EXPORT_SYMBOL_GPL(arm_iommu_attach_device);
  * @dev: valid struct device pointer
  *
  * Detaches the provided device from a previously attached map.
- * This voids the dma operations (dma_map_ops pointer)
+ * This overwrites the dma_ops pointer with appropriate non-IOMMU ops.
  */
 void arm_iommu_detach_device(struct device *dev)
 {
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 0/3] dma-mapping: clear dangling pointers consistently

2018-09-13 Thread Wolfram Sang
When working with dma_set_max_seg_size(), I noticed issues with a
dangling dma_parms pointer. I saw Christoph just worked on handling
something similar for the dma_ops pointer, too. I came up with three
patches on top of Christoph's work which I send out for discussion here:

Patch 1 fixes a meanwhile stale comment.

Patch 2 makes clearing the dma_ops pointer more consistent because it
was missed on the custom ARM implementation to the best of my knowledge.

Patch 3 generalizes teardown_dma_ops to teardown_dma, so that clearing
dma_parms can be added there.

All these patches are based on the dma-mapping for-next branch. They are
build tested and runtime tested on a Renesas Salvator-XS board (R-Car
M3-N, ARM64) and Renesas Lager board (R-Car H2, ARM32).

A branch containing these (and dma_parms additions for the above boards)
can be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git 
renesas/sdhi/set_max_seg

Looking forward to opinions.

Regards,

   Wolfram


Wolfram Sang (3):
  ARM: dma-mapping: update comment about handling dma_ops when detaching
from IOMMU
  dma-mapping: clear dma_ops pointer also on ARM
  dma-mapping: clear dma_parms on teardown, too

 arch/arm/mm/dma-mapping.c   | 8 
 include/linux/dma-mapping.h | 6 --
 2 files changed, 8 insertions(+), 6 deletions(-)

-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-13 Thread Jean-Philippe Brucker
On 13/09/2018 01:19, Tian, Kevin wrote:
>>> This is proposed for architectures which support finer granularity
>>> second level translation with no impact on architectures which only
>>> support Source ID or the similar granularity.
>>
>> Just to be clear, in this paragraph you're only referring to the
>> Nested/second-level translation for mdev, which is specific to vt-d
>> rev3? Other architectures can still do first-level translation with
>> PASID, to support some use-cases of IOMMU aware mediated device
>> (assigning mdevs to userspace drivers, for example)
> 
> yes. aux domain concept applies only to vt-d rev3 which introduces
> scalable mode. Care is taken to avoid breaking usages on existing
> architectures.
> 
> one note. Assigning mdevs to user space alone doesn't imply IOMMU
> aware. All existing mdev usages use software or proprietary methods to
> isolate DMA. There is only one potential IOMMU aware mdev usage 
> which we talked not rely on vt-d rev3 scalable mode - wrap a random 
> PCI device into a single mdev instance (no sharing). In that case mdev 
> inherits RID from parent PCI device, thus is isolated by IOMMU in RID 
> granular. Our RFC supports this usage too. In VFIO two usages (PASID-
> based and RID-based) use same code path, i.e. always binding domain to
> the parent device of mdev. But within IOMMU they go different paths.
> PASID-based will go to aux-domain as iommu_enable_aux_domain
> has been called on that device. RID-based will follow existing 
> unmanaged domain path, as if it is parent device assignment.

For Arm SMMU we're more interested in the PASID-granular case than the
RID-granular one. It doesn't necessarily require vt-d rev3 scalable
mode, the following example can be implemented with an SMMUv3, since it
only needs PASID-granular first-level translation:

We have a PCI function that supports PASID, and can be partitioned into
multiple isolated entities, mdevs. Each mdev has an MMIO frame, an MSI
vector and a PASID.

Different processes (userspace drivers, not QEMU) each open one mdev. A
process controlling one mdev has two ways of doing DMA:

(1) Classically, the process uses a VFIO_TYPE1v2_IOMMU container. This
creates an auxiliary domain for the mdev, with PASID #35. The process
creates DMA mappings with VFIO_IOMMU_MAP_DMA. VFIO calls iommu_map on
the auxiliary domain. The IOMMU driver populates the pgtables associated
with PASID #35.

(2) SVA. One way of doing it: the process uses a new
"VFIO_TYPE1_SVA_IOMMU" type of container. VFIO binds the process address
space to the device, gets PASID #35. Simpler, but not everyone wants to
use SVA, especially not userspace drivers which need the highest
performance.


This example only needs to modify first-level translation, and works
with SMMUv3. The kernel here could be the host, in which case
second-level translation is disabled in the SMMU, or it could be the
guest, in which case second-level mappings are created by QEMU and
first-level translation is managed by assigning PASID tables to the guest.

So (2) would use iommu_sva_bind_device(), but (1) needs something else.
Aren't auxiliary domains suitable for (1)? Why limit auxiliary domain to
second-level or nested translation? It seems silly to use a different
API for first-level, since the flow in userspace and VFIO is the same as
your second-level case as far as MAP_DMA ioctl goes. The difference is
that in your case the auxiliary domain supports an additional operation
which binds first-level page tables. An auxiliary domain that only
supports first-level wouldn't support this operation, but it can still
implement iommu_map/unmap/etc.


Another note: if for some reason you did want to allow userspace to
choose between first-level or second-level, you could implement the
VFIO_TYPE1_NESTING_IOMMU container. It acts like a VFIO_TYPE1v2_IOMMU,
but also sets the DOMAIN_ATTR_NESTING on the IOMMU domain. So DMA_MAP
ioctl on a NESTING container would populate second-level, and DMA_MAP on
a normal container populates first-level. But if you're always going to
use second-level by default, the distinction isn't necessary.


>> Sounds good, I'll drop the private PASID patch if we can figure out a
>> solution to the attach/detach_dev problem discussed on patch 8/10
>>
> 
> Can you elaborate a bit on private PASID usage? what is the
> high level flow on it? 
> 
> Again based on earlier explanation, aux domain is specific to IOMMU
> architecture supporting vtd scalable mode-like capability, which allows
> separate 2nd/1st level translations per PASID. Need a better understanding
> how private PASID is relevant here.

Private PASIDs are used for doing iommu_map/iommu_unmap on PASIDs
(first-level translation):
https://www.spinics.net/lists/dri-devel/msg177003.html As above, some
people don't want SVA, some can't do it, some may even want a few
private address spaces just for their kernel driver. They need a way to
allocate PASIDs and do iommu_map/iommu_unmap on them, without binding 

Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-13 Thread Jerome Glisse
On Thu, Sep 13, 2018 at 04:32:32PM +0800, Kenneth Lee wrote:
> On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote:
> > On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote:
> > > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote:
> > > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote:
> > > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote:
> > > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote:
> > > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote:
> > > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote:
> > > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse wrote:
> > > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee wrote:
> > > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex Williamson 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth Lee 
> > > > > > > > > > > > > wrote:
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it 
> > > > > > > > > "copy_from_user" the
> > > > > > > > > user memory to the kernel. That is not what we need. What we 
> > > > > > > > > try to get is: the
> > > > > > > > > user application do something on its data, and push it away 
> > > > > > > > > to the accelerator,
> > > > > > > > > and says: "I'm tied, it is your turn to do the job...". Then 
> > > > > > > > > the accelerator has
> > > > > > > > > the memory, referring any portion of it with the same VAs of 
> > > > > > > > > the application,
> > > > > > > > > even the VAs are stored inside the memory itself.
> > > > > > > > 
> > > > > > > > You were not looking at right place see 
> > > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > > > > > It does GUP and create GEM object AFAICR you can wrap that GEM 
> > > > > > > > object into a
> > > > > > > > dma buffer object.
> > > > > > > > 
> > > > > > > 
> > > > > > > Thank you for directing me to this implementation. It is 
> > > > > > > interesting:).
> > > > > > > 
> > > > > > > But it is not yet solve my problem. If I understand it right, the 
> > > > > > > userptr in
> > > > > > > i915 do the following:
> > > > > > > 
> > > > > > > 1. The user process sets a user pointer with size to the kernel 
> > > > > > > via ioctl.
> > > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's mm 
> > > > > > > for further
> > > > > > >reference.
> > > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the 
> > > > > > > device. So the data
> > > > > > >can be shared between the user space and the hardware.
> > > > > > > 
> > > > > > > But my scenario is: 
> > > > > > > 
> > > > > > > 1. The user process has some data in the user space, pointed by a 
> > > > > > > pointer, say
> > > > > > >ptr1. And within the memory, there may be some other pointers, 
> > > > > > > let's say one
> > > > > > >of them is ptr2.
> > > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO 
> > > > > > > space. And the
> > > > > > >hardware must refer ptr1 and ptr2 *directly* for data.
> > > > > > > 
> > > > > > > Userptr lets the hardware and process share the same memory 
> > > > > > > space. But I need
> > > > > > > them to share the same *address space*. So IOMMU is a MUST for 
> > > > > > > WarpDrive,
> > > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the 
> > > > > > > procedure is OK.
> > > > > > 
> > > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ?
> > > > > > If so then wait for necessary SVA/SVM to land and do warp drive
> > > > > > without non SVA/SVM path.
> > > > > > 
> > > > > 
> > > > > I think we should clear the concept of SVA/SVM here. As my 
> > > > > understanding, Share
> > > > > Virtual Address/Memory means: any virtual address in a process can be 
> > > > > used by
> > > > > device at the same time. This requires IOMMU device to support PASID. 
> > > > > And
> > > > > optionally, it requires the feature of page-fault-from-device.
> > > > 
> > > > Yes we agree on what SVA/SVM is. There is a one gotcha thought, access
> > > > to range that are MMIO map ie CPU page table pointing to IO memory, IIRC
> > > > it is undefined what happens on some platform for a device trying to
> > > > access those using SVA/SVM.
> > > > 
> > > > 
> > > > > But before the feature is settled down, IOMMU can be used immediately 
> > > > > in the
> > > > > current kernel. That make it possible to assign ONE process's virtual 
> > > > > addresses
> > > > > to the device's IOMMU page table with GUP. This make WarpDrive work 
> > > > > well for one
> > > > > process.
> > > > 
> > > > UH ? How ? You want to GUP _every_ single valid address in the process
> > > > and map it to the device ? How do you handle new vm

Re: [PATCH v2] drivers/vfio: Allow type-1 IOMMU instantiation with all ARM/ARM64 IOMMUs

2018-09-13 Thread Robin Murphy

On 13/09/18 14:15, Geert Uytterhoeven wrote:

Currently the type-1 IOMMU instantiation depends on "ARM_SMMU ||
ARM_SMMU_V3", while it applies to other ARM/ARM64 platforms with an
IOMMU (e.g. Renesas VMSA-compatible IPMMUs).

Instead of extending the list of IOMMU types on ARM platforms, replace
the list by "ARM || ARM64", like other architectures do.  The feature is
still restricted to ARM/ARM64 platforms with an IOMMU by the dependency
on IOMMU_API.


Reviewed-by: Robin Murphy 


Signed-off-by: Geert Uytterhoeven 
---
Tested with sata_rcar on Renesas R-Car H3 ES2.0.

This causes a trivial merge conflict with commit c01eaa95ad30897b ("Make
anon_inodes unconditional") in vfs/for-next.

v2:
   - Make the feature just depend on ARM || ARM64, instead of adding yet
 another IPMMU_VMSA dependency, as suggested by Robin Murphy
 .
---
  drivers/vfio/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index c84333eb5eb59bef..9de5ed38da830a91 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -21,7 +21,7 @@ config VFIO_VIRQFD
  menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
-   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3)
+   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
select ANON_INODES
help
  VFIO provides a framework for secure userspace device drivers.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] drivers/vfio: Allow type-1 IOMMU instantiation with all ARM/ARM64 IOMMUs

2018-09-13 Thread Geert Uytterhoeven
Currently the type-1 IOMMU instantiation depends on "ARM_SMMU ||
ARM_SMMU_V3", while it applies to other ARM/ARM64 platforms with an
IOMMU (e.g. Renesas VMSA-compatible IPMMUs).

Instead of extending the list of IOMMU types on ARM platforms, replace
the list by "ARM || ARM64", like other architectures do.  The feature is
still restricted to ARM/ARM64 platforms with an IOMMU by the dependency
on IOMMU_API.

Signed-off-by: Geert Uytterhoeven 
---
Tested with sata_rcar on Renesas R-Car H3 ES2.0.

This causes a trivial merge conflict with commit c01eaa95ad30897b ("Make
anon_inodes unconditional") in vfs/for-next.

v2:
  - Make the feature just depend on ARM || ARM64, instead of adding yet
another IPMMU_VMSA dependency, as suggested by Robin Murphy
.
---
 drivers/vfio/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index c84333eb5eb59bef..9de5ed38da830a91 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -21,7 +21,7 @@ config VFIO_VIRQFD
 menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
-   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3)
+   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
select ANON_INODES
help
  VFIO provides a framework for secure userspace device drivers.
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: of_dma_request_slave_channel() failed ?

2018-09-13 Thread Geert Uytterhoeven
Hi Robin,

On Thu, Sep 13, 2018 at 12:12 PM Robin Murphy  wrote:
> On 13/09/18 10:00, Geert Uytterhoeven wrote:
> [...]
> > The main issue is that if of_dma_find_controller() fails, a DMA slave driver
> > cannot distinguish between dmac not yet probed successfully, and dmac
> > driver not present.
>
> ...which is in fact the exact same problem that the IOMMU code has -
> might it make sense to give of_dma_request_slave_channel() similar
> (optional?) driver_deferred_probe_check_state() logic, i.e. "if my DMAC
> driver's not shown up by this point, assume it's not built-in and go on
> without it". Of course it is somewhat easier for IOMMU drivers as
> there's zero chance of those popping up as modules later on.

It may solve the issue in some cases.  But only if the dmac is reprobed
before the DMA slave driver, which is not guaranteed.

BTW, it seems e.g. i2c and serial suffer from the same problem, and fall
back to PIO. However, these drivers try to obtain the DMA channel when
used, not during probe, so they start using DMA after the dmac has been
probed successfully.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: of_dma_request_slave_channel() failed ?

2018-09-13 Thread Robin Murphy

On 13/09/18 10:00, Geert Uytterhoeven wrote:
[...]

The main issue is that if of_dma_find_controller() fails, a DMA slave driver
cannot distinguish between dmac not yet probed successfully, and dmac
driver not present.


...which is in fact the exact same problem that the IOMMU code has - 
might it make sense to give of_dma_request_slave_channel() similar 
(optional?) driver_deferred_probe_check_state() logic, i.e. "if my DMAC 
driver's not shown up by this point, assume it's not built-in and go on 
without it". Of course it is somewhat easier for IOMMU drivers as 
there's zero chance of those popping up as modules later on.



However, as we rely more and more on probe deferral, this will cause more
issues in the future.

I guess the proper solution is to take into account dependencies described
in DT before probing devices, i.e. devices with phandles should always be
probed after the targets of these phandles?
Complication:
   - Circular dependencies,
   - Optional phandle targets (dmacs, oops).


Yeah, a proper dependency graph would be ideal, but it's certainly not 
trivial. There was an interesting proposal a couple of years ago of a 
half-way approach where various phandle-chasing routines could actively 
try to probe the target, but I think that got stymied on the fact that a 
single OF node may represent multiple different driver-level providers, 
so it's tricky to tell whether the *right* dependency has been satisfied 
without detailed knowledge of the individual bindings and driver 
implementations. That's probably still an issue for other approaches 
too, sadly.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: of_dma_request_slave_channel() failed ?

2018-09-13 Thread Geert Uytterhoeven
Hi Mark,

On Wed, Sep 12, 2018 at 5:51 PM Mark Brown  wrote:
> On Tue, Sep 11, 2018 at 11:43:47AM +0200, Geert Uytterhoeven wrote:
> > So it seems the audio DMAC is deferred a second time, before the iommu 
> > driver
> > probed.
>
> Shouldn't there be at least one more round of deferred probe triggers
> after the IOMMU probes?

My statement above was incorrect. Adding more debug info allowed to gain
more insight:

A. If CONFIG_IPMMU_VMSA=y, everything is fine:

  1. ipmmu probes,
  2. audio-dmac probes,
  3. audio probes, using DMA

B. If CONFIG_IPMMU_VMSA=n, the sound driver falls back to PIO, which
   Morimoto-san worked around by commit 6c92d5a2744e2761 ("ASoC: rsnd:
   don't fallback to PIO mode when -EPROBE_DEFER")[1]

  1. ipmmu is not probed: driver not enabled => OK
  2. audio-dmac is not probed: iommu not found
  3. audio fails to probe (3 times, hence just ignoring the first one is
 not sufficient):
"of_dma_find_controller: can't find DMA controller
 /soc/dma-controller@ec70"
=> fell back to PIO before [1]
=> -EPROBE_DEFER since [1]
  4. audio-dmac probes
  of_iommu_xlate() calls driver_deferred_probe_check_state(), leading
  to ("ignoring dependency for device, assuming no driver")

  If step 3 returned -EPROBE_DEFER, there's one additional step:
  5. audio reprobes successfully, using DMA.

While step 2 (DMA-capable device not probed because iommu not found)
is correct for devices with a builtin dmac, it causes issues with external
dmacs, as the dmac probe may be postponed after the DMA slave driver
probe.

C. Before commit ac6bbf0cdf4206c5 ("iommu: Remove IOMMU_OF_DECLARE"),
with CONFIG_IPMMU_VMSA=n:

  1. ipmmu is not probed: driver not enabled => OK
  2. audio-dmac probes => OK
  Missing IOMMU ignored, as per
  !ops && !of_iommu_driver_present(iommu_spec->np)
  in of_iommu_xlate(), which is really
  !ops && !of_match_node(&__iommu_of_table, np))
  3. audio probes, using DMA
  Note: if of_dma_find_controller() would have failed (e.g. missing
  dmac driver), the audio driver fell back to PIO here.

As per Morimoto-san's recent reply, audio really needs DMA, so the current
behavior of B.5. is OK. However, this may not be true for other devices
(e.g. SPI, i2c, serial, ...), where the dmac is optional.

The main issue is that if of_dma_find_controller() fails, a DMA slave driver
cannot distinguish between dmac not yet probed successfully, and dmac
driver not present.

However, as we rely more and more on probe deferral, this will cause more
issues in the future.

I guess the proper solution is to take into account dependencies described
in DT before probing devices, i.e. devices with phandles should always be
probed after the targets of these phandles?
Complication:
  - Circular dependencies,
  - Optional phandle targets (dmacs, oops).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: of_dma_request_slave_channel() failed ?

2018-09-13 Thread Geert Uytterhoeven
Hi Morimoto-san,

On Thu, Sep 13, 2018 at 7:48 AM Kuninori Morimoto
 wrote:
> > > -
> > > commit ac6bbf0cdf4206c517ac9789814c23e372ebce4d
> > > Author: Rob Herring 
> > > Date:   Mon Jul 9 09:41:52 2018 -0600
> > >
> > > iommu: Remove IOMMU_OF_DECLARE
> > >
> > > Now that we use the driver core to stop deferred probe for missing
> > > drivers, IOMMU_OF_DECLARE can be removed.
> > >
> > > This is slightly less optimal than having a list of built-in drivers 
> > > in
> > > that we'll now defer probe twice before giving up. This shouldn't 
> > > have a
> > > significant impact on boot times as past discussions about deferred
> > > probe have given no evidence of deferred probe having a substantial
> > > impact.
> > > ...
> > > -
> (snip)
> > I assume you wrote commit 6c92d5a2744e2761 ("ASoC: rsnd: don't fallback
> > to PIO mode when -EPROBE_DEFER") to work around this?
>
> Yes, it is work around for it.
>
> > While this got rid of the error messages, and postpones sound initialization
> > until the audio DMAC is available, it does mean that the driver will _never_
> > fall back to PIO.
> >
> > I.e. if CONFIG_RCAR_DMAC=n, audio will fail to probe instead of falling
> > back to PIO.
>
> If I focus only for sound here, the purpose of PIO mode is
> checking basic HW connection, clock settings, etc.
> Without DMAC support, it can't use many sound features.
> And PIO mode is supporting "SSI" only.
>
> If DT has SRC/CTU/DVC settings for sound,
> it needs DMA support anyway.
>
> &rcar_sound {
> ...
> ports {
> rsnd_port0: port@0 {
> rsnd_endpoint0: endpoint {
> ...
> =>  playback = <&ssi0 &src0 &dvc0>;
> };
> };
> };
> };
>
> Before 6c92d5a2744e2761 patch, driver will forcibly ignore
> non-SSI modules, and try to use PIO mode.
> I'm not sure it is "kindly support" or "overkill support".
>
> After this patch, it needs DMA, otherwise, probe will be failed.
> DT shouldn't have non-SSI modules if you want to use PIO mode.
>
> + /* use PIO mode */
> - playback = <&ssi0 &src0 &dvc0>;
> + playback = <&ssi0>;
>

OK, so falling back to PIO was really a "best effort" fallback, and the user
should really enable DMA?

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-13 Thread Kenneth Lee
On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote:
> Date: Tue, 11 Sep 2018 09:40:14 -0400
> From: Jerome Glisse 
> To: Kenneth Lee 
> CC: Kenneth Lee , Alex Williamson
>  , Herbert Xu ,
>  k...@vger.kernel.org, Jonathan Corbet , Greg Kroah-Hartman
>  , Zaibo Xu ,
>  linux-...@vger.kernel.org, Sanjay Kumar , Hao
>  Fang , linux-ker...@vger.kernel.org,
>  linux...@huawei.com, iommu@lists.linux-foundation.org, "David S . Miller"
>  , linux-cry...@vger.kernel.org, Zhou Wang
>  , Philippe Ombredanne ,
>  Thomas Gleixner , Joerg Roedel ,
>  linux-accelerat...@lists.ozlabs.org, Lu Baolu 
> Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <20180911134013.ga3...@redhat.com>
> 
> On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote:
> > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote:
> > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote:
> > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote:
> > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote:
> > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote:
> > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote:
> > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse wrote:
> > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee wrote:
> > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex Williamson 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth Lee 
> > > > > > > > > > > > wrote:
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it 
> > > > > > > > "copy_from_user" the
> > > > > > > > user memory to the kernel. That is not what we need. What we 
> > > > > > > > try to get is: the
> > > > > > > > user application do something on its data, and push it away to 
> > > > > > > > the accelerator,
> > > > > > > > and says: "I'm tied, it is your turn to do the job...". Then 
> > > > > > > > the accelerator has
> > > > > > > > the memory, referring any portion of it with the same VAs of 
> > > > > > > > the application,
> > > > > > > > even the VAs are stored inside the memory itself.
> > > > > > > 
> > > > > > > You were not looking at right place see 
> > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > > > > It does GUP and create GEM object AFAICR you can wrap that GEM 
> > > > > > > object into a
> > > > > > > dma buffer object.
> > > > > > > 
> > > > > > 
> > > > > > Thank you for directing me to this implementation. It is 
> > > > > > interesting:).
> > > > > > 
> > > > > > But it is not yet solve my problem. If I understand it right, the 
> > > > > > userptr in
> > > > > > i915 do the following:
> > > > > > 
> > > > > > 1. The user process sets a user pointer with size to the kernel via 
> > > > > > ioctl.
> > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's mm for 
> > > > > > further
> > > > > >reference.
> > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the device. 
> > > > > > So the data
> > > > > >can be shared between the user space and the hardware.
> > > > > > 
> > > > > > But my scenario is: 
> > > > > > 
> > > > > > 1. The user process has some data in the user space, pointed by a 
> > > > > > pointer, say
> > > > > >ptr1. And within the memory, there may be some other pointers, 
> > > > > > let's say one
> > > > > >of them is ptr2.
> > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO space. 
> > > > > > And the
> > > > > >hardware must refer ptr1 and ptr2 *directly* for data.
> > > > > > 
> > > > > > Userptr lets the hardware and process share the same memory space. 
> > > > > > But I need
> > > > > > them to share the same *address space*. So IOMMU is a MUST for 
> > > > > > WarpDrive,
> > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the 
> > > > > > procedure is OK.
> > > > > 
> > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ?
> > > > > If so then wait for necessary SVA/SVM to land and do warp drive
> > > > > without non SVA/SVM path.
> > > > > 
> > > > 
> > > > I think we should clear the concept of SVA/SVM here. As my 
> > > > understanding, Share
> > > > Virtual Address/Memory means: any virtual address in a process can be 
> > > > used by
> > > > device at the same time. This requires IOMMU device to support PASID. 
> > > > And
> > > > optionally, it requires the feature of page-fault-from-device.
> > > 
> > > Yes we agree on what SVA/SVM is. There is a one gotcha thought, access
> > > to range that are MMIO map ie CPU page table pointing to IO memory, IIRC
> > > it is undefined what happens on some platform for a device trying to
> > > access those using SVA/SVM.
> > > 
> > > 
> > >

RE: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-13 Thread Tian, Kevin
> From: Christian König
> Sent: Friday, September 7, 2018 4:56 PM
> 
> 5. It would be nice to have to allocate multiple PASIDs for the same
> process address space.
>          E.g. some teams at AMD want to use a separate GPU address space
> for their userspace client library. I'm still trying to avoid that, but
> it is perfectly possible that we are going to need that.
>          Additional to that it is sometimes quite useful for debugging
> to isolate where exactly an incorrect access (segfault) is coming from.
> 
> Let me know if there are some problems with that, especially I want to
> know if there is pushback on #5 so that I can forward that :)
> 

We have similar requirement, except that it is "multiple PASIDs for
same process" instead of "for same process address space".

Intel VT-d goes to a 'true' system-wide PASID allocation policy, 
cross both host processes and guest processes. As Jacob explains,
there will be a virtual cmd register on virtual vtd, through which
guest IOMMU driver requests to get system-wide PASIDs allocated
by host IOMMU driver.

with that design, Qemu represents all guest processes in host
side, thus will get "multiple PASIDs allocated for same process".
However instead of binding all PASIDs to same host address space
of  Qemu, each of PASID entry points to guest address space if 
used by guest process.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-13 Thread Tian, Kevin
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Saturday, September 8, 2018 5:25 AM
> > > iommu-sva expects everywhere that the device has an iommu_domain,
> > > it's the first thing we check on entry. Bypassing all of this would
> > > call idr_alloc() directly, and wouldn't have any code in common
> > > with the current iommu-sva. So it seems like you need a layer on
> > > top of iommu-sva calling idr_alloc() when an IOMMU isn't present,
> > > but I don't think it should be in drivers/iommu/
> >
> > In this case I question if the PASID handling should be under
> > drivers/iommu at all.
> >
> > See I can have a mix of VM context which are bound to processes (some
> > few) and VM contexts which are standalone and doesn't care for a
> > process address space. But for each VM context I need a distinct
> > PASID for the hardware to work.

I'm confused about VM context vs. process. Is VM referring to Virtual
Machine or something else? If yes, I don't understand the binding part
- what VM context is bound to (host?) process?

> >
> > I can live if we say if IOMMU is completely disabled we use a simple
> > ida to allocate them, but when IOMMU is enabled I certainly need a
> > way to reserve a PASID without an associated process.
> >
> VT-d would also have such requirement. There is a virtual command
> register for allocate and free PASID for VM use. When that PASID
> allocation request gets propagated to the host IOMMU driver, we need to
> allocate PASID w/o mm.
> 

VT-d is a bit different. In host side, PASID allocation always happens in
Qemu's context, so those PASIDs are recorded with Qemu process, 
though the entries may point to guest page tables instead of host mm
of Qemu.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu