RE: [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers

2018-09-14 Thread Tian, Kevin
> From: Jean-Philippe Brucker
> Sent: Friday, September 14, 2018 10:46 PM
> 
> On 13/09/2018 01:35, Tian, Kevin wrote:
> >>> Let's consider it from the API point of view.
> >>>
> >>> We have iommu_a(de)ttach_device() APIs to attach or detach a domain
> >>> to/from a device. We should avoid applying a limitation of "these are
> >>> only for single domain case, for multiple domains, use another API".
> >>
> >> That's an idealized view of the API, the actual semantics aren't as
> >> simple. For IOMMU drivers that implement IOMMU_DOMAIN_DMA in
> their
> >> domain_alloc operation (Arm SMMU, AMD IOMMU, ...), attach_dev
> >> attaches
> >> an unmanaged domain, detach_dev reattaches the default DMA domain,
> >> and
> >> the detach_dev IOMMU op is not called. We can't change that now, so
> it's
> >> difficult to add more functionality to attach_dev and detach_dev.
> >>
> >
> > Now we have four possible usages on a(de)ttach_device:
> >
> > 1) Normal DMA API path i.e. IOMMU_DOMAIN_DMA, for DMA
> operations
> > in host/parent device driver;
> >
> > 2) VFIO passthru path, when the physical device is assigned to user space
> > or guest driver
> >
> > 3) mdev passthru path 1, when mdev is assigned to user space or guest
> > driver. Here mdev is a wrap on random PCI device
> >
> > 4) mdev passthru path 2, when mdev is assigned to user space or guest
> > driver. Here mdev is in a smaller granularity (e.g. tagged by PASID) as
> > supported by vt-d scalable mode
> >
> > 1) and 2) are existing usages. What you described (unmanaged vs. default)
> > falls into this category.
> >
> > 3) is actually same as 2) in IOMMU layer. sort of passing through DMA
> > capability to guest. Though there is still a parent driver, the parent 
> > driver
> > itself should not do DMA - i.e. unmanaged in host side.
> >
> > 4) is a new code path introduced in IOMMU layer, which is what
> > vfio_a(de)tach_aux_domain is trying to address. In this case parent
> device
> > can still have its own DMA capability, using existing code path 1) (thus
> > default domain still applies). and the parent device can have multiple
> > aux domains (and associated structures), using code path 4).
> 
> We still have the problem that detach() doesn't propagate to the IOMMU
> driver. Here's the current flow, without mdev:
> 
> 1) At boot, the PF's (parent device) driver is probed, and the IOMMU
> core sets up a default DMA domain, to be used by dma_alloc by the PF's
> driver.
> -> iommu.c calls default_domain->ops->attach_dev(default_domain, dev)
> 
> 2) or 3) Later userspace wants to control the PF, replaces the PF's
> driver with vfio-pci. When userspace creates a container, VFIO allocates
> a new IOMMU domain, and calls iommu_attach_group.
> -> iommu.c calls domain->ops->attach_dev(domain, dev)
> This detaches the PF from the default domain, and attaches it to the new
> domain.
> 
> 1) When the container is closed, VFIO calls iommu_detach_group. This
> detaches the PF from its current domain, and attaches it back to the
> default domain.
> -> iommu.c calls default_domain->ops->attach_dev(default_domain, dev)
> 
> -
> Now with mdev, we still attach the DMA domain in 1). Then:
> 
> 4) Userspace opens an mdev and creates a container. VFIO enables aux
> domain for the device. VFIO allocates a new IOMMU domain, and calls
> iommu_attach_device(domain1, parent_dev).
> -> iommu.c calls domain->ops->attach_dev(domain1, dev)
> Because the device is in "aux domain" state, the IOMMU driver does not
> detach from the default domain, but instead allocates a PASID and
> attaches the aux domain. (Side note: for SMMU we couldn't detach from
> the default domain, because we need it for MSI mappings.)

same for vtd. We don't require parent driver to detach its domain in 1).
Parent driver can have its own DMA capability when aux domain is
enabled in parallel

> 
> 4) Userspace opens another mdev.
> -> iommu.c calls domain->ops->attach_dev(domain2, dev)

another mdev in same VFIO container or different? I assume the
latter since you mentioned a new domain2.

> 
> 1)? When the container is closed, VFIO calls
> iommu_detach_device(domain2, parent_dev)
> -> iommu.c calls default_domain->ops->attach_dev(default_domain, dev)
> Given that auxiliary domains are attached, the IOMMU driver could deduce
> that this actually means "detach an auxiliary domain". But which one?

I didn't get this one. There is no need to stick to 1) behavior for
4), i.e. below is expected:
domain2->ops->detach_dev(domain2, dev)

why cannot ARM implement a detach_dev for aux_domain too? My
feeling is that default domain twist is only for switch between 1/2/3
in concept. 

> 
> So the proposed interface doesn't seem to work as is. If we want to use
> iommu_attach/detach_device for auxiliary domains, the existing behavior
> of iommu.c, and IOMMU drivers that rely on it, have to change. Any
> change I can think of right now seems more daunting than introducing a
> different path for auxiliary domains, like 

Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-14 Thread Jacob Pan
On Thu, 13 Sep 2018 16:03:01 +0100
Jean-Philippe Brucker  wrote:

> On 13/09/2018 01:19, Tian, Kevin wrote:
> >>> This is proposed for architectures which support finer granularity
> >>> second level translation with no impact on architectures which
> >>> only support Source ID or the similar granularity.  
> >>
> >> Just to be clear, in this paragraph you're only referring to the
> >> Nested/second-level translation for mdev, which is specific to vt-d
> >> rev3? Other architectures can still do first-level translation with
> >> PASID, to support some use-cases of IOMMU aware mediated device
> >> (assigning mdevs to userspace drivers, for example)  
> > 
> > yes. aux domain concept applies only to vt-d rev3 which introduces
> > scalable mode. Care is taken to avoid breaking usages on existing
> > architectures.
> > 
> > one note. Assigning mdevs to user space alone doesn't imply IOMMU
> > aware. All existing mdev usages use software or proprietary methods
> > to isolate DMA. There is only one potential IOMMU aware mdev usage 
> > which we talked not rely on vt-d rev3 scalable mode - wrap a random 
> > PCI device into a single mdev instance (no sharing). In that case
> > mdev inherits RID from parent PCI device, thus is isolated by IOMMU
> > in RID granular. Our RFC supports this usage too. In VFIO two
> > usages (PASID- based and RID-based) use same code path, i.e. always
> > binding domain to the parent device of mdev. But within IOMMU they
> > go different paths. PASID-based will go to aux-domain as
> > iommu_enable_aux_domain has been called on that device. RID-based
> > will follow existing unmanaged domain path, as if it is parent
> > device assignment.  
> 
> For Arm SMMU we're more interested in the PASID-granular case than the
> RID-granular one. It doesn't necessarily require vt-d rev3 scalable
> mode, the following example can be implemented with an SMMUv3, since
> it only needs PASID-granular first-level translation:
> 
> We have a PCI function that supports PASID, and can be partitioned
> into multiple isolated entities, mdevs. Each mdev has an MMIO frame,
> an MSI vector and a PASID.
> 
> Different processes (userspace drivers, not QEMU) each open one mdev.
> A process controlling one mdev has two ways of doing DMA:
> 
> (1) Classically, the process uses a VFIO_TYPE1v2_IOMMU container. This
> creates an auxiliary domain for the mdev, with PASID #35. The process
> creates DMA mappings with VFIO_IOMMU_MAP_DMA. VFIO calls iommu_map on
> the auxiliary domain. The IOMMU driver populates the pgtables
> associated with PASID #35.
> 
> (2) SVA. One way of doing it: the process uses a new
> "VFIO_TYPE1_SVA_IOMMU" type of container. VFIO binds the process
> address space to the device, gets PASID #35. Simpler, but not
> everyone wants to use SVA, especially not userspace drivers which
> need the highest performance.
> 
> 
> This example only needs to modify first-level translation, and works
> with SMMUv3. The kernel here could be the host, in which case
> second-level translation is disabled in the SMMU, or it could be the
> guest, in which case second-level mappings are created by QEMU and
> first-level translation is managed by assigning PASID tables to the
> guest.
There is a difference in case of guest SVA. VT-d v3 will bind guest
PASID and guest CR3 instead of the guest PASID table. Then turn on
nesting. In case of mdev, the second level is obtained from the aux
domain which was setup for the default PASID. Or in case of PCI device,
second level is harvested from RID2PASID.

> So (2) would use iommu_sva_bind_device(), 
We would need something different than that for guest bind, just to show
the two cases:

int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int
*pasid, unsigned long flags, void *drvdata)

(WIP)
int sva_bind_gpasid(struct device *dev, struct gpasid_bind_data *data)
where:
/**
 * struct gpasid_bind_data - Information about device and guest PASID binding
 * @pasid:  Process address space ID used for the guest mm
 * @addr_width: Guest address width. Paging mode can also be derived.
 * @gcr3:   Guest CR3 value from guest mm
 */
struct gpasid_bind_data {
__u32 pasid;
__u64 gcr3;
__u32 addr_width;
__u32 flags;
#define IOMMU_SVA_GPASID_SREBIT(0) /* supervisor request */
};
Perhaps there is room to merge with io_mm but the life cycle management
of guest PASID and host PASID will be different if you rely on mm
release callback than FD.

> but (1) needs something
> else. Aren't auxiliary domains suitable for (1)? Why limit auxiliary
> domain to second-level or nested translation? It seems silly to use a
> different API for first-level, since the flow in userspace and VFIO
> is the same as your second-level case as far as MAP_DMA ioctl goes.
> The difference is that in your case the auxiliary domain supports an
> additional operation which binds first-level page tables. An
> auxiliary domain that only supports first-level wouldn't 

Re: [RFC PATCH 1/3] ARM: dma-mapping: update comment about handling dma_ops when detaching from IOMMU

2018-09-14 Thread Wolfram Sang
On Fri, Sep 14, 2018 at 02:15:01PM +0200, Geert Uytterhoeven wrote:
> On Thu, Sep 13, 2018 at 5:18 PM Wolfram Sang
>  wrote:
> > Update the comment because we don't set the pointer to NULL anymore.
> > Also use the correct pointer name 'dma_ops' instead of 'dma_map_ops'.
> >
> > Fixes: 1874619a7df4 ("ARM: dma-mapping: Set proper DMA ops in 
> > arm_iommu_detach_device()")
> > Signed-off-by: Wolfram Sang 
> 
> Nice catch!
> 
> Reviewed-by: Geert Uytterhoeven 

Thanks, so I sent this seperately via ALKML now.



signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 3/3] dma-mapping: clear dma_parms on teardown, too

2018-09-14 Thread Wolfram Sang
On Fri, Sep 14, 2018 at 02:23:51PM +0100, Robin Murphy wrote:
> On 13/09/18 16:17, Wolfram Sang wrote:
> > While sanitizig the pointer for dma_ops on teardown, do the same for
> > dma_parms, too. Rename the function to have a more generic name.
> 
> Upon closer inspection, it looks like there are some cases (at least PCI and
> MFD) where dma_parms is installed by the parent/bus at device creation, and
> therefore remains valid and *would* be expected to persist across the child
> device's driver unbinding and rebinding - seems this is more complex than I
> first thought, sorry.

No problem. I am thankful I understand more details. So, the life cycle
of dma_parms is truly that of the device. Which means that the drivers
using devm_kzalloc in their probe, should ideally clear the pointer on
unbind. Otherwise, it is not possible to say if the non-NULL dma_parms
is intended or dangling. Which makes my initial dmam_set_dma_parms()
approach look not too bad, I'd think?

However, I don't want to push hard with this one. If you think this is
too academic, I'll just leave it for now...



signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH, for-4.19] dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration

2018-09-14 Thread Christoph Hellwig
On Fri, Sep 14, 2018 at 04:44:15PM +0100, Robin Murphy wrote:
> On 14/09/18 11:08, Christoph Hellwig wrote:
>> Aby chance to get a review for this?
>
> So without this, the select does nothing, 
> CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL is never defined, and BMIPS gets the 
> static inline stub and never flushes the RAC when it should?

Exactly!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH, for-4.19] dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration

2018-09-14 Thread Robin Murphy

On 14/09/18 11:08, Christoph Hellwig wrote:

Aby chance to get a review for this?


So without this, the select does nothing, 
CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL is never defined, and BMIPS gets 
the static inline stub and never flushes the RAC when it should? I don't 
know enough MIPS to consider even compile-testing to check it, but that 
sounds like a legitimate fix to me.


Robin.


On Tue, Sep 11, 2018 at 11:00:49AM +0200, Christoph Hellwig wrote:

The patch adding the infrastructure failed to actually add the symbol
declaration, oops..

Fixes: faef87723a ("dma-noncoherent: add a arch_sync_dma_for_cpu_all hook")
Signed-off-by: Christoph Hellwig 
---
  kernel/dma/Kconfig | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 9bd54304446f..1b1d63b3634b 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -23,6 +23,9 @@ config ARCH_HAS_SYNC_DMA_FOR_CPU
bool
select NEED_DMA_MAP_STATE
  
+config ARCH_HAS_SYNC_DMA_FOR_CPU_ALL

+   bool
+
  config DMA_DIRECT_OPS
bool
depends on HAS_DMA
--
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

---end quoted text---


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers

2018-09-14 Thread Jean-Philippe Brucker
On 13/09/2018 01:35, Tian, Kevin wrote:
>>> Let's consider it from the API point of view.
>>>
>>> We have iommu_a(de)ttach_device() APIs to attach or detach a domain
>>> to/from a device. We should avoid applying a limitation of "these are
>>> only for single domain case, for multiple domains, use another API".
>>
>> That's an idealized view of the API, the actual semantics aren't as
>> simple. For IOMMU drivers that implement IOMMU_DOMAIN_DMA in their
>> domain_alloc operation (Arm SMMU, AMD IOMMU, ...), attach_dev
>> attaches
>> an unmanaged domain, detach_dev reattaches the default DMA domain,
>> and
>> the detach_dev IOMMU op is not called. We can't change that now, so it's
>> difficult to add more functionality to attach_dev and detach_dev.
>>
> 
> Now we have four possible usages on a(de)ttach_device:
> 
> 1) Normal DMA API path i.e. IOMMU_DOMAIN_DMA, for DMA operations
> in host/parent device driver;
> 
> 2) VFIO passthru path, when the physical device is assigned to user space
> or guest driver
> 
> 3) mdev passthru path 1, when mdev is assigned to user space or guest
> driver. Here mdev is a wrap on random PCI device
> 
> 4) mdev passthru path 2, when mdev is assigned to user space or guest
> driver. Here mdev is in a smaller granularity (e.g. tagged by PASID) as
> supported by vt-d scalable mode
> 
> 1) and 2) are existing usages. What you described (unmanaged vs. default)
> falls into this category.
> 
> 3) is actually same as 2) in IOMMU layer. sort of passing through DMA
> capability to guest. Though there is still a parent driver, the parent driver
> itself should not do DMA - i.e. unmanaged in host side.
> 
> 4) is a new code path introduced in IOMMU layer, which is what
> vfio_a(de)tach_aux_domain is trying to address. In this case parent device
> can still have its own DMA capability, using existing code path 1) (thus
> default domain still applies). and the parent device can have multiple 
> aux domains (and associated structures), using code path 4).

We still have the problem that detach() doesn't propagate to the IOMMU
driver. Here's the current flow, without mdev:

1) At boot, the PF's (parent device) driver is probed, and the IOMMU
core sets up a default DMA domain, to be used by dma_alloc by the PF's
driver.
-> iommu.c calls default_domain->ops->attach_dev(default_domain, dev)

2) or 3) Later userspace wants to control the PF, replaces the PF's
driver with vfio-pci. When userspace creates a container, VFIO allocates
a new IOMMU domain, and calls iommu_attach_group.
-> iommu.c calls domain->ops->attach_dev(domain, dev)
This detaches the PF from the default domain, and attaches it to the new
domain.

1) When the container is closed, VFIO calls iommu_detach_group. This
detaches the PF from its current domain, and attaches it back to the
default domain.
-> iommu.c calls default_domain->ops->attach_dev(default_domain, dev)

-
Now with mdev, we still attach the DMA domain in 1). Then:

4) Userspace opens an mdev and creates a container. VFIO enables aux
domain for the device. VFIO allocates a new IOMMU domain, and calls
iommu_attach_device(domain1, parent_dev).
-> iommu.c calls domain->ops->attach_dev(domain1, dev)
Because the device is in "aux domain" state, the IOMMU driver does not
detach from the default domain, but instead allocates a PASID and
attaches the aux domain. (Side note: for SMMU we couldn't detach from
the default domain, because we need it for MSI mappings.)

4) Userspace opens another mdev.
-> iommu.c calls domain->ops->attach_dev(domain2, dev)

1)? When the container is closed, VFIO calls
iommu_detach_device(domain2, parent_dev)
-> iommu.c calls default_domain->ops->attach_dev(default_domain, dev)
Given that auxiliary domains are attached, the IOMMU driver could deduce
that this actually means "detach an auxiliary domain". But which one?

So the proposed interface doesn't seem to work as is. If we want to use
iommu_attach/detach_device for auxiliary domains, the existing behavior
of iommu.c, and IOMMU drivers that rely on it, have to change. Any
change I can think of right now seems more daunting than introducing a
different path for auxiliary domains, like iommu_attach_aux_domain for
example.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-14 Thread Jean-Philippe Brucker
>> This example only needs to modify first-level translation, and works
>> with SMMUv3. The kernel here could be the host, in which case
>> second-level translation is disabled in the SMMU, or it could be the
>> guest, in which case second-level mappings are created by QEMU and
>> first-level translation is managed by assigning PASID tables to the guest.
> 
> the former yes applies to aux domain concept. The latter doesn't -
> you have only one second-level per device. whole PASID table managed
> by guest means you assign the whole device to guest, which is not the
> concept of aux domain here.

Right, in the latter case, the host uses a "normal" domain to assign the
whole PCI function to the guest. But the guest can still use auxiliary
domains like in my example, to sub-assign the PCI function to different
guest userspace applications.

>> So (2) would use iommu_sva_bind_device(), but (1) needs something else.
>> Aren't auxiliary domains suitable for (1)? Why limit auxiliary domain to
>> second-level or nested translation? It seems silly to use a different
>> API for first-level, since the flow in userspace and VFIO is the same as
>> your second-level case as far as MAP_DMA ioctl goes. The difference is
>> that in your case the auxiliary domain supports an additional operation
>> which binds first-level page tables. An auxiliary domain that only
>> supports first-level wouldn't support this operation, but it can still
>> implement iommu_map/unmap/etc.
> 
> Thanks for correcting me on this. You are right that aux domain shouldn't
> impose such limitation on 2nd or nested only. We define aux domain
> as a normal domain (aux takes effect only when attaching to a device),
> thus it should support all capabilities possible on a normal domain.
> 
> btw I'm not sure whether you look at my comment to patch 8/10. I
> explained the rationale why aux domain doesn't interfere with existing
> default domain usage, and in a quick thinking above example might
> not make difference. but need your confirm here. :-)

Yes sorry, I didn't have time to answer, will do it now

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-14 Thread Jean-Philippe Brucker
On 13/09/2018 17:55, Raj, Ashok wrote:
>> For Arm SMMU we're more interested in the PASID-granular case than the
>> RID-granular one. It doesn't necessarily require vt-d rev3 scalable
>> mode, the following example can be implemented with an SMMUv3, since it
>> only needs PASID-granular first-level translation:
> 
> You are right, you can simply use the first level as IOVA for every PASID.
> 
> Only issue becomes when you need to assign that to a guest, you would be 
> required
> to shadow the 1st level. If you have a 2nd level per-pasid first level can
> be managed in guest and don't require to shadow them.

Right, for us assigning a PASID-granular mdev to a guest requires shadowing

>> Another note: if for some reason you did want to allow userspace to
>> choose between first-level or second-level, you could implement the
>> VFIO_TYPE1_NESTING_IOMMU container. It acts like a VFIO_TYPE1v2_IOMMU,
>> but also sets the DOMAIN_ATTR_NESTING on the IOMMU domain. So DMA_MAP
>> ioctl on a NESTING container would populate second-level, and DMA_MAP on
>> a normal container populates first-level. But if you're always going to
>> use second-level by default, the distinction isn't necessary.
> 
> Where is the nesting attribute specified? in vt-d2 it was part of context
> entry, so also meant all PASID's are nested now. In vt-d3 its part of
> PASID context.

I don't think the nesting attribute is described in details anywhere.
The SMMU drivers use it to know if they should create first- or
second-level mappings. At the moment QEMU always uses
VFIO_TYPE1v2_IOMMU, but Eric Auger is proposing a patch that adds
VFIO_TYPE1_NESTING_IOMMU to QEMU:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg559820.html

> It seems unsafe to share PASID's with different VM's since any request 
> W/O PASID has only one mapping.

Which case are you talking about? It might be more confusing than
helpful, but here's my understanding of what we can assign to a guest:

  |  no vIOMMU  | vIOMMU no PASID  | vIOMMU with PASID
--+-+--+
 VF   | ok  |  shadow or nest  |  nest
 mdev, SMMUv3 | ok  | shadow   |  shadow + PV (?)
 mdev, vt-d3  | ok  |  nest|nest + PV

The first line, assigning a PCI VF to a guest is the "basic" vfio-pci
case. Currently in QEMU it works by shadowing first-level translation.
We still have to upstream nested translation for that case. Vt-d2 didn't
support nested without PASID, vt-d3 offers RID_PASID for this. On SMMUv3
the PASID table is assigned to the guest, whereas on vt-d3 the host
manages the PASID table and individual page tables are assigned to the
guest.

Assigning an mdev (here I'm talking about the PASID-granular partition
of a VF, not the whole RID-granular VF wrapped by an mdev) could be done
by shadowing first-level translation on SMMUv3. It cannot do nested
since the VF has a single set of second-level page tables, which cannot
be used when mdevs are assigned to different VMs. Vt-d3 has one set of
second-level page tables per PASID, so it can do nested.

Since the parent device has a single PASID space, allowing the guest to
use multiple PASIDs for one mdev requires paravirtual allocation of
PASIDs (last column). Vt-d3 uses the Virtual Command Registers for that.
I assume that it is safe because the host is in charge of programming
PASIDs in the parent device, so the guest couldn't use a PASID allocated
to another mdev, but I don't know what the device's programming model
would look like. Anyway I don't think guest PASID is tackled by this
series (right?) and I don't intend to work on it for SMMUv3 (shadowing
stage-1 for vSVA seems like a bad idea...)

Does this seem accurate?

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 6/6] iommu/arm-smmu: Support non-strict mode

2018-09-14 Thread Robin Murphy
All we need is to wire up .flush_iotlb_all properly and implement the
domain attribute, and iommu-dma and io-pgtable-arm will do the rest for
us. Rather than bother implementing it for v7s format for the highly
unlikely chance of that being relevant, we can simply hide the
non-strict flag from io-pgtable for that combination just so anyone who
does actually try it will simply get over-invalidation instead of
failure to initialise domains.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 40 +---
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fd1b80ef9490..aa5be334753b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -246,6 +246,7 @@ struct arm_smmu_domain {
const struct iommu_gather_ops   *tlb_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
+   boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
struct iommu_domain domain;
@@ -863,6 +864,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
 
+   if (smmu_domain->non_strict && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH32_S)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
smmu_domain->smmu = smmu;
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops) {
@@ -1252,6 +1256,14 @@ static size_t arm_smmu_unmap(struct iommu_domain 
*domain, unsigned long iova,
return ops->unmap(ops, iova, size);
 }
 
+static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   if (smmu_domain->tlb_ops)
+   smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
+}
+
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -1470,13 +1482,17 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
switch (attr) {
case DOMAIN_ATTR_NESTING:
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return -EINVAL;
+   *(int *)data = smmu_domain->non_strict;
+   return 0;
default:
return -ENODEV;
}
@@ -1488,13 +1504,15 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
mutex_lock(_domain->init_mutex);
 
switch (attr) {
case DOMAIN_ATTR_NESTING:
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
if (smmu_domain->smmu) {
ret = -EPERM;
goto out_unlock;
@@ -1505,6 +1523,14 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
else
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
+   break;
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   if (domain->type != IOMMU_DOMAIN_DMA) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   smmu_domain->non_strict = *(int *)data;
break;
default:
ret = -ENODEV;
@@ -1562,7 +1588,7 @@ static struct iommu_ops arm_smmu_ops = {
.attach_dev = arm_smmu_attach_dev,
.map= arm_smmu_map,
.unmap  = arm_smmu_unmap,
-   .flush_iotlb_all= arm_smmu_iotlb_sync,
+   .flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
.add_device = arm_smmu_add_device,
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 5/6] iommu/arm-smmu-v3: Add support for non-strict mode

2018-09-14 Thread Robin Murphy
From: Zhen Lei 

Dynamically choose strict or non-strict mode for page table config based
on the iommu domain type.

Signed-off-by: Zhen Lei 
[rm: convert to domain attribute]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-v3.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f10c852479fc..7bbfa5f7ce8e 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -612,6 +612,7 @@ struct arm_smmu_domain {
struct mutexinit_mutex; /* Protects smmu pointer */
 
struct io_pgtable_ops   *pgtbl_ops;
+   boolnon_strict;
 
enum arm_smmu_domain_stage  stage;
union {
@@ -1633,6 +1634,9 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain)
if (smmu->features & ARM_SMMU_FEAT_COHERENCY)
pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
 
+   if (smmu_domain->non_strict)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
return -ENOMEM;
@@ -1934,13 +1938,17 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
switch (attr) {
case DOMAIN_ATTR_NESTING:
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return -EINVAL;
+   *(int *)data = smmu_domain->non_strict;
+   return 0;
default:
return -ENODEV;
}
@@ -1952,13 +1960,15 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
mutex_lock(_domain->init_mutex);
 
switch (attr) {
case DOMAIN_ATTR_NESTING:
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
if (smmu_domain->smmu) {
ret = -EPERM;
goto out_unlock;
@@ -1969,6 +1979,14 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
else
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
+   break;
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   if (domain->type != IOMMU_DOMAIN_DMA) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   smmu_domain->non_strict = *(int *)data;
break;
default:
ret = -ENODEV;
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 4/6] iommu: Add bootup option "iommu.non_strict"

2018-09-14 Thread Robin Murphy
From: Zhen Lei 

Add a bootup option to make the system manager can choose which mode to
be used. The default mode is strict.

Signed-off-by: Zhen Lei 
[rm: move handling out of SMMUv3 driver]
Signed-off-by: Robin Murphy 
---
 .../admin-guide/kernel-parameters.txt | 13 ++
 drivers/iommu/iommu.c | 26 +++
 2 files changed, 39 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9871e649ffef..406b91759b62 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1749,6 +1749,19 @@
nobypass[PPC/POWERNV]
Disable IOMMU bypass, using IOMMU for PCI devices.
 
+   iommu.non_strict=   [ARM64]
+   Format: { "0" | "1" }
+   0 - strict mode, default.
+   Release IOVAs after the related TLBs are invalid
+   completely.
+   1 - non-strict mode.
+   Put off TLBs invalidation and release memory first.
+   It's good for scatter-gather performance but lacks
+   full isolation, an untrusted device can access the
+   reused memory because the TLBs may still valid.
+   Please take full consideration before choosing this
+   mode. Note that, VFIO will always use strict mode.
+
iommu.passthrough=
[ARM64] Configure DMA to bypass the IOMMU by default.
Format: { "0" | "1" }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c15c5980299..2cabd0c0a4f3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -41,6 +41,7 @@ static unsigned int iommu_def_domain_type = 
IOMMU_DOMAIN_IDENTITY;
 #else
 static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
+static bool iommu_dma_non_strict __read_mostly;
 
 struct iommu_callback_data {
const struct iommu_ops *ops;
@@ -131,6 +132,24 @@ static int __init iommu_set_def_domain_type(char *str)
 }
 early_param("iommu.passthrough", iommu_set_def_domain_type);
 
+static int __init iommu_dma_setup(char *str)
+{
+   int ret;
+
+   ret = kstrtobool(str, _dma_non_strict);
+   if (ret)
+   return ret;
+
+   if (iommu_dma_non_strict) {
+   pr_warn("WARNING: iommu non-strict mode is chosen.\n"
+   "It's good for scatter-gather performance but lacks 
full isolation\n");
+   add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
+   }
+
+   return 0;
+}
+early_param("iommu.non_strict", iommu_dma_setup);
+
 static ssize_t iommu_group_attr_show(struct kobject *kobj,
 struct attribute *__attr, char *buf)
 {
@@ -1072,6 +1091,13 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
group->default_domain = dom;
if (!group->domain)
group->domain = dom;
+
+   if (dom && iommu_dma_non_strict) {
+   int attr = 1;
+   iommu_domain_set_attr(dom,
+ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+ );
+   }
}
 
ret = iommu_group_add_device(group, dev);
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 3/6] iommu/io-pgtable-arm: Add support for non-strict mode

2018-09-14 Thread Robin Murphy
From: Zhen Lei 

To support non-strict mode, now we only TLBI and sync for strict mode,
except for non-leaf invalidations since page table updates themselves
must always be synchronous.

To save having to reason about it too much, make sure the invalidation
in arm_lpae_split_blk_unmap() just performs its own unconditional sync
to minimise the window in which we're technically violating the break-
before-make requirement on a live mapping. This might work out redundant
with an outer-level sync for strict unmaps, but we'll never be splitting
blocks on a DMA fastpath anyway.

Signed-off-by: Zhen Lei 
[rm: tweak comment, commit message, and split_blk_unmap logic]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/io-pgtable-arm.c | 9 ++---
 drivers/iommu/io-pgtable.h | 5 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 2f79efd16a05..5b915aab7fd3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -576,6 +576,7 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
tablep = iopte_deref(pte, data);
} else if (unmap_idx >= 0) {
io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
+   io_pgtable_tlb_sync(>iop);
return size;
}
 
@@ -609,7 +610,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
io_pgtable_tlb_sync(iop);
ptep = iopte_deref(pte, data);
__arm_lpae_free_pgtable(data, lvl + 1, ptep);
-   } else {
+   } else if (!(iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT)) {
io_pgtable_tlb_add_flush(iop, iova, size, size, true);
}
 
@@ -771,7 +772,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
u64 reg;
struct arm_lpae_io_pgtable *data;
 
-   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
+   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
+   IO_PGTABLE_QUIRK_NON_STRICT))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
@@ -863,7 +865,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, 
void *cookie)
struct arm_lpae_io_pgtable *data;
 
/* The NS quirk doesn't apply at stage 2 */
-   if (cfg->quirks & ~IO_PGTABLE_QUIRK_NO_DMA)
+   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
+   IO_PGTABLE_QUIRK_NON_STRICT))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 2df79093cad9..47d5ae559329 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -71,12 +71,17 @@ struct io_pgtable_cfg {
 *  be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
 *  software-emulated IOMMU), such that pagetable updates need not
 *  be treated as explicit DMA data.
+*
+* IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
+*  on unmap, for DMA domains using the flush queue mechanism for
+*  delayed invalidation.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
#define IO_PGTABLE_QUIRK_TLBI_ON_MAPBIT(2)
#define IO_PGTABLE_QUIRK_ARM_MTK_4GBBIT(3)
#define IO_PGTABLE_QUIRK_NO_DMA BIT(4)
+   #define IO_PGTABLE_QUIRK_NON_STRICT BIT(5)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 0/6] Add non-strict mode support for iommu-dma

2018-09-14 Thread Robin Murphy
Hi all,

Since we'd like to get this polished up and merged and Leizhen has other
commitments, here's v7 of the previous series[1] wherein I address all
my own feedback :) This is a quick tweak of the v6 I sent yesterday
since I figured out slightly too late a much neater way of setting the
attribute at the appropriate time.

The principal change is that I've inverted things slightly such that
it's now a generic domain attribute controlled by iommu-dma given the
necessary support from individual IOMMU drivers. That way we can easily
enable other drivers straight away, as I've done for SMMUv2 here (which
also allowed me to give it a quick test with MMU-401s on a Juno board).
Otherwise it's really just cosmetic cleanup and rebasing onto Will's
pending SMMU queue.

Robin.

[1] https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg25150.html


Robin Murphy (1):
  iommu/arm-smmu: Support non-strict mode

Zhen Lei (5):
  iommu/arm-smmu-v3: Implement flush_iotlb_all hook
  iommu/dma: Add support for non-strict mode
  iommu/io-pgtable-arm: Add support for non-strict mode
  iommu: Add bootup option "iommu.non_strict"
  iommu/arm-smmu-v3: Add support for non-strict mode

 .../admin-guide/kernel-parameters.txt | 13 ++
 drivers/iommu/arm-smmu-v3.c   | 40 +++
 drivers/iommu/arm-smmu.c  | 40 +++
 drivers/iommu/dma-iommu.c | 29 +-
 drivers/iommu/io-pgtable-arm.c|  9 +++--
 drivers/iommu/io-pgtable.h|  5 +++
 drivers/iommu/iommu.c | 26 
 include/linux/iommu.h |  1 +
 8 files changed, 145 insertions(+), 18 deletions(-)

-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 1/6] iommu/arm-smmu-v3: Implement flush_iotlb_all hook

2018-09-14 Thread Robin Murphy
From: Zhen Lei 

.flush_iotlb_all is currently stubbed to arm_smmu_iotlb_sync() since the
only time it would ever need to actually do anything is for callers
doing their own explicit batching, e.g.:

iommu_unmap_fast(domain, ...);
iommu_unmap_fast(domain, ...);
iommu_iotlb_flush_all(domain, ...);

where since io-pgtable still issues the TLBI commands implicitly in the
unmap instead of implementing .iotlb_range_add, the "flush" only needs
to ensure completion of those already-in-flight invalidations.

However, we're about to start using it in anger with flush queues, so
let's get a proper implementation wired up.

Signed-off-by: Zhen Lei 
Reviewed-by: Robin Murphy 
[rm: expand commit message]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-v3.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e395f1ff3f81..f10c852479fc 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1781,6 +1781,14 @@ arm_smmu_unmap(struct iommu_domain *domain, unsigned 
long iova, size_t size)
return ops->unmap(ops, iova, size);
 }
 
+static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   if (smmu_domain->smmu)
+   arm_smmu_tlb_inv_context(smmu_domain);
+}
+
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -2008,7 +2016,7 @@ static struct iommu_ops arm_smmu_ops = {
.attach_dev = arm_smmu_attach_dev,
.map= arm_smmu_map,
.unmap  = arm_smmu_unmap,
-   .flush_iotlb_all= arm_smmu_iotlb_sync,
+   .flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
.add_device = arm_smmu_add_device,
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 2/6] iommu/dma: Add support for non-strict mode

2018-09-14 Thread Robin Murphy
From: Zhen Lei 

1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
   capable call domain->ops->flush_iotlb_all to flush TLB.
2. During the iommu domain initialization phase, base on domain->non_strict
   field to check whether non-strict mode is supported or not. If so, call
   init_iova_flush_queue to register iovad->flush_cb callback.
3. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
   -->iommu_dma_free_iova. If the domain is non-strict, call queue_iova to
   put off iova freeing, and omit iommu_tlb_sync operation.

Signed-off-by: Zhen Lei 
[rm: convert raw boolean to domain attribute]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 29 -
 include/linux/iommu.h |  1 +
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 511ff9a1d6d9..092e6926dc3c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -55,6 +55,9 @@ struct iommu_dma_cookie {
};
struct list_headmsi_page_list;
spinlock_t  msi_lock;
+
+   /* Only be assigned in non-strict mode, otherwise it's NULL */
+   struct iommu_domain *domain;
 };
 
 static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
@@ -257,6 +260,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
return ret;
 }
 
+static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad)
+{
+   struct iommu_dma_cookie *cookie;
+   struct iommu_domain *domain;
+
+   cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
+   domain = cookie->domain;
+
+   domain->ops->flush_iotlb_all(domain);
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -275,6 +289,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
unsigned long order, base_pfn, end_pfn;
+   int attr = 1;
 
if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
return -EINVAL;
@@ -308,6 +323,13 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
}
 
init_iova_domain(iovad, 1UL << order, base_pfn);
+
+   if (!iommu_domain_get_attr(domain, DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   ) && attr) {
+   cookie->domain = domain;
+   init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, NULL);
+   }
+
if (!dev)
return 0;
 
@@ -393,6 +415,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie 
*cookie,
/* The MSI case is only ever cleaning up its most recent allocation */
if (cookie->type == IOMMU_DMA_MSI_COOKIE)
cookie->msi_iova -= size;
+   else if (cookie->domain)/* non-strict mode */
+   queue_iova(iovad, iova_pfn(iovad, iova),
+   size >> iova_shift(iovad), 0);
else
free_iova_fast(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad));
@@ -408,7 +433,9 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, 
dma_addr_t dma_addr,
dma_addr -= iova_off;
size = iova_align(iovad, size + iova_off);
 
-   WARN_ON(iommu_unmap(domain, dma_addr, size) != size);
+   WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
+   if (!cookie->domain)
+   iommu_tlb_sync(domain);
iommu_dma_free_iova(cookie, dma_addr, size);
 }
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 87994c265bf5..decabe8e8dbe 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMU_ENABLE,
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
+   DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-14 Thread Jerome Glisse
On Fri, Sep 14, 2018 at 06:50:55AM +, Tian, Kevin wrote:
> > From: Jerome Glisse
> > Sent: Thursday, September 13, 2018 10:52 PM
> >
> [...]
>  > AFAIK, on x86 and PPC at least, all PCIE devices are in the same group
> > by default at boot or at least all devices behind the same bridge.
> 
> the group thing reflects physical hierarchy limitation, not changed
> cross boot. Please note iommu group defines the minimal isolation
> boundary - all devices within same group must be attached to the
> same iommu domain or address space, because physically IOMMU
> cannot differentiate DMAs out of those devices. devices behind
> legacy PCI-X bridge is one example. other examples include devices
> behind a PCIe switch port which doesn't support ACS thus cannot
> route p2p transaction to IOMMU. If talking about typical PCIe 
> endpoint (with upstreaming ports all supporting ACS), you'll get
> one device per group.
> 
> One iommu group today is attached to only one iommu domain.
> In the future one group may attach to multiple domains, as the
> aux domain concept being discussed in another thread.

Thanks for the info.

> 
> > 
> > Maybe they are kernel option to avoid that and userspace init program
> > can definitly re-arrange that base on sysadmin policy).
> 
> I don't think there is such option, as it may break isolation model
> enabled by IOMMU.
> 
> [...]
> > > > That is why i am being pedantic :) on making sure there is good reasons
> > > > to do what you do inside VFIO. I do believe that we want a common
> > frame-
> > > > work like the one you are proposing but i do not believe it should be
> > > > part of VFIO given the baggages it comes with and that are not relevant
> > > > to the use cases for this kind of devices.
> > >
> 
> The purpose of VFIO is clear - the kernel portal for granting generic 
> device resource (mmio, irq, etc.) to user space. VFIO doesn't care
> what exactly a resource is used for (queue, cmd reg, etc.). If really
> pursuing VFIO path is necessary, maybe such common framework
> should lay down in user space, which gets all granted resource from
> kernel driver thru VFIO and then provides accelerator services to 
> other processes?

Except that many existing device driver falls under that description
(ie exposing mmio, command queues, ...) and are not under VFIO.

Up to mdev VFIO was all about handling a full device to userspace AFAIK.
With the introduction of mdev a host kernel driver can "slice" its
device and share it through VFIO to userspace. Note that in that case
it might never end over any mmio, irq, ... the host driver might just
be handling over memory and would be polling from it to schedule on
the real hardware.


The question i am asking about warpdrive is wether being in VFIO is
necessary ? as i do not see the requirement myself.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-14 Thread Jerome Glisse
On Fri, Sep 14, 2018 at 11:12:01AM +0800, Kenneth Lee wrote:
> On Thu, Sep 13, 2018 at 10:51:50AM -0400, Jerome Glisse wrote:
> > On Thu, Sep 13, 2018 at 04:32:32PM +0800, Kenneth Lee wrote:
> > > On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote:
> > > > On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote:
> > > > > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote:
> > > > > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote:
> > > > > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote:
> > > > > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote:
> > > > > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote:
> > > > > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote:
> > > > > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex 
> > > > > > > > > > > > > Williamson wrote:
> > > > > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth 
> > > > > > > > > > > > > > > Lee wrote:
> > > > > > > > 
> > > > > > > > [...]
> > > > > > > > 
> > > > > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it 
> > > > > > > > > > > "copy_from_user" the
> > > > > > > > > > > user memory to the kernel. That is not what we need. What 
> > > > > > > > > > > we try to get is: the
> > > > > > > > > > > user application do something on its data, and push it 
> > > > > > > > > > > away to the accelerator,
> > > > > > > > > > > and says: "I'm tied, it is your turn to do the job...". 
> > > > > > > > > > > Then the accelerator has
> > > > > > > > > > > the memory, referring any portion of it with the same VAs 
> > > > > > > > > > > of the application,
> > > > > > > > > > > even the VAs are stored inside the memory itself.
> > > > > > > > > > 
> > > > > > > > > > You were not looking at right place see 
> > > > > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > > > > > > > It does GUP and create GEM object AFAICR you can wrap that 
> > > > > > > > > > GEM object into a
> > > > > > > > > > dma buffer object.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Thank you for directing me to this implementation. It is 
> > > > > > > > > interesting:).
> > > > > > > > > 
> > > > > > > > > But it is not yet solve my problem. If I understand it right, 
> > > > > > > > > the userptr in
> > > > > > > > > i915 do the following:
> > > > > > > > > 
> > > > > > > > > 1. The user process sets a user pointer with size to the 
> > > > > > > > > kernel via ioctl.
> > > > > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's 
> > > > > > > > > mm for further
> > > > > > > > >reference.
> > > > > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the 
> > > > > > > > > device. So the data
> > > > > > > > >can be shared between the user space and the hardware.
> > > > > > > > > 
> > > > > > > > > But my scenario is: 
> > > > > > > > > 
> > > > > > > > > 1. The user process has some data in the user space, pointed 
> > > > > > > > > by a pointer, say
> > > > > > > > >ptr1. And within the memory, there may be some other 
> > > > > > > > > pointers, let's say one
> > > > > > > > >of them is ptr2.
> > > > > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO 
> > > > > > > > > space. And the
> > > > > > > > >hardware must refer ptr1 and ptr2 *directly* for data.
> > > > > > > > > 
> > > > > > > > > Userptr lets the hardware and process share the same memory 
> > > > > > > > > space. But I need
> > > > > > > > > them to share the same *address space*. So IOMMU is a MUST 
> > > > > > > > > for WarpDrive,
> > > > > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the 
> > > > > > > > > procedure is OK.
> > > > > > > > 
> > > > > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ?
> > > > > > > > If so then wait for necessary SVA/SVM to land and do warp drive
> > > > > > > > without non SVA/SVM path.
> > > > > > > > 
> > > > > > > 
> > > > > > > I think we should clear the concept of SVA/SVM here. As my 
> > > > > > > understanding, Share
> > > > > > > Virtual Address/Memory means: any virtual address in a process 
> > > > > > > can be used by
> > > > > > > device at the same time. This requires IOMMU device to support 
> > > > > > > PASID. And
> > > > > > > optionally, it requires the feature of page-fault-from-device.
> > > > > > 
> > > > > > Yes we agree on what SVA/SVM is. There is a one gotcha thought, 
> > > > > > access
> > > > > > to range that are MMIO map ie CPU page table pointing to IO memory, 
> > > > > > IIRC
> > > > > > it is undefined what happens on 

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-14 Thread Auger Eric
Hi Jacob,

On 5/11/18 10:54 PM, Jacob Pan wrote:
> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
> 
> Faults detected by IOMMU is based on the transaction's source ID which
> can be reported at per device basis, regardless of the device type is a
> PCI device or not.
> 
> The fault types include recoverable (e.g. page request) and
> unrecoverable faults(e.g. access error). In most cases, faults can be
> handled by IOMMU drivers internally. The primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is
> assigned to guest via vIOMMU. In this case, the first level page tables
> are owned by the guest. Page request must be propagated to the guest to
> let guest OS fault in the pages then send page response. In this
> mechanism, the direct receiver of IOMMU fault notification is VFIO,
> which can relay notification events to QEMU or other user space
> software.
> 
> 2. faults need more subtle handling by device drivers. Other than
> simply invoke reset function, there are needs to let device driver
> handle the fault with a smaller impact.
> 
> This patchset is intended to create a generic fault report API such
> that it can scale as follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions
> summarized at https://lwn.net/Articles/608914/.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 149 
> +-
>  include/linux/iommu.h |  35 +++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3a49b96..b3f9daf 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(>iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   mutex_unlock(>mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(>kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the fault 
> event
> + * and data as argument. The handler should return 0 on success. If the 
> fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> + goto done_unlock;
> + }
> +
> + get_device(dev);
> + 

Re: [RFC PATCH 3/3] dma-mapping: clear dma_parms on teardown, too

2018-09-14 Thread Robin Murphy

On 13/09/18 16:17, Wolfram Sang wrote:

While sanitizig the pointer for dma_ops on teardown, do the same for
dma_parms, too. Rename the function to have a more generic name.


Upon closer inspection, it looks like there are some cases (at least PCI 
and MFD) where dma_parms is installed by the parent/bus at device 
creation, and therefore remains valid and *would* be expected to persist 
across the child device's driver unbinding and rebinding - seems this is 
more complex than I first thought, sorry.


Robin.


Signed-off-by: Wolfram Sang 
---
  arch/arm/mm/dma-mapping.c   | 2 +-
  include/linux/dma-mapping.h | 5 +++--
  2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 466b0242e8af..bcf77bc0423f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2399,5 +2399,5 @@ void arch_teardown_dma_ops(struct device *dev)
if (dev->archdata.dma_ops_setup)
arm_teardown_iommu_dma_ops(dev);
  
-	generic_teardown_dma_ops(dev);

+   generic_teardown_dma(dev);
  }
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 020512cb7f0e..6a2d8779b1d8 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -663,12 +663,13 @@ static inline void arch_setup_dma_ops(struct device *dev, 
u64 dma_base,
  bool coherent) { }
  #endif
  
-static inline void generic_teardown_dma_ops(struct device *dev)

+static inline void generic_teardown_dma(struct device *dev)
  {
dev->dma_ops = NULL;
+   dev->dma_parms = NULL;
  }
  #ifndef arch_teardown_dma_ops
-#define arch_teardown_dma_ops generic_teardown_dma_ops
+#define arch_teardown_dma_ops generic_teardown_dma
  #endif
  
  static inline unsigned int dma_get_max_seg_size(struct device *dev)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/3] dma-mapping: clear dma_parms on teardown, too

2018-09-14 Thread Christoph Hellwig
Same here.  I don't even think we really need to clear the ops to
start with, but we if we want to do it I'd just do it directly in
the two callers.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/3] dma-mapping: clear dma_ops pointer also on ARM

2018-09-14 Thread Christoph Hellwig
On Thu, Sep 13, 2018 at 05:17:15PM +0200, Wolfram Sang wrote:
> The generic fallback of arch_teardown_dma_ops() clears the dma_ops
> pointer but the ARM specific version does not. Rename the generic one,
> so it can be called from ARM specific one, too. This will ensure
> consistent behaviour.

Hmm.  I'd rather remove your new generic version entirely and move
the ops clearing into the two callers.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-14 Thread Kenneth Lee
On Fri, Sep 14, 2018 at 06:50:55AM +, Tian, Kevin wrote:
> Date: Fri, 14 Sep 2018 06:50:55 +
> From: "Tian, Kevin" 
> To: Jerome Glisse , Kenneth Lee 
> CC: Kenneth Lee , Alex Williamson
>  , Herbert Xu ,
>  "k...@vger.kernel.org" , Jonathan Corbet
>  , Greg Kroah-Hartman , Zaibo
>  Xu , "linux-...@vger.kernel.org"
>  , "Kumar, Sanjay K" ,
>  Hao Fang , "linux-ker...@vger.kernel.org"
>  , "linux...@huawei.com"
>  , "iommu@lists.linux-foundation.org"
>  , "David S . Miller"
>  , "linux-cry...@vger.kernel.org"
>  , Zhou Wang ,
>  Philippe Ombredanne , Thomas Gleixner
>  , Joerg Roedel ,
>  "linux-accelerat...@lists.ozlabs.org"
>  , Lu Baolu 
> Subject: RE: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> Message-ID: 
> 
> 
> > From: Jerome Glisse
> > Sent: Thursday, September 13, 2018 10:52 PM
> >
> [...]
>  > AFAIK, on x86 and PPC at least, all PCIE devices are in the same group
> > by default at boot or at least all devices behind the same bridge.
> 
> the group thing reflects physical hierarchy limitation, not changed
> cross boot. Please note iommu group defines the minimal isolation
> boundary - all devices within same group must be attached to the
> same iommu domain or address space, because physically IOMMU
> cannot differentiate DMAs out of those devices. devices behind
> legacy PCI-X bridge is one example. other examples include devices
> behind a PCIe switch port which doesn't support ACS thus cannot
> route p2p transaction to IOMMU. If talking about typical PCIe 
> endpoint (with upstreaming ports all supporting ACS), you'll get
> one device per group.
> 
> One iommu group today is attached to only one iommu domain.
> In the future one group may attach to multiple domains, as the
> aux domain concept being discussed in another thread.
> 
> > 
> > Maybe they are kernel option to avoid that and userspace init program
> > can definitly re-arrange that base on sysadmin policy).
> 
> I don't think there is such option, as it may break isolation model
> enabled by IOMMU.
> 
> [...]
> > > > That is why i am being pedantic :) on making sure there is good reasons
> > > > to do what you do inside VFIO. I do believe that we want a common
> > frame-
> > > > work like the one you are proposing but i do not believe it should be
> > > > part of VFIO given the baggages it comes with and that are not relevant
> > > > to the use cases for this kind of devices.
> > >
> 
> The purpose of VFIO is clear - the kernel portal for granting generic 
> device resource (mmio, irq, etc.) to user space. VFIO doesn't care
> what exactly a resource is used for (queue, cmd reg, etc.). If really
> pursuing VFIO path is necessary, maybe such common framework
> should lay down in user space, which gets all granted resource from
> kernel driver thru VFIO and then provides accelerator services to 
> other processes?

Yes. I think this is exactly what WarpDrive is now doing. This patch is just let
the type1 driver use parent IOMMU for mdev.

> 
> Thanks
> Kevin

-- 
-Kenneth(Hisilicon)


本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the 
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention

2018-09-14 Thread Will Deacon
Hi Robin,

On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
> 
> These are a couple of quick patches based on the idea I had back when
> first implementing iommu-dma, but didn't have any way to justify at the
> time. However, the reports of 10-25% better networking performance on v1
> suggest that it's very worthwhile (and far more significant than I ever
> would have guessed).
> 
> As far as merging goes, I don't mind at all whether this goes via IOMMU,
> or via dma-mapping provided Joerg's happy to ack it.

I think it makes most sense for Joerg to take this series via his tree.

Anyway, I've been running them on my TX2 box and things are happy enough,
so:

Tested-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/3] ARM: dma-mapping: update comment about handling dma_ops when detaching from IOMMU

2018-09-14 Thread Geert Uytterhoeven
On Thu, Sep 13, 2018 at 5:18 PM Wolfram Sang
 wrote:
> Update the comment because we don't set the pointer to NULL anymore.
> Also use the correct pointer name 'dma_ops' instead of 'dma_map_ops'.
>
> Fixes: 1874619a7df4 ("ARM: dma-mapping: Set proper DMA ops in 
> arm_iommu_detach_device()")
> Signed-off-by: Wolfram Sang 

Nice catch!

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH, for-4.19] dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration

2018-09-14 Thread Christoph Hellwig
Aby chance to get a review for this?

On Tue, Sep 11, 2018 at 11:00:49AM +0200, Christoph Hellwig wrote:
> The patch adding the infrastructure failed to actually add the symbol
> declaration, oops..
> 
> Fixes: faef87723a ("dma-noncoherent: add a arch_sync_dma_for_cpu_all hook")
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/Kconfig | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index 9bd54304446f..1b1d63b3634b 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -23,6 +23,9 @@ config ARCH_HAS_SYNC_DMA_FOR_CPU
>   bool
>   select NEED_DMA_MAP_STATE
>  
> +config ARCH_HAS_SYNC_DMA_FOR_CPU_ALL
> + bool
> +
>  config DMA_DIRECT_OPS
>   bool
>   depends on HAS_DMA
> -- 
> 2.18.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


merge dma_direct_ops and dma_noncoherent_ops v3

2018-09-14 Thread Christoph Hellwig
While most architectures are either always or never dma coherent for a
given build, the arm, arm64, mips and soon arc architectures can have
different dma coherent settings on a per-device basis.  Additionally
some mips builds can decide at boot time if dma is coherent or not.

I've started to look into handling noncoherent dma in swiotlb, and
moving the dma-iommu ops into common code [1], and for that we need a
generic way to check if a given device is coherent or not.  Moving
this flag into struct device also simplifies the conditionally coherent
architecture implementations.

These patches are also available in a git tree given that they have
a few previous posted dependencies:

git://git.infradead.org/users/hch/misc.git dma-direct-noncoherent-merge

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-noncoherent-merge

Changes since v2:
 - return bool from dev_is_dma_coherent

Changes since v1:
 - rebased to the latest Linus' tree which includes coherent dma support
   for arc
 - a couple tidyups suggested by Paul Burton
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/6] dma-mapping: merge direct and noncoherent ops

2018-09-14 Thread Christoph Hellwig
All the cache maintainance is already stubbed out when not enabled,
but merging the two allows us to nicely handle the case where
cache maintainance is required for some devices, but not others.

Signed-off-by: Christoph Hellwig 
Acked-by: Paul Burton  # MIPS parts
---
 arch/arc/Kconfig |   2 +-
 arch/arc/mm/dma.c|  16 ++--
 arch/arm/mm/dma-mapping-nommu.c  |   5 +-
 arch/c6x/Kconfig |   2 +-
 arch/hexagon/Kconfig |   2 +-
 arch/m68k/Kconfig|   2 +-
 arch/microblaze/Kconfig  |   2 +-
 arch/mips/Kconfig|   1 -
 arch/mips/include/asm/dma-mapping.h  |   2 -
 arch/mips/jazz/jazzdma.c |   6 +-
 arch/mips/mm/dma-noncoherent.c   |  29 ++-
 arch/nds32/Kconfig   |   2 +-
 arch/nios2/Kconfig   |   2 +-
 arch/openrisc/Kconfig|   2 +-
 arch/parisc/Kconfig  |   2 +-
 arch/parisc/kernel/setup.c   |   2 +-
 arch/sh/Kconfig  |   3 +-
 arch/sparc/Kconfig   |   2 +-
 arch/sparc/include/asm/dma-mapping.h |   4 +-
 arch/x86/kernel/amd_gart_64.c|   6 +-
 arch/xtensa/Kconfig  |   2 +-
 include/asm-generic/dma-mapping.h|   9 --
 include/linux/dma-direct.h   |   4 +
 include/linux/dma-mapping.h  |   1 -
 include/linux/dma-noncoherent.h  |   5 --
 kernel/dma/Kconfig   |   9 +-
 kernel/dma/Makefile  |   1 -
 kernel/dma/direct.c  | 121 +--
 kernel/dma/noncoherent.c | 106 ---
 29 files changed, 160 insertions(+), 192 deletions(-)
 delete mode 100644 kernel/dma/noncoherent.c

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index b4441b0764d7..ca03694d518a 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -17,7 +17,7 @@ config ARC
select BUILDTIME_EXTABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
-   select DMA_NONCOHERENT_OPS
+   select DMA_DIRECT_OPS
select DMA_NONCOHERENT_MMAP
select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
select GENERIC_CLOCKEVENTS
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index c75d5c3470e3..535ed4a068ef 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -167,7 +167,7 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t 
paddr,
 }
 
 /*
- * Plug in coherent or noncoherent dma ops
+ * Plug in direct dma map ops.
  */
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
@@ -175,13 +175,11 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
/*
 * IOC hardware snoops all DMA traffic keeping the caches consistent
 * with memory - eliding need for any explicit cache maintenance of
-* DMA buffers - so we can use dma_direct cache ops.
+* DMA buffers.
 */
-   if (is_isa_arcv2() && ioc_enable && coherent) {
-   set_dma_ops(dev, _direct_ops);
-   dev_info(dev, "use dma_direct_ops cache ops\n");
-   } else {
-   set_dma_ops(dev, _noncoherent_ops);
-   dev_info(dev, "use dma_noncoherent_ops cache ops\n");
-   }
+   if (is_isa_arcv2() && ioc_enable && coherent)
+   dev->dma_coherent = true;
+
+   dev_info(dev, "use %sncoherent DMA ops\n",
+dev->dma_coherent ? "" : "non");
 }
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index aa7aba302e76..0ad156f9985b 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -47,7 +47,8 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t 
size,
 */
 
if (attrs & DMA_ATTR_NON_CONSISTENT)
-   return dma_direct_alloc(dev, size, dma_handle, gfp, attrs);
+   return dma_direct_alloc_pages(dev, size, dma_handle, gfp,
+   attrs);
 
ret = dma_alloc_from_global_coherent(size, dma_handle);
 
@@ -70,7 +71,7 @@ static void arm_nommu_dma_free(struct device *dev, size_t 
size,
   unsigned long attrs)
 {
if (attrs & DMA_ATTR_NON_CONSISTENT) {
-   dma_direct_free(dev, size, cpu_addr, dma_addr, attrs);
+   dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
} else {
int ret = dma_release_from_global_coherent(get_order(size),
   cpu_addr);
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index a641b0bf1611..f65a084607fd 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -9,7 +9,7 @@ config C6X
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select CLKDEV_LOOKUP
-   select DMA_NONCOHERENT_OPS
+   

[PATCH 1/6] dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration

2018-09-14 Thread Christoph Hellwig
The patch adding the infrastructure failed to actually add the symbol
declaration, oops..

Fixes: faef87723a ("dma-noncoherent: add a arch_sync_dma_for_cpu_all hook")
Signed-off-by: Christoph Hellwig 
---
 kernel/dma/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 9bd54304446f..1b1d63b3634b 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -23,6 +23,9 @@ config ARCH_HAS_SYNC_DMA_FOR_CPU
bool
select NEED_DMA_MAP_STATE
 
+config ARCH_HAS_SYNC_DMA_FOR_CPU_ALL
+   bool
+
 config DMA_DIRECT_OPS
bool
depends on HAS_DMA
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/6] dma-mapping: consolidate the dma mmap implementations

2018-09-14 Thread Christoph Hellwig
The only functional differences (modulo a few missing fixes in the arch
code) is that architectures without coherent caches need a hook to
convert a virtual or dma address into a pfn, given that we don't have
the kernel linear mapping available for the otherwise easy virt_to_page
call.  As a side effect we can support mmap of the per-device coherent
area even on architectures not providing the callback, and we make
previous dangerous default methods dma_common_mmap actually save for
non-coherent architectures by rejecting it without the right helper.

In addition to that we need a hook so that some architectures can
override the protection bits when mmaping a dma coherent allocations.

Signed-off-by: Christoph Hellwig 
Acked-by: Paul Burton  # MIPS parts
---
 arch/arc/Kconfig  |  2 +-
 arch/arc/mm/dma.c | 25 +++--
 arch/arm/mm/dma-mapping-nommu.c   |  2 +-
 arch/microblaze/Kconfig   |  2 +-
 arch/microblaze/include/asm/pgtable.h |  2 --
 arch/microblaze/kernel/dma.c  | 22 --
 arch/microblaze/mm/consistent.c   |  3 ++-
 arch/mips/Kconfig |  3 ++-
 arch/mips/jazz/jazzdma.c  |  1 -
 arch/mips/mm/dma-noncoherent.c| 32 ---
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/dma-mapping.h   |  5 +++--
 include/linux/dma-noncoherent.h   | 10 +++--
 kernel/dma/Kconfig| 10 +
 kernel/dma/direct.c   | 11 -
 kernel/dma/mapping.c  | 32 ++-
 16 files changed, 58 insertions(+), 106 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index ca03694d518a..3d9bdecfa52d 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -9,6 +9,7 @@
 config ARC
def_bool y
select ARC_TIMERS
+   select ARCH_HAS_DMA_COHERENT_TO_PFN
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
@@ -18,7 +19,6 @@ config ARC
select CLONE_BACKWARDS
select COMMON_CLK
select DMA_DIRECT_OPS
-   select DMA_NONCOHERENT_MMAP
select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
select GENERIC_CLOCKEVENTS
select GENERIC_FIND_FIRST_BIT
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 535ed4a068ef..db203ff69ccf 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -84,29 +84,10 @@ void arch_dma_free(struct device *dev, size_t size, void 
*vaddr,
__free_pages(page, get_order(size));
 }
 
-int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs)
+long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
+   dma_addr_t dma_addr)
 {
-   unsigned long user_count = vma_pages(vma);
-   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long pfn = __phys_to_pfn(dma_addr);
-   unsigned long off = vma->vm_pgoff;
-   int ret = -ENXIO;
-
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-
-   if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, ))
-   return ret;
-
-   if (off < count && user_count <= (count - off)) {
-   ret = remap_pfn_range(vma, vma->vm_start,
- pfn + off,
- user_count << PAGE_SHIFT,
- vma->vm_page_prot);
-   }
-
-   return ret;
+   return __phys_to_pfn(dma_addr);
 }
 
 /*
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 0ad156f9985b..712416ecd8e6 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -91,7 +91,7 @@ static int arm_nommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
if (dma_mmap_from_global_coherent(vma, cpu_addr, size, ))
return ret;
 
-   return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size);
+   return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
 }
 
 
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 0f48ab6a8070..164a4857737a 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -1,6 +1,7 @@
 config MICROBLAZE
def_bool y
select ARCH_NO_SWAP
+   select ARCH_HAS_DMA_COHERENT_TO_PFN if MMU
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
@@ -12,7 +13,6 @@ config MICROBLAZE
select CLONE_BACKWARDS3
select COMMON_CLK
select DMA_DIRECT_OPS
-   select DMA_NONCOHERENT_MMAP
select GENERIC_ATOMIC64
select GENERIC_CLOCKEVENTS
select GENERIC_CPU_DEVICES
diff --git a/arch/microblaze/include/asm/pgtable.h 

[PATCH 6/6] dma-mapping: support non-coherent devices in dma_common_get_sgtable

2018-09-14 Thread Christoph Hellwig
We can use the arch_dma_coherent_to_pfn hook to provide a ->get_sgtable
implementation.  Note that this isn't an endorsement of this interface
(which is a horrible bad idea), but it is required to move arm64 over
to the generic code without a loss of functionality.

Signed-off-by: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c   |  2 +-
 include/linux/dma-mapping.h |  7 ---
 kernel/dma/mapping.c| 23 ---
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 470757ea..28819a0e61d0 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -689,7 +689,7 @@ xen_swiotlb_get_sgtable(struct device *dev, struct sg_table 
*sgt,
   handle, size, attrs);
}
 #endif
-   return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size);
+   return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size, attrs);
 }
 
 static int xen_swiotlb_mapping_error(struct device *dev, dma_addr_t dma_addr)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index c3378d4e0d57..bd81e74cca7b 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -483,8 +483,8 @@ dma_mmap_attrs(struct device *dev, struct vm_area_struct 
*vma, void *cpu_addr,
 #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
 
 int
-dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
-  void *cpu_addr, dma_addr_t dma_addr, size_t size);
+dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void 
*cpu_addr,
+   dma_addr_t dma_addr, size_t size, unsigned long attrs);
 
 static inline int
 dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, void *cpu_addr,
@@ -496,7 +496,8 @@ dma_get_sgtable_attrs(struct device *dev, struct sg_table 
*sgt, void *cpu_addr,
if (ops->get_sgtable)
return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
attrs);
-   return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size);
+   return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
+   attrs);
 }
 
 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 42fd73aca305..58dec7a92b7b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -202,17 +202,26 @@ EXPORT_SYMBOL(dmam_release_declared_memory);
  * Create scatter-list for the already allocated DMA buffer.
  */
 int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
-void *cpu_addr, dma_addr_t handle, size_t size)
+void *cpu_addr, dma_addr_t dma_addr, size_t size,
+unsigned long attrs)
 {
-   struct page *page = virt_to_page(cpu_addr);
+   struct page *page;
int ret;
 
-   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-   if (unlikely(ret))
-   return ret;
+   if (!dev_is_dma_coherent(dev)) {
+   if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_COHERENT_TO_PFN))
+   return -ENXIO;
 
-   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-   return 0;
+   page = pfn_to_page(arch_dma_coherent_to_pfn(dev, cpu_addr,
+   dma_addr));
+   } else {
+   page = virt_to_page(cpu_addr);
+   }
+
+   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
+   if (!ret)
+   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
+   return ret;
 }
 EXPORT_SYMBOL(dma_common_get_sgtable);
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/6] dma-mapping: move the dma_coherent flag to struct device

2018-09-14 Thread Christoph Hellwig
Various architectures support both coherent and non-coherent dma on a
per-device basis.  Move the dma_noncoherent flag from the mips archdata
field to struct device proper to prepare the infrastructure for reuse on
other architectures.

Signed-off-by: Christoph Hellwig 
Acked-by: Paul Burton 
Acked-by: Greg Kroah-Hartman 
---
 arch/mips/Kconfig |  1 +
 arch/mips/include/asm/Kbuild  |  1 +
 arch/mips/include/asm/device.h| 19 
 arch/mips/include/asm/dma-coherence.h |  6 +
 arch/mips/include/asm/dma-mapping.h   |  2 +-
 arch/mips/mm/dma-noncoherent.c| 32 +--
 include/linux/device.h|  7 ++
 include/linux/dma-noncoherent.h   | 16 ++
 kernel/dma/Kconfig|  3 +++
 9 files changed, 41 insertions(+), 46 deletions(-)
 delete mode 100644 arch/mips/include/asm/device.h

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 0b25180028b8..54c52bd0d9d3 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1106,6 +1106,7 @@ config ARCH_SUPPORTS_UPROBES
bool
 
 config DMA_MAYBE_COHERENT
+   select ARCH_HAS_DMA_COHERENCE_H
select DMA_NONCOHERENT
bool
 
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 58351e48421e..9a81e72119da 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -1,6 +1,7 @@
 # MIPS headers
 generic-(CONFIG_GENERIC_CSUM) += checksum.h
 generic-y += current.h
+generic-y += device.h
 generic-y += dma-contiguous.h
 generic-y += emergency-restart.h
 generic-y += export.h
diff --git a/arch/mips/include/asm/device.h b/arch/mips/include/asm/device.h
deleted file mode 100644
index 6aa796f1081a..
--- a/arch/mips/include/asm/device.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/*
- * Arch specific extensions to struct device
- *
- * This file is released under the GPLv2
- */
-#ifndef _ASM_MIPS_DEVICE_H
-#define _ASM_MIPS_DEVICE_H
-
-struct dev_archdata {
-#ifdef CONFIG_DMA_PERDEV_COHERENT
-   /* Non-zero if DMA is coherent with CPU caches */
-   bool dma_coherent;
-#endif
-};
-
-struct pdev_archdata {
-};
-
-#endif /* _ASM_MIPS_DEVICE_H*/
diff --git a/arch/mips/include/asm/dma-coherence.h 
b/arch/mips/include/asm/dma-coherence.h
index 8eda48748ed5..5eaa1fcc878a 100644
--- a/arch/mips/include/asm/dma-coherence.h
+++ b/arch/mips/include/asm/dma-coherence.h
@@ -20,6 +20,12 @@ enum coherent_io_user_state {
 #elif defined(CONFIG_DMA_MAYBE_COHERENT)
 extern enum coherent_io_user_state coherentio;
 extern int hw_coherentio;
+
+static inline bool dev_is_dma_coherent(struct device *dev)
+{
+   return coherentio == IO_COHERENCE_ENABLED ||
+   (coherentio == IO_COHERENCE_DEFAULT && hw_coherentio);
+}
 #else
 #ifdef CONFIG_DMA_NONCOHERENT
 #define coherentio IO_COHERENCE_DISABLED
diff --git a/arch/mips/include/asm/dma-mapping.h 
b/arch/mips/include/asm/dma-mapping.h
index e81c4e97ff1a..40d825c779de 100644
--- a/arch/mips/include/asm/dma-mapping.h
+++ b/arch/mips/include/asm/dma-mapping.h
@@ -25,7 +25,7 @@ static inline void arch_setup_dma_ops(struct device *dev, u64 
dma_base,
  bool coherent)
 {
 #ifdef CONFIG_DMA_PERDEV_COHERENT
-   dev->archdata.dma_coherent = coherent;
+   dev->dma_coherent = coherent;
 #endif
 }
 
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 2aca1236af36..d408ac51f56c 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -14,26 +14,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_DMA_PERDEV_COHERENT
-static inline int dev_is_coherent(struct device *dev)
-{
-   return dev->archdata.dma_coherent;
-}
-#else
-static inline int dev_is_coherent(struct device *dev)
-{
-   switch (coherentio) {
-   default:
-   case IO_COHERENCE_DEFAULT:
-   return hw_coherentio;
-   case IO_COHERENCE_ENABLED:
-   return 1;
-   case IO_COHERENCE_DISABLED:
-   return 0;
-   }
-}
-#endif /* CONFIG_DMA_PERDEV_COHERENT */
-
 /*
  * The affected CPUs below in 'cpu_needs_post_dma_flush()' can speculatively
  * fill random cachelines with stale data at any time, requiring an extra
@@ -49,7 +29,7 @@ static inline int dev_is_coherent(struct device *dev)
  */
 static inline bool cpu_needs_post_dma_flush(struct device *dev)
 {
-   if (dev_is_coherent(dev))
+   if (dev_is_dma_coherent(dev))
return false;
 
switch (boot_cpu_type()) {
@@ -76,7 +56,7 @@ void *arch_dma_alloc(struct device *dev, size_t size,
if (!ret)
return NULL;
 
-   if (!dev_is_coherent(dev) && !(attrs & DMA_ATTR_NON_CONSISTENT)) {
+   if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_NON_CONSISTENT)) {
dma_cache_wback_inv((unsigned long) ret, size);
ret = (void *)UNCAC_ADDR(ret);
}
@@ -87,7 +67,7 @@ void *arch_dma_alloc(struct device *dev, 

[PATCH 2/6] MIPS: don't select DMA_MAYBE_COHERENT from DMA_PERDEV_COHERENT

2018-09-14 Thread Christoph Hellwig
While both option select a form of conditional dma coherence they don't
actually share any code in the implementation, so untangle them.

Signed-off-by: Christoph Hellwig 
Acked-by: Paul Burton 
---
 arch/mips/Kconfig|  2 +-
 arch/mips/kernel/setup.c |  2 +-
 arch/mips/mm/c-r4k.c | 17 -
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 35511999156a..0b25180028b8 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -,7 +,7 @@ config DMA_MAYBE_COHERENT
 
 config DMA_PERDEV_COHERENT
bool
-   select DMA_MAYBE_COHERENT
+   select DMA_NONCOHERENT
 
 config DMA_NONCOHERENT
bool
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index c71d1eb7da59..6d840a44fa36 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -1067,7 +1067,7 @@ static int __init debugfs_mips(void)
 arch_initcall(debugfs_mips);
 #endif
 
-#if defined(CONFIG_DMA_MAYBE_COHERENT) && !defined(CONFIG_DMA_PERDEV_COHERENT)
+#ifdef CONFIG_DMA_MAYBE_COHERENT
 /* User defined DMA coherency from command line. */
 enum coherent_io_user_state coherentio = IO_COHERENCE_DEFAULT;
 EXPORT_SYMBOL_GPL(coherentio);
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index a9ef057c79fe..05bd77727fb9 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -1955,22 +1955,21 @@ void r4k_cache_init(void)
__flush_icache_user_range   = r4k_flush_icache_user_range;
__local_flush_icache_user_range = local_r4k_flush_icache_user_range;
 
-#if defined(CONFIG_DMA_NONCOHERENT) || defined(CONFIG_DMA_MAYBE_COHERENT)
-# if defined(CONFIG_DMA_PERDEV_COHERENT)
-   if (0) {
-# else
-   if ((coherentio == IO_COHERENCE_ENABLED) ||
-   ((coherentio == IO_COHERENCE_DEFAULT) && hw_coherentio)) {
-# endif
+#ifdef CONFIG_DMA_NONCOHERENT
+#ifdef CONFIG_DMA_MAYBE_COHERENT
+   if (coherentio == IO_COHERENCE_ENABLED ||
+   (coherentio == IO_COHERENCE_DEFAULT && hw_coherentio)) {
_dma_cache_wback_inv= (void *)cache_noop;
_dma_cache_wback= (void *)cache_noop;
_dma_cache_inv  = (void *)cache_noop;
-   } else {
+   } else
+#endif /* CONFIG_DMA_MAYBE_COHERENT */
+   {
_dma_cache_wback_inv= r4k_dma_cache_wback_inv;
_dma_cache_wback= r4k_dma_cache_wback_inv;
_dma_cache_inv  = r4k_dma_cache_inv;
}
-#endif
+#endif /* CONFIG_DMA_NONCOHERENT */
 
build_clear_page();
build_copy_page();
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: of_dma_request_slave_channel() failed ?

2018-09-14 Thread Geert Uytterhoeven
Hi Morimoto-san,

On Fri, Sep 14, 2018 at 2:53 AM Kuninori Morimoto
 wrote:
> > > Before 6c92d5a2744e2761 patch, driver will forcibly ignore
> > > non-SSI modules, and try to use PIO mode.
> > > I'm not sure it is "kindly support" or "overkill support".
> > >
> > > After this patch, it needs DMA, otherwise, probe will be failed.
> > > DT shouldn't have non-SSI modules if you want to use PIO mode.
> > >
> > > + /* use PIO mode */
> > > - playback = <  >;
> > > + playback = <>;
> >
> > OK, so falling back to PIO was really a "best effort" fallback, and the user
> > should really enable DMA?
>
> Yeah, I'm thinking this is better.
> PIO fallback is of course "nice to have" if possible.
> But, because of this new iommu patch, keeping this feature
> needs "big complicated patch", and we can get "small effect" I think.
> Thus, I think this is the time to remove this feature.
> Can you agree ?

You're the rcar-sound expert ;-)
If you think there's not much value in keeping PIO fallback, it can be
removed.

We can change behavior once these dependency issues have been resolved
in a generic way.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-14 Thread Tian, Kevin
> From: Jerome Glisse
> Sent: Thursday, September 13, 2018 10:52 PM
>
[...]
 > AFAIK, on x86 and PPC at least, all PCIE devices are in the same group
> by default at boot or at least all devices behind the same bridge.

the group thing reflects physical hierarchy limitation, not changed
cross boot. Please note iommu group defines the minimal isolation
boundary - all devices within same group must be attached to the
same iommu domain or address space, because physically IOMMU
cannot differentiate DMAs out of those devices. devices behind
legacy PCI-X bridge is one example. other examples include devices
behind a PCIe switch port which doesn't support ACS thus cannot
route p2p transaction to IOMMU. If talking about typical PCIe 
endpoint (with upstreaming ports all supporting ACS), you'll get
one device per group.

One iommu group today is attached to only one iommu domain.
In the future one group may attach to multiple domains, as the
aux domain concept being discussed in another thread.

> 
> Maybe they are kernel option to avoid that and userspace init program
> can definitly re-arrange that base on sysadmin policy).

I don't think there is such option, as it may break isolation model
enabled by IOMMU.

[...]
> > > That is why i am being pedantic :) on making sure there is good reasons
> > > to do what you do inside VFIO. I do believe that we want a common
> frame-
> > > work like the one you are proposing but i do not believe it should be
> > > part of VFIO given the baggages it comes with and that are not relevant
> > > to the use cases for this kind of devices.
> >

The purpose of VFIO is clear - the kernel portal for granting generic 
device resource (mmio, irq, etc.) to user space. VFIO doesn't care
what exactly a resource is used for (queue, cmd reg, etc.). If really
pursuing VFIO path is necessary, maybe such common framework
should lay down in user space, which gets all granted resource from
kernel driver thru VFIO and then provides accelerator services to 
other processes?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu