Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-02 Thread David Gibson
On Fri, Apr 29, 2022 at 09:48:38AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 29, 2022 at 04:20:36PM +1000, David Gibson wrote:
> 
> > > I think PPC and S390 are solving the same problem here. I think S390
> > > is going to go to a SW nested model where it has an iommu_domain
> > > controlled by iommufd that is populated with the pinned pages, eg
> > > stored in an xarray.
> > > 
> > > Then the performance map/unmap path is simply copying pages from the
> > > xarray to the real IOPTEs - and this would be modeled as a nested
> > > iommu_domain with a SW vIOPTE walker instead of a HW vIOPTE walker.
> > > 
> > > Perhaps this is agreeable for PPC too?
> > 
> > Uh.. maybe?  Note that I'm making these comments based on working on
> > this some years ago (the initial VFIO for ppc implementation in
> > particular).  I'm no longer actively involved in ppc kernel work.
> 
> OK
>  
> > > > 3) "dynamic DMA windows" (DDW).  The IBM IOMMU hardware allows for 2 
> > > > IOVA
> > > > windows, which aren't contiguous with each other.  The base addresses
> > > > of each of these are fixed, but the size of each window, the pagesize
> > > > (i.e. granularity) of each window and the number of levels in the
> > > > IOMMU pagetable are runtime configurable.  Because it's true in the
> > > > hardware, it's also true of the vIOMMU interface defined by the IBM
> > > > hypervisor (and adpoted by KVM as well).  So, guests can request
> > > > changes in how these windows are handled.  Typical Linux guests will
> > > > use the "low" window (IOVA 0..2GiB) dynamically, and the high window
> > > > (IOVA 1<<60..???) to map all of RAM.  However, as a hypervisor we
> > > > can't count on that; the guest can use them however it wants.
> > > 
> > > As part of nesting iommufd will have a 'create iommu_domain using
> > > iommu driver specific data' primitive.
> > > 
> > > The driver specific data for PPC can include a description of these
> > > windows so the PPC specific qemu driver can issue this new ioctl
> > > using the information provided by the guest.
> > 
> > Hmm.. not sure if that works.  At the moment, qemu (for example) needs
> > to set up the domains/containers/IOASes as it constructs the machine,
> > because that's based on the virtual hardware topology.  Initially they
> > use the default windows (0..2GiB first window, second window
> > disabled).  Only once the guest kernel is up and running does it issue
> > the hypercalls to set the final windows as it prefers.  In theory the
> > guest could change them during runtime though it's unlikely in
> > practice.  They could change during machine lifetime in practice,
> > though, if you rebooted from one guest kernel to another that uses a
> > different configuration.
> > 
> > *Maybe* IOAS construction can be deferred somehow, though I'm not sure
> > because the assigned devices need to live somewhere.
> 
> This is a general requirement for all the nesting implementations, we
> start out with some default nested page table and then later the VM
> does the vIOMMU call to change it. So nesting will have to come along
> with some kind of 'switch domains IOCTL'
> 
> In this case I would guess PPC could do the same and start out with a
> small (nested) iommu_domain and then create the VM's desired
> iommu_domain from the hypercall, and switch to it.
> 
> It is a bit more CPU work since maps in the lower range would have to
> be copied over, but conceptually the model matches the HW nesting.

Ah.. ok.  IIUC what you're saying is that the kernel-side IOASes have
fixed windows, but we fake dynamic windows in the userspace
implementation by flipping the devices over to a new IOAS with the new
windows.  Is that right?

Where exactly would the windows be specified?  My understanding was
that when creating a back-end specific IOAS, that would typically be
for the case where you're using a user / guest managed IO pagetable,
with the backend specifying the format for that.  In the ppc case we'd
need to specify the windows, but we'd still need the IOAS_MAP/UNMAP
operations to manage the mappings.  The PAPR vIOMMU is
paravirtualized, so all updates come via hypercalls, so there's no
user/guest managed data structure.

That should work from the point of view of the userspace and guest
side interfaces.  It might be fiddly from the point of view of the
back end.  The ppc iommu doesn't really have the notion of
configurable domains - instead the address spaces are the hardware or
firmware fixed PEs, so they have a fixed set of devices.  At the bare
metal level it's possible to sort of do domains by making the actual
pagetable pointers for several PEs point to a common place.

However, in the future, nested KVM under PowerVM is likely to be the
norm.  In that situation the L1 as well as the L2 only has the
paravirtualized interfaces, which don't have any notion of domains,
only PEs.  All updates take place via hypercalls which explicitly
specify a PE (strictly speaking they take a "Logical IO Bus Number"

Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-02 Thread David Gibson
On Fri, Apr 29, 2022 at 09:50:30AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 29, 2022 at 04:22:56PM +1000, David Gibson wrote:
> > On Fri, Apr 29, 2022 at 01:21:30AM +, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe 
> > > > Sent: Thursday, April 28, 2022 11:11 PM
> > > > 
> > > > 
> > > > > 3) "dynamic DMA windows" (DDW).  The IBM IOMMU hardware allows for
> > > > 2 IOVA
> > > > > windows, which aren't contiguous with each other.  The base addresses
> > > > > of each of these are fixed, but the size of each window, the pagesize
> > > > > (i.e. granularity) of each window and the number of levels in the
> > > > > IOMMU pagetable are runtime configurable.  Because it's true in the
> > > > > hardware, it's also true of the vIOMMU interface defined by the IBM
> > > > > hypervisor (and adpoted by KVM as well).  So, guests can request
> > > > > changes in how these windows are handled.  Typical Linux guests will
> > > > > use the "low" window (IOVA 0..2GiB) dynamically, and the high window
> > > > > (IOVA 1<<60..???) to map all of RAM.  However, as a hypervisor we
> > > > > can't count on that; the guest can use them however it wants.
> > > > 
> > > > As part of nesting iommufd will have a 'create iommu_domain using
> > > > iommu driver specific data' primitive.
> > > > 
> > > > The driver specific data for PPC can include a description of these
> > > > windows so the PPC specific qemu driver can issue this new ioctl
> > > > using the information provided by the guest.
> > > > 
> > > > The main issue is that internally to the iommu subsystem the
> > > > iommu_domain aperture is assumed to be a single window. This kAPI will
> > > > have to be improved to model the PPC multi-window iommu_domain.
> > > > 
> > > 
> > > From the point of nesting probably each window can be a separate
> > > domain then the existing aperture should still work?
> > 
> > Maybe.  There might be several different ways to represent it, but the
> > vital piece is that any individual device (well, group, technically)
> > must atomically join/leave both windows at once.
> 
> I'm not keen on the multi-iommu_domains because it means we have to
> create the idea that a device can be attached to multiple
> iommu_domains, which we don't have at all today.
> 
> Since iommu_domain allows PPC to implement its special rules, like the
> atomicness above.

I tend to agree; I think extending the iommu domain concept to
incorporate multiple windows makes more sense than extending to allow
multiple domains per device.  I'm just saying there might be other
ways of representing this, and that's not a sticking point for me as
long as the right properties can be preserved.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] drm/rockchip: Refactor IOMMU initialisation

2022-05-02 Thread Heiko Stuebner
On Tue, 5 Apr 2022 15:32:50 +0100, Robin Murphy wrote:
> Defer the IOMMU domain setup until after successfully binding
> components, so we can figure out IOMMU support directly from the VOP
> devices themselves, rather than manually inferring it from the DT (which
> also fails to account for whether the IOMMU driver is actually loaded).
> Although this is somewhat of a logical cleanup, the main motivation is
> to prepare for a change in the iommu_domain_alloc() interface.

Applied, thanks!

[1/1] drm/rockchip: Refactor IOMMU initialisation
  commit: 421be3ee36a497949a4b564cd1e4f7f9fe755f57

Additionally tested on rk3288, rk3399 and px30.

Best regards,
-- 
Heiko Stuebner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] iommu/vt-d: Remove domain_update_iommu_snooping()

2022-05-02 Thread Jacob Pan
Hi BaoLu,

On Sun, 1 May 2022 19:24:33 +0800, Lu Baolu 
wrote:

> The IOMMU force snooping capability is not required to be consistent
> among all the IOMMUs anymore. Remove force snooping capability check
> in the IOMMU hot-add path and domain_update_iommu_snooping() becomes
> a dead code now.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 34 +-
>  1 file changed, 1 insertion(+), 33 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 3c1c228f9031..d5808495eb64 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -533,33 +533,6 @@ static void domain_update_iommu_coherency(struct
> dmar_domain *domain) rcu_read_unlock();
>  }
>  
> -static bool domain_update_iommu_snooping(struct intel_iommu *skip)
> -{
> - struct dmar_drhd_unit *drhd;
> - struct intel_iommu *iommu;
> - bool ret = true;
> -
> - rcu_read_lock();
> - for_each_active_iommu(iommu, drhd) {
> - if (iommu != skip) {
> - /*
> -  * If the hardware is operating in the scalable
> mode,
> -  * the snooping control is always supported
> since we
> -  * always set PASID-table-entry.PGSNP bit if the
> domain
> -  * is managed outside (UNMANAGED).
> -  */
> - if (!sm_supported(iommu) &&
> - !ecap_sc_support(iommu->ecap)) {
> - ret = false;
> - break;
> - }
> - }
> - }
> - rcu_read_unlock();
> -
> - return ret;
> -}
> -
>  static int domain_update_iommu_superpage(struct dmar_domain *domain,
>struct intel_iommu *skip)
>  {
> @@ -3593,12 +3566,7 @@ static int intel_iommu_add(struct dmar_drhd_unit
> *dmaru) iommu->name);
>   return -ENXIO;
>   }
> - if (!ecap_sc_support(iommu->ecap) &&
> - domain_update_iommu_snooping(iommu)) {
> - pr_warn("%s: Doesn't support snooping.\n",
> - iommu->name);
> - return -ENXIO;
> - }
> +
Maybe I missed earlier patches, so this bit can also be deleted?

struct dmar_domain {
u8 iommu_snooping: 1;   /* indicate snooping control
feature */

>   sp = domain_update_iommu_superpage(NULL, iommu) - 1;
>   if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) {
>   pr_warn("%s: Doesn't support large page.\n",


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-02 Thread Jacob Pan
Hi BaoLu,

On Sun, 1 May 2022 19:24:32 +0800, Lu Baolu 
wrote:

> As domain->force_snooping only impacts the devices attached with the
> domain, there's no need to check against all IOMMU units. At the same
> time, for a brand new domain (hasn't been attached to any device), the
> force_snooping field could be set, but the attach_dev callback will
> return failure if it wants to attach to a device which IOMMU has no
> snoop control capability.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/pasid.h |  2 ++
>  drivers/iommu/intel/iommu.c | 50 -
>  drivers/iommu/intel/pasid.c | 18 +
>  3 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index ab4408c824a5..583ea67fc783 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -123,4 +123,6 @@ void intel_pasid_tear_down_entry(struct intel_iommu
> *iommu, bool fault_ignore);
>  int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid);
>  void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid);
> +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
> +   struct device *dev, u32 pasid);
>  #endif /* __INTEL_PASID_H */
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 98050943d863..3c1c228f9031 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4554,13 +4554,61 @@ static phys_addr_t
> intel_iommu_iova_to_phys(struct iommu_domain *domain, return phys;
>  }
>  
> +static bool domain_support_force_snooping(struct dmar_domain *domain)
> +{
> + struct device_domain_info *info;
> + unsigned long flags;
> + bool support = true;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + if (list_empty(>devices))
> + goto out;
> +
> + list_for_each_entry(info, >devices, link) {
> + if (!ecap_sc_support(info->iommu->ecap)) {
> + support = false;
> + break;
> + }
> + }
why not just check the flag dmar_domain->force_snooping? devices wouldn't
be able to attach if !ecap_sc, right?

> +out:
> + spin_unlock_irqrestore(_domain_lock, flags);
> + return support;
> +}
> +
> +static void domain_set_force_snooping(struct dmar_domain *domain)
> +{
> + struct device_domain_info *info;
> + unsigned long flags;
> +
> + /*
> +  * Second level page table supports per-PTE snoop control. The
> +  * iommu_map() interface will handle this by setting SNP bit.
> +  */
> + if (!domain_use_first_level(domain))
> + return;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + if (list_empty(>devices))
> + goto out_unlock;
> +
> + list_for_each_entry(info, >devices, link)
> + intel_pasid_setup_page_snoop_control(info->iommu,
> info->dev,
> +  PASID_RID2PASID);
> +
I guess other DMA API PASIDs need to have sc bit set as well. I will keep
this in mind for my DMA API PASID patch.

> +out_unlock:
> + spin_unlock_irqrestore(_domain_lock, flags);
> +}
> +
>  static bool intel_iommu_enforce_cache_coherency(struct iommu_domain
> *domain) {
>   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>  
> - if (!domain_update_iommu_snooping(NULL))
> + if (!domain_support_force_snooping(dmar_domain))
>   return false;
> +
> + domain_set_force_snooping(dmar_domain);
>   dmar_domain->force_snooping = true;
> +
nit: spurious change
>   return true;
>  }
>  
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index f8d215d85695..815c744e6a34 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -762,3 +762,21 @@ int intel_pasid_setup_pass_through(struct
> intel_iommu *iommu, 
>   return 0;
>  }
> +
> +/*
> + * Set the page snoop control for a pasid entry which has been set up.
> + */
> +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
> +   struct device *dev, u32 pasid)
> +{
> + struct pasid_entry *pte;
> + u16 did;
> +
> + pte = intel_pasid_get_entry(dev, pasid);
> + if (WARN_ON(!pte || !pasid_pte_is_present(pte)))
> + return;
> +
> + pasid_set_pgsnp(pte);
> + did = pasid_get_domain_id(pte);
> + pasid_flush_caches(iommu, pte, pasid, did);
> +}


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/10] ARM/dma-mapping: remove the unused virt_to_dma helper

2022-05-02 Thread Arnd Bergmann
On Mon, May 2, 2022 at 6:43 PM Christoph Hellwig  wrote:
>
> virt_to_dma was only used by the now removed dmabounce code.
>
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Arnd Bergmann 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

2022-05-02 Thread Jason Gunthorpe via iommu
On Mon, May 02, 2022 at 12:11:07PM -0600, Alex Williamson wrote:
> On Fri, 29 Apr 2022 05:45:20 +
> "Tian, Kevin"  wrote:
> > > From: Joao Martins 
> > >  3) Unmapping an IOVA range while returning its dirty bit prior to
> > > unmap. This case is specific for non-nested vIOMMU case where an
> > > erronous guest (or device) DMAing to an address being unmapped at the
> > > same time.  
> > 
> > an erroneous attempt like above cannot anticipate which DMAs can
> > succeed in that window thus the end behavior is undefined. For an
> > undefined behavior nothing will be broken by losing some bits dirtied
> > in the window between reading back dirty bits of the range and
> > actually calling unmap. From guest p.o.v. all those are black-box
> > hardware logic to serve a virtual iotlb invalidation request which just
> > cannot be completed in one cycle.
> > 
> > Hence in reality probably this is not required except to meet vfio
> > compat requirement. Just in concept returning dirty bits at unmap
> > is more accurate.
> > 
> > I'm slightly inclined to abandon it in iommufd uAPI.
> 
> Sorry, I'm not following why an unmap with returned dirty bitmap
> operation is specific to a vIOMMU case, or in fact indicative of some
> sort of erroneous, racy behavior of guest or device.

It is being compared against the alternative which is to explicitly
query dirty then do a normal unmap as two system calls and permit a
race.

The only case with any difference is if the guest is racing DMA with
the unmap - in which case it is already indeterminate for the guest if
the DMA will be completed or not. 

eg on the vIOMMU case if the guest races DMA with unmap then we are
already fine with throwing away that DMA because that is how the race
resolves during non-migration situations, so resovling it as throwing
away the DMA during migration is OK too.

> We need the flexibility to support memory hot-unplug operations
> during migration,

I would have thought that hotplug during migration would simply
discard all the data - how does it use the dirty bitmap?

> This was implemented as a single operation specifically to avoid
> races where ongoing access may be available after retrieving a
> snapshot of the bitmap.  Thanks,

The issue is the cost.

On a real iommu elminating the race is expensive as we have to write
protect the pages before query dirty, which seems to be an extra IOTLB
flush.

It is not clear if paying this cost to become atomic is actually
something any use case needs.

So, I suggest we think about a 3rd op 'write protect and clear
dirties' that will be followed by a normal unmap - the extra op will
have the extra oveheard and userspace can decide if it wants to pay or
not vs the non-atomic read dirties operation. And lets have a use case
where this must be atomic before we implement it..

The downside is we loose a little bit of efficiency by unbundling
these steps, the upside is that it doesn't require quite as many
special iommu_domain/etc paths.

(Also Joao, you should probably have a read and do not clear dirty
operation with the idea that the next operation will be unmap - then
maybe we can avoid IOTLB flushing..)

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

2022-05-02 Thread Alex Williamson
On Fri, 29 Apr 2022 05:45:20 +
"Tian, Kevin"  wrote:
> > From: Joao Martins 
> >  3) Unmapping an IOVA range while returning its dirty bit prior to
> > unmap. This case is specific for non-nested vIOMMU case where an
> > erronous guest (or device) DMAing to an address being unmapped at the
> > same time.  
> 
> an erroneous attempt like above cannot anticipate which DMAs can
> succeed in that window thus the end behavior is undefined. For an
> undefined behavior nothing will be broken by losing some bits dirtied
> in the window between reading back dirty bits of the range and
> actually calling unmap. From guest p.o.v. all those are black-box
> hardware logic to serve a virtual iotlb invalidation request which just
> cannot be completed in one cycle.
> 
> Hence in reality probably this is not required except to meet vfio
> compat requirement. Just in concept returning dirty bits at unmap
> is more accurate.
> 
> I'm slightly inclined to abandon it in iommufd uAPI.

Sorry, I'm not following why an unmap with returned dirty bitmap
operation is specific to a vIOMMU case, or in fact indicative of some
sort of erroneous, racy behavior of guest or device.  We need the
flexibility to support memory hot-unplug operations during migration,
but even in the vIOMMU case, isn't it fair for the VMM to ask whether a
device dirtied the range being unmapped?  This was implemented as a
single operation specifically to avoid races where ongoing access may be
available after retrieving a snapshot of the bitmap.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] dt-bindings: mmc: sdhci-msm: Document the SDX65 compatible

2022-05-02 Thread Rob Herring
On Mon, 02 May 2022 14:07:42 +0530, Rohit Agarwal wrote:
> The SDHCI controller on SDX65 is based on MSM SDHCI v5 IP. Hence,
> document the compatible with "qcom,sdhci-msm-v5" as the fallback.
> 
> Signed-off-by: Rohit Agarwal 
> ---
>  Documentation/devicetree/bindings/mmc/sdhci-msm.yaml | 1 +
>  1 file changed, 1 insertion(+)
> 

Acked-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/10] ARM: sa1100/assabet: move dmabounce hack to ohci driver

2022-05-02 Thread Christoph Hellwig
From: Arnd Bergmann 

The sa platform is one of the two remaining users of the old Arm
specific "dmabounce" code, which is an earlier implementation of the
generic swiotlb.

Linus Walleij submitted a patch that removes dmabounce support from
the ixp4xx, and I had a look at the other user, which is the sa
companion chip.

Looking at how dmabounce is used, I could narrow it down to one driver
one three machines:

 - dmabounce is only initialized on assabet/neponset, jornada720 and
   badge4, which are the platforms that have an sa and support
   DMA on it.

 - All three of these suffer from "erratum #7" that requires only
   doing DMA to half the memory sections based on one of the address
   lines, in addition, the neponset also can't DMA to the RAM that
   is connected to sa itself.

 - the pxa lubbock machine also has sa, but does not support DMA
   on it and does not set dmabounce.

 - only the OHCI and audio devices on sa support DMA, but as
   there is no audio driver for this hardware, only OHCI remains.

In the OHCI code, I noticed that two other platforms already have
a local bounce buffer support in the form of the "local_mem"
allocator. Specifically, TMIO and SM501 use this on a few other ARM
boards with 16KB or 128KB of local SRAM that can be accessed from the
OHCI and from the CPU.

While this is not the same problem as on sa, I could not find a
reason why we can't re-use the existing implementation but replace the
physical SRAM address mapping with a locally allocated DMA buffer.

There are two main downsides:

 - rather than using a dynamically sized pool, this buffer needs
   to be allocated at probe time using a fixed size. Without
   having any idea of what it should be, I picked a size of
   64KB, which is between what the other two OHCI front-ends use
   in their SRAM. If anyone has a better idea what that size
   is reasonable, this can be trivially changed.

 - Previously, only USB transfers to unaddressable memory needed
   to go through the bounce buffer, now all of them do, which may
   impact runtime performance for USB endpoints that do a lot of
   transfers.

On the upside, the local_mem support uses write-combining buffers,
which should be a bit faster for transfers to the device compared to
normal uncached coherent memory as used in dmabounce.

Cc: Linus Walleij 
Cc: Russell King 
Cc: Christoph Hellwig 
Cc: Laurentiu Tudor 
Cc: linux-...@vger.kernel.org
Signed-off-by: Arnd Bergmann 
Reviewed-by: Greg Kroah-Hartman 
Acked-by: Alan Stern 
Signed-off-by: Christoph Hellwig 
---
 arch/arm/common/Kconfig|  2 +-
 arch/arm/common/sa.c   | 64 --
 drivers/usb/core/hcd.c | 17 +++--
 drivers/usb/host/ohci-sa.c | 25 +
 4 files changed, 40 insertions(+), 68 deletions(-)

diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig
index c8e198631d418..bc158fd227e12 100644
--- a/arch/arm/common/Kconfig
+++ b/arch/arm/common/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 config SA
bool
-   select DMABOUNCE if !ARCH_PXA
+   select ZONE_DMA if ARCH_SA1100
 
 config DMABOUNCE
bool
diff --git a/arch/arm/common/sa.c b/arch/arm/common/sa.c
index 5367f03beb468..eaf2b527b0673 100644
--- a/arch/arm/common/sa.c
+++ b/arch/arm/common/sa.c
@@ -1386,70 +1386,9 @@ void sa_driver_unregister(struct sa_driver 
*driver)
 }
 EXPORT_SYMBOL(sa_driver_unregister);
 
-#ifdef CONFIG_DMABOUNCE
-/*
- * According to the "Intel StrongARM SA- Microprocessor Companion
- * Chip Specification Update" (June 2000), erratum #7, there is a
- * significant bug in the SA SDRAM shared memory controller.  If
- * an access to a region of memory above 1MB relative to the bank base,
- * it is important that address bit 10 _NOT_ be asserted. Depending
- * on the configuration of the RAM, bit 10 may correspond to one
- * of several different (processor-relative) address bits.
- *
- * This routine only identifies whether or not a given DMA address
- * is susceptible to the bug.
- *
- * This should only get called for sa_device types due to the
- * way we configure our device dma_masks.
- */
-static int sa_needs_bounce(struct device *dev, dma_addr_t addr, size_t 
size)
-{
-   /*
-* Section 4.6 of the "Intel StrongARM SA- Development Module
-* User's Guide" mentions that jumpers R51 and R52 control the
-* target of SA- DMA (either SDRAM bank 0 on Assabet, or
-* SDRAM bank 1 on Neponset). The default configuration selects
-* Assabet, so any address in bank 1 is necessarily invalid.
-*/
-   return (machine_is_assabet() || machine_is_pfs168()) &&
-   (addr >= 0xc800 || (addr + size) >= 0xc800);
-}
-
-static int sa_notifier_call(struct notifier_block *n, unsigned long action,
-   void *data)
-{
-   struct sa_dev *dev = to_sa_device(data);
-
-   

[PATCH 07/10] ARM/dma-mapping: use dma-direct unconditionally

2022-05-02 Thread Christoph Hellwig
Use dma-direct unconditionally on arm.  It has already been used for
some time for LPAE and nommu configurations.

This mostly changes the streaming mapping implementation and the (simple)
coherent allocator for device that are DMA coherent.  The existing
complex allocator for uncached mappings for non-coherent devices is still
used as is using the arch_dma_alloc/arch_dma_free hooks.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Arnd Bergmann 
Acked-by: Andre Przywara  [highbank]
---
 arch/arm/Kconfig   |   4 +-
 arch/arm/include/asm/dma-mapping.h |  24 --
 arch/arm/mach-highbank/highbank.c  |   2 +-
 arch/arm/mach-mvebu/coherency.c|   2 +-
 arch/arm/mm/dma-mapping.c  | 365 ++---
 5 files changed, 19 insertions(+), 378 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-mapping.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ac2e3e4957d66..3258630beee6c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -19,8 +19,8 @@ config ARM
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU
-   select ARCH_HAS_SYNC_DMA_FOR_DEVICE if SWIOTLB || !MMU
-   select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB || !MMU
+   select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+   select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_TEARDOWN_DMA_OPS if MMU
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
deleted file mode 100644
index 6427b934bd11c..0
--- a/arch/arm/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef ASMARM_DMA_MAPPING_H
-#define ASMARM_DMA_MAPPING_H
-
-#ifdef __KERNEL__
-
-#include 
-#include 
-
-#include 
-#include 
-
-extern const struct dma_map_ops arm_dma_ops;
-extern const struct dma_map_ops arm_coherent_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-   if (IS_ENABLED(CONFIG_MMU) && !IS_ENABLED(CONFIG_ARM_LPAE))
-   return _dma_ops;
-   return NULL;
-}
-
-#endif /* __KERNEL__ */
-#endif
diff --git a/arch/arm/mach-highbank/highbank.c 
b/arch/arm/mach-highbank/highbank.c
index db607955a7e45..5d4f977ac7d2a 100644
--- a/arch/arm/mach-highbank/highbank.c
+++ b/arch/arm/mach-highbank/highbank.c
@@ -98,7 +98,7 @@ static int highbank_platform_notifier(struct notifier_block 
*nb,
if (of_property_read_bool(dev->of_node, "dma-coherent")) {
val = readl(sregs_base + reg);
writel(val | 0xff01, sregs_base + reg);
-   set_dma_ops(dev, _coherent_dma_ops);
+   dev->dma_coherent = true;
}
 
return NOTIFY_OK;
diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index 49e3c8d20c2fa..865ac4bc060df 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -98,7 +98,7 @@ static int mvebu_hwcc_notifier(struct notifier_block *nb,
 
if (event != BUS_NOTIFY_ADD_DEVICE)
return NOTIFY_DONE;
-   set_dma_ops(dev, _coherent_dma_ops);
+   dev->dma_coherent = true;
 
return NOTIFY_OK;
 }
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e09d79a328fa1..0f76222cbcbb9 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -103,79 +103,8 @@ static struct arm_dma_buffer *arm_dma_buffer_find(void 
*virt)
  * before transfers and delay cache invalidation until transfer completion.
  *
  */
-static void __dma_page_cpu_to_dev(struct page *, unsigned long,
-   size_t, enum dma_data_direction);
-static void __dma_page_dev_to_cpu(struct page *, unsigned long,
-   size_t, enum dma_data_direction);
-
-/**
- * arm_dma_map_page - map a portion of a page for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_page().
- */
-static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
-unsigned long offset, size_t size, enum dma_data_direction dir,
-unsigned long attrs)
-{
-   if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-   __dma_page_cpu_to_dev(page, offset, size, dir);
-   return phys_to_dma(dev, page_to_phys(page) + offset);
-}
-
-static dma_addr_t arm_coherent_dma_map_page(struct device *dev, struct page 
*page,
-unsigned long offset, size_t size, enum dma_data_direction dir,
-unsigned long attrs)

[PATCH 03/10] ARM/dma-mapping: mark various dma-mapping routines static in dma-mapping.c

2022-05-02 Thread Christoph Hellwig
With the dmabounce removal these aren't used outside of dma-mapping.c,
so mark them static.  Move the dma_map_ops declarations down a bit
to avoid lots of forward declarations.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Arnd Bergmann 
---
 arch/arm/include/asm/dma-mapping.h |  75 --
 arch/arm/mm/dma-mapping.c  | 100 +
 2 files changed, 46 insertions(+), 129 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 1e015a7ad86aa..6427b934bd11c 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -20,80 +20,5 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return NULL;
 }
 
-/**
- * arm_dma_alloc - allocate consistent memory for DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @size: required memory size
- * @handle: bus-specific DMA address
- * @attrs: optinal attributes that specific mapping properties
- *
- * Allocate some memory for a device for performing DMA.  This function
- * allocates pages, and will return the CPU-viewed address, and sets @handle
- * to be the device-viewed address.
- */
-extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
-  gfp_t gfp, unsigned long attrs);
-
-/**
- * arm_dma_free - free memory allocated by arm_dma_alloc
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @size: size of memory originally requested in dma_alloc_coherent
- * @cpu_addr: CPU-view address returned from dma_alloc_coherent
- * @handle: device-view address returned from dma_alloc_coherent
- * @attrs: optinal attributes that specific mapping properties
- *
- * Free (and unmap) a DMA buffer previously allocated by
- * arm_dma_alloc().
- *
- * References to memory and mappings associated with cpu_addr/handle
- * during and after this call executing are illegal.
- */
-extern void arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
-dma_addr_t handle, unsigned long attrs);
-
-/**
- * arm_dma_mmap - map a coherent DMA allocation into user space
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @vma: vm_area_struct describing requested user mapping
- * @cpu_addr: kernel CPU-view address returned from dma_alloc_coherent
- * @handle: device-view address returned from dma_alloc_coherent
- * @size: size of memory originally requested in dma_alloc_coherent
- * @attrs: optinal attributes that specific mapping properties
- *
- * Map a coherent DMA buffer previously allocated by dma_alloc_coherent
- * into user space.  The coherent DMA buffer must not be freed by the
- * driver until the user space mapping has been released.
- */
-extern int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs);
-
-/*
- * For SA-, IXP425, and ADI systems  the dma-mapping functions are "magic"
- * and utilize bounce buffers as needed to work around limited DMA windows.
- *
- * On the SA-, a bug limits DMA to only certain regions of RAM.
- * On the IXP425, the PCI inbound window is 64MB (256MB total RAM)
- * On some ADI engineering systems, PCI inbound window is 32MB (12MB total RAM)
- *
- * The following are helper functions used by the dmabounce subystem
- *
- */
-
-/*
- * The scatter list versions of the above methods.
- */
-extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
-   enum dma_data_direction, unsigned long attrs);
-extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
-   enum dma_data_direction, unsigned long attrs);
-extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
-   enum dma_data_direction);
-extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, 
int,
-   enum dma_data_direction);
-extern int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs);
-
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82ffac621854f..0ee5adbdd3f1d 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -193,50 +193,6 @@ static int arm_dma_supported(struct device *dev, u64 mask)
return dma_to_pfn(dev, mask) >= max_dma_pfn;
 }
 
-const struct dma_map_ops arm_dma_ops = {
-   .alloc  = arm_dma_alloc,
-   .free   = arm_dma_free,
-   .alloc_pages= dma_direct_alloc_pages,
-   .free_pages = dma_direct_free_pages,
-   .mmap   = arm_dma_mmap,
-   .get_sgtable= arm_dma_get_sgtable,
-   .map_page   = 

[PATCH 06/10] ARM/dma-mapping: use the generic versions of dma_to_phys/phys_to_dma by default

2022-05-02 Thread Christoph Hellwig
Only the footbridge platforms provide their own DMA address translation
helpers, so switch to the generic version for all other platforms, and
consolidate the footbridge implementation to remove two levels of
indirection.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Arnd Bergmann 
---
 arch/arm/Kconfig  |  1 -
 arch/arm/include/asm/dma-direct.h | 41 +--
 arch/arm/include/asm/memory.h |  2 -
 arch/arm/mach-footbridge/Kconfig  |  1 +
 arch/arm/mach-footbridge/common.c | 19 +
 .../mach-footbridge/include/mach/dma-direct.h |  8 
 .../arm/mach-footbridge/include/mach/memory.h |  4 --
 7 files changed, 21 insertions(+), 55 deletions(-)
 create mode 100644 arch/arm/mach-footbridge/include/mach/dma-direct.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2e8091e2d8a86..ac2e3e4957d66 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -15,7 +15,6 @@ config ARM
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
-   select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_SETUP_DMA_OPS
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
diff --git a/arch/arm/include/asm/dma-direct.h 
b/arch/arm/include/asm/dma-direct.h
index 6fd52713b5d12..4f7bcde03abb5 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -1,40 +1 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef ASM_ARM_DMA_DIRECT_H
-#define ASM_ARM_DMA_DIRECT_H 1
-
-#include 
-
-/*
- * dma_to_pfn/pfn_to_dma are architecture private
- * functions used internally by the DMA-mapping API to provide DMA
- * addresses. They must not be used by drivers.
- */
-static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
-{
-   if (dev && dev->dma_range_map)
-   pfn = PFN_DOWN(translate_phys_to_dma(dev, PFN_PHYS(pfn)));
-   return (dma_addr_t)__pfn_to_bus(pfn);
-}
-
-static inline unsigned long dma_to_pfn(struct device *dev, dma_addr_t addr)
-{
-   unsigned long pfn = __bus_to_pfn(addr);
-
-   if (dev && dev->dma_range_map)
-   pfn = PFN_DOWN(translate_dma_to_phys(dev, PFN_PHYS(pfn)));
-   return pfn;
-}
-
-static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
-{
-   unsigned int offset = paddr & ~PAGE_MASK;
-   return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
-}
-
-static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
-{
-   unsigned int offset = dev_addr & ~PAGE_MASK;
-   return __pfn_to_phys(dma_to_pfn(dev, dev_addr)) + offset;
-}
-
-#endif /* ASM_ARM_DMA_DIRECT_H */
+#include 
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index f673e13e0f942..a55a9038abc89 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -378,8 +378,6 @@ static inline unsigned long __virt_to_idmap(unsigned long x)
 #ifndef __virt_to_bus
 #define __virt_to_bus  __virt_to_phys
 #define __bus_to_virt  __phys_to_virt
-#define __pfn_to_bus(x)__pfn_to_phys(x)
-#define __bus_to_pfn(x)__phys_to_pfn(x)
 #endif
 
 /*
diff --git a/arch/arm/mach-footbridge/Kconfig b/arch/arm/mach-footbridge/Kconfig
index 728aff93fba9d..b5bbdcf2e4896 100644
--- a/arch/arm/mach-footbridge/Kconfig
+++ b/arch/arm/mach-footbridge/Kconfig
@@ -60,6 +60,7 @@ endmenu
 
 # Footbridge support
 config FOOTBRIDGE
+   select ARCH_HAS_PHYS_TO_DMA
bool
 
 # Footbridge in host mode
diff --git a/arch/arm/mach-footbridge/common.c 
b/arch/arm/mach-footbridge/common.c
index 322495df271d5..5020eb96b025d 100644
--- a/arch/arm/mach-footbridge/common.c
+++ b/arch/arm/mach-footbridge/common.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -335,17 +336,19 @@ unsigned long __bus_to_virt(unsigned long res)
return res;
 }
 EXPORT_SYMBOL(__bus_to_virt);
-
-unsigned long __pfn_to_bus(unsigned long pfn)
+#else
+static inline unsigned long fb_bus_sdram_offset(void)
 {
-   return __pfn_to_phys(pfn) + (fb_bus_sdram_offset() - PHYS_OFFSET);
+   return BUS_OFFSET;
 }
-EXPORT_SYMBOL(__pfn_to_bus);
+#endif /* CONFIG_FOOTBRIDGE_ADDIN */
 
-unsigned long __bus_to_pfn(unsigned long bus)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-   return __phys_to_pfn(bus - (fb_bus_sdram_offset() - PHYS_OFFSET));
+   return paddr + (fb_bus_sdram_offset() - PHYS_OFFSET);
 }
-EXPORT_SYMBOL(__bus_to_pfn);
 
-#endif
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
+{
+   return dev_addr - (fb_bus_sdram_offset() - PHYS_OFFSET);
+}
diff --git a/arch/arm/mach-footbridge/include/mach/dma-direct.h 
b/arch/arm/mach-footbridge/include/mach/dma-direct.h
new file mode 100644
index 0..01f9e8367c009
--- /dev/null
+++ b/arch/arm/mach-footbridge/include/mach/dma-direct.h
@@ 

[PATCH 05/10] ARM/dma-mapping: use dma_to_phys/phys_to_dma in the dma-mapping code

2022-05-02 Thread Christoph Hellwig
Use the helpers as expected by the dma-direct code in the old arm
dma-mapping code to ease a gradual switch to the common DMA code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Arnd Bergmann 
---
 arch/arm/mm/dma-mapping.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0ee5adbdd3f1d..e09d79a328fa1 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -128,14 +128,14 @@ static dma_addr_t arm_dma_map_page(struct device *dev, 
struct page *page,
 {
if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
__dma_page_cpu_to_dev(page, offset, size, dir);
-   return pfn_to_dma(dev, page_to_pfn(page)) + offset;
+   return phys_to_dma(dev, page_to_phys(page) + offset);
 }
 
 static dma_addr_t arm_coherent_dma_map_page(struct device *dev, struct page 
*page,
 unsigned long offset, size_t size, enum dma_data_direction dir,
 unsigned long attrs)
 {
-   return pfn_to_dma(dev, page_to_pfn(page)) + offset;
+   return phys_to_dma(dev, page_to_phys(page) + offset);
 }
 
 /**
@@ -156,7 +156,7 @@ static void arm_dma_unmap_page(struct device *dev, 
dma_addr_t handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-   __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
+   __dma_page_dev_to_cpu(phys_to_page(dma_to_phys(dev, handle)),
  handle & ~PAGE_MASK, size, dir);
 }
 
@@ -164,7 +164,7 @@ static void arm_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
unsigned int offset = handle & (PAGE_SIZE - 1);
-   struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset));
+   struct page *page = phys_to_page(dma_to_phys(dev, handle-offset));
__dma_page_dev_to_cpu(page, offset, size, dir);
 }
 
@@ -172,7 +172,7 @@ static void arm_dma_sync_single_for_device(struct device 
*dev,
dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
unsigned int offset = handle & (PAGE_SIZE - 1);
-   struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset));
+   struct page *page = phys_to_page(dma_to_phys(dev, handle-offset));
__dma_page_cpu_to_dev(page, offset, size, dir);
 }
 
@@ -190,7 +190,7 @@ static int arm_dma_supported(struct device *dev, u64 mask)
 * Translate the device's DMA mask to a PFN limit.  This
 * PFN number includes the page which we can DMA to.
 */
-   return dma_to_pfn(dev, mask) >= max_dma_pfn;
+   return PHYS_PFN(dma_to_phys(dev, mask)) >= max_dma_pfn;
 }
 
 static void __dma_clear_buffer(struct page *page, size_t size, int 
coherent_flag)
@@ -681,7 +681,7 @@ static void *__dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
if (page) {
unsigned long flags;
 
-   *handle = pfn_to_dma(dev, page_to_pfn(page));
+   *handle = phys_to_dma(dev, page_to_phys(page));
buf->virt = args.want_vaddr ? addr : page;
 
spin_lock_irqsave(_dma_bufs_lock, flags);
@@ -721,7 +721,7 @@ static int __arm_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
int ret = -ENXIO;
unsigned long nr_vma_pages = vma_pages(vma);
unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long pfn = dma_to_pfn(dev, dma_addr);
+   unsigned long pfn = PHYS_PFN(dma_to_phys(dev, dma_addr));
unsigned long off = vma->vm_pgoff;
 
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, ))
@@ -762,7 +762,7 @@ static void __arm_dma_free(struct device *dev, size_t size, 
void *cpu_addr,
   dma_addr_t handle, unsigned long attrs,
   bool is_coherent)
 {
-   struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
+   struct page *page = phys_to_page(dma_to_phys(dev, handle));
struct arm_dma_buffer *buf;
struct arm_dma_free_args args = {
.dev = dev,
@@ -796,15 +796,15 @@ static int arm_dma_get_sgtable(struct device *dev, struct 
sg_table *sgt,
 void *cpu_addr, dma_addr_t handle, size_t size,
 unsigned long attrs)
 {
-   unsigned long pfn = dma_to_pfn(dev, handle);
+   phys_addr_t paddr = dma_to_phys(dev, handle);
struct page *page;
int ret;
 
/* If the PFN is not valid, we do not have a struct page */
-   if (!pfn_valid(pfn))
+   if (!pfn_valid(PHYS_PFN(paddr)))
return -ENXIO;
 
-   page = pfn_to_page(pfn);
+   page = phys_to_page(paddr);
 
ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
if (unlikely(ret))
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH 09/10] ARM/dma-mapping: consolidate IOMMU ops callbacks

2022-05-02 Thread Christoph Hellwig
From: Robin Murphy 

Merge the coherent and non-coherent callbacks down to a single
implementation each, relying on the generic dev->dma_coherent
flag at the points where the difference matters.

Signed-off-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 238 +-
 1 file changed, 55 insertions(+), 183 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6b0095b84a581..10e5e5800d784 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1079,13 +1079,13 @@ static void __iommu_free_atomic(struct device *dev, 
void *cpu_addr,
__free_from_pool(cpu_addr, size);
 }
 
-static void *__arm_iommu_alloc_attrs(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs,
-   int coherent_flag)
+static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
+   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
 {
pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
struct page **pages;
void *addr = NULL;
+   int coherent_flag = dev->dma_coherent ? COHERENT : NORMAL;
 
*handle = DMA_MAPPING_ERROR;
size = PAGE_ALIGN(size);
@@ -1128,19 +1128,7 @@ static void *__arm_iommu_alloc_attrs(struct device *dev, 
size_t size,
return NULL;
 }
 
-static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
-{
-   return __arm_iommu_alloc_attrs(dev, size, handle, gfp, attrs, NORMAL);
-}
-
-static void *arm_coherent_iommu_alloc_attrs(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
-{
-   return __arm_iommu_alloc_attrs(dev, size, handle, gfp, attrs, COHERENT);
-}
-
-static int __arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct 
*vma,
+static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
@@ -1154,35 +1142,24 @@ static int __arm_iommu_mmap_attrs(struct device *dev, 
struct vm_area_struct *vma
if (vma->vm_pgoff >= nr_pages)
return -ENXIO;
 
+   if (!dev->dma_coherent)
+   vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
+
err = vm_map_pages(vma, pages, nr_pages);
if (err)
pr_err("Remapping memory failed: %d\n", err);
 
return err;
 }
-static int arm_iommu_mmap_attrs(struct device *dev,
-   struct vm_area_struct *vma, void *cpu_addr,
-   dma_addr_t dma_addr, size_t size, unsigned long attrs)
-{
-   vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
-
-   return __arm_iommu_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, 
attrs);
-}
-
-static int arm_coherent_iommu_mmap_attrs(struct device *dev,
-   struct vm_area_struct *vma, void *cpu_addr,
-   dma_addr_t dma_addr, size_t size, unsigned long attrs)
-{
-   return __arm_iommu_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, 
attrs);
-}
 
 /*
  * free a page as defined by the above mapping.
  * Must not be called with IRQs disabled.
  */
-static void __arm_iommu_free_attrs(struct device *dev, size_t size, void 
*cpu_addr,
-   dma_addr_t handle, unsigned long attrs, int coherent_flag)
+static void arm_iommu_free_attrs(struct device *dev, size_t size, void 
*cpu_addr,
+   dma_addr_t handle, unsigned long attrs)
 {
+   int coherent_flag = dev->dma_coherent ? COHERENT : NORMAL;
struct page **pages;
size = PAGE_ALIGN(size);
 
@@ -1204,19 +1181,6 @@ static void __arm_iommu_free_attrs(struct device *dev, 
size_t size, void *cpu_ad
__iommu_free_buffer(dev, pages, size, attrs);
 }
 
-static void arm_iommu_free_attrs(struct device *dev, size_t size,
-void *cpu_addr, dma_addr_t handle,
-unsigned long attrs)
-{
-   __arm_iommu_free_attrs(dev, size, cpu_addr, handle, attrs, NORMAL);
-}
-
-static void arm_coherent_iommu_free_attrs(struct device *dev, size_t size,
-   void *cpu_addr, dma_addr_t handle, unsigned long attrs)
-{
-   __arm_iommu_free_attrs(dev, size, cpu_addr, handle, attrs, COHERENT);
-}
-
 static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
 void *cpu_addr, dma_addr_t dma_addr,
 size_t size, unsigned long attrs)
@@ -1236,8 +1200,7 @@ static int arm_iommu_get_sgtable(struct device *dev, 
struct sg_table *sgt,
  */
 static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
  size_t size, dma_addr_t *handle,
- enum dma_data_direction dir, unsigned long attrs,
- bool is_coherent)
+ enum 

[PATCH 02/10] ARM/dma-mapping: remove dmabounce

2022-05-02 Thread Christoph Hellwig
Remove the now unused dmabounce code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Arnd Bergmann 
---
 arch/arm/common/Kconfig|   4 -
 arch/arm/common/Makefile   |   1 -
 arch/arm/common/dmabounce.c| 582 -
 arch/arm/include/asm/device.h  |   3 -
 arch/arm/include/asm/dma-mapping.h |  29 --
 5 files changed, 619 deletions(-)
 delete mode 100644 arch/arm/common/dmabounce.c

diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig
index bc158fd227e12..d2fdb1796f488 100644
--- a/arch/arm/common/Kconfig
+++ b/arch/arm/common/Kconfig
@@ -3,10 +3,6 @@ config SA
bool
select ZONE_DMA if ARCH_SA1100
 
-config DMABOUNCE
-   bool
-   select ZONE_DMA
-
 config KRAIT_L2_ACCESSORS
bool
 
diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
index 8cd574be94cfe..7bae8cbaafe78 100644
--- a/arch/arm/common/Makefile
+++ b/arch/arm/common/Makefile
@@ -6,7 +6,6 @@
 obj-y  += firmware.o
 
 obj-$(CONFIG_SA)   += sa.o
-obj-$(CONFIG_DMABOUNCE)+= dmabounce.o
 obj-$(CONFIG_KRAIT_L2_ACCESSORS) += krait-l2-accessors.o
 obj-$(CONFIG_SHARP_LOCOMO) += locomo.o
 obj-$(CONFIG_SHARP_PARAM)  += sharpsl_param.o
diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
deleted file mode 100644
index 7996c04393d50..0
--- a/arch/arm/common/dmabounce.c
+++ /dev/null
@@ -1,582 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- *  arch/arm/common/dmabounce.c
- *
- *  Special dma_{map/unmap/dma_sync}_* routines for systems that have
- *  limited DMA windows. These functions utilize bounce buffers to
- *  copy data to/from buffers located outside the DMA region. This
- *  only works for systems in which DMA memory is at the bottom of
- *  RAM, the remainder of memory is at the top and the DMA memory
- *  can be marked as ZONE_DMA. Anything beyond that such as discontiguous
- *  DMA windows will require custom implementations that reserve memory
- *  areas at early bootup.
- *
- *  Original version by Brad Parker (b...@heeltoe.com)
- *  Re-written by Christopher Hoover 
- *  Made generic by Deepak Saxena 
- *
- *  Copyright (C) 2002 Hewlett Packard Company.
- *  Copyright (C) 2004 MontaVista Software, Inc.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
-#undef STATS
-
-#ifdef STATS
-#define DO_STATS(X) do { X ; } while (0)
-#else
-#define DO_STATS(X) do { } while (0)
-#endif
-
-/* ** */
-
-struct safe_buffer {
-   struct list_head node;
-
-   /* original request */
-   void*ptr;
-   size_t  size;
-   int direction;
-
-   /* safe buffer info */
-   struct dmabounce_pool *pool;
-   void*safe;
-   dma_addr_t  safe_dma_addr;
-};
-
-struct dmabounce_pool {
-   unsigned long   size;
-   struct dma_pool *pool;
-#ifdef STATS
-   unsigned long   allocs;
-#endif
-};
-
-struct dmabounce_device_info {
-   struct device *dev;
-   struct list_head safe_buffers;
-#ifdef STATS
-   unsigned long total_allocs;
-   unsigned long map_op_count;
-   unsigned long bounce_count;
-   int attr_res;
-#endif
-   struct dmabounce_pool   small;
-   struct dmabounce_pool   large;
-
-   rwlock_t lock;
-
-   int (*needs_bounce)(struct device *, dma_addr_t, size_t);
-};
-
-#ifdef STATS
-static ssize_t dmabounce_show(struct device *dev, struct device_attribute 
*attr,
- char *buf)
-{
-   struct dmabounce_device_info *device_info = dev->archdata.dmabounce;
-   return sprintf(buf, "%lu %lu %lu %lu %lu %lu\n",
-   device_info->small.allocs,
-   device_info->large.allocs,
-   device_info->total_allocs - device_info->small.allocs -
-   device_info->large.allocs,
-   device_info->total_allocs,
-   device_info->map_op_count,
-   device_info->bounce_count);
-}
-
-static DEVICE_ATTR(dmabounce_stats, 0400, dmabounce_show, NULL);
-#endif
-
-
-/* allocate a 'safe' buffer and keep track of it */
-static inline struct safe_buffer *
-alloc_safe_buffer(struct dmabounce_device_info *device_info, void *ptr,
- size_t size, enum dma_data_direction dir)
-{
-   struct safe_buffer *buf;
-   struct dmabounce_pool *pool;
-   struct device *dev = device_info->dev;
-   unsigned long flags;
-
-   dev_dbg(dev, "%s(ptr=%p, size=%d, dir=%d)\n",
-   __func__, ptr, size, dir);
-
-   if (size <= device_info->small.size) {
-   pool = _info->small;
-   } else if (size <= device_info->large.size) {
-   pool = _info->large;
-   } else {
-   pool = NULL;
-   }
-
-   buf = kmalloc(sizeof(struct safe_buffer), 

[PATCH 04/10] ARM/dma-mapping: remove the unused virt_to_dma helper

2022-05-02 Thread Christoph Hellwig
virt_to_dma was only used by the now removed dmabounce code.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/dma-direct.h | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/dma-direct.h 
b/arch/arm/include/asm/dma-direct.h
index 77fcb7ee5ec90..6fd52713b5d12 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -5,7 +5,7 @@
 #include 
 
 /*
- * dma_to_pfn/pfn_to_dma/virt_to_dma are architecture private
+ * dma_to_pfn/pfn_to_dma are architecture private
  * functions used internally by the DMA-mapping API to provide DMA
  * addresses. They must not be used by drivers.
  */
@@ -25,14 +25,6 @@ static inline unsigned long dma_to_pfn(struct device *dev, 
dma_addr_t addr)
return pfn;
 }
 
-static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
-{
-   if (dev)
-   return pfn_to_dma(dev, virt_to_pfn(addr));
-
-   return (dma_addr_t)__virt_to_bus((unsigned long)(addr));
-}
-
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
unsigned int offset = paddr & ~PAGE_MASK;
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


fully convert arm to use dma-direct v2

2022-05-02 Thread Christoph Hellwig
Hi all,

arm is the last platform not using the dma-direct code for directly
mapped DMA.  With the dmaboune removal from Arnd we can easily switch
arm to always use dma-direct now (it already does for LPAE configs
and nommu).  I'd love to merge this series through the dma-mapping tree
as it gives us the opportunity for additional core dma-mapping
improvements.

Changes since v1:
 - remove another unused function
 - improve a few commit logs
 - add three more patches from Robin

Diffstat:
 arch/arm/common/dmabounce.c  |  582 -
 arch/arm/include/asm/dma-mapping.h   |  128 ---
 b/arch/arm/Kconfig   |5 
 b/arch/arm/common/Kconfig|6 
 b/arch/arm/common/Makefile   |1 
 b/arch/arm/common/sa.c   |   64 -
 b/arch/arm/include/asm/device.h  |3 
 b/arch/arm/include/asm/dma-direct.h  |   49 -
 b/arch/arm/include/asm/memory.h  |2 
 b/arch/arm/mach-footbridge/Kconfig   |1 
 b/arch/arm/mach-footbridge/common.c  |   19 
 b/arch/arm/mach-footbridge/include/mach/dma-direct.h |8 
 b/arch/arm/mach-footbridge/include/mach/memory.h |4 
 b/arch/arm/mach-highbank/highbank.c  |2 
 b/arch/arm/mach-mvebu/coherency.c|2 
 b/arch/arm/mm/dma-mapping.c  |  649 ++-
 b/drivers/usb/core/hcd.c |   17 
 b/drivers/usb/host/ohci-sa.c |   25 
 18 files changed, 137 insertions(+), 1430 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/10] ARM/dma-mapping: drop .dma_supported for IOMMU ops

2022-05-02 Thread Christoph Hellwig
From: Robin Murphy 

When an IOMMU is present, we trust that it should be capable
of remapping any physical memory, and since the device masks
represent the input (virtual) addresses to the IOMMU it makes
no sense to validate them against physical PFNs anyway.

Signed-off-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 23 ---
 1 file changed, 23 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0f76222cbcbb9..6b0095b84a581 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -104,25 +104,6 @@ static struct arm_dma_buffer *arm_dma_buffer_find(void 
*virt)
  *
  */
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-/*
- * Return whether the given device DMA address mask can be supported
- * properly.  For example, if your device can only drive the low 24-bits
- * during bus mastering, then you would pass 0x00ff as the mask
- * to this function.
- */
-static int arm_dma_supported(struct device *dev, u64 mask)
-{
-   unsigned long max_dma_pfn = min(max_pfn - 1, arm_dma_pfn_limit);
-
-   /*
-* Translate the device's DMA mask to a PFN limit.  This
-* PFN number includes the page which we can DMA to.
-*/
-   return PHYS_PFN(dma_to_phys(dev, mask)) >= max_dma_pfn;
-}
-#endif
-
 static void __dma_clear_buffer(struct page *page, size_t size, int 
coherent_flag)
 {
/*
@@ -1681,8 +1662,6 @@ static const struct dma_map_ops iommu_ops = {
 
.map_resource   = arm_iommu_map_resource,
.unmap_resource = arm_iommu_unmap_resource,
-
-   .dma_supported  = arm_dma_supported,
 };
 
 static const struct dma_map_ops iommu_coherent_ops = {
@@ -1699,8 +1678,6 @@ static const struct dma_map_ops iommu_coherent_ops = {
 
.map_resource   = arm_iommu_map_resource,
.unmap_resource = arm_iommu_unmap_resource,
-
-   .dma_supported  = arm_dma_supported,
 };
 
 /**
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/10] ARM/dma-mapping: merge IOMMU ops

2022-05-02 Thread Christoph Hellwig
From: Robin Murphy 

The dma_sync_* operations are now the only difference between the
coherent and non-coherent IOMMU ops. Some minor tweaks to make those
safe for coherent devices with minimal overhead, and we can condense
down to a single set of DMA ops.

Signed-off-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 37 +
 1 file changed, 13 insertions(+), 24 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 10e5e5800d784..dd46cce61579a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1341,6 +1341,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev,
struct scatterlist *s;
int i;
 
+   if (dev->dma_coherent)
+   return;
+
for_each_sg(sg, s, nents, i)
__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
 
@@ -1360,6 +1363,9 @@ static void arm_iommu_sync_sg_for_device(struct device 
*dev,
struct scatterlist *s;
int i;
 
+   if (dev->dma_coherent)
+   return;
+
for_each_sg(sg, s, nents, i)
__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 }
@@ -1493,12 +1499,13 @@ static void arm_iommu_sync_single_for_cpu(struct device 
*dev,
 {
struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
dma_addr_t iova = handle & PAGE_MASK;
-   struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+   struct page *page;
unsigned int offset = handle & ~PAGE_MASK;
 
-   if (!iova)
+   if (dev->dma_coherent || !iova)
return;
 
+   page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
__dma_page_dev_to_cpu(page, offset, size, dir);
 }
 
@@ -1507,12 +1514,13 @@ static void arm_iommu_sync_single_for_device(struct 
device *dev,
 {
struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
dma_addr_t iova = handle & PAGE_MASK;
-   struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+   struct page *page;
unsigned int offset = handle & ~PAGE_MASK;
 
-   if (!iova)
+   if (dev->dma_coherent || !iova)
return;
 
+   page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
__dma_page_cpu_to_dev(page, offset, size, dir);
 }
 
@@ -1536,22 +1544,6 @@ static const struct dma_map_ops iommu_ops = {
.unmap_resource = arm_iommu_unmap_resource,
 };
 
-static const struct dma_map_ops iommu_coherent_ops = {
-   .alloc  = arm_iommu_alloc_attrs,
-   .free   = arm_iommu_free_attrs,
-   .mmap   = arm_iommu_mmap_attrs,
-   .get_sgtable= arm_iommu_get_sgtable,
-
-   .map_page   = arm_iommu_map_page,
-   .unmap_page = arm_iommu_unmap_page,
-
-   .map_sg = arm_iommu_map_sg,
-   .unmap_sg   = arm_iommu_unmap_sg,
-
-   .map_resource   = arm_iommu_map_resource,
-   .unmap_resource = arm_iommu_unmap_resource,
-};
-
 /**
  * arm_iommu_create_mapping
  * @bus: pointer to the bus holding the client device (for IOMMU calls)
@@ -1750,10 +1742,7 @@ static void arm_setup_iommu_dma_ops(struct device *dev, 
u64 dma_base, u64 size,
return;
}
 
-   if (coherent)
-   set_dma_ops(dev, _coherent_ops);
-   else
-   set_dma_ops(dev, _ops);
+   set_dma_ops(dev, _ops);
 }
 
 static void arm_teardown_iommu_dma_ops(struct device *dev)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

2022-05-02 Thread Jason Gunthorpe via iommu
On Mon, May 02, 2022 at 12:12:04PM -0400, Qian Cai wrote:
> On Mon, Apr 18, 2022 at 08:49:49AM +0800, Lu Baolu wrote:
> > Hi Joerg,
> > 
> > This is a resend version of v8 posted here:
> > https://lore.kernel.org/linux-iommu/20220308054421.847385-1-baolu...@linux.intel.com/
> > as we discussed in this thread:
> > https://lore.kernel.org/linux-iommu/yk%2fq1bgn8pc5h...@8bytes.org/
> > 
> > All patches can be applied perfectly except this one:
> >  - [PATCH v8 02/11] driver core: Add dma_cleanup callback in bus_type
> > It conflicts with below refactoring commit:
> >  - 4b775aaf1ea99 "driver core: Refactor sysfs and drv/bus remove hooks"
> > The conflict has been fixed in this post.
> > 
> > No functional changes in this series. I suppress cc-ing this series to
> > all v8 reviewers in order to avoid spam.
> > 
> > Please consider it for your iommu tree.
> 
> Reverting this series fixed an user-after-free while doing SR-IOV.
> 
>  BUG: KASAN: use-after-free in __lock_acquire
>  Read of size 8 at addr 080279825d78 by task qemu-system-aar/22429
>  CPU: 24 PID: 22429 Comm: qemu-system-aar Not tainted 
> 5.18.0-rc5-next-20220502 #69
>  Call trace:
>   dump_backtrace
>   show_stack
>   dump_stack_lvl
>   print_address_description.constprop.0
>   print_report
>   kasan_report
>   __asan_report_load8_noabort
>   __lock_acquire
>   lock_acquire.part.0
>   lock_acquire
>   _raw_spin_lock_irqsave
>   arm_smmu_detach_dev
>   arm_smmu_detach_dev at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:2377
>   arm_smmu_attach_dev

Hum.

So what has happened is that VFIO does this sequence:

 iommu_detach_group()
 iommu_domain_free()
 iommu_group_release_dma_owner()

Which, I think should be valid, API wise.

>From what I can see reading the code SMMUv3 blows up above because it
doesn't have a detach_dev op:

.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = arm_smmu_attach_dev,
.map_pages  = arm_smmu_map_pages,
.unmap_pages= arm_smmu_unmap_pages,
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
.enable_nesting = arm_smmu_enable_nesting,
.free   = arm_smmu_domain_free,
}

But it is internally tracking the domain inside the master - so when
the next domain is attached it does this:

static void arm_smmu_detach_dev(struct arm_smmu_master *master)
{
struct arm_smmu_domain *smmu_domain = master->domain;

spin_lock_irqsave(_domain->devices_lock, flags);

And explodes as the domain has been freed but master->domain was not
NULL'd.

It worked before because iommu_detach_group() used to attach the
default group and that was before the domain was freed in the above
sequence.

I'm guessing SMMU3 needs to call it's arm_smmu_detach_dev(master) from
the detach_dev op and null it's cached copy of the domain, but I don't
know this driver.. Robin?

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

2022-05-02 Thread Qian Cai
On Mon, Apr 18, 2022 at 08:49:49AM +0800, Lu Baolu wrote:
> Hi Joerg,
> 
> This is a resend version of v8 posted here:
> https://lore.kernel.org/linux-iommu/20220308054421.847385-1-baolu...@linux.intel.com/
> as we discussed in this thread:
> https://lore.kernel.org/linux-iommu/yk%2fq1bgn8pc5h...@8bytes.org/
> 
> All patches can be applied perfectly except this one:
>  - [PATCH v8 02/11] driver core: Add dma_cleanup callback in bus_type
> It conflicts with below refactoring commit:
>  - 4b775aaf1ea99 "driver core: Refactor sysfs and drv/bus remove hooks"
> The conflict has been fixed in this post.
> 
> No functional changes in this series. I suppress cc-ing this series to
> all v8 reviewers in order to avoid spam.
> 
> Please consider it for your iommu tree.

Reverting this series fixed an user-after-free while doing SR-IOV.

 BUG: KASAN: use-after-free in __lock_acquire
 Read of size 8 at addr 080279825d78 by task qemu-system-aar/22429
 CPU: 24 PID: 22429 Comm: qemu-system-aar Not tainted 5.18.0-rc5-next-20220502 
#69
 Call trace:
  dump_backtrace
  show_stack
  dump_stack_lvl
  print_address_description.constprop.0
  print_report
  kasan_report
  __asan_report_load8_noabort
  __lock_acquire
  lock_acquire.part.0
  lock_acquire
  _raw_spin_lock_irqsave
  arm_smmu_detach_dev
  arm_smmu_detach_dev at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:2377
  arm_smmu_attach_dev
  __iommu_attach_group
  __iommu_attach_device at drivers/iommu/iommu.c:1942
  (inlined by) iommu_group_do_attach_device at drivers/iommu/iommu.c:2058
  (inlined by) __iommu_group_for_each_dev at drivers/iommu/iommu.c:989
  (inlined by) __iommu_attach_group at drivers/iommu/iommu.c:2069
  iommu_group_release_dma_owner
  __vfio_group_unset_container
  vfio_group_try_dissolve_container
  vfio_group_put_external_user
  kvm_vfio_destroy
  kvm_destroy_vm
  kvm_vm_release
  __fput
  fput
  task_work_run
  do_exit
  do_group_exit
  get_signal
  do_signal
  do_notify_resume
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync

 Allocated by task 22427:
  kasan_save_stack
  __kasan_kmalloc
  kmem_cache_alloc_trace
  arm_smmu_domain_alloc
  iommu_domain_alloc
  vfio_iommu_type1_attach_group
  vfio_ioctl_set_iommu
  vfio_fops_unl_ioctl
  __arm64_sys_ioctl
  invoke_syscall
  el0_svc_common.constprop.0
  do_el0_svc
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync

 Freed by task 22429:
  kasan_save_stack
  kasan_set_track
  kasan_set_free_info
  kasan_slab_free
  __kasan_slab_free
  slab_free_freelist_hook
  kfree
  arm_smmu_domain_free
  arm_smmu_domain_free at iommu/arm/arm-smmu-v3/arm-smmu-v3.c:2067
  iommu_domain_free
  vfio_iommu_type1_detach_group
  __vfio_group_unset_container
  vfio_group_try_dissolve_container
  vfio_group_put_external_user
  kvm_vfio_destroy
  kvm_destroy_vm
  kvm_vm_release
  __fput
  fput
  task_work_run
  do_exit
  do_group_exit
  get_signal
  do_signal
  do_notify_resume
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: dart: Add missing module owner to ops structure

2022-05-02 Thread Sven Peter
On Mon, May 2, 2022, at 11:22, Hector Martin wrote:
> This is required to make loading this as a module work.
>
> Signed-off-by: Hector Martin 

this could probably use a 
Fixes: 46d1fb072e76 ("iommu/dart: Add DART iommu driver")

but otherwise
Reviewed-by: Sven Peter 


Thanks,

Sven
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/7] dt-bindings: iommu: renesas, ipmmu-vmsa: R-Car V3U is R-Car Gen4

2022-05-02 Thread Geert Uytterhoeven
Despite the name, R-Car V3U is the first member of the R-Car Gen4
family.  Hence move its compatible value to the R-Car Gen4 section.

Signed-off-by: Geert Uytterhoeven 
---
 .../devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml 
b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
index 8854569ca3a6c949..786f85a97077d79c 100644
--- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
+++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
@@ -43,10 +43,10 @@ properties:
   - renesas,ipmmu-r8a77980 # R-Car V3H
   - renesas,ipmmu-r8a77990 # R-Car E3
   - renesas,ipmmu-r8a77995 # R-Car D3
-  - renesas,ipmmu-r8a779a0 # R-Car V3U
   - items:
   - enum:
-  - renesas,ipmmu-r8a779f0 # R-Car S4-8
+  - renesas,ipmmu-r8a779a0   # R-Car V3U
+  - renesas,ipmmu-r8a779f0   # R-Car S4-8
   - const: renesas,rcar-gen4-ipmmu-vmsa  # R-Car Gen4
 
   reg:
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/7] dt-bindings: serial: renesas, scif: R-Car V3U is R-Car Gen4

2022-05-02 Thread Geert Uytterhoeven
Despite the name, R-Car V3U is the first member of the R-Car Gen4
family.  Hence move its compatible value to the R-Car Gen4 section.

Signed-off-by: Geert Uytterhoeven 
---
 Documentation/devicetree/bindings/serial/renesas,scif.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/serial/renesas,scif.yaml 
b/Documentation/devicetree/bindings/serial/renesas,scif.yaml
index 5d37f8f189fbe1d1..90fe45265fbc64cc 100644
--- a/Documentation/devicetree/bindings/serial/renesas,scif.yaml
+++ b/Documentation/devicetree/bindings/serial/renesas,scif.yaml
@@ -60,12 +60,12 @@ properties:
   - renesas,scif-r8a77980 # R-Car V3H
   - renesas,scif-r8a77990 # R-Car E3
   - renesas,scif-r8a77995 # R-Car D3
-  - renesas,scif-r8a779a0 # R-Car V3U
   - const: renesas,rcar-gen3-scif # R-Car Gen3 and RZ/G2
   - const: renesas,scif   # generic SCIF compatible UART
 
   - items:
   - enum:
+  - renesas,scif-r8a779a0 # R-Car V3U
   - renesas,scif-r8a779f0 # R-Car S4-8
   - const: renesas,rcar-gen4-scif # R-Car Gen4
   - const: renesas,scif   # generic SCIF compatible UART
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/7] dt-bindings: serial: renesas, hscif: R-Car V3U is R-Car Gen4

2022-05-02 Thread Geert Uytterhoeven
Despite the name, R-Car V3U is the first member of the R-Car Gen4
family.  Hence move its compatible value to the R-Car Gen4 section.

Signed-off-by: Geert Uytterhoeven 
---
 Documentation/devicetree/bindings/serial/renesas,hscif.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/serial/renesas,hscif.yaml 
b/Documentation/devicetree/bindings/serial/renesas,hscif.yaml
index d688a07a0e2970cd..87180d95cd4c5d14 100644
--- a/Documentation/devicetree/bindings/serial/renesas,hscif.yaml
+++ b/Documentation/devicetree/bindings/serial/renesas,hscif.yaml
@@ -51,12 +51,12 @@ properties:
   - renesas,hscif-r8a77980 # R-Car V3H
   - renesas,hscif-r8a77990 # R-Car E3
   - renesas,hscif-r8a77995 # R-Car D3
-  - renesas,hscif-r8a779a0 # R-Car V3U
   - const: renesas,rcar-gen3-hscif # R-Car Gen3 and RZ/G2
   - const: renesas,hscif   # generic HSCIF compatible UART
 
   - items:
   - enum:
+  - renesas,hscif-r8a779a0 # R-Car V3U
   - renesas,hscif-r8a779g0 # R-Car V4H
   - const: renesas,rcar-gen4-hscif # R-Car Gen4
   - const: renesas,hscif   # generic HSCIF compatible UART
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/7] dt-bindings: watchdog: renesas, wdt: R-Car V3U is R-Car Gen4

2022-05-02 Thread Geert Uytterhoeven
Despite the name, R-Car V3U is the first member of the R-Car Gen4
family.  Hence move its compatible value to the R-Car Gen4 section.

Signed-off-by: Geert Uytterhoeven 
---
 Documentation/devicetree/bindings/watchdog/renesas,wdt.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/watchdog/renesas,wdt.yaml 
b/Documentation/devicetree/bindings/watchdog/renesas,wdt.yaml
index 77ee7c4b8067f506..1fa243052327bffe 100644
--- a/Documentation/devicetree/bindings/watchdog/renesas,wdt.yaml
+++ b/Documentation/devicetree/bindings/watchdog/renesas,wdt.yaml
@@ -59,11 +59,11 @@ properties:
   - renesas,r8a77980-wdt # R-Car V3H
   - renesas,r8a77990-wdt # R-Car E3
   - renesas,r8a77995-wdt # R-Car D3
-  - renesas,r8a779a0-wdt # R-Car V3U
   - const: renesas,rcar-gen3-wdt # R-Car Gen3 and RZ/G2
 
   - items:
   - enum:
+  - renesas,r8a779a0-wdt # R-Car V3U
   - renesas,r8a779f0-wdt # R-Car S4-8
   - const: renesas,rcar-gen4-wdt # R-Car Gen4
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/7] dt-bindings: renesas: R-Car V3U is R-Car Gen4

2022-05-02 Thread Geert Uytterhoeven
Hi all,

Despite the name, R-Car V3U is the first member of the R-Car Gen4
family[1].

Hence this patch series updates various DT binding documents to move
compatible values for R-Car V3U devices to R-Car Gen4 sections, in
bindings where the latter already exist.  Other DT binding documents
will be updated progressively, after adding support for more SoCs in the
R-Car Gen4 family.

These patches are intended to be taken by DT or subsystem maintainers.
Separate patches to update the DTS file[2] and SoC identication code[3]
are in-flight.

Thanks for your comments!

[1] 
https://www.renesas.com/eu/en/products/automotive-products/automotive-system-chips-socs/r-car-v3u-best-class-r-car-v3u-asil-d-system-chip-automated-driving
[2] [PATCH] arm64: dts: renesas: r8a779a0: Update to R-Car Gen4 compatible 
values

https://lore.kernel.org/73cea9d5e1a6639422c67e4df4285042e31c9fd5.1651497071.git.geert+rene...@glider.be
[3] [PATCH] soc: renesas: R-Car V3U is R-Car Gen4

https://lore.kernel.org/2bbecad7b6c24c0d5c1797b3f7f0733d5ba33842.1651497066.git.geert+rene...@glider.be

Geert Uytterhoeven (7):
  dt-bindings: gpio: renesas,rcar-gpio: R-Car V3U is R-Car Gen4
  dt-bindings: i2c: renesas,rcar-i2c: R-Car V3U is R-Car Gen4
  dt-bindings: iommu: renesas,ipmmu-vmsa: R-Car V3U is R-Car Gen4
  dt-bindings: renesas,rcar-dmac: R-Car V3U is R-Car Gen4
  dt-bindings: serial: renesas,hscif: R-Car V3U is R-Car Gen4
  dt-bindings: serial: renesas,scif: R-Car V3U is R-Car Gen4
  dt-bindings: watchdog: renesas,wdt: R-Car V3U is R-Car Gen4

 .../devicetree/bindings/dma/renesas,rcar-dmac.yaml | 10 --
 .../devicetree/bindings/gpio/renesas,rcar-gpio.yaml|  4 +---
 .../devicetree/bindings/i2c/renesas,rcar-i2c.yaml  |  2 +-
 .../devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml  |  4 ++--
 .../devicetree/bindings/serial/renesas,hscif.yaml  |  2 +-
 .../devicetree/bindings/serial/renesas,scif.yaml   |  2 +-
 .../devicetree/bindings/watchdog/renesas,wdt.yaml  |  2 +-
 7 files changed, 11 insertions(+), 15 deletions(-)

-- 
2.25.1

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/7] dt-bindings: gpio: renesas, rcar-gpio: R-Car V3U is R-Car Gen4

2022-05-02 Thread Geert Uytterhoeven
Despite the name, R-Car V3U is the first member of the R-Car Gen4
family.  Hence move its compatible value to the R-Car Gen4 section.

Signed-off-by: Geert Uytterhoeven 
---
 Documentation/devicetree/bindings/gpio/renesas,rcar-gpio.yaml | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/gpio/renesas,rcar-gpio.yaml 
b/Documentation/devicetree/bindings/gpio/renesas,rcar-gpio.yaml
index 0681a4790cd62e23..75e5da6a7cc04bbd 100644
--- a/Documentation/devicetree/bindings/gpio/renesas,rcar-gpio.yaml
+++ b/Documentation/devicetree/bindings/gpio/renesas,rcar-gpio.yaml
@@ -48,11 +48,9 @@ properties:
   - renesas,gpio-r8a77995 # R-Car D3
   - const: renesas,rcar-gen3-gpio # R-Car Gen3 or RZ/G2
 
-  - items:
-  - const: renesas,gpio-r8a779a0  # R-Car V3U
-
   - items:
   - enum:
+  - renesas,gpio-r8a779a0 # R-Car V3U
   - renesas,gpio-r8a779f0 # R-Car S4-8
   - const: renesas,rcar-gen4-gpio # R-Car Gen4
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/7] dt-bindings: renesas,rcar-dmac: R-Car V3U is R-Car Gen4

2022-05-02 Thread Geert Uytterhoeven
Despite the name, R-Car V3U is the first member of the R-Car Gen4
family.  Hence move its compatible value to the R-Car Gen4 section.

Signed-off-by: Geert Uytterhoeven 
---
 .../devicetree/bindings/dma/renesas,rcar-dmac.yaml | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.yaml 
b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.yaml
index 7c6badf39921b660..7202cd68e7597dc2 100644
--- a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.yaml
+++ b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.yaml
@@ -42,11 +42,10 @@ properties:
   - const: renesas,rcar-dmac
 
   - items:
-  - const: renesas,dmac-r8a779a0 # R-Car V3U
-
-  - items:
-  - const: renesas,dmac-r8a779f0 # R-Car S4-8
-  - const: renesas,rcar-gen4-dmac
+  - enum:
+  - renesas,dmac-r8a779a0 # R-Car V3U
+  - renesas,dmac-r8a779f0 # R-Car S4-8
+  - const: renesas,rcar-gen4-dmac # R-Car Gen4
 
   reg: true
 
@@ -121,7 +120,6 @@ if:
 compatible:
   contains:
 enum:
-  - renesas,dmac-r8a779a0
   - renesas,rcar-gen4-dmac
 then:
   properties:
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/7] dt-bindings: i2c: renesas, rcar-i2c: R-Car V3U is R-Car Gen4

2022-05-02 Thread Geert Uytterhoeven
Despite the name, R-Car V3U is the first member of the R-Car Gen4
family.  I2C on R-Car V3U also supports some extra features (e.g. Slave
Clock Stretch Select), which are supported by other R-Car Gen4 SoCs, but
not by any other R-Car Gen3 SoC.

Hence move its compatible value to the R-Car Gen4 section.

Signed-off-by: Geert Uytterhoeven 
---
 Documentation/devicetree/bindings/i2c/renesas,rcar-i2c.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/i2c/renesas,rcar-i2c.yaml 
b/Documentation/devicetree/bindings/i2c/renesas,rcar-i2c.yaml
index c30107833a5145f9..f9929578c7613f07 100644
--- a/Documentation/devicetree/bindings/i2c/renesas,rcar-i2c.yaml
+++ b/Documentation/devicetree/bindings/i2c/renesas,rcar-i2c.yaml
@@ -46,11 +46,11 @@ properties:
   - renesas,i2c-r8a77980 # R-Car V3H
   - renesas,i2c-r8a77990 # R-Car E3
   - renesas,i2c-r8a77995 # R-Car D3
-  - renesas,i2c-r8a779a0 # R-Car V3U
   - const: renesas,rcar-gen3-i2c # R-Car Gen3 and RZ/G2
 
   - items:
   - enum:
+  - renesas,i2c-r8a779a0 # R-Car V3U
   - renesas,i2c-r8a779f0 # R-Car S4-8
   - const: renesas,rcar-gen4-i2c # R-Car Gen4
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] iommu/vt-d: Remove hard coding PGSNP bit in PASID entries

2022-05-02 Thread Jason Gunthorpe via iommu
On Sun, May 01, 2022 at 07:24:34PM +0800, Lu Baolu wrote:
> As enforce_cache_coherency has been introduced into the iommu_domain_ops,
> the kernel component which owns the iommu domain is able to opt-in its
> requirement for force snooping support. The iommu driver has no need to
> hard code the page snoop control bit in the PASID table entries anymore.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/pasid.h | 1 -
>  drivers/iommu/intel/iommu.c | 3 ---
>  drivers/iommu/intel/pasid.c | 6 --
>  3 files changed, 10 deletions(-)

It seems fine, but as in the other email where do we do
pasid_set_pgsnp() for a new device attach on an already no-snopp domain?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] iommu/vt-d: Remove domain_update_iommu_snooping()

2022-05-02 Thread Jason Gunthorpe via iommu
On Sun, May 01, 2022 at 07:24:33PM +0800, Lu Baolu wrote:
> The IOMMU force snooping capability is not required to be consistent
> among all the IOMMUs anymore. Remove force snooping capability check
> in the IOMMU hot-add path and domain_update_iommu_snooping() becomes
> a dead code now.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 34 +-
>  1 file changed, 1 insertion(+), 33 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-02 Thread Jason Gunthorpe via iommu
On Sun, May 01, 2022 at 07:24:32PM +0800, Lu Baolu wrote:
> +static bool domain_support_force_snooping(struct dmar_domain *domain)
> +{
> + struct device_domain_info *info;
> + unsigned long flags;
> + bool support = true;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + if (list_empty(>devices))
> + goto out;

Why? list_for_each_entry will just do nothing..

> + list_for_each_entry(info, >devices, link) {
> + if (!ecap_sc_support(info->iommu->ecap)) {
> + support = false;
> + break;
> + }
> + }
> +out:
> + spin_unlock_irqrestore(_domain_lock, flags);
> + return support;
> +}
> +
> +static void domain_set_force_snooping(struct dmar_domain *domain)
> +{
> + struct device_domain_info *info;
> + unsigned long flags;
> +
> + /*
> +  * Second level page table supports per-PTE snoop control. The
> +  * iommu_map() interface will handle this by setting SNP bit.
> +  */
> + if (!domain_use_first_level(domain))
> + return;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + if (list_empty(>devices))
> + goto out_unlock;
> +
> + list_for_each_entry(info, >devices, link)
> + intel_pasid_setup_page_snoop_control(info->iommu, info->dev,
> +  PASID_RID2PASID);
> +
> +out_unlock:
> + spin_unlock_irqrestore(_domain_lock, flags);
> +}
> +
>  static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
>  {
>   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>  
> - if (!domain_update_iommu_snooping(NULL))
> + if (!domain_support_force_snooping(dmar_domain))
>   return false;

Maybe exit early if force_snooping = true?

> + domain_set_force_snooping(dmar_domain);
>   dmar_domain->force_snooping = true;
> +
>   return true;
>  }
>  
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index f8d215d85695..815c744e6a34 100644
> +++ b/drivers/iommu/intel/pasid.c
> @@ -762,3 +762,21 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
> *iommu,
>  
>   return 0;
>  }
> +
> +/*
> + * Set the page snoop control for a pasid entry which has been set up.
> + */

So the 'first level' is only used with pasid?

> +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
> +   struct device *dev, u32 pasid)
> +{
> + struct pasid_entry *pte;
> + u16 did;
> +
> + pte = intel_pasid_get_entry(dev, pasid);
> + if (WARN_ON(!pte || !pasid_pte_is_present(pte)))
> + return;
> +
> + pasid_set_pgsnp(pte);

Doesn't this need to be done in other places too, like when a new attach
is made? Patch 5 removed it, but should that be made if
domain->force_snooping?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/vt-d: Set SNP bit only in second-level page table entries

2022-05-02 Thread Jason Gunthorpe via iommu
On Sun, May 01, 2022 at 07:24:31PM +0800, Lu Baolu wrote:
> The SNP bit is only valid for second-level PTEs. Setting this bit in the
> first-level PTEs has no functional impact because the Intel IOMMU always
> ignores the same bit in first-level PTEs. Anyway, let's check the page
> table type before setting SNP bit in PTEs to make the code more readable.

Shouldn't this be tested before setting force_snooping and not during
every map?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu/vt-d: Block force-snoop domain attaching if no SC support

2022-05-02 Thread Jason Gunthorpe via iommu
On Sun, May 01, 2022 at 07:24:30PM +0800, Lu Baolu wrote:
> In the attach_dev callback of the default domain ops, if the domain has
> been set force_snooping, but the iommu hardware of the device does not
> support SC(Snoop Control) capability, the callback should block it and
> return a corresponding error code.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V2 1/2] swiotlb: Add Child IO TLB mem support

2022-05-02 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. Swiotlb code allocates
bounce buffer among child IO tlb mem iterately.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  7 +++
 kernel/dma/swiotlb.c| 97 -
 2 files changed, 94 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..4a3f6a7b4b7e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -89,6 +89,9 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
+ * @num_child:  The child io tlb mem number in the pool.
+ * @child_start:The child index to start searching in the next round.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +105,10 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int num_child;
+   unsigned int child_nslot;
+   unsigned int child_start;
+   struct io_tlb_mem *child;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e2ef0864eb1e..32e8f42530b6 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -207,6 +207,26 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->force_bounce = true;
 
spin_lock_init(>lock);
+
+   if (mem->num_child) {
+   mem->child_nslot = nslabs / mem->num_child;
+   mem->child_start = 0;
+
+   /*
+* Initialize child IO TLB mem, divide IO TLB pool
+* into child number. Reuse parent mem->slot in the
+* child mem->slot.
+*/
+   for (i = 0; i < mem->num_child; i++) {
+   mem->child[i].slots = mem->slots + i * mem->child_nslot;
+   mem->child[i].num_child = 0;
+
+   swiotlb_init_io_tlb_mem(>child[i],
+   start + ((i * mem->child_nslot) << 
IO_TLB_SHIFT),
+   mem->child_nslot, late_alloc);
+   }
+   }
+
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
@@ -336,16 +356,18 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
get_order(array_size(sizeof(*mem->slots), nslabs)));
-   if (!mem->slots) {
-   free_pages((unsigned long)vstart, order);
-   return -ENOMEM;
-   }
+   if (!mem->slots)
+   goto error_slots;
 
set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT);
swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
 
swiotlb_print_info();
return 0;
+
+error_slots:
+   free_pages((unsigned long)vstart, order);
+   return -ENOMEM;
 }
 
 void __init swiotlb_exit(void)
@@ -483,10 +505,11 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
- size_t alloc_size, unsigned int alloc_align_mask)
+static int swiotlb_do_find_slots(struct io_tlb_mem *mem,
+struct device *dev, phys_addr_t orig_addr,
+size_t alloc_size,
+unsigned int alloc_align_mask)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -565,6 +588,46 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
return index;
 }
 
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size, unsigned int alloc_align_mask)
+{
+   struct io_tlb_mem *mem = 

[RFC PATCH V2 0/2] swiotlb: Add child io tlb mem support

2022-05-02 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. The number child IO tlb
mem maybe set up equal with device queue number and this helps to resolve
swiotlb spinlock overhead among devices and queues.

Patch 2 introduces IO TLB Block concepts and swiotlb_device_allocate()
API to allocate per-device swiotlb bounce buffer. The new API Accepts
queue number as the number of child IO TLB mem to set up device's IO
TLB mem.

Tianyu Lan (2):
  swiotlb: Add Child IO TLB mem support
  Swiotlb: Add device bounce buffer allocation interface

 include/linux/swiotlb.h |  40 ++
 kernel/dma/swiotlb.c| 290 ++--
 2 files changed, 317 insertions(+), 13 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V2 2/2] Swiotlb: Add device bounce buffer allocation interface

2022-05-02 Thread Tianyu Lan
From: Tianyu Lan 

In SEV/TDX Confidential VM, device DMA transaction needs use swiotlb
bounce buffer to share data with host/hypervisor. The swiotlb spinlock
introduces overhead among devices if they share io tlb mem. Avoid such
issue, introduce swiotlb_device_allocate() to allocate device bounce
buffer from default io tlb pool and set up child IO tlb mem for queue
bounce buffer allocaton according input queue number. Device may have
multi io queues and setting up the same number of child io tlb mem may
help to resolve spinlock overhead among queues.

Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer
from default pool for devices. IO TLB segment(256k) is too small.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  35 +++-
 kernel/dma/swiotlb.c| 195 +++-
 2 files changed, 225 insertions(+), 5 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4a3f6a7b4b7e..efd29e884fd7 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -31,6 +31,14 @@ struct scatterlist;
 #define IO_TLB_SHIFT 11
 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT)
 
+/*
+ * IO TLB BLOCK UNIT as device bounce buffer allocation unit.
+ * This allows device allocates bounce buffer from default io
+ * tlb pool.
+ */
+#define IO_TLB_BLOCKSIZE   (8 * IO_TLB_SEGSIZE)
+#define IO_TLB_BLOCK_UNIT  (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT)
+
 /* default to 64MB */
 #define IO_TLB_DEFAULT_SIZE (64UL<<20)
 
@@ -89,9 +97,11 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
- * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
  * @num_child:  The child io tlb mem number in the pool.
+ * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
+ * @child_nblock:The number of IO TLB block in the child IO TLB mem.
  * @child_start:The child index to start searching in the next round.
+ * @block_start:The block index to start searching in the next round.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -107,8 +117,16 @@ struct io_tlb_mem {
bool for_alloc;
unsigned int num_child;
unsigned int child_nslot;
+   unsigned int child_nblock;
unsigned int child_start;
+   unsigned int block_index;
struct io_tlb_mem *child;
+   struct io_tlb_mem *parent;
+   struct io_tlb_block {
+   size_t alloc_size;
+   unsigned long start_slot;
+   unsigned int list;
+   } *block;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -137,6 +155,10 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size);
+void swiotlb_device_free(struct device *dev);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -169,6 +191,17 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+void swiotlb_device_free(struct device *dev)
+{
+}
+
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size)
+{
+   return -ENOMEM;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 32e8f42530b6..f8a0711cd9de 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -195,7 +195,8 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j;
+   unsigned int block_num = nslabs / IO_TLB_BLOCKSIZE;
 
mem->nslabs = nslabs;
mem->start = start;
@@ -210,6 +211,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
 
if (mem->num_child) {
mem->child_nslot = nslabs / mem->num_child;
+   mem->child_nblock = block_num / mem->num_child;
mem->child_start = 0;
 
/*
@@ -219,15 +221,24 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
 */
for (i = 0; i < mem->num_child; i++) {
mem->child[i].slots = mem->slots + i * mem->child_nslot;
-   mem->child[i].num_child = 0;
+   mem->child[i].block = 

Re: [PATCH RFC 18/19] iommu/intel: Access/Dirty bit support for SL domains

2022-05-02 Thread Joao Martins
On 4/30/22 07:12, Baolu Lu wrote:
> On 2022/4/29 05:09, Joao Martins wrote:
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -5089,6 +5089,113 @@ static void intel_iommu_iotlb_sync_map(struct 
>> iommu_domain *domain,
>>  }
>>   }
>>   
>> +static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
>> +  bool enable)
>> +{
>> +struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> +struct device_domain_info *info;
>> +unsigned long flags;
>> +int ret = -EINVAL;
> 
>   if (domain_use_first_level(dmar_domain))
>   return -EOPNOTSUPP;
> 
Will add.

>> +
>> +spin_lock_irqsave(_domain_lock, flags);
>> +if (list_empty(_domain->devices)) {
>> +spin_unlock_irqrestore(_domain_lock, flags);
>> +return ret;
>> +}
> 
> I agreed with Kevin's suggestion in his reply.
> 
/me nods

>> +
>> +list_for_each_entry(info, _domain->devices, link) {
>> +if (!info->dev || (info->domain != dmar_domain))
>> +continue;
> 
> This check is redundant.
> 

I'll drop it.

>> +
>> +/* Dirty tracking is second-stage level SM only */
>> +if ((info->domain && domain_use_first_level(info->domain)) ||
>> +!ecap_slads(info->iommu->ecap) ||
>> +!sm_supported(info->iommu) || !intel_iommu_sm) {
>> +ret = -EOPNOTSUPP;
>> +continue;
> 
> Perhaps break and return -EOPNOTSUPP directly here? We are not able to
> support a mixed mode, right?
> 
Correct, I should return early here.

>> +}
>> +
>> +ret = intel_pasid_setup_dirty_tracking(info->iommu, 
>> info->domain,
>> + info->dev, PASID_RID2PASID,
>> + enable);
>> +if (ret)
>> +break;
>> +}
>> +spin_unlock_irqrestore(_domain_lock, flags);
>> +
>> +/*
>> + * We need to flush context TLB and IOTLB with any cached translations
>> + * to force the incoming DMA requests for have its IOTLB entries tagged
>> + * with A/D bits
>> + */
>> +intel_flush_iotlb_all(domain);
>> +return ret;
>> +}
> 
> Best regards,
> baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 04/19] iommu: Add an unmap API that returns dirtied IOPTEs

2022-05-02 Thread Joao Martins
On 4/30/22 06:12, Baolu Lu wrote:
> On 2022/4/29 05:09, Joao Martins wrote:
>> Today, the dirty state is lost and the page wouldn't be migrated to
>> destination potentially leading the guest into error.
>>
>> Add an unmap API that reads the dirty bit and sets it in the
>> user passed bitmap. This unmap iommu API tackles a potentially
>> racy update to the dirty bit *when* doing DMA on a iova that is
>> being unmapped at the same time.
>>
>> The new unmap_read_dirty/unmap_pages_read_dirty does not replace
>> the unmap pages, but rather only when explicit called with an dirty
>> bitmap data passed in.
>>
>> It could be said that the guest is buggy and rather than a special unmap
>> path tackling the theoretical race ... it would suffice fetching the
>> dirty bits (with GET_DIRTY_IOVA), and then unmap the IOVA.
> 
> I am not sure whether this API could solve the race.
> 

Yeah, it doesn't fully solve the race as DMA can still potentially
occuring until the IOMMU needs to rewalk page tables (i.e. after IOTLB flush).


> size_t iommu_unmap(struct iommu_domain *domain,
> unsigned long iova, size_t size)
> {
>  struct iommu_iotlb_gather iotlb_gather;
>  size_t ret;
> 
>  iommu_iotlb_gather_init(_gather);
>  ret = __iommu_unmap(domain, iova, size, _gather);
>  iommu_iotlb_sync(domain, _gather);
> 
>  return ret;
> }
> 
> The PTEs are cleared before iotlb invalidation. What if a DMA write
> happens after PTE clearing and before the iotlb invalidation with the
> PTE happening to be cached?


Yeap. Jason/Robin also reiterated similarly.

To fully handle this we need to force the PTEs readonly, and check the dirty bit
after. So perhaps if we wanna go to the extent of fully stopping DMA -- which 
none
of unmap APIs ever guarantee -- we need more of an write-protects API that 
optionally
fetches the dirties. And then the unmap remains as is (prior to this series).

Now whether this race is worth solving isn't clear (bearing that solving the 
race will add
a lot of overhead), and git/mailing list archeology doesn't respond to that 
either if this
was ever useful in pratice :(

Joao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 03/19] iommufd: Dirty tracking data support

2022-05-02 Thread Joao Martins
On 4/30/22 05:11, Baolu Lu wrote:
> On 2022/4/29 05:09, Joao Martins wrote:
>> Add an IO pagetable API iopt_read_and_clear_dirty_data() that
>> performs the reading of dirty IOPTEs for a given IOVA range and
>> then copying back to userspace from each area-internal bitmap.
>>
>> Underneath it uses the IOMMU equivalent API which will read the
>> dirty bits, as well as atomically clearing the IOPTE dirty bit
>> and flushing the IOTLB at the end. The dirty bitmaps pass an
>> iotlb_gather to allow batching the dirty-bit updates.
>>
>> Most of the complexity, though, is in the handling of the user
>> bitmaps to avoid copies back and forth. The bitmap user addresses
>> need to be iterated through, pinned and then passing the pages
>> into iommu core. The amount of bitmap data passed at a time for a
>> read_and_clear_dirty() is 1 page worth of pinned base page
>> pointers. That equates to 16M bits, or rather 64G of data that
>> can be returned as 'dirtied'. The flush the IOTLB at the end of
>> the whole scanned IOVA range, to defer as much as possible the
>> potential DMA performance penalty.
>>
>> Signed-off-by: Joao Martins 
>> ---
>>   drivers/iommu/iommufd/io_pagetable.c| 169 
>>   drivers/iommu/iommufd/iommufd_private.h |  44 ++
>>   2 files changed, 213 insertions(+)
>>
>> diff --git a/drivers/iommu/iommufd/io_pagetable.c 
>> b/drivers/iommu/iommufd/io_pagetable.c
>> index f4609ef369e0..835b5040fce9 100644
>> --- a/drivers/iommu/iommufd/io_pagetable.c
>> +++ b/drivers/iommu/iommufd/io_pagetable.c
>> @@ -14,6 +14,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   
>>   #include "io_pagetable.h"
>>   
>> @@ -347,6 +348,174 @@ int iopt_set_dirty_tracking(struct io_pagetable *iopt,
>>  return ret;
>>   }
>>   
>> +int iommufd_dirty_iter_init(struct iommufd_dirty_iter *iter,
>> +struct iommufd_dirty_data *bitmap)
>> +{
>> +struct iommu_dirty_bitmap *dirty = >dirty;
>> +unsigned long bitmap_len;
>> +
>> +bitmap_len = dirty_bitmap_bytes(bitmap->length >> dirty->pgshift);
>> +
>> +import_single_range(WRITE, bitmap->data, bitmap_len,
>> +>bitmap_iov, >bitmap_iter);
>> +iter->iova = bitmap->iova;
>> +
>> +/* Can record up to 64G at a time */
>> +dirty->pages = (struct page **) __get_free_page(GFP_KERNEL);
>> +
>> +return !dirty->pages ? -ENOMEM : 0;
>> +}
>> +
>> +void iommufd_dirty_iter_free(struct iommufd_dirty_iter *iter)
>> +{
>> +struct iommu_dirty_bitmap *dirty = >dirty;
>> +
>> +if (dirty->pages) {
>> +free_page((unsigned long) dirty->pages);
>> +dirty->pages = NULL;
>> +}
>> +}
>> +
>> +bool iommufd_dirty_iter_done(struct iommufd_dirty_iter *iter)
>> +{
>> +return iov_iter_count(>bitmap_iter) > 0;
>> +}
>> +
>> +static inline unsigned long iommufd_dirty_iter_bytes(struct 
>> iommufd_dirty_iter *iter)
>> +{
>> +unsigned long left = iter->bitmap_iter.count - 
>> iter->bitmap_iter.iov_offset;
>> +
>> +left = min_t(unsigned long, left, (iter->dirty.npages << PAGE_SHIFT));
>> +
>> +return left;
>> +}
>> +
>> +unsigned long iommufd_dirty_iova_length(struct iommufd_dirty_iter *iter)
>> +{
>> +unsigned long left = iommufd_dirty_iter_bytes(iter);
>> +
>> +return ((BITS_PER_BYTE * left) << iter->dirty.pgshift);
>> +}
>> +
>> +unsigned long iommufd_dirty_iova(struct iommufd_dirty_iter *iter)
>> +{
>> +unsigned long skip = iter->bitmap_iter.iov_offset;
>> +
>> +return iter->iova + ((BITS_PER_BYTE * skip) << iter->dirty.pgshift);
>> +}
>> +
>> +void iommufd_dirty_iter_advance(struct iommufd_dirty_iter *iter)
>> +{
>> +iov_iter_advance(>bitmap_iter, iommufd_dirty_iter_bytes(iter));
>> +}
>> +
>> +void iommufd_dirty_iter_put(struct iommufd_dirty_iter *iter)
>> +{
>> +struct iommu_dirty_bitmap *dirty = >dirty;
>> +
>> +if (dirty->npages)
>> +unpin_user_pages(dirty->pages, dirty->npages);
>> +}
>> +
>> +int iommufd_dirty_iter_get(struct iommufd_dirty_iter *iter)
>> +{
>> +struct iommu_dirty_bitmap *dirty = >dirty;
>> +unsigned long npages;
>> +unsigned long ret;
>> +void *addr;
>> +
>> +addr = iter->bitmap_iov.iov_base + iter->bitmap_iter.iov_offset;
>> +npages = iov_iter_npages(>bitmap_iter,
>> + PAGE_SIZE / sizeof(struct page *));
>> +
>> +ret = pin_user_pages_fast((unsigned long) addr, npages,
>> +  FOLL_WRITE, dirty->pages);
>> +if (ret <= 0)
>> +return -EINVAL;
>> +
>> +dirty->npages = ret;
>> +dirty->iova = iommufd_dirty_iova(iter);
>> +dirty->start_offset = offset_in_page(addr);
>> +return 0;
>> +}
>> +
>> +static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
>> +  struct iommufd_dirty_data *bitmap)
> 
> This looks more like a helper in the iommu core. How about
> 
>   iommufd_read_clear_domain_dirty()?
> 
Heh, I guess that's more 

Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable

2022-05-02 Thread Joao Martins
On 4/30/22 00:51, Baolu Lu wrote:
> On 2022/4/29 05:09, Joao Martins wrote:
>> +int iopt_set_dirty_tracking(struct io_pagetable *iopt,
>> +struct iommu_domain *domain, bool enable)
>> +{
>> +struct iommu_domain *dom;
>> +unsigned long index;
>> +int ret = -EOPNOTSUPP;
>> +
>> +down_write(>iova_rwsem);
>> +if (!domain) {
>> +down_write(>domains_rwsem);
>> +xa_for_each(>domains, index, dom) {
>> +ret = iommu_set_dirty_tracking(dom, iopt, enable);
>> +if (ret < 0)
>> +break;
> 
> Do you need to roll back to the original state before return failure?
> Partial domains have already had dirty bit tracking enabled.
> 
Yeap, will fix the unwinding for next iteration.

>> +}
>> +up_write(>domains_rwsem);
>> +} else {
>> +ret = iommu_set_dirty_tracking(domain, iopt, enable);
>> +}
>> +
>> +up_write(>iova_rwsem);
>> +return ret;
>> +}
> 
> Best regards,
> baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 15/19] iommu/arm-smmu-v3: Add set_dirty_tracking_range() support

2022-05-02 Thread Joao Martins
[my mua made the message a tad crooked with the quotations]

On 5/2/22 12:52, Joao Martins wrote:
> On 4/29/22 20:20, Robin Murphy wrote:
>> On 2022-04-29 17:40, Joao Martins wrote:
>>> On 4/29/22 17:11, Jason Gunthorpe wrote:
 On Fri, Apr 29, 2022 at 03:45:23PM +0100, Joao Martins wrote:
> On 4/29/22 13:23, Jason Gunthorpe wrote:
>> On Fri, Apr 29, 2022 at 01:06:06PM +0100, Joao Martins wrote:
>>
 TBH I'd be inclined to just enable DBM unconditionally in
 arm_smmu_domain_finalise() if the SMMU supports it. Trying to toggle it
 dynamically (especially on a live domain) seems more trouble that it's
 worth.
>>>
>>> Hmmm, but then it would strip userland/VMM from any sort of control 
>>> (contrary
>>> to what we can do on the CPU/KVM side). e.g. the first time you do
>>> GET_DIRTY_IOVA it would return all dirtied IOVAs since the beginning
>>> of guest time, as opposed to those only after you enabled 
>>> dirty-tracking.
>>
>> It just means that on SMMU the start tracking op clears all the dirty
>> bits.
>>
> Hmm, OK. But aren't really picking a poison here? On ARM it's the 
> difference
> from switching the setting the DBM bit and put the IOPTE as 
> writeable-clean (which
> is clearing another bit) versus read-and-clear-when-dirty-track-start 
> which means
> we need to re-walk the pagetables to clear one bit.

 Yes, I don't think a iopte walk is avoidable?

>>> Correct -- exactly why I am still more learning towards enable DBM bit only 
>>> at start
>>> versus enabling DBM at domain-creation while clearing dirty at start.
>>
>> I'd say it's largely down to whether you want the bother of 
>> communicating a dynamic behaviour change into io-pgtable. The big 
>> advantage of having it just use DBM all the time is that you don't have 
>> to do that, and the "start tracking" operation is then nothing more than 
>> a normal "read and clear" operation but ignoring the read result.
>>
>> At this point I'd much rather opt for simplicity, and leave the fancier 
>> stuff to revisit later if and when somebody does demonstrate a 
>> significant overhead from using DBM when not strictly needed.
> OK -- I did get the code simplicity part[*]. Albeit my concern is that last
> point: if there's anything fundamentally affecting DMA performance then
> any SMMU user would see it even if they don't care at all about DBM (i.e. 
> regular
> baremetal/non-vm iommu usage).
> 

I can switch the SMMUv3 one to the always-enabled DBM bit.

> [*] It was how I had this initially PoC-ed. And really all IOMMU drivers 
> dirty tracking
> could be simplified to be always-enabled, and start/stop is essentially 
> flushing/clearing
> dirties. Albeit I like that this is only really used (by hardware) when 
> needed and any
> other DMA user isn't affected.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 15/19] iommu/arm-smmu-v3: Add set_dirty_tracking_range() support

2022-05-02 Thread Joao Martins
On 4/29/22 20:20, Robin Murphy wrote:
> On 2022-04-29 17:40, Joao Martins wrote:
>> On 4/29/22 17:11, Jason Gunthorpe wrote:
>>> On Fri, Apr 29, 2022 at 03:45:23PM +0100, Joao Martins wrote:
 On 4/29/22 13:23, Jason Gunthorpe wrote:
> On Fri, Apr 29, 2022 at 01:06:06PM +0100, Joao Martins wrote:
>
>>> TBH I'd be inclined to just enable DBM unconditionally in
>>> arm_smmu_domain_finalise() if the SMMU supports it. Trying to toggle it
>>> dynamically (especially on a live domain) seems more trouble that it's
>>> worth.
>>
>> Hmmm, but then it would strip userland/VMM from any sort of control 
>> (contrary
>> to what we can do on the CPU/KVM side). e.g. the first time you do
>> GET_DIRTY_IOVA it would return all dirtied IOVAs since the beginning
>> of guest time, as opposed to those only after you enabled dirty-tracking.
>
> It just means that on SMMU the start tracking op clears all the dirty
> bits.
>
 Hmm, OK. But aren't really picking a poison here? On ARM it's the 
 difference
 from switching the setting the DBM bit and put the IOPTE as 
 writeable-clean (which
 is clearing another bit) versus read-and-clear-when-dirty-track-start 
 which means
 we need to re-walk the pagetables to clear one bit.
>>>
>>> Yes, I don't think a iopte walk is avoidable?
>>>
>> Correct -- exactly why I am still more learning towards enable DBM bit only 
>> at start
>> versus enabling DBM at domain-creation while clearing dirty at start.
> 
> I'd say it's largely down to whether you want the bother of 
> communicating a dynamic behaviour change into io-pgtable. The big 
> advantage of having it just use DBM all the time is that you don't have 
> to do that, and the "start tracking" operation is then nothing more than 
> a normal "read and clear" operation but ignoring the read result.
> 
> At this point I'd much rather opt for simplicity, and leave the fancier 
> stuff to revisit later if and when somebody does demonstrate a 
> significant overhead from using DBM when not strictly needed.
> OK -- I did get the code simplicity part[*]. Albeit my concern is that last
point: if there's anything fundamentally affecting DMA performance then
any SMMU user would see it even if they don't care at all about DBM (i.e. 
regular
baremetal/non-vm iommu usage).

[*] It was how I had this initially PoC-ed. And really all IOMMU drivers dirty 
tracking
could be simplified to be always-enabled, and start/stop is essentially 
flushing/clearing
dirties. Albeit I like that this is only really used (by hardware) when needed 
and any
other DMA user isn't affected.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 10/37] iommu/amd: Introduce per PCI segment last_bdf

2022-05-02 Thread Joerg Roedel
Hi Vasant,

On Fri, Apr 29, 2022 at 08:15:49PM +0530, Vasant Hegde wrote:
> We still need to parse IVHD to find max devices supported by each PCI segment
> (same as the way its doing it today). Hence we need all these variables.

>From what I have seen since a few years the IVRS tables enumerate the
whole PCI segment, up to device ff:1f.7. This results in the maximum
being allocated for all data structures anyway. Therefore we can
probably think about skipping the scan to find the largest bdf and just
assume it is ff:1f.7, saving us all the size-tracking variables?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: dart: Add missing module owner to ops structure

2022-05-02 Thread Hector Martin
This is required to make loading this as a module work.

Signed-off-by: Hector Martin 
---
 drivers/iommu/apple-dart.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index decafb07ad08..a10a73282014 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -773,6 +773,7 @@ static const struct iommu_ops apple_dart_iommu_ops = {
.get_resv_regions = apple_dart_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.pgsize_bitmap = -1UL, /* Restricted during dart probe */
+   .owner = THIS_MODULE,
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = apple_dart_attach_dev,
.detach_dev = apple_dart_detach_dev,
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: fix an incorrect NULL check on list iterator

2022-05-02 Thread kernel test robot
Hi Xiaomeng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on joro-iommu/next]
[also build test ERROR on v5.18-rc5 next-20220429]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/intel-lab-lkp/linux/commits/Xiaomeng-Tong/iommu-fix-an-incorrect-NULL-check-on-list-iterator/20220501-211400
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: arm-allmodconfig 
(https://download.01.org/0day-ci/archive/20220502/202205021754.gethfnns-...@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/99e334beef5d5be25ed19d3142d16000f0a1986d
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Xiaomeng-Tong/iommu-fix-an-incorrect-NULL-check-on-list-iterator/20220501-211400
git checkout 99e334beef5d5be25ed19d3142d16000f0a1986d
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 
O=build_dir ARCH=arm SHELL=/bin/bash drivers/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/iommu/msm_iommu.c: In function 'qcom_iommu_of_xlate':
>> drivers/iommu/msm_iommu.c:629:17: error: 'ret' undeclared (first use in this 
>> function); did you mean 'net'?
 629 | ret = -ENODEV;
 | ^~~
 | net
   drivers/iommu/msm_iommu.c:629:17: note: each undeclared identifier is 
reported only once for each function it appears in
   drivers/iommu/msm_iommu.c:638:1: error: control reaches end of non-void 
function [-Werror=return-type]
 638 | }
 | ^
   cc1: some warnings being treated as errors


vim +629 drivers/iommu/msm_iommu.c

f78ebca8ff3d61 Sricharan R   2016-06-13  614  
f78ebca8ff3d61 Sricharan R   2016-06-13  615  static int 
qcom_iommu_of_xlate(struct device *dev,
f78ebca8ff3d61 Sricharan R   2016-06-13  616   struct 
of_phandle_args *spec)
f78ebca8ff3d61 Sricharan R   2016-06-13  617  {
99e334beef5d5b Xiaomeng Tong 2022-05-01  618struct msm_iommu_dev *iommu = 
NULL, *iter;
f78ebca8ff3d61 Sricharan R   2016-06-13  619unsigned long flags;
f78ebca8ff3d61 Sricharan R   2016-06-13  620  
f78ebca8ff3d61 Sricharan R   2016-06-13  621
spin_lock_irqsave(_iommu_lock, flags);
99e334beef5d5b Xiaomeng Tong 2022-05-01  622list_for_each_entry(iter, 
_iommu_devices, dev_node)
99e334beef5d5b Xiaomeng Tong 2022-05-01  623if (iter->dev->of_node 
== spec->np) {
99e334beef5d5b Xiaomeng Tong 2022-05-01  624iommu = iter;
f78ebca8ff3d61 Sricharan R   2016-06-13  625break;
99e334beef5d5b Xiaomeng Tong 2022-05-01  626}
f78ebca8ff3d61 Sricharan R   2016-06-13  627  
99e334beef5d5b Xiaomeng Tong 2022-05-01  628if (!iommu) {
f78ebca8ff3d61 Sricharan R   2016-06-13 @629ret = -ENODEV;
f78ebca8ff3d61 Sricharan R   2016-06-13  630goto fail;
f78ebca8ff3d61 Sricharan R   2016-06-13  631}
f78ebca8ff3d61 Sricharan R   2016-06-13  632  
bb5bdc5ab7f133 Xiaoke Wang   2022-04-28  633ret = insert_iommu_master(dev, 
, spec);
f78ebca8ff3d61 Sricharan R   2016-06-13  634  fail:
f78ebca8ff3d61 Sricharan R   2016-06-13  635
spin_unlock_irqrestore(_iommu_lock, flags);
f78ebca8ff3d61 Sricharan R   2016-06-13  636  
f78ebca8ff3d61 Sricharan R   2016-06-13  637return ret;
f78ebca8ff3d61 Sricharan R   2016-06-13  638  }
f78ebca8ff3d61 Sricharan R   2016-06-13  639  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] dt-bindings: arm-smmu: Add binding for SDX65 SMMU

2022-05-02 Thread Rohit Agarwal
Add devicetree binding for Qualcomm SDX65 SMMU.

Signed-off-by: Rohit Agarwal 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index da5381c..1f99bff 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -39,6 +39,7 @@ properties:
   - qcom,sc8180x-smmu-500
   - qcom,sdm845-smmu-500
   - qcom,sdx55-smmu-500
+  - qcom,sdx65-smmu-500
   - qcom,sm6350-smmu-500
   - qcom,sm8150-smmu-500
   - qcom,sm8250-smmu-500
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] mmc: host/sdhci-msm: Add compatible string check for sdx65

2022-05-02 Thread Rohit Agarwal
Add sdx65 SoC specific compatible string check inside qcom
'sdhci-msm' controller driver.

Signed-off-by: Rohit Agarwal 
---
 drivers/mmc/host/sdhci-msm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
index fd8b4a9..65661ad 100644
--- a/drivers/mmc/host/sdhci-msm.c
+++ b/drivers/mmc/host/sdhci-msm.c
@@ -2453,6 +2453,7 @@ static const struct of_device_id sdhci_msm_dt_match[] = {
 */
{.compatible = "qcom,qcs404-sdhci", .data = _msm_v5_var},
{.compatible = "qcom,sdx55-sdhci",  .data = _msm_v5_var},
+   {.compatible = "qcom,sdx65-sdhci",  .data = _msm_v5_var},
{.compatible = "qcom,sdm630-sdhci", .data = _msm_v5_var},
{.compatible = "qcom,sm6125-sdhci", .data = _msm_v5_var},
{.compatible = "qcom,sm6350-sdhci", .data = _msm_v5_var},
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] dt-bindings: mmc: sdhci-msm: Document the SDX65 compatible

2022-05-02 Thread Rohit Agarwal
The SDHCI controller on SDX65 is based on MSM SDHCI v5 IP. Hence,
document the compatible with "qcom,sdhci-msm-v5" as the fallback.

Signed-off-by: Rohit Agarwal 
---
 Documentation/devicetree/bindings/mmc/sdhci-msm.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/mmc/sdhci-msm.yaml 
b/Documentation/devicetree/bindings/mmc/sdhci-msm.yaml
index da42a88..e423633 100644
--- a/Documentation/devicetree/bindings/mmc/sdhci-msm.yaml
+++ b/Documentation/devicetree/bindings/mmc/sdhci-msm.yaml
@@ -33,6 +33,7 @@ properties:
   - qcom,sdm630-sdhci
   - qcom,sdm845-sdhci
   - qcom,sdx55-sdhci
+  - qcom,sdx65-sdhci
   - qcom,sm6125-sdhci
   - qcom,sm6350-sdhci
   - qcom,sm8150-sdhci
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] ARM: dts: qcom: sdx65: Add Shared memory manager support

2022-05-02 Thread Rohit Agarwal
Add smem node to support shared memory manager on SDX65 platform.

Signed-off-by: Rohit Agarwal 
---
 arch/arm/boot/dts/qcom-sdx65.dtsi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/qcom-sdx65.dtsi 
b/arch/arm/boot/dts/qcom-sdx65.dtsi
index 210e55c..57bda62 100644
--- a/arch/arm/boot/dts/qcom-sdx65.dtsi
+++ b/arch/arm/boot/dts/qcom-sdx65.dtsi
@@ -87,8 +87,10 @@
};
 
smem_mem: memory@8fe2 {
-   no-map;
+   compatible = "qcom,smem";
reg = <0x8fe2 0xc>;
+   hwlocks = <_mutex 3>;
+   no-map;
};
 
cmd_db: reserved-memory@8fee {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] SDHCI and SMEM updates for sdx65.

2022-05-02 Thread Rohit Agarwal
Hello,

This series adds some patches addressing comments from Bjorn and Ulf.
The patches are from the  original series "SDX65 devicetree updates" that
were needed some changes.

Thanks,
Rohit.

Rohit Agarwal (4):
  dt-bindings: mmc: sdhci-msm: Document the SDX65 compatible
  mmc: host/sdhci-msm: Add compatible string check for sdx65
  dt-bindings: arm-smmu: Add binding for SDX65 SMMU
  ARM: dts: qcom: sdx65: Add Shared memory manager support

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 1 +
 Documentation/devicetree/bindings/mmc/sdhci-msm.yaml  | 1 +
 arch/arm/boot/dts/qcom-sdx65.dtsi | 4 +++-
 drivers/mmc/host/sdhci-msm.c  | 1 +
 4 files changed, 6 insertions(+), 1 deletion(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu