Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Alex Williamson
On Thu, 7 Apr 2022 12:23:31 -0300
Jason Gunthorpe  wrote:

> On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:
> 
> > For the specific case of overriding PCIe No Snoop (which is more problematic
> > from an Arm SMMU PoV) when assigning to a VM, would that not be easier
> > solved by just having vfio-pci clear the "Enable No Snoop" control bit in
> > the endpoint's PCIe capability?  
> 
> Ideally.
> 
> That was rediscussed recently, apparently there are non-compliant
> devices and drivers that just ignore the bit. 
> 
> Presumably this is why x86 had to move to an IOMMU enforced feature..

I considered this option when implementing the current solution, but
ultimately I didn't have confidence in being able to prevent drivers
from using device specific means to effect the change anyway.  GPUs
especially have various back channels to config space.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent

2022-04-07 Thread Robin Murphy

On 2022-04-07 20:08, Jason Gunthorpe wrote:

On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote:

On 2022-04-07 18:43, Jason Gunthorpe wrote:

On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote:

At a glance, this all looks about the right shape to me now, thanks!


Thanks!


Ideally I'd hope patch #4 could go straight to device_iommu_capable() from
my Thunderbolt series, but we can figure that out in a couple of weeks once


Yes, this does helps that because now the only iommu_capable call is
in a context where a device is available :)


Derp, of course I have *two* VFIO patches waiting, the other one touching
the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much
as I hate it and would love to boot all that stuff over to
drivers/irqchip,


Oh me too...


it's not in my way so I'm leaving it be for now). I'll have to rebase that
anyway, so merging this as-is is absolutely fine!


This might help your effort - after this series and this below there
are no 'bus' users of iommu_capable left at all.


Thanks, but I still need a device for the iommu_domain_alloc() as well, 
so at that point the interrupt check is OK to stay where it is. I 
figured out a locking strategy per my original idea that seems pretty 
clean, it just needs vfio_group_viable() to go away first:


https://gitlab.arm.com/linux-arm/linux-rm/-/commit/c6057da9f6b5f4b0fb67c6e647d2f8f76a6177fc

Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/2] PCI: ACPI: Support Microsoft's "DmaProperty"

2022-04-07 Thread Bjorn Helgaas
In subject,

  PCI/ACPI: ...

would be consistent with previous history (at least things coming
through the PCI tree :)).

On Fri, Mar 25, 2022 at 11:46:08AM -0700, Rajat Jain wrote:
> The "DmaProperty" is supported and documented by Microsoft here:
> https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports

Here's a more specific link (could probably be referenced below to
avoid cluttering the text here):

https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-internal-pcie-ports-accessible-to-users-and-requiring-dma-protection

> They use this property for DMA protection:
> https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt
> 
> Support the "DmaProperty" with the same semantics. This is useful for
> internal PCI devices that do not hang off a PCIe rootport, but offer
> an attack surface for DMA attacks (e.g. internal network devices).

Same semantics as what?

The MS description of "ExternalFacingPort" says:

  This ACPI object enables the operating system to identify externally
  exposed PCIe hierarchies, such as Thunderbolt.

and "DmaProperty" says:

  This ACPI object enables the operating system to identify internal
  PCIe hierarchies that are easily accessible by users (such as,
  Laptop M.2 PCIe slots accessible by way of a latch) and require
  protection by the OS Kernel DMA Protection mechanism.

I don't really understand why they called out "laptop M.2 PCIe slots"
here.  Is the idea that those are more accessible than a standard
internal PCIe slot?  Seems like a pretty small distinction to me.

I can understand your example of internal network devices adding an
attack surface.  But I don't see how "DmaProperty" helps identify
those.  Wouldn't a NIC in a standard internal PCIe slot add the same
attack surface?

> Signed-off-by: Rajat Jain 
> Reviewed-by: Mika Westerberg 
> ---
> v5: * Reorder the patches in the series
> v4: * Add the GUID. 
> * Update the comment and commitlog.
> v3: * Use Microsoft's documented property "DmaProperty"
> * Resctrict to ACPI only
> 
>  drivers/acpi/property.c |  3 +++
>  drivers/pci/pci-acpi.c  | 16 
>  2 files changed, 19 insertions(+)
> 
> diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
> index d0986bda2964..20603cacc28d 100644
> --- a/drivers/acpi/property.c
> +++ b/drivers/acpi/property.c
> @@ -48,6 +48,9 @@ static const guid_t prp_guids[] = {
>   /* Storage device needs D3 GUID: 5025030f-842f-4ab4-a561-99a5189762d0 */
>   GUID_INIT(0x5025030f, 0x842f, 0x4ab4,
> 0xa5, 0x61, 0x99, 0xa5, 0x18, 0x97, 0x62, 0xd0),
> + /* DmaProperty for PCI devices GUID: 
> 70d24161-6dd5-4c9e-8070-705531292865 */
> + GUID_INIT(0x70d24161, 0x6dd5, 0x4c9e,
> +   0x80, 0x70, 0x70, 0x55, 0x31, 0x29, 0x28, 0x65),
>  };
>  
>  /* ACPI _DSD data subnodes GUID: dbb8e3e6-5886-4ba6-8795-1319f52a966b */
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 1f15ab7eabf8..378e05096c52 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1350,12 +1350,28 @@ static void pci_acpi_set_external_facing(struct 
> pci_dev *dev)
>   dev->external_facing = 1;
>  }
>  
> +static void pci_acpi_check_for_dma_protection(struct pci_dev *dev)

I try to avoid function names like *_check_*() because they don't give
any hint about whether there's a side effect or what direction things
are going.  I prefer things that return a value or make sense when
used as a predicate.  Maybe something like this?

  int pci_dev_has_dma_property(struct pci_dev *dev)

  dev->untrusted |= pci_dev_has_dma_property(pci_dev);

> +{
> + u8 val;
> +
> + /*
> +  * Property also used by Microsoft Windows for same purpose,
> +  * (to implement DMA protection from a device, using the IOMMU).
> +  */
> + if (device_property_read_u8(&dev->dev, "DmaProperty", &val))

The MS web page says a _DSD with this property must be implemented in
the Root Port device scope, but we don't enforce that here.  We *do*
enforce it in pci_acpi_set_untrusted().  Shouldn't we do the same
here?

We currently look at three properties from the same _DSD:

  DmaProperty
  ExternalFacingPort
  HotPlugSupportInD3

For "HotPlugSupportInD3", we check that "value == 1".  For
"ExternalFacingPort", we check that it's non-zero.  The MS doc isn't
explicit about the values, but shows "1" in the sample ASL.  I think
we should handle all three cases the same.

The first two use device_property_read_u8(); the last uses
acpi_dev_get_property().  Again, I think they should all be the same.

acpi_dev_get_property() is easier for me to read because there are
slightly fewer layers of abstraction between _DSD and
acpi_dev_get_property().

But IIUC, device_property_read_u8() works for either ACPI or DT
properties, and maybe there is interest in using this for DT systems.
None of these appear in any in

Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent

2022-04-07 Thread Jason Gunthorpe via iommu
On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote:
> On 2022-04-07 18:43, Jason Gunthorpe wrote:
> > On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote:
> > > At a glance, this all looks about the right shape to me now, thanks!
> > 
> > Thanks!
> > 
> > > Ideally I'd hope patch #4 could go straight to device_iommu_capable() from
> > > my Thunderbolt series, but we can figure that out in a couple of weeks 
> > > once
> > 
> > Yes, this does helps that because now the only iommu_capable call is
> > in a context where a device is available :)
> 
> Derp, of course I have *two* VFIO patches waiting, the other one touching
> the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much
> as I hate it and would love to boot all that stuff over to
> drivers/irqchip,

Oh me too...

> it's not in my way so I'm leaving it be for now). I'll have to rebase that
> anyway, so merging this as-is is absolutely fine!

This might help your effort - after this series and this below there
are no 'bus' users of iommu_capable left at all.

>From 55d72be40bc0a031711e318c8dd1cb60673d9eca Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe 
Date: Thu, 7 Apr 2022 16:00:50 -0300
Subject: [PATCH] vfio: Move the IOMMU_CAP_INTR_REMAP to a context with a
 struct device

This check is done to ensure that the device we want to use is fully
isolated and the platform does not allow the device's MemWr TLPs to
trigger unauthorized MSIs.

Instead of doing it in the container context where we only have a group,
move the check to open_device where we actually know the device.

This is still security safe as userspace cannot trigger an MemWr TLPs
without obtaining a device fd.

Signed-off-by: Jason Gunthorpe 
---
 drivers/vfio/vfio.c |  9 +
 drivers/vfio/vfio.h |  1 +
 drivers/vfio/vfio_iommu_type1.c | 28 +---
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 9edad767cfdad3..8db5cea1dc1d75 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1355,6 +1355,15 @@ static int vfio_group_get_device_fd(struct vfio_group 
*group, char *buf)
if (IS_ERR(device))
return PTR_ERR(device);
 
+   /* Confirm this device is compatible with the container */
+   if (group->type == VFIO_IOMMU &&
+   group->container->iommu_driver->ops->device_ok) {
+   ret = group->container->iommu_driver->ops->device_ok(
+   group->container->iommu_data, device->dev);
+   if (ret)
+   goto err_device_put;
+   }
+
if (!try_module_get(device->dev->driver->owner)) {
ret = -ENODEV;
goto err_device_put;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index a6713022115155..3db60de71d42eb 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -66,6 +66,7 @@ struct vfio_iommu_driver_ops {
   struct iommu_group *group);
void(*notify)(void *iommu_data,
  enum vfio_iommu_notify_type event);
+   int (*device_ok)(void *iommu_data, struct device *device);
 };
 
 int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e35759..5e966fb0ab9202 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2153,6 +2153,21 @@ static void vfio_iommu_iova_insert_copy(struct 
vfio_iommu *iommu,
list_splice_tail(iova_copy, iova);
 }
 
+static int vfio_iommu_device_ok(void *iommu_data, struct device *device)
+{
+   bool msi_remap;
+
+   msi_remap = irq_domain_check_msi_remap() ||
+   iommu_capable(device->bus, IOMMU_CAP_INTR_REMAP);
+
+   if (!allow_unsafe_interrupts && !msi_remap) {
+   pr_warn("%s: No interrupt remapping support.  Use the module 
param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this 
platform\n",
+   __func__);
+   return -EPERM;
+   }
+   return 0;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_group *iommu_group, enum vfio_group_type type)
 {
@@ -2160,7 +2175,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
struct vfio_iommu_group *group;
struct vfio_domain *domain, *d;
struct bus_type *bus = NULL;
-   bool resv_msi, msi_remap;
+   bool resv_msi;
phys_addr_t resv_msi_base = 0;
struct iommu_domain_geometry *geo;
LIST_HEAD(iova_copy);
@@ -2257,16 +2272,6 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
INIT_LIST_HEAD(&domain->group_list);
list_add(&group->next, &domain->group_list);
 
-   msi_remap = irq_domain_check_msi_remap() ||
-   iommu_capable(bus, IOMMU_

Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent

2022-04-07 Thread Robin Murphy

On 2022-04-07 18:43, Jason Gunthorpe wrote:

On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote:

At a glance, this all looks about the right shape to me now, thanks!


Thanks!


Ideally I'd hope patch #4 could go straight to device_iommu_capable() from
my Thunderbolt series, but we can figure that out in a couple of weeks once


Yes, this does helps that because now the only iommu_capable call is
in a context where a device is available :)


Derp, of course I have *two* VFIO patches waiting, the other one 
touching the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, 
which, much as I hate it and would love to boot all that stuff over to 
drivers/irqchip, it's not in my way so I'm leaving it be for now). I'll 
have to rebase that anyway, so merging this as-is is absolutely fine!


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] drm/tegra: Stop using iommu_present()

2022-04-07 Thread Dmitry Osipenko
On 4/6/22 21:06, Robin Murphy wrote:
> On 2022-04-06 15:32, Dmitry Osipenko wrote:
>> On 4/5/22 17:19, Robin Murphy wrote:
>>> Remove the pointless check. host1x_drm_wants_iommu() cannot return true
>>> unless an IOMMU exists for the host1x platform device, which at the
>>> moment
>>> means the iommu_present() test could never fail.
>>>
>>> Signed-off-by: Robin Murphy 
>>> ---
>>>   drivers/gpu/drm/tegra/drm.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>> index 9464f522e257..bc4321561400 100644
>>> --- a/drivers/gpu/drm/tegra/drm.c
>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>> @@ -1149,7 +1149,7 @@ static int host1x_drm_probe(struct
>>> host1x_device *dev)
>>>   goto put;
>>>   }
>>>   -    if (host1x_drm_wants_iommu(dev) &&
>>> iommu_present(&platform_bus_type)) {
>>> +    if (host1x_drm_wants_iommu(dev)) {
>>>   tegra->domain = iommu_domain_alloc(&platform_bus_type);
>>>   if (!tegra->domain) {
>>>   err = -ENOMEM;
>>
>> host1x_drm_wants_iommu() returns true if there is no IOMMU for the
>> host1x platform device of Tegra20/30 SoCs.
> 
> Ah, apparently this is another example of what happens when I write
> patches late on a Friday night...
> 
> So on second look, what we want to ascertain here is whether dev has an
> IOMMU, but only if the host1x parent is not addressing-limited, either
> because it can also use the IOMMU, or because all possible addresses are
> small enough anyway, right? 

Yes

> Are we specifically looking for the host1x
> having a DMA-API-managed domain, or can that also end up using the
> tegra->domain or another unmanaged domain too?

We have host1x DMA that could have:

1. No IOMMU domain, depending on kernel/DT config
2. Managed domain, on newer SoCs
3. Unmanaged domain, on older SoCs

We have Tegra DRM devices which can:

1. Be attached to a shared unmanaged tegra->domain, on older SoCs
2. Have own managed domains, on newer SoCs

> I can't quite figure out
> from the comments whether it's physical addresses, IOVAs, or both that
> we're concerned with here.

Tegra DRM allocates buffers and submits jobs to h/w using host1x's
channel DMA. DRM framebuffers' addresses are inserted into host1x
command buffers by kernel driver and addresses beyond 32bit space need
to be treated specially, we don't support such addresses in upstream.

IOMMU AS is limited to 32bits on Tegra in upstream kernel for pre-T186
SoCs, it hides 64bit addresses from host1x. Post-T186 SoCs have extra
features that allow kernel driver not to bother about addresses.

For newer ARM64 SoCs there is assumption in the Tegra drivers that IOMMU
always presents, to simplify things.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent

2022-04-07 Thread Jason Gunthorpe via iommu
On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote:
> At a glance, this all looks about the right shape to me now, thanks!

Thanks!

> Ideally I'd hope patch #4 could go straight to device_iommu_capable() from
> my Thunderbolt series, but we can figure that out in a couple of weeks once

Yes, this does helps that because now the only iommu_capable call is
in a context where a device is available :)

> Joerg starts queueing 5.19 material. I've got another VFIO patch waiting for
> the DMA ownership series to land anyway, so it's hardly the end of the world
> if I have to come back to follow up on this one too.

Hopefully Joerg will start soon, I also have patches written waiting
for the DMA ownership series.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent

2022-04-07 Thread Robin Murphy

On 2022-04-07 16:23, Jason Gunthorpe wrote:

PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented
by a platform as bypassing elements in the DMA coherent CPU cache
hierarchy. A driver can command a device to set this bit on some of its
transactions as a micro-optimization.

However, the driver is now responsible to synchronize the CPU cache with
the DMA that bypassed it. On x86 this may be done through the wbinvd
instruction, and the i915 GPU driver is the only Linux DMA driver that
calls it.

The problem comes that KVM on x86 will normally disable the wbinvd
instruction in the guest and render it a NOP. As the driver running in the
guest is not aware the wbinvd doesn't work it may still cause the device
to set the no-snoop bit and the platform will bypass the CPU cache.
Without a working wbinvd there is no way to re-synchronize the CPU cache
and the driver in the VM has data corruption.

Thus, we see a general direction on x86 that the IOMMU HW is able to block
the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to
to NOP the wbinvd without causing any data corruption.

This control for Intel IOMMU was exposed by using IOMMU_CACHE and
IOMMU_CAP_CACHE_COHERENCY, however these two values now have multiple
meanings and usages beyond blocking no-snoop and the whole thing has
become confused. AMD IOMMU has the same feature and same IOPTE bits
however it unconditionally blocks no-snoop.

Change it so that:
  - IOMMU_CACHE is only about the DMA coherence of normal DMAs from a
device. It is used by the DMA API/VFIO/etc when the user of the
iommu_domain will not be doing manual cache coherency operations.

  - IOMMU_CAP_CACHE_COHERENCY indicates if IOMMU_CACHE can be used with the
device.

  - The new optional domain op enforce_cache_coherency() will cause the
entire domain to block no-snoop requests - ie there is no way for any
device attached to the domain to opt out of the IOMMU_CACHE behavior.
This is permanent on the domain and must apply to any future devices
attached to it.

Ideally an iommu driver should implement enforce_cache_coherency() so that
by DMA API domains allow the no-snoop optimization. This leaves it
available to kernel drivers like i915. VFIO will call
enforce_cache_coherency() before establishing any mappings and the domain
should then permanently block no-snoop.

If enforce_cache_coherency() fails VFIO will communicate back through to
KVM into the arch code via kvm_arch_register_noncoherent_dma()
(only implemented by x86) which triggers a working wbinvd to be made
available to the VM.

While other iommu drivers are certainly welcome to implement
enforce_cache_coherency(), it is not clear there is any benefit in doing
so right now.

This is on github: https://github.com/jgunthorpe/linux/commits/intel_no_snoop

v2:
  - Abandon removing IOMMU_CAP_CACHE_COHERENCY - instead make it the cap
flag that indicates IOMMU_CACHE is supported
  - Put the VFIO tests for IOMMU_CACHE at VFIO device registration
  - In the Intel driver remove the domain->iommu_snooping value, this is
global not per-domain


At a glance, this all looks about the right shape to me now, thanks!

Ideally I'd hope patch #4 could go straight to device_iommu_capable() 
from my Thunderbolt series, but we can figure that out in a couple of 
weeks once Joerg starts queueing 5.19 material. I've got another VFIO 
patch waiting for the DMA ownership series to land anyway, so it's 
hardly the end of the world if I have to come back to follow up on this 
one too.


For the series,

Acked-by: Robin Murphy 


v1: 
https://lore.kernel.org/r/0-v1-ef02c60ddb76+12ca2-intel_no_snoop_...@nvidia.com

Jason Gunthorpe (4):
   iommu: Introduce the domain op enforce_cache_coherency()
   vfio: Move the Intel no-snoop control off of IOMMU_CACHE
   iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for
 IOMMU_CACHE
   vfio: Require that devices support DMA cache coherence

  drivers/iommu/amd/iommu.c   |  7 +++
  drivers/iommu/intel/iommu.c | 17 +
  drivers/vfio/vfio.c |  7 +++
  drivers/vfio/vfio_iommu_type1.c | 30 +++---
  include/linux/intel-iommu.h |  2 +-
  include/linux/iommu.h   |  7 +--
  6 files changed, 52 insertions(+), 18 deletions(-)


base-commit: 3123109284176b1532874591f7c81f3837bbdc17

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 21/21] nvme-pci: allow mmaping the CMB in userspace

2022-04-07 Thread Logan Gunthorpe
Allow userspace to obtain CMB memory by mmaping the controller's
char device. The mmap call allocates and returns a hunk of CMB memory,
(the offset is ignored) so userspace does not have control over the
address within the CMB.

A VMA allocated in this way will only be usable by drivers that set
FOLL_PCI_P2PDMA when calling GUP. And inter-device support will be
checked the first time the pages are mapped for DMA.

Currently this is only supported by O_DIRECT to an PCI NVMe device
or through the NVMe passthrough IOCTL.

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/host/core.c | 15 +++
 drivers/nvme/host/nvme.h |  2 ++
 drivers/nvme/host/pci.c  | 17 +
 3 files changed, 34 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bbc276dda49f..1fd3372c2c18 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3114,6 +3114,10 @@ static int nvme_dev_open(struct inode *inode, struct 
file *file)
}
 
file->private_data = ctrl;
+
+   if (ctrl->ops->cdev_file_open)
+   ctrl->ops->cdev_file_open(ctrl, file);
+
return 0;
 }
 
@@ -3127,12 +3131,23 @@ static int nvme_dev_release(struct inode *inode, struct 
file *file)
return 0;
 }
 
+static int nvme_dev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct nvme_ctrl *ctrl = file->private_data;
+
+   if (!ctrl->ops->mmap_cmb)
+   return -ENODEV;
+
+   return ctrl->ops->mmap_cmb(ctrl, vma);
+}
+
 static const struct file_operations nvme_dev_fops = {
.owner  = THIS_MODULE,
.open   = nvme_dev_open,
.release= nvme_dev_release,
.unlocked_ioctl = nvme_dev_ioctl,
.compat_ioctl   = compat_ptr_ioctl,
+   .mmap   = nvme_dev_mmap,
 };
 
 static ssize_t nvme_sysfs_reset(struct device *dev,
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 7d97bfb2a9e2..24fbcd274c64 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -497,6 +497,8 @@ struct nvme_ctrl_ops {
void (*delete_ctrl)(struct nvme_ctrl *ctrl);
int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
+   void (*cdev_file_open)(struct nvme_ctrl *ctrl, struct file *file);
+   int (*mmap_cmb)(struct nvme_ctrl *ctrl, struct vm_area_struct *vma);
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 07412116d4d1..5946244e0295 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2965,6 +2965,21 @@ static bool nvme_pci_supports_pci_p2pdma(struct 
nvme_ctrl *ctrl)
return dma_pci_p2pdma_supported(dev->dev);
 }
 
+static void nvme_pci_cdev_file_open(struct nvme_ctrl *ctrl, struct file *file)
+{
+   struct pci_dev *pdev = to_pci_dev(to_nvme_dev(ctrl)->dev);
+
+   pci_p2pdma_file_open(pdev, file);
+}
+
+static int nvme_pci_mmap_cmb(struct nvme_ctrl *ctrl,
+struct vm_area_struct *vma)
+{
+   struct pci_dev *pdev = to_pci_dev(to_nvme_dev(ctrl)->dev);
+
+   return pci_mmap_p2pmem(pdev, vma);
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
.name   = "pcie",
.module = THIS_MODULE,
@@ -2976,6 +2991,8 @@ static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
.submit_async_event = nvme_pci_submit_async_event,
.get_address= nvme_pci_get_address,
.supports_pci_p2pdma= nvme_pci_supports_pci_p2pdma,
+   .cdev_file_open = nvme_pci_cdev_file_open,
+   .mmap_cmb   = nvme_pci_mmap_cmb,
 };
 
 static int nvme_dev_map(struct nvme_dev *dev)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 12/21] RDMA/rw: drop pci_p2pdma_[un]map_sg()

2022-04-07 Thread Logan Gunthorpe
dma_map_sg() now supports the use of P2PDMA pages so pci_p2pdma_map_sg()
is no longer necessary and may be dropped. This means the
rdma_rw_[un]map_sg() helpers are no longer necessary. Remove it all.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
 drivers/infiniband/core/rw.c | 45 
 1 file changed, 9 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 4d98f931a13d..8367974b7998 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -274,33 +274,6 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
return 1;
 }
 
-static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
-u32 sg_cnt, enum dma_data_direction dir)
-{
-   if (is_pci_p2pdma_page(sg_page(sg)))
-   pci_p2pdma_unmap_sg(dev->dma_device, sg, sg_cnt, dir);
-   else
-   ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
-}
-
-static int rdma_rw_map_sgtable(struct ib_device *dev, struct sg_table *sgt,
-  enum dma_data_direction dir)
-{
-   int nents;
-
-   if (is_pci_p2pdma_page(sg_page(sgt->sgl))) {
-   if (WARN_ON_ONCE(ib_uses_virt_dma(dev)))
-   return 0;
-   nents = pci_p2pdma_map_sg(dev->dma_device, sgt->sgl,
- sgt->orig_nents, dir);
-   if (!nents)
-   return -EIO;
-   sgt->nents = nents;
-   return 0;
-   }
-   return ib_dma_map_sgtable_attrs(dev, sgt, dir, 0);
-}
-
 /**
  * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
  * @ctx:   context to initialize
@@ -327,7 +300,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp 
*qp, u32 port_num,
};
int ret;
 
-   ret = rdma_rw_map_sgtable(dev, &sgt, dir);
+   ret = ib_dma_map_sgtable_attrs(dev, &sgt, dir, 0);
if (ret)
return ret;
sg_cnt = sgt.nents;
@@ -366,7 +339,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp 
*qp, u32 port_num,
return ret;
 
 out_unmap_sg:
-   rdma_rw_unmap_sg(dev, sgt.sgl, sgt.orig_nents, dir);
+   ib_dma_unmap_sgtable_attrs(dev, &sgt, dir, 0);
return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_init);
@@ -414,12 +387,12 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
return -EINVAL;
}
 
-   ret = rdma_rw_map_sgtable(dev, &sgt, dir);
+   ret = ib_dma_map_sgtable_attrs(dev, &sgt, dir, 0);
if (ret)
return ret;
 
if (prot_sg_cnt) {
-   ret = rdma_rw_map_sgtable(dev, &prot_sgt, dir);
+   ret = ib_dma_map_sgtable_attrs(dev, &prot_sgt, dir, 0);
if (ret)
goto out_unmap_sg;
}
@@ -486,9 +459,9 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
kfree(ctx->reg);
 out_unmap_prot_sg:
if (prot_sgt.nents)
-   rdma_rw_unmap_sg(dev, prot_sgt.sgl, prot_sgt.orig_nents, dir);
+   ib_dma_unmap_sgtable_attrs(dev, &prot_sgt, dir, 0);
 out_unmap_sg:
-   rdma_rw_unmap_sg(dev, sgt.sgl, sgt.orig_nents, dir);
+   ib_dma_unmap_sgtable_attrs(dev, &sgt, dir, 0);
return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_signature_init);
@@ -621,7 +594,7 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct 
ib_qp *qp,
break;
}
 
-   rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
@@ -649,8 +622,8 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
kfree(ctx->reg);
 
if (prot_sg_cnt)
-   rdma_rw_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
-   rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 00/21] Userspace P2PDMA with O_DIRECT NVMe devices

2022-04-07 Thread Logan Gunthorpe
Hi,

This patchset continues my work to add userspace P2PDMA access using
O_DIRECT NVMe devices. This posting contains some minor fixes and a
rebase onto v5.18-rc1 which contains cleanup from Christoph around
free_zone_device_page() that helps to enable this patchset. The
previous posting was here[1].

The patchset enables userspace P2PDMA by allowing userspace to mmap()
allocated chunks of the CMB. The resulting VMA can be passed only
to O_DIRECT IO on NVMe backed files or block devices. A flag is added
to GUP() in Patch <>, then Patches <> through <> wire this flag up based
on whether the block queue indicates P2PDMA support. Patches <>
through <> enable the CMB to be mapped into userspace by mmaping
the nvme char device.

This is relatively straightforward, however the one significant
problem is that, presently, pci_p2pdma_map_sg() requires a homogeneous
SGL with all P2PDMA pages or all regular pages. Enhancing GUP to
support enforcing this rule would require a huge hack that I don't
expect would be all that pallatable. So the first 13 patches add
support for P2PDMA pages to dma_map_sg[table]() to the dma-direct
and dma-iommu implementations. Thus systems without an IOMMU plus
Intel and AMD IOMMUs are supported. (Other IOMMU implementations would
then be unsupported, notably ARM and PowerPC but support would be added
when they convert to dma-iommu).

dma_map_sgtable() is preferred when dealing with P2PDMA memory as it
will return -EREMOTEIO when the DMA device cannot map specific P2PDMA
pages based on the existing rules in calc_map_type_and_dist().

The other issue is dma_unmap_sg() needs a flag to determine whether a
given dma_addr_t was mapped regularly or as a PCI bus address. To allow
this, a third flag is added to the page_link field in struct
scatterlist. This effectively means support for P2PDMA will now depend
on CONFIG_64BIT.

Feedback welcome.

This series is based on v5.18-rc1. A git branch is available here:

  https://github.com/sbates130272/linux-p2pmem/  p2pdma_user_cmb_v6

Thanks,

Logan

[1] lkml.kernel.org/r/20220128002614.6136-1-log...@deltatee.com

--

Changes since v5:
  - Rebased onto v5.18-rc1 which includes Christophs cleanup to
free_zone_device_page() (similar to Ralph's patch).
  - Fix bug with concurrent first calls to pci_p2pdma_vma_fault()
that caused a double allocation and lost p2p memory. Noticed
by Andrew Maier.
  - Collected a Reviewed-by tag from Chaitanya.
  - Numerous minor fixes to commit messages

Changes since v4:
  - Rebase onto v5.17-rc1.
  - Included Ralph Cambell's patches which removes the ZONE_DEVICE page
reference count offset. This is just to demonstrate that this
series is compatible with that direction.
  - Added a comment in pci_p2pdma_map_sg_attrs(), per Chaitanya and
included his Reviewed-by tags.
  - Patch 1 in the last series which cleaned up scatterlist.h
has been upstreamed.
  - Dropped NEED_SG_DMA_BUS_ADDR_FLAG seeing depends on doesn't
work with selected symbols, per Christoph.
  - Switched iov_iter_get_pages_[alloc_]flags to be exported with
EXPORT_SYMBOL_GPL, per Christoph.
  - Renamed zone_device_pages_are_mergeable() to
zone_device_pages_have_same_pgmap(), per Christoph.
  - Renamed .mmap_file_open operation in nvme_ctrl_ops to
cdev_file_open(), per Christoph.

Changes since v3:
  - Add some comment and commit message cleanups I had missed for v3,
also moved the prototypes for some of the p2pdma helpers to
dma-map-ops.h (which I missed in v3 and was suggested in v2).
  - Add separate cleanup patch for scatterlist.h and change the macros
to functions. (Suggested by Chaitanya and Jason, respectively)
  - Rename sg_dma_mark_pci_p2pdma() and sg_is_dma_pci_p2pdma() to
sg_dma_mark_bus_address() and sg_is_dma_bus_address() which
is a more generic name (As requested by Jason)
  - Fixes to some comments and commit messages as suggested by Bjorn
and Jason.
  - Ensure swiotlb is not used with P2PDMA pages. (Per Jason)
  - The sgtable coversion in RDMA was split out and sent upstream
separately, the new patch is only the removal. (Per Jason)
  - Moved the FOLL_PCI_P2PDMA check outside of get_dev_pagemap() as
Jason suggested this will be removed in the near term.
  - Add two patches to ensure that zone device pages with different
pgmaps are never merged in the block layer or
sg_alloc_append_table_from_pages() (Per Jason)
  - Ensure synchronize_rcu() or call_rcu() is used before returning
pages to the genalloc. (Jason pointed out that pages are not
gauranteed to be unused in all architectures until at least
after an RCU grace period, and that synchronize_rcu() was likely
too slow to use in the vma close operation.
  - Collected Acks and Reviews by Bjorn, Jason and Max.

Logan Gunthorpe (21):
  lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
  PCI/P2PDMA: Attempt to set map_type if it has not been set
  PCI/P2PDMA: Expose pci_p2pdma_map_ty

[PATCH v6 02/21] PCI/P2PDMA: Attempt to set map_type if it has not been set

2022-04-07 Thread Logan Gunthorpe
Attempt to find the mapping type for P2PDMA pages on the first
DMA map attempt if it has not been done ahead of time.

Previously, the mapping type was expected to be calculated ahead of
time, but if pages are to come from userspace then there's no
way to ensure the path was checked ahead of time.

This change will calculate the mapping type if it hasn't pre-calculated
so it is no longer invalid to call pci_p2pdma_map_sg() before the mapping
type is calculated, so drop the WARN_ON when that is the case.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
Reviewed-by: Chaitanya Kulkarni 
---
 drivers/pci/p2pdma.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 30b1df3c9d2f..c3a68e82cf36 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -849,6 +849,7 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
dev_pagemap *pgmap,
struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
struct pci_dev *client;
struct pci_p2pdma *p2pdma;
+   int dist;
 
if (!provider->p2pdma)
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
@@ -865,6 +866,10 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
dev_pagemap *pgmap,
type = xa_to_value(xa_load(&p2pdma->map_types,
   map_types_idx(client)));
rcu_read_unlock();
+
+   if (type == PCI_P2PDMA_MAP_UNKNOWN)
+   return calc_map_type_and_dist(provider, client, &dist, true);
+
return type;
 }
 
@@ -907,7 +912,7 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
case PCI_P2PDMA_MAP_BUS_ADDR:
return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
default:
-   WARN_ON_ONCE(1);
+   /* Mapping is not Supported */
return 0;
}
 }
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 16/21] block: add check when merging zone device pages

2022-04-07 Thread Logan Gunthorpe
Consecutive zone device pages should not be merged into the same sgl
or bvec segment with other types of pages or if they belong to different
pgmaps. Otherwise getting the pgmap of a given segment is not possible
without scanning the entire segment. This helper returns true either if
both pages are not zone device pages or both pages are zone device
pages with the same pgmap.

Add a helper to determine if zone device pages are mergeable and use
this helper in page_is_mergeable().

Signed-off-by: Logan Gunthorpe 
---
 block/bio.c|  2 ++
 include/linux/mm.h | 23 +++
 2 files changed, 25 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index cdd7b2915c53..3406c0450db3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -834,6 +834,8 @@ static inline bool page_is_mergeable(const struct bio_vec 
*bv,
return false;
if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
return false;
+   if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
+   return false;
 
*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
if (*same_page)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 14ef41af8b77..fb2264a17e4a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1108,6 +1108,24 @@ static inline bool is_zone_device_page(const struct page 
*page)
 {
return page_zonenum(page) == ZONE_DEVICE;
 }
+
+/*
+ * Consecutive zone device pages should not be merged into the same sgl
+ * or bvec segment with other types of pages or if they belong to different
+ * pgmaps. Otherwise getting the pgmap of a given segment is not possible
+ * without scanning the entire segment. This helper returns true either if
+ * both pages are not zone device pages or both pages are zone device pages
+ * with the same pgmap.
+ */
+static inline bool zone_device_pages_have_same_pgmap(const struct page *a,
+const struct page *b)
+{
+   if (is_zone_device_page(a) != is_zone_device_page(b))
+   return false;
+   if (!is_zone_device_page(a))
+   return true;
+   return a->pgmap == b->pgmap;
+}
 extern void memmap_init_zone_device(struct zone *, unsigned long,
unsigned long, struct dev_pagemap *);
 #else
@@ -1115,6 +1133,11 @@ static inline bool is_zone_device_page(const struct page 
*page)
 {
return false;
 }
+static inline bool zone_device_pages_have_same_pgmap(const struct page *a,
+const struct page *b)
+{
+   return true;
+}
 #endif
 
 static inline bool folio_is_zone_device(const struct folio *folio)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 14/21] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages

2022-04-07 Thread Logan Gunthorpe
GUP Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to
allow obtaining P2PDMA pages. If GUP is called without the flag and a
P2PDMA page is found, it will return an error.

FOLL_PCI_P2PDMA cannot be set if FOLL_LONGTERM is set.

Signed-off-by: Logan Gunthorpe 
---
 include/linux/mm.h |  1 +
 mm/gup.c   | 22 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e34edb775334..14ef41af8b77 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2936,6 +2936,7 @@ struct page *follow_page(struct vm_area_struct *vma, 
unsigned long address,
 #define FOLL_SPLIT_PMD 0x2 /* split huge pmd before returning */
 #define FOLL_PIN   0x4 /* pages must be released via unpin_user_page */
 #define FOLL_FAST_ONLY 0x8 /* gup_fast: prevent fall-back to slow gup */
+#define FOLL_PCI_P2PDMA0x10 /* allow returning PCI P2PDMA pages */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
diff --git a/mm/gup.c b/mm/gup.c
index f598a037eb04..0af6f802ca38 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -490,6 +490,12 @@ static struct page *follow_page_pte(struct vm_area_struct 
*vma,
page = pte_page(pte);
else
goto no_page;
+
+   if (unlikely(!(flags & FOLL_PCI_P2PDMA) &&
+is_pci_p2pdma_page(page))) {
+   page = ERR_PTR(-EREMOTEIO);
+   goto out;
+   }
} else if (unlikely(!page)) {
if (flags & FOLL_DUMP) {
/* Avoid special (like zero) pages in core dumps */
@@ -919,6 +925,9 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
return -EOPNOTSUPP;
 
+   if ((gup_flags & FOLL_LONGTERM) && (gup_flags & FOLL_PCI_P2PDMA))
+   return -EOPNOTSUPP;
+
if (vma_is_secretmem(vma))
return -EFAULT;
 
@@ -2184,6 +2193,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
 
+   if (unlikely(pte_devmap(pte) && !(flags & FOLL_PCI_P2PDMA) &&
+is_pci_p2pdma_page(page)))
+   goto pte_unmap;
+
folio = try_grab_folio(page, 1, flags);
if (!folio)
goto pte_unmap;
@@ -2258,6 +2271,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned 
long addr,
undo_dev_pagemap(nr, nr_start, flags, pages);
break;
}
+
+   if (!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)) {
+   undo_dev_pagemap(nr, nr_start, flags, pages);
+   break;
+   }
+
SetPageReferenced(page);
pages[*nr] = page;
if (unlikely(!try_grab_page(page, flags))) {
@@ -2729,7 +2748,8 @@ static int internal_get_user_pages_fast(unsigned long 
start,
 
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
   FOLL_FORCE | FOLL_PIN | FOLL_GET |
-  FOLL_FAST_ONLY | FOLL_NOFAULT)))
+  FOLL_FAST_ONLY | FOLL_NOFAULT |
+  FOLL_PCI_P2PDMA)))
return -EINVAL;
 
if (gup_flags & FOLL_PIN)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 19/21] block: set FOLL_PCI_P2PDMA in bio_map_user_iov()

2022-04-07 Thread Logan Gunthorpe
When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for
iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be
passed from userspace and enables the NVMe passthru requests to
use P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
---
 block/blk-map.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index c7f71d83eff1..85baf922a0e8 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -234,6 +234,7 @@ static int bio_map_user_iov(struct request *rq, struct 
iov_iter *iter,
gfp_t gfp_mask)
 {
unsigned int max_sectors = queue_max_hw_sectors(rq->q);
+   unsigned int flags = 0;
struct bio *bio;
int ret;
int j;
@@ -246,13 +247,17 @@ static int bio_map_user_iov(struct request *rq, struct 
iov_iter *iter,
return -ENOMEM;
bio->bi_opf |= req_op(rq);
 
+   if (blk_queue_pci_p2pdma(rq->q))
+   flags |= FOLL_PCI_P2PDMA;
+
while (iov_iter_count(iter)) {
struct page **pages;
ssize_t bytes;
size_t offs, added = 0;
int npages;
 
-   bytes = iov_iter_get_pages_alloc(iter, &pages, LONG_MAX, &offs);
+   bytes = iov_iter_get_pages_alloc_flags(iter, &pages, LONG_MAX,
+  &offs, flags);
if (unlikely(bytes <= 0)) {
ret = bytes ? bytes : -EFAULT;
goto out_unmap;
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 04/21] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations

2022-04-07 Thread Logan Gunthorpe
Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
implementations. It takes an scatterlist segment that must point to a
pci_p2pdma struct page and will map it if the mapping requires a bus
address.

The return value indicates whether the mapping required a bus address
or whether the caller still needs to map the segment normally. If the
segment should not be mapped, -EREMOTEIO is returned.

This helper uses a state structure to track the changes to the
pgmap across calls and avoid needing to lookup into the xarray for
every page.

Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
dma_map_sg() implementations where the sg segment containing the page
differs from the sg segment containing the DMA address.

Prototypes for these helpers are added to dma-map-ops.h as they are only
useful to dma map implementations and don't need to pollute the public
pci-p2pdma header.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/p2pdma.c| 59 +
 include/linux/dma-map-ops.h | 21 +
 2 files changed, 80 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 8573bf9d651a..9032c2ed2cdf 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -946,6 +946,65 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
 
+/**
+ * pci_p2pdma_map_segment - map an sg segment determining the mapping type
+ * @state: State structure that should be declared outside of the for_each_sg()
+ * loop and initialized to zero.
+ * @dev: DMA device that's doing the mapping operation
+ * @sg: scatterlist segment to map
+ *
+ * This is a helper to be used by non-IOMMU dma_map_sg() implementations where
+ * the sg segment is the same for the page_link and the dma_address.
+ *
+ * Attempt to map a single segment in an SGL with the PCI bus address.
+ * The segment must point to a PCI P2PDMA page and thus must be
+ * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
+ *
+ * Returns the type of mapping used and maps the page if the type is
+ * PCI_P2PDMA_MAP_BUS_ADDR.
+ */
+enum pci_p2pdma_map_type
+pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
+  struct scatterlist *sg)
+{
+   if (state->pgmap != sg_page(sg)->pgmap) {
+   state->pgmap = sg_page(sg)->pgmap;
+   state->map = pci_p2pdma_map_type(state->pgmap, dev);
+   state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
+   }
+
+   if (state->map == PCI_P2PDMA_MAP_BUS_ADDR) {
+   sg->dma_address = sg_phys(sg) + state->bus_off;
+   sg_dma_len(sg) = sg->length;
+   sg_dma_mark_bus_address(sg);
+   }
+
+   return state->map;
+}
+
+/**
+ * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
+ * be mapped with PCI_P2PDMA_MAP_BUS_ADDR
+ * @pg_sg: scatterlist segment with the page to map
+ * @dma_sg: scatterlist segment to assign a DMA address to
+ *
+ * This is a helper for iommu dma_map_sg() implementations when the
+ * segment for the DMA address differs from the segment containing the
+ * source page.
+ *
+ * pci_p2pdma_map_type() must have already been called on the pg_sg and
+ * returned PCI_P2PDMA_MAP_BUS_ADDR.
+ */
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+   struct scatterlist *dma_sg)
+{
+   struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
+
+   dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
+   sg_dma_len(dma_sg) = pg_sg->length;
+   sg_dma_mark_bus_address(dma_sg);
+}
+
 /**
  * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
  * to enable p2pdma
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index d693a0e33bac..752f91e5eb5d 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -413,15 +413,36 @@ enum pci_p2pdma_map_type {
PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
 };
 
+struct pci_p2pdma_map_state {
+   struct dev_pagemap *pgmap;
+   int map;
+   u64 bus_off;
+};
+
 #ifdef CONFIG_PCI_P2PDMA
 enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
 struct device *dev);
+enum pci_p2pdma_map_type
+pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
+  struct scatterlist *sg);
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+   struct scatterlist *dma_sg);
 #else /* CONFIG_PCI_P2PDMA */
 static inline enum pci_p2pdma_map_type
 pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev)
 {
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 }
+static inline enum pci_p2pdma_map_type
+pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
+  struct scatterlist *

[PATCH v6 01/21] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL

2022-04-07 Thread Logan Gunthorpe
Make use of the third free LSB in scatterlist's page_link on 64bit systems.

The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a
given SGL segments dma_address points to a PCI bus address.
dma_unmap_sg_p2pdma() will need to perform different cleanup when a
segment is marked as a bus address.

The new bit will only be used when CONFIG_PCI_P2PDMA is set; this means
PCI P2PDMA will require CONFIG_64BIT. This should be acceptable as the
majority of P2PDMA use cases are restricted to newer root complexes and
roughly require the extra address space for memory BARs used in the
transactions.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Chaitanya Kulkarni 
---
 drivers/pci/Kconfig |  5 +
 include/linux/scatterlist.h | 44 -
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 133c73207782..5cc7cba1941f 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -164,6 +164,11 @@ config PCI_PASID
 config PCI_P2PDMA
bool "PCI peer-to-peer transfer support"
depends on ZONE_DEVICE
+   #
+   # The need for the scatterlist DMA bus address flag means PCI P2PDMA
+   # requires 64bit
+   #
+   depends on 64BIT
select GENERIC_ALLOCATOR
help
  Enableѕ drivers to do PCI peer-to-peer transactions to and from
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 7ff9d6386c12..6561ca8aead8 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -64,12 +64,24 @@ struct sg_append_table {
 #define SG_CHAIN   0x01UL
 #define SG_END 0x02UL
 
+/*
+ * bit 2 is the third free bit in the page_link on 64bit systems which
+ * is used by dma_unmap_sg() to determine if the dma_address is a
+ * bus address when doing P2PDMA.
+ */
+#ifdef CONFIG_PCI_P2PDMA
+#define SG_DMA_BUS_ADDRESS 0x04UL
+static_assert(__alignof__(struct page) >= 8);
+#else
+#define SG_DMA_BUS_ADDRESS 0x00UL
+#endif
+
 /*
  * We overload the LSB of the page pointer to indicate whether it's
  * a valid sg entry, or whether it points to the start of a new scatterlist.
  * Those low bits are there for everyone! (thanks mason :-)
  */
-#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END)
+#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_DMA_BUS_ADDRESS)
 
 static inline unsigned int __sg_flags(struct scatterlist *sg)
 {
@@ -91,6 +103,11 @@ static inline bool sg_is_last(struct scatterlist *sg)
return __sg_flags(sg) & SG_END;
 }
 
+static inline bool sg_is_dma_bus_address(struct scatterlist *sg)
+{
+   return __sg_flags(sg) & SG_DMA_BUS_ADDRESS;
+}
+
 /**
  * sg_assign_page - Assign a given page to an SG entry
  * @sg:SG entry
@@ -245,6 +262,31 @@ static inline void sg_unmark_end(struct scatterlist *sg)
sg->page_link &= ~SG_END;
 }
 
+/**
+ * sg_dma_mark_bus address - Mark the scatterlist entry as a bus address
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ *   Marks the passed in sg entry to indicate that the dma_address is
+ *   a bus address and doesn't need to be unmapped.
+ **/
+static inline void sg_dma_mark_bus_address(struct scatterlist *sg)
+{
+   sg->page_link |= SG_DMA_BUS_ADDRESS;
+}
+
+/**
+ * sg_unmark_pci_p2pdma - Unmark the scatterlist entry as a bus address
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ *   Clears the bus address mark.
+ **/
+static inline void sg_dma_unmark_bus_address(struct scatterlist *sg)
+{
+   sg->page_link &= ~SG_DMA_BUS_ADDRESS;
+}
+
 /**
  * sg_phys - Return physical address of an sg entry
  * @sg: SG entry
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 07/21] dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support

2022-04-07 Thread Logan Gunthorpe
Add a flags member to the dma_map_ops structure with one flag to
indicate support for PCI P2PDMA.

Also, add a helper to check if a device supports PCI P2PDMA.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
 include/linux/dma-map-ops.h | 10 ++
 include/linux/dma-mapping.h |  5 +
 kernel/dma/mapping.c| 18 ++
 3 files changed, 33 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 752f91e5eb5d..4d4161d58ce0 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -11,7 +11,17 @@
 
 struct cma;
 
+/*
+ * Values for struct dma_map_ops.flags:
+ *
+ * DMA_F_PCI_P2PDMA_SUPPORTED: Indicates the dma_map_ops implementation can
+ * handle PCI P2PDMA pages in the map_sg/unmap_sg operation.
+ */
+#define DMA_F_PCI_P2PDMA_SUPPORTED (1 << 0)
+
 struct dma_map_ops {
+   unsigned int flags;
+
void *(*alloc)(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp,
unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..f7c61b2b4b5e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -140,6 +140,7 @@ int dma_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
unsigned long attrs);
 bool dma_can_mmap(struct device *dev);
 int dma_supported(struct device *dev, u64 mask);
+bool dma_pci_p2pdma_supported(struct device *dev);
 int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
@@ -250,6 +251,10 @@ static inline int dma_supported(struct device *dev, u64 
mask)
 {
return 0;
 }
+static inline bool dma_pci_p2pdma_supported(struct device *dev)
+{
+   return false;
+}
 static inline int dma_set_mask(struct device *dev, u64 mask)
 {
return -EIO;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9f65d1041638..21793506fdb6 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -722,6 +722,24 @@ int dma_supported(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_supported);
 
+bool dma_pci_p2pdma_supported(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   /* if ops is not set, dma direct will be used which supports P2PDMA */
+   if (!ops)
+   return true;
+
+   /*
+* Note: dma_ops_bypass is not checked here because P2PDMA should
+* not be used with dma mapping ops that do not have support even
+* if the specific device is bypassing them.
+*/
+
+   return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;
+}
+EXPORT_SYMBOL_GPL(dma_pci_p2pdma_supported);
+
 #ifdef CONFIG_ARCH_HAS_DMA_SET_MASK
 void arch_dma_set_mask(struct device *dev, u64 mask);
 #else
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

2022-04-07 Thread Logan Gunthorpe
When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

A P2PDMA page may have three possible outcomes when being mapped:
  1) If the data path between the two devices doesn't go through
 the root port, then it should be mapped with a PCI bus address
  2) If the data path goes through the host bridge, it should be mapped
 normally with an IOMMU IOVA.
  3) It is not possible for the two devices to communicate and thus
 the mapping operation should fail (and it will return -EREMOTEIO).

Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
 drivers/iommu/dma-iommu.c | 68 +++
 1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..ef86f2b573d1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1045,6 +1046,16 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
 
+   if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
+   if (i > 0)
+   cur = sg_next(cur);
+
+   pci_p2pdma_map_bus_segment(s, cur);
+   count++;
+   cur_len = 0;
+   continue;
+   }
+
/*
 * Now fill in the real DMA data. If...
 * - there is a valid output segment to append to
@@ -1141,6 +1152,8 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iova_domain *iovad = &cookie->iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+   struct dev_pagemap *pgmap = NULL;
+   enum pci_p2pdma_map_type map_type;
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
@@ -1176,6 +1189,35 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
s_length = iova_align(iovad, s_length + s_iova_off);
s->length = s_length;
 
+   if (is_pci_p2pdma_page(sg_page(s))) {
+   if (sg_page(s)->pgmap != pgmap) {
+   pgmap = sg_page(s)->pgmap;
+   map_type = pci_p2pdma_map_type(pgmap, dev);
+   }
+
+   switch (map_type) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   /*
+* A zero length will be ignored by
+* iommu_map_sg() and then can be detected
+* in __finalise_sg() to actually map the
+* bus address.
+*/
+   s->length = 0;
+   continue;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   /*
+* Mapping through host bridge should be
+* mapped with regular IOVAs, thus we
+* do nothing here and continue below.
+*/
+   break;
+   default:
+   ret = -EREMOTEIO;
+   goto out_restore_sg;
+   }
+   }
+
/*
 * Due to the alignment of our single IOVA allocation, we can
 * depend on these assumptions about the segment boundary mask:
@@ -1198,6 +1240,9 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = s;
}
 
+   if (!iova_len)
+   return __finalise_sg(dev, sg, nents, 0);
+
iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
if (!iova) {
ret = -ENOMEM;
@@ -1219,7 +1264,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
 out_restore_sg:
__invalidate_sg(sg, nents);
 out:
-   if (ret != -ENOMEM)
+   if (ret != -ENOMEM && ret != -EREMOTEIO)
return -EINVAL;
re

[PATCH v6 13/21] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

2022-04-07 Thread Logan Gunthorpe
This interface is superseded by support in dma_map_sg() which now supports
heterogeneous scatterlists. There are no longer any users, so remove it.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Max Gurtovoy 
---
 drivers/pci/p2pdma.c   | 66 --
 include/linux/pci-p2pdma.h | 27 
 2 files changed, 93 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 9032c2ed2cdf..4d3cab9da748 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -880,72 +880,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
dev_pagemap *pgmap,
return type;
 }
 
-static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
-   struct device *dev, struct scatterlist *sg, int nents)
-{
-   struct scatterlist *s;
-   int i;
-
-   for_each_sg(sg, s, nents, i) {
-   s->dma_address = sg_phys(s) + p2p_pgmap->bus_offset;
-   sg_dma_len(s) = s->length;
-   }
-
-   return nents;
-}
-
-/**
- * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: elements in the scatterlist
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_map_sg() (if called)
- *
- * Scatterlists mapped with this function should be unmapped using
- * pci_p2pdma_unmap_sg_attrs().
- *
- * Returns the number of SG entries mapped or 0 on error.
- */
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   struct pci_p2pdma_pagemap *p2p_pgmap =
-   to_p2p_pgmap(sg_page(sg)->pgmap);
-
-   switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
-   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
-   return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
-   case PCI_P2PDMA_MAP_BUS_ADDR:
-   return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
-   default:
-   /* Mapping is not Supported */
-   return 0;
-   }
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
-
-/**
- * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was
- * mapped with pci_p2pdma_map_sg()
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: number of elements returned by pci_p2pdma_map_sg()
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_unmap_sg() (if called)
- */
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   enum pci_p2pdma_map_type map_type;
-
-   map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
-
-   if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
-   dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
-
 /**
  * pci_p2pdma_map_segment - map an sg segment determining the mapping type
  * @state: State structure that should be declared outside of the for_each_sg()
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 8318a97c9c61..2c07aa6b7665 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -30,10 +30,6 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev 
*pdev,
 unsigned int *nents, u32 length);
 void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
 int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
bool *use_p2pdma);
 ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
@@ -83,17 +79,6 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
 static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 {
 }
-static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
-   struct scatterlist *sg, int nents, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-   return 0;
-}
-static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
-   struct scatterlist *sg, int nents, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-}
 static inline int pci_p2pdma_enable_store(const char *page,
struct pci_dev **p2p_dev, bool *use_p2pdma)
 {
@@ -119,16 +104,4 @@ static inline struct pci_dev *pci_p2pmem_find(struct 
device *client)
return pci_p2pmem_find_many(&client, 1);
 }
 
-static inline int pci_p2pdma_map_sg(struct device 

[PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-04-07 Thread Logan Gunthorpe
Introduce pci_mmap_p2pmem() which is a helper to allocate and mmap
a hunk of p2pmem into userspace.

Pages are allocated from the genalloc in bulk with their reference
count set to one. They are returned to the genalloc when the page is put
through p2pdma_page_free() (the reference count is once again set to 1
in free_zone_device_page()).

The VMA does not take a reference to the pages when they are inserted
with vmf_insert_mixed() (which is necessary for zone device pages) so
the backing P2P memory is stored in a structures in vm_private_data.

A pseudo mount is used to allocate an inode for each PCI device. The
inode's address_space is used in the file doing the mmap so that all
VMAs are collected and can be unmapped if the PCI device is unbound.
After unmapping, the VMAs are iterated through and their pages are
put so the device can continue to be unbound. An active flag is used
to signal to VMAs not to allocate any further P2P memory once the
removal process starts. The flag is synchronized with concurrent
access with an RCU lock.

The VMAs and inode will survive after the unbind of the device, but no
pages will be present in the VMA and a subsequent access will result
in a SIGBUS error.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/p2pdma.c   | 340 -
 include/linux/pci-p2pdma.h |  11 ++
 include/uapi/linux/magic.h |   1 +
 3 files changed, 350 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 4d3cab9da748..cce4c7b6dd75 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -17,14 +17,19 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 struct pci_p2pdma {
struct gen_pool *pool;
bool p2pmem_published;
struct xarray map_types;
+   struct inode *inode;
+   bool active;
 };
 
 struct pci_p2pdma_pagemap {
@@ -33,6 +38,17 @@ struct pci_p2pdma_pagemap {
u64 bus_offset;
 };
 
+struct pci_p2pdma_map {
+   struct kref ref;
+   struct rcu_head rcu;
+   struct pci_dev *pdev;
+   struct inode *inode;
+   size_t len;
+
+   spinlock_t kaddr_lock;
+   void *kaddr;
+};
+
 static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap)
 {
return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap);
@@ -101,6 +117,38 @@ static const struct attribute_group p2pmem_group = {
.name = "p2pmem",
 };
 
+/*
+ * P2PDMA internal mount
+ * Fake an internal VFS mount-point in order to allocate struct address_space
+ * mappings to remove VMAs on unbind events.
+ */
+static int pci_p2pdma_fs_cnt;
+static struct vfsmount *pci_p2pdma_fs_mnt;
+
+static int pci_p2pdma_fs_init_fs_context(struct fs_context *fc)
+{
+   return init_pseudo(fc, P2PDMA_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type pci_p2pdma_fs_type = {
+   .name = "p2dma",
+   .owner = THIS_MODULE,
+   .init_fs_context = pci_p2pdma_fs_init_fs_context,
+   .kill_sb = kill_anon_super,
+};
+
+static void p2pdma_page_free(struct page *page)
+{
+   struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap);
+
+   gen_pool_free(pgmap->provider->p2pdma->pool,
+ (uintptr_t)page_to_virt(page), PAGE_SIZE);
+}
+
+static const struct dev_pagemap_ops p2pdma_pgmap_ops = {
+   .page_free = p2pdma_page_free,
+};
+
 static void pci_p2pdma_release(void *data)
 {
struct pci_dev *pdev = data;
@@ -117,6 +165,9 @@ static void pci_p2pdma_release(void *data)
gen_pool_destroy(p2pdma->pool);
sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
xa_destroy(&p2pdma->map_types);
+
+   iput(p2pdma->inode);
+   simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt);
 }
 
 static int pci_p2pdma_setup(struct pci_dev *pdev)
@@ -134,17 +185,32 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
if (!p2p->pool)
goto out;
 
-   error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
+   error = simple_pin_fs(&pci_p2pdma_fs_type, &pci_p2pdma_fs_mnt,
+ &pci_p2pdma_fs_cnt);
if (error)
goto out_pool_destroy;
 
+   p2p->inode = alloc_anon_inode(pci_p2pdma_fs_mnt->mnt_sb);
+   if (IS_ERR(p2p->inode)) {
+   error = -ENOMEM;
+   goto out_unpin_fs;
+   }
+
+   error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
+   if (error)
+   goto out_put_inode;
+
error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group);
if (error)
-   goto out_pool_destroy;
+   goto out_put_inode;
 
rcu_assign_pointer(pdev->p2pdma, p2p);
return 0;
 
+out_put_inode:
+   iput(p2p->inode);
+out_unpin_fs:
+   simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt);
 out_pool_destroy:
gen_pool_destroy(p2p->pool);
 out:
@@ -152,6 +218,

[PATCH v6 18/21] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()

2022-04-07 Thread Logan Gunthorpe
When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for
iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be passed
from userspace and enables the O_DIRECT path in iomap based filesystems
and direct to block devices.

Signed-off-by: Logan Gunthorpe 
---
 block/bio.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index 3406c0450db3..271a720a6dc1 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1149,6 +1149,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, 
struct iov_iter *iter)
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
struct page **pages = (struct page **)bv;
bool same_page = false;
+   unsigned int flags = 0;
ssize_t size, left;
unsigned len, i;
size_t offset;
@@ -1161,7 +1162,12 @@ static int __bio_iov_iter_get_pages(struct bio *bio, 
struct iov_iter *iter)
BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-   size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
+   if (bio->bi_bdev && bio->bi_bdev->bd_disk &&
+   blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
+   flags |= FOLL_PCI_P2PDMA;
+
+   size = iov_iter_get_pages_flags(iter, pages, LONG_MAX, nr_pages,
+   &offset, flags);
if (unlikely(size <= 0))
return size ? size : -EFAULT;
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 15/21] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()

2022-04-07 Thread Logan Gunthorpe
Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
which take a flags argument that is passed to get_user_pages_fast().

This is so that FOLL_PCI_P2PDMA can be passed when appropriate.

Signed-off-by: Logan Gunthorpe 
---
 include/linux/uio.h |  6 ++
 lib/iov_iter.c  | 25 +++--
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 739285fe5a2f..ddf9e4cf4a59 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -232,8 +232,14 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int 
direction, struct pipe_inode
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t 
count);
 void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray 
*xarray,
 loff_t start, size_t count);
+ssize_t iov_iter_get_pages_flags(struct iov_iter *i, struct page **pages,
+   size_t maxsize, unsigned maxpages, size_t *start,
+   unsigned int gup_flags);
 ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
size_t maxsize, unsigned maxpages, size_t *start);
+ssize_t iov_iter_get_pages_alloc_flags(struct iov_iter *i,
+   struct page ***pages, size_t maxsize, size_t *start,
+   unsigned int gup_flags);
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
size_t maxsize, size_t *start);
 int iov_iter_npages(const struct iov_iter *i, int maxpages);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a99..9bf6e3af5120 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1515,9 +1515,9 @@ static struct page *first_bvec_segment(const struct 
iov_iter *i,
return page;
 }
 
-ssize_t iov_iter_get_pages(struct iov_iter *i,
+ssize_t iov_iter_get_pages_flags(struct iov_iter *i,
   struct page **pages, size_t maxsize, unsigned maxpages,
-  size_t *start)
+  size_t *start, unsigned int gup_flags)
 {
size_t len;
int n, res;
@@ -1528,7 +1528,6 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
return 0;
 
if (likely(iter_is_iovec(i))) {
-   unsigned int gup_flags = 0;
unsigned long addr;
 
if (iov_iter_rw(i) != WRITE)
@@ -1558,6 +1557,13 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
return iter_xarray_get_pages(i, pages, maxsize, maxpages, 
start);
return -EFAULT;
 }
+EXPORT_SYMBOL_GPL(iov_iter_get_pages_flags);
+
+ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
+  size_t maxsize, unsigned maxpages, size_t *start)
+{
+   return iov_iter_get_pages_flags(i, pages, maxsize, maxpages, start, 0);
+}
 EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
@@ -1640,9 +1646,9 @@ static ssize_t iter_xarray_get_pages_alloc(struct 
iov_iter *i,
return actual;
 }
 
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
+ssize_t iov_iter_get_pages_alloc_flags(struct iov_iter *i,
   struct page ***pages, size_t maxsize,
-  size_t *start)
+  size_t *start, unsigned int gup_flags)
 {
struct page **p;
size_t len;
@@ -1654,7 +1660,6 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
return 0;
 
if (likely(iter_is_iovec(i))) {
-   unsigned int gup_flags = 0;
unsigned long addr;
 
if (iov_iter_rw(i) != WRITE)
@@ -1667,6 +1672,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
p = get_pages_array(n);
if (!p)
return -ENOMEM;
+
res = get_user_pages_fast(addr, n, gup_flags, p);
if (unlikely(res <= 0)) {
kvfree(p);
@@ -1694,6 +1700,13 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
return iter_xarray_get_pages_alloc(i, pages, maxsize, start);
return -EFAULT;
 }
+EXPORT_SYMBOL_GPL(iov_iter_get_pages_alloc_flags);
+
+ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
+size_t maxsize, size_t *start)
+{
+   return iov_iter_get_pages_alloc_flags(i, pages, maxsize, start, 0);
+}
 EXPORT_SYMBOL(iov_iter_get_pages_alloc);
 
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 03/21] PCI/P2PDMA: Expose pci_p2pdma_map_type()

2022-04-07 Thread Logan Gunthorpe
pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
implementation because it will need to determine the mapping type
ahead of actually doing the mapping to create the actual IOMMU mapping.

Prototypes for this helper are added to dma-map-ops.h as they are only
useful to dma map implementations and don't need to pollute the public
pci-p2pdma header

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Chaitanya Kulkarni 
---
 drivers/pci/p2pdma.c| 25 +
 include/linux/dma-map-ops.h | 45 +
 2 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index c3a68e82cf36..8573bf9d651a 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -10,6 +10,7 @@
 
 #define pr_fmt(fmt) "pci-p2pdma: " fmt
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -20,13 +21,6 @@
 #include 
 #include 
 
-enum pci_p2pdma_map_type {
-   PCI_P2PDMA_MAP_UNKNOWN = 0,
-   PCI_P2PDMA_MAP_NOT_SUPPORTED,
-   PCI_P2PDMA_MAP_BUS_ADDR,
-   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
-};
-
 struct pci_p2pdma {
struct gen_pool *pool;
bool p2pmem_published;
@@ -842,8 +836,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
 
-static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
-   struct device *dev)
+/**
+ * pci_p2pdma_map_type - return the type of mapping that should be used for
+ * a given device and pgmap
+ * @pgmap: the pagemap of a page to determine the mapping type for
+ * @dev: device that is mapping the page
+ *
+ * Returns one of:
+ * PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
+ * PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
+ * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done normally
+ * using the CPU physical address (in dma-direct) or an IOVA
+ * mapping for the IOMMU.
+ */
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+struct device *dev)
 {
enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..d693a0e33bac 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -379,4 +379,49 @@ static inline void debug_dma_dump_mappings(struct device 
*dev)
 
 extern const struct dma_map_ops dma_dummy_ops;
 
+enum pci_p2pdma_map_type {
+   /*
+* PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping
+* type hasn't been calculated yet. Functions that return this enum
+* never return this value.
+*/
+   PCI_P2PDMA_MAP_UNKNOWN = 0,
+
+   /*
+* PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will
+* traverse the host bridge and the host bridge is not in the
+* allowlist. DMA Mapping routines should return an error when
+* this is returned.
+*/
+   PCI_P2PDMA_MAP_NOT_SUPPORTED,
+
+   /*
+* PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to
+* each other directly through a PCI switch and the transaction will
+* not traverse the host bridge. Such a mapping should program
+* the DMA engine with PCI bus addresses.
+*/
+   PCI_P2PDMA_MAP_BUS_ADDR,
+
+   /*
+* PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk
+* to each other, but the transaction traverses a host bridge on the
+* allowlist. In this case, a normal mapping either with CPU physical
+* addresses (in the case of dma-direct) or IOVA addresses (in the
+* case of IOMMUs) should be used to program the DMA engine.
+*/
+   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
+};
+
+#ifdef CONFIG_PCI_P2PDMA
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+struct device *dev);
+#else /* CONFIG_PCI_P2PDMA */
+static inline enum pci_p2pdma_map_type
+pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev)
+{
+   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+}
+#endif /* CONFIG_PCI_P2PDMA */
+
 #endif /* _LINUX_DMA_MAP_OPS_H */
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 10/21] nvme-pci: convert to using dma_map_sgtable()

2022-04-07 Thread Logan Gunthorpe
The dma_map operations now support P2PDMA pages directly. So remove
the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls
to dma_map_sgtable().

dma_map_sgtable() returns more complete error codes than dma_map_sg()
and allows differentiating EREMOTEIO errors in case an unsupported
P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET
so the request isn't retried.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Max Gurtovoy 
Reviewed-by: Chaitanya Kulkarni 
---
 drivers/nvme/host/pci.c | 69 +
 1 file changed, 29 insertions(+), 40 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fec4c7191310..07412116d4d1 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -230,11 +230,10 @@ struct nvme_iod {
bool use_sgl;
int aborted;
int npages; /* In the PRP list. 0 means small pool in use */
-   int nents;  /* Used in scatterlist */
dma_addr_t first_dma;
unsigned int dma_len;   /* length of single DMA segment mapping */
dma_addr_t meta_dma;
-   struct scatterlist *sg;
+   struct sg_table sgt;
 };
 
 static inline unsigned int nvme_dbbuf_size(struct nvme_dev *dev)
@@ -524,7 +523,7 @@ static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
 static void **nvme_pci_iod_list(struct request *req)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-   return (void **)(iod->sg + blk_rq_nr_phys_segments(req));
+   return (void **)(iod->sgt.sgl + blk_rq_nr_phys_segments(req));
 }
 
 static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
@@ -576,17 +575,6 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct 
request *req)
}
 }
 
-static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
-{
-   struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
-   if (is_pci_p2pdma_page(sg_page(iod->sg)))
-   pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
-   rq_dma_dir(req));
-   else
-   dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
-}
-
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -597,9 +585,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct 
request *req)
return;
}
 
-   WARN_ON_ONCE(!iod->nents);
+   WARN_ON_ONCE(!iod->sgt.nents);
+
+   dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0);
 
-   nvme_unmap_sg(dev, req);
if (iod->npages == 0)
dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
  iod->first_dma);
@@ -607,7 +596,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct 
request *req)
nvme_free_sgls(dev, req);
else
nvme_free_prps(dev, req);
-   mempool_free(iod->sg, dev->iod_mempool);
+   mempool_free(iod->sgt.sgl, dev->iod_mempool);
 }
 
 static void nvme_print_sgl(struct scatterlist *sgl, int nents)
@@ -630,7 +619,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev 
*dev,
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct dma_pool *pool;
int length = blk_rq_payload_bytes(req);
-   struct scatterlist *sg = iod->sg;
+   struct scatterlist *sg = iod->sgt.sgl;
int dma_len = sg_dma_len(sg);
u64 dma_addr = sg_dma_address(sg);
int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1);
@@ -703,16 +692,16 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev 
*dev,
dma_len = sg_dma_len(sg);
}
 done:
-   cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
+   cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sgt.sgl));
cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma);
return BLK_STS_OK;
 free_prps:
nvme_free_prps(dev, req);
return BLK_STS_RESOURCE;
 bad_sgl:
-   WARN(DO_ONCE(nvme_print_sgl, iod->sg, iod->nents),
+   WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents),
"Invalid SGL for payload:%d nents:%d\n",
-   blk_rq_payload_bytes(req), iod->nents);
+   blk_rq_payload_bytes(req), iod->sgt.nents);
return BLK_STS_IOERR;
 }
 
@@ -738,12 +727,13 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc 
*sge,
 }
 
 static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
-   struct request *req, struct nvme_rw_command *cmd, int entries)
+   struct request *req, struct nvme_rw_command *cmd)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct dma_pool *pool;
struct nvme_sgl_desc *sg_list;
-   struct scatterlist *sg = iod->sg;
+   struct scatterlist *sg = iod->sgt.sgl;
+   unsigned int entries = iod->sgt.nents;
dma_addr_t sgl_dma;
int i = 0;
 

[PATCH v6 05/21] dma-mapping: allow EREMOTEIO return code for P2PDMA transfers

2022-04-07 Thread Logan Gunthorpe
Add EREMOTEIO error return to dma_map_sgtable() which will be used
by .map_sg() implementations that detect P2PDMA pages that the
underlying DMA device cannot access.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
 kernel/dma/mapping.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index db7244291b74..9f65d1041638 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -197,7 +197,7 @@ static int __dma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
if (ents > 0)
debug_dma_map_sg(dev, sg, nents, ents, dir, attrs);
else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
- ents != -EIO))
+ ents != -EIO && ents != -EREMOTEIO))
return -EIO;
 
return ents;
@@ -255,6 +255,8 @@ EXPORT_SYMBOL(dma_map_sg_attrs);
  * complete the mapping. Should succeed if retried later.
  *   -EIO  Legacy error code with an unknown meaning. eg. this is
  * returned if a lower level call returned DMA_MAPPING_ERROR.
+ *   -EREMOTEIOThe DMA device cannot access P2PDMA memory specified in
+ * the sg_table. This will not succeed if retried.
  */
 int dma_map_sgtable(struct device *dev, struct sg_table *sgt,
enum dma_data_direction dir, unsigned long attrs)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 06/21] dma-direct: support PCI P2PDMA pages in dma-direct map_sg

2022-04-07 Thread Logan Gunthorpe
Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for heterogeneous SGLs that contain both P2PDMA and regular pages.

A P2PDMA page may have three possible outcomes when being mapped:
  1) If the data path between the two devices doesn't go through the
 root port, then it should be mapped with a PCI bus address
  2) If the data path goes through the host bridge, it should be mapped
 normally, as though it were a CPU physical address
  3) It is not possible for the two devices to communicate and thus
 the mapping operation should fail (and it will return -EREMOTEIO).

SGL segments that contain PCI bus addresses are marked with
sg_dma_mark_pci_p2pdma() and are ignored when unmapped.

P2PDMA mappings are also failed if swiotlb needs to be used on the
mapping.

Signed-off-by: Logan Gunthorpe 
---
 kernel/dma/direct.c | 43 +--
 kernel/dma/direct.h |  8 +++-
 2 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 9743c6ccce1a..33b838a3ccb2 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -461,29 +461,60 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
arch_sync_dma_for_cpu_all();
 }
 
+/*
+ * Unmaps segments, except for ones marked as pci_p2pdma which do not
+ * require any further action as they contain a bus address.
+ */
 void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction dir, unsigned long attrs)
 {
struct scatterlist *sg;
int i;
 
-   for_each_sg(sgl, sg, nents, i)
-   dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
-attrs);
+   for_each_sg(sgl,  sg, nents, i) {
+   if (sg_is_dma_bus_address(sg))
+   sg_dma_unmark_bus_address(sg);
+   else
+   dma_direct_unmap_page(dev, sg->dma_address,
+ sg_dma_len(sg), dir, attrs);
+   }
 }
 #endif
 
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs)
 {
-   int i;
+   struct pci_p2pdma_map_state p2pdma_state = {};
+   enum pci_p2pdma_map_type map;
struct scatterlist *sg;
+   int i, ret;
 
for_each_sg(sgl, sg, nents, i) {
+   if (is_pci_p2pdma_page(sg_page(sg))) {
+   map = pci_p2pdma_map_segment(&p2pdma_state, dev, sg);
+   switch (map) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   continue;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   /*
+* Any P2P mapping that traverses the PCI
+* host bridge must be mapped with CPU physical
+* address and not PCI bus addresses. This is
+* done with dma_direct_map_page() below.
+*/
+   break;
+   default:
+   ret = -EREMOTEIO;
+   goto out_unmap;
+   }
+   }
+
sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
sg->offset, sg->length, dir, attrs);
-   if (sg->dma_address == DMA_MAPPING_ERROR)
+   if (sg->dma_address == DMA_MAPPING_ERROR) {
+   ret = -EIO;
goto out_unmap;
+   }
sg_dma_len(sg) = sg->length;
}
 
@@ -491,7 +522,7 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
 
 out_unmap:
dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
-   return -EIO;
+   return ret;
 }
 
 dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 4632b0f4f72e..81b213409ce8 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -8,6 +8,7 @@
 #define _KERNEL_DMA_DIRECT_H
 
 #include 
+#include 
 
 int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
@@ -87,10 +88,15 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (is_swiotlb_force_bounce(dev))
+   if (is_swiotlb_force_bounce(dev)) {
+   if (is_pci_p2pdma_page(page))
+   return DMA_MAPPING_ERROR;
return swiotlb_map(dev, phys, size, dir, attrs);
+   }
 
if (unlikely(!dma_capab

[PATCH v6 17/21] lib/scatterlist: add check when merging zone device pages

2022-04-07 Thread Logan Gunthorpe
Consecutive zone device pages should not be merged into the same sgl
or bvec segment with other types of pages or if they belong to different
pgmaps. Otherwise getting the pgmap of a given segment is not possible
without scanning the entire segment. This helper returns true either if
both pages are not zone device pages or both pages are zone device
pages with the same pgmap.

Factor out the check for page mergability into a pages_are_mergable()
helper and add a check with zone_device_pages_are_mergeable().

Signed-off-by: Logan Gunthorpe 
---
 lib/scatterlist.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index d5e82e4a57ad..af53a0984f76 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -410,6 +410,15 @@ static struct scatterlist *get_next_sg(struct 
sg_append_table *table,
return new_sg;
 }
 
+static bool pages_are_mergeable(struct page *a, struct page *b)
+{
+   if (page_to_pfn(a) != page_to_pfn(b) + 1)
+   return false;
+   if (!zone_device_pages_have_same_pgmap(a, b))
+   return false;
+   return true;
+}
+
 /**
  * sg_alloc_append_table_from_pages - Allocate and initialize an append sg
  *table from an array of pages
@@ -447,6 +456,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table 
*sgt_append,
unsigned int chunks, cur_page, seg_len, i, prv_len = 0;
unsigned int added_nents = 0;
struct scatterlist *s = sgt_append->prv;
+   struct page *last_pg;
 
/*
 * The algorithm below requires max_segment to be aligned to PAGE_SIZE
@@ -460,21 +470,17 @@ int sg_alloc_append_table_from_pages(struct 
sg_append_table *sgt_append,
return -EOPNOTSUPP;
 
if (sgt_append->prv) {
-   unsigned long paddr =
-   (page_to_pfn(sg_page(sgt_append->prv)) * PAGE_SIZE +
-sgt_append->prv->offset + sgt_append->prv->length) /
-   PAGE_SIZE;
-
if (WARN_ON(offset))
return -EINVAL;
 
/* Merge contiguous pages into the last SG */
prv_len = sgt_append->prv->length;
-   while (n_pages && page_to_pfn(pages[0]) == paddr) {
+   last_pg = sg_page(sgt_append->prv);
+   while (n_pages && pages_are_mergeable(last_pg, pages[0])) {
if (sgt_append->prv->length + PAGE_SIZE > max_segment)
break;
sgt_append->prv->length += PAGE_SIZE;
-   paddr++;
+   last_pg = pages[0];
pages++;
n_pages--;
}
@@ -488,7 +494,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table 
*sgt_append,
for (i = 1; i < n_pages; i++) {
seg_len += PAGE_SIZE;
if (seg_len >= max_segment ||
-   page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) {
+   !pages_are_mergeable(pages[i], pages[i - 1])) {
chunks++;
seg_len = 0;
}
@@ -504,8 +510,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table 
*sgt_append,
for (j = cur_page + 1; j < n_pages; j++) {
seg_len += PAGE_SIZE;
if (seg_len >= max_segment ||
-   page_to_pfn(pages[j]) !=
-   page_to_pfn(pages[j - 1]) + 1)
+   !pages_are_mergeable(pages[j], pages[j - 1]))
break;
}
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 11/21] RDMA/core: introduce ib_dma_pci_p2p_dma_supported()

2022-04-07 Thread Logan Gunthorpe
Introduce the helper function ib_dma_pci_p2p_dma_supported() to check
if a given ib_device can be used in P2PDMA transfers. This ensures
the ib_device is not using virt_dma and also that the underlying
dma_device supports P2PDMA.

Use the new helper in nvme-rdma to replace the existing check for
ib_uses_virt_dma(). Adding the dma_pci_p2pdma_supported() check allows
switching away from pci_p2pdma_[un]map_sg().

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Max Gurtovoy 
---
 drivers/nvme/target/rdma.c |  2 +-
 include/rdma/ib_verbs.h| 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 2fab0b219b25..12258f87ccc8 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -415,7 +415,7 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device 
*ndev,
if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
goto out_free_rsp;
 
-   if (!ib_uses_virt_dma(ndev->device))
+   if (ib_dma_pci_p2p_dma_supported(ndev->device))
r->req.p2p_client = &ndev->device->dev;
r->send_sge.length = sizeof(*r->req.cqe);
r->send_sge.lkey = ndev->pd->local_dma_lkey;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 69d883f7fb41..79609ab73014 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4003,6 +4003,17 @@ static inline bool ib_uses_virt_dma(struct ib_device 
*dev)
return IS_ENABLED(CONFIG_INFINIBAND_VIRT_DMA) && !dev->dma_device;
 }
 
+/*
+ * Check if a IB device's underlying DMA mapping supports P2PDMA transfers.
+ */
+static inline bool ib_dma_pci_p2p_dma_supported(struct ib_device *dev)
+{
+   if (ib_uses_virt_dma(dev))
+   return false;
+
+   return dma_pci_p2pdma_supported(dev->dma_device);
+}
+
 /**
  * ib_dma_mapping_error - check a DMA addr for error
  * @dev: The device for which the dma_addr was created
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 09/21] nvme-pci: check DMA ops when indicating support for PCI P2PDMA

2022-04-07 Thread Logan Gunthorpe
Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Chaitanya Kulkarni 
---
 drivers/nvme/host/core.c |  3 ++-
 drivers/nvme/host/nvme.h |  2 +-
 drivers/nvme/host/pci.c  | 11 +--
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index efb85c6d8e2d..bbc276dda49f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3912,7 +3912,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid,
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
 
blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
-   if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+   if (ctrl->ops->supports_pci_p2pdma &&
+   ctrl->ops->supports_pci_p2pdma(ctrl))
blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
 
ns->ctrl = ctrl;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1393bbf82d71..7d97bfb2a9e2 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -489,7 +489,6 @@ struct nvme_ctrl_ops {
unsigned int flags;
 #define NVME_F_FABRICS (1 << 0)
 #define NVME_F_METADATA_SUPPORTED  (1 << 1)
-#define NVME_F_PCI_P2PDMA  (1 << 2)
int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
@@ -497,6 +496,7 @@ struct nvme_ctrl_ops {
void (*submit_async_event)(struct nvme_ctrl *ctrl);
void (*delete_ctrl)(struct nvme_ctrl *ctrl);
int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
+   bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d817ca17463e..fec4c7191310 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2969,17 +2969,24 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, 
char *buf, int size)
return snprintf(buf, size, "%s\n", dev_name(&pdev->dev));
 }
 
+static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
+{
+   struct nvme_dev *dev = to_nvme_dev(ctrl);
+
+   return dma_pci_p2pdma_supported(dev->dev);
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
.name   = "pcie",
.module = THIS_MODULE,
-   .flags  = NVME_F_METADATA_SUPPORTED |
- NVME_F_PCI_P2PDMA,
+   .flags  = NVME_F_METADATA_SUPPORTED,
.reg_read32 = nvme_pci_reg_read32,
.reg_write32= nvme_pci_reg_write32,
.reg_read64 = nvme_pci_reg_read64,
.free_ctrl  = nvme_pci_free_ctrl,
.submit_async_event = nvme_pci_submit_async_event,
.get_address= nvme_pci_get_address,
+   .supports_pci_p2pdma= nvme_pci_supports_pci_p2pdma,
 };
 
 static int nvme_dev_map(struct nvme_dev *dev)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH net-next] sfc: Stop using iommu_present()

2022-04-07 Thread Edward Cree
On 05/04/2022 14:40, Robin Murphy wrote:
> Even if an IOMMU might be present for some PCI segment in the system,
> that doesn't necessarily mean it provides translation for the device
> we care about. It appears that what we care about here is specifically
> whether DMA mapping ops involve any IOMMU overhead or not, so check for
> translation actually being active for our device.
> 
> Signed-off-by: Robin Murphy 
Acked-by: Edward Cree 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 00/11] ACPI/IORT: Support for IORT RMR node

2022-04-07 Thread Steven Price
Hi Shameer,

I've given it a spin on my Juno board and the EFIFB works fine with it.
However I am getting a warning:

ACPI: IORT: [Firmware Bug]: RMR descriptor[0xf80d] with zero length,
continue anyway

Which on examination of the IORT table is correct - the firmware does
indeed seem to have a bug and the length in the IORT table is 0,
hopefully I can get that fixed. However since it "all works" that points
out that the reserved memory region isn't actually used. Instead the
existing entries from the SMMU are reused (even though they don't match
the address/size region in the RMR).

I'm not sure if that matters but I thought it worth pointing out as it's
not currently spelt out that the RMR descriptor's content is currently
actually ignored.

Anyway, FWIW:

Tested-by: Steven Price 

Steve

On 04/04/2022 13:41, Shameer Kolothum wrote:
> Hi
> 
> v8 --> v9
>  - Adressed comments from Robin on interfaces as discussed here[0].
>  - Addressed comments from Lorenzo.
>  
> Though functionally there aren't any major changes, the interfaces have
> changed from v8 and for that reason not included the T-by tags from
> Steve and Eric yet(Many thanks for that). Appreciate it if you could
> give this a spin and let me know.
> 
> (The revised ACPICA pull request for IORT E.d related changes is
> here[1] and this is now merged to acpica:master.)
> 
> Please take a look and let me know your thoughts.
> 
> Thanks,
> Shameer
> [0] 
> https://lore.kernel.org/linux-arm-kernel/c982f1d7-c565-769a-abae-79c962969...@arm.com/
> [1] https://github.com/acpica/acpica/pull/765
> 
> From old:
> We have faced issues with 3408iMR RAID controller cards which
> fail to boot when SMMU is enabled. This is because these
> controllers make use of host memory for various caching related
> purposes and when SMMU is enabled the iMR firmware fails to
> access these memory regions as there is no mapping for them.
> IORT RMR provides a way for UEFI to describe and report these
> memory regions so that the kernel can make a unity mapping for
> these in SMMU.
> 
> Change History:
> 
> v7 --> v8
>   - Patch #1 has temp definitions for RMR related changes till
> the ACPICA header changes are part of kernel.
>   - No early parsing of RMR node info and is only parsed at the
> time of use.
>   - Changes to the RMR get/put API format compared to the
> previous version.
>   - Support for RMR descriptor shared by multiple stream IDs.
> 
> v6 --> v7
>  -fix pointed out by Steve to the SMMUv2 SMR bypass install in patch #8.
> 
> v5 --> v6
> - Addressed comments from Robin & Lorenzo.
>   : Moved iort_parse_rmr() to acpi_iort_init() from
> iort_init_platform_devices().
>   : Removed use of struct iort_rmr_entry during the initial
> parse. Using struct iommu_resv_region instead.
>   : Report RMR address alignment and overlap errors, but continue.
>   : Reworked arm_smmu_init_bypass_stes() (patch # 6).
> - Updated SMMUv2 bypass SMR code. Thanks to Jon N (patch #8).
> - Set IOMMU protection flags(IOMMU_CACHE, IOMMU_MMIO) based
>   on Type of RMR region. Suggested by Jon N.
> 
> v4 --> v5
>  -Added a fw_data union to struct iommu_resv_region and removed
>   struct iommu_rmr (Based on comments from Joerg/Robin).
>  -Added iommu_put_rmrs() to release mem.
>  -Thanks to Steve for verifying on SMMUv2, but not added the Tested-by
>   yet because of the above changes.
> 
> v3 -->v4
> -Included the SMMUv2 SMR bypass install changes suggested by
>  Steve(patch #7)
> -As per Robin's comments, RMR reserve implementation is now
>  more generic  (patch #8) and dropped v3 patches 8 and 10.
> -Rebase to 5.13-rc1
> 
> RFC v2 --> v3
>  -Dropped RFC tag as the ACPICA header changes are now ready to be
>   part of 5.13[0]. But this series still has a dependency on that patch.
>  -Added IORT E.b related changes(node flags, _DSM function 5 checks for
>   PCIe).
>  -Changed RMR to stream id mapping from M:N to M:1 as per the spec and
>   discussion here[1].
>  -Last two patches add support for SMMUv2(Thanks to Jon Nettleton!)
> 
> Jon Nettleton (1):
>   iommu/arm-smmu: Get associated RMR info and install bypass SMR
> 
> Shameer Kolothum (10):
>   ACPI/IORT: Add temporary RMR node flag definitions
>   iommu: Introduce a union to struct iommu_resv_region
>   ACPI/IORT: Make iort_iommu_msi_get_resv_regions() return void
>   ACPI/IORT: Provide a generic helper to retrieve reserve regions
>   iommu/dma: Introduce a helper to remove reserved regions
>   ACPI/IORT: Add support to retrieve IORT RMR reserved regions
>   ACPI/IORT: Add a helper to retrieve RMR info directly
>   iommu/arm-smmu-v3: Introduce strtab init helper
>   iommu/arm-smmu-v3: Refactor arm_smmu_init_bypass_stes() to force
> bypass
>   iommu/arm-smmu-v3: Get associated RMR info and install bypass STE
> 
>  drivers/acpi/arm64/iort.c   | 369 ++--
>  drivers/iommu/apple-dart.c  |   2 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  80 -
>  drivers/i

Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Christoph Hellwig
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:
>> My take is that the drivers using this API are doing it to make sure
>> their HW blocks are setup in a way that is consistent with the DMA API
>> they are also using, and run in constrained embedded-style
>> environments that know the firmware support is present.
>>
>> So in the end it does not seem suitable right now for linking to
>> IOMMU_CACHE..
>
> That seems a pretty good summary - I think they're basically all "firmware 
> told Linux I'm coherent so I'd better act coherent" cases, but that still 
> doesn't necessarily mean that they're *forced* to respect that.

Yes. And the interface is horribly misnamed for that.  I'll see what
I can do to clean this up as I've noticed various other not very
nice things in that area.

> One of the 
> things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force 
> DMA cache maintenance for coherent devices, primarily to hook up in 
> Panfrost (where there is a bit of a performance to claw back on the 
> coherent AmLogic SoCs by leaving certain buffers non-cacheable).

This has been an explicit request from the amdgpu folks and thus been
on my TODO list for quite a while as well.  Note that I don't think it
should be a flag to dma_alloc_attrs, but rather for dma_alloc_pages
as the drivers that want non-snoop generally also want to actually
be able to deal with pages.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for IOMMU_CACHE

2022-04-07 Thread Jason Gunthorpe via iommu
While the comment was correct that this flag was intended to convey the
block no-snoop support in the IOMMU, it has become widely implemented and
used to mean the IOMMU supports IOMMU_CACHE as a map flag. Only the Intel
driver was different.

Now that the Intel driver is using enforce_cache_coherency() update the
comment to make it clear that IOMMU_CAP_CACHE_COHERENCY is only about
IOMMU_CACHE.  Fix the Intel driver to return true since IOMMU_CACHE always
works.

The two places that test this flag, usnic and vdpa, are both assigning
userspace pages to a driver controlled iommu_domain and require
IOMMU_CACHE behavior as they offer no way for userspace to synchronize
caches.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/intel/iommu.c | 2 +-
 include/linux/iommu.h   | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8f3674e997df06..14ba185175e9ec 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4556,7 +4556,7 @@ static bool intel_iommu_enforce_cache_coherency(struct 
iommu_domain *domain)
 static bool intel_iommu_capable(enum iommu_cap cap)
 {
if (cap == IOMMU_CAP_CACHE_COHERENCY)
-   return domain_update_iommu_snooping(NULL);
+   return true;
if (cap == IOMMU_CAP_INTR_REMAP)
return irq_remapping_enabled == 1;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fe4f24c469c373..fd58f7adc52796 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -103,8 +103,7 @@ static inline bool iommu_is_dma_domain(struct iommu_domain 
*domain)
 }
 
 enum iommu_cap {
-   IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU can enforce cache coherent DMA
-  transactions */
+   IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU_CACHE is supported */
IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
 };
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] vfio: Move the Intel no-snoop control off of IOMMU_CACHE

2022-04-07 Thread Jason Gunthorpe via iommu
IOMMU_CACHE means "normal DMA to this iommu_domain's IOVA should be cache
coherent" and is used by the DMA API. The definition allows for special
non-coherent DMA to exist - ie processing of the no-snoop flag in PCIe
TLPs - so long as this behavior is opt-in by the device driver.

The flag is mainly used by the DMA API to synchronize the IOMMU setting
with the expected cache behavior of the DMA master. eg based on
dev_is_dma_coherent() in some case.

For Intel IOMMU IOMMU_CACHE was redefined to mean 'force all DMA to be
cache coherent' which has the practical effect of causing the IOMMU to
ignore the no-snoop bit in a PCIe TLP.

x86 platforms are always IOMMU_CACHE, so Intel should ignore this flag.

Instead use the new domain op enforce_cache_coherency() which causes every
IOPTE created in the domain to have the no-snoop blocking behavior.

Reconfigure VFIO to always use IOMMU_CACHE and call
enforce_cache_coherency() to operate the special Intel behavior.

Remove the IOMMU_CACHE test from Intel IOMMU.

Ultimately VFIO plumbs the result of enforce_cache_coherency() back into
the x86 platform code through kvm_arch_register_noncoherent_dma() which
controls if the WBINVD instruction is available in the guest. No other
arch implements kvm_arch_register_noncoherent_dma().

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/intel/iommu.c |  7 ++-
 drivers/vfio/vfio_iommu_type1.c | 30 +++---
 include/linux/intel-iommu.h |  1 -
 3 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index f08611a6cc4799..8f3674e997df06 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -641,7 +641,6 @@ static unsigned long domain_super_pgsize_bitmap(struct 
dmar_domain *domain)
 static void domain_update_iommu_cap(struct dmar_domain *domain)
 {
domain_update_iommu_coherency(domain);
-   domain->iommu_snooping = domain_update_iommu_snooping(NULL);
domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL);
 
/*
@@ -4283,7 +4282,6 @@ static int md_domain_init(struct dmar_domain *domain, int 
guest_width)
domain->agaw = width_to_agaw(adjust_width);
 
domain->iommu_coherency = false;
-   domain->iommu_snooping = false;
domain->iommu_superpage = 0;
domain->max_addr = 0;
 
@@ -4422,8 +4420,7 @@ static int intel_iommu_map(struct iommu_domain *domain,
prot |= DMA_PTE_READ;
if (iommu_prot & IOMMU_WRITE)
prot |= DMA_PTE_WRITE;
-   if (((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping) ||
-   dmar_domain->enforce_no_snoop)
+   if (dmar_domain->enforce_no_snoop)
prot |= DMA_PTE_SNP;
 
max_addr = iova + size;
@@ -4550,7 +4547,7 @@ static bool intel_iommu_enforce_cache_coherency(struct 
iommu_domain *domain)
 {
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 
-   if (!dmar_domain->iommu_snooping)
+   if (!domain_update_iommu_snooping(NULL))
return false;
dmar_domain->enforce_no_snoop = true;
return true;
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9394aa9444c10c..c13b9290e35759 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -84,8 +84,8 @@ struct vfio_domain {
struct iommu_domain *domain;
struct list_headnext;
struct list_headgroup_list;
-   int prot;   /* IOMMU_CACHE */
-   boolfgsp;   /* Fine-grained super pages */
+   boolfgsp : 1;   /* Fine-grained super pages */
+   boolenforce_cache_coherency : 1;
 };
 
 struct vfio_dma {
@@ -1461,7 +1461,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, 
dma_addr_t iova,
 
list_for_each_entry(d, &iommu->domain_list, next) {
ret = iommu_map(d->domain, iova, (phys_addr_t)pfn << PAGE_SHIFT,
-   npage << PAGE_SHIFT, prot | d->prot);
+   npage << PAGE_SHIFT, prot | IOMMU_CACHE);
if (ret)
goto unwind;
 
@@ -1771,7 +1771,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
}
 
ret = iommu_map(domain->domain, iova, phys,
-   size, dma->prot | domain->prot);
+   size, dma->prot | IOMMU_CACHE);
if (ret) {
if (!dma->iommu_mapped) {
vfio_unpin_pages_remote(dma, iova,
@@ -1859,7 +1859,7 @@ static void vfio_test_domain_fgsp(struct vfio_domain 
*domain)
return;
 
ret = iommu_map(domain->domain, 0, page_to_phys(pages), PAGE_SIZE * 2,
-   IOMMU_READ | I

[PATCH v2 4/4] vfio: Require that devices support DMA cache coherence

2022-04-07 Thread Jason Gunthorpe via iommu
IOMMU_CACHE means that normal DMAs do not require any additional coherency
mechanism and is the basic uAPI that VFIO exposes to userspace. For
instance VFIO applications like DPDK will not work if additional coherency
operations are required.

Therefore check IOMMU_CAP_CACHE_COHERENCY like vdpa & usnic do before
allowing an IOMMU backed VFIO device to be created.

Signed-off-by: Jason Gunthorpe 
---
 drivers/vfio/vfio.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index a4555014bd1e72..9edad767cfdad3 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -815,6 +815,13 @@ static int __vfio_register_dev(struct vfio_device *device,
 
 int vfio_register_group_dev(struct vfio_device *device)
 {
+   /*
+* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
+* restore cache coherency.
+*/
+   if (!iommu_capable(device->dev->bus, IOMMU_CAP_CACHE_COHERENCY))
+   return -EINVAL;
+
return __vfio_register_dev(device,
vfio_group_find_or_alloc(device->dev));
 }
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/4] iommu: Introduce the domain op enforce_cache_coherency()

2022-04-07 Thread Jason Gunthorpe via iommu
This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and
IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU.

Currently only Intel and AMD IOMMUs are known to support this
feature. They both implement it as an IOPTE bit, that when set, will cause
PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though
the no-snoop bit was clear.

The new API is triggered by calling enforce_cache_coherency() before
mapping any IOVA to the domain which globally switches on no-snoop
blocking. This allows other implementations that might block no-snoop
globally and outside the IOPTE - AMD also documents such a HW capability.

Leave AMD out of sync with Intel and have it block no-snoop even for
in-kernel users. This can be trivially resolved in a follow up patch.

Only VFIO will call this new API.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/amd/iommu.c   |  7 +++
 drivers/iommu/intel/iommu.c | 14 +-
 include/linux/intel-iommu.h |  1 +
 include/linux/iommu.h   |  4 
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e61..e500b487eb3429 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2271,6 +2271,12 @@ static int amd_iommu_def_domain_type(struct device *dev)
return 0;
 }
 
+static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
+{
+   /* IOMMU_PTE_FC is always set */
+   return true;
+}
+
 const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -2293,6 +2299,7 @@ const struct iommu_ops amd_iommu_ops = {
.flush_iotlb_all = amd_iommu_flush_iotlb_all,
.iotlb_sync = amd_iommu_iotlb_sync,
.free   = amd_iommu_domain_free,
+   .enforce_cache_coherency = amd_iommu_enforce_cache_coherency,
}
 };
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index df5c62ecf942b8..f08611a6cc4799 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4422,7 +4422,8 @@ static int intel_iommu_map(struct iommu_domain *domain,
prot |= DMA_PTE_READ;
if (iommu_prot & IOMMU_WRITE)
prot |= DMA_PTE_WRITE;
-   if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping)
+   if (((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping) ||
+   dmar_domain->enforce_no_snoop)
prot |= DMA_PTE_SNP;
 
max_addr = iova + size;
@@ -4545,6 +4546,16 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
iommu_domain *domain,
return phys;
 }
 
+static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+
+   if (!dmar_domain->iommu_snooping)
+   return false;
+   dmar_domain->enforce_no_snoop = true;
+   return true;
+}
+
 static bool intel_iommu_capable(enum iommu_cap cap)
 {
if (cap == IOMMU_CAP_CACHE_COHERENCY)
@@ -4898,6 +4909,7 @@ const struct iommu_ops intel_iommu_ops = {
.iotlb_sync = intel_iommu_tlb_sync,
.iova_to_phys   = intel_iommu_iova_to_phys,
.free   = intel_iommu_domain_free,
+   .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
}
 };
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2f9891cb3d0014..1f930c0c225d94 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -540,6 +540,7 @@ struct dmar_domain {
u8 has_iotlb_device: 1;
u8 iommu_coherency: 1;  /* indicate coherency of iommu access */
u8 iommu_snooping: 1;   /* indicate snooping control feature */
+   u8 enforce_no_snoop : 1;/* Create IOPTEs with snoop control */
 
struct list_head devices;   /* all devices' list */
struct iova_domain iovad;   /* iova's that belong to this domain */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9208eca4b0d1ac..fe4f24c469c373 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -272,6 +272,9 @@ struct iommu_ops {
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *queue
  * @iova_to_phys: translate iova to physical address
+ * @enforce_cache_coherency: Prevent any kind of DMA from bypassing 
IOMMU_CACHE,
+ *   including no-snoop TLPs on PCIe or other platform
+ *   specific mechanisms.
  * @enable_nesting: Enable nesting
  * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
  * @free: Release the domain after use.
@@ -300,6 +303,7 @@ struct iommu_domain_ops {
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
dma_addr_t i

[PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent

2022-04-07 Thread Jason Gunthorpe via iommu
PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented
by a platform as bypassing elements in the DMA coherent CPU cache
hierarchy. A driver can command a device to set this bit on some of its
transactions as a micro-optimization.

However, the driver is now responsible to synchronize the CPU cache with
the DMA that bypassed it. On x86 this may be done through the wbinvd
instruction, and the i915 GPU driver is the only Linux DMA driver that
calls it.

The problem comes that KVM on x86 will normally disable the wbinvd
instruction in the guest and render it a NOP. As the driver running in the
guest is not aware the wbinvd doesn't work it may still cause the device
to set the no-snoop bit and the platform will bypass the CPU cache.
Without a working wbinvd there is no way to re-synchronize the CPU cache
and the driver in the VM has data corruption.

Thus, we see a general direction on x86 that the IOMMU HW is able to block
the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to
to NOP the wbinvd without causing any data corruption.

This control for Intel IOMMU was exposed by using IOMMU_CACHE and
IOMMU_CAP_CACHE_COHERENCY, however these two values now have multiple
meanings and usages beyond blocking no-snoop and the whole thing has
become confused. AMD IOMMU has the same feature and same IOPTE bits
however it unconditionally blocks no-snoop.

Change it so that:
 - IOMMU_CACHE is only about the DMA coherence of normal DMAs from a
   device. It is used by the DMA API/VFIO/etc when the user of the
   iommu_domain will not be doing manual cache coherency operations.

 - IOMMU_CAP_CACHE_COHERENCY indicates if IOMMU_CACHE can be used with the
   device.

 - The new optional domain op enforce_cache_coherency() will cause the
   entire domain to block no-snoop requests - ie there is no way for any
   device attached to the domain to opt out of the IOMMU_CACHE behavior.
   This is permanent on the domain and must apply to any future devices
   attached to it.

Ideally an iommu driver should implement enforce_cache_coherency() so that
by DMA API domains allow the no-snoop optimization. This leaves it
available to kernel drivers like i915. VFIO will call
enforce_cache_coherency() before establishing any mappings and the domain
should then permanently block no-snoop.

If enforce_cache_coherency() fails VFIO will communicate back through to
KVM into the arch code via kvm_arch_register_noncoherent_dma()
(only implemented by x86) which triggers a working wbinvd to be made
available to the VM.

While other iommu drivers are certainly welcome to implement
enforce_cache_coherency(), it is not clear there is any benefit in doing
so right now.

This is on github: https://github.com/jgunthorpe/linux/commits/intel_no_snoop

v2:
 - Abandon removing IOMMU_CAP_CACHE_COHERENCY - instead make it the cap
   flag that indicates IOMMU_CACHE is supported
 - Put the VFIO tests for IOMMU_CACHE at VFIO device registration
 - In the Intel driver remove the domain->iommu_snooping value, this is
   global not per-domain
v1: 
https://lore.kernel.org/r/0-v1-ef02c60ddb76+12ca2-intel_no_snoop_...@nvidia.com

Jason Gunthorpe (4):
  iommu: Introduce the domain op enforce_cache_coherency()
  vfio: Move the Intel no-snoop control off of IOMMU_CACHE
  iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for
IOMMU_CACHE
  vfio: Require that devices support DMA cache coherence

 drivers/iommu/amd/iommu.c   |  7 +++
 drivers/iommu/intel/iommu.c | 17 +
 drivers/vfio/vfio.c |  7 +++
 drivers/vfio/vfio_iommu_type1.c | 30 +++---
 include/linux/intel-iommu.h |  2 +-
 include/linux/iommu.h   |  7 +--
 6 files changed, 52 insertions(+), 18 deletions(-)


base-commit: 3123109284176b1532874591f7c81f3837bbdc17
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Jason Gunthorpe via iommu
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:

> For the specific case of overriding PCIe No Snoop (which is more problematic
> from an Arm SMMU PoV) when assigning to a VM, would that not be easier
> solved by just having vfio-pci clear the "Enable No Snoop" control bit in
> the endpoint's PCIe capability?

Ideally.

That was rediscussed recently, apparently there are non-compliant
devices and drivers that just ignore the bit. 

Presumably this is why x86 had to move to an IOMMU enforced feature..

> That seems a pretty good summary - I think they're basically all "firmware
> told Linux I'm coherent so I'd better act coherent" cases, but that still
> doesn't necessarily mean that they're *forced* to respect that. One of the
> things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force
> DMA cache maintenance for coherent devices, primarily to hook up in Panfrost
> (where there is a bit of a performance to claw back on the coherent AmLogic
> SoCs by leaving certain buffers non-cacheable).

It would be great to see that in a way that could bring in the few other
GPU drivers doing no-snoop to a formal DMA API instead of hacking
their own stuff with wbinvd calls or whatever.

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Robin Murphy

On 2022-04-07 14:59, Jason Gunthorpe wrote:

On Thu, Apr 07, 2022 at 07:18:48AM +, Tian, Kevin wrote:

From: Jason Gunthorpe 
Sent: Thursday, April 7, 2022 1:17 AM

On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:

On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:

On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:

On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:

Oh, I didn't know about device_get_dma_attr()..


Which is completely broken for any non-OF, non-ACPI plaform.


I saw that, but I spent some time searching and could not find an
iommu driver that would load independently of OF or ACPI. ie no IOMMU
platform drivers are created by board files. Things like Intel/AMD
discover only from ACPI, etc.


Intel discovers IOMMUs (and optionally ACPI namespace devices) from
ACPI, but there is no ACPI description for PCI devices i.e. the current
logic of device_get_dma_attr() cannot be used on PCI devices.


Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or
DEV_DMA_NOT_SUPPORTED?


I think it _should_ return DEV_DMA_COHERENT on x86/IA-64 (unless a _CCA 
method was actually present to say otherwise), based on 
acpi_init_coherency(), but I only know for sure what happens on arm64.



I think I should give up on this and just redefine the existing iommu
cap flag to IOMMU_CAP_CACHE_SUPPORTED or something.


TBH I don't see any issue with current name, but I'd certainly be happy 
to nail down a specific definition for it, along the lines of "this 
means that IOMMU_CACHE mappings are generally coherent". That works for 
things like Arm's S2FWB making it OK to assign an otherwise-non-coherent 
device without extra hassle.


For the specific case of overriding PCIe No Snoop (which is more 
problematic from an Arm SMMU PoV) when assigning to a VM, would that not 
be easier solved by just having vfio-pci clear the "Enable No Snoop" 
control bit in the endpoint's PCIe capability?



We could alternatively use existing device_get_dma_attr() as a default
with an iommu wrapper and push the exception down through the iommu
driver and s390 can override it.



if going this way probably device_get_dma_attr() should be renamed to
device_fwnode_get_dma_attr() instead to make it clearer?


I'm looking at the few users:

drivers/ata/ahci_ceva.c
drivers/ata/ahci_qoriq.c
  - These are ARM only drivers. They are trying to copy the dma-coherent
property from its DT/ACPI definition to internal register settings
which look like they tune how the AXI bus transactions are created.

I'm guessing the SATA IP block's AXI interface can be configured to
generate coherent or non-coherent requests and it has to be set
in a way that is consistent with the SOC architecture and match
what the DMA API expects the device will do.

drivers/crypto/ccp/sp-platform.c
  - Only used on ARM64 and also programs a HW register similar to the
sata drivers. Refuses to work if the FW property is not present.

drivers/net/ethernet/amd/xgbe/xgbe-platform.c
  - Seems to be configuring another ARM AXI block

drivers/gpu/drm/panfrost/panfrost_drv.c
  - Robin's commit comment here is good, and one of the things this
controls is if the coherent_walk is set for the io-pgtable-arm.c
code which avoids DMA API calls

drivers/gpu/drm/tegra/uapi.c
  - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea.

My take is that the drivers using this API are doing it to make sure
their HW blocks are setup in a way that is consistent with the DMA API
they are also using, and run in constrained embedded-style
environments that know the firmware support is present.

So in the end it does not seem suitable right now for linking to
IOMMU_CACHE..


That seems a pretty good summary - I think they're basically all 
"firmware told Linux I'm coherent so I'd better act coherent" cases, but 
that still doesn't necessarily mean that they're *forced* to respect 
that. One of the things on my to-do list is to try adding a 
DMA_ATTR_NO_SNOOP that can force DMA cache maintenance for coherent 
devices, primarily to hook up in Panfrost (where there is a bit of a 
performance to claw back on the coherent AmLogic SoCs by leaving certain 
buffers non-cacheable).


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] Make the iommu driver no-snoop block feature consistent

2022-04-07 Thread Jason Gunthorpe via iommu
On Wed, Apr 06, 2022 at 06:52:04AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, April 6, 2022 12:16 AM
> > 
> > PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented
> > by a platform as bypassing elements in the DMA coherent CPU cache
> > hierarchy. A driver can command a device to set this bit on some of its
> > transactions as a micro-optimization.
> > 
> > However, the driver is now responsible to synchronize the CPU cache with
> > the DMA that bypassed it. On x86 this is done through the wbinvd
> > instruction, and the i915 GPU driver is the only Linux DMA driver that
> > calls it.
> 
> More accurately x86 supports both unprivileged clflush instructions
> to invalidate one cacheline and a privileged wbinvd instruction to
> invalidate the entire cache. Replacing 'this is done' with 'this may
> be done' is clearer.
> 
> > 
> > The problem comes that KVM on x86 will normally disable the wbinvd
> > instruction in the guest and render it a NOP. As the driver running in the
> > guest is not aware the wbinvd doesn't work it may still cause the device
> > to set the no-snoop bit and the platform will bypass the CPU cache.
> > Without a working wbinvd there is no way to re-synchronize the CPU cache
> > and the driver in the VM has data corruption.
> > 
> > Thus, we see a general direction on x86 that the IOMMU HW is able to block
> > the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to
> > to NOP the wbinvd without causing any data corruption.
> > 
> > This control for Intel IOMMU was exposed by using IOMMU_CACHE and
> > IOMMU_CAP_CACHE_COHERENCY, however these two values now have
> > multiple
> > meanings and usages beyond blocking no-snoop and the whole thing has
> > become confused.
>
> Also point out your finding about AMD IOMMU?

Done, thanks

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] vfio: Require that devices support DMA cache coherence

2022-04-07 Thread Jason Gunthorpe via iommu
On Wed, Apr 06, 2022 at 07:02:36AM +, Tian, Kevin wrote:

> > So like this:
> > 
> >  int vfio_register_group_dev(struct vfio_device *device)
> >  {
> > +   if (!dev_is_dma_coherent(device->dev))
> > +   return -EINVAL;
> > +
> > return __vfio_register_dev(device,
> > vfio_group_find_or_alloc(device->dev));
> >  }
> > 
> > I fixed it up.
> > 
> 
> if that is the case should it also apply to usnic and vdpa in the first
> patch (i.e. fail the probe)?

Ideally, but I don't want to mess with existing logic in these
drivers..

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v5 4/5] iommu: Allow max opt DMA len be set for a group via sysfs

2022-04-07 Thread John Garry via iommu

On 07/04/2022 09:21, Leizhen (ThunderTown) wrote:



On 2022/4/4 19:27, John Garry wrote:

Add support to allow the maximum optimised DMA len be set for an IOMMU
group via sysfs.

This is much the same with the method to change the default domain type
for a group.

Signed-off-by: John Garry 
---
  .../ABI/testing/sysfs-kernel-iommu_groups | 16 +
  drivers/iommu/iommu.c | 59 ++-
  include/linux/iommu.h |  6 ++
  3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups 
b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
index b15af6a5bc08..ed6f72794f6c 100644
--- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@@ -63,3 +63,19 @@ Description: /sys/kernel/iommu_groups//type shows 
the type of default
system could lead to catastrophic effects (the users might
need to reboot the machine to get it to normal state). So, it's
expected that the users understand what they're doing.
+
+What:  /sys/kernel/iommu_groups//max_opt_dma_size
+Date:  Feb 2022
+KernelVersion: v5.18
+Contact:   iommu@lists.linux-foundation.org
+Description:   /sys/kernel/iommu_groups//max_opt_dma_size shows the
+   max optimised DMA size for the default IOMMU domain associated
+   with the group.
+   Each IOMMU domain has an IOVA domain. The IOVA domain caches
+   IOVAs upto a certain size as a performance optimisation.
+   This sysfs file allows the range of the IOVA domain caching be
+   set, such that larger than default IOVAs may be cached.
+   A value of 0 means that the default caching range is chosen.
+   A privileged user could request the kernel the change the range
+   by writing to this file. For this to happen, the same rules
+   and procedure applies as in changing the default domain type.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 10bb10c2a210..7c7258f19bed 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -48,6 +48,7 @@ struct iommu_group {
struct iommu_domain *default_domain;
struct iommu_domain *domain;
struct list_head entry;
+   size_t max_opt_dma_size;
  };
  
  struct group_device {

@@ -89,6 +90,9 @@ static int iommu_create_device_direct_mappings(struct 
iommu_group *group,
  static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
  static ssize_t iommu_group_store_type(struct iommu_group *group,
  const char *buf, size_t count);
+static ssize_t iommu_group_store_max_opt_dma_size(struct iommu_group *group,
+ const char *buf,
+ size_t count);
  
  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\

  struct iommu_group_attribute iommu_group_attr_##_name =   \
@@ -571,6 +575,12 @@ static ssize_t iommu_group_show_type(struct iommu_group 
*group,
return strlen(type);
  }
  
+static ssize_t iommu_group_show_max_opt_dma_size(struct iommu_group *group,

+char *buf)
+{
+   return sprintf(buf, "%zu\n", group->max_opt_dma_size);
+}
+
  static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
  
  static IOMMU_GROUP_ATTR(reserved_regions, 0444,

@@ -579,6 +589,9 @@ static IOMMU_GROUP_ATTR(reserved_regions, 0444,
  static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type,
iommu_group_store_type);
  
+static IOMMU_GROUP_ATTR(max_opt_dma_size, 0644, iommu_group_show_max_opt_dma_size,

+   iommu_group_store_max_opt_dma_size);
+
  static void iommu_group_release(struct kobject *kobj)
  {
struct iommu_group *group = to_iommu_group(kobj);
@@ -665,6 +678,10 @@ struct iommu_group *iommu_group_alloc(void)
if (ret)
return ERR_PTR(ret);
  
+	ret = iommu_group_create_file(group, &iommu_group_attr_max_opt_dma_size);

+   if (ret)
+   return ERR_PTR(ret);
+
pr_debug("Allocated group %d\n", group->id);
  
  	return group;

@@ -2087,6 +2104,11 @@ struct iommu_domain *iommu_get_dma_domain(struct device 
*dev)
return dev->iommu_group->default_domain;
  }
  
+size_t iommu_group_get_max_opt_dma_size(struct iommu_group *group)

+{
+   return group->max_opt_dma_size;
+}
+
  /*
   * IOMMU groups are really the natural working unit of the IOMMU, but
   * the IOMMU API works on domains and devices.  Bridge that gap by
@@ -2871,12 +2893,14 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
   * @prev_dev: The device in the group (this is used to make sure that the 
device
   * hasn't changed after the caller has called this function)
   * @type: The type of the new default domain that ge

Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Jason Gunthorpe via iommu
On Thu, Apr 07, 2022 at 07:18:48AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Thursday, April 7, 2022 1:17 AM
> > 
> > On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
> > > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > > > > Oh, I didn't know about device_get_dma_attr()..
> > > > >
> > > > > Which is completely broken for any non-OF, non-ACPI plaform.
> > > >
> > > > I saw that, but I spent some time searching and could not find an
> > > > iommu driver that would load independently of OF or ACPI. ie no IOMMU
> > > > platform drivers are created by board files. Things like Intel/AMD
> > > > discover only from ACPI, etc.
> 
> Intel discovers IOMMUs (and optionally ACPI namespace devices) from
> ACPI, but there is no ACPI description for PCI devices i.e. the current
> logic of device_get_dma_attr() cannot be used on PCI devices. 

Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or
DEV_DMA_NOT_SUPPORTED?

I think I should give up on this and just redefine the existing iommu
cap flag to IOMMU_CAP_CACHE_SUPPORTED or something.

> > We could alternatively use existing device_get_dma_attr() as a default
> > with an iommu wrapper and push the exception down through the iommu
> > driver and s390 can override it.
> > 
> 
> if going this way probably device_get_dma_attr() should be renamed to
> device_fwnode_get_dma_attr() instead to make it clearer?

I'm looking at the few users:

drivers/ata/ahci_ceva.c
drivers/ata/ahci_qoriq.c
 - These are ARM only drivers. They are trying to copy the dma-coherent
   property from its DT/ACPI definition to internal register settings
   which look like they tune how the AXI bus transactions are created.

   I'm guessing the SATA IP block's AXI interface can be configured to
   generate coherent or non-coherent requests and it has to be set
   in a way that is consistent with the SOC architecture and match
   what the DMA API expects the device will do.

drivers/crypto/ccp/sp-platform.c
 - Only used on ARM64 and also programs a HW register similar to the
   sata drivers. Refuses to work if the FW property is not present.

drivers/net/ethernet/amd/xgbe/xgbe-platform.c
 - Seems to be configuring another ARM AXI block

drivers/gpu/drm/panfrost/panfrost_drv.c
 - Robin's commit comment here is good, and one of the things this
   controls is if the coherent_walk is set for the io-pgtable-arm.c
   code which avoids DMA API calls

drivers/gpu/drm/tegra/uapi.c
 - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea.

My take is that the drivers using this API are doing it to make sure
their HW blocks are setup in a way that is consistent with the DMA API
they are also using, and run in constrained embedded-style
environments that know the firmware support is present.

So in the end it does not seem suitable right now for linking to
IOMMU_CACHE..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 06/11] ACPI/IORT: Add support to retrieve IORT RMR reserved regions

2022-04-07 Thread Christoph Hellwig
On Thu, Apr 07, 2022 at 02:53:38PM +0100, Robin Murphy wrote:
> > Why can't this just go into generic_iommu_put_resv_regions?  The idea
> > that the iommu low-level drivers need to call into dma-iommu which is
> > a consumer of the IOMMU API is odd.  Especially if that just calls out
> > to ACPI code and generic IOMMU code only anyway.
> 
> Because assuming ACPI means IORT is not generic. Part of the aim in adding
> the union to iommu_resv_region is that stuff like AMD's unity_map_entry and
> Intel's dmar_rmrr_unit can be folded into it as well, and their reserved
> region handling correspondingly simplified too.
> 
> The iommu_dma_{get,put}_resv_region() helpers are kind of intended to be
> specific to the fwnode mechanism which deals with IORT and devicetree (once
> the reserved region bindings are fully worked out).

But IORT is not driver₋specific code.  So we'll need a USE_IORT flag
somewhere in core IOMMU code instead of trying to stuff this into
driver operations.  and dma-iommu mostly certainly implies IORT even
less than ACPI.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 06/11] ACPI/IORT: Add support to retrieve IORT RMR reserved regions

2022-04-07 Thread Robin Murphy

On 2022-04-07 14:28, Christoph Hellwig wrote:

+static void iort_rmr_desc_check_overlap(struct acpi_iort_rmr_desc *desc, u32 
count)


Overly long line.


  void iommu_dma_put_resv_regions(struct device *dev, struct list_head *list)
  {
+   if (!is_of_node(dev_iommu_fwspec_get(dev)->iommu_fwnode))
+   iort_iommu_put_resv_regions(dev, list);
+
generic_iommu_put_resv_regions(dev, list);
  }


Why can't this just go into generic_iommu_put_resv_regions?  The idea
that the iommu low-level drivers need to call into dma-iommu which is
a consumer of the IOMMU API is odd.  Especially if that just calls out
to ACPI code and generic IOMMU code only anyway.


Because assuming ACPI means IORT is not generic. Part of the aim in 
adding the union to iommu_resv_region is that stuff like AMD's 
unity_map_entry and Intel's dmar_rmrr_unit can be folded into it as 
well, and their reserved region handling correspondingly simplified too.


The iommu_dma_{get,put}_resv_region() helpers are kind of intended to be 
specific to the fwnode mechanism which deals with IORT and devicetree 
(once the reserved region bindings are fully worked out).


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v5 5/5] iova: Add iova_len argument to iova_domain_init_rcaches()

2022-04-07 Thread John Garry via iommu

On 07/04/2022 09:27, Leizhen (ThunderTown) wrote:




Thanks for having a look



On 2022/4/4 19:27, John Garry wrote:

Add max opt argument to iova_domain_init_rcaches(), and use it to set the
rcaches range.

Also fix up all users to set this value (at 0, meaning use default),
including a wrapper for that, iova_domain_init_rcaches_default().

For dma-iommu.c we derive the iova_len argument from the IOMMU group
max opt DMA size.

Signed-off-by: John Garry 
---
  drivers/iommu/dma-iommu.c| 15 ++-
  drivers/iommu/iova.c | 19 ---
  drivers/vdpa/vdpa_user/iova_domain.c |  4 ++--
  include/linux/iova.h |  3 ++-
  4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 42ca42ff1b5d..19f35624611c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -525,6 +525,8 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
struct iova_domain *iovad;
+   size_t max_opt_dma_size;
+   unsigned long iova_len = 0;
int ret;
  
  	if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)

@@ -560,7 +562,18 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
}
  
  	init_iova_domain(iovad, 1UL << order, base_pfn);

-   ret = iova_domain_init_rcaches(iovad);
+
+   max_opt_dma_size = iommu_group_get_max_opt_dma_size(dev->iommu_group);
+   if (max_opt_dma_size) {
+   unsigned long shift = __ffs(1UL << order);
+
+   iova_len = roundup_pow_of_two(max_opt_dma_size);
+   iova_len >>= shift;
+   if (!iova_len)
+   iova_len = 1;


How about move "iovad->rcache_max_size = iova_len_to_rcache_max(iova_len);" 
here? So that,
iova_domain_init_rcaches() can remain the same. And 
iova_domain_init_rcaches_default() does
not need to be added.



I see your idea. I will say that I would rather not add 
iova_domain_init_rcaches_default(). But personally I think it's better 
to setup all rcache stuff only once and inside 
iova_domain_init_rcaches(), as it is today.


In addition, it doesn't look reasonable to expose iova_len_to_rcache_max().

But maybe it's ok. Other opinion would be welcome...

Thanks,
John


+   }
+
+   ret = iova_domain_init_rcaches(iovad, iova_len);
if (ret)
return ret;
  
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c

index 5c22b9187b79..d65e79e132ee 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -706,12 +706,20 @@ static void iova_magazine_push(struct iova_magazine *mag, 
unsigned long pfn)
mag->pfns[mag->size++] = pfn;
  }
  
-int iova_domain_init_rcaches(struct iova_domain *iovad)

+static unsigned long iova_len_to_rcache_max(unsigned long iova_len)
+{
+   return order_base_2(iova_len) + 1;
+}
+
+int iova_domain_init_rcaches(struct iova_domain *iovad, unsigned long iova_len)
  {
unsigned int cpu;
int i, ret;
  
-	iovad->rcache_max_size = 6; /* Arbitrarily high default */

+   if (iova_len)
+   iovad->rcache_max_size = iova_len_to_rcache_max(iova_len);
+   else
+   iovad->rcache_max_size = 6; /* Arbitrarily high default */
  
  	iovad->rcaches = kcalloc(iovad->rcache_max_size,

 sizeof(struct iova_rcache),
@@ -755,7 +763,12 @@ int iova_domain_init_rcaches(struct iova_domain *iovad)
free_iova_rcaches(iovad);
return ret;
  }
-EXPORT_SYMBOL_GPL(iova_domain_init_rcaches);
+
+int iova_domain_init_rcaches_default(struct iova_domain *iovad)
+{
+   return iova_domain_init_rcaches(iovad, 0);
+}
+EXPORT_SYMBOL_GPL(iova_domain_init_rcaches_default);
  
  /*

   * Try inserting IOVA range starting with 'iova_pfn' into 'rcache', and
diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
b/drivers/vdpa/vdpa_user/iova_domain.c
index 6daa3978d290..3a2acef98a4a 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -514,12 +514,12 @@ vduse_domain_create(unsigned long iova_limit, size_t 
bounce_size)
spin_lock_init(&domain->iotlb_lock);
init_iova_domain(&domain->stream_iovad,
PAGE_SIZE, IOVA_START_PFN);
-   ret = iova_domain_init_rcaches(&domain->stream_iovad);
+   ret = iova_domain_init_rcaches_default(&domain->stream_iovad);
if (ret)
goto err_iovad_stream;
init_iova_domain(&domain->consistent_iovad,
PAGE_SIZE, bounce_pfns);
-   ret = iova_domain_init_rcaches(&domain->consistent_iovad);
+   ret = iova_domain_init_rcaches_default(&domain->consistent_iovad);
if (ret)
goto err_iovad_consistent;
  
diff --git a/include/linux/iova.h b/include/linux/iova.h

index 02f7222fa85a..56281

Re: [PATCH v2 1/2] iommu/amd: Enable swiotlb in all cases

2022-04-07 Thread Christoph Hellwig
On Thu, Apr 07, 2022 at 02:31:44PM +0100, Robin Murphy wrote:
> FWIW it's also broken for another niche case where
> iommu_default_passthrough() == false at init, but the user later changes a
> 32-bit device's default domain type to passthrough via sysfs, such that it
> starts needing regular dma-direct bouncing.

Yeah.

We also have yet another issue:  swiotlb is not allocate if there is
no memory outside the 4GB physical address space.  I think I can fix
that easily after my swiotlb init series goes in, before that it
would be a bit of a mess spread over all the architectures.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/amd: Enable swiotlb in all cases

2022-04-07 Thread Robin Murphy

On 2022-04-04 21:47, Mario Limonciello wrote:

Previously the AMD IOMMU would only enable SWIOTLB in certain
circumstances:
  * IOMMU in passthrough mode
  * SME enabled

This logic however doesn't work when an untrusted device is plugged in
that doesn't do page aligned DMA transactions.  The expectation is
that a bounce buffer is used for those transactions.

This fails like this:

swiotlb buffer is full (sz: 4096 bytes), total 0 (slots), used 0 (slots)

That happens because the bounce buffers have been allocated, followed by
freed during startup but the bounce buffering code expects that all IOMMUs
have left it enabled.

Remove the criteria to set up bounce buffers on AMD systems to ensure
they're always available for supporting untrusted devices.


FWIW it's also broken for another niche case where 
iommu_default_passthrough() == false at init, but the user later changes 
a 32-bit device's default domain type to passthrough via sysfs, such 
that it starts needing regular dma-direct bouncing.


Reviewed-by: Robin Murphy 


Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Suggested-by: Christoph Hellwig 
Signed-off-by: Mario Limonciello 
---
v1->v2:
  * Enable swiotlb for AMD instead of ignoring it for inactive

  drivers/iommu/amd/iommu.c | 7 ---
  1 file changed, 7 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e..079694f894b8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1838,17 +1838,10 @@ void amd_iommu_domain_update(struct protection_domain 
*domain)
amd_iommu_domain_flush_complete(domain);
  }
  
-static void __init amd_iommu_init_dma_ops(void)

-{
-   swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
-}
-
  int __init amd_iommu_init_api(void)
  {
int err;
  
-	amd_iommu_init_dma_ops();

-
err = bus_set_iommu(&pci_bus_type, &amd_iommu_ops);
if (err)
return err;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 06/11] ACPI/IORT: Add support to retrieve IORT RMR reserved regions

2022-04-07 Thread Christoph Hellwig
> +static void iort_rmr_desc_check_overlap(struct acpi_iort_rmr_desc *desc, u32 
> count)

Overly long line.

>  void iommu_dma_put_resv_regions(struct device *dev, struct list_head *list)
>  {
> + if (!is_of_node(dev_iommu_fwspec_get(dev)->iommu_fwnode))
> + iort_iommu_put_resv_regions(dev, list);
> +
>   generic_iommu_put_resv_regions(dev, list);
>  }

Why can't this just go into generic_iommu_put_resv_regions?  The idea
that the iommu low-level drivers need to call into dma-iommu which is
a consumer of the IOMMU API is odd.  Especially if that just calls out
to ACPI code and generic IOMMU code only anyway.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 05/11] iommu/dma: Introduce a helper to remove reserved regions

2022-04-07 Thread Christoph Hellwig
>  
> +static inline void iommu_dma_put_resv_regions(struct device *dev, struct 
> list_head *list)
> +{
> +}
> +
>  #endif   /* CONFIG_IOMMU_DMA */

This changes behavior when CONFIG_IOMMU_DMA is not set.  So e.g. on
ARM all the drivers that are using the new helper now fail to release
reserved regions.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu: remove the put_resv_regions method

2022-04-07 Thread Christoph Hellwig
On Thu, Apr 07, 2022 at 11:18:20AM +0100, Robin Murphy wrote:
> On 2022-04-07 07:26, Christoph Hellwig wrote:
>> All drivers that implement get_resv_regions just use
>> generic_put_resv_regions to implement the put side.  Remove the
>> indirections and document the allocations constraints.
>
> Unfortunately we need to keep this one for now, as the belated IORT RMR 
> support will finally be the first real user[1][2].
>
> Robin.
>
> [1] 
> https://lore.kernel.org/linux-iommu/20220404124209.1086-6-shameerali.kolothum.th...@huawei.com/
> [2] 
> https://lore.kernel.org/linux-iommu/20220404124209.1086-7-shameerali.kolothum.th...@huawei.com/

What these patches to looks wrong to me.  I'll comment there.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/omap: Fix regression in probe for NULL pointer dereference

2022-04-07 Thread H. Nikolaus Schaller
Hi,

> Am 07.04.2022 um 07:39 schrieb Tony Lindgren :
> 
> Hi,
> 
> * Tony Lindgren  [220331 09:21]:
>> Commit 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") started
>> triggering a NULL pointer dereference for some omap variants:
>> 
>> __iommu_probe_device from probe_iommu_group+0x2c/0x38
>> probe_iommu_group from bus_for_each_dev+0x74/0xbc
>> bus_for_each_dev from bus_iommu_probe+0x34/0x2e8
>> bus_iommu_probe from bus_set_iommu+0x80/0xc8
>> bus_set_iommu from omap_iommu_init+0x88/0xcc
>> omap_iommu_init from do_one_initcall+0x44/0x24
>> 
>> This is caused by omap iommu probe returning 0 instead of ERR_PTR(-ENODEV)
>> as noted by Jason Gunthorpe .
>> 
>> Looks like the regression already happened with an earlier commit
>> 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs")
>> that changed the function return type and missed converting one place.
> 
> Can you guys please get this fix into the -rc series? Or ack it so
> I can pick it up into my fixes branch?
> 
> Without this fix booting is failing for a bunch of devices.

Yes, I can confirm that v5.18-rc1 does not even boot the GTA04 (omap3)
and OMAP5UEVM to any activity without this patch.

Seems to be an urgent fix.

BR and thanks,
Nikolaus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] dma-iommu: Check that swiotlb is active before trying to use it

2022-04-07 Thread Robin Murphy

On 2022-04-04 21:47, Mario Limonciello via iommu wrote:

If the IOMMU is in use and an untrusted device is connected to an external
facing port but the address requested isn't page aligned will cause the
kernel to attempt to use bounce buffers.

If for some reason the bounce buffers have not been allocated this is a
problem that should be made apparent to the user.


Reviewed-by: Robin Murphy 


Signed-off-by: Mario Limonciello 
---
v1->v2:
  * Move error message into the caller

  drivers/iommu/dma-iommu.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..1ca85d37eeab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -971,6 +971,11 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
void *padding_start;
size_t padding_size, aligned_size;
  
+		if (!is_swiotlb_active(dev)) {

+   dev_warn_once(dev, "DMA bounce buffers are inactive, unable 
to map unaligned transaction.\n");
+   return DMA_MAPPING_ERROR;
+   }
+
aligned_size = iova_align(iovad, size);
phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
  iova_mask(iovad), dir, attrs);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 0/7] Add support for HiSilicon PCIe Tune and Trace device

2022-04-07 Thread Yicong Yang via iommu
HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
integrated Endpoint (RCiEP) device, providing the capability
to dynamically monitor and tune the PCIe traffic (tune),
and trace the TLP headers (trace).

PTT tune is designed for monitoring and adjusting PCIe link parameters.
We provide several parameters of the PCIe link. Through the driver,
user can adjust the value of certain parameter to affect the PCIe link
for the purpose of enhancing the performance in certian situation.

PTT trace is designed for dumping the TLP headers to the memory, which
can be used to analyze the transactions and usage condition of the PCIe
Link. Users can choose filters to trace headers, by either requester
ID, or those downstream of a set of Root Ports on the same core of the
PTT device. It's also supported to trace the headers of certain type and
of certain direction.

The driver registers a PMU device for each PTT device. The trace can
be used through `perf record` and the traced headers can be decoded
by `perf report`. The perf command support for the device is also
added in this patchset. The tune can be used through the sysfs
attributes of related PMU device. See the documentation for the
detailed usage.

Change since v6:
- Fix W=1 errors reported by lkp test, thanks

Change since v5:
- Squash the PMU patch into PATCH 2 suggested by John
- refine the commit message of PATCH 1 and some comments
Link: 
https://lore.kernel.org/lkml/20220308084930.5142-1-yangyic...@hisilicon.com/

Change since v4:
Address the comments from Jonathan, John and Ma Ca, thanks.
- Use devm* also for allocating the DMA buffers
- Remove the IRQ handler stub in Patch 2
- Make functions waiting for hardware state return boolean
- Manual remove the PMU device as it should be removed first
- Modifier the orders in probe and removal to make them matched well
- Make available {directions,type,format} array const and non-global
- Using the right filter list in filters show and well protect the
  list with mutex
- Record the trace status with a boolean @started rather than enum
- Optimize the process of finding the PTT devices of the perf-tool
Link: 
https://lore.kernel.org/linux-pci/20220221084307.33712-1-yangyic...@hisilicon.com/

Change since v3:
Address the comments from Jonathan and John, thanks.
- drop members in the common struct which can be get on the fly
- reduce buffer struct and organize the buffers with array instead of list
- reduce the DMA reset wait time to avoid long time busy loop
- split the available_filters sysfs attribute into two files, for root port
  and requester respectively. Update the documentation accordingly
- make IOMMU mapping check earlier in probe to avoid race condition. Also
  make IOMMU quirk patch prior to driver in the series
- Cleanups and typos fixes from John and Jonathan
Link: 
https://lore.kernel.org/linux-pci/20220124131118.17887-1-yangyic...@hisilicon.com/

Change since v2:
- address the comments from Mathieu, thanks.
  - rename the directory to ptt to match the function of the device
  - spinoff the declarations to a separate header
  - split the trace function to several patches
  - some other comments.
- make default smmu domain type of PTT device to identity
  Drop the RMR as it's not recommended and use an iommu_def_domain_type
  quirk to passthrough the device DMA as suggested by Robin. 
Link: 
https://lore.kernel.org/linux-pci/2026090625.53702-1-yangyic...@hisilicon.com/

Change since v1:
- switch the user interface of trace to perf from debugfs
- switch the user interface of tune to sysfs from debugfs
- add perf tool support to start trace and decode the trace data
- address the comments of documentation from Bjorn
- add RMR[1] support of the device as trace works in RMR mode or
  direct DMA mode. RMR support is achieved by common APIs rather
  than the APIs implemented in [1].
Link: 
https://lore.kernel.org/lkml/1618654631-42454-1-git-send-email-yangyic...@hisilicon.com/
[1] 
https://lore.kernel.org/linux-acpi/20210805080724.480-1-shameerali.kolothum.th...@huawei.com/

Qi Liu (1):
  perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

Yicong Yang (6):
  iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to
identity
  hwtracing: Add trace function support for HiSilicon PCIe Tune and
Trace device
  hisi_ptt: Add support for dynamically updating the filter list
  hisi_ptt: Add tune function support for HiSilicon PCIe Tune and Trace
device
  docs: Add HiSilicon PTT device driver documentation
  MAINTAINERS: Add maintainer for HiSilicon PTT driver

 Documentation/trace/hisi-ptt.rst  |  303 +
 MAINTAINERS   |7 +
 drivers/Makefile  |1 +
 drivers/hwtracing/Kconfig |2 +
 drivers/hwtracing/ptt/Kconfig |   12 +
 drivers/hwtracing/ptt/Makefile|2 +
 drivers/hwtracing/ptt/hisi_ptt.c  | 1161 ++

[PATCH v7 7/7] MAINTAINERS: Add maintainer for HiSilicon PTT driver

2022-04-07 Thread Yicong Yang via iommu
Add maintainer for driver and documentation of HiSilicon PTT device.

Signed-off-by: Yicong Yang 
Reviewed-by: Jonathan Cameron 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fd768d43e048..d30a1698251c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8858,6 +8858,13 @@ F:   Documentation/admin-guide/perf/hisi-pcie-pmu.rst
 F: Documentation/admin-guide/perf/hisi-pmu.rst
 F: drivers/perf/hisilicon
 
+HISILICON PTT DRIVER
+M: Yicong Yang 
+L: linux-ker...@vger.kernel.org
+S: Maintained
+F: Documentation/trace/hisi-ptt.rst
+F: drivers/hwtracing/ptt/
+
 HISILICON QM AND ZIP Controller DRIVER
 M: Zhou Wang 
 L: linux-cry...@vger.kernel.org
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 2/7] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-04-07 Thread Yicong Yang via iommu
HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
Endpoint(RCiEP) device, providing the capability to dynamically monitor and
tune the PCIe traffic, and trace the TLP headers.

Add the driver for the device to enable the trace function. Register PMU
device of PTT trace, then users can use trace through perf command. The
driver makes use of perf AUX trace and support following events to
configure the trace:

- filter: select Root port or Endpoint to trace
- type: select the type of traced TLP headers
- direction: select the direction of traced TLP headers
- format: select the data format of the traced TLP headers

This patch adds the driver part of PTT trace. The perf command support of
PTT trace is added in the following patch.

Signed-off-by: Yicong Yang 
Reviewed-by: Jonathan Cameron 
---
 drivers/Makefile |   1 +
 drivers/hwtracing/Kconfig|   2 +
 drivers/hwtracing/ptt/Kconfig|  12 +
 drivers/hwtracing/ptt/Makefile   |   2 +
 drivers/hwtracing/ptt/hisi_ptt.c | 874 +++
 drivers/hwtracing/ptt/hisi_ptt.h | 166 ++
 6 files changed, 1057 insertions(+)
 create mode 100644 drivers/hwtracing/ptt/Kconfig
 create mode 100644 drivers/hwtracing/ptt/Makefile
 create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c
 create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h

diff --git a/drivers/Makefile b/drivers/Makefile
index 020780b6b4d2..662d50599467 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -175,6 +175,7 @@ obj-$(CONFIG_USB4)  += thunderbolt/
 obj-$(CONFIG_CORESIGHT)+= hwtracing/coresight/
 obj-y  += hwtracing/intel_th/
 obj-$(CONFIG_STM)  += hwtracing/stm/
+obj-$(CONFIG_HISI_PTT) += hwtracing/ptt/
 obj-$(CONFIG_ANDROID)  += android/
 obj-$(CONFIG_NVMEM)+= nvmem/
 obj-$(CONFIG_FPGA) += fpga/
diff --git a/drivers/hwtracing/Kconfig b/drivers/hwtracing/Kconfig
index 13085835a636..911ee977103c 100644
--- a/drivers/hwtracing/Kconfig
+++ b/drivers/hwtracing/Kconfig
@@ -5,4 +5,6 @@ source "drivers/hwtracing/stm/Kconfig"
 
 source "drivers/hwtracing/intel_th/Kconfig"
 
+source "drivers/hwtracing/ptt/Kconfig"
+
 endmenu
diff --git a/drivers/hwtracing/ptt/Kconfig b/drivers/hwtracing/ptt/Kconfig
new file mode 100644
index ..8902a6f27563
--- /dev/null
+++ b/drivers/hwtracing/ptt/Kconfig
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config HISI_PTT
+   tristate "HiSilicon PCIe Tune and Trace Device"
+   depends on ARM64 || (COMPILE_TEST && 64BIT)
+   depends on PCI && HAS_DMA && HAS_IOMEM && PERF_EVENTS
+   help
+ HiSilicon PCIe Tune and Trace Device exists as a PCIe RCiEP
+ device, and it provides support for PCIe traffic tuning and
+ tracing TLP headers to the memory.
+
+ This driver can also be built as a module. If so, the module
+ will be called hisi_ptt.
diff --git a/drivers/hwtracing/ptt/Makefile b/drivers/hwtracing/ptt/Makefile
new file mode 100644
index ..908c09a98161
--- /dev/null
+++ b/drivers/hwtracing/ptt/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_HISI_PTT) += hisi_ptt.o
diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
new file mode 100644
index ..242b41870380
--- /dev/null
+++ b/drivers/hwtracing/ptt/hisi_ptt.c
@@ -0,0 +1,874 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for HiSilicon PCIe tune and trace device
+ *
+ * Copyright (c) 2022 HiSilicon Technologies Co., Ltd.
+ * Author: Yicong Yang 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "hisi_ptt.h"
+
+static u16 hisi_ptt_get_filter_val(struct pci_dev *pdev)
+{
+   if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
+   return BIT(HISI_PCIE_CORE_PORT_ID(PCI_SLOT(pdev->devfn)));
+
+   return PCI_DEVID(pdev->bus->number, pdev->devfn);
+}
+
+static bool hisi_ptt_wait_trace_hw_idle(struct hisi_ptt *hisi_ptt)
+{
+   u32 val;
+
+   return !readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_STS,
+ val, val & HISI_PTT_TRACE_IDLE,
+ HISI_PTT_WAIT_POLL_INTERVAL_US,
+ HISI_PTT_WAIT_TRACE_TIMEOUT_US);
+}
+
+static bool hisi_ptt_wait_dma_reset_done(struct hisi_ptt *hisi_ptt)
+{
+   u32 val;
+
+   return !readl_poll_timeout_atomic(hisi_ptt->iobase + 
HISI_PTT_TRACE_WR_STS,
+ val, !val, 
HISI_PTT_RESET_POLL_INTERVAL_US,
+ HISI_PTT_RESET_TIMEOUT_US);
+}
+
+static void hisi_ptt_free_trace_buf(struct hisi_ptt *hisi_ptt)
+{
+   struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl;
+   struct device *dev = &hisi_ptt->pdev->dev;
+   int i;
+
+   if (!ctrl-

[PATCH v7 5/7] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

2022-04-07 Thread Yicong Yang via iommu
From: Qi Liu 

'perf record' and 'perf report --dump-raw-trace' supported in this
patch.

Example usage:

Output will contain raw PTT data and its textual representation, such
as:

0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x40  offset: 0
ref: 0xa5d50c725  idx: 0  tid: -1  cpu: 0
.
. ... HISI PTT data: size 4194304 bytes
.  : 00 00 00 00 Prefix
.  0004: 08 20 00 60 Header DW0
.  0008: ff 02 00 01 Header DW1
.  000c: 20 08 00 00 Header DW2
.  0010: 10 e7 44 ab Header DW3
.  0014: 2a a8 1e 01 Time
.  0020: 00 00 00 00 Prefix
.  0024: 01 00 00 60 Header DW0
.  0028: 0f 1e 00 01 Header DW1
.  002c: 04 00 00 00 Header DW2
.  0030: 40 00 81 02 Header DW3
.  0034: ee 02 00 00 Time


Signed-off-by: Qi Liu 
Signed-off-by: Yicong Yang 
---
 tools/perf/arch/arm/util/auxtrace.c   |  76 +-
 tools/perf/arch/arm/util/pmu.c|   3 +
 tools/perf/arch/arm64/util/Build  |   2 +-
 tools/perf/arch/arm64/util/hisi_ptt.c | 195 
 tools/perf/util/Build |   2 +
 tools/perf/util/auxtrace.c|   4 +
 tools/perf/util/auxtrace.h|   1 +
 tools/perf/util/hisi-ptt-decoder/Build|   1 +
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.c   | 170 ++
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.h   |  28 +++
 tools/perf/util/hisi_ptt.c| 218 ++
 tools/perf/util/hisi_ptt.h|  28 +++
 12 files changed, 724 insertions(+), 4 deletions(-)
 create mode 100644 tools/perf/arch/arm64/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/Build
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.h
 create mode 100644 tools/perf/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi_ptt.h

diff --git a/tools/perf/arch/arm/util/auxtrace.c 
b/tools/perf/arch/arm/util/auxtrace.c
index 5fc6a2a3dbc5..393f5757c039 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -4,9 +4,11 @@
  * Author: Mathieu Poirier 
  */
 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #include "../../../util/auxtrace.h"
 #include "../../../util/debug.h"
@@ -14,6 +16,7 @@
 #include "../../../util/pmu.h"
 #include "cs-etm.h"
 #include "arm-spe.h"
+#include "hisi_ptt.h"
 
 static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
 {
@@ -50,6 +53,58 @@ static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, 
int *err)
return arm_spe_pmus;
 }
 
+static struct perf_pmu **find_all_hisi_ptt_pmus(int *nr_ptts, int *err)
+{
+   const char *sysfs = sysfs__mountpoint();
+   struct perf_pmu **hisi_ptt_pmus = NULL;
+   struct dirent *dent;
+   char path[PATH_MAX];
+   DIR *dir = NULL;
+   int idx = 0;
+
+   snprintf(path, PATH_MAX, "%s" EVENT_SOURCE_DEVICE_PATH, sysfs);
+   dir = opendir(path);
+   if (!dir) {
+   pr_err("can't read directory '%s'\n", EVENT_SOURCE_DEVICE_PATH);
+   *err = -EINVAL;
+   goto out;
+   }
+
+   while ((dent = readdir(dir))) {
+   if (strstr(dent->d_name, HISI_PTT_PMU_NAME))
+   (*nr_ptts)++;
+   }
+
+   if (!(*nr_ptts))
+   goto out;
+
+   hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * (*nr_ptts));
+   if (!hisi_ptt_pmus) {
+   pr_err("hisi_ptt alloc failed\n");
+   *err = -ENOMEM;
+   goto out;
+   }
+
+   rewinddir(dir);
+   while ((dent = readdir(dir))) {
+   if (strstr(dent->d_name, HISI_PTT_PMU_NAME) && idx < 
(*nr_ptts)) {
+   hisi_ptt_pmus[idx] = perf_pmu__find(dent->d_name);
+   if (hisi_ptt_pmus[idx]) {
+   pr_debug2("%s %d: hisi_ptt_pmu %d type %d name 
%s\n",
+   __func__, __LINE__, idx,
+   hisi_ptt_pmus[idx]->type,
+   hisi_ptt_pmus[idx]->name);
+   idx++;
+   }
+
+   }
+   }
+
+out:
+   closedir(dir);
+   return hisi_ptt_pmus;
+}
+
 struct auxtrace_record
 *auxtrace_record__init(struct evlist *evlist, int *err)
 {
@@ -57,8 +112,12 @@ struct auxtrace_record
struct evsel *evsel;
bool found_etm = false;
struct perf_pmu *found_spe = NULL;
+   struct perf_pmu *found_p

[PATCH v7 4/7] hisi_ptt: Add tune function support for HiSilicon PCIe Tune and Trace device

2022-04-07 Thread Yicong Yang via iommu
Add tune function for the HiSilicon Tune and Trace device. The interface
of tune is exposed through sysfs attributes of PTT PMU device.

Signed-off-by: Yicong Yang 
Reviewed-by: Jonathan Cameron 
---
 drivers/hwtracing/ptt/hisi_ptt.c | 154 +++
 drivers/hwtracing/ptt/hisi_ptt.h |  20 
 2 files changed, 174 insertions(+)

diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
index b1958ac20372..331c8e43cd17 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.c
+++ b/drivers/hwtracing/ptt/hisi_ptt.c
@@ -21,6 +21,159 @@
 
 #include "hisi_ptt.h"
 
+static bool hisi_ptt_wait_tuning_finish(struct hisi_ptt *hisi_ptt)
+{
+   u32 val;
+
+   return !readl_poll_timeout(hisi_ptt->iobase + HISI_PTT_TUNING_INT_STAT,
+ val, !(val & HISI_PTT_TUNING_INT_STAT_MASK),
+ HISI_PTT_WAIT_POLL_INTERVAL_US,
+ HISI_PTT_WAIT_TUNE_TIMEOUT_US);
+}
+
+static int hisi_ptt_tune_data_get(struct hisi_ptt *hisi_ptt,
+ u32 event, u16 *data)
+{
+   u32 reg;
+
+   reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+   reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB);
+   reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB,
+ event);
+   writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+
+   /* Write all 1 to indicates it's the read process */
+   writel(~0U, hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
+
+   if (!hisi_ptt_wait_tuning_finish(hisi_ptt))
+   return -ETIMEDOUT;
+
+   reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
+   reg &= HISI_PTT_TUNING_DATA_VAL_MASK;
+   *data = FIELD_GET(HISI_PTT_TUNING_DATA_VAL_MASK, reg);
+
+   return 0;
+}
+
+static int hisi_ptt_tune_data_set(struct hisi_ptt *hisi_ptt,
+ u32 event, u16 data)
+{
+   u32 reg;
+
+   reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+   reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB);
+   reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB,
+ event);
+   writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+
+   writel(FIELD_PREP(HISI_PTT_TUNING_DATA_VAL_MASK, data),
+  hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
+
+   if (!hisi_ptt_wait_tuning_finish(hisi_ptt))
+   return -ETIMEDOUT;
+
+   return 0;
+}
+
+static ssize_t hisi_ptt_tune_attr_show(struct device *dev,
+  struct device_attribute *attr,
+  char *buf)
+{
+   struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
+   struct dev_ext_attribute *ext_attr;
+   struct hisi_ptt_tune_desc *desc;
+   int ret;
+   u16 val;
+
+   ext_attr = container_of(attr, struct dev_ext_attribute, attr);
+   desc = ext_attr->var;
+
+   if (!mutex_trylock(&hisi_ptt->mutex))
+   return -EBUSY;
+
+   ret = hisi_ptt_tune_data_get(hisi_ptt, desc->event_code, &val);
+
+   mutex_unlock(&hisi_ptt->mutex);
+   return ret ? ret : sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t hisi_ptt_tune_attr_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
+   struct dev_ext_attribute *ext_attr;
+   struct hisi_ptt_tune_desc *desc;
+   int ret;
+   u16 val;
+
+   ext_attr = container_of(attr, struct dev_ext_attribute, attr);
+   desc = ext_attr->var;
+
+   if (kstrtou16(buf, 10, &val))
+   return -EINVAL;
+
+   if (!mutex_trylock(&hisi_ptt->mutex))
+   return -EBUSY;
+
+   ret = hisi_ptt_tune_data_set(hisi_ptt, desc->event_code, val);
+
+   mutex_unlock(&hisi_ptt->mutex);
+   return ret ? ret : count;
+}
+
+#define HISI_PTT_TUNE_ATTR(_name, _val, _show, _store) \
+   static struct hisi_ptt_tune_desc _name##_desc = {   \
+   .name = #_name, \
+   .event_code = _val, \
+   };  \
+   static struct dev_ext_attribute hisi_ptt_##_name##_attr = { \
+   .attr   = __ATTR(_name, 0600, _show, _store),   \
+   .var= &_name##_desc,\
+   }
+
+#define HISI_PTT_TUNE_ATTR_COMMON(_name, _val) \
+   HISI_PTT_TUNE_ATTR(_name, _val, \
+  hisi_ptt_tune_attr_show, \
+  hisi_ptt_tune_attr_store)
+
+/*
+ * The value of the tuning event are composed of two parts: main event code in 
bit[0,15] 

[PATCH v7 3/7] hisi_ptt: Add support for dynamically updating the filter list

2022-04-07 Thread Yicong Yang via iommu
The PCIe devices supported by the PTT trace can be removed/rescanned by
hotplug or through sysfs.  Add support for dynamically updating the
available filter list by registering a PCI bus notifier block. Then user
can always get latest information about available tracing filters and
driver can block the invalid filters of which related devices no longer
exist in the system.

Signed-off-by: Yicong Yang 
Reviewed-by: Jonathan Cameron 
---
 drivers/hwtracing/ptt/hisi_ptt.c | 159 ---
 drivers/hwtracing/ptt/hisi_ptt.h |  34 +++
 2 files changed, 180 insertions(+), 13 deletions(-)

diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
index 242b41870380..b1958ac20372 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.c
+++ b/drivers/hwtracing/ptt/hisi_ptt.c
@@ -270,27 +270,121 @@ static int hisi_ptt_register_irq(struct hisi_ptt 
*hisi_ptt)
return 0;
 }
 
-static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data)
+static void hisi_ptt_update_filters(struct work_struct *work)
 {
+   struct delayed_work *delayed_work = to_delayed_work(work);
+   struct hisi_ptt_filter_update_info info;
struct hisi_ptt_filter_desc *filter;
-   struct hisi_ptt *hisi_ptt = data;
struct list_head *target_list;
+   struct hisi_ptt *hisi_ptt;
 
-   target_list = pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT ?
- &hisi_ptt->port_filters : &hisi_ptt->req_filters;
+   hisi_ptt = container_of(delayed_work, struct hisi_ptt, work);
 
-   filter = kzalloc(sizeof(*filter), GFP_KERNEL);
-   if (!filter) {
-   pci_err(hisi_ptt->pdev, "failed to add filter %s\n", 
pci_name(pdev));
-   return -ENOMEM;
+   if (!mutex_trylock(&hisi_ptt->mutex)) {
+   schedule_delayed_work(&hisi_ptt->work, HISI_PTT_WORK_DELAY_MS);
+   return;
}
 
-   filter->pdev = pdev;
-   list_add_tail(&filter->list, target_list);
+   while (kfifo_get(&hisi_ptt->filter_update_kfifo, &info)) {
+   bool is_port = pci_pcie_type(info.pdev) == 
PCI_EXP_TYPE_ROOT_PORT;
+   u16 val = hisi_ptt_get_filter_val(info.pdev);
 
-   /* Update the available port mask */
-   if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
-   hisi_ptt->port_mask |= hisi_ptt_get_filter_val(pdev);
+   target_list = is_port ? &hisi_ptt->port_filters : 
&hisi_ptt->req_filters;
+
+   if (info.is_add) {
+   filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+   if (!filter) {
+   pci_err(hisi_ptt->pdev, "failed to add filter 
%s\n",
+   pci_name(info.pdev));
+   continue;
+   }
+
+   filter->pdev = info.pdev;
+   list_add_tail(&filter->list, target_list);
+   } else {
+   list_for_each_entry(filter, target_list, list)
+   if (hisi_ptt_get_filter_val(filter->pdev) == 
val) {
+   list_del(&filter->list);
+   kfree(filter);
+   break;
+   }
+   }
+
+   /* Update the available port mask */
+   if (!is_port)
+   continue;
+
+   if (info.is_add)
+   hisi_ptt->port_mask |= val;
+   else
+   hisi_ptt->port_mask &= ~val;
+   }
+
+   mutex_unlock(&hisi_ptt->mutex);
+}
+
+static void hisi_ptt_update_fifo_in(struct hisi_ptt *hisi_ptt,
+   struct hisi_ptt_filter_update_info *info)
+{
+   struct pci_dev *root_port = pcie_find_root_port(info->pdev);
+   u32 port_devid;
+
+   if (!root_port)
+   return;
+
+   port_devid = PCI_DEVID(root_port->bus->number, root_port->devfn);
+   if (port_devid < hisi_ptt->lower ||
+   port_devid > hisi_ptt->upper)
+   return;
+
+   if (kfifo_in_spinlocked(&hisi_ptt->filter_update_kfifo, info, 1,
+   &hisi_ptt->filter_update_lock))
+   schedule_delayed_work(&hisi_ptt->work, 0);
+   else
+   pci_warn(hisi_ptt->pdev,
+"filter update fifo overflow for target %s\n",
+pci_name(info->pdev));
+}
+
+/*
+ * A PCI bus notifier is used here for dynamically updating the filter
+ * list.
+ */
+static int hisi_ptt_notifier_call(struct notifier_block *nb, unsigned long 
action,
+ void *data)
+{
+   struct hisi_ptt *hisi_ptt = container_of(nb, struct hisi_ptt, 
hisi_ptt_nb);
+   struct hisi_ptt_filter_update_info info;
+   struct device *dev = data;
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   info.pdev

[PATCH v7 6/7] docs: Add HiSilicon PTT device driver documentation

2022-04-07 Thread Yicong Yang via iommu
Document the introduction and usage of HiSilicon PTT device driver.

Signed-off-by: Yicong Yang 
Reviewed-by: Jonathan Cameron 
---
 Documentation/trace/hisi-ptt.rst | 303 +++
 1 file changed, 303 insertions(+)
 create mode 100644 Documentation/trace/hisi-ptt.rst

diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst
new file mode 100644
index ..13677705ee1f
--- /dev/null
+++ b/Documentation/trace/hisi-ptt.rst
@@ -0,0 +1,303 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+HiSilicon PCIe Tune and Trace device
+==
+
+Introduction
+
+
+HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
+integrated Endpoint (RCiEP) device, providing the capability
+to dynamically monitor and tune the PCIe link's events (tune),
+and trace the TLP headers (trace). The two functions are independent,
+but is recommended to use them together to analyze and enhance the
+PCIe link's performance.
+
+On Kunpeng 930 SoC, the PCIe Root Complex is composed of several
+PCIe cores. Each PCIe core includes several Root Ports and a PTT
+RCiEP, like below. The PTT device is capable of tuning and
+tracing the links of the PCIe core.
+::
+  +--Core 0---+
+  |   |   [   PTT   ] |
+  |   |   [Root Port]---[Endpoint]
+  |   |   [Root Port]---[Endpoint]
+  |   |   [Root Port]---[Endpoint]
+Root Complex  |--Core 1---+
+  |   |   [   PTT   ] |
+  |   |   [Root Port]---[ Switch ]---[Endpoint]
+  |   |   [Root Port]---[Endpoint] `-[Endpoint]
+  |   |   [Root Port]---[Endpoint]
+  +---+
+
+The PTT device driver registers one PMU device for each PTT device.
+The name of each PTT device is composed of 'hisi_ptt' prefix with
+the id of the SICL and the Core where it locates. The Kunpeng 930
+SoC encapsulates multiple CPU dies (SCCL, Super CPU Cluster) and
+IO dies (SICL, Super I/O Cluster), where there's one PCIe Root
+Complex for each SICL.
+::
+/sys/devices/hisi_ptt_
+
+Tune
+
+
+PTT tune is designed for monitoring and adjusting PCIe link parameters 
(events).
+Currently we support events in 4 classes. The scope of the events
+covers the PCIe core to which the PTT device belongs.
+
+Each event is presented as a file under $(PTT PMU dir)/tune, and
+a simple open/read/write/close cycle will be used to tune the event.
+::
+$ cd /sys/devices/hisi_ptt_/tune
+$ ls
+qos_tx_cplqos_tx_npqos_tx_p
+tx_path_rx_req_alloc_buf_level
+tx_path_tx_req_alloc_buf_level
+$ cat qos_tx_dp
+1
+$ echo 2 > qos_tx_dp
+$ cat qos_tx_dp
+2
+
+Current value (numerical value) of the event can be simply read
+from the file, and the desired value written to the file to tune.
+
+1. Tx path QoS control
+
+
+The following files are provided to tune the QoS of the tx path of
+the PCIe core.
+
+- qos_tx_cpl: weight of Tx completion TLPs
+- qos_tx_np: weight of Tx non-posted TLPs
+- qos_tx_p: weight of Tx posted TLPs
+
+The weight influences the proportion of certain packets on the PCIe link.
+For example, for the storage scenario, increase the proportion
+of the completion packets on the link to enhance the performance as
+more completions are consumed.
+
+The available tune data of these events is [0, 1, 2].
+Writing a negative value will return an error, and out of range
+values will be converted to 2. Note that the event value just
+indicates a probable level, but is not precise.
+
+2. Tx path buffer control
+-
+
+Following files are provided to tune the buffer of tx path of the PCIe core.
+
+- tx_path_rx_req_alloc_buf_level: watermark of Rx requested
+- tx_path_tx_req_alloc_buf_level: watermark of Tx requested
+
+These events influence the watermark of the buffer allocated for each
+type. Rx means the inbound while Tx means outbound. The packets will
+be stored in the buffer first and then transmitted either when the
+watermark reached or when timed out. For a busy direction, you should
+increase the related buffer watermark to avoid frequently posting and
+thus enhance the performance. In most cases just keep the default value.
+
+The available tune data of above events is [0, 1, 2].
+Writing a negative value will return an error, and out of range
+values will be converted to 2. Note that the event value just
+indicates a probable level, but is not precise.
+
+Trace
+=
+
+PTT trace is designed for dumping the TLP headers to the memory, which
+can be used to analyze the transactions and usage condition of the PCIe
+Link. You can choose to filter the traced headers by either requester ID,
+or those downstream of a set of Root Ports on the same core of the PTT
+device. It's also supported to trace the headers of certain type and of

[PATCH v7 1/7] iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity

2022-04-07 Thread Yicong Yang via iommu
The DMA operations of HiSilicon PTT device can only work properly with
identical mappings. So add a quirk for the device to force the domain
as passthrough.

Signed-off-by: Yicong Yang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8f..5ec15ae2a9b1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2839,6 +2839,21 @@ static int arm_smmu_dev_disable_feature(struct device 
*dev,
}
 }
 
+#define IS_HISI_PTT_DEVICE(pdev)   ((pdev)->vendor == PCI_VENDOR_ID_HUAWEI 
&& \
+(pdev)->device == 0xa12e)
+
+static int arm_smmu_def_domain_type(struct device *dev)
+{
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   if (IS_HISI_PTT_DEVICE(pdev))
+   return IOMMU_DOMAIN_IDENTITY;
+   }
+
+   return 0;
+}
+
 static struct iommu_ops arm_smmu_ops = {
.capable= arm_smmu_capable,
.domain_alloc   = arm_smmu_domain_alloc,
@@ -2856,6 +2871,7 @@ static struct iommu_ops arm_smmu_ops = {
.sva_unbind = arm_smmu_sva_unbind,
.sva_get_pasid  = arm_smmu_sva_get_pasid,
.page_response  = arm_smmu_page_response,
+   .def_domain_type= arm_smmu_def_domain_type,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
.owner  = THIS_MODULE,
.default_domain_ops = &(const struct iommu_domain_ops) {
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] x86/msi: Fix msi message data shadow struct

2022-04-07 Thread Reto Buerki
The x86 MSI message data is 32 bits in total and is either in
compatibility or remappable format, see Intel Virtualization Technology
for Directed I/O, section 5.1.2.

Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs")
Co-developed-by: Adrian-Ken Rueegsegger 
Signed-off-by: Adrian-Ken Rueegsegger 
Signed-off-by: Reto Buerki 
---
 arch/x86/include/asm/msi.h | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index b85147d75626..d71c7e8b738d 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -12,14 +12,17 @@ int pci_msi_prepare(struct irq_domain *domain, struct 
device *dev, int nvec,
 /* Structs and defines for the X86 specific MSI message format */
 
 typedef struct x86_msi_data {
-   u32 vector  :  8,
-   delivery_mode   :  3,
-   dest_mode_logical   :  1,
-   reserved:  2,
-   active_low  :  1,
-   is_level:  1;
-
-   u32 dmar_subhandle;
+   union {
+   struct {
+   u32 vector  :  8,
+   delivery_mode   :  3,
+   dest_mode_logical   :  1,
+   reserved:  2,
+   active_low  :  1,
+   is_level:  1;
+   };
+   u32 dmar_subhandle;
+   };
 } __attribute__ ((packed)) arch_msi_msg_data_t;
 #define arch_msi_msg_data  x86_msi_data
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu: remove the put_resv_regions method

2022-04-07 Thread Robin Murphy

On 2022-04-07 07:26, Christoph Hellwig wrote:

All drivers that implement get_resv_regions just use
generic_put_resv_regions to implement the put side.  Remove the
indirections and document the allocations constraints.


Unfortunately we need to keep this one for now, as the belated IORT RMR 
support will finally be the first real user[1][2].


Robin.

[1] 
https://lore.kernel.org/linux-iommu/20220404124209.1086-6-shameerali.kolothum.th...@huawei.com/
[2] 
https://lore.kernel.org/linux-iommu/20220404124209.1086-7-shameerali.kolothum.th...@huawei.com/



Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   |  1 -
  drivers/iommu/apple-dart.c  |  1 -
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  1 -
  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  1 -
  drivers/iommu/intel/iommu.c |  1 -
  drivers/iommu/iommu.c   | 20 +---
  drivers/iommu/mtk_iommu.c   |  1 -
  drivers/iommu/virtio-iommu.c|  5 ++---
  include/linux/iommu.h   |  4 
  9 files changed, 3 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e61..7011b46022dcbb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2279,7 +2279,6 @@ const struct iommu_ops amd_iommu_ops = {
.probe_finalize = amd_iommu_probe_finalize,
.device_group = amd_iommu_device_group,
.get_resv_regions = amd_iommu_get_resv_regions,
-   .put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
.def_domain_type = amd_iommu_def_domain_type,
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index decafb07ad0831..a45ad9ade0dba6 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -771,7 +771,6 @@ static const struct iommu_ops apple_dart_iommu_ops = {
.of_xlate = apple_dart_of_xlate,
.def_domain_type = apple_dart_def_domain_type,
.get_resv_regions = apple_dart_get_resv_regions,
-   .put_resv_regions = generic_iommu_put_resv_regions,
.pgsize_bitmap = -1UL, /* Restricted during dart probe */
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = apple_dart_attach_dev,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 36461fb46d436c..1ea184bbf750a6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2847,7 +2847,6 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
-   .put_resv_regions   = generic_iommu_put_resv_regions,
.dev_enable_feat= arm_smmu_dev_enable_feature,
.dev_disable_feat   = arm_smmu_dev_disable_feature,
.sva_bind   = arm_smmu_sva_bind,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 568cce590ccc13..41da1275689ebd 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1589,7 +1589,6 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
-   .put_resv_regions   = generic_iommu_put_resv_regions,
.def_domain_type= arm_smmu_def_domain_type,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
.owner  = THIS_MODULE,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index df5c62ecf942b8..cafe50cb484cd5 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4875,7 +4875,6 @@ const struct iommu_ops intel_iommu_ops = {
.probe_finalize = intel_iommu_probe_finalize,
.release_device = intel_iommu_release_device,
.get_resv_regions   = intel_iommu_get_resv_regions,
-   .put_resv_regions   = generic_iommu_put_resv_regions,
.device_group   = intel_iommu_device_group,
.dev_enable_feat= intel_iommu_dev_enable_feat,
.dev_disable_feat   = intel_iommu_dev_disable_feat,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6ce73f35c43aac..2e1f7d1cf74793 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2584,31 +2584,13 @@ void iommu_get_resv_regions(struct device *dev, struct 
list_head *list)
  }
  
  void iommu_put_resv_regions(struct device *dev, struct list_head *list)

-{
-   const struct iommu_ops *ops = dev_iommu_ops(dev);
-
-   if (ops->put_resv_regions)
- 

Re: [PATCH v3 2/2] iommu/mediatek: Add mt8186 iommu support

2022-04-07 Thread AngeloGioacchino Del Regno

Il 07/04/22 10:32, Yong Wu ha scritto:

Add mt8186 iommu supports.

Signed-off-by: Anan Sun 
Signed-off-by: Yong Wu 
Reviewed-by: Matthias Brugger 


Reviewed-by: AngeloGioacchino Del Regno 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/2] dt-bindings: mediatek: mt8186: Add binding for MM iommu

2022-04-07 Thread AngeloGioacchino Del Regno

Il 07/04/22 10:32, Yong Wu ha scritto:

Add mt8186 iommu binding. "-mm" means the iommu is for Multimedia.

Signed-off-by: Yong Wu 
Acked-by: Krzysztof Kozlowski 
Reviewed-by: Rob Herring 
Reviewed-by: Matthias Brugger 


Reviewed-by: AngeloGioacchino Del Regno 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Niklas Schnelle
On Wed, 2022-04-06 at 14:17 -0300, Jason Gunthorpe wrote:
> On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > > > Oh, I didn't know about device_get_dma_attr()..
> > > > 
> > > > Which is completely broken for any non-OF, non-ACPI plaform.
> > > 
> > > I saw that, but I spent some time searching and could not find an
> > > iommu driver that would load independently of OF or ACPI. ie no IOMMU
> > > platform drivers are created by board files. Things like Intel/AMD
> > > discover only from ACPI, etc.
> > 
> > s390?
> 
> Ah, I missed looking in s390, hyperv and virtio.. 
> 
> hyperv is not creating iommu_domains, just IRQ remapping
> 
> virtio is using OF
> 
> And s390 indeed doesn't obviously have OF or ACPI parts..
> 
> This seems like it would be consistent with other things:
> 
> enum dev_dma_attr device_get_dma_attr(struct device *dev)
> {
>   const struct fwnode_handle *fwnode = dev_fwnode(dev);
>   struct acpi_device *adev = to_acpi_device_node(fwnode);
> 
>   if (is_of_node(fwnode)) {
>   if (of_dma_is_coherent(to_of_node(fwnode)))
>   return DEV_DMA_COHERENT;
>   return DEV_DMA_NON_COHERENT;
>   } else if (adev) {
>   return acpi_get_dma_attr(adev);
>   }
> 
>   /* Platform is always DMA coherent */
>   if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) &&
>   !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) &&
>   !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) &&
>   device_iommu_mapped(dev))
>   return DEV_DMA_COHERENT;
>   return DEV_DMA_NOT_SUPPORTED;
> }
> EXPORT_SYMBOL_GPL(device_get_dma_attr);
> 
> ie s390 has no of or acpi but the entire platform is known DMA
> coherent at config time so allow it. Not sure we need the
> device_iommu_mapped() or not.

I only took a short look but I think the device_iommu_mapped() call in
there is wrong. On s390 PCI always goes through IOMMU hardware both
when using the IOMMU API and when using the DMA API and this hardware
is always coherent. This is even true for s390 guests in QEMU/KVM and
under the z/VM hypervisor. As far as I can see device_iommu_mapped()'s
check for dev->iommu_group would only work while the device is under
IOMMU API control not DMA API, no?

Also, while it is true that s390 *hardware* devices are always cache
coherent there is also the case that SWIOTLB is used for protected
virtualization and then cache flushing APIs must be used.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 4/7] hisi_ptt: Add tune function support for HiSilicon PCIe Tune and Trace device

2022-04-07 Thread Yicong Yang via iommu
On 2022/4/7 12:28, kernel test robot wrote:
> Hi Yicong,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on joro-iommu/next]
> [also build test WARNING on linus/master linux/master v5.18-rc1 next-20220406]
> [cannot apply to tip/perf/core]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:
> https://github.com/intel-lab-lkp/linux/commits/Yicong-Yang/Add-support-for-HiSilicon-PCIe-Tune-and-Trace-device/20220406-200044
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
> config: alpha-allyesconfig 
> (https://download.01.org/0day-ci/archive/20220407/202204071201.acepulor-...@intel.com/config)
> compiler: alpha-linux-gcc (GCC) 11.2.0
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # 
> https://github.com/intel-lab-lkp/linux/commit/9400668b70cbcd5ec74a52f043c3a333b80135f8
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review 
> Yicong-Yang/Add-support-for-HiSilicon-PCIe-Tune-and-Trace-device/20220406-200044
> git checkout 9400668b70cbcd5ec74a52f043c3a333b80135f8
> # save the config file to linux build tree
> mkdir build_dir
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
> O=build_dir ARCH=alpha SHELL=/bin/bash drivers/hwtracing/ptt/
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> All warnings (new ones prefixed by >>):
> 
>drivers/hwtracing/ptt/hisi_ptt.c: In function 'hisi_ptt_tune_data_get':
>>> drivers/hwtracing/ptt/hisi_ptt.c:46:16: warning: conversion from 'long 
>>> unsigned int' to 'u32' {aka 'unsigned int'} changes value from 
>>> '18446744073709551615' to '4294967295' [-Woverflow]
>   46 | writel(~0UL, hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
>  |^~~~

Thanks for the report. using of ~0U will fix this.

>drivers/hwtracing/ptt/hisi_ptt.c: At top level:
>drivers/hwtracing/ptt/hisi_ptt.c:1131:6: warning: no previous prototype 
> for 'hisi_ptt_remove' [-Wmissing-prototypes]
> 1131 | void hisi_ptt_remove(struct pci_dev *pdev)
>  |  ^~~
> 

for here I missed the static identifier. will fix. thanks.

> 
> vim +46 drivers/hwtracing/ptt/hisi_ptt.c
> 
> 33
> 34static int hisi_ptt_tune_data_get(struct hisi_ptt *hisi_ptt,
> 35  u32 event, u16 *data)
> 36{
> 37u32 reg;
> 38
> 39reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
> 40reg &= ~(HISI_PTT_TUNING_CTRL_CODE | 
> HISI_PTT_TUNING_CTRL_SUB);
> 41reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | 
> HISI_PTT_TUNING_CTRL_SUB,
> 42  event);
> 43writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
> 44
> 45/* Write all 1 to indicates it's the read process */
>   > 46writel(~0UL, hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
> 47
> 48if (!hisi_ptt_wait_tuning_finish(hisi_ptt))
> 49return -ETIMEDOUT;
> 50
> 51reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
> 52reg &= HISI_PTT_TUNING_DATA_VAL_MASK;
> 53*data = FIELD_GET(HISI_PTT_TUNING_DATA_VAL_MASK, reg);
> 54
> 55return 0;
> 56}
> 57
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/2] iommu/mediatek: Add mt8186 iommu support

2022-04-07 Thread Yong Wu via iommu
Add mt8186 iommu supports.

Signed-off-by: Anan Sun 
Signed-off-by: Yong Wu 
Reviewed-by: Matthias Brugger 
---
 drivers/iommu/mtk_iommu.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 22c95ed78b3c..8d2b6dc89177 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -160,6 +160,7 @@ enum mtk_iommu_plat {
M4U_MT8167,
M4U_MT8173,
M4U_MT8183,
+   M4U_MT8186,
M4U_MT8192,
M4U_MT8195,
 };
@@ -1429,6 +1430,21 @@ static const struct mtk_iommu_plat_data mt8183_data = {
.larbid_remap = {{0}, {4}, {5}, {6}, {7}, {2}, {3}, {1}},
 };
 
+static const struct mtk_iommu_plat_data mt8186_data_mm = {
+   .m4u_plat   = M4U_MT8186,
+   .flags  = HAS_BCLK | HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN |
+ WR_THROT_EN | IOVA_34_EN | NOT_STD_AXI_MODE |
+ MTK_IOMMU_TYPE_MM,
+   .larbid_remap   = {{0}, {1, MTK_INVALID_LARBID, 8}, {4}, {7}, {2}, {9, 
11, 19, 20},
+  {MTK_INVALID_LARBID, 14, 16},
+  {MTK_INVALID_LARBID, 13, MTK_INVALID_LARBID, 17}},
+   .inv_sel_reg= REG_MMU_INV_SEL_GEN2,
+   .banks_num  = 1,
+   .banks_enable   = {true},
+   .iova_region= mt8192_multi_dom,
+   .iova_region_nr = ARRAY_SIZE(mt8192_multi_dom),
+};
+
 static const struct mtk_iommu_plat_data mt8192_data = {
.m4u_plat   = M4U_MT8192,
.flags  = HAS_BCLK | HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN |
@@ -1498,6 +1514,7 @@ static const struct of_device_id mtk_iommu_of_ids[] = {
{ .compatible = "mediatek,mt8167-m4u", .data = &mt8167_data},
{ .compatible = "mediatek,mt8173-m4u", .data = &mt8173_data},
{ .compatible = "mediatek,mt8183-m4u", .data = &mt8183_data},
+   { .compatible = "mediatek,mt8186-iommu-mm",.data = 
&mt8186_data_mm}, /* mm: m4u */
{ .compatible = "mediatek,mt8192-m4u", .data = &mt8192_data},
{ .compatible = "mediatek,mt8195-iommu-infra", .data = 
&mt8195_data_infra},
{ .compatible = "mediatek,mt8195-iommu-vdo",   .data = 
&mt8195_data_vdo},
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/2] dt-bindings: mediatek: mt8186: Add binding for MM iommu

2022-04-07 Thread Yong Wu via iommu
Add mt8186 iommu binding. "-mm" means the iommu is for Multimedia.

Signed-off-by: Yong Wu 
Acked-by: Krzysztof Kozlowski 
Reviewed-by: Rob Herring 
Reviewed-by: Matthias Brugger 
---
 .../bindings/iommu/mediatek,iommu.yaml|   4 +
 .../dt-bindings/memory/mt8186-memory-port.h   | 217 ++
 2 files changed, 221 insertions(+)
 create mode 100644 include/dt-bindings/memory/mt8186-memory-port.h

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
index eed59ec00e78..91a3629a8e6e 100644
--- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
@@ -76,6 +76,7 @@ properties:
   - mediatek,mt8167-m4u  # generation two
   - mediatek,mt8173-m4u  # generation two
   - mediatek,mt8183-m4u  # generation two
+  - mediatek,mt8186-iommu-mm # generation two
   - mediatek,mt8192-m4u  # generation two
   - mediatek,mt8195-iommu-vdo# generation two
   - mediatek,mt8195-iommu-vpp# generation two
@@ -122,6 +123,7 @@ properties:
   dt-binding/memory/mt8167-larb-port.h for mt8167,
   dt-binding/memory/mt8173-larb-port.h for mt8173,
   dt-binding/memory/mt8183-larb-port.h for mt8183,
+  dt-binding/memory/mt8186-memory-port.h for mt8186,
   dt-binding/memory/mt8192-larb-port.h for mt8192.
   dt-binding/memory/mt8195-memory-port.h for mt8195.
 
@@ -143,6 +145,7 @@ allOf:
   - mediatek,mt2701-m4u
   - mediatek,mt2712-m4u
   - mediatek,mt8173-m4u
+  - mediatek,mt8186-iommu-mm
   - mediatek,mt8192-m4u
   - mediatek,mt8195-iommu-vdo
   - mediatek,mt8195-iommu-vpp
@@ -155,6 +158,7 @@ allOf:
   properties:
 compatible:
   enum:
+- mediatek,mt8186-iommu-mm
 - mediatek,mt8192-m4u
 - mediatek,mt8195-iommu-vdo
 - mediatek,mt8195-iommu-vpp
diff --git a/include/dt-bindings/memory/mt8186-memory-port.h 
b/include/dt-bindings/memory/mt8186-memory-port.h
new file mode 100644
index ..2bc6e4433048
--- /dev/null
+++ b/include/dt-bindings/memory/mt8186-memory-port.h
@@ -0,0 +1,217 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022 MediaTek Inc.
+ *
+ * Author: Anan Sun 
+ * Author: Yong Wu 
+ */
+#ifndef _DT_BINDINGS_MEMORY_MT8186_LARB_PORT_H_
+#define _DT_BINDINGS_MEMORY_MT8186_LARB_PORT_H_
+
+#include 
+
+/*
+ * MM IOMMU supports 16GB dma address. We separate it to four ranges:
+ * 0 ~ 4G; 4G ~ 8G; 8G ~ 12G; 12G ~ 16G, we could adjust these masters
+ * locate in anyone region. BUT:
+ * a) Make sure all the ports inside a larb are in one range.
+ * b) The iova of any master can NOT cross the 4G/8G/12G boundary.
+ *
+ * This is the suggested mapping in this SoC:
+ *
+ * modulesdma-address-region   larbs-ports
+ * disp 0 ~ 4G  larb0/1/2
+ * vcodec  4G ~ 8G  larb4/7
+ * cam/mdp 8G ~ 12G the other larbs.
+ * N/A 12G ~ 16G
+ * CCU0   0x24000_ ~ 0x243ff_   larb13: port 9/10
+ * CCU1   0x24400_ ~ 0x247ff_   larb14: port 4/5
+ */
+
+/* MM IOMMU ports */
+/* LARB 0 -- MMSYS */
+#define IOMMU_PORT_L0_DISP_POSTMASK0   MTK_M4U_ID(0, 0)
+#define IOMMU_PORT_L0_REVERSED MTK_M4U_ID(0, 1)
+#define IOMMU_PORT_L0_OVL_RDMA0MTK_M4U_ID(0, 2)
+#define IOMMU_PORT_L0_DISP_FAKE0   MTK_M4U_ID(0, 3)
+
+/* LARB 1 -- MMSYS */
+#define IOMMU_PORT_L1_DISP_RDMA1   MTK_M4U_ID(1, 0)
+#define IOMMU_PORT_L1_OVL_2L_RDMA0 MTK_M4U_ID(1, 1)
+#define IOMMU_PORT_L1_DISP_RDMA0   MTK_M4U_ID(1, 2)
+#define IOMMU_PORT_L1_DISP_WDMA0   MTK_M4U_ID(1, 3)
+#define IOMMU_PORT_L1_DISP_FAKE1   MTK_M4U_ID(1, 4)
+
+/* LARB 2 -- MMSYS */
+#define IOMMU_PORT_L2_MDP_RDMA0MTK_M4U_ID(2, 0)
+#define IOMMU_PORT_L2_MDP_RDMA1MTK_M4U_ID(2, 1)
+#define IOMMU_PORT_L2_MDP_WROT0MTK_M4U_ID(2, 2)
+#define IOMMU_PORT_L2_MDP_WROT1MTK_M4U_ID(2, 3)
+#define IOMMU_PORT_L2_DISP_FAKE0   MTK_M4U_ID(2, 4)
+
+/* LARB 4 -- VDEC */
+#define IOMMU_PORT_L4_HW_VDEC_MC_EXT   MTK_M4U_ID(4, 0)
+#define IOMMU_PORT_L4_HW_VDEC_UFO_EXT  MTK_M4U_ID(4, 1)
+#define IOMMU_PORT_L4_HW_VDEC_PP_EXT   MTK_M4U_ID(4, 2)
+#define IOMMU_PORT_L4_HW_VDEC_PRED_RD_EXT  MTK_M4U_ID(4, 3)
+#define IOMMU_PORT_L4_HW_VDEC_PRED_WR_EXT  MTK_M4U_ID(4, 4)
+#define IOMMU_PORT_L4_HW_VDEC_PPWRAP_EXT   MTK_M4U_ID(4, 5)
+#define IOMMU_PORT_L4_HW_VDEC_TILE_EXT MTK_M4U_ID(4, 6)
+#define IOMMU_PORT_L4_HW_VDEC_VLD_EXT  MTK_M4U_ID(4, 7)
+#define IOMMU_PORT_L4_HW_VDEC_VLD2_EXT MTK_M4U_ID(4, 8)
+#define IOMMU_PORT_L4_HW_VDEC_AVC_MV_EXT   MTK_M4U_ID(4, 9)
+#define IOMMU_PORT_L4_HW_VDEC_UFO_ENC_EXT  MTK_M4U_ID(4, 10)
+#define IOMMU_PORT_L4_HW_VDEC_

[PATCH v3 0/2] MT8186 IOMMU SUPPORT

2022-04-07 Thread Yong Wu via iommu
This patchset adds mt8186 iommu support.

Change note:
v3: Rebase on v5.18-rc1 and mt8195 iommu v6:

https://lore.kernel.org/linux-iommu/20220407075726.17771-1-yong...@mediatek.com/

v2: 
https://lore.kernel.org/linux-iommu/20220223072402.17518-1-yong...@mediatek.com/
a)Base on v5.17-rc1 and mt8195 iommu v5.
b)Add a comment "mm: m4u" in the code for readable.

v1: 
https://lore.kernel.org/linux-mediatek/20220125093244.18230-1-yong...@mediatek.com/

Yong Wu (2):
  dt-bindings: mediatek: mt8186: Add binding for MM iommu
  iommu/mediatek: Add mt8186 iommu support

 .../bindings/iommu/mediatek,iommu.yaml|   4 +
 drivers/iommu/mtk_iommu.c |  17 ++
 .../dt-bindings/memory/mt8186-memory-port.h   | 217 ++
 3 files changed, 238 insertions(+)
 create mode 100644 include/dt-bindings/memory/mt8186-memory-port.h

-- 
2.18.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v5 5/5] iova: Add iova_len argument to iova_domain_init_rcaches()

2022-04-07 Thread Leizhen (ThunderTown) via iommu



On 2022/4/4 19:27, John Garry wrote:
> Add max opt argument to iova_domain_init_rcaches(), and use it to set the
> rcaches range.
> 
> Also fix up all users to set this value (at 0, meaning use default),
> including a wrapper for that, iova_domain_init_rcaches_default().
> 
> For dma-iommu.c we derive the iova_len argument from the IOMMU group
> max opt DMA size.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/iommu/dma-iommu.c| 15 ++-
>  drivers/iommu/iova.c | 19 ---
>  drivers/vdpa/vdpa_user/iova_domain.c |  4 ++--
>  include/linux/iova.h |  3 ++-
>  4 files changed, 34 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 42ca42ff1b5d..19f35624611c 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -525,6 +525,8 @@ static int iommu_dma_init_domain(struct iommu_domain 
> *domain, dma_addr_t base,
>   struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   unsigned long order, base_pfn;
>   struct iova_domain *iovad;
> + size_t max_opt_dma_size;
> + unsigned long iova_len = 0;
>   int ret;
>  
>   if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
> @@ -560,7 +562,18 @@ static int iommu_dma_init_domain(struct iommu_domain 
> *domain, dma_addr_t base,
>   }
>  
>   init_iova_domain(iovad, 1UL << order, base_pfn);
> - ret = iova_domain_init_rcaches(iovad);
> +
> + max_opt_dma_size = iommu_group_get_max_opt_dma_size(dev->iommu_group);
> + if (max_opt_dma_size) {
> + unsigned long shift = __ffs(1UL << order);
> +
> + iova_len = roundup_pow_of_two(max_opt_dma_size);
> + iova_len >>= shift;
> + if (!iova_len)
> + iova_len = 1;

How about move "iovad->rcache_max_size = iova_len_to_rcache_max(iova_len);" 
here? So that,
iova_domain_init_rcaches() can remain the same. And 
iova_domain_init_rcaches_default() does
not need to be added.

> + }
> +
> + ret = iova_domain_init_rcaches(iovad, iova_len);
>   if (ret)
>   return ret;
>  
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 5c22b9187b79..d65e79e132ee 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -706,12 +706,20 @@ static void iova_magazine_push(struct iova_magazine 
> *mag, unsigned long pfn)
>   mag->pfns[mag->size++] = pfn;
>  }
>  
> -int iova_domain_init_rcaches(struct iova_domain *iovad)
> +static unsigned long iova_len_to_rcache_max(unsigned long iova_len)
> +{
> + return order_base_2(iova_len) + 1;
> +}
> +
> +int iova_domain_init_rcaches(struct iova_domain *iovad, unsigned long 
> iova_len)
>  {
>   unsigned int cpu;
>   int i, ret;
>  
> - iovad->rcache_max_size = 6; /* Arbitrarily high default */
> + if (iova_len)
> + iovad->rcache_max_size = iova_len_to_rcache_max(iova_len);
> + else
> + iovad->rcache_max_size = 6; /* Arbitrarily high default */
>  
>   iovad->rcaches = kcalloc(iovad->rcache_max_size,
>sizeof(struct iova_rcache),
> @@ -755,7 +763,12 @@ int iova_domain_init_rcaches(struct iova_domain *iovad)
>   free_iova_rcaches(iovad);
>   return ret;
>  }
> -EXPORT_SYMBOL_GPL(iova_domain_init_rcaches);
> +
> +int iova_domain_init_rcaches_default(struct iova_domain *iovad)
> +{
> + return iova_domain_init_rcaches(iovad, 0);
> +}
> +EXPORT_SYMBOL_GPL(iova_domain_init_rcaches_default);
>  
>  /*
>   * Try inserting IOVA range starting with 'iova_pfn' into 'rcache', and
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
> b/drivers/vdpa/vdpa_user/iova_domain.c
> index 6daa3978d290..3a2acef98a4a 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.c
> +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> @@ -514,12 +514,12 @@ vduse_domain_create(unsigned long iova_limit, size_t 
> bounce_size)
>   spin_lock_init(&domain->iotlb_lock);
>   init_iova_domain(&domain->stream_iovad,
>   PAGE_SIZE, IOVA_START_PFN);
> - ret = iova_domain_init_rcaches(&domain->stream_iovad);
> + ret = iova_domain_init_rcaches_default(&domain->stream_iovad);
>   if (ret)
>   goto err_iovad_stream;
>   init_iova_domain(&domain->consistent_iovad,
>   PAGE_SIZE, bounce_pfns);
> - ret = iova_domain_init_rcaches(&domain->consistent_iovad);
> + ret = iova_domain_init_rcaches_default(&domain->consistent_iovad);
>   if (ret)
>   goto err_iovad_consistent;
>  
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 02f7222fa85a..56281434ce0c 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -95,7 +95,8 @@ struct iova *reserve_iova(struct iova_domain *iovad, 
> unsigned long pfn_lo,
>   unsigned long pfn_hi);
>  void init_iova_domain(struct iova_domain *iovad, unsigned long granule,
>   unsigned long start_pfn);
> -int iov

Re: [PATCH RESEND v5 4/5] iommu: Allow max opt DMA len be set for a group via sysfs

2022-04-07 Thread Leizhen (ThunderTown) via iommu



On 2022/4/4 19:27, John Garry wrote:
> Add support to allow the maximum optimised DMA len be set for an IOMMU
> group via sysfs.
> 
> This is much the same with the method to change the default domain type
> for a group.
> 
> Signed-off-by: John Garry 
> ---
>  .../ABI/testing/sysfs-kernel-iommu_groups | 16 +
>  drivers/iommu/iommu.c | 59 ++-
>  include/linux/iommu.h |  6 ++
>  3 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups 
> b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
> index b15af6a5bc08..ed6f72794f6c 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
> +++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
> @@ -63,3 +63,19 @@ Description:   /sys/kernel/iommu_groups//type 
> shows the type of default
>   system could lead to catastrophic effects (the users might
>   need to reboot the machine to get it to normal state). So, it's
>   expected that the users understand what they're doing.
> +
> +What:/sys/kernel/iommu_groups//max_opt_dma_size
> +Date:Feb 2022
> +KernelVersion:   v5.18
> +Contact: iommu@lists.linux-foundation.org
> +Description: /sys/kernel/iommu_groups//max_opt_dma_size shows the
> + max optimised DMA size for the default IOMMU domain associated
> + with the group.
> + Each IOMMU domain has an IOVA domain. The IOVA domain caches
> + IOVAs upto a certain size as a performance optimisation.
> + This sysfs file allows the range of the IOVA domain caching be
> + set, such that larger than default IOVAs may be cached.
> + A value of 0 means that the default caching range is chosen.
> + A privileged user could request the kernel the change the range
> + by writing to this file. For this to happen, the same rules
> + and procedure applies as in changing the default domain type.
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 10bb10c2a210..7c7258f19bed 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -48,6 +48,7 @@ struct iommu_group {
>   struct iommu_domain *default_domain;
>   struct iommu_domain *domain;
>   struct list_head entry;
> + size_t max_opt_dma_size;
>  };
>  
>  struct group_device {
> @@ -89,6 +90,9 @@ static int iommu_create_device_direct_mappings(struct 
> iommu_group *group,
>  static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
>  static ssize_t iommu_group_store_type(struct iommu_group *group,
> const char *buf, size_t count);
> +static ssize_t iommu_group_store_max_opt_dma_size(struct iommu_group *group,
> +   const char *buf,
> +   size_t count);
>  
>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)\
>  struct iommu_group_attribute iommu_group_attr_##_name =  \
> @@ -571,6 +575,12 @@ static ssize_t iommu_group_show_type(struct iommu_group 
> *group,
>   return strlen(type);
>  }
>  
> +static ssize_t iommu_group_show_max_opt_dma_size(struct iommu_group *group,
> +  char *buf)
> +{
> + return sprintf(buf, "%zu\n", group->max_opt_dma_size);
> +}
> +
>  static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
>  
>  static IOMMU_GROUP_ATTR(reserved_regions, 0444,
> @@ -579,6 +589,9 @@ static IOMMU_GROUP_ATTR(reserved_regions, 0444,
>  static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type,
>   iommu_group_store_type);
>  
> +static IOMMU_GROUP_ATTR(max_opt_dma_size, 0644, 
> iommu_group_show_max_opt_dma_size,
> + iommu_group_store_max_opt_dma_size);
> +
>  static void iommu_group_release(struct kobject *kobj)
>  {
>   struct iommu_group *group = to_iommu_group(kobj);
> @@ -665,6 +678,10 @@ struct iommu_group *iommu_group_alloc(void)
>   if (ret)
>   return ERR_PTR(ret);
>  
> + ret = iommu_group_create_file(group, 
> &iommu_group_attr_max_opt_dma_size);
> + if (ret)
> + return ERR_PTR(ret);
> +
>   pr_debug("Allocated group %d\n", group->id);
>  
>   return group;
> @@ -2087,6 +2104,11 @@ struct iommu_domain *iommu_get_dma_domain(struct 
> device *dev)
>   return dev->iommu_group->default_domain;
>  }
>  
> +size_t iommu_group_get_max_opt_dma_size(struct iommu_group *group)
> +{
> + return group->max_opt_dma_size;
> +}
> +
>  /*
>   * IOMMU groups are really the natural working unit of the IOMMU, but
>   * the IOMMU API works on domains and devices.  Bridge that gap by
> @@ -2871,12 +2893,14 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
>   * @prev_dev: The device in the group (this is used to make sure that the 
> device
>   

RE: [PATCH 4/4] iommu/arm-smmu-v3: cleanup arm_smmu_dev_{enable, disable}_feature

2022-04-07 Thread Tian, Kevin
> From: Christoph Hellwig
> Sent: Thursday, April 7, 2022 2:26 PM
> 
> Fold the arm_smmu_dev_has_feature arm_smmu_dev_feature_enabled
> into
> the main methods.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 57 ++---
>  1 file changed, 15 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 1ea184bbf750a6..8e201c660139ae 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2760,58 +2760,27 @@ static void arm_smmu_get_resv_regions(struct
> device *dev,
>   iommu_dma_get_resv_regions(dev, head);
>  }
> 
> -static bool arm_smmu_dev_has_feature(struct device *dev,
> -  enum iommu_dev_features feat)
> -{
> - struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> -
> - if (!master)
> - return false;
> -
> - switch (feat) {
> - case IOMMU_DEV_FEAT_IOPF:
> - return arm_smmu_master_iopf_supported(master);
> - case IOMMU_DEV_FEAT_SVA:
> - return arm_smmu_master_sva_supported(master);
> - default:
> - return false;
> - }
> -}
> -
> -static bool arm_smmu_dev_feature_enabled(struct device *dev,
> -  enum iommu_dev_features feat)
> -{
> - struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> -
> - if (!master)
> - return false;
> -
> - switch (feat) {
> - case IOMMU_DEV_FEAT_IOPF:
> - return master->iopf_enabled;
> - case IOMMU_DEV_FEAT_SVA:
> - return arm_smmu_master_sva_enabled(master);
> - default:
> - return false;
> - }
> -}
> -
>  static int arm_smmu_dev_enable_feature(struct device *dev,
>  enum iommu_dev_features feat)
>  {
>   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> 
> - if (!arm_smmu_dev_has_feature(dev, feat))
> - return -ENODEV;
> -
> - if (arm_smmu_dev_feature_enabled(dev, feat))
> - return -EBUSY;
> + if (!master)
> + return -EINVAL;

Old logic returns -ENODEV but it's changed to -EINVAL here. Is it
intended? If yes, probably mention it in the patch description though
just a small semantics change.

> 
>   switch (feat) {
>   case IOMMU_DEV_FEAT_IOPF:
> + if (!arm_smmu_master_iopf_supported(master))
> + return -EINVAL;
> + if (master->iopf_enabled)
> + return -EBUSY;
>   master->iopf_enabled = true;
>   return 0;
>   case IOMMU_DEV_FEAT_SVA:
> + if (!arm_smmu_master_sva_supported(master))
> + return -EINVAL;
> + if (arm_smmu_master_sva_enabled(master))
> + return -EBUSY;
>   return arm_smmu_master_enable_sva(master);
>   default:
>   return -EINVAL;
> @@ -2823,16 +2792,20 @@ static int
> arm_smmu_dev_disable_feature(struct device *dev,
>  {
>   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> 
> - if (!arm_smmu_dev_feature_enabled(dev, feat))
> + if (!master)
>   return -EINVAL;
> 
>   switch (feat) {
>   case IOMMU_DEV_FEAT_IOPF:
> + if (!master->iopf_enabled)
> + return -EINVAL;
>   if (master->sva_enabled)
>   return -EBUSY;
>   master->iopf_enabled = false;
>   return 0;
>   case IOMMU_DEV_FEAT_SVA:
> + if (!arm_smmu_master_sva_enabled(master))
> + return -EINVAL;
>   return arm_smmu_master_disable_sva(master);
>   default:
>   return -EINVAL;
> --
> 2.30.2
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 3/4] iommu: remove the put_resv_regions method

2022-04-07 Thread Tian, Kevin
> From: Christoph Hellwig
> Sent: Thursday, April 7, 2022 2:26 PM
> 
> All drivers that implement get_resv_regions just use
> generic_put_resv_regions to implement the put side.  Remove the
> indirections and document the allocations constraints.
> 

Looks no document after removal:

>  void iommu_put_resv_regions(struct device *dev, struct list_head *list)
> -{
> - const struct iommu_ops *ops = dev_iommu_ops(dev);
> -
> - if (ops->put_resv_regions)
> - ops->put_resv_regions(dev, list);
> -}
> -
> -/**
> - * generic_iommu_put_resv_regions - Reserved region driver helper
> - * @dev: device for which to free reserved regions
> - * @list: reserved region list for device
> - *
> - * IOMMU drivers can use this to implement their .put_resv_regions()
> callback
> - * for simple reservations. Memory allocated for each reserved region will
> be
> - * freed. If an IOMMU driver allocates additional resources per region, it is
> - * going to have to implement a custom callback.
> - */
> -void generic_iommu_put_resv_regions(struct device *dev, struct list_head
> *list)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 34/34] iommu/mediatek: mt8195: Enable multi banks for infra iommu

2022-04-07 Thread Yong Wu via iommu
Enable the multi-bank functions for infra-iommu. We put PCIE in bank0
and USB in the last bank(bank4). and we don't use the other banks
currently, disable them.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 027bbbced80d..22c95ed78b3c 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1449,8 +1449,11 @@ static const struct mtk_iommu_plat_data 
mt8195_data_infra = {
MTK_IOMMU_TYPE_INFRA | IFA_IOMMU_PCIE_SUPPORT,
.pericfg_comp_str = "mediatek,mt8195-pericfg_ao",
.inv_sel_reg  = REG_MMU_INV_SEL_GEN2,
-   .banks_num= 1,
-   .banks_enable = {true},
+   .banks_num= 5,
+   .banks_enable = {true, false, false, false, true},
+   .banks_portmsk= {[0] = GENMASK(19, 16), /* PCIe */
+[4] = GENMASK(31, 20), /* USB */
+   },
.iova_region  = single_domain,
.iova_region_nr   = ARRAY_SIZE(single_domain),
 };
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 33/34] iommu/mediatek: Backup/restore regsiters for multi banks

2022-04-07 Thread Yong Wu via iommu
Each bank has some independent registers. thus backup/restore them for
each a bank when suspend and resume.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 46 ++-
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 028dc642a31e..027bbbced80d 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -173,11 +173,12 @@ struct mtk_iommu_suspend_reg {
u32 misc_ctrl;
u32 dcm_dis;
u32 ctrl_reg;
-   u32 int_control0;
-   u32 int_main_control;
-   u32 ivrp_paddr;
u32 vld_pa_rng;
u32 wr_len_ctrl;
+
+   u32 int_control[MTK_IOMMU_BANK_MAX];
+   u32 int_main_control[MTK_IOMMU_BANK_MAX];
+   u32 ivrp_paddr[MTK_IOMMU_BANK_MAX];
 };
 
 struct mtk_iommu_plat_data {
@@ -1292,16 +1293,23 @@ static int __maybe_unused 
mtk_iommu_runtime_suspend(struct device *dev)
 {
struct mtk_iommu_data *data = dev_get_drvdata(dev);
struct mtk_iommu_suspend_reg *reg = &data->reg;
-   void __iomem *base = data->bank[0].base;
+   void __iomem *base;
+   int i = 0;
 
+   base = data->bank[i].base;
reg->wr_len_ctrl = readl_relaxed(base + REG_MMU_WR_LEN_CTRL);
reg->misc_ctrl = readl_relaxed(base + REG_MMU_MISC_CTRL);
reg->dcm_dis = readl_relaxed(base + REG_MMU_DCM_DIS);
reg->ctrl_reg = readl_relaxed(base + REG_MMU_CTRL_REG);
-   reg->int_control0 = readl_relaxed(base + REG_MMU_INT_CONTROL0);
-   reg->int_main_control = readl_relaxed(base + REG_MMU_INT_MAIN_CONTROL);
-   reg->ivrp_paddr = readl_relaxed(base + REG_MMU_IVRP_PADDR);
reg->vld_pa_rng = readl_relaxed(base + REG_MMU_VLD_PA_RNG);
+   do {
+   if (!data->plat_data->banks_enable[i])
+   continue;
+   base = data->bank[i].base;
+   reg->int_control[i] = readl_relaxed(base + 
REG_MMU_INT_CONTROL0);
+   reg->int_main_control[i] = readl_relaxed(base + 
REG_MMU_INT_MAIN_CONTROL);
+   reg->ivrp_paddr[i] = readl_relaxed(base + REG_MMU_IVRP_PADDR);
+   } while (++i < data->plat_data->banks_num);
clk_disable_unprepare(data->bclk);
return 0;
 }
@@ -1310,9 +1318,9 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct 
device *dev)
 {
struct mtk_iommu_data *data = dev_get_drvdata(dev);
struct mtk_iommu_suspend_reg *reg = &data->reg;
-   struct mtk_iommu_domain *m4u_dom = data->bank[0].m4u_dom;
-   void __iomem *base = data->bank[0].base;
-   int ret;
+   struct mtk_iommu_domain *m4u_dom;
+   void __iomem *base;
+   int ret, i = 0;
 
ret = clk_prepare_enable(data->bclk);
if (ret) {
@@ -1324,18 +1332,26 @@ static int __maybe_unused 
mtk_iommu_runtime_resume(struct device *dev)
 * Uppon first resume, only enable the clk and return, since the values 
of the
 * registers are not yet set.
 */
-   if (!m4u_dom)
+   if (!reg->wr_len_ctrl)
return 0;
 
+   base = data->bank[i].base;
writel_relaxed(reg->wr_len_ctrl, base + REG_MMU_WR_LEN_CTRL);
writel_relaxed(reg->misc_ctrl, base + REG_MMU_MISC_CTRL);
writel_relaxed(reg->dcm_dis, base + REG_MMU_DCM_DIS);
writel_relaxed(reg->ctrl_reg, base + REG_MMU_CTRL_REG);
-   writel_relaxed(reg->int_control0, base + REG_MMU_INT_CONTROL0);
-   writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL);
-   writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
-   writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, base + 
REG_MMU_PT_BASE_ADDR);
+   do {
+   m4u_dom = data->bank[i].m4u_dom;
+   if (!data->plat_data->banks_enable[i] || !m4u_dom)
+   continue;
+   base = data->bank[i].base;
+   writel_relaxed(reg->int_control[i], base + 
REG_MMU_INT_CONTROL0);
+   writel_relaxed(reg->int_main_control[i], base + 
REG_MMU_INT_MAIN_CONTROL);
+   writel_relaxed(reg->ivrp_paddr[i], base + REG_MMU_IVRP_PADDR);
+   writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
+  base + REG_MMU_PT_BASE_ADDR);
+   } while (++i < data->plat_data->banks_num);
 
/*
 * Users may allocate dma buffer before they call pm_runtime_get,
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 32/34] iommu/mediatek: Initialise/Remove for multi bank dev

2022-04-07 Thread Yong Wu via iommu
The registers for each bank of the IOMMU base are in order, delta is
0x1000. Initialise the base for each bank.

For all the previous SoC, we only have bank0. thus use "do {} while()"
to allow bank0 always go.

When removing the device, Not always all the banks are initialised, it
depend on if there is masters for that bank.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 44 ++-
 1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d42b3d35a36e..028dc642a31e 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -113,6 +113,7 @@
 #define F_MMU_INT_ID_PORT_ID(a)(((a) >> 2) & 0x1f)
 
 #define MTK_PROTECT_PA_ALIGN   256
+#define MTK_IOMMU_BANK_SZ  0x1000
 
 #define PERICFG_IOMMU_10x714
 
@@ -1104,7 +1105,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
struct component_match  *match = NULL;
struct regmap   *infracfg;
void*protect;
-   int ret, banks_num;
+   int ret, banks_num, i = 0;
u32 val;
char*p;
struct mtk_iommu_bank_data *bank;
@@ -1145,27 +1146,36 @@ static int mtk_iommu_probe(struct platform_device *pdev)
data->enable_4GB = !!(val & F_DDR_4GB_SUPPORT_EN);
}
 
+   banks_num = data->plat_data->banks_num;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (resource_size(res) < banks_num * MTK_IOMMU_BANK_SZ) {
+   dev_err(dev, "banknr %d. res %pR is not enough.\n", banks_num, 
res);
+   return -EINVAL;
+   }
base = devm_ioremap_resource(dev, res);
if (IS_ERR(base))
return PTR_ERR(base);
ioaddr = res->start;
 
-   banks_num = data->plat_data->banks_num;
data->bank = devm_kmalloc(dev, banks_num * sizeof(*data->bank), 
GFP_KERNEL);
if (!data->bank)
return -ENOMEM;
 
-   bank = &data->bank[0];
-   bank->id = 0;
-   bank->base = base;
-   bank->m4u_dom = NULL;
-   bank->irq = platform_get_irq(pdev, 0);
-   if (bank->irq < 0)
-   return bank->irq;
-   bank->parent_dev = dev;
-   bank->parent_data = data;
-   spin_lock_init(&bank->tlb_lock);
+   do {
+   if (!data->plat_data->banks_enable[i])
+   continue;
+   bank = &data->bank[i];
+   bank->id = i;
+   bank->base = base + i * MTK_IOMMU_BANK_SZ;
+   bank->m4u_dom = NULL;
+
+   bank->irq = platform_get_irq(pdev, i);
+   if (bank->irq < 0)
+   return bank->irq;
+   bank->parent_dev = dev;
+   bank->parent_data = data;
+   spin_lock_init(&bank->tlb_lock);
+   } while (++i < banks_num);
 
if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_BCLK)) {
data->bclk = devm_clk_get(dev, "bclk");
@@ -1251,7 +1261,8 @@ static int mtk_iommu_probe(struct platform_device *pdev)
 static int mtk_iommu_remove(struct platform_device *pdev)
 {
struct mtk_iommu_data *data = platform_get_drvdata(pdev);
-   struct mtk_iommu_bank_data *bank = &data->bank[0];
+   struct mtk_iommu_bank_data *bank;
+   int i;
 
iommu_device_sysfs_remove(&data->iommu);
iommu_device_unregister(&data->iommu);
@@ -1268,7 +1279,12 @@ static int mtk_iommu_remove(struct platform_device *pdev)
 #endif
}
pm_runtime_disable(&pdev->dev);
-   devm_free_irq(&pdev->dev, bank->irq, bank);
+   for (i = 0; i < data->plat_data->banks_num; i++) {
+   bank = &data->bank[i];
+   if (!bank->m4u_dom)
+   continue;
+   devm_free_irq(&pdev->dev, bank->irq, bank);
+   }
return 0;
 }
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 31/34] iommu/mediatek: Get the proper bankid for multi banks

2022-04-07 Thread Yong Wu via iommu
We preassign some ports in a special bank via the new defined
banks_portmsk. Put it in the plat_data means it is not expected to be
adjusted dynamically.

If the iommu id in the iommu consumer's dtsi node is inside this
banks_portmsk, then we switch it to this special iommu bank, and
initialise the IOMMU bank HW.

Each a bank has the independent pgtable(4GB iova range). Each a bank
is a independent iommu domain/group. Currently we don't separate different
iova ranges inside a bank.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 39 ---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 0828cff97625..d42b3d35a36e 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -191,6 +191,7 @@ struct mtk_iommu_plat_data {
 
u8  banks_num;
boolbanks_enable[MTK_IOMMU_BANK_MAX];
+   unsigned intbanks_portmsk[MTK_IOMMU_BANK_MAX];
unsigned char   larbid_remap[MTK_LARB_COM_MAX][MTK_LARB_SUBCOM_MAX];
 };
 
@@ -467,6 +468,30 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static unsigned int mtk_iommu_get_bank_id(struct device *dev,
+ const struct mtk_iommu_plat_data 
*plat_data)
+{
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   unsigned int i, portmsk = 0, bankid = 0;
+
+   if (plat_data->banks_num == 1)
+   return bankid;
+
+   for (i = 0; i < fwspec->num_ids; i++)
+   portmsk |= BIT(MTK_M4U_TO_PORT(fwspec->ids[i]));
+
+   for (i = 0; i < plat_data->banks_num && i < MTK_IOMMU_BANK_MAX; i++) {
+   if (!plat_data->banks_enable[i])
+   continue;
+
+   if (portmsk & plat_data->banks_portmsk[i]) {
+   bankid = i;
+   break;
+   }
+   }
+   return bankid; /* default is 0 */
+}
+
 static int mtk_iommu_get_iova_region_id(struct device *dev,
const struct mtk_iommu_plat_data 
*plat_data)
 {
@@ -619,13 +644,14 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
struct list_head *hw_list = data->hw_list;
struct device *m4udev = data->dev;
struct mtk_iommu_bank_data *bank;
-   unsigned int bankid = 0;
+   unsigned int bankid;
int ret, region_id;
 
region_id = mtk_iommu_get_iova_region_id(dev, data->plat_data);
if (region_id < 0)
return region_id;
 
+   bankid = mtk_iommu_get_bank_id(dev, data->plat_data);
mutex_lock(&dom->mutex);
if (!dom->bank) {
/* Data is in the frstdata in sharing pgtable case. */
@@ -802,6 +828,7 @@ static struct iommu_group *mtk_iommu_device_group(struct 
device *dev)
struct mtk_iommu_data *c_data = dev_iommu_priv_get(dev), *data;
struct list_head *hw_list = c_data->hw_list;
struct iommu_group *group;
+   unsigned int bankid, groupid;
int regionid;
 
data = mtk_iommu_get_frst_data(hw_list);
@@ -812,12 +839,18 @@ static struct iommu_group *mtk_iommu_device_group(struct 
device *dev)
if (regionid < 0)
return ERR_PTR(regionid);
 
+   bankid = mtk_iommu_get_bank_id(dev, data->plat_data);
mutex_lock(&data->mutex);
-   group = data->m4u_group[regionid];
+   /*
+* If the bank function is enabled, each a bank is a iommu group/domain.
+* otherwise, each a iova region is a iommu group/domain.
+*/
+   groupid = bankid ? bankid : regionid;
+   group = data->m4u_group[groupid];
if (!group) {
group = iommu_group_alloc();
if (!IS_ERR(group))
-   data->m4u_group[regionid] = group;
+   data->m4u_group[groupid] = group;
} else {
iommu_group_ref_get(group);
}
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 30/34] iommu/mediatek: Change the domid to iova_region_id

2022-04-07 Thread Yong Wu via iommu
Prepare for adding bankid, also no functional change.

In the previous SoC, each a iova_region is a domain; In the multi-banks
case, each a bank is a domain, then the original function name
"mtk_iommu_get_domain_id" is not proper. Use "iova_region_id" instead of
"domain_id".

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 46 +++
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 9c27b99ca0cd..0828cff97625 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -467,8 +467,8 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-static int mtk_iommu_get_domain_id(struct device *dev,
-  const struct mtk_iommu_plat_data *plat_data)
+static int mtk_iommu_get_iova_region_id(struct device *dev,
+   const struct mtk_iommu_plat_data 
*plat_data)
 {
const struct mtk_iommu_iova_region *rgn = plat_data->iova_region;
const struct bus_dma_region *dma_rgn = dev->dma_range_map;
@@ -498,7 +498,7 @@ static int mtk_iommu_get_domain_id(struct device *dev,
 }
 
 static int mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev,
-   bool enable, unsigned int domid)
+   bool enable, unsigned int regionid)
 {
struct mtk_smi_larb_iommu*larb_mmu;
unsigned int larbid, portid;
@@ -514,12 +514,12 @@ static int mtk_iommu_config(struct mtk_iommu_data *data, 
struct device *dev,
if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
larb_mmu = &data->larb_imu[larbid];
 
-   region = data->plat_data->iova_region + domid;
+   region = data->plat_data->iova_region + regionid;
larb_mmu->bank[portid] = 
upper_32_bits(region->iova_base);
 
-   dev_dbg(dev, "%s iommu for larb(%s) port %d dom %d bank 
%d.\n",
+   dev_dbg(dev, "%s iommu for larb(%s) port %d region %d 
rgn-bank %d.\n",
enable ? "enable" : "disable", 
dev_name(larb_mmu->dev),
-   portid, domid, larb_mmu->bank[portid]);
+   portid, regionid, larb_mmu->bank[portid]);
 
if (enable)
larb_mmu->mmu |= MTK_SMI_MMU_EN(portid);
@@ -545,7 +545,7 @@ static int mtk_iommu_config(struct mtk_iommu_data *data, 
struct device *dev,
 
 static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
 struct mtk_iommu_data *data,
-unsigned int domid)
+unsigned int region_id)
 {
const struct mtk_iommu_iova_region *region;
struct mtk_iommu_domain *m4u_dom;
@@ -584,7 +584,7 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom,
 
 update_iova_region:
/* Update the iova region for this domain */
-   region = data->plat_data->iova_region + domid;
+   region = data->plat_data->iova_region + region_id;
dom->domain.geometry.aperture_start = region->iova_base;
dom->domain.geometry.aperture_end = region->iova_base + region->size - 
1;
dom->domain.geometry.force_aperture = true;
@@ -620,18 +620,18 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
struct device *m4udev = data->dev;
struct mtk_iommu_bank_data *bank;
unsigned int bankid = 0;
-   int ret, domid;
+   int ret, region_id;
 
-   domid = mtk_iommu_get_domain_id(dev, data->plat_data);
-   if (domid < 0)
-   return domid;
+   region_id = mtk_iommu_get_iova_region_id(dev, data->plat_data);
+   if (region_id < 0)
+   return region_id;
 
mutex_lock(&dom->mutex);
if (!dom->bank) {
/* Data is in the frstdata in sharing pgtable case. */
frstdata = mtk_iommu_get_frst_data(hw_list);
 
-   ret = mtk_iommu_domain_finalise(dom, frstdata, domid);
+   ret = mtk_iommu_domain_finalise(dom, frstdata, region_id);
if (ret) {
mutex_unlock(&dom->mutex);
return -ENODEV;
@@ -662,7 +662,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
}
mutex_unlock(&data->mutex);
 
-   return mtk_iommu_config(data, dev, true, domid);
+   return mtk_iommu_config(data, dev, true, region_id);
 
 err_unlock:
mutex_unlock(&data->mutex);
@@ -802,22 +802,22 @@ static struct iommu_group *mtk_iommu_device_group(struct 
device *dev)
struct mtk_iommu_data *c_data = dev_iommu_priv_get(dev), *data;
struct list_head *hw_list = c_data->hw_list;
struct iom

[PATCH v6 28/34] iommu/mediatek: Add mtk_iommu_bank_data structure

2022-04-07 Thread Yong Wu via iommu
Prepare for supporting multi-banks for the IOMMU HW, No functional change.

Add a new structure(mtk_iommu_bank_data) for each a bank. Each a bank have
the independent HW base/IRQ/tlb-range ops, and each a bank has its special
iommu-domain(independent pgtable), thus, also move the domain information
into it.

In previous SoC, we have only one bank which could be treated as bank0(
bankid always is 0 for the previous SoC).

After adding this structure, the tlb operations and irq could use
bank_data as parameter.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 179 +-
 1 file changed, 117 insertions(+), 62 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d46eb745492f..f2a29399f10f 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -151,6 +151,7 @@
 #define MTK_LARB_SUBCOM_MAX8
 
 #define MTK_IOMMU_GROUP_MAX8
+#define MTK_IOMMU_BANK_MAX 5
 
 enum mtk_iommu_plat {
M4U_MT2712,
@@ -187,25 +188,36 @@ struct mtk_iommu_plat_data {
struct list_head*hw_list;
unsigned intiova_region_nr;
const struct mtk_iommu_iova_region  *iova_region;
+
+   u8  banks_num;
+   boolbanks_enable[MTK_IOMMU_BANK_MAX];
unsigned char   larbid_remap[MTK_LARB_COM_MAX][MTK_LARB_SUBCOM_MAX];
 };
 
-struct mtk_iommu_data {
+struct mtk_iommu_bank_data {
void __iomem*base;
int irq;
+   u8  id;
+   struct device   *parent_dev;
+   struct mtk_iommu_data   *parent_data;
+   spinlock_t  tlb_lock; /* lock for tlb range flush */
+   struct mtk_iommu_domain *m4u_dom; /* Each bank has a domain */
+};
+
+struct mtk_iommu_data {
struct device   *dev;
struct clk  *bclk;
phys_addr_t protect_base; /* protect memory base */
struct mtk_iommu_suspend_regreg;
-   struct mtk_iommu_domain *m4u_dom;
struct iommu_group  *m4u_group[MTK_IOMMU_GROUP_MAX];
boolenable_4GB;
-   spinlock_t  tlb_lock; /* lock for tlb range flush */
 
struct iommu_device iommu;
const struct mtk_iommu_plat_data *plat_data;
struct device   *smicomm_dev;
 
+   struct mtk_iommu_bank_data  *bank;
+
struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */
struct regmap   *pericfg;
 
@@ -225,7 +237,7 @@ struct mtk_iommu_domain {
struct io_pgtable_cfg   cfg;
struct io_pgtable_ops   *iop;
 
-   struct mtk_iommu_data   *data;
+   struct mtk_iommu_bank_data  *bank;
struct iommu_domain domain;
 
struct mutexmutex; /* Protect "data" in this 
structure */
@@ -311,20 +323,24 @@ static struct mtk_iommu_domain *to_mtk_domain(struct 
iommu_domain *dom)
 
 static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
 {
-   void __iomem *base = data->base;
+   /* Tlb flush all always is in bank0. */
+   struct mtk_iommu_bank_data *bank = &data->bank[0];
+   void __iomem *base = bank->base;
unsigned long flags;
 
-   spin_lock_irqsave(&data->tlb_lock, flags);
+   spin_lock_irqsave(&bank->tlb_lock, flags);
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + 
data->plat_data->inv_sel_reg);
writel_relaxed(F_ALL_INVLD, base + REG_MMU_INVALIDATE);
wmb(); /* Make sure the tlb flush all done */
-   spin_unlock_irqrestore(&data->tlb_lock, flags);
+   spin_unlock_irqrestore(&bank->tlb_lock, flags);
 }
 
 static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
-  struct mtk_iommu_data *data)
+  struct mtk_iommu_bank_data *bank)
 {
-   struct list_head *head = data->hw_list;
+   struct list_head *head = bank->parent_data->hw_list;
+   struct mtk_iommu_bank_data *curbank;
+   struct mtk_iommu_data *data;
bool check_pm_status;
unsigned long flags;
void __iomem *base;
@@ -354,9 +370,10 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
continue;
}
 
-   base = data->base;
+   curbank = &data->bank[bank->id];
+   base = curbank->base;
 
-   spin_lock_irqsave(&data->tlb_lock, flags);
+   spin_lock_irqsave(&curbank->tlb_lock, flags);
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
   base + data->plat_data->inv_sel_reg);
 
@@ -371,7 +388,7 @@ static void mtk

[PATCH v6 29/34] iommu/mediatek: Initialise bank HW for each a bank

2022-04-07 Thread Yong Wu via iommu
The mt8195 IOMMU HW max support 5 banks, and regarding the banks'
registers, it looks like:

 
 |bank0  | bank1 | bank2 | bank3 | bank4|
 
 |global |
 |control| null
 |regs   |
 -
 |bank   |bank   |bank   |bank   |bank   |
 |regs   |regs   |regs   |regs   |regs   |
 |   |   |   |   |   |
 -

Each bank has some special bank registers and it share bank0's global
control registers. this patch initialise the bank hw with the bankid.

In the hw_init, we always initialise bank0's control register since
we don't know if the bank0 is initialised.

Additionally, About each bank's register base, always delta 0x1000.
like bank[x + 1] = bank[x] + 0x1000.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index f2a29399f10f..9c27b99ca0cd 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -259,7 +259,7 @@ static void mtk_iommu_unbind(struct device *dev)
 
 static const struct iommu_ops mtk_iommu_ops;
 
-static int mtk_iommu_hw_init(const struct mtk_iommu_data *data);
+static int mtk_iommu_hw_init(const struct mtk_iommu_data *data, unsigned int 
bankid);
 
 #define MTK_IOMMU_TLB_ADDR(iova) ({\
dma_addr_t _addr = iova;\
@@ -642,12 +642,14 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
 
mutex_lock(&data->mutex);
bank = &data->bank[bankid];
-   if (!bank->m4u_dom) { /* Initialize the M4U HW */
+   if (!bank->m4u_dom) { /* Initialize the M4U HW for each a BANK */
ret = pm_runtime_resume_and_get(m4udev);
-   if (ret < 0)
+   if (ret < 0) {
+   dev_err(m4udev, "pm get fail(%d) in attach.\n", ret);
goto err_unlock;
+   }
 
-   ret = mtk_iommu_hw_init(data);
+   ret = mtk_iommu_hw_init(data, bankid);
if (ret) {
pm_runtime_put(m4udev);
goto err_unlock;
@@ -897,11 +899,16 @@ static const struct iommu_ops mtk_iommu_ops = {
}
 };
 
-static int mtk_iommu_hw_init(const struct mtk_iommu_data *data)
+static int mtk_iommu_hw_init(const struct mtk_iommu_data *data, unsigned int 
bankid)
 {
+   const struct mtk_iommu_bank_data *bankx = &data->bank[bankid];
const struct mtk_iommu_bank_data *bank0 = &data->bank[0];
u32 regval;
 
+   /*
+* Global control settings are in bank0. May re-init these global 
registers
+* since no sure if there is bank0 consumers.
+*/
if (data->plat_data->m4u_plat == M4U_MT8173) {
regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
 F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173;
@@ -944,13 +951,14 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
}
writel_relaxed(regval, bank0->base + REG_MMU_MISC_CTRL);
 
+   /* Independent settings for each bank */
regval = F_L2_MULIT_HIT_EN |
F_TABLE_WALK_FAULT_INT_EN |
F_PREETCH_FIFO_OVERFLOW_INT_EN |
F_MISS_FIFO_OVERFLOW_INT_EN |
F_PREFETCH_FIFO_ERR_INT_EN |
F_MISS_FIFO_ERR_INT_EN;
-   writel_relaxed(regval, bank0->base + REG_MMU_INT_CONTROL0);
+   writel_relaxed(regval, bankx->base + REG_MMU_INT_CONTROL0);
 
regval = F_INT_TRANSLATION_FAULT |
F_INT_MAIN_MULTI_HIT_FAULT |
@@ -959,19 +967,19 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
F_INT_TLB_MISS_FAULT |
F_INT_MISS_TRANSACTION_FIFO_FAULT |
F_INT_PRETETCH_TRANSATION_FIFO_FAULT;
-   writel_relaxed(regval, bank0->base + REG_MMU_INT_MAIN_CONTROL);
+   writel_relaxed(regval, bankx->base + REG_MMU_INT_MAIN_CONTROL);
 
if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_LEGACY_IVRP_PADDR))
regval = (data->protect_base >> 1) | (data->enable_4GB << 31);
else
regval = lower_32_bits(data->protect_base) |
 upper_32_bits(data->protect_base);
-   writel_relaxed(regval, bank0->base + REG_MMU_IVRP_PADDR);
+   writel_relaxed(regval, bankx->base + REG_MMU_IVRP_PADDR);
 
-   if (devm_request_irq(bank0->parent_dev, bank0->irq, mtk_iommu_isr, 0,
-dev_name(bank0->parent_dev), (void *)bank0)) {
-   writel_relaxed(0, bank0->base + REG_MMU_PT_BASE_ADDR);
-   dev_err(bank0->parent_dev, "Failed @ IRQ-%d Request\n", 
bank0->irq);
+   if (devm_request_irq(bankx->parent_dev, bankx->irq, mtk_iomm

[PATCH v6 27/34] iommu/mediatek-v1: Just rename mtk_iommu to mtk_iommu_v1

2022-04-07 Thread Yong Wu via iommu
No functional change. Just rename this for readable. Differentiate this
from mtk_iommu.c

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu_v1.c | 211 +--
 1 file changed, 103 insertions(+), 108 deletions(-)

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 3d1f0897d1cc..62669e60991f 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -85,53 +85,53 @@
  */
 #define M2701_IOMMU_PGT_SIZE   SZ_4M
 
-struct mtk_iommu_suspend_reg {
+struct mtk_iommu_v1_suspend_reg {
u32 standard_axi_mode;
u32 dcm_dis;
u32 ctrl_reg;
u32 int_control0;
 };
 
-struct mtk_iommu_data {
+struct mtk_iommu_v1_data {
void __iomem*base;
int irq;
struct device   *dev;
struct clk  *bclk;
phys_addr_t protect_base; /* protect memory base */
-   struct mtk_iommu_domain *m4u_dom;
+   struct mtk_iommu_v1_domain  *m4u_dom;
 
struct iommu_device iommu;
struct dma_iommu_mapping*mapping;
struct mtk_smi_larb_iommu   larb_imu[MTK_LARB_NR_MAX];
 
-   struct mtk_iommu_suspend_regreg;
+   struct mtk_iommu_v1_suspend_reg reg;
 };
 
-struct mtk_iommu_domain {
+struct mtk_iommu_v1_domain {
spinlock_t  pgtlock; /* lock for page table */
struct iommu_domain domain;
u32 *pgt_va;
dma_addr_t  pgt_pa;
-   struct mtk_iommu_data   *data;
+   struct mtk_iommu_v1_data*data;
 };
 
-static int mtk_iommu_bind(struct device *dev)
+static int mtk_iommu_v1_bind(struct device *dev)
 {
-   struct mtk_iommu_data *data = dev_get_drvdata(dev);
+   struct mtk_iommu_v1_data *data = dev_get_drvdata(dev);
 
return component_bind_all(dev, &data->larb_imu);
 }
 
-static void mtk_iommu_unbind(struct device *dev)
+static void mtk_iommu_v1_unbind(struct device *dev)
 {
-   struct mtk_iommu_data *data = dev_get_drvdata(dev);
+   struct mtk_iommu_v1_data *data = dev_get_drvdata(dev);
 
component_unbind_all(dev, &data->larb_imu);
 }
 
-static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
+static struct mtk_iommu_v1_domain *to_mtk_domain(struct iommu_domain *dom)
 {
-   return container_of(dom, struct mtk_iommu_domain, domain);
+   return container_of(dom, struct mtk_iommu_v1_domain, domain);
 }
 
 static const int mt2701_m4u_in_larb[] = {
@@ -157,7 +157,7 @@ static inline int mt2701_m4u_to_port(int id)
return id - mt2701_m4u_in_larb[larb];
 }
 
-static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
+static void mtk_iommu_v1_tlb_flush_all(struct mtk_iommu_v1_data *data)
 {
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
data->base + REG_MMU_INV_SEL);
@@ -165,8 +165,8 @@ static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data 
*data)
wmb(); /* Make sure the tlb flush all done */
 }
 
-static void mtk_iommu_tlb_flush_range(struct mtk_iommu_data *data,
-   unsigned long iova, size_t size)
+static void mtk_iommu_v1_tlb_flush_range(struct mtk_iommu_v1_data *data,
+unsigned long iova, size_t size)
 {
int ret;
u32 tmp;
@@ -184,16 +184,16 @@ static void mtk_iommu_tlb_flush_range(struct 
mtk_iommu_data *data,
if (ret) {
dev_warn(data->dev,
 "Partial TLB flush timed out, falling back to full 
flush\n");
-   mtk_iommu_tlb_flush_all(data);
+   mtk_iommu_v1_tlb_flush_all(data);
}
/* Clear the CPE status */
writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
 }
 
-static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
+static irqreturn_t mtk_iommu_v1_isr(int irq, void *dev_id)
 {
-   struct mtk_iommu_data *data = dev_id;
-   struct mtk_iommu_domain *dom = data->m4u_dom;
+   struct mtk_iommu_v1_data *data = dev_id;
+   struct mtk_iommu_v1_domain *dom = data->m4u_dom;
u32 int_state, regval, fault_iova, fault_pa;
unsigned int fault_larb, fault_port;
 
@@ -223,13 +223,13 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
regval |= F_INT_CLR_BIT;
writel_relaxed(regval, data->base + REG_MMU_INT_CONTROL);
 
-   mtk_iommu_tlb_flush_all(data);
+   mtk_iommu_v1_tlb_flush_all(data);
 
return IRQ_HANDLED;
 }
 
-static void mtk_iommu_config(struct mtk_iommu_data *data,
-struct device *dev, bool enable)
+static void mtk_iommu_v1_config(struct mtk_iommu_v1_data *data,
+   struct device *dev, boo

[PATCH v6 26/34] iommu/mediatek: Remove mtk_iommu.h

2022-04-07 Thread Yong Wu via iommu
Currently there is a suspend structure in the header file. It's no need
to keep a header file only for this. Move these into the c file and rm
this header file.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c| 14 +-
 drivers/iommu/mtk_iommu.h| 32 
 drivers/iommu/mtk_iommu_v1.c | 11 ---
 3 files changed, 21 insertions(+), 36 deletions(-)
 delete mode 100644 drivers/iommu/mtk_iommu.h

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index ab3b1aedfdc3..d46eb745492f 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -30,7 +31,7 @@
 #include 
 #include 
 
-#include "mtk_iommu.h"
+#include 
 
 #define REG_MMU_PT_BASE_ADDR   0x000
 #define MMU_PT_ADDR_MASK   GENMASK(31, 7)
@@ -166,6 +167,17 @@ struct mtk_iommu_iova_region {
unsigned long long  size;
 };
 
+struct mtk_iommu_suspend_reg {
+   u32 misc_ctrl;
+   u32 dcm_dis;
+   u32 ctrl_reg;
+   u32 int_control0;
+   u32 int_main_control;
+   u32 ivrp_paddr;
+   u32 vld_pa_rng;
+   u32 wr_len_ctrl;
+};
+
 struct mtk_iommu_plat_data {
enum mtk_iommu_plat m4u_plat;
u32 flags;
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
deleted file mode 100644
index 305243e18aa9..
--- a/drivers/iommu/mtk_iommu.h
+++ /dev/null
@@ -1,32 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (c) 2015-2016 MediaTek Inc.
- * Author: Honghui Zhang 
- */
-
-#ifndef _MTK_IOMMU_H_
-#define _MTK_IOMMU_H_
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct mtk_iommu_suspend_reg {
-   union {
-   u32 standard_axi_mode;/* v1 */
-   u32 misc_ctrl;/* v2 */
-   };
-   u32 dcm_dis;
-   u32 ctrl_reg;
-   u32 int_control0;
-   u32 int_main_control;
-   u32 ivrp_paddr;
-   u32 vld_pa_rng;
-   u32 wr_len_ctrl;
-};
-
-#endif
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 6d1c09c91e1f..3d1f0897d1cc 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -7,7 +7,6 @@
  *
  * Based on driver/iommu/mtk_iommu.c
  */
-#include 
 #include 
 #include 
 #include 
@@ -28,10 +27,9 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
-#include "mtk_iommu.h"
 
 #define REG_MMU_PT_BASE_ADDR   0x000
 
@@ -87,6 +85,13 @@
  */
 #define M2701_IOMMU_PGT_SIZE   SZ_4M
 
+struct mtk_iommu_suspend_reg {
+   u32 standard_axi_mode;
+   u32 dcm_dis;
+   u32 ctrl_reg;
+   u32 int_control0;
+};
+
 struct mtk_iommu_data {
void __iomem*base;
int irq;
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 25/34] iommu/mediatek: Separate mtk_iommu_data for v1 and v2

2022-04-07 Thread Yong Wu via iommu
Prepare for adding the structure "mtk_iommu_bank_data". No functional
change. The mtk_iommu_domain in v1 and v2 are different, we could not add
current data as bank[0] in v1 simplistically.

Currently we have no plan to add new SoC for v1, in order to avoid affect
v1 when we add many new features for v2, I totally separate v1 and v2 in
this patch, there are many structures only for v2.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c| 82 +---
 drivers/iommu/mtk_iommu.h| 81 ---
 drivers/iommu/mtk_iommu_v1.c | 29 +
 3 files changed, 106 insertions(+), 86 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6b238ad55cbe..ab3b1aedfdc3 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -146,6 +146,69 @@
 
 #define MTK_INVALID_LARBID MTK_LARB_NR_MAX
 
+#define MTK_LARB_COM_MAX   8
+#define MTK_LARB_SUBCOM_MAX8
+
+#define MTK_IOMMU_GROUP_MAX8
+
+enum mtk_iommu_plat {
+   M4U_MT2712,
+   M4U_MT6779,
+   M4U_MT8167,
+   M4U_MT8173,
+   M4U_MT8183,
+   M4U_MT8192,
+   M4U_MT8195,
+};
+
+struct mtk_iommu_iova_region {
+   dma_addr_t  iova_base;
+   unsigned long long  size;
+};
+
+struct mtk_iommu_plat_data {
+   enum mtk_iommu_plat m4u_plat;
+   u32 flags;
+   u32 inv_sel_reg;
+
+   char*pericfg_comp_str;
+   struct list_head*hw_list;
+   unsigned intiova_region_nr;
+   const struct mtk_iommu_iova_region  *iova_region;
+   unsigned char   larbid_remap[MTK_LARB_COM_MAX][MTK_LARB_SUBCOM_MAX];
+};
+
+struct mtk_iommu_data {
+   void __iomem*base;
+   int irq;
+   struct device   *dev;
+   struct clk  *bclk;
+   phys_addr_t protect_base; /* protect memory base */
+   struct mtk_iommu_suspend_regreg;
+   struct mtk_iommu_domain *m4u_dom;
+   struct iommu_group  *m4u_group[MTK_IOMMU_GROUP_MAX];
+   boolenable_4GB;
+   spinlock_t  tlb_lock; /* lock for tlb range flush */
+
+   struct iommu_device iommu;
+   const struct mtk_iommu_plat_data *plat_data;
+   struct device   *smicomm_dev;
+
+   struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */
+   struct regmap   *pericfg;
+
+   struct mutexmutex; /* Protect m4u_group/m4u_dom 
above */
+
+   /*
+* In the sharing pgtable case, list data->list to the global list like 
m4ulist.
+* In the non-sharing pgtable case, list data->list to the itself 
hw_list_head.
+*/
+   struct list_head*hw_list;
+   struct list_headhw_list_head;
+   struct list_headlist;
+   struct mtk_smi_larb_iommu   larb_imu[MTK_LARB_NR_MAX];
+};
+
 struct mtk_iommu_domain {
struct io_pgtable_cfg   cfg;
struct io_pgtable_ops   *iop;
@@ -156,6 +219,20 @@ struct mtk_iommu_domain {
struct mutexmutex; /* Protect "data" in this 
structure */
 };
 
+static int mtk_iommu_bind(struct device *dev)
+{
+   struct mtk_iommu_data *data = dev_get_drvdata(dev);
+
+   return component_bind_all(dev, &data->larb_imu);
+}
+
+static void mtk_iommu_unbind(struct device *dev)
+{
+   struct mtk_iommu_data *data = dev_get_drvdata(dev);
+
+   component_unbind_all(dev, &data->larb_imu);
+}
+
 static const struct iommu_ops mtk_iommu_ops;
 
 static int mtk_iommu_hw_init(const struct mtk_iommu_data *data);
@@ -193,11 +270,6 @@ static LIST_HEAD(m4ulist); /* List all the M4U HWs */
 
 #define for_each_m4u(data, head)  list_for_each_entry(data, head, list)
 
-struct mtk_iommu_iova_region {
-   dma_addr_t  iova_base;
-   unsigned long long  size;
-};
-
 static const struct mtk_iommu_iova_region single_domain[] = {
{.iova_base = 0,.size = SZ_4G},
 };
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index f2ee11cd254a..305243e18aa9 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -7,23 +7,14 @@
 #ifndef _MTK_IOMMU_H_
 #define _MTK_IOMMU_H_
 
-#include 
-#include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 #include 
 #include 
 
-#define MTK_LARB_COM_MAX   8
-#define MTK_LARB_SUBCOM_MAX8
-
-#define MTK_IOMMU_GROUP_MAX8
-
 struct mtk_iommu_suspend_reg {
union {
u32 standard_axi_mode;/* v1 */
@@ -38,76 +29,4 @@ struct mtk_iommu_suspend_reg {
u32 wr_len_ctrl;
 };
 
-enum mt

[PATCH v6 24/34] iommu/mediatek: Just move code position in hw_init

2022-04-07 Thread Yong Wu via iommu
No functional change too, prepare for mt8195 IOMMU support bank functions.
Some global control settings are in bank0 while the other banks have
their bank independent setting. Here only move the global control
settings and the independent registers together.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 48 +++
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index cb99c1d01f28..6b238ad55cbe 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -803,30 +803,6 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
}
writel_relaxed(regval, data->base + REG_MMU_CTRL_REG);
 
-   regval = F_L2_MULIT_HIT_EN |
-   F_TABLE_WALK_FAULT_INT_EN |
-   F_PREETCH_FIFO_OVERFLOW_INT_EN |
-   F_MISS_FIFO_OVERFLOW_INT_EN |
-   F_PREFETCH_FIFO_ERR_INT_EN |
-   F_MISS_FIFO_ERR_INT_EN;
-   writel_relaxed(regval, data->base + REG_MMU_INT_CONTROL0);
-
-   regval = F_INT_TRANSLATION_FAULT |
-   F_INT_MAIN_MULTI_HIT_FAULT |
-   F_INT_INVALID_PA_FAULT |
-   F_INT_ENTRY_REPLACEMENT_FAULT |
-   F_INT_TLB_MISS_FAULT |
-   F_INT_MISS_TRANSACTION_FIFO_FAULT |
-   F_INT_PRETETCH_TRANSATION_FIFO_FAULT;
-   writel_relaxed(regval, data->base + REG_MMU_INT_MAIN_CONTROL);
-
-   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_LEGACY_IVRP_PADDR))
-   regval = (data->protect_base >> 1) | (data->enable_4GB << 31);
-   else
-   regval = lower_32_bits(data->protect_base) |
-upper_32_bits(data->protect_base);
-   writel_relaxed(regval, data->base + REG_MMU_IVRP_PADDR);
-
if (data->enable_4GB &&
MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_VLD_PA_RNG)) {
/*
@@ -860,6 +836,30 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
}
writel_relaxed(regval, data->base + REG_MMU_MISC_CTRL);
 
+   regval = F_L2_MULIT_HIT_EN |
+   F_TABLE_WALK_FAULT_INT_EN |
+   F_PREETCH_FIFO_OVERFLOW_INT_EN |
+   F_MISS_FIFO_OVERFLOW_INT_EN |
+   F_PREFETCH_FIFO_ERR_INT_EN |
+   F_MISS_FIFO_ERR_INT_EN;
+   writel_relaxed(regval, data->base + REG_MMU_INT_CONTROL0);
+
+   regval = F_INT_TRANSLATION_FAULT |
+   F_INT_MAIN_MULTI_HIT_FAULT |
+   F_INT_INVALID_PA_FAULT |
+   F_INT_ENTRY_REPLACEMENT_FAULT |
+   F_INT_TLB_MISS_FAULT |
+   F_INT_MISS_TRANSACTION_FIFO_FAULT |
+   F_INT_PRETETCH_TRANSATION_FIFO_FAULT;
+   writel_relaxed(regval, data->base + REG_MMU_INT_MAIN_CONTROL);
+
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_LEGACY_IVRP_PADDR))
+   regval = (data->protect_base >> 1) | (data->enable_4GB << 31);
+   else
+   regval = lower_32_bits(data->protect_base) |
+upper_32_bits(data->protect_base);
+   writel_relaxed(regval, data->base + REG_MMU_IVRP_PADDR);
+
if (devm_request_irq(data->dev, data->irq, mtk_iommu_isr, 0,
 dev_name(data->dev), (void *)data)) {
writel_relaxed(0, data->base + REG_MMU_PT_BASE_ADDR);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 23/34] iommu/mediatek: Only adjust code about register base

2022-04-07 Thread Yong Wu via iommu
No functional change. Use "base" instead of the data->base. This is
avoid to touch too many lines in the next patches.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 51 +--
 1 file changed, 27 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 7a8d9dda7361..cb99c1d01f28 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -227,12 +227,12 @@ static struct mtk_iommu_domain *to_mtk_domain(struct 
iommu_domain *dom)
 
 static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
 {
+   void __iomem *base = data->base;
unsigned long flags;
 
spin_lock_irqsave(&data->tlb_lock, flags);
-   writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
-  data->base + data->plat_data->inv_sel_reg);
-   writel_relaxed(F_ALL_INVLD, data->base + REG_MMU_INVALIDATE);
+   writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + 
data->plat_data->inv_sel_reg);
+   writel_relaxed(F_ALL_INVLD, base + REG_MMU_INVALIDATE);
wmb(); /* Make sure the tlb flush all done */
spin_unlock_irqrestore(&data->tlb_lock, flags);
 }
@@ -243,6 +243,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
struct list_head *head = data->hw_list;
bool check_pm_status;
unsigned long flags;
+   void __iomem *base;
int ret;
u32 tmp;
 
@@ -269,23 +270,23 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
continue;
}
 
+   base = data->base;
+
spin_lock_irqsave(&data->tlb_lock, flags);
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
-  data->base + data->plat_data->inv_sel_reg);
+  base + data->plat_data->inv_sel_reg);
 
-   writel_relaxed(MTK_IOMMU_TLB_ADDR(iova),
-  data->base + REG_MMU_INVLD_START_A);
+   writel_relaxed(MTK_IOMMU_TLB_ADDR(iova), base + 
REG_MMU_INVLD_START_A);
writel_relaxed(MTK_IOMMU_TLB_ADDR(iova + size - 1),
-  data->base + REG_MMU_INVLD_END_A);
-   writel_relaxed(F_MMU_INV_RANGE,
-  data->base + REG_MMU_INVALIDATE);
+  base + REG_MMU_INVLD_END_A);
+   writel_relaxed(F_MMU_INV_RANGE, base + REG_MMU_INVALIDATE);
 
/* tlb sync */
-   ret = readl_poll_timeout_atomic(data->base + REG_MMU_CPE_DONE,
+   ret = readl_poll_timeout_atomic(base + REG_MMU_CPE_DONE,
tmp, tmp != 0, 10, 1000);
 
/* Clear the CPE status */
-   writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
+   writel_relaxed(0, base + REG_MMU_CPE_DONE);
spin_unlock_irqrestore(&data->tlb_lock, flags);
 
if (ret) {
@@ -305,23 +306,25 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
struct mtk_iommu_domain *dom = data->m4u_dom;
unsigned int fault_larb = MTK_INVALID_LARBID, fault_port = 0, sub_comm 
= 0;
u32 int_state, regval, va34_32, pa34_32;
+   const struct mtk_iommu_plat_data *plat_data = data->plat_data;
+   void __iomem *base = data->base;
u64 fault_iova, fault_pa;
bool layer, write;
 
/* Read error info from registers */
-   int_state = readl_relaxed(data->base + REG_MMU_FAULT_ST1);
+   int_state = readl_relaxed(base + REG_MMU_FAULT_ST1);
if (int_state & F_REG_MMU0_FAULT_MASK) {
-   regval = readl_relaxed(data->base + REG_MMU0_INT_ID);
-   fault_iova = readl_relaxed(data->base + REG_MMU0_FAULT_VA);
-   fault_pa = readl_relaxed(data->base + REG_MMU0_INVLD_PA);
+   regval = readl_relaxed(base + REG_MMU0_INT_ID);
+   fault_iova = readl_relaxed(base + REG_MMU0_FAULT_VA);
+   fault_pa = readl_relaxed(base + REG_MMU0_INVLD_PA);
} else {
-   regval = readl_relaxed(data->base + REG_MMU1_INT_ID);
-   fault_iova = readl_relaxed(data->base + REG_MMU1_FAULT_VA);
-   fault_pa = readl_relaxed(data->base + REG_MMU1_INVLD_PA);
+   regval = readl_relaxed(base + REG_MMU1_INT_ID);
+   fault_iova = readl_relaxed(base + REG_MMU1_FAULT_VA);
+   fault_pa = readl_relaxed(base + REG_MMU1_INVLD_PA);
}
layer = fault_iova & F_MMU_FAULT_VA_LAYER_BIT;
write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT;
-   if (MTK_IOMMU_HAS_FLAG(data->plat_data, IOVA_34_EN)) {
+   if (MTK_IOMMU_HAS_FLAG(plat_data, IOVA_34_EN)) {
va34_32 = FIELD_GET(F_MMU_INVAL_VA_34_32_MASK, fault_iova);
fault_iova = fault_iova & F_MMU_INVAL_VA_31_12_MASK;
  

[PATCH v6 22/34] iommu/mediatek: Add mt8195 support

2022-04-07 Thread Yong Wu via iommu
mt8195 has 3 IOMMU, containing 2 MM IOMMUs, one is for vdo, the other
is for vpp. and 1 INFRA IOMMU.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 43 +++
 drivers/iommu/mtk_iommu.h |  1 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 763e912d0a67..7a8d9dda7361 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1234,6 +1234,46 @@ static const struct mtk_iommu_plat_data mt8192_data = {
   {0, 14, 16}, {0, 13, 18, 17}},
 };
 
+static const struct mtk_iommu_plat_data mt8195_data_infra = {
+   .m4u_plat = M4U_MT8195,
+   .flags= WR_THROT_EN | DCM_DISABLE | PM_CLK_AO |
+   MTK_IOMMU_TYPE_INFRA | IFA_IOMMU_PCIE_SUPPORT,
+   .pericfg_comp_str = "mediatek,mt8195-pericfg_ao",
+   .inv_sel_reg  = REG_MMU_INV_SEL_GEN2,
+   .iova_region  = single_domain,
+   .iova_region_nr   = ARRAY_SIZE(single_domain),
+};
+
+static const struct mtk_iommu_plat_data mt8195_data_vdo = {
+   .m4u_plat   = M4U_MT8195,
+   .flags  = HAS_BCLK | HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN |
+ WR_THROT_EN | NOT_STD_AXI_MODE | IOVA_34_EN |
+ SHARE_PGTABLE | MTK_IOMMU_TYPE_MM,
+   .hw_list= &m4ulist,
+   .inv_sel_reg= REG_MMU_INV_SEL_GEN2,
+   .iova_region= mt8192_multi_dom,
+   .iova_region_nr = ARRAY_SIZE(mt8192_multi_dom),
+   .larbid_remap   = {{2, 0}, {21}, {24}, {7}, {19}, {9, 10, 11},
+  {13, 17, 15/* 17b */, 25}, {5}},
+};
+
+static const struct mtk_iommu_plat_data mt8195_data_vpp = {
+   .m4u_plat   = M4U_MT8195,
+   .flags  = HAS_BCLK | HAS_SUB_COMM_3BITS | OUT_ORDER_WR_EN |
+ WR_THROT_EN | NOT_STD_AXI_MODE | IOVA_34_EN |
+ SHARE_PGTABLE | MTK_IOMMU_TYPE_MM,
+   .hw_list= &m4ulist,
+   .inv_sel_reg= REG_MMU_INV_SEL_GEN2,
+   .iova_region= mt8192_multi_dom,
+   .iova_region_nr = ARRAY_SIZE(mt8192_multi_dom),
+   .larbid_remap   = {{1}, {3},
+  {22, MTK_INVALID_LARBID, MTK_INVALID_LARBID, 
MTK_INVALID_LARBID, 23},
+  {8}, {20}, {12},
+  /* 16: 16a; 29: 16b; 30: CCUtop0; 31: CCUtop1 */
+  {14, 16, 29, 26, 30, 31, 18},
+  {4, MTK_INVALID_LARBID, MTK_INVALID_LARBID, 
MTK_INVALID_LARBID, 6}},
+};
+
 static const struct of_device_id mtk_iommu_of_ids[] = {
{ .compatible = "mediatek,mt2712-m4u", .data = &mt2712_data},
{ .compatible = "mediatek,mt6779-m4u", .data = &mt6779_data},
@@ -1241,6 +1281,9 @@ static const struct of_device_id mtk_iommu_of_ids[] = {
{ .compatible = "mediatek,mt8173-m4u", .data = &mt8173_data},
{ .compatible = "mediatek,mt8183-m4u", .data = &mt8183_data},
{ .compatible = "mediatek,mt8192-m4u", .data = &mt8192_data},
+   { .compatible = "mediatek,mt8195-iommu-infra", .data = 
&mt8195_data_infra},
+   { .compatible = "mediatek,mt8195-iommu-vdo",   .data = 
&mt8195_data_vdo},
+   { .compatible = "mediatek,mt8195-iommu-vpp",   .data = 
&mt8195_data_vpp},
{}
 };
 
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index 56838fad8c73..f2ee11cd254a 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -46,6 +46,7 @@ enum mtk_iommu_plat {
M4U_MT8173,
M4U_MT8183,
M4U_MT8192,
+   M4U_MT8195,
 };
 
 struct mtk_iommu_iova_region;
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 21/34] iommu/mediatek: Add PCIe support

2022-04-07 Thread Yong Wu via iommu
Currently the code for of_iommu_configure_dev_id is like this:

static int of_iommu_configure_dev_id(struct device_node *master_np,
 struct device *dev,
 const u32 *id)
{
   struct of_phandle_args iommu_spec = { .args_count = 1 };

   err = of_map_id(master_np, *id, "iommu-map",
   "iommu-map-mask", &iommu_spec.np,
   iommu_spec.args);
...
}

It supports only one id output. BUT our PCIe HW has two ID(one is for
writing, the other is for reading). I'm not sure if we should change
of_map_id to support output MAX_PHANDLE_ARGS.

Here add the solution in ourselve drivers. If it's pcie case, enable one
more bit.

Not all infra iommu support PCIe, thus add a PCIe support flag here.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d975c892b332..763e912d0a67 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -134,6 +135,7 @@
 #define MTK_IOMMU_TYPE_MASK(0x3 << 13)
 /* PM and clock always on. e.g. infra iommu */
 #define PM_CLK_AO  BIT(15)
+#define IFA_IOMMU_PCIE_SUPPORT BIT(16)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x)  (!!(((pdata)->flags) & (_x)))
 
@@ -420,8 +422,11 @@ static int mtk_iommu_config(struct mtk_iommu_data *data, 
struct device *dev,
larb_mmu->mmu &= ~MTK_SMI_MMU_EN(portid);
} else if (MTK_IOMMU_IS_TYPE(data->plat_data, 
MTK_IOMMU_TYPE_INFRA)) {
peri_mmuen_msk = BIT(portid);
-   peri_mmuen = enable ? peri_mmuen_msk : 0;
+   /* PCI dev has only one output id, enable the next 
writing bit for PCIe */
+   if (dev_is_pci(dev))
+   peri_mmuen_msk |= BIT(portid + 1);
 
+   peri_mmuen = enable ? peri_mmuen_msk : 0;
ret = regmap_update_bits(data->pericfg, PERICFG_IOMMU_1,
 peri_mmuen_msk, peri_mmuen);
if (ret)
@@ -1052,6 +1057,15 @@ static int mtk_iommu_probe(struct platform_device *pdev)
ret = component_master_add_with_match(dev, &mtk_iommu_com_ops, 
match);
if (ret)
goto out_bus_set_null;
+   } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) &&
+  MTK_IOMMU_HAS_FLAG(data->plat_data, IFA_IOMMU_PCIE_SUPPORT)) 
{
+#ifdef CONFIG_PCI
+   if (!iommu_present(&pci_bus_type)) {
+   ret = bus_set_iommu(&pci_bus_type, &mtk_iommu_ops);
+   if (ret) /* PCIe fail don't affect platform_bus. */
+   goto out_list_del;
+   }
+#endif
}
return ret;
 
@@ -1082,6 +1096,11 @@ static int mtk_iommu_remove(struct platform_device *pdev)
if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
device_link_remove(data->smicomm_dev, &pdev->dev);
component_master_del(&pdev->dev, &mtk_iommu_com_ops);
+   } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) &&
+  MTK_IOMMU_HAS_FLAG(data->plat_data, IFA_IOMMU_PCIE_SUPPORT)) 
{
+#ifdef CONFIG_PCI
+   bus_set_iommu(&pci_bus_type, NULL);
+#endif
}
pm_runtime_disable(&pdev->dev);
devm_free_irq(&pdev->dev, data->irq, data);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 20/34] iommu/mediatek: Add infra iommu support

2022-04-07 Thread Yong Wu via iommu
The infra iommu enable bits in mt8195 is in the pericfg register segment,
use regmap to update it.

If infra iommu master translation fault, It doesn't have the larbid/portid,
thus print out the whole register value.

Since regmap_update_bits may fail, add return value for mtk_iommu_config.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 36 +---
 drivers/iommu/mtk_iommu.h |  2 ++
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index afb77a530f32..d975c892b332 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -112,6 +112,8 @@
 
 #define MTK_PROTECT_PA_ALIGN   256
 
+#define PERICFG_IOMMU_10x714
+
 #define HAS_4GB_MODE   BIT(0)
 /* HW will use the EMI clock if there isn't the "bclk". */
 #define HAS_BCLK   BIT(1)
@@ -343,8 +345,8 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
   write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
dev_err_ratelimited(
data->dev,
-   "fault type=0x%x iova=0x%llx pa=0x%llx larb=%d port=%d 
layer=%d %s\n",
-   int_state, fault_iova, fault_pa, fault_larb, fault_port,
+   "fault type=0x%x iova=0x%llx pa=0x%llx 
master=0x%x(larb=%d port=%d) layer=%d %s\n",
+   int_state, fault_iova, fault_pa, regval, fault_larb, 
fault_port,
layer, write ? "write" : "read");
}
 
@@ -388,14 +390,15 @@ static int mtk_iommu_get_domain_id(struct device *dev,
return -EINVAL;
 }
 
-static void mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev,
-bool enable, unsigned int domid)
+static int mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev,
+   bool enable, unsigned int domid)
 {
struct mtk_smi_larb_iommu*larb_mmu;
unsigned int larbid, portid;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
const struct mtk_iommu_iova_region *region;
-   int i;
+   u32 peri_mmuen, peri_mmuen_msk;
+   int i, ret = 0;
 
for (i = 0; i < fwspec->num_ids; ++i) {
larbid = MTK_M4U_TO_LARB(fwspec->ids[i]);
@@ -415,8 +418,19 @@ static void mtk_iommu_config(struct mtk_iommu_data *data, 
struct device *dev,
larb_mmu->mmu |= MTK_SMI_MMU_EN(portid);
else
larb_mmu->mmu &= ~MTK_SMI_MMU_EN(portid);
+   } else if (MTK_IOMMU_IS_TYPE(data->plat_data, 
MTK_IOMMU_TYPE_INFRA)) {
+   peri_mmuen_msk = BIT(portid);
+   peri_mmuen = enable ? peri_mmuen_msk : 0;
+
+   ret = regmap_update_bits(data->pericfg, PERICFG_IOMMU_1,
+peri_mmuen_msk, peri_mmuen);
+   if (ret)
+   dev_err(dev, "%s iommu(%s) inframaster 0x%x 
fail(%d).\n",
+   enable ? "enable" : "disable",
+   dev_name(data->dev), peri_mmuen_msk, 
ret);
}
}
+   return ret;
 }
 
 static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
@@ -531,8 +545,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
}
mutex_unlock(&data->mutex);
 
-   mtk_iommu_config(data, dev, true, domid);
-   return 0;
+   return mtk_iommu_config(data, dev, true, domid);
 
 err_unlock:
mutex_unlock(&data->mutex);
@@ -995,6 +1008,15 @@ static int mtk_iommu_probe(struct platform_device *pdev)
ret = mtk_iommu_mm_dts_parse(dev, &match, data);
if (ret)
goto out_runtime_disable;
+   } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) &&
+  data->plat_data->pericfg_comp_str) {
+   infracfg = 
syscon_regmap_lookup_by_compatible(data->plat_data->pericfg_comp_str);
+   if (IS_ERR(infracfg)) {
+   ret = PTR_ERR(infracfg);
+   goto out_runtime_disable;
+   }
+
+   data->pericfg = infracfg;
}
 
platform_set_drvdata(pdev, data);
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index f41e32252056..56838fad8c73 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -55,6 +55,7 @@ struct mtk_iommu_plat_data {
u32 flags;
u32 inv_sel_reg;
 
+   char*pericfg_comp_str;
struct list_head*hw_list;
unsigned intiova_region_nr;

[PATCH v6 18/34] iommu/mediatek: Allow IOMMU_DOMAIN_UNMANAGED for PCIe VFIO

2022-04-07 Thread Yong Wu via iommu
Allow the type IOMMU_DOMAIN_UNMANAGED since vfio_iommu_type1.c always call
iommu_domain_alloc. The PCIe EP works ok when going through vfio.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2746e178d2be..b737613e9046 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -446,7 +446,7 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned 
type)
 {
struct mtk_iommu_domain *dom;
 
-   if (type != IOMMU_DOMAIN_DMA)
+   if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED)
return NULL;
 
dom = kzalloc(sizeof(*dom), GFP_KERNEL);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 16/34] iommu/mediatek: Contain MM IOMMU flow with the MM TYPE

2022-04-07 Thread Yong Wu via iommu
Prepare for supporting INFRA_IOMMU, and APU_IOMMU later.

For Infra IOMMU/APU IOMMU, it doesn't have the "larb""port". thus, Use
the MM flag contain the MM_IOMMU special flow, Also, it moves a big
chunk code about parsing the mediatek,larbs into a function, this is
only needed for MM IOMMU. and all the current SoC are MM_IOMMU.

The device link between iommu consumer device and smi-larb device only
is needed in MM iommu case.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 212 ++
 1 file changed, 121 insertions(+), 91 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 642949aad47e..b048986913b9 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -138,6 +138,8 @@
 #define MTK_IOMMU_IS_TYPE(pdata, _x)   MTK_IOMMU_HAS_FLAG_MASK(pdata, _x,\
MTK_IOMMU_TYPE_MASK)
 
+#define MTK_INVALID_LARBID MTK_LARB_NR_MAX
+
 struct mtk_iommu_domain {
struct io_pgtable_cfg   cfg;
struct io_pgtable_ops   *iop;
@@ -274,7 +276,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
 {
struct mtk_iommu_data *data = dev_id;
struct mtk_iommu_domain *dom = data->m4u_dom;
-   unsigned int fault_larb, fault_port, sub_comm = 0;
+   unsigned int fault_larb = MTK_INVALID_LARBID, fault_port = 0, sub_comm 
= 0;
u32 int_state, regval, va34_32, pa34_32;
u64 fault_iova, fault_pa;
bool layer, write;
@@ -300,17 +302,19 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
pa34_32 = FIELD_GET(F_MMU_INVAL_PA_34_32_MASK, fault_iova);
fault_pa |= (u64)pa34_32 << 32;
 
-   fault_port = F_MMU_INT_ID_PORT_ID(regval);
-   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_2BITS)) {
-   fault_larb = F_MMU_INT_ID_COMM_ID(regval);
-   sub_comm = F_MMU_INT_ID_SUB_COMM_ID(regval);
-   } else if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_3BITS)) {
-   fault_larb = F_MMU_INT_ID_COMM_ID_EXT(regval);
-   sub_comm = F_MMU_INT_ID_SUB_COMM_ID_EXT(regval);
-   } else {
-   fault_larb = F_MMU_INT_ID_LARB_ID(regval);
+   if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
+   fault_port = F_MMU_INT_ID_PORT_ID(regval);
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_2BITS)) {
+   fault_larb = F_MMU_INT_ID_COMM_ID(regval);
+   sub_comm = F_MMU_INT_ID_SUB_COMM_ID(regval);
+   } else if (MTK_IOMMU_HAS_FLAG(data->plat_data, 
HAS_SUB_COMM_3BITS)) {
+   fault_larb = F_MMU_INT_ID_COMM_ID_EXT(regval);
+   sub_comm = F_MMU_INT_ID_SUB_COMM_ID_EXT(regval);
+   } else {
+   fault_larb = F_MMU_INT_ID_LARB_ID(regval);
+   }
+   fault_larb = 
data->plat_data->larbid_remap[fault_larb][sub_comm];
}
-   fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm];
 
if (report_iommu_fault(&dom->domain, data->dev, fault_iova,
   write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
@@ -374,19 +378,21 @@ static void mtk_iommu_config(struct mtk_iommu_data *data, 
struct device *dev,
larbid = MTK_M4U_TO_LARB(fwspec->ids[i]);
portid = MTK_M4U_TO_PORT(fwspec->ids[i]);
 
-   larb_mmu = &data->larb_imu[larbid];
+   if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
+   larb_mmu = &data->larb_imu[larbid];
 
-   region = data->plat_data->iova_region + domid;
-   larb_mmu->bank[portid] = upper_32_bits(region->iova_base);
+   region = data->plat_data->iova_region + domid;
+   larb_mmu->bank[portid] = 
upper_32_bits(region->iova_base);
 
-   dev_dbg(dev, "%s iommu for larb(%s) port %d dom %d bank %d.\n",
-   enable ? "enable" : "disable", dev_name(larb_mmu->dev),
-   portid, domid, larb_mmu->bank[portid]);
+   dev_dbg(dev, "%s iommu for larb(%s) port %d dom %d bank 
%d.\n",
+   enable ? "enable" : "disable", 
dev_name(larb_mmu->dev),
+   portid, domid, larb_mmu->bank[portid]);
 
-   if (enable)
-   larb_mmu->mmu |= MTK_SMI_MMU_EN(portid);
-   else
-   larb_mmu->mmu &= ~MTK_SMI_MMU_EN(portid);
+   if (enable)
+   larb_mmu->mmu |= MTK_SMI_MMU_EN(portid);
+   else
+   larb_mmu->mmu &= ~MTK_SMI_MMU_EN(portid);
+   }
}
 }
 
@@ -593,6 +599,9 @@ static struct iommu_device *mtk_iommu_probe_device(struct 
device *dev)
 
 

[PATCH v6 19/34] iommu/mediatek: Add a PM_CLK_AO flag for infra iommu

2022-04-07 Thread Yong Wu via iommu
The power/clock of infra iommu is always on, and it doesn't have the
device link with the master devices, then the infra iommu device's PM
status is not active, thus we add A PM_CLK_AO flag for infra iommu.

The tlb operation is a bit not clear here, there are 2 special cases.
Comment them in the code.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index b737613e9046..afb77a530f32 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -130,6 +130,8 @@
 #define MTK_IOMMU_TYPE_MM  (0x0 << 13)
 #define MTK_IOMMU_TYPE_INFRA   (0x1 << 13)
 #define MTK_IOMMU_TYPE_MASK(0x3 << 13)
+/* PM and clock always on. e.g. infra iommu */
+#define PM_CLK_AO  BIT(15)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x)  (!!(((pdata)->flags) & (_x)))
 
@@ -235,13 +237,33 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
   struct mtk_iommu_data *data)
 {
struct list_head *head = data->hw_list;
+   bool check_pm_status;
unsigned long flags;
int ret;
u32 tmp;
 
for_each_m4u(data, head) {
-   if (pm_runtime_get_if_in_use(data->dev) <= 0)
-   continue;
+   /*
+* To avoid resume the iommu device frequently when the iommu 
device
+* is not active, it doesn't always call pm_runtime_get here, 
then tlb
+* flush depends on the tlb flush all in the runtime resume.
+*
+* There are 2 special cases:
+*
+* Case1: The iommu dev doesn't have power domain but has bclk. 
This case
+* should also avoid the tlb flush while the dev is not active 
to mute
+* the tlb timeout log. like mt8173.
+*
+* Case2: The power/clock of infra iommu is always on, and it 
doesn't
+* have the device link with the master devices. This case 
should avoid
+* the PM status check.
+*/
+   check_pm_status = !MTK_IOMMU_HAS_FLAG(data->plat_data, 
PM_CLK_AO);
+
+   if (check_pm_status) {
+   if (pm_runtime_get_if_in_use(data->dev) <= 0)
+   continue;
+   }
 
spin_lock_irqsave(&data->tlb_lock, flags);
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
@@ -268,7 +290,8 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
mtk_iommu_tlb_flush_all(data);
}
 
-   pm_runtime_put(data->dev);
+   if (check_pm_status)
+   pm_runtime_put(data->dev);
}
 }
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 17/34] iommu/mediatek: Adjust device link when it is sub-common

2022-04-07 Thread Yong Wu via iommu
For MM IOMMU, We always add device link between smi-common and IOMMU HW.
In mt8195, we add smi-sub-common. Thus, if the node is sub-common, we still
need find again to get smi-common, then do device link.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index b048986913b9..2746e178d2be 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -834,7 +834,7 @@ static const struct component_master_ops mtk_iommu_com_ops 
= {
 static int mtk_iommu_mm_dts_parse(struct device *dev, struct component_match 
**match,
  struct mtk_iommu_data *data)
 {
-   struct device_node *larbnode, *smicomm_node;
+   struct device_node *larbnode, *smicomm_node, *smi_subcomm_node;
struct platform_device *plarbdev;
struct device_link *link;
int i, larb_nr, ret;
@@ -874,11 +874,21 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
component_compare_of, larbnode);
}
 
-   /* Get smi-common dev from the last larb. */
-   smicomm_node = of_parse_phandle(larbnode, "mediatek,smi", 0);
-   if (!smicomm_node)
+   /* Get smi-(sub)-common dev from the last larb. */
+   smi_subcomm_node = of_parse_phandle(larbnode, "mediatek,smi", 0);
+   if (!smi_subcomm_node)
return -EINVAL;
 
+   /*
+* It may have two level smi-common. the node is smi-sub-common if it
+* has a new mediatek,smi property. otherwise it is smi-commmon.
+*/
+   smicomm_node = of_parse_phandle(smi_subcomm_node, "mediatek,smi", 0);
+   if (smicomm_node)
+   of_node_put(smi_subcomm_node);
+   else
+   smicomm_node = smi_subcomm_node;
+
plarbdev = of_find_device_by_node(smicomm_node);
of_node_put(smicomm_node);
data->smicomm_dev = &plarbdev->dev;
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 15/34] iommu/mediatek: Add IOMMU_TYPE flag

2022-04-07 Thread Yong Wu via iommu
Add IOMMU_TYPE definition. In the mt8195, we have another IOMMU_TYPE:
infra iommu, also there will be another APU_IOMMU, thus, use 2bits for the
IOMMU_TYPE.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 84d661e0b371..642949aad47e 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -126,9 +126,17 @@
 #define SHARE_PGTABLE  BIT(10) /* 2 HW share pgtable */
 #define DCM_DISABLEBIT(11)
 #define NOT_STD_AXI_MODE   BIT(12)
+/* 2 bits: iommu type */
+#define MTK_IOMMU_TYPE_MM  (0x0 << 13)
+#define MTK_IOMMU_TYPE_INFRA   (0x1 << 13)
+#define MTK_IOMMU_TYPE_MASK(0x3 << 13)
 
-#define MTK_IOMMU_HAS_FLAG(pdata, _x) \
-   pdata)->flags) & (_x)) == (_x))
+#define MTK_IOMMU_HAS_FLAG(pdata, _x)  (!!(((pdata)->flags) & (_x)))
+
+#define MTK_IOMMU_HAS_FLAG_MASK(pdata, _x, mask)   \
+   pdata)->flags) & (mask)) == (_x))
+#define MTK_IOMMU_IS_TYPE(pdata, _x)   MTK_IOMMU_HAS_FLAG_MASK(pdata, _x,\
+   MTK_IOMMU_TYPE_MASK)
 
 struct mtk_iommu_domain {
struct io_pgtable_cfg   cfg;
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 14/34] iommu/mediatek: Add SUB_COMMON_3BITS flag

2022-04-07 Thread Yong Wu via iommu
In prevous SoC, the sub common id occupy 2 bits. the mt8195's sub common
id has 3bits. Add a new flag for this. and rename the previous flag to
_2BITS. For readable, I put these two flags together, then move the
other flags. no functional change.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 26 --
 drivers/iommu/mtk_iommu.h |  2 +-
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d59f6857a9df..84d661e0b371 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -105,6 +105,8 @@
 #define REG_MMU1_INT_ID0x154
 #define F_MMU_INT_ID_COMM_ID(a)(((a) >> 9) & 0x7)
 #define F_MMU_INT_ID_SUB_COMM_ID(a)(((a) >> 7) & 0x3)
+#define F_MMU_INT_ID_COMM_ID_EXT(a)(((a) >> 10) & 0x7)
+#define F_MMU_INT_ID_SUB_COMM_ID_EXT(a)(((a) >> 7) & 0x7)
 #define F_MMU_INT_ID_LARB_ID(a)(((a) >> 7) & 0x7)
 #define F_MMU_INT_ID_PORT_ID(a)(((a) >> 2) & 0x1f)
 
@@ -116,13 +118,14 @@
 #define HAS_VLD_PA_RNG BIT(2)
 #define RESET_AXI  BIT(3)
 #define OUT_ORDER_WR_ENBIT(4)
-#define HAS_SUB_COMM   BIT(5)
-#define WR_THROT_ENBIT(6)
-#define HAS_LEGACY_IVRP_PADDR  BIT(7)
-#define IOVA_34_EN BIT(8)
-#define SHARE_PGTABLE  BIT(9) /* 2 HW share pgtable */
-#define DCM_DISABLEBIT(10)
-#define NOT_STD_AXI_MODE   BIT(11)
+#define HAS_SUB_COMM_2BITS BIT(5)
+#define HAS_SUB_COMM_3BITS BIT(6)
+#define WR_THROT_ENBIT(7)
+#define HAS_LEGACY_IVRP_PADDR  BIT(8)
+#define IOVA_34_EN BIT(9)
+#define SHARE_PGTABLE  BIT(10) /* 2 HW share pgtable */
+#define DCM_DISABLEBIT(11)
+#define NOT_STD_AXI_MODE   BIT(12)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -290,9 +293,12 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
fault_pa |= (u64)pa34_32 << 32;
 
fault_port = F_MMU_INT_ID_PORT_ID(regval);
-   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM)) {
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_2BITS)) {
fault_larb = F_MMU_INT_ID_COMM_ID(regval);
sub_comm = F_MMU_INT_ID_SUB_COMM_ID(regval);
+   } else if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_3BITS)) {
+   fault_larb = F_MMU_INT_ID_COMM_ID_EXT(regval);
+   sub_comm = F_MMU_INT_ID_SUB_COMM_ID_EXT(regval);
} else {
fault_larb = F_MMU_INT_ID_LARB_ID(regval);
}
@@ -1069,7 +1075,7 @@ static const struct mtk_iommu_plat_data mt2712_data = {
 
 static const struct mtk_iommu_plat_data mt6779_data = {
.m4u_plat  = M4U_MT6779,
-   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN |
+   .flags = HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN | WR_THROT_EN |
 NOT_STD_AXI_MODE,
.inv_sel_reg   = REG_MMU_INV_SEL_GEN2,
.iova_region   = single_domain,
@@ -1107,7 +1113,7 @@ static const struct mtk_iommu_plat_data mt8183_data = {
 
 static const struct mtk_iommu_plat_data mt8192_data = {
.m4u_plat   = M4U_MT8192,
-   .flags  = HAS_BCLK | HAS_SUB_COMM | OUT_ORDER_WR_EN |
+   .flags  = HAS_BCLK | HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN |
  WR_THROT_EN | IOVA_34_EN | NOT_STD_AXI_MODE,
.inv_sel_reg= REG_MMU_INV_SEL_GEN2,
.iova_region= mt8192_multi_dom,
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index dc868fce0d2a..f41e32252056 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -20,7 +20,7 @@
 #include 
 
 #define MTK_LARB_COM_MAX   8
-#define MTK_LARB_SUBCOM_MAX4
+#define MTK_LARB_SUBCOM_MAX8
 
 #define MTK_IOMMU_GROUP_MAX8
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 13/34] iommu/mediatek: Always enable output PA over 32bits in isr

2022-04-07 Thread Yong Wu via iommu
Currently the output PA[32:33] is contained by the flag IOVA_34.
This is not right. the iova_34 has no relation with pa[32:33], the 32bits
iova still could map to pa[32:33]. Move it out from the flag.

No need fix tag since currently only mt8192 use the calulation and it
always has this IOVA_34 flag.

Prepare for the IOMMU that still use IOVA 32bits but its dram size may be
over 4GB.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 1f3fe3276aa0..d59f6857a9df 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -283,11 +283,11 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT;
if (MTK_IOMMU_HAS_FLAG(data->plat_data, IOVA_34_EN)) {
va34_32 = FIELD_GET(F_MMU_INVAL_VA_34_32_MASK, fault_iova);
-   pa34_32 = FIELD_GET(F_MMU_INVAL_PA_34_32_MASK, fault_iova);
fault_iova = fault_iova & F_MMU_INVAL_VA_31_12_MASK;
fault_iova |= (u64)va34_32 << 32;
-   fault_pa |= (u64)pa34_32 << 32;
}
+   pa34_32 = FIELD_GET(F_MMU_INVAL_PA_34_32_MASK, fault_iova);
+   fault_pa |= (u64)pa34_32 << 32;
 
fault_port = F_MMU_INT_ID_PORT_ID(regval);
if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM)) {
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 12/34] iommu/mediatek: Remove the granule in the tlb flush

2022-04-07 Thread Yong Wu via iommu
The MediaTek IOMMU doesn't care about granule when tlb flushing.
Remove this variable.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index e7008a20ec74..1f3fe3276aa0 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -219,7 +219,6 @@ static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data 
*data)
 }
 
 static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
-  size_t granule,
   struct mtk_iommu_data *data)
 {
struct list_head *head = data->hw_list;
@@ -541,8 +540,7 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
*domain,
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
size_t length = gather->end - gather->start + 1;
 
-   mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
-  dom->data);
+   mtk_iommu_tlb_flush_range_sync(gather->start, length, dom->data);
 }
 
 static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
@@ -550,7 +548,7 @@ static void mtk_iommu_sync_map(struct iommu_domain *domain, 
unsigned long iova,
 {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 
-   mtk_iommu_tlb_flush_range_sync(iova, size, size, dom->data);
+   mtk_iommu_tlb_flush_range_sync(iova, size, dom->data);
 }
 
 static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 11/34] iommu/mediatek: Add a flag NON_STD_AXI

2022-04-07 Thread Yong Wu via iommu
Add a new flag NON_STD_AXI, All the previous SoC support this flag.
Prepare for adding infra and apu iommu which don't support this.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 92f172a772d1..e7008a20ec74 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -122,6 +122,7 @@
 #define IOVA_34_EN BIT(8)
 #define SHARE_PGTABLE  BIT(9) /* 2 HW share pgtable */
 #define DCM_DISABLEBIT(10)
+#define NOT_STD_AXI_MODE   BIT(11)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -785,7 +786,8 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
regval = 0;
} else {
regval = readl_relaxed(data->base + REG_MMU_MISC_CTRL);
-   regval &= ~F_MMU_STANDARD_AXI_MODE_MASK;
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, NOT_STD_AXI_MODE))
+   regval &= ~F_MMU_STANDARD_AXI_MODE_MASK;
if (MTK_IOMMU_HAS_FLAG(data->plat_data, OUT_ORDER_WR_EN))
regval &= ~F_MMU_IN_ORDER_WR_EN_MASK;
}
@@ -1058,7 +1060,8 @@ static const struct dev_pm_ops mtk_iommu_pm_ops = {
 
 static const struct mtk_iommu_plat_data mt2712_data = {
.m4u_plat = M4U_MT2712,
-   .flags= HAS_4GB_MODE | HAS_BCLK | HAS_VLD_PA_RNG | 
SHARE_PGTABLE,
+   .flags= HAS_4GB_MODE | HAS_BCLK | HAS_VLD_PA_RNG | 
SHARE_PGTABLE |
+   NOT_STD_AXI_MODE,
.hw_list  = &m4ulist,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
.iova_region  = single_domain,
@@ -1068,7 +1071,8 @@ static const struct mtk_iommu_plat_data mt2712_data = {
 
 static const struct mtk_iommu_plat_data mt6779_data = {
.m4u_plat  = M4U_MT6779,
-   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN,
+   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN |
+NOT_STD_AXI_MODE,
.inv_sel_reg   = REG_MMU_INV_SEL_GEN2,
.iova_region   = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
@@ -1077,7 +1081,7 @@ static const struct mtk_iommu_plat_data mt6779_data = {
 
 static const struct mtk_iommu_plat_data mt8167_data = {
.m4u_plat = M4U_MT8167,
-   .flags= RESET_AXI | HAS_LEGACY_IVRP_PADDR,
+   .flags= RESET_AXI | HAS_LEGACY_IVRP_PADDR | NOT_STD_AXI_MODE,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
.iova_region  = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
@@ -1087,7 +1091,7 @@ static const struct mtk_iommu_plat_data mt8167_data = {
 static const struct mtk_iommu_plat_data mt8173_data = {
.m4u_plat = M4U_MT8173,
.flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI |
-   HAS_LEGACY_IVRP_PADDR,
+   HAS_LEGACY_IVRP_PADDR | NOT_STD_AXI_MODE,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
.iova_region  = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
@@ -1106,7 +1110,7 @@ static const struct mtk_iommu_plat_data mt8183_data = {
 static const struct mtk_iommu_plat_data mt8192_data = {
.m4u_plat   = M4U_MT8192,
.flags  = HAS_BCLK | HAS_SUB_COMM | OUT_ORDER_WR_EN |
- WR_THROT_EN | IOVA_34_EN,
+ WR_THROT_EN | IOVA_34_EN | NOT_STD_AXI_MODE,
.inv_sel_reg= REG_MMU_INV_SEL_GEN2,
.iova_region= mt8192_multi_dom,
.iova_region_nr = ARRAY_SIZE(mt8192_multi_dom),
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 10/34] iommu/mediatek: Add a flag DCM_DISABLE

2022-04-07 Thread Yong Wu via iommu
In the infra iommu, we should disable DCM. add a new flag for this.

Signed-off-by: Yong Wu 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d91a0c138536..92f172a772d1 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -51,6 +51,8 @@
 #define F_MMU_STANDARD_AXI_MODE_MASK   (BIT(3) | BIT(19))
 
 #define REG_MMU_DCM_DIS0x050
+#define F_MMU_DCM  BIT(8)
+
 #define REG_MMU_WR_LEN_CTRL0x054
 #define F_MMU_WR_THROT_DIS_MASK(BIT(5) | BIT(21))
 
@@ -119,6 +121,7 @@
 #define HAS_LEGACY_IVRP_PADDR  BIT(7)
 #define IOVA_34_EN BIT(8)
 #define SHARE_PGTABLE  BIT(9) /* 2 HW share pgtable */
+#define DCM_DISABLEBIT(10)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -765,7 +768,11 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
regval = F_MMU_VLD_PA_RNG(7, 4);
writel_relaxed(regval, data->base + REG_MMU_VLD_PA_RNG);
}
-   writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, DCM_DISABLE))
+   writel_relaxed(F_MMU_DCM, data->base + REG_MMU_DCM_DIS);
+   else
+   writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
+
if (MTK_IOMMU_HAS_FLAG(data->plat_data, WR_THROT_EN)) {
/* write command throttling mode */
regval = readl_relaxed(data->base + REG_MMU_WR_LEN_CTRL);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   >