Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Thu, 7 Apr 2022 12:23:31 -0300 Jason Gunthorpe wrote: > On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote: > > > For the specific case of overriding PCIe No Snoop (which is more problematic > > from an Arm SMMU PoV) when assigning to a VM, would that not be easier > > solved by just having vfio-pci clear the "Enable No Snoop" control bit in > > the endpoint's PCIe capability? > > Ideally. > > That was rediscussed recently, apparently there are non-compliant > devices and drivers that just ignore the bit. > > Presumably this is why x86 had to move to an IOMMU enforced feature.. I considered this option when implementing the current solution, but ultimately I didn't have confidence in being able to prevent drivers from using device specific means to effect the change anyway. GPUs especially have various back channels to config space. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent
On 2022-04-07 20:08, Jason Gunthorpe wrote: On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote: On 2022-04-07 18:43, Jason Gunthorpe wrote: On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: At a glance, this all looks about the right shape to me now, thanks! Thanks! Ideally I'd hope patch #4 could go straight to device_iommu_capable() from my Thunderbolt series, but we can figure that out in a couple of weeks once Yes, this does helps that because now the only iommu_capable call is in a context where a device is available :) Derp, of course I have *two* VFIO patches waiting, the other one touching the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much as I hate it and would love to boot all that stuff over to drivers/irqchip, Oh me too... it's not in my way so I'm leaving it be for now). I'll have to rebase that anyway, so merging this as-is is absolutely fine! This might help your effort - after this series and this below there are no 'bus' users of iommu_capable left at all. Thanks, but I still need a device for the iommu_domain_alloc() as well, so at that point the interrupt check is OK to stay where it is. I figured out a locking strategy per my original idea that seems pretty clean, it just needs vfio_group_viable() to go away first: https://gitlab.arm.com/linux-arm/linux-rm/-/commit/c6057da9f6b5f4b0fb67c6e647d2f8f76a6177fc Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 1/2] PCI: ACPI: Support Microsoft's "DmaProperty"
In subject, PCI/ACPI: ... would be consistent with previous history (at least things coming through the PCI tree :)). On Fri, Mar 25, 2022 at 11:46:08AM -0700, Rajat Jain wrote: > The "DmaProperty" is supported and documented by Microsoft here: > https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports Here's a more specific link (could probably be referenced below to avoid cluttering the text here): https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-internal-pcie-ports-accessible-to-users-and-requiring-dma-protection > They use this property for DMA protection: > https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt > > Support the "DmaProperty" with the same semantics. This is useful for > internal PCI devices that do not hang off a PCIe rootport, but offer > an attack surface for DMA attacks (e.g. internal network devices). Same semantics as what? The MS description of "ExternalFacingPort" says: This ACPI object enables the operating system to identify externally exposed PCIe hierarchies, such as Thunderbolt. and "DmaProperty" says: This ACPI object enables the operating system to identify internal PCIe hierarchies that are easily accessible by users (such as, Laptop M.2 PCIe slots accessible by way of a latch) and require protection by the OS Kernel DMA Protection mechanism. I don't really understand why they called out "laptop M.2 PCIe slots" here. Is the idea that those are more accessible than a standard internal PCIe slot? Seems like a pretty small distinction to me. I can understand your example of internal network devices adding an attack surface. But I don't see how "DmaProperty" helps identify those. Wouldn't a NIC in a standard internal PCIe slot add the same attack surface? > Signed-off-by: Rajat Jain > Reviewed-by: Mika Westerberg > --- > v5: * Reorder the patches in the series > v4: * Add the GUID. > * Update the comment and commitlog. > v3: * Use Microsoft's documented property "DmaProperty" > * Resctrict to ACPI only > > drivers/acpi/property.c | 3 +++ > drivers/pci/pci-acpi.c | 16 > 2 files changed, 19 insertions(+) > > diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c > index d0986bda2964..20603cacc28d 100644 > --- a/drivers/acpi/property.c > +++ b/drivers/acpi/property.c > @@ -48,6 +48,9 @@ static const guid_t prp_guids[] = { > /* Storage device needs D3 GUID: 5025030f-842f-4ab4-a561-99a5189762d0 */ > GUID_INIT(0x5025030f, 0x842f, 0x4ab4, > 0xa5, 0x61, 0x99, 0xa5, 0x18, 0x97, 0x62, 0xd0), > + /* DmaProperty for PCI devices GUID: > 70d24161-6dd5-4c9e-8070-705531292865 */ > + GUID_INIT(0x70d24161, 0x6dd5, 0x4c9e, > + 0x80, 0x70, 0x70, 0x55, 0x31, 0x29, 0x28, 0x65), > }; > > /* ACPI _DSD data subnodes GUID: dbb8e3e6-5886-4ba6-8795-1319f52a966b */ > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > index 1f15ab7eabf8..378e05096c52 100644 > --- a/drivers/pci/pci-acpi.c > +++ b/drivers/pci/pci-acpi.c > @@ -1350,12 +1350,28 @@ static void pci_acpi_set_external_facing(struct > pci_dev *dev) > dev->external_facing = 1; > } > > +static void pci_acpi_check_for_dma_protection(struct pci_dev *dev) I try to avoid function names like *_check_*() because they don't give any hint about whether there's a side effect or what direction things are going. I prefer things that return a value or make sense when used as a predicate. Maybe something like this? int pci_dev_has_dma_property(struct pci_dev *dev) dev->untrusted |= pci_dev_has_dma_property(pci_dev); > +{ > + u8 val; > + > + /* > + * Property also used by Microsoft Windows for same purpose, > + * (to implement DMA protection from a device, using the IOMMU). > + */ > + if (device_property_read_u8(&dev->dev, "DmaProperty", &val)) The MS web page says a _DSD with this property must be implemented in the Root Port device scope, but we don't enforce that here. We *do* enforce it in pci_acpi_set_untrusted(). Shouldn't we do the same here? We currently look at three properties from the same _DSD: DmaProperty ExternalFacingPort HotPlugSupportInD3 For "HotPlugSupportInD3", we check that "value == 1". For "ExternalFacingPort", we check that it's non-zero. The MS doc isn't explicit about the values, but shows "1" in the sample ASL. I think we should handle all three cases the same. The first two use device_property_read_u8(); the last uses acpi_dev_get_property(). Again, I think they should all be the same. acpi_dev_get_property() is easier for me to read because there are slightly fewer layers of abstraction between _DSD and acpi_dev_get_property(). But IIUC, device_property_read_u8() works for either ACPI or DT properties, and maybe there is interest in using this for DT systems. None of these appear in any in
Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent
On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote: > On 2022-04-07 18:43, Jason Gunthorpe wrote: > > On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: > > > At a glance, this all looks about the right shape to me now, thanks! > > > > Thanks! > > > > > Ideally I'd hope patch #4 could go straight to device_iommu_capable() from > > > my Thunderbolt series, but we can figure that out in a couple of weeks > > > once > > > > Yes, this does helps that because now the only iommu_capable call is > > in a context where a device is available :) > > Derp, of course I have *two* VFIO patches waiting, the other one touching > the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much > as I hate it and would love to boot all that stuff over to > drivers/irqchip, Oh me too... > it's not in my way so I'm leaving it be for now). I'll have to rebase that > anyway, so merging this as-is is absolutely fine! This might help your effort - after this series and this below there are no 'bus' users of iommu_capable left at all. >From 55d72be40bc0a031711e318c8dd1cb60673d9eca Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Thu, 7 Apr 2022 16:00:50 -0300 Subject: [PATCH] vfio: Move the IOMMU_CAP_INTR_REMAP to a context with a struct device This check is done to ensure that the device we want to use is fully isolated and the platform does not allow the device's MemWr TLPs to trigger unauthorized MSIs. Instead of doing it in the container context where we only have a group, move the check to open_device where we actually know the device. This is still security safe as userspace cannot trigger an MemWr TLPs without obtaining a device fd. Signed-off-by: Jason Gunthorpe --- drivers/vfio/vfio.c | 9 + drivers/vfio/vfio.h | 1 + drivers/vfio/vfio_iommu_type1.c | 28 +--- 3 files changed, 27 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 9edad767cfdad3..8db5cea1dc1d75 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1355,6 +1355,15 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf) if (IS_ERR(device)) return PTR_ERR(device); + /* Confirm this device is compatible with the container */ + if (group->type == VFIO_IOMMU && + group->container->iommu_driver->ops->device_ok) { + ret = group->container->iommu_driver->ops->device_ok( + group->container->iommu_data, device->dev); + if (ret) + goto err_device_put; + } + if (!try_module_get(device->dev->driver->owner)) { ret = -ENODEV; goto err_device_put; diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index a6713022115155..3db60de71d42eb 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -66,6 +66,7 @@ struct vfio_iommu_driver_ops { struct iommu_group *group); void(*notify)(void *iommu_data, enum vfio_iommu_notify_type event); + int (*device_ok)(void *iommu_data, struct device *device); }; int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops); diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index c13b9290e35759..5e966fb0ab9202 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2153,6 +2153,21 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu, list_splice_tail(iova_copy, iova); } +static int vfio_iommu_device_ok(void *iommu_data, struct device *device) +{ + bool msi_remap; + + msi_remap = irq_domain_check_msi_remap() || + iommu_capable(device->bus, IOMMU_CAP_INTR_REMAP); + + if (!allow_unsafe_interrupts && !msi_remap) { + pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n", + __func__); + return -EPERM; + } + return 0; +} + static int vfio_iommu_type1_attach_group(void *iommu_data, struct iommu_group *iommu_group, enum vfio_group_type type) { @@ -2160,7 +2175,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, struct vfio_iommu_group *group; struct vfio_domain *domain, *d; struct bus_type *bus = NULL; - bool resv_msi, msi_remap; + bool resv_msi; phys_addr_t resv_msi_base = 0; struct iommu_domain_geometry *geo; LIST_HEAD(iova_copy); @@ -2257,16 +2272,6 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, INIT_LIST_HEAD(&domain->group_list); list_add(&group->next, &domain->group_list); - msi_remap = irq_domain_check_msi_remap() || - iommu_capable(bus, IOMMU_
Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent
On 2022-04-07 18:43, Jason Gunthorpe wrote: On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: At a glance, this all looks about the right shape to me now, thanks! Thanks! Ideally I'd hope patch #4 could go straight to device_iommu_capable() from my Thunderbolt series, but we can figure that out in a couple of weeks once Yes, this does helps that because now the only iommu_capable call is in a context where a device is available :) Derp, of course I have *two* VFIO patches waiting, the other one touching the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much as I hate it and would love to boot all that stuff over to drivers/irqchip, it's not in my way so I'm leaving it be for now). I'll have to rebase that anyway, so merging this as-is is absolutely fine! Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] drm/tegra: Stop using iommu_present()
On 4/6/22 21:06, Robin Murphy wrote: > On 2022-04-06 15:32, Dmitry Osipenko wrote: >> On 4/5/22 17:19, Robin Murphy wrote: >>> Remove the pointless check. host1x_drm_wants_iommu() cannot return true >>> unless an IOMMU exists for the host1x platform device, which at the >>> moment >>> means the iommu_present() test could never fail. >>> >>> Signed-off-by: Robin Murphy >>> --- >>> drivers/gpu/drm/tegra/drm.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c >>> index 9464f522e257..bc4321561400 100644 >>> --- a/drivers/gpu/drm/tegra/drm.c >>> +++ b/drivers/gpu/drm/tegra/drm.c >>> @@ -1149,7 +1149,7 @@ static int host1x_drm_probe(struct >>> host1x_device *dev) >>> goto put; >>> } >>> - if (host1x_drm_wants_iommu(dev) && >>> iommu_present(&platform_bus_type)) { >>> + if (host1x_drm_wants_iommu(dev)) { >>> tegra->domain = iommu_domain_alloc(&platform_bus_type); >>> if (!tegra->domain) { >>> err = -ENOMEM; >> >> host1x_drm_wants_iommu() returns true if there is no IOMMU for the >> host1x platform device of Tegra20/30 SoCs. > > Ah, apparently this is another example of what happens when I write > patches late on a Friday night... > > So on second look, what we want to ascertain here is whether dev has an > IOMMU, but only if the host1x parent is not addressing-limited, either > because it can also use the IOMMU, or because all possible addresses are > small enough anyway, right? Yes > Are we specifically looking for the host1x > having a DMA-API-managed domain, or can that also end up using the > tegra->domain or another unmanaged domain too? We have host1x DMA that could have: 1. No IOMMU domain, depending on kernel/DT config 2. Managed domain, on newer SoCs 3. Unmanaged domain, on older SoCs We have Tegra DRM devices which can: 1. Be attached to a shared unmanaged tegra->domain, on older SoCs 2. Have own managed domains, on newer SoCs > I can't quite figure out > from the comments whether it's physical addresses, IOVAs, or both that > we're concerned with here. Tegra DRM allocates buffers and submits jobs to h/w using host1x's channel DMA. DRM framebuffers' addresses are inserted into host1x command buffers by kernel driver and addresses beyond 32bit space need to be treated specially, we don't support such addresses in upstream. IOMMU AS is limited to 32bits on Tegra in upstream kernel for pre-T186 SoCs, it hides 64bit addresses from host1x. Post-T186 SoCs have extra features that allow kernel driver not to bother about addresses. For newer ARM64 SoCs there is assumption in the Tegra drivers that IOMMU always presents, to simplify things. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent
On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: > At a glance, this all looks about the right shape to me now, thanks! Thanks! > Ideally I'd hope patch #4 could go straight to device_iommu_capable() from > my Thunderbolt series, but we can figure that out in a couple of weeks once Yes, this does helps that because now the only iommu_capable call is in a context where a device is available :) > Joerg starts queueing 5.19 material. I've got another VFIO patch waiting for > the DMA ownership series to land anyway, so it's hardly the end of the world > if I have to come back to follow up on this one too. Hopefully Joerg will start soon, I also have patches written waiting for the DMA ownership series. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent
On 2022-04-07 16:23, Jason Gunthorpe wrote: PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented by a platform as bypassing elements in the DMA coherent CPU cache hierarchy. A driver can command a device to set this bit on some of its transactions as a micro-optimization. However, the driver is now responsible to synchronize the CPU cache with the DMA that bypassed it. On x86 this may be done through the wbinvd instruction, and the i915 GPU driver is the only Linux DMA driver that calls it. The problem comes that KVM on x86 will normally disable the wbinvd instruction in the guest and render it a NOP. As the driver running in the guest is not aware the wbinvd doesn't work it may still cause the device to set the no-snoop bit and the platform will bypass the CPU cache. Without a working wbinvd there is no way to re-synchronize the CPU cache and the driver in the VM has data corruption. Thus, we see a general direction on x86 that the IOMMU HW is able to block the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to to NOP the wbinvd without causing any data corruption. This control for Intel IOMMU was exposed by using IOMMU_CACHE and IOMMU_CAP_CACHE_COHERENCY, however these two values now have multiple meanings and usages beyond blocking no-snoop and the whole thing has become confused. AMD IOMMU has the same feature and same IOPTE bits however it unconditionally blocks no-snoop. Change it so that: - IOMMU_CACHE is only about the DMA coherence of normal DMAs from a device. It is used by the DMA API/VFIO/etc when the user of the iommu_domain will not be doing manual cache coherency operations. - IOMMU_CAP_CACHE_COHERENCY indicates if IOMMU_CACHE can be used with the device. - The new optional domain op enforce_cache_coherency() will cause the entire domain to block no-snoop requests - ie there is no way for any device attached to the domain to opt out of the IOMMU_CACHE behavior. This is permanent on the domain and must apply to any future devices attached to it. Ideally an iommu driver should implement enforce_cache_coherency() so that by DMA API domains allow the no-snoop optimization. This leaves it available to kernel drivers like i915. VFIO will call enforce_cache_coherency() before establishing any mappings and the domain should then permanently block no-snoop. If enforce_cache_coherency() fails VFIO will communicate back through to KVM into the arch code via kvm_arch_register_noncoherent_dma() (only implemented by x86) which triggers a working wbinvd to be made available to the VM. While other iommu drivers are certainly welcome to implement enforce_cache_coherency(), it is not clear there is any benefit in doing so right now. This is on github: https://github.com/jgunthorpe/linux/commits/intel_no_snoop v2: - Abandon removing IOMMU_CAP_CACHE_COHERENCY - instead make it the cap flag that indicates IOMMU_CACHE is supported - Put the VFIO tests for IOMMU_CACHE at VFIO device registration - In the Intel driver remove the domain->iommu_snooping value, this is global not per-domain At a glance, this all looks about the right shape to me now, thanks! Ideally I'd hope patch #4 could go straight to device_iommu_capable() from my Thunderbolt series, but we can figure that out in a couple of weeks once Joerg starts queueing 5.19 material. I've got another VFIO patch waiting for the DMA ownership series to land anyway, so it's hardly the end of the world if I have to come back to follow up on this one too. For the series, Acked-by: Robin Murphy v1: https://lore.kernel.org/r/0-v1-ef02c60ddb76+12ca2-intel_no_snoop_...@nvidia.com Jason Gunthorpe (4): iommu: Introduce the domain op enforce_cache_coherency() vfio: Move the Intel no-snoop control off of IOMMU_CACHE iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for IOMMU_CACHE vfio: Require that devices support DMA cache coherence drivers/iommu/amd/iommu.c | 7 +++ drivers/iommu/intel/iommu.c | 17 + drivers/vfio/vfio.c | 7 +++ drivers/vfio/vfio_iommu_type1.c | 30 +++--- include/linux/intel-iommu.h | 2 +- include/linux/iommu.h | 7 +-- 6 files changed, 52 insertions(+), 18 deletions(-) base-commit: 3123109284176b1532874591f7c81f3837bbdc17 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 21/21] nvme-pci: allow mmaping the CMB in userspace
Allow userspace to obtain CMB memory by mmaping the controller's char device. The mmap call allocates and returns a hunk of CMB memory, (the offset is ignored) so userspace does not have control over the address within the CMB. A VMA allocated in this way will only be usable by drivers that set FOLL_PCI_P2PDMA when calling GUP. And inter-device support will be checked the first time the pages are mapped for DMA. Currently this is only supported by O_DIRECT to an PCI NVMe device or through the NVMe passthrough IOCTL. Signed-off-by: Logan Gunthorpe --- drivers/nvme/host/core.c | 15 +++ drivers/nvme/host/nvme.h | 2 ++ drivers/nvme/host/pci.c | 17 + 3 files changed, 34 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index bbc276dda49f..1fd3372c2c18 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3114,6 +3114,10 @@ static int nvme_dev_open(struct inode *inode, struct file *file) } file->private_data = ctrl; + + if (ctrl->ops->cdev_file_open) + ctrl->ops->cdev_file_open(ctrl, file); + return 0; } @@ -3127,12 +3131,23 @@ static int nvme_dev_release(struct inode *inode, struct file *file) return 0; } +static int nvme_dev_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct nvme_ctrl *ctrl = file->private_data; + + if (!ctrl->ops->mmap_cmb) + return -ENODEV; + + return ctrl->ops->mmap_cmb(ctrl, vma); +} + static const struct file_operations nvme_dev_fops = { .owner = THIS_MODULE, .open = nvme_dev_open, .release= nvme_dev_release, .unlocked_ioctl = nvme_dev_ioctl, .compat_ioctl = compat_ptr_ioctl, + .mmap = nvme_dev_mmap, }; static ssize_t nvme_sysfs_reset(struct device *dev, diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 7d97bfb2a9e2..24fbcd274c64 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -497,6 +497,8 @@ struct nvme_ctrl_ops { void (*delete_ctrl)(struct nvme_ctrl *ctrl); int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size); bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl); + void (*cdev_file_open)(struct nvme_ctrl *ctrl, struct file *file); + int (*mmap_cmb)(struct nvme_ctrl *ctrl, struct vm_area_struct *vma); }; /* diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 07412116d4d1..5946244e0295 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2965,6 +2965,21 @@ static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl) return dma_pci_p2pdma_supported(dev->dev); } +static void nvme_pci_cdev_file_open(struct nvme_ctrl *ctrl, struct file *file) +{ + struct pci_dev *pdev = to_pci_dev(to_nvme_dev(ctrl)->dev); + + pci_p2pdma_file_open(pdev, file); +} + +static int nvme_pci_mmap_cmb(struct nvme_ctrl *ctrl, +struct vm_area_struct *vma) +{ + struct pci_dev *pdev = to_pci_dev(to_nvme_dev(ctrl)->dev); + + return pci_mmap_p2pmem(pdev, vma); +} + static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = { .name = "pcie", .module = THIS_MODULE, @@ -2976,6 +2991,8 @@ static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = { .submit_async_event = nvme_pci_submit_async_event, .get_address= nvme_pci_get_address, .supports_pci_p2pdma= nvme_pci_supports_pci_p2pdma, + .cdev_file_open = nvme_pci_cdev_file_open, + .mmap_cmb = nvme_pci_mmap_cmb, }; static int nvme_dev_map(struct nvme_dev *dev) -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 12/21] RDMA/rw: drop pci_p2pdma_[un]map_sg()
dma_map_sg() now supports the use of P2PDMA pages so pci_p2pdma_map_sg() is no longer necessary and may be dropped. This means the rdma_rw_[un]map_sg() helpers are no longer necessary. Remove it all. Signed-off-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe --- drivers/infiniband/core/rw.c | 45 1 file changed, 9 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c index 4d98f931a13d..8367974b7998 100644 --- a/drivers/infiniband/core/rw.c +++ b/drivers/infiniband/core/rw.c @@ -274,33 +274,6 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp, return 1; } -static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg, -u32 sg_cnt, enum dma_data_direction dir) -{ - if (is_pci_p2pdma_page(sg_page(sg))) - pci_p2pdma_unmap_sg(dev->dma_device, sg, sg_cnt, dir); - else - ib_dma_unmap_sg(dev, sg, sg_cnt, dir); -} - -static int rdma_rw_map_sgtable(struct ib_device *dev, struct sg_table *sgt, - enum dma_data_direction dir) -{ - int nents; - - if (is_pci_p2pdma_page(sg_page(sgt->sgl))) { - if (WARN_ON_ONCE(ib_uses_virt_dma(dev))) - return 0; - nents = pci_p2pdma_map_sg(dev->dma_device, sgt->sgl, - sgt->orig_nents, dir); - if (!nents) - return -EIO; - sgt->nents = nents; - return 0; - } - return ib_dma_map_sgtable_attrs(dev, sgt, dir, 0); -} - /** * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context * @ctx: context to initialize @@ -327,7 +300,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u32 port_num, }; int ret; - ret = rdma_rw_map_sgtable(dev, &sgt, dir); + ret = ib_dma_map_sgtable_attrs(dev, &sgt, dir, 0); if (ret) return ret; sg_cnt = sgt.nents; @@ -366,7 +339,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u32 port_num, return ret; out_unmap_sg: - rdma_rw_unmap_sg(dev, sgt.sgl, sgt.orig_nents, dir); + ib_dma_unmap_sgtable_attrs(dev, &sgt, dir, 0); return ret; } EXPORT_SYMBOL(rdma_rw_ctx_init); @@ -414,12 +387,12 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, return -EINVAL; } - ret = rdma_rw_map_sgtable(dev, &sgt, dir); + ret = ib_dma_map_sgtable_attrs(dev, &sgt, dir, 0); if (ret) return ret; if (prot_sg_cnt) { - ret = rdma_rw_map_sgtable(dev, &prot_sgt, dir); + ret = ib_dma_map_sgtable_attrs(dev, &prot_sgt, dir, 0); if (ret) goto out_unmap_sg; } @@ -486,9 +459,9 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, kfree(ctx->reg); out_unmap_prot_sg: if (prot_sgt.nents) - rdma_rw_unmap_sg(dev, prot_sgt.sgl, prot_sgt.orig_nents, dir); + ib_dma_unmap_sgtable_attrs(dev, &prot_sgt, dir, 0); out_unmap_sg: - rdma_rw_unmap_sg(dev, sgt.sgl, sgt.orig_nents, dir); + ib_dma_unmap_sgtable_attrs(dev, &sgt, dir, 0); return ret; } EXPORT_SYMBOL(rdma_rw_ctx_signature_init); @@ -621,7 +594,7 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, break; } - rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir); + ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir); } EXPORT_SYMBOL(rdma_rw_ctx_destroy); @@ -649,8 +622,8 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, struct ib_qp *qp, kfree(ctx->reg); if (prot_sg_cnt) - rdma_rw_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir); - rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir); + ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir); + ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir); } EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature); -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 00/21] Userspace P2PDMA with O_DIRECT NVMe devices
Hi, This patchset continues my work to add userspace P2PDMA access using O_DIRECT NVMe devices. This posting contains some minor fixes and a rebase onto v5.18-rc1 which contains cleanup from Christoph around free_zone_device_page() that helps to enable this patchset. The previous posting was here[1]. The patchset enables userspace P2PDMA by allowing userspace to mmap() allocated chunks of the CMB. The resulting VMA can be passed only to O_DIRECT IO on NVMe backed files or block devices. A flag is added to GUP() in Patch <>, then Patches <> through <> wire this flag up based on whether the block queue indicates P2PDMA support. Patches <> through <> enable the CMB to be mapped into userspace by mmaping the nvme char device. This is relatively straightforward, however the one significant problem is that, presently, pci_p2pdma_map_sg() requires a homogeneous SGL with all P2PDMA pages or all regular pages. Enhancing GUP to support enforcing this rule would require a huge hack that I don't expect would be all that pallatable. So the first 13 patches add support for P2PDMA pages to dma_map_sg[table]() to the dma-direct and dma-iommu implementations. Thus systems without an IOMMU plus Intel and AMD IOMMUs are supported. (Other IOMMU implementations would then be unsupported, notably ARM and PowerPC but support would be added when they convert to dma-iommu). dma_map_sgtable() is preferred when dealing with P2PDMA memory as it will return -EREMOTEIO when the DMA device cannot map specific P2PDMA pages based on the existing rules in calc_map_type_and_dist(). The other issue is dma_unmap_sg() needs a flag to determine whether a given dma_addr_t was mapped regularly or as a PCI bus address. To allow this, a third flag is added to the page_link field in struct scatterlist. This effectively means support for P2PDMA will now depend on CONFIG_64BIT. Feedback welcome. This series is based on v5.18-rc1. A git branch is available here: https://github.com/sbates130272/linux-p2pmem/ p2pdma_user_cmb_v6 Thanks, Logan [1] lkml.kernel.org/r/20220128002614.6136-1-log...@deltatee.com -- Changes since v5: - Rebased onto v5.18-rc1 which includes Christophs cleanup to free_zone_device_page() (similar to Ralph's patch). - Fix bug with concurrent first calls to pci_p2pdma_vma_fault() that caused a double allocation and lost p2p memory. Noticed by Andrew Maier. - Collected a Reviewed-by tag from Chaitanya. - Numerous minor fixes to commit messages Changes since v4: - Rebase onto v5.17-rc1. - Included Ralph Cambell's patches which removes the ZONE_DEVICE page reference count offset. This is just to demonstrate that this series is compatible with that direction. - Added a comment in pci_p2pdma_map_sg_attrs(), per Chaitanya and included his Reviewed-by tags. - Patch 1 in the last series which cleaned up scatterlist.h has been upstreamed. - Dropped NEED_SG_DMA_BUS_ADDR_FLAG seeing depends on doesn't work with selected symbols, per Christoph. - Switched iov_iter_get_pages_[alloc_]flags to be exported with EXPORT_SYMBOL_GPL, per Christoph. - Renamed zone_device_pages_are_mergeable() to zone_device_pages_have_same_pgmap(), per Christoph. - Renamed .mmap_file_open operation in nvme_ctrl_ops to cdev_file_open(), per Christoph. Changes since v3: - Add some comment and commit message cleanups I had missed for v3, also moved the prototypes for some of the p2pdma helpers to dma-map-ops.h (which I missed in v3 and was suggested in v2). - Add separate cleanup patch for scatterlist.h and change the macros to functions. (Suggested by Chaitanya and Jason, respectively) - Rename sg_dma_mark_pci_p2pdma() and sg_is_dma_pci_p2pdma() to sg_dma_mark_bus_address() and sg_is_dma_bus_address() which is a more generic name (As requested by Jason) - Fixes to some comments and commit messages as suggested by Bjorn and Jason. - Ensure swiotlb is not used with P2PDMA pages. (Per Jason) - The sgtable coversion in RDMA was split out and sent upstream separately, the new patch is only the removal. (Per Jason) - Moved the FOLL_PCI_P2PDMA check outside of get_dev_pagemap() as Jason suggested this will be removed in the near term. - Add two patches to ensure that zone device pages with different pgmaps are never merged in the block layer or sg_alloc_append_table_from_pages() (Per Jason) - Ensure synchronize_rcu() or call_rcu() is used before returning pages to the genalloc. (Jason pointed out that pages are not gauranteed to be unused in all architectures until at least after an RCU grace period, and that synchronize_rcu() was likely too slow to use in the vma close operation. - Collected Acks and Reviews by Bjorn, Jason and Max. Logan Gunthorpe (21): lib/scatterlist: add flag for indicating P2PDMA segments in an SGL PCI/P2PDMA: Attempt to set map_type if it has not been set PCI/P2PDMA: Expose pci_p2pdma_map_ty
[PATCH v6 02/21] PCI/P2PDMA: Attempt to set map_type if it has not been set
Attempt to find the mapping type for P2PDMA pages on the first DMA map attempt if it has not been done ahead of time. Previously, the mapping type was expected to be calculated ahead of time, but if pages are to come from userspace then there's no way to ensure the path was checked ahead of time. This change will calculate the mapping type if it hasn't pre-calculated so it is no longer invalid to call pci_p2pdma_map_sg() before the mapping type is calculated, so drop the WARN_ON when that is the case. Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Reviewed-by: Chaitanya Kulkarni --- drivers/pci/p2pdma.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 30b1df3c9d2f..c3a68e82cf36 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -849,6 +849,7 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider; struct pci_dev *client; struct pci_p2pdma *p2pdma; + int dist; if (!provider->p2pdma) return PCI_P2PDMA_MAP_NOT_SUPPORTED; @@ -865,6 +866,10 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, type = xa_to_value(xa_load(&p2pdma->map_types, map_types_idx(client))); rcu_read_unlock(); + + if (type == PCI_P2PDMA_MAP_UNKNOWN) + return calc_map_type_and_dist(provider, client, &dist, true); + return type; } @@ -907,7 +912,7 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg, case PCI_P2PDMA_MAP_BUS_ADDR: return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents); default: - WARN_ON_ONCE(1); + /* Mapping is not Supported */ return 0; } } -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 16/21] block: add check when merging zone device pages
Consecutive zone device pages should not be merged into the same sgl or bvec segment with other types of pages or if they belong to different pgmaps. Otherwise getting the pgmap of a given segment is not possible without scanning the entire segment. This helper returns true either if both pages are not zone device pages or both pages are zone device pages with the same pgmap. Add a helper to determine if zone device pages are mergeable and use this helper in page_is_mergeable(). Signed-off-by: Logan Gunthorpe --- block/bio.c| 2 ++ include/linux/mm.h | 23 +++ 2 files changed, 25 insertions(+) diff --git a/block/bio.c b/block/bio.c index cdd7b2915c53..3406c0450db3 100644 --- a/block/bio.c +++ b/block/bio.c @@ -834,6 +834,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv, return false; if (xen_domain() && !xen_biovec_phys_mergeable(bv, page)) return false; + if (!zone_device_pages_have_same_pgmap(bv->bv_page, page)) + return false; *same_page = ((vec_end_addr & PAGE_MASK) == page_addr); if (*same_page) diff --git a/include/linux/mm.h b/include/linux/mm.h index 14ef41af8b77..fb2264a17e4a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1108,6 +1108,24 @@ static inline bool is_zone_device_page(const struct page *page) { return page_zonenum(page) == ZONE_DEVICE; } + +/* + * Consecutive zone device pages should not be merged into the same sgl + * or bvec segment with other types of pages or if they belong to different + * pgmaps. Otherwise getting the pgmap of a given segment is not possible + * without scanning the entire segment. This helper returns true either if + * both pages are not zone device pages or both pages are zone device pages + * with the same pgmap. + */ +static inline bool zone_device_pages_have_same_pgmap(const struct page *a, +const struct page *b) +{ + if (is_zone_device_page(a) != is_zone_device_page(b)) + return false; + if (!is_zone_device_page(a)) + return true; + return a->pgmap == b->pgmap; +} extern void memmap_init_zone_device(struct zone *, unsigned long, unsigned long, struct dev_pagemap *); #else @@ -1115,6 +1133,11 @@ static inline bool is_zone_device_page(const struct page *page) { return false; } +static inline bool zone_device_pages_have_same_pgmap(const struct page *a, +const struct page *b) +{ + return true; +} #endif static inline bool folio_is_zone_device(const struct folio *folio) -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 14/21] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
GUP Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to allow obtaining P2PDMA pages. If GUP is called without the flag and a P2PDMA page is found, it will return an error. FOLL_PCI_P2PDMA cannot be set if FOLL_LONGTERM is set. Signed-off-by: Logan Gunthorpe --- include/linux/mm.h | 1 + mm/gup.c | 22 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e34edb775334..14ef41af8b77 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2936,6 +2936,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_SPLIT_PMD 0x2 /* split huge pmd before returning */ #define FOLL_PIN 0x4 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x8 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_PCI_P2PDMA0x10 /* allow returning PCI P2PDMA pages */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each diff --git a/mm/gup.c b/mm/gup.c index f598a037eb04..0af6f802ca38 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -490,6 +490,12 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = pte_page(pte); else goto no_page; + + if (unlikely(!(flags & FOLL_PCI_P2PDMA) && +is_pci_p2pdma_page(page))) { + page = ERR_PTR(-EREMOTEIO); + goto out; + } } else if (unlikely(!page)) { if (flags & FOLL_DUMP) { /* Avoid special (like zero) pages in core dumps */ @@ -919,6 +925,9 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP; + if ((gup_flags & FOLL_LONGTERM) && (gup_flags & FOLL_PCI_P2PDMA)) + return -EOPNOTSUPP; + if (vma_is_secretmem(vma)) return -EFAULT; @@ -2184,6 +2193,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); + if (unlikely(pte_devmap(pte) && !(flags & FOLL_PCI_P2PDMA) && +is_pci_p2pdma_page(page))) + goto pte_unmap; + folio = try_grab_folio(page, 1, flags); if (!folio) goto pte_unmap; @@ -2258,6 +2271,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, undo_dev_pagemap(nr, nr_start, flags, pages); break; } + + if (!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)) { + undo_dev_pagemap(nr, nr_start, flags, pages); + break; + } + SetPageReferenced(page); pages[*nr] = page; if (unlikely(!try_grab_page(page, flags))) { @@ -2729,7 +2748,8 @@ static int internal_get_user_pages_fast(unsigned long start, if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | FOLL_FORCE | FOLL_PIN | FOLL_GET | - FOLL_FAST_ONLY | FOLL_NOFAULT))) + FOLL_FAST_ONLY | FOLL_NOFAULT | + FOLL_PCI_P2PDMA))) return -EINVAL; if (gup_flags & FOLL_PIN) -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 19/21] block: set FOLL_PCI_P2PDMA in bio_map_user_iov()
When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be passed from userspace and enables the NVMe passthru requests to use P2PDMA pages. Signed-off-by: Logan Gunthorpe --- block/blk-map.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index c7f71d83eff1..85baf922a0e8 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -234,6 +234,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, gfp_t gfp_mask) { unsigned int max_sectors = queue_max_hw_sectors(rq->q); + unsigned int flags = 0; struct bio *bio; int ret; int j; @@ -246,13 +247,17 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, return -ENOMEM; bio->bi_opf |= req_op(rq); + if (blk_queue_pci_p2pdma(rq->q)) + flags |= FOLL_PCI_P2PDMA; + while (iov_iter_count(iter)) { struct page **pages; ssize_t bytes; size_t offs, added = 0; int npages; - bytes = iov_iter_get_pages_alloc(iter, &pages, LONG_MAX, &offs); + bytes = iov_iter_get_pages_alloc_flags(iter, &pages, LONG_MAX, + &offs, flags); if (unlikely(bytes <= 0)) { ret = bytes ? bytes : -EFAULT; goto out_unmap; -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 04/21] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg() implementations. It takes an scatterlist segment that must point to a pci_p2pdma struct page and will map it if the mapping requires a bus address. The return value indicates whether the mapping required a bus address or whether the caller still needs to map the segment normally. If the segment should not be mapped, -EREMOTEIO is returned. This helper uses a state structure to track the changes to the pgmap across calls and avoid needing to lookup into the xarray for every page. Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU dma_map_sg() implementations where the sg segment containing the page differs from the sg segment containing the DMA address. Prototypes for these helpers are added to dma-map-ops.h as they are only useful to dma map implementations and don't need to pollute the public pci-p2pdma header. Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas --- drivers/pci/p2pdma.c| 59 + include/linux/dma-map-ops.h | 21 + 2 files changed, 80 insertions(+) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 8573bf9d651a..9032c2ed2cdf 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -946,6 +946,65 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, } EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs); +/** + * pci_p2pdma_map_segment - map an sg segment determining the mapping type + * @state: State structure that should be declared outside of the for_each_sg() + * loop and initialized to zero. + * @dev: DMA device that's doing the mapping operation + * @sg: scatterlist segment to map + * + * This is a helper to be used by non-IOMMU dma_map_sg() implementations where + * the sg segment is the same for the page_link and the dma_address. + * + * Attempt to map a single segment in an SGL with the PCI bus address. + * The segment must point to a PCI P2PDMA page and thus must be + * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check. + * + * Returns the type of mapping used and maps the page if the type is + * PCI_P2PDMA_MAP_BUS_ADDR. + */ +enum pci_p2pdma_map_type +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, + struct scatterlist *sg) +{ + if (state->pgmap != sg_page(sg)->pgmap) { + state->pgmap = sg_page(sg)->pgmap; + state->map = pci_p2pdma_map_type(state->pgmap, dev); + state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset; + } + + if (state->map == PCI_P2PDMA_MAP_BUS_ADDR) { + sg->dma_address = sg_phys(sg) + state->bus_off; + sg_dma_len(sg) = sg->length; + sg_dma_mark_bus_address(sg); + } + + return state->map; +} + +/** + * pci_p2pdma_map_bus_segment - map an sg segment pre determined to + * be mapped with PCI_P2PDMA_MAP_BUS_ADDR + * @pg_sg: scatterlist segment with the page to map + * @dma_sg: scatterlist segment to assign a DMA address to + * + * This is a helper for iommu dma_map_sg() implementations when the + * segment for the DMA address differs from the segment containing the + * source page. + * + * pci_p2pdma_map_type() must have already been called on the pg_sg and + * returned PCI_P2PDMA_MAP_BUS_ADDR. + */ +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg, + struct scatterlist *dma_sg) +{ + struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap); + + dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset; + sg_dma_len(dma_sg) = pg_sg->length; + sg_dma_mark_bus_address(dma_sg); +} + /** * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store * to enable p2pdma diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index d693a0e33bac..752f91e5eb5d 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -413,15 +413,36 @@ enum pci_p2pdma_map_type { PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, }; +struct pci_p2pdma_map_state { + struct dev_pagemap *pgmap; + int map; + u64 bus_off; +}; + #ifdef CONFIG_PCI_P2PDMA enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev); +enum pci_p2pdma_map_type +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, + struct scatterlist *sg); +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg, + struct scatterlist *dma_sg); #else /* CONFIG_PCI_P2PDMA */ static inline enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev) { return PCI_P2PDMA_MAP_NOT_SUPPORTED; } +static inline enum pci_p2pdma_map_type +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, + struct scatterlist *
[PATCH v6 01/21] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
Make use of the third free LSB in scatterlist's page_link on 64bit systems. The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a given SGL segments dma_address points to a PCI bus address. dma_unmap_sg_p2pdma() will need to perform different cleanup when a segment is marked as a bus address. The new bit will only be used when CONFIG_PCI_P2PDMA is set; this means PCI P2PDMA will require CONFIG_64BIT. This should be acceptable as the majority of P2PDMA use cases are restricted to newer root complexes and roughly require the extra address space for memory BARs used in the transactions. Signed-off-by: Logan Gunthorpe Reviewed-by: Chaitanya Kulkarni --- drivers/pci/Kconfig | 5 + include/linux/scatterlist.h | 44 - 2 files changed, 48 insertions(+), 1 deletion(-) diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 133c73207782..5cc7cba1941f 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -164,6 +164,11 @@ config PCI_PASID config PCI_P2PDMA bool "PCI peer-to-peer transfer support" depends on ZONE_DEVICE + # + # The need for the scatterlist DMA bus address flag means PCI P2PDMA + # requires 64bit + # + depends on 64BIT select GENERIC_ALLOCATOR help Enableѕ drivers to do PCI peer-to-peer transactions to and from diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 7ff9d6386c12..6561ca8aead8 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -64,12 +64,24 @@ struct sg_append_table { #define SG_CHAIN 0x01UL #define SG_END 0x02UL +/* + * bit 2 is the third free bit in the page_link on 64bit systems which + * is used by dma_unmap_sg() to determine if the dma_address is a + * bus address when doing P2PDMA. + */ +#ifdef CONFIG_PCI_P2PDMA +#define SG_DMA_BUS_ADDRESS 0x04UL +static_assert(__alignof__(struct page) >= 8); +#else +#define SG_DMA_BUS_ADDRESS 0x00UL +#endif + /* * We overload the LSB of the page pointer to indicate whether it's * a valid sg entry, or whether it points to the start of a new scatterlist. * Those low bits are there for everyone! (thanks mason :-) */ -#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END) +#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_DMA_BUS_ADDRESS) static inline unsigned int __sg_flags(struct scatterlist *sg) { @@ -91,6 +103,11 @@ static inline bool sg_is_last(struct scatterlist *sg) return __sg_flags(sg) & SG_END; } +static inline bool sg_is_dma_bus_address(struct scatterlist *sg) +{ + return __sg_flags(sg) & SG_DMA_BUS_ADDRESS; +} + /** * sg_assign_page - Assign a given page to an SG entry * @sg:SG entry @@ -245,6 +262,31 @@ static inline void sg_unmark_end(struct scatterlist *sg) sg->page_link &= ~SG_END; } +/** + * sg_dma_mark_bus address - Mark the scatterlist entry as a bus address + * @sg: SG entryScatterlist + * + * Description: + * Marks the passed in sg entry to indicate that the dma_address is + * a bus address and doesn't need to be unmapped. + **/ +static inline void sg_dma_mark_bus_address(struct scatterlist *sg) +{ + sg->page_link |= SG_DMA_BUS_ADDRESS; +} + +/** + * sg_unmark_pci_p2pdma - Unmark the scatterlist entry as a bus address + * @sg: SG entryScatterlist + * + * Description: + * Clears the bus address mark. + **/ +static inline void sg_dma_unmark_bus_address(struct scatterlist *sg) +{ + sg->page_link &= ~SG_DMA_BUS_ADDRESS; +} + /** * sg_phys - Return physical address of an sg entry * @sg: SG entry -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 07/21] dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support
Add a flags member to the dma_map_ops structure with one flag to indicate support for PCI P2PDMA. Also, add a helper to check if a device supports PCI P2PDMA. Signed-off-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe --- include/linux/dma-map-ops.h | 10 ++ include/linux/dma-mapping.h | 5 + kernel/dma/mapping.c| 18 ++ 3 files changed, 33 insertions(+) diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 752f91e5eb5d..4d4161d58ce0 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -11,7 +11,17 @@ struct cma; +/* + * Values for struct dma_map_ops.flags: + * + * DMA_F_PCI_P2PDMA_SUPPORTED: Indicates the dma_map_ops implementation can + * handle PCI P2PDMA pages in the map_sg/unmap_sg operation. + */ +#define DMA_F_PCI_P2PDMA_SUPPORTED (1 << 0) + struct dma_map_ops { + unsigned int flags; + void *(*alloc)(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index dca2b1355bb1..f7c61b2b4b5e 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -140,6 +140,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, unsigned long attrs); bool dma_can_mmap(struct device *dev); int dma_supported(struct device *dev, u64 mask); +bool dma_pci_p2pdma_supported(struct device *dev); int dma_set_mask(struct device *dev, u64 mask); int dma_set_coherent_mask(struct device *dev, u64 mask); u64 dma_get_required_mask(struct device *dev); @@ -250,6 +251,10 @@ static inline int dma_supported(struct device *dev, u64 mask) { return 0; } +static inline bool dma_pci_p2pdma_supported(struct device *dev) +{ + return false; +} static inline int dma_set_mask(struct device *dev, u64 mask) { return -EIO; diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 9f65d1041638..21793506fdb6 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -722,6 +722,24 @@ int dma_supported(struct device *dev, u64 mask) } EXPORT_SYMBOL(dma_supported); +bool dma_pci_p2pdma_supported(struct device *dev) +{ + const struct dma_map_ops *ops = get_dma_ops(dev); + + /* if ops is not set, dma direct will be used which supports P2PDMA */ + if (!ops) + return true; + + /* +* Note: dma_ops_bypass is not checked here because P2PDMA should +* not be used with dma mapping ops that do not have support even +* if the specific device is bypassing them. +*/ + + return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED; +} +EXPORT_SYMBOL_GPL(dma_pci_p2pdma_supported); + #ifdef CONFIG_ARCH_HAS_DMA_SET_MASK void arch_dma_set_mask(struct device *dev, u64 mask); #else -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg
When a PCI P2PDMA page is seen, set the IOVA length of the segment to zero so that it is not mapped into the IOVA. Then, in finalise_sg(), apply the appropriate bus address to the segment. The IOVA is not created if the scatterlist only consists of P2PDMA pages. A P2PDMA page may have three possible outcomes when being mapped: 1) If the data path between the two devices doesn't go through the root port, then it should be mapped with a PCI bus address 2) If the data path goes through the host bridge, it should be mapped normally with an IOMMU IOVA. 3) It is not possible for the two devices to communicate and thus the mapping operation should fail (and it will return -EREMOTEIO). Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to indicate bus address segments. On unmap, P2PDMA segments are skipped over when determining the start and end IOVA addresses. With this change, the flags variable in the dma_map_ops is set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages. Signed-off-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe --- drivers/iommu/dma-iommu.c | 68 +++ 1 file changed, 61 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 09f6e1c0f9c0..ef86f2b573d1 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -1045,6 +1046,16 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents, sg_dma_address(s) = DMA_MAPPING_ERROR; sg_dma_len(s) = 0; + if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) { + if (i > 0) + cur = sg_next(cur); + + pci_p2pdma_map_bus_segment(s, cur); + count++; + cur_len = 0; + continue; + } + /* * Now fill in the real DMA data. If... * - there is a valid output segment to append to @@ -1141,6 +1152,8 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, struct iova_domain *iovad = &cookie->iovad; struct scatterlist *s, *prev = NULL; int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs); + struct dev_pagemap *pgmap = NULL; + enum pci_p2pdma_map_type map_type; dma_addr_t iova; size_t iova_len = 0; unsigned long mask = dma_get_seg_boundary(dev); @@ -1176,6 +1189,35 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, s_length = iova_align(iovad, s_length + s_iova_off); s->length = s_length; + if (is_pci_p2pdma_page(sg_page(s))) { + if (sg_page(s)->pgmap != pgmap) { + pgmap = sg_page(s)->pgmap; + map_type = pci_p2pdma_map_type(pgmap, dev); + } + + switch (map_type) { + case PCI_P2PDMA_MAP_BUS_ADDR: + /* +* A zero length will be ignored by +* iommu_map_sg() and then can be detected +* in __finalise_sg() to actually map the +* bus address. +*/ + s->length = 0; + continue; + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: + /* +* Mapping through host bridge should be +* mapped with regular IOVAs, thus we +* do nothing here and continue below. +*/ + break; + default: + ret = -EREMOTEIO; + goto out_restore_sg; + } + } + /* * Due to the alignment of our single IOVA allocation, we can * depend on these assumptions about the segment boundary mask: @@ -1198,6 +1240,9 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, prev = s; } + if (!iova_len) + return __finalise_sg(dev, sg, nents, 0); + iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev); if (!iova) { ret = -ENOMEM; @@ -1219,7 +1264,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, out_restore_sg: __invalidate_sg(sg, nents); out: - if (ret != -ENOMEM) + if (ret != -ENOMEM && ret != -EREMOTEIO) return -EINVAL; re
[PATCH v6 13/21] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()
This interface is superseded by support in dma_map_sg() which now supports heterogeneous scatterlists. There are no longer any users, so remove it. Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Reviewed-by: Jason Gunthorpe Reviewed-by: Max Gurtovoy --- drivers/pci/p2pdma.c | 66 -- include/linux/pci-p2pdma.h | 27 2 files changed, 93 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 9032c2ed2cdf..4d3cab9da748 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -880,72 +880,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, return type; } -static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap, - struct device *dev, struct scatterlist *sg, int nents) -{ - struct scatterlist *s; - int i; - - for_each_sg(sg, s, nents, i) { - s->dma_address = sg_phys(s) + p2p_pgmap->bus_offset; - sg_dma_len(s) = s->length; - } - - return nents; -} - -/** - * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA - * @dev: device doing the DMA request - * @sg: scatter list to map - * @nents: elements in the scatterlist - * @dir: DMA direction - * @attrs: DMA attributes passed to dma_map_sg() (if called) - * - * Scatterlists mapped with this function should be unmapped using - * pci_p2pdma_unmap_sg_attrs(). - * - * Returns the number of SG entries mapped or 0 on error. - */ -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction dir, unsigned long attrs) -{ - struct pci_p2pdma_pagemap *p2p_pgmap = - to_p2p_pgmap(sg_page(sg)->pgmap); - - switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) { - case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: - return dma_map_sg_attrs(dev, sg, nents, dir, attrs); - case PCI_P2PDMA_MAP_BUS_ADDR: - return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents); - default: - /* Mapping is not Supported */ - return 0; - } -} -EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs); - -/** - * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was - * mapped with pci_p2pdma_map_sg() - * @dev: device doing the DMA request - * @sg: scatter list to map - * @nents: number of elements returned by pci_p2pdma_map_sg() - * @dir: DMA direction - * @attrs: DMA attributes passed to dma_unmap_sg() (if called) - */ -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction dir, unsigned long attrs) -{ - enum pci_p2pdma_map_type map_type; - - map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev); - - if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE) - dma_unmap_sg_attrs(dev, sg, nents, dir, attrs); -} -EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs); - /** * pci_p2pdma_map_segment - map an sg segment determining the mapping type * @state: State structure that should be declared outside of the for_each_sg() diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 8318a97c9c61..2c07aa6b7665 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -30,10 +30,6 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev, unsigned int *nents, u32 length); void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl); void pci_p2pmem_publish(struct pci_dev *pdev, bool publish); -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction dir, unsigned long attrs); -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction dir, unsigned long attrs); int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev, bool *use_p2pdma); ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, @@ -83,17 +79,6 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev, static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish) { } -static inline int pci_p2pdma_map_sg_attrs(struct device *dev, - struct scatterlist *sg, int nents, enum dma_data_direction dir, - unsigned long attrs) -{ - return 0; -} -static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev, - struct scatterlist *sg, int nents, enum dma_data_direction dir, - unsigned long attrs) -{ -} static inline int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev, bool *use_p2pdma) { @@ -119,16 +104,4 @@ static inline struct pci_dev *pci_p2pmem_find(struct device *client) return pci_p2pmem_find_many(&client, 1); } -static inline int pci_p2pdma_map_sg(struct device
[PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()
Introduce pci_mmap_p2pmem() which is a helper to allocate and mmap a hunk of p2pmem into userspace. Pages are allocated from the genalloc in bulk with their reference count set to one. They are returned to the genalloc when the page is put through p2pdma_page_free() (the reference count is once again set to 1 in free_zone_device_page()). The VMA does not take a reference to the pages when they are inserted with vmf_insert_mixed() (which is necessary for zone device pages) so the backing P2P memory is stored in a structures in vm_private_data. A pseudo mount is used to allocate an inode for each PCI device. The inode's address_space is used in the file doing the mmap so that all VMAs are collected and can be unmapped if the PCI device is unbound. After unmapping, the VMAs are iterated through and their pages are put so the device can continue to be unbound. An active flag is used to signal to VMAs not to allocate any further P2P memory once the removal process starts. The flag is synchronized with concurrent access with an RCU lock. The VMAs and inode will survive after the unbind of the device, but no pages will be present in the VMA and a subsequent access will result in a SIGBUS error. Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas --- drivers/pci/p2pdma.c | 340 - include/linux/pci-p2pdma.h | 11 ++ include/uapi/linux/magic.h | 1 + 3 files changed, 350 insertions(+), 2 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 4d3cab9da748..cce4c7b6dd75 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -17,14 +17,19 @@ #include #include #include +#include +#include #include #include #include +#include struct pci_p2pdma { struct gen_pool *pool; bool p2pmem_published; struct xarray map_types; + struct inode *inode; + bool active; }; struct pci_p2pdma_pagemap { @@ -33,6 +38,17 @@ struct pci_p2pdma_pagemap { u64 bus_offset; }; +struct pci_p2pdma_map { + struct kref ref; + struct rcu_head rcu; + struct pci_dev *pdev; + struct inode *inode; + size_t len; + + spinlock_t kaddr_lock; + void *kaddr; +}; + static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap) { return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap); @@ -101,6 +117,38 @@ static const struct attribute_group p2pmem_group = { .name = "p2pmem", }; +/* + * P2PDMA internal mount + * Fake an internal VFS mount-point in order to allocate struct address_space + * mappings to remove VMAs on unbind events. + */ +static int pci_p2pdma_fs_cnt; +static struct vfsmount *pci_p2pdma_fs_mnt; + +static int pci_p2pdma_fs_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, P2PDMA_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type pci_p2pdma_fs_type = { + .name = "p2dma", + .owner = THIS_MODULE, + .init_fs_context = pci_p2pdma_fs_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static void p2pdma_page_free(struct page *page) +{ + struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap); + + gen_pool_free(pgmap->provider->p2pdma->pool, + (uintptr_t)page_to_virt(page), PAGE_SIZE); +} + +static const struct dev_pagemap_ops p2pdma_pgmap_ops = { + .page_free = p2pdma_page_free, +}; + static void pci_p2pdma_release(void *data) { struct pci_dev *pdev = data; @@ -117,6 +165,9 @@ static void pci_p2pdma_release(void *data) gen_pool_destroy(p2pdma->pool); sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group); xa_destroy(&p2pdma->map_types); + + iput(p2pdma->inode); + simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt); } static int pci_p2pdma_setup(struct pci_dev *pdev) @@ -134,17 +185,32 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) if (!p2p->pool) goto out; - error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); + error = simple_pin_fs(&pci_p2pdma_fs_type, &pci_p2pdma_fs_mnt, + &pci_p2pdma_fs_cnt); if (error) goto out_pool_destroy; + p2p->inode = alloc_anon_inode(pci_p2pdma_fs_mnt->mnt_sb); + if (IS_ERR(p2p->inode)) { + error = -ENOMEM; + goto out_unpin_fs; + } + + error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); + if (error) + goto out_put_inode; + error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); if (error) - goto out_pool_destroy; + goto out_put_inode; rcu_assign_pointer(pdev->p2pdma, p2p); return 0; +out_put_inode: + iput(p2p->inode); +out_unpin_fs: + simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt); out_pool_destroy: gen_pool_destroy(p2p->pool); out: @@ -152,6 +218,
[PATCH v6 18/21] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be passed from userspace and enables the O_DIRECT path in iomap based filesystems and direct to block devices. Signed-off-by: Logan Gunthorpe --- block/bio.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/block/bio.c b/block/bio.c index 3406c0450db3..271a720a6dc1 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1149,6 +1149,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; struct page **pages = (struct page **)bv; bool same_page = false; + unsigned int flags = 0; ssize_t size, left; unsigned len, i; size_t offset; @@ -1161,7 +1162,12 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2); pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); - size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset); + if (bio->bi_bdev && bio->bi_bdev->bd_disk && + blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue)) + flags |= FOLL_PCI_P2PDMA; + + size = iov_iter_get_pages_flags(iter, pages, LONG_MAX, nr_pages, + &offset, flags); if (unlikely(size <= 0)) return size ? size : -EFAULT; -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 15/21] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags() which take a flags argument that is passed to get_user_pages_fast(). This is so that FOLL_PCI_P2PDMA can be passed when appropriate. Signed-off-by: Logan Gunthorpe --- include/linux/uio.h | 6 ++ lib/iov_iter.c | 25 +++-- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 739285fe5a2f..ddf9e4cf4a59 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -232,8 +232,14 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, loff_t start, size_t count); +ssize_t iov_iter_get_pages_flags(struct iov_iter *i, struct page **pages, + size_t maxsize, unsigned maxpages, size_t *start, + unsigned int gup_flags); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); +ssize_t iov_iter_get_pages_alloc_flags(struct iov_iter *i, + struct page ***pages, size_t maxsize, size_t *start, + unsigned int gup_flags); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start); int iov_iter_npages(const struct iov_iter *i, int maxpages); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 6dd5330f7a99..9bf6e3af5120 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1515,9 +1515,9 @@ static struct page *first_bvec_segment(const struct iov_iter *i, return page; } -ssize_t iov_iter_get_pages(struct iov_iter *i, +ssize_t iov_iter_get_pages_flags(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, - size_t *start) + size_t *start, unsigned int gup_flags) { size_t len; int n, res; @@ -1528,7 +1528,6 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, return 0; if (likely(iter_is_iovec(i))) { - unsigned int gup_flags = 0; unsigned long addr; if (iov_iter_rw(i) != WRITE) @@ -1558,6 +1557,13 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, return iter_xarray_get_pages(i, pages, maxsize, maxpages, start); return -EFAULT; } +EXPORT_SYMBOL_GPL(iov_iter_get_pages_flags); + +ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, + size_t maxsize, unsigned maxpages, size_t *start) +{ + return iov_iter_get_pages_flags(i, pages, maxsize, maxpages, start, 0); +} EXPORT_SYMBOL(iov_iter_get_pages); static struct page **get_pages_array(size_t n) @@ -1640,9 +1646,9 @@ static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i, return actual; } -ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, +ssize_t iov_iter_get_pages_alloc_flags(struct iov_iter *i, struct page ***pages, size_t maxsize, - size_t *start) + size_t *start, unsigned int gup_flags) { struct page **p; size_t len; @@ -1654,7 +1660,6 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, return 0; if (likely(iter_is_iovec(i))) { - unsigned int gup_flags = 0; unsigned long addr; if (iov_iter_rw(i) != WRITE) @@ -1667,6 +1672,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, p = get_pages_array(n); if (!p) return -ENOMEM; + res = get_user_pages_fast(addr, n, gup_flags, p); if (unlikely(res <= 0)) { kvfree(p); @@ -1694,6 +1700,13 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, return iter_xarray_get_pages_alloc(i, pages, maxsize, start); return -EFAULT; } +EXPORT_SYMBOL_GPL(iov_iter_get_pages_alloc_flags); + +ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, +size_t maxsize, size_t *start) +{ + return iov_iter_get_pages_alloc_flags(i, pages, maxsize, start, 0); +} EXPORT_SYMBOL(iov_iter_get_pages_alloc); size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 03/21] PCI/P2PDMA: Expose pci_p2pdma_map_type()
pci_p2pdma_map_type() will be needed by the dma-iommu map_sg implementation because it will need to determine the mapping type ahead of actually doing the mapping to create the actual IOMMU mapping. Prototypes for this helper are added to dma-map-ops.h as they are only useful to dma map implementations and don't need to pollute the public pci-p2pdma header Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni --- drivers/pci/p2pdma.c| 25 + include/linux/dma-map-ops.h | 45 + 2 files changed, 61 insertions(+), 9 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index c3a68e82cf36..8573bf9d651a 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -10,6 +10,7 @@ #define pr_fmt(fmt) "pci-p2pdma: " fmt #include +#include #include #include #include @@ -20,13 +21,6 @@ #include #include -enum pci_p2pdma_map_type { - PCI_P2PDMA_MAP_UNKNOWN = 0, - PCI_P2PDMA_MAP_NOT_SUPPORTED, - PCI_P2PDMA_MAP_BUS_ADDR, - PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, -}; - struct pci_p2pdma { struct gen_pool *pool; bool p2pmem_published; @@ -842,8 +836,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish) } EXPORT_SYMBOL_GPL(pci_p2pmem_publish); -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, - struct device *dev) +/** + * pci_p2pdma_map_type - return the type of mapping that should be used for + * a given device and pgmap + * @pgmap: the pagemap of a page to determine the mapping type for + * @dev: device that is mapping the page + * + * Returns one of: + * PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done + * PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address + * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done normally + * using the CPU physical address (in dma-direct) or an IOVA + * mapping for the IOMMU. + */ +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, +struct device *dev) { enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED; struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider; diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 0d5b06b3a4a6..d693a0e33bac 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -379,4 +379,49 @@ static inline void debug_dma_dump_mappings(struct device *dev) extern const struct dma_map_ops dma_dummy_ops; +enum pci_p2pdma_map_type { + /* +* PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping +* type hasn't been calculated yet. Functions that return this enum +* never return this value. +*/ + PCI_P2PDMA_MAP_UNKNOWN = 0, + + /* +* PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will +* traverse the host bridge and the host bridge is not in the +* allowlist. DMA Mapping routines should return an error when +* this is returned. +*/ + PCI_P2PDMA_MAP_NOT_SUPPORTED, + + /* +* PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to +* each other directly through a PCI switch and the transaction will +* not traverse the host bridge. Such a mapping should program +* the DMA engine with PCI bus addresses. +*/ + PCI_P2PDMA_MAP_BUS_ADDR, + + /* +* PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk +* to each other, but the transaction traverses a host bridge on the +* allowlist. In this case, a normal mapping either with CPU physical +* addresses (in the case of dma-direct) or IOVA addresses (in the +* case of IOMMUs) should be used to program the DMA engine. +*/ + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, +}; + +#ifdef CONFIG_PCI_P2PDMA +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, +struct device *dev); +#else /* CONFIG_PCI_P2PDMA */ +static inline enum pci_p2pdma_map_type +pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev) +{ + return PCI_P2PDMA_MAP_NOT_SUPPORTED; +} +#endif /* CONFIG_PCI_P2PDMA */ + #endif /* _LINUX_DMA_MAP_OPS_H */ -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 10/21] nvme-pci: convert to using dma_map_sgtable()
The dma_map operations now support P2PDMA pages directly. So remove the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls to dma_map_sgtable(). dma_map_sgtable() returns more complete error codes than dma_map_sg() and allows differentiating EREMOTEIO errors in case an unsupported P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET so the request isn't retried. Signed-off-by: Logan Gunthorpe Reviewed-by: Max Gurtovoy Reviewed-by: Chaitanya Kulkarni --- drivers/nvme/host/pci.c | 69 + 1 file changed, 29 insertions(+), 40 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index fec4c7191310..07412116d4d1 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -230,11 +230,10 @@ struct nvme_iod { bool use_sgl; int aborted; int npages; /* In the PRP list. 0 means small pool in use */ - int nents; /* Used in scatterlist */ dma_addr_t first_dma; unsigned int dma_len; /* length of single DMA segment mapping */ dma_addr_t meta_dma; - struct scatterlist *sg; + struct sg_table sgt; }; static inline unsigned int nvme_dbbuf_size(struct nvme_dev *dev) @@ -524,7 +523,7 @@ static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx) static void **nvme_pci_iod_list(struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - return (void **)(iod->sg + blk_rq_nr_phys_segments(req)); + return (void **)(iod->sgt.sgl + blk_rq_nr_phys_segments(req)); } static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req) @@ -576,17 +575,6 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct request *req) } } -static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req) -{ - struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - - if (is_pci_p2pdma_page(sg_page(iod->sg))) - pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents, - rq_dma_dir(req)); - else - dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req)); -} - static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); @@ -597,9 +585,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) return; } - WARN_ON_ONCE(!iod->nents); + WARN_ON_ONCE(!iod->sgt.nents); + + dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0); - nvme_unmap_sg(dev, req); if (iod->npages == 0) dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0], iod->first_dma); @@ -607,7 +596,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) nvme_free_sgls(dev, req); else nvme_free_prps(dev, req); - mempool_free(iod->sg, dev->iod_mempool); + mempool_free(iod->sgt.sgl, dev->iod_mempool); } static void nvme_print_sgl(struct scatterlist *sgl, int nents) @@ -630,7 +619,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct dma_pool *pool; int length = blk_rq_payload_bytes(req); - struct scatterlist *sg = iod->sg; + struct scatterlist *sg = iod->sgt.sgl; int dma_len = sg_dma_len(sg); u64 dma_addr = sg_dma_address(sg); int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1); @@ -703,16 +692,16 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, dma_len = sg_dma_len(sg); } done: - cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg)); + cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sgt.sgl)); cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma); return BLK_STS_OK; free_prps: nvme_free_prps(dev, req); return BLK_STS_RESOURCE; bad_sgl: - WARN(DO_ONCE(nvme_print_sgl, iod->sg, iod->nents), + WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents), "Invalid SGL for payload:%d nents:%d\n", - blk_rq_payload_bytes(req), iod->nents); + blk_rq_payload_bytes(req), iod->sgt.nents); return BLK_STS_IOERR; } @@ -738,12 +727,13 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc *sge, } static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev, - struct request *req, struct nvme_rw_command *cmd, int entries) + struct request *req, struct nvme_rw_command *cmd) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct dma_pool *pool; struct nvme_sgl_desc *sg_list; - struct scatterlist *sg = iod->sg; + struct scatterlist *sg = iod->sgt.sgl; + unsigned int entries = iod->sgt.nents; dma_addr_t sgl_dma; int i = 0;
[PATCH v6 05/21] dma-mapping: allow EREMOTEIO return code for P2PDMA transfers
Add EREMOTEIO error return to dma_map_sgtable() which will be used by .map_sg() implementations that detect P2PDMA pages that the underlying DMA device cannot access. Signed-off-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe --- kernel/dma/mapping.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index db7244291b74..9f65d1041638 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -197,7 +197,7 @@ static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, if (ents > 0) debug_dma_map_sg(dev, sg, nents, ents, dir, attrs); else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM && - ents != -EIO)) + ents != -EIO && ents != -EREMOTEIO)) return -EIO; return ents; @@ -255,6 +255,8 @@ EXPORT_SYMBOL(dma_map_sg_attrs); * complete the mapping. Should succeed if retried later. * -EIO Legacy error code with an unknown meaning. eg. this is * returned if a lower level call returned DMA_MAPPING_ERROR. + * -EREMOTEIOThe DMA device cannot access P2PDMA memory specified in + * the sg_table. This will not succeed if retried. */ int dma_map_sgtable(struct device *dev, struct sg_table *sgt, enum dma_data_direction dir, unsigned long attrs) -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 06/21] dma-direct: support PCI P2PDMA pages in dma-direct map_sg
Add PCI P2PDMA support for dma_direct_map_sg() so that it can map PCI P2PDMA pages directly without a hack in the callers. This allows for heterogeneous SGLs that contain both P2PDMA and regular pages. A P2PDMA page may have three possible outcomes when being mapped: 1) If the data path between the two devices doesn't go through the root port, then it should be mapped with a PCI bus address 2) If the data path goes through the host bridge, it should be mapped normally, as though it were a CPU physical address 3) It is not possible for the two devices to communicate and thus the mapping operation should fail (and it will return -EREMOTEIO). SGL segments that contain PCI bus addresses are marked with sg_dma_mark_pci_p2pdma() and are ignored when unmapped. P2PDMA mappings are also failed if swiotlb needs to be used on the mapping. Signed-off-by: Logan Gunthorpe --- kernel/dma/direct.c | 43 +-- kernel/dma/direct.h | 8 +++- 2 files changed, 44 insertions(+), 7 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 9743c6ccce1a..33b838a3ccb2 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -461,29 +461,60 @@ void dma_direct_sync_sg_for_cpu(struct device *dev, arch_sync_dma_for_cpu_all(); } +/* + * Unmaps segments, except for ones marked as pci_p2pdma which do not + * require any further action as they contain a bus address. + */ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, int nents, enum dma_data_direction dir, unsigned long attrs) { struct scatterlist *sg; int i; - for_each_sg(sgl, sg, nents, i) - dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir, -attrs); + for_each_sg(sgl, sg, nents, i) { + if (sg_is_dma_bus_address(sg)) + sg_dma_unmark_bus_address(sg); + else + dma_direct_unmap_page(dev, sg->dma_address, + sg_dma_len(sg), dir, attrs); + } } #endif int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, enum dma_data_direction dir, unsigned long attrs) { - int i; + struct pci_p2pdma_map_state p2pdma_state = {}; + enum pci_p2pdma_map_type map; struct scatterlist *sg; + int i, ret; for_each_sg(sgl, sg, nents, i) { + if (is_pci_p2pdma_page(sg_page(sg))) { + map = pci_p2pdma_map_segment(&p2pdma_state, dev, sg); + switch (map) { + case PCI_P2PDMA_MAP_BUS_ADDR: + continue; + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: + /* +* Any P2P mapping that traverses the PCI +* host bridge must be mapped with CPU physical +* address and not PCI bus addresses. This is +* done with dma_direct_map_page() below. +*/ + break; + default: + ret = -EREMOTEIO; + goto out_unmap; + } + } + sg->dma_address = dma_direct_map_page(dev, sg_page(sg), sg->offset, sg->length, dir, attrs); - if (sg->dma_address == DMA_MAPPING_ERROR) + if (sg->dma_address == DMA_MAPPING_ERROR) { + ret = -EIO; goto out_unmap; + } sg_dma_len(sg) = sg->length; } @@ -491,7 +522,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, out_unmap: dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC); - return -EIO; + return ret; } dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr, diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h index 4632b0f4f72e..81b213409ce8 100644 --- a/kernel/dma/direct.h +++ b/kernel/dma/direct.h @@ -8,6 +8,7 @@ #define _KERNEL_DMA_DIRECT_H #include +#include int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, @@ -87,10 +88,15 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev, phys_addr_t phys = page_to_phys(page) + offset; dma_addr_t dma_addr = phys_to_dma(dev, phys); - if (is_swiotlb_force_bounce(dev)) + if (is_swiotlb_force_bounce(dev)) { + if (is_pci_p2pdma_page(page)) + return DMA_MAPPING_ERROR; return swiotlb_map(dev, phys, size, dir, attrs); + } if (unlikely(!dma_capab
[PATCH v6 17/21] lib/scatterlist: add check when merging zone device pages
Consecutive zone device pages should not be merged into the same sgl or bvec segment with other types of pages or if they belong to different pgmaps. Otherwise getting the pgmap of a given segment is not possible without scanning the entire segment. This helper returns true either if both pages are not zone device pages or both pages are zone device pages with the same pgmap. Factor out the check for page mergability into a pages_are_mergable() helper and add a check with zone_device_pages_are_mergeable(). Signed-off-by: Logan Gunthorpe --- lib/scatterlist.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/lib/scatterlist.c b/lib/scatterlist.c index d5e82e4a57ad..af53a0984f76 100644 --- a/lib/scatterlist.c +++ b/lib/scatterlist.c @@ -410,6 +410,15 @@ static struct scatterlist *get_next_sg(struct sg_append_table *table, return new_sg; } +static bool pages_are_mergeable(struct page *a, struct page *b) +{ + if (page_to_pfn(a) != page_to_pfn(b) + 1) + return false; + if (!zone_device_pages_have_same_pgmap(a, b)) + return false; + return true; +} + /** * sg_alloc_append_table_from_pages - Allocate and initialize an append sg *table from an array of pages @@ -447,6 +456,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append, unsigned int chunks, cur_page, seg_len, i, prv_len = 0; unsigned int added_nents = 0; struct scatterlist *s = sgt_append->prv; + struct page *last_pg; /* * The algorithm below requires max_segment to be aligned to PAGE_SIZE @@ -460,21 +470,17 @@ int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append, return -EOPNOTSUPP; if (sgt_append->prv) { - unsigned long paddr = - (page_to_pfn(sg_page(sgt_append->prv)) * PAGE_SIZE + -sgt_append->prv->offset + sgt_append->prv->length) / - PAGE_SIZE; - if (WARN_ON(offset)) return -EINVAL; /* Merge contiguous pages into the last SG */ prv_len = sgt_append->prv->length; - while (n_pages && page_to_pfn(pages[0]) == paddr) { + last_pg = sg_page(sgt_append->prv); + while (n_pages && pages_are_mergeable(last_pg, pages[0])) { if (sgt_append->prv->length + PAGE_SIZE > max_segment) break; sgt_append->prv->length += PAGE_SIZE; - paddr++; + last_pg = pages[0]; pages++; n_pages--; } @@ -488,7 +494,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append, for (i = 1; i < n_pages; i++) { seg_len += PAGE_SIZE; if (seg_len >= max_segment || - page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) { + !pages_are_mergeable(pages[i], pages[i - 1])) { chunks++; seg_len = 0; } @@ -504,8 +510,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append, for (j = cur_page + 1; j < n_pages; j++) { seg_len += PAGE_SIZE; if (seg_len >= max_segment || - page_to_pfn(pages[j]) != - page_to_pfn(pages[j - 1]) + 1) + !pages_are_mergeable(pages[j], pages[j - 1])) break; } -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 11/21] RDMA/core: introduce ib_dma_pci_p2p_dma_supported()
Introduce the helper function ib_dma_pci_p2p_dma_supported() to check if a given ib_device can be used in P2PDMA transfers. This ensures the ib_device is not using virt_dma and also that the underlying dma_device supports P2PDMA. Use the new helper in nvme-rdma to replace the existing check for ib_uses_virt_dma(). Adding the dma_pci_p2pdma_supported() check allows switching away from pci_p2pdma_[un]map_sg(). Signed-off-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Max Gurtovoy --- drivers/nvme/target/rdma.c | 2 +- include/rdma/ib_verbs.h| 11 +++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 2fab0b219b25..12258f87ccc8 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -415,7 +415,7 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev, if (ib_dma_mapping_error(ndev->device, r->send_sge.addr)) goto out_free_rsp; - if (!ib_uses_virt_dma(ndev->device)) + if (ib_dma_pci_p2p_dma_supported(ndev->device)) r->req.p2p_client = &ndev->device->dev; r->send_sge.length = sizeof(*r->req.cqe); r->send_sge.lkey = ndev->pd->local_dma_lkey; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 69d883f7fb41..79609ab73014 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -4003,6 +4003,17 @@ static inline bool ib_uses_virt_dma(struct ib_device *dev) return IS_ENABLED(CONFIG_INFINIBAND_VIRT_DMA) && !dev->dma_device; } +/* + * Check if a IB device's underlying DMA mapping supports P2PDMA transfers. + */ +static inline bool ib_dma_pci_p2p_dma_supported(struct ib_device *dev) +{ + if (ib_uses_virt_dma(dev)) + return false; + + return dma_pci_p2pdma_supported(dev->dma_device); +} + /** * ib_dma_mapping_error - check a DMA addr for error * @dev: The device for which the dma_addr was created -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 09/21] nvme-pci: check DMA ops when indicating support for PCI P2PDMA
Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops flags can be checked for PCI P2PDMA support. Signed-off-by: Logan Gunthorpe Reviewed-by: Chaitanya Kulkarni --- drivers/nvme/host/core.c | 3 ++- drivers/nvme/host/nvme.h | 2 +- drivers/nvme/host/pci.c | 11 +-- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index efb85c6d8e2d..bbc276dda49f 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3912,7 +3912,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid, blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue); blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue); - if (ctrl->ops->flags & NVME_F_PCI_P2PDMA) + if (ctrl->ops->supports_pci_p2pdma && + ctrl->ops->supports_pci_p2pdma(ctrl)) blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue); ns->ctrl = ctrl; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 1393bbf82d71..7d97bfb2a9e2 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -489,7 +489,6 @@ struct nvme_ctrl_ops { unsigned int flags; #define NVME_F_FABRICS (1 << 0) #define NVME_F_METADATA_SUPPORTED (1 << 1) -#define NVME_F_PCI_P2PDMA (1 << 2) int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val); int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val); int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val); @@ -497,6 +496,7 @@ struct nvme_ctrl_ops { void (*submit_async_event)(struct nvme_ctrl *ctrl); void (*delete_ctrl)(struct nvme_ctrl *ctrl); int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size); + bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl); }; /* diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index d817ca17463e..fec4c7191310 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2969,17 +2969,24 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size) return snprintf(buf, size, "%s\n", dev_name(&pdev->dev)); } +static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl) +{ + struct nvme_dev *dev = to_nvme_dev(ctrl); + + return dma_pci_p2pdma_supported(dev->dev); +} + static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = { .name = "pcie", .module = THIS_MODULE, - .flags = NVME_F_METADATA_SUPPORTED | - NVME_F_PCI_P2PDMA, + .flags = NVME_F_METADATA_SUPPORTED, .reg_read32 = nvme_pci_reg_read32, .reg_write32= nvme_pci_reg_write32, .reg_read64 = nvme_pci_reg_read64, .free_ctrl = nvme_pci_free_ctrl, .submit_async_event = nvme_pci_submit_async_event, .get_address= nvme_pci_get_address, + .supports_pci_p2pdma= nvme_pci_supports_pci_p2pdma, }; static int nvme_dev_map(struct nvme_dev *dev) -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH net-next] sfc: Stop using iommu_present()
On 05/04/2022 14:40, Robin Murphy wrote: > Even if an IOMMU might be present for some PCI segment in the system, > that doesn't necessarily mean it provides translation for the device > we care about. It appears that what we care about here is specifically > whether DMA mapping ops involve any IOMMU overhead or not, so check for > translation actually being active for our device. > > Signed-off-by: Robin Murphy Acked-by: Edward Cree ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 00/11] ACPI/IORT: Support for IORT RMR node
Hi Shameer, I've given it a spin on my Juno board and the EFIFB works fine with it. However I am getting a warning: ACPI: IORT: [Firmware Bug]: RMR descriptor[0xf80d] with zero length, continue anyway Which on examination of the IORT table is correct - the firmware does indeed seem to have a bug and the length in the IORT table is 0, hopefully I can get that fixed. However since it "all works" that points out that the reserved memory region isn't actually used. Instead the existing entries from the SMMU are reused (even though they don't match the address/size region in the RMR). I'm not sure if that matters but I thought it worth pointing out as it's not currently spelt out that the RMR descriptor's content is currently actually ignored. Anyway, FWIW: Tested-by: Steven Price Steve On 04/04/2022 13:41, Shameer Kolothum wrote: > Hi > > v8 --> v9 > - Adressed comments from Robin on interfaces as discussed here[0]. > - Addressed comments from Lorenzo. > > Though functionally there aren't any major changes, the interfaces have > changed from v8 and for that reason not included the T-by tags from > Steve and Eric yet(Many thanks for that). Appreciate it if you could > give this a spin and let me know. > > (The revised ACPICA pull request for IORT E.d related changes is > here[1] and this is now merged to acpica:master.) > > Please take a look and let me know your thoughts. > > Thanks, > Shameer > [0] > https://lore.kernel.org/linux-arm-kernel/c982f1d7-c565-769a-abae-79c962969...@arm.com/ > [1] https://github.com/acpica/acpica/pull/765 > > From old: > We have faced issues with 3408iMR RAID controller cards which > fail to boot when SMMU is enabled. This is because these > controllers make use of host memory for various caching related > purposes and when SMMU is enabled the iMR firmware fails to > access these memory regions as there is no mapping for them. > IORT RMR provides a way for UEFI to describe and report these > memory regions so that the kernel can make a unity mapping for > these in SMMU. > > Change History: > > v7 --> v8 > - Patch #1 has temp definitions for RMR related changes till > the ACPICA header changes are part of kernel. > - No early parsing of RMR node info and is only parsed at the > time of use. > - Changes to the RMR get/put API format compared to the > previous version. > - Support for RMR descriptor shared by multiple stream IDs. > > v6 --> v7 > -fix pointed out by Steve to the SMMUv2 SMR bypass install in patch #8. > > v5 --> v6 > - Addressed comments from Robin & Lorenzo. > : Moved iort_parse_rmr() to acpi_iort_init() from > iort_init_platform_devices(). > : Removed use of struct iort_rmr_entry during the initial > parse. Using struct iommu_resv_region instead. > : Report RMR address alignment and overlap errors, but continue. > : Reworked arm_smmu_init_bypass_stes() (patch # 6). > - Updated SMMUv2 bypass SMR code. Thanks to Jon N (patch #8). > - Set IOMMU protection flags(IOMMU_CACHE, IOMMU_MMIO) based > on Type of RMR region. Suggested by Jon N. > > v4 --> v5 > -Added a fw_data union to struct iommu_resv_region and removed > struct iommu_rmr (Based on comments from Joerg/Robin). > -Added iommu_put_rmrs() to release mem. > -Thanks to Steve for verifying on SMMUv2, but not added the Tested-by > yet because of the above changes. > > v3 -->v4 > -Included the SMMUv2 SMR bypass install changes suggested by > Steve(patch #7) > -As per Robin's comments, RMR reserve implementation is now > more generic (patch #8) and dropped v3 patches 8 and 10. > -Rebase to 5.13-rc1 > > RFC v2 --> v3 > -Dropped RFC tag as the ACPICA header changes are now ready to be > part of 5.13[0]. But this series still has a dependency on that patch. > -Added IORT E.b related changes(node flags, _DSM function 5 checks for > PCIe). > -Changed RMR to stream id mapping from M:N to M:1 as per the spec and > discussion here[1]. > -Last two patches add support for SMMUv2(Thanks to Jon Nettleton!) > > Jon Nettleton (1): > iommu/arm-smmu: Get associated RMR info and install bypass SMR > > Shameer Kolothum (10): > ACPI/IORT: Add temporary RMR node flag definitions > iommu: Introduce a union to struct iommu_resv_region > ACPI/IORT: Make iort_iommu_msi_get_resv_regions() return void > ACPI/IORT: Provide a generic helper to retrieve reserve regions > iommu/dma: Introduce a helper to remove reserved regions > ACPI/IORT: Add support to retrieve IORT RMR reserved regions > ACPI/IORT: Add a helper to retrieve RMR info directly > iommu/arm-smmu-v3: Introduce strtab init helper > iommu/arm-smmu-v3: Refactor arm_smmu_init_bypass_stes() to force > bypass > iommu/arm-smmu-v3: Get associated RMR info and install bypass STE > > drivers/acpi/arm64/iort.c | 369 ++-- > drivers/iommu/apple-dart.c | 2 +- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 80 - > drivers/i
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote: >> My take is that the drivers using this API are doing it to make sure >> their HW blocks are setup in a way that is consistent with the DMA API >> they are also using, and run in constrained embedded-style >> environments that know the firmware support is present. >> >> So in the end it does not seem suitable right now for linking to >> IOMMU_CACHE.. > > That seems a pretty good summary - I think they're basically all "firmware > told Linux I'm coherent so I'd better act coherent" cases, but that still > doesn't necessarily mean that they're *forced* to respect that. Yes. And the interface is horribly misnamed for that. I'll see what I can do to clean this up as I've noticed various other not very nice things in that area. > One of the > things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force > DMA cache maintenance for coherent devices, primarily to hook up in > Panfrost (where there is a bit of a performance to claw back on the > coherent AmLogic SoCs by leaving certain buffers non-cacheable). This has been an explicit request from the amdgpu folks and thus been on my TODO list for quite a while as well. Note that I don't think it should be a flag to dma_alloc_attrs, but rather for dma_alloc_pages as the drivers that want non-snoop generally also want to actually be able to deal with pages. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 3/4] iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for IOMMU_CACHE
While the comment was correct that this flag was intended to convey the block no-snoop support in the IOMMU, it has become widely implemented and used to mean the IOMMU supports IOMMU_CACHE as a map flag. Only the Intel driver was different. Now that the Intel driver is using enforce_cache_coherency() update the comment to make it clear that IOMMU_CAP_CACHE_COHERENCY is only about IOMMU_CACHE. Fix the Intel driver to return true since IOMMU_CACHE always works. The two places that test this flag, usnic and vdpa, are both assigning userspace pages to a driver controlled iommu_domain and require IOMMU_CACHE behavior as they offer no way for userspace to synchronize caches. Signed-off-by: Jason Gunthorpe --- drivers/iommu/intel/iommu.c | 2 +- include/linux/iommu.h | 3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 8f3674e997df06..14ba185175e9ec 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4556,7 +4556,7 @@ static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain) static bool intel_iommu_capable(enum iommu_cap cap) { if (cap == IOMMU_CAP_CACHE_COHERENCY) - return domain_update_iommu_snooping(NULL); + return true; if (cap == IOMMU_CAP_INTR_REMAP) return irq_remapping_enabled == 1; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index fe4f24c469c373..fd58f7adc52796 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -103,8 +103,7 @@ static inline bool iommu_is_dma_domain(struct iommu_domain *domain) } enum iommu_cap { - IOMMU_CAP_CACHE_COHERENCY, /* IOMMU can enforce cache coherent DMA - transactions */ + IOMMU_CAP_CACHE_COHERENCY, /* IOMMU_CACHE is supported */ IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */ IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */ }; -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/4] vfio: Move the Intel no-snoop control off of IOMMU_CACHE
IOMMU_CACHE means "normal DMA to this iommu_domain's IOVA should be cache coherent" and is used by the DMA API. The definition allows for special non-coherent DMA to exist - ie processing of the no-snoop flag in PCIe TLPs - so long as this behavior is opt-in by the device driver. The flag is mainly used by the DMA API to synchronize the IOMMU setting with the expected cache behavior of the DMA master. eg based on dev_is_dma_coherent() in some case. For Intel IOMMU IOMMU_CACHE was redefined to mean 'force all DMA to be cache coherent' which has the practical effect of causing the IOMMU to ignore the no-snoop bit in a PCIe TLP. x86 platforms are always IOMMU_CACHE, so Intel should ignore this flag. Instead use the new domain op enforce_cache_coherency() which causes every IOPTE created in the domain to have the no-snoop blocking behavior. Reconfigure VFIO to always use IOMMU_CACHE and call enforce_cache_coherency() to operate the special Intel behavior. Remove the IOMMU_CACHE test from Intel IOMMU. Ultimately VFIO plumbs the result of enforce_cache_coherency() back into the x86 platform code through kvm_arch_register_noncoherent_dma() which controls if the WBINVD instruction is available in the guest. No other arch implements kvm_arch_register_noncoherent_dma(). Signed-off-by: Jason Gunthorpe --- drivers/iommu/intel/iommu.c | 7 ++- drivers/vfio/vfio_iommu_type1.c | 30 +++--- include/linux/intel-iommu.h | 1 - 3 files changed, 21 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index f08611a6cc4799..8f3674e997df06 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -641,7 +641,6 @@ static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain) static void domain_update_iommu_cap(struct dmar_domain *domain) { domain_update_iommu_coherency(domain); - domain->iommu_snooping = domain_update_iommu_snooping(NULL); domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL); /* @@ -4283,7 +4282,6 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) domain->agaw = width_to_agaw(adjust_width); domain->iommu_coherency = false; - domain->iommu_snooping = false; domain->iommu_superpage = 0; domain->max_addr = 0; @@ -4422,8 +4420,7 @@ static int intel_iommu_map(struct iommu_domain *domain, prot |= DMA_PTE_READ; if (iommu_prot & IOMMU_WRITE) prot |= DMA_PTE_WRITE; - if (((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping) || - dmar_domain->enforce_no_snoop) + if (dmar_domain->enforce_no_snoop) prot |= DMA_PTE_SNP; max_addr = iova + size; @@ -4550,7 +4547,7 @@ static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); - if (!dmar_domain->iommu_snooping) + if (!domain_update_iommu_snooping(NULL)) return false; dmar_domain->enforce_no_snoop = true; return true; diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 9394aa9444c10c..c13b9290e35759 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -84,8 +84,8 @@ struct vfio_domain { struct iommu_domain *domain; struct list_headnext; struct list_headgroup_list; - int prot; /* IOMMU_CACHE */ - boolfgsp; /* Fine-grained super pages */ + boolfgsp : 1; /* Fine-grained super pages */ + boolenforce_cache_coherency : 1; }; struct vfio_dma { @@ -1461,7 +1461,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova, list_for_each_entry(d, &iommu->domain_list, next) { ret = iommu_map(d->domain, iova, (phys_addr_t)pfn << PAGE_SHIFT, - npage << PAGE_SHIFT, prot | d->prot); + npage << PAGE_SHIFT, prot | IOMMU_CACHE); if (ret) goto unwind; @@ -1771,7 +1771,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, } ret = iommu_map(domain->domain, iova, phys, - size, dma->prot | domain->prot); + size, dma->prot | IOMMU_CACHE); if (ret) { if (!dma->iommu_mapped) { vfio_unpin_pages_remote(dma, iova, @@ -1859,7 +1859,7 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain) return; ret = iommu_map(domain->domain, 0, page_to_phys(pages), PAGE_SIZE * 2, - IOMMU_READ | I
[PATCH v2 4/4] vfio: Require that devices support DMA cache coherence
IOMMU_CACHE means that normal DMAs do not require any additional coherency mechanism and is the basic uAPI that VFIO exposes to userspace. For instance VFIO applications like DPDK will not work if additional coherency operations are required. Therefore check IOMMU_CAP_CACHE_COHERENCY like vdpa & usnic do before allowing an IOMMU backed VFIO device to be created. Signed-off-by: Jason Gunthorpe --- drivers/vfio/vfio.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index a4555014bd1e72..9edad767cfdad3 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -815,6 +815,13 @@ static int __vfio_register_dev(struct vfio_device *device, int vfio_register_group_dev(struct vfio_device *device) { + /* +* VFIO always sets IOMMU_CACHE because we offer no way for userspace to +* restore cache coherency. +*/ + if (!iommu_capable(device->dev->bus, IOMMU_CAP_CACHE_COHERENCY)) + return -EINVAL; + return __vfio_register_dev(device, vfio_group_find_or_alloc(device->dev)); } -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/4] iommu: Introduce the domain op enforce_cache_coherency()
This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU. Currently only Intel and AMD IOMMUs are known to support this feature. They both implement it as an IOPTE bit, that when set, will cause PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though the no-snoop bit was clear. The new API is triggered by calling enforce_cache_coherency() before mapping any IOVA to the domain which globally switches on no-snoop blocking. This allows other implementations that might block no-snoop globally and outside the IOPTE - AMD also documents such a HW capability. Leave AMD out of sync with Intel and have it block no-snoop even for in-kernel users. This can be trivially resolved in a follow up patch. Only VFIO will call this new API. Signed-off-by: Jason Gunthorpe --- drivers/iommu/amd/iommu.c | 7 +++ drivers/iommu/intel/iommu.c | 14 +- include/linux/intel-iommu.h | 1 + include/linux/iommu.h | 4 4 files changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a1ada7bff44e61..e500b487eb3429 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2271,6 +2271,12 @@ static int amd_iommu_def_domain_type(struct device *dev) return 0; } +static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain) +{ + /* IOMMU_PTE_FC is always set */ + return true; +} + const struct iommu_ops amd_iommu_ops = { .capable = amd_iommu_capable, .domain_alloc = amd_iommu_domain_alloc, @@ -2293,6 +2299,7 @@ const struct iommu_ops amd_iommu_ops = { .flush_iotlb_all = amd_iommu_flush_iotlb_all, .iotlb_sync = amd_iommu_iotlb_sync, .free = amd_iommu_domain_free, + .enforce_cache_coherency = amd_iommu_enforce_cache_coherency, } }; diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index df5c62ecf942b8..f08611a6cc4799 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4422,7 +4422,8 @@ static int intel_iommu_map(struct iommu_domain *domain, prot |= DMA_PTE_READ; if (iommu_prot & IOMMU_WRITE) prot |= DMA_PTE_WRITE; - if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping) + if (((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping) || + dmar_domain->enforce_no_snoop) prot |= DMA_PTE_SNP; max_addr = iova + size; @@ -4545,6 +4546,16 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain, return phys; } +static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain) +{ + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + + if (!dmar_domain->iommu_snooping) + return false; + dmar_domain->enforce_no_snoop = true; + return true; +} + static bool intel_iommu_capable(enum iommu_cap cap) { if (cap == IOMMU_CAP_CACHE_COHERENCY) @@ -4898,6 +4909,7 @@ const struct iommu_ops intel_iommu_ops = { .iotlb_sync = intel_iommu_tlb_sync, .iova_to_phys = intel_iommu_iova_to_phys, .free = intel_iommu_domain_free, + .enforce_cache_coherency = intel_iommu_enforce_cache_coherency, } }; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 2f9891cb3d0014..1f930c0c225d94 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -540,6 +540,7 @@ struct dmar_domain { u8 has_iotlb_device: 1; u8 iommu_coherency: 1; /* indicate coherency of iommu access */ u8 iommu_snooping: 1; /* indicate snooping control feature */ + u8 enforce_no_snoop : 1;/* Create IOPTEs with snoop control */ struct list_head devices; /* all devices' list */ struct iova_domain iovad; /* iova's that belong to this domain */ diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9208eca4b0d1ac..fe4f24c469c373 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -272,6 +272,9 @@ struct iommu_ops { * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush *queue * @iova_to_phys: translate iova to physical address + * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, + * including no-snoop TLPs on PCIe or other platform + * specific mechanisms. * @enable_nesting: Enable nesting * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) * @free: Release the domain after use. @@ -300,6 +303,7 @@ struct iommu_domain_ops { phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t i
[PATCH v2 0/4] Make the iommu driver no-snoop block feature consistent
PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented by a platform as bypassing elements in the DMA coherent CPU cache hierarchy. A driver can command a device to set this bit on some of its transactions as a micro-optimization. However, the driver is now responsible to synchronize the CPU cache with the DMA that bypassed it. On x86 this may be done through the wbinvd instruction, and the i915 GPU driver is the only Linux DMA driver that calls it. The problem comes that KVM on x86 will normally disable the wbinvd instruction in the guest and render it a NOP. As the driver running in the guest is not aware the wbinvd doesn't work it may still cause the device to set the no-snoop bit and the platform will bypass the CPU cache. Without a working wbinvd there is no way to re-synchronize the CPU cache and the driver in the VM has data corruption. Thus, we see a general direction on x86 that the IOMMU HW is able to block the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to to NOP the wbinvd without causing any data corruption. This control for Intel IOMMU was exposed by using IOMMU_CACHE and IOMMU_CAP_CACHE_COHERENCY, however these two values now have multiple meanings and usages beyond blocking no-snoop and the whole thing has become confused. AMD IOMMU has the same feature and same IOPTE bits however it unconditionally blocks no-snoop. Change it so that: - IOMMU_CACHE is only about the DMA coherence of normal DMAs from a device. It is used by the DMA API/VFIO/etc when the user of the iommu_domain will not be doing manual cache coherency operations. - IOMMU_CAP_CACHE_COHERENCY indicates if IOMMU_CACHE can be used with the device. - The new optional domain op enforce_cache_coherency() will cause the entire domain to block no-snoop requests - ie there is no way for any device attached to the domain to opt out of the IOMMU_CACHE behavior. This is permanent on the domain and must apply to any future devices attached to it. Ideally an iommu driver should implement enforce_cache_coherency() so that by DMA API domains allow the no-snoop optimization. This leaves it available to kernel drivers like i915. VFIO will call enforce_cache_coherency() before establishing any mappings and the domain should then permanently block no-snoop. If enforce_cache_coherency() fails VFIO will communicate back through to KVM into the arch code via kvm_arch_register_noncoherent_dma() (only implemented by x86) which triggers a working wbinvd to be made available to the VM. While other iommu drivers are certainly welcome to implement enforce_cache_coherency(), it is not clear there is any benefit in doing so right now. This is on github: https://github.com/jgunthorpe/linux/commits/intel_no_snoop v2: - Abandon removing IOMMU_CAP_CACHE_COHERENCY - instead make it the cap flag that indicates IOMMU_CACHE is supported - Put the VFIO tests for IOMMU_CACHE at VFIO device registration - In the Intel driver remove the domain->iommu_snooping value, this is global not per-domain v1: https://lore.kernel.org/r/0-v1-ef02c60ddb76+12ca2-intel_no_snoop_...@nvidia.com Jason Gunthorpe (4): iommu: Introduce the domain op enforce_cache_coherency() vfio: Move the Intel no-snoop control off of IOMMU_CACHE iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for IOMMU_CACHE vfio: Require that devices support DMA cache coherence drivers/iommu/amd/iommu.c | 7 +++ drivers/iommu/intel/iommu.c | 17 + drivers/vfio/vfio.c | 7 +++ drivers/vfio/vfio_iommu_type1.c | 30 +++--- include/linux/intel-iommu.h | 2 +- include/linux/iommu.h | 7 +-- 6 files changed, 52 insertions(+), 18 deletions(-) base-commit: 3123109284176b1532874591f7c81f3837bbdc17 -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote: > For the specific case of overriding PCIe No Snoop (which is more problematic > from an Arm SMMU PoV) when assigning to a VM, would that not be easier > solved by just having vfio-pci clear the "Enable No Snoop" control bit in > the endpoint's PCIe capability? Ideally. That was rediscussed recently, apparently there are non-compliant devices and drivers that just ignore the bit. Presumably this is why x86 had to move to an IOMMU enforced feature.. > That seems a pretty good summary - I think they're basically all "firmware > told Linux I'm coherent so I'd better act coherent" cases, but that still > doesn't necessarily mean that they're *forced* to respect that. One of the > things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force > DMA cache maintenance for coherent devices, primarily to hook up in Panfrost > (where there is a bit of a performance to claw back on the coherent AmLogic > SoCs by leaving certain buffers non-cacheable). It would be great to see that in a way that could bring in the few other GPU drivers doing no-snoop to a formal DMA API instead of hacking their own stuff with wbinvd calls or whatever. Thanks, Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On 2022-04-07 14:59, Jason Gunthorpe wrote: On Thu, Apr 07, 2022 at 07:18:48AM +, Tian, Kevin wrote: From: Jason Gunthorpe Sent: Thursday, April 7, 2022 1:17 AM On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: Oh, I didn't know about device_get_dma_attr().. Which is completely broken for any non-OF, non-ACPI plaform. I saw that, but I spent some time searching and could not find an iommu driver that would load independently of OF or ACPI. ie no IOMMU platform drivers are created by board files. Things like Intel/AMD discover only from ACPI, etc. Intel discovers IOMMUs (and optionally ACPI namespace devices) from ACPI, but there is no ACPI description for PCI devices i.e. the current logic of device_get_dma_attr() cannot be used on PCI devices. Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or DEV_DMA_NOT_SUPPORTED? I think it _should_ return DEV_DMA_COHERENT on x86/IA-64 (unless a _CCA method was actually present to say otherwise), based on acpi_init_coherency(), but I only know for sure what happens on arm64. I think I should give up on this and just redefine the existing iommu cap flag to IOMMU_CAP_CACHE_SUPPORTED or something. TBH I don't see any issue with current name, but I'd certainly be happy to nail down a specific definition for it, along the lines of "this means that IOMMU_CACHE mappings are generally coherent". That works for things like Arm's S2FWB making it OK to assign an otherwise-non-coherent device without extra hassle. For the specific case of overriding PCIe No Snoop (which is more problematic from an Arm SMMU PoV) when assigning to a VM, would that not be easier solved by just having vfio-pci clear the "Enable No Snoop" control bit in the endpoint's PCIe capability? We could alternatively use existing device_get_dma_attr() as a default with an iommu wrapper and push the exception down through the iommu driver and s390 can override it. if going this way probably device_get_dma_attr() should be renamed to device_fwnode_get_dma_attr() instead to make it clearer? I'm looking at the few users: drivers/ata/ahci_ceva.c drivers/ata/ahci_qoriq.c - These are ARM only drivers. They are trying to copy the dma-coherent property from its DT/ACPI definition to internal register settings which look like they tune how the AXI bus transactions are created. I'm guessing the SATA IP block's AXI interface can be configured to generate coherent or non-coherent requests and it has to be set in a way that is consistent with the SOC architecture and match what the DMA API expects the device will do. drivers/crypto/ccp/sp-platform.c - Only used on ARM64 and also programs a HW register similar to the sata drivers. Refuses to work if the FW property is not present. drivers/net/ethernet/amd/xgbe/xgbe-platform.c - Seems to be configuring another ARM AXI block drivers/gpu/drm/panfrost/panfrost_drv.c - Robin's commit comment here is good, and one of the things this controls is if the coherent_walk is set for the io-pgtable-arm.c code which avoids DMA API calls drivers/gpu/drm/tegra/uapi.c - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea. My take is that the drivers using this API are doing it to make sure their HW blocks are setup in a way that is consistent with the DMA API they are also using, and run in constrained embedded-style environments that know the firmware support is present. So in the end it does not seem suitable right now for linking to IOMMU_CACHE.. That seems a pretty good summary - I think they're basically all "firmware told Linux I'm coherent so I'd better act coherent" cases, but that still doesn't necessarily mean that they're *forced* to respect that. One of the things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force DMA cache maintenance for coherent devices, primarily to hook up in Panfrost (where there is a bit of a performance to claw back on the coherent AmLogic SoCs by leaving certain buffers non-cacheable). Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/5] Make the iommu driver no-snoop block feature consistent
On Wed, Apr 06, 2022 at 06:52:04AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Wednesday, April 6, 2022 12:16 AM > > > > PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented > > by a platform as bypassing elements in the DMA coherent CPU cache > > hierarchy. A driver can command a device to set this bit on some of its > > transactions as a micro-optimization. > > > > However, the driver is now responsible to synchronize the CPU cache with > > the DMA that bypassed it. On x86 this is done through the wbinvd > > instruction, and the i915 GPU driver is the only Linux DMA driver that > > calls it. > > More accurately x86 supports both unprivileged clflush instructions > to invalidate one cacheline and a privileged wbinvd instruction to > invalidate the entire cache. Replacing 'this is done' with 'this may > be done' is clearer. > > > > > The problem comes that KVM on x86 will normally disable the wbinvd > > instruction in the guest and render it a NOP. As the driver running in the > > guest is not aware the wbinvd doesn't work it may still cause the device > > to set the no-snoop bit and the platform will bypass the CPU cache. > > Without a working wbinvd there is no way to re-synchronize the CPU cache > > and the driver in the VM has data corruption. > > > > Thus, we see a general direction on x86 that the IOMMU HW is able to block > > the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to > > to NOP the wbinvd without causing any data corruption. > > > > This control for Intel IOMMU was exposed by using IOMMU_CACHE and > > IOMMU_CAP_CACHE_COHERENCY, however these two values now have > > multiple > > meanings and usages beyond blocking no-snoop and the whole thing has > > become confused. > > Also point out your finding about AMD IOMMU? Done, thanks Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] vfio: Require that devices support DMA cache coherence
On Wed, Apr 06, 2022 at 07:02:36AM +, Tian, Kevin wrote: > > So like this: > > > > int vfio_register_group_dev(struct vfio_device *device) > > { > > + if (!dev_is_dma_coherent(device->dev)) > > + return -EINVAL; > > + > > return __vfio_register_dev(device, > > vfio_group_find_or_alloc(device->dev)); > > } > > > > I fixed it up. > > > > if that is the case should it also apply to usnic and vdpa in the first > patch (i.e. fail the probe)? Ideally, but I don't want to mess with existing logic in these drivers.. Thanks, Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RESEND v5 4/5] iommu: Allow max opt DMA len be set for a group via sysfs
On 07/04/2022 09:21, Leizhen (ThunderTown) wrote: On 2022/4/4 19:27, John Garry wrote: Add support to allow the maximum optimised DMA len be set for an IOMMU group via sysfs. This is much the same with the method to change the default domain type for a group. Signed-off-by: John Garry --- .../ABI/testing/sysfs-kernel-iommu_groups | 16 + drivers/iommu/iommu.c | 59 ++- include/linux/iommu.h | 6 ++ 3 files changed, 79 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups b/Documentation/ABI/testing/sysfs-kernel-iommu_groups index b15af6a5bc08..ed6f72794f6c 100644 --- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups +++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups @@ -63,3 +63,19 @@ Description: /sys/kernel/iommu_groups//type shows the type of default system could lead to catastrophic effects (the users might need to reboot the machine to get it to normal state). So, it's expected that the users understand what they're doing. + +What: /sys/kernel/iommu_groups//max_opt_dma_size +Date: Feb 2022 +KernelVersion: v5.18 +Contact: iommu@lists.linux-foundation.org +Description: /sys/kernel/iommu_groups//max_opt_dma_size shows the + max optimised DMA size for the default IOMMU domain associated + with the group. + Each IOMMU domain has an IOVA domain. The IOVA domain caches + IOVAs upto a certain size as a performance optimisation. + This sysfs file allows the range of the IOVA domain caching be + set, such that larger than default IOVAs may be cached. + A value of 0 means that the default caching range is chosen. + A privileged user could request the kernel the change the range + by writing to this file. For this to happen, the same rules + and procedure applies as in changing the default domain type. diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 10bb10c2a210..7c7258f19bed 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -48,6 +48,7 @@ struct iommu_group { struct iommu_domain *default_domain; struct iommu_domain *domain; struct list_head entry; + size_t max_opt_dma_size; }; struct group_device { @@ -89,6 +90,9 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group, static struct iommu_group *iommu_group_get_for_dev(struct device *dev); static ssize_t iommu_group_store_type(struct iommu_group *group, const char *buf, size_t count); +static ssize_t iommu_group_store_max_opt_dma_size(struct iommu_group *group, + const char *buf, + size_t count); #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ struct iommu_group_attribute iommu_group_attr_##_name = \ @@ -571,6 +575,12 @@ static ssize_t iommu_group_show_type(struct iommu_group *group, return strlen(type); } +static ssize_t iommu_group_show_max_opt_dma_size(struct iommu_group *group, +char *buf) +{ + return sprintf(buf, "%zu\n", group->max_opt_dma_size); +} + static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL); static IOMMU_GROUP_ATTR(reserved_regions, 0444, @@ -579,6 +589,9 @@ static IOMMU_GROUP_ATTR(reserved_regions, 0444, static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type, iommu_group_store_type); +static IOMMU_GROUP_ATTR(max_opt_dma_size, 0644, iommu_group_show_max_opt_dma_size, + iommu_group_store_max_opt_dma_size); + static void iommu_group_release(struct kobject *kobj) { struct iommu_group *group = to_iommu_group(kobj); @@ -665,6 +678,10 @@ struct iommu_group *iommu_group_alloc(void) if (ret) return ERR_PTR(ret); + ret = iommu_group_create_file(group, &iommu_group_attr_max_opt_dma_size); + if (ret) + return ERR_PTR(ret); + pr_debug("Allocated group %d\n", group->id); return group; @@ -2087,6 +2104,11 @@ struct iommu_domain *iommu_get_dma_domain(struct device *dev) return dev->iommu_group->default_domain; } +size_t iommu_group_get_max_opt_dma_size(struct iommu_group *group) +{ + return group->max_opt_dma_size; +} + /* * IOMMU groups are really the natural working unit of the IOMMU, but * the IOMMU API works on domains and devices. Bridge that gap by @@ -2871,12 +2893,14 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); * @prev_dev: The device in the group (this is used to make sure that the device * hasn't changed after the caller has called this function) * @type: The type of the new default domain that ge
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Thu, Apr 07, 2022 at 07:18:48AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Thursday, April 7, 2022 1:17 AM > > > > On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: > > > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: > > > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > > > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > > > > > Oh, I didn't know about device_get_dma_attr().. > > > > > > > > > > Which is completely broken for any non-OF, non-ACPI plaform. > > > > > > > > I saw that, but I spent some time searching and could not find an > > > > iommu driver that would load independently of OF or ACPI. ie no IOMMU > > > > platform drivers are created by board files. Things like Intel/AMD > > > > discover only from ACPI, etc. > > Intel discovers IOMMUs (and optionally ACPI namespace devices) from > ACPI, but there is no ACPI description for PCI devices i.e. the current > logic of device_get_dma_attr() cannot be used on PCI devices. Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or DEV_DMA_NOT_SUPPORTED? I think I should give up on this and just redefine the existing iommu cap flag to IOMMU_CAP_CACHE_SUPPORTED or something. > > We could alternatively use existing device_get_dma_attr() as a default > > with an iommu wrapper and push the exception down through the iommu > > driver and s390 can override it. > > > > if going this way probably device_get_dma_attr() should be renamed to > device_fwnode_get_dma_attr() instead to make it clearer? I'm looking at the few users: drivers/ata/ahci_ceva.c drivers/ata/ahci_qoriq.c - These are ARM only drivers. They are trying to copy the dma-coherent property from its DT/ACPI definition to internal register settings which look like they tune how the AXI bus transactions are created. I'm guessing the SATA IP block's AXI interface can be configured to generate coherent or non-coherent requests and it has to be set in a way that is consistent with the SOC architecture and match what the DMA API expects the device will do. drivers/crypto/ccp/sp-platform.c - Only used on ARM64 and also programs a HW register similar to the sata drivers. Refuses to work if the FW property is not present. drivers/net/ethernet/amd/xgbe/xgbe-platform.c - Seems to be configuring another ARM AXI block drivers/gpu/drm/panfrost/panfrost_drv.c - Robin's commit comment here is good, and one of the things this controls is if the coherent_walk is set for the io-pgtable-arm.c code which avoids DMA API calls drivers/gpu/drm/tegra/uapi.c - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea. My take is that the drivers using this API are doing it to make sure their HW blocks are setup in a way that is consistent with the DMA API they are also using, and run in constrained embedded-style environments that know the firmware support is present. So in the end it does not seem suitable right now for linking to IOMMU_CACHE.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 06/11] ACPI/IORT: Add support to retrieve IORT RMR reserved regions
On Thu, Apr 07, 2022 at 02:53:38PM +0100, Robin Murphy wrote: > > Why can't this just go into generic_iommu_put_resv_regions? The idea > > that the iommu low-level drivers need to call into dma-iommu which is > > a consumer of the IOMMU API is odd. Especially if that just calls out > > to ACPI code and generic IOMMU code only anyway. > > Because assuming ACPI means IORT is not generic. Part of the aim in adding > the union to iommu_resv_region is that stuff like AMD's unity_map_entry and > Intel's dmar_rmrr_unit can be folded into it as well, and their reserved > region handling correspondingly simplified too. > > The iommu_dma_{get,put}_resv_region() helpers are kind of intended to be > specific to the fwnode mechanism which deals with IORT and devicetree (once > the reserved region bindings are fully worked out). But IORT is not driver₋specific code. So we'll need a USE_IORT flag somewhere in core IOMMU code instead of trying to stuff this into driver operations. and dma-iommu mostly certainly implies IORT even less than ACPI. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 06/11] ACPI/IORT: Add support to retrieve IORT RMR reserved regions
On 2022-04-07 14:28, Christoph Hellwig wrote: +static void iort_rmr_desc_check_overlap(struct acpi_iort_rmr_desc *desc, u32 count) Overly long line. void iommu_dma_put_resv_regions(struct device *dev, struct list_head *list) { + if (!is_of_node(dev_iommu_fwspec_get(dev)->iommu_fwnode)) + iort_iommu_put_resv_regions(dev, list); + generic_iommu_put_resv_regions(dev, list); } Why can't this just go into generic_iommu_put_resv_regions? The idea that the iommu low-level drivers need to call into dma-iommu which is a consumer of the IOMMU API is odd. Especially if that just calls out to ACPI code and generic IOMMU code only anyway. Because assuming ACPI means IORT is not generic. Part of the aim in adding the union to iommu_resv_region is that stuff like AMD's unity_map_entry and Intel's dmar_rmrr_unit can be folded into it as well, and their reserved region handling correspondingly simplified too. The iommu_dma_{get,put}_resv_region() helpers are kind of intended to be specific to the fwnode mechanism which deals with IORT and devicetree (once the reserved region bindings are fully worked out). Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RESEND v5 5/5] iova: Add iova_len argument to iova_domain_init_rcaches()
On 07/04/2022 09:27, Leizhen (ThunderTown) wrote: Thanks for having a look On 2022/4/4 19:27, John Garry wrote: Add max opt argument to iova_domain_init_rcaches(), and use it to set the rcaches range. Also fix up all users to set this value (at 0, meaning use default), including a wrapper for that, iova_domain_init_rcaches_default(). For dma-iommu.c we derive the iova_len argument from the IOMMU group max opt DMA size. Signed-off-by: John Garry --- drivers/iommu/dma-iommu.c| 15 ++- drivers/iommu/iova.c | 19 --- drivers/vdpa/vdpa_user/iova_domain.c | 4 ++-- include/linux/iova.h | 3 ++- 4 files changed, 34 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 42ca42ff1b5d..19f35624611c 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -525,6 +525,8 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, struct iommu_dma_cookie *cookie = domain->iova_cookie; unsigned long order, base_pfn; struct iova_domain *iovad; + size_t max_opt_dma_size; + unsigned long iova_len = 0; int ret; if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) @@ -560,7 +562,18 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, } init_iova_domain(iovad, 1UL << order, base_pfn); - ret = iova_domain_init_rcaches(iovad); + + max_opt_dma_size = iommu_group_get_max_opt_dma_size(dev->iommu_group); + if (max_opt_dma_size) { + unsigned long shift = __ffs(1UL << order); + + iova_len = roundup_pow_of_two(max_opt_dma_size); + iova_len >>= shift; + if (!iova_len) + iova_len = 1; How about move "iovad->rcache_max_size = iova_len_to_rcache_max(iova_len);" here? So that, iova_domain_init_rcaches() can remain the same. And iova_domain_init_rcaches_default() does not need to be added. I see your idea. I will say that I would rather not add iova_domain_init_rcaches_default(). But personally I think it's better to setup all rcache stuff only once and inside iova_domain_init_rcaches(), as it is today. In addition, it doesn't look reasonable to expose iova_len_to_rcache_max(). But maybe it's ok. Other opinion would be welcome... Thanks, John + } + + ret = iova_domain_init_rcaches(iovad, iova_len); if (ret) return ret; diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 5c22b9187b79..d65e79e132ee 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -706,12 +706,20 @@ static void iova_magazine_push(struct iova_magazine *mag, unsigned long pfn) mag->pfns[mag->size++] = pfn; } -int iova_domain_init_rcaches(struct iova_domain *iovad) +static unsigned long iova_len_to_rcache_max(unsigned long iova_len) +{ + return order_base_2(iova_len) + 1; +} + +int iova_domain_init_rcaches(struct iova_domain *iovad, unsigned long iova_len) { unsigned int cpu; int i, ret; - iovad->rcache_max_size = 6; /* Arbitrarily high default */ + if (iova_len) + iovad->rcache_max_size = iova_len_to_rcache_max(iova_len); + else + iovad->rcache_max_size = 6; /* Arbitrarily high default */ iovad->rcaches = kcalloc(iovad->rcache_max_size, sizeof(struct iova_rcache), @@ -755,7 +763,12 @@ int iova_domain_init_rcaches(struct iova_domain *iovad) free_iova_rcaches(iovad); return ret; } -EXPORT_SYMBOL_GPL(iova_domain_init_rcaches); + +int iova_domain_init_rcaches_default(struct iova_domain *iovad) +{ + return iova_domain_init_rcaches(iovad, 0); +} +EXPORT_SYMBOL_GPL(iova_domain_init_rcaches_default); /* * Try inserting IOVA range starting with 'iova_pfn' into 'rcache', and diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c index 6daa3978d290..3a2acef98a4a 100644 --- a/drivers/vdpa/vdpa_user/iova_domain.c +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -514,12 +514,12 @@ vduse_domain_create(unsigned long iova_limit, size_t bounce_size) spin_lock_init(&domain->iotlb_lock); init_iova_domain(&domain->stream_iovad, PAGE_SIZE, IOVA_START_PFN); - ret = iova_domain_init_rcaches(&domain->stream_iovad); + ret = iova_domain_init_rcaches_default(&domain->stream_iovad); if (ret) goto err_iovad_stream; init_iova_domain(&domain->consistent_iovad, PAGE_SIZE, bounce_pfns); - ret = iova_domain_init_rcaches(&domain->consistent_iovad); + ret = iova_domain_init_rcaches_default(&domain->consistent_iovad); if (ret) goto err_iovad_consistent; diff --git a/include/linux/iova.h b/include/linux/iova.h index 02f7222fa85a..56281
Re: [PATCH v2 1/2] iommu/amd: Enable swiotlb in all cases
On Thu, Apr 07, 2022 at 02:31:44PM +0100, Robin Murphy wrote: > FWIW it's also broken for another niche case where > iommu_default_passthrough() == false at init, but the user later changes a > 32-bit device's default domain type to passthrough via sysfs, such that it > starts needing regular dma-direct bouncing. Yeah. We also have yet another issue: swiotlb is not allocate if there is no memory outside the 4GB physical address space. I think I can fix that easily after my swiotlb init series goes in, before that it would be a bit of a mess spread over all the architectures. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/2] iommu/amd: Enable swiotlb in all cases
On 2022-04-04 21:47, Mario Limonciello wrote: Previously the AMD IOMMU would only enable SWIOTLB in certain circumstances: * IOMMU in passthrough mode * SME enabled This logic however doesn't work when an untrusted device is plugged in that doesn't do page aligned DMA transactions. The expectation is that a bounce buffer is used for those transactions. This fails like this: swiotlb buffer is full (sz: 4096 bytes), total 0 (slots), used 0 (slots) That happens because the bounce buffers have been allocated, followed by freed during startup but the bounce buffering code expects that all IOMMUs have left it enabled. Remove the criteria to set up bounce buffers on AMD systems to ensure they're always available for supporting untrusted devices. FWIW it's also broken for another niche case where iommu_default_passthrough() == false at init, but the user later changes a 32-bit device's default domain type to passthrough via sysfs, such that it starts needing regular dma-direct bouncing. Reviewed-by: Robin Murphy Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Suggested-by: Christoph Hellwig Signed-off-by: Mario Limonciello --- v1->v2: * Enable swiotlb for AMD instead of ignoring it for inactive drivers/iommu/amd/iommu.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a1ada7bff44e..079694f894b8 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1838,17 +1838,10 @@ void amd_iommu_domain_update(struct protection_domain *domain) amd_iommu_domain_flush_complete(domain); } -static void __init amd_iommu_init_dma_ops(void) -{ - swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0; -} - int __init amd_iommu_init_api(void) { int err; - amd_iommu_init_dma_ops(); - err = bus_set_iommu(&pci_bus_type, &amd_iommu_ops); if (err) return err; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 06/11] ACPI/IORT: Add support to retrieve IORT RMR reserved regions
> +static void iort_rmr_desc_check_overlap(struct acpi_iort_rmr_desc *desc, u32 > count) Overly long line. > void iommu_dma_put_resv_regions(struct device *dev, struct list_head *list) > { > + if (!is_of_node(dev_iommu_fwspec_get(dev)->iommu_fwnode)) > + iort_iommu_put_resv_regions(dev, list); > + > generic_iommu_put_resv_regions(dev, list); > } Why can't this just go into generic_iommu_put_resv_regions? The idea that the iommu low-level drivers need to call into dma-iommu which is a consumer of the IOMMU API is odd. Especially if that just calls out to ACPI code and generic IOMMU code only anyway. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 05/11] iommu/dma: Introduce a helper to remove reserved regions
> > +static inline void iommu_dma_put_resv_regions(struct device *dev, struct > list_head *list) > +{ > +} > + > #endif /* CONFIG_IOMMU_DMA */ This changes behavior when CONFIG_IOMMU_DMA is not set. So e.g. on ARM all the drivers that are using the new helper now fail to release reserved regions. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] iommu: remove the put_resv_regions method
On Thu, Apr 07, 2022 at 11:18:20AM +0100, Robin Murphy wrote: > On 2022-04-07 07:26, Christoph Hellwig wrote: >> All drivers that implement get_resv_regions just use >> generic_put_resv_regions to implement the put side. Remove the >> indirections and document the allocations constraints. > > Unfortunately we need to keep this one for now, as the belated IORT RMR > support will finally be the first real user[1][2]. > > Robin. > > [1] > https://lore.kernel.org/linux-iommu/20220404124209.1086-6-shameerali.kolothum.th...@huawei.com/ > [2] > https://lore.kernel.org/linux-iommu/20220404124209.1086-7-shameerali.kolothum.th...@huawei.com/ What these patches to looks wrong to me. I'll comment there. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/omap: Fix regression in probe for NULL pointer dereference
Hi, > Am 07.04.2022 um 07:39 schrieb Tony Lindgren : > > Hi, > > * Tony Lindgren [220331 09:21]: >> Commit 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") started >> triggering a NULL pointer dereference for some omap variants: >> >> __iommu_probe_device from probe_iommu_group+0x2c/0x38 >> probe_iommu_group from bus_for_each_dev+0x74/0xbc >> bus_for_each_dev from bus_iommu_probe+0x34/0x2e8 >> bus_iommu_probe from bus_set_iommu+0x80/0xc8 >> bus_set_iommu from omap_iommu_init+0x88/0xcc >> omap_iommu_init from do_one_initcall+0x44/0x24 >> >> This is caused by omap iommu probe returning 0 instead of ERR_PTR(-ENODEV) >> as noted by Jason Gunthorpe . >> >> Looks like the regression already happened with an earlier commit >> 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs") >> that changed the function return type and missed converting one place. > > Can you guys please get this fix into the -rc series? Or ack it so > I can pick it up into my fixes branch? > > Without this fix booting is failing for a bunch of devices. Yes, I can confirm that v5.18-rc1 does not even boot the GTA04 (omap3) and OMAP5UEVM to any activity without this patch. Seems to be an urgent fix. BR and thanks, Nikolaus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/2] dma-iommu: Check that swiotlb is active before trying to use it
On 2022-04-04 21:47, Mario Limonciello via iommu wrote: If the IOMMU is in use and an untrusted device is connected to an external facing port but the address requested isn't page aligned will cause the kernel to attempt to use bounce buffers. If for some reason the bounce buffers have not been allocated this is a problem that should be made apparent to the user. Reviewed-by: Robin Murphy Signed-off-by: Mario Limonciello --- v1->v2: * Move error message into the caller drivers/iommu/dma-iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 09f6e1c0f9c0..1ca85d37eeab 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -971,6 +971,11 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, void *padding_start; size_t padding_size, aligned_size; + if (!is_swiotlb_active(dev)) { + dev_warn_once(dev, "DMA bounce buffers are inactive, unable to map unaligned transaction.\n"); + return DMA_MAPPING_ERROR; + } + aligned_size = iova_align(iovad, size); phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size, iova_mask(iovad), dir, attrs); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 0/7] Add support for HiSilicon PCIe Tune and Trace device
HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex integrated Endpoint (RCiEP) device, providing the capability to dynamically monitor and tune the PCIe traffic (tune), and trace the TLP headers (trace). PTT tune is designed for monitoring and adjusting PCIe link parameters. We provide several parameters of the PCIe link. Through the driver, user can adjust the value of certain parameter to affect the PCIe link for the purpose of enhancing the performance in certian situation. PTT trace is designed for dumping the TLP headers to the memory, which can be used to analyze the transactions and usage condition of the PCIe Link. Users can choose filters to trace headers, by either requester ID, or those downstream of a set of Root Ports on the same core of the PTT device. It's also supported to trace the headers of certain type and of certain direction. The driver registers a PMU device for each PTT device. The trace can be used through `perf record` and the traced headers can be decoded by `perf report`. The perf command support for the device is also added in this patchset. The tune can be used through the sysfs attributes of related PMU device. See the documentation for the detailed usage. Change since v6: - Fix W=1 errors reported by lkp test, thanks Change since v5: - Squash the PMU patch into PATCH 2 suggested by John - refine the commit message of PATCH 1 and some comments Link: https://lore.kernel.org/lkml/20220308084930.5142-1-yangyic...@hisilicon.com/ Change since v4: Address the comments from Jonathan, John and Ma Ca, thanks. - Use devm* also for allocating the DMA buffers - Remove the IRQ handler stub in Patch 2 - Make functions waiting for hardware state return boolean - Manual remove the PMU device as it should be removed first - Modifier the orders in probe and removal to make them matched well - Make available {directions,type,format} array const and non-global - Using the right filter list in filters show and well protect the list with mutex - Record the trace status with a boolean @started rather than enum - Optimize the process of finding the PTT devices of the perf-tool Link: https://lore.kernel.org/linux-pci/20220221084307.33712-1-yangyic...@hisilicon.com/ Change since v3: Address the comments from Jonathan and John, thanks. - drop members in the common struct which can be get on the fly - reduce buffer struct and organize the buffers with array instead of list - reduce the DMA reset wait time to avoid long time busy loop - split the available_filters sysfs attribute into two files, for root port and requester respectively. Update the documentation accordingly - make IOMMU mapping check earlier in probe to avoid race condition. Also make IOMMU quirk patch prior to driver in the series - Cleanups and typos fixes from John and Jonathan Link: https://lore.kernel.org/linux-pci/20220124131118.17887-1-yangyic...@hisilicon.com/ Change since v2: - address the comments from Mathieu, thanks. - rename the directory to ptt to match the function of the device - spinoff the declarations to a separate header - split the trace function to several patches - some other comments. - make default smmu domain type of PTT device to identity Drop the RMR as it's not recommended and use an iommu_def_domain_type quirk to passthrough the device DMA as suggested by Robin. Link: https://lore.kernel.org/linux-pci/2026090625.53702-1-yangyic...@hisilicon.com/ Change since v1: - switch the user interface of trace to perf from debugfs - switch the user interface of tune to sysfs from debugfs - add perf tool support to start trace and decode the trace data - address the comments of documentation from Bjorn - add RMR[1] support of the device as trace works in RMR mode or direct DMA mode. RMR support is achieved by common APIs rather than the APIs implemented in [1]. Link: https://lore.kernel.org/lkml/1618654631-42454-1-git-send-email-yangyic...@hisilicon.com/ [1] https://lore.kernel.org/linux-acpi/20210805080724.480-1-shameerali.kolothum.th...@huawei.com/ Qi Liu (1): perf tool: Add support for HiSilicon PCIe Tune and Trace device driver Yicong Yang (6): iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device hisi_ptt: Add support for dynamically updating the filter list hisi_ptt: Add tune function support for HiSilicon PCIe Tune and Trace device docs: Add HiSilicon PTT device driver documentation MAINTAINERS: Add maintainer for HiSilicon PTT driver Documentation/trace/hisi-ptt.rst | 303 + MAINTAINERS |7 + drivers/Makefile |1 + drivers/hwtracing/Kconfig |2 + drivers/hwtracing/ptt/Kconfig | 12 + drivers/hwtracing/ptt/Makefile|2 + drivers/hwtracing/ptt/hisi_ptt.c | 1161 ++
[PATCH v7 7/7] MAINTAINERS: Add maintainer for HiSilicon PTT driver
Add maintainer for driver and documentation of HiSilicon PTT device. Signed-off-by: Yicong Yang Reviewed-by: Jonathan Cameron --- MAINTAINERS | 7 +++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index fd768d43e048..d30a1698251c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8858,6 +8858,13 @@ F: Documentation/admin-guide/perf/hisi-pcie-pmu.rst F: Documentation/admin-guide/perf/hisi-pmu.rst F: drivers/perf/hisilicon +HISILICON PTT DRIVER +M: Yicong Yang +L: linux-ker...@vger.kernel.org +S: Maintained +F: Documentation/trace/hisi-ptt.rst +F: drivers/hwtracing/ptt/ + HISILICON QM AND ZIP Controller DRIVER M: Zhou Wang L: linux-cry...@vger.kernel.org -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 2/7] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device
HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated Endpoint(RCiEP) device, providing the capability to dynamically monitor and tune the PCIe traffic, and trace the TLP headers. Add the driver for the device to enable the trace function. Register PMU device of PTT trace, then users can use trace through perf command. The driver makes use of perf AUX trace and support following events to configure the trace: - filter: select Root port or Endpoint to trace - type: select the type of traced TLP headers - direction: select the direction of traced TLP headers - format: select the data format of the traced TLP headers This patch adds the driver part of PTT trace. The perf command support of PTT trace is added in the following patch. Signed-off-by: Yicong Yang Reviewed-by: Jonathan Cameron --- drivers/Makefile | 1 + drivers/hwtracing/Kconfig| 2 + drivers/hwtracing/ptt/Kconfig| 12 + drivers/hwtracing/ptt/Makefile | 2 + drivers/hwtracing/ptt/hisi_ptt.c | 874 +++ drivers/hwtracing/ptt/hisi_ptt.h | 166 ++ 6 files changed, 1057 insertions(+) create mode 100644 drivers/hwtracing/ptt/Kconfig create mode 100644 drivers/hwtracing/ptt/Makefile create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h diff --git a/drivers/Makefile b/drivers/Makefile index 020780b6b4d2..662d50599467 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -175,6 +175,7 @@ obj-$(CONFIG_USB4) += thunderbolt/ obj-$(CONFIG_CORESIGHT)+= hwtracing/coresight/ obj-y += hwtracing/intel_th/ obj-$(CONFIG_STM) += hwtracing/stm/ +obj-$(CONFIG_HISI_PTT) += hwtracing/ptt/ obj-$(CONFIG_ANDROID) += android/ obj-$(CONFIG_NVMEM)+= nvmem/ obj-$(CONFIG_FPGA) += fpga/ diff --git a/drivers/hwtracing/Kconfig b/drivers/hwtracing/Kconfig index 13085835a636..911ee977103c 100644 --- a/drivers/hwtracing/Kconfig +++ b/drivers/hwtracing/Kconfig @@ -5,4 +5,6 @@ source "drivers/hwtracing/stm/Kconfig" source "drivers/hwtracing/intel_th/Kconfig" +source "drivers/hwtracing/ptt/Kconfig" + endmenu diff --git a/drivers/hwtracing/ptt/Kconfig b/drivers/hwtracing/ptt/Kconfig new file mode 100644 index ..8902a6f27563 --- /dev/null +++ b/drivers/hwtracing/ptt/Kconfig @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: GPL-2.0-only +config HISI_PTT + tristate "HiSilicon PCIe Tune and Trace Device" + depends on ARM64 || (COMPILE_TEST && 64BIT) + depends on PCI && HAS_DMA && HAS_IOMEM && PERF_EVENTS + help + HiSilicon PCIe Tune and Trace Device exists as a PCIe RCiEP + device, and it provides support for PCIe traffic tuning and + tracing TLP headers to the memory. + + This driver can also be built as a module. If so, the module + will be called hisi_ptt. diff --git a/drivers/hwtracing/ptt/Makefile b/drivers/hwtracing/ptt/Makefile new file mode 100644 index ..908c09a98161 --- /dev/null +++ b/drivers/hwtracing/ptt/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_HISI_PTT) += hisi_ptt.o diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c new file mode 100644 index ..242b41870380 --- /dev/null +++ b/drivers/hwtracing/ptt/hisi_ptt.c @@ -0,0 +1,874 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Driver for HiSilicon PCIe tune and trace device + * + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd. + * Author: Yicong Yang + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "hisi_ptt.h" + +static u16 hisi_ptt_get_filter_val(struct pci_dev *pdev) +{ + if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) + return BIT(HISI_PCIE_CORE_PORT_ID(PCI_SLOT(pdev->devfn))); + + return PCI_DEVID(pdev->bus->number, pdev->devfn); +} + +static bool hisi_ptt_wait_trace_hw_idle(struct hisi_ptt *hisi_ptt) +{ + u32 val; + + return !readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_STS, + val, val & HISI_PTT_TRACE_IDLE, + HISI_PTT_WAIT_POLL_INTERVAL_US, + HISI_PTT_WAIT_TRACE_TIMEOUT_US); +} + +static bool hisi_ptt_wait_dma_reset_done(struct hisi_ptt *hisi_ptt) +{ + u32 val; + + return !readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_WR_STS, + val, !val, HISI_PTT_RESET_POLL_INTERVAL_US, + HISI_PTT_RESET_TIMEOUT_US); +} + +static void hisi_ptt_free_trace_buf(struct hisi_ptt *hisi_ptt) +{ + struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl; + struct device *dev = &hisi_ptt->pdev->dev; + int i; + + if (!ctrl-
[PATCH v7 5/7] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver
From: Qi Liu 'perf record' and 'perf report --dump-raw-trace' supported in this patch. Example usage: Output will contain raw PTT data and its textual representation, such as: 0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x40 offset: 0 ref: 0xa5d50c725 idx: 0 tid: -1 cpu: 0 . . ... HISI PTT data: size 4194304 bytes . : 00 00 00 00 Prefix . 0004: 08 20 00 60 Header DW0 . 0008: ff 02 00 01 Header DW1 . 000c: 20 08 00 00 Header DW2 . 0010: 10 e7 44 ab Header DW3 . 0014: 2a a8 1e 01 Time . 0020: 00 00 00 00 Prefix . 0024: 01 00 00 60 Header DW0 . 0028: 0f 1e 00 01 Header DW1 . 002c: 04 00 00 00 Header DW2 . 0030: 40 00 81 02 Header DW3 . 0034: ee 02 00 00 Time Signed-off-by: Qi Liu Signed-off-by: Yicong Yang --- tools/perf/arch/arm/util/auxtrace.c | 76 +- tools/perf/arch/arm/util/pmu.c| 3 + tools/perf/arch/arm64/util/Build | 2 +- tools/perf/arch/arm64/util/hisi_ptt.c | 195 tools/perf/util/Build | 2 + tools/perf/util/auxtrace.c| 4 + tools/perf/util/auxtrace.h| 1 + tools/perf/util/hisi-ptt-decoder/Build| 1 + .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.c | 170 ++ .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.h | 28 +++ tools/perf/util/hisi_ptt.c| 218 ++ tools/perf/util/hisi_ptt.h| 28 +++ 12 files changed, 724 insertions(+), 4 deletions(-) create mode 100644 tools/perf/arch/arm64/util/hisi_ptt.c create mode 100644 tools/perf/util/hisi-ptt-decoder/Build create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.c create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.h create mode 100644 tools/perf/util/hisi_ptt.c create mode 100644 tools/perf/util/hisi_ptt.h diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c index 5fc6a2a3dbc5..393f5757c039 100644 --- a/tools/perf/arch/arm/util/auxtrace.c +++ b/tools/perf/arch/arm/util/auxtrace.c @@ -4,9 +4,11 @@ * Author: Mathieu Poirier */ +#include #include #include #include +#include #include "../../../util/auxtrace.h" #include "../../../util/debug.h" @@ -14,6 +16,7 @@ #include "../../../util/pmu.h" #include "cs-etm.h" #include "arm-spe.h" +#include "hisi_ptt.h" static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err) { @@ -50,6 +53,58 @@ static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err) return arm_spe_pmus; } +static struct perf_pmu **find_all_hisi_ptt_pmus(int *nr_ptts, int *err) +{ + const char *sysfs = sysfs__mountpoint(); + struct perf_pmu **hisi_ptt_pmus = NULL; + struct dirent *dent; + char path[PATH_MAX]; + DIR *dir = NULL; + int idx = 0; + + snprintf(path, PATH_MAX, "%s" EVENT_SOURCE_DEVICE_PATH, sysfs); + dir = opendir(path); + if (!dir) { + pr_err("can't read directory '%s'\n", EVENT_SOURCE_DEVICE_PATH); + *err = -EINVAL; + goto out; + } + + while ((dent = readdir(dir))) { + if (strstr(dent->d_name, HISI_PTT_PMU_NAME)) + (*nr_ptts)++; + } + + if (!(*nr_ptts)) + goto out; + + hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * (*nr_ptts)); + if (!hisi_ptt_pmus) { + pr_err("hisi_ptt alloc failed\n"); + *err = -ENOMEM; + goto out; + } + + rewinddir(dir); + while ((dent = readdir(dir))) { + if (strstr(dent->d_name, HISI_PTT_PMU_NAME) && idx < (*nr_ptts)) { + hisi_ptt_pmus[idx] = perf_pmu__find(dent->d_name); + if (hisi_ptt_pmus[idx]) { + pr_debug2("%s %d: hisi_ptt_pmu %d type %d name %s\n", + __func__, __LINE__, idx, + hisi_ptt_pmus[idx]->type, + hisi_ptt_pmus[idx]->name); + idx++; + } + + } + } + +out: + closedir(dir); + return hisi_ptt_pmus; +} + struct auxtrace_record *auxtrace_record__init(struct evlist *evlist, int *err) { @@ -57,8 +112,12 @@ struct auxtrace_record struct evsel *evsel; bool found_etm = false; struct perf_pmu *found_spe = NULL; + struct perf_pmu *found_p
[PATCH v7 4/7] hisi_ptt: Add tune function support for HiSilicon PCIe Tune and Trace device
Add tune function for the HiSilicon Tune and Trace device. The interface of tune is exposed through sysfs attributes of PTT PMU device. Signed-off-by: Yicong Yang Reviewed-by: Jonathan Cameron --- drivers/hwtracing/ptt/hisi_ptt.c | 154 +++ drivers/hwtracing/ptt/hisi_ptt.h | 20 2 files changed, 174 insertions(+) diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c index b1958ac20372..331c8e43cd17 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.c +++ b/drivers/hwtracing/ptt/hisi_ptt.c @@ -21,6 +21,159 @@ #include "hisi_ptt.h" +static bool hisi_ptt_wait_tuning_finish(struct hisi_ptt *hisi_ptt) +{ + u32 val; + + return !readl_poll_timeout(hisi_ptt->iobase + HISI_PTT_TUNING_INT_STAT, + val, !(val & HISI_PTT_TUNING_INT_STAT_MASK), + HISI_PTT_WAIT_POLL_INTERVAL_US, + HISI_PTT_WAIT_TUNE_TIMEOUT_US); +} + +static int hisi_ptt_tune_data_get(struct hisi_ptt *hisi_ptt, + u32 event, u16 *data) +{ + u32 reg; + + reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); + reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB); + reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB, + event); + writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); + + /* Write all 1 to indicates it's the read process */ + writel(~0U, hisi_ptt->iobase + HISI_PTT_TUNING_DATA); + + if (!hisi_ptt_wait_tuning_finish(hisi_ptt)) + return -ETIMEDOUT; + + reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_DATA); + reg &= HISI_PTT_TUNING_DATA_VAL_MASK; + *data = FIELD_GET(HISI_PTT_TUNING_DATA_VAL_MASK, reg); + + return 0; +} + +static int hisi_ptt_tune_data_set(struct hisi_ptt *hisi_ptt, + u32 event, u16 data) +{ + u32 reg; + + reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); + reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB); + reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB, + event); + writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); + + writel(FIELD_PREP(HISI_PTT_TUNING_DATA_VAL_MASK, data), + hisi_ptt->iobase + HISI_PTT_TUNING_DATA); + + if (!hisi_ptt_wait_tuning_finish(hisi_ptt)) + return -ETIMEDOUT; + + return 0; +} + +static ssize_t hisi_ptt_tune_attr_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev)); + struct dev_ext_attribute *ext_attr; + struct hisi_ptt_tune_desc *desc; + int ret; + u16 val; + + ext_attr = container_of(attr, struct dev_ext_attribute, attr); + desc = ext_attr->var; + + if (!mutex_trylock(&hisi_ptt->mutex)) + return -EBUSY; + + ret = hisi_ptt_tune_data_get(hisi_ptt, desc->event_code, &val); + + mutex_unlock(&hisi_ptt->mutex); + return ret ? ret : sysfs_emit(buf, "%u\n", val); +} + +static ssize_t hisi_ptt_tune_attr_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev)); + struct dev_ext_attribute *ext_attr; + struct hisi_ptt_tune_desc *desc; + int ret; + u16 val; + + ext_attr = container_of(attr, struct dev_ext_attribute, attr); + desc = ext_attr->var; + + if (kstrtou16(buf, 10, &val)) + return -EINVAL; + + if (!mutex_trylock(&hisi_ptt->mutex)) + return -EBUSY; + + ret = hisi_ptt_tune_data_set(hisi_ptt, desc->event_code, val); + + mutex_unlock(&hisi_ptt->mutex); + return ret ? ret : count; +} + +#define HISI_PTT_TUNE_ATTR(_name, _val, _show, _store) \ + static struct hisi_ptt_tune_desc _name##_desc = { \ + .name = #_name, \ + .event_code = _val, \ + }; \ + static struct dev_ext_attribute hisi_ptt_##_name##_attr = { \ + .attr = __ATTR(_name, 0600, _show, _store), \ + .var= &_name##_desc,\ + } + +#define HISI_PTT_TUNE_ATTR_COMMON(_name, _val) \ + HISI_PTT_TUNE_ATTR(_name, _val, \ + hisi_ptt_tune_attr_show, \ + hisi_ptt_tune_attr_store) + +/* + * The value of the tuning event are composed of two parts: main event code in bit[0,15]
[PATCH v7 3/7] hisi_ptt: Add support for dynamically updating the filter list
The PCIe devices supported by the PTT trace can be removed/rescanned by hotplug or through sysfs. Add support for dynamically updating the available filter list by registering a PCI bus notifier block. Then user can always get latest information about available tracing filters and driver can block the invalid filters of which related devices no longer exist in the system. Signed-off-by: Yicong Yang Reviewed-by: Jonathan Cameron --- drivers/hwtracing/ptt/hisi_ptt.c | 159 --- drivers/hwtracing/ptt/hisi_ptt.h | 34 +++ 2 files changed, 180 insertions(+), 13 deletions(-) diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c index 242b41870380..b1958ac20372 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.c +++ b/drivers/hwtracing/ptt/hisi_ptt.c @@ -270,27 +270,121 @@ static int hisi_ptt_register_irq(struct hisi_ptt *hisi_ptt) return 0; } -static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data) +static void hisi_ptt_update_filters(struct work_struct *work) { + struct delayed_work *delayed_work = to_delayed_work(work); + struct hisi_ptt_filter_update_info info; struct hisi_ptt_filter_desc *filter; - struct hisi_ptt *hisi_ptt = data; struct list_head *target_list; + struct hisi_ptt *hisi_ptt; - target_list = pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT ? - &hisi_ptt->port_filters : &hisi_ptt->req_filters; + hisi_ptt = container_of(delayed_work, struct hisi_ptt, work); - filter = kzalloc(sizeof(*filter), GFP_KERNEL); - if (!filter) { - pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev)); - return -ENOMEM; + if (!mutex_trylock(&hisi_ptt->mutex)) { + schedule_delayed_work(&hisi_ptt->work, HISI_PTT_WORK_DELAY_MS); + return; } - filter->pdev = pdev; - list_add_tail(&filter->list, target_list); + while (kfifo_get(&hisi_ptt->filter_update_kfifo, &info)) { + bool is_port = pci_pcie_type(info.pdev) == PCI_EXP_TYPE_ROOT_PORT; + u16 val = hisi_ptt_get_filter_val(info.pdev); - /* Update the available port mask */ - if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) - hisi_ptt->port_mask |= hisi_ptt_get_filter_val(pdev); + target_list = is_port ? &hisi_ptt->port_filters : &hisi_ptt->req_filters; + + if (info.is_add) { + filter = kzalloc(sizeof(*filter), GFP_KERNEL); + if (!filter) { + pci_err(hisi_ptt->pdev, "failed to add filter %s\n", + pci_name(info.pdev)); + continue; + } + + filter->pdev = info.pdev; + list_add_tail(&filter->list, target_list); + } else { + list_for_each_entry(filter, target_list, list) + if (hisi_ptt_get_filter_val(filter->pdev) == val) { + list_del(&filter->list); + kfree(filter); + break; + } + } + + /* Update the available port mask */ + if (!is_port) + continue; + + if (info.is_add) + hisi_ptt->port_mask |= val; + else + hisi_ptt->port_mask &= ~val; + } + + mutex_unlock(&hisi_ptt->mutex); +} + +static void hisi_ptt_update_fifo_in(struct hisi_ptt *hisi_ptt, + struct hisi_ptt_filter_update_info *info) +{ + struct pci_dev *root_port = pcie_find_root_port(info->pdev); + u32 port_devid; + + if (!root_port) + return; + + port_devid = PCI_DEVID(root_port->bus->number, root_port->devfn); + if (port_devid < hisi_ptt->lower || + port_devid > hisi_ptt->upper) + return; + + if (kfifo_in_spinlocked(&hisi_ptt->filter_update_kfifo, info, 1, + &hisi_ptt->filter_update_lock)) + schedule_delayed_work(&hisi_ptt->work, 0); + else + pci_warn(hisi_ptt->pdev, +"filter update fifo overflow for target %s\n", +pci_name(info->pdev)); +} + +/* + * A PCI bus notifier is used here for dynamically updating the filter + * list. + */ +static int hisi_ptt_notifier_call(struct notifier_block *nb, unsigned long action, + void *data) +{ + struct hisi_ptt *hisi_ptt = container_of(nb, struct hisi_ptt, hisi_ptt_nb); + struct hisi_ptt_filter_update_info info; + struct device *dev = data; + struct pci_dev *pdev = to_pci_dev(dev); + + info.pdev
[PATCH v7 6/7] docs: Add HiSilicon PTT device driver documentation
Document the introduction and usage of HiSilicon PTT device driver. Signed-off-by: Yicong Yang Reviewed-by: Jonathan Cameron --- Documentation/trace/hisi-ptt.rst | 303 +++ 1 file changed, 303 insertions(+) create mode 100644 Documentation/trace/hisi-ptt.rst diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst new file mode 100644 index ..13677705ee1f --- /dev/null +++ b/Documentation/trace/hisi-ptt.rst @@ -0,0 +1,303 @@ +.. SPDX-License-Identifier: GPL-2.0 + +== +HiSilicon PCIe Tune and Trace device +== + +Introduction + + +HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex +integrated Endpoint (RCiEP) device, providing the capability +to dynamically monitor and tune the PCIe link's events (tune), +and trace the TLP headers (trace). The two functions are independent, +but is recommended to use them together to analyze and enhance the +PCIe link's performance. + +On Kunpeng 930 SoC, the PCIe Root Complex is composed of several +PCIe cores. Each PCIe core includes several Root Ports and a PTT +RCiEP, like below. The PTT device is capable of tuning and +tracing the links of the PCIe core. +:: + +--Core 0---+ + | | [ PTT ] | + | | [Root Port]---[Endpoint] + | | [Root Port]---[Endpoint] + | | [Root Port]---[Endpoint] +Root Complex |--Core 1---+ + | | [ PTT ] | + | | [Root Port]---[ Switch ]---[Endpoint] + | | [Root Port]---[Endpoint] `-[Endpoint] + | | [Root Port]---[Endpoint] + +---+ + +The PTT device driver registers one PMU device for each PTT device. +The name of each PTT device is composed of 'hisi_ptt' prefix with +the id of the SICL and the Core where it locates. The Kunpeng 930 +SoC encapsulates multiple CPU dies (SCCL, Super CPU Cluster) and +IO dies (SICL, Super I/O Cluster), where there's one PCIe Root +Complex for each SICL. +:: +/sys/devices/hisi_ptt_ + +Tune + + +PTT tune is designed for monitoring and adjusting PCIe link parameters (events). +Currently we support events in 4 classes. The scope of the events +covers the PCIe core to which the PTT device belongs. + +Each event is presented as a file under $(PTT PMU dir)/tune, and +a simple open/read/write/close cycle will be used to tune the event. +:: +$ cd /sys/devices/hisi_ptt_/tune +$ ls +qos_tx_cplqos_tx_npqos_tx_p +tx_path_rx_req_alloc_buf_level +tx_path_tx_req_alloc_buf_level +$ cat qos_tx_dp +1 +$ echo 2 > qos_tx_dp +$ cat qos_tx_dp +2 + +Current value (numerical value) of the event can be simply read +from the file, and the desired value written to the file to tune. + +1. Tx path QoS control + + +The following files are provided to tune the QoS of the tx path of +the PCIe core. + +- qos_tx_cpl: weight of Tx completion TLPs +- qos_tx_np: weight of Tx non-posted TLPs +- qos_tx_p: weight of Tx posted TLPs + +The weight influences the proportion of certain packets on the PCIe link. +For example, for the storage scenario, increase the proportion +of the completion packets on the link to enhance the performance as +more completions are consumed. + +The available tune data of these events is [0, 1, 2]. +Writing a negative value will return an error, and out of range +values will be converted to 2. Note that the event value just +indicates a probable level, but is not precise. + +2. Tx path buffer control +- + +Following files are provided to tune the buffer of tx path of the PCIe core. + +- tx_path_rx_req_alloc_buf_level: watermark of Rx requested +- tx_path_tx_req_alloc_buf_level: watermark of Tx requested + +These events influence the watermark of the buffer allocated for each +type. Rx means the inbound while Tx means outbound. The packets will +be stored in the buffer first and then transmitted either when the +watermark reached or when timed out. For a busy direction, you should +increase the related buffer watermark to avoid frequently posting and +thus enhance the performance. In most cases just keep the default value. + +The available tune data of above events is [0, 1, 2]. +Writing a negative value will return an error, and out of range +values will be converted to 2. Note that the event value just +indicates a probable level, but is not precise. + +Trace += + +PTT trace is designed for dumping the TLP headers to the memory, which +can be used to analyze the transactions and usage condition of the PCIe +Link. You can choose to filter the traced headers by either requester ID, +or those downstream of a set of Root Ports on the same core of the PTT +device. It's also supported to trace the headers of certain type and of
[PATCH v7 1/7] iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity
The DMA operations of HiSilicon PTT device can only work properly with identical mappings. So add a quirk for the device to force the domain as passthrough. Signed-off-by: Yicong Yang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 627a3ed5ee8f..5ec15ae2a9b1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2839,6 +2839,21 @@ static int arm_smmu_dev_disable_feature(struct device *dev, } } +#define IS_HISI_PTT_DEVICE(pdev) ((pdev)->vendor == PCI_VENDOR_ID_HUAWEI && \ +(pdev)->device == 0xa12e) + +static int arm_smmu_def_domain_type(struct device *dev) +{ + if (dev_is_pci(dev)) { + struct pci_dev *pdev = to_pci_dev(dev); + + if (IS_HISI_PTT_DEVICE(pdev)) + return IOMMU_DOMAIN_IDENTITY; + } + + return 0; +} + static struct iommu_ops arm_smmu_ops = { .capable= arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, @@ -2856,6 +2871,7 @@ static struct iommu_ops arm_smmu_ops = { .sva_unbind = arm_smmu_sva_unbind, .sva_get_pasid = arm_smmu_sva_get_pasid, .page_response = arm_smmu_page_response, + .def_domain_type= arm_smmu_def_domain_type, .pgsize_bitmap = -1UL, /* Restricted during device attach */ .owner = THIS_MODULE, .default_domain_ops = &(const struct iommu_domain_ops) { -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] x86/msi: Fix msi message data shadow struct
The x86 MSI message data is 32 bits in total and is either in compatibility or remappable format, see Intel Virtualization Technology for Directed I/O, section 5.1.2. Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs") Co-developed-by: Adrian-Ken Rueegsegger Signed-off-by: Adrian-Ken Rueegsegger Signed-off-by: Reto Buerki --- arch/x86/include/asm/msi.h | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h index b85147d75626..d71c7e8b738d 100644 --- a/arch/x86/include/asm/msi.h +++ b/arch/x86/include/asm/msi.h @@ -12,14 +12,17 @@ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, /* Structs and defines for the X86 specific MSI message format */ typedef struct x86_msi_data { - u32 vector : 8, - delivery_mode : 3, - dest_mode_logical : 1, - reserved: 2, - active_low : 1, - is_level: 1; - - u32 dmar_subhandle; + union { + struct { + u32 vector : 8, + delivery_mode : 3, + dest_mode_logical : 1, + reserved: 2, + active_low : 1, + is_level: 1; + }; + u32 dmar_subhandle; + }; } __attribute__ ((packed)) arch_msi_msg_data_t; #define arch_msi_msg_data x86_msi_data -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] iommu: remove the put_resv_regions method
On 2022-04-07 07:26, Christoph Hellwig wrote: All drivers that implement get_resv_regions just use generic_put_resv_regions to implement the put side. Remove the indirections and document the allocations constraints. Unfortunately we need to keep this one for now, as the belated IORT RMR support will finally be the first real user[1][2]. Robin. [1] https://lore.kernel.org/linux-iommu/20220404124209.1086-6-shameerali.kolothum.th...@huawei.com/ [2] https://lore.kernel.org/linux-iommu/20220404124209.1086-7-shameerali.kolothum.th...@huawei.com/ Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 1 - drivers/iommu/apple-dart.c | 1 - drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 - drivers/iommu/intel/iommu.c | 1 - drivers/iommu/iommu.c | 20 +--- drivers/iommu/mtk_iommu.c | 1 - drivers/iommu/virtio-iommu.c| 5 ++--- include/linux/iommu.h | 4 9 files changed, 3 insertions(+), 32 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a1ada7bff44e61..7011b46022dcbb 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2279,7 +2279,6 @@ const struct iommu_ops amd_iommu_ops = { .probe_finalize = amd_iommu_probe_finalize, .device_group = amd_iommu_device_group, .get_resv_regions = amd_iommu_get_resv_regions, - .put_resv_regions = generic_iommu_put_resv_regions, .is_attach_deferred = amd_iommu_is_attach_deferred, .pgsize_bitmap = AMD_IOMMU_PGSIZES, .def_domain_type = amd_iommu_def_domain_type, diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c index decafb07ad0831..a45ad9ade0dba6 100644 --- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -771,7 +771,6 @@ static const struct iommu_ops apple_dart_iommu_ops = { .of_xlate = apple_dart_of_xlate, .def_domain_type = apple_dart_def_domain_type, .get_resv_regions = apple_dart_get_resv_regions, - .put_resv_regions = generic_iommu_put_resv_regions, .pgsize_bitmap = -1UL, /* Restricted during dart probe */ .default_domain_ops = &(const struct iommu_domain_ops) { .attach_dev = apple_dart_attach_dev, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 36461fb46d436c..1ea184bbf750a6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2847,7 +2847,6 @@ static struct iommu_ops arm_smmu_ops = { .device_group = arm_smmu_device_group, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, - .put_resv_regions = generic_iommu_put_resv_regions, .dev_enable_feat= arm_smmu_dev_enable_feature, .dev_disable_feat = arm_smmu_dev_disable_feature, .sva_bind = arm_smmu_sva_bind, diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 568cce590ccc13..41da1275689ebd 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1589,7 +1589,6 @@ static struct iommu_ops arm_smmu_ops = { .device_group = arm_smmu_device_group, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, - .put_resv_regions = generic_iommu_put_resv_regions, .def_domain_type= arm_smmu_def_domain_type, .pgsize_bitmap = -1UL, /* Restricted during device attach */ .owner = THIS_MODULE, diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index df5c62ecf942b8..cafe50cb484cd5 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4875,7 +4875,6 @@ const struct iommu_ops intel_iommu_ops = { .probe_finalize = intel_iommu_probe_finalize, .release_device = intel_iommu_release_device, .get_resv_regions = intel_iommu_get_resv_regions, - .put_resv_regions = generic_iommu_put_resv_regions, .device_group = intel_iommu_device_group, .dev_enable_feat= intel_iommu_dev_enable_feat, .dev_disable_feat = intel_iommu_dev_disable_feat, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 6ce73f35c43aac..2e1f7d1cf74793 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2584,31 +2584,13 @@ void iommu_get_resv_regions(struct device *dev, struct list_head *list) } void iommu_put_resv_regions(struct device *dev, struct list_head *list) -{ - const struct iommu_ops *ops = dev_iommu_ops(dev); - - if (ops->put_resv_regions) -
Re: [PATCH v3 2/2] iommu/mediatek: Add mt8186 iommu support
Il 07/04/22 10:32, Yong Wu ha scritto: Add mt8186 iommu supports. Signed-off-by: Anan Sun Signed-off-by: Yong Wu Reviewed-by: Matthias Brugger Reviewed-by: AngeloGioacchino Del Regno ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/2] dt-bindings: mediatek: mt8186: Add binding for MM iommu
Il 07/04/22 10:32, Yong Wu ha scritto: Add mt8186 iommu binding. "-mm" means the iommu is for Multimedia. Signed-off-by: Yong Wu Acked-by: Krzysztof Kozlowski Reviewed-by: Rob Herring Reviewed-by: Matthias Brugger Reviewed-by: AngeloGioacchino Del Regno ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Wed, 2022-04-06 at 14:17 -0300, Jason Gunthorpe wrote: > On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: > > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: > > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > > > > Oh, I didn't know about device_get_dma_attr().. > > > > > > > > Which is completely broken for any non-OF, non-ACPI plaform. > > > > > > I saw that, but I spent some time searching and could not find an > > > iommu driver that would load independently of OF or ACPI. ie no IOMMU > > > platform drivers are created by board files. Things like Intel/AMD > > > discover only from ACPI, etc. > > > > s390? > > Ah, I missed looking in s390, hyperv and virtio.. > > hyperv is not creating iommu_domains, just IRQ remapping > > virtio is using OF > > And s390 indeed doesn't obviously have OF or ACPI parts.. > > This seems like it would be consistent with other things: > > enum dev_dma_attr device_get_dma_attr(struct device *dev) > { > const struct fwnode_handle *fwnode = dev_fwnode(dev); > struct acpi_device *adev = to_acpi_device_node(fwnode); > > if (is_of_node(fwnode)) { > if (of_dma_is_coherent(to_of_node(fwnode))) > return DEV_DMA_COHERENT; > return DEV_DMA_NON_COHERENT; > } else if (adev) { > return acpi_get_dma_attr(adev); > } > > /* Platform is always DMA coherent */ > if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) && > !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) && > device_iommu_mapped(dev)) > return DEV_DMA_COHERENT; > return DEV_DMA_NOT_SUPPORTED; > } > EXPORT_SYMBOL_GPL(device_get_dma_attr); > > ie s390 has no of or acpi but the entire platform is known DMA > coherent at config time so allow it. Not sure we need the > device_iommu_mapped() or not. I only took a short look but I think the device_iommu_mapped() call in there is wrong. On s390 PCI always goes through IOMMU hardware both when using the IOMMU API and when using the DMA API and this hardware is always coherent. This is even true for s390 guests in QEMU/KVM and under the z/VM hypervisor. As far as I can see device_iommu_mapped()'s check for dev->iommu_group would only work while the device is under IOMMU API control not DMA API, no? Also, while it is true that s390 *hardware* devices are always cache coherent there is also the case that SWIOTLB is used for protected virtualization and then cache flushing APIs must be used. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 4/7] hisi_ptt: Add tune function support for HiSilicon PCIe Tune and Trace device
On 2022/4/7 12:28, kernel test robot wrote: > Hi Yicong, > > I love your patch! Perhaps something to improve: > > [auto build test WARNING on joro-iommu/next] > [also build test WARNING on linus/master linux/master v5.18-rc1 next-20220406] > [cannot apply to tip/perf/core] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch] > > url: > https://github.com/intel-lab-lkp/linux/commits/Yicong-Yang/Add-support-for-HiSilicon-PCIe-Tune-and-Trace-device/20220406-200044 > base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next > config: alpha-allyesconfig > (https://download.01.org/0day-ci/archive/20220407/202204071201.acepulor-...@intel.com/config) > compiler: alpha-linux-gcc (GCC) 11.2.0 > reproduce (this is a W=1 build): > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # > https://github.com/intel-lab-lkp/linux/commit/9400668b70cbcd5ec74a52f043c3a333b80135f8 > git remote add linux-review https://github.com/intel-lab-lkp/linux > git fetch --no-tags linux-review > Yicong-Yang/Add-support-for-HiSilicon-PCIe-Tune-and-Trace-device/20220406-200044 > git checkout 9400668b70cbcd5ec74a52f043c3a333b80135f8 > # save the config file to linux build tree > mkdir build_dir > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross > O=build_dir ARCH=alpha SHELL=/bin/bash drivers/hwtracing/ptt/ > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot > > All warnings (new ones prefixed by >>): > >drivers/hwtracing/ptt/hisi_ptt.c: In function 'hisi_ptt_tune_data_get': >>> drivers/hwtracing/ptt/hisi_ptt.c:46:16: warning: conversion from 'long >>> unsigned int' to 'u32' {aka 'unsigned int'} changes value from >>> '18446744073709551615' to '4294967295' [-Woverflow] > 46 | writel(~0UL, hisi_ptt->iobase + HISI_PTT_TUNING_DATA); > |^~~~ Thanks for the report. using of ~0U will fix this. >drivers/hwtracing/ptt/hisi_ptt.c: At top level: >drivers/hwtracing/ptt/hisi_ptt.c:1131:6: warning: no previous prototype > for 'hisi_ptt_remove' [-Wmissing-prototypes] > 1131 | void hisi_ptt_remove(struct pci_dev *pdev) > | ^~~ > for here I missed the static identifier. will fix. thanks. > > vim +46 drivers/hwtracing/ptt/hisi_ptt.c > > 33 > 34static int hisi_ptt_tune_data_get(struct hisi_ptt *hisi_ptt, > 35 u32 event, u16 *data) > 36{ > 37u32 reg; > 38 > 39reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); > 40reg &= ~(HISI_PTT_TUNING_CTRL_CODE | > HISI_PTT_TUNING_CTRL_SUB); > 41reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | > HISI_PTT_TUNING_CTRL_SUB, > 42 event); > 43writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); > 44 > 45/* Write all 1 to indicates it's the read process */ > > 46writel(~0UL, hisi_ptt->iobase + HISI_PTT_TUNING_DATA); > 47 > 48if (!hisi_ptt_wait_tuning_finish(hisi_ptt)) > 49return -ETIMEDOUT; > 50 > 51reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_DATA); > 52reg &= HISI_PTT_TUNING_DATA_VAL_MASK; > 53*data = FIELD_GET(HISI_PTT_TUNING_DATA_VAL_MASK, reg); > 54 > 55return 0; > 56} > 57 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 2/2] iommu/mediatek: Add mt8186 iommu support
Add mt8186 iommu supports. Signed-off-by: Anan Sun Signed-off-by: Yong Wu Reviewed-by: Matthias Brugger --- drivers/iommu/mtk_iommu.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 22c95ed78b3c..8d2b6dc89177 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -160,6 +160,7 @@ enum mtk_iommu_plat { M4U_MT8167, M4U_MT8173, M4U_MT8183, + M4U_MT8186, M4U_MT8192, M4U_MT8195, }; @@ -1429,6 +1430,21 @@ static const struct mtk_iommu_plat_data mt8183_data = { .larbid_remap = {{0}, {4}, {5}, {6}, {7}, {2}, {3}, {1}}, }; +static const struct mtk_iommu_plat_data mt8186_data_mm = { + .m4u_plat = M4U_MT8186, + .flags = HAS_BCLK | HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN | + WR_THROT_EN | IOVA_34_EN | NOT_STD_AXI_MODE | + MTK_IOMMU_TYPE_MM, + .larbid_remap = {{0}, {1, MTK_INVALID_LARBID, 8}, {4}, {7}, {2}, {9, 11, 19, 20}, + {MTK_INVALID_LARBID, 14, 16}, + {MTK_INVALID_LARBID, 13, MTK_INVALID_LARBID, 17}}, + .inv_sel_reg= REG_MMU_INV_SEL_GEN2, + .banks_num = 1, + .banks_enable = {true}, + .iova_region= mt8192_multi_dom, + .iova_region_nr = ARRAY_SIZE(mt8192_multi_dom), +}; + static const struct mtk_iommu_plat_data mt8192_data = { .m4u_plat = M4U_MT8192, .flags = HAS_BCLK | HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN | @@ -1498,6 +1514,7 @@ static const struct of_device_id mtk_iommu_of_ids[] = { { .compatible = "mediatek,mt8167-m4u", .data = &mt8167_data}, { .compatible = "mediatek,mt8173-m4u", .data = &mt8173_data}, { .compatible = "mediatek,mt8183-m4u", .data = &mt8183_data}, + { .compatible = "mediatek,mt8186-iommu-mm",.data = &mt8186_data_mm}, /* mm: m4u */ { .compatible = "mediatek,mt8192-m4u", .data = &mt8192_data}, { .compatible = "mediatek,mt8195-iommu-infra", .data = &mt8195_data_infra}, { .compatible = "mediatek,mt8195-iommu-vdo", .data = &mt8195_data_vdo}, -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/2] dt-bindings: mediatek: mt8186: Add binding for MM iommu
Add mt8186 iommu binding. "-mm" means the iommu is for Multimedia. Signed-off-by: Yong Wu Acked-by: Krzysztof Kozlowski Reviewed-by: Rob Herring Reviewed-by: Matthias Brugger --- .../bindings/iommu/mediatek,iommu.yaml| 4 + .../dt-bindings/memory/mt8186-memory-port.h | 217 ++ 2 files changed, 221 insertions(+) create mode 100644 include/dt-bindings/memory/mt8186-memory-port.h diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml index eed59ec00e78..91a3629a8e6e 100644 --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml @@ -76,6 +76,7 @@ properties: - mediatek,mt8167-m4u # generation two - mediatek,mt8173-m4u # generation two - mediatek,mt8183-m4u # generation two + - mediatek,mt8186-iommu-mm # generation two - mediatek,mt8192-m4u # generation two - mediatek,mt8195-iommu-vdo# generation two - mediatek,mt8195-iommu-vpp# generation two @@ -122,6 +123,7 @@ properties: dt-binding/memory/mt8167-larb-port.h for mt8167, dt-binding/memory/mt8173-larb-port.h for mt8173, dt-binding/memory/mt8183-larb-port.h for mt8183, + dt-binding/memory/mt8186-memory-port.h for mt8186, dt-binding/memory/mt8192-larb-port.h for mt8192. dt-binding/memory/mt8195-memory-port.h for mt8195. @@ -143,6 +145,7 @@ allOf: - mediatek,mt2701-m4u - mediatek,mt2712-m4u - mediatek,mt8173-m4u + - mediatek,mt8186-iommu-mm - mediatek,mt8192-m4u - mediatek,mt8195-iommu-vdo - mediatek,mt8195-iommu-vpp @@ -155,6 +158,7 @@ allOf: properties: compatible: enum: +- mediatek,mt8186-iommu-mm - mediatek,mt8192-m4u - mediatek,mt8195-iommu-vdo - mediatek,mt8195-iommu-vpp diff --git a/include/dt-bindings/memory/mt8186-memory-port.h b/include/dt-bindings/memory/mt8186-memory-port.h new file mode 100644 index ..2bc6e4433048 --- /dev/null +++ b/include/dt-bindings/memory/mt8186-memory-port.h @@ -0,0 +1,217 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2022 MediaTek Inc. + * + * Author: Anan Sun + * Author: Yong Wu + */ +#ifndef _DT_BINDINGS_MEMORY_MT8186_LARB_PORT_H_ +#define _DT_BINDINGS_MEMORY_MT8186_LARB_PORT_H_ + +#include + +/* + * MM IOMMU supports 16GB dma address. We separate it to four ranges: + * 0 ~ 4G; 4G ~ 8G; 8G ~ 12G; 12G ~ 16G, we could adjust these masters + * locate in anyone region. BUT: + * a) Make sure all the ports inside a larb are in one range. + * b) The iova of any master can NOT cross the 4G/8G/12G boundary. + * + * This is the suggested mapping in this SoC: + * + * modulesdma-address-region larbs-ports + * disp 0 ~ 4G larb0/1/2 + * vcodec 4G ~ 8G larb4/7 + * cam/mdp 8G ~ 12G the other larbs. + * N/A 12G ~ 16G + * CCU0 0x24000_ ~ 0x243ff_ larb13: port 9/10 + * CCU1 0x24400_ ~ 0x247ff_ larb14: port 4/5 + */ + +/* MM IOMMU ports */ +/* LARB 0 -- MMSYS */ +#define IOMMU_PORT_L0_DISP_POSTMASK0 MTK_M4U_ID(0, 0) +#define IOMMU_PORT_L0_REVERSED MTK_M4U_ID(0, 1) +#define IOMMU_PORT_L0_OVL_RDMA0MTK_M4U_ID(0, 2) +#define IOMMU_PORT_L0_DISP_FAKE0 MTK_M4U_ID(0, 3) + +/* LARB 1 -- MMSYS */ +#define IOMMU_PORT_L1_DISP_RDMA1 MTK_M4U_ID(1, 0) +#define IOMMU_PORT_L1_OVL_2L_RDMA0 MTK_M4U_ID(1, 1) +#define IOMMU_PORT_L1_DISP_RDMA0 MTK_M4U_ID(1, 2) +#define IOMMU_PORT_L1_DISP_WDMA0 MTK_M4U_ID(1, 3) +#define IOMMU_PORT_L1_DISP_FAKE1 MTK_M4U_ID(1, 4) + +/* LARB 2 -- MMSYS */ +#define IOMMU_PORT_L2_MDP_RDMA0MTK_M4U_ID(2, 0) +#define IOMMU_PORT_L2_MDP_RDMA1MTK_M4U_ID(2, 1) +#define IOMMU_PORT_L2_MDP_WROT0MTK_M4U_ID(2, 2) +#define IOMMU_PORT_L2_MDP_WROT1MTK_M4U_ID(2, 3) +#define IOMMU_PORT_L2_DISP_FAKE0 MTK_M4U_ID(2, 4) + +/* LARB 4 -- VDEC */ +#define IOMMU_PORT_L4_HW_VDEC_MC_EXT MTK_M4U_ID(4, 0) +#define IOMMU_PORT_L4_HW_VDEC_UFO_EXT MTK_M4U_ID(4, 1) +#define IOMMU_PORT_L4_HW_VDEC_PP_EXT MTK_M4U_ID(4, 2) +#define IOMMU_PORT_L4_HW_VDEC_PRED_RD_EXT MTK_M4U_ID(4, 3) +#define IOMMU_PORT_L4_HW_VDEC_PRED_WR_EXT MTK_M4U_ID(4, 4) +#define IOMMU_PORT_L4_HW_VDEC_PPWRAP_EXT MTK_M4U_ID(4, 5) +#define IOMMU_PORT_L4_HW_VDEC_TILE_EXT MTK_M4U_ID(4, 6) +#define IOMMU_PORT_L4_HW_VDEC_VLD_EXT MTK_M4U_ID(4, 7) +#define IOMMU_PORT_L4_HW_VDEC_VLD2_EXT MTK_M4U_ID(4, 8) +#define IOMMU_PORT_L4_HW_VDEC_AVC_MV_EXT MTK_M4U_ID(4, 9) +#define IOMMU_PORT_L4_HW_VDEC_UFO_ENC_EXT MTK_M4U_ID(4, 10) +#define IOMMU_PORT_L4_HW_VDEC_
[PATCH v3 0/2] MT8186 IOMMU SUPPORT
This patchset adds mt8186 iommu support. Change note: v3: Rebase on v5.18-rc1 and mt8195 iommu v6: https://lore.kernel.org/linux-iommu/20220407075726.17771-1-yong...@mediatek.com/ v2: https://lore.kernel.org/linux-iommu/20220223072402.17518-1-yong...@mediatek.com/ a)Base on v5.17-rc1 and mt8195 iommu v5. b)Add a comment "mm: m4u" in the code for readable. v1: https://lore.kernel.org/linux-mediatek/20220125093244.18230-1-yong...@mediatek.com/ Yong Wu (2): dt-bindings: mediatek: mt8186: Add binding for MM iommu iommu/mediatek: Add mt8186 iommu support .../bindings/iommu/mediatek,iommu.yaml| 4 + drivers/iommu/mtk_iommu.c | 17 ++ .../dt-bindings/memory/mt8186-memory-port.h | 217 ++ 3 files changed, 238 insertions(+) create mode 100644 include/dt-bindings/memory/mt8186-memory-port.h -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RESEND v5 5/5] iova: Add iova_len argument to iova_domain_init_rcaches()
On 2022/4/4 19:27, John Garry wrote: > Add max opt argument to iova_domain_init_rcaches(), and use it to set the > rcaches range. > > Also fix up all users to set this value (at 0, meaning use default), > including a wrapper for that, iova_domain_init_rcaches_default(). > > For dma-iommu.c we derive the iova_len argument from the IOMMU group > max opt DMA size. > > Signed-off-by: John Garry > --- > drivers/iommu/dma-iommu.c| 15 ++- > drivers/iommu/iova.c | 19 --- > drivers/vdpa/vdpa_user/iova_domain.c | 4 ++-- > include/linux/iova.h | 3 ++- > 4 files changed, 34 insertions(+), 7 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 42ca42ff1b5d..19f35624611c 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -525,6 +525,8 @@ static int iommu_dma_init_domain(struct iommu_domain > *domain, dma_addr_t base, > struct iommu_dma_cookie *cookie = domain->iova_cookie; > unsigned long order, base_pfn; > struct iova_domain *iovad; > + size_t max_opt_dma_size; > + unsigned long iova_len = 0; > int ret; > > if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) > @@ -560,7 +562,18 @@ static int iommu_dma_init_domain(struct iommu_domain > *domain, dma_addr_t base, > } > > init_iova_domain(iovad, 1UL << order, base_pfn); > - ret = iova_domain_init_rcaches(iovad); > + > + max_opt_dma_size = iommu_group_get_max_opt_dma_size(dev->iommu_group); > + if (max_opt_dma_size) { > + unsigned long shift = __ffs(1UL << order); > + > + iova_len = roundup_pow_of_two(max_opt_dma_size); > + iova_len >>= shift; > + if (!iova_len) > + iova_len = 1; How about move "iovad->rcache_max_size = iova_len_to_rcache_max(iova_len);" here? So that, iova_domain_init_rcaches() can remain the same. And iova_domain_init_rcaches_default() does not need to be added. > + } > + > + ret = iova_domain_init_rcaches(iovad, iova_len); > if (ret) > return ret; > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index 5c22b9187b79..d65e79e132ee 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -706,12 +706,20 @@ static void iova_magazine_push(struct iova_magazine > *mag, unsigned long pfn) > mag->pfns[mag->size++] = pfn; > } > > -int iova_domain_init_rcaches(struct iova_domain *iovad) > +static unsigned long iova_len_to_rcache_max(unsigned long iova_len) > +{ > + return order_base_2(iova_len) + 1; > +} > + > +int iova_domain_init_rcaches(struct iova_domain *iovad, unsigned long > iova_len) > { > unsigned int cpu; > int i, ret; > > - iovad->rcache_max_size = 6; /* Arbitrarily high default */ > + if (iova_len) > + iovad->rcache_max_size = iova_len_to_rcache_max(iova_len); > + else > + iovad->rcache_max_size = 6; /* Arbitrarily high default */ > > iovad->rcaches = kcalloc(iovad->rcache_max_size, >sizeof(struct iova_rcache), > @@ -755,7 +763,12 @@ int iova_domain_init_rcaches(struct iova_domain *iovad) > free_iova_rcaches(iovad); > return ret; > } > -EXPORT_SYMBOL_GPL(iova_domain_init_rcaches); > + > +int iova_domain_init_rcaches_default(struct iova_domain *iovad) > +{ > + return iova_domain_init_rcaches(iovad, 0); > +} > +EXPORT_SYMBOL_GPL(iova_domain_init_rcaches_default); > > /* > * Try inserting IOVA range starting with 'iova_pfn' into 'rcache', and > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c > b/drivers/vdpa/vdpa_user/iova_domain.c > index 6daa3978d290..3a2acef98a4a 100644 > --- a/drivers/vdpa/vdpa_user/iova_domain.c > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > @@ -514,12 +514,12 @@ vduse_domain_create(unsigned long iova_limit, size_t > bounce_size) > spin_lock_init(&domain->iotlb_lock); > init_iova_domain(&domain->stream_iovad, > PAGE_SIZE, IOVA_START_PFN); > - ret = iova_domain_init_rcaches(&domain->stream_iovad); > + ret = iova_domain_init_rcaches_default(&domain->stream_iovad); > if (ret) > goto err_iovad_stream; > init_iova_domain(&domain->consistent_iovad, > PAGE_SIZE, bounce_pfns); > - ret = iova_domain_init_rcaches(&domain->consistent_iovad); > + ret = iova_domain_init_rcaches_default(&domain->consistent_iovad); > if (ret) > goto err_iovad_consistent; > > diff --git a/include/linux/iova.h b/include/linux/iova.h > index 02f7222fa85a..56281434ce0c 100644 > --- a/include/linux/iova.h > +++ b/include/linux/iova.h > @@ -95,7 +95,8 @@ struct iova *reserve_iova(struct iova_domain *iovad, > unsigned long pfn_lo, > unsigned long pfn_hi); > void init_iova_domain(struct iova_domain *iovad, unsigned long granule, > unsigned long start_pfn); > -int iov
Re: [PATCH RESEND v5 4/5] iommu: Allow max opt DMA len be set for a group via sysfs
On 2022/4/4 19:27, John Garry wrote: > Add support to allow the maximum optimised DMA len be set for an IOMMU > group via sysfs. > > This is much the same with the method to change the default domain type > for a group. > > Signed-off-by: John Garry > --- > .../ABI/testing/sysfs-kernel-iommu_groups | 16 + > drivers/iommu/iommu.c | 59 ++- > include/linux/iommu.h | 6 ++ > 3 files changed, 79 insertions(+), 2 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups > b/Documentation/ABI/testing/sysfs-kernel-iommu_groups > index b15af6a5bc08..ed6f72794f6c 100644 > --- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups > +++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups > @@ -63,3 +63,19 @@ Description: /sys/kernel/iommu_groups//type > shows the type of default > system could lead to catastrophic effects (the users might > need to reboot the machine to get it to normal state). So, it's > expected that the users understand what they're doing. > + > +What:/sys/kernel/iommu_groups//max_opt_dma_size > +Date:Feb 2022 > +KernelVersion: v5.18 > +Contact: iommu@lists.linux-foundation.org > +Description: /sys/kernel/iommu_groups//max_opt_dma_size shows the > + max optimised DMA size for the default IOMMU domain associated > + with the group. > + Each IOMMU domain has an IOVA domain. The IOVA domain caches > + IOVAs upto a certain size as a performance optimisation. > + This sysfs file allows the range of the IOVA domain caching be > + set, such that larger than default IOVAs may be cached. > + A value of 0 means that the default caching range is chosen. > + A privileged user could request the kernel the change the range > + by writing to this file. For this to happen, the same rules > + and procedure applies as in changing the default domain type. > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 10bb10c2a210..7c7258f19bed 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -48,6 +48,7 @@ struct iommu_group { > struct iommu_domain *default_domain; > struct iommu_domain *domain; > struct list_head entry; > + size_t max_opt_dma_size; > }; > > struct group_device { > @@ -89,6 +90,9 @@ static int iommu_create_device_direct_mappings(struct > iommu_group *group, > static struct iommu_group *iommu_group_get_for_dev(struct device *dev); > static ssize_t iommu_group_store_type(struct iommu_group *group, > const char *buf, size_t count); > +static ssize_t iommu_group_store_max_opt_dma_size(struct iommu_group *group, > + const char *buf, > + size_t count); > > #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)\ > struct iommu_group_attribute iommu_group_attr_##_name = \ > @@ -571,6 +575,12 @@ static ssize_t iommu_group_show_type(struct iommu_group > *group, > return strlen(type); > } > > +static ssize_t iommu_group_show_max_opt_dma_size(struct iommu_group *group, > + char *buf) > +{ > + return sprintf(buf, "%zu\n", group->max_opt_dma_size); > +} > + > static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL); > > static IOMMU_GROUP_ATTR(reserved_regions, 0444, > @@ -579,6 +589,9 @@ static IOMMU_GROUP_ATTR(reserved_regions, 0444, > static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type, > iommu_group_store_type); > > +static IOMMU_GROUP_ATTR(max_opt_dma_size, 0644, > iommu_group_show_max_opt_dma_size, > + iommu_group_store_max_opt_dma_size); > + > static void iommu_group_release(struct kobject *kobj) > { > struct iommu_group *group = to_iommu_group(kobj); > @@ -665,6 +678,10 @@ struct iommu_group *iommu_group_alloc(void) > if (ret) > return ERR_PTR(ret); > > + ret = iommu_group_create_file(group, > &iommu_group_attr_max_opt_dma_size); > + if (ret) > + return ERR_PTR(ret); > + > pr_debug("Allocated group %d\n", group->id); > > return group; > @@ -2087,6 +2104,11 @@ struct iommu_domain *iommu_get_dma_domain(struct > device *dev) > return dev->iommu_group->default_domain; > } > > +size_t iommu_group_get_max_opt_dma_size(struct iommu_group *group) > +{ > + return group->max_opt_dma_size; > +} > + > /* > * IOMMU groups are really the natural working unit of the IOMMU, but > * the IOMMU API works on domains and devices. Bridge that gap by > @@ -2871,12 +2893,14 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); > * @prev_dev: The device in the group (this is used to make sure that the > device >
RE: [PATCH 4/4] iommu/arm-smmu-v3: cleanup arm_smmu_dev_{enable, disable}_feature
> From: Christoph Hellwig > Sent: Thursday, April 7, 2022 2:26 PM > > Fold the arm_smmu_dev_has_feature arm_smmu_dev_feature_enabled > into > the main methods. > > Signed-off-by: Christoph Hellwig > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 57 ++--- > 1 file changed, 15 insertions(+), 42 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 1ea184bbf750a6..8e201c660139ae 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -2760,58 +2760,27 @@ static void arm_smmu_get_resv_regions(struct > device *dev, > iommu_dma_get_resv_regions(dev, head); > } > > -static bool arm_smmu_dev_has_feature(struct device *dev, > - enum iommu_dev_features feat) > -{ > - struct arm_smmu_master *master = dev_iommu_priv_get(dev); > - > - if (!master) > - return false; > - > - switch (feat) { > - case IOMMU_DEV_FEAT_IOPF: > - return arm_smmu_master_iopf_supported(master); > - case IOMMU_DEV_FEAT_SVA: > - return arm_smmu_master_sva_supported(master); > - default: > - return false; > - } > -} > - > -static bool arm_smmu_dev_feature_enabled(struct device *dev, > - enum iommu_dev_features feat) > -{ > - struct arm_smmu_master *master = dev_iommu_priv_get(dev); > - > - if (!master) > - return false; > - > - switch (feat) { > - case IOMMU_DEV_FEAT_IOPF: > - return master->iopf_enabled; > - case IOMMU_DEV_FEAT_SVA: > - return arm_smmu_master_sva_enabled(master); > - default: > - return false; > - } > -} > - > static int arm_smmu_dev_enable_feature(struct device *dev, > enum iommu_dev_features feat) > { > struct arm_smmu_master *master = dev_iommu_priv_get(dev); > > - if (!arm_smmu_dev_has_feature(dev, feat)) > - return -ENODEV; > - > - if (arm_smmu_dev_feature_enabled(dev, feat)) > - return -EBUSY; > + if (!master) > + return -EINVAL; Old logic returns -ENODEV but it's changed to -EINVAL here. Is it intended? If yes, probably mention it in the patch description though just a small semantics change. > > switch (feat) { > case IOMMU_DEV_FEAT_IOPF: > + if (!arm_smmu_master_iopf_supported(master)) > + return -EINVAL; > + if (master->iopf_enabled) > + return -EBUSY; > master->iopf_enabled = true; > return 0; > case IOMMU_DEV_FEAT_SVA: > + if (!arm_smmu_master_sva_supported(master)) > + return -EINVAL; > + if (arm_smmu_master_sva_enabled(master)) > + return -EBUSY; > return arm_smmu_master_enable_sva(master); > default: > return -EINVAL; > @@ -2823,16 +2792,20 @@ static int > arm_smmu_dev_disable_feature(struct device *dev, > { > struct arm_smmu_master *master = dev_iommu_priv_get(dev); > > - if (!arm_smmu_dev_feature_enabled(dev, feat)) > + if (!master) > return -EINVAL; > > switch (feat) { > case IOMMU_DEV_FEAT_IOPF: > + if (!master->iopf_enabled) > + return -EINVAL; > if (master->sva_enabled) > return -EBUSY; > master->iopf_enabled = false; > return 0; > case IOMMU_DEV_FEAT_SVA: > + if (!arm_smmu_master_sva_enabled(master)) > + return -EINVAL; > return arm_smmu_master_disable_sva(master); > default: > return -EINVAL; > -- > 2.30.2 > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 3/4] iommu: remove the put_resv_regions method
> From: Christoph Hellwig > Sent: Thursday, April 7, 2022 2:26 PM > > All drivers that implement get_resv_regions just use > generic_put_resv_regions to implement the put side. Remove the > indirections and document the allocations constraints. > Looks no document after removal: > void iommu_put_resv_regions(struct device *dev, struct list_head *list) > -{ > - const struct iommu_ops *ops = dev_iommu_ops(dev); > - > - if (ops->put_resv_regions) > - ops->put_resv_regions(dev, list); > -} > - > -/** > - * generic_iommu_put_resv_regions - Reserved region driver helper > - * @dev: device for which to free reserved regions > - * @list: reserved region list for device > - * > - * IOMMU drivers can use this to implement their .put_resv_regions() > callback > - * for simple reservations. Memory allocated for each reserved region will > be > - * freed. If an IOMMU driver allocates additional resources per region, it is > - * going to have to implement a custom callback. > - */ > -void generic_iommu_put_resv_regions(struct device *dev, struct list_head > *list) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 34/34] iommu/mediatek: mt8195: Enable multi banks for infra iommu
Enable the multi-bank functions for infra-iommu. We put PCIE in bank0 and USB in the last bank(bank4). and we don't use the other banks currently, disable them. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 027bbbced80d..22c95ed78b3c 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -1449,8 +1449,11 @@ static const struct mtk_iommu_plat_data mt8195_data_infra = { MTK_IOMMU_TYPE_INFRA | IFA_IOMMU_PCIE_SUPPORT, .pericfg_comp_str = "mediatek,mt8195-pericfg_ao", .inv_sel_reg = REG_MMU_INV_SEL_GEN2, - .banks_num= 1, - .banks_enable = {true}, + .banks_num= 5, + .banks_enable = {true, false, false, false, true}, + .banks_portmsk= {[0] = GENMASK(19, 16), /* PCIe */ +[4] = GENMASK(31, 20), /* USB */ + }, .iova_region = single_domain, .iova_region_nr = ARRAY_SIZE(single_domain), }; -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 33/34] iommu/mediatek: Backup/restore regsiters for multi banks
Each bank has some independent registers. thus backup/restore them for each a bank when suspend and resume. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 46 ++- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 028dc642a31e..027bbbced80d 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -173,11 +173,12 @@ struct mtk_iommu_suspend_reg { u32 misc_ctrl; u32 dcm_dis; u32 ctrl_reg; - u32 int_control0; - u32 int_main_control; - u32 ivrp_paddr; u32 vld_pa_rng; u32 wr_len_ctrl; + + u32 int_control[MTK_IOMMU_BANK_MAX]; + u32 int_main_control[MTK_IOMMU_BANK_MAX]; + u32 ivrp_paddr[MTK_IOMMU_BANK_MAX]; }; struct mtk_iommu_plat_data { @@ -1292,16 +1293,23 @@ static int __maybe_unused mtk_iommu_runtime_suspend(struct device *dev) { struct mtk_iommu_data *data = dev_get_drvdata(dev); struct mtk_iommu_suspend_reg *reg = &data->reg; - void __iomem *base = data->bank[0].base; + void __iomem *base; + int i = 0; + base = data->bank[i].base; reg->wr_len_ctrl = readl_relaxed(base + REG_MMU_WR_LEN_CTRL); reg->misc_ctrl = readl_relaxed(base + REG_MMU_MISC_CTRL); reg->dcm_dis = readl_relaxed(base + REG_MMU_DCM_DIS); reg->ctrl_reg = readl_relaxed(base + REG_MMU_CTRL_REG); - reg->int_control0 = readl_relaxed(base + REG_MMU_INT_CONTROL0); - reg->int_main_control = readl_relaxed(base + REG_MMU_INT_MAIN_CONTROL); - reg->ivrp_paddr = readl_relaxed(base + REG_MMU_IVRP_PADDR); reg->vld_pa_rng = readl_relaxed(base + REG_MMU_VLD_PA_RNG); + do { + if (!data->plat_data->banks_enable[i]) + continue; + base = data->bank[i].base; + reg->int_control[i] = readl_relaxed(base + REG_MMU_INT_CONTROL0); + reg->int_main_control[i] = readl_relaxed(base + REG_MMU_INT_MAIN_CONTROL); + reg->ivrp_paddr[i] = readl_relaxed(base + REG_MMU_IVRP_PADDR); + } while (++i < data->plat_data->banks_num); clk_disable_unprepare(data->bclk); return 0; } @@ -1310,9 +1318,9 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct device *dev) { struct mtk_iommu_data *data = dev_get_drvdata(dev); struct mtk_iommu_suspend_reg *reg = &data->reg; - struct mtk_iommu_domain *m4u_dom = data->bank[0].m4u_dom; - void __iomem *base = data->bank[0].base; - int ret; + struct mtk_iommu_domain *m4u_dom; + void __iomem *base; + int ret, i = 0; ret = clk_prepare_enable(data->bclk); if (ret) { @@ -1324,18 +1332,26 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct device *dev) * Uppon first resume, only enable the clk and return, since the values of the * registers are not yet set. */ - if (!m4u_dom) + if (!reg->wr_len_ctrl) return 0; + base = data->bank[i].base; writel_relaxed(reg->wr_len_ctrl, base + REG_MMU_WR_LEN_CTRL); writel_relaxed(reg->misc_ctrl, base + REG_MMU_MISC_CTRL); writel_relaxed(reg->dcm_dis, base + REG_MMU_DCM_DIS); writel_relaxed(reg->ctrl_reg, base + REG_MMU_CTRL_REG); - writel_relaxed(reg->int_control0, base + REG_MMU_INT_CONTROL0); - writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL); - writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR); writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG); - writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, base + REG_MMU_PT_BASE_ADDR); + do { + m4u_dom = data->bank[i].m4u_dom; + if (!data->plat_data->banks_enable[i] || !m4u_dom) + continue; + base = data->bank[i].base; + writel_relaxed(reg->int_control[i], base + REG_MMU_INT_CONTROL0); + writel_relaxed(reg->int_main_control[i], base + REG_MMU_INT_MAIN_CONTROL); + writel_relaxed(reg->ivrp_paddr[i], base + REG_MMU_IVRP_PADDR); + writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, + base + REG_MMU_PT_BASE_ADDR); + } while (++i < data->plat_data->banks_num); /* * Users may allocate dma buffer before they call pm_runtime_get, -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 32/34] iommu/mediatek: Initialise/Remove for multi bank dev
The registers for each bank of the IOMMU base are in order, delta is 0x1000. Initialise the base for each bank. For all the previous SoC, we only have bank0. thus use "do {} while()" to allow bank0 always go. When removing the device, Not always all the banks are initialised, it depend on if there is masters for that bank. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 44 ++- 1 file changed, 30 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index d42b3d35a36e..028dc642a31e 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -113,6 +113,7 @@ #define F_MMU_INT_ID_PORT_ID(a)(((a) >> 2) & 0x1f) #define MTK_PROTECT_PA_ALIGN 256 +#define MTK_IOMMU_BANK_SZ 0x1000 #define PERICFG_IOMMU_10x714 @@ -1104,7 +1105,7 @@ static int mtk_iommu_probe(struct platform_device *pdev) struct component_match *match = NULL; struct regmap *infracfg; void*protect; - int ret, banks_num; + int ret, banks_num, i = 0; u32 val; char*p; struct mtk_iommu_bank_data *bank; @@ -1145,27 +1146,36 @@ static int mtk_iommu_probe(struct platform_device *pdev) data->enable_4GB = !!(val & F_DDR_4GB_SUPPORT_EN); } + banks_num = data->plat_data->banks_num; res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (resource_size(res) < banks_num * MTK_IOMMU_BANK_SZ) { + dev_err(dev, "banknr %d. res %pR is not enough.\n", banks_num, res); + return -EINVAL; + } base = devm_ioremap_resource(dev, res); if (IS_ERR(base)) return PTR_ERR(base); ioaddr = res->start; - banks_num = data->plat_data->banks_num; data->bank = devm_kmalloc(dev, banks_num * sizeof(*data->bank), GFP_KERNEL); if (!data->bank) return -ENOMEM; - bank = &data->bank[0]; - bank->id = 0; - bank->base = base; - bank->m4u_dom = NULL; - bank->irq = platform_get_irq(pdev, 0); - if (bank->irq < 0) - return bank->irq; - bank->parent_dev = dev; - bank->parent_data = data; - spin_lock_init(&bank->tlb_lock); + do { + if (!data->plat_data->banks_enable[i]) + continue; + bank = &data->bank[i]; + bank->id = i; + bank->base = base + i * MTK_IOMMU_BANK_SZ; + bank->m4u_dom = NULL; + + bank->irq = platform_get_irq(pdev, i); + if (bank->irq < 0) + return bank->irq; + bank->parent_dev = dev; + bank->parent_data = data; + spin_lock_init(&bank->tlb_lock); + } while (++i < banks_num); if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_BCLK)) { data->bclk = devm_clk_get(dev, "bclk"); @@ -1251,7 +1261,8 @@ static int mtk_iommu_probe(struct platform_device *pdev) static int mtk_iommu_remove(struct platform_device *pdev) { struct mtk_iommu_data *data = platform_get_drvdata(pdev); - struct mtk_iommu_bank_data *bank = &data->bank[0]; + struct mtk_iommu_bank_data *bank; + int i; iommu_device_sysfs_remove(&data->iommu); iommu_device_unregister(&data->iommu); @@ -1268,7 +1279,12 @@ static int mtk_iommu_remove(struct platform_device *pdev) #endif } pm_runtime_disable(&pdev->dev); - devm_free_irq(&pdev->dev, bank->irq, bank); + for (i = 0; i < data->plat_data->banks_num; i++) { + bank = &data->bank[i]; + if (!bank->m4u_dom) + continue; + devm_free_irq(&pdev->dev, bank->irq, bank); + } return 0; } -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 31/34] iommu/mediatek: Get the proper bankid for multi banks
We preassign some ports in a special bank via the new defined banks_portmsk. Put it in the plat_data means it is not expected to be adjusted dynamically. If the iommu id in the iommu consumer's dtsi node is inside this banks_portmsk, then we switch it to this special iommu bank, and initialise the IOMMU bank HW. Each a bank has the independent pgtable(4GB iova range). Each a bank is a independent iommu domain/group. Currently we don't separate different iova ranges inside a bank. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 39 --- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 0828cff97625..d42b3d35a36e 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -191,6 +191,7 @@ struct mtk_iommu_plat_data { u8 banks_num; boolbanks_enable[MTK_IOMMU_BANK_MAX]; + unsigned intbanks_portmsk[MTK_IOMMU_BANK_MAX]; unsigned char larbid_remap[MTK_LARB_COM_MAX][MTK_LARB_SUBCOM_MAX]; }; @@ -467,6 +468,30 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) return IRQ_HANDLED; } +static unsigned int mtk_iommu_get_bank_id(struct device *dev, + const struct mtk_iommu_plat_data *plat_data) +{ + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + unsigned int i, portmsk = 0, bankid = 0; + + if (plat_data->banks_num == 1) + return bankid; + + for (i = 0; i < fwspec->num_ids; i++) + portmsk |= BIT(MTK_M4U_TO_PORT(fwspec->ids[i])); + + for (i = 0; i < plat_data->banks_num && i < MTK_IOMMU_BANK_MAX; i++) { + if (!plat_data->banks_enable[i]) + continue; + + if (portmsk & plat_data->banks_portmsk[i]) { + bankid = i; + break; + } + } + return bankid; /* default is 0 */ +} + static int mtk_iommu_get_iova_region_id(struct device *dev, const struct mtk_iommu_plat_data *plat_data) { @@ -619,13 +644,14 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, struct list_head *hw_list = data->hw_list; struct device *m4udev = data->dev; struct mtk_iommu_bank_data *bank; - unsigned int bankid = 0; + unsigned int bankid; int ret, region_id; region_id = mtk_iommu_get_iova_region_id(dev, data->plat_data); if (region_id < 0) return region_id; + bankid = mtk_iommu_get_bank_id(dev, data->plat_data); mutex_lock(&dom->mutex); if (!dom->bank) { /* Data is in the frstdata in sharing pgtable case. */ @@ -802,6 +828,7 @@ static struct iommu_group *mtk_iommu_device_group(struct device *dev) struct mtk_iommu_data *c_data = dev_iommu_priv_get(dev), *data; struct list_head *hw_list = c_data->hw_list; struct iommu_group *group; + unsigned int bankid, groupid; int regionid; data = mtk_iommu_get_frst_data(hw_list); @@ -812,12 +839,18 @@ static struct iommu_group *mtk_iommu_device_group(struct device *dev) if (regionid < 0) return ERR_PTR(regionid); + bankid = mtk_iommu_get_bank_id(dev, data->plat_data); mutex_lock(&data->mutex); - group = data->m4u_group[regionid]; + /* +* If the bank function is enabled, each a bank is a iommu group/domain. +* otherwise, each a iova region is a iommu group/domain. +*/ + groupid = bankid ? bankid : regionid; + group = data->m4u_group[groupid]; if (!group) { group = iommu_group_alloc(); if (!IS_ERR(group)) - data->m4u_group[regionid] = group; + data->m4u_group[groupid] = group; } else { iommu_group_ref_get(group); } -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 30/34] iommu/mediatek: Change the domid to iova_region_id
Prepare for adding bankid, also no functional change. In the previous SoC, each a iova_region is a domain; In the multi-banks case, each a bank is a domain, then the original function name "mtk_iommu_get_domain_id" is not proper. Use "iova_region_id" instead of "domain_id". Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 46 +++ 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 9c27b99ca0cd..0828cff97625 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -467,8 +467,8 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) return IRQ_HANDLED; } -static int mtk_iommu_get_domain_id(struct device *dev, - const struct mtk_iommu_plat_data *plat_data) +static int mtk_iommu_get_iova_region_id(struct device *dev, + const struct mtk_iommu_plat_data *plat_data) { const struct mtk_iommu_iova_region *rgn = plat_data->iova_region; const struct bus_dma_region *dma_rgn = dev->dma_range_map; @@ -498,7 +498,7 @@ static int mtk_iommu_get_domain_id(struct device *dev, } static int mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev, - bool enable, unsigned int domid) + bool enable, unsigned int regionid) { struct mtk_smi_larb_iommu*larb_mmu; unsigned int larbid, portid; @@ -514,12 +514,12 @@ static int mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev, if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) { larb_mmu = &data->larb_imu[larbid]; - region = data->plat_data->iova_region + domid; + region = data->plat_data->iova_region + regionid; larb_mmu->bank[portid] = upper_32_bits(region->iova_base); - dev_dbg(dev, "%s iommu for larb(%s) port %d dom %d bank %d.\n", + dev_dbg(dev, "%s iommu for larb(%s) port %d region %d rgn-bank %d.\n", enable ? "enable" : "disable", dev_name(larb_mmu->dev), - portid, domid, larb_mmu->bank[portid]); + portid, regionid, larb_mmu->bank[portid]); if (enable) larb_mmu->mmu |= MTK_SMI_MMU_EN(portid); @@ -545,7 +545,7 @@ static int mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev, static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom, struct mtk_iommu_data *data, -unsigned int domid) +unsigned int region_id) { const struct mtk_iommu_iova_region *region; struct mtk_iommu_domain *m4u_dom; @@ -584,7 +584,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom, update_iova_region: /* Update the iova region for this domain */ - region = data->plat_data->iova_region + domid; + region = data->plat_data->iova_region + region_id; dom->domain.geometry.aperture_start = region->iova_base; dom->domain.geometry.aperture_end = region->iova_base + region->size - 1; dom->domain.geometry.force_aperture = true; @@ -620,18 +620,18 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, struct device *m4udev = data->dev; struct mtk_iommu_bank_data *bank; unsigned int bankid = 0; - int ret, domid; + int ret, region_id; - domid = mtk_iommu_get_domain_id(dev, data->plat_data); - if (domid < 0) - return domid; + region_id = mtk_iommu_get_iova_region_id(dev, data->plat_data); + if (region_id < 0) + return region_id; mutex_lock(&dom->mutex); if (!dom->bank) { /* Data is in the frstdata in sharing pgtable case. */ frstdata = mtk_iommu_get_frst_data(hw_list); - ret = mtk_iommu_domain_finalise(dom, frstdata, domid); + ret = mtk_iommu_domain_finalise(dom, frstdata, region_id); if (ret) { mutex_unlock(&dom->mutex); return -ENODEV; @@ -662,7 +662,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, } mutex_unlock(&data->mutex); - return mtk_iommu_config(data, dev, true, domid); + return mtk_iommu_config(data, dev, true, region_id); err_unlock: mutex_unlock(&data->mutex); @@ -802,22 +802,22 @@ static struct iommu_group *mtk_iommu_device_group(struct device *dev) struct mtk_iommu_data *c_data = dev_iommu_priv_get(dev), *data; struct list_head *hw_list = c_data->hw_list; struct iom
[PATCH v6 28/34] iommu/mediatek: Add mtk_iommu_bank_data structure
Prepare for supporting multi-banks for the IOMMU HW, No functional change. Add a new structure(mtk_iommu_bank_data) for each a bank. Each a bank have the independent HW base/IRQ/tlb-range ops, and each a bank has its special iommu-domain(independent pgtable), thus, also move the domain information into it. In previous SoC, we have only one bank which could be treated as bank0( bankid always is 0 for the previous SoC). After adding this structure, the tlb operations and irq could use bank_data as parameter. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 179 +- 1 file changed, 117 insertions(+), 62 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index d46eb745492f..f2a29399f10f 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -151,6 +151,7 @@ #define MTK_LARB_SUBCOM_MAX8 #define MTK_IOMMU_GROUP_MAX8 +#define MTK_IOMMU_BANK_MAX 5 enum mtk_iommu_plat { M4U_MT2712, @@ -187,25 +188,36 @@ struct mtk_iommu_plat_data { struct list_head*hw_list; unsigned intiova_region_nr; const struct mtk_iommu_iova_region *iova_region; + + u8 banks_num; + boolbanks_enable[MTK_IOMMU_BANK_MAX]; unsigned char larbid_remap[MTK_LARB_COM_MAX][MTK_LARB_SUBCOM_MAX]; }; -struct mtk_iommu_data { +struct mtk_iommu_bank_data { void __iomem*base; int irq; + u8 id; + struct device *parent_dev; + struct mtk_iommu_data *parent_data; + spinlock_t tlb_lock; /* lock for tlb range flush */ + struct mtk_iommu_domain *m4u_dom; /* Each bank has a domain */ +}; + +struct mtk_iommu_data { struct device *dev; struct clk *bclk; phys_addr_t protect_base; /* protect memory base */ struct mtk_iommu_suspend_regreg; - struct mtk_iommu_domain *m4u_dom; struct iommu_group *m4u_group[MTK_IOMMU_GROUP_MAX]; boolenable_4GB; - spinlock_t tlb_lock; /* lock for tlb range flush */ struct iommu_device iommu; const struct mtk_iommu_plat_data *plat_data; struct device *smicomm_dev; + struct mtk_iommu_bank_data *bank; + struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */ struct regmap *pericfg; @@ -225,7 +237,7 @@ struct mtk_iommu_domain { struct io_pgtable_cfg cfg; struct io_pgtable_ops *iop; - struct mtk_iommu_data *data; + struct mtk_iommu_bank_data *bank; struct iommu_domain domain; struct mutexmutex; /* Protect "data" in this structure */ @@ -311,20 +323,24 @@ static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom) static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data) { - void __iomem *base = data->base; + /* Tlb flush all always is in bank0. */ + struct mtk_iommu_bank_data *bank = &data->bank[0]; + void __iomem *base = bank->base; unsigned long flags; - spin_lock_irqsave(&data->tlb_lock, flags); + spin_lock_irqsave(&bank->tlb_lock, flags); writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + data->plat_data->inv_sel_reg); writel_relaxed(F_ALL_INVLD, base + REG_MMU_INVALIDATE); wmb(); /* Make sure the tlb flush all done */ - spin_unlock_irqrestore(&data->tlb_lock, flags); + spin_unlock_irqrestore(&bank->tlb_lock, flags); } static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, - struct mtk_iommu_data *data) + struct mtk_iommu_bank_data *bank) { - struct list_head *head = data->hw_list; + struct list_head *head = bank->parent_data->hw_list; + struct mtk_iommu_bank_data *curbank; + struct mtk_iommu_data *data; bool check_pm_status; unsigned long flags; void __iomem *base; @@ -354,9 +370,10 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, continue; } - base = data->base; + curbank = &data->bank[bank->id]; + base = curbank->base; - spin_lock_irqsave(&data->tlb_lock, flags); + spin_lock_irqsave(&curbank->tlb_lock, flags); writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + data->plat_data->inv_sel_reg); @@ -371,7 +388,7 @@ static void mtk
[PATCH v6 29/34] iommu/mediatek: Initialise bank HW for each a bank
The mt8195 IOMMU HW max support 5 banks, and regarding the banks' registers, it looks like: |bank0 | bank1 | bank2 | bank3 | bank4| |global | |control| null |regs | - |bank |bank |bank |bank |bank | |regs |regs |regs |regs |regs | | | | | | | - Each bank has some special bank registers and it share bank0's global control registers. this patch initialise the bank hw with the bankid. In the hw_init, we always initialise bank0's control register since we don't know if the bank0 is initialised. Additionally, About each bank's register base, always delta 0x1000. like bank[x + 1] = bank[x] + 0x1000. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 32 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index f2a29399f10f..9c27b99ca0cd 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -259,7 +259,7 @@ static void mtk_iommu_unbind(struct device *dev) static const struct iommu_ops mtk_iommu_ops; -static int mtk_iommu_hw_init(const struct mtk_iommu_data *data); +static int mtk_iommu_hw_init(const struct mtk_iommu_data *data, unsigned int bankid); #define MTK_IOMMU_TLB_ADDR(iova) ({\ dma_addr_t _addr = iova;\ @@ -642,12 +642,14 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, mutex_lock(&data->mutex); bank = &data->bank[bankid]; - if (!bank->m4u_dom) { /* Initialize the M4U HW */ + if (!bank->m4u_dom) { /* Initialize the M4U HW for each a BANK */ ret = pm_runtime_resume_and_get(m4udev); - if (ret < 0) + if (ret < 0) { + dev_err(m4udev, "pm get fail(%d) in attach.\n", ret); goto err_unlock; + } - ret = mtk_iommu_hw_init(data); + ret = mtk_iommu_hw_init(data, bankid); if (ret) { pm_runtime_put(m4udev); goto err_unlock; @@ -897,11 +899,16 @@ static const struct iommu_ops mtk_iommu_ops = { } }; -static int mtk_iommu_hw_init(const struct mtk_iommu_data *data) +static int mtk_iommu_hw_init(const struct mtk_iommu_data *data, unsigned int bankid) { + const struct mtk_iommu_bank_data *bankx = &data->bank[bankid]; const struct mtk_iommu_bank_data *bank0 = &data->bank[0]; u32 regval; + /* +* Global control settings are in bank0. May re-init these global registers +* since no sure if there is bank0 consumers. +*/ if (data->plat_data->m4u_plat == M4U_MT8173) { regval = F_MMU_PREFETCH_RT_REPLACE_MOD | F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173; @@ -944,13 +951,14 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data) } writel_relaxed(regval, bank0->base + REG_MMU_MISC_CTRL); + /* Independent settings for each bank */ regval = F_L2_MULIT_HIT_EN | F_TABLE_WALK_FAULT_INT_EN | F_PREETCH_FIFO_OVERFLOW_INT_EN | F_MISS_FIFO_OVERFLOW_INT_EN | F_PREFETCH_FIFO_ERR_INT_EN | F_MISS_FIFO_ERR_INT_EN; - writel_relaxed(regval, bank0->base + REG_MMU_INT_CONTROL0); + writel_relaxed(regval, bankx->base + REG_MMU_INT_CONTROL0); regval = F_INT_TRANSLATION_FAULT | F_INT_MAIN_MULTI_HIT_FAULT | @@ -959,19 +967,19 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data) F_INT_TLB_MISS_FAULT | F_INT_MISS_TRANSACTION_FIFO_FAULT | F_INT_PRETETCH_TRANSATION_FIFO_FAULT; - writel_relaxed(regval, bank0->base + REG_MMU_INT_MAIN_CONTROL); + writel_relaxed(regval, bankx->base + REG_MMU_INT_MAIN_CONTROL); if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_LEGACY_IVRP_PADDR)) regval = (data->protect_base >> 1) | (data->enable_4GB << 31); else regval = lower_32_bits(data->protect_base) | upper_32_bits(data->protect_base); - writel_relaxed(regval, bank0->base + REG_MMU_IVRP_PADDR); + writel_relaxed(regval, bankx->base + REG_MMU_IVRP_PADDR); - if (devm_request_irq(bank0->parent_dev, bank0->irq, mtk_iommu_isr, 0, -dev_name(bank0->parent_dev), (void *)bank0)) { - writel_relaxed(0, bank0->base + REG_MMU_PT_BASE_ADDR); - dev_err(bank0->parent_dev, "Failed @ IRQ-%d Request\n", bank0->irq); + if (devm_request_irq(bankx->parent_dev, bankx->irq, mtk_iomm
[PATCH v6 27/34] iommu/mediatek-v1: Just rename mtk_iommu to mtk_iommu_v1
No functional change. Just rename this for readable. Differentiate this from mtk_iommu.c Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu_v1.c | 211 +-- 1 file changed, 103 insertions(+), 108 deletions(-) diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 3d1f0897d1cc..62669e60991f 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -85,53 +85,53 @@ */ #define M2701_IOMMU_PGT_SIZE SZ_4M -struct mtk_iommu_suspend_reg { +struct mtk_iommu_v1_suspend_reg { u32 standard_axi_mode; u32 dcm_dis; u32 ctrl_reg; u32 int_control0; }; -struct mtk_iommu_data { +struct mtk_iommu_v1_data { void __iomem*base; int irq; struct device *dev; struct clk *bclk; phys_addr_t protect_base; /* protect memory base */ - struct mtk_iommu_domain *m4u_dom; + struct mtk_iommu_v1_domain *m4u_dom; struct iommu_device iommu; struct dma_iommu_mapping*mapping; struct mtk_smi_larb_iommu larb_imu[MTK_LARB_NR_MAX]; - struct mtk_iommu_suspend_regreg; + struct mtk_iommu_v1_suspend_reg reg; }; -struct mtk_iommu_domain { +struct mtk_iommu_v1_domain { spinlock_t pgtlock; /* lock for page table */ struct iommu_domain domain; u32 *pgt_va; dma_addr_t pgt_pa; - struct mtk_iommu_data *data; + struct mtk_iommu_v1_data*data; }; -static int mtk_iommu_bind(struct device *dev) +static int mtk_iommu_v1_bind(struct device *dev) { - struct mtk_iommu_data *data = dev_get_drvdata(dev); + struct mtk_iommu_v1_data *data = dev_get_drvdata(dev); return component_bind_all(dev, &data->larb_imu); } -static void mtk_iommu_unbind(struct device *dev) +static void mtk_iommu_v1_unbind(struct device *dev) { - struct mtk_iommu_data *data = dev_get_drvdata(dev); + struct mtk_iommu_v1_data *data = dev_get_drvdata(dev); component_unbind_all(dev, &data->larb_imu); } -static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom) +static struct mtk_iommu_v1_domain *to_mtk_domain(struct iommu_domain *dom) { - return container_of(dom, struct mtk_iommu_domain, domain); + return container_of(dom, struct mtk_iommu_v1_domain, domain); } static const int mt2701_m4u_in_larb[] = { @@ -157,7 +157,7 @@ static inline int mt2701_m4u_to_port(int id) return id - mt2701_m4u_in_larb[larb]; } -static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data) +static void mtk_iommu_v1_tlb_flush_all(struct mtk_iommu_v1_data *data) { writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, data->base + REG_MMU_INV_SEL); @@ -165,8 +165,8 @@ static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data) wmb(); /* Make sure the tlb flush all done */ } -static void mtk_iommu_tlb_flush_range(struct mtk_iommu_data *data, - unsigned long iova, size_t size) +static void mtk_iommu_v1_tlb_flush_range(struct mtk_iommu_v1_data *data, +unsigned long iova, size_t size) { int ret; u32 tmp; @@ -184,16 +184,16 @@ static void mtk_iommu_tlb_flush_range(struct mtk_iommu_data *data, if (ret) { dev_warn(data->dev, "Partial TLB flush timed out, falling back to full flush\n"); - mtk_iommu_tlb_flush_all(data); + mtk_iommu_v1_tlb_flush_all(data); } /* Clear the CPE status */ writel_relaxed(0, data->base + REG_MMU_CPE_DONE); } -static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) +static irqreturn_t mtk_iommu_v1_isr(int irq, void *dev_id) { - struct mtk_iommu_data *data = dev_id; - struct mtk_iommu_domain *dom = data->m4u_dom; + struct mtk_iommu_v1_data *data = dev_id; + struct mtk_iommu_v1_domain *dom = data->m4u_dom; u32 int_state, regval, fault_iova, fault_pa; unsigned int fault_larb, fault_port; @@ -223,13 +223,13 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) regval |= F_INT_CLR_BIT; writel_relaxed(regval, data->base + REG_MMU_INT_CONTROL); - mtk_iommu_tlb_flush_all(data); + mtk_iommu_v1_tlb_flush_all(data); return IRQ_HANDLED; } -static void mtk_iommu_config(struct mtk_iommu_data *data, -struct device *dev, bool enable) +static void mtk_iommu_v1_config(struct mtk_iommu_v1_data *data, + struct device *dev, boo
[PATCH v6 26/34] iommu/mediatek: Remove mtk_iommu.h
Currently there is a suspend structure in the header file. It's no need to keep a header file only for this. Move these into the c file and rm this header file. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c| 14 +- drivers/iommu/mtk_iommu.h| 32 drivers/iommu/mtk_iommu_v1.c | 11 --- 3 files changed, 21 insertions(+), 36 deletions(-) delete mode 100644 drivers/iommu/mtk_iommu.h diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index ab3b1aedfdc3..d46eb745492f 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -30,7 +31,7 @@ #include #include -#include "mtk_iommu.h" +#include #define REG_MMU_PT_BASE_ADDR 0x000 #define MMU_PT_ADDR_MASK GENMASK(31, 7) @@ -166,6 +167,17 @@ struct mtk_iommu_iova_region { unsigned long long size; }; +struct mtk_iommu_suspend_reg { + u32 misc_ctrl; + u32 dcm_dis; + u32 ctrl_reg; + u32 int_control0; + u32 int_main_control; + u32 ivrp_paddr; + u32 vld_pa_rng; + u32 wr_len_ctrl; +}; + struct mtk_iommu_plat_data { enum mtk_iommu_plat m4u_plat; u32 flags; diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h deleted file mode 100644 index 305243e18aa9.. --- a/drivers/iommu/mtk_iommu.h +++ /dev/null @@ -1,32 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (c) 2015-2016 MediaTek Inc. - * Author: Honghui Zhang - */ - -#ifndef _MTK_IOMMU_H_ -#define _MTK_IOMMU_H_ - -#include -#include -#include -#include -#include -#include -#include - -struct mtk_iommu_suspend_reg { - union { - u32 standard_axi_mode;/* v1 */ - u32 misc_ctrl;/* v2 */ - }; - u32 dcm_dis; - u32 ctrl_reg; - u32 int_control0; - u32 int_main_control; - u32 ivrp_paddr; - u32 vld_pa_rng; - u32 wr_len_ctrl; -}; - -#endif diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 6d1c09c91e1f..3d1f0897d1cc 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -7,7 +7,6 @@ * * Based on driver/iommu/mtk_iommu.c */ -#include #include #include #include @@ -28,10 +27,9 @@ #include #include #include -#include +#include #include #include -#include "mtk_iommu.h" #define REG_MMU_PT_BASE_ADDR 0x000 @@ -87,6 +85,13 @@ */ #define M2701_IOMMU_PGT_SIZE SZ_4M +struct mtk_iommu_suspend_reg { + u32 standard_axi_mode; + u32 dcm_dis; + u32 ctrl_reg; + u32 int_control0; +}; + struct mtk_iommu_data { void __iomem*base; int irq; -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 25/34] iommu/mediatek: Separate mtk_iommu_data for v1 and v2
Prepare for adding the structure "mtk_iommu_bank_data". No functional change. The mtk_iommu_domain in v1 and v2 are different, we could not add current data as bank[0] in v1 simplistically. Currently we have no plan to add new SoC for v1, in order to avoid affect v1 when we add many new features for v2, I totally separate v1 and v2 in this patch, there are many structures only for v2. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c| 82 +--- drivers/iommu/mtk_iommu.h| 81 --- drivers/iommu/mtk_iommu_v1.c | 29 + 3 files changed, 106 insertions(+), 86 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 6b238ad55cbe..ab3b1aedfdc3 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -146,6 +146,69 @@ #define MTK_INVALID_LARBID MTK_LARB_NR_MAX +#define MTK_LARB_COM_MAX 8 +#define MTK_LARB_SUBCOM_MAX8 + +#define MTK_IOMMU_GROUP_MAX8 + +enum mtk_iommu_plat { + M4U_MT2712, + M4U_MT6779, + M4U_MT8167, + M4U_MT8173, + M4U_MT8183, + M4U_MT8192, + M4U_MT8195, +}; + +struct mtk_iommu_iova_region { + dma_addr_t iova_base; + unsigned long long size; +}; + +struct mtk_iommu_plat_data { + enum mtk_iommu_plat m4u_plat; + u32 flags; + u32 inv_sel_reg; + + char*pericfg_comp_str; + struct list_head*hw_list; + unsigned intiova_region_nr; + const struct mtk_iommu_iova_region *iova_region; + unsigned char larbid_remap[MTK_LARB_COM_MAX][MTK_LARB_SUBCOM_MAX]; +}; + +struct mtk_iommu_data { + void __iomem*base; + int irq; + struct device *dev; + struct clk *bclk; + phys_addr_t protect_base; /* protect memory base */ + struct mtk_iommu_suspend_regreg; + struct mtk_iommu_domain *m4u_dom; + struct iommu_group *m4u_group[MTK_IOMMU_GROUP_MAX]; + boolenable_4GB; + spinlock_t tlb_lock; /* lock for tlb range flush */ + + struct iommu_device iommu; + const struct mtk_iommu_plat_data *plat_data; + struct device *smicomm_dev; + + struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */ + struct regmap *pericfg; + + struct mutexmutex; /* Protect m4u_group/m4u_dom above */ + + /* +* In the sharing pgtable case, list data->list to the global list like m4ulist. +* In the non-sharing pgtable case, list data->list to the itself hw_list_head. +*/ + struct list_head*hw_list; + struct list_headhw_list_head; + struct list_headlist; + struct mtk_smi_larb_iommu larb_imu[MTK_LARB_NR_MAX]; +}; + struct mtk_iommu_domain { struct io_pgtable_cfg cfg; struct io_pgtable_ops *iop; @@ -156,6 +219,20 @@ struct mtk_iommu_domain { struct mutexmutex; /* Protect "data" in this structure */ }; +static int mtk_iommu_bind(struct device *dev) +{ + struct mtk_iommu_data *data = dev_get_drvdata(dev); + + return component_bind_all(dev, &data->larb_imu); +} + +static void mtk_iommu_unbind(struct device *dev) +{ + struct mtk_iommu_data *data = dev_get_drvdata(dev); + + component_unbind_all(dev, &data->larb_imu); +} + static const struct iommu_ops mtk_iommu_ops; static int mtk_iommu_hw_init(const struct mtk_iommu_data *data); @@ -193,11 +270,6 @@ static LIST_HEAD(m4ulist); /* List all the M4U HWs */ #define for_each_m4u(data, head) list_for_each_entry(data, head, list) -struct mtk_iommu_iova_region { - dma_addr_t iova_base; - unsigned long long size; -}; - static const struct mtk_iommu_iova_region single_domain[] = { {.iova_base = 0,.size = SZ_4G}, }; diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h index f2ee11cd254a..305243e18aa9 100644 --- a/drivers/iommu/mtk_iommu.h +++ b/drivers/iommu/mtk_iommu.h @@ -7,23 +7,14 @@ #ifndef _MTK_IOMMU_H_ #define _MTK_IOMMU_H_ -#include -#include #include #include #include #include -#include #include -#include #include #include -#define MTK_LARB_COM_MAX 8 -#define MTK_LARB_SUBCOM_MAX8 - -#define MTK_IOMMU_GROUP_MAX8 - struct mtk_iommu_suspend_reg { union { u32 standard_axi_mode;/* v1 */ @@ -38,76 +29,4 @@ struct mtk_iommu_suspend_reg { u32 wr_len_ctrl; }; -enum mt
[PATCH v6 24/34] iommu/mediatek: Just move code position in hw_init
No functional change too, prepare for mt8195 IOMMU support bank functions. Some global control settings are in bank0 while the other banks have their bank independent setting. Here only move the global control settings and the independent registers together. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 48 +++ 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index cb99c1d01f28..6b238ad55cbe 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -803,30 +803,6 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data) } writel_relaxed(regval, data->base + REG_MMU_CTRL_REG); - regval = F_L2_MULIT_HIT_EN | - F_TABLE_WALK_FAULT_INT_EN | - F_PREETCH_FIFO_OVERFLOW_INT_EN | - F_MISS_FIFO_OVERFLOW_INT_EN | - F_PREFETCH_FIFO_ERR_INT_EN | - F_MISS_FIFO_ERR_INT_EN; - writel_relaxed(regval, data->base + REG_MMU_INT_CONTROL0); - - regval = F_INT_TRANSLATION_FAULT | - F_INT_MAIN_MULTI_HIT_FAULT | - F_INT_INVALID_PA_FAULT | - F_INT_ENTRY_REPLACEMENT_FAULT | - F_INT_TLB_MISS_FAULT | - F_INT_MISS_TRANSACTION_FIFO_FAULT | - F_INT_PRETETCH_TRANSATION_FIFO_FAULT; - writel_relaxed(regval, data->base + REG_MMU_INT_MAIN_CONTROL); - - if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_LEGACY_IVRP_PADDR)) - regval = (data->protect_base >> 1) | (data->enable_4GB << 31); - else - regval = lower_32_bits(data->protect_base) | -upper_32_bits(data->protect_base); - writel_relaxed(regval, data->base + REG_MMU_IVRP_PADDR); - if (data->enable_4GB && MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_VLD_PA_RNG)) { /* @@ -860,6 +836,30 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data) } writel_relaxed(regval, data->base + REG_MMU_MISC_CTRL); + regval = F_L2_MULIT_HIT_EN | + F_TABLE_WALK_FAULT_INT_EN | + F_PREETCH_FIFO_OVERFLOW_INT_EN | + F_MISS_FIFO_OVERFLOW_INT_EN | + F_PREFETCH_FIFO_ERR_INT_EN | + F_MISS_FIFO_ERR_INT_EN; + writel_relaxed(regval, data->base + REG_MMU_INT_CONTROL0); + + regval = F_INT_TRANSLATION_FAULT | + F_INT_MAIN_MULTI_HIT_FAULT | + F_INT_INVALID_PA_FAULT | + F_INT_ENTRY_REPLACEMENT_FAULT | + F_INT_TLB_MISS_FAULT | + F_INT_MISS_TRANSACTION_FIFO_FAULT | + F_INT_PRETETCH_TRANSATION_FIFO_FAULT; + writel_relaxed(regval, data->base + REG_MMU_INT_MAIN_CONTROL); + + if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_LEGACY_IVRP_PADDR)) + regval = (data->protect_base >> 1) | (data->enable_4GB << 31); + else + regval = lower_32_bits(data->protect_base) | +upper_32_bits(data->protect_base); + writel_relaxed(regval, data->base + REG_MMU_IVRP_PADDR); + if (devm_request_irq(data->dev, data->irq, mtk_iommu_isr, 0, dev_name(data->dev), (void *)data)) { writel_relaxed(0, data->base + REG_MMU_PT_BASE_ADDR); -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 23/34] iommu/mediatek: Only adjust code about register base
No functional change. Use "base" instead of the data->base. This is avoid to touch too many lines in the next patches. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 51 +-- 1 file changed, 27 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 7a8d9dda7361..cb99c1d01f28 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -227,12 +227,12 @@ static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom) static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data) { + void __iomem *base = data->base; unsigned long flags; spin_lock_irqsave(&data->tlb_lock, flags); - writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, - data->base + data->plat_data->inv_sel_reg); - writel_relaxed(F_ALL_INVLD, data->base + REG_MMU_INVALIDATE); + writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + data->plat_data->inv_sel_reg); + writel_relaxed(F_ALL_INVLD, base + REG_MMU_INVALIDATE); wmb(); /* Make sure the tlb flush all done */ spin_unlock_irqrestore(&data->tlb_lock, flags); } @@ -243,6 +243,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, struct list_head *head = data->hw_list; bool check_pm_status; unsigned long flags; + void __iomem *base; int ret; u32 tmp; @@ -269,23 +270,23 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, continue; } + base = data->base; + spin_lock_irqsave(&data->tlb_lock, flags); writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, - data->base + data->plat_data->inv_sel_reg); + base + data->plat_data->inv_sel_reg); - writel_relaxed(MTK_IOMMU_TLB_ADDR(iova), - data->base + REG_MMU_INVLD_START_A); + writel_relaxed(MTK_IOMMU_TLB_ADDR(iova), base + REG_MMU_INVLD_START_A); writel_relaxed(MTK_IOMMU_TLB_ADDR(iova + size - 1), - data->base + REG_MMU_INVLD_END_A); - writel_relaxed(F_MMU_INV_RANGE, - data->base + REG_MMU_INVALIDATE); + base + REG_MMU_INVLD_END_A); + writel_relaxed(F_MMU_INV_RANGE, base + REG_MMU_INVALIDATE); /* tlb sync */ - ret = readl_poll_timeout_atomic(data->base + REG_MMU_CPE_DONE, + ret = readl_poll_timeout_atomic(base + REG_MMU_CPE_DONE, tmp, tmp != 0, 10, 1000); /* Clear the CPE status */ - writel_relaxed(0, data->base + REG_MMU_CPE_DONE); + writel_relaxed(0, base + REG_MMU_CPE_DONE); spin_unlock_irqrestore(&data->tlb_lock, flags); if (ret) { @@ -305,23 +306,25 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) struct mtk_iommu_domain *dom = data->m4u_dom; unsigned int fault_larb = MTK_INVALID_LARBID, fault_port = 0, sub_comm = 0; u32 int_state, regval, va34_32, pa34_32; + const struct mtk_iommu_plat_data *plat_data = data->plat_data; + void __iomem *base = data->base; u64 fault_iova, fault_pa; bool layer, write; /* Read error info from registers */ - int_state = readl_relaxed(data->base + REG_MMU_FAULT_ST1); + int_state = readl_relaxed(base + REG_MMU_FAULT_ST1); if (int_state & F_REG_MMU0_FAULT_MASK) { - regval = readl_relaxed(data->base + REG_MMU0_INT_ID); - fault_iova = readl_relaxed(data->base + REG_MMU0_FAULT_VA); - fault_pa = readl_relaxed(data->base + REG_MMU0_INVLD_PA); + regval = readl_relaxed(base + REG_MMU0_INT_ID); + fault_iova = readl_relaxed(base + REG_MMU0_FAULT_VA); + fault_pa = readl_relaxed(base + REG_MMU0_INVLD_PA); } else { - regval = readl_relaxed(data->base + REG_MMU1_INT_ID); - fault_iova = readl_relaxed(data->base + REG_MMU1_FAULT_VA); - fault_pa = readl_relaxed(data->base + REG_MMU1_INVLD_PA); + regval = readl_relaxed(base + REG_MMU1_INT_ID); + fault_iova = readl_relaxed(base + REG_MMU1_FAULT_VA); + fault_pa = readl_relaxed(base + REG_MMU1_INVLD_PA); } layer = fault_iova & F_MMU_FAULT_VA_LAYER_BIT; write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT; - if (MTK_IOMMU_HAS_FLAG(data->plat_data, IOVA_34_EN)) { + if (MTK_IOMMU_HAS_FLAG(plat_data, IOVA_34_EN)) { va34_32 = FIELD_GET(F_MMU_INVAL_VA_34_32_MASK, fault_iova); fault_iova = fault_iova & F_MMU_INVAL_VA_31_12_MASK;
[PATCH v6 22/34] iommu/mediatek: Add mt8195 support
mt8195 has 3 IOMMU, containing 2 MM IOMMUs, one is for vdo, the other is for vpp. and 1 INFRA IOMMU. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 43 +++ drivers/iommu/mtk_iommu.h | 1 + 2 files changed, 44 insertions(+) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 763e912d0a67..7a8d9dda7361 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -1234,6 +1234,46 @@ static const struct mtk_iommu_plat_data mt8192_data = { {0, 14, 16}, {0, 13, 18, 17}}, }; +static const struct mtk_iommu_plat_data mt8195_data_infra = { + .m4u_plat = M4U_MT8195, + .flags= WR_THROT_EN | DCM_DISABLE | PM_CLK_AO | + MTK_IOMMU_TYPE_INFRA | IFA_IOMMU_PCIE_SUPPORT, + .pericfg_comp_str = "mediatek,mt8195-pericfg_ao", + .inv_sel_reg = REG_MMU_INV_SEL_GEN2, + .iova_region = single_domain, + .iova_region_nr = ARRAY_SIZE(single_domain), +}; + +static const struct mtk_iommu_plat_data mt8195_data_vdo = { + .m4u_plat = M4U_MT8195, + .flags = HAS_BCLK | HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN | + WR_THROT_EN | NOT_STD_AXI_MODE | IOVA_34_EN | + SHARE_PGTABLE | MTK_IOMMU_TYPE_MM, + .hw_list= &m4ulist, + .inv_sel_reg= REG_MMU_INV_SEL_GEN2, + .iova_region= mt8192_multi_dom, + .iova_region_nr = ARRAY_SIZE(mt8192_multi_dom), + .larbid_remap = {{2, 0}, {21}, {24}, {7}, {19}, {9, 10, 11}, + {13, 17, 15/* 17b */, 25}, {5}}, +}; + +static const struct mtk_iommu_plat_data mt8195_data_vpp = { + .m4u_plat = M4U_MT8195, + .flags = HAS_BCLK | HAS_SUB_COMM_3BITS | OUT_ORDER_WR_EN | + WR_THROT_EN | NOT_STD_AXI_MODE | IOVA_34_EN | + SHARE_PGTABLE | MTK_IOMMU_TYPE_MM, + .hw_list= &m4ulist, + .inv_sel_reg= REG_MMU_INV_SEL_GEN2, + .iova_region= mt8192_multi_dom, + .iova_region_nr = ARRAY_SIZE(mt8192_multi_dom), + .larbid_remap = {{1}, {3}, + {22, MTK_INVALID_LARBID, MTK_INVALID_LARBID, MTK_INVALID_LARBID, 23}, + {8}, {20}, {12}, + /* 16: 16a; 29: 16b; 30: CCUtop0; 31: CCUtop1 */ + {14, 16, 29, 26, 30, 31, 18}, + {4, MTK_INVALID_LARBID, MTK_INVALID_LARBID, MTK_INVALID_LARBID, 6}}, +}; + static const struct of_device_id mtk_iommu_of_ids[] = { { .compatible = "mediatek,mt2712-m4u", .data = &mt2712_data}, { .compatible = "mediatek,mt6779-m4u", .data = &mt6779_data}, @@ -1241,6 +1281,9 @@ static const struct of_device_id mtk_iommu_of_ids[] = { { .compatible = "mediatek,mt8173-m4u", .data = &mt8173_data}, { .compatible = "mediatek,mt8183-m4u", .data = &mt8183_data}, { .compatible = "mediatek,mt8192-m4u", .data = &mt8192_data}, + { .compatible = "mediatek,mt8195-iommu-infra", .data = &mt8195_data_infra}, + { .compatible = "mediatek,mt8195-iommu-vdo", .data = &mt8195_data_vdo}, + { .compatible = "mediatek,mt8195-iommu-vpp", .data = &mt8195_data_vpp}, {} }; diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h index 56838fad8c73..f2ee11cd254a 100644 --- a/drivers/iommu/mtk_iommu.h +++ b/drivers/iommu/mtk_iommu.h @@ -46,6 +46,7 @@ enum mtk_iommu_plat { M4U_MT8173, M4U_MT8183, M4U_MT8192, + M4U_MT8195, }; struct mtk_iommu_iova_region; -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 21/34] iommu/mediatek: Add PCIe support
Currently the code for of_iommu_configure_dev_id is like this: static int of_iommu_configure_dev_id(struct device_node *master_np, struct device *dev, const u32 *id) { struct of_phandle_args iommu_spec = { .args_count = 1 }; err = of_map_id(master_np, *id, "iommu-map", "iommu-map-mask", &iommu_spec.np, iommu_spec.args); ... } It supports only one id output. BUT our PCIe HW has two ID(one is for writing, the other is for reading). I'm not sure if we should change of_map_id to support output MAX_PHANDLE_ARGS. Here add the solution in ourselve drivers. If it's pcie case, enable one more bit. Not all infra iommu support PCIe, thus add a PCIe support flag here. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index d975c892b332..763e912d0a67 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -134,6 +135,7 @@ #define MTK_IOMMU_TYPE_MASK(0x3 << 13) /* PM and clock always on. e.g. infra iommu */ #define PM_CLK_AO BIT(15) +#define IFA_IOMMU_PCIE_SUPPORT BIT(16) #define MTK_IOMMU_HAS_FLAG(pdata, _x) (!!(((pdata)->flags) & (_x))) @@ -420,8 +422,11 @@ static int mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev, larb_mmu->mmu &= ~MTK_SMI_MMU_EN(portid); } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA)) { peri_mmuen_msk = BIT(portid); - peri_mmuen = enable ? peri_mmuen_msk : 0; + /* PCI dev has only one output id, enable the next writing bit for PCIe */ + if (dev_is_pci(dev)) + peri_mmuen_msk |= BIT(portid + 1); + peri_mmuen = enable ? peri_mmuen_msk : 0; ret = regmap_update_bits(data->pericfg, PERICFG_IOMMU_1, peri_mmuen_msk, peri_mmuen); if (ret) @@ -1052,6 +1057,15 @@ static int mtk_iommu_probe(struct platform_device *pdev) ret = component_master_add_with_match(dev, &mtk_iommu_com_ops, match); if (ret) goto out_bus_set_null; + } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) && + MTK_IOMMU_HAS_FLAG(data->plat_data, IFA_IOMMU_PCIE_SUPPORT)) { +#ifdef CONFIG_PCI + if (!iommu_present(&pci_bus_type)) { + ret = bus_set_iommu(&pci_bus_type, &mtk_iommu_ops); + if (ret) /* PCIe fail don't affect platform_bus. */ + goto out_list_del; + } +#endif } return ret; @@ -1082,6 +1096,11 @@ static int mtk_iommu_remove(struct platform_device *pdev) if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) { device_link_remove(data->smicomm_dev, &pdev->dev); component_master_del(&pdev->dev, &mtk_iommu_com_ops); + } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) && + MTK_IOMMU_HAS_FLAG(data->plat_data, IFA_IOMMU_PCIE_SUPPORT)) { +#ifdef CONFIG_PCI + bus_set_iommu(&pci_bus_type, NULL); +#endif } pm_runtime_disable(&pdev->dev); devm_free_irq(&pdev->dev, data->irq, data); -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 20/34] iommu/mediatek: Add infra iommu support
The infra iommu enable bits in mt8195 is in the pericfg register segment, use regmap to update it. If infra iommu master translation fault, It doesn't have the larbid/portid, thus print out the whole register value. Since regmap_update_bits may fail, add return value for mtk_iommu_config. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 36 +--- drivers/iommu/mtk_iommu.h | 2 ++ 2 files changed, 31 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index afb77a530f32..d975c892b332 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -112,6 +112,8 @@ #define MTK_PROTECT_PA_ALIGN 256 +#define PERICFG_IOMMU_10x714 + #define HAS_4GB_MODE BIT(0) /* HW will use the EMI clock if there isn't the "bclk". */ #define HAS_BCLK BIT(1) @@ -343,8 +345,8 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) { dev_err_ratelimited( data->dev, - "fault type=0x%x iova=0x%llx pa=0x%llx larb=%d port=%d layer=%d %s\n", - int_state, fault_iova, fault_pa, fault_larb, fault_port, + "fault type=0x%x iova=0x%llx pa=0x%llx master=0x%x(larb=%d port=%d) layer=%d %s\n", + int_state, fault_iova, fault_pa, regval, fault_larb, fault_port, layer, write ? "write" : "read"); } @@ -388,14 +390,15 @@ static int mtk_iommu_get_domain_id(struct device *dev, return -EINVAL; } -static void mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev, -bool enable, unsigned int domid) +static int mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev, + bool enable, unsigned int domid) { struct mtk_smi_larb_iommu*larb_mmu; unsigned int larbid, portid; struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); const struct mtk_iommu_iova_region *region; - int i; + u32 peri_mmuen, peri_mmuen_msk; + int i, ret = 0; for (i = 0; i < fwspec->num_ids; ++i) { larbid = MTK_M4U_TO_LARB(fwspec->ids[i]); @@ -415,8 +418,19 @@ static void mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev, larb_mmu->mmu |= MTK_SMI_MMU_EN(portid); else larb_mmu->mmu &= ~MTK_SMI_MMU_EN(portid); + } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA)) { + peri_mmuen_msk = BIT(portid); + peri_mmuen = enable ? peri_mmuen_msk : 0; + + ret = regmap_update_bits(data->pericfg, PERICFG_IOMMU_1, +peri_mmuen_msk, peri_mmuen); + if (ret) + dev_err(dev, "%s iommu(%s) inframaster 0x%x fail(%d).\n", + enable ? "enable" : "disable", + dev_name(data->dev), peri_mmuen_msk, ret); } } + return ret; } static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom, @@ -531,8 +545,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, } mutex_unlock(&data->mutex); - mtk_iommu_config(data, dev, true, domid); - return 0; + return mtk_iommu_config(data, dev, true, domid); err_unlock: mutex_unlock(&data->mutex); @@ -995,6 +1008,15 @@ static int mtk_iommu_probe(struct platform_device *pdev) ret = mtk_iommu_mm_dts_parse(dev, &match, data); if (ret) goto out_runtime_disable; + } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) && + data->plat_data->pericfg_comp_str) { + infracfg = syscon_regmap_lookup_by_compatible(data->plat_data->pericfg_comp_str); + if (IS_ERR(infracfg)) { + ret = PTR_ERR(infracfg); + goto out_runtime_disable; + } + + data->pericfg = infracfg; } platform_set_drvdata(pdev, data); diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h index f41e32252056..56838fad8c73 100644 --- a/drivers/iommu/mtk_iommu.h +++ b/drivers/iommu/mtk_iommu.h @@ -55,6 +55,7 @@ struct mtk_iommu_plat_data { u32 flags; u32 inv_sel_reg; + char*pericfg_comp_str; struct list_head*hw_list; unsigned intiova_region_nr;
[PATCH v6 18/34] iommu/mediatek: Allow IOMMU_DOMAIN_UNMANAGED for PCIe VFIO
Allow the type IOMMU_DOMAIN_UNMANAGED since vfio_iommu_type1.c always call iommu_domain_alloc. The PCIe EP works ok when going through vfio. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 2746e178d2be..b737613e9046 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -446,7 +446,7 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type) { struct mtk_iommu_domain *dom; - if (type != IOMMU_DOMAIN_DMA) + if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED) return NULL; dom = kzalloc(sizeof(*dom), GFP_KERNEL); -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 16/34] iommu/mediatek: Contain MM IOMMU flow with the MM TYPE
Prepare for supporting INFRA_IOMMU, and APU_IOMMU later. For Infra IOMMU/APU IOMMU, it doesn't have the "larb""port". thus, Use the MM flag contain the MM_IOMMU special flow, Also, it moves a big chunk code about parsing the mediatek,larbs into a function, this is only needed for MM IOMMU. and all the current SoC are MM_IOMMU. The device link between iommu consumer device and smi-larb device only is needed in MM iommu case. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 212 ++ 1 file changed, 121 insertions(+), 91 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 642949aad47e..b048986913b9 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -138,6 +138,8 @@ #define MTK_IOMMU_IS_TYPE(pdata, _x) MTK_IOMMU_HAS_FLAG_MASK(pdata, _x,\ MTK_IOMMU_TYPE_MASK) +#define MTK_INVALID_LARBID MTK_LARB_NR_MAX + struct mtk_iommu_domain { struct io_pgtable_cfg cfg; struct io_pgtable_ops *iop; @@ -274,7 +276,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) { struct mtk_iommu_data *data = dev_id; struct mtk_iommu_domain *dom = data->m4u_dom; - unsigned int fault_larb, fault_port, sub_comm = 0; + unsigned int fault_larb = MTK_INVALID_LARBID, fault_port = 0, sub_comm = 0; u32 int_state, regval, va34_32, pa34_32; u64 fault_iova, fault_pa; bool layer, write; @@ -300,17 +302,19 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) pa34_32 = FIELD_GET(F_MMU_INVAL_PA_34_32_MASK, fault_iova); fault_pa |= (u64)pa34_32 << 32; - fault_port = F_MMU_INT_ID_PORT_ID(regval); - if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_2BITS)) { - fault_larb = F_MMU_INT_ID_COMM_ID(regval); - sub_comm = F_MMU_INT_ID_SUB_COMM_ID(regval); - } else if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_3BITS)) { - fault_larb = F_MMU_INT_ID_COMM_ID_EXT(regval); - sub_comm = F_MMU_INT_ID_SUB_COMM_ID_EXT(regval); - } else { - fault_larb = F_MMU_INT_ID_LARB_ID(regval); + if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) { + fault_port = F_MMU_INT_ID_PORT_ID(regval); + if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_2BITS)) { + fault_larb = F_MMU_INT_ID_COMM_ID(regval); + sub_comm = F_MMU_INT_ID_SUB_COMM_ID(regval); + } else if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_3BITS)) { + fault_larb = F_MMU_INT_ID_COMM_ID_EXT(regval); + sub_comm = F_MMU_INT_ID_SUB_COMM_ID_EXT(regval); + } else { + fault_larb = F_MMU_INT_ID_LARB_ID(regval); + } + fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm]; } - fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm]; if (report_iommu_fault(&dom->domain, data->dev, fault_iova, write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) { @@ -374,19 +378,21 @@ static void mtk_iommu_config(struct mtk_iommu_data *data, struct device *dev, larbid = MTK_M4U_TO_LARB(fwspec->ids[i]); portid = MTK_M4U_TO_PORT(fwspec->ids[i]); - larb_mmu = &data->larb_imu[larbid]; + if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) { + larb_mmu = &data->larb_imu[larbid]; - region = data->plat_data->iova_region + domid; - larb_mmu->bank[portid] = upper_32_bits(region->iova_base); + region = data->plat_data->iova_region + domid; + larb_mmu->bank[portid] = upper_32_bits(region->iova_base); - dev_dbg(dev, "%s iommu for larb(%s) port %d dom %d bank %d.\n", - enable ? "enable" : "disable", dev_name(larb_mmu->dev), - portid, domid, larb_mmu->bank[portid]); + dev_dbg(dev, "%s iommu for larb(%s) port %d dom %d bank %d.\n", + enable ? "enable" : "disable", dev_name(larb_mmu->dev), + portid, domid, larb_mmu->bank[portid]); - if (enable) - larb_mmu->mmu |= MTK_SMI_MMU_EN(portid); - else - larb_mmu->mmu &= ~MTK_SMI_MMU_EN(portid); + if (enable) + larb_mmu->mmu |= MTK_SMI_MMU_EN(portid); + else + larb_mmu->mmu &= ~MTK_SMI_MMU_EN(portid); + } } } @@ -593,6 +599,9 @@ static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
[PATCH v6 19/34] iommu/mediatek: Add a PM_CLK_AO flag for infra iommu
The power/clock of infra iommu is always on, and it doesn't have the device link with the master devices, then the infra iommu device's PM status is not active, thus we add A PM_CLK_AO flag for infra iommu. The tlb operation is a bit not clear here, there are 2 special cases. Comment them in the code. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index b737613e9046..afb77a530f32 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -130,6 +130,8 @@ #define MTK_IOMMU_TYPE_MM (0x0 << 13) #define MTK_IOMMU_TYPE_INFRA (0x1 << 13) #define MTK_IOMMU_TYPE_MASK(0x3 << 13) +/* PM and clock always on. e.g. infra iommu */ +#define PM_CLK_AO BIT(15) #define MTK_IOMMU_HAS_FLAG(pdata, _x) (!!(((pdata)->flags) & (_x))) @@ -235,13 +237,33 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, struct mtk_iommu_data *data) { struct list_head *head = data->hw_list; + bool check_pm_status; unsigned long flags; int ret; u32 tmp; for_each_m4u(data, head) { - if (pm_runtime_get_if_in_use(data->dev) <= 0) - continue; + /* +* To avoid resume the iommu device frequently when the iommu device +* is not active, it doesn't always call pm_runtime_get here, then tlb +* flush depends on the tlb flush all in the runtime resume. +* +* There are 2 special cases: +* +* Case1: The iommu dev doesn't have power domain but has bclk. This case +* should also avoid the tlb flush while the dev is not active to mute +* the tlb timeout log. like mt8173. +* +* Case2: The power/clock of infra iommu is always on, and it doesn't +* have the device link with the master devices. This case should avoid +* the PM status check. +*/ + check_pm_status = !MTK_IOMMU_HAS_FLAG(data->plat_data, PM_CLK_AO); + + if (check_pm_status) { + if (pm_runtime_get_if_in_use(data->dev) <= 0) + continue; + } spin_lock_irqsave(&data->tlb_lock, flags); writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, @@ -268,7 +290,8 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, mtk_iommu_tlb_flush_all(data); } - pm_runtime_put(data->dev); + if (check_pm_status) + pm_runtime_put(data->dev); } } -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 17/34] iommu/mediatek: Adjust device link when it is sub-common
For MM IOMMU, We always add device link between smi-common and IOMMU HW. In mt8195, we add smi-sub-common. Thus, if the node is sub-common, we still need find again to get smi-common, then do device link. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index b048986913b9..2746e178d2be 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -834,7 +834,7 @@ static const struct component_master_ops mtk_iommu_com_ops = { static int mtk_iommu_mm_dts_parse(struct device *dev, struct component_match **match, struct mtk_iommu_data *data) { - struct device_node *larbnode, *smicomm_node; + struct device_node *larbnode, *smicomm_node, *smi_subcomm_node; struct platform_device *plarbdev; struct device_link *link; int i, larb_nr, ret; @@ -874,11 +874,21 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, struct component_match **m component_compare_of, larbnode); } - /* Get smi-common dev from the last larb. */ - smicomm_node = of_parse_phandle(larbnode, "mediatek,smi", 0); - if (!smicomm_node) + /* Get smi-(sub)-common dev from the last larb. */ + smi_subcomm_node = of_parse_phandle(larbnode, "mediatek,smi", 0); + if (!smi_subcomm_node) return -EINVAL; + /* +* It may have two level smi-common. the node is smi-sub-common if it +* has a new mediatek,smi property. otherwise it is smi-commmon. +*/ + smicomm_node = of_parse_phandle(smi_subcomm_node, "mediatek,smi", 0); + if (smicomm_node) + of_node_put(smi_subcomm_node); + else + smicomm_node = smi_subcomm_node; + plarbdev = of_find_device_by_node(smicomm_node); of_node_put(smicomm_node); data->smicomm_dev = &plarbdev->dev; -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 15/34] iommu/mediatek: Add IOMMU_TYPE flag
Add IOMMU_TYPE definition. In the mt8195, we have another IOMMU_TYPE: infra iommu, also there will be another APU_IOMMU, thus, use 2bits for the IOMMU_TYPE. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 84d661e0b371..642949aad47e 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -126,9 +126,17 @@ #define SHARE_PGTABLE BIT(10) /* 2 HW share pgtable */ #define DCM_DISABLEBIT(11) #define NOT_STD_AXI_MODE BIT(12) +/* 2 bits: iommu type */ +#define MTK_IOMMU_TYPE_MM (0x0 << 13) +#define MTK_IOMMU_TYPE_INFRA (0x1 << 13) +#define MTK_IOMMU_TYPE_MASK(0x3 << 13) -#define MTK_IOMMU_HAS_FLAG(pdata, _x) \ - pdata)->flags) & (_x)) == (_x)) +#define MTK_IOMMU_HAS_FLAG(pdata, _x) (!!(((pdata)->flags) & (_x))) + +#define MTK_IOMMU_HAS_FLAG_MASK(pdata, _x, mask) \ + pdata)->flags) & (mask)) == (_x)) +#define MTK_IOMMU_IS_TYPE(pdata, _x) MTK_IOMMU_HAS_FLAG_MASK(pdata, _x,\ + MTK_IOMMU_TYPE_MASK) struct mtk_iommu_domain { struct io_pgtable_cfg cfg; -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 14/34] iommu/mediatek: Add SUB_COMMON_3BITS flag
In prevous SoC, the sub common id occupy 2 bits. the mt8195's sub common id has 3bits. Add a new flag for this. and rename the previous flag to _2BITS. For readable, I put these two flags together, then move the other flags. no functional change. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 26 -- drivers/iommu/mtk_iommu.h | 2 +- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index d59f6857a9df..84d661e0b371 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -105,6 +105,8 @@ #define REG_MMU1_INT_ID0x154 #define F_MMU_INT_ID_COMM_ID(a)(((a) >> 9) & 0x7) #define F_MMU_INT_ID_SUB_COMM_ID(a)(((a) >> 7) & 0x3) +#define F_MMU_INT_ID_COMM_ID_EXT(a)(((a) >> 10) & 0x7) +#define F_MMU_INT_ID_SUB_COMM_ID_EXT(a)(((a) >> 7) & 0x7) #define F_MMU_INT_ID_LARB_ID(a)(((a) >> 7) & 0x7) #define F_MMU_INT_ID_PORT_ID(a)(((a) >> 2) & 0x1f) @@ -116,13 +118,14 @@ #define HAS_VLD_PA_RNG BIT(2) #define RESET_AXI BIT(3) #define OUT_ORDER_WR_ENBIT(4) -#define HAS_SUB_COMM BIT(5) -#define WR_THROT_ENBIT(6) -#define HAS_LEGACY_IVRP_PADDR BIT(7) -#define IOVA_34_EN BIT(8) -#define SHARE_PGTABLE BIT(9) /* 2 HW share pgtable */ -#define DCM_DISABLEBIT(10) -#define NOT_STD_AXI_MODE BIT(11) +#define HAS_SUB_COMM_2BITS BIT(5) +#define HAS_SUB_COMM_3BITS BIT(6) +#define WR_THROT_ENBIT(7) +#define HAS_LEGACY_IVRP_PADDR BIT(8) +#define IOVA_34_EN BIT(9) +#define SHARE_PGTABLE BIT(10) /* 2 HW share pgtable */ +#define DCM_DISABLEBIT(11) +#define NOT_STD_AXI_MODE BIT(12) #define MTK_IOMMU_HAS_FLAG(pdata, _x) \ pdata)->flags) & (_x)) == (_x)) @@ -290,9 +293,12 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) fault_pa |= (u64)pa34_32 << 32; fault_port = F_MMU_INT_ID_PORT_ID(regval); - if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM)) { + if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_2BITS)) { fault_larb = F_MMU_INT_ID_COMM_ID(regval); sub_comm = F_MMU_INT_ID_SUB_COMM_ID(regval); + } else if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM_3BITS)) { + fault_larb = F_MMU_INT_ID_COMM_ID_EXT(regval); + sub_comm = F_MMU_INT_ID_SUB_COMM_ID_EXT(regval); } else { fault_larb = F_MMU_INT_ID_LARB_ID(regval); } @@ -1069,7 +1075,7 @@ static const struct mtk_iommu_plat_data mt2712_data = { static const struct mtk_iommu_plat_data mt6779_data = { .m4u_plat = M4U_MT6779, - .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN | + .flags = HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN | WR_THROT_EN | NOT_STD_AXI_MODE, .inv_sel_reg = REG_MMU_INV_SEL_GEN2, .iova_region = single_domain, @@ -1107,7 +1113,7 @@ static const struct mtk_iommu_plat_data mt8183_data = { static const struct mtk_iommu_plat_data mt8192_data = { .m4u_plat = M4U_MT8192, - .flags = HAS_BCLK | HAS_SUB_COMM | OUT_ORDER_WR_EN | + .flags = HAS_BCLK | HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN | WR_THROT_EN | IOVA_34_EN | NOT_STD_AXI_MODE, .inv_sel_reg= REG_MMU_INV_SEL_GEN2, .iova_region= mt8192_multi_dom, diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h index dc868fce0d2a..f41e32252056 100644 --- a/drivers/iommu/mtk_iommu.h +++ b/drivers/iommu/mtk_iommu.h @@ -20,7 +20,7 @@ #include #define MTK_LARB_COM_MAX 8 -#define MTK_LARB_SUBCOM_MAX4 +#define MTK_LARB_SUBCOM_MAX8 #define MTK_IOMMU_GROUP_MAX8 -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 13/34] iommu/mediatek: Always enable output PA over 32bits in isr
Currently the output PA[32:33] is contained by the flag IOVA_34. This is not right. the iova_34 has no relation with pa[32:33], the 32bits iova still could map to pa[32:33]. Move it out from the flag. No need fix tag since currently only mt8192 use the calulation and it always has this IOVA_34 flag. Prepare for the IOMMU that still use IOVA 32bits but its dram size may be over 4GB. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 1f3fe3276aa0..d59f6857a9df 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -283,11 +283,11 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT; if (MTK_IOMMU_HAS_FLAG(data->plat_data, IOVA_34_EN)) { va34_32 = FIELD_GET(F_MMU_INVAL_VA_34_32_MASK, fault_iova); - pa34_32 = FIELD_GET(F_MMU_INVAL_PA_34_32_MASK, fault_iova); fault_iova = fault_iova & F_MMU_INVAL_VA_31_12_MASK; fault_iova |= (u64)va34_32 << 32; - fault_pa |= (u64)pa34_32 << 32; } + pa34_32 = FIELD_GET(F_MMU_INVAL_PA_34_32_MASK, fault_iova); + fault_pa |= (u64)pa34_32 << 32; fault_port = F_MMU_INT_ID_PORT_ID(regval); if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM)) { -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 12/34] iommu/mediatek: Remove the granule in the tlb flush
The MediaTek IOMMU doesn't care about granule when tlb flushing. Remove this variable. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index e7008a20ec74..1f3fe3276aa0 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -219,7 +219,6 @@ static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data) } static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, - size_t granule, struct mtk_iommu_data *data) { struct list_head *head = data->hw_list; @@ -541,8 +540,7 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain, struct mtk_iommu_domain *dom = to_mtk_domain(domain); size_t length = gather->end - gather->start + 1; - mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize, - dom->data); + mtk_iommu_tlb_flush_range_sync(gather->start, length, dom->data); } static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova, @@ -550,7 +548,7 @@ static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova, { struct mtk_iommu_domain *dom = to_mtk_domain(domain); - mtk_iommu_tlb_flush_range_sync(iova, size, size, dom->data); + mtk_iommu_tlb_flush_range_sync(iova, size, dom->data); } static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain, -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 11/34] iommu/mediatek: Add a flag NON_STD_AXI
Add a new flag NON_STD_AXI, All the previous SoC support this flag. Prepare for adding infra and apu iommu which don't support this. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 92f172a772d1..e7008a20ec74 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -122,6 +122,7 @@ #define IOVA_34_EN BIT(8) #define SHARE_PGTABLE BIT(9) /* 2 HW share pgtable */ #define DCM_DISABLEBIT(10) +#define NOT_STD_AXI_MODE BIT(11) #define MTK_IOMMU_HAS_FLAG(pdata, _x) \ pdata)->flags) & (_x)) == (_x)) @@ -785,7 +786,8 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data) regval = 0; } else { regval = readl_relaxed(data->base + REG_MMU_MISC_CTRL); - regval &= ~F_MMU_STANDARD_AXI_MODE_MASK; + if (MTK_IOMMU_HAS_FLAG(data->plat_data, NOT_STD_AXI_MODE)) + regval &= ~F_MMU_STANDARD_AXI_MODE_MASK; if (MTK_IOMMU_HAS_FLAG(data->plat_data, OUT_ORDER_WR_EN)) regval &= ~F_MMU_IN_ORDER_WR_EN_MASK; } @@ -1058,7 +1060,8 @@ static const struct dev_pm_ops mtk_iommu_pm_ops = { static const struct mtk_iommu_plat_data mt2712_data = { .m4u_plat = M4U_MT2712, - .flags= HAS_4GB_MODE | HAS_BCLK | HAS_VLD_PA_RNG | SHARE_PGTABLE, + .flags= HAS_4GB_MODE | HAS_BCLK | HAS_VLD_PA_RNG | SHARE_PGTABLE | + NOT_STD_AXI_MODE, .hw_list = &m4ulist, .inv_sel_reg = REG_MMU_INV_SEL_GEN1, .iova_region = single_domain, @@ -1068,7 +1071,8 @@ static const struct mtk_iommu_plat_data mt2712_data = { static const struct mtk_iommu_plat_data mt6779_data = { .m4u_plat = M4U_MT6779, - .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN, + .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN | +NOT_STD_AXI_MODE, .inv_sel_reg = REG_MMU_INV_SEL_GEN2, .iova_region = single_domain, .iova_region_nr = ARRAY_SIZE(single_domain), @@ -1077,7 +1081,7 @@ static const struct mtk_iommu_plat_data mt6779_data = { static const struct mtk_iommu_plat_data mt8167_data = { .m4u_plat = M4U_MT8167, - .flags= RESET_AXI | HAS_LEGACY_IVRP_PADDR, + .flags= RESET_AXI | HAS_LEGACY_IVRP_PADDR | NOT_STD_AXI_MODE, .inv_sel_reg = REG_MMU_INV_SEL_GEN1, .iova_region = single_domain, .iova_region_nr = ARRAY_SIZE(single_domain), @@ -1087,7 +1091,7 @@ static const struct mtk_iommu_plat_data mt8167_data = { static const struct mtk_iommu_plat_data mt8173_data = { .m4u_plat = M4U_MT8173, .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI | - HAS_LEGACY_IVRP_PADDR, + HAS_LEGACY_IVRP_PADDR | NOT_STD_AXI_MODE, .inv_sel_reg = REG_MMU_INV_SEL_GEN1, .iova_region = single_domain, .iova_region_nr = ARRAY_SIZE(single_domain), @@ -1106,7 +1110,7 @@ static const struct mtk_iommu_plat_data mt8183_data = { static const struct mtk_iommu_plat_data mt8192_data = { .m4u_plat = M4U_MT8192, .flags = HAS_BCLK | HAS_SUB_COMM | OUT_ORDER_WR_EN | - WR_THROT_EN | IOVA_34_EN, + WR_THROT_EN | IOVA_34_EN | NOT_STD_AXI_MODE, .inv_sel_reg= REG_MMU_INV_SEL_GEN2, .iova_region= mt8192_multi_dom, .iova_region_nr = ARRAY_SIZE(mt8192_multi_dom), -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 10/34] iommu/mediatek: Add a flag DCM_DISABLE
In the infra iommu, we should disable DCM. add a new flag for this. Signed-off-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index d91a0c138536..92f172a772d1 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -51,6 +51,8 @@ #define F_MMU_STANDARD_AXI_MODE_MASK (BIT(3) | BIT(19)) #define REG_MMU_DCM_DIS0x050 +#define F_MMU_DCM BIT(8) + #define REG_MMU_WR_LEN_CTRL0x054 #define F_MMU_WR_THROT_DIS_MASK(BIT(5) | BIT(21)) @@ -119,6 +121,7 @@ #define HAS_LEGACY_IVRP_PADDR BIT(7) #define IOVA_34_EN BIT(8) #define SHARE_PGTABLE BIT(9) /* 2 HW share pgtable */ +#define DCM_DISABLEBIT(10) #define MTK_IOMMU_HAS_FLAG(pdata, _x) \ pdata)->flags) & (_x)) == (_x)) @@ -765,7 +768,11 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data) regval = F_MMU_VLD_PA_RNG(7, 4); writel_relaxed(regval, data->base + REG_MMU_VLD_PA_RNG); } - writel_relaxed(0, data->base + REG_MMU_DCM_DIS); + if (MTK_IOMMU_HAS_FLAG(data->plat_data, DCM_DISABLE)) + writel_relaxed(F_MMU_DCM, data->base + REG_MMU_DCM_DIS); + else + writel_relaxed(0, data->base + REG_MMU_DCM_DIS); + if (MTK_IOMMU_HAS_FLAG(data->plat_data, WR_THROT_EN)) { /* write command throttling mode */ regval = readl_relaxed(data->base + REG_MMU_WR_LEN_CTRL); -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu