RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
> From: Jason Gunthorpe > Sent: Thursday, April 29, 2021 4:46 AM > > > > I think the name IOASID is fine for the uAPI, the kernel version can > > > be called ioasid_id or something. > > > > ioasid is already an id and then ioasid_id just adds confusion. Another > > point is that ioasid is currently used to represent both PCI PASID and > > ARM substream ID in the kernel. It implies that if we want to separate > > ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with > > another general term usable for substream ID. Are we making the > > terms too confusing here? > > This is why I also am not so sure about exposing the PASID in the API > because it is ultimately a HW specific item. > > As I said to David, one avenue is to have some generic uAPI that is > very general and keep all this deeply detailed stuff, that really only > matters for qemu, as part of a more HW specific vIOMMU driver > interface. > OK, so the general uAPI will not expose hw_id and just provide everything generic for managing I/O page table (map/unmap, nesting, etc.) through IOASID and then specific uAPI is provided to handle platform specific requirements (hw_id, iova windows, etc.) Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework
Hi Keqian, On 5/7/21 6:21 PM, Keqian Zhu wrote: Some types of IOMMU are capable of tracking DMA dirty log, such as ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the dirty log tracking framework in the IOMMU base layer. Four new essential interfaces are added, and we maintaince the status of dirty log tracking in iommu_domain. 1. iommu_support_dirty_log: Check whether domain supports dirty log tracking 2. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking 3. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap 4. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap Note: Don't concurrently call these interfaces with other ops that access underlying page table. Signed-off-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/iommu.c| 201 +++ include/linux/iommu.h| 63 +++ include/trace/events/iommu.h | 63 +++ 3 files changed, 327 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d5df5..0d15620d1e90 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1940,6 +1940,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, domain->type = type; /* Assume all sizes by default; the driver may override this later */ domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; + mutex_init(>switch_log_lock); return domain; } @@ -2703,6 +2704,206 @@ int iommu_set_pgtable_quirks(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks); +bool iommu_support_dirty_log(struct iommu_domain *domain) +{ + const struct iommu_ops *ops = domain->ops; + + return ops->support_dirty_log && ops->support_dirty_log(domain); +} +EXPORT_SYMBOL_GPL(iommu_support_dirty_log); I suppose this interface is to ask the vendor IOMMU driver to check whether each device/iommu in the domain supports dirty bit tracking. But what will happen if new devices with different tracking capability are added afterward? To make things simple, is it possible to support this tracking only when all underlying IOMMUs support dirty bit tracking? Or, the more crazy idea is that we don't need to check this capability at all. If dirty bit tracking is not supported by hardware, just mark all pages dirty? + +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable, + unsigned long iova, size_t size, int prot) +{ + const struct iommu_ops *ops = domain->ops; + unsigned long orig_iova = iova; + unsigned int min_pagesz; + size_t orig_size = size; + bool flush = false; + int ret = 0; + + if (unlikely(!ops->switch_dirty_log)) + return -ENODEV; + + min_pagesz = 1 << __ffs(domain->pgsize_bitmap); + if (!IS_ALIGNED(iova | size, min_pagesz)) { + pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n", + iova, size, min_pagesz); + return -EINVAL; + } + + mutex_lock(>switch_log_lock); + if (enable && domain->dirty_log_tracking) { + ret = -EBUSY; + goto out; + } else if (!enable && !domain->dirty_log_tracking) { + ret = -EINVAL; + goto out; + } + + pr_debug("switch_dirty_log %s for: iova 0x%lx size 0x%zx\n", +enable ? "enable" : "disable", iova, size); + + while (size) { + size_t pgsize = iommu_pgsize(domain, iova, size); + + flush = true; + ret = ops->switch_dirty_log(domain, enable, iova, pgsize, prot); Per minimal page callback is much expensive. How about using (pagesize, count), so that all pages with the same page size could be handled in a single indirect call? I remember I commented this during last review, but I don't mind doing it again. Best regards, baolu + if (ret) + break; + + pr_debug("switch_dirty_log handled: iova 0x%lx size 0x%zx\n", +iova, pgsize); + + iova += pgsize; + size -= pgsize; + } + + if (flush) + iommu_flush_iotlb_all(domain); + + if (!ret) { + domain->dirty_log_tracking = enable; + trace_switch_dirty_log(orig_iova, orig_size, enable); + } +out: + mutex_unlock(>switch_log_lock); + return ret; +} +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log); + +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova, +size_t size, unsigned long *bitmap, +unsigned long base_iova, unsigned long bitmap_pgshift) +{ + const struct iommu_ops *ops = domain->ops; + unsigned long orig_iova = iova; + unsigned int min_pagesz; + size_t orig_size = size; + int ret = 0; + + if
[PATCH -next] iommu/virtio: Add missing MODULE_DEVICE_TABLE
This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this driver when it is built as an external module. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- drivers/iommu/virtio-iommu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 7c02481a81b4..c6e5ee4d9cef 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -1136,6 +1136,7 @@ static struct virtio_device_id id_table[] = { { VIRTIO_ID_IOMMU, VIRTIO_DEV_ANY_ID }, { 0 }, }; +MODULE_DEVICE_TABLE(virtio, id_table); static struct virtio_driver virtio_iommu_drv = { .driver.name= KBUILD_MODNAME, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/of: Fix request and enable ACS for of_iommu_configure
Hi Bjorn, On 2021/5/8 5:14, Bjorn Helgaas wrote: On Fri, May 07, 2021 at 12:49:53PM +, Wang Xingang wrote: From: Xingang Wang When request ACS for PCI device in of_iommu_configure, the pci device has already been scanned and added with 'pci_acs_enable=0'. So the pci_request_acs() in current procedure does not work for enabling ACS. Besides, the ACS should be enabled only if there's an IOMMU in system. So this fix the call of pci_request_acs() and call pci_enable_acs() to make sure ACS is enabled for the pci_device. For consistency: s/of_iommu_configure/of_iommu_configure()/ s/pci device/PCI device/ s/pci_device/PCI device/ But I'm confused about what problem this fixes. On x86, I think we *do* set pci_acs_enable=1 in this path: start_kernel mm_init mem_init pci_iommu_alloc p->detect() detect_intel_iommu # IOMMU_INIT_POST(detect_intel_iommu) pci_request_acs pci_acs_enable = 1 before enumerating any PCI devices. But you mentioned pci_host_common_probe(), which I think is mostly used on non-x86 architectures, and I'm guessing those arches detect the IOMMU differently. I am testing on an arm architecture, and found that the ACS is enabled properly when booting with ACPI. With ACPI enabled, x86 IOMMU is checked when parsing DMAR, arm SMMU is checked when parsing IORT. But when booting with devicetree, IOMMU is checked in of_iommu_configure, and the pci_request_acs() is late at this time. So my question is, can we figure out how to detect IOMMUs the same way across all arches? Thanks. It would be better if there's a way to detect IOMMUs across all arches and both for ACPI and DT. Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when configuring IOMMU linkage") Signed-off-by: Xingang Wang --- drivers/iommu/of_iommu.c | 10 +- drivers/pci/pci.c| 2 +- include/linux/pci.h | 1 + 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index a9d2df001149..dc621861ae72 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, .np = master_np, }; - pci_request_acs(); err = pci_for_each_dma_alias(to_pci_dev(dev), of_pci_iommu_init, ); } else { @@ -222,6 +221,15 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, /* The fwspec pointer changed, read it again */ fwspec = dev_iommu_fwspec_get(dev); ops= fwspec->ops; + + /* +* If we found an IOMMU and the device is pci, +* make sure we enable ACS. s/pci/PCI/ for consistency. +*/ + if (dev_is_pci(dev)) { + pci_request_acs(); + pci_enable_acs(to_pci_dev(dev)); + } } /* * If we have reason to believe the IOMMU driver missed the initial diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b717680377a9..4e4f98ee2870 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -926,7 +926,7 @@ static void pci_std_enable_acs(struct pci_dev *dev) * pci_enable_acs - enable ACS if hardware support it * @dev: the PCI device */ -static void pci_enable_acs(struct pci_dev *dev) +void pci_enable_acs(struct pci_dev *dev) { if (!pci_acs_enable) goto disable_acs_redir; diff --git a/include/linux/pci.h b/include/linux/pci.h index c20211e59a57..e6a8bfbc9c98 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2223,6 +2223,7 @@ static inline struct pci_dev *pcie_find_root_port(struct pci_dev *dev) } void pci_request_acs(void); +void pci_enable_acs(struct pci_dev *dev); bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags); bool pci_acs_path_enabled(struct pci_dev *start, struct pci_dev *end, u16 acs_flags); -- 2.19.1 . Thanks Xingang . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Regression when booting 5.15 as dom0 on arm64 (WAS: Re: [linux-linus test] 161829: regressions - FAIL)
Hi all, On 07/05/2021 22:00, osstest service owner wrote: flight 161829 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/161829/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: [...] test-arm64-arm64-examine 8 reboot fail REGR. vs. 152332 test-arm64-arm64-libvirt-xsm 8 xen-boot fail REGR. vs. 152332 test-arm64-arm64-xl-credit2 8 xen-boot fail REGR. vs. 152332 test-arm64-arm64-xl-credit1 8 xen-boot fail REGR. vs. 152332 test-arm64-arm64-xl 8 xen-boot fail REGR. vs. 152332 test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 152332 test-amd64-amd64-amd64-pvgrub 20 guest-stop fail REGR. vs. 152332 test-amd64-amd64-i386-pvgrub 20 guest-stop fail REGR. vs. 152332 test-arm64-arm64-xl-thunderx 8 xen-boot fail REGR. vs. 152332 test-amd64-amd64-xl-qcow213 guest-start fail REGR. vs. 152332 test-amd64-amd64-libvirt-vhd 13 guest-start fail REGR. vs. 152332 test-amd64-amd64-examine 4 memdisk-try-append fail REGR. vs. 152332 test-arm64-arm64-xl-seattle 8 xen-boot fail REGR. vs. 152332 test-armhf-armhf-xl-vhd 13 guest-start fail REGR. vs. 152332 Osstest reported dom0 boot failure on all the arm64 platform we have. The stack trace is similar everywhere: [ 18.101077] Unable to handle kernel NULL pointer dereference at virtual address 0008 [ 18.101441] Mem abort info: [ 18.101625] ESR = 0x9604 [ 18.101839] EC = 0x25: DABT (current EL), IL = 32 bits [ 18.102111] SET = 0, FnV = 0 [ 18.102327] EA = 0, S1PTW = 0 [ 18.102544] Data abort info: [ 18.102747] ISV = 0, ISS = 0x0004 [ 18.102968] CM = 0, WnR = 0 [ 18.103183] [0008] user address but active_mm is swapper [ 18.103476] Internal error: Oops: 9604 [#1] SMP [ 18.103689] Modules linked in: [ 18.103881] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc3+ #126 [ 18.104172] Hardware name: Foundation-v8A (DT) [ 18.104376] pstate: 6005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 18.104653] pc : xen_swiotlb_dma_supported+0x30/0xc8 [ 18.104893] lr : dma_supported+0x38/0x68 [ 18.105118] sp : 80001295bac0 [ 18.105289] x29: 80001295bac0 x28: 8000114f44c0 [ 18.105600] x27: 0007 x26: 8000113a1000 [ 18.105906] x25: x24: 800011d2e910 [ 18.106213] x23: 800011d4d000 x22: 12fad810 [ 18.106525] x21: x20: [ 18.106837] x19: 12fad810 x18: ffeb [ 18.107146] x17: x16: 493f1445 [ 18.107450] x15: 80001132d000 x14: 1c131000 [ 18.107759] x13: 498d0616 x12: 8000129f7000 [ 18.108068] x11: 12c08710 x10: 800011a91000 [ 18.108380] x9 : 3000 x8 : 1000 [ 18.108722] x7 : 800011a90a88 x6 : 800010a7275c [ 18.109031] x5 : x4 : 0001 [ 18.109331] x3 : 2cd8f9dc91b3df00 x2 : 8000109c7578 [ 18.109640] x1 : x0 : [ 18.109940] Call trace: [ 18.110079] xen_swiotlb_dma_supported+0x30/0xc8 [ 18.110319] dma_supported+0x38/0x68 [ 18.110543] dma_set_mask+0x30/0x58 [ 18.110765] virtio_mmio_probe+0x1c8/0x238 [ 18.110979] platform_probe+0x6c/0x108 [ 18.88] really_probe+0xfc/0x3c8 [ 18.111413] driver_probe_device+0x68/0xe8 [ 18.111647] device_driver_attach+0x74/0x98 [ 18.111883] __driver_attach+0x98/0xe0 [ 18.112111] bus_for_each_dev+0x84/0xd8 [ 18.112334] driver_attach+0x30/0x40 [ 18.112557] bus_add_driver+0x168/0x228 [ 18.112784] driver_register+0x64/0x110 [ 18.113016] __platform_driver_register+0x34/0x40 [ 18.113257] virtio_mmio_init+0x20/0x28 [ 18.113480] do_one_initcall+0x90/0x470 [ 18.113694] kernel_init_freeable+0x3ec/0x440 [ 18.113950] kernel_init+0x1c/0x138 [ 18.114158] ret_from_fork+0x10/0x18 [ 18.114409] Code: f000f641 f000a200 f944e821 f9410400 (f9400434) [ 18.114718] ---[ end trace d39406719f25d228 ]--- [ 18.115044] Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b [ 18.115339] SMP: stopping secondary CPUs [ 18.115584] Kernel Offset: disabled [ 18.115743] CPU features: 0x0024,61802000 [ 18.115954] Memory Limit: none [ 18.116173] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b ]--- I have bisected manually and pinpoint the following commit: commit 2726bf3ff2520dba61fafc90a055640f7ad54e05 (refs/bisect/bad) Author: Florian Fainelli Date: Mon Mar 22 18:53:49 2021 -0700 swiotlb: Make SWIOTLB_NO_FORCE perform no allocation When SWIOTLB_NO_FORCE is used, there should really be no allocations of default_nslabs to occur since we
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
Hi Jason, On Fri, 7 May 2021 16:28:10 -0300, Jason Gunthorpe wrote: > > The unanswered question is how do we plumb from vIOMMU without a custom > > allocator to get a system wide PASID? > > PASID allocation is part of the iommu driver, it really shouldn't be > global. > In the current code, the pluggable custom allocator *is* part of the iommu vendor driver. If it decides the allocation is global then it should be suitable for the platform since there will never be a VT-d IOMMU on another vendor's platform. It is true that the default allocator is global which suites the current needs. I am just wondering if we are solving a problem does not exist yet. > When the architecture code goes to allocate a single PASID for the > mm_struct it should flag that allocation request with a 'must work for > all RIDs flag' and the iommu driver should take care of it. That might > mean the iommu driver consults a global static xarray, or maybe it > does a hypercall, but it should be done through that API, not a side > care global singleton. Why do we need to flag the allocation every time if on a platform *every* PASID can potentially be global? At the time of allocation, we don't know if the PASID will be used for a shared (ENQCMD) or a dedicated workqueue. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/of: Fix request and enable ACS for of_iommu_configure
On Fri, May 07, 2021 at 12:49:53PM +, Wang Xingang wrote: > From: Xingang Wang > > When request ACS for PCI device in of_iommu_configure, the pci device > has already been scanned and added with 'pci_acs_enable=0'. So the > pci_request_acs() in current procedure does not work for enabling ACS. > Besides, the ACS should be enabled only if there's an IOMMU in system. > So this fix the call of pci_request_acs() and call pci_enable_acs() to > make sure ACS is enabled for the pci_device. For consistency: s/of_iommu_configure/of_iommu_configure()/ s/pci device/PCI device/ s/pci_device/PCI device/ But I'm confused about what problem this fixes. On x86, I think we *do* set pci_acs_enable=1 in this path: start_kernel mm_init mem_init pci_iommu_alloc p->detect() detect_intel_iommu # IOMMU_INIT_POST(detect_intel_iommu) pci_request_acs pci_acs_enable = 1 before enumerating any PCI devices. But you mentioned pci_host_common_probe(), which I think is mostly used on non-x86 architectures, and I'm guessing those arches detect the IOMMU differently. So my question is, can we figure out how to detect IOMMUs the same way across all arches? > Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when > configuring IOMMU linkage") > Signed-off-by: Xingang Wang > --- > drivers/iommu/of_iommu.c | 10 +- > drivers/pci/pci.c| 2 +- > include/linux/pci.h | 1 + > 3 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c > index a9d2df001149..dc621861ae72 100644 > --- a/drivers/iommu/of_iommu.c > +++ b/drivers/iommu/of_iommu.c > @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device > *dev, > .np = master_np, > }; > > - pci_request_acs(); > err = pci_for_each_dma_alias(to_pci_dev(dev), >of_pci_iommu_init, ); > } else { > @@ -222,6 +221,15 @@ const struct iommu_ops *of_iommu_configure(struct device > *dev, > /* The fwspec pointer changed, read it again */ > fwspec = dev_iommu_fwspec_get(dev); > ops= fwspec->ops; > + > + /* > + * If we found an IOMMU and the device is pci, > + * make sure we enable ACS. s/pci/PCI/ for consistency. > + */ > + if (dev_is_pci(dev)) { > + pci_request_acs(); > + pci_enable_acs(to_pci_dev(dev)); > + } > } > /* >* If we have reason to believe the IOMMU driver missed the initial > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index b717680377a9..4e4f98ee2870 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -926,7 +926,7 @@ static void pci_std_enable_acs(struct pci_dev *dev) > * pci_enable_acs - enable ACS if hardware support it > * @dev: the PCI device > */ > -static void pci_enable_acs(struct pci_dev *dev) > +void pci_enable_acs(struct pci_dev *dev) > { > if (!pci_acs_enable) > goto disable_acs_redir; > diff --git a/include/linux/pci.h b/include/linux/pci.h > index c20211e59a57..e6a8bfbc9c98 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -2223,6 +2223,7 @@ static inline struct pci_dev > *pcie_find_root_port(struct pci_dev *dev) > } > > void pci_request_acs(void); > +void pci_enable_acs(struct pci_dev *dev); > bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags); > bool pci_acs_path_enabled(struct pci_dev *start, > struct pci_dev *end, u16 acs_flags); > -- > 2.19.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/6] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema
On Fri, 07 May 2021 11:02:27 +0200, Benjamin Gaignard wrote: > Convert Rockchip IOMMU to DT schema > > Signed-off-by: Benjamin Gaignard > --- > version 4: > - Add descriptions for reg items > - Add description for interrupts items > - Remove useless interrupt-names proporties > > version 2: > - Change maintainer > - Change reg maxItems > - Change interrupt maxItems > .../bindings/iommu/rockchip,iommu.txt | 38 - > .../bindings/iommu/rockchip,iommu.yaml| 80 +++ > 2 files changed, 80 insertions(+), 38 deletions(-) > delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt > create mode 100644 > Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml > Reviewed-by: Rob Herring ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Fri, May 07, 2021 at 12:23:25PM -0700, Raj, Ashok wrote: > Hi Jason > > - Removed lizefan's email due to bounces... > > On Fri, May 07, 2021 at 03:20:50PM -0300, Jason Gunthorpe wrote: > > On Fri, May 07, 2021 at 11:14:58AM -0700, Raj, Ashok wrote: > > > On Fri, May 07, 2021 at 02:20:51PM -0300, Jason Gunthorpe wrote: > > > > On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote: > > > > > > > > > For platforms that support ENQCMD, it is required to mandate PASIDs > > > > > are > > > > > global across the entire system. Maybe its better to call them gPASID > > > > > for > > > > > guest and hPASID for host. Short reason being gPASID->hPASID is a > > > > > guest > > > > > wide mapping for ENQCMD and not a per-RID based mapping. (We covered > > > > > that > > > > > in earlier responses) > > > > > > > > I don't think it is actually ENQCMD that forces this, ENQCMD can use a > > > > per-RID PASID in the translation table as well. > > > > > > When using ENQCMD the PASID that needs to be sent on the wire is picked > > > from an MSR setup by kernel. This is context switched along with the > > > process. So each process has only 1 PASID that can go out when using > > > ENQCMD. ENQCMD takes one mmio address specific to the acclerator and a > > > source for the descriptor. > > > > Oh. I forgot this also globally locked the PASID to a single > > MSR. Sigh. That makes the whole mechanism useless for anything except > > whole process SVA. > > Is there another kind of SVA? Our mapping from that each process requires a > single mm, and PASID for SVM was a direct map from that. There are lots of potential applications for something like ENQCMD that are not whole process SVA. Linking it to a single PASID basically nukes any other use of it unfortunately. > > I would have to ask for a PASID that has the property it needs. You > > are saying the property is even bigger than "usable on a group of > > RIDs" but is actually "global for every RID and IOMMU in the system so > > it can go into a MSR". Gross, but fine, ask for that explicitly when > > allocating the PASID. > > If one process has a single mm, is that also gross? :-) So a single process > having a PASID is just an identifier for IOMMU. It just seems like what a > mm is for a process == PASID for SVM-IOMMU support. > > The unanswered question is how do we plumb from vIOMMU without a custom > allocator to get a system wide PASID? PASID allocation is part of the iommu driver, it really shouldn't be global. When the architecture code goes to allocate a single PASID for the mm_struct it should flag that allocation request with a 'must work for all RIDs flag' and the iommu driver should take care of it. That might mean the iommu driver consults a global static xarray, or maybe it does a hypercall, but it should be done through that API, not a side care global singleton. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
Hi Jason - Removed lizefan's email due to bounces... On Fri, May 07, 2021 at 03:20:50PM -0300, Jason Gunthorpe wrote: > On Fri, May 07, 2021 at 11:14:58AM -0700, Raj, Ashok wrote: > > On Fri, May 07, 2021 at 02:20:51PM -0300, Jason Gunthorpe wrote: > > > On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote: > > > > > > > For platforms that support ENQCMD, it is required to mandate PASIDs are > > > > global across the entire system. Maybe its better to call them gPASID > > > > for > > > > guest and hPASID for host. Short reason being gPASID->hPASID is a guest > > > > wide mapping for ENQCMD and not a per-RID based mapping. (We covered > > > > that > > > > in earlier responses) > > > > > > I don't think it is actually ENQCMD that forces this, ENQCMD can use a > > > per-RID PASID in the translation table as well. > > > > When using ENQCMD the PASID that needs to be sent on the wire is picked > > from an MSR setup by kernel. This is context switched along with the > > process. So each process has only 1 PASID that can go out when using > > ENQCMD. ENQCMD takes one mmio address specific to the acclerator and a > > source for the descriptor. > > Oh. I forgot this also globally locked the PASID to a single > MSR. Sigh. That makes the whole mechanism useless for anything except > whole process SVA. Is there another kind of SVA? Our mapping from that each process requires a single mm, and PASID for SVM was a direct map from that. > > It also make it a general kernel problem and not just related to the > vIOMMU scenario. > > > > I think at the uAPI level the callpaths that require allocating a > > > PASID from a group of RIDs should be explicit in their intention and > > > not implicitly rely on a certain allocator behavior. > > > > The difficult part I see is, when one application establishes a path > > to one acclerator, we have no knowledge if its going to connect to a > > second, third or such. I don't see how this can work reasonably > > well. What if PASIDx is allocated for one, but the second RID its > > trying to attach already has this PASID allocated? > > You mean like some kind of vIOMMU hot plug? Not vIOMMU hot plug. but an application opens accel1, does a bind to allocate a PASID. What i meant was kernel has no information if this needs to be a per-RID PASID, or a global PASID. Keeping this global solves the other problems or more complex mechanisms to say "Reserve this PASID on all accelerators" which seems pretty complicated to implement. Now are we loosing anything by keeping the PASIDs global? As we discussed there is no security issue since the PASID table that hosts these PASIDs for SVM are still per-RID. For e.g. app establishes connection to accl1, allocates PASID-X RID for accel1 now has PASID-X and the process mm plummed later app also connects with accl2, now the PASID-X is plummed in for RID of accel2. > > > > If you want to get a PASID that can be used with every RID on in your > > > /dev/ioasid then ask for that exactly. > > > > Correct, but how does guest through vIOMMU driver communicate that intent > > so uAPI > > plumbing can do this? I mean architecturally via IOMMU interfaces? > > I would have to ask for a PASID that has the property it needs. You > are saying the property is even bigger than "usable on a group of > RIDs" but is actually "global for every RID and IOMMU in the system so > it can go into a MSR". Gross, but fine, ask for that explicitly when > allocating the PASID. If one process has a single mm, is that also gross? :-) So a single process having a PASID is just an identifier for IOMMU. It just seems like what a mm is for a process == PASID for SVM-IOMMU support. The unanswered question is how do we plumb from vIOMMU without a custom allocator to get a system wide PASID? The way it works today is if we have a custom allocator registered, that's the mechanics to get PASIDs allocated. for Intel vIOMMU it happens to be a global unique allocation. If a particular vIOMMU doesn't require, it does not have vIOMMU interface, and those naturally get a guest local PASID name space. (Im not sure if that's how the allocator works today, but I guess its extensible to accomplish a RID local PASID if that's exactly what is required) Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Fri, May 07, 2021 at 11:14:58AM -0700, Raj, Ashok wrote: > On Fri, May 07, 2021 at 02:20:51PM -0300, Jason Gunthorpe wrote: > > On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote: > > > > > For platforms that support ENQCMD, it is required to mandate PASIDs are > > > global across the entire system. Maybe its better to call them gPASID for > > > guest and hPASID for host. Short reason being gPASID->hPASID is a guest > > > wide mapping for ENQCMD and not a per-RID based mapping. (We covered that > > > in earlier responses) > > > > I don't think it is actually ENQCMD that forces this, ENQCMD can use a > > per-RID PASID in the translation table as well. > > When using ENQCMD the PASID that needs to be sent on the wire is picked > from an MSR setup by kernel. This is context switched along with the > process. So each process has only 1 PASID that can go out when using > ENQCMD. ENQCMD takes one mmio address specific to the acclerator and a > source for the descriptor. Oh. I forgot this also globally locked the PASID to a single MSR. Sigh. That makes the whole mechanism useless for anything except whole process SVA. It also make it a general kernel problem and not just related to the vIOMMU scenario. > > I think at the uAPI level the callpaths that require allocating a > > PASID from a group of RIDs should be explicit in their intention and > > not implicitly rely on a certain allocator behavior. > > The difficult part I see is, when one application establishes a path > to one acclerator, we have no knowledge if its going to connect to a > second, third or such. I don't see how this can work reasonably > well. What if PASIDx is allocated for one, but the second RID its > trying to attach already has this PASID allocated? You mean like some kind of vIOMMU hot plug? > > If you want to get a PASID that can be used with every RID on in your > > /dev/ioasid then ask for that exactly. > > Correct, but how does guest through vIOMMU driver communicate that intent so > uAPI > plumbing can do this? I mean architecturally via IOMMU interfaces? I would have to ask for a PASID that has the property it needs. You are saying the property is even bigger than "usable on a group of RIDs" but is actually "global for every RID and IOMMU in the system so it can go into a MSR". Gross, but fine, ask for that explicitly when allocating the PASID. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Fri, May 07, 2021 at 02:20:51PM -0300, Jason Gunthorpe wrote: > On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote: > > > For platforms that support ENQCMD, it is required to mandate PASIDs are > > global across the entire system. Maybe its better to call them gPASID for > > guest and hPASID for host. Short reason being gPASID->hPASID is a guest > > wide mapping for ENQCMD and not a per-RID based mapping. (We covered that > > in earlier responses) > > I don't think it is actually ENQCMD that forces this, ENQCMD can use a > per-RID PASID in the translation table as well. When using ENQCMD the PASID that needs to be sent on the wire is picked from an MSR setup by kernel. This is context switched along with the process. So each process has only 1 PASID that can go out when using ENQCMD. ENQCMD takes one mmio address specific to the acclerator and a source for the descriptor. When one application is connecting to more than one accelerator since this is MSR based filled in by the cpu instruction automaticaly requires both accelerators to use the same PASID. Did you refer to this implementation? or something else? > > You get forced here only based on the design of the vIOMMU > communication channel. If the guest can demand any RID is attached to > a specific guest determined PASID then the hypervisor must accommodate > that. True, but when we have guest using vSVM, and enabling vENQCMD the requirement is the same inside a guest. > > > > Which would be a different behavior than something like Intel's top > > > level IOASID that doesn't claim all the PASIDs. > > > > isn't this simple, if we can say ioasid allocator can provide > > > > - system wide PASID > > - RID local PASID > > > > Based on platform capabilities that require such differentiation? > > I think at the uAPI level the callpaths that require allocating a > PASID from a group of RIDs should be explicit in their intention and > not implicitly rely on a certain allocator behavior. The difficult part I see is, when one application establishes a path to one acclerator, we have no knowledge if its going to connect to a second, third or such. I don't see how this can work reasonably well. What if PASIDx is allocated for one, but the second RID its trying to attach already has this PASID allocated? > > If you want to get a PASID that can be used with every RID on in your > /dev/ioasid then ask for that exactly. Correct, but how does guest through vIOMMU driver communicate that intent so uAPI plumbing can do this? I mean architecturally via IOMMU interfaces? Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote: > For platforms that support ENQCMD, it is required to mandate PASIDs are > global across the entire system. Maybe its better to call them gPASID for > guest and hPASID for host. Short reason being gPASID->hPASID is a guest > wide mapping for ENQCMD and not a per-RID based mapping. (We covered that > in earlier responses) I don't think it is actually ENQCMD that forces this, ENQCMD can use a per-RID PASID in the translation table as well. You get forced here only based on the design of the vIOMMU communication channel. If the guest can demand any RID is attached to a specific guest determined PASID then the hypervisor must accommodate that. > > Which would be a different behavior than something like Intel's top > > level IOASID that doesn't claim all the PASIDs. > > isn't this simple, if we can say ioasid allocator can provide > > - system wide PASID > - RID local PASID > > Based on platform capabilities that require such differentiation? I think at the uAPI level the callpaths that require allocating a PASID from a group of RIDs should be explicit in their intention and not implicitly rely on a certain allocator behavior. If you want to get a PASID that can be used with every RID on in your /dev/ioasid then ask for that exactly. It makes the uAPI much more understandable to be explicit. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Fri, May 07, 2021 at 11:06:14AM -0600, Alex Williamson wrote: > We had tossed around an idea of a super-container with vfio, it's maybe > something we'd want to incorporate into this design. For instance, if > memory could be pre-registered with a super container, which would > handle the locked memory accounting for that memory, then > sub-containers could all handle the IOMMU context of their sets of > devices relative to that common memory pool. This is where I suggested to David to use nesting of IOASIDs. Without HW support for nesting a SW nest is really just re-using the memory registration information stored in the parent when constructing the children Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Fri, 7 May 2021 07:36:49 + "Tian, Kevin" wrote: > > From: Alex Williamson > > Sent: Wednesday, April 28, 2021 11:06 PM > > > > On Wed, 28 Apr 2021 06:34:11 + > > "Tian, Kevin" wrote: > > > > > > Can you or Alex elaborate where the complexity and performance problem > > > locate in VFIO map/umap? We'd like to understand more detail and see > > how > > > to avoid it in the new interface. > > > > > > The map/unmap interface is really only good for long lived mappings, > > the overhead is too high for things like vIOMMU use cases or any case > > where the mapping is intended to be dynamic. Userspace drivers must > > make use of a long lived buffer mapping in order to achieve performance. > > This is not a limitation of VFIO map/unmap. It's the limitation of any > map/unmap semantics since the fact of long-lived vs. short-lived is > imposed by userspace. Nested translation is the only viable optimization > allowing 2nd-level to be a long-lived mapping even w/ vIOMMU. From > this angle I'm not sure how a new map/unmap implementation could > address this perf limitation alone. Sure, we don't need to try to tackle every problem at once, a map/unmap interface compatible with what we have is a good place to start and nested translation may provide the high performance option. That's not to say that we couldn't, in the future, extend the map/unmap with memory pre-registration like done in the spapr IOMMU to see how that could reduce latency. > > The mapping and unmapping granularity has been a problem as well, > > type1v1 allowed arbitrary unmaps to bisect the original mapping, with > > the massive caveat that the caller relies on the return value of the > > unmap to determine what was actually unmapped because the IOMMU use > > of > > superpages is transparent to the caller. This led to type1v2 that > > simply restricts the user to avoid ever bisecting mappings. That still > > leaves us with problems for things like virtio-mem support where we > > need to create initial mappings with a granularity that allows us to > > later remove entries, which can prevent effective use of IOMMU > > superpages. > > We could start with a semantics similar to type1v2. > > btw why does virtio-mem require a smaller granularity? Can we split > superpages in-the-fly when removal actually happens (just similar > to page split in VM live migration for efficient dirty page tracking)? The IOMMU API doesn't currently support those semantics. If the IOMMU used a superpage, then a superpage gets unmapped, it doesn't get atomically broken down into smaller pages. Therefore virtio-mem proposes a fixed mapping granularity to allow for that same unmapping granularity. > and isn't it another problem imposed by userspace? How could a new > map/unmap implementation mitigate this problem if the userspace > insists on a smaller granularity for initial mappings? Currently if userspace wants to guarantee unmap granularity, they need to make the same restriction themselves on the mapping granularity. For instance, userspace cannot currently map a 1GB IOVA range while guaranteeing 2MB unmap granularity of that range with a single ioctl. Instead userspace would need to make 512, 2MB mappings calls. > > Locked page accounting has been another constant issue. We perform > > locked page accounting at the container level, where each container > > accounts independently. A user may require multiple containers, the > > containers may pin the same physical memory, but be accounted against > > the user once per container. > > for /dev/ioasid there is still an open whether an process is allowed to > open /dev/ioasid once or multiple times. If there is only one ioasid_fd > per process, the accounting can be made accurately. otherwise the > same problem still exists as each ioasid_fd is akin to the container, then > we need find a better solution. We had tossed around an idea of a super-container with vfio, it's maybe something we'd want to incorporate into this design. For instance, if memory could be pre-registered with a super container, which would handle the locked memory accounting for that memory, then sub-containers could all handle the IOMMU context of their sets of devices relative to that common memory pool. > > Those are the main ones I can think of. It is nice to have a simple > > map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't > > raise the barrier to entry too high, but the user needs to have the > > ability to have more control of their mappings and locked page > > accounting should probably be offloaded somewhere. Thanks, > > > > Based on your feedbacks I feel it's probably reasonable to start with > a type1v2 semantics for the new interface. Locked accounting could > also start with the same VFIO restriction and then improve it > incrementally, if a cleaner way is intrusive (if not affecting uAPI). > But I didn't get the suggestion on "more control of their
Re: [PATCH v3 4/4] iommu: rockchip: Add support iommu v2
On 2021-05-04 09:41, Benjamin Gaignard wrote: From: Simon Xue RK356x SoC got new IOMMU hardware block (version 2). Add a compatible to distinguish it from the first version. Signed-off-by: Simon Xue [Benjamin] - port driver from kernel 4.19 to 5.12 - change functions prototype. - squash all fixes in this commit. Signed-off-by: Benjamin Gaignard --- version 3: - Change compatible name to use SoC name drivers/iommu/rockchip-iommu.c | 422 +++-- 1 file changed, 406 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index e5d86b7177de..32dcf0aaec18 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -81,6 +82,30 @@ */ #define RK_IOMMU_PGSIZE_BITMAP 0x007ff000 +#define DT_LO_MASK 0xf000 +#define DT_HI_MASK GENMASK_ULL(39, 32) Mixing GENMASK and raw constants seems a bit inconsistent. +#define DT_SHIFT 28 + +#define DTE_BASE_HI_MASK GENMASK(11, 4) + +#define PAGE_DESC_LO_MASK 0xf000 +#define PAGE_DESC_HI1_LOWER 32 +#define PAGE_DESC_HI1_UPPER 35 +#define PAGE_DESC_HI2_LOWER 36 +#define PAGE_DESC_HI2_UPPER 39 +#define PAGE_DESC_HI_MASK1 GENMASK_ULL(PAGE_DESC_HI1_UPPER, PAGE_DESC_HI1_LOWER) +#define PAGE_DESC_HI_MASK2 GENMASK_ULL(PAGE_DESC_HI2_UPPER, PAGE_DESC_HI2_LOWER) It might just be me, but that ends up as just an unreadable mess. I can see the overall intent, but either way it's hardly pretty. Maybe consider trying the bitfield.h helpers so you wouldn't have to keep track of explicit shifts at all? +#define DTE_HI1_LOWER 8 +#define DTE_HI1_UPPER 11 +#define DTE_HI2_LOWER 4 +#define DTE_HI2_UPPER 7 +#define DTE_HI_MASK1 GENMASK(DTE_HI1_UPPER, DTE_HI1_LOWER) +#define DTE_HI_MASK2 GENMASK(DTE_HI2_UPPER, DTE_HI2_LOWER) + +#define PAGE_DESC_HI_SHIFT1 (PAGE_DESC_HI1_LOWER - DTE_HI1_LOWER) +#define PAGE_DESC_HI_SHIFT2 (PAGE_DESC_HI2_LOWER - DTE_HI2_LOWER) What makes it even worse is the contrast between these and "#define DT_SHIFT 28" above, as equal parts of the same patch :( + struct rk_iommu_domain { struct list_head iommus; u32 *dt; /* page directory table */ @@ -91,6 +116,10 @@ struct rk_iommu_domain { struct iommu_domain domain; }; +struct rockchip_iommu_data { + u32 version; +}; + /* list of clocks required by IOMMU */ static const char * const rk_iommu_clocks[] = { "aclk", "iface", @@ -108,6 +137,7 @@ struct rk_iommu { struct list_head node; /* entry in rk_iommu_domain.iommus */ struct iommu_domain *domain; /* domain to which iommu is attached */ struct iommu_group *group; + u32 version; }; struct rk_iommudata { @@ -174,11 +204,32 @@ static struct rk_iommu_domain *to_rk_domain(struct iommu_domain *dom) #define RK_DTE_PT_ADDRESS_MASK0xf000 #define RK_DTE_PT_VALID BIT(0) +/* + * In v2: + * 31:12 - PT address bit 31:0 + * 11: 8 - PT address bit 35:32 + * 7: 4 - PT address bit 39:36 + * 3: 1 - Reserved + * 0 - 1 if PT @ PT address is valid + */ +#define RK_DTE_PT_ADDRESS_MASK_V2 0xfff0 + static inline phys_addr_t rk_dte_pt_address(u32 dte) { return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK; } +static inline phys_addr_t rk_dte_pt_address_v2(u32 dte) +{ + u64 dte_v2 = dte; + + dte_v2 = ((dte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) | +((dte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) | +(dte_v2 & PAGE_DESC_LO_MASK); + + return (phys_addr_t)dte_v2; +} But on current (v1) systems, the respective bits 11:4 of the DTE/PTE and 40:32 of the address should always be 0, so you could in principle just use the packing/unpacking logic all the time, right? That's what we did with 52-bit support in io-pgtable-arm, for instance. + static inline bool rk_dte_is_pt_valid(u32 dte) { return dte & RK_DTE_PT_VALID; @@ -189,6 +240,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma) return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID; } +static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma) +{ + pt_dma = (pt_dma & PAGE_DESC_LO_MASK) | +((pt_dma & PAGE_DESC_HI_MASK1) >> PAGE_DESC_HI_SHIFT1) | +(pt_dma & PAGE_DESC_HI_MASK2) >> PAGE_DESC_HI_SHIFT2; + + return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID; +} + /* * Each PTE has a Page address, some flags and a valid bit: * +-+---+---+-+ @@ -215,11 +275,37 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma) #define RK_PTE_PAGE_READABLE BIT(1) #define RK_PTE_PAGE_VALID BIT(0) +/* + * In v2: + * 31:12 - Page address bit 31:0 + * 11:9 - Page address bit 34:32 + * 8:4 - Page address bit 39:35 + * 3 - Security + * 2 - Readable + * 1 - Writable + * 0 - 1 if Page @ Page
[PATCH 1/1] iommu/of: Fix request and enable ACS for of_iommu_configure
From: Xingang Wang When request ACS for PCI device in of_iommu_configure, the pci device has already been scanned and added with 'pci_acs_enable=0'. So the pci_request_acs() in current procedure does not work for enabling ACS. Besides, the ACS should be enabled only if there's an IOMMU in system. So this fix the call of pci_request_acs() and call pci_enable_acs() to make sure ACS is enabled for the pci_device. Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when configuring IOMMU linkage") Signed-off-by: Xingang Wang --- drivers/iommu/of_iommu.c | 10 +- drivers/pci/pci.c| 2 +- include/linux/pci.h | 1 + 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index a9d2df001149..dc621861ae72 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, .np = master_np, }; - pci_request_acs(); err = pci_for_each_dma_alias(to_pci_dev(dev), of_pci_iommu_init, ); } else { @@ -222,6 +221,15 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, /* The fwspec pointer changed, read it again */ fwspec = dev_iommu_fwspec_get(dev); ops= fwspec->ops; + + /* +* If we found an IOMMU and the device is pci, +* make sure we enable ACS. +*/ + if (dev_is_pci(dev)) { + pci_request_acs(); + pci_enable_acs(to_pci_dev(dev)); + } } /* * If we have reason to believe the IOMMU driver missed the initial diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b717680377a9..4e4f98ee2870 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -926,7 +926,7 @@ static void pci_std_enable_acs(struct pci_dev *dev) * pci_enable_acs - enable ACS if hardware support it * @dev: the PCI device */ -static void pci_enable_acs(struct pci_dev *dev) +void pci_enable_acs(struct pci_dev *dev) { if (!pci_acs_enable) goto disable_acs_redir; diff --git a/include/linux/pci.h b/include/linux/pci.h index c20211e59a57..e6a8bfbc9c98 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2223,6 +2223,7 @@ static inline struct pci_dev *pcie_find_root_port(struct pci_dev *dev) } void pci_request_acs(void); +void pci_enable_acs(struct pci_dev *dev); bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags); bool pci_acs_path_enabled(struct pci_dev *start, struct pci_dev *end, u16 acs_flags); -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/1] iommu/of: Fix request and enable ACS for of_iommu_configure
From: Xingang Wang When request ACS in of_iommu_configure, the pci_acs_init procedure has already been called. The pci device probe procedure is like the following: pci_host_common_probe pci_device_add pci_acs_init of_iommu_configure pci_request_acs The pci_request_acs() does not work because the pci_acs_init and pci_enable_acs procedure has already finished, so the ACS is not enabled as expected. Besides, the ACS is enabled only if IOMMU is detected and the device is pci. So this fix 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when configuring IOMMU linkage"), add pci_enable_acs() and IOMMU check to make sure ACS is enabled for the pci_device. Xingang Wang (1): iommu/of: Fix request and enable ACS for of_iommu_configure drivers/iommu/of_iommu.c | 10 +- drivers/pci/pci.c| 2 +- include/linux/pci.h | 1 + 3 files changed, 11 insertions(+), 2 deletions(-) -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Fri, May 07, 2021 at 07:36:49AM +, Tian, Kevin wrote: > for /dev/ioasid there is still an open whether an process is allowed to > open /dev/ioasid once or multiple times. If there is only one ioasid_fd > per process, the accounting can be made accurately. otherwise the > same problem still exists as each ioasid_fd is akin to the container, then > we need find a better solution. You can't really do tricks like 'FD once per process' in linux. The locked page accounting problem is much bigger than vfio and I don't really know of any solution.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v4 09/13] iommu/arm-smmu-v3: Add feature detection for BBML
From: Kunkun Jiang This detects BBML feature and if SMMU supports it, transfer BBMLx quirk to io-pgtable. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++ 2 files changed, 25 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index c42e59655fd0..3a2dc3177180 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2051,6 +2051,11 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain, if (smmu->features & ARM_SMMU_FEAT_HD) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD; + if (smmu->features & ARM_SMMU_FEAT_BBML1) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML1; + else if (smmu->features & ARM_SMMU_FEAT_BBML2) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML2; + pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain); if (!pgtbl_ops) return -ENOMEM; @@ -3419,6 +3424,20 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) /* IDR3 */ reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3); + switch (FIELD_GET(IDR3_BBML, reg)) { + case IDR3_BBML0: + break; + case IDR3_BBML1: + smmu->features |= ARM_SMMU_FEAT_BBML1; + break; + case IDR3_BBML2: + smmu->features |= ARM_SMMU_FEAT_BBML2; + break; + default: + dev_err(smmu->dev, "unknown/unsupported BBM behavior level\n"); + return -ENXIO; + } + if (FIELD_GET(IDR3_RIL, reg)) smmu->features |= ARM_SMMU_FEAT_RANGE_INV; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 3edcd31b046e..e3b6bdd292c9 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -54,6 +54,10 @@ #define IDR1_SIDSIZE GENMASK(5, 0) #define ARM_SMMU_IDR3 0xc +#define IDR3_BBML GENMASK(12, 11) +#define IDR3_BBML0 0 +#define IDR3_BBML1 1 +#define IDR3_BBML2 2 #define IDR3_RIL (1 << 10) #define ARM_SMMU_IDR5 0x14 @@ -613,6 +617,8 @@ struct arm_smmu_device { #define ARM_SMMU_FEAT_E2H (1 << 18) #define ARM_SMMU_FEAT_HA (1 << 19) #define ARM_SMMU_FEAT_HD (1 << 20) +#define ARM_SMMU_FEAT_BBML1(1 << 21) +#define ARM_SMMU_FEAT_BBML2(1 << 22) u32 features; #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0) -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v4 11/13] iommu/arm-smmu-v3: Realize sync_dirty_log iommu ops
From: Kunkun Jiang This realizes sync_dirty_log iommu ops based on sync_dirty_log io-pgtable ops. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 + 1 file changed, 30 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 6de81d6ab652..3d3c0f8e2446 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2721,6 +2721,35 @@ static int arm_smmu_switch_dirty_log(struct iommu_domain *domain, bool enable, return 0; } +static int arm_smmu_sync_dirty_log(struct iommu_domain *domain, + unsigned long iova, size_t size, + unsigned long *bitmap, + unsigned long base_iova, + unsigned long bitmap_pgshift) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; + struct arm_smmu_device *smmu = smmu_domain->smmu; + + if (!(smmu->features & ARM_SMMU_FEAT_HD)) + return -ENODEV; + if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) + return -EINVAL; + + if (!ops || !ops->sync_dirty_log) { + pr_err("io-pgtable don't realize sync dirty log\n"); + return -ENODEV; + } + + /* +* Flush iotlb to ensure all inflight transactions are completed. +* See doc IHI0070Da 3.13.4 "HTTU behavior summary". +*/ + arm_smmu_flush_iotlb_all(domain); + return ops->sync_dirty_log(ops, iova, size, bitmap, base_iova, + bitmap_pgshift); +} + static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) { return iommu_fwspec_add_ids(dev, args->args, 1); @@ -2820,6 +2849,7 @@ static struct iommu_ops arm_smmu_ops = { .device_group = arm_smmu_device_group, .enable_nesting = arm_smmu_enable_nesting, .switch_dirty_log = arm_smmu_switch_dirty_log, + .sync_dirty_log = arm_smmu_sync_dirty_log, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v4 13/13] iommu/arm-smmu-v3: Realize support_dirty_log iommu ops
From: Kunkun Jiang We have implemented these interfaces required to support iommu dirty log tracking. The last step is reporting this feature to upper user, then the user can perform higher policy base on it. For arm smmuv3, it is equal to ARM_SMMU_FEAT_HD. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 9b4739247dbb..59d11f084199 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2684,6 +2684,13 @@ static int arm_smmu_merge_page(struct iommu_domain *domain, unsigned long iova, return ret; } +static bool arm_smmu_support_dirty_log(struct iommu_domain *domain) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + return !!(smmu_domain->smmu->features & ARM_SMMU_FEAT_HD); +} + static int arm_smmu_switch_dirty_log(struct iommu_domain *domain, bool enable, unsigned long iova, size_t size, int prot) { @@ -2872,6 +2879,7 @@ static struct iommu_ops arm_smmu_ops = { .release_device = arm_smmu_release_device, .device_group = arm_smmu_device_group, .enable_nesting = arm_smmu_enable_nesting, + .support_dirty_log = arm_smmu_support_dirty_log, .switch_dirty_log = arm_smmu_switch_dirty_log, .sync_dirty_log = arm_smmu_sync_dirty_log, .clear_dirty_log= arm_smmu_clear_dirty_log, -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v4 10/13] iommu/arm-smmu-v3: Realize switch_dirty_log iommu ops
From: Kunkun Jiang This realizes switch_dirty_log. In order to get finer dirty granule, it invokes arm_smmu_split_block when start dirty log, and invokes arm_smmu_merge_page() to recover block mapping when stop dirty log. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 142 drivers/iommu/iommu.c | 5 +- include/linux/iommu.h | 2 + 3 files changed, 147 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 3a2dc3177180..6de81d6ab652 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2580,6 +2580,147 @@ static int arm_smmu_enable_nesting(struct iommu_domain *domain) return ret; } +static int arm_smmu_split_block(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; + size_t handled_size; + + if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) { + dev_err(smmu->dev, "don't support BBML1/2, can't split block\n"); + return -ENODEV; + } + if (!ops || !ops->split_block) { + pr_err("io-pgtable don't realize split block\n"); + return -ENODEV; + } + + handled_size = ops->split_block(ops, iova, size); + if (handled_size != size) { + pr_err("split block failed\n"); + return -EFAULT; + } + + return 0; +} + +static int __arm_smmu_merge_page(struct iommu_domain *domain, +unsigned long iova, phys_addr_t paddr, +size_t size, int prot) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; + size_t handled_size; + + if (!ops || !ops->merge_page) { + pr_err("io-pgtable don't realize merge page\n"); + return -ENODEV; + } + + while (size) { + size_t pgsize = iommu_pgsize(domain, iova | paddr, size); + + handled_size = ops->merge_page(ops, iova, paddr, pgsize, prot); + if (handled_size != pgsize) { + pr_err("merge page failed\n"); + return -EFAULT; + } + + pr_debug("merge handled: iova 0x%lx pa %pa size 0x%zx\n", +iova, , pgsize); + + iova += pgsize; + paddr += pgsize; + size -= pgsize; + } + + return 0; +} + +static int arm_smmu_merge_page(struct iommu_domain *domain, unsigned long iova, + size_t size, int prot) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; + phys_addr_t phys; + dma_addr_t p, i; + size_t cont_size; + int ret = 0; + + if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) { + dev_err(smmu->dev, "don't support BBML1/2, can't merge page\n"); + return -ENODEV; + } + + if (!ops || !ops->iova_to_phys) + return -ENODEV; + + while (size) { + phys = ops->iova_to_phys(ops, iova); + cont_size = PAGE_SIZE; + p = phys + cont_size; + i = iova + cont_size; + + while (cont_size < size && p == ops->iova_to_phys(ops, i)) { + p += PAGE_SIZE; + i += PAGE_SIZE; + cont_size += PAGE_SIZE; + } + + if (cont_size != PAGE_SIZE) { + ret = __arm_smmu_merge_page(domain, iova, phys, + cont_size, prot); + if (ret) + break; + } + + iova += cont_size; + size -= cont_size; + } + + return ret; +} + +static int arm_smmu_switch_dirty_log(struct iommu_domain *domain, bool enable, +unsigned long iova, size_t size, int prot) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + if (!(smmu->features & ARM_SMMU_FEAT_HD)) + return -ENODEV; + if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) + return -EINVAL; + + if (enable) { + /* +* For SMMU, the hardware dirty management is always enabled if +*
[RFC PATCH v4 07/13] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
From: Jean-Philippe Brucker If the SMMU supports it and the kernel was built with HTTU support, enable hardware update of access and dirty flags. This is essential for shared page tables, to reduce the number of access faults on the fault queue. Normal DMA with io-pgtables doesn't currently use the access or dirty flags. We can enable HTTU even if CPUs don't support it, because the kernel always checks for HW dirty bit and updates the PTE flags atomically. Signed-off-by: Jean-Philippe Brucker --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 3 files changed, 50 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index bb251cab61f3..ae075e675892 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -121,10 +121,12 @@ static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm) if (err) goto out_free_asid; + /* HA and HD will be filtered out later if not supported by the SMMU */ tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) | FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) | FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) | FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) | + CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD | CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; switch (PAGE_SIZE) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 54b2f27b81d4..4ac59a89bc76 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1010,10 +1010,17 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, * this substream's traffic */ } else { /* (1) and (2) */ + u64 tcr = cd->tcr; + cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); cdptr[2] = 0; cdptr[3] = cpu_to_le64(cd->mair); + if (!(smmu->features & ARM_SMMU_FEAT_HD)) + tcr &= ~CTXDESC_CD_0_TCR_HD; + if (!(smmu->features & ARM_SMMU_FEAT_HA)) + tcr &= ~CTXDESC_CD_0_TCR_HA; + /* * STE is live, and the SMMU might read dwords of this CD in any * order. Ensure that it observes valid values before reading @@ -1021,7 +1028,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, */ arm_smmu_sync_cd(smmu_domain, ssid, true); - val = cd->tcr | + val = tcr | #ifdef __BIG_ENDIAN CTXDESC_CD_0_ENDI | #endif @@ -3242,6 +3249,28 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) return 0; } +static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg) +{ + u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | ARM_SMMU_FEAT_HD); + u32 features = 0; + + switch (FIELD_GET(IDR0_HTTU, reg)) { + case IDR0_HTTU_ACCESS_DIRTY: + features |= ARM_SMMU_FEAT_HD; + fallthrough; + case IDR0_HTTU_ACCESS: + features |= ARM_SMMU_FEAT_HA; + } + + if (smmu->dev->of_node) + smmu->features |= features; + else if (features != fw_features) + /* ACPI IORT sets the HTTU bits */ + dev_warn(smmu->dev, +"IDR0.HTTU overridden by FW configuration (0x%x)\n", +fw_features); +} + static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) { u32 reg; @@ -3302,6 +3331,8 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) smmu->features |= ARM_SMMU_FEAT_E2H; } + arm_smmu_get_httu(smmu, reg); + /* * The coherency feature as set by FW is used in preference to the ID * register, but warn on mismatch. @@ -3487,6 +3518,14 @@ static int arm_smmu_device_acpi_probe(struct platform_device *pdev, if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE) smmu->features |= ARM_SMMU_FEAT_COHERENCY; + switch (FIELD_GET(ACPI_IORT_SMMU_V3_HTTU_OVERRIDE, iort_smmu->flags)) { + case IDR0_HTTU_ACCESS_DIRTY: + smmu->features |= ARM_SMMU_FEAT_HD; + fallthrough; + case IDR0_HTTU_ACCESS: + smmu->features |= ARM_SMMU_FEAT_HA; + } + return 0; } #else diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index
[RFC PATCH v4 08/13] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
From: Kunkun Jiang As nested mode is not upstreamed now, we just aim to support dirty log tracking for stage1 with io-pgtable mapping (means not support SVA mapping). If HTTU is supported, we enable HA/HD bits in the SMMU CD and transfer ARM_HD quirk to io-pgtable. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4ac59a89bc76..c42e59655fd0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1942,6 +1942,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) | FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) | FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) | + CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD | CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; cfg->cd.mair= pgtbl_cfg->arm_lpae_s1_cfg.mair; @@ -2047,6 +2048,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain, if (!iommu_get_dma_strict(domain)) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; + if (smmu->features & ARM_SMMU_FEAT_HD) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD; pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain); if (!pgtbl_ops) -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v4 06/13] iommu/io-pgtable-arm: Add and realize clear_dirty_log ops
From: Kunkun Jiang After dirty log is retrieved, user should clear dirty log to re-enable dirty log tracking for these dirtied pages. This clears the dirty state (As we just set DBM bit for stage1 mapping, so should set the AP[2] bit) of these leaf TTDs that are specified by the user provided bitmap. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/io-pgtable-arm.c | 93 ++ include/linux/io-pgtable.h | 4 ++ 2 files changed, 97 insertions(+) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 155d440099ab..2b41b9d0faa3 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -965,6 +965,98 @@ static int arm_lpae_sync_dirty_log(struct io_pgtable_ops *ops, bitmap, base_iova, bitmap_pgshift); } +static int __arm_lpae_clear_dirty_log(struct arm_lpae_io_pgtable *data, + unsigned long iova, size_t size, + int lvl, arm_lpae_iopte *ptep, + unsigned long *bitmap, + unsigned long base_iova, + unsigned long bitmap_pgshift) +{ + arm_lpae_iopte pte; + struct io_pgtable *iop = >iop; + unsigned long offset; + size_t base, next_size; + int nbits, ret, i; + + if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) + return -EINVAL; + + ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); + pte = READ_ONCE(*ptep); + if (WARN_ON(!pte)) + return -EINVAL; + + if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) { + if (iopte_leaf(pte, lvl, iop->fmt)) { + if (pte & ARM_LPAE_PTE_AP_RDONLY) + return 0; + + /* Ensure all corresponding bits are set */ + nbits = size >> bitmap_pgshift; + offset = (iova - base_iova) >> bitmap_pgshift; + for (i = offset; i < offset + nbits; i++) { + if (!test_bit(i, bitmap)) + return 0; + } + + /* Race does not exist */ + pte |= ARM_LPAE_PTE_AP_RDONLY; + __arm_lpae_set_pte(ptep, pte, >cfg); + return 0; + } + /* Current level is table, traverse next level */ + next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data); + ptep = iopte_deref(pte, data); + for (base = 0; base < size; base += next_size) { + ret = __arm_lpae_clear_dirty_log(data, iova + base, + next_size, lvl + 1, ptep, bitmap, + base_iova, bitmap_pgshift); + if (ret) + return ret; + } + return 0; + } else if (iopte_leaf(pte, lvl, iop->fmt)) { + /* Though the size is too small, it is already clean */ + if (pte & ARM_LPAE_PTE_AP_RDONLY) + return 0; + + return -EINVAL; + } + + /* Keep on walkin */ + ptep = iopte_deref(pte, data); + return __arm_lpae_clear_dirty_log(data, iova, size, lvl + 1, ptep, + bitmap, base_iova, bitmap_pgshift); +} + +static int arm_lpae_clear_dirty_log(struct io_pgtable_ops *ops, + unsigned long iova, size_t size, + unsigned long *bitmap, + unsigned long base_iova, + unsigned long bitmap_pgshift) +{ + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct io_pgtable_cfg *cfg = >iop.cfg; + arm_lpae_iopte *ptep = data->pgd; + int lvl = data->start_level; + long iaext = (s64)iova >> cfg->ias; + + if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size)) + return -EINVAL; + + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) + iaext = ~iaext; + if (WARN_ON(iaext)) + return -EINVAL; + + if (data->iop.fmt != ARM_64_LPAE_S1 && + data->iop.fmt != ARM_32_LPAE_S1) + return -EINVAL; + + return __arm_lpae_clear_dirty_log(data, iova, size, lvl, ptep, + bitmap, base_iova, bitmap_pgshift); +} + static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) { unsigned long granule, page_sizes; @@ -1046,6 +1138,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg) .split_block= arm_lpae_split_block, .merge_page = arm_lpae_merge_page, .sync_dirty_log = arm_lpae_sync_dirty_log, +
[RFC PATCH v4 12/13] iommu/arm-smmu-v3: Realize clear_dirty_log iommu ops
From: Kunkun Jiang This realizes clear_dirty_log iommu ops based on clear_dirty_log io-pgtable ops. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 + 1 file changed, 25 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 3d3c0f8e2446..9b4739247dbb 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2750,6 +2750,30 @@ static int arm_smmu_sync_dirty_log(struct iommu_domain *domain, bitmap_pgshift); } +static int arm_smmu_clear_dirty_log(struct iommu_domain *domain, + unsigned long iova, size_t size, + unsigned long *bitmap, + unsigned long base_iova, + unsigned long bitmap_pgshift) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; + struct arm_smmu_device *smmu = smmu_domain->smmu; + + if (!(smmu->features & ARM_SMMU_FEAT_HD)) + return -ENODEV; + if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) + return -EINVAL; + + if (!ops || !ops->clear_dirty_log) { + pr_err("io-pgtable don't realize clear dirty log\n"); + return -ENODEV; + } + + return ops->clear_dirty_log(ops, iova, size, bitmap, base_iova, + bitmap_pgshift); +} + static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) { return iommu_fwspec_add_ids(dev, args->args, 1); @@ -2850,6 +2874,7 @@ static struct iommu_ops arm_smmu_ops = { .enable_nesting = arm_smmu_enable_nesting, .switch_dirty_log = arm_smmu_switch_dirty_log, .sync_dirty_log = arm_smmu_sync_dirty_log, + .clear_dirty_log= arm_smmu_clear_dirty_log, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v4 04/13] iommu/io-pgtable-arm: Add and realize merge_page ops
From: Kunkun Jiang If block(largepage) mappings are split during start dirty log, then when stop dirty log, we need to recover them for better DMA performance. This recovers block mappings and unmap the span of page mappings. BBML1 or BBML2 feature is required. Merging page is designed to be only used by dirty log tracking, which does not concurrently work with other pgtable ops that access underlying page table, so race condition does not exist. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/io-pgtable-arm.c | 78 ++ include/linux/io-pgtable.h | 2 + 2 files changed, 80 insertions(+) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 664a9548b199..b9f6e3370032 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops *ops, return __arm_lpae_split_block(data, iova, size, lvl, ptep); } +static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data, + unsigned long iova, phys_addr_t paddr, + size_t size, int lvl, arm_lpae_iopte *ptep, + arm_lpae_iopte prot) +{ + arm_lpae_iopte pte, *tablep; + struct io_pgtable *iop = >iop; + struct io_pgtable_cfg *cfg = >iop.cfg; + + if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) + return 0; + + ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); + pte = READ_ONCE(*ptep); + if (WARN_ON(!pte)) + return 0; + + if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) { + if (iopte_leaf(pte, lvl, iop->fmt)) + return size; + + /* Race does not exist */ + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) { + prot |= ARM_LPAE_PTE_NT; + __arm_lpae_init_pte(data, paddr, prot, lvl, ptep); + io_pgtable_tlb_flush_walk(iop, iova, size, + ARM_LPAE_GRANULE(data)); + + prot &= ~(ARM_LPAE_PTE_NT); + __arm_lpae_init_pte(data, paddr, prot, lvl, ptep); + } else { + __arm_lpae_init_pte(data, paddr, prot, lvl, ptep); + } + + tablep = iopte_deref(pte, data); + __arm_lpae_free_pgtable(data, lvl + 1, tablep); + return size; + } else if (iopte_leaf(pte, lvl, iop->fmt)) { + /* The size is too small, already merged */ + return size; + } + + /* Keep on walkin */ + ptep = iopte_deref(pte, data); + return __arm_lpae_merge_page(data, iova, paddr, size, lvl + 1, ptep, prot); +} + +static size_t arm_lpae_merge_page(struct io_pgtable_ops *ops, unsigned long iova, + phys_addr_t paddr, size_t size, int iommu_prot) +{ + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct io_pgtable_cfg *cfg = >iop.cfg; + arm_lpae_iopte *ptep = data->pgd; + int lvl = data->start_level; + arm_lpae_iopte prot; + long iaext = (s64)iova >> cfg->ias; + + if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size)) + return 0; + + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) + iaext = ~iaext; + if (WARN_ON(iaext || paddr >> cfg->oas)) + return 0; + + /* If no access, then nothing to do */ + if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE))) + return size; + + /* If it is smallest granule, then nothing to do */ + if (size == ARM_LPAE_BLOCK_SIZE(ARM_LPAE_MAX_LEVELS - 1, data)) + return size; + + prot = arm_lpae_prot_to_pte(data, iommu_prot); + return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot); +} + static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) { unsigned long granule, page_sizes; @@ -879,6 +956,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg) .unmap = arm_lpae_unmap, .iova_to_phys = arm_lpae_iova_to_phys, .split_block= arm_lpae_split_block, + .merge_page = arm_lpae_merge_page, }; return data; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index eba6c6ccbe49..e77576d946a2 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -169,6 +169,8 @@ struct io_pgtable_ops { unsigned long iova); size_t (*split_block)(struct io_pgtable_ops *ops, unsigned long iova, size_t size); + size_t (*merge_page)(struct io_pgtable_ops *ops, unsigned long iova, +phys_addr_t phys, size_t size, int prot); }; /** --
[RFC PATCH v4 05/13] iommu/io-pgtable-arm: Add and realize sync_dirty_log ops
From: Kunkun Jiang During dirty log tracking, user will try to retrieve dirty log from iommu if it supports hardware dirty log. Scan leaf TTD and treat it is dirty if it's writable. As we just set DBM bit for stage1 mapping, so check whether AP[2] is not set. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/io-pgtable-arm.c | 89 ++ include/linux/io-pgtable.h | 4 ++ 2 files changed, 93 insertions(+) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index b9f6e3370032..155d440099ab 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -877,6 +877,94 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops *ops, unsigned long iova return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot); } +static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data, +unsigned long iova, size_t size, +int lvl, arm_lpae_iopte *ptep, +unsigned long *bitmap, +unsigned long base_iova, +unsigned long bitmap_pgshift) +{ + arm_lpae_iopte pte; + struct io_pgtable *iop = >iop; + size_t base, next_size; + unsigned long offset; + int nbits, ret; + + if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) + return -EINVAL; + + ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); + pte = READ_ONCE(*ptep); + if (WARN_ON(!pte)) + return -EINVAL; + + if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) { + if (iopte_leaf(pte, lvl, iop->fmt)) { + if (pte & ARM_LPAE_PTE_AP_RDONLY) + return 0; + + /* It is writable, set the bitmap */ + nbits = size >> bitmap_pgshift; + offset = (iova - base_iova) >> bitmap_pgshift; + bitmap_set(bitmap, offset, nbits); + return 0; + } + /* Current level is table, traverse next level */ + next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data); + ptep = iopte_deref(pte, data); + for (base = 0; base < size; base += next_size) { + ret = __arm_lpae_sync_dirty_log(data, iova + base, + next_size, lvl + 1, ptep, bitmap, + base_iova, bitmap_pgshift); + if (ret) + return ret; + } + return 0; + } else if (iopte_leaf(pte, lvl, iop->fmt)) { + if (pte & ARM_LPAE_PTE_AP_RDONLY) + return 0; + + /* Though the size is too small, also set bitmap */ + nbits = size >> bitmap_pgshift; + offset = (iova - base_iova) >> bitmap_pgshift; + bitmap_set(bitmap, offset, nbits); + return 0; + } + + /* Keep on walkin */ + ptep = iopte_deref(pte, data); + return __arm_lpae_sync_dirty_log(data, iova, size, lvl + 1, ptep, + bitmap, base_iova, bitmap_pgshift); +} + +static int arm_lpae_sync_dirty_log(struct io_pgtable_ops *ops, + unsigned long iova, size_t size, + unsigned long *bitmap, + unsigned long base_iova, + unsigned long bitmap_pgshift) +{ + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct io_pgtable_cfg *cfg = >iop.cfg; + arm_lpae_iopte *ptep = data->pgd; + int lvl = data->start_level; + long iaext = (s64)iova >> cfg->ias; + + if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size)) + return -EINVAL; + + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) + iaext = ~iaext; + if (WARN_ON(iaext)) + return -EINVAL; + + if (data->iop.fmt != ARM_64_LPAE_S1 && + data->iop.fmt != ARM_32_LPAE_S1) + return -EINVAL; + + return __arm_lpae_sync_dirty_log(data, iova, size, lvl, ptep, +bitmap, base_iova, bitmap_pgshift); +} + static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) { unsigned long granule, page_sizes; @@ -957,6 +1045,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg) .iova_to_phys = arm_lpae_iova_to_phys, .split_block= arm_lpae_split_block, .merge_page = arm_lpae_merge_page, + .sync_dirty_log = arm_lpae_sync_dirty_log, }; return data; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index e77576d946a2..329fa99d9d96 100644 ---
[RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework
Some types of IOMMU are capable of tracking DMA dirty log, such as ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the dirty log tracking framework in the IOMMU base layer. Four new essential interfaces are added, and we maintaince the status of dirty log tracking in iommu_domain. 1. iommu_support_dirty_log: Check whether domain supports dirty log tracking 2. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking 3. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap 4. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap Note: Don't concurrently call these interfaces with other ops that access underlying page table. Signed-off-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/iommu.c| 201 +++ include/linux/iommu.h| 63 +++ include/trace/events/iommu.h | 63 +++ 3 files changed, 327 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d5df5..0d15620d1e90 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1940,6 +1940,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, domain->type = type; /* Assume all sizes by default; the driver may override this later */ domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; + mutex_init(>switch_log_lock); return domain; } @@ -2703,6 +2704,206 @@ int iommu_set_pgtable_quirks(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks); +bool iommu_support_dirty_log(struct iommu_domain *domain) +{ + const struct iommu_ops *ops = domain->ops; + + return ops->support_dirty_log && ops->support_dirty_log(domain); +} +EXPORT_SYMBOL_GPL(iommu_support_dirty_log); + +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable, + unsigned long iova, size_t size, int prot) +{ + const struct iommu_ops *ops = domain->ops; + unsigned long orig_iova = iova; + unsigned int min_pagesz; + size_t orig_size = size; + bool flush = false; + int ret = 0; + + if (unlikely(!ops->switch_dirty_log)) + return -ENODEV; + + min_pagesz = 1 << __ffs(domain->pgsize_bitmap); + if (!IS_ALIGNED(iova | size, min_pagesz)) { + pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n", + iova, size, min_pagesz); + return -EINVAL; + } + + mutex_lock(>switch_log_lock); + if (enable && domain->dirty_log_tracking) { + ret = -EBUSY; + goto out; + } else if (!enable && !domain->dirty_log_tracking) { + ret = -EINVAL; + goto out; + } + + pr_debug("switch_dirty_log %s for: iova 0x%lx size 0x%zx\n", +enable ? "enable" : "disable", iova, size); + + while (size) { + size_t pgsize = iommu_pgsize(domain, iova, size); + + flush = true; + ret = ops->switch_dirty_log(domain, enable, iova, pgsize, prot); + if (ret) + break; + + pr_debug("switch_dirty_log handled: iova 0x%lx size 0x%zx\n", +iova, pgsize); + + iova += pgsize; + size -= pgsize; + } + + if (flush) + iommu_flush_iotlb_all(domain); + + if (!ret) { + domain->dirty_log_tracking = enable; + trace_switch_dirty_log(orig_iova, orig_size, enable); + } +out: + mutex_unlock(>switch_log_lock); + return ret; +} +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log); + +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova, +size_t size, unsigned long *bitmap, +unsigned long base_iova, unsigned long bitmap_pgshift) +{ + const struct iommu_ops *ops = domain->ops; + unsigned long orig_iova = iova; + unsigned int min_pagesz; + size_t orig_size = size; + int ret = 0; + + if (unlikely(!ops->sync_dirty_log)) + return -ENODEV; + + min_pagesz = 1 << __ffs(domain->pgsize_bitmap); + if (!IS_ALIGNED(iova | size, min_pagesz)) { + pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n", + iova, size, min_pagesz); + return -EINVAL; + } + + mutex_lock(>switch_log_lock); + if (!domain->dirty_log_tracking) { + ret = -EINVAL; + goto out; + } + + pr_debug("sync_dirty_log for: iova 0x%lx size 0x%zx\n", iova, size); + + while (size) { + size_t pgsize = iommu_pgsize(domain, iova, size); + + ret = ops->sync_dirty_log(domain, iova, pgsize, + bitmap, base_iova, bitmap_pgshift); + if (ret) + break; + +
[RFC PATCH v4 00/13] iommu/smmuv3: Implement hardware dirty log tracking
Hi Robin, Will and everyone, I think this series is relative mature now, please give your valuable suggestions, thanks! This patch series is split from the series[1] that containes both IOMMU part and VFIO part. The VFIO part will be sent out in another series. [1] https://lore.kernel.org/linux-iommu/20210310090614.26668-1-zhukeqi...@huawei.com/ changelog: v4: - Modify the framework as suggested by Baolu, thanks! - Add trace for iommu ops. - Extract io-pgtable part. v3: - Merge start_dirty_log and stop_dirty_log into switch_dirty_log. (Yi Sun) - Maintain the dirty log status in iommu_domain. - Update commit message to make patch easier to review. v2: - Address all comments of RFC version, thanks for all of you ;-) - Add a bugfix that start dirty log for newly added dma ranges and domain. Hi everyone, This patch series introduces a framework of iommu dirty log tracking, and smmuv3 realizes this framework. This new feature can be used by VFIO dma dirty tracking. Intention: Some types of IOMMU are capable of tracking DMA dirty log, such as ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the dirty log tracking framework in the IOMMU base layer. Three new essential interfaces are added, and we maintaince the status of dirty log tracking in iommu_domain. 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap About SMMU HTTU: HTTU (Hardware Translation Table Update) is a feature of ARM SMMUv3, it can update access flag or/and dirty state of the TTD (Translation Table Descriptor) by hardware. With HTTU, stage1 TTD is classified into 3 types: DBM bit AP[2](readonly bit) 1. writable_clean 1 1 2. writable_dirty 1 0 3. readonly 0 1 If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from writable_clean to writable_dirty. Then software can scan TTD to sync dirty state into dirty bitmap. With this feature, we can track the dirty log of DMA continuously and precisely. About this series: Patch 1-3:Introduce dirty log tracking framework in the IOMMU base layer, and two common interfaces that can be used by many types of iommu. Patch 4-6: Add feature detection for smmu HTTU and enable HTTU for smmu stage1 mapping. And add feature detection for smmu BBML. We need to split block mapping when start dirty log tracking and merge page mapping when stop dirty log tracking, which requires break-before-make procedure. But it might cause problems when the TTD is alive. The I/O streams might not tolerate translation faults. So BBML should be used. Patch 7-12: We implement these interfaces for arm smmuv3. Thanks, Keqian Jean-Philippe Brucker (1): iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Keqian Zhu (1): iommu: Introduce dirty log tracking framework Kunkun Jiang (11): iommu/io-pgtable-arm: Add quirk ARM_HD and ARM_BBMLx iommu/io-pgtable-arm: Add and realize split_block ops iommu/io-pgtable-arm: Add and realize merge_page ops iommu/io-pgtable-arm: Add and realize sync_dirty_log ops iommu/io-pgtable-arm: Add and realize clear_dirty_log ops iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping iommu/arm-smmu-v3: Add feature detection for BBML iommu/arm-smmu-v3: Realize switch_dirty_log iommu ops iommu/arm-smmu-v3: Realize sync_dirty_log iommu ops iommu/arm-smmu-v3: Realize clear_dirty_log iommu ops iommu/arm-smmu-v3: Realize support_dirty_log iommu ops .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 268 +++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 14 + drivers/iommu/io-pgtable-arm.c| 389 +- drivers/iommu/iommu.c | 206 +- include/linux/io-pgtable.h| 23 ++ include/linux/iommu.h | 65 +++ include/trace/events/iommu.h | 63 +++ 8 files changed, 1026 insertions(+), 4 deletions(-) -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v4 02/13] iommu/io-pgtable-arm: Add quirk ARM_HD and ARM_BBMLx
From: Kunkun Jiang These features are essential to support dirty log tracking for SMMU with io-pgtable mapping. The dirty state information is encoded using the access permission bits AP[2] (stage 1) or S2AP[1] (stage 2) in conjunction with the DBM (Dirty Bit Modifier) bit, where DBM means writable and AP[2]/ S2AP[1] means dirty. When has ARM_HD, we set DBM bit for S1 mapping. As SMMU nested mode is not upstreamed for now, we just aim to support dirty log tracking for stage1 with io-pgtable mapping (means not support SVA). Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/io-pgtable-arm.c | 7 ++- include/linux/io-pgtable.h | 11 +++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58e79b5..94d790b8ed27 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -72,6 +72,7 @@ #define ARM_LPAE_PTE_NSTABLE (((arm_lpae_iopte)1) << 63) #define ARM_LPAE_PTE_XN(((arm_lpae_iopte)3) << 53) +#define ARM_LPAE_PTE_DBM (((arm_lpae_iopte)1) << 51) #define ARM_LPAE_PTE_AF(((arm_lpae_iopte)1) << 10) #define ARM_LPAE_PTE_SH_NS (((arm_lpae_iopte)0) << 8) #define ARM_LPAE_PTE_SH_OS (((arm_lpae_iopte)2) << 8) @@ -81,7 +82,7 @@ #define ARM_LPAE_PTE_ATTR_LO_MASK (((arm_lpae_iopte)0x3ff) << 2) /* Ignore the contiguous bit for block splitting */ -#define ARM_LPAE_PTE_ATTR_HI_MASK (((arm_lpae_iopte)6) << 52) +#define ARM_LPAE_PTE_ATTR_HI_MASK (((arm_lpae_iopte)13) << 51) #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK |\ ARM_LPAE_PTE_ATTR_HI_MASK) /* Software bit for solving coherency races */ @@ -379,6 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova, static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, int prot) { + struct io_pgtable_cfg *cfg = >iop.cfg; arm_lpae_iopte pte; if (data->iop.fmt == ARM_64_LPAE_S1 || @@ -386,6 +388,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, pte = ARM_LPAE_PTE_nG; if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ)) pte |= ARM_LPAE_PTE_AP_RDONLY; + else if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD) + pte |= ARM_LPAE_PTE_DBM; + if (!(prot & IOMMU_PRIV)) pte |= ARM_LPAE_PTE_AP_UNPRIV; } else { diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 4d40dfa75b55..92274705b772 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -82,6 +82,14 @@ struct io_pgtable_cfg { * * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability * attributes set in the TCR for a non-coherent page-table walker. +* +* IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status. +* +* IO_PGTABLE_QUIRK_ARM_BBML1: ARM SMMU supports BBM Level 1 behavior +* when changing block size. +* +* IO_PGTABLE_QUIRK_ARM_BBML2: ARM SMMU supports BBM Level 2 behavior +* when changing block size. */ #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) @@ -89,6 +97,9 @@ struct io_pgtable_cfg { #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4) #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5) #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6) + #define IO_PGTABLE_QUIRK_ARM_HD BIT(7) + #define IO_PGTABLE_QUIRK_ARM_BBML1 BIT(8) + #define IO_PGTABLE_QUIRK_ARM_BBML2 BIT(9) unsigned long quirks; unsigned long pgsize_bitmap; unsigned intias; -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v4 03/13] iommu/io-pgtable-arm: Add and realize split_block ops
From: Kunkun Jiang Block(largepage) mapping is not a proper granule for dirty log tracking. Take an extreme example, if DMA writes one byte, under 1G mapping, the dirty amount reported is 1G, but under 4K mapping, the dirty amount is just 4K. This splits block descriptor to an span of page descriptors. BBML1 or BBML2 feature is required. Spliting block is designed to be only used by dirty log tracking, which does not concurrently work with other pgtable ops that access underlying page table, so race condition does not exist. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/io-pgtable-arm.c | 122 + include/linux/io-pgtable.h | 2 + 2 files changed, 124 insertions(+) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 94d790b8ed27..664a9548b199 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -79,6 +79,8 @@ #define ARM_LPAE_PTE_SH_IS (((arm_lpae_iopte)3) << 8) #define ARM_LPAE_PTE_NS(((arm_lpae_iopte)1) << 5) #define ARM_LPAE_PTE_VALID (((arm_lpae_iopte)1) << 0) +/* Block descriptor bits */ +#define ARM_LPAE_PTE_NT(((arm_lpae_iopte)1) << 16) #define ARM_LPAE_PTE_ATTR_LO_MASK (((arm_lpae_iopte)0x3ff) << 2) /* Ignore the contiguous bit for block splitting */ @@ -679,6 +681,125 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, return iopte_to_paddr(pte, data) | iova; } +static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data, +unsigned long iova, size_t size, int lvl, +arm_lpae_iopte *ptep); + +static size_t arm_lpae_do_split_blk(struct arm_lpae_io_pgtable *data, + unsigned long iova, size_t size, + arm_lpae_iopte blk_pte, int lvl, + arm_lpae_iopte *ptep) +{ + struct io_pgtable_cfg *cfg = >iop.cfg; + arm_lpae_iopte pte, *tablep; + phys_addr_t blk_paddr; + size_t tablesz = ARM_LPAE_GRANULE(data); + size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data); + int i; + + if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) + return 0; + + tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg); + if (!tablep) + return 0; + + blk_paddr = iopte_to_paddr(blk_pte, data); + pte = iopte_prot(blk_pte); + for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) + __arm_lpae_init_pte(data, blk_paddr, pte, lvl, [i]); + + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) { + /* Race does not exist */ + blk_pte |= ARM_LPAE_PTE_NT; + __arm_lpae_set_pte(ptep, blk_pte, cfg); + io_pgtable_tlb_flush_walk(>iop, iova, size, size); + } + /* Race does not exist */ + pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg); + + /* Have splited it into page? */ + if (lvl == (ARM_LPAE_MAX_LEVELS - 1)) + return size; + + /* Go back to lvl - 1 */ + ptep -= ARM_LPAE_LVL_IDX(iova, lvl - 1, data); + return __arm_lpae_split_block(data, iova, size, lvl - 1, ptep); +} + +static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data, +unsigned long iova, size_t size, int lvl, +arm_lpae_iopte *ptep) +{ + arm_lpae_iopte pte; + struct io_pgtable *iop = >iop; + size_t base, next_size, total_size; + + if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) + return 0; + + ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); + pte = READ_ONCE(*ptep); + if (WARN_ON(!pte)) + return 0; + + if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) { + if (iopte_leaf(pte, lvl, iop->fmt)) { + if (lvl == (ARM_LPAE_MAX_LEVELS - 1) || + (pte & ARM_LPAE_PTE_AP_RDONLY)) + return size; + + /* We find a writable block, split it. */ + return arm_lpae_do_split_blk(data, iova, size, pte, + lvl + 1, ptep); + } else { + /* If it is the last table level, then nothing to do */ + if (lvl == (ARM_LPAE_MAX_LEVELS - 2)) + return size; + + total_size = 0; + next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data); + ptep = iopte_deref(pte, data); + for (base = 0; base < size; base += next_size) + total_size += __arm_lpae_split_block(data, + iova + base, next_size, lvl + 1, +
Re: [PATCH v3 08/10] iommu/arm-smmu-v3: Reserve any RMR regions associated with a dev
On 2021-04-20 09:27, Shameer Kolothum wrote: Get RMR regions associated with a dev reserved so that there is a unity mapping for them in SMMU. Signed-off-by: Shameer Kolothum --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 + 1 file changed, 29 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 14e9c7034c04..8bacedf7bb34 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2531,6 +2531,34 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) return iommu_fwspec_add_ids(dev, args->args, 1); } +static bool arm_smmu_dev_has_rmr(struct arm_smmu_master *master, +struct iommu_rmr *e) +{ + int i; + + for (i = 0; i < master->num_sids; i++) { + if (e->sid == master->sids[i]) + return true; + } + + return false; +} + +static void arm_smmu_rmr_get_resv_regions(struct device *dev, + struct list_head *head) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_device *smmu = master->smmu; + struct iommu_rmr *rmr; + + list_for_each_entry(rmr, >rmr_list, list) { + if (!arm_smmu_dev_has_rmr(master, rmr)) + continue; + + iommu_dma_get_rmr_resv_regions(dev, rmr, head); + } +} + TBH I wouldn't have thought we need a driver-specific hook for this, or is it too painful to correlate fwspec->iommu_fwnode back to the relevant IORT node generically? Robin. static void arm_smmu_get_resv_regions(struct device *dev, struct list_head *head) { @@ -2545,6 +2573,7 @@ static void arm_smmu_get_resv_regions(struct device *dev, list_add_tail(>list, head); iommu_dma_get_resv_regions(dev, head); + arm_smmu_rmr_get_resv_regions(dev, head); } static bool arm_smmu_dev_has_feature(struct device *dev, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 09/10] iommu/arm-smmu: Get associated RMR info and install bypass SMR
On 2021-05-06 16:17, Steven Price wrote: On 20/04/2021 09:27, Shameer Kolothum wrote: From: Jon Nettleton Check if there is any RMR info associated with the devices behind the SMMU and if any, install bypass SMRs for them. This is to keep any ongoing traffic associated with these devices alive when we enable/reset SMMU during probe(). Signed-off-by: Jon Nettleton Signed-off-by: Shameer Kolothum --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 42 +++ drivers/iommu/arm/arm-smmu/arm-smmu.h | 2 ++ 2 files changed, 44 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index d8c6bfde6a61..4d2f91626d87 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2102,6 +2102,43 @@ err_reset_platform_ops: __maybe_unused; return err; } +static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device *smmu) +{ + struct iommu_rmr *e; + int i, cnt = 0; + u32 smr; + + for (i = 0; i < smmu->num_mapping_groups; i++) { + smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i)); + if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr)) + continue; + + list_for_each_entry(e, >rmr_list, list) { + if (FIELD_GET(ARM_SMMU_SMR_ID, smr) != e->sid) + continue; + + smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr); + smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr); + smmu->smrs[i].valid = true; + + smmu->s2crs[i].type = S2CR_TYPE_BYPASS; + smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT; + smmu->s2crs[i].cbndx = 0xff; + + cnt++; + } + } If I understand this correctly - this is looking at the current (hardware) configuration of the SMMU and attempting to preserve any bypass SMRs. However from what I can tell it suffers from the following two problems: (a) Only the ID of the SMR is being checked, not the MASK. So if the firmware has setup an SMR matching a number of streams this will break. (b) The SMMU might not be enabled at all (CLIENTPD==1) or bypass enabled for unmatched streams (USFCFG==0). Yes, trying to infer anything from the current SMMU hardware state is bogus - consider what you might find left over after a kexec, for instance. The *only* way to detect the presence and applicability of RMRs is to look at the actual RMR nodes in the IORT. Ignore what we let the Qualcomm ACPI bootloader hack do - that whole implementation is "special". Robin. Certainly in my test setup case (b) applies and so this doesn't work. Perhaps something like the below would work better? (It works in the case of the SMMU not enabled - I've not tested case (a)). Steve 8< static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device *smmu) { struct iommu_rmr *e; int i, cnt = 0; u32 smr; u32 reg; reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0); if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg & ARM_SMMU_sCR0_CLIENTPD)) { /* * SMMU is already enabled and disallowing bypass, so preserve * the existing SMRs */ for (i = 0; i < smmu->num_mapping_groups; i++) { smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i)); if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr)) continue; smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr); smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr); smmu->smrs[i].valid = true; } } list_for_each_entry(e, >rmr_list, list) { u32 sid = e->sid; i = arm_smmu_find_sme(smmu, sid, ~0); if (i < 0) continue; if (smmu->s2crs[i].count == 0) { smmu->smrs[i].id = sid; smmu->smrs[i].mask = ~0; smmu->smrs[i].valid = true; } smmu->s2crs[i].count++; smmu->s2crs[i].type = S2CR_TYPE_BYPASS; smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT; smmu->s2crs[i].cbndx = 0xff; cnt++; } if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg & ARM_SMMU_sCR0_CLIENTPD)) { /* Remove the valid bit for unused SMRs */ for (i = 0; i < smmu->num_mapping_groups; i++) { if (smmu->s2crs[i].count == 0) smmu->smrs[i].valid = false; } } dev_notice(smmu->dev, "\tpreserved %d boot mapping%s\n", cnt, cnt == 1 ? "" : "s"); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 6/6] iommu: rockchip: Add support iommu v2
From: Simon Xue RK356x SoC got new IOMMU hardware block (version 2). Add a compatible to distinguish it from the first version. Signed-off-by: Simon Xue [Benjamin] - port driver from kernel 4.19 to 5.12 - change functions prototype. - squash all fixes in this commit. Signed-off-by: Benjamin Gaignard --- drivers/iommu/rockchip-iommu.c | 422 +++-- 1 file changed, 406 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index e5d86b7177de..32dcf0aaec18 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -81,6 +82,30 @@ */ #define RK_IOMMU_PGSIZE_BITMAP 0x007ff000 +#define DT_LO_MASK 0xf000 +#define DT_HI_MASK GENMASK_ULL(39, 32) +#define DT_SHIFT 28 + +#define DTE_BASE_HI_MASK GENMASK(11, 4) + +#define PAGE_DESC_LO_MASK 0xf000 +#define PAGE_DESC_HI1_LOWER 32 +#define PAGE_DESC_HI1_UPPER 35 +#define PAGE_DESC_HI2_LOWER 36 +#define PAGE_DESC_HI2_UPPER 39 +#define PAGE_DESC_HI_MASK1 GENMASK_ULL(PAGE_DESC_HI1_UPPER, PAGE_DESC_HI1_LOWER) +#define PAGE_DESC_HI_MASK2 GENMASK_ULL(PAGE_DESC_HI2_UPPER, PAGE_DESC_HI2_LOWER) + +#define DTE_HI1_LOWER 8 +#define DTE_HI1_UPPER 11 +#define DTE_HI2_LOWER 4 +#define DTE_HI2_UPPER 7 +#define DTE_HI_MASK1 GENMASK(DTE_HI1_UPPER, DTE_HI1_LOWER) +#define DTE_HI_MASK2 GENMASK(DTE_HI2_UPPER, DTE_HI2_LOWER) + +#define PAGE_DESC_HI_SHIFT1 (PAGE_DESC_HI1_LOWER - DTE_HI1_LOWER) +#define PAGE_DESC_HI_SHIFT2 (PAGE_DESC_HI2_LOWER - DTE_HI2_LOWER) + struct rk_iommu_domain { struct list_head iommus; u32 *dt; /* page directory table */ @@ -91,6 +116,10 @@ struct rk_iommu_domain { struct iommu_domain domain; }; +struct rockchip_iommu_data { + u32 version; +}; + /* list of clocks required by IOMMU */ static const char * const rk_iommu_clocks[] = { "aclk", "iface", @@ -108,6 +137,7 @@ struct rk_iommu { struct list_head node; /* entry in rk_iommu_domain.iommus */ struct iommu_domain *domain; /* domain to which iommu is attached */ struct iommu_group *group; + u32 version; }; struct rk_iommudata { @@ -174,11 +204,32 @@ static struct rk_iommu_domain *to_rk_domain(struct iommu_domain *dom) #define RK_DTE_PT_ADDRESS_MASK0xf000 #define RK_DTE_PT_VALID BIT(0) +/* + * In v2: + * 31:12 - PT address bit 31:0 + * 11: 8 - PT address bit 35:32 + * 7: 4 - PT address bit 39:36 + * 3: 1 - Reserved + * 0 - 1 if PT @ PT address is valid + */ +#define RK_DTE_PT_ADDRESS_MASK_V2 0xfff0 + static inline phys_addr_t rk_dte_pt_address(u32 dte) { return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK; } +static inline phys_addr_t rk_dte_pt_address_v2(u32 dte) +{ + u64 dte_v2 = dte; + + dte_v2 = ((dte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) | +((dte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) | +(dte_v2 & PAGE_DESC_LO_MASK); + + return (phys_addr_t)dte_v2; +} + static inline bool rk_dte_is_pt_valid(u32 dte) { return dte & RK_DTE_PT_VALID; @@ -189,6 +240,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma) return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID; } +static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma) +{ + pt_dma = (pt_dma & PAGE_DESC_LO_MASK) | +((pt_dma & PAGE_DESC_HI_MASK1) >> PAGE_DESC_HI_SHIFT1) | +(pt_dma & PAGE_DESC_HI_MASK2) >> PAGE_DESC_HI_SHIFT2; + + return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID; +} + /* * Each PTE has a Page address, some flags and a valid bit: * +-+---+---+-+ @@ -215,11 +275,37 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma) #define RK_PTE_PAGE_READABLE BIT(1) #define RK_PTE_PAGE_VALID BIT(0) +/* + * In v2: + * 31:12 - Page address bit 31:0 + * 11:9 - Page address bit 34:32 + * 8:4 - Page address bit 39:35 + * 3 - Security + * 2 - Readable + * 1 - Writable + * 0 - 1 if Page @ Page address is valid + */ +#define RK_PTE_PAGE_ADDRESS_MASK_V2 0xfff0 +#define RK_PTE_PAGE_FLAGS_MASK_V20x000e +#define RK_PTE_PAGE_READABLE_V2 BIT(2) +#define RK_PTE_PAGE_WRITABLE_V2 BIT(1) + static inline phys_addr_t rk_pte_page_address(u32 pte) { return (phys_addr_t)pte & RK_PTE_PAGE_ADDRESS_MASK; } +static inline phys_addr_t rk_pte_page_address_v2(u32 pte) +{ + u64 pte_v2 = pte; + + pte_v2 = ((pte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) | +((pte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) | +(pte_v2 & PAGE_DESC_LO_MASK); + + return (phys_addr_t)pte_v2; +} + static inline bool rk_pte_is_page_valid(u32 pte) { return pte & RK_PTE_PAGE_VALID; @@ -229,12 +315,26 @@ static inline bool rk_pte_is_page_valid(u32 pte) static u32 rk_mk_pte(phys_addr_t page, int prot)
[PATCH v4 3/6] ARM: dts: rockchip: rk322x: Fix IOMMU nodes properties
Add '#" to iommu-cells properties. Remove useless interrupt-names properties Signed-off-by: Benjamin Gaignard --- version 4: - Remove interrupt-names properties from IOMMU nodes arch/arm/boot/dts/rk322x.dtsi | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/arch/arm/boot/dts/rk322x.dtsi b/arch/arm/boot/dts/rk322x.dtsi index a4dd50aaf3fc..7b4143ddc95c 100644 --- a/arch/arm/boot/dts/rk322x.dtsi +++ b/arch/arm/boot/dts/rk322x.dtsi @@ -561,10 +561,9 @@ vpu_mmu: iommu@20020800 { compatible = "rockchip,iommu"; reg = <0x20020800 0x100>; interrupts = ; - interrupt-names = "vpu_mmu"; clocks = < ACLK_VPU>, < HCLK_VPU>; clock-names = "aclk", "iface"; - iommu-cells = <0>; + #iommu-cells = <0>; status = "disabled"; }; @@ -572,10 +571,9 @@ vdec_mmu: iommu@20030480 { compatible = "rockchip,iommu"; reg = <0x20030480 0x40>, <0x200304c0 0x40>; interrupts = ; - interrupt-names = "vdec_mmu"; clocks = < ACLK_RKVDEC>, < HCLK_RKVDEC>; clock-names = "aclk", "iface"; - iommu-cells = <0>; + #iommu-cells = <0>; status = "disabled"; }; @@ -605,7 +603,6 @@ vop_mmu: iommu@20053f00 { compatible = "rockchip,iommu"; reg = <0x20053f00 0x100>; interrupts = ; - interrupt-names = "vop_mmu"; clocks = < ACLK_VOP>, < HCLK_VOP>; clock-names = "aclk", "iface"; #iommu-cells = <0>; @@ -626,10 +623,9 @@ iep_mmu: iommu@20070800 { compatible = "rockchip,iommu"; reg = <0x20070800 0x100>; interrupts = ; - interrupt-names = "iep_mmu"; clocks = < ACLK_IEP>, < HCLK_IEP>; clock-names = "aclk", "iface"; - iommu-cells = <0>; + #iommu-cells = <0>; status = "disabled"; }; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 5/6] ARM64: dts: rockchip: rk3036: Remove useless interrupt-names properties
Remove useless interrupt-names properties for IOMMU nodes Signed-off-by: Benjamin Gaignard --- arch/arm64/boot/dts/rockchip/px30.dtsi | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/arm64/boot/dts/rockchip/px30.dtsi b/arch/arm64/boot/dts/rockchip/px30.dtsi index c45b0cfcae09..47d0c29fd8d0 100644 --- a/arch/arm64/boot/dts/rockchip/px30.dtsi +++ b/arch/arm64/boot/dts/rockchip/px30.dtsi @@ -1068,7 +1068,6 @@ vopb_mmu: iommu@ff460f00 { compatible = "rockchip,iommu"; reg = <0x0 0xff460f00 0x0 0x100>; interrupts = ; - interrupt-names = "vopb_mmu"; clocks = < ACLK_VOPB>, < HCLK_VOPB>; clock-names = "aclk", "iface"; power-domains = < PX30_PD_VO>; @@ -1109,7 +1108,6 @@ vopl_mmu: iommu@ff470f00 { compatible = "rockchip,iommu"; reg = <0x0 0xff470f00 0x0 0x100>; interrupts = ; - interrupt-names = "vopl_mmu"; clocks = < ACLK_VOPL>, < HCLK_VOPL>; clock-names = "aclk", "iface"; power-domains = < PX30_PD_VO>; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 4/6] ARM: dts: rockchip: rk3036: Remove useless interrupt-names on IOMMU node
Remove useless interrupt-names property for IOMMU node Signed-off-by: Benjamin Gaignard --- arch/arm/boot/dts/rk3036.dtsi | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm/boot/dts/rk3036.dtsi b/arch/arm/boot/dts/rk3036.dtsi index 47a787a12e55..16753dc50bf3 100644 --- a/arch/arm/boot/dts/rk3036.dtsi +++ b/arch/arm/boot/dts/rk3036.dtsi @@ -140,7 +140,6 @@ vop_mmu: iommu@10118300 { compatible = "rockchip,iommu"; reg = <0x10118300 0x100>; interrupts = ; - interrupt-names = "vop_mmu"; clocks = < ACLK_LCDC>, < HCLK_LCDC>; clock-names = "aclk", "iface"; #iommu-cells = <0>; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 1/6] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema
Convert Rockchip IOMMU to DT schema Signed-off-by: Benjamin Gaignard --- version 4: - Add descriptions for reg items - Add description for interrupts items - Remove useless interrupt-names proporties version 2: - Change maintainer - Change reg maxItems - Change interrupt maxItems .../bindings/iommu/rockchip,iommu.txt | 38 - .../bindings/iommu/rockchip,iommu.yaml| 80 +++ 2 files changed, 80 insertions(+), 38 deletions(-) delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt deleted file mode 100644 index 6ecefea1c6f9.. --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt +++ /dev/null @@ -1,38 +0,0 @@ -Rockchip IOMMU -== - -A Rockchip DRM iommu translates io virtual addresses to physical addresses for -its master device. Each slave device is bound to a single master device, and -shares its clocks, power domain and irq. - -Required properties: -- compatible : Should be "rockchip,iommu" -- reg : Address space for the configuration registers -- interrupts : Interrupt specifier for the IOMMU instance -- interrupt-names : Interrupt name for the IOMMU instance -- #iommu-cells: Should be <0>. This indicates the iommu is a -"single-master" device, and needs no additional information -to associate with its master device. See: -Documentation/devicetree/bindings/iommu/iommu.txt -- clocks : A list of clocks required for the IOMMU to be accessible by -the host CPU. -- clock-names : Should contain the following: - "iface" - Main peripheral bus clock (PCLK/HCL) (required) - "aclk" - AXI bus clock (required) - -Optional properties: -- rockchip,disable-mmu-reset : Don't use the mmu reset operation. - Some mmu instances may produce unexpected results - when the reset operation is used. - -Example: - - vopl_mmu: iommu@ff940300 { - compatible = "rockchip,iommu"; - reg = <0xff940300 0x100>; - interrupts = ; - interrupt-names = "vopl_mmu"; - clocks = < ACLK_VOP1>, < HCLK_VOP1>; - clock-names = "aclk", "iface"; - #iommu-cells = <0>; - }; diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml new file mode 100644 index ..099fc2578b54 --- /dev/null +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml @@ -0,0 +1,80 @@ +# SPDX-License-Identifier: GPL-2.0-only +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iommu/rockchip,iommu.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Rockchip IOMMU + +maintainers: + - Heiko Stuebner + +description: |+ + A Rockchip DRM iommu translates io virtual addresses to physical addresses for + its master device. Each slave device is bound to a single master device and + shares its clocks, power domain and irq. + + For information on assigning IOMMU controller to its peripheral devices, + see generic IOMMU bindings. + +properties: + compatible: +const: rockchip,iommu + + reg: +items: + - description: configuration registers for MMU instance 0 + - description: configuration registers for MMU instance 1 +minItems: 1 +maxItems: 2 + + interrupts: +items: + - description: interruption for MMU instance 0 + - description: interruption for MMU instance 1 +minItems: 1 +maxItems: 2 + + clocks: +items: + - description: Core clock + - description: Interface clock + + clock-names: +items: + - const: aclk + - const: iface + + "#iommu-cells": +const: 0 + + rockchip,disable-mmu-reset: +$ref: /schemas/types.yaml#/definitions/flag +description: | + Do not use the mmu reset operation. + Some mmu instances may produce unexpected results + when the reset operation is used. + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + - "#iommu-cells" + +additionalProperties: false + +examples: + - | +#include +#include + +vopl_mmu: iommu@ff940300 { + compatible = "rockchip,iommu"; + reg = <0xff940300 0x100>; + interrupts = ; + clocks = < ACLK_VOP1>, < HCLK_VOP1>; + clock-names = "aclk", "iface"; + #iommu-cells = <0>; +}; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 2/6] dt-bindings: iommu: rockchip: Add compatible for v2
Add compatible for the second version of IOMMU hardware block. RK356x IOMMU can also be link to a power domain. Signed-off-by: Benjamin Gaignard Reviewed-by: Rob Herring --- version 3: - Rename compatible with SoC name version 2: - Add power-domains property .../devicetree/bindings/iommu/rockchip,iommu.yaml | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml index 099fc2578b54..d2e28a9e3545 100644 --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml @@ -19,7 +19,9 @@ description: |+ properties: compatible: -const: rockchip,iommu +enum: + - rockchip,iommu + - rockchip,rk3568-iommu reg: items: @@ -48,6 +50,9 @@ properties: "#iommu-cells": const: 0 + power-domains: +maxItems: 1 + rockchip,disable-mmu-reset: $ref: /schemas/types.yaml#/definitions/flag description: | -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 0/6] Add IOMMU driver for rk356x
This series adds the IOMMU driver for rk356x SoC. Since a new compatible is needed to distinguish this second version of IOMMU hardware block from the first one, it is an opportunity to convert the binding to DT schema. version 4: - Add description for reg items - Remove useless interrupt-names properties - Add description for interrupts items - Remove interrupt-names properties from DST files version 3: - Rename compatible with soc prefix - Rebase on v5.12 tag version 2: - Fix iommu-cells typo in rk322x.dtsi - Change maintainer - Change reg maxItems - Add power-domains property Benjamin Gaignard (5): dt-bindings: iommu: rockchip: Convert IOMMU to DT schema dt-bindings: iommu: rockchip: Add compatible for v2 ARM: dts: rockchip: rk322x: Fix IOMMU nodes properties ARM: dts: rockchip: rk3036: Remove useless interrupt-names on IOMMU node ARM64: dts: rockchip: rk3036: Remove useless interrupt-names properties Simon Xue (1): iommu: rockchip: Add support iommu v2 .../bindings/iommu/rockchip,iommu.txt | 38 -- .../bindings/iommu/rockchip,iommu.yaml| 85 arch/arm/boot/dts/rk3036.dtsi | 1 - arch/arm/boot/dts/rk322x.dtsi | 10 +- arch/arm64/boot/dts/rockchip/px30.dtsi| 2 - drivers/iommu/rockchip-iommu.c| 422 +- 6 files changed, 494 insertions(+), 64 deletions(-) delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
> From: Jason Gunthorpe > Sent: Wednesday, May 5, 2021 1:13 AM > > On Wed, Apr 28, 2021 at 06:58:19AM +, Tian, Kevin wrote: > > > From: Jason Gunthorpe > > > Sent: Wednesday, April 28, 2021 1:12 AM > > > > > [...] > > > > As Alex says, if this line fails because of the group restrictions, > > > > that's not great because it's not very obvious what's gone wrong. > > > > > > Okay, that is fair, but let's solve that problem directly. For > > > instance netlink has been going in the direction of adding a "extack" > > > from the kernel which is a descriptive error string. If the failing > > > ioctl returned the string: > > > > > > "cannot join this device to the IOASID because device XXX in the > > >same group #10 is in use" > > > > > > Would you agree it is now obvious what has gone wrong? In fact would > > > you agree this is a lot better user experience than what applications > > > do today even though they have the group FD? > > > > > > > Currently all the discussions are around implicit vs. explicit uAPI > > semantics > > on the group restriction. However if we look beyond group the implicit > > semantics might be inevitable when dealing with incompatible iommu > > domains. An existing example of iommu incompatibility is IOMMU_ > > CACHE. > > I still think we need to get rid of these incompatibilities > somehow. Having multiple HW incompatible IOASID in the same platform > is just bad all around. > > When modeling in userspace IOMMU_CACHE sounds like it is a property of > each individual IOASID, not an attribute that requires a new domain. sure. the iommu domain is an kernel-internal concept. The userspace should focus everything on IOASID. > > People that want to create cache bypass IOASID's should just ask for > that that directly. > Yes, in earlier discussion we agreed on a scheme that ioasid module will return an error to userspace indicating incompatibility detected when binding a device to ioasid then the userspace should create a new IOASID for this device. This has to be done 'explicitly'. When I used it as the example for 'implicit semantics" is that the kernel won't create another group-like object to contain devices with compatible attributes and 'explicitly' manage it in uAPI like group_fd. If we anyway rely on the userspace to have more intelligence on those hardware restrictions, it's little sense to only explicitly handle group_fd in uAPI. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
> From: Alex Williamson > Sent: Wednesday, April 28, 2021 11:06 PM > > On Wed, 28 Apr 2021 06:34:11 + > "Tian, Kevin" wrote: > > > > From: Jason Gunthorpe > > > Sent: Monday, April 26, 2021 8:38 PM > > > > > [...] > > > > Want to hear your opinion for one open here. There is no doubt that > > > > an ioasid represents a HW page table when the table is constructed by > > > > userspace and then linked to the IOMMU through the bind/unbind > > > > API. But I'm not very sure about whether an ioasid should represent > > > > the exact pgtable or the mapping metadata when the underlying > > > > pgtable is indirectly constructed through map/unmap API. VFIO does > > > > the latter way, which is why it allows multiple incompatible domains > > > > in a single container which all share the same mapping metadata. > > > > > > I think VFIO's map/unmap is way too complex and we know it has bad > > > performance problems. > > > > Can you or Alex elaborate where the complexity and performance problem > > locate in VFIO map/umap? We'd like to understand more detail and see > how > > to avoid it in the new interface. > > > The map/unmap interface is really only good for long lived mappings, > the overhead is too high for things like vIOMMU use cases or any case > where the mapping is intended to be dynamic. Userspace drivers must > make use of a long lived buffer mapping in order to achieve performance. This is not a limitation of VFIO map/unmap. It's the limitation of any map/unmap semantics since the fact of long-lived vs. short-lived is imposed by userspace. Nested translation is the only viable optimization allowing 2nd-level to be a long-lived mapping even w/ vIOMMU. From this angle I'm not sure how a new map/unmap implementation could address this perf limitation alone. > > The mapping and unmapping granularity has been a problem as well, > type1v1 allowed arbitrary unmaps to bisect the original mapping, with > the massive caveat that the caller relies on the return value of the > unmap to determine what was actually unmapped because the IOMMU use > of > superpages is transparent to the caller. This led to type1v2 that > simply restricts the user to avoid ever bisecting mappings. That still > leaves us with problems for things like virtio-mem support where we > need to create initial mappings with a granularity that allows us to > later remove entries, which can prevent effective use of IOMMU > superpages. We could start with a semantics similar to type1v2. btw why does virtio-mem require a smaller granularity? Can we split superpages in-the-fly when removal actually happens (just similar to page split in VM live migration for efficient dirty page tracking)? and isn't it another problem imposed by userspace? How could a new map/unmap implementation mitigate this problem if the userspace insists on a smaller granularity for initial mappings? > > Locked page accounting has been another constant issue. We perform > locked page accounting at the container level, where each container > accounts independently. A user may require multiple containers, the > containers may pin the same physical memory, but be accounted against > the user once per container. for /dev/ioasid there is still an open whether an process is allowed to open /dev/ioasid once or multiple times. If there is only one ioasid_fd per process, the accounting can be made accurately. otherwise the same problem still exists as each ioasid_fd is akin to the container, then we need find a better solution. > > Those are the main ones I can think of. It is nice to have a simple > map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't > raise the barrier to entry too high, but the user needs to have the > ability to have more control of their mappings and locked page > accounting should probably be offloaded somewhere. Thanks, > Based on your feedbacks I feel it's probably reasonable to start with a type1v2 semantics for the new interface. Locked accounting could also start with the same VFIO restriction and then improve it incrementally, if a cleaner way is intrusive (if not affecting uAPI). But I didn't get the suggestion on "more control of their mappings". Can you elaborate? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu