Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver
On 2024-06-24 3:36 pm, Teddy Astie wrote: Hello Robin, Thanks for the thourough review. diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 0af39bbbe3a3..242cefac77c9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -480,6 +480,15 @@ config VIRTIO_IOMMU Say Y here if you intend to run this kernel as a guest. +config XEN_IOMMU + bool "Xen IOMMU driver" + depends on XEN_DOM0 Clearly this depends on X86 as well. Well, I don't intend this driver to be X86-only, even though the current Xen RFC doesn't support ARM (yet). Unless there is a counter-indication for it ? It's purely practical - even if you drop the asm/iommu.h stuff it would still break ARM DOM0 builds due to HYPERVISOR_iommu_op() only being defined for x86. And it's better to add a dependency here to make it clear what's *currently* supported, than to add dummy code to allow it to build for ARM if that's not actually tested or usable yet. +bool xen_iommu_capable(struct device *dev, enum iommu_cap cap) +{ + switch (cap) { + case IOMMU_CAP_CACHE_COHERENCY: + return true; Will the PV-IOMMU only ever be exposed on hardware where that really is always true? On the hypervisor side, the PV-IOMMU interface always implicitely flush the IOMMU hardware on map/unmap operation, so at the end of the hypercall, the cache should be always coherent IMO. As Jason already brought up, this is not about TLBs or anything cached by the IOMMU itself, it's about the memory type(s) it can create mappings with. Returning true here says Xen guarantees it can use a cacheable memory type which will let DMA snoop the CPU caches. Furthermore, not explicitly handling IOMMU_CACHE in the map_pages op then also implies that it will *always* do that, so you couldn't actually get an uncached mapping even if you wanted one. + while (xen_pg_count) { + size_t to_unmap = min(xen_pg_count, max_nr_pages); + + //pr_info("Unmapping %lx-%lx\n", dfn, dfn + to_unmap - 1); + + op.unmap_pages.dfn = dfn; + op.unmap_pages.nr_pages = to_unmap; + + ret = HYPERVISOR_iommu_op(); + + if (ret) + pr_warn("Unmap failure (%lx-%lx)\n", dfn, dfn + to_unmap - 1); But then how would it ever happen anyway? Unmap is a domain op, so a domain which doesn't allow unmapping shouldn't offer it in the first place... Unmap failing should be exceptionnal, but is possible e.g with transparent superpages (like Xen IOMMU drivers do). Xen drivers folds appropriate contiguous mappings into superpages entries to optimize memory usage and iotlb. However, if you unmap in the middle of a region covered by a superpage entry, this is no longer a valid superpage entry, and you need to allocate and fill the lower levels, which is faillible if lacking memory. OK, so in the worst case you could potentially have a partial unmap failure if the range crosses a superpage boundary and the end part happens to have been folded, and Xen doesn't detect and prepare that allocation until it's already unmapped up to the boundary. If that is so, does the hypercall interface give any information about partial failure, or can any error only be taken to mean that some or all of the given range may or may not have be unmapped now? In this case I'd argue that you really *do* want to return short, in the hope of propagating the error back up and letting the caller know the address space is now messed up before things start blowing up even more if they keep going and subsequently try to map new pages into not-actually-unmapped VAs. While mapping on top of another mapping is ok for us (it's just going to override the previous mapping), I definetely agree that having the address space messed up is not good. Oh, indeed, quietly replacing existing PTEs might help paper over errors in this particular instance, but it does then allow *other* cases to go wrong in fun and infuriating ways :) +static struct iommu_domain default_domain = { + .ops = &(const struct iommu_domain_ops){ + .attach_dev = default_domain_attach_dev + } +}; Looks like you could make it a static xen_iommu_domain and just use the normal attach callback? Either way please name it something less confusing like xen_iommu_identity_domain - "default" is far too overloaded round here already... Yes, although, if in the future, we can have either this domain as identity or blocking/paging depending on some upper level configuration. Should we have both identity and blocking domains, and only setting the relevant one in iommu_ops, or keep this naming. That's something that can be considered if and when it does happen. For now, if it's going to be pre-mapped as an identity domain, then let's just treat it as such and keep things straightforward. +void __exit xen_iommu_fini(void) +{ + pr_info("Unregistering Xen IOMMU driver\n"); + + iommu_device_unregister(_iommu_device); +
Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver
On 2024-06-24 6:36 pm, Easwar Hariharan wrote: Hi Jason, On 6/24/2024 9:32 AM, Jason Gunthorpe wrote: On Mon, Jun 24, 2024 at 02:36:45PM +, Teddy Astie wrote: +bool xen_iommu_capable(struct device *dev, enum iommu_cap cap) +{ + switch (cap) { + case IOMMU_CAP_CACHE_COHERENCY: + return true; Will the PV-IOMMU only ever be exposed on hardware where that really is always true? On the hypervisor side, the PV-IOMMU interface always implicitely flush the IOMMU hardware on map/unmap operation, so at the end of the hypercall, the cache should be always coherent IMO. Cache coherency is a property of the underlying IOMMU HW and reflects the ability to prevent generating transactions that would bypass the cache. On AMD and Intel IOMMU HW this maps to a bit in their PTEs that must always be set to claim this capability. No ARM SMMU supports it yet. Unrelated to this patch: Both the arm-smmu and arm-smmu-v3 drivers claim this capability if the device tree/IORT table have the corresponding flags. I read through DEN0049 to determine what are the knock-on effects, or equivalently the requirements to set those flags in the IORT, but came up empty. Could you help with what I'm missing to resolve the apparent contradiction between your statement and the code? We did rejig things slightly a while back. The status quo now is that IOMMU_CAP_CACHE_COHERENCY mostly covers whether IOMMU mappings can make device accesses coherent at all, tied in with the IOMMU_CACHE prot value - this is effectively forced for Intel and AMD, while for SMMU we have to take a guess, but as commented it's a pretty reasonable assumption that if the SMMU's own output for table walks etc. is coherent then its translation outputs are likely to be too. The further property of being able to then enforce a coherent mapping regardless of what an endpoint might try to get around it (PCIe No Snoop etc.) is now under the enforce_cache_coherency op - that's what SMMU can't guarantee for now due to the IMP-DEF nature of whether S2FWB overrides No Snoop or not. Thanks, Robin.
Re: [RFC PATCH] iommu/xen: Add Xen PV-IOMMU driver
On 2024-06-23 4:21 am, Baolu Lu wrote: On 6/21/24 11:09 PM, Teddy Astie wrote: Le 19/06/2024 à 18:30, Jason Gunthorpe a écrit : On Thu, Jun 13, 2024 at 01:50:22PM +, Teddy Astie wrote: +struct iommu_domain *xen_iommu_domain_alloc(unsigned type) +{ + struct xen_iommu_domain *domain; + u16 ctx_no; + int ret; + + if (type & IOMMU_DOMAIN_IDENTITY) { + /* use default domain */ + ctx_no = 0; Please use the new ops, domain_alloc_paging and the static identity domain. Yes, in the v2, I will use this newer interface. I have a question on this new interface : is it valid to not have a identity domain (and "default domain" being blocking); well in the current implementation it doesn't really matter, but at some point, we may want to allow not having it (thus making this driver mandatory). It's valid to not have an identity domain if "default domain being blocking" means a paging domain with no mappings. In the iommu driver's iommu_ops::def_domain_type callback, just always return IOMMU_DOMAIN_DMA, which indicates that the iommu driver doesn't support identity translation. That's not necessary - if neither ops->identity_domain nor ops->domain_alloc(IOMMU_DOMAIN_IDENTITY) gives a valid domain then we fall back to IOMMU_DOMAIN_DMA anyway. Thanks, Robin.
Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver
On 2024-06-21 5:08 pm, TSnake41 wrote: From: Teddy Astie In the context of Xen, Linux runs as Dom0 and doesn't have access to the machine IOMMU. Although, a IOMMU is mandatory to use some kernel features such as VFIO or DMA protection. In Xen, we added a paravirtualized IOMMU with iommu_op hypercall in order to allow Dom0 to implement such feature. This commit introduces a new IOMMU driver that uses this new hypercall interface. Signed-off-by Teddy Astie --- Changes since v1 : * formatting changes * applied Jan Beulich proposed changes : removed vim notes at end of pv-iommu.h * applied Jason Gunthorpe proposed changes : use new ops and remove redundant checks --- arch/x86/include/asm/xen/hypercall.h | 6 + drivers/iommu/Kconfig| 9 + drivers/iommu/Makefile | 1 + drivers/iommu/xen-iommu.c| 489 +++ include/xen/interface/memory.h | 33 ++ include/xen/interface/pv-iommu.h | 104 ++ include/xen/interface/xen.h | 1 + 7 files changed, 643 insertions(+) create mode 100644 drivers/iommu/xen-iommu.c create mode 100644 include/xen/interface/pv-iommu.h diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index a2dd24947eb8..6b1857f27c14 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -490,6 +490,12 @@ HYPERVISOR_xenpmu_op(unsigned int op, void *arg) return _hypercall2(int, xenpmu_op, op, arg); } +static inline int +HYPERVISOR_iommu_op(void *arg) +{ + return _hypercall1(int, iommu_op, arg); +} + static inline int HYPERVISOR_dm_op( domid_t dom, unsigned int nr_bufs, struct xen_dm_op_buf *bufs) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 0af39bbbe3a3..242cefac77c9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -480,6 +480,15 @@ config VIRTIO_IOMMU Say Y here if you intend to run this kernel as a guest. +config XEN_IOMMU + bool "Xen IOMMU driver" + depends on XEN_DOM0 Clearly this depends on X86 as well. + select IOMMU_API + help + Xen PV-IOMMU driver for Dom0. + + Say Y here if you intend to run this guest as Xen Dom0. + config SPRD_IOMMU tristate "Unisoc IOMMU Support" depends on ARCH_SPRD || COMPILE_TEST diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 542760d963ec..393afe22c901 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -30,3 +30,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o obj-$(CONFIG_APPLE_DART) += apple-dart.o +obj-$(CONFIG_XEN_IOMMU) += xen-iommu.o \ No newline at end of file diff --git a/drivers/iommu/xen-iommu.c b/drivers/iommu/xen-iommu.c new file mode 100644 index ..b765445d27cd --- /dev/null +++ b/drivers/iommu/xen-iommu.c @@ -0,0 +1,489 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Xen PV-IOMMU driver. + * + * Copyright (C) 2024 Vates SAS + * + * Author: Teddy Astie + * + */ + +#define pr_fmt(fmt)"xen-iommu: " fmt + +#include +#include +#include +#include +#include Please drop this; it's a driver, not a DMA ops implementation. +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +MODULE_DESCRIPTION("Xen IOMMU driver"); +MODULE_AUTHOR("Teddy Astie "); +MODULE_LICENSE("GPL"); + +#define MSI_RANGE_START(0xfee0) +#define MSI_RANGE_END (0xfeef) + +#define XEN_IOMMU_PGSIZES (0x1000) + +struct xen_iommu_domain { + struct iommu_domain domain; + + u16 ctx_no; /* Xen PV-IOMMU context number */ +}; + +static struct iommu_device xen_iommu_device; + +static uint32_t max_nr_pages; +static uint64_t max_iova_addr; + +static spinlock_t lock; Not a great name - usually it's good to name a lock after what it protects. Although perhaps it is already, since AFAICS this isn't actually used anywhere anyway. +static inline struct xen_iommu_domain *to_xen_iommu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct xen_iommu_domain, domain); +} + +static inline u64 addr_to_pfn(u64 addr) +{ + return addr >> 12; +} + +static inline u64 pfn_to_addr(u64 pfn) +{ + return pfn << 12; +} + +bool xen_iommu_capable(struct device *dev, enum iommu_cap cap) +{ + switch (cap) { + case IOMMU_CAP_CACHE_COHERENCY: + return true; Will the PV-IOMMU only ever be exposed on hardware where that really is always true? + + default: + return false; + } +} + +struct iommu_domain *xen_iommu_domain_alloc_paging(struct device *dev) +{ + struct xen_iommu_domain *domain; + int ret; + + struct pv_iommu_op op = { +
Re: [PATCH v4 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk
On 23/05/2024 6:52 pm, Rob Clark wrote: From: Rob Clark Add an io-pgtable method to walk the pgtable returning the raw PTEs that would be traversed for a given iova access. Have to say I'm a little torn here - with my iommu-dma hat on I'm not super enthusiastic about adding any more overhead to iova_to_phys, but in terms of maintaining io-pgtable I do like the overall shape of the implementation... Will, how much would you hate a compromise of inlining iova_to_phys as the default walk behaviour if cb is NULL? :) That said, looking at the unmap figures for dma_map_benchmark on a Neoverse N1, any difference I think I see is still well within the noise, so maybe a handful of extra indirect calls isn't really enough to worry about? Cheers, Robin. Signed-off-by: Rob Clark --- drivers/iommu/io-pgtable-arm.c | 51 -- include/linux/io-pgtable.h | 4 +++ 2 files changed, 46 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index f7828a7aad41..f47a0e64bb35 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -693,17 +693,19 @@ static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iov data->start_level, ptep); } -static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, -unsigned long iova) +static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned long iova, + int (*cb)(void *cb_data, void *pte, int level), + void *cb_data) { struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); arm_lpae_iopte pte, *ptep = data->pgd; int lvl = data->start_level; + int ret; do { /* Valid IOPTE pointer? */ if (!ptep) - return 0; + return -EFAULT; /* Grab the IOPTE we're interested in */ ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); @@ -711,22 +713,52 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, /* Valid entry? */ if (!pte) - return 0; + return -EFAULT; + + ret = cb(cb_data, , lvl); + if (ret) + return ret; - /* Leaf entry? */ + /* Leaf entry? If so, we've found the translation */ if (iopte_leaf(pte, lvl, data->iop.fmt)) - goto found_translation; + return 0; /* Take it to the next level */ ptep = iopte_deref(pte, data); } while (++lvl < ARM_LPAE_MAX_LEVELS); /* Ran out of page tables to walk */ + return -EFAULT; +} + +struct iova_to_phys_walk_data { + arm_lpae_iopte pte; + int level; +}; + +static int iova_to_phys_walk_cb(void *cb_data, void *pte, int level) +{ + struct iova_to_phys_walk_data *d = cb_data; + + d->pte = *(arm_lpae_iopte *)pte; + d->level = level; + return 0; +} + +static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, +unsigned long iova) +{ + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct iova_to_phys_walk_data d; + int ret; + + ret = arm_lpae_pgtable_walk(ops, iova, iova_to_phys_walk_cb, ); + if (ret) + return 0; -found_translation: - iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1); - return iopte_to_paddr(pte, data) | iova; + iova &= (ARM_LPAE_BLOCK_SIZE(d.level, data) - 1); + return iopte_to_paddr(d.pte, data) | iova; } static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) @@ -807,6 +839,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg) .map_pages = arm_lpae_map_pages, .unmap_pages= arm_lpae_unmap_pages, .iova_to_phys = arm_lpae_iova_to_phys, + .pgtable_walk = arm_lpae_pgtable_walk, }; return data; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 86cf1f7ae389..261b48af068a 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -177,6 +177,7 @@ struct io_pgtable_cfg { * @map_pages:Map a physically contiguous range of pages of the same size. * @unmap_pages: Unmap a range of virtually contiguous pages of the same size. * @iova_to_phys: Translate iova to physical address. + * @pgtable_walk: (optional) Perform a page table walk for a given iova. * * These functions map directly onto the iommu_ops member functions with * the same names. @@ -190,6 +191,9 @@ struct io_pgtable_ops { struct iommu_iotlb_gather *gather); phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
Re: [PATCH v4 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk
On 23/05/2024 6:52 pm, Rob Clark wrote: From: Rob Clark Add an io-pgtable method to walk the pgtable returning the raw PTEs that would be traversed for a given iova access. Have to say I'm a little torn here - with my iommu-dma hat on I'm not super enthusiastic about adding any more overhead to iova_to_phys, but in terms of maintaining io-pgtable I do like the overall shape of the implementation... Will, how much would you hate a compromise of inlining iova_to_phys as the default walk behaviour if cb is NULL? :) That said, looking at the unmap figures for dma_map_benchmark on a Neoverse N1, any difference I think I see is still well within the noise, so maybe a handful of extra indirect calls isn't really enough to worry about? Cheers, Robin. Signed-off-by: Rob Clark --- drivers/iommu/io-pgtable-arm.c | 51 -- include/linux/io-pgtable.h | 4 +++ 2 files changed, 46 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index f7828a7aad41..f47a0e64bb35 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -693,17 +693,19 @@ static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iov data->start_level, ptep); } -static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, -unsigned long iova) +static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned long iova, + int (*cb)(void *cb_data, void *pte, int level), + void *cb_data) { struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); arm_lpae_iopte pte, *ptep = data->pgd; int lvl = data->start_level; + int ret; do { /* Valid IOPTE pointer? */ if (!ptep) - return 0; + return -EFAULT; /* Grab the IOPTE we're interested in */ ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); @@ -711,22 +713,52 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, /* Valid entry? */ if (!pte) - return 0; + return -EFAULT; + + ret = cb(cb_data, , lvl); + if (ret) + return ret; - /* Leaf entry? */ + /* Leaf entry? If so, we've found the translation */ if (iopte_leaf(pte, lvl, data->iop.fmt)) - goto found_translation; + return 0; /* Take it to the next level */ ptep = iopte_deref(pte, data); } while (++lvl < ARM_LPAE_MAX_LEVELS); /* Ran out of page tables to walk */ + return -EFAULT; +} + +struct iova_to_phys_walk_data { + arm_lpae_iopte pte; + int level; +}; + +static int iova_to_phys_walk_cb(void *cb_data, void *pte, int level) +{ + struct iova_to_phys_walk_data *d = cb_data; + + d->pte = *(arm_lpae_iopte *)pte; + d->level = level; + return 0; +} + +static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, +unsigned long iova) +{ + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct iova_to_phys_walk_data d; + int ret; + + ret = arm_lpae_pgtable_walk(ops, iova, iova_to_phys_walk_cb, ); + if (ret) + return 0; -found_translation: - iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1); - return iopte_to_paddr(pte, data) | iova; + iova &= (ARM_LPAE_BLOCK_SIZE(d.level, data) - 1); + return iopte_to_paddr(d.pte, data) | iova; } static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) @@ -807,6 +839,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg) .map_pages = arm_lpae_map_pages, .unmap_pages= arm_lpae_unmap_pages, .iova_to_phys = arm_lpae_iova_to_phys, + .pgtable_walk = arm_lpae_pgtable_walk, }; return data; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 86cf1f7ae389..261b48af068a 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -177,6 +177,7 @@ struct io_pgtable_cfg { * @map_pages:Map a physically contiguous range of pages of the same size. * @unmap_pages: Unmap a range of virtually contiguous pages of the same size. * @iova_to_phys: Translate iova to physical address. + * @pgtable_walk: (optional) Perform a page table walk for a given iova. * * These functions map directly onto the iommu_ops member functions with * the same names. @@ -190,6 +191,9 @@ struct io_pgtable_ops { struct iommu_iotlb_gather *gather); phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
Re: [PATCH 2/9] iommu/rockchip: Attach multiple power domains
On 2024-06-13 10:38 pm, Sebastian Reichel wrote: Hi, On Thu, Jun 13, 2024 at 11:34:02AM GMT, Tomeu Vizoso wrote: On Thu, Jun 13, 2024 at 11:24 AM Tomeu Vizoso wrote: On Thu, Jun 13, 2024 at 2:05 AM Sebastian Reichel wrote: On Wed, Jun 12, 2024 at 03:52:55PM GMT, Tomeu Vizoso wrote: IOMMUs with multiple base addresses can also have multiple power domains. The base framework only takes care of a single power domain, as some devices will need for these power domains to be powered on in a specific order. Use a helper function to stablish links in the order in which they are in the DT. This is needed by the IOMMU used by the NPU in the RK3588. Signed-off-by: Tomeu Vizoso --- To me it looks like this is multiple IOMMUs, which should each get their own node. I don't see a good reason for merging these together. I have made quite a few attempts at splitting the IOMMUs and also the cores, but I wasn't able to get things working stably. The TRM is really scant about how the 4 IOMMU instances relate to each other, and what the fourth one is for. Given that the vendor driver treats them as a single IOMMU with four instances and we don't have any information on them, I resigned myself to just have them as a single device. I would love to be proved wrong though and find a way fo getting things stably as different devices so they can be powered on and off as needed. We could save quite some code as well. FWIW, here a few ways how I tried to structure the DT nodes, none of these worked reliably: https://gitlab.freedesktop.org/tomeu/linux/-/blob/6.10-rocket-multiple-devices-power/arch/arm64/boot/dts/rockchip/rk3588s.dtsi?ref_type=heads#L1163 https://gitlab.freedesktop.org/tomeu/linux/-/blob/6.10-rocket-schema-subnodes//arch/arm64/boot/dts/rockchip/rk3588s.dtsi?ref_type=heads#L1162 https://gitlab.freedesktop.org/tomeu/linux/-/blob/6.10-rocket-multiple-devices//arch/arm64/boot/dts/rockchip/rk3588s.dtsi?ref_type=heads#L1163 https://gitlab.freedesktop.org/tomeu/linux/-/blob/6.10-rocket-multiple-iommus//arch/arm64/boot/dts/rockchip/rk3588s.dtsi?ref_type=heads#L2669 I can very well imagine I missed some way of getting this to work, but for every attempt, the domains, iommus and cores were resumed in different orders that presumably caused problems during concurrent execution fo workloads. So I fell back to what the vendor driver does, which works reliably (but all cores have to be powered on at the same time). Mh. The "6.10-rocket-multiple-iommus" branch seems wrong. There is only one iommu node in that. I would have expected a test with rknn { // combined device iommus = <>, <>, ...; }; Otherwise I think I would go with the schema-subnodes variant. The driver can initially walk through the sub-nodes and collect the resources into the main device, so on the driver side nothing would really change. But that has a couple of advantages: 1. DT and DT binding are easier to read 2. It's similar to e.g. CPU cores each having their own node 3. Easy to extend to more cores in the future 4. The kernel can easily switch to proper per-core device model when the problem has been identified It also would seem to permit describing and associating the per-core IOMMUs individually - apart from core 0's apparent coupling to whatever shared "uncore" stuff exists for the whole thing, from the distinct clocks, interrupts, power domains etc. lining up with each core I'd guess those IOMMUs are not interrelated the same way the ISP's read/write IOMMUs are (which was the main justification for adopting the multiple-reg design originally vs. distinct DT nodes like Exynos does). However, practically that would require the driver to at least populate per-core child devices to make DMA API or IOMMU API mappings with, since we couldn't spread the "collect the resources" trick into those subsystems as well. Thanks, Robin.
Re: [PATCH v3] hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs
On 2024-06-12 1:50 pm, Philippe Mathieu-Daudé wrote: On 12/6/24 14:48, Peter Maydell wrote: On Wed, 12 Jun 2024 at 13:33, Philippe Mathieu-Daudé wrote: Hi Zhenyu, On 12/6/24 04:05, Zhenyu Zhang wrote: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 3c93c0c0a6..3cefac6d43 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -271,6 +271,17 @@ static void create_fdt(VirtMachineState *vms) qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2); qemu_fdt_setprop_string(fdt, "/", "model", "linux,dummy-virt"); + /* + * For QEMU, all DMA is coherent. Advertising this in the root node + * has two benefits: + * + * - It avoids potential bugs where we forget to mark a DMA + * capable device as being dma-coherent + * - It avoids spurious warnings from the Linux kernel about + * devices which can't do DMA at all + */ + qemu_fdt_setprop(fdt, "/", "dma-coherent", NULL, 0); OK, but why restrict that to the Aarch64 virt machine? Shouldn't advertise this generically in create_device_tree()? Or otherwise at least in the other virt machines? create_device_tree() creates an empty device tree, not one with stuff in it. It seems reasonable to me for this property on the root to be set in the same place we set other properties of the root node. OK. Still the question about other virt machines remains unanswered :) From the DT consumer point of view, the interpretation and assumptions around coherency *are* generally architecture- or platform-specific. For instance on RISC-V, many platforms want to assume coherency by default (and potentially use "dma-noncoherent" to mark individual devices that aren't), while others may still want to do the opposite and use "dma-coherent" in the same manner as Arm and AArch64. Neither property existed back in ePAPR, so typical PowerPC systems wouldn't even be looking and will just make their own assumptions by other means. Thanks, Robin.
Re: [PATCH RFC] hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs
ng passed force_dma = true. https://elixir.bootlin.com/linux/v6.10-rc2/source/drivers/amba/bus.c#L361 The is a comment in of_dma_configure() /* * For legacy reasons, we have to assume some devices need * DMA configuration regardless of whether "dma-ranges" is * correctly specified or not. */ So this I think this is being triggered by a workaround for broken DT. This was introduced by Robin Murphy +CC though you may need to ask on kernel list because ARM / QEMU fun. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=723288836628b Relevant comment from that patch description: "Certain bus types have a general expectation of DMA capability and carry a well-established precedent that an absent "dma-ranges" implies the same as the empty property, so we automatically opt those in to DMA configuration regardless, to avoid regressing most existing platforms." The patch implies that AMBA is one of those. So not sure this is solveable without a hack such as eliding the warning message if dma_force was set as the situation probably isn't relevant then.. Except it absolutely is, because the whole reason for setting force_dma on those buses is that they *do* commonly have DMA-capable devices, and they are also commonly non-coherent such that this condition would be serious. Especially AMBA, given that the things old enough to still be using that abstraction rather than plain platform (PL080, PL111, PL330,...) all predate ACE-Lite so don't even have the *possibility* of being coherent without external trickery in the interconnect. Thanks, Robin.
Re: [PATCH 00/20] iommu: Refactoring domain allocation interface
On 29/05/2024 6:32 am, Lu Baolu wrote: The IOMMU subsystem has undergone some changes, including the removal of iommu_ops from the bus structure. Consequently, the existing domain allocation interface, which relies on a bus type argument, is no longer relevant: struct iommu_domain *iommu_domain_alloc(struct bus_type *bus) This series is designed to refactor the use of this interface. It proposes two new interfaces to replace iommu_domain_alloc(): - iommu_user_domain_alloc(): This interface is intended for allocating iommu domains managed by userspace for device passthrough scenarios, such as those used by iommufd, vfio, and vdpa. It clearly indicates that the domain is for user-managed device DMA. If an IOMMU driver does not implement iommu_ops->domain_alloc_user, this interface will rollback to the generic paging domain allocation. - iommu_paging_domain_alloc(): This interface is for allocating iommu domains managed by kernel drivers for kernel DMA purposes. It takes a device pointer as a parameter, which better reflects the current design of the IOMMU subsystem. The majority of device drivers currently using iommu_domain_alloc() do so to allocate a domain for a specific device and then attach that domain to the device. These cases can be straightforwardly migrated to the new interfaces. Ooh, nice! This was rising back up my to-do list as well, but I concur it's rather more straightforward than my version that did devious things to keep the iommu_domain_alloc() name... However, there are some drivers with more complex use cases that do not fit neatly into this new scheme. For example: $ git grep "= iommu_domain_alloc" arch/arm/mm/dma-mapping.c: mapping->domain = iommu_domain_alloc(bus); This one's simple enough, the refactor just needs to go one step deeper. I've just rebased and pushed my old patch for that, if you'd like it [1]. drivers/gpu/drm/rockchip/rockchip_drm_drv.c:private->domain = iommu_domain_alloc(private->iommu_dev->bus); Both this one and usnic_uiom_alloc_pd() should be OK - back when I did all the figuring out to clean up iommu_present(), I specifically reworked them into "dev->bus" style as a reminder that it *is* supposed to be the right device for doing this with, even if the attach is a bit more distant. drivers/gpu/drm/tegra/drm.c:tegra->domain = iommu_domain_alloc(_bus_type); This is the tricky one, where the device to hand may *not* be the right device for IOMMU API use [2]. FWIW my plan was to pull the "walk the platform bus to find any IOMMU-mapped device" trick into this code and use it both to remove the final iommu_present() and for a device-based domain allocation. drivers/infiniband/hw/usnic/usnic_uiom.c: pd->domain = domain = iommu_domain_alloc(dev->bus); This series leave those cases unchanged and keep iommu_domain_alloc() for their usage. But new drivers should not use it anymore. I'd certainly be keen for it to be gone ASAP, since I'm seeing increasing demand for supporting multiple IOMMU drivers, and this is the last bus-based thing standing in the way of that. Thanks, Robin. [1] https://gitlab.arm.com/linux-arm/linux-rm/-/commit/f048cc6a323d8641898025ca96071df7cbe8bd52 [2] https://lore.kernel.org/linux-iommu/add31812-50d5-6cb0-3908-143c523ab...@collabora.com/ The whole series is also available on GitHub: https://github.com/LuBaolu/intel-iommu/commits/iommu-domain-allocation-refactor-v1 Lu Baolu (20): iommu: Add iommu_user_domain_alloc() interface iommufd: Use iommu_user_domain_alloc() vfio/type1: Use iommu_paging_domain_alloc() vhost-vdpa: Use iommu_user_domain_alloc() iommu: Add iommu_paging_domain_alloc() interface drm/msm: Use iommu_paging_domain_alloc() drm/nouveau/tegra: Use iommu_paging_domain_alloc() gpu: host1x: Use iommu_paging_domain_alloc() media: nvidia: tegra: Use iommu_paging_domain_alloc() media: venus: firmware: Use iommu_paging_domain_alloc() ath10k: Use iommu_paging_domain_alloc() wifi: ath11k: Use iommu_paging_domain_alloc() remoteproc: Use iommu_paging_domain_alloc() soc/fsl/qbman: Use iommu_paging_domain_alloc() iommu/vt-d: Add helper to allocate paging domain iommu/vt-d: Add domain_alloc_paging support iommu/vt-d: Simplify compatibility check for identity domain iommu/vt-d: Enhance compatibility check for paging domain attach iommu/vt-d: Remove domain_update_iommu_cap() iommu/vt-d: Remove domain_update_iommu_superpage() include/linux/iommu.h | 12 + drivers/gpu/drm/msm/msm_iommu.c | 8 +- .../drm/nouveau/nvkm/engine/device/tegra.c| 4 +- drivers/gpu/host1x/dev.c | 6 +- drivers/iommu/intel/iommu.c | 319 -- drivers/iommu/intel/pasid.c | 28 +- drivers/iommu/iommu.c | 62 drivers/iommu/iommufd/hw_pagetable.c |
Re: [Intel-wired-lan] [PATCH net-next v6 13/21] bitmap: make bitmap_{get, set}_value8() use bitmap_{read, write}()
On 30/05/2024 6:11 pm, Yury Norov wrote: On Wed, May 29, 2024 at 04:12:25PM +0100, Robin Murphy wrote: Hi Alexander, On 27/03/2024 3:23 pm, Alexander Lobakin wrote: Now that we have generic bitmap_read() and bitmap_write(), which are inline and try to take care of non-bound-crossing and aligned cases to keep them optimized, collapse bitmap_{get,set}_value8() into simple wrappers around the former ones. bloat-o-meter shows no difference in vmlinux and -2 bytes for gpio-pca953x.ko, which says the optimization didn't suffer due to that change. The converted helpers have the value width embedded and always compile-time constant and that helps a lot. This change appears to have introduced a build failure for me on arm64 (with GCC 9.4.0 from Ubuntu 20.04.02) - reverting b44759705f7d makes these errors go away again: In file included from drivers/gpio/gpio-pca953x.c:12: drivers/gpio/gpio-pca953x.c: In function ‘pca953x_probe’: ./include/linux/bitmap.h:799:17: error: array subscript [1, 1024] is outside array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds] 799 | map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits); | ^~ In file included from ./include/linux/atomic.h:5, from drivers/gpio/gpio-pca953x.c:11: drivers/gpio/gpio-pca953x.c:1015:17: note: while referencing ‘val’ 1015 | DECLARE_BITMAP(val, MAX_LINE); | ^~~ ./include/linux/types.h:11:16: note: in definition of macro ‘DECLARE_BITMAP’ 11 | unsigned long name[BITS_TO_LONGS(bits)] |^~~~ In file included from drivers/gpio/gpio-pca953x.c:12: ./include/linux/bitmap.h:800:17: error: array subscript [1, 1024] is outside array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds] 800 | map[index + 1] |= (value >> space); | ~~~^~~ In file included from ./include/linux/atomic.h:5, from drivers/gpio/gpio-pca953x.c:11: drivers/gpio/gpio-pca953x.c:1015:17: note: while referencing ‘val’ 1015 | DECLARE_BITMAP(val, MAX_LINE); | ^~~ ./include/linux/types.h:11:16: note: in definition of macro ‘DECLARE_BITMAP’ 11 | unsigned long name[BITS_TO_LONGS(bits)] |^~~~ I've not dug further since I don't have any interest in the pca953x driver - it just happened to be enabled in my config, so for now I've turned it off. However I couldn't obviously see any other reports of this, so here it is. It's a compiler false-positive. The straightforward fix is to disable the warning For gcc9+, and it's in Andrew Morton's tree alrady. but there's some discussion ongoing on how it should be mitigated properlu: https://lore.kernel.org/all/0ab2702f-8245-4f02-beb7-dcc7d79d5...@app.fastmail.com/T/ Ah, great! Guess I really should have scrolled further down my lore search results - I assumed I was looking for any other reports of a recent regression in mainline, not ones from 6 months ago :) Cheers, Robin.
Re: [Intel-wired-lan] [PATCH net-next v6 13/21] bitmap: make bitmap_{get, set}_value8() use bitmap_{read, write}()
Hi Alexander, On 27/03/2024 3:23 pm, Alexander Lobakin wrote: Now that we have generic bitmap_read() and bitmap_write(), which are inline and try to take care of non-bound-crossing and aligned cases to keep them optimized, collapse bitmap_{get,set}_value8() into simple wrappers around the former ones. bloat-o-meter shows no difference in vmlinux and -2 bytes for gpio-pca953x.ko, which says the optimization didn't suffer due to that change. The converted helpers have the value width embedded and always compile-time constant and that helps a lot. This change appears to have introduced a build failure for me on arm64 (with GCC 9.4.0 from Ubuntu 20.04.02) - reverting b44759705f7d makes these errors go away again: In file included from drivers/gpio/gpio-pca953x.c:12: drivers/gpio/gpio-pca953x.c: In function ‘pca953x_probe’: ./include/linux/bitmap.h:799:17: error: array subscript [1, 1024] is outside array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds] 799 | map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits); | ^~ In file included from ./include/linux/atomic.h:5, from drivers/gpio/gpio-pca953x.c:11: drivers/gpio/gpio-pca953x.c:1015:17: note: while referencing ‘val’ 1015 | DECLARE_BITMAP(val, MAX_LINE); | ^~~ ./include/linux/types.h:11:16: note: in definition of macro ‘DECLARE_BITMAP’ 11 | unsigned long name[BITS_TO_LONGS(bits)] |^~~~ In file included from drivers/gpio/gpio-pca953x.c:12: ./include/linux/bitmap.h:800:17: error: array subscript [1, 1024] is outside array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds] 800 | map[index + 1] |= (value >> space); | ~~~^~~ In file included from ./include/linux/atomic.h:5, from drivers/gpio/gpio-pca953x.c:11: drivers/gpio/gpio-pca953x.c:1015:17: note: while referencing ‘val’ 1015 | DECLARE_BITMAP(val, MAX_LINE); | ^~~ ./include/linux/types.h:11:16: note: in definition of macro ‘DECLARE_BITMAP’ 11 | unsigned long name[BITS_TO_LONGS(bits)] |^~~~ I've not dug further since I don't have any interest in the pca953x driver - it just happened to be enabled in my config, so for now I've turned it off. However I couldn't obviously see any other reports of this, so here it is. Thanks, Robin.
Re: [PATCH] treewide: Fix common grammar mistake "the the"
On 11/04/2024 4:04 pm, Thorsten Blum wrote: Use `find . -type f -exec sed -i 's/\/the/g' {} +` to find all occurrences of "the the" and replace them with a single "the". [...] diff --git a/arch/arm/include/asm/unwind.h b/arch/arm/include/asm/unwind.h index d60b09a5acfc..a75da9a01f91 100644 --- a/arch/arm/include/asm/unwind.h +++ b/arch/arm/include/asm/unwind.h @@ -10,7 +10,7 @@ #ifndef __ASSEMBLY__ -/* Unwind reason code according the the ARM EABI documents */ +/* Unwind reason code according the ARM EABI documents */ Well, that's clearly still not right... repeated words aren't *always* redundant, sometimes they're meant to be other words ;) Thanks, Robin.
Re: [PATCH] treewide: Fix common grammar mistake "the the"
On 11/04/2024 4:04 pm, Thorsten Blum wrote: Use `find . -type f -exec sed -i 's/\/the/g' {} +` to find all occurrences of "the the" and replace them with a single "the". [...] diff --git a/arch/arm/include/asm/unwind.h b/arch/arm/include/asm/unwind.h index d60b09a5acfc..a75da9a01f91 100644 --- a/arch/arm/include/asm/unwind.h +++ b/arch/arm/include/asm/unwind.h @@ -10,7 +10,7 @@ #ifndef __ASSEMBLY__ -/* Unwind reason code according the the ARM EABI documents */ +/* Unwind reason code according the ARM EABI documents */ Well, that's clearly still not right... repeated words aren't *always* redundant, sometimes they're meant to be other words ;) Thanks, Robin.
Re: [PATCH] drm/panthor: Don't use virt_to_pfn()
On 18/03/2024 2:51 pm, Steven Price wrote: virt_to_pfn() isn't available on x86 (except to xen) so breaks COMPILE_TEST builds. Avoid its use completely by instead storing the struct page pointer allocated in panthor_device_init() and using page_to_pfn() instead. Signed-off-by: Steven Price --- drivers/gpu/drm/panthor/panthor_device.c | 10 ++ drivers/gpu/drm/panthor/panthor_device.h | 2 +- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index 69deb8e17778..3c30da03fa48 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -154,6 +154,7 @@ int panthor_device_init(struct panthor_device *ptdev) { struct resource *res; struct page *p; + u32 *dummy_page_virt; int ret; ptdev->coherent = device_get_dma_attr(ptdev->base.dev) == DEV_DMA_COHERENT; @@ -172,9 +173,10 @@ int panthor_device_init(struct panthor_device *ptdev) if (!p) return -ENOMEM; - ptdev->pm.dummy_latest_flush = page_address(p); + ptdev->pm.dummy_latest_flush = p; + dummy_page_virt = page_address(p); ret = drmm_add_action_or_reset(>base, panthor_device_free_page, - ptdev->pm.dummy_latest_flush); + dummy_page_virt); Nit: I was about to say I'd be inclined to switch the callback to __free_page() instead, but then I realise there's no real need to be reinventing that in the first place: dummy_page_virt = (void *)devm_get_free_pages(ptdev->base.dev, GFP_KERNEL | GFP_ZERO, 0); if (!dummy_page_virt) return -ENOMEM; ptdev->pm.dummy_latest_flush = virt_to_page(dummy_page_virt); Cheers, Robin. if (ret) return ret; @@ -184,7 +186,7 @@ int panthor_device_init(struct panthor_device *ptdev) * happens while the dummy page is mapped. Zero cannot be used because * that means 'always flush'. */ - *ptdev->pm.dummy_latest_flush = 1; + *dummy_page_virt = 1; INIT_WORK(>reset.work, panthor_device_reset_work); ptdev->reset.wq = alloc_ordered_workqueue("panthor-reset-wq", 0); @@ -353,7 +355,7 @@ static vm_fault_t panthor_mmio_vm_fault(struct vm_fault *vmf) if (active) pfn = __phys_to_pfn(ptdev->phys_addr + CSF_GPU_LATEST_FLUSH_ID); else - pfn = virt_to_pfn(ptdev->pm.dummy_latest_flush); + pfn = page_to_pfn(ptdev->pm.dummy_latest_flush); break; default: diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index 51c9d61b6796..c84c27dcc92c 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -160,7 +160,7 @@ struct panthor_device { * Used to replace the real LATEST_FLUSH page when the GPU * is suspended. */ - u32 *dummy_latest_flush; + struct page *dummy_latest_flush; } pm; };
Re: [PATCH] drm/panthor: Fix the CONFIG_PM=n case
On 18/03/2024 1:49 pm, Steven Price wrote: On 18/03/2024 13:08, Boris Brezillon wrote: On Mon, 18 Mar 2024 11:31:05 + Steven Price wrote: On 18/03/2024 08:58, Boris Brezillon wrote: Putting a hard dependency on CONFIG_PM is not possible because of a circular dependency issue, and it's actually not desirable either. In order to support this use case, we forcibly resume at init time, and suspend at unplug time. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202403031944.eoimq8wk-...@intel.com/ Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- Tested by faking CONFIG_PM=n in the driver (basically commenting all pm_runtime calls, and making the panthor_device_suspend/resume() calls unconditional in the panthor_device_unplug/init() path) since CONFIG_ARCH_ROCKCHIP selects CONFIG_PM. Seems to work fine, but I can't be 100% sure this will work correctly on a platform that has CONFIG_PM=n. The same - I can't test this properly :( Note that the other option (which AFAICT doesn't cause any problems) is to "select PM" rather than depend on it - AIUI the 'select' dependency is considered in the opposite direction by kconfig so won't cause the dependency loop. Doesn't seem to work with COMPILE_TEST though? I mean, we need something like depends on ARM || ARM64 || (COMPILE_TEST && PM) ... select PM but kconfig doesn't like that Why do we need the "&& PM" part? Just: depends on ARM || ARM64 || COMPILE_TEST ... select PM Or at least that appears to work for me. drivers/gpu/drm/panthor/Kconfig:3:error: recursive dependency detected! drivers/gpu/drm/panthor/Kconfig:3: symbol DRM_PANTHOR depends on PM kernel/power/Kconfig:183:symbol PM is selected by DRM_PANTHOR which id why I initially when for a depends on PM Of course if there is actually anyone who has a platform which can be built !CONFIG_PM then that won't help. But the inability of anyone to actually properly test this configuration does worry me a little. Well, as long as it doesn't regress the PM behavior, I think I'm happy to take the risk. Worst case scenario, someone complains that this is not working properly when they do the !PM bringup :-). Indeed, I've no objection to this patch - although I really should have compiled tested it as Robin pointed out ;) But one other thing I've noticed when compile testing it - we don't appear to have fully fixed the virt_to_pfn() problem. On x86 with COMPILE_TEST I still get an error. Looking at the code it appears that virt_to_pfn() isn't available on x86... it overrides asm/page.h and doesn't provide a definition. The definition on x86 is hiding in asm/xen/page.h. Outside of arch code it's only drivers/xen that currently uses that function. So I guess it's probably best to do a PFN_DOWN(virt_to_phys(...)) instead. Or look to fix x86 :) FWIW from a quick look it might be cleaner to store the struct page pointer for the dummy page - especially since the VA only seems to be used once in panthor_device_init() anyway - then use page_to_pfn() at the business end. Cheers, Robin.
Re: [PATCH] drm/panthor: Fix the CONFIG_PM=n case
On 18/03/2024 8:58 am, Boris Brezillon wrote: Putting a hard dependency on CONFIG_PM is not possible because of a circular dependency issue, and it's actually not desirable either. In order to support this use case, we forcibly resume at init time, and suspend at unplug time. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202403031944.eoimq8wk-...@intel.com/ Signed-off-by: Boris Brezillon --- Tested by faking CONFIG_PM=n in the driver (basically commenting all pm_runtime calls, and making the panthor_device_suspend/resume() calls unconditional in the panthor_device_unplug/init() path) since CONFIG_ARCH_ROCKCHIP selects CONFIG_PM. Seems to work fine, but I can't be 100% sure this will work correctly on a platform that has CONFIG_PM=n. --- drivers/gpu/drm/panthor/panthor_device.c | 13 +++-- drivers/gpu/drm/panthor/panthor_drv.c| 4 +++- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index 69deb8e17778..ba7aedbb4931 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -87,6 +87,10 @@ void panthor_device_unplug(struct panthor_device *ptdev) pm_runtime_dont_use_autosuspend(ptdev->base.dev); pm_runtime_put_sync_suspend(ptdev->base.dev); + /* If PM is disabled, we need to call the suspend handler manually. */ + if (!IS_ENABLED(CONFIG_PM)) + panthor_device_suspend(ptdev->base.dev); + /* Report the unplug operation as done to unblock concurrent * panthor_device_unplug() callers. */ @@ -218,6 +222,13 @@ int panthor_device_init(struct panthor_device *ptdev) if (ret) return ret; + /* If PM is disabled, we need to call panthor_device_resume() manually. */ + if (!IS_ENABLED(CONFIG_PM)) { + ret = panthor_device_resume(ptdev->base.dev); + if (ret) + return ret; + } + ret = panthor_gpu_init(ptdev); if (ret) goto err_rpm_put; @@ -402,7 +413,6 @@ int panthor_device_mmap_io(struct panthor_device *ptdev, struct vm_area_struct * return 0; } -#ifdef CONFIG_PM int panthor_device_resume(struct device *dev) { struct panthor_device *ptdev = dev_get_drvdata(dev); @@ -547,4 +557,3 @@ int panthor_device_suspend(struct device *dev) mutex_unlock(>pm.mmio_lock); return ret; } -#endif diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index ff484506229f..2ea6a9f436db 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1407,17 +1407,19 @@ static const struct of_device_id dt_match[] = { }; MODULE_DEVICE_TABLE(of, dt_match); +#ifdef CONFIG_PM This #ifdef isn't necessary, and in fact will break the !PM build - pm_ptr() already takes care of allowing the compiler to optimise out the ops structure itself without any further annotations. Thanks, Robin. static DEFINE_RUNTIME_DEV_PM_OPS(panthor_pm_ops, panthor_device_suspend, panthor_device_resume, NULL); +#endif static struct platform_driver panthor_driver = { .probe = panthor_probe, .remove_new = panthor_remove, .driver = { .name = "panthor", - .pm = _pm_ops, + .pm = pm_ptr(_pm_ops), .of_match_table = dt_match, }, };
Re: [PATCH 3/3] dt-bindings: remoteproc: Add Arm remoteproc
On 2024-03-01 4:42 pm, abdellatif.elkhl...@arm.com wrote: From: Abdellatif El Khlifi introduce the bindings for Arm remoteproc support. Signed-off-by: Abdellatif El Khlifi --- .../bindings/remoteproc/arm,rproc.yaml| 69 +++ MAINTAINERS | 1 + 2 files changed, 70 insertions(+) create mode 100644 Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml diff --git a/Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml b/Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml new file mode 100644 index ..322197158059 --- /dev/null +++ b/Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml @@ -0,0 +1,69 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/remoteproc/arm,rproc.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Arm Remoteproc Devices + +maintainers: + - Abdellatif El Khlifi + +description: | + Some Arm heterogeneous System-On-Chips feature remote processors that can + be controlled with a reset control register and a reset status register to + start or stop the processor. + + This document defines the bindings for these remote processors. + +properties: + compatible: +enum: + - arm,corstone1000-extsys + + reg: +minItems: 2 +maxItems: 2 +description: | + Address and size in bytes of the reset control register + and the reset status register. + Expects the registers to be in the order as above. + Should contain an entry for each value in 'reg-names'. + + reg-names: +description: | + Required names for each of the reset registers defined in + the 'reg' property. Expects the names from the following + list, in the specified order, each representing the corresponding + reset register. +items: + - const: reset-control + - const: reset-status + + firmware-name: +description: | + Default name of the firmware to load to the remote processor. So... is loading the firmware image achieved by somehow bitbanging it through the one reset register, maybe? I find it hard to believe this is a complete and functional binding. Frankly at the moment I'd be inclined to say it isn't even a remoteproc binding (or driver) at all, it's a reset controller. Bindings are a contract for describing the hardware, not the current state of Linux driver support - if this thing still needs mailboxes, shared memory, a reset vector register, or whatever else to actually be useful, those should be in the binding from day 1 so that a) people can write and deploy correct DTs now, such that functionality becomes available on their systems as soon as driver support catches up, and b) the community has any hope of being able to review whether the binding is appropriately designed and specified for the purpose it intends to serve. For instance right now it seems somewhat tenuous to describe two consecutive 32-bit registers as separate "reg" entries, but *maybe* it's OK if that's all there ever is. However if it's actually going to end up needing several more additional MMIO and/or memory regions for other functionality, then describing each register and location individually is liable to get unmanageable really fast, and a higher-level functional grouping (e.g. these reset-related registers together as a single 8-byte region) would likely be a better design. Thanks, Robin. + +required: + - compatible + - reg + - reg-names + - firmware-name + +additionalProperties: false + +examples: + - | +extsys0: remoteproc@1a010310 { +compatible = "arm,corstone1000-extsys"; +reg = <0x1a010310 0x4>, <0x1a010314 0x4>; +reg-names = "reset-control", "reset-status"; +firmware-name = "es0_flashfw.elf"; +}; + +extsys1: remoteproc@1a010318 { +compatible = "arm,corstone1000-extsys"; +reg = <0x1a010318 0x4>, <0x1a01031c 0x4>; +reg-names = "reset-control", "reset-status"; +firmware-name = "es1_flashfw.elf"; +}; diff --git a/MAINTAINERS b/MAINTAINERS index 54d6a40feea5..eddaa3841a65 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1768,6 +1768,7 @@ ARM REMOTEPROC DRIVER M:Abdellatif El Khlifi L:linux-remotep...@vger.kernel.org S:Maintained +F: Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml F:drivers/remoteproc/arm_rproc.c ARM SMC WATCHDOG DRIVER
Re: [PATCH 3/3] drm/panthor: Fix undefined panthor_device_suspend/resume symbol issue
On 2024-03-11 1:22 pm, Boris Brezillon wrote: On Mon, 11 Mar 2024 13:11:28 + Robin Murphy wrote: On 2024-03-11 11:52 am, Boris Brezillon wrote: On Mon, 11 Mar 2024 13:49:56 +0200 Jani Nikula wrote: On Mon, 11 Mar 2024, Boris Brezillon wrote: On Mon, 11 Mar 2024 13:05:01 +0200 Jani Nikula wrote: This breaks the config for me: SYNCinclude/config/auto.conf.cmd GEN Makefile drivers/iommu/Kconfig:14:error: recursive dependency detected! drivers/iommu/Kconfig:14: symbol IOMMU_SUPPORT is selected by DRM_PANTHOR drivers/gpu/drm/panthor/Kconfig:3: symbol DRM_PANTHOR depends on PM kernel/power/Kconfig:183: symbol PM is selected by PM_SLEEP kernel/power/Kconfig:117: symbol PM_SLEEP depends on HIBERNATE_CALLBACKS kernel/power/Kconfig:35:symbol HIBERNATE_CALLBACKS is selected by XEN_SAVE_RESTORE arch/x86/xen/Kconfig:67:symbol XEN_SAVE_RESTORE depends on XEN arch/x86/xen/Kconfig:6: symbol XEN depends on PARAVIRT arch/x86/Kconfig:781: symbol PARAVIRT is selected by HYPERV drivers/hv/Kconfig:5: symbol HYPERV depends on X86_LOCAL_APIC arch/x86/Kconfig:1106: symbol X86_LOCAL_APIC depends on X86_UP_APIC arch/x86/Kconfig:1081: symbol X86_UP_APIC prompt is visible depending on PCI_MSI drivers/pci/Kconfig:39: symbol PCI_MSI is selected by AMD_IOMMU drivers/iommu/amd/Kconfig:3:symbol AMD_IOMMU depends on IOMMU_SUPPORT Uh, I guess we want a "depends on IOMMU_SUPPORT" instead of "select IOMMU_SUPPORT" in panthor then. That works for me. Let's revert the faulty commit first. We'll see if Steve has a different solution for the original issue. FWIW, the reasoning in the offending commit seems incredibly tenuous. There are far more practical reasons for building an arm/arm64 kernel without PM - for debugging or whatever, and where one may even still want a usable GPU, let alone just a non-broken build - than there are for building this driver for x86. Using pm_ptr() is trivial, and if you want to support COMPILE_TEST then there's really no justifiable excuse not to. The problem is not just about using pm_ptr(), but also making sure panthor_device_resume/suspend() are called called in the init/unplug path when !PM, as I don't think the PM helpers automate that for us. I was just aiming for a simple fix that wouldn't force me to test the !PM case... Fair enough, at worst we could always have a runtime check and refuse to probe in conditions we don't think are worth the bother of implementing fully-functional support for. However if we want to make an argument for only supporting "realistic" configs at build time then that is an argument for dropping COMPILE_TEST as well. Thanks, Robin.
Re: [PATCH 3/3] drm/panthor: Fix undefined panthor_device_suspend/resume symbol issue
On 2024-03-11 11:52 am, Boris Brezillon wrote: On Mon, 11 Mar 2024 13:49:56 +0200 Jani Nikula wrote: On Mon, 11 Mar 2024, Boris Brezillon wrote: On Mon, 11 Mar 2024 13:05:01 +0200 Jani Nikula wrote: This breaks the config for me: SYNCinclude/config/auto.conf.cmd GEN Makefile drivers/iommu/Kconfig:14:error: recursive dependency detected! drivers/iommu/Kconfig:14: symbol IOMMU_SUPPORT is selected by DRM_PANTHOR drivers/gpu/drm/panthor/Kconfig:3: symbol DRM_PANTHOR depends on PM kernel/power/Kconfig:183: symbol PM is selected by PM_SLEEP kernel/power/Kconfig:117: symbol PM_SLEEP depends on HIBERNATE_CALLBACKS kernel/power/Kconfig:35:symbol HIBERNATE_CALLBACKS is selected by XEN_SAVE_RESTORE arch/x86/xen/Kconfig:67:symbol XEN_SAVE_RESTORE depends on XEN arch/x86/xen/Kconfig:6: symbol XEN depends on PARAVIRT arch/x86/Kconfig:781: symbol PARAVIRT is selected by HYPERV drivers/hv/Kconfig:5: symbol HYPERV depends on X86_LOCAL_APIC arch/x86/Kconfig:1106: symbol X86_LOCAL_APIC depends on X86_UP_APIC arch/x86/Kconfig:1081: symbol X86_UP_APIC prompt is visible depending on PCI_MSI drivers/pci/Kconfig:39: symbol PCI_MSI is selected by AMD_IOMMU drivers/iommu/amd/Kconfig:3:symbol AMD_IOMMU depends on IOMMU_SUPPORT Uh, I guess we want a "depends on IOMMU_SUPPORT" instead of "select IOMMU_SUPPORT" in panthor then. That works for me. Let's revert the faulty commit first. We'll see if Steve has a different solution for the original issue. FWIW, the reasoning in the offending commit seems incredibly tenuous. There are far more practical reasons for building an arm/arm64 kernel without PM - for debugging or whatever, and where one may even still want a usable GPU, let alone just a non-broken build - than there are for building this driver for x86. Using pm_ptr() is trivial, and if you want to support COMPILE_TEST then there's really no justifiable excuse not to. Thanks, Robin.
Re: [mainline] [linux-next] [6.8-rc1] [FC] [DLPAR] OOps kernel crash after performing dlpar remove test
] [c000a878bcb0] [c0685d3c] kernfs_fop_write_iter+0x1cc/0x280 [ 981.124283] [c000a878bd00] [c05909c8] vfs_write+0x358/0x4b0 [ 981.124288] [c000a878bdc0] [c0590cfc] ksys_write+0x7c/0x140 [ 981.124293] [c000a878be10] [c0036554] system_call_exception+0x134/0x330 [ 981.124298] [c000a878be50] [c000d6a0] system_call_common+0x160/0x2e4 [ 981.124303] --- interrupt: c00 at 0x200013f21594 [ 981.124306] NIP: 200013f21594 LR: 200013e97bf4 CTR: [ 981.124309] REGS: c000a878be80 TRAP: 0c00 Not tainted (6.5.0-rc6-next-20230817-auto) [ 981.124312] MSR: 8280f033 CR: 22000282 XER: [ 981.124321] IRQMASK: 0 [ 981.124321] GPR00: 0004 73a55c70 200014007300 0007 [ 981.124321] GPR04: 00013aff5750 0008 fbad2c80 00013afd02a0 [ 981.124321] GPR08: 0001 [ 981.124321] GPR12: 200013b7bc30 [ 981.124321] GPR16: [ 981.124321] GPR20: [ 981.124321] GPR24: 00010ef61668 0008 00013aff5750 [ 981.124321] GPR28: 0008 00013afd02a0 00013aff5750 0008 [ 981.124356] NIP [200013f21594] 0x200013f21594 [ 981.124358] LR [200013e97bf4] 0x200013e97bf4 [ 981.124361] --- interrupt: c00 [ 981.124362] Code: 38427bd0 7c0802a6 6000 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 7cbf2b78 38a0 7cdd3378 f8010010 f821ffc1 4bff95d1 6000 7c7e1b79 [ 981.124374] ---[ end trace ]--- Thanks and Regards On 1/31/24 16:18, Robin Murphy wrote: On 2024-01-31 9:19 am, Tasmiya Nalatwad wrote: Greetings, [mainline] [linux-next] [6.8-rc1] [DLPAR] OOps kernel crash after performing dlpar remove test --- Traces --- [58563.146236] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6b83 [58563.146242] Faulting instruction address: 0xc09c0e60 [58563.146248] Oops: Kernel access of bad area, sig: 11 [#1] [58563.146252] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries [58563.146258] Modules linked in: isofs cdrom dm_snapshot dm_bufio dm_round_robin dm_queue_length exfat vfat fat btrfs blake2b_generic xor raid6_pq zstd_compress loop xfs libcrc32c raid0 nvram rpadlpar_io rpaphp nfnetlink xsk_diag bonding tls rfkill sunrpc dm_service_time dm_multipath dm_mod pseries_rng vmx_crypto binfmt_misc ext4 mbcache jbd2 sd_mod sg ibmvscsi scsi_transport_srp ibmveth lpfc nvmet_fc nvmet nvme_fc nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 scsi_transport_fc fuse [58563.146326] CPU: 0 PID: 1071247 Comm: drmgr Kdump: loaded Not tainted 6.8.0-rc1-auto-gecb1b8288dc7 #1 [58563.146332] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 0xf05 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries [58563.146337] NIP: c09c0e60 LR: c09c0e28 CTR: c09c1584 [58563.146342] REGS: c0007960f260 TRAP: 0380 Not tainted (6.8.0-rc1-auto-gecb1b8288dc7) [58563.146347] MSR: 80009033 CR: 24822424 XER: 20040006 [58563.146360] CFAR: c09c0e74 IRQMASK: 0 [58563.146360] GPR00: c09c0e28 c0007960f500 c1482600 c3050540 [58563.146360] GPR04: c0089a6870c0 0001 fffe [58563.146360] GPR08: c2bac020 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 0220 [58563.146360] GPR12: 2000 c308 [58563.146360] GPR16: 0001 [58563.146360] GPR20: c1281478 c1281490 c2bfed80 [58563.146360] GPR24: c0089a6870c0 c2b9ffb8 [58563.146360] GPR28: c2bac0e8 [58563.146421] NIP [c09c0e60] iommu_ops_from_fwnode+0x68/0x118 [58563.146430] LR [c09c0e28] iommu_ops_from_fwnode+0x30/0x118 This implies that iommu_device_list has become corrupted. Looks like spapr_tce_setup_phb_iommus_initcall() registers an iommu_device which pcibios_free_controller() could free if a PCI controller is removed, but there's no path anywhere to ever unregister any of those IOMMUs. Presumably this also means that is a PCI controller is dynamically added after init, its IOMMU won't be set up properly either. Thanks, Robin. [58563.146437] Call Trace: [58563.146439] [c0007960f500] [c0007960f560] 0xc0007960f560 (unreliable) [58563.146446] [c0007960f530] [c09c0fd0] __iommu_probe_device+0xc0/0x5c0 [58563.146454] [c0007960f5a0] [c09c151c] iommu_probe_device+0x4c/0xb4 [58563.146462] [c0007960f5e0] [c09c15d0] iommu_bus_notifier+0x4c/0x8c [58563.146469
Re: [mainline] [linux-next] [6.8-rc1] [FC] [DLPAR] OOps kernel crash after performing dlpar remove test
On 2024-01-31 9:19 am, Tasmiya Nalatwad wrote: Greetings, [mainline] [linux-next] [6.8-rc1] [DLPAR] OOps kernel crash after performing dlpar remove test --- Traces --- [58563.146236] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6b83 [58563.146242] Faulting instruction address: 0xc09c0e60 [58563.146248] Oops: Kernel access of bad area, sig: 11 [#1] [58563.146252] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries [58563.146258] Modules linked in: isofs cdrom dm_snapshot dm_bufio dm_round_robin dm_queue_length exfat vfat fat btrfs blake2b_generic xor raid6_pq zstd_compress loop xfs libcrc32c raid0 nvram rpadlpar_io rpaphp nfnetlink xsk_diag bonding tls rfkill sunrpc dm_service_time dm_multipath dm_mod pseries_rng vmx_crypto binfmt_misc ext4 mbcache jbd2 sd_mod sg ibmvscsi scsi_transport_srp ibmveth lpfc nvmet_fc nvmet nvme_fc nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 scsi_transport_fc fuse [58563.146326] CPU: 0 PID: 1071247 Comm: drmgr Kdump: loaded Not tainted 6.8.0-rc1-auto-gecb1b8288dc7 #1 [58563.146332] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 0xf05 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries [58563.146337] NIP: c09c0e60 LR: c09c0e28 CTR: c09c1584 [58563.146342] REGS: c0007960f260 TRAP: 0380 Not tainted (6.8.0-rc1-auto-gecb1b8288dc7) [58563.146347] MSR: 80009033 CR: 24822424 XER: 20040006 [58563.146360] CFAR: c09c0e74 IRQMASK: 0 [58563.146360] GPR00: c09c0e28 c0007960f500 c1482600 c3050540 [58563.146360] GPR04: c0089a6870c0 0001 fffe [58563.146360] GPR08: c2bac020 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 0220 [58563.146360] GPR12: 2000 c308 [58563.146360] GPR16: 0001 [58563.146360] GPR20: c1281478 c1281490 c2bfed80 [58563.146360] GPR24: c0089a6870c0 c2b9ffb8 [58563.146360] GPR28: c2bac0e8 [58563.146421] NIP [c09c0e60] iommu_ops_from_fwnode+0x68/0x118 [58563.146430] LR [c09c0e28] iommu_ops_from_fwnode+0x30/0x118 This implies that iommu_device_list has become corrupted. Looks like spapr_tce_setup_phb_iommus_initcall() registers an iommu_device which pcibios_free_controller() could free if a PCI controller is removed, but there's no path anywhere to ever unregister any of those IOMMUs. Presumably this also means that is a PCI controller is dynamically added after init, its IOMMU won't be set up properly either. Thanks, Robin. [58563.146437] Call Trace: [58563.146439] [c0007960f500] [c0007960f560] 0xc0007960f560 (unreliable) [58563.146446] [c0007960f530] [c09c0fd0] __iommu_probe_device+0xc0/0x5c0 [58563.146454] [c0007960f5a0] [c09c151c] iommu_probe_device+0x4c/0xb4 [58563.146462] [c0007960f5e0] [c09c15d0] iommu_bus_notifier+0x4c/0x8c [58563.146469] [c0007960f600] [c019e3d0] notifier_call_chain+0xb8/0x1a0 [58563.146476] [c0007960f660] [c019eea0] blocking_notifier_call_chain+0x64/0x94 [58563.146483] [c0007960f6a0] [c09d3c5c] bus_notify+0x50/0x7c [58563.146491] [c0007960f6e0] [c09cfba4] device_add+0x774/0x9bc [58563.146498] [c0007960f7a0] [c08abe9c] pci_device_add+0x2f4/0x864 [58563.146506] [c0007960f850] [c007d5a0] of_create_pci_dev+0x390/0xa08 [58563.146514] [c0007960f930] [c007de68] __of_scan_bus+0x250/0x328 [58563.146520] [c0007960fa10] [c007a680] pcibios_scan_phb+0x274/0x3c0 [58563.146527] [c0007960fae0] [c0105d58] init_phb_dynamic+0xb8/0x110 [58563.146535] [c0007960fb50] [c008217b0380] dlpar_add_slot+0x170/0x3b4 [rpadlpar_io] [58563.146544] [c0007960fbf0] [c008217b0ca0] add_slot_store+0xa4/0x140 [rpadlpar_io] [58563.146551] [c0007960fc80] [c0f3dbec] kobj_attr_store+0x30/0x4c [58563.146559] [c0007960fca0] [c06931fc] sysfs_kf_write+0x68/0x7c [58563.146566] [c0007960fcc0] [c0691b2c] kernfs_fop_write_iter+0x1c8/0x278 [58563.146573] [c0007960fd10] [c0599f54] vfs_write+0x340/0x4cc [58563.146580] [c0007960fdc0] [c059a2bc] ksys_write+0x7c/0x140 [58563.146587] [c0007960fe10] [c0035d74] system_call_exception+0x134/0x330 [58563.146595] [c0007960fe50] [c000d6a0] system_call_common+0x160/0x2e4 [58563.146602] --- interrupt: c00 at 0x24470cb4 [58563.146606] NIP: 24470cb4 LR: 243e7d04 CTR: [58563.146611] REGS: c0007960fe80 TRAP: 0c00 Not tainted (6.8.0-rc1-auto-gecb1b8288dc7) [58563.146616] MSR: 8280f033 CR: 24000282 XER: [58563.146632] IRQMASK: 0 [58563.146632] GPR00:
Re: [PATCH 00/17] video: dw_hdmi: Support Vendor PHY
On 2023-12-15 7:13 am, Kever Yang wrote: Hi Jagan, On 2023/12/15 14:36, Jagan Teki wrote: Hi Heiko/Kerver/Anatoloj, On Mon, Dec 11, 2023 at 2:30 PM Jagan Teki wrote: Unlike RK3399, Sunxi/Meson DW HDMI the new Rockchip SoC Rk3328 would support external vendor PHY with DW HDMI chip. Support this vendor PHY by adding new platform PHY ops via DW HDMI driver and call the respective generic phy from platform driver code. This series tested in RK3328 with 1080p (1920x1080) resolution. Patch 0001/0005: Support Vendor PHY Patch 0006/0008: VOP extension for win, dsp offsets Patch 0009/0010: RK3328 VOP, HDMI clocks Patch 0011: Rockchip Inno HDMI PHY Patch 0012: RK3328 HDMI driver Patch 0013: RK3328 VOP driver Patch 0014/0017: Enable HDMI Out for RK3328 Importent: One pontential issues is that Linux HDMI out on RK3328 has effected by this patchset as I wouldn't find any relation or clue. [ 0.752016] Loading compiled-in X.509 certificates [ 0.787796] inno_hdmi_phy_rk3328_clk_recalc_rate: parent 2400 [ 0.788391] inno-hdmi-phy ff43.phy: inno_hdmi_phy_rk3328_clk_recalc_rate rate 14850 vco 14850 [ 0.798353] rockchip-drm display-subsystem: bound ff37.vop (ops vop_component_ops) [ 0.799403] dwhdmi-rockchip ff3c.hdmi: supply avdd-0v9 not found, using dummy regulator [ 0.800288] rk_iommu ff373f00.iommu: Enable stall request timed out, status: 0x4b [ 0.801131] dwhdmi-rockchip ff3c.hdmi: supply avdd-1v8 not found, using dummy regulator [ 0.802056] rk_iommu ff373f00.iommu: Disable paging request timed out, status: 0x4b [ 0.803233] dwhdmi-rockchip ff3c.hdmi: Detected HDMI TX controller v2.11a with HDCP (inno_dw_hdmi_phy2) [ 0.805355] dwhdmi-rockchip ff3c.hdmi: registered DesignWare HDMI I2C bus driver [ 0.808769] rockchip-drm display-subsystem: bound ff3c.hdmi (ops dw_hdmi_rockchip_ops) [ 0.810869] [drm] Initialized rockchip 1.0.0 20140818 for display-subsystem on minor 0 The only way I can use Linux HDMI by disabling IOMMU or support disable-iommu link for RK3328 via DT [1]. [1] https://www.spinics.net/lists/devicetree/msg605124.html Is anyone aware of this issue? I did post the patches for Linux IOMMU but seems not a proper solution. Any suggestions? I'm not expert in HDMI/VOP, so I can't provide a suitable solution in the kernel, but here is the reason why we need patch to workaround the issue in the kernel: - The VOP driver working in U-Boot is non-IOMMU mode, and the VOP access DDR by physical address; - The VOP driver working in kernel with IOMMU enabled(by default), the VOP access DDR with virtual address(by IOMMU); - The VOP is keep working in kernel before kernel VOP driver is enabled, and the IOMMU driver will be enabled by the Linux PM framework, since the IOMMU is not correctly configured at this point, the VOP will access unknown space(the original physical address in U-Boot) convert by IOMMU; So we need to disable the IOMMU temporary in kernel startup before VOP driver is enabled. If U-Boot isn't handing off an active framebuffer, then it should be U-Boot's responsibility to stop the VOP before it exits; if on the other hand it is, then it can now use the "iommu-addresses" DT property (see the reserved-memory schema) on the framebuffer region, and we should just need a bit of work in the IOMMU driver to ensure that is respected during the period between the IOMMU initialising and the Linux VOP driver subsequently taking over (i.e. so it won't get stuck on an unexpected page fault as seems to be happening above). The IOMMU aspect of that ought to be fairly straightforward; the trickier part might be the runtime PM aspect to ensure the IOMMU doesn't let itself go idle and actually turn anything off during that period. I also still think that doing the full rk_iommu_disable() upon runtime suspend is wrong, but that's more of a thing which confounds the underlying issue here, rather than being the problem in itself. Thanks, Robin.
Re: [PATCH v2] iommu/arm-smmu-qcom: Add missing GMU entry to match table
On 2023-12-10 6:06 pm, Rob Clark wrote: From: Rob Clark In some cases the firmware expects cbndx 1 to be assigned to the GMU, so we also want the default domain for the GMU to be an identy domain. This way it does not get a context bank assigned. Without this, both of_dma_configure() and drm/msm's iommu_domain_attach() will trigger allocating and configuring a context bank. So GMU ends up attached to both cbndx 1 and later cbndx 2. This arrangement seemingly confounds and surprises the firmware if the GPU later triggers a translation fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU getting wedged and the GPU stuck without memory access. Reviewed-by: Robin Murphy Cc: sta...@vger.kernel.org Signed-off-by: Rob Clark --- I didn't add a fixes tag because really this issue has been there all along, but either didn't matter with other firmware or we didn't notice the problem. drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 549ae4dba3a6..d326fa230b96 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -243,6 +243,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = { { .compatible = "qcom,adreno" }, + { .compatible = "qcom,adreno-gmu" }, { .compatible = "qcom,mdp4" }, { .compatible = "qcom,mdss" }, { .compatible = "qcom,sc7180-mdss" },
Re: [PATCH] iommu/arm-smmu-qcom: Add missing GMU entry to match table
On 07/12/2023 9:24 pm, Rob Clark wrote: From: Rob Clark We also want the default domain for the GMU to be an identy domain, so it does not get a context bank assigned. Without this, both of_dma_configure() and drm/msm's iommu_domain_attach() will trigger allocating and configuring a context bank. So GMU ends up attached to both cbndx 1 and cbndx 2. I can't help but read this as implying that it gets attached to both *at the same time*, which would be indicative of a far more serious problem in the main driver and/or IOMMU core code. However, from what we discussed on IRC last night, it sounds like the key point here is more straightforwardly that firmware expects the GMU to be using context bank 1, in a vaguely similar fashion to how context bank 0 is special for the GPU. Clarifying that would help explain why we're just doing this as a trick to influence the allocator (i.e. unlike some of the other devices in this list we don't actually need the properties of the identity domain itself). In future it might be nice to reserve this explicitly on platforms which need it and extend qcom_adreno_smmu_alloc_context_bank() to handle the GMU as well, but I don't object to this patch as an immediate quick fix for now, especially as something nice and easy for stable (I'd agree with Johan in that regard). Thanks, Robin. This arrangement seemingly confounds and surprises the firmware if the GPU later triggers a translation fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU getting wedged and the GPU stuck without memory access. Signed-off-by: Rob Clark --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 549ae4dba3a6..d326fa230b96 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -243,6 +243,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = { { .compatible = "qcom,adreno" }, + { .compatible = "qcom,adreno-gmu" }, { .compatible = "qcom,mdp4" }, { .compatible = "qcom,mdss" }, { .compatible = "qcom,sc7180-mdss" },
Re: [PATCH 1/3] iommu/msm-iommu: don't limit the driver too much
On 07/12/2023 12:54 pm, Dmitry Baryshkov wrote: In preparation of dropping most of ARCH_QCOM subtypes, stop limiting the driver just to those machines. Allow it to be built for any 32-bit Qualcomm platform (ARCH_QCOM). Acked-by: Robin Murphy Unless Joerg disagrees, I think it should be fine if you want to take this via the SoC tree. Thanks, Robin. Signed-off-by: Dmitry Baryshkov --- drivers/iommu/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 7673bb82945b..fd67f586f010 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -178,7 +178,7 @@ config FSL_PAMU config MSM_IOMMU bool "MSM IOMMU Support" depends on ARM - depends on ARCH_MSM8X60 || ARCH_MSM8960 || COMPILE_TEST + depends on ARCH_QCOM || COMPILE_TEST select IOMMU_API select IOMMU_IO_PGTABLE_ARMV7S help
Re: [PATCH 10/10] ACPI: IORT: Allow COMPILE_TEST of IORT
On 29/11/2023 12:48 am, Jason Gunthorpe wrote: The arm-smmu driver can COMPILE_TEST on x86, so expand this to also enable the IORT code so it can be COMPILE_TEST'd too. Signed-off-by: Jason Gunthorpe --- drivers/acpi/Kconfig| 2 -- drivers/acpi/Makefile | 2 +- drivers/acpi/arm64/Kconfig | 1 + drivers/acpi/arm64/Makefile | 2 +- drivers/iommu/Kconfig | 1 + 5 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index f819e760ff195a..3b7f77b227d13a 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -541,9 +541,7 @@ config ACPI_PFRUT To compile the drivers as modules, choose M here: the modules will be called pfr_update and pfr_telemetry. -if ARM64 source "drivers/acpi/arm64/Kconfig" -endif config ACPI_PPTT bool diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index eaa09bf52f1760..4e77ae37b80726 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -127,7 +127,7 @@ obj-y += pmic/ video-objs+= acpi_video.o video_detect.o obj-y += dptf/ -obj-$(CONFIG_ARM64) += arm64/ +obj-y += arm64/ obj-$(CONFIG_ACPI_VIOT) += viot.o diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig index b3ed6212244c1e..537d49d8ace69e 100644 --- a/drivers/acpi/arm64/Kconfig +++ b/drivers/acpi/arm64/Kconfig @@ -11,6 +11,7 @@ config ACPI_GTDT config ACPI_AGDI bool "Arm Generic Diagnostic Dump and Reset Device Interface" + depends on ARM64 depends on ARM_SDE_INTERFACE help Arm Generic Diagnostic Dump and Reset Device Interface (AGDI) is diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index 143debc1ba4a9d..71d0e635599390 100644 --- a/drivers/acpi/arm64/Makefile +++ b/drivers/acpi/arm64/Makefile @@ -4,4 +4,4 @@ obj-$(CONFIG_ACPI_IORT) += iort.o obj-$(CONFIG_ACPI_GTDT) += gtdt.o obj-$(CONFIG_ACPI_APMT) += apmt.o obj-$(CONFIG_ARM_AMBA)+= amba.o -obj-y += dma.o init.o +obj-$(CONFIG_ARM64)+= dma.o init.o diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 7673bb82945b6c..309378e76a9bc9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -318,6 +318,7 @@ config ARM_SMMU select IOMMU_API select IOMMU_IO_PGTABLE_LPAE select ARM_DMA_USE_IOMMU if ARM + select ACPI_IORT if ACPI This is incomplete. If you want the driver to be responsible for enabling its own probing mechanisms then you need to select OF and ACPI too. And all the other drivers which probe from IORT should surely also select ACPI_IORT, and thus ACPI as well. And maybe the PCI core should as well because there are general properties of PCI host bridges and devices described in there? But of course that's clearly backwards nonsense, because drivers do not and should not do that, so this change is not appropriate either. The IORT code may not be *functionally* arm64-specific, but logically it very much is - it serves a specification which is tied to the Arm architecture and describes Arm-architecture-specific concepts, within the wider context of ACPI on Arm itself only supporting AArch64, and not AArch32. It's also not like it's driver code that someone might use as an example and copy to a similar driver which could then run on different architectures where a latent theoretical bug becomes real. There's really no practical value to be had from compile-testing IORT. Thanks, Robin. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 10/10] ACPI: IORT: Allow COMPILE_TEST of IORT
On 29/11/2023 12:48 am, Jason Gunthorpe wrote: The arm-smmu driver can COMPILE_TEST on x86, so expand this to also enable the IORT code so it can be COMPILE_TEST'd too. Signed-off-by: Jason Gunthorpe --- drivers/acpi/Kconfig| 2 -- drivers/acpi/Makefile | 2 +- drivers/acpi/arm64/Kconfig | 1 + drivers/acpi/arm64/Makefile | 2 +- drivers/iommu/Kconfig | 1 + 5 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index f819e760ff195a..3b7f77b227d13a 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -541,9 +541,7 @@ config ACPI_PFRUT To compile the drivers as modules, choose M here: the modules will be called pfr_update and pfr_telemetry. -if ARM64 source "drivers/acpi/arm64/Kconfig" -endif config ACPI_PPTT bool diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index eaa09bf52f1760..4e77ae37b80726 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -127,7 +127,7 @@ obj-y += pmic/ video-objs+= acpi_video.o video_detect.o obj-y += dptf/ -obj-$(CONFIG_ARM64) += arm64/ +obj-y += arm64/ obj-$(CONFIG_ACPI_VIOT) += viot.o diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig index b3ed6212244c1e..537d49d8ace69e 100644 --- a/drivers/acpi/arm64/Kconfig +++ b/drivers/acpi/arm64/Kconfig @@ -11,6 +11,7 @@ config ACPI_GTDT config ACPI_AGDI bool "Arm Generic Diagnostic Dump and Reset Device Interface" + depends on ARM64 depends on ARM_SDE_INTERFACE help Arm Generic Diagnostic Dump and Reset Device Interface (AGDI) is diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index 143debc1ba4a9d..71d0e635599390 100644 --- a/drivers/acpi/arm64/Makefile +++ b/drivers/acpi/arm64/Makefile @@ -4,4 +4,4 @@ obj-$(CONFIG_ACPI_IORT) += iort.o obj-$(CONFIG_ACPI_GTDT) += gtdt.o obj-$(CONFIG_ACPI_APMT) += apmt.o obj-$(CONFIG_ARM_AMBA)+= amba.o -obj-y += dma.o init.o +obj-$(CONFIG_ARM64)+= dma.o init.o diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 7673bb82945b6c..309378e76a9bc9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -318,6 +318,7 @@ config ARM_SMMU select IOMMU_API select IOMMU_IO_PGTABLE_LPAE select ARM_DMA_USE_IOMMU if ARM + select ACPI_IORT if ACPI This is incomplete. If you want the driver to be responsible for enabling its own probing mechanisms then you need to select OF and ACPI too. And all the other drivers which probe from IORT should surely also select ACPI_IORT, and thus ACPI as well. And maybe the PCI core should as well because there are general properties of PCI host bridges and devices described in there? But of course that's clearly backwards nonsense, because drivers do not and should not do that, so this change is not appropriate either. The IORT code may not be *functionally* arm64-specific, but logically it very much is - it serves a specification which is tied to the Arm architecture and describes Arm-architecture-specific concepts, within the wider context of ACPI on Arm itself only supporting AArch64, and not AArch32. It's also not like it's driver code that someone might use as an example and copy to a similar driver which could then run on different architectures where a latent theoretical bug becomes real. There's really no practical value to be had from compile-testing IORT. Thanks, Robin.
Re: [PATCH 10/10] ACPI: IORT: Allow COMPILE_TEST of IORT
On 29/11/2023 12:48 am, Jason Gunthorpe wrote: The arm-smmu driver can COMPILE_TEST on x86, so expand this to also enable the IORT code so it can be COMPILE_TEST'd too. Signed-off-by: Jason Gunthorpe --- drivers/acpi/Kconfig| 2 -- drivers/acpi/Makefile | 2 +- drivers/acpi/arm64/Kconfig | 1 + drivers/acpi/arm64/Makefile | 2 +- drivers/iommu/Kconfig | 1 + 5 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index f819e760ff195a..3b7f77b227d13a 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -541,9 +541,7 @@ config ACPI_PFRUT To compile the drivers as modules, choose M here: the modules will be called pfr_update and pfr_telemetry. -if ARM64 source "drivers/acpi/arm64/Kconfig" -endif config ACPI_PPTT bool diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index eaa09bf52f1760..4e77ae37b80726 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -127,7 +127,7 @@ obj-y += pmic/ video-objs+= acpi_video.o video_detect.o obj-y += dptf/ -obj-$(CONFIG_ARM64) += arm64/ +obj-y += arm64/ obj-$(CONFIG_ACPI_VIOT) += viot.o diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig index b3ed6212244c1e..537d49d8ace69e 100644 --- a/drivers/acpi/arm64/Kconfig +++ b/drivers/acpi/arm64/Kconfig @@ -11,6 +11,7 @@ config ACPI_GTDT config ACPI_AGDI bool "Arm Generic Diagnostic Dump and Reset Device Interface" + depends on ARM64 depends on ARM_SDE_INTERFACE help Arm Generic Diagnostic Dump and Reset Device Interface (AGDI) is diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index 143debc1ba4a9d..71d0e635599390 100644 --- a/drivers/acpi/arm64/Makefile +++ b/drivers/acpi/arm64/Makefile @@ -4,4 +4,4 @@ obj-$(CONFIG_ACPI_IORT) += iort.o obj-$(CONFIG_ACPI_GTDT) += gtdt.o obj-$(CONFIG_ACPI_APMT) += apmt.o obj-$(CONFIG_ARM_AMBA)+= amba.o -obj-y += dma.o init.o +obj-$(CONFIG_ARM64)+= dma.o init.o diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 7673bb82945b6c..309378e76a9bc9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -318,6 +318,7 @@ config ARM_SMMU select IOMMU_API select IOMMU_IO_PGTABLE_LPAE select ARM_DMA_USE_IOMMU if ARM + select ACPI_IORT if ACPI This is incomplete. If you want the driver to be responsible for enabling its own probing mechanisms then you need to select OF and ACPI too. And all the other drivers which probe from IORT should surely also select ACPI_IORT, and thus ACPI as well. And maybe the PCI core should as well because there are general properties of PCI host bridges and devices described in there? But of course that's clearly backwards nonsense, because drivers do not and should not do that, so this change is not appropriate either. The IORT code may not be *functionally* arm64-specific, but logically it very much is - it serves a specification which is tied to the Arm architecture and describes Arm-architecture-specific concepts, within the wider context of ACPI on Arm itself only supporting AArch64, and not AArch32. It's also not like it's driver code that someone might use as an example and copy to a similar driver which could then run on different architectures where a latent theoretical bug becomes real. There's really no practical value to be had from compile-testing IORT. Thanks, Robin.
Re: [PATCH 06/10] iommu: Replace iommu_device_lock with iommu_probe_device_lock
On 29/11/2023 12:48 am, Jason Gunthorpe wrote: The iommu_device_lock protects the iommu_device_list which is only read by iommu_ops_from_fwnode(). This is now always called under the iommu_probe_device_lock, so we don't need to double lock the linked list. Use the iommu_probe_device_lock on the write side too. Please no, iommu_probe_device_lock() is a hack and we need to remove the *reason* it exists at all. And IMO just because iommu_present() is deprecated doesn't justify making it look utterly nonsensical - in no way does that have any relationship with probe_device, much less need to serialise against it! Thanks, Robin. Signed-off-by: Jason Gunthorpe --- drivers/iommu/iommu.c | 30 +- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 08f29a1dfcd5f8..9557c2ec08d915 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -146,7 +146,6 @@ struct iommu_group_attribute iommu_group_attr_##_name = \ container_of(_kobj, struct iommu_group, kobj) static LIST_HEAD(iommu_device_list); -static DEFINE_SPINLOCK(iommu_device_lock); static const struct bus_type * const iommu_buses[] = { _bus_type, @@ -262,9 +261,9 @@ int iommu_device_register(struct iommu_device *iommu, if (hwdev) iommu->fwnode = dev_fwnode(hwdev); - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); list_add_tail(>list, _device_list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++) err = bus_iommu_probe(iommu_buses[i]); @@ -279,9 +278,9 @@ void iommu_device_unregister(struct iommu_device *iommu) for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) bus_for_each_dev(iommu_buses[i], NULL, iommu, remove_iommu_group); - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); list_del(>list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); /* Pairs with the alloc in generic_single_device_group() */ iommu_group_put(iommu->singleton_group); @@ -316,9 +315,9 @@ int iommu_device_register_bus(struct iommu_device *iommu, if (err) return err; - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); list_add_tail(>list, _device_list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); err = bus_iommu_probe(bus); if (err) { @@ -2033,9 +2032,9 @@ bool iommu_present(const struct bus_type *bus) for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) { if (iommu_buses[i] == bus) { - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); ret = !list_empty(_device_list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); } } return ret; @@ -2980,17 +2979,14 @@ EXPORT_SYMBOL_GPL(iommu_default_passthrough); const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) { - const struct iommu_ops *ops = NULL; struct iommu_device *iommu; - spin_lock(_device_lock); + lockdep_assert_held(_probe_device_lock); + list_for_each_entry(iommu, _device_list, list) - if (iommu->fwnode == fwnode) { - ops = iommu->ops; - break; - } - spin_unlock(_device_lock); - return ops; + if (iommu->fwnode == fwnode) + return iommu->ops; + return NULL; } int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
Re: [PATCH 06/10] iommu: Replace iommu_device_lock with iommu_probe_device_lock
On 29/11/2023 12:48 am, Jason Gunthorpe wrote: The iommu_device_lock protects the iommu_device_list which is only read by iommu_ops_from_fwnode(). This is now always called under the iommu_probe_device_lock, so we don't need to double lock the linked list. Use the iommu_probe_device_lock on the write side too. Please no, iommu_probe_device_lock() is a hack and we need to remove the *reason* it exists at all. And IMO just because iommu_present() is deprecated doesn't justify making it look utterly nonsensical - in no way does that have any relationship with probe_device, much less need to serialise against it! Thanks, Robin. Signed-off-by: Jason Gunthorpe --- drivers/iommu/iommu.c | 30 +- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 08f29a1dfcd5f8..9557c2ec08d915 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -146,7 +146,6 @@ struct iommu_group_attribute iommu_group_attr_##_name = \ container_of(_kobj, struct iommu_group, kobj) static LIST_HEAD(iommu_device_list); -static DEFINE_SPINLOCK(iommu_device_lock); static const struct bus_type * const iommu_buses[] = { _bus_type, @@ -262,9 +261,9 @@ int iommu_device_register(struct iommu_device *iommu, if (hwdev) iommu->fwnode = dev_fwnode(hwdev); - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); list_add_tail(>list, _device_list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++) err = bus_iommu_probe(iommu_buses[i]); @@ -279,9 +278,9 @@ void iommu_device_unregister(struct iommu_device *iommu) for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) bus_for_each_dev(iommu_buses[i], NULL, iommu, remove_iommu_group); - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); list_del(>list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); /* Pairs with the alloc in generic_single_device_group() */ iommu_group_put(iommu->singleton_group); @@ -316,9 +315,9 @@ int iommu_device_register_bus(struct iommu_device *iommu, if (err) return err; - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); list_add_tail(>list, _device_list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); err = bus_iommu_probe(bus); if (err) { @@ -2033,9 +2032,9 @@ bool iommu_present(const struct bus_type *bus) for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) { if (iommu_buses[i] == bus) { - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); ret = !list_empty(_device_list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); } } return ret; @@ -2980,17 +2979,14 @@ EXPORT_SYMBOL_GPL(iommu_default_passthrough); const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) { - const struct iommu_ops *ops = NULL; struct iommu_device *iommu; - spin_lock(_device_lock); + lockdep_assert_held(_probe_device_lock); + list_for_each_entry(iommu, _device_list, list) - if (iommu->fwnode == fwnode) { - ops = iommu->ops; - break; - } - spin_unlock(_device_lock); - return ops; + if (iommu->fwnode == fwnode) + return iommu->ops; + return NULL; } int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode, ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [Nouveau] [PATCH 06/10] iommu: Replace iommu_device_lock with iommu_probe_device_lock
On 29/11/2023 12:48 am, Jason Gunthorpe wrote: The iommu_device_lock protects the iommu_device_list which is only read by iommu_ops_from_fwnode(). This is now always called under the iommu_probe_device_lock, so we don't need to double lock the linked list. Use the iommu_probe_device_lock on the write side too. Please no, iommu_probe_device_lock() is a hack and we need to remove the *reason* it exists at all. And IMO just because iommu_present() is deprecated doesn't justify making it look utterly nonsensical - in no way does that have any relationship with probe_device, much less need to serialise against it! Thanks, Robin. Signed-off-by: Jason Gunthorpe --- drivers/iommu/iommu.c | 30 +- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 08f29a1dfcd5f8..9557c2ec08d915 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -146,7 +146,6 @@ struct iommu_group_attribute iommu_group_attr_##_name = \ container_of(_kobj, struct iommu_group, kobj) static LIST_HEAD(iommu_device_list); -static DEFINE_SPINLOCK(iommu_device_lock); static const struct bus_type * const iommu_buses[] = { _bus_type, @@ -262,9 +261,9 @@ int iommu_device_register(struct iommu_device *iommu, if (hwdev) iommu->fwnode = dev_fwnode(hwdev); - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); list_add_tail(>list, _device_list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++) err = bus_iommu_probe(iommu_buses[i]); @@ -279,9 +278,9 @@ void iommu_device_unregister(struct iommu_device *iommu) for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) bus_for_each_dev(iommu_buses[i], NULL, iommu, remove_iommu_group); - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); list_del(>list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); /* Pairs with the alloc in generic_single_device_group() */ iommu_group_put(iommu->singleton_group); @@ -316,9 +315,9 @@ int iommu_device_register_bus(struct iommu_device *iommu, if (err) return err; - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); list_add_tail(>list, _device_list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); err = bus_iommu_probe(bus); if (err) { @@ -2033,9 +2032,9 @@ bool iommu_present(const struct bus_type *bus) for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) { if (iommu_buses[i] == bus) { - spin_lock(_device_lock); + mutex_lock(_probe_device_lock); ret = !list_empty(_device_list); - spin_unlock(_device_lock); + mutex_unlock(_probe_device_lock); } } return ret; @@ -2980,17 +2979,14 @@ EXPORT_SYMBOL_GPL(iommu_default_passthrough); const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) { - const struct iommu_ops *ops = NULL; struct iommu_device *iommu; - spin_lock(_device_lock); + lockdep_assert_held(_probe_device_lock); + list_for_each_entry(iommu, _device_list, list) - if (iommu->fwnode == fwnode) { - ops = iommu->ops; - break; - } - spin_unlock(_device_lock); - return ops; + if (iommu->fwnode == fwnode) + return iommu->ops; + return NULL; } int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
Re: [PATCH 08/16] iommu/fsl: use page allocation function provided by iommu-pages.h
On 28/11/2023 11:50 pm, Jason Gunthorpe wrote: On Tue, Nov 28, 2023 at 06:00:13PM -0500, Pasha Tatashin wrote: On Tue, Nov 28, 2023 at 5:53 PM Robin Murphy wrote: On 2023-11-28 8:49 pm, Pasha Tatashin wrote: Convert iommu/fsl_pamu.c to use the new page allocation functions provided in iommu-pages.h. Again, this is not a pagetable. This thing doesn't even *have* pagetables. Similar to patches #1 and #2 where you're lumping in configuration tables which belong to the IOMMU driver itself, as opposed to pagetables which effectively belong to an IOMMU domain's user. But then there are still drivers where you're *not* accounting similar configuration structures, so I really struggle to see how this metric is useful when it's so completely inconsistent in what it's counting :/ The whole IOMMU subsystem allocates a significant amount of kernel locked memory that we want to at least observe. The new field in vmstat does just that: it reports ALL buddy allocator memory that IOMMU allocates. However, for accounting purposes, I agree, we need to do better, and separate at least iommu pagetables from the rest. We can separate the metric into two: iommu pagetable only iommu everything or into three: iommu pagetable only iommu dma iommu everything What do you think? I think I said this at LPC - if you want to have fine grained accounting of memory by owner you need to go talk to the cgroup people and come up with something generic. Adding ever open coded finer category breakdowns just for iommu doesn't make alot of sense. You can make some argument that the pagetable memory should be counted because kvm counts it's shadow memory, but I wouldn't go into further detail than that with hand coded counters.. Right, pagetable memory is interesting since it's something that any random kernel user can indirectly allocate via iommu_domain_alloc() and iommu_map(), and some of those users may even be doing so on behalf of userspace. I have no objection to accounting and potentially applying limits to *that*. Beyond that, though, there is nothing special about "the IOMMU subsystem". The amount of memory an IOMMU driver needs to allocate for itself in order to function is not of interest beyond curiosity, it just is what it is; limiting it would only break the IOMMU, and if a user thinks it's "too much", the only actionable thing that might help is to physically remove devices from the system. Similar for DMA buffers; it might be intriguing to account those, but it's not really an actionable metric - in the overwhelming majority of cases you can't simply tell a driver to allocate less than what it needs. And that is of course assuming if we were to account *all* DMA buffers, since whether they happen to have an IOMMU translation or not is irrelevant (we'd have already accounted the pagetables as pagetables if so). I bet "the networking subsystem" also consumes significant memory on the same kind of big systems where IOMMU pagetables would be of any concern. I believe some of the some of the "serious" NICs can easily run up hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. - would you propose accounting those too? Thanks, Robin.
Re: [PATCH 06/16] iommu/dma: use page allocation function provided by iommu-pages.h
On 2023-11-28 10:50 pm, Pasha Tatashin wrote: On Tue, Nov 28, 2023 at 5:34 PM Robin Murphy wrote: On 2023-11-28 8:49 pm, Pasha Tatashin wrote: Convert iommu/dma-iommu.c to use the new page allocation functions provided in iommu-pages.h. These have nothing to do with IOMMU pagetables, they are DMA buffers and they belong to whoever called the corresponding dma_alloc_* function. Hi Robin, This is true, however, we want to account and observe the pages allocated by IOMMU subsystem for DMA buffers, as they are essentially unmovable locked pages. Should we separate IOMMU memory from KVM memory all together and add another field to /proc/meminfo, something like "iommu -> iommu pagetable and dma memory", or do we want to export DMA memory separately from IOMMU page tables? These are not allocated by "the IOMMU subsystem", they are allocated by the DMA API. Even if you want to claim that a driver pinning memory via iommu_dma_ops is somehow different from the same driver pinning the same amount of memory via dma-direct when iommu.passthrough=1, it's still nonsense because you're failing to account the pages which iommu_dma_ops gets from CMA, dma_common_alloc_pages(), dynamic SWIOTLB, the various pools, and so on. Thanks, Robin. Since, I included DMA memory, I specifically removed mentioning of IOMMU page tables in the most of places, and only report it as IOMMU memory. However, since it is still bundled together with SecPageTables it can be confusing. Pasha
[PATCH v3] drm/mediatek: Stop using iommu_present()
Remove the pointless check. If an IOMMU is providing transparent DMA API ops for any device(s) we care about, the DT code will have enforced the appropriate probe ordering already. And if the IOMMU *is* entirely absent, then attempting to go ahead with CMA and either suceeding or failing decisively seems more useful than deferring forever. Signed-off-by: Robin Murphy --- I realised that last time I sent this I probably should have CCed a wider audience of reviewers, so here's one with an updated commit message as well to make the resend more worthwhile. drivers/gpu/drm/mediatek/mtk_drm_drv.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index 2dfaa613276a..48581da51857 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -5,7 +5,6 @@ */ #include -#include #include #include #include @@ -608,9 +607,6 @@ static int mtk_drm_bind(struct device *dev) struct drm_device *drm; int ret, i; - if (!iommu_present(_bus_type)) - return -EPROBE_DEFER; - pdev = of_find_device_by_node(private->mutex_node); if (!pdev) { dev_err(dev, "Waiting for disp-mutex device %pOF\n", -- 2.39.2.101.g768bb238c484.dirty
Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
On 2023-11-16 4:17 am, Jason Gunthorpe wrote: On Wed, Nov 15, 2023 at 08:23:54PM +, Robin Murphy wrote: On 2023-11-15 3:36 pm, Jason Gunthorpe wrote: On Wed, Nov 15, 2023 at 03:22:09PM +, Robin Murphy wrote: On 2023-11-15 2:05 pm, Jason Gunthorpe wrote: [Several people have tested this now, so it is something that should sit in linux-next for a while] What's the aim here? This is obviously far, far too much for a stable fix, To fix the locking bug and ugly abuse of dev->iommu? Fixing the locking can be achieved by fixing the locking, as I have now demonstrated. Obviously. I rejected that right away because of how incredibly wrongly layered and hacky it is to do something like that. What, and dressing up the fundamental layering violation by baking it even further into the API flow, while still not actually fixing it or any of its *other* symptoms, is somehow better? Ultimately, this series is still basically doing the same thing my patch does - extending the scope of the existing iommu_probe_device_lock hack to cover fwspec creation. A hack is a hack, so frankly I'd rather it be simple and obvious and look like one, and being easy to remove again is an obvious bonus too. I haven't seen patches or an outline on what you have in mind though? In my view I would like to get rid of of_xlate(), at a minimum. It is a micro-optimization I don't think we need. I see a pretty straightforward path to get there from here. Micro-optimisation!? OK, I think I have to say it. Please stop trying to rewrite code you don't understand. I understand it fine. The list of (fwnode_handle, of_phandle_args) tuples doesn't change between when of_xlate is callled and when probe is called. Probe can have the same list. As best I can tell the extra ops avoids maybe some memory allocation, maybe an extra iteration. What it does do is screw up alot of the drivers that seem to want to allocate the per-device data in of_xlate and make it convoluted and memory leaking buggy on error paths. So, I would move toward having the driver's probe invoke a helper like: iommu_of_xlate(dev, fwspec, _fwnode_function, ); Which generates the same list of (fwnode_handle, of_phandle_args) that was passed to of_xlate today, but is ordered sensibly within the sequence of probe for what many drivers seem to want to do. Grep for of_xlate. It is a standard and well-understood callback pattern for a subsystem to parse a common DT binding and pass a driver-specific specifier to a driver to interpret. Or maybe you just have a peculiar definition of what you think "micro-optimisation" means? :/ So, it is not so much that that the idea of of_xlate goes away, but the specific op->of_xlate does, it gets shifted into a helper that invokes the same function in a more logical spot. I'm curious how you imagine an IOMMU driver's ->probe function could be called *before* parsing the firmware to work out what, if any, IOMMU, and thus driver, a device is associated with. Unless you think we should have the horrible perf model of passing the device to *every* registered ->probe callback in turn until someone claims it. And then every driver has to have identical boilerplate to go off and parse the generic "iommus" binding... which is the whole damn reason for *not* going down that route and instead using an of_xlate mechanism in the first place. The per-device data can be allocated at the top of probe and passed through args to fix the lifetime bugs. It is pretty simple to do. I believe the kids these days would say "Say you don't understand the code without saying you don't understand the code." Most of this series constitutes a giant sweeping redesign of a whole bunch of internal machinery to permit it to be used concurrently, where that concurrency should still not exist in the first place because the thing that allows it to happen also causes other problems like groups being broken. Once the real problem is fixed there will be no need for any of this, and at worst some of it will then actually get in the way. Not quite. This decouples two unrelated things into seperate concerns. It is not so much about the concurrency but removing the abuse of dev->iommu by code that has no need to touch it at all. Sorry, the "abuse" of storing IOMMU-API-specific data in the place we intentionally created to consolidate all the IOMMU-API-specific data into? Yes, there is an issue with the circumstances in which this data is sometimes accessed, but as I'm starting to tire of repeating, that issue fundamentally dates back to 2017, and the implications were unfortunately overlooked when dev->iommu was later introduced and fwspec moved into it (since the non-DT probing paths still worked as originally designed). Pretending that dev->iommu is the issue here is missing the point. Decoupling makes moving code around easier since the relationships
Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
On 2023-11-16 4:17 am, Jason Gunthorpe wrote: On Wed, Nov 15, 2023 at 08:23:54PM +, Robin Murphy wrote: On 2023-11-15 3:36 pm, Jason Gunthorpe wrote: On Wed, Nov 15, 2023 at 03:22:09PM +, Robin Murphy wrote: On 2023-11-15 2:05 pm, Jason Gunthorpe wrote: [Several people have tested this now, so it is something that should sit in linux-next for a while] What's the aim here? This is obviously far, far too much for a stable fix, To fix the locking bug and ugly abuse of dev->iommu? Fixing the locking can be achieved by fixing the locking, as I have now demonstrated. Obviously. I rejected that right away because of how incredibly wrongly layered and hacky it is to do something like that. What, and dressing up the fundamental layering violation by baking it even further into the API flow, while still not actually fixing it or any of its *other* symptoms, is somehow better? Ultimately, this series is still basically doing the same thing my patch does - extending the scope of the existing iommu_probe_device_lock hack to cover fwspec creation. A hack is a hack, so frankly I'd rather it be simple and obvious and look like one, and being easy to remove again is an obvious bonus too. I haven't seen patches or an outline on what you have in mind though? In my view I would like to get rid of of_xlate(), at a minimum. It is a micro-optimization I don't think we need. I see a pretty straightforward path to get there from here. Micro-optimisation!? OK, I think I have to say it. Please stop trying to rewrite code you don't understand. I understand it fine. The list of (fwnode_handle, of_phandle_args) tuples doesn't change between when of_xlate is callled and when probe is called. Probe can have the same list. As best I can tell the extra ops avoids maybe some memory allocation, maybe an extra iteration. What it does do is screw up alot of the drivers that seem to want to allocate the per-device data in of_xlate and make it convoluted and memory leaking buggy on error paths. So, I would move toward having the driver's probe invoke a helper like: iommu_of_xlate(dev, fwspec, _fwnode_function, ); Which generates the same list of (fwnode_handle, of_phandle_args) that was passed to of_xlate today, but is ordered sensibly within the sequence of probe for what many drivers seem to want to do. Grep for of_xlate. It is a standard and well-understood callback pattern for a subsystem to parse a common DT binding and pass a driver-specific specifier to a driver to interpret. Or maybe you just have a peculiar definition of what you think "micro-optimisation" means? :/ So, it is not so much that that the idea of of_xlate goes away, but the specific op->of_xlate does, it gets shifted into a helper that invokes the same function in a more logical spot. I'm curious how you imagine an IOMMU driver's ->probe function could be called *before* parsing the firmware to work out what, if any, IOMMU, and thus driver, a device is associated with. Unless you think we should have the horrible perf model of passing the device to *every* registered ->probe callback in turn until someone claims it. And then every driver has to have identical boilerplate to go off and parse the generic "iommus" binding... which is the whole damn reason for *not* going down that route and instead using an of_xlate mechanism in the first place. The per-device data can be allocated at the top of probe and passed through args to fix the lifetime bugs. It is pretty simple to do. I believe the kids these days would say "Say you don't understand the code without saying you don't understand the code." Most of this series constitutes a giant sweeping redesign of a whole bunch of internal machinery to permit it to be used concurrently, where that concurrency should still not exist in the first place because the thing that allows it to happen also causes other problems like groups being broken. Once the real problem is fixed there will be no need for any of this, and at worst some of it will then actually get in the way. Not quite. This decouples two unrelated things into seperate concerns. It is not so much about the concurrency but removing the abuse of dev->iommu by code that has no need to touch it at all. Sorry, the "abuse" of storing IOMMU-API-specific data in the place we intentionally created to consolidate all the IOMMU-API-specific data into? Yes, there is an issue with the circumstances in which this data is sometimes accessed, but as I'm starting to tire of repeating, that issue fundamentally dates back to 2017, and the implications were unfortunately overlooked when dev->iommu was later introduced and fwspec moved into it (since the non-DT probing paths still worked as originally designed). Pretending that dev->iommu is the issue here is missing the point. Decoupling makes moving code around easier since the relationships
Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
On 2023-11-15 3:36 pm, Jason Gunthorpe wrote: On Wed, Nov 15, 2023 at 03:22:09PM +, Robin Murphy wrote: On 2023-11-15 2:05 pm, Jason Gunthorpe wrote: [Several people have tested this now, so it is something that should sit in linux-next for a while] What's the aim here? This is obviously far, far too much for a stable fix, To fix the locking bug and ugly abuse of dev->iommu? Fixing the locking can be achieved by fixing the locking, as I have now demonstrated. I wouldn't say that, it is up to the people who care about this to decide. It seems alot of people are hitting it so maybe it should be backported in some situations. Regardless, we should not continue to have this locking bug in v6.8. but then it's also not the refactoring we want for the future either, since it's moving in the wrong direction of cementing the fundamental brokenness further in place rather than getting any closer to removing it. I haven't seen patches or an outline on what you have in mind though? In my view I would like to get rid of of_xlate(), at a minimum. It is a micro-optimization I don't think we need. I see a pretty straightforward path to get there from here. Micro-optimisation!? OK, I think I have to say it. Please stop trying to rewrite code you don't understand. Do you also want to get rid of iommu_fwspec, or at least thin it out? That seems reasonable too, I think that becomes within reach once of_xlate is gone. What do you see as "cemeting"? Most of this series constitutes a giant sweeping redesign of a whole bunch of internal machinery to permit it to be used concurrently, where that concurrency should still not exist in the first place because the thing that allows it to happen also causes other problems like groups being broken. Once the real problem is fixed there will be no need for any of this, and at worst some of it will then actually get in the way. I feel like I've explained it many times already, but what needs to happen is for the firmware parsing and of_xlate stage to be initiated by __iommu_probe_device() itself. The first step is my bus ops series (if I'm ever allowed to get it landed...) which gets to the state of expecting to start from a fwspec. Then it's a case of shuffling around what's currently in the bus_type dma_configure methods such that point is where the fwspec is created as well, and the driver-probe-time work is almost removed except for still deferring if a device is waiting for its IOMMU instance (since that instance turning up and registering will retrigger the rest itself). And there at last, a trivial lifecycle and access pattern for dev->iommu (with the overlapping bits of iommu_fwspec finally able to be squashed as well), and finally an end to 8 long and unfortunate years of calling things in the wrong order in ways they were never supposed to be. Thanks, Robin.
Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
On 2023-11-15 3:36 pm, Jason Gunthorpe wrote: On Wed, Nov 15, 2023 at 03:22:09PM +, Robin Murphy wrote: On 2023-11-15 2:05 pm, Jason Gunthorpe wrote: [Several people have tested this now, so it is something that should sit in linux-next for a while] What's the aim here? This is obviously far, far too much for a stable fix, To fix the locking bug and ugly abuse of dev->iommu? Fixing the locking can be achieved by fixing the locking, as I have now demonstrated. I wouldn't say that, it is up to the people who care about this to decide. It seems alot of people are hitting it so maybe it should be backported in some situations. Regardless, we should not continue to have this locking bug in v6.8. but then it's also not the refactoring we want for the future either, since it's moving in the wrong direction of cementing the fundamental brokenness further in place rather than getting any closer to removing it. I haven't seen patches or an outline on what you have in mind though? In my view I would like to get rid of of_xlate(), at a minimum. It is a micro-optimization I don't think we need. I see a pretty straightforward path to get there from here. Micro-optimisation!? OK, I think I have to say it. Please stop trying to rewrite code you don't understand. Do you also want to get rid of iommu_fwspec, or at least thin it out? That seems reasonable too, I think that becomes within reach once of_xlate is gone. What do you see as "cemeting"? Most of this series constitutes a giant sweeping redesign of a whole bunch of internal machinery to permit it to be used concurrently, where that concurrency should still not exist in the first place because the thing that allows it to happen also causes other problems like groups being broken. Once the real problem is fixed there will be no need for any of this, and at worst some of it will then actually get in the way. I feel like I've explained it many times already, but what needs to happen is for the firmware parsing and of_xlate stage to be initiated by __iommu_probe_device() itself. The first step is my bus ops series (if I'm ever allowed to get it landed...) which gets to the state of expecting to start from a fwspec. Then it's a case of shuffling around what's currently in the bus_type dma_configure methods such that point is where the fwspec is created as well, and the driver-probe-time work is almost removed except for still deferring if a device is waiting for its IOMMU instance (since that instance turning up and registering will retrigger the rest itself). And there at last, a trivial lifecycle and access pattern for dev->iommu (with the overlapping bits of iommu_fwspec finally able to be squashed as well), and finally an end to 8 long and unfortunate years of calling things in the wrong order in ways they were never supposed to be. Thanks, Robin. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
On 2023-11-15 2:05 pm, Jason Gunthorpe wrote: [Several people have tested this now, so it is something that should sit in linux-next for a while] What's the aim here? This is obviously far, far too much for a stable fix, but then it's also not the refactoring we want for the future either, since it's moving in the wrong direction of cementing the fundamental brokenness further in place rather than getting any closer to removing it. Thanks, Robin. The iommu subsystem uses dev->iommu to store bits of information about the attached iommu driver. This has been co-opted by the ACPI/OF code to also be a place to pass around the iommu_fwspec before a driver is probed. Since both are using the same pointers without any locking it triggers races if there is concurrent driver loading: CPU0 CPU1 of_iommu_configure()iommu_device_register() .. bus_iommu_probe() iommu_fwspec_of_xlate() __iommu_probe_device() iommu_init_device() dev_iommu_get() .. ops->probe fails, no fwspec .. dev_iommu_free() dev->iommu->fwspec*crash* My first attempt get correct locking here was to use the device_lock to protect the entire *_iommu_configure() and iommu_probe() paths. This allowed safe use of dev->iommu within those paths. Unfortuately enough drivers abuse the of_iommu_configure() flow without proper locking and this approach failed. This approach removes touches of dev->iommu from the *_iommu_configure() code. The few remaining required touches are moved into iommu.c and protected with the existing iommu_probe_device_lock. To do this we change *_iommu_configure() to hold the iommu_fwspec on the stack while it is being built. Once it is fully formed the core code will install it into the dev->iommu when it calls probe. This also removes all the touches of iommu_ops from the *_iommu_configure() paths and makes that mechanism private to the iommu core. A few more lockdep assertions are added to discourage future mis-use. This is on github: https://github.com/jgunthorpe/linux/commits/iommu_fwspec v2: - Fix all the kconfig randomization 0-day stuff - Add missing kdoc parameters - Remove NO_IOMMU, replace it with ENODEV - Use PTR_ERR to print errno in the new/moved logging v1: https://lore.kernel.org/r/0-v1-5f734af130a3+34f-iommu_fwspec_...@nvidia.com Jason Gunthorpe (17): iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops() iommmu/of: Do not return struct iommu_ops from of_iommu_configure() iommu/of: Use -ENODEV consistently in of_iommu_configure() acpi: Do not return struct iommu_ops from acpi_iommu_configure_id() iommu: Make iommu_fwspec->ids a distinct allocation iommu: Add iommu_fwspec_alloc/dealloc() iommu: Add iommu_probe_device_fwspec() iommu/of: Do not use dev->iommu within of_iommu_configure() iommu: Add iommu_fwspec_append_ids() acpi: Do not use dev->iommu within acpi_iommu_configure() iommu: Hold iommu_probe_device_lock while calling ops->of_xlate iommu: Make iommu_ops_from_fwnode() static iommu: Remove dev_iommu_fwspec_set() iommu: Remove pointless iommu_fwspec_free() iommu: Add ops->of_xlate_fwspec() iommu: Mark dev_iommu_get() with lockdep iommu: Mark dev_iommu_priv_set() with a lockdep arch/arc/mm/dma.c | 2 +- arch/arm/mm/dma-mapping-nommu.c | 2 +- arch/arm/mm/dma-mapping.c | 10 +- arch/arm64/mm/dma-mapping.c | 4 +- arch/mips/mm/dma-noncoherent.c | 2 +- arch/riscv/mm/dma-noncoherent.c | 2 +- drivers/acpi/arm64/iort.c | 42 ++-- drivers/acpi/scan.c | 104 + drivers/acpi/viot.c | 45 ++-- drivers/hv/hv_common.c | 2 +- drivers/iommu/amd/iommu.c | 2 - drivers/iommu/apple-dart.c | 1 - drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +- drivers/iommu/arm/arm-smmu/arm-smmu.c | 23 +- drivers/iommu/intel/iommu.c | 2 - drivers/iommu/iommu.c | 227 +++- drivers/iommu/of_iommu.c| 133 +--- drivers/iommu/omap-iommu.c | 1 - drivers/iommu/tegra-smmu.c | 1 - drivers/iommu/virtio-iommu.c| 8 +- drivers/of/device.c | 24 ++- include/acpi/acpi_bus.h | 8 +- include/linux/acpi_iort.h | 8 +- include/linux/acpi_viot.h | 5 +- include/linux/dma-map-ops.h | 4 +- include/linux/iommu.h | 47 ++-- include/linux/of_iommu.h| 13 +- 27 files
Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
On 2023-11-15 2:05 pm, Jason Gunthorpe wrote: [Several people have tested this now, so it is something that should sit in linux-next for a while] What's the aim here? This is obviously far, far too much for a stable fix, but then it's also not the refactoring we want for the future either, since it's moving in the wrong direction of cementing the fundamental brokenness further in place rather than getting any closer to removing it. Thanks, Robin. The iommu subsystem uses dev->iommu to store bits of information about the attached iommu driver. This has been co-opted by the ACPI/OF code to also be a place to pass around the iommu_fwspec before a driver is probed. Since both are using the same pointers without any locking it triggers races if there is concurrent driver loading: CPU0 CPU1 of_iommu_configure()iommu_device_register() .. bus_iommu_probe() iommu_fwspec_of_xlate() __iommu_probe_device() iommu_init_device() dev_iommu_get() .. ops->probe fails, no fwspec .. dev_iommu_free() dev->iommu->fwspec*crash* My first attempt get correct locking here was to use the device_lock to protect the entire *_iommu_configure() and iommu_probe() paths. This allowed safe use of dev->iommu within those paths. Unfortuately enough drivers abuse the of_iommu_configure() flow without proper locking and this approach failed. This approach removes touches of dev->iommu from the *_iommu_configure() code. The few remaining required touches are moved into iommu.c and protected with the existing iommu_probe_device_lock. To do this we change *_iommu_configure() to hold the iommu_fwspec on the stack while it is being built. Once it is fully formed the core code will install it into the dev->iommu when it calls probe. This also removes all the touches of iommu_ops from the *_iommu_configure() paths and makes that mechanism private to the iommu core. A few more lockdep assertions are added to discourage future mis-use. This is on github: https://github.com/jgunthorpe/linux/commits/iommu_fwspec v2: - Fix all the kconfig randomization 0-day stuff - Add missing kdoc parameters - Remove NO_IOMMU, replace it with ENODEV - Use PTR_ERR to print errno in the new/moved logging v1: https://lore.kernel.org/r/0-v1-5f734af130a3+34f-iommu_fwspec_...@nvidia.com Jason Gunthorpe (17): iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops() iommmu/of: Do not return struct iommu_ops from of_iommu_configure() iommu/of: Use -ENODEV consistently in of_iommu_configure() acpi: Do not return struct iommu_ops from acpi_iommu_configure_id() iommu: Make iommu_fwspec->ids a distinct allocation iommu: Add iommu_fwspec_alloc/dealloc() iommu: Add iommu_probe_device_fwspec() iommu/of: Do not use dev->iommu within of_iommu_configure() iommu: Add iommu_fwspec_append_ids() acpi: Do not use dev->iommu within acpi_iommu_configure() iommu: Hold iommu_probe_device_lock while calling ops->of_xlate iommu: Make iommu_ops_from_fwnode() static iommu: Remove dev_iommu_fwspec_set() iommu: Remove pointless iommu_fwspec_free() iommu: Add ops->of_xlate_fwspec() iommu: Mark dev_iommu_get() with lockdep iommu: Mark dev_iommu_priv_set() with a lockdep arch/arc/mm/dma.c | 2 +- arch/arm/mm/dma-mapping-nommu.c | 2 +- arch/arm/mm/dma-mapping.c | 10 +- arch/arm64/mm/dma-mapping.c | 4 +- arch/mips/mm/dma-noncoherent.c | 2 +- arch/riscv/mm/dma-noncoherent.c | 2 +- drivers/acpi/arm64/iort.c | 42 ++-- drivers/acpi/scan.c | 104 + drivers/acpi/viot.c | 45 ++-- drivers/hv/hv_common.c | 2 +- drivers/iommu/amd/iommu.c | 2 - drivers/iommu/apple-dart.c | 1 - drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +- drivers/iommu/arm/arm-smmu/arm-smmu.c | 23 +- drivers/iommu/intel/iommu.c | 2 - drivers/iommu/iommu.c | 227 +++- drivers/iommu/of_iommu.c| 133 +--- drivers/iommu/omap-iommu.c | 1 - drivers/iommu/tegra-smmu.c | 1 - drivers/iommu/virtio-iommu.c| 8 +- drivers/of/device.c | 24 ++- include/acpi/acpi_bus.h | 8 +- include/linux/acpi_iort.h | 8 +- include/linux/acpi_viot.h | 5 +- include/linux/dma-map-ops.h | 4 +- include/linux/iommu.h | 47 ++-- include/linux/of_iommu.h| 13 +- 27 files
Re: [PATCH] arm/mm: add option to prefer IOMMU ops for DMA on Xen
On 11/11/2023 6:45 pm, Chuck Zmudzinski wrote: Enabling the new option, ARM_DMA_USE_IOMMU_XEN, fixes this error when attaching the Exynos mixer in Linux dom0 on Xen on the Chromebook Snow (and probably on other devices that use the Exynos mixer): [drm] Exynos DRM: using 1440.fimd device for DMA mapping operations exynos-drm exynos-drm: bound 1440.fimd (ops 0xc0d96354) exynos-mixer 1445.mixer: [drm:exynos_drm_register_dma] *ERROR* Device 1445.mixer lacks support for IOMMU exynos-drm exynos-drm: failed to bind 1445.mixer (ops 0xc0d97554): -22 exynos-drm exynos-drm: adev bind failed: -22 exynos-dp: probe of 145b.dp-controller failed with error -22 Linux normally uses xen_swiotlb_dma_ops for DMA for all devices when xen_swiotlb is detected even when Xen exposes an IOMMU to Linux. Enabling the new config option allows devices such as the Exynos mixer to use the IOMMU instead of xen_swiotlb_dma_ops for DMA and this fixes the error. The new config option is not set by default because it is likely some devices that use IOMMU for DMA on Xen will cause DMA errors and memory corruption when Xen PV block and network drivers are in use on the system. Link: https://lore.kernel.org/xen-devel/acfab1c5-eed1-4930-8c70-8681e256c...@netscape.net/ Signed-off-by: Chuck Zmudzinski --- The reported error with the Exynos mixer is not fixed by default by adding a second patch to select the new option in the Kconfig definition for the Exynos mixer if EXYNOS_IOMMU and SWIOTLB_XEN are enabled because it is not certain setting the config option is suitable for all cases. So it is necessary to explicitly select the new config option during the config stage of the Linux kernel build to fix the reported error or similar errors that have the same cause of lack of support for IOMMU on Xen. This is necessary to avoid any regressions that might be caused by enabling the new option by default for the Exynos mixer. arch/arm/mm/dma-mapping.c | 6 ++ drivers/xen/Kconfig | 16 2 files changed, 22 insertions(+) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 5409225b4abc..ca04fdf01be3 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1779,6 +1779,12 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, if (iommu) arm_setup_iommu_dma_ops(dev, dma_base, size, iommu, coherent); +#ifdef CONFIG_ARM_DMA_USE_IOMMU_XEN FWIW I don't think this really needs a config option - if Xen *has* made an IOMMU available, then there isn't really much reason not to use it, and if for some reason someone really didn't want to then they could simply disable the IOMMU driver anyway. + if (dev->dma_ops == _ops) { + dev->archdata.dma_ops_setup = true; The existing assignment is effectively unconditional by this point anyway, so could probably just be moved earlier to save duplicating it (or perhaps just make the xen_setup_dma_ops() call conditional instead to save the early return as well). However, are the IOMMU DMA ops really compatible with Xen? The comments about hypercalls and foreign memory in xen_arch_need_swiotlb() leave me concerned that assuming non-coherent DMA to any old Dom0 page is OK might not actually work in general :/ Thanks, Robin. + return; + } +#endif xen_setup_dma_ops(dev); dev->archdata.dma_ops_setup = true; } diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index d5989871dd5d..44e1334b6acd 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -181,6 +181,22 @@ config SWIOTLB_XEN select DMA_OPS select SWIOTLB +config ARM_DMA_USE_IOMMU_XEN + bool "Prefer IOMMU DMA ops on Xen" + depends on SWIOTLB_XEN + depends on ARM_DMA_USE_IOMMU + help + Normally on Xen, the IOMMU is used by Xen and not exposed to + Linux. Some Arm systems such as Exynos have an IOMMU that + Xen does not use so the IOMMU is exposed to Linux in those + cases. This option enables Linux to use the IOMMU instead of + using the Xen swiotlb_dma_ops for DMA on Xen. + + Say N here unless support for one or more devices that use + IOMMU ops instead of Xen swiotlb ops for DMA is needed and the + devices that use the IOMMU do not cause any problems on the + Xen system in use. + config XEN_PCI_STUB bool
Re: [PATCH v2 6/8] dt-bindings: reserved-memory: Add secure CMA reserved memory range
On 13/11/2023 6:37 am, Yong Wu (吴勇) wrote: [...] +properties: + compatible: +const: secure_cma_region Still wrong compatible. Look at other bindings - there is nowhere underscore. Look at other reserved memory bindings especially. Also, CMA is a Linux thingy, so either not suitable for bindings at all, or you need Linux specific compatible. I don't quite get why do you evennot put CMA there - adding Linux specific stuff will get obvious pushback... Thanks. I will change to: secure-region. Is this ok? No, the previous discussion went off in entirely the wrong direction. To reiterate, the point of the binding is not to describe the expected usage of the thing nor the general concept of the thing, but to describe the actual thing itself. There are any number of different ways software may interact with a "secure region", so that is meaningless as a compatible. It needs to describe *this* secure memory interface offered by *this* TEE, so that software knows that to use it requires making those particular SiP calls with that particular UUID etc. Thanks, Robin.
Re: [PATCH v7 08/10] arm64: Kconfig.platforms: Add config for Marvell PXA1908 platform
On 2023-11-03 5:02 pm, Duje Mihanović wrote: On Friday, November 3, 2023 4:34:54 PM CET Robin Murphy wrote: On 2023-11-02 3:20 pm, Duje Mihanović wrote: +config ARCH_MMP + bool "Marvell MMP SoC Family" + select ARM_GIC + select ARM_ARCH_TIMER + select ARM_SMMU NAK, not only is selecting user-visible symbols generally frowned upon, and ignoring their dependencies even worse, but for a multiplatform kernel the user may well want this to be a module. If having the SMMU driver built-in is somehow fundamentally required for this platform to boot, that would represent much bigger problems. The SoC can boot without SMMU and PDMA, but not GIC, pinctrl or the arch timer. I see that most other SoCs still select drivers and frameworks they presumably need for booting, with the exceptions of ARCH_BITMAIN, ARCH_LG1K and a couple others. Which of these two options should I go for? Well, you don't really need to select ARM_GIC or ARM_ARCH_TIMER here either, since those are already selected by ARM64 itself. Keeping PINCTRL_SINGLE is fair, although you should also select PINCTRL as its dependency. As an additional nit, the file seems to be primarily ordered by symbol name, so it might be nice to slip ARCH_MMC in between ARCH_MESON and ARCH_MVEBU. Cheers, Robin.
Re: [PATCH v7 08/10] arm64: Kconfig.platforms: Add config for Marvell PXA1908 platform
On 2023-11-02 3:20 pm, Duje Mihanović wrote: Add ARCH_MMP configuration option for Marvell PXA1908 SoC. Signed-off-by: Duje Mihanović --- arch/arm64/Kconfig.platforms | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms index 6069120199bb..b417cae42c84 100644 --- a/arch/arm64/Kconfig.platforms +++ b/arch/arm64/Kconfig.platforms @@ -89,6 +89,17 @@ config ARCH_BERLIN help This enables support for Marvell Berlin SoC Family +config ARCH_MMP + bool "Marvell MMP SoC Family" + select ARM_GIC + select ARM_ARCH_TIMER + select ARM_SMMU NAK, not only is selecting user-visible symbols generally frowned upon, and ignoring their dependencies even worse, but for a multiplatform kernel the user may well want this to be a module. If having the SMMU driver built-in is somehow fundamentally required for this platform to boot, that would represent much bigger problems. Thanks, Robin. + select MMP_PDMA + select PINCTRL_SINGLE + help + This enables support for Marvell MMP SoC family, currently + supporting PXA1908 aka IAP140. + config ARCH_BITMAIN bool "Bitmain SoC Platforms" help
Re: [PATCH v7 06/10] ASoC: pxa: Suppress SSPA on ARM64
On 2023-11-02 3:26 pm, Mark Brown wrote: On Thu, Nov 02, 2023 at 04:20:29PM +0100, Duje Mihanović wrote: The SSPA driver currently seems to generate ARM32 assembly, which causes build errors when building a kernel for an ARM64 ARCH_MMP platform. Fixes: fa375d42f0e5 ("ASoC: mmp: add sspa support") Reported-by: kernel test robot tristate "SoC Audio via MMP SSPA ports" - depends on ARCH_MMP + depends on ARCH_MMP && ARM This isn't a fix for the existing code, AFAICT the issue here is that ARCH_MMP is currently only available for arm and presumably something in the rest of your series makes it available for arm64. This would be a prerequisite for that patch. Please don't just insert random fixes tags just because you can. FWIW it doesn't even seem to be the right reason either. AFACIT the issue being introduced is that SND_MMP_SOC_SSPA selects SND_ARM which depends on ARM, but after patch #8 ARCH_MMP itself will no longer necessarily imply ARM. The fact that selecting SND_ARM with unmet dependencies also allows SND_ARMAACI to be enabled (which appears to be the only thing actually containing open-coded Arm asm) is tangential. Robin.
Re: [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU
On 29/09/2023 4:45 pm, Will Deacon wrote: On Mon, Sep 25, 2023 at 06:54:42PM +0100, Robin Murphy wrote: On 2023-04-10 19:52, Dmitry Baryshkov wrote: If the Adreno SMMU is dma-coherent, allocation will fail unless we disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the coherent SMMUs (like we have on sm8350 platform). Hmm, but is it right that it should fail in the first place? The fact is that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly can't see why the io-pgtable code is going out of its way to explicitly reject a request to give them the same attribute it's already giving then anyway :/ Even if the original intent was for the quirk to have an over-specific implication of representing inner-NC as well, that hardly seems useful if what we've ended up with in practice is a nonsensical-looking check in one place and then a weird hacky bodge in another purely to work around it. Does anyone know a good reason why this is the way it is? I think it was mainly because the quick doesn't make sense for a coherent page-table walker and we could in theory use that bit for something else in that case. Yuck, even if we did want some horrible notion of quirks being conditional on parts of the config rather than just the format, then the users would need to be testing for the same condition as the pagetable code itself (i.e. cfg->coherent_walk), rather than hoping some other property of something else indirectly reflects the right information - e.g. there'd be no hope of backporting this particular bodge before 5.19 where the old iommu_capable(IOMMU_CAP_CACHE_COHERENCY) always returned true, and in future we could conceivably support coherent SMMUs being configured for non-coherent walks on a per-domain basis. Furthermore, if we did overload a flag to have multiple meanings, then we'd have no way of knowing which one the caller was actually expecting, thus the illusion of being able to validate calls in the meantime isn't necessarily as helpful as it seems, particularly in a case where the "wrong" interpretation would be to have no effect anyway. Mostly though I'd hope that if we ever got anywhere near the point of running out of quirk bits we'd have already realised that it's time for a better interface :( Based on that, I think that when I do get round to needing to touch this code, I'll propose just streamlining the whole quirk. Cheers, Robin.
Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU
On 29/09/2023 4:45 pm, Will Deacon wrote: On Mon, Sep 25, 2023 at 06:54:42PM +0100, Robin Murphy wrote: On 2023-04-10 19:52, Dmitry Baryshkov wrote: If the Adreno SMMU is dma-coherent, allocation will fail unless we disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the coherent SMMUs (like we have on sm8350 platform). Hmm, but is it right that it should fail in the first place? The fact is that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly can't see why the io-pgtable code is going out of its way to explicitly reject a request to give them the same attribute it's already giving then anyway :/ Even if the original intent was for the quirk to have an over-specific implication of representing inner-NC as well, that hardly seems useful if what we've ended up with in practice is a nonsensical-looking check in one place and then a weird hacky bodge in another purely to work around it. Does anyone know a good reason why this is the way it is? I think it was mainly because the quick doesn't make sense for a coherent page-table walker and we could in theory use that bit for something else in that case. Yuck, even if we did want some horrible notion of quirks being conditional on parts of the config rather than just the format, then the users would need to be testing for the same condition as the pagetable code itself (i.e. cfg->coherent_walk), rather than hoping some other property of something else indirectly reflects the right information - e.g. there'd be no hope of backporting this particular bodge before 5.19 where the old iommu_capable(IOMMU_CAP_CACHE_COHERENCY) always returned true, and in future we could conceivably support coherent SMMUs being configured for non-coherent walks on a per-domain basis. Furthermore, if we did overload a flag to have multiple meanings, then we'd have no way of knowing which one the caller was actually expecting, thus the illusion of being able to validate calls in the meantime isn't necessarily as helpful as it seems, particularly in a case where the "wrong" interpretation would be to have no effect anyway. Mostly though I'd hope that if we ever got anywhere near the point of running out of quirk bits we'd have already realised that it's time for a better interface :( Based on that, I think that when I do get round to needing to touch this code, I'll propose just streamlining the whole quirk. Cheers, Robin.
Re: [PATCH 6/8] iommu/dart: Move the blocked domain support to a global static
On 2023-09-26 20:05, Janne Grunau wrote: Hej, On Fri, Sep 22, 2023 at 02:07:57PM -0300, Jason Gunthorpe wrote: Move to the new static global for blocked domains. Move the blocked specific code to apple_dart_attach_dev_blocked(). Signed-off-by: Jason Gunthorpe --- drivers/iommu/apple-dart.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c index 424f779ccc34df..376f4c5461e8f7 100644 --- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -675,10 +675,6 @@ static int apple_dart_attach_dev(struct iommu_domain *domain, for_each_stream_map(i, cfg, stream_map) apple_dart_setup_translation(dart_domain, stream_map); break; - case IOMMU_DOMAIN_BLOCKED: - for_each_stream_map(i, cfg, stream_map) - apple_dart_hw_disable_dma(stream_map); - break; default: return -EINVAL; } @@ -710,6 +706,30 @@ static struct iommu_domain apple_dart_identity_domain = { .ops = _dart_identity_ops, }; +static int apple_dart_attach_dev_blocked(struct iommu_domain *domain, +struct device *dev) +{ + struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev); + struct apple_dart_stream_map *stream_map; + int i; + + if (cfg->stream_maps[0].dart->force_bypass) + return -EINVAL; unrelated to this change as this keeps the current behavior but I think force_bypass should not override IOMMU_DOMAIN_BLOCKED. It is set if the CPU page size is smaller than dart's page size. Obviously dart can't translate in that situation but it should be still possible to block it completely. How do we manage this? I can write a patch either to the current state or based on this series. The series is queued already, so best to send a patch based on iommu/core (I guess just removing these lines?). It won't be super-useful in practice since the blocking domain is normally only used to transition to an unmanaged domain which in the force_bypass situation can't be used anyway, but it's still nice on principle not to have unnecessary reasons for attach to fail. Thanks, Robin. + + for_each_stream_map(i, cfg, stream_map) + apple_dart_hw_disable_dma(stream_map); + return 0; +} + +static const struct iommu_domain_ops apple_dart_blocked_ops = { + .attach_dev = apple_dart_attach_dev_blocked, +}; + +static struct iommu_domain apple_dart_blocked_domain = { + .type = IOMMU_DOMAIN_BLOCKED, + .ops = _dart_blocked_ops, +}; + static struct iommu_device *apple_dart_probe_device(struct device *dev) { struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev); @@ -739,8 +759,7 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type) { struct apple_dart_domain *dart_domain; - if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_BLOCKED) + if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED) return NULL; dart_domain = kzalloc(sizeof(*dart_domain), GFP_KERNEL); @@ -749,10 +768,6 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type) mutex_init(_domain->init_lock); - /* no need to allocate pgtbl_ops or do any other finalization steps */ - if (type == IOMMU_DOMAIN_BLOCKED) - dart_domain->finalized = true; - return _domain->domain; } @@ -966,6 +981,7 @@ static void apple_dart_get_resv_regions(struct device *dev, static const struct iommu_ops apple_dart_iommu_ops = { .identity_domain = _dart_identity_domain, + .blocked_domain = _dart_blocked_domain, .domain_alloc = apple_dart_domain_alloc, .probe_device = apple_dart_probe_device, .release_device = apple_dart_release_device, -- 2.42.0 Reviewed-by: Janne Grunau best regards Janne ps: I sent the reply to [Patch 4/8] accidentally with an incorrect from address but the correct Reviewed-by:. I can resend if necessary.
Re: [PATCH 6/8] iommu/dart: Move the blocked domain support to a global static
On 2023-09-26 20:34, Robin Murphy wrote: On 2023-09-26 20:05, Janne Grunau wrote: Hej, On Fri, Sep 22, 2023 at 02:07:57PM -0300, Jason Gunthorpe wrote: Move to the new static global for blocked domains. Move the blocked specific code to apple_dart_attach_dev_blocked(). Signed-off-by: Jason Gunthorpe --- drivers/iommu/apple-dart.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c index 424f779ccc34df..376f4c5461e8f7 100644 --- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -675,10 +675,6 @@ static int apple_dart_attach_dev(struct iommu_domain *domain, for_each_stream_map(i, cfg, stream_map) apple_dart_setup_translation(dart_domain, stream_map); break; - case IOMMU_DOMAIN_BLOCKED: - for_each_stream_map(i, cfg, stream_map) - apple_dart_hw_disable_dma(stream_map); - break; default: return -EINVAL; } @@ -710,6 +706,30 @@ static struct iommu_domain apple_dart_identity_domain = { .ops = _dart_identity_ops, }; +static int apple_dart_attach_dev_blocked(struct iommu_domain *domain, + struct device *dev) +{ + struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev); + struct apple_dart_stream_map *stream_map; + int i; + + if (cfg->stream_maps[0].dart->force_bypass) + return -EINVAL; unrelated to this change as this keeps the current behavior but I think force_bypass should not override IOMMU_DOMAIN_BLOCKED. It is set if the CPU page size is smaller than dart's page size. Obviously dart can't translate in that situation but it should be still possible to block it completely. How do we manage this? I can write a patch either to the current state or based on this series. The series is queued already, so best to send a patch based on iommu/core (I guess just removing these lines?). Um, what? This isn't the domain_alloc_paging series itself, Robin you fool. Clearly it's time to close the computer and try again tomorrow... Cheers, Robin. It won't be super-useful in practice since the blocking domain is normally only used to transition to an unmanaged domain which in the force_bypass situation can't be used anyway, but it's still nice on principle not to have unnecessary reasons for attach to fail. Thanks, Robin. + + for_each_stream_map(i, cfg, stream_map) + apple_dart_hw_disable_dma(stream_map); + return 0; +} + +static const struct iommu_domain_ops apple_dart_blocked_ops = { + .attach_dev = apple_dart_attach_dev_blocked, +}; + +static struct iommu_domain apple_dart_blocked_domain = { + .type = IOMMU_DOMAIN_BLOCKED, + .ops = _dart_blocked_ops, +}; + static struct iommu_device *apple_dart_probe_device(struct device *dev) { struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev); @@ -739,8 +759,7 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type) { struct apple_dart_domain *dart_domain; - if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_BLOCKED) + if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED) return NULL; dart_domain = kzalloc(sizeof(*dart_domain), GFP_KERNEL); @@ -749,10 +768,6 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type) mutex_init(_domain->init_lock); - /* no need to allocate pgtbl_ops or do any other finalization steps */ - if (type == IOMMU_DOMAIN_BLOCKED) - dart_domain->finalized = true; - return _domain->domain; } @@ -966,6 +981,7 @@ static void apple_dart_get_resv_regions(struct device *dev, static const struct iommu_ops apple_dart_iommu_ops = { .identity_domain = _dart_identity_domain, + .blocked_domain = _dart_blocked_domain, .domain_alloc = apple_dart_domain_alloc, .probe_device = apple_dart_probe_device, .release_device = apple_dart_release_device, -- 2.42.0 Reviewed-by: Janne Grunau best regards Janne ps: I sent the reply to [Patch 4/8] accidentally with an incorrect from address but the correct Reviewed-by:. I can resend if necessary.
Re: [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU
On 2023-04-10 19:52, Dmitry Baryshkov wrote: If the Adreno SMMU is dma-coherent, allocation will fail unless we disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the coherent SMMUs (like we have on sm8350 platform). Hmm, but is it right that it should fail in the first place? The fact is that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly can't see why the io-pgtable code is going out of its way to explicitly reject a request to give them the same attribute it's already giving then anyway :/ Even if the original intent was for the quirk to have an over-specific implication of representing inner-NC as well, that hardly seems useful if what we've ended up with in practice is a nonsensical-looking check in one place and then a weird hacky bodge in another purely to work around it. Does anyone know a good reason why this is the way it is? [ just came across this code in the tree while trying to figure out what to do with iommu_set_pgtable_quirks()... ] Thanks, Robin. Fixes: 54af0ceb7595 ("arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU nodes") Reported-by: David Heidelberg Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 2942d2548ce6..f74495dcbd96 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1793,7 +1793,8 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) * This allows GPU to set the bus attributes required to use system * cache on behalf of the iommu page table walker. */ - if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice)) + if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice) && + !device_iommu_capable(>dev, IOMMU_CAP_CACHE_COHERENCY)) quirks |= IO_PGTABLE_QUIRK_ARM_OUTER_WBWA; return adreno_iommu_create_address_space(gpu, pdev, quirks);
Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU
On 2023-04-10 19:52, Dmitry Baryshkov wrote: If the Adreno SMMU is dma-coherent, allocation will fail unless we disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the coherent SMMUs (like we have on sm8350 platform). Hmm, but is it right that it should fail in the first place? The fact is that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly can't see why the io-pgtable code is going out of its way to explicitly reject a request to give them the same attribute it's already giving then anyway :/ Even if the original intent was for the quirk to have an over-specific implication of representing inner-NC as well, that hardly seems useful if what we've ended up with in practice is a nonsensical-looking check in one place and then a weird hacky bodge in another purely to work around it. Does anyone know a good reason why this is the way it is? [ just came across this code in the tree while trying to figure out what to do with iommu_set_pgtable_quirks()... ] Thanks, Robin. Fixes: 54af0ceb7595 ("arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU nodes") Reported-by: David Heidelberg Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 2942d2548ce6..f74495dcbd96 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1793,7 +1793,8 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) * This allows GPU to set the bus attributes required to use system * cache on behalf of the iommu page table walker. */ - if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice)) + if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice) && + !device_iommu_capable(>dev, IOMMU_CAP_CACHE_COHERENCY)) quirks |= IO_PGTABLE_QUIRK_ARM_OUTER_WBWA; return adreno_iommu_create_address_space(gpu, pdev, quirks);
Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map
On 2023-09-25 14:29, Jason Gunthorpe wrote: On Mon, Sep 25, 2023 at 02:07:50PM +0100, Robin Murphy wrote: On 2023-09-23 00:33, Jason Gunthorpe wrote: On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings either; it sets it once it's discovered any instance, since apparently it's assuming that all instances must support identical page sizes, and thus once it's seen one it can work "normally" per the core code's assumptions. It's also I think the only driver which has a "finalise" bodge but *can* still properly support map-before-attach, by virtue of having to replay mappings to every new endpoint anyway. Well it can't quite do that since it doesn't know the geometry - it all is sort of guessing and hoping it doesn't explode on replay. If it knows the geometry it wouldn't need finalize... I think it's entirely reasonable to assume that any direct mappings specified for a device are valid for that device and its IOMMU. However, in the particular case of virtio, it really shouldn't ever have direct mappings anyway, since even if the underlying hardware did have any, the host can enforce the actual direct-mapping aspect itself, and just present them as unusable regions to the guest. I assume this machinery is for the ARM GIC ITS page Again, that's irrelevant. It can only be about whether the actual ->map_pages call succeeds or not. A driver could well know up-front that all instances support the same pgsize_bitmap and aperture, and set both at ->domain_alloc time, yet still be unable to handle an actual mapping without knowing which instance(s) that needs to interact with (e.g. omap-iommu). I think this is a different issue. The domain is supposed to represent the actual io pte storage, and the storage is supposed to exist even when the domain is not attached to anything. As we said with tegra-gart, it is a bug in the driver if all the mappings disappear when the last device is detached from the domain. Driver bugs like this turn into significant issues with vfio/iommufd as this will result in warn_on's and memory leaking. So, I disagree that this is something we should be allowing in the API design. map_pages should succeed (memory allocation failures aside) if a IOVA within the aperture and valid flags are presented. Regardless of the attachment status. Calling map_pages with an IOVA outside the aperture should be a caller bug. It looks omap is just mis-designed to store the pgd in the omap_iommu, not the omap_iommu_domain :( pgd is clearly a per-domain object in our API. And why does every instance need its own copy of the identical pgd? The point wasn't that it was necessarily a good and justifiable example, just that it is one that exists, to demonstrate that in general we have no reasonable heuristic for guessing whether ->map_pages is going to succeed or not other than by calling it and seeing if it succeeds or not. And IMO it's a complete waste of time thinking about ways to make such a heuristic possible instead of just getting on with fixing iommu_domain_alloc() to make the problem disappear altogether. Once Joerg pushes out the current queue I'll rebase and resend v4 of the bus ops removal, then hopefully get back to despairing at the hideous pile of WIP iommu_domain_alloc() patches I currently have on top of it... Thanks, Robin. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map
On 2023-09-23 00:33, Jason Gunthorpe wrote: On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings either; it sets it once it's discovered any instance, since apparently it's assuming that all instances must support identical page sizes, and thus once it's seen one it can work "normally" per the core code's assumptions. It's also I think the only driver which has a "finalise" bodge but *can* still properly support map-before-attach, by virtue of having to replay mappings to every new endpoint anyway. Well it can't quite do that since it doesn't know the geometry - it all is sort of guessing and hoping it doesn't explode on replay. If it knows the geometry it wouldn't need finalize... I think it's entirely reasonable to assume that any direct mappings specified for a device are valid for that device and its IOMMU. However, in the particular case of virtio, it really shouldn't ever have direct mappings anyway, since even if the underlying hardware did have any, the host can enforce the actual direct-mapping aspect itself, and just present them as unusable regions to the guest. What do you think about something like this to replace iommu_create_device_direct_mappings(), that does enforce things properly? I fail to see how that would make any practical difference. Either the mappings can be correctly set up in a pagetable *before* the relevant device is attached to that pagetable, or they can't (if the driver doesn't have enough information to be able to do so) and we just have to really hope nothing blows up in the race window between attaching the device to an empty pagetable and having a second try at iommu_create_device_direct_mappings(). That's a driver-level issue and has nothing to do with pgsize_bitmap either way. Except we don't detect this in the core code correctly, that is my point. We should detect the aperture conflict, not pgsize_bitmap to check if it is the first or second try. Again, that's irrelevant. It can only be about whether the actual ->map_pages call succeeds or not. A driver could well know up-front that all instances support the same pgsize_bitmap and aperture, and set both at ->domain_alloc time, yet still be unable to handle an actual mapping without knowing which instance(s) that needs to interact with (e.g. omap-iommu). Thanks, Robin. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map
On 22/09/2023 5:27 pm, Jason Gunthorpe wrote: On Fri, Sep 22, 2023 at 02:13:18PM +0100, Robin Murphy wrote: On 22/09/2023 1:41 pm, Jason Gunthorpe wrote: On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: They're not strictly equivalent: this check works around a temporary issue with the IOMMU core, which calls map/unmap before the domain is finalized. Where? The above points to iommu_create_device_direct_mappings() but it doesn't because the pgsize_bitmap == 0: __iommu_domain_alloc() sets pgsize_bitmap in this case: /* * If not already set, assume all sizes by default; the driver * may override this later */ if (!domain->pgsize_bitmap) domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; Dirver's shouldn't do that. The core code was fixed to try again with mapping reserved regions to support these kinds of drivers. This is still the "normal" code path, really; I think it's only AMD that started initialising the domain bitmap "early" and warranted making it conditional. My main point was that iommu_create_device_direct_mappings() should fail for unfinalized domains, setting pgsize_bitmap to allow it to succeed is not a nice hack, and not necessary now. Sure, but it's the whole "unfinalised domains" and rewriting domain->pgsize_bitmap after attach thing that is itself the massive hack. AMD doesn't do that, and doesn't need to; it knows the appropriate format at allocation time and can quite happily return a fully working domain which allows map before attach, but the old ops->pgsize_bitmap mechanism fundamentally doesn't work for multiple formats with different page sizes. The only thing I'd accuse it of doing wrong is the weird half-and-half thing of having one format as a default via one mechanism, and the other as an override through the other, rather than setting both explicitly. virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings either; it sets it once it's discovered any instance, since apparently it's assuming that all instances must support identical page sizes, and thus once it's seen one it can work "normally" per the core code's assumptions. It's also I think the only driver which has a "finalise" bodge but *can* still properly support map-before-attach, by virtue of having to replay mappings to every new endpoint anyway. What do you think about something like this to replace iommu_create_device_direct_mappings(), that does enforce things properly? I fail to see how that would make any practical difference. Either the mappings can be correctly set up in a pagetable *before* the relevant device is attached to that pagetable, or they can't (if the driver doesn't have enough information to be able to do so) and we just have to really hope nothing blows up in the race window between attaching the device to an empty pagetable and having a second try at iommu_create_device_direct_mappings(). That's a driver-level issue and has nothing to do with pgsize_bitmap either way. Thanks, Robin. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map
On 22/09/2023 1:41 pm, Jason Gunthorpe wrote: On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: They're not strictly equivalent: this check works around a temporary issue with the IOMMU core, which calls map/unmap before the domain is finalized. Where? The above points to iommu_create_device_direct_mappings() but it doesn't because the pgsize_bitmap == 0: __iommu_domain_alloc() sets pgsize_bitmap in this case: /* * If not already set, assume all sizes by default; the driver * may override this later */ if (!domain->pgsize_bitmap) domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; Dirver's shouldn't do that. The core code was fixed to try again with mapping reserved regions to support these kinds of drivers. This is still the "normal" code path, really; I think it's only AMD that started initialising the domain bitmap "early" and warranted making it conditional. However we *do* ultimately want all the drivers to do the same, so we can get rid of ops->pgsize_bitmap, because it's already pretty redundant and meaningless in the face of per-domain pagetable formats. Thanks, Robin. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
On 2023-09-21 17:44, Jason Gunthorpe wrote: On Thu, Sep 21, 2023 at 08:12:03PM +0800, Baolu Lu wrote: On 2023/9/21 15:51, Yi Liu wrote: diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4a7c5c8fdbb4..3c8660fe9bb1 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, }; +/** + * enum iommu_hwpt_type - IOMMU HWPT Type + * @IOMMU_HWPT_TYPE_DEFAULT: default How about s/default/vendor agnostic/ ? Please don't use the word vendor :) IOMMU_HWPT_TYPE_GENERIC perhaps if we don't like default Ah yes, a default domain type, not to be confused with any default domain type, including the default default domain type. Just in case anyone had forgotten how gleefully fun this is :D I particularly like the bit where we end up with this construct later: switch (hwpt_type) { case IOMMU_HWPT_TYPE_DEFAULT: /* allocate a domain */ default: /* allocate a different domain */ } But of course neither case allocates a *default* domain, because it's quite obviously the wrong place to be doing that. I could go on enjoying myself, but basically yeah, "default" can't be a type in itself (at best it would be a meta-type which could be requested, such that it resolves to some real type to actually allocate), so a good name should reflect what the type functionally *means* to the user. IIUC the important distinction is that it's an abstract kernel-owned pagetable for the user to indirectly control via the API, rather than one it owns and writes directly (and thus has to be in a specific agreed format). Thanks, Robin.
Re: arm64: Unable to handle kernel execute from non-executable memory at virtual address ffff8000834c13a0
On 20/09/2023 3:32 pm, Mark Rutland wrote: Hi Naresh, On Wed, Sep 20, 2023 at 11:29:12AM +0200, Naresh Kamboju wrote: [ my two cents ] While running LTP pty07 test cases on arm64 juno-r2 with Linux next-20230919 the following kernel crash was noticed. I have been noticing this issue intermittently on Juno-r2 for more than a month. Anyone have noticed this crash ? How intermittent is this? 1/2, 1/10, 1/100, rarer still? Are you running *just* the pty07 test, or are you running a whole LTP suite and the issue first occurs around pty07? Given you've been hitting this for a month, have you tried testing mainline? Do you have a known-good kernel that we can start a bisect from? Do you *only* see this on Juno-r2 and are you testing on other hardware? Reported-by: Linux Kernel Functional Testing [0.00] Linux version 6.6.0-rc2-next-20230919 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41) #1 SMP PREEMPT @1695107157 [0.00] KASLR disabled due to lack of seed [0.00] Machine model: ARM Juno development board (r2) ... LTP running pty ... pty07.c:92: TINFO: Saving active console 1 ../../../include/tst_fuzzy_sync.h:640: TINFO: Stopped sampling at 552 (out of 1024) samples, sampling time reached 50% of the total time limit ../../../include/tst_fuzzy_sync.h:307: TINFO: loop = 552, delay_bias = 0 ../../../include/tst_fuzzy_sync.h:295: TINFO: start_a - start_b: { avg = 127ns, avg_dev =84ns, dev_ratio = 0.66 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - start_a : { avg = 17296156ns, avg_dev = 5155058ns, dev_ratio = 0.30 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_b - start_b : { avg = 101202336ns, avg_dev = 6689286ns, dev_ratio = 0.07 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - end_b: { avg = -83906064ns, avg_dev = 10230694ns, dev_ratio = 0.12 } ../../../include/tst_fuzzy_sync.h:295: TINFO: spins: { avg = 2765565 , avg_dev = 339285 , dev_ratio = 0.12 } [ 384.133538] Unable to handle kernel execute from non-executable memory at virtual address 8000834c13a0 [ 384.133559] Mem abort info: [ 384.133568] ESR = 0x860f [ 384.133578] EC = 0x21: IABT (current EL), IL = 32 bits [ 384.133590] SET = 0, FnV = 0 [ 384.133600] EA = 0, S1PTW = 0 [ 384.133610] FSC = 0x0f: level 3 permission fault [ 384.133621] swapper pgtable: 4k pages, 48-bit VAs, pgdp=82375000 [ 384.133634] [8000834c13a0] pgd=1009f003, p4d=1009f003, pud=1009e003, pmd=10098003, pte=0078836c1703 [ 384.133697] Internal error: Oops: 860f [#1] PREEMPT SMP [ 384.133707] Modules linked in: tda998x onboard_usb_hub cec hdlcd crct10dif_ce drm_dma_helper drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [ 384.133767] CPU: 3 PID: 589 Comm: (udev-worker) Not tainted 6.6.0-rc2-next-20230919 #1 [ 384.133779] Hardware name: ARM Juno development board (r2) (DT) [ 384.133784] pstate: 4005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 384.133796] pc : in_lookup_hashtable+0x178/0x2000 This indicates that the faulting address 8000834c13a0 is in_lookup_hashtable+0x178/0x2000, which would been we've somehow marked the kernel text as non-executable, which we never do intentionally. I suspect that implies memory corruption. Have you tried running this with KASAN enabled? [ 384.133818] lr : rcu_core (arch/arm64/include/asm/preempt.h:13 (discriminator 1) kernel/rcu/tree.c:2146 (discriminator 1) kernel/rcu/tree.c:2403 (discriminator 1)) For the record, this LR appears to be the expected return address of the "f(rhp);" call within rcu_do_batch() (if CONFIG_DEBUG_LOCK_ALLOC=n), so it looks like a case of a bogus or corrupted RCU callback. The PC is in the middle of a data symbol (in_lookup_hashtable is an array), so NX is expected and I wouldn't imagine the pagetables have gone wrong, just regular data corruption or use-after-free somewhere. Robin. [ 384.133832] sp : 800083533e60 [ 384.133836] x29: 800083533e60 x28: 0008008a6180 x27: 000a [ 384.133854] x26: x25: x24: 800083533f10 [ 384.133871] x23: 800082404008 x22: 800082ebea80 x21: 800082f55940 [ 384.133889] x20: 00097ed75440 x19: 0001 x18: [ 384.133905] x17: 8008fc95c000 x16: 80008353 x15: 3d09 [ 384.133922] x14: 00030d40 x13: x12: 003d0900 [ 384.133939] x11: x10: 0008 x9 : 80008015b05c [ 384.133955] x8 : 800083533da8 x7 : x6 : 0100 [ 384.133971] x5 : 800082ebf000 x4 : 800082ebf2e8 x3 : [ 384.133987] x2 : 000825bf8618 x1 : 8000834c13a0 x0 : 00082b6d7170 [ 384.134005] Call trace: [ 384.134009] in_lookup_hashtable+0x178/0x2000 [ 384.134022] rcu_core_si (kernel/rcu/tree.c:2421) [ 384.134035]
Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map
On 2023-09-19 09:15, Jean-Philippe Brucker wrote: On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 17dcd826f5c2..3649586f0e5c 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) int ret; unsigned long flags; + /* +* .iotlb_sync_map and .flush_iotlb_all may be called before the viommu +* is initialized e.g. via iommu_create_device_direct_mappings() +*/ + if (!viommu) + return 0; Minor nit: I'd be inclined to make that check explicitly in the places where it definitely is expected, rather than allowing *any* sync to silently do nothing if called incorrectly. Plus then they could use vdomain->nr_endpoints for consistency with the equivalent checks elsewhere (it did take me a moment to figure out how we could get to .iotlb_sync_map with a NULL viommu without viommu_map_pages() blowing up first...) They're not strictly equivalent: this check works around a temporary issue with the IOMMU core, which calls map/unmap before the domain is finalized. Once we merge domain_alloc() and finalize(), then this check disappears, but we still need to test nr_endpoints in map/unmap to handle detached domains (and we still need to fix the synchronization of nr_endpoints against attach/detach). That's why I preferred doing this on viommu and keeping it in one place. Fair enough - it just seems to me that in both cases it's a detached domain, so its previous history of whether it's ever been otherwise or not shouldn't matter. Even once viommu is initialised, does it really make sense to send sync commands for a mapping on a detached domain where we haven't actually sent any map/unmap commands? Thanks, Robin. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map
On 2023-09-18 12:51, Niklas Schnelle wrote: Pull out the sync operation from viommu_map_pages() by implementing ops->iotlb_sync_map. This allows the common IOMMU code to map multiple elements of an sg with a single sync (see iommu_map_sg()). Furthermore, it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH. Is it really a requirement? Deferred flush only deals with unmapping. Or are you just trying to say that it's not too worthwhile to try doing more for unmapping performance while obvious mapping performance is still left on the table? Link: https://lore.kernel.org/lkml/20230726111433.1105665-1-schne...@linux.ibm.com/ Signed-off-by: Niklas Schnelle --- drivers/iommu/virtio-iommu.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 17dcd826f5c2..3649586f0e5c 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) int ret; unsigned long flags; + /* +* .iotlb_sync_map and .flush_iotlb_all may be called before the viommu +* is initialized e.g. via iommu_create_device_direct_mappings() +*/ + if (!viommu) + return 0; Minor nit: I'd be inclined to make that check explicitly in the places where it definitely is expected, rather than allowing *any* sync to silently do nothing if called incorrectly. Plus then they could use vdomain->nr_endpoints for consistency with the equivalent checks elsewhere (it did take me a moment to figure out how we could get to .iotlb_sync_map with a NULL viommu without viommu_map_pages() blowing up first...) Thanks, Robin. spin_lock_irqsave(>request_lock, flags); ret = __viommu_sync_req(viommu); if (ret) @@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova, .flags = cpu_to_le32(flags), }; - ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map)); + ret = viommu_add_req(vdomain->viommu, , sizeof(map)); if (ret) { viommu_del_mappings(vdomain, iova, end); return ret; @@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain, viommu_sync_req(vdomain->viommu); } +static int viommu_iotlb_sync_map(struct iommu_domain *domain, +unsigned long iova, size_t size) +{ + struct viommu_domain *vdomain = to_viommu_domain(domain); + + return viommu_sync_req(vdomain->viommu); +} + static void viommu_get_resv_regions(struct device *dev, struct list_head *head) { struct iommu_resv_region *entry, *new_entry, *msi = NULL; @@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = { .unmap_pages= viommu_unmap_pages, .iova_to_phys = viommu_iova_to_phys, .iotlb_sync = viommu_iotlb_sync, + .iotlb_sync_map = viommu_iotlb_sync_map, .free = viommu_domain_free, } }; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 8/9] dt-bindings: reserved-memory: MediaTek: Add reserved memory for SVP
On 12/09/2023 4:53 pm, Rob Herring wrote: On Tue, Sep 12, 2023 at 11:13:50AM +0100, Robin Murphy wrote: On 12/09/2023 9:28 am, Krzysztof Kozlowski wrote: On 12/09/2023 08:16, Yong Wu (吴勇) wrote: Hi Rob, Thanks for your review. On Mon, 2023-09-11 at 10:44 -0500, Rob Herring wrote: External email : Please do not click links or open attachments until you have verified the sender or the content. On Mon, Sep 11, 2023 at 10:30:37AM +0800, Yong Wu wrote: This adds the binding for describing a CMA memory for MediaTek SVP(Secure Video Path). CMA is a Linux thing. How is this related to CMA? Signed-off-by: Yong Wu --- .../mediatek,secure_cma_chunkmem.yaml | 42 +++ 1 file changed, 42 insertions(+) create mode 100644 Documentation/devicetree/bindings/reserved- memory/mediatek,secure_cma_chunkmem.yaml diff --git a/Documentation/devicetree/bindings/reserved- memory/mediatek,secure_cma_chunkmem.yaml b/Documentation/devicetree/bindings/reserved- memory/mediatek,secure_cma_chunkmem.yaml new file mode 100644 index ..cc10e00d35c4 --- /dev/null +++ b/Documentation/devicetree/bindings/reserved- memory/mediatek,secure_cma_chunkmem.yaml @@ -0,0 +1,42 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/reserved-memory/mediatek,secure_cma_chunkmem.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MediaTek Secure Video Path Reserved Memory What makes this specific to Mediatek? Secure video path is fairly common, right? Here we just reserve a buffer and would like to create a dma-buf secure heap for SVP, then the secure engines(Vcodec and DRM) could prepare secure buffer through it. But the heap driver is pure SW driver, it is not platform device and All drivers are pure SW. we don't have a corresponding HW unit for it. Thus I don't think I could create a platform dtsi node and use "memory-region" pointer to the region. I used RESERVEDMEM_OF_DECLARE currently(The code is in [9/9]). Sorry if this is not right. If this is not for any hardware and you already understand this (since you cannot use other bindings) then you cannot have custom bindings for it either. Then in our usage case, is there some similar method to do this? or any other suggestion? Don't stuff software into DTS. Aren't most reserved-memory bindings just software policy if you look at it that way, though? IIUC this is a pool of memory that is visible and available to the Non-Secure OS, but is fundamentally owned by the Secure TEE, and pages that the TEE allocates from it will become physically inaccessible to the OS. Thus the platform does impose constraints on how the Non-Secure OS may use it, and per the rest of the reserved-memory bindings, describing it as a "reusable" reservation seems entirely appropriate. If anything that's *more* platform-related and so DT-relevant than typical arbitrary reservations which just represent "save some memory to dedicate to a particular driver" and don't actually bear any relationship to firmware or hardware at all. Yes, a memory range defined by hardware or firmware is within scope of DT. (CMA at aribitrary address was questionable.) My issue here is more that 'secure video memory' is not any way Mediatek specific. AIUI, it's a requirement from certain content providers for video playback to work. So why the Mediatek specific binding? Based on the implementation, I'd ask the question the other way round - the way it works looks to be at least somewhat dependent on Mediatek's TEE, in ways where other vendors' equivalent implementations may be functionally incompatible, however nothing suggests it's actually specific to video (beyond that presumably being the primary use-case they had in mind). Thanks, Robin.
Re: [PATCH 8/9] dt-bindings: reserved-memory: MediaTek: Add reserved memory for SVP
On 12/09/2023 9:28 am, Krzysztof Kozlowski wrote: On 12/09/2023 08:16, Yong Wu (吴勇) wrote: Hi Rob, Thanks for your review. On Mon, 2023-09-11 at 10:44 -0500, Rob Herring wrote: External email : Please do not click links or open attachments until you have verified the sender or the content. On Mon, Sep 11, 2023 at 10:30:37AM +0800, Yong Wu wrote: This adds the binding for describing a CMA memory for MediaTek SVP(Secure Video Path). CMA is a Linux thing. How is this related to CMA? Signed-off-by: Yong Wu --- .../mediatek,secure_cma_chunkmem.yaml | 42 +++ 1 file changed, 42 insertions(+) create mode 100644 Documentation/devicetree/bindings/reserved- memory/mediatek,secure_cma_chunkmem.yaml diff --git a/Documentation/devicetree/bindings/reserved- memory/mediatek,secure_cma_chunkmem.yaml b/Documentation/devicetree/bindings/reserved- memory/mediatek,secure_cma_chunkmem.yaml new file mode 100644 index ..cc10e00d35c4 --- /dev/null +++ b/Documentation/devicetree/bindings/reserved- memory/mediatek,secure_cma_chunkmem.yaml @@ -0,0 +1,42 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/reserved-memory/mediatek,secure_cma_chunkmem.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MediaTek Secure Video Path Reserved Memory What makes this specific to Mediatek? Secure video path is fairly common, right? Here we just reserve a buffer and would like to create a dma-buf secure heap for SVP, then the secure engines(Vcodec and DRM) could prepare secure buffer through it. But the heap driver is pure SW driver, it is not platform device and All drivers are pure SW. we don't have a corresponding HW unit for it. Thus I don't think I could create a platform dtsi node and use "memory-region" pointer to the region. I used RESERVEDMEM_OF_DECLARE currently(The code is in [9/9]). Sorry if this is not right. If this is not for any hardware and you already understand this (since you cannot use other bindings) then you cannot have custom bindings for it either. Then in our usage case, is there some similar method to do this? or any other suggestion? Don't stuff software into DTS. Aren't most reserved-memory bindings just software policy if you look at it that way, though? IIUC this is a pool of memory that is visible and available to the Non-Secure OS, but is fundamentally owned by the Secure TEE, and pages that the TEE allocates from it will become physically inaccessible to the OS. Thus the platform does impose constraints on how the Non-Secure OS may use it, and per the rest of the reserved-memory bindings, describing it as a "reusable" reservation seems entirely appropriate. If anything that's *more* platform-related and so DT-relevant than typical arbitrary reservations which just represent "save some memory to dedicate to a particular driver" and don't actually bear any relationship to firmware or hardware at all. However, the fact that Linux's implementation of how to reuse reserved memory areas is called CMA is indeed still irrelevant and has no place in the binding itself. Thanks, Robin.
Re: [PATCH 3/5] armv8: fsl-layerscape: create bypass smmu mapping for MC
On 2023-09-06 19:10, Laurentiu Tudor wrote: On 9/6/2023 8:21 PM, Robin Murphy wrote: On 2023-09-06 17:01, Laurentiu Tudor wrote: MC being a plain DMA master as any other device in the SoC and being live at OS boot time, as soon as the SMMU is probed it will immediately start triggering faults because there is no mapping in the SMMU for the MC. Pre-create such a mapping in the SMMU, being the OS's responsibility to preserve it. Does U-Boot enable the SMMU? AFAICS the only thing it knows how to do is explicitly turn it *off*, therefore programming other registers appears to be a complete waste of time. No, it doesn't enable SMMU but it does mark a SMR as valid for MC FW. And the ARM SMMU driver subtly preserves it, see [1] (it's late and I might be wrong, but I'll double check tomorrow). :-) No, that sets the SMR valid bit *if* the corresponding entry is allocated and marked as valid in the software state in smmu->smrs, which at probe time it isn't, because that's only just been allocated and is still zero-initialised. Unless, that is, arm_smmu_rmr_install_bypass_smr() found a reserved region and preallocated an entry to honour it. But even those entries are still constructed from scratch; we can't do anything with the existing SMR/S2CR register contents in general since they may be uninitialised random reset values, so we don't even look. Pay no attention to the qcom_smmu_cfg_probe() hack either - that only exists on the promise that the relevant platforms couldn't have their firmware updated to use proper RMRs. You're already doing the right thing in patch #2, so there's no need to waste code on doing a pointless wrong thing as well. Thanks, Robin. All that should matter to the OS, and that it is responsible for upholding, is the reserved memory regions from patch #2. For instance, if the OS is Linux, literally the first thing arm_smmu_device_reset() does is rewrite all the S2CRs and SMRs without so much as looking. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/arm/arm-smmu/arm-smmu.c#n894 --- Best Regards, Laurentiu Signed-off-by: Laurentiu Tudor --- arch/arm/cpu/armv8/fsl-layerscape/soc.c | 26 --- .../asm/arch-fsl-layerscape/immap_lsch3.h | 9 +++ 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/arm/cpu/armv8/fsl-layerscape/soc.c b/arch/arm/cpu/armv8/fsl-layerscape/soc.c index 3bfdc3f77431..870b99838ab5 100644 --- a/arch/arm/cpu/armv8/fsl-layerscape/soc.c +++ b/arch/arm/cpu/armv8/fsl-layerscape/soc.c @@ -376,6 +376,18 @@ void bypass_smmu(void) val = (in_le32(SMMU_NSCR0) | SCR0_CLIENTPD_MASK) & ~(SCR0_USFCFG_MASK); out_le32(SMMU_NSCR0, val); } + +void setup_smmu_mc_bypass(int icid, int mask) +{ + u32 val; + + val = SMMU_SMR_VALID_MASK | (icid << SMMU_SMR_ID_SHIFT) | + (mask << SMMU_SMR_MASK_SHIFT); + out_le32(SMMU_REG_SMR(0), val); + val = SMMU_S2CR_EXIDVALID_VALID_MASK | SMMU_S2CR_TYPE_BYPASS_MASK; + out_le32(SMMU_REG_S2CR(0), val); +} + void fsl_lsch3_early_init_f(void) { erratum_rcw_src(); @@ -402,10 +414,18 @@ void fsl_lsch3_early_init_f(void) bypass_smmu(); #endif -#if defined(CONFIG_ARCH_LS1088A) || defined(CONFIG_ARCH_LS1028A) || \ - defined(CONFIG_ARCH_LS2080A) || defined(CONFIG_ARCH_LX2160A) || \ - defined(CONFIG_ARCH_LX2162A) +#ifdef CONFIG_ARCH_LS1028A + set_icids(); +#endif + +#if defined(CONFIG_ARCH_LS1088A) || defined(CONFIG_ARCH_LS2080A) + set_icids(); + setup_smmu_mc_bypass(0x300, 0); +#endif + +#if defined(CONFIG_ARCH_LX2160A) || defined(CONFIG_ARCH_LX2162A) set_icids(); + setup_smmu_mc_bypass(0x4000, 0); #endif } diff --git a/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h b/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h index ca5e33379ba9..bec5355adaed 100644 --- a/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h +++ b/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h @@ -190,6 +190,15 @@ #define SCR0_CLIENTPD_MASK 0x0001 #define SCR0_USFCFG_MASK 0x0400 +#define SMMU_REG_SMR(n) (SMMU_BASE + 0x800 + ((n) << 2)) +#define SMMU_REG_S2CR(n) (SMMU_BASE + 0xc00 + ((n) << 2)) +#define SMMU_SMR_VALID_MASK 0x8000 +#define SMMU_SMR_MASK_MASK 0x +#define SMMU_SMR_MASK_SHIFT 16 +#define SMMU_SMR_ID_MASK 0x +#define SMMU_SMR_ID_SHIFT 0 +#define SMMU_S2CR_EXIDVALID_VALID_MASK 0x0400 +#define SMMU_S2CR_TYPE_BYPASS_MASK 0x0001 /* PCIe */ #define CFG_SYS_PCIE1_ADDR (CONFIG_SYS_IMMR + 0x240)
Re: [PATCH 3/5] armv8: fsl-layerscape: create bypass smmu mapping for MC
On 2023-09-06 17:01, Laurentiu Tudor wrote: MC being a plain DMA master as any other device in the SoC and being live at OS boot time, as soon as the SMMU is probed it will immediately start triggering faults because there is no mapping in the SMMU for the MC. Pre-create such a mapping in the SMMU, being the OS's responsibility to preserve it. Does U-Boot enable the SMMU? AFAICS the only thing it knows how to do is explicitly turn it *off*, therefore programming other registers appears to be a complete waste of time. All that should matter to the OS, and that it is responsible for upholding, is the reserved memory regions from patch #2. For instance, if the OS is Linux, literally the first thing arm_smmu_device_reset() does is rewrite all the S2CRs and SMRs without so much as looking. Thanks, Robin. Signed-off-by: Laurentiu Tudor --- arch/arm/cpu/armv8/fsl-layerscape/soc.c | 26 --- .../asm/arch-fsl-layerscape/immap_lsch3.h | 9 +++ 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/arm/cpu/armv8/fsl-layerscape/soc.c b/arch/arm/cpu/armv8/fsl-layerscape/soc.c index 3bfdc3f77431..870b99838ab5 100644 --- a/arch/arm/cpu/armv8/fsl-layerscape/soc.c +++ b/arch/arm/cpu/armv8/fsl-layerscape/soc.c @@ -376,6 +376,18 @@ void bypass_smmu(void) val = (in_le32(SMMU_NSCR0) | SCR0_CLIENTPD_MASK) & ~(SCR0_USFCFG_MASK); out_le32(SMMU_NSCR0, val); } + +void setup_smmu_mc_bypass(int icid, int mask) +{ + u32 val; + + val = SMMU_SMR_VALID_MASK | (icid << SMMU_SMR_ID_SHIFT) | + (mask << SMMU_SMR_MASK_SHIFT); + out_le32(SMMU_REG_SMR(0), val); + val = SMMU_S2CR_EXIDVALID_VALID_MASK | SMMU_S2CR_TYPE_BYPASS_MASK; + out_le32(SMMU_REG_S2CR(0), val); +} + void fsl_lsch3_early_init_f(void) { erratum_rcw_src(); @@ -402,10 +414,18 @@ void fsl_lsch3_early_init_f(void) bypass_smmu(); #endif -#if defined(CONFIG_ARCH_LS1088A) || defined(CONFIG_ARCH_LS1028A) || \ - defined(CONFIG_ARCH_LS2080A) || defined(CONFIG_ARCH_LX2160A) || \ - defined(CONFIG_ARCH_LX2162A) +#ifdef CONFIG_ARCH_LS1028A + set_icids(); +#endif + +#if defined(CONFIG_ARCH_LS1088A) || defined(CONFIG_ARCH_LS2080A) + set_icids(); + setup_smmu_mc_bypass(0x300, 0); +#endif + +#if defined(CONFIG_ARCH_LX2160A) || defined(CONFIG_ARCH_LX2162A) set_icids(); + setup_smmu_mc_bypass(0x4000, 0); #endif } diff --git a/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h b/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h index ca5e33379ba9..bec5355adaed 100644 --- a/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h +++ b/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h @@ -190,6 +190,15 @@ #define SCR0_CLIENTPD_MASK0x0001 #define SCR0_USFCFG_MASK 0x0400 +#define SMMU_REG_SMR(n) (SMMU_BASE + 0x800 + ((n) << 2)) +#define SMMU_REG_S2CR(n) (SMMU_BASE + 0xc00 + ((n) << 2)) +#define SMMU_SMR_VALID_MASK0x8000 +#define SMMU_SMR_MASK_MASK 0x +#define SMMU_SMR_MASK_SHIFT16 +#define SMMU_SMR_ID_MASK 0x +#define SMMU_SMR_ID_SHIFT 0 +#define SMMU_S2CR_EXIDVALID_VALID_MASK 0x0400 +#define SMMU_S2CR_TYPE_BYPASS_MASK 0x0001 /* PCIe */ #define CFG_SYS_PCIE1_ADDR(CONFIG_SYS_IMMR + 0x240)
Re: [PATCH 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush
On 2023-09-04 16:34, Jean-Philippe Brucker wrote: On Fri, Aug 25, 2023 at 05:21:26PM +0200, Niklas Schnelle wrote: Add ops->flush_iotlb_all operation to enable virtio-iommu for the dma-iommu deferred flush scheme. This results inn a significant increase in in performance in exchange for a window in which devices can still access previously IOMMU mapped memory. To get back to the prior behavior iommu.strict=1 may be set on the kernel command line. Maybe add that it depends on CONFIG_IOMMU_DEFAULT_DMA_{LAZY,STRICT} as well, because I've seen kernel configs that enable either. Indeed, I'd be inclined phrase it in terms of the driver now actually being able to honour lazy mode when requested (which happens to be the default on x86), rather than as if it might be some potentially-unexpected change in behaviour. Thanks, Robin. Link: https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/ Signed-off-by: Niklas Schnelle --- drivers/iommu/virtio-iommu.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index fb73dec5b953..1b7526494490 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -924,6 +924,15 @@ static int viommu_iotlb_sync_map(struct iommu_domain *domain, return viommu_sync_req(vdomain->viommu); } +static void viommu_flush_iotlb_all(struct iommu_domain *domain) +{ + struct viommu_domain *vdomain = to_viommu_domain(domain); + + if (!vdomain->nr_endpoints) + return; As for patch 1, a NULL check in viommu_sync_req() would allow dropping this one Thanks, Jean + viommu_sync_req(vdomain->viommu); +} + static void viommu_get_resv_regions(struct device *dev, struct list_head *head) { struct iommu_resv_region *entry, *new_entry, *msi = NULL; @@ -1049,6 +1058,8 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap) switch (cap) { case IOMMU_CAP_CACHE_COHERENCY: return true; + case IOMMU_CAP_DEFERRED_FLUSH: + return true; default: return false; } @@ -1069,6 +1080,7 @@ static struct iommu_ops viommu_ops = { .map_pages = viommu_map_pages, .unmap_pages= viommu_unmap_pages, .iova_to_phys = viommu_iova_to_phys, + .flush_iotlb_all= viommu_flush_iotlb_all, .iotlb_sync = viommu_iotlb_sync, .iotlb_sync_map = viommu_iotlb_sync_map, .free = viommu_domain_free, -- 2.39.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On 2023-09-04 17:16, Boris Brezillon wrote: On Mon, 4 Sep 2023 16:22:19 +0100 Steven Price wrote: On 04/09/2023 10:26, Boris Brezillon wrote: On Mon, 4 Sep 2023 08:42:08 +0100 Steven Price wrote: On 01/09/2023 17:10, Boris Brezillon wrote: On Wed, 9 Aug 2023 18:53:15 +0200 Boris Brezillon wrote: +/** + * DOC: MMIO regions exposed to userspace. + * + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET + * + * File offset for all MMIO regions being exposed to userspace. Don't use + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead. + * + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET + * + * File offset for the LATEST_FLUSH_ID register. The Userspace driver controls + * GPU cache flushling through CS instructions, but the flush reduction + * mechanism requires a flush_id. This flush_id could be queried with an + * ioctl, but Arm provides a well-isolated register page containing only this + * read-only register, so let's expose this page through a static mmap offset + * and allow direct mapping of this MMIO region so we can avoid the + * user <-> kernel round-trip. + */ +#define DRM_PANTHOR_USER_MMIO_OFFSET (0x1ull << 56) I'm playing with a 32-bit kernel/userspace, and this is problematic, because vm_pgoff is limited to 32-bit there, meaning we can only map up to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let userspace set the mmio range? Hmm, I was rather hoping we could ignore 32 bit these days ;) But while I can't see why anyone would be running a 32 bit kernel, I guess 32 bit user space is likely to still be needed. Uh, I just hit a new problem with 32-bit kernels: the io-pgtable interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any chance you could advise me on what to do here? 1. assume this limitation is here for a good reason, and limit the GPU VA space to 32-bits on 32-bit kernels or 2. update the interface to make iova an u64 I'm not sure I can answer the question from a technical perspective, hopefully Robin will be able to. Had a quick chat with Robin, and he's recommending going for #1 too. But why do we care about 32-bit kernels on a platform which is new enough to have a CSF-GPU (and by extension a recent 64-bit CPU)? Apparently the memory you save by switching to a 32-bit kernel matters to some people. To clarify, the CPU is aarch64, but they want to use it in 32-bit mode. Given the other limitations present in a 32-bit kernel I'd be tempted to say '1' just for simplicity. Especially since apparently we've lived with this for panfrost which presumably has the same limitation (even though all Bifrost/Midgard GPUs have at least 33 bits of VA space). Well, Panfrost is simpler in that you don't have this kernel VA range, and, IIRC, we are using the old format that naturally limits the GPU VA space to 4G. FWIW the legacy pagetable format itself should be fine going up to however many bits the GPU supports, however there were various ISA limitations around crossing 4GB boundaries, and the easiest way to avoid having to think about those was to just not use more than 4GB of VA at all (minus chunks at the ends for similar weird ISA reasons). Cheers, Robin.
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On 2023-08-09 17:53, Boris Brezillon wrote: [...] +/** + * struct drm_panthor_vm_create - Arguments passed to DRM_PANTHOR_IOCTL_VM_CREATE + */ +struct drm_panthor_vm_create { + /** @flags: VM flags, MBZ. */ + __u32 flags; + + /** @id: Returned VM ID. */ + __u32 id; + + /** +* @kernel_va_range: Size of the VA space reserved for kernel objects. +* +* If kernel_va_range is zero, we pick half of the VA space for kernel objects. +* +* Kernel VA space is always placed at the top of the supported VA range. +*/ + __u64 kernel_va_range; Off the back of the "IOVA as unsigned long" concern, Boris and I reasoned through the 64-bit vs. 32-bit vs. compat cases on IRC, and it seems like this kernel_va_range argument is a source of much of the pain. Rather than have userspace specify a quantity which it shouldn't care about and depend on assumptions of kernel behaviour to infer the quantity which *is* relevant (i.e. how large the usable range of the VM will actually be), I think it would be considerably more logical for userspace to simply request the size of usable VM it actually wants. Then it would be straightforward and consistent to define the default value in terms of the minimum of half the GPU VA size or TASK_SIZE (the latter being the largest *meaningful* value in all 3 cases), and it's still easy enough for the kernel to deduce for itself whether there's a reasonable amount of space left between the requested limit and ULONG_MAX for it to use. 32-bit kernels should then get at least 1GB to play with, for compat the kernel BOs can get well out of the way into the >32-bit range, and it's only really 64-bit where userspace is liable to see "kernel" VA space impinging on usable process VAs. Even then we're not sure that's a significant concern beyond OpenCL SVM. Thanks, Robin.
Re: [PATCH v2 13/15] drm/panthor: Allow driver compilation
On 2023-08-14 12:18, Steven Price wrote: On 11/08/2023 20:26, Robin Murphy wrote: On 2023-08-11 17:56, Daniel Stone wrote: Hi, On 11/08/2023 17:35, Robin Murphy wrote: On 2023-08-09 17:53, Boris Brezillon wrote: +obj-$(CONFIG_DRM_PANTHOR) += panthor.o FWIW I still think it would be nice to have a minor directory/Kconfig/Makefile reshuffle and a trivial bit of extra registration glue to build both drivers into a single module. It seems like it could be a perpetual source of confusion to end users where Mesa "panfrost" is the right option but kernel "panfrost" is the wrong one. Especially when pretty much every other GPU driver is also just one big top-level module to load for many different generations of hardware. Plus it would mean that if someone did want to have a go at deduplicating the resource-wrangling boilerplate for OPPs etc. in future, there's more chance of being able to do so meaningfully. It might be nice to point it out, but to be fair Intel and AMD both have two (or more) drivers, as does Broadcom/RPi. As does, err ... Mali. Indeed, I didn't mean to imply that I'm not aware that e.g. gma500 is to i915 what lima is to panfrost. It was more that unlike the others where there's a pretty clear line in the sand between "driver for old hardware" and "driver for the majority of recent hardware", this one happens to fall splat in the middle of the current major generation such that panfrost is the correct module for Mali Bifrost but also the wrong one for Mali Bifrost... :/ Well panfrost.ko is the correct module for all Bifrost ;) It's Valhall that's the confusing one. Bah, you see? If even developers sufficiently involved to be CCed on the patches can't remember what's what, what hope does Joe User have? :D I would hope that for most users they can just build both panfrost and panthor and everything will "Just Work (tm)". I'm not sure how much users are actually aware of the architecture family of their GPU. I think at the moment (until marketing mess it up) there's also the 'simple' rule: * Mali T* is Midgard and supported by panfrost.ko * Mali Gxx (two digits) is Bifrost or first-generation Valhall and supported by panfrost.ko * Mali Gxxx (three digits) is Valhall CSF and supported by panthor. (and Immortalis is always three digits and Valhall CSF). With brain now engaged, indeed that sounds right. However if the expectation is that most people would steer clear even of marketing's alphabet soup and just enable everything, that could also be seen as somewhat of an argument for just putting it all together and not bothering with a separate option. I can see the point, but otoh if someone's managed to build all the right regulator/clock/etc modules to get a working system, they'll probably manage to figure teh GPU side out? Maybe; either way I guess it's not really my concern, since I'm the only user that *I* have to support, and I do already understand it. From the upstream perspective I mostly just want to hold on to the hope of not having to write my io-pgtable bugs twice over if at all possible :) I agree it would be nice to merge some of the common code, I'm hoping this is something that might be possible in the future. But at the moment the focus is on trying to get basic support for the new GPUs without the danger of regressing the old GPUs. Yup, I get that, it's just the niggling concern I have is whether what we do at the moment might paint us into a corner with respect to what we're then able to change later; I know KConfig symbols are explicitly not ABI, but module names and driver names might be more of a grey area. And, to be honest, for a fair bit of the common code in panfrost/panthorm it's common to a few other drivers too. So the correct answer might well be to try to add more generic helpers (devfreq, clocks, power domains all spring to mind - there's a lot of boiler plate and nothing very special about Mali). That much is true, however I guess there's also stuff like perf counter support which is less likely to be DRM-level generic but perhaps still sufficiently similar between JM and CSF. The main thing I don't know, and thus feel compelled to poke at, is whether there's any possibility that once the new UAPI is mature, it might eventually become preferable to move Job Manager support over to some subset of that rather than maintain two whole UAPIs in parallel (particularly at the Mesa end). My (limited) understanding is that all the BO-wrangling and MMU code is primarily different here for the sake of supporting new shiny UAPI features, not because of anything inherent to CSF itself (other than CSF being the thing which makes supporting said features feasible). If that's a preposterous idea and absolutely never ever going to be realistic, then fine, but if not, then it feels like the kind of thing that my all-too-great experience of technical debt and bad short-term
Re: [PATCH v2 05/15] drm/panthor: Add the GPU logical block
On 2023-08-14 11:54, Steven Price wrote: [...] +/** + * panthor_gpu_l2_power_on() - Power-on the L2-cache + * @ptdev: Device. + * + * Return: 0 on success, a negative error code otherwise. + */ +int panthor_gpu_l2_power_on(struct panthor_device *ptdev) +{ + u64 core_mask = U64_MAX; + + if (ptdev->gpu_info.l2_present != 1) { + /* +* Only support one core group now. +* ~(l2_present - 1) unsets all bits in l2_present except +* the bottom bit. (l2_present - 2) has all the bits in +* the first core group set. AND them together to generate +* a mask of cores in the first core group. +*/ + core_mask = ~(ptdev->gpu_info.l2_present - 1) & +(ptdev->gpu_info.l2_present - 2); + drm_info_once(>base, "using only 1st core group (%lu cores from %lu)\n", + hweight64(core_mask), + hweight64(ptdev->gpu_info.shader_present)); I'm not sure what the point of this complexity is. This boils down to the equivalent of: if (ptdev->gpu_info.l2_present != 1) core_mask = 1; Hmm, that doesn't look right - the idiom here should be to set all bits of the output below the *second* set bit of the input, i.e. 0x11 -> 0x0f. However since panthor is (somewhat ironically) unlikely to ever run on T628, and everything newer should pretend to have a single L2 because software-managed coherency is a terrible idea, I would agree that ultimately it does all seem a bit pointless. If we were doing shader-core power management manually (like on pre-CSF GPUs, rather than letting the firmware control it) then the computed core_mask would be useful. So I guess it comes down to the drm_info_once() output and counting the cores - which is nice to have but it took me some time figuring out what was going on here. As for the complexity, I'd suggest you can have some choice words with the guy who originally suggested that code[1] ;) Cheers, Robin. [1] https://lore.kernel.org/dri-devel/b009b4c4-0396-58c2-7779-30c844f36...@arm.com/
Re: [PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()
On 2023-08-18 22:32, Jason Gunthorpe wrote: It turns out several drivers are calling of_dma_configure() outside the expected bus_type.dma_configure op. This ends up being mis-locked and triggers a lockdep assertion, or instance: iommu_probe_device_locked+0xd4/0xe4 of_iommu_configure+0x10c/0x200 of_dma_configure_id+0x104/0x3b8 a6xx_gmu_init+0x4c/0xccc [msm] a6xx_gpu_init+0x3ac/0x770 [msm] adreno_bind+0x174/0x2ac [msm] component_bind_all+0x118/0x24c msm_drm_bind+0x1e8/0x6c4 [msm] try_to_bring_up_aggregate_device+0x168/0x1d4 __component_add+0xa8/0x170 component_add+0x14/0x20 dsi_dev_attach+0x20/0x2c [msm] dsi_host_attach+0x9c/0x144 [msm] devm_mipi_dsi_attach+0x34/0xb4 lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc] lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc] i2c_device_probe+0x14c/0x290 really_probe+0x148/0x2b4 __driver_probe_device+0x78/0x12c driver_probe_device+0x3c/0x160 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0xa8/0x1b0 device_initial_probe+0x14/0x20 bus_probe_device+0xb0/0xb4 deferred_probe_work_func+0x8c/0xc8 process_one_work+0x1ec/0x53c worker_thread+0x298/0x408 kthread+0x124/0x128 ret_from_fork+0x10/0x20 It is subtle and was never documented or enforced, but there has always been an assumption that of_dma_configure_id() is not concurrent. It makes several calls into the iommu layer that require this, including dev_iommu_get(). The majority of cases have been preventing concurrency using the device_lock(). Thus the new lock debugging added exposes an existing problem in drivers. On inspection this looks like a theoretical locking problem as generally the cases are already assuming they are the exclusive (single threaded) user of the target device. Sorry to be blunt, but the only problem is that you've introduced an idealistic new locking scheme which failed to take into account how things currently actually work, and is broken and achieving nothing but causing problems. The solution is to drop those locking patches entirely and rethink the whole thing. When their sole purpose was to improve the locking and make it easier to reason about, and now the latest "fix" is now to remove one of the assertions which forms the fundamental basis for that reasoning, then the point has clearly been lost. All we've done is churned a dodgy and incomplete locking scheme into a *different* dodgy and incomplete locking scheme. I do not think that continuing to dig in deeper is the way out of the hole... It's now rc7, and I have little confidence that aren't still more latent problems which just haven't been hit yet (e.g. acpi_dma_configure() is also called in different contexts relative to the device lock, which is absolutely by design and not broken). And on the subject of idealism, the fact is that doing IOMMU configuration based on driver probe via bus->dma_configure is *fundamentally wrong* and breaking a bunch of other IOMMU API assumptions, so it is not a robust foundation to build anything upon in the first place. The problem it causes with broken groups has been known about for several years now, however it's needed a lot of work to get to the point of being able to fix it properly (FWIW that is now #2 on my priority list after getting the bus ops stuff done, which should also make it easier). Thanks, Robin. Sadly, there are deeper technical problems with all of the places doing this. There are several problemetic patterns: 1) Probe a driver on device A and then steal device B and use it as part of the driver operation. Since no driver was probed to device B it means we never called bus_type.dma_configure and thus the drivers hackily try to open code this. Unfortunately nothing prevents another driver from binding to device B and creating total chaos. eg vfio bind triggered by userspace 2) Probe a driver on device A and then create a new platform driver B for a fwnode that doesn't have one, then do #1 This has the same essential problem as #1, the new device is never probed so the hack call to of_dma_configure() is needed to setup DMA, and we are at risk of something else trying to use the device. 3) Probe a driver on device A but the of_node was incorrect for DMA so fix it by figuring out the right node and calling of_dma_configure() This will blow up in the iommu code if the driver is unprobed because the bus_type now assumes that dma_configure and dma_cleanup are strictly paired. Since dma_configure will have done the wrong thing due to the missing of_node, dma_cleanup will be unpaired and iommu_device_unuse_default_domain() will blow up. Further the driver operating on device A will not be protected against changes to the iommu domain since it never called iommu_device_use_default_domain() At least this case will not throw a lockdep warning as
Re: [PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()
On 2023-08-18 22:32, Jason Gunthorpe wrote: It turns out several drivers are calling of_dma_configure() outside the expected bus_type.dma_configure op. This ends up being mis-locked and triggers a lockdep assertion, or instance: iommu_probe_device_locked+0xd4/0xe4 of_iommu_configure+0x10c/0x200 of_dma_configure_id+0x104/0x3b8 a6xx_gmu_init+0x4c/0xccc [msm] a6xx_gpu_init+0x3ac/0x770 [msm] adreno_bind+0x174/0x2ac [msm] component_bind_all+0x118/0x24c msm_drm_bind+0x1e8/0x6c4 [msm] try_to_bring_up_aggregate_device+0x168/0x1d4 __component_add+0xa8/0x170 component_add+0x14/0x20 dsi_dev_attach+0x20/0x2c [msm] dsi_host_attach+0x9c/0x144 [msm] devm_mipi_dsi_attach+0x34/0xb4 lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc] lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc] i2c_device_probe+0x14c/0x290 really_probe+0x148/0x2b4 __driver_probe_device+0x78/0x12c driver_probe_device+0x3c/0x160 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0xa8/0x1b0 device_initial_probe+0x14/0x20 bus_probe_device+0xb0/0xb4 deferred_probe_work_func+0x8c/0xc8 process_one_work+0x1ec/0x53c worker_thread+0x298/0x408 kthread+0x124/0x128 ret_from_fork+0x10/0x20 It is subtle and was never documented or enforced, but there has always been an assumption that of_dma_configure_id() is not concurrent. It makes several calls into the iommu layer that require this, including dev_iommu_get(). The majority of cases have been preventing concurrency using the device_lock(). Thus the new lock debugging added exposes an existing problem in drivers. On inspection this looks like a theoretical locking problem as generally the cases are already assuming they are the exclusive (single threaded) user of the target device. Sorry to be blunt, but the only problem is that you've introduced an idealistic new locking scheme which failed to take into account how things currently actually work, and is broken and achieving nothing but causing problems. The solution is to drop those locking patches entirely and rethink the whole thing. When their sole purpose was to improve the locking and make it easier to reason about, and now the latest "fix" is now to remove one of the assertions which forms the fundamental basis for that reasoning, then the point has clearly been lost. All we've done is churned a dodgy and incomplete locking scheme into a *different* dodgy and incomplete locking scheme. I do not think that continuing to dig in deeper is the way out of the hole... It's now rc7, and I have little confidence that aren't still more latent problems which just haven't been hit yet (e.g. acpi_dma_configure() is also called in different contexts relative to the device lock, which is absolutely by design and not broken). And on the subject of idealism, the fact is that doing IOMMU configuration based on driver probe via bus->dma_configure is *fundamentally wrong* and breaking a bunch of other IOMMU API assumptions, so it is not a robust foundation to build anything upon in the first place. The problem it causes with broken groups has been known about for several years now, however it's needed a lot of work to get to the point of being able to fix it properly (FWIW that is now #2 on my priority list after getting the bus ops stuff done, which should also make it easier). Thanks, Robin. Sadly, there are deeper technical problems with all of the places doing this. There are several problemetic patterns: 1) Probe a driver on device A and then steal device B and use it as part of the driver operation. Since no driver was probed to device B it means we never called bus_type.dma_configure and thus the drivers hackily try to open code this. Unfortunately nothing prevents another driver from binding to device B and creating total chaos. eg vfio bind triggered by userspace 2) Probe a driver on device A and then create a new platform driver B for a fwnode that doesn't have one, then do #1 This has the same essential problem as #1, the new device is never probed so the hack call to of_dma_configure() is needed to setup DMA, and we are at risk of something else trying to use the device. 3) Probe a driver on device A but the of_node was incorrect for DMA so fix it by figuring out the right node and calling of_dma_configure() This will blow up in the iommu code if the driver is unprobed because the bus_type now assumes that dma_configure and dma_cleanup are strictly paired. Since dma_configure will have done the wrong thing due to the missing of_node, dma_cleanup will be unpaired and iommu_device_unuse_default_domain() will blow up. Further the driver operating on device A will not be protected against changes to the iommu domain since it never called iommu_device_use_default_domain() At least this case will not throw a lockdep warning as
Re: [Freedreno] [PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()
On 2023-08-18 22:32, Jason Gunthorpe wrote: It turns out several drivers are calling of_dma_configure() outside the expected bus_type.dma_configure op. This ends up being mis-locked and triggers a lockdep assertion, or instance: iommu_probe_device_locked+0xd4/0xe4 of_iommu_configure+0x10c/0x200 of_dma_configure_id+0x104/0x3b8 a6xx_gmu_init+0x4c/0xccc [msm] a6xx_gpu_init+0x3ac/0x770 [msm] adreno_bind+0x174/0x2ac [msm] component_bind_all+0x118/0x24c msm_drm_bind+0x1e8/0x6c4 [msm] try_to_bring_up_aggregate_device+0x168/0x1d4 __component_add+0xa8/0x170 component_add+0x14/0x20 dsi_dev_attach+0x20/0x2c [msm] dsi_host_attach+0x9c/0x144 [msm] devm_mipi_dsi_attach+0x34/0xb4 lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc] lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc] i2c_device_probe+0x14c/0x290 really_probe+0x148/0x2b4 __driver_probe_device+0x78/0x12c driver_probe_device+0x3c/0x160 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0xa8/0x1b0 device_initial_probe+0x14/0x20 bus_probe_device+0xb0/0xb4 deferred_probe_work_func+0x8c/0xc8 process_one_work+0x1ec/0x53c worker_thread+0x298/0x408 kthread+0x124/0x128 ret_from_fork+0x10/0x20 It is subtle and was never documented or enforced, but there has always been an assumption that of_dma_configure_id() is not concurrent. It makes several calls into the iommu layer that require this, including dev_iommu_get(). The majority of cases have been preventing concurrency using the device_lock(). Thus the new lock debugging added exposes an existing problem in drivers. On inspection this looks like a theoretical locking problem as generally the cases are already assuming they are the exclusive (single threaded) user of the target device. Sorry to be blunt, but the only problem is that you've introduced an idealistic new locking scheme which failed to take into account how things currently actually work, and is broken and achieving nothing but causing problems. The solution is to drop those locking patches entirely and rethink the whole thing. When their sole purpose was to improve the locking and make it easier to reason about, and now the latest "fix" is now to remove one of the assertions which forms the fundamental basis for that reasoning, then the point has clearly been lost. All we've done is churned a dodgy and incomplete locking scheme into a *different* dodgy and incomplete locking scheme. I do not think that continuing to dig in deeper is the way out of the hole... It's now rc7, and I have little confidence that aren't still more latent problems which just haven't been hit yet (e.g. acpi_dma_configure() is also called in different contexts relative to the device lock, which is absolutely by design and not broken). And on the subject of idealism, the fact is that doing IOMMU configuration based on driver probe via bus->dma_configure is *fundamentally wrong* and breaking a bunch of other IOMMU API assumptions, so it is not a robust foundation to build anything upon in the first place. The problem it causes with broken groups has been known about for several years now, however it's needed a lot of work to get to the point of being able to fix it properly (FWIW that is now #2 on my priority list after getting the bus ops stuff done, which should also make it easier). Thanks, Robin. Sadly, there are deeper technical problems with all of the places doing this. There are several problemetic patterns: 1) Probe a driver on device A and then steal device B and use it as part of the driver operation. Since no driver was probed to device B it means we never called bus_type.dma_configure and thus the drivers hackily try to open code this. Unfortunately nothing prevents another driver from binding to device B and creating total chaos. eg vfio bind triggered by userspace 2) Probe a driver on device A and then create a new platform driver B for a fwnode that doesn't have one, then do #1 This has the same essential problem as #1, the new device is never probed so the hack call to of_dma_configure() is needed to setup DMA, and we are at risk of something else trying to use the device. 3) Probe a driver on device A but the of_node was incorrect for DMA so fix it by figuring out the right node and calling of_dma_configure() This will blow up in the iommu code if the driver is unprobed because the bus_type now assumes that dma_configure and dma_cleanup are strictly paired. Since dma_configure will have done the wrong thing due to the missing of_node, dma_cleanup will be unpaired and iommu_device_unuse_default_domain() will blow up. Further the driver operating on device A will not be protected against changes to the iommu domain since it never called iommu_device_use_default_domain() At least this case will not throw a lockdep warning as
Re: [PATCH v3] misc: sram: Add DMA-BUF Heap exporting of SRAM areas
On 2023-07-13 20:13, Andrew Davis wrote: This new export type exposes to userspace the SRAM area as a DMA-BUF Heap, this allows for allocations of DMA-BUFs that can be consumed by various DMA-BUF supporting devices. Signed-off-by: Andrew Davis --- Changes from v2: - Make sram_dma_heap_allocate static (kernel test robot) - Rebase on v6.5-rc1 drivers/misc/Kconfig | 7 + drivers/misc/Makefile| 1 + drivers/misc/sram-dma-heap.c | 245 +++ drivers/misc/sram.c | 6 + drivers/misc/sram.h | 16 +++ 5 files changed, 275 insertions(+) create mode 100644 drivers/misc/sram-dma-heap.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 75e427f124b28..ee34dfb61605f 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -448,6 +448,13 @@ config SRAM config SRAM_EXEC bool +config SRAM_DMA_HEAP + bool "Export on-chip SRAM pools using DMA-Heaps" + depends on DMABUF_HEAPS && SRAM + help + This driver allows the export of on-chip SRAM marked as both pool + and exportable to userspace using the DMA-Heaps interface. + config DW_XDATA_PCIE depends on PCI tristate "Synopsys DesignWare xData PCIe driver" diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index f2a4d1ff65d46..5e7516bfaa8de 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -47,6 +47,7 @@ obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/ obj-$(CONFIG_LATTICE_ECP3_CONFIG) += lattice-ecp3-config.o obj-$(CONFIG_SRAM)+= sram.o obj-$(CONFIG_SRAM_EXEC) += sram-exec.o +obj-$(CONFIG_SRAM_DMA_HEAP)+= sram-dma-heap.o obj-$(CONFIG_GENWQE) += genwqe/ obj-$(CONFIG_ECHO)+= echo/ obj-$(CONFIG_CXL_BASE)+= cxl/ diff --git a/drivers/misc/sram-dma-heap.c b/drivers/misc/sram-dma-heap.c new file mode 100644 index 0..c054c04dff33e --- /dev/null +++ b/drivers/misc/sram-dma-heap.c @@ -0,0 +1,245 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * SRAM DMA-Heap userspace exporter + * + * Copyright (C) 2019-2022 Texas Instruments Incorporated - https://www.ti.com/ + * Andrew Davis + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "sram.h" + +struct sram_dma_heap { + struct dma_heap *heap; + struct gen_pool *pool; +}; + +struct sram_dma_heap_buffer { + struct gen_pool *pool; + struct list_head attachments; + struct mutex attachments_lock; + unsigned long len; + void *vaddr; + phys_addr_t paddr; +}; + +struct dma_heap_attachment { + struct device *dev; + struct sg_table *table; + struct list_head list; +}; + +static int dma_heap_attach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attachment) +{ + struct sram_dma_heap_buffer *buffer = dmabuf->priv; + struct dma_heap_attachment *a; + struct sg_table *table; + + a = kzalloc(sizeof(*a), GFP_KERNEL); + if (!a) + return -ENOMEM; + + table = kmalloc(sizeof(*table), GFP_KERNEL); + if (!table) { + kfree(a); + return -ENOMEM; + } + if (sg_alloc_table(table, 1, GFP_KERNEL)) { + kfree(table); + kfree(a); + return -ENOMEM; + } + sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(buffer->paddr)), buffer->len, 0); What happens if someone (reasonably) assumes that this struct page pointer isn't completely made up, and dereferences it? (That's if pfn_to_page() itself doesn't blow up, which it potentially might, at least under CONFIG_SPARSEMEM) I think this needs to be treated as P2PDMA if it's going to have any hope of working robustly. + + a->table = table; + a->dev = attachment->dev; + INIT_LIST_HEAD(>list); + + attachment->priv = a; + + mutex_lock(>attachments_lock); + list_add(>list, >attachments); + mutex_unlock(>attachments_lock); + + return 0; +} + +static void dma_heap_detatch(struct dma_buf *dmabuf, +struct dma_buf_attachment *attachment) +{ + struct sram_dma_heap_buffer *buffer = dmabuf->priv; + struct dma_heap_attachment *a = attachment->priv; + + mutex_lock(>attachments_lock); + list_del(>list); + mutex_unlock(>attachments_lock); + + sg_free_table(a->table); + kfree(a->table); + kfree(a); +} + +static struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment *attachment, +enum dma_data_direction direction) +{ + struct dma_heap_attachment *a = attachment->priv; + struct sg_table *table = a->table; + + /* +* As this heap is backed by uncached SRAM memory we do not need to +* perform any sync operations on the buffer before allowing device +
Re: [PATCH v2 13/15] drm/panthor: Allow driver compilation
On 2023-08-11 17:56, Daniel Stone wrote: Hi, On 11/08/2023 17:35, Robin Murphy wrote: On 2023-08-09 17:53, Boris Brezillon wrote: +obj-$(CONFIG_DRM_PANTHOR) += panthor.o FWIW I still think it would be nice to have a minor directory/Kconfig/Makefile reshuffle and a trivial bit of extra registration glue to build both drivers into a single module. It seems like it could be a perpetual source of confusion to end users where Mesa "panfrost" is the right option but kernel "panfrost" is the wrong one. Especially when pretty much every other GPU driver is also just one big top-level module to load for many different generations of hardware. Plus it would mean that if someone did want to have a go at deduplicating the resource-wrangling boilerplate for OPPs etc. in future, there's more chance of being able to do so meaningfully. It might be nice to point it out, but to be fair Intel and AMD both have two (or more) drivers, as does Broadcom/RPi. As does, err ... Mali. Indeed, I didn't mean to imply that I'm not aware that e.g. gma500 is to i915 what lima is to panfrost. It was more that unlike the others where there's a pretty clear line in the sand between "driver for old hardware" and "driver for the majority of recent hardware", this one happens to fall splat in the middle of the current major generation such that panfrost is the correct module for Mali Bifrost but also the wrong one for Mali Bifrost... :/ I can see the point, but otoh if someone's managed to build all the right regulator/clock/etc modules to get a working system, they'll probably manage to figure teh GPU side out? Maybe; either way I guess it's not really my concern, since I'm the only user that *I* have to support, and I do already understand it. From the upstream perspective I mostly just want to hold on to the hope of not having to write my io-pgtable bugs twice over if at all possible :) Cheers, Robin.
Re: [PATCH v2 13/15] drm/panthor: Allow driver compilation
On 2023-08-09 17:53, Boris Brezillon wrote: Now that all blocks are available, we can add/update Kconfig/Makefile files to allow compilation. v2: - Rename the driver (pancsf -> panthor) - Change the license (GPL2 -> MIT + GPL2) - Split the driver addition commit - Add new dependencies on GPUVA and DRM_SCHED Signed-off-by: Boris Brezillon --- drivers/gpu/drm/Kconfig | 2 ++ drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/panthor/Kconfig | 16 drivers/gpu/drm/panthor/Makefile | 15 +++ 4 files changed, 34 insertions(+) create mode 100644 drivers/gpu/drm/panthor/Kconfig create mode 100644 drivers/gpu/drm/panthor/Makefile diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 2a44b9419d4d..bddfbdb2ffee 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -358,6 +358,8 @@ source "drivers/gpu/drm/lima/Kconfig" source "drivers/gpu/drm/panfrost/Kconfig" +source "drivers/gpu/drm/panthor/Kconfig" + source "drivers/gpu/drm/aspeed/Kconfig" source "drivers/gpu/drm/mcde/Kconfig" diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 215e78e79125..0a260727505f 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -188,6 +188,7 @@ obj-$(CONFIG_DRM_TVE200) += tve200/ obj-$(CONFIG_DRM_XEN) += xen/ obj-$(CONFIG_DRM_VBOXVIDEO) += vboxvideo/ obj-$(CONFIG_DRM_LIMA) += lima/ +obj-$(CONFIG_DRM_PANTHOR) += panthor/ obj-$(CONFIG_DRM_PANFROST) += panfrost/ obj-$(CONFIG_DRM_ASPEED_GFX) += aspeed/ obj-$(CONFIG_DRM_MCDE) += mcde/ diff --git a/drivers/gpu/drm/panthor/Kconfig b/drivers/gpu/drm/panthor/Kconfig new file mode 100644 index ..a9d17b1bbb75 --- /dev/null +++ b/drivers/gpu/drm/panthor/Kconfig @@ -0,0 +1,16 @@ +# SPDX-License-Identifier: GPL-2.0 or MIT + +config DRM_PANTHOR + tristate "Panthor (DRM support for ARM Mali CSF-based GPUs)" + depends on DRM + depends on ARM || ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64) + depends on MMU + select DRM_EXEC + select DRM_SCHED + select IOMMU_SUPPORT + select IOMMU_IO_PGTABLE_LPAE + select DRM_GEM_SHMEM_HELPER + select PM_DEVFREQ + select DEVFREQ_GOV_SIMPLE_ONDEMAND + help + DRM driver for ARM Mali CSF-based GPUs. diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile new file mode 100644 index ..64193a484879 --- /dev/null +++ b/drivers/gpu/drm/panthor/Makefile @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0 or MIT + +panthor-y := \ + panthor_devfreq.o \ + panthor_device.o \ + panthor_drv.o \ + panthor_gem.o \ + panthor_gpu.o \ + panthor_heap.o \ + panthor_heap.o \ + panthor_fw.o \ + panthor_mmu.o \ + panthor_sched.o + +obj-$(CONFIG_DRM_PANTHOR) += panthor.o FWIW I still think it would be nice to have a minor directory/Kconfig/Makefile reshuffle and a trivial bit of extra registration glue to build both drivers into a single module. It seems like it could be a perpetual source of confusion to end users where Mesa "panfrost" is the right option but kernel "panfrost" is the wrong one. Especially when pretty much every other GPU driver is also just one big top-level module to load for many different generations of hardware. Plus it would mean that if someone did want to have a go at deduplicating the resource-wrangling boilerplate for OPPs etc. in future, there's more chance of being able to do so meaningfully. Cheers, Robin.
Re: [PATCH] iommu: Explicitly include correct DT includes
On 14/07/2023 6:46 pm, Rob Herring wrote: The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Thanks Rob; FWIW, Acked-by: Robin Murphy I guess you're hoping for Joerg to pick this up? However I wouldn't foresee any major conflicts if you do need to take it through the OF tree. Cheers, Robin. Signed-off-by: Rob Herring --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c | 2 +- drivers/iommu/arm/arm-smmu/arm-smmu.c| 1 - drivers/iommu/arm/arm-smmu/qcom_iommu.c | 3 +-- drivers/iommu/ipmmu-vmsa.c | 1 - drivers/iommu/sprd-iommu.c | 1 + drivers/iommu/tegra-smmu.c | 2 +- drivers/iommu/virtio-iommu.c | 2 +- 7 files changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c index b5b14108e086..bb89d49adf8d 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c @@ -3,7 +3,7 @@ * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved. */ -#include +#include #include #include diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index a86acd76c1df..d6d1a2a55cc0 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -29,7 +29,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c index a503ed758ec3..cc3f68a3516c 100644 --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c @@ -22,8 +22,7 @@ #include #include #include -#include -#include +#include #include #include #include diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 9f64c5c9f5b9..0aeedd3e1494 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -17,7 +17,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c index 39e34fdeccda..51144c232474 100644 --- a/drivers/iommu/sprd-iommu.c +++ b/drivers/iommu/sprd-iommu.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 1cbf063ccf14..e445f80d0226 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -9,7 +9,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 3551ed057774..17dcd826f5c2 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include #include ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/7] swiotlb: make io_tlb_default_mem local to swiotlb.c
On 27/06/2023 11:24 am, Greg Kroah-Hartman wrote: On Tue, Jun 27, 2023 at 11:54:23AM +0200, Petr Tesarik wrote: +/** + * is_swiotlb_active() - check if the software IO TLB is initialized + * @dev: Device to check, or %NULL for the default IO TLB. + */ bool is_swiotlb_active(struct device *dev) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev + ? dev->dma_io_tlb_mem + : _tlb_default_mem; That's impossible to read and maintain over time, sorry. Please use real "if () else" lines, so that it can be maintained over time. Moreover, it makes for a horrible interface anyway. If there's a need for a non-specific "is SWIOTLB present at all?" check unrelated to any particular device (which arguably still smells of poking into implementation details...), please encapsulate it in its own distinct helper like, say, is_swiotlb_present(void). However, the more I think about it, the more I doubt that logic like octeon_pci_setup() can continue to work properly at all if SWIOTLB allocation becomes dynamic... :/ Thanks, Robin.
Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device
On 2023-06-20 10:47, Sui Jingfeng wrote: From: Sui Jingfeng Loongson CPUs maintain cache coherency by hardware, which means that the data in the CPU cache is identical to the data in main system memory. As for the peripheral device, most of Loongson chips chose to define the peripherals as DMA coherent by default, device drivers do not need to maintain the coherency between a processor and an I/O device manually. There are exceptions, for LS2K1000 SoC, part of peripheral device can be configured as DMA non-coherent. But there is no released version of such firmware exist in the market. Peripherals of older LS2K1000 is also DMA non-coherent, but they are nearly outdated. So, those are trivial cases. Nevertheless, kernel space still need to do the probe work, because vivante GPU IP has been integrated into various platform. Hence, this patch add runtime detection code to probe if a specific GPU is DMA coherent, If the answer is yes, we are going to utilize such features. On Loongson platform, When a buffer is accessed by both the GPU and the CPU, the driver should prefer ETNA_BO_CACHED over ETNA_BO_WC. This patch also add a new parameter: etnaviv_param_gpu_coherent, which allow userspace to know if such a feature is available. Because write-combined BO is still preferred in some case, especially where don't need CPU read, for example, uploading compiled shader bin. Cc: Lucas Stach Cc: Christian Gmeiner Cc: Philipp Zabel Cc: Bjorn Helgaas Cc: Daniel Vetter Signed-off-by: Sui Jingfeng --- drivers/gpu/drm/etnaviv/etnaviv_drv.c | 35 + drivers/gpu/drm/etnaviv/etnaviv_drv.h | 6 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 22 ++--- drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 7 - drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 +++ include/uapi/drm/etnaviv_drm.h | 1 + 6 files changed, 70 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c index 0a365e96d371..d8e788aa16cb 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c @@ -5,7 +5,9 @@ #include #include +#include /* * This header is for implementations of dma_map_ops and related code. * It should not be included in drivers just using the DMA API. */ #include +#include #include #include @@ -24,6 +26,34 @@ #include "etnaviv_pci_drv.h" #include "etnaviv_perfmon.h" +static struct device_node *etnaviv_of_first_available_node(void) +{ + struct device_node *core_node; + + for_each_compatible_node(core_node, NULL, "vivante,gc") { + if (of_device_is_available(core_node)) + return core_node; + } + + return NULL; +} + +static bool etnaviv_is_dma_coherent(struct device *dev) +{ + struct device_node *np; + bool coherent; + + np = etnaviv_of_first_available_node(); + if (np) { + coherent = of_dma_is_coherent(np); + of_node_put(np); + } else { + coherent = dev_is_dma_coherent(dev); + } Please use device_get_dma_attr() like other well-behaved drivers. + + return coherent; +} + /* * etnaviv private data construction and destructions: */ @@ -52,6 +82,11 @@ etnaviv_alloc_private(struct device *dev, struct drm_device *drm) return ERR_PTR(-ENOMEM); } + priv->dma_coherent = etnaviv_is_dma_coherent(dev); + + if (priv->dma_coherent) + drm_info(drm, "%s is dma coherent\n", dev_name(dev)); I'm pretty sure the end-user doesn't care. + return priv; } diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h index 9cd72948cfad..644e5712c050 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h @@ -46,6 +46,12 @@ struct etnaviv_drm_private { struct xarray active_contexts; u32 next_context_id; + /* +* If true, the GPU is capable of snooping cpu cache. Here, it +* also means that cache coherency is enforced by the hardware. +*/ + bool dma_coherent; + /* list of GEM objects: */ struct mutex gem_lock; struct list_head gem_list; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c index b5f73502e3dd..39bdc3774f2d 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c @@ -343,6 +343,7 @@ void *etnaviv_gem_vmap(struct drm_gem_object *obj) static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj) { struct page **pages; + pgprot_t prot; lockdep_assert_held(>lock); @@ -350,8 +351,19 @@ static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj) if (IS_ERR(pages)) return NULL; - return vmap(pages, obj->base.size >> PAGE_SHIFT, -
Re: [PATCH v2 09/25] iommu/fsl_pamu: Implement an IDENTITY domain
On 2023-06-01 20:46, Jason Gunthorpe wrote: On Thu, Jun 01, 2023 at 08:37:45PM +0100, Robin Murphy wrote: On 2023-05-16 01:00, Jason Gunthorpe wrote: Robin was able to check the documentation and what fsl_pamu has historically called detach_dev() is really putting the IOMMU into an IDENTITY mode. Unfortunately it was the other way around - it's the call to fsl_setup_liodns() from fsl_pamu_probe() which leaves everything in bypass by default (the PAACE_ATM_NO_XLATE part, IIRC), whereas the detach_device() call here ends up disabling the given device's LIODN altogether Er, I see.. Let me think about it, you convinced me to change it from PLATFORM, so maybe we should go back to that if it is all wonky. FWIW I was thinking more along the lines of a token nominal identity domain where attach does nothing at all... There doesn't appear to have ever been any code anywhere for putting things *back* into bypass after using a VFIO domain, so as-is these default domains would probably break all DMA :( Sounds like it just never worked right. ie going to VFIO mode was always a one way trip and you can't go back to a kernel driver. ...on the assumption that doing so wouldn't really be any less broken than it always has been :) Thanks, Robin. I don't think this patch makes it worse because we call the identity attach_dev in all the same places we called detach_dev in the first place. We add an extra call at the start of time, but that call is NOP'd by this: if (domain == platform_domain || !domain) + return 0; + (bah, and the variable name needs updating too) Honestly, I don't really want to fix FSL since it seems abandoned, so either this patch or going back to PLATFORM seems like the best option. Jason
Re: [PATCH v2 25/25] iommu: Convert remaining simple drivers to domain_alloc_paging()
On 2023-05-16 01:00, Jason Gunthorpe wrote: These drivers don't support IOMMU_DOMAIN_DMA, so this commit effectively allows them to support that mode. The prior work to require default_domains makes this safe because every one of these drivers is either compilation incompatible with dma-iommu.c, or already establishing a default_domain. In both cases alloc_domain() will never be called with IOMMU_DOMAIN_DMA for these drivers so it is safe to drop the test. Removing these tests clarifies that the domain allocation path is only about the functionality of a paging domain and has nothing to do with policy of how the paging domain is used for UNMANAGED/DMA/DMA_FQ. Tested-by: Niklas Schnelle Signed-off-by: Jason Gunthorpe --- drivers/iommu/fsl_pamu_domain.c | 7 ++- drivers/iommu/msm_iommu.c | 7 ++- drivers/iommu/mtk_iommu_v1.c| 7 ++- drivers/iommu/omap-iommu.c | 7 ++- drivers/iommu/s390-iommu.c | 7 ++- 5 files changed, 10 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index ca4f5ebf028783..8d5d6a3acf9dfd 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -192,13 +192,10 @@ static void fsl_pamu_domain_free(struct iommu_domain *domain) kmem_cache_free(fsl_pamu_domain_cache, dma_domain); } -static struct iommu_domain *fsl_pamu_domain_alloc(unsigned type) +static struct iommu_domain *fsl_pamu_domain_alloc_paging(struct device *dev) This isn't a paging domain - it doesn't support map/unmap, and AFAICT all it has ever been intended to do is "isolate" accesses to within an aperture which is never set to anything less than the entire physical address space :/ I hate to imagine what the VFIO userspace applications looked like... Thanks, Robin. { struct fsl_dma_domain *dma_domain; - if (type != IOMMU_DOMAIN_UNMANAGED) - return NULL; - dma_domain = kmem_cache_zalloc(fsl_pamu_domain_cache, GFP_KERNEL); if (!dma_domain) return NULL; @@ -476,7 +473,7 @@ static const struct iommu_ops fsl_pamu_ops = { .identity_domain = _pamu_identity_domain, .def_domain_type = _pamu_def_domain_type, .capable= fsl_pamu_capable, - .domain_alloc = fsl_pamu_domain_alloc, + .domain_alloc_paging = fsl_pamu_domain_alloc_paging, .probe_device = fsl_pamu_probe_device, .device_group = fsl_pamu_device_group, .default_domain_ops = &(const struct iommu_domain_ops) { diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c index 26ed81cfeee897..a163cee0b7242d 100644 --- a/drivers/iommu/msm_iommu.c +++ b/drivers/iommu/msm_iommu.c @@ -302,13 +302,10 @@ static void __program_context(void __iomem *base, int ctx, SET_M(base, ctx, 1); } -static struct iommu_domain *msm_iommu_domain_alloc(unsigned type) +static struct iommu_domain *msm_iommu_domain_alloc_paging(struct device *dev) { struct msm_priv *priv; - if (type != IOMMU_DOMAIN_UNMANAGED) - return NULL; - priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) goto fail_nomem; @@ -691,7 +688,7 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id) static struct iommu_ops msm_iommu_ops = { .identity_domain = _iommu_identity_domain, - .domain_alloc = msm_iommu_domain_alloc, + .domain_alloc_paging = msm_iommu_domain_alloc_paging, .probe_device = msm_iommu_probe_device, .device_group = generic_device_group, .pgsize_bitmap = MSM_IOMMU_PGSIZES, diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 7c0c1d50df5f75..67e044c1a7d93b 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -270,13 +270,10 @@ static int mtk_iommu_v1_domain_finalise(struct mtk_iommu_v1_data *data) return 0; } -static struct iommu_domain *mtk_iommu_v1_domain_alloc(unsigned type) +static struct iommu_domain *mtk_iommu_v1_domain_alloc_paging(struct device *dev) { struct mtk_iommu_v1_domain *dom; - if (type != IOMMU_DOMAIN_UNMANAGED) - return NULL; - dom = kzalloc(sizeof(*dom), GFP_KERNEL); if (!dom) return NULL; @@ -585,7 +582,7 @@ static int mtk_iommu_v1_hw_init(const struct mtk_iommu_v1_data *data) static const struct iommu_ops mtk_iommu_v1_ops = { .identity_domain = _iommu_v1_identity_domain, - .domain_alloc = mtk_iommu_v1_domain_alloc, + .domain_alloc_paging = mtk_iommu_v1_domain_alloc_paging, .probe_device = mtk_iommu_v1_probe_device, .probe_finalize = mtk_iommu_v1_probe_finalize, .release_device = mtk_iommu_v1_release_device, diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index 34340ef15241bc..fcf99bd195b32e 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -1580,13
Re: [PATCH v2 09/25] iommu/fsl_pamu: Implement an IDENTITY domain
On 2023-05-16 01:00, Jason Gunthorpe wrote: Robin was able to check the documentation and what fsl_pamu has historically called detach_dev() is really putting the IOMMU into an IDENTITY mode. Unfortunately it was the other way around - it's the call to fsl_setup_liodns() from fsl_pamu_probe() which leaves everything in bypass by default (the PAACE_ATM_NO_XLATE part, IIRC), whereas the detach_device() call here ends up disabling the given device's LIODN altogether. There doesn't appear to have ever been any code anywhere for putting things *back* into bypass after using a VFIO domain, so as-is these default domains would probably break all DMA :( Thanks, Robin. Move to the new core support for ARM_DMA_USE_IOMMU by defining ops->identity_domain. This is a ppc driver without any dma_ops, so ensure the identity translation is the default domain. Signed-off-by: Jason Gunthorpe --- drivers/iommu/fsl_pamu_domain.c | 32 +--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index bce37229709965..ca4f5ebf028783 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -283,15 +283,21 @@ static int fsl_pamu_attach_device(struct iommu_domain *domain, return ret; } -static void fsl_pamu_set_platform_dma(struct device *dev) +static int fsl_pamu_identity_attach(struct iommu_domain *platform_domain, + struct device *dev) { struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain); + struct fsl_dma_domain *dma_domain; const u32 *prop; int len; struct pci_dev *pdev = NULL; struct pci_controller *pci_ctl; + if (domain == platform_domain || !domain) + return 0; + + dma_domain = to_fsl_dma_domain(domain); + /* * Use LIODN of the PCI controller while detaching a * PCI device. @@ -312,8 +318,18 @@ static void fsl_pamu_set_platform_dma(struct device *dev) detach_device(dev, dma_domain); else pr_debug("missing fsl,liodn property at %pOF\n", dev->of_node); + return 0; } +static struct iommu_domain_ops fsl_pamu_identity_ops = { + .attach_dev = fsl_pamu_identity_attach, +}; + +static struct iommu_domain fsl_pamu_identity_domain = { + .type = IOMMU_DOMAIN_IDENTITY, + .ops = _pamu_identity_ops, +}; + /* Set the domain stash attribute */ int fsl_pamu_configure_l1_stash(struct iommu_domain *domain, u32 cpu) { @@ -447,12 +463,22 @@ static struct iommu_device *fsl_pamu_probe_device(struct device *dev) return _iommu; } +static int fsl_pamu_def_domain_type(struct device *dev) +{ + /* +* This platform does not use dma_ops at all so the normally the iommu +* must be in identity mode +*/ + return IOMMU_DOMAIN_IDENTITY; +} + static const struct iommu_ops fsl_pamu_ops = { + .identity_domain = _pamu_identity_domain, + .def_domain_type = _pamu_def_domain_type, .capable= fsl_pamu_capable, .domain_alloc = fsl_pamu_domain_alloc, .probe_device = fsl_pamu_probe_device, .device_group = fsl_pamu_device_group, - .set_platform_dma_ops = fsl_pamu_set_platform_dma, .default_domain_ops = &(const struct iommu_domain_ops) { .attach_dev = fsl_pamu_attach_device, .iova_to_phys = fsl_pamu_iova_to_phys,
Re: [PATCH v2 23/25] iommu: Add ops->domain_alloc_paging()
On 2023-05-16 01:00, Jason Gunthorpe wrote: This callback requests the driver to create only a __IOMMU_DOMAIN_PAGING domain, so it saves a few lines in a lot of drivers needlessly checking the type. More critically, this allows us to sweep out all the IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA checks from a lot of the drivers, simplifying what is going on in the code and ultimately removing the now-unused special cases in drivers where they did not support IOMMU_DOMAIN_DMA. domain_alloc_paging() should return a struct iommu_domain that is functionally compatible with ARM_DMA_USE_IOMMU, dma-iommu.c and iommufd. Be forwards looking and pass in a 'struct device *' argument. We can provide this when allocating the default_domain. No drivers will look at this. As mentioned before, we already know we're going to need additional flags (and possibly data) to cover the existing set_pgtable_quirks use-case plus new stuff like the proposed dirty-tracking enable, so I'd be inclined to either add an extensible structure argument now to avoid future churn, or just not bother adding the device argument either until drivers can actually use it. Signed-off-by: Jason Gunthorpe --- drivers/iommu/iommu.c | 18 +++--- include/linux/iommu.h | 3 +++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index c4cac1dcf80610..15aa51c356bd74 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1995,14 +1995,25 @@ void iommu_set_fault_handler(struct iommu_domain *domain, EXPORT_SYMBOL_GPL(iommu_set_fault_handler); static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops, +struct device *dev, unsigned int type) { struct iommu_domain *domain; if (type == IOMMU_DOMAIN_IDENTITY && ops->identity_domain) return ops->identity_domain; + else if ((type == IOMMU_DOMAIN_UNMANAGED || type == IOMMU_DOMAIN_DMA) && +ops->domain_alloc_paging) { + /* +* For now exclude DMA_FQ since it is still a driver policy +* decision through domain_alloc() if we can use FQ mode. +*/ That's sorted now, so the type test can neatly collapse down to "type & __IOMMU_DOMAIN_PAGING". Thanks, Robin. + domain = ops->domain_alloc_paging(dev); + } else if (ops->domain_alloc) + domain = ops->domain_alloc(type); + else + return NULL; - domain = ops->domain_alloc(type); if (!domain) return NULL; @@ -2033,14 +2044,15 @@ __iommu_group_domain_alloc(struct iommu_group *group, unsigned int type) lockdep_assert_held(>mutex); - return __iommu_domain_alloc(dev_iommu_ops(dev), type); + return __iommu_domain_alloc(dev_iommu_ops(dev), dev, type); } struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus) { if (bus == NULL || bus->iommu_ops == NULL) return NULL; - return __iommu_domain_alloc(bus->iommu_ops, IOMMU_DOMAIN_UNMANAGED); + return __iommu_domain_alloc(bus->iommu_ops, NULL, + IOMMU_DOMAIN_UNMANAGED); } EXPORT_SYMBOL_GPL(iommu_domain_alloc); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 387746f8273c99..18b0df42cc80d1 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -227,6 +227,8 @@ struct iommu_iotlb_gather { * struct iommu_ops - iommu ops and capabilities * @capable: check capability * @domain_alloc: allocate iommu domain + * @domain_alloc_paging: Allocate an iommu_domain that can be used for + * UNMANAGED, DMA, and DMA_FQ domain types. * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU @@ -258,6 +260,7 @@ struct iommu_ops { /* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); + struct iommu_domain *(*domain_alloc_paging)(struct device *dev); struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev);
Re: [PATCH v2 08/25] iommu: Allow an IDENTITY domain as the default_domain in ARM32
On 2023-05-16 01:00, Jason Gunthorpe wrote: Even though dma-iommu.c and CONFIG_ARM_DMA_USE_IOMMU do approximately the same stuff, the way they relate to the IOMMU core is quiet different. dma-iommu.c expects the core code to setup an UNMANAGED domain (of type IOMMU_DOMAIN_DMA) and then configures itself to use that domain. This becomes the default_domain for the group. ARM_DMA_USE_IOMMU does not use the default_domain, instead it directly allocates an UNMANAGED domain and operates it just like an external driver. In this case group->default_domain is NULL. If the driver provides a global static identity_domain then automatically use it as the default_domain when in ARM_DMA_USE_IOMMU mode. This allows drivers that implemented default_domain == NULL as an IDENTITY translation to trivially get a properly labeled non-NULL default_domain on ARM32 configs. With this arrangment when ARM_DMA_USE_IOMMU wants to disconnect from the device the normal detach_domain flow will restore the IDENTITY domain as the default domain. Overall this makes attach_dev() of the IDENTITY domain called in the same places as detach_dev(). This effectively migrates these drivers to default_domain mode. For drivers that support ARM64 they will gain support for the IDENTITY translation mode for the dma_api and behave in a uniform way. Drivers use this by setting ops->identity_domain to a static singleton iommu_domain that implements the identity attach. If the core detects ARM_DMA_USE_IOMMU mode then it automatically attaches the IDENTITY domain during probe. Drivers can continue to prevent the use of DMA translation by returning IOMMU_DOMAIN_IDENTITY from def_domain_type, this will completely prevent IOMMU_DMA from running but will not impact ARM_DMA_USE_IOMMU. This allows removing the set_platform_dma_ops() from every remaining driver. Remove the set_platform_dma_ops from rockchip and mkt_v1 as all it does is set an existing global static identity domain. mkt_v1 does not support IOMMU_DOMAIN_DMA and it does not compile on ARM64 so this transformation is safe. Signed-off-by: Jason Gunthorpe --- drivers/iommu/iommu.c | 40 +- drivers/iommu/mtk_iommu_v1.c | 12 -- drivers/iommu/rockchip-iommu.c | 10 - 3 files changed, 35 insertions(+), 27 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 8ba90571449cec..bed7cb6e5ee65b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1757,18 +1757,48 @@ static int iommu_get_default_domain_type(struct iommu_group *group, int type; lockdep_assert_held(>mutex); + + /* +* ARM32 drivers supporting CONFIG_ARM_DMA_USE_IOMMU can declare an +* identity_domain and it will automatically become their default +* domain. Later on ARM_DMA_USE_IOMMU will install its UNMANAGED domain. +* Override the selection to IDENTITY if we are sure the driver supports +* it. +*/ + if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) && ops->identity_domain) { If I cared about arm-smmu on 32-bit, I'd bring that up again, but honestly I'm not sure that I do... I think it might end up working after patch #21, and it's currently still broken for lack of .set_platform_dma anyway, so meh. + type = IOMMU_DOMAIN_IDENTITY; + if (best_type && type && best_type != type) + goto err; + best_type = target_type = IOMMU_DOMAIN_IDENTITY; + } + for_each_group_device(group, gdev) { type = best_type; if (ops->def_domain_type) { type = ops->def_domain_type(gdev->dev); - if (best_type && type && best_type != type) + if (best_type && type && best_type != type) { + /* Stick with the last driver override we saw */ + best_type = type; goto err; + } } if (dev_is_pci(gdev->dev) && to_pci_dev(gdev->dev)->untrusted) { - type = IOMMU_DOMAIN_DMA; - if (best_type && type && best_type != type) - goto err; + /* +* We don't have any way for the iommu core code to +* force arm_iommu to activate so we can't enforce +* trusted. Log it and keep going with the IDENTITY +* default domain. +*/ + if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)) { + dev_warn( + gdev->dev, + "PCI device is untrusted but ARM32 does not support secure IOMMU operation, continuing anyway.\n"); To within experimental error, this is dead code. The ARM DMA ops don't
Re: [PATCH v2 04/25] iommu: Add IOMMU_DOMAIN_PLATFORM for S390
On 2023-05-16 01:00, Jason Gunthorpe wrote: The PLATFORM domain will be set as the default domain and attached as normal during probe. The driver will ignore the initial attach from a NULL domain to the PLATFORM domain. After this, the PLATFORM domain's attach_dev will be called whenever we detach from an UNMANAGED domain (eg for VFIO). This is the same time the original design would have called op->detach_dev(). This is temporary until the S390 dma-iommu.c conversion is merged. If we do need a stopgap here, can we please just call the current situation an identity domain? It's true enough in the sense that the IOMMU API is not offering any translation or guarantee of isolation, so the semantics of an identity domain - from the point of view of anything inside the IOMMU API that would be looking - are no weaker or less useful than a "platform" domain whose semantics are intentionally unknown. Then similarly for patch #3 - since we already know s390 is temporary, it seems an anathema to introduce a whole domain type with its own weird ops->default_domain mechanism solely for POWER to not actually use domains with. In terms of reasoning, I don't see that IOMMU_DOMAIN_PLATFORM is any more useful than a NULL default domain, it just renames the problem, and gives us more code to maintain for the privilege. As I say, though, we don't actually need to juggle the semantic of a "we don't know what's happening here" domain around any further, since it works out that a "we're not influencing anything here" domain actually suffices for what we want to reason about, and those are already well-defined. Sure, the platform DMA ops *might* be doing more, but that's beyond the scope of the IOMMU API either way. At that point, lo and behold, s390 and POWER now look just like ARM and the core code only needs a single special case for arch-specific default identity domains, lovely! Thanks, Robin. Tested-by: Heiko Stuebner Tested-by: Niklas Schnelle Signed-off-by: Jason Gunthorpe --- drivers/iommu/s390-iommu.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c index fbf59a8db29b11..f0c867c57a5b9b 100644 --- a/drivers/iommu/s390-iommu.c +++ b/drivers/iommu/s390-iommu.c @@ -142,14 +142,31 @@ static int s390_iommu_attach_device(struct iommu_domain *domain, return 0; } -static void s390_iommu_set_platform_dma(struct device *dev) +/* + * Switch control over the IOMMU to S390's internal dma_api ops + */ +static int s390_iommu_platform_attach(struct iommu_domain *platform_domain, + struct device *dev) { struct zpci_dev *zdev = to_zpci_dev(dev); + if (!zdev->s390_domain) + return 0; + __s390_iommu_detach_device(zdev); zpci_dma_init_device(zdev); + return 0; } +static struct iommu_domain_ops s390_iommu_platform_ops = { + .attach_dev = s390_iommu_platform_attach, +}; + +static struct iommu_domain s390_iommu_platform_domain = { + .type = IOMMU_DOMAIN_PLATFORM, + .ops = _iommu_platform_ops, +}; + static void s390_iommu_get_resv_regions(struct device *dev, struct list_head *list) { @@ -428,12 +445,12 @@ void zpci_destroy_iommu(struct zpci_dev *zdev) } static const struct iommu_ops s390_iommu_ops = { + .default_domain = _iommu_platform_domain, .capable = s390_iommu_capable, .domain_alloc = s390_domain_alloc, .probe_device = s390_iommu_probe_device, .release_device = s390_iommu_release_device, .device_group = generic_device_group, - .set_platform_dma_ops = s390_iommu_set_platform_dma, .pgsize_bitmap = SZ_4K, .get_resv_regions = s390_iommu_get_resv_regions, .default_domain_ops = &(const struct iommu_domain_ops) {
Re: [PATCH 4/7] drm/apu: Add support of IOMMU
On 2023-05-17 15:52, Alexandre Bailon wrote: Some APU devices are behind an IOMMU. For some of these devices, we can't use DMA API because they use static addresses so we have to manually use IOMMU API to correctly map the buffers. Except you still need to use the DMA for the sake of cache coherency and any other aspects :( This adds support of IOMMU. Signed-off-by: Alexandre Bailon Reviewed-by: Julien Stephan --- drivers/gpu/drm/apu/apu_drv.c | 4 + drivers/gpu/drm/apu/apu_gem.c | 174 + drivers/gpu/drm/apu/apu_internal.h | 16 +++ drivers/gpu/drm/apu/apu_sched.c| 28 + include/uapi/drm/apu_drm.h | 12 +- 5 files changed, 233 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/apu/apu_drv.c b/drivers/gpu/drm/apu/apu_drv.c index b6bd340b2bc8..a0dce785a02a 100644 --- a/drivers/gpu/drm/apu/apu_drv.c +++ b/drivers/gpu/drm/apu/apu_drv.c @@ -23,6 +23,10 @@ static const struct drm_ioctl_desc ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap, + DRM_RENDER_ALLOW), }; DEFINE_DRM_GEM_DMA_FOPS(apu_drm_ops); diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c index 0e7b3b27942c..0a91363754c5 100644 --- a/drivers/gpu/drm/apu/apu_gem.c +++ b/drivers/gpu/drm/apu/apu_gem.c @@ -2,6 +2,9 @@ // // Copyright 2020 BayLibre SAS +#include +#include + #include #include @@ -42,6 +45,7 @@ int ioctl_gem_new(struct drm_device *dev, void *data, */ apu_obj->size = args->size; apu_obj->offset = 0; + apu_obj->iommu_refcount = 0; mutex_init(_obj->mutex); ret = drm_gem_handle_create(file_priv, gem_obj, >handle); @@ -54,3 +58,173 @@ int ioctl_gem_new(struct drm_device *dev, void *data, return 0; } + +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj) +{ + int iova_pfn; + int i; + + if (!obj->iommu_sgt) + return; + + mutex_lock(>mutex); + obj->iommu_refcount--; + if (obj->iommu_refcount) { + mutex_unlock(>mutex); + return; + } + + iova_pfn = PHYS_PFN(obj->iova); Using mm layer operations on IOVAs looks wrong. In practice I don't think it's ultimately harmful, other than potentially making less efficient use of IOVA space if the CPU page size is larger than the IOMMU page size, but it's still a bad code smell when you're using an IOVA abstraction that is deliberately decoupled from CPU pages. + for (i = 0; i < obj->iommu_sgt->nents; i++) { + iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn), + PAGE_ALIGN(obj->iommu_sgt->sgl[i].length)); + iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length)); You can unmap a set of IOVA-contiguous mappings as a single range with one call. + } + + sg_free_table(obj->iommu_sgt); + kfree(obj->iommu_sgt); + + free_iova(_drm->iovad, PHYS_PFN(obj->iova)); + mutex_unlock(>mutex); +} + +static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj) +{ + if (obj->funcs) + return obj->funcs->get_sg_table(obj); + return NULL; +} + +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj) +{ + struct apu_gem_object *apu_obj = to_apu_bo(obj); + struct scatterlist *sgl; + phys_addr_t phys; + int total_buf_space; + int iova_pfn; + int iova; + int ret; + int i; + + mutex_lock(_obj->mutex); + apu_obj->iommu_refcount++; + if (apu_obj->iommu_refcount != 1) { + mutex_unlock(_obj->mutex); + return 0; + } + + apu_obj->iommu_sgt = apu_get_sg_table(obj); + if (IS_ERR(apu_obj->iommu_sgt)) { + mutex_unlock(_obj->mutex); + return PTR_ERR(apu_obj->iommu_sgt); + } + + total_buf_space = obj->size; + iova_pfn = alloc_iova_fast(_drm->iovad, + total_buf_space >> PAGE_SHIFT, + apu_drm->iova_limit_pfn, true); If you need things mapped at specific addresses like the commit message claims, the DMA IOVA allocator is a terrible tool for the job. DRM already has its own more flexible abstraction for address space management in the form of drm_mm, so as a DRM driver it would seem a lot more sensible to use one of those. And even if you could justify using this allocator, I can't imagine there's any way you'd need the _fast version (further illustrated by the fact that you're freeing the IOVAs wrongly for that). + apu_obj->iova =
[PATCH v2] drm/mediatek: Stop using iommu_present()
Remove the pointless check. If an IOMMU is providing transparent DMA API ops for any device(s) we care about, the DT code will have enforced the appropriate probe ordering already. Signed-off-by: Robin Murphy --- v2: Rebase to 6.4-rc1 drivers/gpu/drm/mediatek/mtk_drm_drv.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index 6dcb4ba2466c..3e677eb0dc70 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -5,7 +5,6 @@ */ #include -#include #include #include #include @@ -582,9 +581,6 @@ static int mtk_drm_bind(struct device *dev) struct drm_device *drm; int ret, i; - if (!iommu_present(_bus_type)) - return -EPROBE_DEFER; - pdev = of_find_device_by_node(private->mutex_node); if (!pdev) { dev_err(dev, "Waiting for disp-mutex device %pOF\n", -- 2.39.2.101.g768bb238c484.dirty
Re: [PATCH 1/3] iommu/dma: Clean up Kconfig
On 2023-05-05 15:50, Jason Gunthorpe wrote: On Tue, Aug 16, 2022 at 06:28:03PM +0100, Robin Murphy wrote: Although iommu-dma is a per-architecture chonce, that is currently implemented in a rather haphazard way. Selecting from the arch Kconfig was the original logical approach, but is complicated by having to manage dependencies; conversely, selecting from drivers ends up hiding the architecture dependency *too* well. Instead, let's just have it enable itself automatically when IOMMU API support is enabled for the relevant architectures. It can't get much clearer than that. Signed-off-by: Robin Murphy --- arch/arm64/Kconfig | 1 - drivers/iommu/Kconfig | 3 +-- drivers/iommu/amd/Kconfig | 1 - drivers/iommu/intel/Kconfig | 1 - 4 files changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 571cc234d0b3..59af600445c2 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -209,7 +209,6 @@ config ARM64 select HAVE_KPROBES select HAVE_KRETPROBES select HAVE_GENERIC_VDSO - select IOMMU_DMA if IOMMU_SUPPORT select IRQ_DOMAIN select IRQ_FORCED_THREADING select KASAN_VMALLOC if KASAN diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 5c5cb5bee8b6..1d99c2d984fb 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -137,7 +137,7 @@ config OF_IOMMU # IOMMU-agnostic DMA-mapping layer config IOMMU_DMA - bool + def_bool ARM64 || IA64 || X86 Robin, do you remember why you added IA64 here? What is the Itanimum IOMMU driver? config INTEL_IOMMU bool "Support for Intel IOMMU using DMA Remapping Devices" depends on PCI_MSI && ACPI && (X86 || IA64) Yes, really :) Robin.
Re: [PATCH v2 5/5] fbdev: Define framebuffer I/O from Linux' I/O functions
On 2023-04-28 10:27, Thomas Zimmermann wrote: Implement framebuffer I/O helpers, such as fb_read*() and fb_write*() with Linux' regular I/O functions. Remove all ifdef cases for the various architectures. Most of the supported architectures use __raw_() I/O functions or treat framebuffer memory like regular memory. This is also implemented by the architectures' I/O function, so we can use them instead. Sparc uses SBus to connect to framebuffer devices. It provides respective implementations of the framebuffer I/O helpers. The involved sbus_() I/O helpers map to the same code as Sparc's regular I/O functions. As with other platforms, we can use those instead. We leave a TODO item to replace all fb_() functions with their regular I/O counterparts throughout the fbdev drivers. Signed-off-by: Thomas Zimmermann --- include/linux/fb.h | 63 +++--- 1 file changed, 15 insertions(+), 48 deletions(-) diff --git a/include/linux/fb.h b/include/linux/fb.h index 08cb47da71f8..4aa9e90edd17 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -15,7 +15,6 @@ #include #include #include -#include struct vm_area_struct; struct fb_info; @@ -511,58 +510,26 @@ struct fb_info { */ #define STUPID_ACCELF_TEXT_SHIT -// This will go away -#if defined(__sparc__) - -/* We map all of our framebuffers such that big-endian accesses - * are what we want, so the following is sufficient. +/* + * TODO: Update fbdev drivers to call the I/O helpers directly and + * remove the fb_() tokens. */ - -// This will go away -#define fb_readb sbus_readb -#define fb_readw sbus_readw -#define fb_readl sbus_readl -#define fb_readq sbus_readq -#define fb_writeb sbus_writeb -#define fb_writew sbus_writew -#define fb_writel sbus_writel -#define fb_writeq sbus_writeq -#define fb_memset sbus_memset_io -#define fb_memcpy_fromfb sbus_memcpy_fromio -#define fb_memcpy_tofb sbus_memcpy_toio - -#elif defined(__i386__) || defined(__alpha__) || defined(__x86_64__) || \ - defined(__hppa__) || defined(__sh__) || defined(__powerpc__) || \ - defined(__arm__) || defined(__aarch64__) || defined(__mips__) - -#define fb_readb __raw_readb -#define fb_readw __raw_readw -#define fb_readl __raw_readl -#define fb_readq __raw_readq -#define fb_writeb __raw_writeb -#define fb_writew __raw_writew -#define fb_writel __raw_writel -#define fb_writeq __raw_writeq Note that on at least some architectures, the __raw variants are native-endian, whereas the regular accessors are explicitly little-endian, so there is a slight risk of inadvertently changing behaviour on big-endian systems (MIPS most likely, but a few old ARM platforms run BE as well). +#define fb_readb readb +#define fb_readw readw +#define fb_readl readl +#if defined(CONFIG_64BIT) +#define fb_readq readq +#endif You probably don't need to bother making these conditional - 32-bit architectures aren't forbidden from providing readq/writeq if they really want to, and drivers can also use the io-64-nonatomic headers for portability. The build will still fail in a sufficiently obvious manner if neither is true. Thanks, Robin. +#define fb_writeb writeb +#define fb_writew writew +#define fb_writel writel +#if defined(CONFIG_64BIT) +#define fb_writeq writeq +#endif #define fb_memset memset_io #define fb_memcpy_fromfb memcpy_fromio #define fb_memcpy_tofb memcpy_toio -#else - -#define fb_readb(addr) (*(volatile u8 *) (addr)) -#define fb_readw(addr) (*(volatile u16 *) (addr)) -#define fb_readl(addr) (*(volatile u32 *) (addr)) -#define fb_readq(addr) (*(volatile u64 *) (addr)) -#define fb_writeb(b,addr) (*(volatile u8 *) (addr) = (b)) -#define fb_writew(b,addr) (*(volatile u16 *) (addr) = (b)) -#define fb_writel(b,addr) (*(volatile u32 *) (addr) = (b)) -#define fb_writeq(b,addr) (*(volatile u64 *) (addr) = (b)) -#define fb_memset memset -#define fb_memcpy_fromfb memcpy -#define fb_memcpy_tofb memcpy - -#endif - #define FB_LEFT_POS(p, bpp) (fb_be_math(p) ? (32 - (bpp)) : 0) #define FB_SHIFT_HIGH(p, val, bits) (fb_be_math(p) ? (val) >> (bits) : \ (val) << (bits)) ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v2 5/5] fbdev: Define framebuffer I/O from Linux' I/O functions
On 2023-04-28 10:27, Thomas Zimmermann wrote: Implement framebuffer I/O helpers, such as fb_read*() and fb_write*() with Linux' regular I/O functions. Remove all ifdef cases for the various architectures. Most of the supported architectures use __raw_() I/O functions or treat framebuffer memory like regular memory. This is also implemented by the architectures' I/O function, so we can use them instead. Sparc uses SBus to connect to framebuffer devices. It provides respective implementations of the framebuffer I/O helpers. The involved sbus_() I/O helpers map to the same code as Sparc's regular I/O functions. As with other platforms, we can use those instead. We leave a TODO item to replace all fb_() functions with their regular I/O counterparts throughout the fbdev drivers. Signed-off-by: Thomas Zimmermann --- include/linux/fb.h | 63 +++--- 1 file changed, 15 insertions(+), 48 deletions(-) diff --git a/include/linux/fb.h b/include/linux/fb.h index 08cb47da71f8..4aa9e90edd17 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -15,7 +15,6 @@ #include #include #include -#include struct vm_area_struct; struct fb_info; @@ -511,58 +510,26 @@ struct fb_info { */ #define STUPID_ACCELF_TEXT_SHIT -// This will go away -#if defined(__sparc__) - -/* We map all of our framebuffers such that big-endian accesses - * are what we want, so the following is sufficient. +/* + * TODO: Update fbdev drivers to call the I/O helpers directly and + * remove the fb_() tokens. */ - -// This will go away -#define fb_readb sbus_readb -#define fb_readw sbus_readw -#define fb_readl sbus_readl -#define fb_readq sbus_readq -#define fb_writeb sbus_writeb -#define fb_writew sbus_writew -#define fb_writel sbus_writel -#define fb_writeq sbus_writeq -#define fb_memset sbus_memset_io -#define fb_memcpy_fromfb sbus_memcpy_fromio -#define fb_memcpy_tofb sbus_memcpy_toio - -#elif defined(__i386__) || defined(__alpha__) || defined(__x86_64__) || \ - defined(__hppa__) || defined(__sh__) || defined(__powerpc__) || \ - defined(__arm__) || defined(__aarch64__) || defined(__mips__) - -#define fb_readb __raw_readb -#define fb_readw __raw_readw -#define fb_readl __raw_readl -#define fb_readq __raw_readq -#define fb_writeb __raw_writeb -#define fb_writew __raw_writew -#define fb_writel __raw_writel -#define fb_writeq __raw_writeq Note that on at least some architectures, the __raw variants are native-endian, whereas the regular accessors are explicitly little-endian, so there is a slight risk of inadvertently changing behaviour on big-endian systems (MIPS most likely, but a few old ARM platforms run BE as well). +#define fb_readb readb +#define fb_readw readw +#define fb_readl readl +#if defined(CONFIG_64BIT) +#define fb_readq readq +#endif You probably don't need to bother making these conditional - 32-bit architectures aren't forbidden from providing readq/writeq if they really want to, and drivers can also use the io-64-nonatomic headers for portability. The build will still fail in a sufficiently obvious manner if neither is true. Thanks, Robin. +#define fb_writeb writeb +#define fb_writew writew +#define fb_writel writel +#if defined(CONFIG_64BIT) +#define fb_writeq writeq +#endif #define fb_memset memset_io #define fb_memcpy_fromfb memcpy_fromio #define fb_memcpy_tofb memcpy_toio -#else - -#define fb_readb(addr) (*(volatile u8 *) (addr)) -#define fb_readw(addr) (*(volatile u16 *) (addr)) -#define fb_readl(addr) (*(volatile u32 *) (addr)) -#define fb_readq(addr) (*(volatile u64 *) (addr)) -#define fb_writeb(b,addr) (*(volatile u8 *) (addr) = (b)) -#define fb_writew(b,addr) (*(volatile u16 *) (addr) = (b)) -#define fb_writel(b,addr) (*(volatile u32 *) (addr) = (b)) -#define fb_writeq(b,addr) (*(volatile u64 *) (addr) = (b)) -#define fb_memset memset -#define fb_memcpy_fromfb memcpy -#define fb_memcpy_tofb memcpy - -#endif - #define FB_LEFT_POS(p, bpp) (fb_be_math(p) ? (32 - (bpp)) : 0) #define FB_SHIFT_HIGH(p, val, bits) (fb_be_math(p) ? (val) >> (bits) : \ (val) << (bits))
Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper
On 31/03/2023 3:00 pm, Arnd Bergmann wrote: On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: On 2023-03-27 13:13, Arnd Bergmann wrote: [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Ok, thanks for the information. I'm still not sure whether this can result in the situation where PG_dcache_clean is wrong though. Specifically, the question is whether a DMA to a coherent buffer can end up in a dirty L1 dcache of one core and require to write back the dcache before invalidating the icache for that page. On ia64, this is not the case, the optimization here is to only flush the icache after a coherent DMA into an executable user page, while Arm only does this for noncoherent DMA but not coherent DMA. From your explanation it sounds like this might happen, even though that would mean that "coherent" DMA is slightly less coherent than it is elsewhere. To be on the safe side, I'd have to pass a flag into arch_dma_mark_clean() about coherency, to let the arm implementation still require the extra dcache flush for coherent DMA, while ia64 can ignore that flag. Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA write should be pretty much equivalent to a coherent write by another CPU (or indeed the local CPU itself) - nothing says that it *couldn't* dirty a line in a data cache above the level of unification, so in general the assumption must be that, yes, if coherent DMA is writing data intended to be executable, then it's going to want a Dcache clean to PoU and an Icache invalidate to PoU before trying to execute it. By comparison, a non-coherent DMA transfer will inherently have to invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot leave dirty data above the PoU, so only the Icache maintenance is required in the executable case. (FWIW I believe the Armv8 IDC/DIC features can safely be considered irrelevant to 32-bit kernels) I don't know a great deal about IA-64, but it appears to be using its PG_arch_1 flag in a subtly different manner to Arm, namely to optimise out the *Icache* maintenance. So if anything, it seems IA-64 is the weirdo here (who'd have guessed?) where DMA manages to be *more* coherent than the CPUs themselves :) This is all now making me think we need some careful consideration of whether the benefits of consolidating code outweigh the confusion of conflating multiple different meanings of "clean" together... Thanks, Robin.
Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper
On 31/03/2023 3:00 pm, Arnd Bergmann wrote: On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: On 2023-03-27 13:13, Arnd Bergmann wrote: [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Ok, thanks for the information. I'm still not sure whether this can result in the situation where PG_dcache_clean is wrong though. Specifically, the question is whether a DMA to a coherent buffer can end up in a dirty L1 dcache of one core and require to write back the dcache before invalidating the icache for that page. On ia64, this is not the case, the optimization here is to only flush the icache after a coherent DMA into an executable user page, while Arm only does this for noncoherent DMA but not coherent DMA. From your explanation it sounds like this might happen, even though that would mean that "coherent" DMA is slightly less coherent than it is elsewhere. To be on the safe side, I'd have to pass a flag into arch_dma_mark_clean() about coherency, to let the arm implementation still require the extra dcache flush for coherent DMA, while ia64 can ignore that flag. Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA write should be pretty much equivalent to a coherent write by another CPU (or indeed the local CPU itself) - nothing says that it *couldn't* dirty a line in a data cache above the level of unification, so in general the assumption must be that, yes, if coherent DMA is writing data intended to be executable, then it's going to want a Dcache clean to PoU and an Icache invalidate to PoU before trying to execute it. By comparison, a non-coherent DMA transfer will inherently have to invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot leave dirty data above the PoU, so only the Icache maintenance is required in the executable case. (FWIW I believe the Armv8 IDC/DIC features can safely be considered irrelevant to 32-bit kernels) I don't know a great deal about IA-64, but it appears to be using its PG_arch_1 flag in a subtly different manner to Arm, namely to optimise out the *Icache* maintenance. So if anything, it seems IA-64 is the weirdo here (who'd have guessed?) where DMA manages to be *more* coherent than the CPUs themselves :) This is all now making me think we need some careful consideration of whether the benefits of consolidating code outweigh the confusion of conflating multiple different meanings of "clean" together... Thanks, Robin. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper
On 2023-03-27 13:13, Arnd Bergmann wrote: From: Arnd Bergmann The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Cheers, Robin. Signed-off-by: Arnd Bergmann --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++ 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, >flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* -* Mark the D-cache clean for these pages to avoid extra flushing. -*/ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, >flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if
Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper
On 2023-03-27 13:13, Arnd Bergmann wrote: From: Arnd Bergmann The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Cheers, Robin. Signed-off-by: Arnd Bergmann --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++ 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, >flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* -* Mark the D-cache clean for these pages to avoid extra flushing. -*/ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, >flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if
Re: [PATCH 3/3] of: address: Use dma_default_coherent to determine default coherency
On 2023-02-22 13:37, Jiaxun Yang wrote: As for now all arches have dma_default_coherent matched with default DMA coherency for of devices, so there is no need to have a standalone config option. This also fixes a case that for some MIPS platforms, coherency information is not carried in devicetree and kernel will override dma_default_coherent at early boot. Note for PowerPC: CONFIG_OF_DMA_DEFUALT_COHERENT was only selected when CONFIG_NOT_COHERENT_CACHE is false, in this case dma_default_coherent will be true, so it still matches present behavior. Note for RISC-V: dma_default_coherent is set to true at init code in this series. OK, so the fundamental problem here is that we have two slightly different conflicting mechanisms, the ex-PowerPC config option, and the ex-MIPS dma_default_coherent for which of_dma_is_coherent() has apparently been broken forever. I'd agree that it's worth consolidating the two, but please separate out the fix as below, so it's feasible to backport without having to muck about in arch code. Signed-off-by: Jiaxun Yang --- arch/powerpc/Kconfig | 1 - arch/riscv/Kconfig | 1 - drivers/of/Kconfig | 4 drivers/of/address.c | 2 +- 4 files changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2c9cdf1d8761..c67e5da714f7 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -272,7 +272,6 @@ config PPC select NEED_PER_CPU_PAGE_FIRST_CHUNKif PPC64 select NEED_SG_DMA_LENGTH select OF - select OF_DMA_DEFAULT_COHERENT if !NOT_COHERENT_CACHE select OF_EARLY_FLATTREE select OLD_SIGACTIONif PPC32 select OLD_SIGSUSPEND diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 1d46a268ce16..406c6816d289 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -119,7 +119,6 @@ config RISCV select MODULES_USE_ELF_RELA if MODULES select MODULE_SECTIONS if MODULES select OF - select OF_DMA_DEFAULT_COHERENT select OF_EARLY_FLATTREE select OF_IRQ select PCI_DOMAINS_GENERIC if PCI diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig index 644386833a7b..e40f10bf2ba4 100644 --- a/drivers/of/Kconfig +++ b/drivers/of/Kconfig @@ -102,8 +102,4 @@ config OF_OVERLAY config OF_NUMA bool -config OF_DMA_DEFAULT_COHERENT - # arches should select this if DMA is coherent by default for OF devices - bool - endif # OF diff --git a/drivers/of/address.c b/drivers/of/address.c index 4c0b169ef9bf..23ade4919853 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -1103,7 +1103,7 @@ phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np) bool of_dma_is_coherent(struct device_node *np) { struct device_node *node; - bool is_coherent = IS_ENABLED(CONFIG_OF_DMA_DEFAULT_COHERENT); + bool is_coherent = dma_default_coherent; AFAICS, all you should actually need is a single self-contained addition here, something like: + /* +* DT-based MIPS doesn't use OF_DMA_DEFAULT_COHERENT, but +* might override the system-wide default at runtime. +*/ +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \ + defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \ + defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) + is_coherent = dma_default_coherent; +#endif node = of_node_get(np); Then *after* that's fixed, we can do a more comprehensive refactoring to merge the two mechanisms properly. FWIW I think I'd prefer an approach closer to the first one, where config options control the initial value of dma_default_coherent rather than architectures having to override it unconditionally (and TBH I'd also like to have a generic config symbol for whether an arch supports per-device coherency or not). Thanks, Robin.
Re: [PATCH 0/7] MIPS DMA coherence fixes
On 2023-02-22 13:04, Jiaxun Yang wrote: 2023年2月22日 12:55,Robin Murphy 写道: On 2023-02-21 19:55, Jiaxun Yang wrote: 2023年2月21日 19:46,Robin Murphy 写道: On 2023-02-21 18:15, Jiaxun Yang wrote: 2023年2月21日 17:54,Christoph Hellwig 写道: Can you explain the motivation here? Also why riscv patches are at the end of a mips fіxes series? Ah sorry for any confusion. So the main purpose of this patch is to fix MIPS’s broken per-device coherency. To be more precise, we want to be able to control the default coherency for all devices probed from devicetree in early boot code. Including the patch which actually does that would be helpful. As it is, patches 4-7 here just appear to be moving an option around for no practical effect. Well the affect is default coherency of devicetree probed devices are now following dma_default_coherent instead of a static Kconfig option. For MIPS platform, dma_default_coherent will be determined by boot code. "Will be" is the issue I'm getting at. We can't review some future promise of a patch, we can only review actual patches. And it's hard to meaningfully review preparatory patches for some change without the full context of that change. Actually this already present in current MIPS platform code. arch/mips/mti-malta is setting dma_default_coherent on boot, and it’s devicetree does not explicitly specify coherency. OK, this really needs to be explained much more clearly. I read this series as 3 actual fix patches, then 3 patches adding a new option to replace an existing one on the grounds that it "can be useful" for unspecified purposes, then a final cleanup patch removing the old option that has now been superseded. Going back and looking closely I see there is actually a brief mention in the cleanup patch that it also happens to fix some issue, but even then it doesn't clearly explain what the issue really is or how and why the fix works and is appropriate. Ideally, functional fixes and cleanup should be in distinct patches whenever that is reasonable. Sometimes the best fix is inherently a cleanup, but in such cases the patch should always be presented as the fix being its primary purpose. Please also use the cover letter to give reviewers an overview of the whole series if it's not merely a set of loosely-related patches that just happened to be convenient so send all together. I think I do at least now understand the underlying problem well enough to have a think about whether this is the best way to address it. Thanks, Robin.
Re: [PATCH 0/7] MIPS DMA coherence fixes
On 2023-02-21 19:55, Jiaxun Yang wrote: 2023年2月21日 19:46,Robin Murphy 写道: On 2023-02-21 18:15, Jiaxun Yang wrote: 2023年2月21日 17:54,Christoph Hellwig 写道: Can you explain the motivation here? Also why riscv patches are at the end of a mips fіxes series? Ah sorry for any confusion. So the main purpose of this patch is to fix MIPS’s broken per-device coherency. To be more precise, we want to be able to control the default coherency for all devices probed from devicetree in early boot code. Including the patch which actually does that would be helpful. As it is, patches 4-7 here just appear to be moving an option around for no practical effect. Well the affect is default coherency of devicetree probed devices are now following dma_default_coherent instead of a static Kconfig option. For MIPS platform, dma_default_coherent will be determined by boot code. "Will be" is the issue I'm getting at. We can't review some future promise of a patch, we can only review actual patches. And it's hard to meaningfully review preparatory patches for some change without the full context of that change. Thanks, Robin.
Re: [PATCH 0/7] MIPS DMA coherence fixes
On 2023-02-21 18:15, Jiaxun Yang wrote: 2023年2月21日 17:54,Christoph Hellwig 写道: Can you explain the motivation here? Also why riscv patches are at the end of a mips fіxes series? Ah sorry for any confusion. So the main purpose of this patch is to fix MIPS’s broken per-device coherency. To be more precise, we want to be able to control the default coherency for all devices probed from devicetree in early boot code. Including the patch which actually does that would be helpful. As it is, patches 4-7 here just appear to be moving an option around for no practical effect. Robin. To achieve that I decided to reuse dma_default_coherent to set default coherency for devicetree. And all later patches are severing for this purpose. Thanks - Jiaxun
Re: [PATCH 4/7] dma-mapping: Always provide dma_default_coherent
On 2023-02-21 17:58, Christoph Hellwig wrote: On Tue, Feb 21, 2023 at 12:46:10PM +, Jiaxun Yang wrote: dma_default_coherent can be useful for determine default coherency even on arches without noncoherent support. How? Indeed, "default" is conceptually meaningless when there is no possible alternative :/ Robin.
Re: [PATCH v2 04/10] iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous()
On 2023-01-18 18:00, Jason Gunthorpe wrote: Change the sg_alloc_table_from_pages() allocation that was hardwired to GFP_KERNEL to use the gfp parameter like the other allocations in this function. Auditing says this is never called from an atomic context, so it is safe as is, but reads wrong. I think the point may have been that the sgtable metadata is a logically-distinct allocation from the buffer pages themselves. Much like the allocation of the pages array itself further down in __iommu_dma_alloc_pages(). I see these days it wouldn't be catastrophic to pass GFP_HIGHMEM into __get_free_page() via sg_kmalloc(), but still, allocating implementation-internal metadata with all the same constraints as a DMA buffer has just as much smell of wrong about it IMO. I'd say the more confusing thing about this particular context is why we're using iommu_map_sg_atomic() further down - that seems to have been an oversight in 781ca2de89ba, since this particular path has never supported being called in atomic context. Overall I'm starting to wonder if it might not be better to stick a "use GFP_KERNEL_ACCOUNT if you allocate" flag in the domain for any level of the API internals to pick up as appropriate, rather than propagate per-call gfp flags everywhere. As it stands we're still missing potential pagetable and other domain-related allocations by drivers in .attach_dev and even (in probably-shouldn't-really-happen cases) .unmap_pages... Thanks, Robin. Signed-off-by: Jason Gunthorpe --- drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8c2788633c1766..e4bf1bb159f7c7 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -822,7 +822,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev, if (!iova) goto out_free_pages; - if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL)) + if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, gfp)) goto out_free_iova; if (!(ioprot & IOMMU_CACHE)) { ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2 04/10] iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous()
On 2023-01-18 18:00, Jason Gunthorpe wrote: Change the sg_alloc_table_from_pages() allocation that was hardwired to GFP_KERNEL to use the gfp parameter like the other allocations in this function. Auditing says this is never called from an atomic context, so it is safe as is, but reads wrong. I think the point may have been that the sgtable metadata is a logically-distinct allocation from the buffer pages themselves. Much like the allocation of the pages array itself further down in __iommu_dma_alloc_pages(). I see these days it wouldn't be catastrophic to pass GFP_HIGHMEM into __get_free_page() via sg_kmalloc(), but still, allocating implementation-internal metadata with all the same constraints as a DMA buffer has just as much smell of wrong about it IMO. I'd say the more confusing thing about this particular context is why we're using iommu_map_sg_atomic() further down - that seems to have been an oversight in 781ca2de89ba, since this particular path has never supported being called in atomic context. Overall I'm starting to wonder if it might not be better to stick a "use GFP_KERNEL_ACCOUNT if you allocate" flag in the domain for any level of the API internals to pick up as appropriate, rather than propagate per-call gfp flags everywhere. As it stands we're still missing potential pagetable and other domain-related allocations by drivers in .attach_dev and even (in probably-shouldn't-really-happen cases) .unmap_pages... Thanks, Robin. Signed-off-by: Jason Gunthorpe --- drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8c2788633c1766..e4bf1bb159f7c7 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -822,7 +822,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev, if (!iova) goto out_free_pages; - if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL)) + if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, gfp)) goto out_free_iova; if (!(ioprot & IOMMU_CACHE)) {