Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver

2024-06-26 Thread Robin Murphy

On 2024-06-24 3:36 pm, Teddy Astie wrote:

Hello Robin,
Thanks for the thourough review.


diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 0af39bbbe3a3..242cefac77c9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -480,6 +480,15 @@ config VIRTIO_IOMMU
     Say Y here if you intend to run this kernel as a guest.
+config XEN_IOMMU
+    bool "Xen IOMMU driver"
+    depends on XEN_DOM0


Clearly this depends on X86 as well.


Well, I don't intend this driver to be X86-only, even though the current
Xen RFC doesn't support ARM (yet). Unless there is a counter-indication
for it ?


It's purely practical - even if you drop the asm/iommu.h stuff it would 
still break ARM DOM0 builds due to HYPERVISOR_iommu_op() only being 
defined for x86. And it's better to add a dependency here to make it 
clear what's *currently* supported, than to add dummy code to allow it 
to build for ARM if that's not actually tested or usable yet.



+bool xen_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+    switch (cap) {
+    case IOMMU_CAP_CACHE_COHERENCY:
+    return true;


Will the PV-IOMMU only ever be exposed on hardware where that really is
always true?



On the hypervisor side, the PV-IOMMU interface always implicitely flush
the IOMMU hardware on map/unmap operation, so at the end of the
hypercall, the cache should be always coherent IMO.


As Jason already brought up, this is not about TLBs or anything cached 
by the IOMMU itself, it's about the memory type(s) it can create 
mappings with. Returning true here says Xen guarantees it can use a 
cacheable memory type which will let DMA snoop the CPU caches. 
Furthermore, not explicitly handling IOMMU_CACHE in the map_pages op 
then also implies that it will *always* do that, so you couldn't 
actually get an uncached mapping even if you wanted one.



+    while (xen_pg_count) {
+    size_t to_unmap = min(xen_pg_count, max_nr_pages);
+
+    //pr_info("Unmapping %lx-%lx\n", dfn, dfn + to_unmap - 1);
+
+    op.unmap_pages.dfn = dfn;
+    op.unmap_pages.nr_pages = to_unmap;
+
+    ret = HYPERVISOR_iommu_op();
+
+    if (ret)
+    pr_warn("Unmap failure (%lx-%lx)\n", dfn, dfn + to_unmap
- 1);


But then how

would it ever happen anyway? Unmap is a domain op, so a domain which
doesn't allow unmapping shouldn't offer it in the first place...


Unmap failing should be exceptionnal, but is possible e.g with
transparent superpages (like Xen IOMMU drivers do). Xen drivers folds
appropriate contiguous mappings into superpages entries to optimize
memory usage and iotlb. However, if you unmap in the middle of a region
covered by a superpage entry, this is no longer a valid superpage entry,
and you need to allocate and fill the lower levels, which is faillible
if lacking memory.


OK, so in the worst case you could potentially have a partial unmap 
failure if the range crosses a superpage boundary and the end part 
happens to have been folded, and Xen doesn't detect and prepare that 
allocation until it's already unmapped up to the boundary. If that is 
so, does the hypercall interface give any information about partial 
failure, or can any error only be taken to mean that some or all of the 
given range may or may not have be unmapped now?

In this case I'd argue that you really *do* want to return short, in the
hope of propagating the error back up and letting the caller know the
address space is now messed up before things start blowing up even more
if they keep going and subsequently try to map new pages into
not-actually-unmapped VAs.


While mapping on top of another mapping is ok for us (it's just going to
override the previous mapping), I definetely agree that having the
address space messed up is not good.


Oh, indeed, quietly replacing existing PTEs might help paper over errors 
in this particular instance, but it does then allow *other* cases to go 
wrong in fun and infuriating ways :)



+static struct iommu_domain default_domain = {
+    .ops = &(const struct iommu_domain_ops){
+    .attach_dev = default_domain_attach_dev
+    }
+};


Looks like you could make it a static xen_iommu_domain and just use the
normal attach callback? Either way please name it something less
confusing like xen_iommu_identity_domain - "default" is far too
overloaded round here already...



Yes, although, if in the future, we can have either this domain as
identity or blocking/paging depending on some upper level configuration.
Should we have both identity and blocking domains, and only setting the
relevant one in iommu_ops, or keep this naming.


That's something that can be considered if and when it does happen. For 
now, if it's going to be pre-mapped as an identity domain, then let's 
just treat it as such and keep things straightforward.



+void __exit xen_iommu_fini(void)
+{
+    pr_info("Unregistering Xen IOMMU driver\n");
+
+    iommu_device_unregister(_iommu_device);
+    

Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver

2024-06-24 Thread Robin Murphy

On 2024-06-24 6:36 pm, Easwar Hariharan wrote:

Hi Jason,

On 6/24/2024 9:32 AM, Jason Gunthorpe wrote:

On Mon, Jun 24, 2024 at 02:36:45PM +, Teddy Astie wrote:

+bool xen_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+    switch (cap) {
+    case IOMMU_CAP_CACHE_COHERENCY:
+    return true;


Will the PV-IOMMU only ever be exposed on hardware where that really is
always true?



On the hypervisor side, the PV-IOMMU interface always implicitely flush
the IOMMU hardware on map/unmap operation, so at the end of the
hypercall, the cache should be always coherent IMO.


Cache coherency is a property of the underlying IOMMU HW and reflects
the ability to prevent generating transactions that would bypass the
cache.

On AMD and Intel IOMMU HW this maps to a bit in their PTEs that must
always be set to claim this capability.

No ARM SMMU supports it yet.



Unrelated to this patch: Both the arm-smmu and arm-smmu-v3 drivers claim
this capability if the device tree/IORT table have the corresponding flags.

I read through DEN0049 to determine what are the knock-on effects, or
equivalently the requirements to set those flags in the IORT, but came
up empty. Could you help with what I'm missing to resolve the apparent
contradiction between your statement and the code?


We did rejig things slightly a while back. The status quo now is that 
IOMMU_CAP_CACHE_COHERENCY mostly covers whether IOMMU mappings can make 
device accesses coherent at all, tied in with the IOMMU_CACHE prot value 
- this is effectively forced for Intel and AMD, while for SMMU we have 
to take a guess, but as commented it's a pretty reasonable assumption 
that if the SMMU's own output for table walks etc. is coherent then its 
translation outputs are likely to be too. The further property of being 
able to then enforce a coherent mapping regardless of what an endpoint 
might try to get around it (PCIe No Snoop etc.) is now under the 
enforce_cache_coherency op - that's what SMMU can't guarantee for now 
due to the IMP-DEF nature of whether S2FWB overrides No Snoop or not.


Thanks,
Robin.



Re: [RFC PATCH] iommu/xen: Add Xen PV-IOMMU driver

2024-06-24 Thread Robin Murphy

On 2024-06-23 4:21 am, Baolu Lu wrote:

On 6/21/24 11:09 PM, Teddy Astie wrote:

Le 19/06/2024 à 18:30, Jason Gunthorpe a écrit :

On Thu, Jun 13, 2024 at 01:50:22PM +, Teddy Astie wrote:


+struct iommu_domain *xen_iommu_domain_alloc(unsigned type)
+{
+    struct xen_iommu_domain *domain;
+    u16 ctx_no;
+    int ret;
+
+    if (type & IOMMU_DOMAIN_IDENTITY) {
+    /* use default domain */
+    ctx_no = 0;
Please use the new ops, domain_alloc_paging and the static identity 
domain.

Yes, in the v2, I will use this newer interface.

I have a question on this new interface : is it valid to not have a
identity domain (and "default domain" being blocking); well in the
current implementation it doesn't really matter, but at some point, we
may want to allow not having it (thus making this driver mandatory).


It's valid to not have an identity domain if "default domain being
blocking" means a paging domain with no mappings.

In the iommu driver's iommu_ops::def_domain_type callback, just always
return IOMMU_DOMAIN_DMA, which indicates that the iommu driver doesn't
support identity translation.


That's not necessary - if neither ops->identity_domain nor 
ops->domain_alloc(IOMMU_DOMAIN_IDENTITY) gives a valid domain then we 
fall back to IOMMU_DOMAIN_DMA anyway.


Thanks,
Robin.



Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver

2024-06-21 Thread Robin Murphy

On 2024-06-21 5:08 pm, TSnake41 wrote:

From: Teddy Astie 

In the context of Xen, Linux runs as Dom0 and doesn't have access to the
machine IOMMU. Although, a IOMMU is mandatory to use some kernel features
such as VFIO or DMA protection.

In Xen, we added a paravirtualized IOMMU with iommu_op hypercall in order to
allow Dom0 to implement such feature. This commit introduces a new IOMMU driver
that uses this new hypercall interface.

Signed-off-by Teddy Astie 
---
Changes since v1 :
* formatting changes
* applied Jan Beulich proposed changes : removed vim notes at end of pv-iommu.h
* applied Jason Gunthorpe proposed changes : use new ops and remove redundant
checks
---
  arch/x86/include/asm/xen/hypercall.h |   6 +
  drivers/iommu/Kconfig|   9 +
  drivers/iommu/Makefile   |   1 +
  drivers/iommu/xen-iommu.c| 489 +++
  include/xen/interface/memory.h   |  33 ++
  include/xen/interface/pv-iommu.h | 104 ++
  include/xen/interface/xen.h  |   1 +
  7 files changed, 643 insertions(+)
  create mode 100644 drivers/iommu/xen-iommu.c
  create mode 100644 include/xen/interface/pv-iommu.h

diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index a2dd24947eb8..6b1857f27c14 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -490,6 +490,12 @@ HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
return _hypercall2(int, xenpmu_op, op, arg);
  }
  
+static inline int

+HYPERVISOR_iommu_op(void *arg)
+{
+   return _hypercall1(int, iommu_op, arg);
+}
+
  static inline int
  HYPERVISOR_dm_op(
domid_t dom, unsigned int nr_bufs, struct xen_dm_op_buf *bufs)
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 0af39bbbe3a3..242cefac77c9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -480,6 +480,15 @@ config VIRTIO_IOMMU
  
  	  Say Y here if you intend to run this kernel as a guest.
  
+config XEN_IOMMU

+   bool "Xen IOMMU driver"
+   depends on XEN_DOM0


Clearly this depends on X86 as well.


+   select IOMMU_API
+   help
+   Xen PV-IOMMU driver for Dom0.
+
+   Say Y here if you intend to run this guest as Xen Dom0.
+
  config SPRD_IOMMU
tristate "Unisoc IOMMU Support"
depends on ARCH_SPRD || COMPILE_TEST
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 542760d963ec..393afe22c901 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
  obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
  obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
  obj-$(CONFIG_APPLE_DART) += apple-dart.o
+obj-$(CONFIG_XEN_IOMMU) += xen-iommu.o
\ No newline at end of file
diff --git a/drivers/iommu/xen-iommu.c b/drivers/iommu/xen-iommu.c
new file mode 100644
index ..b765445d27cd
--- /dev/null
+++ b/drivers/iommu/xen-iommu.c
@@ -0,0 +1,489 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Xen PV-IOMMU driver.
+ *
+ * Copyright (C) 2024 Vates SAS
+ *
+ * Author: Teddy Astie 
+ *
+ */
+
+#define pr_fmt(fmt)"xen-iommu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 


Please drop this; it's a driver, not a DMA ops implementation.


+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_DESCRIPTION("Xen IOMMU driver");
+MODULE_AUTHOR("Teddy Astie ");
+MODULE_LICENSE("GPL");
+
+#define MSI_RANGE_START(0xfee0)
+#define MSI_RANGE_END  (0xfeef)
+
+#define XEN_IOMMU_PGSIZES   (0x1000)
+
+struct xen_iommu_domain {
+   struct iommu_domain domain;
+
+   u16 ctx_no; /* Xen PV-IOMMU context number */
+};
+
+static struct iommu_device xen_iommu_device;
+
+static uint32_t max_nr_pages;
+static uint64_t max_iova_addr;
+
+static spinlock_t lock;


Not a great name - usually it's good to name a lock after what it 
protects. Although perhaps it is already, since AFAICS this isn't 
actually used anywhere anyway.



+static inline struct xen_iommu_domain *to_xen_iommu_domain(struct iommu_domain 
*dom)
+{
+   return container_of(dom, struct xen_iommu_domain, domain);
+}
+
+static inline u64 addr_to_pfn(u64 addr)
+{
+   return addr >> 12;
+}
+
+static inline u64 pfn_to_addr(u64 pfn)
+{
+   return pfn << 12;
+}
+
+bool xen_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+   switch (cap) {
+   case IOMMU_CAP_CACHE_COHERENCY:
+   return true;


Will the PV-IOMMU only ever be exposed on hardware where that really is 
always true?



+
+   default:
+   return false;
+   }
+}
+
+struct iommu_domain *xen_iommu_domain_alloc_paging(struct device *dev)
+{
+   struct xen_iommu_domain *domain;
+   int ret;
+
+   struct pv_iommu_op op = {
+   

Re: [PATCH v4 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk

2024-06-17 Thread Robin Murphy

On 23/05/2024 6:52 pm, Rob Clark wrote:

From: Rob Clark 

Add an io-pgtable method to walk the pgtable returning the raw PTEs that
would be traversed for a given iova access.


Have to say I'm a little torn here - with my iommu-dma hat on I'm not 
super enthusiastic about adding any more overhead to iova_to_phys, but 
in terms of maintaining io-pgtable I do like the overall shape of the 
implementation...


Will, how much would you hate a compromise of inlining iova_to_phys as 
the default walk behaviour if cb is NULL? :)


That said, looking at the unmap figures for dma_map_benchmark on a 
Neoverse N1, any difference I think I see is still well within the 
noise, so maybe a handful of extra indirect calls isn't really enough to 
worry about?


Cheers,
Robin.


Signed-off-by: Rob Clark 
---
  drivers/iommu/io-pgtable-arm.c | 51 --
  include/linux/io-pgtable.h |  4 +++
  2 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f7828a7aad41..f47a0e64bb35 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -693,17 +693,19 @@ static size_t arm_lpae_unmap_pages(struct io_pgtable_ops 
*ops, unsigned long iov
data->start_level, ptep);
  }
  
-static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,

-unsigned long iova)
+static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned long 
iova,
+   int (*cb)(void *cb_data, void *pte, int level),
+   void *cb_data)
  {
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
arm_lpae_iopte pte, *ptep = data->pgd;
int lvl = data->start_level;
+   int ret;
  
  	do {

/* Valid IOPTE pointer? */
if (!ptep)
-   return 0;
+   return -EFAULT;
  
  		/* Grab the IOPTE we're interested in */

ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
@@ -711,22 +713,52 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
  
  		/* Valid entry? */

if (!pte)
-   return 0;
+   return -EFAULT;
+
+   ret = cb(cb_data, , lvl);
+   if (ret)
+   return ret;
  
-		/* Leaf entry? */

+   /* Leaf entry?  If so, we've found the translation */
if (iopte_leaf(pte, lvl, data->iop.fmt))
-   goto found_translation;
+   return 0;
  
  		/* Take it to the next level */

ptep = iopte_deref(pte, data);
} while (++lvl < ARM_LPAE_MAX_LEVELS);
  
  	/* Ran out of page tables to walk */

+   return -EFAULT;
+}
+
+struct iova_to_phys_walk_data {
+   arm_lpae_iopte pte;
+   int level;
+};
+
+static int iova_to_phys_walk_cb(void *cb_data, void *pte, int level)
+{
+   struct iova_to_phys_walk_data *d = cb_data;
+
+   d->pte = *(arm_lpae_iopte *)pte;
+   d->level = level;
+
return 0;
+}
+
+static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+unsigned long iova)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct iova_to_phys_walk_data d;
+   int ret;
+
+   ret = arm_lpae_pgtable_walk(ops, iova, iova_to_phys_walk_cb, );
+   if (ret)
+   return 0;
  
-found_translation:

-   iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
-   return iopte_to_paddr(pte, data) | iova;
+   iova &= (ARM_LPAE_BLOCK_SIZE(d.level, data) - 1);
+   return iopte_to_paddr(d.pte, data) | iova;
  }
  
  static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)

@@ -807,6 +839,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.map_pages  = arm_lpae_map_pages,
.unmap_pages= arm_lpae_unmap_pages,
.iova_to_phys   = arm_lpae_iova_to_phys,
+   .pgtable_walk   = arm_lpae_pgtable_walk,
};
  
  	return data;

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 86cf1f7ae389..261b48af068a 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -177,6 +177,7 @@ struct io_pgtable_cfg {
   * @map_pages:Map a physically contiguous range of pages of the same size.
   * @unmap_pages:  Unmap a range of virtually contiguous pages of the same 
size.
   * @iova_to_phys: Translate iova to physical address.
+ * @pgtable_walk: (optional) Perform a page table walk for a given iova.
   *
   * These functions map directly onto the iommu_ops member functions with
   * the same names.
@@ -190,6 +191,9 @@ struct io_pgtable_ops {
  struct iommu_iotlb_gather *gather);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
   

Re: [PATCH v4 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk

2024-06-17 Thread Robin Murphy

On 23/05/2024 6:52 pm, Rob Clark wrote:

From: Rob Clark 

Add an io-pgtable method to walk the pgtable returning the raw PTEs that
would be traversed for a given iova access.


Have to say I'm a little torn here - with my iommu-dma hat on I'm not 
super enthusiastic about adding any more overhead to iova_to_phys, but 
in terms of maintaining io-pgtable I do like the overall shape of the 
implementation...


Will, how much would you hate a compromise of inlining iova_to_phys as 
the default walk behaviour if cb is NULL? :)


That said, looking at the unmap figures for dma_map_benchmark on a 
Neoverse N1, any difference I think I see is still well within the 
noise, so maybe a handful of extra indirect calls isn't really enough to 
worry about?


Cheers,
Robin.


Signed-off-by: Rob Clark 
---
  drivers/iommu/io-pgtable-arm.c | 51 --
  include/linux/io-pgtable.h |  4 +++
  2 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f7828a7aad41..f47a0e64bb35 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -693,17 +693,19 @@ static size_t arm_lpae_unmap_pages(struct io_pgtable_ops 
*ops, unsigned long iov
data->start_level, ptep);
  }
  
-static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,

-unsigned long iova)
+static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned long 
iova,
+   int (*cb)(void *cb_data, void *pte, int level),
+   void *cb_data)
  {
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
arm_lpae_iopte pte, *ptep = data->pgd;
int lvl = data->start_level;
+   int ret;
  
  	do {

/* Valid IOPTE pointer? */
if (!ptep)
-   return 0;
+   return -EFAULT;
  
  		/* Grab the IOPTE we're interested in */

ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
@@ -711,22 +713,52 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
  
  		/* Valid entry? */

if (!pte)
-   return 0;
+   return -EFAULT;
+
+   ret = cb(cb_data, , lvl);
+   if (ret)
+   return ret;
  
-		/* Leaf entry? */

+   /* Leaf entry?  If so, we've found the translation */
if (iopte_leaf(pte, lvl, data->iop.fmt))
-   goto found_translation;
+   return 0;
  
  		/* Take it to the next level */

ptep = iopte_deref(pte, data);
} while (++lvl < ARM_LPAE_MAX_LEVELS);
  
  	/* Ran out of page tables to walk */

+   return -EFAULT;
+}
+
+struct iova_to_phys_walk_data {
+   arm_lpae_iopte pte;
+   int level;
+};
+
+static int iova_to_phys_walk_cb(void *cb_data, void *pte, int level)
+{
+   struct iova_to_phys_walk_data *d = cb_data;
+
+   d->pte = *(arm_lpae_iopte *)pte;
+   d->level = level;
+
return 0;
+}
+
+static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+unsigned long iova)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct iova_to_phys_walk_data d;
+   int ret;
+
+   ret = arm_lpae_pgtable_walk(ops, iova, iova_to_phys_walk_cb, );
+   if (ret)
+   return 0;
  
-found_translation:

-   iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
-   return iopte_to_paddr(pte, data) | iova;
+   iova &= (ARM_LPAE_BLOCK_SIZE(d.level, data) - 1);
+   return iopte_to_paddr(d.pte, data) | iova;
  }
  
  static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)

@@ -807,6 +839,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.map_pages  = arm_lpae_map_pages,
.unmap_pages= arm_lpae_unmap_pages,
.iova_to_phys   = arm_lpae_iova_to_phys,
+   .pgtable_walk   = arm_lpae_pgtable_walk,
};
  
  	return data;

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 86cf1f7ae389..261b48af068a 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -177,6 +177,7 @@ struct io_pgtable_cfg {
   * @map_pages:Map a physically contiguous range of pages of the same size.
   * @unmap_pages:  Unmap a range of virtually contiguous pages of the same 
size.
   * @iova_to_phys: Translate iova to physical address.
+ * @pgtable_walk: (optional) Perform a page table walk for a given iova.
   *
   * These functions map directly onto the iommu_ops member functions with
   * the same names.
@@ -190,6 +191,9 @@ struct io_pgtable_ops {
  struct iommu_iotlb_gather *gather);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
   

Re: [PATCH 2/9] iommu/rockchip: Attach multiple power domains

2024-06-14 Thread Robin Murphy

On 2024-06-13 10:38 pm, Sebastian Reichel wrote:

Hi,

On Thu, Jun 13, 2024 at 11:34:02AM GMT, Tomeu Vizoso wrote:

On Thu, Jun 13, 2024 at 11:24 AM Tomeu Vizoso  wrote:

On Thu, Jun 13, 2024 at 2:05 AM Sebastian Reichel
 wrote:

On Wed, Jun 12, 2024 at 03:52:55PM GMT, Tomeu Vizoso wrote:

IOMMUs with multiple base addresses can also have multiple power
domains.

The base framework only takes care of a single power domain, as some
devices will need for these power domains to be powered on in a specific
order.

Use a helper function to stablish links in the order in which they are
in the DT.

This is needed by the IOMMU used by the NPU in the RK3588.

Signed-off-by: Tomeu Vizoso 
---


To me it looks like this is multiple IOMMUs, which should each get
their own node. I don't see a good reason for merging these
together.


I have made quite a few attempts at splitting the IOMMUs and also the
cores, but I wasn't able to get things working stably. The TRM is
really scant about how the 4 IOMMU instances relate to each other, and
what the fourth one is for.

Given that the vendor driver treats them as a single IOMMU with four
instances and we don't have any information on them, I resigned myself
to just have them as a single device.

I would love to be proved wrong though and find a way fo getting
things stably as different devices so they can be powered on and off
as needed. We could save quite some code as well.


FWIW, here a few ways how I tried to structure the DT nodes, none of
these worked reliably:

https://gitlab.freedesktop.org/tomeu/linux/-/blob/6.10-rocket-multiple-devices-power/arch/arm64/boot/dts/rockchip/rk3588s.dtsi?ref_type=heads#L1163
https://gitlab.freedesktop.org/tomeu/linux/-/blob/6.10-rocket-schema-subnodes//arch/arm64/boot/dts/rockchip/rk3588s.dtsi?ref_type=heads#L1162
https://gitlab.freedesktop.org/tomeu/linux/-/blob/6.10-rocket-multiple-devices//arch/arm64/boot/dts/rockchip/rk3588s.dtsi?ref_type=heads#L1163
https://gitlab.freedesktop.org/tomeu/linux/-/blob/6.10-rocket-multiple-iommus//arch/arm64/boot/dts/rockchip/rk3588s.dtsi?ref_type=heads#L2669

I can very well imagine I missed some way of getting this to work, but
for every attempt, the domains, iommus and cores were resumed in
different orders that presumably caused problems during concurrent
execution fo workloads.

So I fell back to what the vendor driver does, which works reliably
(but all cores have to be powered on at the same time).


Mh. The "6.10-rocket-multiple-iommus" branch seems wrong. There is
only one iommu node in that. I would have expected a test with

rknn {
 // combined device

 iommus = <>, <>, ...;
};

Otherwise I think I would go with the schema-subnodes variant. The
driver can initially walk through the sub-nodes and collect the
resources into the main device, so on the driver side nothing would
really change. But that has a couple of advantages:

1. DT and DT binding are easier to read
2. It's similar to e.g. CPU cores each having their own node
3. Easy to extend to more cores in the future
4. The kernel can easily switch to proper per-core device model when
the problem has been identified


It also would seem to permit describing and associating the per-core 
IOMMUs individually - apart from core 0's apparent coupling to whatever 
shared "uncore" stuff exists for the whole thing, from the distinct 
clocks, interrupts, power domains etc. lining up with each core I'd 
guess those IOMMUs are not interrelated the same way the ISP's 
read/write IOMMUs are (which was the main justification for adopting the 
multiple-reg design originally vs. distinct DT nodes like Exynos does). 
However, practically that would require the driver to at least populate 
per-core child devices to make DMA API or IOMMU API mappings with, since 
we couldn't spread the "collect the resources" trick into those 
subsystems as well.


Thanks,
Robin.


Re: [PATCH v3] hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs

2024-06-12 Thread Robin Murphy

On 2024-06-12 1:50 pm, Philippe Mathieu-Daudé wrote:

On 12/6/24 14:48, Peter Maydell wrote:
On Wed, 12 Jun 2024 at 13:33, Philippe Mathieu-Daudé 
 wrote:


Hi Zhenyu,

On 12/6/24 04:05, Zhenyu Zhang wrote:

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3c93c0c0a6..3cefac6d43 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -271,6 +271,17 @@ static void create_fdt(VirtMachineState *vms)
   qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2);
   qemu_fdt_setprop_string(fdt, "/", "model", "linux,dummy-virt");

+    /*
+ * For QEMU, all DMA is coherent. Advertising this in the root 
node

+ * has two benefits:
+ *
+ * - It avoids potential bugs where we forget to mark a DMA
+ *   capable device as being dma-coherent
+ * - It avoids spurious warnings from the Linux kernel about
+ *   devices which can't do DMA at all
+ */
+    qemu_fdt_setprop(fdt, "/", "dma-coherent", NULL, 0);


OK, but why restrict that to the Aarch64 virt machine?
Shouldn't advertise this generically in create_device_tree()?
Or otherwise at least in the other virt machines?


create_device_tree() creates an empty device tree, not one
with stuff in it. It seems reasonable to me for this property
on the root to be set in the same place we set other properties
of the root node.


OK. Still the question about other virt machines remains
unanswered :)


From the DT consumer point of view, the interpretation and assumptions 
around coherency *are* generally architecture- or platform-specific. For 
instance on RISC-V, many platforms want to assume coherency by default 
(and potentially use "dma-noncoherent" to mark individual devices that 
aren't), while others may still want to do the opposite and use 
"dma-coherent" in the same manner as Arm and AArch64. Neither property 
existed back in ePAPR, so typical PowerPC systems wouldn't even be 
looking and will just make their own assumptions by other means.


Thanks,
Robin.



Re: [PATCH RFC] hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs

2024-06-06 Thread Robin Murphy
ng passed force_dma = true.
https://elixir.bootlin.com/linux/v6.10-rc2/source/drivers/amba/bus.c#L361

The is a comment in of_dma_configure()
/*
 * For legacy reasons, we have to assume some devices need
 * DMA configuration regardless of whether "dma-ranges" is
 * correctly specified or not.
 */
So this I think this is being triggered by a workaround for broken DT.

This was introduced by Robin Murphy +CC though you may need to ask on
kernel list because ARM / QEMU fun.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=723288836628b

Relevant comment from that patch description:

"Certain bus types have a general expectation of
DMA capability and carry a well-established precedent that an absent
"dma-ranges" implies the same as the empty property, so we automatically
opt those in to DMA configuration regardless, to avoid regressing most
existing platforms."

The patch implies that AMBA is one of those.

So not sure this is solveable without a hack such as eliding the warning
message if dma_force was set as the situation probably isn't relevant then..


Except it absolutely is, because the whole reason for setting force_dma 
on those buses is that they *do* commonly have DMA-capable devices, and 
they are also commonly non-coherent such that this condition would be 
serious. Especially AMBA, given that the things old enough to still be 
using that abstraction rather than plain platform (PL080, PL111, 
PL330,...) all predate ACE-Lite so don't even have the *possibility* of 
being coherent without external trickery in the interconnect.


Thanks,
Robin.



Re: [PATCH 00/20] iommu: Refactoring domain allocation interface

2024-05-30 Thread Robin Murphy

On 29/05/2024 6:32 am, Lu Baolu wrote:

The IOMMU subsystem has undergone some changes, including the removal
of iommu_ops from the bus structure. Consequently, the existing domain
allocation interface, which relies on a bus type argument, is no longer
relevant:

 struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)

This series is designed to refactor the use of this interface. It
proposes two new interfaces to replace iommu_domain_alloc():

- iommu_user_domain_alloc(): This interface is intended for allocating
   iommu domains managed by userspace for device passthrough scenarios,
   such as those used by iommufd, vfio, and vdpa. It clearly indicates
   that the domain is for user-managed device DMA.

   If an IOMMU driver does not implement iommu_ops->domain_alloc_user,
   this interface will rollback to the generic paging domain allocation.

- iommu_paging_domain_alloc(): This interface is for allocating iommu
   domains managed by kernel drivers for kernel DMA purposes. It takes a
   device pointer as a parameter, which better reflects the current
   design of the IOMMU subsystem.

The majority of device drivers currently using iommu_domain_alloc() do
so to allocate a domain for a specific device and then attach that
domain to the device. These cases can be straightforwardly migrated to
the new interfaces.


Ooh, nice! This was rising back up my to-do list as well, but I concur 
it's rather more straightforward than my version that did devious things 
to keep the iommu_domain_alloc() name...



However, there are some drivers with more complex use cases that do
not fit neatly into this new scheme. For example:

$ git grep "= iommu_domain_alloc"
arch/arm/mm/dma-mapping.c:  mapping->domain = iommu_domain_alloc(bus);


This one's simple enough, the refactor just needs to go one step deeper. 
I've just rebased and pushed my old patch for that, if you'd like it [1].



drivers/gpu/drm/rockchip/rockchip_drm_drv.c:private->domain = 
iommu_domain_alloc(private->iommu_dev->bus);


Both this one and usnic_uiom_alloc_pd() should be OK - back when I did 
all the figuring out to clean up iommu_present(), I specifically 
reworked them into "dev->bus" style as a reminder that it *is* supposed 
to be the right device for doing this with, even if the attach is a bit 
more distant.



drivers/gpu/drm/tegra/drm.c:tegra->domain = 
iommu_domain_alloc(_bus_type);


This is the tricky one, where the device to hand may *not* be the right 
device for IOMMU API use [2]. FWIW my plan was to pull the "walk the 
platform bus to find any IOMMU-mapped device" trick into this code and 
use it both to remove the final iommu_present() and for a device-based 
domain allocation.



drivers/infiniband/hw/usnic/usnic_uiom.c:   pd->domain = domain = 
iommu_domain_alloc(dev->bus);

This series leave those cases unchanged and keep iommu_domain_alloc()
for their usage. But new drivers should not use it anymore.


I'd certainly be keen for it to be gone ASAP, since I'm seeing 
increasing demand for supporting multiple IOMMU drivers, and this is the 
last bus-based thing standing in the way of that.


Thanks,
Robin.

[1] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/f048cc6a323d8641898025ca96071df7cbe8bd52
[2] 
https://lore.kernel.org/linux-iommu/add31812-50d5-6cb0-3908-143c523ab...@collabora.com/



The whole series is also available on GitHub:
https://github.com/LuBaolu/intel-iommu/commits/iommu-domain-allocation-refactor-v1

Lu Baolu (20):
   iommu: Add iommu_user_domain_alloc() interface
   iommufd: Use iommu_user_domain_alloc()
   vfio/type1: Use iommu_paging_domain_alloc()
   vhost-vdpa: Use iommu_user_domain_alloc()
   iommu: Add iommu_paging_domain_alloc() interface
   drm/msm: Use iommu_paging_domain_alloc()
   drm/nouveau/tegra: Use iommu_paging_domain_alloc()
   gpu: host1x: Use iommu_paging_domain_alloc()
   media: nvidia: tegra: Use iommu_paging_domain_alloc()
   media: venus: firmware: Use iommu_paging_domain_alloc()
   ath10k: Use iommu_paging_domain_alloc()
   wifi: ath11k: Use iommu_paging_domain_alloc()
   remoteproc: Use iommu_paging_domain_alloc()
   soc/fsl/qbman: Use iommu_paging_domain_alloc()
   iommu/vt-d: Add helper to allocate paging domain
   iommu/vt-d: Add domain_alloc_paging support
   iommu/vt-d: Simplify compatibility check for identity domain
   iommu/vt-d: Enhance compatibility check for paging domain attach
   iommu/vt-d: Remove domain_update_iommu_cap()
   iommu/vt-d: Remove domain_update_iommu_superpage()

  include/linux/iommu.h |  12 +
  drivers/gpu/drm/msm/msm_iommu.c   |   8 +-
  .../drm/nouveau/nvkm/engine/device/tegra.c|   4 +-
  drivers/gpu/host1x/dev.c  |   6 +-
  drivers/iommu/intel/iommu.c   | 319 --
  drivers/iommu/intel/pasid.c   |  28 +-
  drivers/iommu/iommu.c |  62 
  drivers/iommu/iommufd/hw_pagetable.c  | 

Re: [Intel-wired-lan] [PATCH net-next v6 13/21] bitmap: make bitmap_{get, set}_value8() use bitmap_{read, write}()

2024-05-30 Thread Robin Murphy

On 30/05/2024 6:11 pm, Yury Norov wrote:

On Wed, May 29, 2024 at 04:12:25PM +0100, Robin Murphy wrote:

Hi Alexander,

On 27/03/2024 3:23 pm, Alexander Lobakin wrote:

Now that we have generic bitmap_read() and bitmap_write(), which are
inline and try to take care of non-bound-crossing and aligned cases
to keep them optimized, collapse bitmap_{get,set}_value8() into
simple wrappers around the former ones.
bloat-o-meter shows no difference in vmlinux and -2 bytes for
gpio-pca953x.ko, which says the optimization didn't suffer due to
that change. The converted helpers have the value width embedded
and always compile-time constant and that helps a lot.


This change appears to have introduced a build failure for me on arm64
(with GCC 9.4.0 from Ubuntu 20.04.02) - reverting b44759705f7d makes
these errors go away again:

In file included from drivers/gpio/gpio-pca953x.c:12:
drivers/gpio/gpio-pca953x.c: In function ‘pca953x_probe’:
./include/linux/bitmap.h:799:17: error: array subscript [1, 1024] is outside 
array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds]
   799 |  map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
   | ^~
In file included from ./include/linux/atomic.h:5,
  from drivers/gpio/gpio-pca953x.c:11:
drivers/gpio/gpio-pca953x.c:1015:17: note: while referencing ‘val’
  1015 |  DECLARE_BITMAP(val, MAX_LINE);
   | ^~~
./include/linux/types.h:11:16: note: in definition of macro ‘DECLARE_BITMAP’
11 |  unsigned long name[BITS_TO_LONGS(bits)]
   |^~~~
In file included from drivers/gpio/gpio-pca953x.c:12:
./include/linux/bitmap.h:800:17: error: array subscript [1, 1024] is outside 
array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds]
   800 |  map[index + 1] |= (value >> space);
   |  ~~~^~~
In file included from ./include/linux/atomic.h:5,
  from drivers/gpio/gpio-pca953x.c:11:
drivers/gpio/gpio-pca953x.c:1015:17: note: while referencing ‘val’
  1015 |  DECLARE_BITMAP(val, MAX_LINE);
   | ^~~
./include/linux/types.h:11:16: note: in definition of macro ‘DECLARE_BITMAP’
11 |  unsigned long name[BITS_TO_LONGS(bits)]
   |^~~~

I've not dug further since I don't have any interest in the pca953x
driver - it just happened to be enabled in my config, so for now I've
turned it off. However I couldn't obviously see any other reports of
this, so here it is.


It's a compiler false-positive. The straightforward fix is to disable the 
warning
For gcc9+, and it's in Andrew Morton's tree alrady. but there's some discussion
ongoing on how it should be mitigated properlu:

https://lore.kernel.org/all/0ab2702f-8245-4f02-beb7-dcc7d79d5...@app.fastmail.com/T/


Ah, great! Guess I really should have scrolled further down my lore 
search results - I assumed I was looking for any other reports of a 
recent regression in mainline, not ones from 6 months ago :)


Cheers,
Robin.


Re: [Intel-wired-lan] [PATCH net-next v6 13/21] bitmap: make bitmap_{get, set}_value8() use bitmap_{read, write}()

2024-05-29 Thread Robin Murphy

Hi Alexander,

On 27/03/2024 3:23 pm, Alexander Lobakin wrote:

Now that we have generic bitmap_read() and bitmap_write(), which are
inline and try to take care of non-bound-crossing and aligned cases
to keep them optimized, collapse bitmap_{get,set}_value8() into
simple wrappers around the former ones.
bloat-o-meter shows no difference in vmlinux and -2 bytes for
gpio-pca953x.ko, which says the optimization didn't suffer due to
that change. The converted helpers have the value width embedded
and always compile-time constant and that helps a lot.


This change appears to have introduced a build failure for me on arm64
(with GCC 9.4.0 from Ubuntu 20.04.02) - reverting b44759705f7d makes
these errors go away again:

In file included from drivers/gpio/gpio-pca953x.c:12:
drivers/gpio/gpio-pca953x.c: In function ‘pca953x_probe’:
./include/linux/bitmap.h:799:17: error: array subscript [1, 1024] is outside 
array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds]
  799 |  map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
  | ^~
In file included from ./include/linux/atomic.h:5,
 from drivers/gpio/gpio-pca953x.c:11:
drivers/gpio/gpio-pca953x.c:1015:17: note: while referencing ‘val’
 1015 |  DECLARE_BITMAP(val, MAX_LINE);
  | ^~~
./include/linux/types.h:11:16: note: in definition of macro ‘DECLARE_BITMAP’
   11 |  unsigned long name[BITS_TO_LONGS(bits)]
  |^~~~
In file included from drivers/gpio/gpio-pca953x.c:12:
./include/linux/bitmap.h:800:17: error: array subscript [1, 1024] is outside 
array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds]
  800 |  map[index + 1] |= (value >> space);
  |  ~~~^~~
In file included from ./include/linux/atomic.h:5,
 from drivers/gpio/gpio-pca953x.c:11:
drivers/gpio/gpio-pca953x.c:1015:17: note: while referencing ‘val’
 1015 |  DECLARE_BITMAP(val, MAX_LINE);
  | ^~~
./include/linux/types.h:11:16: note: in definition of macro ‘DECLARE_BITMAP’
   11 |  unsigned long name[BITS_TO_LONGS(bits)]
  |^~~~

I've not dug further since I don't have any interest in the pca953x
driver - it just happened to be enabled in my config, so for now I've
turned it off. However I couldn't obviously see any other reports of
this, so here it is.

Thanks,
Robin.


Re: [PATCH] treewide: Fix common grammar mistake "the the"

2024-04-11 Thread Robin Murphy

On 11/04/2024 4:04 pm, Thorsten Blum wrote:

Use `find . -type f -exec sed -i 's/\/the/g' {} +` to find all
occurrences of "the the" and replace them with a single "the".


[...]

diff --git a/arch/arm/include/asm/unwind.h b/arch/arm/include/asm/unwind.h
index d60b09a5acfc..a75da9a01f91 100644
--- a/arch/arm/include/asm/unwind.h
+++ b/arch/arm/include/asm/unwind.h
@@ -10,7 +10,7 @@
  
  #ifndef __ASSEMBLY__
  
-/* Unwind reason code according the the ARM EABI documents */

+/* Unwind reason code according the ARM EABI documents */


Well, that's clearly still not right... repeated words aren't *always* 
redundant, sometimes they're meant to be other words ;)


Thanks,
Robin.


Re: [PATCH] treewide: Fix common grammar mistake "the the"

2024-04-11 Thread Robin Murphy

On 11/04/2024 4:04 pm, Thorsten Blum wrote:

Use `find . -type f -exec sed -i 's/\/the/g' {} +` to find all
occurrences of "the the" and replace them with a single "the".


[...]

diff --git a/arch/arm/include/asm/unwind.h b/arch/arm/include/asm/unwind.h
index d60b09a5acfc..a75da9a01f91 100644
--- a/arch/arm/include/asm/unwind.h
+++ b/arch/arm/include/asm/unwind.h
@@ -10,7 +10,7 @@
  
  #ifndef __ASSEMBLY__
  
-/* Unwind reason code according the the ARM EABI documents */

+/* Unwind reason code according the ARM EABI documents */


Well, that's clearly still not right... repeated words aren't *always* 
redundant, sometimes they're meant to be other words ;)


Thanks,
Robin.


Re: [PATCH] drm/panthor: Don't use virt_to_pfn()

2024-03-18 Thread Robin Murphy

On 18/03/2024 2:51 pm, Steven Price wrote:

virt_to_pfn() isn't available on x86 (except to xen) so breaks
COMPILE_TEST builds. Avoid its use completely by instead storing the
struct page pointer allocated in panthor_device_init() and using
page_to_pfn() instead.

Signed-off-by: Steven Price 
---
  drivers/gpu/drm/panthor/panthor_device.c | 10 ++
  drivers/gpu/drm/panthor/panthor_device.h |  2 +-
  2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c 
b/drivers/gpu/drm/panthor/panthor_device.c
index 69deb8e17778..3c30da03fa48 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -154,6 +154,7 @@ int panthor_device_init(struct panthor_device *ptdev)
  {
struct resource *res;
struct page *p;
+   u32 *dummy_page_virt;
int ret;
  
  	ptdev->coherent = device_get_dma_attr(ptdev->base.dev) == DEV_DMA_COHERENT;

@@ -172,9 +173,10 @@ int panthor_device_init(struct panthor_device *ptdev)
if (!p)
return -ENOMEM;
  
-	ptdev->pm.dummy_latest_flush = page_address(p);

+   ptdev->pm.dummy_latest_flush = p;
+   dummy_page_virt = page_address(p);
ret = drmm_add_action_or_reset(>base, panthor_device_free_page,
-  ptdev->pm.dummy_latest_flush);
+  dummy_page_virt);


Nit: I was about to say I'd be inclined to switch the callback to 
__free_page() instead, but then I realise there's no real need to be 
reinventing that in the first place:


dummy_page_virt = (void *)devm_get_free_pages(ptdev->base.dev,
GFP_KERNEL | GFP_ZERO, 0);
if (!dummy_page_virt)
return -ENOMEM;

ptdev->pm.dummy_latest_flush = virt_to_page(dummy_page_virt);

Cheers,
Robin.


if (ret)
return ret;
  
@@ -184,7 +186,7 @@ int panthor_device_init(struct panthor_device *ptdev)

 * happens while the dummy page is mapped. Zero cannot be used because
 * that means 'always flush'.
 */
-   *ptdev->pm.dummy_latest_flush = 1;
+   *dummy_page_virt = 1;
  
  	INIT_WORK(>reset.work, panthor_device_reset_work);

ptdev->reset.wq = alloc_ordered_workqueue("panthor-reset-wq", 0);
@@ -353,7 +355,7 @@ static vm_fault_t panthor_mmio_vm_fault(struct vm_fault 
*vmf)
if (active)
pfn = __phys_to_pfn(ptdev->phys_addr + 
CSF_GPU_LATEST_FLUSH_ID);
else
-   pfn = virt_to_pfn(ptdev->pm.dummy_latest_flush);
+   pfn = page_to_pfn(ptdev->pm.dummy_latest_flush);
break;
  
  	default:

diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index 51c9d61b6796..c84c27dcc92c 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -160,7 +160,7 @@ struct panthor_device {
 * Used to replace the real LATEST_FLUSH page when the GPU
 * is suspended.
 */
-   u32 *dummy_latest_flush;
+   struct page *dummy_latest_flush;
} pm;
  };
  


Re: [PATCH] drm/panthor: Fix the CONFIG_PM=n case

2024-03-18 Thread Robin Murphy

On 18/03/2024 1:49 pm, Steven Price wrote:

On 18/03/2024 13:08, Boris Brezillon wrote:

On Mon, 18 Mar 2024 11:31:05 +
Steven Price  wrote:


On 18/03/2024 08:58, Boris Brezillon wrote:

Putting a hard dependency on CONFIG_PM is not possible because of a
circular dependency issue, and it's actually not desirable either. In
order to support this use case, we forcibly resume at init time, and
suspend at unplug time.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202403031944.eoimq8wk-...@intel.com/
Signed-off-by: Boris Brezillon 


Reviewed-by: Steven Price 


---
Tested by faking CONFIG_PM=n in the driver (basically commenting
all pm_runtime calls, and making the panthor_device_suspend/resume()
calls unconditional in the panthor_device_unplug/init() path) since
CONFIG_ARCH_ROCKCHIP selects CONFIG_PM. Seems to work fine, but I
can't be 100% sure this will work correctly on a platform that has
CONFIG_PM=n.


The same - I can't test this properly :(

Note that the other option (which AFAICT doesn't cause any problems) is
to "select PM" rather than depend on it - AIUI the 'select' dependency
is considered in the opposite direction by kconfig so won't cause the
dependency loop.


Doesn't seem to work with COMPILE_TEST though? I mean, we need
something like

depends on ARM || ARM64 || (COMPILE_TEST && PM)
...
select PM

but kconfig doesn't like that


Why do we need the "&& PM" part? Just:

depends on ARM || ARM64 || COMPILE_TEST
...
select PM

Or at least that appears to work for me.


drivers/gpu/drm/panthor/Kconfig:3:error: recursive dependency detected!
drivers/gpu/drm/panthor/Kconfig:3:  symbol DRM_PANTHOR depends on
PM kernel/power/Kconfig:183:symbol PM is selected by DRM_PANTHOR

which id why I initially when for a depends on PM



Of course if there is actually anyone who has a
platform which can be built !CONFIG_PM then that won't help. But the
inability of anyone to actually properly test this configuration does
worry me a little.


Well, as long as it doesn't regress the PM behavior, I think I'm happy
to take the risk. Worst case scenario, someone complains that this is
not working properly when they do the !PM bringup :-).


Indeed, I've no objection to this patch - although I really should have
compiled tested it as Robin pointed out ;)

But one other thing I've noticed when compile testing it - we don't
appear to have fully fixed the virt_to_pfn() problem. On x86 with
COMPILE_TEST I still get an error. Looking at the code it appears that
virt_to_pfn() isn't available on x86... it overrides asm/page.h and
doesn't provide a definition. The definition on x86 is hiding in
asm/xen/page.h.

Outside of arch code it's only drivers/xen that currently uses that
function. So I guess it's probably best to do a
PFN_DOWN(virt_to_phys(...)) instead. Or look to fix x86 :)


FWIW from a quick look it might be cleaner to store the struct page 
pointer for the dummy page - especially since the VA only seems to be 
used once in panthor_device_init() anyway - then use page_to_pfn() at 
the business end.


Cheers,
Robin.


Re: [PATCH] drm/panthor: Fix the CONFIG_PM=n case

2024-03-18 Thread Robin Murphy

On 18/03/2024 8:58 am, Boris Brezillon wrote:

Putting a hard dependency on CONFIG_PM is not possible because of a
circular dependency issue, and it's actually not desirable either. In
order to support this use case, we forcibly resume at init time, and
suspend at unplug time.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202403031944.eoimq8wk-...@intel.com/
Signed-off-by: Boris Brezillon 
---
Tested by faking CONFIG_PM=n in the driver (basically commenting
all pm_runtime calls, and making the panthor_device_suspend/resume()
calls unconditional in the panthor_device_unplug/init() path) since
CONFIG_ARCH_ROCKCHIP selects CONFIG_PM. Seems to work fine, but I
can't be 100% sure this will work correctly on a platform that has
CONFIG_PM=n.
---
  drivers/gpu/drm/panthor/panthor_device.c | 13 +++--
  drivers/gpu/drm/panthor/panthor_drv.c|  4 +++-
  2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c 
b/drivers/gpu/drm/panthor/panthor_device.c
index 69deb8e17778..ba7aedbb4931 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -87,6 +87,10 @@ void panthor_device_unplug(struct panthor_device *ptdev)
pm_runtime_dont_use_autosuspend(ptdev->base.dev);
pm_runtime_put_sync_suspend(ptdev->base.dev);
  
+	/* If PM is disabled, we need to call the suspend handler manually. */

+   if (!IS_ENABLED(CONFIG_PM))
+   panthor_device_suspend(ptdev->base.dev);
+
/* Report the unplug operation as done to unblock concurrent
 * panthor_device_unplug() callers.
 */
@@ -218,6 +222,13 @@ int panthor_device_init(struct panthor_device *ptdev)
if (ret)
return ret;
  
+	/* If PM is disabled, we need to call panthor_device_resume() manually. */

+   if (!IS_ENABLED(CONFIG_PM)) {
+   ret = panthor_device_resume(ptdev->base.dev);
+   if (ret)
+   return ret;
+   }
+
ret = panthor_gpu_init(ptdev);
if (ret)
goto err_rpm_put;
@@ -402,7 +413,6 @@ int panthor_device_mmap_io(struct panthor_device *ptdev, 
struct vm_area_struct *
return 0;
  }
  
-#ifdef CONFIG_PM

  int panthor_device_resume(struct device *dev)
  {
struct panthor_device *ptdev = dev_get_drvdata(dev);
@@ -547,4 +557,3 @@ int panthor_device_suspend(struct device *dev)
mutex_unlock(>pm.mmio_lock);
return ret;
  }
-#endif
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index ff484506229f..2ea6a9f436db 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1407,17 +1407,19 @@ static const struct of_device_id dt_match[] = {
  };
  MODULE_DEVICE_TABLE(of, dt_match);
  
+#ifdef CONFIG_PM


This #ifdef isn't necessary, and in fact will break the !PM build - 
pm_ptr() already takes care of allowing the compiler to optimise out the 
ops structure itself without any further annotations.


Thanks,
Robin.


  static DEFINE_RUNTIME_DEV_PM_OPS(panthor_pm_ops,
 panthor_device_suspend,
 panthor_device_resume,
 NULL);
+#endif
  
  static struct platform_driver panthor_driver = {

.probe = panthor_probe,
.remove_new = panthor_remove,
.driver = {
.name = "panthor",
-   .pm = _pm_ops,
+   .pm = pm_ptr(_pm_ops),
.of_match_table = dt_match,
},
  };


Re: [PATCH 3/3] dt-bindings: remoteproc: Add Arm remoteproc

2024-03-13 Thread Robin Murphy

On 2024-03-01 4:42 pm, abdellatif.elkhl...@arm.com wrote:

From: Abdellatif El Khlifi 

introduce the bindings for Arm remoteproc support.

Signed-off-by: Abdellatif El Khlifi 
---
  .../bindings/remoteproc/arm,rproc.yaml| 69 +++
  MAINTAINERS   |  1 +
  2 files changed, 70 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml

diff --git a/Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml 
b/Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml
new file mode 100644
index ..322197158059
--- /dev/null
+++ b/Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml
@@ -0,0 +1,69 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/remoteproc/arm,rproc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm Remoteproc Devices
+
+maintainers:
+  - Abdellatif El Khlifi 
+
+description: |
+  Some Arm heterogeneous System-On-Chips feature remote processors that can
+  be controlled with a reset control register and a reset status register to
+  start or stop the processor.
+
+  This document defines the bindings for these remote processors.
+
+properties:
+  compatible:
+enum:
+  - arm,corstone1000-extsys
+
+  reg:
+minItems: 2
+maxItems: 2
+description: |
+  Address and size in bytes of the reset control register
+  and the reset status register.
+  Expects the registers to be in the order as above.
+  Should contain an entry for each value in 'reg-names'.
+
+  reg-names:
+description: |
+  Required names for each of the reset registers defined in
+  the 'reg' property. Expects the names from the following
+  list, in the specified order, each representing the corresponding
+  reset register.
+items:
+  - const: reset-control
+  - const: reset-status
+
+  firmware-name:
+description: |
+  Default name of the firmware to load to the remote processor.


So... is loading the firmware image achieved by somehow bitbanging it 
through the one reset register, maybe? I find it hard to believe this is 
a complete and functional binding.


Frankly at the moment I'd be inclined to say it isn't even a remoteproc 
binding (or driver) at all, it's a reset controller. Bindings are a 
contract for describing the hardware, not the current state of Linux 
driver support - if this thing still needs mailboxes, shared memory, a 
reset vector register, or whatever else to actually be useful, those 
should be in the binding from day 1 so that a) people can write and 
deploy correct DTs now, such that functionality becomes available on 
their systems as soon as driver support catches up, and b) the community 
has any hope of being able to review whether the binding is 
appropriately designed and specified for the purpose it intends to serve.


For instance right now it seems somewhat tenuous to describe two 
consecutive 32-bit registers as separate "reg" entries, but *maybe* it's 
OK if that's all there ever is. However if it's actually going to end up 
needing several more additional MMIO and/or memory regions for other 
functionality, then describing each register and location individually 
is liable to get unmanageable really fast, and a higher-level functional 
grouping (e.g. these reset-related registers together as a single 8-byte 
region) would likely be a better design.


Thanks,
Robin.


+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - firmware-name
+
+additionalProperties: false
+
+examples:
+  - |
+extsys0: remoteproc@1a010310 {
+compatible = "arm,corstone1000-extsys";
+reg = <0x1a010310 0x4>, <0x1a010314 0x4>;
+reg-names = "reset-control", "reset-status";
+firmware-name = "es0_flashfw.elf";
+};
+
+extsys1: remoteproc@1a010318 {
+compatible = "arm,corstone1000-extsys";
+reg = <0x1a010318 0x4>, <0x1a01031c 0x4>;
+reg-names = "reset-control", "reset-status";
+firmware-name = "es1_flashfw.elf";
+};
diff --git a/MAINTAINERS b/MAINTAINERS
index 54d6a40feea5..eddaa3841a65 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1768,6 +1768,7 @@ ARM REMOTEPROC DRIVER
  M:Abdellatif El Khlifi 
  L:linux-remotep...@vger.kernel.org
  S:Maintained
+F: Documentation/devicetree/bindings/remoteproc/arm,rproc.yaml
  F:drivers/remoteproc/arm_rproc.c
  
  ARM SMC WATCHDOG DRIVER




Re: [PATCH 3/3] drm/panthor: Fix undefined panthor_device_suspend/resume symbol issue

2024-03-11 Thread Robin Murphy

On 2024-03-11 1:22 pm, Boris Brezillon wrote:

On Mon, 11 Mar 2024 13:11:28 +
Robin Murphy  wrote:


On 2024-03-11 11:52 am, Boris Brezillon wrote:

On Mon, 11 Mar 2024 13:49:56 +0200
Jani Nikula  wrote:
   

On Mon, 11 Mar 2024, Boris Brezillon  wrote:

On Mon, 11 Mar 2024 13:05:01 +0200
Jani Nikula  wrote:
 

This breaks the config for me:

SYNCinclude/config/auto.conf.cmd
GEN Makefile
drivers/iommu/Kconfig:14:error: recursive dependency detected!
drivers/iommu/Kconfig:14:   symbol IOMMU_SUPPORT is selected by DRM_PANTHOR
drivers/gpu/drm/panthor/Kconfig:3:  symbol DRM_PANTHOR depends on PM
kernel/power/Kconfig:183:   symbol PM is selected by PM_SLEEP
kernel/power/Kconfig:117:   symbol PM_SLEEP depends on HIBERNATE_CALLBACKS
kernel/power/Kconfig:35:symbol HIBERNATE_CALLBACKS is selected by 
XEN_SAVE_RESTORE
arch/x86/xen/Kconfig:67:symbol XEN_SAVE_RESTORE depends on XEN
arch/x86/xen/Kconfig:6: symbol XEN depends on PARAVIRT
arch/x86/Kconfig:781:   symbol PARAVIRT is selected by HYPERV
drivers/hv/Kconfig:5:   symbol HYPERV depends on X86_LOCAL_APIC
arch/x86/Kconfig:1106:  symbol X86_LOCAL_APIC depends on X86_UP_APIC
arch/x86/Kconfig:1081:  symbol X86_UP_APIC prompt is visible depending on 
PCI_MSI
drivers/pci/Kconfig:39: symbol PCI_MSI is selected by AMD_IOMMU
drivers/iommu/amd/Kconfig:3:symbol AMD_IOMMU depends on IOMMU_SUPPORT


Uh, I guess we want a "depends on IOMMU_SUPPORT" instead of "select
IOMMU_SUPPORT" in panthor then.


That works for me.


Let's revert the faulty commit first. We'll see if Steve has a
different solution for the original issue.


FWIW, the reasoning in the offending commit seems incredibly tenuous.
There are far more practical reasons for building an arm/arm64 kernel
without PM - for debugging or whatever, and where one may even still
want a usable GPU, let alone just a non-broken build - than there are
for building this driver for x86. Using pm_ptr() is trivial, and if you
want to support COMPILE_TEST then there's really no justifiable excuse
not to.


The problem is not just about using pm_ptr(), but also making sure
panthor_device_resume/suspend() are called called in the init/unplug
path when !PM, as I don't think the PM helpers automate that for us. I
was just aiming for a simple fix that wouldn't force me to test the !PM
case...
Fair enough, at worst we could always have a runtime check and refuse to 
probe in conditions we don't think are worth the bother of implementing 
fully-functional support for. However if we want to make an argument for 
only supporting "realistic" configs at build time then that is an 
argument for dropping COMPILE_TEST as well.


Thanks,
Robin.


Re: [PATCH 3/3] drm/panthor: Fix undefined panthor_device_suspend/resume symbol issue

2024-03-11 Thread Robin Murphy

On 2024-03-11 11:52 am, Boris Brezillon wrote:

On Mon, 11 Mar 2024 13:49:56 +0200
Jani Nikula  wrote:


On Mon, 11 Mar 2024, Boris Brezillon  wrote:

On Mon, 11 Mar 2024 13:05:01 +0200
Jani Nikula  wrote:
  

This breaks the config for me:

   SYNCinclude/config/auto.conf.cmd
   GEN Makefile
drivers/iommu/Kconfig:14:error: recursive dependency detected!
drivers/iommu/Kconfig:14:   symbol IOMMU_SUPPORT is selected by DRM_PANTHOR
drivers/gpu/drm/panthor/Kconfig:3:  symbol DRM_PANTHOR depends on PM
kernel/power/Kconfig:183:   symbol PM is selected by PM_SLEEP
kernel/power/Kconfig:117:   symbol PM_SLEEP depends on HIBERNATE_CALLBACKS
kernel/power/Kconfig:35:symbol HIBERNATE_CALLBACKS is selected by 
XEN_SAVE_RESTORE
arch/x86/xen/Kconfig:67:symbol XEN_SAVE_RESTORE depends on XEN
arch/x86/xen/Kconfig:6: symbol XEN depends on PARAVIRT
arch/x86/Kconfig:781:   symbol PARAVIRT is selected by HYPERV
drivers/hv/Kconfig:5:   symbol HYPERV depends on X86_LOCAL_APIC
arch/x86/Kconfig:1106:  symbol X86_LOCAL_APIC depends on X86_UP_APIC
arch/x86/Kconfig:1081:  symbol X86_UP_APIC prompt is visible depending on 
PCI_MSI
drivers/pci/Kconfig:39: symbol PCI_MSI is selected by AMD_IOMMU
drivers/iommu/amd/Kconfig:3:symbol AMD_IOMMU depends on IOMMU_SUPPORT


Uh, I guess we want a "depends on IOMMU_SUPPORT" instead of "select
IOMMU_SUPPORT" in panthor then.


That works for me.


Let's revert the faulty commit first. We'll see if Steve has a
different solution for the original issue.


FWIW, the reasoning in the offending commit seems incredibly tenuous. 
There are far more practical reasons for building an arm/arm64 kernel 
without PM - for debugging or whatever, and where one may even still 
want a usable GPU, let alone just a non-broken build - than there are 
for building this driver for x86. Using pm_ptr() is trivial, and if you 
want to support COMPILE_TEST then there's really no justifiable excuse 
not to.


Thanks,
Robin.


Re: [mainline] [linux-next] [6.8-rc1] [FC] [DLPAR] OOps kernel crash after performing dlpar remove test

2024-02-02 Thread Robin Murphy
] [c000a878bcb0] [c0685d3c]
kernfs_fop_write_iter+0x1cc/0x280
[  981.124283] [c000a878bd00] [c05909c8] vfs_write+0x358/0x4b0
[  981.124288] [c000a878bdc0] [c0590cfc] ksys_write+0x7c/0x140
[  981.124293] [c000a878be10] [c0036554]
system_call_exception+0x134/0x330
[  981.124298] [c000a878be50] [c000d6a0]
system_call_common+0x160/0x2e4
[  981.124303] --- interrupt: c00 at 0x200013f21594
[  981.124306] NIP:  200013f21594 LR: 200013e97bf4 CTR:

[  981.124309] REGS: c000a878be80 TRAP: 0c00   Not tainted
(6.5.0-rc6-next-20230817-auto)
[  981.124312] MSR:  8280f033
  CR: 22000282  XER: 
[  981.124321] IRQMASK: 0
[  981.124321] GPR00: 0004 73a55c70 200014007300
0007
[  981.124321] GPR04: 00013aff5750 0008 fbad2c80
00013afd02a0
[  981.124321] GPR08: 0001  

[  981.124321] GPR12:  200013b7bc30 

[  981.124321] GPR16:   

[  981.124321] GPR20:   

[  981.124321] GPR24: 00010ef61668  0008
00013aff5750
[  981.124321] GPR28: 0008 00013afd02a0 00013aff5750
0008
[  981.124356] NIP [200013f21594] 0x200013f21594
[  981.124358] LR [200013e97bf4] 0x200013e97bf4
[  981.124361] --- interrupt: c00
[  981.124362] Code: 38427bd0 7c0802a6 6000 7c0802a6 fba1ffe8
fbc1fff0 fbe1fff8 7cbf2b78 38a0 7cdd3378 f8010010 f821ffc1
 4bff95d1 6000 7c7e1b79
[  981.124374] ---[ end trace  ]---


Thanks and Regards

On 1/31/24 16:18, Robin Murphy wrote:

On 2024-01-31 9:19 am, Tasmiya Nalatwad wrote:

Greetings,

[mainline] [linux-next] [6.8-rc1] [DLPAR] OOps kernel crash after 
performing dlpar remove test


--- Traces ---

[58563.146236] BUG: Unable to handle kernel data access at 
0x6b6b6b6b6b6b6b83

[58563.146242] Faulting instruction address: 0xc09c0e60
[58563.146248] Oops: Kernel access of bad area, sig: 11 [#1]
[58563.146252] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
[58563.146258] Modules linked in: isofs cdrom dm_snapshot dm_bufio 
dm_round_robin dm_queue_length exfat vfat fat btrfs blake2b_generic 
xor raid6_pq zstd_compress loop xfs libcrc32c raid0 nvram rpadlpar_io 
rpaphp nfnetlink xsk_diag bonding tls rfkill sunrpc dm_service_time 
dm_multipath dm_mod pseries_rng vmx_crypto binfmt_misc ext4 mbcache 
jbd2 sd_mod sg ibmvscsi scsi_transport_srp ibmveth lpfc nvmet_fc 
nvmet nvme_fc nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 
scsi_transport_fc fuse
[58563.146326] CPU: 0 PID: 1071247 Comm: drmgr Kdump: loaded Not 
tainted 6.8.0-rc1-auto-gecb1b8288dc7 #1
[58563.146332] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 
0xf05 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
[58563.146337] NIP:  c09c0e60 LR: c09c0e28 CTR: 
c09c1584
[58563.146342] REGS: c0007960f260 TRAP: 0380   Not tainted 
(6.8.0-rc1-auto-gecb1b8288dc7)
[58563.146347] MSR:  80009033   CR: 
24822424  XER: 20040006

[58563.146360] CFAR: c09c0e74 IRQMASK: 0
[58563.146360] GPR00: c09c0e28 c0007960f500 
c1482600 c3050540
[58563.146360] GPR04:  c0089a6870c0 
0001 fffe
[58563.146360] GPR08: c2bac020 6b6b6b6b6b6b6b6b 
6b6b6b6b6b6b6b6b 0220
[58563.146360] GPR12: 2000 c308 
 
[58563.146360] GPR16:   
 0001
[58563.146360] GPR20: c1281478  
c1281490 c2bfed80
[58563.146360] GPR24: c0089a6870c0  
 c2b9ffb8
[58563.146360] GPR28:  c2bac0e8 
 

[58563.146421] NIP [c09c0e60] iommu_ops_from_fwnode+0x68/0x118
[58563.146430] LR [c09c0e28] iommu_ops_from_fwnode+0x30/0x118


This implies that iommu_device_list has become corrupted. Looks like 
spapr_tce_setup_phb_iommus_initcall() registers an iommu_device which 
pcibios_free_controller() could free if a PCI controller is removed, 
but there's no path anywhere to ever unregister any of those IOMMUs. 
Presumably this also means that is a PCI controller is dynamically 
added after init, its IOMMU won't be set up properly either.


Thanks,
Robin.


[58563.146437] Call Trace:
[58563.146439] [c0007960f500] [c0007960f560] 
0xc0007960f560 (unreliable)
[58563.146446] [c0007960f530] [c09c0fd0] 
__iommu_probe_device+0xc0/0x5c0
[58563.146454] [c0007960f5a0] [c09c151c] 
iommu_probe_device+0x4c/0xb4
[58563.146462] [c0007960f5e0] [c09c15d0] 
iommu_bus_notifier+0x4c/0x8c
[58563.146469

Re: [mainline] [linux-next] [6.8-rc1] [FC] [DLPAR] OOps kernel crash after performing dlpar remove test

2024-01-31 Thread Robin Murphy

On 2024-01-31 9:19 am, Tasmiya Nalatwad wrote:

Greetings,

[mainline] [linux-next] [6.8-rc1] [DLPAR] OOps kernel crash after 
performing dlpar remove test


--- Traces ---

[58563.146236] BUG: Unable to handle kernel data access at 
0x6b6b6b6b6b6b6b83

[58563.146242] Faulting instruction address: 0xc09c0e60
[58563.146248] Oops: Kernel access of bad area, sig: 11 [#1]
[58563.146252] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
[58563.146258] Modules linked in: isofs cdrom dm_snapshot dm_bufio 
dm_round_robin dm_queue_length exfat vfat fat btrfs blake2b_generic xor 
raid6_pq zstd_compress loop xfs libcrc32c raid0 nvram rpadlpar_io rpaphp 
nfnetlink xsk_diag bonding tls rfkill sunrpc dm_service_time 
dm_multipath dm_mod pseries_rng vmx_crypto binfmt_misc ext4 mbcache jbd2 
sd_mod sg ibmvscsi scsi_transport_srp ibmveth lpfc nvmet_fc nvmet 
nvme_fc nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 
scsi_transport_fc fuse
[58563.146326] CPU: 0 PID: 1071247 Comm: drmgr Kdump: loaded Not tainted 
6.8.0-rc1-auto-gecb1b8288dc7 #1
[58563.146332] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 
0xf05 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
[58563.146337] NIP:  c09c0e60 LR: c09c0e28 CTR: 
c09c1584
[58563.146342] REGS: c0007960f260 TRAP: 0380   Not tainted 
(6.8.0-rc1-auto-gecb1b8288dc7)
[58563.146347] MSR:  80009033   CR: 
24822424  XER: 20040006

[58563.146360] CFAR: c09c0e74 IRQMASK: 0
[58563.146360] GPR00: c09c0e28 c0007960f500 c1482600 
c3050540
[58563.146360] GPR04:  c0089a6870c0 0001 
fffe
[58563.146360] GPR08: c2bac020 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 
0220
[58563.146360] GPR12: 2000 c308  

[58563.146360] GPR16:    
0001
[58563.146360] GPR20: c1281478  c1281490 
c2bfed80
[58563.146360] GPR24: c0089a6870c0   
c2b9ffb8
[58563.146360] GPR28:  c2bac0e8  


[58563.146421] NIP [c09c0e60] iommu_ops_from_fwnode+0x68/0x118
[58563.146430] LR [c09c0e28] iommu_ops_from_fwnode+0x30/0x118


This implies that iommu_device_list has become corrupted. Looks like 
spapr_tce_setup_phb_iommus_initcall() registers an iommu_device which 
pcibios_free_controller() could free if a PCI controller is removed, but 
there's no path anywhere to ever unregister any of those IOMMUs. 
Presumably this also means that is a PCI controller is dynamically added 
after init, its IOMMU won't be set up properly either.


Thanks,
Robin.


[58563.146437] Call Trace:
[58563.146439] [c0007960f500] [c0007960f560] 0xc0007960f560 
(unreliable)
[58563.146446] [c0007960f530] [c09c0fd0] 
__iommu_probe_device+0xc0/0x5c0
[58563.146454] [c0007960f5a0] [c09c151c] 
iommu_probe_device+0x4c/0xb4
[58563.146462] [c0007960f5e0] [c09c15d0] 
iommu_bus_notifier+0x4c/0x8c
[58563.146469] [c0007960f600] [c019e3d0] 
notifier_call_chain+0xb8/0x1a0
[58563.146476] [c0007960f660] [c019eea0] 
blocking_notifier_call_chain+0x64/0x94

[58563.146483] [c0007960f6a0] [c09d3c5c] bus_notify+0x50/0x7c
[58563.146491] [c0007960f6e0] [c09cfba4] device_add+0x774/0x9bc
[58563.146498] [c0007960f7a0] [c08abe9c] 
pci_device_add+0x2f4/0x864
[58563.146506] [c0007960f850] [c007d5a0] 
of_create_pci_dev+0x390/0xa08
[58563.146514] [c0007960f930] [c007de68] 
__of_scan_bus+0x250/0x328
[58563.146520] [c0007960fa10] [c007a680] 
pcibios_scan_phb+0x274/0x3c0
[58563.146527] [c0007960fae0] [c0105d58] 
init_phb_dynamic+0xb8/0x110
[58563.146535] [c0007960fb50] [c008217b0380] 
dlpar_add_slot+0x170/0x3b4 [rpadlpar_io]
[58563.146544] [c0007960fbf0] [c008217b0ca0] 
add_slot_store+0xa4/0x140 [rpadlpar_io]
[58563.146551] [c0007960fc80] [c0f3dbec] 
kobj_attr_store+0x30/0x4c
[58563.146559] [c0007960fca0] [c06931fc] 
sysfs_kf_write+0x68/0x7c
[58563.146566] [c0007960fcc0] [c0691b2c] 
kernfs_fop_write_iter+0x1c8/0x278

[58563.146573] [c0007960fd10] [c0599f54] vfs_write+0x340/0x4cc
[58563.146580] [c0007960fdc0] [c059a2bc] ksys_write+0x7c/0x140
[58563.146587] [c0007960fe10] [c0035d74] 
system_call_exception+0x134/0x330
[58563.146595] [c0007960fe50] [c000d6a0] 
system_call_common+0x160/0x2e4

[58563.146602] --- interrupt: c00 at 0x24470cb4
[58563.146606] NIP:  24470cb4 LR: 243e7d04 CTR: 

[58563.146611] REGS: c0007960fe80 TRAP: 0c00   Not tainted 
(6.8.0-rc1-auto-gecb1b8288dc7)
[58563.146616] MSR:  8280f033 
  CR: 24000282  XER: 

[58563.146632] IRQMASK: 0
[58563.146632] GPR00: 

Re: [PATCH 00/17] video: dw_hdmi: Support Vendor PHY

2023-12-18 Thread Robin Murphy

On 2023-12-15 7:13 am, Kever Yang wrote:

Hi Jagan,

On 2023/12/15 14:36, Jagan Teki wrote:

Hi Heiko/Kerver/Anatoloj,

On Mon, Dec 11, 2023 at 2:30 PM Jagan Teki 
 wrote:

Unlike RK3399, Sunxi/Meson DW HDMI the new Rockchip SoC Rk3328 would
support external vendor PHY with DW HDMI chip.

Support this vendor PHY by adding new platform PHY ops via DW HDMI
driver and call the respective generic phy from platform driver code.

This series tested in RK3328 with 1080p (1920x1080) resolution.

Patch 0001/0005: Support Vendor PHY
Patch 0006/0008: VOP extension for win, dsp offsets
Patch 0009/0010: RK3328 VOP, HDMI clocks
Patch 0011:  Rockchip Inno HDMI PHY
Patch 0012:  RK3328 HDMI driver
Patch 0013:  RK3328 VOP driver
Patch 0014/0017: Enable HDMI Out for RK3328

Importent:
One pontential issues is that Linux HDMI out on RK3328 has effected by
this patchset as I wouldn't find any relation or clue.

[    0.752016] Loading compiled-in X.509 certificates
[    0.787796] inno_hdmi_phy_rk3328_clk_recalc_rate: parent 2400
[    0.788391] inno-hdmi-phy ff43.phy: 
inno_hdmi_phy_rk3328_clk_recalc_rate rate 14850 vco 14850
[    0.798353] rockchip-drm display-subsystem: bound ff37.vop 
(ops vop_component_ops)
[    0.799403] dwhdmi-rockchip ff3c.hdmi: supply avdd-0v9 not 
found, using dummy regulator
[    0.800288] rk_iommu ff373f00.iommu: Enable stall request timed 
out, status: 0x4b
[    0.801131] dwhdmi-rockchip ff3c.hdmi: supply avdd-1v8 not 
found, using dummy regulator
[    0.802056] rk_iommu ff373f00.iommu: Disable paging request timed 
out, status: 0x4b
[    0.803233] dwhdmi-rockchip ff3c.hdmi: Detected HDMI TX 
controller v2.11a with HDCP (inno_dw_hdmi_phy2)
[    0.805355] dwhdmi-rockchip ff3c.hdmi: registered DesignWare 
HDMI I2C bus driver
[    0.808769] rockchip-drm display-subsystem: bound ff3c.hdmi 
(ops dw_hdmi_rockchip_ops)
[    0.810869] [drm] Initialized rockchip 1.0.0 20140818 for 
display-subsystem on minor 0


The only way I can use Linux HDMI by disabling IOMMU or support
disable-iommu link for RK3328 via DT [1].

[1] https://www.spinics.net/lists/devicetree/msg605124.html

Is anyone aware of this issue? I did post the patches for Linux IOMMU
but seems not a proper solution. Any suggestions?


I'm not expert in HDMI/VOP, so I can't provide a suitable solution in 
the kernel,


but here is the reason why we need patch to workaround the issue in the 
kernel:


- The VOP driver working in U-Boot is non-IOMMU mode, and the VOP access 
DDR by physical address;


- The VOP driver working in kernel with IOMMU enabled(by default), the 
VOP access DDR with virtual address(by IOMMU);


- The VOP is keep working in kernel before kernel VOP driver is enabled, 
and the IOMMU driver will be enabled by


    the Linux PM framework, since the IOMMU is not correctly configured 
at this point, the VOP will access unknown


     space(the original physical address in U-Boot) convert by IOMMU;

So we need to disable the IOMMU temporary in kernel startup before VOP 
driver is enabled.


If U-Boot isn't handing off an active framebuffer, then it should be 
U-Boot's responsibility to stop the VOP before it exits; if on the other 
hand it is, then it can now use the "iommu-addresses" DT property (see 
the reserved-memory schema) on the framebuffer region, and we should 
just need a bit of work in the IOMMU driver to ensure that is respected 
during the period between the IOMMU initialising and the Linux VOP 
driver subsequently taking over (i.e. so it won't get stuck on an 
unexpected page fault as seems to be happening above). The IOMMU aspect 
of that ought to be fairly straightforward; the trickier part might be 
the runtime PM aspect to ensure the IOMMU doesn't let itself go idle and 
actually turn anything off during that period. I also still think that 
doing the full rk_iommu_disable() upon runtime suspend is wrong, but 
that's more of a thing which confounds the underlying issue here, rather 
than being the problem in itself.


Thanks,
Robin.


Re: [PATCH v2] iommu/arm-smmu-qcom: Add missing GMU entry to match table

2023-12-11 Thread Robin Murphy

On 2023-12-10 6:06 pm, Rob Clark wrote:

From: Rob Clark 

In some cases the firmware expects cbndx 1 to be assigned to the GMU,
so we also want the default domain for the GMU to be an identy domain.
This way it does not get a context bank assigned.  Without this, both
of_dma_configure() and drm/msm's iommu_domain_attach() will trigger
allocating and configuring a context bank.  So GMU ends up attached to
both cbndx 1 and later cbndx 2.  This arrangement seemingly confounds
and surprises the firmware if the GPU later triggers a translation
fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU
getting wedged and the GPU stuck without memory access.


Reviewed-by: Robin Murphy 


Cc: sta...@vger.kernel.org
Signed-off-by: Rob Clark 
---

I didn't add a fixes tag because really this issue has been there
all along, but either didn't matter with other firmware or we didn't
notice the problem.

  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 549ae4dba3a6..d326fa230b96 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -243,6 +243,7 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
  
  static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {

{ .compatible = "qcom,adreno" },
+   { .compatible = "qcom,adreno-gmu" },
{ .compatible = "qcom,mdp4" },
{ .compatible = "qcom,mdss" },
{ .compatible = "qcom,sc7180-mdss" },


Re: [PATCH] iommu/arm-smmu-qcom: Add missing GMU entry to match table

2023-12-08 Thread Robin Murphy

On 07/12/2023 9:24 pm, Rob Clark wrote:

From: Rob Clark 

We also want the default domain for the GMU to be an identy domain,
so it does not get a context bank assigned.  Without this, both
of_dma_configure() and drm/msm's iommu_domain_attach() will trigger
allocating and configuring a context bank.  So GMU ends up attached
to both cbndx 1 and cbndx 2.


I can't help but read this as implying that it gets attached to both *at 
the same time*, which would be indicative of a far more serious problem 
in the main driver and/or IOMMU core code.


However, from what we discussed on IRC last night, it sounds like the 
key point here is more straightforwardly that firmware expects the GMU 
to be using context bank 1, in a vaguely similar fashion to how context 
bank 0 is special for the GPU. Clarifying that would help explain why 
we're just doing this as a trick to influence the allocator (i.e. unlike 
some of the other devices in this list we don't actually need the 
properties of the identity domain itself).


In future it might be nice to reserve this explicitly on platforms which 
need it and extend qcom_adreno_smmu_alloc_context_bank() to handle the 
GMU as well, but I don't object to this patch as an immediate quick fix 
for now, especially as something nice and easy for stable (I'd agree 
with Johan in that regard).


Thanks,
Robin.


 This arrangement seemingly confounds
and surprises the firmware if the GPU later triggers a translation
fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU
getting wedged and the GPU stuck without memory access.

Signed-off-by: Rob Clark 
---
  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 549ae4dba3a6..d326fa230b96 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -243,6 +243,7 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
  
  static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {

{ .compatible = "qcom,adreno" },
+   { .compatible = "qcom,adreno-gmu" },
{ .compatible = "qcom,mdp4" },
{ .compatible = "qcom,mdss" },
{ .compatible = "qcom,sc7180-mdss" },


Re: [PATCH 1/3] iommu/msm-iommu: don't limit the driver too much

2023-12-07 Thread Robin Murphy

On 07/12/2023 12:54 pm, Dmitry Baryshkov wrote:

In preparation of dropping most of ARCH_QCOM subtypes, stop limiting the
driver just to those machines. Allow it to be built for any 32-bit
Qualcomm platform (ARCH_QCOM).


Acked-by: Robin Murphy 

Unless Joerg disagrees, I think it should be fine if you want to take 
this via the SoC tree.


Thanks,
Robin.



Signed-off-by: Dmitry Baryshkov 
---
  drivers/iommu/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b..fd67f586f010 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -178,7 +178,7 @@ config FSL_PAMU
  config MSM_IOMMU
bool "MSM IOMMU Support"
depends on ARM
-   depends on ARCH_MSM8X60 || ARCH_MSM8960 || COMPILE_TEST
+   depends on ARCH_QCOM || COMPILE_TEST
select IOMMU_API
select IOMMU_IO_PGTABLE_ARMV7S
help


Re: [PATCH 10/10] ACPI: IORT: Allow COMPILE_TEST of IORT

2023-11-30 Thread Robin Murphy

On 29/11/2023 12:48 am, Jason Gunthorpe wrote:

The arm-smmu driver can COMPILE_TEST on x86, so expand this to also
enable the IORT code so it can be COMPILE_TEST'd too.

Signed-off-by: Jason Gunthorpe 
---
  drivers/acpi/Kconfig| 2 --
  drivers/acpi/Makefile   | 2 +-
  drivers/acpi/arm64/Kconfig  | 1 +
  drivers/acpi/arm64/Makefile | 2 +-
  drivers/iommu/Kconfig   | 1 +
  5 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index f819e760ff195a..3b7f77b227d13a 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -541,9 +541,7 @@ config ACPI_PFRUT
  To compile the drivers as modules, choose M here:
  the modules will be called pfr_update and pfr_telemetry.
  
-if ARM64

  source "drivers/acpi/arm64/Kconfig"
-endif
  
  config ACPI_PPTT

bool
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index eaa09bf52f1760..4e77ae37b80726 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -127,7 +127,7 @@ obj-y   += pmic/
  video-objs+= acpi_video.o video_detect.o
  obj-y += dptf/
  
-obj-$(CONFIG_ARM64)		+= arm64/

+obj-y  += arm64/
  
  obj-$(CONFIG_ACPI_VIOT)		+= viot.o
  
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig

index b3ed6212244c1e..537d49d8ace69e 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -11,6 +11,7 @@ config ACPI_GTDT
  
  config ACPI_AGDI

bool "Arm Generic Diagnostic Dump and Reset Device Interface"
+   depends on ARM64
depends on ARM_SDE_INTERFACE
help
  Arm Generic Diagnostic Dump and Reset Device Interface (AGDI) is
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 143debc1ba4a9d..71d0e635599390 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,4 +4,4 @@ obj-$(CONFIG_ACPI_IORT) += iort.o
  obj-$(CONFIG_ACPI_GTDT)   += gtdt.o
  obj-$(CONFIG_ACPI_APMT)   += apmt.o
  obj-$(CONFIG_ARM_AMBA)+= amba.o
-obj-y  += dma.o init.o
+obj-$(CONFIG_ARM64)+= dma.o init.o
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b6c..309378e76a9bc9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -318,6 +318,7 @@ config ARM_SMMU
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
+   select ACPI_IORT if ACPI


This is incomplete. If you want the driver to be responsible for 
enabling its own probing mechanisms then you need to select OF and ACPI 
too. And all the other drivers which probe from IORT should surely also 
select ACPI_IORT, and thus ACPI as well. And maybe the PCI core should 
as well because there are general properties of PCI host bridges and 
devices described in there?


But of course that's clearly backwards nonsense, because drivers do not 
and should not do that, so this change is not appropriate either. The 
IORT code may not be *functionally* arm64-specific, but logically it 
very much is - it serves a specification which is tied to the Arm 
architecture and describes Arm-architecture-specific concepts, within 
the wider context of ACPI on Arm itself only supporting AArch64, and not 
AArch32. It's also not like it's driver code that someone might use as 
an example and copy to a similar driver which could then run on 
different architectures where a latent theoretical bug becomes real. 
There's really no practical value to be had from compile-testing IORT.


Thanks,
Robin.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 10/10] ACPI: IORT: Allow COMPILE_TEST of IORT

2023-11-30 Thread Robin Murphy

On 29/11/2023 12:48 am, Jason Gunthorpe wrote:

The arm-smmu driver can COMPILE_TEST on x86, so expand this to also
enable the IORT code so it can be COMPILE_TEST'd too.

Signed-off-by: Jason Gunthorpe 
---
  drivers/acpi/Kconfig| 2 --
  drivers/acpi/Makefile   | 2 +-
  drivers/acpi/arm64/Kconfig  | 1 +
  drivers/acpi/arm64/Makefile | 2 +-
  drivers/iommu/Kconfig   | 1 +
  5 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index f819e760ff195a..3b7f77b227d13a 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -541,9 +541,7 @@ config ACPI_PFRUT
  To compile the drivers as modules, choose M here:
  the modules will be called pfr_update and pfr_telemetry.
  
-if ARM64

  source "drivers/acpi/arm64/Kconfig"
-endif
  
  config ACPI_PPTT

bool
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index eaa09bf52f1760..4e77ae37b80726 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -127,7 +127,7 @@ obj-y   += pmic/
  video-objs+= acpi_video.o video_detect.o
  obj-y += dptf/
  
-obj-$(CONFIG_ARM64)		+= arm64/

+obj-y  += arm64/
  
  obj-$(CONFIG_ACPI_VIOT)		+= viot.o
  
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig

index b3ed6212244c1e..537d49d8ace69e 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -11,6 +11,7 @@ config ACPI_GTDT
  
  config ACPI_AGDI

bool "Arm Generic Diagnostic Dump and Reset Device Interface"
+   depends on ARM64
depends on ARM_SDE_INTERFACE
help
  Arm Generic Diagnostic Dump and Reset Device Interface (AGDI) is
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 143debc1ba4a9d..71d0e635599390 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,4 +4,4 @@ obj-$(CONFIG_ACPI_IORT) += iort.o
  obj-$(CONFIG_ACPI_GTDT)   += gtdt.o
  obj-$(CONFIG_ACPI_APMT)   += apmt.o
  obj-$(CONFIG_ARM_AMBA)+= amba.o
-obj-y  += dma.o init.o
+obj-$(CONFIG_ARM64)+= dma.o init.o
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b6c..309378e76a9bc9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -318,6 +318,7 @@ config ARM_SMMU
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
+   select ACPI_IORT if ACPI


This is incomplete. If you want the driver to be responsible for 
enabling its own probing mechanisms then you need to select OF and ACPI 
too. And all the other drivers which probe from IORT should surely also 
select ACPI_IORT, and thus ACPI as well. And maybe the PCI core should 
as well because there are general properties of PCI host bridges and 
devices described in there?


But of course that's clearly backwards nonsense, because drivers do not 
and should not do that, so this change is not appropriate either. The 
IORT code may not be *functionally* arm64-specific, but logically it 
very much is - it serves a specification which is tied to the Arm 
architecture and describes Arm-architecture-specific concepts, within 
the wider context of ACPI on Arm itself only supporting AArch64, and not 
AArch32. It's also not like it's driver code that someone might use as 
an example and copy to a similar driver which could then run on 
different architectures where a latent theoretical bug becomes real. 
There's really no practical value to be had from compile-testing IORT.


Thanks,
Robin.



Re: [PATCH 10/10] ACPI: IORT: Allow COMPILE_TEST of IORT

2023-11-30 Thread Robin Murphy

On 29/11/2023 12:48 am, Jason Gunthorpe wrote:

The arm-smmu driver can COMPILE_TEST on x86, so expand this to also
enable the IORT code so it can be COMPILE_TEST'd too.

Signed-off-by: Jason Gunthorpe 
---
  drivers/acpi/Kconfig| 2 --
  drivers/acpi/Makefile   | 2 +-
  drivers/acpi/arm64/Kconfig  | 1 +
  drivers/acpi/arm64/Makefile | 2 +-
  drivers/iommu/Kconfig   | 1 +
  5 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index f819e760ff195a..3b7f77b227d13a 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -541,9 +541,7 @@ config ACPI_PFRUT
  To compile the drivers as modules, choose M here:
  the modules will be called pfr_update and pfr_telemetry.
  
-if ARM64

  source "drivers/acpi/arm64/Kconfig"
-endif
  
  config ACPI_PPTT

bool
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index eaa09bf52f1760..4e77ae37b80726 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -127,7 +127,7 @@ obj-y   += pmic/
  video-objs+= acpi_video.o video_detect.o
  obj-y += dptf/
  
-obj-$(CONFIG_ARM64)		+= arm64/

+obj-y  += arm64/
  
  obj-$(CONFIG_ACPI_VIOT)		+= viot.o
  
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig

index b3ed6212244c1e..537d49d8ace69e 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -11,6 +11,7 @@ config ACPI_GTDT
  
  config ACPI_AGDI

bool "Arm Generic Diagnostic Dump and Reset Device Interface"
+   depends on ARM64
depends on ARM_SDE_INTERFACE
help
  Arm Generic Diagnostic Dump and Reset Device Interface (AGDI) is
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 143debc1ba4a9d..71d0e635599390 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,4 +4,4 @@ obj-$(CONFIG_ACPI_IORT) += iort.o
  obj-$(CONFIG_ACPI_GTDT)   += gtdt.o
  obj-$(CONFIG_ACPI_APMT)   += apmt.o
  obj-$(CONFIG_ARM_AMBA)+= amba.o
-obj-y  += dma.o init.o
+obj-$(CONFIG_ARM64)+= dma.o init.o
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b6c..309378e76a9bc9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -318,6 +318,7 @@ config ARM_SMMU
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
+   select ACPI_IORT if ACPI


This is incomplete. If you want the driver to be responsible for 
enabling its own probing mechanisms then you need to select OF and ACPI 
too. And all the other drivers which probe from IORT should surely also 
select ACPI_IORT, and thus ACPI as well. And maybe the PCI core should 
as well because there are general properties of PCI host bridges and 
devices described in there?


But of course that's clearly backwards nonsense, because drivers do not 
and should not do that, so this change is not appropriate either. The 
IORT code may not be *functionally* arm64-specific, but logically it 
very much is - it serves a specification which is tied to the Arm 
architecture and describes Arm-architecture-specific concepts, within 
the wider context of ACPI on Arm itself only supporting AArch64, and not 
AArch32. It's also not like it's driver code that someone might use as 
an example and copy to a similar driver which could then run on 
different architectures where a latent theoretical bug becomes real. 
There's really no practical value to be had from compile-testing IORT.


Thanks,
Robin.


Re: [PATCH 06/10] iommu: Replace iommu_device_lock with iommu_probe_device_lock

2023-11-29 Thread Robin Murphy

On 29/11/2023 12:48 am, Jason Gunthorpe wrote:

The iommu_device_lock protects the iommu_device_list which is only read by
iommu_ops_from_fwnode().

This is now always called under the iommu_probe_device_lock, so we don't
need to double lock the linked list. Use the iommu_probe_device_lock on
the write side too.


Please no, iommu_probe_device_lock() is a hack and we need to remove the 
*reason* it exists at all. And IMO just because iommu_present() is 
deprecated doesn't justify making it look utterly nonsensical - in no 
way does that have any relationship with probe_device, much less need to 
serialise against it!


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/iommu.c | 30 +-
  1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 08f29a1dfcd5f8..9557c2ec08d915 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -146,7 +146,6 @@ struct iommu_group_attribute iommu_group_attr_##_name = 
\
container_of(_kobj, struct iommu_group, kobj)
  
  static LIST_HEAD(iommu_device_list);

-static DEFINE_SPINLOCK(iommu_device_lock);
  
  static const struct bus_type * const iommu_buses[] = {

_bus_type,
@@ -262,9 +261,9 @@ int iommu_device_register(struct iommu_device *iommu,
if (hwdev)
iommu->fwnode = dev_fwnode(hwdev);
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)

err = bus_iommu_probe(iommu_buses[i]);
@@ -279,9 +278,9 @@ void iommu_device_unregister(struct iommu_device *iommu)
for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++)
bus_for_each_dev(iommu_buses[i], NULL, iommu, 
remove_iommu_group);
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_del(>list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	/* Pairs with the alloc in generic_single_device_group() */

iommu_group_put(iommu->singleton_group);
@@ -316,9 +315,9 @@ int iommu_device_register_bus(struct iommu_device *iommu,
if (err)
return err;
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	err = bus_iommu_probe(bus);

if (err) {
@@ -2033,9 +2032,9 @@ bool iommu_present(const struct bus_type *bus)
  
  	for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {

if (iommu_buses[i] == bus) {
-   spin_lock(_device_lock);
+   mutex_lock(_probe_device_lock);
ret = !list_empty(_device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
}
}
return ret;
@@ -2980,17 +2979,14 @@ EXPORT_SYMBOL_GPL(iommu_default_passthrough);
  
  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)

  {
-   const struct iommu_ops *ops = NULL;
struct iommu_device *iommu;
  
-	spin_lock(_device_lock);

+   lockdep_assert_held(_probe_device_lock);
+
list_for_each_entry(iommu, _device_list, list)
-   if (iommu->fwnode == fwnode) {
-   ops = iommu->ops;
-   break;
-   }
-   spin_unlock(_device_lock);
-   return ops;
+   if (iommu->fwnode == fwnode)
+   return iommu->ops;
+   return NULL;
  }
  
  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,


Re: [PATCH 06/10] iommu: Replace iommu_device_lock with iommu_probe_device_lock

2023-11-29 Thread Robin Murphy

On 29/11/2023 12:48 am, Jason Gunthorpe wrote:

The iommu_device_lock protects the iommu_device_list which is only read by
iommu_ops_from_fwnode().

This is now always called under the iommu_probe_device_lock, so we don't
need to double lock the linked list. Use the iommu_probe_device_lock on
the write side too.


Please no, iommu_probe_device_lock() is a hack and we need to remove the 
*reason* it exists at all. And IMO just because iommu_present() is 
deprecated doesn't justify making it look utterly nonsensical - in no 
way does that have any relationship with probe_device, much less need to 
serialise against it!


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/iommu.c | 30 +-
  1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 08f29a1dfcd5f8..9557c2ec08d915 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -146,7 +146,6 @@ struct iommu_group_attribute iommu_group_attr_##_name = 
\
container_of(_kobj, struct iommu_group, kobj)
  
  static LIST_HEAD(iommu_device_list);

-static DEFINE_SPINLOCK(iommu_device_lock);
  
  static const struct bus_type * const iommu_buses[] = {

_bus_type,
@@ -262,9 +261,9 @@ int iommu_device_register(struct iommu_device *iommu,
if (hwdev)
iommu->fwnode = dev_fwnode(hwdev);
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)

err = bus_iommu_probe(iommu_buses[i]);
@@ -279,9 +278,9 @@ void iommu_device_unregister(struct iommu_device *iommu)
for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++)
bus_for_each_dev(iommu_buses[i], NULL, iommu, 
remove_iommu_group);
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_del(>list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	/* Pairs with the alloc in generic_single_device_group() */

iommu_group_put(iommu->singleton_group);
@@ -316,9 +315,9 @@ int iommu_device_register_bus(struct iommu_device *iommu,
if (err)
return err;
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	err = bus_iommu_probe(bus);

if (err) {
@@ -2033,9 +2032,9 @@ bool iommu_present(const struct bus_type *bus)
  
  	for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {

if (iommu_buses[i] == bus) {
-   spin_lock(_device_lock);
+   mutex_lock(_probe_device_lock);
ret = !list_empty(_device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
}
}
return ret;
@@ -2980,17 +2979,14 @@ EXPORT_SYMBOL_GPL(iommu_default_passthrough);
  
  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)

  {
-   const struct iommu_ops *ops = NULL;
struct iommu_device *iommu;
  
-	spin_lock(_device_lock);

+   lockdep_assert_held(_probe_device_lock);
+
list_for_each_entry(iommu, _device_list, list)
-   if (iommu->fwnode == fwnode) {
-   ops = iommu->ops;
-   break;
-   }
-   spin_unlock(_device_lock);
-   return ops;
+   if (iommu->fwnode == fwnode)
+   return iommu->ops;
+   return NULL;
  }
  
  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [Nouveau] [PATCH 06/10] iommu: Replace iommu_device_lock with iommu_probe_device_lock

2023-11-29 Thread Robin Murphy

On 29/11/2023 12:48 am, Jason Gunthorpe wrote:

The iommu_device_lock protects the iommu_device_list which is only read by
iommu_ops_from_fwnode().

This is now always called under the iommu_probe_device_lock, so we don't
need to double lock the linked list. Use the iommu_probe_device_lock on
the write side too.


Please no, iommu_probe_device_lock() is a hack and we need to remove the 
*reason* it exists at all. And IMO just because iommu_present() is 
deprecated doesn't justify making it look utterly nonsensical - in no 
way does that have any relationship with probe_device, much less need to 
serialise against it!


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/iommu.c | 30 +-
  1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 08f29a1dfcd5f8..9557c2ec08d915 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -146,7 +146,6 @@ struct iommu_group_attribute iommu_group_attr_##_name = 
\
container_of(_kobj, struct iommu_group, kobj)
  
  static LIST_HEAD(iommu_device_list);

-static DEFINE_SPINLOCK(iommu_device_lock);
  
  static const struct bus_type * const iommu_buses[] = {

_bus_type,
@@ -262,9 +261,9 @@ int iommu_device_register(struct iommu_device *iommu,
if (hwdev)
iommu->fwnode = dev_fwnode(hwdev);
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)

err = bus_iommu_probe(iommu_buses[i]);
@@ -279,9 +278,9 @@ void iommu_device_unregister(struct iommu_device *iommu)
for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++)
bus_for_each_dev(iommu_buses[i], NULL, iommu, 
remove_iommu_group);
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_del(>list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	/* Pairs with the alloc in generic_single_device_group() */

iommu_group_put(iommu->singleton_group);
@@ -316,9 +315,9 @@ int iommu_device_register_bus(struct iommu_device *iommu,
if (err)
return err;
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	err = bus_iommu_probe(bus);

if (err) {
@@ -2033,9 +2032,9 @@ bool iommu_present(const struct bus_type *bus)
  
  	for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {

if (iommu_buses[i] == bus) {
-   spin_lock(_device_lock);
+   mutex_lock(_probe_device_lock);
ret = !list_empty(_device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
}
}
return ret;
@@ -2980,17 +2979,14 @@ EXPORT_SYMBOL_GPL(iommu_default_passthrough);
  
  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)

  {
-   const struct iommu_ops *ops = NULL;
struct iommu_device *iommu;
  
-	spin_lock(_device_lock);

+   lockdep_assert_held(_probe_device_lock);
+
list_for_each_entry(iommu, _device_list, list)
-   if (iommu->fwnode == fwnode) {
-   ops = iommu->ops;
-   break;
-   }
-   spin_unlock(_device_lock);
-   return ops;
+   if (iommu->fwnode == fwnode)
+   return iommu->ops;
+   return NULL;
  }
  
  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,


Re: [PATCH 08/16] iommu/fsl: use page allocation function provided by iommu-pages.h

2023-11-29 Thread Robin Murphy

On 28/11/2023 11:50 pm, Jason Gunthorpe wrote:

On Tue, Nov 28, 2023 at 06:00:13PM -0500, Pasha Tatashin wrote:

On Tue, Nov 28, 2023 at 5:53 PM Robin Murphy  wrote:


On 2023-11-28 8:49 pm, Pasha Tatashin wrote:

Convert iommu/fsl_pamu.c to use the new page allocation functions
provided in iommu-pages.h.


Again, this is not a pagetable. This thing doesn't even *have* pagetables.

Similar to patches #1 and #2 where you're lumping in configuration
tables which belong to the IOMMU driver itself, as opposed to pagetables
which effectively belong to an IOMMU domain's user. But then there are
still drivers where you're *not* accounting similar configuration
structures, so I really struggle to see how this metric is useful when
it's so completely inconsistent in what it's counting :/


The whole IOMMU subsystem allocates a significant amount of kernel
locked memory that we want to at least observe. The new field in
vmstat does just that: it reports ALL buddy allocator memory that
IOMMU allocates. However, for accounting purposes, I agree, we need to
do better, and separate at least iommu pagetables from the rest.

We can separate the metric into two:
iommu pagetable only
iommu everything

or into three:
iommu pagetable only
iommu dma
iommu everything

What do you think?


I think I said this at LPC - if you want to have fine grained
accounting of memory by owner you need to go talk to the cgroup people
and come up with something generic. Adding ever open coded finer
category breakdowns just for iommu doesn't make alot of sense.

You can make some argument that the pagetable memory should be counted
because kvm counts it's shadow memory, but I wouldn't go into further
detail than that with hand coded counters..


Right, pagetable memory is interesting since it's something that any 
random kernel user can indirectly allocate via iommu_domain_alloc() and 
iommu_map(), and some of those users may even be doing so on behalf of 
userspace. I have no objection to accounting and potentially applying 
limits to *that*.


Beyond that, though, there is nothing special about "the IOMMU 
subsystem". The amount of memory an IOMMU driver needs to allocate for 
itself in order to function is not of interest beyond curiosity, it just 
is what it is; limiting it would only break the IOMMU, and if a user 
thinks it's "too much", the only actionable thing that might help is to 
physically remove devices from the system. Similar for DMA buffers; it 
might be intriguing to account those, but it's not really an actionable 
metric - in the overwhelming majority of cases you can't simply tell a 
driver to allocate less than what it needs. And that is of course 
assuming if we were to account *all* DMA buffers, since whether they 
happen to have an IOMMU translation or not is irrelevant (we'd have 
already accounted the pagetables as pagetables if so).


I bet "the networking subsystem" also consumes significant memory on the 
same kind of big systems where IOMMU pagetables would be of any concern. 
I believe some of the some of the "serious" NICs can easily run up 
hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. 
- would you propose accounting those too?


Thanks,
Robin.



Re: [PATCH 06/16] iommu/dma: use page allocation function provided by iommu-pages.h

2023-11-28 Thread Robin Murphy

On 2023-11-28 10:50 pm, Pasha Tatashin wrote:

On Tue, Nov 28, 2023 at 5:34 PM Robin Murphy  wrote:


On 2023-11-28 8:49 pm, Pasha Tatashin wrote:

Convert iommu/dma-iommu.c to use the new page allocation functions
provided in iommu-pages.h.


These have nothing to do with IOMMU pagetables, they are DMA buffers and
they belong to whoever called the corresponding dma_alloc_* function.


Hi Robin,

This is true, however, we want to account and observe the pages
allocated by IOMMU subsystem for DMA buffers, as they are essentially
unmovable locked pages. Should we separate IOMMU memory from KVM
memory all together and add another field to /proc/meminfo, something
like "iommu -> iommu pagetable and dma memory", or do we want to
export DMA memory separately from IOMMU page tables?


These are not allocated by "the IOMMU subsystem", they are allocated by 
the DMA API. Even if you want to claim that a driver pinning memory via 
iommu_dma_ops is somehow different from the same driver pinning the same 
amount of memory via dma-direct when iommu.passthrough=1, it's still 
nonsense because you're failing to account the pages which iommu_dma_ops 
gets from CMA, dma_common_alloc_pages(), dynamic SWIOTLB, the various 
pools, and so on.


Thanks,
Robin.


Since, I included DMA memory, I specifically removed mentioning of
IOMMU page tables in the most of places, and only report it as IOMMU
memory. However, since it is still bundled together with SecPageTables
it can be confusing.

Pasha




[PATCH v3] drm/mediatek: Stop using iommu_present()

2023-11-23 Thread Robin Murphy
Remove the pointless check. If an IOMMU is providing transparent DMA API
ops for any device(s) we care about, the DT code will have enforced the
appropriate probe ordering already. And if the IOMMU *is* entirely
absent, then attempting to go ahead with CMA and either suceeding or
failing decisively seems more useful than deferring forever.

Signed-off-by: Robin Murphy 
---

I realised that last time I sent this I probably should have CCed a
wider audience of reviewers, so here's one with an updated commit
message as well to make the resend more worthwhile.

 drivers/gpu/drm/mediatek/mtk_drm_drv.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c 
b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index 2dfaa613276a..48581da51857 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -5,7 +5,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -608,9 +607,6 @@ static int mtk_drm_bind(struct device *dev)
struct drm_device *drm;
int ret, i;
 
-   if (!iommu_present(_bus_type))
-   return -EPROBE_DEFER;
-
pdev = of_find_device_by_node(private->mutex_node);
if (!pdev) {
dev_err(dev, "Waiting for disp-mutex device %pOF\n",
-- 
2.39.2.101.g768bb238c484.dirty



Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec

2023-11-21 Thread Robin Murphy

On 2023-11-16 4:17 am, Jason Gunthorpe wrote:

On Wed, Nov 15, 2023 at 08:23:54PM +, Robin Murphy wrote:

On 2023-11-15 3:36 pm, Jason Gunthorpe wrote:

On Wed, Nov 15, 2023 at 03:22:09PM +, Robin Murphy wrote:

On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:

[Several people have tested this now, so it is something that should sit in
linux-next for a while]


What's the aim here? This is obviously far, far too much for a
stable fix,


To fix the locking bug and ugly abuse of dev->iommu?


Fixing the locking can be achieved by fixing the locking, as I have now
demonstrated.


Obviously. I rejected that right away because of how incredibly
wrongly layered and hacky it is to do something like that.


What, and dressing up the fundamental layering violation by baking it 
even further into the API flow, while still not actually fixing it or 
any of its *other* symptoms, is somehow better?


Ultimately, this series is still basically doing the same thing my patch 
does - extending the scope of the existing iommu_probe_device_lock hack 
to cover fwspec creation. A hack is a hack, so frankly I'd rather it be 
simple and obvious and look like one, and being easy to remove again is 
an obvious bonus too.



I haven't seen patches or an outline on what you have in mind though?

In my view I would like to get rid of of_xlate(), at a minimum. It is
a micro-optimization I don't think we need. I see a pretty
straightforward path to get there from here.


Micro-optimisation!? OK, I think I have to say it. Please stop trying to
rewrite code you don't understand.


I understand it fine. The list of (fwnode_handle, of_phandle_args)
tuples doesn't change between when of_xlate is callled and when probe
is called. Probe can have the same list. As best I can tell the extra
ops avoids maybe some memory allocation, maybe an extra iteration.

What it does do is screw up alot of the drivers that seem to want to
allocate the per-device data in of_xlate and make it convoluted and
memory leaking buggy on error paths.

So, I would move toward having the driver's probe invoke a helper like:

iommu_of_xlate(dev, fwspec, _fwnode_function, );

Which generates the same list of (fwnode_handle, of_phandle_args) that
was passed to of_xlate today, but is ordered sensibly within the
sequence of probe for what many drivers seem to want to do.


Grep for of_xlate. It is a standard and well-understood callback pattern 
for a subsystem to parse a common DT binding and pass a driver-specific 
specifier to a driver to interpret. Or maybe you just have a peculiar 
definition of what you think "micro-optimisation" means? :/



So, it is not so much that that the idea of of_xlate goes away, but
the specific op->of_xlate does, it gets shifted into a helper that
invokes the same function in a more logical spot.


I'm curious how you imagine an IOMMU driver's ->probe function could be 
called *before* parsing the firmware to work out what, if any, IOMMU, 
and thus driver, a device is associated with. Unless you think we should 
have the horrible perf model of passing the device to *every* registered 
->probe callback in turn until someone claims it. And then every driver 
has to have identical boilerplate to go off and parse the generic 
"iommus" binding... which is the whole damn reason for *not* going down 
that route and instead using an of_xlate mechanism in the first place.



The per-device data can be allocated at the top of probe and passed
through args to fix the lifetime bugs.

It is pretty simple to do.


I believe the kids these days would say "Say you don't understand the 
code without saying you don't understand the code."



Most of this series constitutes a giant sweeping redesign of a whole bunch
of internal machinery to permit it to be used concurrently, where that
concurrency should still not exist in the first place because the thing that
allows it to happen also causes other problems like groups being broken.
Once the real problem is fixed there will be no need for any of this, and at
worst some of it will then actually get in the way.


Not quite. This decouples two unrelated things into seperate
concerns. It is not so much about the concurrency but removing the
abuse of dev->iommu by code that has no need to touch it at all.


Sorry, the "abuse" of storing IOMMU-API-specific data in the place we 
intentionally created to consolidate all the IOMMU-API-specific data 
into? Yes, there is an issue with the circumstances in which this data 
is sometimes accessed, but as I'm starting to tire of repeating, that 
issue fundamentally dates back to 2017, and the implications were 
unfortunately overlooked when dev->iommu was later introduced and fwspec 
moved into it (since the non-DT probing paths still worked as originally 
designed). Pretending that dev->iommu is the issue here is missing the 
point.



Decoupling makes moving code around easier since the relationships

Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec

2023-11-21 Thread Robin Murphy

On 2023-11-16 4:17 am, Jason Gunthorpe wrote:

On Wed, Nov 15, 2023 at 08:23:54PM +, Robin Murphy wrote:

On 2023-11-15 3:36 pm, Jason Gunthorpe wrote:

On Wed, Nov 15, 2023 at 03:22:09PM +, Robin Murphy wrote:

On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:

[Several people have tested this now, so it is something that should sit in
linux-next for a while]


What's the aim here? This is obviously far, far too much for a
stable fix,


To fix the locking bug and ugly abuse of dev->iommu?


Fixing the locking can be achieved by fixing the locking, as I have now
demonstrated.


Obviously. I rejected that right away because of how incredibly
wrongly layered and hacky it is to do something like that.


What, and dressing up the fundamental layering violation by baking it 
even further into the API flow, while still not actually fixing it or 
any of its *other* symptoms, is somehow better?


Ultimately, this series is still basically doing the same thing my patch 
does - extending the scope of the existing iommu_probe_device_lock hack 
to cover fwspec creation. A hack is a hack, so frankly I'd rather it be 
simple and obvious and look like one, and being easy to remove again is 
an obvious bonus too.



I haven't seen patches or an outline on what you have in mind though?

In my view I would like to get rid of of_xlate(), at a minimum. It is
a micro-optimization I don't think we need. I see a pretty
straightforward path to get there from here.


Micro-optimisation!? OK, I think I have to say it. Please stop trying to
rewrite code you don't understand.


I understand it fine. The list of (fwnode_handle, of_phandle_args)
tuples doesn't change between when of_xlate is callled and when probe
is called. Probe can have the same list. As best I can tell the extra
ops avoids maybe some memory allocation, maybe an extra iteration.

What it does do is screw up alot of the drivers that seem to want to
allocate the per-device data in of_xlate and make it convoluted and
memory leaking buggy on error paths.

So, I would move toward having the driver's probe invoke a helper like:

iommu_of_xlate(dev, fwspec, _fwnode_function, );

Which generates the same list of (fwnode_handle, of_phandle_args) that
was passed to of_xlate today, but is ordered sensibly within the
sequence of probe for what many drivers seem to want to do.


Grep for of_xlate. It is a standard and well-understood callback pattern 
for a subsystem to parse a common DT binding and pass a driver-specific 
specifier to a driver to interpret. Or maybe you just have a peculiar 
definition of what you think "micro-optimisation" means? :/



So, it is not so much that that the idea of of_xlate goes away, but
the specific op->of_xlate does, it gets shifted into a helper that
invokes the same function in a more logical spot.


I'm curious how you imagine an IOMMU driver's ->probe function could be 
called *before* parsing the firmware to work out what, if any, IOMMU, 
and thus driver, a device is associated with. Unless you think we should 
have the horrible perf model of passing the device to *every* registered 
->probe callback in turn until someone claims it. And then every driver 
has to have identical boilerplate to go off and parse the generic 
"iommus" binding... which is the whole damn reason for *not* going down 
that route and instead using an of_xlate mechanism in the first place.



The per-device data can be allocated at the top of probe and passed
through args to fix the lifetime bugs.

It is pretty simple to do.


I believe the kids these days would say "Say you don't understand the 
code without saying you don't understand the code."



Most of this series constitutes a giant sweeping redesign of a whole bunch
of internal machinery to permit it to be used concurrently, where that
concurrency should still not exist in the first place because the thing that
allows it to happen also causes other problems like groups being broken.
Once the real problem is fixed there will be no need for any of this, and at
worst some of it will then actually get in the way.


Not quite. This decouples two unrelated things into seperate
concerns. It is not so much about the concurrency but removing the
abuse of dev->iommu by code that has no need to touch it at all.


Sorry, the "abuse" of storing IOMMU-API-specific data in the place we 
intentionally created to consolidate all the IOMMU-API-specific data 
into? Yes, there is an issue with the circumstances in which this data 
is sometimes accessed, but as I'm starting to tire of repeating, that 
issue fundamentally dates back to 2017, and the implications were 
unfortunately overlooked when dev->iommu was later introduced and fwspec 
moved into it (since the non-DT probing paths still worked as originally 
designed). Pretending that dev->iommu is the issue here is missing the 
point.



Decoupling makes moving code around easier since the relationships

Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec

2023-11-15 Thread Robin Murphy

On 2023-11-15 3:36 pm, Jason Gunthorpe wrote:

On Wed, Nov 15, 2023 at 03:22:09PM +, Robin Murphy wrote:

On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:

[Several people have tested this now, so it is something that should sit in
linux-next for a while]


What's the aim here? This is obviously far, far too much for a
stable fix,


To fix the locking bug and ugly abuse of dev->iommu?


Fixing the locking can be achieved by fixing the locking, as I have now 
demonstrated.



I wouldn't say that, it is up to the people who care about this to
decide. It seems alot of people are hitting it so maybe it should be
backported in some situations. Regardless, we should not continue to
have this locking bug in v6.8.


but then it's also not the refactoring we want for the future either, since
it's moving in the wrong direction of cementing the fundamental brokenness
further in place rather than getting any closer to removing it.


I haven't seen patches or an outline on what you have in mind though?

In my view I would like to get rid of of_xlate(), at a minimum. It is
a micro-optimization I don't think we need. I see a pretty
straightforward path to get there from here.


Micro-optimisation!? OK, I think I have to say it. Please stop trying to 
rewrite code you don't understand.



Do you also want to get rid of iommu_fwspec, or at least thin it out?
That seems reasonable too, I think that becomes within reach once
of_xlate is gone.

What do you see as "cemeting"?


Most of this series constitutes a giant sweeping redesign of a whole 
bunch of internal machinery to permit it to be used concurrently, where 
that concurrency should still not exist in the first place because the 
thing that allows it to happen also causes other problems like groups 
being broken. Once the real problem is fixed there will be no need for 
any of this, and at worst some of it will then actually get in the way.


I feel like I've explained it many times already, but what needs to 
happen is for the firmware parsing and of_xlate stage to be initiated by 
__iommu_probe_device() itself. The first step is my bus ops series (if 
I'm ever allowed to get it landed...) which gets to the state of 
expecting to start from a fwspec. Then it's a case of shuffling around 
what's currently in the bus_type dma_configure methods such that point 
is where the fwspec is created as well, and the driver-probe-time work 
is almost removed except for still deferring if a device is waiting for 
its IOMMU instance (since that instance turning up and registering will 
retrigger the rest itself). And there at last, a trivial lifecycle and 
access pattern for dev->iommu (with the overlapping bits of iommu_fwspec 
finally able to be squashed as well), and finally an end to 8 long and 
unfortunate years of calling things in the wrong order in ways they were 
never supposed to be.


Thanks,
Robin.



Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec

2023-11-15 Thread Robin Murphy

On 2023-11-15 3:36 pm, Jason Gunthorpe wrote:

On Wed, Nov 15, 2023 at 03:22:09PM +, Robin Murphy wrote:

On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:

[Several people have tested this now, so it is something that should sit in
linux-next for a while]


What's the aim here? This is obviously far, far too much for a
stable fix,


To fix the locking bug and ugly abuse of dev->iommu?


Fixing the locking can be achieved by fixing the locking, as I have now 
demonstrated.



I wouldn't say that, it is up to the people who care about this to
decide. It seems alot of people are hitting it so maybe it should be
backported in some situations. Regardless, we should not continue to
have this locking bug in v6.8.


but then it's also not the refactoring we want for the future either, since
it's moving in the wrong direction of cementing the fundamental brokenness
further in place rather than getting any closer to removing it.


I haven't seen patches or an outline on what you have in mind though?

In my view I would like to get rid of of_xlate(), at a minimum. It is
a micro-optimization I don't think we need. I see a pretty
straightforward path to get there from here.


Micro-optimisation!? OK, I think I have to say it. Please stop trying to 
rewrite code you don't understand.



Do you also want to get rid of iommu_fwspec, or at least thin it out?
That seems reasonable too, I think that becomes within reach once
of_xlate is gone.

What do you see as "cemeting"?


Most of this series constitutes a giant sweeping redesign of a whole 
bunch of internal machinery to permit it to be used concurrently, where 
that concurrency should still not exist in the first place because the 
thing that allows it to happen also causes other problems like groups 
being broken. Once the real problem is fixed there will be no need for 
any of this, and at worst some of it will then actually get in the way.


I feel like I've explained it many times already, but what needs to 
happen is for the firmware parsing and of_xlate stage to be initiated by 
__iommu_probe_device() itself. The first step is my bus ops series (if 
I'm ever allowed to get it landed...) which gets to the state of 
expecting to start from a fwspec. Then it's a case of shuffling around 
what's currently in the bus_type dma_configure methods such that point 
is where the fwspec is created as well, and the driver-probe-time work 
is almost removed except for still deferring if a device is waiting for 
its IOMMU instance (since that instance turning up and registering will 
retrigger the rest itself). And there at last, a trivial lifecycle and 
access pattern for dev->iommu (with the overlapping bits of iommu_fwspec 
finally able to be squashed as well), and finally an end to 8 long and 
unfortunate years of calling things in the wrong order in ways they were 
never supposed to be.


Thanks,
Robin.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec

2023-11-15 Thread Robin Murphy

On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:

[Several people have tested this now, so it is something that should sit in
linux-next for a while]


What's the aim here? This is obviously far, far too much for a stable 
fix, but then it's also not the refactoring we want for the future 
either, since it's moving in the wrong direction of cementing the 
fundamental brokenness further in place rather than getting any closer 
to removing it.


Thanks,
Robin.


The iommu subsystem uses dev->iommu to store bits of information about the
attached iommu driver. This has been co-opted by the ACPI/OF code to also
be a place to pass around the iommu_fwspec before a driver is probed.

Since both are using the same pointers without any locking it triggers
races if there is concurrent driver loading:

  CPU0 CPU1
of_iommu_configure()iommu_device_register()
  ..   bus_iommu_probe()
   iommu_fwspec_of_xlate()  __iommu_probe_device()
 iommu_init_device()
dev_iommu_get()
   .. ops->probe fails, no fwspec ..
   dev_iommu_free()
dev->iommu->fwspec*crash*

My first attempt get correct locking here was to use the device_lock to
protect the entire *_iommu_configure() and iommu_probe() paths. This
allowed safe use of dev->iommu within those paths. Unfortuately enough
drivers abuse the of_iommu_configure() flow without proper locking and
this approach failed.

This approach removes touches of dev->iommu from the *_iommu_configure()
code. The few remaining required touches are moved into iommu.c and
protected with the existing iommu_probe_device_lock.

To do this we change *_iommu_configure() to hold the iommu_fwspec on the
stack while it is being built. Once it is fully formed the core code will
install it into the dev->iommu when it calls probe.

This also removes all the touches of iommu_ops from
the *_iommu_configure() paths and makes that mechanism private to the
iommu core.

A few more lockdep assertions are added to discourage future mis-use.

This is on github: https://github.com/jgunthorpe/linux/commits/iommu_fwspec

v2:
  - Fix all the kconfig randomization 0-day stuff
  - Add missing kdoc parameters
  - Remove NO_IOMMU, replace it with ENODEV
  - Use PTR_ERR to print errno in the new/moved logging
v1: https://lore.kernel.org/r/0-v1-5f734af130a3+34f-iommu_fwspec_...@nvidia.com

Jason Gunthorpe (17):
   iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()
   iommmu/of: Do not return struct iommu_ops from of_iommu_configure()
   iommu/of: Use -ENODEV consistently in of_iommu_configure()
   acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()
   iommu: Make iommu_fwspec->ids a distinct allocation
   iommu: Add iommu_fwspec_alloc/dealloc()
   iommu: Add iommu_probe_device_fwspec()
   iommu/of: Do not use dev->iommu within of_iommu_configure()
   iommu: Add iommu_fwspec_append_ids()
   acpi: Do not use dev->iommu within acpi_iommu_configure()
   iommu: Hold iommu_probe_device_lock while calling ops->of_xlate
   iommu: Make iommu_ops_from_fwnode() static
   iommu: Remove dev_iommu_fwspec_set()
   iommu: Remove pointless iommu_fwspec_free()
   iommu: Add ops->of_xlate_fwspec()
   iommu: Mark dev_iommu_get() with lockdep
   iommu: Mark dev_iommu_priv_set() with a lockdep

  arch/arc/mm/dma.c   |   2 +-
  arch/arm/mm/dma-mapping-nommu.c |   2 +-
  arch/arm/mm/dma-mapping.c   |  10 +-
  arch/arm64/mm/dma-mapping.c |   4 +-
  arch/mips/mm/dma-noncoherent.c  |   2 +-
  arch/riscv/mm/dma-noncoherent.c |   2 +-
  drivers/acpi/arm64/iort.c   |  42 ++--
  drivers/acpi/scan.c | 104 +
  drivers/acpi/viot.c |  45 ++--
  drivers/hv/hv_common.c  |   2 +-
  drivers/iommu/amd/iommu.c   |   2 -
  drivers/iommu/apple-dart.c  |   1 -
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   9 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  23 +-
  drivers/iommu/intel/iommu.c |   2 -
  drivers/iommu/iommu.c   | 227 +++-
  drivers/iommu/of_iommu.c| 133 +---
  drivers/iommu/omap-iommu.c  |   1 -
  drivers/iommu/tegra-smmu.c  |   1 -
  drivers/iommu/virtio-iommu.c|   8 +-
  drivers/of/device.c |  24 ++-
  include/acpi/acpi_bus.h |   8 +-
  include/linux/acpi_iort.h   |   8 +-
  include/linux/acpi_viot.h   |   5 +-
  include/linux/dma-map-ops.h |   4 +-
  include/linux/iommu.h   |  47 ++--
  include/linux/of_iommu.h|  13 +-
  27 files 

Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec

2023-11-15 Thread Robin Murphy

On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:

[Several people have tested this now, so it is something that should sit in
linux-next for a while]


What's the aim here? This is obviously far, far too much for a stable 
fix, but then it's also not the refactoring we want for the future 
either, since it's moving in the wrong direction of cementing the 
fundamental brokenness further in place rather than getting any closer 
to removing it.


Thanks,
Robin.


The iommu subsystem uses dev->iommu to store bits of information about the
attached iommu driver. This has been co-opted by the ACPI/OF code to also
be a place to pass around the iommu_fwspec before a driver is probed.

Since both are using the same pointers without any locking it triggers
races if there is concurrent driver loading:

  CPU0 CPU1
of_iommu_configure()iommu_device_register()
  ..   bus_iommu_probe()
   iommu_fwspec_of_xlate()  __iommu_probe_device()
 iommu_init_device()
dev_iommu_get()
   .. ops->probe fails, no fwspec ..
   dev_iommu_free()
dev->iommu->fwspec*crash*

My first attempt get correct locking here was to use the device_lock to
protect the entire *_iommu_configure() and iommu_probe() paths. This
allowed safe use of dev->iommu within those paths. Unfortuately enough
drivers abuse the of_iommu_configure() flow without proper locking and
this approach failed.

This approach removes touches of dev->iommu from the *_iommu_configure()
code. The few remaining required touches are moved into iommu.c and
protected with the existing iommu_probe_device_lock.

To do this we change *_iommu_configure() to hold the iommu_fwspec on the
stack while it is being built. Once it is fully formed the core code will
install it into the dev->iommu when it calls probe.

This also removes all the touches of iommu_ops from
the *_iommu_configure() paths and makes that mechanism private to the
iommu core.

A few more lockdep assertions are added to discourage future mis-use.

This is on github: https://github.com/jgunthorpe/linux/commits/iommu_fwspec

v2:
  - Fix all the kconfig randomization 0-day stuff
  - Add missing kdoc parameters
  - Remove NO_IOMMU, replace it with ENODEV
  - Use PTR_ERR to print errno in the new/moved logging
v1: https://lore.kernel.org/r/0-v1-5f734af130a3+34f-iommu_fwspec_...@nvidia.com

Jason Gunthorpe (17):
   iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()
   iommmu/of: Do not return struct iommu_ops from of_iommu_configure()
   iommu/of: Use -ENODEV consistently in of_iommu_configure()
   acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()
   iommu: Make iommu_fwspec->ids a distinct allocation
   iommu: Add iommu_fwspec_alloc/dealloc()
   iommu: Add iommu_probe_device_fwspec()
   iommu/of: Do not use dev->iommu within of_iommu_configure()
   iommu: Add iommu_fwspec_append_ids()
   acpi: Do not use dev->iommu within acpi_iommu_configure()
   iommu: Hold iommu_probe_device_lock while calling ops->of_xlate
   iommu: Make iommu_ops_from_fwnode() static
   iommu: Remove dev_iommu_fwspec_set()
   iommu: Remove pointless iommu_fwspec_free()
   iommu: Add ops->of_xlate_fwspec()
   iommu: Mark dev_iommu_get() with lockdep
   iommu: Mark dev_iommu_priv_set() with a lockdep

  arch/arc/mm/dma.c   |   2 +-
  arch/arm/mm/dma-mapping-nommu.c |   2 +-
  arch/arm/mm/dma-mapping.c   |  10 +-
  arch/arm64/mm/dma-mapping.c |   4 +-
  arch/mips/mm/dma-noncoherent.c  |   2 +-
  arch/riscv/mm/dma-noncoherent.c |   2 +-
  drivers/acpi/arm64/iort.c   |  42 ++--
  drivers/acpi/scan.c | 104 +
  drivers/acpi/viot.c |  45 ++--
  drivers/hv/hv_common.c  |   2 +-
  drivers/iommu/amd/iommu.c   |   2 -
  drivers/iommu/apple-dart.c  |   1 -
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   9 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  23 +-
  drivers/iommu/intel/iommu.c |   2 -
  drivers/iommu/iommu.c   | 227 +++-
  drivers/iommu/of_iommu.c| 133 +---
  drivers/iommu/omap-iommu.c  |   1 -
  drivers/iommu/tegra-smmu.c  |   1 -
  drivers/iommu/virtio-iommu.c|   8 +-
  drivers/of/device.c |  24 ++-
  include/acpi/acpi_bus.h |   8 +-
  include/linux/acpi_iort.h   |   8 +-
  include/linux/acpi_viot.h   |   5 +-
  include/linux/dma-map-ops.h |   4 +-
  include/linux/iommu.h   |  47 ++--
  include/linux/of_iommu.h|  13 +-
  27 files 

Re: [PATCH] arm/mm: add option to prefer IOMMU ops for DMA on Xen

2023-11-14 Thread Robin Murphy

On 11/11/2023 6:45 pm, Chuck Zmudzinski wrote:

Enabling the new option, ARM_DMA_USE_IOMMU_XEN, fixes this error when
attaching the Exynos mixer in Linux dom0 on Xen on the Chromebook Snow
(and probably on other devices that use the Exynos mixer):

[drm] Exynos DRM: using 1440.fimd device for DMA mapping operations
exynos-drm exynos-drm: bound 1440.fimd (ops 0xc0d96354)
exynos-mixer 1445.mixer: [drm:exynos_drm_register_dma] *ERROR* Device
  1445.mixer lacks support for IOMMU
exynos-drm exynos-drm: failed to bind 1445.mixer (ops 0xc0d97554): -22
exynos-drm exynos-drm: adev bind failed: -22
exynos-dp: probe of 145b.dp-controller failed with error -22

Linux normally uses xen_swiotlb_dma_ops for DMA for all devices when
xen_swiotlb is detected even when Xen exposes an IOMMU to Linux. Enabling
the new config option allows devices such as the Exynos mixer to use the
IOMMU instead of xen_swiotlb_dma_ops for DMA and this fixes the error.

The new config option is not set by default because it is likely some
devices that use IOMMU for DMA on Xen will cause DMA errors and memory
corruption when Xen PV block and network drivers are in use on the system.

Link: 
https://lore.kernel.org/xen-devel/acfab1c5-eed1-4930-8c70-8681e256c...@netscape.net/

Signed-off-by: Chuck Zmudzinski 
---
The reported error with the Exynos mixer is not fixed by default by adding
a second patch to select the new option in the Kconfig definition for the
Exynos mixer if EXYNOS_IOMMU and SWIOTLB_XEN are enabled because it is
not certain setting the config option is suitable for all cases. So it is
necessary to explicitly select the new config option during the config
stage of the Linux kernel build to fix the reported error or similar
errors that have the same cause of lack of support for IOMMU on Xen. This
is necessary to avoid any regressions that might be caused by enabling the
new option by default for the Exynos mixer.
  arch/arm/mm/dma-mapping.c |  6 ++
  drivers/xen/Kconfig   | 16 
  2 files changed, 22 insertions(+)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 5409225b4abc..ca04fdf01be3 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1779,6 +1779,12 @@ void arch_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size,
if (iommu)
arm_setup_iommu_dma_ops(dev, dma_base, size, iommu, coherent);
  
+#ifdef CONFIG_ARM_DMA_USE_IOMMU_XEN


FWIW I don't think this really needs a config option - if Xen *has* made 
an IOMMU available, then there isn't really much reason not to use it, 
and if for some reason someone really didn't want to then they could 
simply disable the IOMMU driver anyway.



+   if (dev->dma_ops == _ops) {
+   dev->archdata.dma_ops_setup = true;


The existing assignment is effectively unconditional by this point 
anyway, so could probably just be moved earlier to save duplicating it 
(or perhaps just make the xen_setup_dma_ops() call conditional instead 
to save the early return as well).


However, are the IOMMU DMA ops really compatible with Xen? The comments 
about hypercalls and foreign memory in xen_arch_need_swiotlb() leave me 
concerned that assuming non-coherent DMA to any old Dom0 page is OK 
might not actually work in general :/


Thanks,
Robin.


+   return;
+   }
+#endif
xen_setup_dma_ops(dev);
dev->archdata.dma_ops_setup = true;
  }
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index d5989871dd5d..44e1334b6acd 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -181,6 +181,22 @@ config SWIOTLB_XEN
select DMA_OPS
select SWIOTLB
  
+config ARM_DMA_USE_IOMMU_XEN

+   bool "Prefer IOMMU DMA ops on Xen"
+   depends on SWIOTLB_XEN
+   depends on ARM_DMA_USE_IOMMU
+   help
+ Normally on Xen, the IOMMU is used by Xen and not exposed to
+ Linux. Some Arm systems such as Exynos have an IOMMU that
+ Xen does not use so the IOMMU is exposed to Linux in those
+ cases. This option enables Linux to use the IOMMU instead of
+ using the Xen swiotlb_dma_ops for DMA on Xen.
+
+ Say N here unless support for one or more devices that use
+ IOMMU ops instead of Xen swiotlb ops for DMA is needed and the
+ devices that use the IOMMU do not cause any problems on the
+ Xen system in use.
+
  config XEN_PCI_STUB
bool
  




Re: [PATCH v2 6/8] dt-bindings: reserved-memory: Add secure CMA reserved memory range

2023-11-14 Thread Robin Murphy

On 13/11/2023 6:37 am, Yong Wu (吴勇) wrote:
[...]

+properties:
+  compatible:
+const: secure_cma_region


Still wrong compatible. Look at other bindings - there is nowhere
underscore. Look at other reserved memory bindings especially.

Also, CMA is a Linux thingy, so either not suitable for bindings at
all,
or you need Linux specific compatible. I don't quite get why do you
evennot
put CMA there - adding Linux specific stuff will get obvious
pushback...


Thanks. I will change to: secure-region. Is this ok?


No, the previous discussion went off in entirely the wrong direction. To 
reiterate, the point of the binding is not to describe the expected 
usage of the thing nor the general concept of the thing, but to describe 
the actual thing itself. There are any number of different ways software 
may interact with a "secure region", so that is meaningless as a 
compatible. It needs to describe *this* secure memory interface offered 
by *this* TEE, so that software knows that to use it requires making 
those particular SiP calls with that particular UUID etc.


Thanks,
Robin.


Re: [PATCH v7 08/10] arm64: Kconfig.platforms: Add config for Marvell PXA1908 platform

2023-11-03 Thread Robin Murphy

On 2023-11-03 5:02 pm, Duje Mihanović wrote:

On Friday, November 3, 2023 4:34:54 PM CET Robin Murphy wrote:

On 2023-11-02 3:20 pm, Duje Mihanović wrote:

+config ARCH_MMP
+   bool "Marvell MMP SoC Family"
+   select ARM_GIC
+   select ARM_ARCH_TIMER
+   select ARM_SMMU


NAK, not only is selecting user-visible symbols generally frowned upon,
and ignoring their dependencies even worse, but for a multiplatform
kernel the user may well want this to be a module.

If having the SMMU driver built-in is somehow fundamentally required for
this platform to boot, that would represent much bigger problems.


The SoC can boot without SMMU and PDMA, but not GIC, pinctrl or the arch
timer. I see that most other SoCs still select drivers and frameworks they
presumably need for booting, with the exceptions of ARCH_BITMAIN, ARCH_LG1K
and a couple others. Which of these two options should I go for?


Well, you don't really need to select ARM_GIC or ARM_ARCH_TIMER here 
either, since those are already selected by ARM64 itself. Keeping 
PINCTRL_SINGLE is fair, although you should also select PINCTRL as its 
dependency.


As an additional nit, the file seems to be primarily ordered by symbol 
name, so it might be nice to slip ARCH_MMC in between ARCH_MESON and 
ARCH_MVEBU.


Cheers,
Robin.


Re: [PATCH v7 08/10] arm64: Kconfig.platforms: Add config for Marvell PXA1908 platform

2023-11-03 Thread Robin Murphy

On 2023-11-02 3:20 pm, Duje Mihanović wrote:

Add ARCH_MMP configuration option for Marvell PXA1908 SoC.

Signed-off-by: Duje Mihanović 
---
  arch/arm64/Kconfig.platforms | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index 6069120199bb..b417cae42c84 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -89,6 +89,17 @@ config ARCH_BERLIN
help
  This enables support for Marvell Berlin SoC Family
  
+config ARCH_MMP

+   bool "Marvell MMP SoC Family"
+   select ARM_GIC
+   select ARM_ARCH_TIMER
+   select ARM_SMMU


NAK, not only is selecting user-visible symbols generally frowned upon, 
and ignoring their dependencies even worse, but for a multiplatform 
kernel the user may well want this to be a module.


If having the SMMU driver built-in is somehow fundamentally required for 
this platform to boot, that would represent much bigger problems.


Thanks,
Robin.


+   select MMP_PDMA
+   select PINCTRL_SINGLE
+   help
+ This enables support for Marvell MMP SoC family, currently
+ supporting PXA1908 aka IAP140.
+
  config ARCH_BITMAIN
bool "Bitmain SoC Platforms"
help


Re: [PATCH v7 06/10] ASoC: pxa: Suppress SSPA on ARM64

2023-11-03 Thread Robin Murphy

On 2023-11-02 3:26 pm, Mark Brown wrote:

On Thu, Nov 02, 2023 at 04:20:29PM +0100, Duje Mihanović wrote:

The SSPA driver currently seems to generate ARM32 assembly, which causes
build errors when building a kernel for an ARM64 ARCH_MMP platform.

Fixes: fa375d42f0e5 ("ASoC: mmp: add sspa support")
Reported-by: kernel test robot 



tristate "SoC Audio via MMP SSPA ports"
-   depends on ARCH_MMP
+   depends on ARCH_MMP && ARM


This isn't a fix for the existing code, AFAICT the issue here is that
ARCH_MMP is currently only available for arm and presumably something in
the rest of your series makes it available for arm64.  This would be a
prerequisite for that patch.

Please don't just insert random fixes tags just because you can.


FWIW it doesn't even seem to be the right reason either. AFACIT the 
issue being introduced is that SND_MMP_SOC_SSPA selects SND_ARM which 
depends on ARM, but after patch #8 ARCH_MMP itself will no longer 
necessarily imply ARM. The fact that selecting SND_ARM with unmet 
dependencies also allows SND_ARMAACI to be enabled (which appears to be 
the only thing actually containing open-coded Arm asm) is tangential.


Robin.


Re: [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-29 Thread Robin Murphy

On 29/09/2023 4:45 pm, Will Deacon wrote:

On Mon, Sep 25, 2023 at 06:54:42PM +0100, Robin Murphy wrote:

On 2023-04-10 19:52, Dmitry Baryshkov wrote:

If the Adreno SMMU is dma-coherent, allocation will fail unless we
disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
coherent SMMUs (like we have on sm8350 platform).


Hmm, but is it right that it should fail in the first place? The fact is
that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly
can't see why the io-pgtable code is going out of its way to explicitly
reject a request to give them the same attribute it's already giving then
anyway :/

Even if the original intent was for the quirk to have an over-specific
implication of representing inner-NC as well, that hardly seems useful if
what we've ended up with in practice is a nonsensical-looking check in one
place and then a weird hacky bodge in another purely to work around it.

Does anyone know a good reason why this is the way it is?


I think it was mainly because the quick doesn't make sense for a coherent
page-table walker and we could in theory use that bit for something else
in that case.


Yuck, even if we did want some horrible notion of quirks being 
conditional on parts of the config rather than just the format, then the 
users would need to be testing for the same condition as the pagetable 
code itself (i.e. cfg->coherent_walk), rather than hoping some other 
property of something else indirectly reflects the right information - 
e.g. there'd be no hope of backporting this particular bodge before 5.19 
where the old iommu_capable(IOMMU_CAP_CACHE_COHERENCY) always returned 
true, and in future we could conceivably support coherent SMMUs being 
configured for non-coherent walks on a per-domain basis.


Furthermore, if we did overload a flag to have multiple meanings, then 
we'd have no way of knowing which one the caller was actually expecting, 
thus the illusion of being able to validate calls in the meantime isn't 
necessarily as helpful as it seems, particularly in a case where the 
"wrong" interpretation would be to have no effect anyway. Mostly though 
I'd hope that if we ever got anywhere near the point of running out of 
quirk bits we'd have already realised that it's time for a better 
interface :(


Based on that, I think that when I do get round to needing to touch this 
code, I'll propose just streamlining the whole quirk.


Cheers,
Robin.


Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-29 Thread Robin Murphy

On 29/09/2023 4:45 pm, Will Deacon wrote:

On Mon, Sep 25, 2023 at 06:54:42PM +0100, Robin Murphy wrote:

On 2023-04-10 19:52, Dmitry Baryshkov wrote:

If the Adreno SMMU is dma-coherent, allocation will fail unless we
disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
coherent SMMUs (like we have on sm8350 platform).


Hmm, but is it right that it should fail in the first place? The fact is
that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly
can't see why the io-pgtable code is going out of its way to explicitly
reject a request to give them the same attribute it's already giving then
anyway :/

Even if the original intent was for the quirk to have an over-specific
implication of representing inner-NC as well, that hardly seems useful if
what we've ended up with in practice is a nonsensical-looking check in one
place and then a weird hacky bodge in another purely to work around it.

Does anyone know a good reason why this is the way it is?


I think it was mainly because the quick doesn't make sense for a coherent
page-table walker and we could in theory use that bit for something else
in that case.


Yuck, even if we did want some horrible notion of quirks being 
conditional on parts of the config rather than just the format, then the 
users would need to be testing for the same condition as the pagetable 
code itself (i.e. cfg->coherent_walk), rather than hoping some other 
property of something else indirectly reflects the right information - 
e.g. there'd be no hope of backporting this particular bodge before 5.19 
where the old iommu_capable(IOMMU_CAP_CACHE_COHERENCY) always returned 
true, and in future we could conceivably support coherent SMMUs being 
configured for non-coherent walks on a per-domain basis.


Furthermore, if we did overload a flag to have multiple meanings, then 
we'd have no way of knowing which one the caller was actually expecting, 
thus the illusion of being able to validate calls in the meantime isn't 
necessarily as helpful as it seems, particularly in a case where the 
"wrong" interpretation would be to have no effect anyway. Mostly though 
I'd hope that if we ever got anywhere near the point of running out of 
quirk bits we'd have already realised that it's time for a better 
interface :(


Based on that, I think that when I do get round to needing to touch this 
code, I'll propose just streamlining the whole quirk.


Cheers,
Robin.


Re: [PATCH 6/8] iommu/dart: Move the blocked domain support to a global static

2023-09-26 Thread Robin Murphy

On 2023-09-26 20:05, Janne Grunau wrote:

Hej,

On Fri, Sep 22, 2023 at 02:07:57PM -0300, Jason Gunthorpe wrote:

Move to the new static global for blocked domains. Move the blocked
specific code to apple_dart_attach_dev_blocked().

Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/apple-dart.c | 36 ++--
  1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 424f779ccc34df..376f4c5461e8f7 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -675,10 +675,6 @@ static int apple_dart_attach_dev(struct iommu_domain 
*domain,
for_each_stream_map(i, cfg, stream_map)
apple_dart_setup_translation(dart_domain, stream_map);
break;
-   case IOMMU_DOMAIN_BLOCKED:
-   for_each_stream_map(i, cfg, stream_map)
-   apple_dart_hw_disable_dma(stream_map);
-   break;
default:
return -EINVAL;
}
@@ -710,6 +706,30 @@ static struct iommu_domain apple_dart_identity_domain = {
.ops = _dart_identity_ops,
  };
  
+static int apple_dart_attach_dev_blocked(struct iommu_domain *domain,

+struct device *dev)
+{
+   struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+   struct apple_dart_stream_map *stream_map;
+   int i;
+
+   if (cfg->stream_maps[0].dart->force_bypass)
+   return -EINVAL;


unrelated to this change as this keeps the current behavior but I think
force_bypass should not override IOMMU_DOMAIN_BLOCKED. It is set if the
CPU page size is smaller than dart's page size. Obviously dart can't
translate in that situation but it should be still possible to block it
completely.

How do we manage this? I can write a patch either to the current state
or based on this series.


The series is queued already, so best to send a patch based on 
iommu/core (I guess just removing these lines?). It won't be 
super-useful in practice since the blocking domain is normally only used 
to transition to an unmanaged domain which in the force_bypass situation 
can't be used anyway, but it's still nice on principle not to have 
unnecessary reasons for attach to fail.


Thanks,
Robin.




+
+   for_each_stream_map(i, cfg, stream_map)
+   apple_dart_hw_disable_dma(stream_map);
+   return 0;
+}
+
+static const struct iommu_domain_ops apple_dart_blocked_ops = {
+   .attach_dev = apple_dart_attach_dev_blocked,
+};
+
+static struct iommu_domain apple_dart_blocked_domain = {
+   .type = IOMMU_DOMAIN_BLOCKED,
+   .ops = _dart_blocked_ops,
+};
+
  static struct iommu_device *apple_dart_probe_device(struct device *dev)
  {
struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
@@ -739,8 +759,7 @@ static struct iommu_domain 
*apple_dart_domain_alloc(unsigned int type)
  {
struct apple_dart_domain *dart_domain;
  
-	if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED &&

-   type != IOMMU_DOMAIN_BLOCKED)
+   if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED)
return NULL;
  
  	dart_domain = kzalloc(sizeof(*dart_domain), GFP_KERNEL);

@@ -749,10 +768,6 @@ static struct iommu_domain 
*apple_dart_domain_alloc(unsigned int type)
  
  	mutex_init(_domain->init_lock);
  
-	/* no need to allocate pgtbl_ops or do any other finalization steps */

-   if (type == IOMMU_DOMAIN_BLOCKED)
-   dart_domain->finalized = true;
-
return _domain->domain;
  }
  
@@ -966,6 +981,7 @@ static void apple_dart_get_resv_regions(struct device *dev,
  
  static const struct iommu_ops apple_dart_iommu_ops = {

.identity_domain = _dart_identity_domain,
+   .blocked_domain = _dart_blocked_domain,
.domain_alloc = apple_dart_domain_alloc,
.probe_device = apple_dart_probe_device,
.release_device = apple_dart_release_device,
--
2.42.0


Reviewed-by: Janne Grunau 

best regards
Janne

ps: I sent the reply to [Patch 4/8] accidentally with an incorrect from
address but the correct Reviewed-by:. I can resend if necessary.


Re: [PATCH 6/8] iommu/dart: Move the blocked domain support to a global static

2023-09-26 Thread Robin Murphy

On 2023-09-26 20:34, Robin Murphy wrote:

On 2023-09-26 20:05, Janne Grunau wrote:

Hej,

On Fri, Sep 22, 2023 at 02:07:57PM -0300, Jason Gunthorpe wrote:

Move to the new static global for blocked domains. Move the blocked
specific code to apple_dart_attach_dev_blocked().

Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/apple-dart.c | 36 ++--
  1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 424f779ccc34df..376f4c5461e8f7 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -675,10 +675,6 @@ static int apple_dart_attach_dev(struct 
iommu_domain *domain,

  for_each_stream_map(i, cfg, stream_map)
  apple_dart_setup_translation(dart_domain, stream_map);
  break;
-    case IOMMU_DOMAIN_BLOCKED:
-    for_each_stream_map(i, cfg, stream_map)
-    apple_dart_hw_disable_dma(stream_map);
-    break;
  default:
  return -EINVAL;
  }
@@ -710,6 +706,30 @@ static struct iommu_domain 
apple_dart_identity_domain = {

  .ops = _dart_identity_ops,
  };
+static int apple_dart_attach_dev_blocked(struct iommu_domain *domain,
+ struct device *dev)
+{
+    struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+    struct apple_dart_stream_map *stream_map;
+    int i;
+
+    if (cfg->stream_maps[0].dart->force_bypass)
+    return -EINVAL;


unrelated to this change as this keeps the current behavior but I think
force_bypass should not override IOMMU_DOMAIN_BLOCKED. It is set if the
CPU page size is smaller than dart's page size. Obviously dart can't
translate in that situation but it should be still possible to block it
completely.

How do we manage this? I can write a patch either to the current state
or based on this series.


The series is queued already, so best to send a patch based on 
iommu/core (I guess just removing these lines?).


Um, what? This isn't the domain_alloc_paging series itself, Robin you 
fool. Clearly it's time to close the computer and try again tomorrow...


Cheers,
Robin.

It won't be 
super-useful in practice since the blocking domain is normally only used 
to transition to an unmanaged domain which in the force_bypass situation 
can't be used anyway, but it's still nice on principle not to have 
unnecessary reasons for attach to fail.


Thanks,
Robin.




+
+    for_each_stream_map(i, cfg, stream_map)
+    apple_dart_hw_disable_dma(stream_map);
+    return 0;
+}
+
+static const struct iommu_domain_ops apple_dart_blocked_ops = {
+    .attach_dev = apple_dart_attach_dev_blocked,
+};
+
+static struct iommu_domain apple_dart_blocked_domain = {
+    .type = IOMMU_DOMAIN_BLOCKED,
+    .ops = _dart_blocked_ops,
+};
+
  static struct iommu_device *apple_dart_probe_device(struct device 
*dev)

  {
  struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
@@ -739,8 +759,7 @@ static struct iommu_domain 
*apple_dart_domain_alloc(unsigned int type)

  {
  struct apple_dart_domain *dart_domain;
-    if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED &&
-    type != IOMMU_DOMAIN_BLOCKED)
+    if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED)
  return NULL;
  dart_domain = kzalloc(sizeof(*dart_domain), GFP_KERNEL);
@@ -749,10 +768,6 @@ static struct iommu_domain 
*apple_dart_domain_alloc(unsigned int type)

  mutex_init(_domain->init_lock);
-    /* no need to allocate pgtbl_ops or do any other finalization 
steps */

-    if (type == IOMMU_DOMAIN_BLOCKED)
-    dart_domain->finalized = true;
-
  return _domain->domain;
  }
@@ -966,6 +981,7 @@ static void apple_dart_get_resv_regions(struct 
device *dev,

  static const struct iommu_ops apple_dart_iommu_ops = {
  .identity_domain = _dart_identity_domain,
+    .blocked_domain = _dart_blocked_domain,
  .domain_alloc = apple_dart_domain_alloc,
  .probe_device = apple_dart_probe_device,
  .release_device = apple_dart_release_device,
--
2.42.0


Reviewed-by: Janne Grunau 

best regards
Janne

ps: I sent the reply to [Patch 4/8] accidentally with an incorrect from
address but the correct Reviewed-by:. I can resend if necessary.




Re: [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-25 Thread Robin Murphy

On 2023-04-10 19:52, Dmitry Baryshkov wrote:

If the Adreno SMMU is dma-coherent, allocation will fail unless we
disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
coherent SMMUs (like we have on sm8350 platform).


Hmm, but is it right that it should fail in the first place? The fact is 
that if the SMMU is coherent then walks *will* be outer-WBWA, so I 
honestly can't see why the io-pgtable code is going out of its way to 
explicitly reject a request to give them the same attribute it's already 
giving then anyway :/


Even if the original intent was for the quirk to have an over-specific 
implication of representing inner-NC as well, that hardly seems useful 
if what we've ended up with in practice is a nonsensical-looking check 
in one place and then a weird hacky bodge in another purely to work 
around it.


Does anyone know a good reason why this is the way it is?

[ just came across this code in the tree while trying to figure out what 
to do with iommu_set_pgtable_quirks()... ]


Thanks,
Robin.


Fixes: 54af0ceb7595 ("arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU 
nodes")
Reported-by: David Heidelberg 
Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 2942d2548ce6..f74495dcbd96 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1793,7 +1793,8 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct 
platform_device *pdev)
 * This allows GPU to set the bus attributes required to use system
 * cache on behalf of the iommu page table walker.
 */
-   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
+   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice) &&
+   !device_iommu_capable(>dev, IOMMU_CAP_CACHE_COHERENCY))
quirks |= IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
  
  	return adreno_iommu_create_address_space(gpu, pdev, quirks);


Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-25 Thread Robin Murphy

On 2023-04-10 19:52, Dmitry Baryshkov wrote:

If the Adreno SMMU is dma-coherent, allocation will fail unless we
disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
coherent SMMUs (like we have on sm8350 platform).


Hmm, but is it right that it should fail in the first place? The fact is 
that if the SMMU is coherent then walks *will* be outer-WBWA, so I 
honestly can't see why the io-pgtable code is going out of its way to 
explicitly reject a request to give them the same attribute it's already 
giving then anyway :/


Even if the original intent was for the quirk to have an over-specific 
implication of representing inner-NC as well, that hardly seems useful 
if what we've ended up with in practice is a nonsensical-looking check 
in one place and then a weird hacky bodge in another purely to work 
around it.


Does anyone know a good reason why this is the way it is?

[ just came across this code in the tree while trying to figure out what 
to do with iommu_set_pgtable_quirks()... ]


Thanks,
Robin.


Fixes: 54af0ceb7595 ("arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU 
nodes")
Reported-by: David Heidelberg 
Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 2942d2548ce6..f74495dcbd96 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1793,7 +1793,8 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct 
platform_device *pdev)
 * This allows GPU to set the bus attributes required to use system
 * cache on behalf of the iommu page table walker.
 */
-   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
+   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice) &&
+   !device_iommu_capable(>dev, IOMMU_CAP_CACHE_COHERENCY))
quirks |= IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
  
  	return adreno_iommu_create_address_space(gpu, pdev, quirks);


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-25 Thread Robin Murphy

On 2023-09-25 14:29, Jason Gunthorpe wrote:

On Mon, Sep 25, 2023 at 02:07:50PM +0100, Robin Murphy wrote:

On 2023-09-23 00:33, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:


virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
either; it sets it once it's discovered any instance, since apparently it's
assuming that all instances must support identical page sizes, and thus once
it's seen one it can work "normally" per the core code's assumptions. It's
also I think the only driver which has a "finalise" bodge but *can* still
properly support map-before-attach, by virtue of having to replay mappings
to every new endpoint anyway.


Well it can't quite do that since it doesn't know the geometry - it
all is sort of guessing and hoping it doesn't explode on replay. If it
knows the geometry it wouldn't need finalize...


I think it's entirely reasonable to assume that any direct mappings
specified for a device are valid for that device and its IOMMU. However, in
the particular case of virtio, it really shouldn't ever have direct mappings
anyway, since even if the underlying hardware did have any, the host can
enforce the actual direct-mapping aspect itself, and just present them as
unusable regions to the guest.


I assume this machinery is for the ARM GIC ITS page


Again, that's irrelevant. It can only be about whether the actual
->map_pages call succeeds or not. A driver could well know up-front that all
instances support the same pgsize_bitmap and aperture, and set both at
->domain_alloc time, yet still be unable to handle an actual mapping without
knowing which instance(s) that needs to interact with (e.g. omap-iommu).


I think this is a different issue. The domain is supposed to represent
the actual io pte storage, and the storage is supposed to exist even
when the domain is not attached to anything.

As we said with tegra-gart, it is a bug in the driver if all the
mappings disappear when the last device is detached from the domain.
Driver bugs like this turn into significant issues with vfio/iommufd
as this will result in warn_on's and memory leaking.

So, I disagree that this is something we should be allowing in the API
design. map_pages should succeed (memory allocation failures aside) if
a IOVA within the aperture and valid flags are presented. Regardless
of the attachment status. Calling map_pages with an IOVA outside the
aperture should be a caller bug.

It looks omap is just mis-designed to store the pgd in the omap_iommu,
not the omap_iommu_domain :( pgd is clearly a per-domain object in our
API. And why does every instance need its own copy of the identical
pgd?


The point wasn't that it was necessarily a good and justifiable example, 
just that it is one that exists, to demonstrate that in general we have 
no reasonable heuristic for guessing whether ->map_pages is going to 
succeed or not other than by calling it and seeing if it succeeds or 
not. And IMO it's a complete waste of time thinking about ways to make 
such a heuristic possible instead of just getting on with fixing 
iommu_domain_alloc() to make the problem disappear altogether. Once 
Joerg pushes out the current queue I'll rebase and resend v4 of the bus 
ops removal, then hopefully get back to despairing at the hideous pile 
of WIP iommu_domain_alloc() patches I currently have on top of it...


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-25 Thread Robin Murphy

On 2023-09-23 00:33, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:


virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
either; it sets it once it's discovered any instance, since apparently it's
assuming that all instances must support identical page sizes, and thus once
it's seen one it can work "normally" per the core code's assumptions. It's
also I think the only driver which has a "finalise" bodge but *can* still
properly support map-before-attach, by virtue of having to replay mappings
to every new endpoint anyway.


Well it can't quite do that since it doesn't know the geometry - it
all is sort of guessing and hoping it doesn't explode on replay. If it
knows the geometry it wouldn't need finalize...


I think it's entirely reasonable to assume that any direct mappings 
specified for a device are valid for that device and its IOMMU. However, 
in the particular case of virtio, it really shouldn't ever have direct 
mappings anyway, since even if the underlying hardware did have any, the 
host can enforce the actual direct-mapping aspect itself, and just 
present them as unusable regions to the guest.



What do you think about something like this to replace
iommu_create_device_direct_mappings(), that does enforce things
properly?


I fail to see how that would make any practical difference. Either the
mappings can be correctly set up in a pagetable *before* the relevant device
is attached to that pagetable, or they can't (if the driver doesn't have
enough information to be able to do so) and we just have to really hope
nothing blows up in the race window between attaching the device to an empty
pagetable and having a second try at iommu_create_device_direct_mappings().
That's a driver-level issue and has nothing to do with pgsize_bitmap either
way.


Except we don't detect this in the core code correctly, that is my
point. We should detect the aperture conflict, not pgsize_bitmap to
check if it is the first or second try.


Again, that's irrelevant. It can only be about whether the actual 
->map_pages call succeeds or not. A driver could well know up-front that 
all instances support the same pgsize_bitmap and aperture, and set both 
at ->domain_alloc time, yet still be unable to handle an actual mapping 
without knowing which instance(s) that needs to interact with (e.g. 
omap-iommu).


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-22 Thread Robin Murphy

On 22/09/2023 5:27 pm, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 02:13:18PM +0100, Robin Murphy wrote:

On 22/09/2023 1:41 pm, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote:

They're not strictly equivalent: this check works around a temporary issue
with the IOMMU core, which calls map/unmap before the domain is
finalized.


Where? The above points to iommu_create_device_direct_mappings() but
it doesn't because the pgsize_bitmap == 0:


__iommu_domain_alloc() sets pgsize_bitmap in this case:

  /*
   * If not already set, assume all sizes by default; the driver
   * may override this later
   */
  if (!domain->pgsize_bitmap)
  domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;


Dirver's shouldn't do that.

The core code was fixed to try again with mapping reserved regions to
support these kinds of drivers.


This is still the "normal" code path, really; I think it's only AMD that
started initialising the domain bitmap "early" and warranted making it
conditional.


My main point was that iommu_create_device_direct_mappings() should
fail for unfinalized domains, setting pgsize_bitmap to allow it to
succeed is not a nice hack, and not necessary now.


Sure, but it's the whole "unfinalised domains" and rewriting 
domain->pgsize_bitmap after attach thing that is itself the massive 
hack. AMD doesn't do that, and doesn't need to; it knows the appropriate 
format at allocation time and can quite happily return a fully working 
domain which allows map before attach, but the old ops->pgsize_bitmap 
mechanism fundamentally doesn't work for multiple formats with different 
page sizes. The only thing I'd accuse it of doing wrong is the weird 
half-and-half thing of having one format as a default via one mechanism, 
and the other as an override through the other, rather than setting both 
explicitly.


virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings 
either; it sets it once it's discovered any instance, since apparently 
it's assuming that all instances must support identical page sizes, and 
thus once it's seen one it can work "normally" per the core code's 
assumptions. It's also I think the only driver which has a "finalise" 
bodge but *can* still properly support map-before-attach, by virtue of 
having to replay mappings to every new endpoint anyway.



What do you think about something like this to replace
iommu_create_device_direct_mappings(), that does enforce things
properly?


I fail to see how that would make any practical difference. Either the 
mappings can be correctly set up in a pagetable *before* the relevant 
device is attached to that pagetable, or they can't (if the driver 
doesn't have enough information to be able to do so) and we just have to 
really hope nothing blows up in the race window between attaching the 
device to an empty pagetable and having a second try at 
iommu_create_device_direct_mappings(). That's a driver-level issue and 
has nothing to do with pgsize_bitmap either way.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-22 Thread Robin Murphy

On 22/09/2023 1:41 pm, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote:

They're not strictly equivalent: this check works around a temporary issue
with the IOMMU core, which calls map/unmap before the domain is
finalized.


Where? The above points to iommu_create_device_direct_mappings() but
it doesn't because the pgsize_bitmap == 0:


__iommu_domain_alloc() sets pgsize_bitmap in this case:

 /*
  * If not already set, assume all sizes by default; the driver
  * may override this later
  */
 if (!domain->pgsize_bitmap)
 domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;


Dirver's shouldn't do that.

The core code was fixed to try again with mapping reserved regions to
support these kinds of drivers.


This is still the "normal" code path, really; I think it's only AMD that 
started initialising the domain bitmap "early" and warranted making it 
conditional. However we *do* ultimately want all the drivers to do the 
same, so we can get rid of ops->pgsize_bitmap, because it's already 
pretty redundant and meaningless in the face of per-domain pagetable 
formats.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

2023-09-22 Thread Robin Murphy

On 2023-09-21 17:44, Jason Gunthorpe wrote:

On Thu, Sep 21, 2023 at 08:12:03PM +0800, Baolu Lu wrote:

On 2023/9/21 15:51, Yi Liu wrote:

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 4a7c5c8fdbb4..3c8660fe9bb1 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags {
IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
   };
+/**
+ * enum iommu_hwpt_type - IOMMU HWPT Type
+ * @IOMMU_HWPT_TYPE_DEFAULT: default


How about s/default/vendor agnostic/ ?


Please don't use the word vendor :)

IOMMU_HWPT_TYPE_GENERIC perhaps if we don't like default


Ah yes, a default domain type, not to be confused with any default 
domain type, including the default default domain type. Just in case 
anyone had forgotten how gleefully fun this is :D


I particularly like the bit where we end up with this construct later:

switch (hwpt_type) {
case IOMMU_HWPT_TYPE_DEFAULT:
/* allocate a domain */
default:
/* allocate a different domain */
}

But of course neither case allocates a *default* domain, because it's 
quite obviously the wrong place to be doing that.


I could go on enjoying myself, but basically yeah, "default" can't be a 
type in itself (at best it would be a meta-type which could be 
requested, such that it resolves to some real type to actually 
allocate), so a good name should reflect what the type functionally 
*means* to the user. IIUC the important distinction is that it's an 
abstract kernel-owned pagetable for the user to indirectly control via 
the API, rather than one it owns and writes directly (and thus has to be 
in a specific agreed format).


Thanks,
Robin.


Re: arm64: Unable to handle kernel execute from non-executable memory at virtual address ffff8000834c13a0

2023-09-20 Thread Robin Murphy

On 20/09/2023 3:32 pm, Mark Rutland wrote:

Hi Naresh,

On Wed, Sep 20, 2023 at 11:29:12AM +0200, Naresh Kamboju wrote:

[ my two cents ]
While running LTP pty07 test cases on arm64 juno-r2 with Linux next-20230919
the following kernel crash was noticed.

I have been noticing this issue intermittently on Juno-r2 for more than a month.
Anyone have noticed this crash ?


How intermittent is this? 1/2, 1/10, 1/100, rarer still?

Are you running *just* the pty07 test, or are you running a whole LTP suite and
the issue first occurs around pty07?

Given you've been hitting this for a month, have you tried testing mainline? Do
you have a known-good kernel that we can start a bisect from?

Do you *only* see this on Juno-r2 and are you testing on other hardware?


Reported-by: Linux Kernel Functional Testing 

[0.00] Linux version 6.6.0-rc2-next-20230919 (tuxmake@tuxmake)
(aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0, GNU ld (GNU Binutils
for Debian) 2.41) #1 SMP PREEMPT @1695107157
[0.00] KASLR disabled due to lack of seed
[0.00] Machine model: ARM Juno development board (r2)
...
LTP running pty
...

pty07.c:92: TINFO: Saving active console 1
../../../include/tst_fuzzy_sync.h:640: TINFO: Stopped sampling at 552
(out of 1024) samples, sampling time reached 50% of the total time
limit
../../../include/tst_fuzzy_sync.h:307: TINFO: loop = 552, delay_bias = 0
../../../include/tst_fuzzy_sync.h:295: TINFO: start_a - start_b: { avg
=   127ns, avg_dev =84ns, dev_ratio = 0.66 }
../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - start_a  : { avg
= 17296156ns, avg_dev = 5155058ns, dev_ratio = 0.30 }
../../../include/tst_fuzzy_sync.h:295: TINFO: end_b - start_b  : { avg
= 101202336ns, avg_dev = 6689286ns, dev_ratio = 0.07 }
../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - end_b: { avg
= -83906064ns, avg_dev = 10230694ns, dev_ratio = 0.12 }
../../../include/tst_fuzzy_sync.h:295: TINFO: spins: { avg
= 2765565  , avg_dev = 339285  , dev_ratio = 0.12 }
[  384.133538] Unable to handle kernel execute from non-executable
memory at virtual address 8000834c13a0
[  384.133559] Mem abort info:
[  384.133568]   ESR = 0x860f
[  384.133578]   EC = 0x21: IABT (current EL), IL = 32 bits
[  384.133590]   SET = 0, FnV = 0
[  384.133600]   EA = 0, S1PTW = 0
[  384.133610]   FSC = 0x0f: level 3 permission fault
[  384.133621] swapper pgtable: 4k pages, 48-bit VAs, pgdp=82375000
[  384.133634] [8000834c13a0] pgd=1009f003,
p4d=1009f003, pud=1009e003, pmd=10098003,
pte=0078836c1703
[  384.133697] Internal error: Oops: 860f [#1] PREEMPT SMP
[  384.133707] Modules linked in: tda998x onboard_usb_hub cec hdlcd
crct10dif_ce drm_dma_helper drm_kms_helper fuse drm backlight dm_mod
ip_tables x_tables
[  384.133767] CPU: 3 PID: 589 Comm: (udev-worker) Not tainted
6.6.0-rc2-next-20230919 #1
[  384.133779] Hardware name: ARM Juno development board (r2) (DT)
[  384.133784] pstate: 4005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  384.133796] pc : in_lookup_hashtable+0x178/0x2000


This indicates that the faulting address 8000834c13a0 is
in_lookup_hashtable+0x178/0x2000, which would been we've somehow marked the
kernel text as non-executable, which we never do intentionally.

I suspect that implies memory corruption. Have you tried running this with
KASAN enabled?


[  384.133818] lr : rcu_core (arch/arm64/include/asm/preempt.h:13
(discriminator 1) kernel/rcu/tree.c:2146 (discriminator 1)
kernel/rcu/tree.c:2403 (discriminator 1))


For the record, this LR appears to be the expected return address of the 
"f(rhp);" call within rcu_do_batch() (if CONFIG_DEBUG_LOCK_ALLOC=n), so 
it looks like a case of a bogus or corrupted RCU callback. The PC is in 
the middle of a data symbol (in_lookup_hashtable is an array), so NX is 
expected and I wouldn't imagine the pagetables have gone wrong, just 
regular data corruption or use-after-free somewhere.


Robin.


[  384.133832] sp : 800083533e60
[  384.133836] x29: 800083533e60 x28: 0008008a6180 x27: 000a
[  384.133854] x26:  x25:  x24: 800083533f10
[  384.133871] x23: 800082404008 x22: 800082ebea80 x21: 800082f55940
[  384.133889] x20: 00097ed75440 x19: 0001 x18: 
[  384.133905] x17: 8008fc95c000 x16: 80008353 x15: 3d09
[  384.133922] x14: 00030d40 x13:  x12: 003d0900
[  384.133939] x11:  x10: 0008 x9 : 80008015b05c
[  384.133955] x8 : 800083533da8 x7 :  x6 : 0100
[  384.133971] x5 : 800082ebf000 x4 : 800082ebf2e8 x3 : 
[  384.133987] x2 : 000825bf8618 x1 : 8000834c13a0 x0 : 00082b6d7170
[  384.134005] Call trace:
[  384.134009] in_lookup_hashtable+0x178/0x2000
[  384.134022] rcu_core_si (kernel/rcu/tree.c:2421)
[  384.134035] 

Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-19 Thread Robin Murphy

On 2023-09-19 09:15, Jean-Philippe Brucker wrote:

On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote:

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 17dcd826f5c2..3649586f0e5c 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
int ret;
unsigned long flags;
+   /*
+* .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
+* is initialized e.g. via iommu_create_device_direct_mappings()
+*/
+   if (!viommu)
+   return 0;


Minor nit: I'd be inclined to make that check explicitly in the places where
it definitely is expected, rather than allowing *any* sync to silently do
nothing if called incorrectly. Plus then they could use
vdomain->nr_endpoints for consistency with the equivalent checks elsewhere
(it did take me a moment to figure out how we could get to .iotlb_sync_map
with a NULL viommu without viommu_map_pages() blowing up first...)


They're not strictly equivalent: this check works around a temporary issue
with the IOMMU core, which calls map/unmap before the domain is finalized.
Once we merge domain_alloc() and finalize(), then this check disappears,
but we still need to test nr_endpoints in map/unmap to handle detached
domains (and we still need to fix the synchronization of nr_endpoints
against attach/detach). That's why I preferred doing this on viommu and
keeping it in one place.


Fair enough - it just seems to me that in both cases it's a detached 
domain, so its previous history of whether it's ever been otherwise or 
not shouldn't matter. Even once viommu is initialised, does it really 
make sense to send sync commands for a mapping on a detached domain 
where we haven't actually sent any map/unmap commands?


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-18 Thread Robin Murphy

On 2023-09-18 12:51, Niklas Schnelle wrote:

Pull out the sync operation from viommu_map_pages() by implementing
ops->iotlb_sync_map. This allows the common IOMMU code to map multiple
elements of an sg with a single sync (see iommu_map_sg()). Furthermore,
it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH.


Is it really a requirement? Deferred flush only deals with unmapping. Or 
are you just trying to say that it's not too worthwhile to try doing 
more for unmapping performance while obvious mapping performance is 
still left on the table?



Link: 
https://lore.kernel.org/lkml/20230726111433.1105665-1-schne...@linux.ibm.com/
Signed-off-by: Niklas Schnelle 
---
  drivers/iommu/virtio-iommu.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 17dcd826f5c2..3649586f0e5c 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
int ret;
unsigned long flags;
  
+	/*

+* .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
+* is initialized e.g. via iommu_create_device_direct_mappings()
+*/
+   if (!viommu)
+   return 0;


Minor nit: I'd be inclined to make that check explicitly in the places 
where it definitely is expected, rather than allowing *any* sync to 
silently do nothing if called incorrectly. Plus then they could use 
vdomain->nr_endpoints for consistency with the equivalent checks 
elsewhere (it did take me a moment to figure out how we could get to 
.iotlb_sync_map with a NULL viommu without viommu_map_pages() blowing up 
first...)


Thanks,
Robin.


spin_lock_irqsave(>request_lock, flags);
ret = __viommu_sync_req(viommu);
if (ret)
@@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, 
unsigned long iova,
.flags  = cpu_to_le32(flags),
};
  
-		ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));

+   ret = viommu_add_req(vdomain->viommu, , sizeof(map));
if (ret) {
viommu_del_mappings(vdomain, iova, end);
return ret;
@@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain,
viommu_sync_req(vdomain->viommu);
  }
  
+static int viommu_iotlb_sync_map(struct iommu_domain *domain,

+unsigned long iova, size_t size)
+{
+   struct viommu_domain *vdomain = to_viommu_domain(domain);
+
+   return viommu_sync_req(vdomain->viommu);
+}
+
  static void viommu_get_resv_regions(struct device *dev, struct list_head 
*head)
  {
struct iommu_resv_region *entry, *new_entry, *msi = NULL;
@@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = {
.unmap_pages= viommu_unmap_pages,
.iova_to_phys   = viommu_iova_to_phys,
.iotlb_sync = viommu_iotlb_sync,
+   .iotlb_sync_map = viommu_iotlb_sync_map,
.free   = viommu_domain_free,
}
  };


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 8/9] dt-bindings: reserved-memory: MediaTek: Add reserved memory for SVP

2023-09-12 Thread Robin Murphy

On 12/09/2023 4:53 pm, Rob Herring wrote:

On Tue, Sep 12, 2023 at 11:13:50AM +0100, Robin Murphy wrote:

On 12/09/2023 9:28 am, Krzysztof Kozlowski wrote:

On 12/09/2023 08:16, Yong Wu (吴勇) wrote:

Hi Rob,

Thanks for your review.

On Mon, 2023-09-11 at 10:44 -0500, Rob Herring wrote:


External email : Please do not click links or open attachments until
you have verified the sender or the content.
   On Mon, Sep 11, 2023 at 10:30:37AM +0800, Yong Wu wrote:

This adds the binding for describing a CMA memory for MediaTek

SVP(Secure

Video Path).


CMA is a Linux thing. How is this related to CMA?




Signed-off-by: Yong Wu 
---
   .../mediatek,secure_cma_chunkmem.yaml | 42

+++

   1 file changed, 42 insertions(+)
   create mode 100644 Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml


diff --git a/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml
b/Documentation/devicetree/bindings/reserved-
memory/mediatek,secure_cma_chunkmem.yaml

new file mode 100644
index ..cc10e00d35c4
--- /dev/null
+++ b/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml

@@ -0,0 +1,42 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id:

http://devicetree.org/schemas/reserved-memory/mediatek,secure_cma_chunkmem.yaml#

+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MediaTek Secure Video Path Reserved Memory


What makes this specific to Mediatek? Secure video path is fairly
common, right?


Here we just reserve a buffer and would like to create a dma-buf secure
heap for SVP, then the secure engines(Vcodec and DRM) could prepare
secure buffer through it.
But the heap driver is pure SW driver, it is not platform device and


All drivers are pure SW.


we don't have a corresponding HW unit for it. Thus I don't think I
could create a platform dtsi node and use "memory-region" pointer to
the region. I used RESERVEDMEM_OF_DECLARE currently(The code is in
[9/9]). Sorry if this is not right.


If this is not for any hardware and you already understand this (since
you cannot use other bindings) then you cannot have custom bindings for
it either.



Then in our usage case, is there some similar method to do this? or
any other suggestion?


Don't stuff software into DTS.


Aren't most reserved-memory bindings just software policy if you look at it
that way, though? IIUC this is a pool of memory that is visible and
available to the Non-Secure OS, but is fundamentally owned by the Secure
TEE, and pages that the TEE allocates from it will become physically
inaccessible to the OS. Thus the platform does impose constraints on how the
Non-Secure OS may use it, and per the rest of the reserved-memory bindings,
describing it as a "reusable" reservation seems entirely appropriate. If
anything that's *more* platform-related and so DT-relevant than typical
arbitrary reservations which just represent "save some memory to dedicate to
a particular driver" and don't actually bear any relationship to firmware or
hardware at all.


Yes, a memory range defined by hardware or firmware is within scope of
DT. (CMA at aribitrary address was questionable.)

My issue here is more that 'secure video memory' is not any way Mediatek
specific. AIUI, it's a requirement from certain content providers for
video playback to work. So why the Mediatek specific binding?


Based on the implementation, I'd ask the question the other way round - 
the way it works looks to be at least somewhat dependent on Mediatek's 
TEE, in ways where other vendors' equivalent implementations may be 
functionally incompatible, however nothing suggests it's actually 
specific to video (beyond that presumably being the primary use-case 
they had in mind).


Thanks,
Robin.


Re: [PATCH 8/9] dt-bindings: reserved-memory: MediaTek: Add reserved memory for SVP

2023-09-12 Thread Robin Murphy

On 12/09/2023 9:28 am, Krzysztof Kozlowski wrote:

On 12/09/2023 08:16, Yong Wu (吴勇) wrote:

Hi Rob,

Thanks for your review.

On Mon, 2023-09-11 at 10:44 -0500, Rob Herring wrote:


External email : Please do not click links or open attachments until
you have verified the sender or the content.
  On Mon, Sep 11, 2023 at 10:30:37AM +0800, Yong Wu wrote:

This adds the binding for describing a CMA memory for MediaTek

SVP(Secure

Video Path).


CMA is a Linux thing. How is this related to CMA?




Signed-off-by: Yong Wu 
---
  .../mediatek,secure_cma_chunkmem.yaml | 42

+++

  1 file changed, 42 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml


diff --git a/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml
b/Documentation/devicetree/bindings/reserved-
memory/mediatek,secure_cma_chunkmem.yaml

new file mode 100644
index ..cc10e00d35c4
--- /dev/null
+++ b/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml

@@ -0,0 +1,42 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id:

http://devicetree.org/schemas/reserved-memory/mediatek,secure_cma_chunkmem.yaml#

+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MediaTek Secure Video Path Reserved Memory


What makes this specific to Mediatek? Secure video path is fairly
common, right?


Here we just reserve a buffer and would like to create a dma-buf secure
heap for SVP, then the secure engines(Vcodec and DRM) could prepare
secure buffer through it.
  
But the heap driver is pure SW driver, it is not platform device and


All drivers are pure SW.


we don't have a corresponding HW unit for it. Thus I don't think I
could create a platform dtsi node and use "memory-region" pointer to
the region. I used RESERVEDMEM_OF_DECLARE currently(The code is in
[9/9]). Sorry if this is not right.


If this is not for any hardware and you already understand this (since
you cannot use other bindings) then you cannot have custom bindings for
it either.



Then in our usage case, is there some similar method to do this? or
any other suggestion?


Don't stuff software into DTS.


Aren't most reserved-memory bindings just software policy if you look at 
it that way, though? IIUC this is a pool of memory that is visible and 
available to the Non-Secure OS, but is fundamentally owned by the Secure 
TEE, and pages that the TEE allocates from it will become physically 
inaccessible to the OS. Thus the platform does impose constraints on how 
the Non-Secure OS may use it, and per the rest of the reserved-memory 
bindings, describing it as a "reusable" reservation seems entirely 
appropriate. If anything that's *more* platform-related and so 
DT-relevant than typical arbitrary reservations which just represent 
"save some memory to dedicate to a particular driver" and don't actually 
bear any relationship to firmware or hardware at all.


However, the fact that Linux's implementation of how to reuse reserved 
memory areas is called CMA is indeed still irrelevant and has no place 
in the binding itself.


Thanks,
Robin.


Re: [PATCH 3/5] armv8: fsl-layerscape: create bypass smmu mapping for MC

2023-09-06 Thread Robin Murphy

On 2023-09-06 19:10, Laurentiu Tudor wrote:



On 9/6/2023 8:21 PM, Robin Murphy wrote:

On 2023-09-06 17:01, Laurentiu Tudor wrote:

MC being a plain DMA master as any other device in the SoC and
being live at OS boot time, as soon as the SMMU is probed it
will immediately start triggering faults because there is no
mapping in the SMMU for the MC. Pre-create such a mapping in
the SMMU, being the OS's responsibility to preserve it.


Does U-Boot enable the SMMU? AFAICS the only thing it knows how to do 
is explicitly turn it *off*, therefore programming other registers 
appears to be a complete waste of time.


No, it doesn't enable SMMU but it does mark a SMR as valid for MC FW. 
And the ARM SMMU driver subtly preserves it, see [1] (it's late and I 
might be wrong, but I'll double check tomorrow). :-)


No, that sets the SMR valid bit *if* the corresponding entry is 
allocated and marked as valid in the software state in smmu->smrs, which 
at probe time it isn't, because that's only just been allocated and is 
still zero-initialised. Unless, that is, 
arm_smmu_rmr_install_bypass_smr() found a reserved region and 
preallocated an entry to honour it. But even those entries are still 
constructed from scratch; we can't do anything with the existing 
SMR/S2CR register contents in general since they may be uninitialised 
random reset values, so we don't even look.


Pay no attention to the qcom_smmu_cfg_probe() hack either - that only 
exists on the promise that the relevant platforms couldn't have their 
firmware updated to use proper RMRs.


You're already doing the right thing in patch #2, so there's no need to 
waste code on doing a pointless wrong thing as well.


Thanks,
Robin.

All that should matter to the OS, and that it is responsible for 
upholding, is the reserved memory regions from patch #2. For instance, 
if the OS is Linux, literally the first thing arm_smmu_device_reset() 
does is rewrite all the S2CRs and SMRs without so much as looking.


[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/arm/arm-smmu/arm-smmu.c#n894


---
Best Regards, Laurentiu




Signed-off-by: Laurentiu Tudor 
---
  arch/arm/cpu/armv8/fsl-layerscape/soc.c   | 26 ---
  .../asm/arch-fsl-layerscape/immap_lsch3.h |  9 +++
  2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/arm/cpu/armv8/fsl-layerscape/soc.c 
b/arch/arm/cpu/armv8/fsl-layerscape/soc.c

index 3bfdc3f77431..870b99838ab5 100644
--- a/arch/arm/cpu/armv8/fsl-layerscape/soc.c
+++ b/arch/arm/cpu/armv8/fsl-layerscape/soc.c
@@ -376,6 +376,18 @@ void bypass_smmu(void)
  val = (in_le32(SMMU_NSCR0) | SCR0_CLIENTPD_MASK) & 
~(SCR0_USFCFG_MASK);

  out_le32(SMMU_NSCR0, val);
  }
+
+void setup_smmu_mc_bypass(int icid, int mask)
+{
+    u32 val;
+
+    val = SMMU_SMR_VALID_MASK | (icid << SMMU_SMR_ID_SHIFT) |
+    (mask << SMMU_SMR_MASK_SHIFT);
+    out_le32(SMMU_REG_SMR(0), val);
+    val = SMMU_S2CR_EXIDVALID_VALID_MASK | SMMU_S2CR_TYPE_BYPASS_MASK;
+    out_le32(SMMU_REG_S2CR(0), val);
+}
+
  void fsl_lsch3_early_init_f(void)
  {
  erratum_rcw_src();
@@ -402,10 +414,18 @@ void fsl_lsch3_early_init_f(void)
  bypass_smmu();
  #endif
-#if defined(CONFIG_ARCH_LS1088A) || defined(CONFIG_ARCH_LS1028A) || \
-    defined(CONFIG_ARCH_LS2080A) || defined(CONFIG_ARCH_LX2160A) || \
-    defined(CONFIG_ARCH_LX2162A)
+#ifdef CONFIG_ARCH_LS1028A
+    set_icids();
+#endif
+
+#if defined(CONFIG_ARCH_LS1088A) || defined(CONFIG_ARCH_LS2080A)
+    set_icids();
+    setup_smmu_mc_bypass(0x300, 0);
+#endif
+
+#if defined(CONFIG_ARCH_LX2160A) || defined(CONFIG_ARCH_LX2162A)
  set_icids();
+    setup_smmu_mc_bypass(0x4000, 0);
  #endif
  }
diff --git a/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h 
b/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h

index ca5e33379ba9..bec5355adaed 100644
--- a/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h
+++ b/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h
@@ -190,6 +190,15 @@
  #define SCR0_CLIENTPD_MASK    0x0001
  #define SCR0_USFCFG_MASK    0x0400
+#define SMMU_REG_SMR(n)    (SMMU_BASE + 0x800 + ((n) << 2))
+#define SMMU_REG_S2CR(n)    (SMMU_BASE + 0xc00 + ((n) << 2))
+#define SMMU_SMR_VALID_MASK    0x8000
+#define SMMU_SMR_MASK_MASK    0x
+#define SMMU_SMR_MASK_SHIFT    16
+#define SMMU_SMR_ID_MASK    0x
+#define SMMU_SMR_ID_SHIFT    0
+#define SMMU_S2CR_EXIDVALID_VALID_MASK    0x0400
+#define SMMU_S2CR_TYPE_BYPASS_MASK    0x0001
  /* PCIe */
  #define CFG_SYS_PCIE1_ADDR    (CONFIG_SYS_IMMR + 0x240)


Re: [PATCH 3/5] armv8: fsl-layerscape: create bypass smmu mapping for MC

2023-09-06 Thread Robin Murphy

On 2023-09-06 17:01, Laurentiu Tudor wrote:

MC being a plain DMA master as any other device in the SoC and
being live at OS boot time, as soon as the SMMU is probed it
will immediately start triggering faults because there is no
mapping in the SMMU for the MC. Pre-create such a mapping in
the SMMU, being the OS's responsibility to preserve it.


Does U-Boot enable the SMMU? AFAICS the only thing it knows how to do is 
explicitly turn it *off*, therefore programming other registers appears 
to be a complete waste of time.


All that should matter to the OS, and that it is responsible for 
upholding, is the reserved memory regions from patch #2. For instance, 
if the OS is Linux, literally the first thing arm_smmu_device_reset() 
does is rewrite all the S2CRs and SMRs without so much as looking.


Thanks,
Robin.


Signed-off-by: Laurentiu Tudor 
---
  arch/arm/cpu/armv8/fsl-layerscape/soc.c   | 26 ---
  .../asm/arch-fsl-layerscape/immap_lsch3.h |  9 +++
  2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/arm/cpu/armv8/fsl-layerscape/soc.c 
b/arch/arm/cpu/armv8/fsl-layerscape/soc.c
index 3bfdc3f77431..870b99838ab5 100644
--- a/arch/arm/cpu/armv8/fsl-layerscape/soc.c
+++ b/arch/arm/cpu/armv8/fsl-layerscape/soc.c
@@ -376,6 +376,18 @@ void bypass_smmu(void)
val = (in_le32(SMMU_NSCR0) | SCR0_CLIENTPD_MASK) & ~(SCR0_USFCFG_MASK);
out_le32(SMMU_NSCR0, val);
  }
+
+void setup_smmu_mc_bypass(int icid, int mask)
+{
+   u32 val;
+
+   val = SMMU_SMR_VALID_MASK | (icid << SMMU_SMR_ID_SHIFT) |
+   (mask << SMMU_SMR_MASK_SHIFT);
+   out_le32(SMMU_REG_SMR(0), val);
+   val = SMMU_S2CR_EXIDVALID_VALID_MASK | SMMU_S2CR_TYPE_BYPASS_MASK;
+   out_le32(SMMU_REG_S2CR(0), val);
+}
+
  void fsl_lsch3_early_init_f(void)
  {
erratum_rcw_src();
@@ -402,10 +414,18 @@ void fsl_lsch3_early_init_f(void)
bypass_smmu();
  #endif
  
-#if defined(CONFIG_ARCH_LS1088A) || defined(CONFIG_ARCH_LS1028A) || \

-   defined(CONFIG_ARCH_LS2080A) || defined(CONFIG_ARCH_LX2160A) || \
-   defined(CONFIG_ARCH_LX2162A)
+#ifdef CONFIG_ARCH_LS1028A
+   set_icids();
+#endif
+
+#if defined(CONFIG_ARCH_LS1088A) || defined(CONFIG_ARCH_LS2080A)
+   set_icids();
+   setup_smmu_mc_bypass(0x300, 0);
+#endif
+
+#if defined(CONFIG_ARCH_LX2160A) || defined(CONFIG_ARCH_LX2162A)
set_icids();
+   setup_smmu_mc_bypass(0x4000, 0);
  #endif
  }
  
diff --git a/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h b/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h

index ca5e33379ba9..bec5355adaed 100644
--- a/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h
+++ b/arch/arm/include/asm/arch-fsl-layerscape/immap_lsch3.h
@@ -190,6 +190,15 @@
  #define SCR0_CLIENTPD_MASK0x0001
  #define SCR0_USFCFG_MASK  0x0400
  
+#define SMMU_REG_SMR(n)			(SMMU_BASE + 0x800 + ((n) << 2))

+#define SMMU_REG_S2CR(n)   (SMMU_BASE + 0xc00 + ((n) << 2))
+#define SMMU_SMR_VALID_MASK0x8000
+#define SMMU_SMR_MASK_MASK 0x
+#define SMMU_SMR_MASK_SHIFT16
+#define SMMU_SMR_ID_MASK   0x
+#define SMMU_SMR_ID_SHIFT  0
+#define SMMU_S2CR_EXIDVALID_VALID_MASK 0x0400
+#define SMMU_S2CR_TYPE_BYPASS_MASK 0x0001
  
  /* PCIe */

  #define CFG_SYS_PCIE1_ADDR(CONFIG_SYS_IMMR + 0x240)


Re: [PATCH 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush

2023-09-04 Thread Robin Murphy

On 2023-09-04 16:34, Jean-Philippe Brucker wrote:

On Fri, Aug 25, 2023 at 05:21:26PM +0200, Niklas Schnelle wrote:

Add ops->flush_iotlb_all operation to enable virtio-iommu for the
dma-iommu deferred flush scheme. This results inn a significant increase


in


in performance in exchange for a window in which devices can still
access previously IOMMU mapped memory. To get back to the prior behavior
iommu.strict=1 may be set on the kernel command line.


Maybe add that it depends on CONFIG_IOMMU_DEFAULT_DMA_{LAZY,STRICT} as
well, because I've seen kernel configs that enable either.


Indeed, I'd be inclined phrase it in terms of the driver now actually 
being able to honour lazy mode when requested (which happens to be the 
default on x86), rather than as if it might be some 
potentially-unexpected change in behaviour.


Thanks,
Robin.


Link: https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/
Signed-off-by: Niklas Schnelle 
---
  drivers/iommu/virtio-iommu.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index fb73dec5b953..1b7526494490 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -924,6 +924,15 @@ static int viommu_iotlb_sync_map(struct iommu_domain 
*domain,
return viommu_sync_req(vdomain->viommu);
  }
  
+static void viommu_flush_iotlb_all(struct iommu_domain *domain)

+{
+   struct viommu_domain *vdomain = to_viommu_domain(domain);
+
+   if (!vdomain->nr_endpoints)
+   return;


As for patch 1, a NULL check in viommu_sync_req() would allow dropping
this one

Thanks,
Jean


+   viommu_sync_req(vdomain->viommu);
+}
+
  static void viommu_get_resv_regions(struct device *dev, struct list_head 
*head)
  {
struct iommu_resv_region *entry, *new_entry, *msi = NULL;
@@ -1049,6 +1058,8 @@ static bool viommu_capable(struct device *dev, enum 
iommu_cap cap)
switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:
return true;
+   case IOMMU_CAP_DEFERRED_FLUSH:
+   return true;
default:
return false;
}
@@ -1069,6 +1080,7 @@ static struct iommu_ops viommu_ops = {
.map_pages  = viommu_map_pages,
.unmap_pages= viommu_unmap_pages,
.iova_to_phys   = viommu_iova_to_phys,
+   .flush_iotlb_all= viommu_flush_iotlb_all,
.iotlb_sync = viommu_iotlb_sync,
.iotlb_sync_map = viommu_iotlb_sync_map,
.free   = viommu_domain_free,

--
2.39.2


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Robin Murphy

On 2023-09-04 17:16, Boris Brezillon wrote:

On Mon, 4 Sep 2023 16:22:19 +0100
Steven Price  wrote:


On 04/09/2023 10:26, Boris Brezillon wrote:

On Mon, 4 Sep 2023 08:42:08 +0100
Steven Price  wrote:
   

On 01/09/2023 17:10, Boris Brezillon wrote:

On Wed,  9 Aug 2023 18:53:15 +0200
Boris Brezillon  wrote:
 

+/**
+ * DOC: MMIO regions exposed to userspace.
+ *
+ * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
+ *
+ * File offset for all MMIO regions being exposed to userspace. Don't use
+ * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
+ *
+ * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
+ *
+ * File offset for the LATEST_FLUSH_ID register. The Userspace driver controls
+ * GPU cache flushling through CS instructions, but the flush reduction
+ * mechanism requires a flush_id. This flush_id could be queried with an
+ * ioctl, but Arm provides a well-isolated register page containing only this
+ * read-only register, so let's expose this page through a static mmap offset
+ * and allow direct mapping of this MMIO region so we can avoid the
+ * user <-> kernel round-trip.
+ */
+#define DRM_PANTHOR_USER_MMIO_OFFSET   (0x1ull << 56)


I'm playing with a 32-bit kernel/userspace, and this is problematic,
because vm_pgoff is limited to 32-bit there, meaning we can only map up
to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
userspace set the mmio range?


Hmm, I was rather hoping we could ignore 32 bit these days ;) But while
I can't see why anyone would be running a 32 bit kernel, I guess 32 bit
user space is likely to still be needed.


Uh, I just hit a new problem with 32-bit kernels: the io-pgtable
interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning
the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any
chance you could advise me on what to do here?

1. assume this limitation is here for a good reason, and limit the GPU
VA space to 32-bits on 32-bit kernels

or

2. update the interface to make iova an u64


I'm not sure I can answer the question from a technical perspective,
hopefully Robin will be able to.


Had a quick chat with Robin, and he's recommending going for #1 too.



But why do we care about 32-bit kernels on a platform which is new
enough to have a CSF-GPU (and by extension a recent 64-bit CPU)?


Apparently the memory you save by switching to a 32-bit kernel matters
to some people. To clarify, the CPU is aarch64, but they want to use it
in 32-bit mode.



Given the other limitations present in a 32-bit kernel I'd be tempted to
say '1' just for simplicity. Especially since apparently we've lived
with this for panfrost which presumably has the same limitation (even
though all Bifrost/Midgard GPUs have at least 33 bits of VA space).


Well, Panfrost is simpler in that you don't have this kernel VA range,
and, IIRC, we are using the old format that naturally limits the GPU VA
space to 4G.


FWIW the legacy pagetable format itself should be fine going up to 
however many bits the GPU supports, however there were various ISA 
limitations around crossing 4GB boundaries, and the easiest way to avoid 
having to think about those was to just not use more than 4GB of VA at 
all (minus chunks at the ends for similar weird ISA reasons).


Cheers,
Robin.


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Robin Murphy

On 2023-08-09 17:53, Boris Brezillon wrote:
[...]

+/**
+ * struct drm_panthor_vm_create - Arguments passed to 
DRM_PANTHOR_IOCTL_VM_CREATE
+ */
+struct drm_panthor_vm_create {
+   /** @flags: VM flags, MBZ. */
+   __u32 flags;
+
+   /** @id: Returned VM ID. */
+   __u32 id;
+
+   /**
+* @kernel_va_range: Size of the VA space reserved for kernel objects.
+*
+* If kernel_va_range is zero, we pick half of the VA space for kernel 
objects.
+*
+* Kernel VA space is always placed at the top of the supported VA 
range.
+*/
+   __u64 kernel_va_range;


Off the back of the "IOVA as unsigned long" concern, Boris and I 
reasoned through the 64-bit vs. 32-bit vs. compat cases on IRC, and it 
seems like this kernel_va_range argument is a source of much of the pain.


Rather than have userspace specify a quantity which it shouldn't care 
about and depend on assumptions of kernel behaviour to infer the 
quantity which *is* relevant (i.e. how large the usable range of the VM 
will actually be), I think it would be considerably more logical for 
userspace to simply request the size of usable VM it actually wants. 
Then it would be straightforward and consistent to define the default 
value in terms of the minimum of half the GPU VA size or TASK_SIZE (the 
latter being the largest *meaningful* value in all 3 cases), and it's 
still easy enough for the kernel to deduce for itself whether there's a 
reasonable amount of space left between the requested limit and 
ULONG_MAX for it to use. 32-bit kernels should then get at least 1GB to 
play with, for compat the kernel BOs can get well out of the way into 
the >32-bit range, and it's only really 64-bit where userspace is liable 
to see "kernel" VA space impinging on usable process VAs. Even then 
we're not sure that's a significant concern beyond OpenCL SVM.


Thanks,
Robin.


Re: [PATCH v2 13/15] drm/panthor: Allow driver compilation

2023-08-21 Thread Robin Murphy

On 2023-08-14 12:18, Steven Price wrote:

On 11/08/2023 20:26, Robin Murphy wrote:

On 2023-08-11 17:56, Daniel Stone wrote:

Hi,

On 11/08/2023 17:35, Robin Murphy wrote:

On 2023-08-09 17:53, Boris Brezillon wrote:

+obj-$(CONFIG_DRM_PANTHOR) += panthor.o


FWIW I still think it would be nice to have a minor
directory/Kconfig/Makefile reshuffle and a trivial bit of extra
registration glue to build both drivers into a single module. It
seems like it could be a perpetual source of confusion to end users
where Mesa "panfrost" is the right option but kernel "panfrost" is
the wrong one. Especially when pretty much every other GPU driver is
also just one big top-level module to load for many different
generations of hardware. Plus it would mean that if someone did want
to have a go at deduplicating the resource-wrangling boilerplate for
OPPs etc. in future, there's more chance of being able to do so
meaningfully.


It might be nice to point it out, but to be fair Intel and AMD both
have two (or more) drivers, as does Broadcom/RPi. As does, err ... Mali.


Indeed, I didn't mean to imply that I'm not aware that e.g. gma500 is to
i915 what lima is to panfrost. It was more that unlike the others where
there's a pretty clear line in the sand between "driver for old
hardware" and "driver for the majority of recent hardware", this one
happens to fall splat in the middle of the current major generation such
that panfrost is the correct module for Mali Bifrost but also the wrong
one for Mali Bifrost... :/


Well panfrost.ko is the correct module for all Bifrost ;) It's Valhall
that's the confusing one.


Bah, you see? If even developers sufficiently involved to be CCed on the 
patches can't remember what's what, what hope does Joe User have? :D



I would hope that for most users they can just build both panfrost and
panthor and everything will "Just Work (tm)". I'm not sure how much
users are actually aware of the architecture family of their GPU.

I think at the moment (until marketing mess it up) there's also the
'simple' rule:

* Mali T* is Midgard and supported by panfrost.ko
* Mali Gxx (two digits) is Bifrost or first-generation Valhall and
supported by panfrost.ko
* Mali Gxxx (three digits) is Valhall CSF and supported by panthor.

(and Immortalis is always three digits and Valhall CSF).


With brain now engaged, indeed that sounds right. However if the 
expectation is that most people would steer clear even of marketing's 
alphabet soup and just enable everything, that could also be seen as 
somewhat of an argument for just putting it all together and not 
bothering with a separate option.



I can see the point, but otoh if someone's managed to build all the
right regulator/clock/etc modules to get a working system, they'll
probably manage to figure teh GPU side out?


Maybe; either way I guess it's not really my concern, since I'm the only
user that *I* have to support, and I do already understand it. From the
upstream perspective I mostly just want to hold on to the hope of not
having to write my io-pgtable bugs twice over if at all possible :)


I agree it would be nice to merge some of the common code, I'm hoping
this is something that might be possible in the future. But at the
moment the focus is on trying to get basic support for the new GPUs
without the danger of regressing the old GPUs.


Yup, I get that, it's just the niggling concern I have is whether what 
we do at the moment might paint us into a corner with respect to what 
we're then able to change later; I know KConfig symbols are explicitly 
not ABI, but module names and driver names might be more of a grey area.



And, to be honest, for a fair bit of the common code in
panfrost/panthorm it's common to a few other drivers too. So the correct
answer might well be to try to add more generic helpers (devfreq,
clocks, power domains all spring to mind - there's a lot of boiler plate
and nothing very special about Mali).


That much is true, however I guess there's also stuff like perf counter 
support which is less likely to be DRM-level generic but perhaps still 
sufficiently similar between JM and CSF. The main thing I don't know, 
and thus feel compelled to poke at, is whether there's any possibility 
that once the new UAPI is mature, it might eventually become preferable 
to move Job Manager support over to some subset of that rather than 
maintain two whole UAPIs in parallel (particularly at the Mesa end). My 
(limited) understanding is that all the BO-wrangling and MMU code is 
primarily different here for the sake of supporting new shiny UAPI 
features, not because of anything inherent to CSF itself (other than CSF 
being the thing which makes supporting said features feasible). If 
that's a preposterous idea and absolutely never ever going to be 
realistic, then fine, but if not, then it feels like the kind of thing 
that my all-too-great experience of technical debt and bad short-term

Re: [PATCH v2 05/15] drm/panthor: Add the GPU logical block

2023-08-21 Thread Robin Murphy

On 2023-08-14 11:54, Steven Price wrote:
[...]

+/**
+ * panthor_gpu_l2_power_on() - Power-on the L2-cache
+ * @ptdev: Device.
+ *
+ * Return: 0 on success, a negative error code otherwise.
+ */
+int panthor_gpu_l2_power_on(struct panthor_device *ptdev)
+{
+   u64 core_mask = U64_MAX;
+
+   if (ptdev->gpu_info.l2_present != 1) {
+   /*
+* Only support one core group now.
+* ~(l2_present - 1) unsets all bits in l2_present except
+* the bottom bit. (l2_present - 2) has all the bits in
+* the first core group set. AND them together to generate
+* a mask of cores in the first core group.
+*/
+   core_mask = ~(ptdev->gpu_info.l2_present - 1) &
+(ptdev->gpu_info.l2_present - 2);
+   drm_info_once(>base, "using only 1st core group (%lu cores 
from %lu)\n",
+ hweight64(core_mask),
+ hweight64(ptdev->gpu_info.shader_present));


I'm not sure what the point of this complexity is. This boils down to
the equivalent of:

if (ptdev->gpu_info.l2_present != 1)
core_mask = 1;


Hmm, that doesn't look right - the idiom here should be to set all bits 
of the output below the *second* set bit of the input, i.e. 0x11 -> 
0x0f. However since panthor is (somewhat ironically) unlikely to ever 
run on T628, and everything newer should pretend to have a single L2 
because software-managed coherency is a terrible idea, I would agree 
that ultimately it does all seem a bit pointless.



If we were doing shader-core power management manually (like on pre-CSF
GPUs, rather than letting the firmware control it) then the computed
core_mask would be useful. So I guess it comes down to the
drm_info_once() output and counting the cores - which is nice to have
but it took me some time figuring out what was going on here.
As for the complexity, I'd suggest you can have some choice words with 
the guy who originally suggested that code[1] ;)


Cheers,
Robin.

[1] 
https://lore.kernel.org/dri-devel/b009b4c4-0396-58c2-7779-30c844f36...@arm.com/


Re: [PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()

2023-08-21 Thread Robin Murphy

On 2023-08-18 22:32, Jason Gunthorpe wrote:

It turns out several drivers are calling of_dma_configure() outside the
expected bus_type.dma_configure op. This ends up being mis-locked and
triggers a lockdep assertion, or instance:

   iommu_probe_device_locked+0xd4/0xe4
   of_iommu_configure+0x10c/0x200
   of_dma_configure_id+0x104/0x3b8
   a6xx_gmu_init+0x4c/0xccc [msm]
   a6xx_gpu_init+0x3ac/0x770 [msm]
   adreno_bind+0x174/0x2ac [msm]
   component_bind_all+0x118/0x24c
   msm_drm_bind+0x1e8/0x6c4 [msm]
   try_to_bring_up_aggregate_device+0x168/0x1d4
   __component_add+0xa8/0x170
   component_add+0x14/0x20
   dsi_dev_attach+0x20/0x2c [msm]
   dsi_host_attach+0x9c/0x144 [msm]
   devm_mipi_dsi_attach+0x34/0xb4
   lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc]
   lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc]
   i2c_device_probe+0x14c/0x290
   really_probe+0x148/0x2b4
   __driver_probe_device+0x78/0x12c
   driver_probe_device+0x3c/0x160
   __device_attach_driver+0xb8/0x138
   bus_for_each_drv+0x84/0xe0
   __device_attach+0xa8/0x1b0
   device_initial_probe+0x14/0x20
   bus_probe_device+0xb0/0xb4
   deferred_probe_work_func+0x8c/0xc8
   process_one_work+0x1ec/0x53c
   worker_thread+0x298/0x408
   kthread+0x124/0x128
   ret_from_fork+0x10/0x20

It is subtle and was never documented or enforced, but there has always
been an assumption that of_dma_configure_id() is not concurrent. It makes
several calls into the iommu layer that require this, including
dev_iommu_get(). The majority of cases have been preventing concurrency
using the device_lock().

Thus the new lock debugging added exposes an existing problem in
drivers. On inspection this looks like a theoretical locking problem as
generally the cases are already assuming they are the exclusive (single
threaded) user of the target device.


Sorry to be blunt, but the only problem is that you've introduced an 
idealistic new locking scheme which failed to take into account how 
things currently actually work, and is broken and achieving nothing but 
causing problems.


The solution is to drop those locking patches entirely and rethink the 
whole thing. When their sole purpose was to improve the locking and make 
it easier to reason about, and now the latest "fix" is now to remove one 
of the assertions which forms the fundamental basis for that reasoning, 
then the point has clearly been lost. All we've done is churned a dodgy 
and incomplete locking scheme into a *different* dodgy and incomplete 
locking scheme. I do not think that continuing to dig in deeper is the 
way out of the hole...


It's now rc7, and I have little confidence that aren't still more latent 
problems which just haven't been hit yet (e.g. acpi_dma_configure() is 
also called in different contexts relative to the device lock, which is 
absolutely by design and not broken).


And on the subject of idealism, the fact is that doing IOMMU 
configuration based on driver probe via bus->dma_configure is 
*fundamentally wrong* and breaking a bunch of other IOMMU API 
assumptions, so it is not a robust foundation to build anything upon in 
the first place. The problem it causes with broken groups has been known 
about for several years now, however it's needed a lot of work to get to 
the point of being able to fix it properly (FWIW that is now #2 on my 
priority list after getting the bus ops stuff done, which should also 
make it easier).


Thanks,
Robin.


Sadly, there are deeper technical problems with all of the places doing
this. There are several problemetic patterns:

1) Probe a driver on device A and then steal device B and use it as part
of the driver operation.

Since no driver was probed to device B it means we never called
bus_type.dma_configure and thus the drivers hackily try to open code
this.

Unfortunately nothing prevents another driver from binding to device B
and creating total chaos. eg vfio bind triggered by userspace

2) Probe a driver on device A and then create a new platform driver B for a
fwnode that doesn't have one, then do #1

This has the same essential problem as #1, the new device is never
probed so the hack call to of_dma_configure() is needed to setup DMA,
and we are at risk of something else trying to use the device.

3) Probe a driver on device A but the of_node was incorrect for DMA so fix
it by figuring out the right node and calling of_dma_configure()

This will blow up in the iommu code if the driver is unprobed because
the bus_type now assumes that dma_configure and dma_cleanup are
strictly paired. Since dma_configure will have done the wrong thing due
to the missing of_node, dma_cleanup will be unpaired and
iommu_device_unuse_default_domain() will blow up.

Further the driver operating on device A will not be protected against
changes to the iommu domain since it never called
iommu_device_use_default_domain()

At least this case will not throw a lockdep warning as 

Re: [PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()

2023-08-21 Thread Robin Murphy

On 2023-08-18 22:32, Jason Gunthorpe wrote:

It turns out several drivers are calling of_dma_configure() outside the
expected bus_type.dma_configure op. This ends up being mis-locked and
triggers a lockdep assertion, or instance:

   iommu_probe_device_locked+0xd4/0xe4
   of_iommu_configure+0x10c/0x200
   of_dma_configure_id+0x104/0x3b8
   a6xx_gmu_init+0x4c/0xccc [msm]
   a6xx_gpu_init+0x3ac/0x770 [msm]
   adreno_bind+0x174/0x2ac [msm]
   component_bind_all+0x118/0x24c
   msm_drm_bind+0x1e8/0x6c4 [msm]
   try_to_bring_up_aggregate_device+0x168/0x1d4
   __component_add+0xa8/0x170
   component_add+0x14/0x20
   dsi_dev_attach+0x20/0x2c [msm]
   dsi_host_attach+0x9c/0x144 [msm]
   devm_mipi_dsi_attach+0x34/0xb4
   lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc]
   lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc]
   i2c_device_probe+0x14c/0x290
   really_probe+0x148/0x2b4
   __driver_probe_device+0x78/0x12c
   driver_probe_device+0x3c/0x160
   __device_attach_driver+0xb8/0x138
   bus_for_each_drv+0x84/0xe0
   __device_attach+0xa8/0x1b0
   device_initial_probe+0x14/0x20
   bus_probe_device+0xb0/0xb4
   deferred_probe_work_func+0x8c/0xc8
   process_one_work+0x1ec/0x53c
   worker_thread+0x298/0x408
   kthread+0x124/0x128
   ret_from_fork+0x10/0x20

It is subtle and was never documented or enforced, but there has always
been an assumption that of_dma_configure_id() is not concurrent. It makes
several calls into the iommu layer that require this, including
dev_iommu_get(). The majority of cases have been preventing concurrency
using the device_lock().

Thus the new lock debugging added exposes an existing problem in
drivers. On inspection this looks like a theoretical locking problem as
generally the cases are already assuming they are the exclusive (single
threaded) user of the target device.


Sorry to be blunt, but the only problem is that you've introduced an 
idealistic new locking scheme which failed to take into account how 
things currently actually work, and is broken and achieving nothing but 
causing problems.


The solution is to drop those locking patches entirely and rethink the 
whole thing. When their sole purpose was to improve the locking and make 
it easier to reason about, and now the latest "fix" is now to remove one 
of the assertions which forms the fundamental basis for that reasoning, 
then the point has clearly been lost. All we've done is churned a dodgy 
and incomplete locking scheme into a *different* dodgy and incomplete 
locking scheme. I do not think that continuing to dig in deeper is the 
way out of the hole...


It's now rc7, and I have little confidence that aren't still more latent 
problems which just haven't been hit yet (e.g. acpi_dma_configure() is 
also called in different contexts relative to the device lock, which is 
absolutely by design and not broken).


And on the subject of idealism, the fact is that doing IOMMU 
configuration based on driver probe via bus->dma_configure is 
*fundamentally wrong* and breaking a bunch of other IOMMU API 
assumptions, so it is not a robust foundation to build anything upon in 
the first place. The problem it causes with broken groups has been known 
about for several years now, however it's needed a lot of work to get to 
the point of being able to fix it properly (FWIW that is now #2 on my 
priority list after getting the bus ops stuff done, which should also 
make it easier).


Thanks,
Robin.


Sadly, there are deeper technical problems with all of the places doing
this. There are several problemetic patterns:

1) Probe a driver on device A and then steal device B and use it as part
of the driver operation.

Since no driver was probed to device B it means we never called
bus_type.dma_configure and thus the drivers hackily try to open code
this.

Unfortunately nothing prevents another driver from binding to device B
and creating total chaos. eg vfio bind triggered by userspace

2) Probe a driver on device A and then create a new platform driver B for a
fwnode that doesn't have one, then do #1

This has the same essential problem as #1, the new device is never
probed so the hack call to of_dma_configure() is needed to setup DMA,
and we are at risk of something else trying to use the device.

3) Probe a driver on device A but the of_node was incorrect for DMA so fix
it by figuring out the right node and calling of_dma_configure()

This will blow up in the iommu code if the driver is unprobed because
the bus_type now assumes that dma_configure and dma_cleanup are
strictly paired. Since dma_configure will have done the wrong thing due
to the missing of_node, dma_cleanup will be unpaired and
iommu_device_unuse_default_domain() will blow up.

Further the driver operating on device A will not be protected against
changes to the iommu domain since it never called
iommu_device_use_default_domain()

At least this case will not throw a lockdep warning as 

Re: [Freedreno] [PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()

2023-08-21 Thread Robin Murphy

On 2023-08-18 22:32, Jason Gunthorpe wrote:

It turns out several drivers are calling of_dma_configure() outside the
expected bus_type.dma_configure op. This ends up being mis-locked and
triggers a lockdep assertion, or instance:

   iommu_probe_device_locked+0xd4/0xe4
   of_iommu_configure+0x10c/0x200
   of_dma_configure_id+0x104/0x3b8
   a6xx_gmu_init+0x4c/0xccc [msm]
   a6xx_gpu_init+0x3ac/0x770 [msm]
   adreno_bind+0x174/0x2ac [msm]
   component_bind_all+0x118/0x24c
   msm_drm_bind+0x1e8/0x6c4 [msm]
   try_to_bring_up_aggregate_device+0x168/0x1d4
   __component_add+0xa8/0x170
   component_add+0x14/0x20
   dsi_dev_attach+0x20/0x2c [msm]
   dsi_host_attach+0x9c/0x144 [msm]
   devm_mipi_dsi_attach+0x34/0xb4
   lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc]
   lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc]
   i2c_device_probe+0x14c/0x290
   really_probe+0x148/0x2b4
   __driver_probe_device+0x78/0x12c
   driver_probe_device+0x3c/0x160
   __device_attach_driver+0xb8/0x138
   bus_for_each_drv+0x84/0xe0
   __device_attach+0xa8/0x1b0
   device_initial_probe+0x14/0x20
   bus_probe_device+0xb0/0xb4
   deferred_probe_work_func+0x8c/0xc8
   process_one_work+0x1ec/0x53c
   worker_thread+0x298/0x408
   kthread+0x124/0x128
   ret_from_fork+0x10/0x20

It is subtle and was never documented or enforced, but there has always
been an assumption that of_dma_configure_id() is not concurrent. It makes
several calls into the iommu layer that require this, including
dev_iommu_get(). The majority of cases have been preventing concurrency
using the device_lock().

Thus the new lock debugging added exposes an existing problem in
drivers. On inspection this looks like a theoretical locking problem as
generally the cases are already assuming they are the exclusive (single
threaded) user of the target device.


Sorry to be blunt, but the only problem is that you've introduced an 
idealistic new locking scheme which failed to take into account how 
things currently actually work, and is broken and achieving nothing but 
causing problems.


The solution is to drop those locking patches entirely and rethink the 
whole thing. When their sole purpose was to improve the locking and make 
it easier to reason about, and now the latest "fix" is now to remove one 
of the assertions which forms the fundamental basis for that reasoning, 
then the point has clearly been lost. All we've done is churned a dodgy 
and incomplete locking scheme into a *different* dodgy and incomplete 
locking scheme. I do not think that continuing to dig in deeper is the 
way out of the hole...


It's now rc7, and I have little confidence that aren't still more latent 
problems which just haven't been hit yet (e.g. acpi_dma_configure() is 
also called in different contexts relative to the device lock, which is 
absolutely by design and not broken).


And on the subject of idealism, the fact is that doing IOMMU 
configuration based on driver probe via bus->dma_configure is 
*fundamentally wrong* and breaking a bunch of other IOMMU API 
assumptions, so it is not a robust foundation to build anything upon in 
the first place. The problem it causes with broken groups has been known 
about for several years now, however it's needed a lot of work to get to 
the point of being able to fix it properly (FWIW that is now #2 on my 
priority list after getting the bus ops stuff done, which should also 
make it easier).


Thanks,
Robin.


Sadly, there are deeper technical problems with all of the places doing
this. There are several problemetic patterns:

1) Probe a driver on device A and then steal device B and use it as part
of the driver operation.

Since no driver was probed to device B it means we never called
bus_type.dma_configure and thus the drivers hackily try to open code
this.

Unfortunately nothing prevents another driver from binding to device B
and creating total chaos. eg vfio bind triggered by userspace

2) Probe a driver on device A and then create a new platform driver B for a
fwnode that doesn't have one, then do #1

This has the same essential problem as #1, the new device is never
probed so the hack call to of_dma_configure() is needed to setup DMA,
and we are at risk of something else trying to use the device.

3) Probe a driver on device A but the of_node was incorrect for DMA so fix
it by figuring out the right node and calling of_dma_configure()

This will blow up in the iommu code if the driver is unprobed because
the bus_type now assumes that dma_configure and dma_cleanup are
strictly paired. Since dma_configure will have done the wrong thing due
to the missing of_node, dma_cleanup will be unpaired and
iommu_device_unuse_default_domain() will blow up.

Further the driver operating on device A will not be protected against
changes to the iommu domain since it never called
iommu_device_use_default_domain()

At least this case will not throw a lockdep warning as 

Re: [PATCH v3] misc: sram: Add DMA-BUF Heap exporting of SRAM areas

2023-08-17 Thread Robin Murphy

On 2023-07-13 20:13, Andrew Davis wrote:

This new export type exposes to userspace the SRAM area as a DMA-BUF Heap,
this allows for allocations of DMA-BUFs that can be consumed by various
DMA-BUF supporting devices.

Signed-off-by: Andrew Davis 
---

Changes from v2:
  - Make sram_dma_heap_allocate static (kernel test robot)
  - Rebase on v6.5-rc1

  drivers/misc/Kconfig |   7 +
  drivers/misc/Makefile|   1 +
  drivers/misc/sram-dma-heap.c | 245 +++
  drivers/misc/sram.c  |   6 +
  drivers/misc/sram.h  |  16 +++
  5 files changed, 275 insertions(+)
  create mode 100644 drivers/misc/sram-dma-heap.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 75e427f124b28..ee34dfb61605f 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -448,6 +448,13 @@ config SRAM
  config SRAM_EXEC
bool
  
+config SRAM_DMA_HEAP

+   bool "Export on-chip SRAM pools using DMA-Heaps"
+   depends on DMABUF_HEAPS && SRAM
+   help
+ This driver allows the export of on-chip SRAM marked as both pool
+ and exportable to userspace using the DMA-Heaps interface.
+
  config DW_XDATA_PCIE
depends on PCI
tristate "Synopsys DesignWare xData PCIe driver"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index f2a4d1ff65d46..5e7516bfaa8de 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/
  obj-$(CONFIG_LATTICE_ECP3_CONFIG) += lattice-ecp3-config.o
  obj-$(CONFIG_SRAM)+= sram.o
  obj-$(CONFIG_SRAM_EXEC)   += sram-exec.o
+obj-$(CONFIG_SRAM_DMA_HEAP)+= sram-dma-heap.o
  obj-$(CONFIG_GENWQE)  += genwqe/
  obj-$(CONFIG_ECHO)+= echo/
  obj-$(CONFIG_CXL_BASE)+= cxl/
diff --git a/drivers/misc/sram-dma-heap.c b/drivers/misc/sram-dma-heap.c
new file mode 100644
index 0..c054c04dff33e
--- /dev/null
+++ b/drivers/misc/sram-dma-heap.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * SRAM DMA-Heap userspace exporter
+ *
+ * Copyright (C) 2019-2022 Texas Instruments Incorporated - https://www.ti.com/
+ * Andrew Davis 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sram.h"
+
+struct sram_dma_heap {
+   struct dma_heap *heap;
+   struct gen_pool *pool;
+};
+
+struct sram_dma_heap_buffer {
+   struct gen_pool *pool;
+   struct list_head attachments;
+   struct mutex attachments_lock;
+   unsigned long len;
+   void *vaddr;
+   phys_addr_t paddr;
+};
+
+struct dma_heap_attachment {
+   struct device *dev;
+   struct sg_table *table;
+   struct list_head list;
+};
+
+static int dma_heap_attach(struct dma_buf *dmabuf,
+  struct dma_buf_attachment *attachment)
+{
+   struct sram_dma_heap_buffer *buffer = dmabuf->priv;
+   struct dma_heap_attachment *a;
+   struct sg_table *table;
+
+   a = kzalloc(sizeof(*a), GFP_KERNEL);
+   if (!a)
+   return -ENOMEM;
+
+   table = kmalloc(sizeof(*table), GFP_KERNEL);
+   if (!table) {
+   kfree(a);
+   return -ENOMEM;
+   }
+   if (sg_alloc_table(table, 1, GFP_KERNEL)) {
+   kfree(table);
+   kfree(a);
+   return -ENOMEM;
+   }
+   sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(buffer->paddr)), 
buffer->len, 0);


What happens if someone (reasonably) assumes that this struct page 
pointer isn't completely made up, and dereferences it?


(That's if pfn_to_page() itself doesn't blow up, which it potentially 
might, at least under CONFIG_SPARSEMEM)


I think this needs to be treated as P2PDMA if it's going to have any 
hope of working robustly.



+
+   a->table = table;
+   a->dev = attachment->dev;
+   INIT_LIST_HEAD(>list);
+
+   attachment->priv = a;
+
+   mutex_lock(>attachments_lock);
+   list_add(>list, >attachments);
+   mutex_unlock(>attachments_lock);
+
+   return 0;
+}
+
+static void dma_heap_detatch(struct dma_buf *dmabuf,
+struct dma_buf_attachment *attachment)
+{
+   struct sram_dma_heap_buffer *buffer = dmabuf->priv;
+   struct dma_heap_attachment *a = attachment->priv;
+
+   mutex_lock(>attachments_lock);
+   list_del(>list);
+   mutex_unlock(>attachments_lock);
+
+   sg_free_table(a->table);
+   kfree(a->table);
+   kfree(a);
+}
+
+static struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment 
*attachment,
+enum dma_data_direction direction)
+{
+   struct dma_heap_attachment *a = attachment->priv;
+   struct sg_table *table = a->table;
+
+   /*
+* As this heap is backed by uncached SRAM memory we do not need to
+* perform any sync operations on the buffer before allowing device
+  

Re: [PATCH v2 13/15] drm/panthor: Allow driver compilation

2023-08-11 Thread Robin Murphy

On 2023-08-11 17:56, Daniel Stone wrote:

Hi,

On 11/08/2023 17:35, Robin Murphy wrote:

On 2023-08-09 17:53, Boris Brezillon wrote:

+obj-$(CONFIG_DRM_PANTHOR) += panthor.o


FWIW I still think it would be nice to have a minor 
directory/Kconfig/Makefile reshuffle and a trivial bit of extra 
registration glue to build both drivers into a single module. It seems 
like it could be a perpetual source of confusion to end users where 
Mesa "panfrost" is the right option but kernel "panfrost" is the wrong 
one. Especially when pretty much every other GPU driver is also just 
one big top-level module to load for many different generations of 
hardware. Plus it would mean that if someone did want to have a go at 
deduplicating the resource-wrangling boilerplate for OPPs etc. in 
future, there's more chance of being able to do so meaningfully.


It might be nice to point it out, but to be fair Intel and AMD both have 
two (or more) drivers, as does Broadcom/RPi. As does, err ... Mali.


Indeed, I didn't mean to imply that I'm not aware that e.g. gma500 is to 
i915 what lima is to panfrost. It was more that unlike the others where 
there's a pretty clear line in the sand between "driver for old 
hardware" and "driver for the majority of recent hardware", this one 
happens to fall splat in the middle of the current major generation such 
that panfrost is the correct module for Mali Bifrost but also the wrong 
one for Mali Bifrost... :/


I can see the point, but otoh if someone's managed to build all the 
right regulator/clock/etc modules to get a working system, they'll 
probably manage to figure teh GPU side out?


Maybe; either way I guess it's not really my concern, since I'm the only 
user that *I* have to support, and I do already understand it. From the 
upstream perspective I mostly just want to hold on to the hope of not 
having to write my io-pgtable bugs twice over if at all possible :)


Cheers,
Robin.


Re: [PATCH v2 13/15] drm/panthor: Allow driver compilation

2023-08-11 Thread Robin Murphy

On 2023-08-09 17:53, Boris Brezillon wrote:

Now that all blocks are available, we can add/update Kconfig/Makefile
files to allow compilation.

v2:
- Rename the driver (pancsf -> panthor)
- Change the license (GPL2 -> MIT + GPL2)
- Split the driver addition commit
- Add new dependencies on GPUVA and DRM_SCHED

Signed-off-by: Boris Brezillon 
---
  drivers/gpu/drm/Kconfig  |  2 ++
  drivers/gpu/drm/Makefile |  1 +
  drivers/gpu/drm/panthor/Kconfig  | 16 
  drivers/gpu/drm/panthor/Makefile | 15 +++
  4 files changed, 34 insertions(+)
  create mode 100644 drivers/gpu/drm/panthor/Kconfig
  create mode 100644 drivers/gpu/drm/panthor/Makefile

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 2a44b9419d4d..bddfbdb2ffee 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -358,6 +358,8 @@ source "drivers/gpu/drm/lima/Kconfig"
  
  source "drivers/gpu/drm/panfrost/Kconfig"
  
+source "drivers/gpu/drm/panthor/Kconfig"

+
  source "drivers/gpu/drm/aspeed/Kconfig"
  
  source "drivers/gpu/drm/mcde/Kconfig"

diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 215e78e79125..0a260727505f 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -188,6 +188,7 @@ obj-$(CONFIG_DRM_TVE200) += tve200/
  obj-$(CONFIG_DRM_XEN) += xen/
  obj-$(CONFIG_DRM_VBOXVIDEO) += vboxvideo/
  obj-$(CONFIG_DRM_LIMA)  += lima/
+obj-$(CONFIG_DRM_PANTHOR) += panthor/
  obj-$(CONFIG_DRM_PANFROST) += panfrost/
  obj-$(CONFIG_DRM_ASPEED_GFX) += aspeed/
  obj-$(CONFIG_DRM_MCDE) += mcde/
diff --git a/drivers/gpu/drm/panthor/Kconfig b/drivers/gpu/drm/panthor/Kconfig
new file mode 100644
index ..a9d17b1bbb75
--- /dev/null
+++ b/drivers/gpu/drm/panthor/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0 or MIT
+
+config DRM_PANTHOR
+   tristate "Panthor (DRM support for ARM Mali CSF-based GPUs)"
+   depends on DRM
+   depends on ARM || ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   depends on MMU
+   select DRM_EXEC
+   select DRM_SCHED
+   select IOMMU_SUPPORT
+   select IOMMU_IO_PGTABLE_LPAE
+   select DRM_GEM_SHMEM_HELPER
+   select PM_DEVFREQ
+   select DEVFREQ_GOV_SIMPLE_ONDEMAND
+   help
+ DRM driver for ARM Mali CSF-based GPUs.
diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile
new file mode 100644
index ..64193a484879
--- /dev/null
+++ b/drivers/gpu/drm/panthor/Makefile
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0 or MIT
+
+panthor-y := \
+   panthor_devfreq.o \
+   panthor_device.o \
+   panthor_drv.o \
+   panthor_gem.o \
+   panthor_gpu.o \
+   panthor_heap.o \
+   panthor_heap.o \
+   panthor_fw.o \
+   panthor_mmu.o \
+   panthor_sched.o
+
+obj-$(CONFIG_DRM_PANTHOR) += panthor.o


FWIW I still think it would be nice to have a minor 
directory/Kconfig/Makefile reshuffle and a trivial bit of extra 
registration glue to build both drivers into a single module. It seems 
like it could be a perpetual source of confusion to end users where Mesa 
"panfrost" is the right option but kernel "panfrost" is the wrong one. 
Especially when pretty much every other GPU driver is also just one big 
top-level module to load for many different generations of hardware. 
Plus it would mean that if someone did want to have a go at 
deduplicating the resource-wrangling boilerplate for OPPs etc. in 
future, there's more chance of being able to do so meaningfully.


Cheers,
Robin.


Re: [PATCH] iommu: Explicitly include correct DT includes

2023-08-07 Thread Robin Murphy

On 14/07/2023 6:46 pm, Rob Herring wrote:

The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.


Thanks Rob; FWIW,

Acked-by: Robin Murphy 

I guess you're hoping for Joerg to pick this up? However I wouldn't 
foresee any major conflicts if you do need to take it through the OF tree.


Cheers,
Robin.


Signed-off-by: Rob Herring 
---
  drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c | 2 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c| 1 -
  drivers/iommu/arm/arm-smmu/qcom_iommu.c  | 3 +--
  drivers/iommu/ipmmu-vmsa.c   | 1 -
  drivers/iommu/sprd-iommu.c   | 1 +
  drivers/iommu/tegra-smmu.c   | 2 +-
  drivers/iommu/virtio-iommu.c | 2 +-
  7 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
index b5b14108e086..bb89d49adf8d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
@@ -3,7 +3,7 @@
   * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
   */
  
-#include 

+#include 
  #include 
  #include 
  
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index a86acd76c1df..d6d1a2a55cc0 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -29,7 +29,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index a503ed758ec3..cc3f68a3516c 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -22,8 +22,7 @@
  #include 
  #include 
  #include 
-#include 
-#include 
+#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 9f64c5c9f5b9..0aeedd3e1494 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -17,7 +17,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 39e34fdeccda..51144c232474 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c

index 1cbf063ccf14..e445f80d0226 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -9,7 +9,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 3551ed057774..17dcd826f5c2 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -13,7 +13,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/7] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-06-27 Thread Robin Murphy

On 27/06/2023 11:24 am, Greg Kroah-Hartman wrote:

On Tue, Jun 27, 2023 at 11:54:23AM +0200, Petr Tesarik wrote:

+/**
+ * is_swiotlb_active() - check if the software IO TLB is initialized
+ * @dev:   Device to check, or %NULL for the default IO TLB.
+ */
  bool is_swiotlb_active(struct device *dev)
  {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev
+   ? dev->dma_io_tlb_mem
+   : _tlb_default_mem;


That's impossible to read and maintain over time, sorry.

Please use real "if () else" lines, so that it can be maintained over
time.


Moreover, it makes for a horrible interface anyway. If there's a need 
for a non-specific "is SWIOTLB present at all?" check unrelated to any 
particular device (which arguably still smells of poking into 
implementation details...), please encapsulate it in its own distinct 
helper like, say, is_swiotlb_present(void).


However, the more I think about it, the more I doubt that logic like 
octeon_pci_setup() can continue to work properly at all if SWIOTLB 
allocation becomes dynamic... :/


Thanks,
Robin.



Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

2023-06-23 Thread Robin Murphy

On 2023-06-20 10:47, Sui Jingfeng wrote:

From: Sui Jingfeng 

Loongson CPUs maintain cache coherency by hardware, which means that the
data in the CPU cache is identical to the data in main system memory. As
for the peripheral device, most of Loongson chips chose to define the
peripherals as DMA coherent by default, device drivers do not need to
maintain the coherency between a processor and an I/O device manually.

There are exceptions, for LS2K1000 SoC, part of peripheral device can be
configured as DMA non-coherent. But there is no released version of such
firmware exist in the market. Peripherals of older LS2K1000 is also DMA
non-coherent, but they are nearly outdated. So, those are trivial cases.

Nevertheless, kernel space still need to do the probe work, because vivante
GPU IP has been integrated into various platform. Hence, this patch add
runtime detection code to probe if a specific GPU is DMA coherent, If the
answer is yes, we are going to utilize such features. On Loongson platform,
When a buffer is accessed by both the GPU and the CPU, the driver should
prefer ETNA_BO_CACHED over ETNA_BO_WC.

This patch also add a new parameter: etnaviv_param_gpu_coherent, which
allow userspace to know if such a feature is available. Because
write-combined BO is still preferred in some case, especially where don't
need CPU read, for example, uploading compiled shader bin.

Cc: Lucas Stach 
Cc: Christian Gmeiner 
Cc: Philipp Zabel 
Cc: Bjorn Helgaas 
Cc: Daniel Vetter 
Signed-off-by: Sui Jingfeng 
---
  drivers/gpu/drm/etnaviv/etnaviv_drv.c   | 35 +
  drivers/gpu/drm/etnaviv/etnaviv_drv.h   |  6 
  drivers/gpu/drm/etnaviv/etnaviv_gem.c   | 22 ++---
  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c |  7 -
  drivers/gpu/drm/etnaviv/etnaviv_gpu.c   |  4 +++
  include/uapi/drm/etnaviv_drm.h  |  1 +
  6 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 0a365e96d371..d8e788aa16cb 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -5,7 +5,9 @@
  
  #include 

  #include 
+#include 


/*
 * This header is for implementations of dma_map_ops and related code.
 * It should not be included in drivers just using the DMA API.
 */


  #include 
+#include 
  #include 
  #include 
  
@@ -24,6 +26,34 @@

  #include "etnaviv_pci_drv.h"
  #include "etnaviv_perfmon.h"
  
+static struct device_node *etnaviv_of_first_available_node(void)

+{
+   struct device_node *core_node;
+
+   for_each_compatible_node(core_node, NULL, "vivante,gc") {
+   if (of_device_is_available(core_node))
+   return core_node;
+   }
+
+   return NULL;
+}
+
+static bool etnaviv_is_dma_coherent(struct device *dev)
+{
+   struct device_node *np;
+   bool coherent;
+
+   np = etnaviv_of_first_available_node();
+   if (np) {
+   coherent = of_dma_is_coherent(np);
+   of_node_put(np);
+   } else {
+   coherent = dev_is_dma_coherent(dev);
+   }


Please use device_get_dma_attr() like other well-behaved drivers.


+
+   return coherent;
+}
+
  /*
   * etnaviv private data construction and destructions:
   */
@@ -52,6 +82,11 @@ etnaviv_alloc_private(struct device *dev, struct drm_device 
*drm)
return ERR_PTR(-ENOMEM);
}
  
+	priv->dma_coherent = etnaviv_is_dma_coherent(dev);

+
+   if (priv->dma_coherent)
+   drm_info(drm, "%s is dma coherent\n", dev_name(dev));


I'm pretty sure the end-user doesn't care.


+
return priv;
  }
  
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h

index 9cd72948cfad..644e5712c050 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
@@ -46,6 +46,12 @@ struct etnaviv_drm_private {
struct xarray active_contexts;
u32 next_context_id;
  
+	/*

+* If true, the GPU is capable of snooping cpu cache. Here, it
+* also means that cache coherency is enforced by the hardware.
+*/
+   bool dma_coherent;
+
/* list of GEM objects: */
struct mutex gem_lock;
struct list_head gem_list;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index b5f73502e3dd..39bdc3774f2d 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -343,6 +343,7 @@ void *etnaviv_gem_vmap(struct drm_gem_object *obj)
  static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
  {
struct page **pages;
+   pgprot_t prot;
  
  	lockdep_assert_held(>lock);
  
@@ -350,8 +351,19 @@ static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)

if (IS_ERR(pages))
return NULL;
  
-	return vmap(pages, obj->base.size >> PAGE_SHIFT,

-   

Re: [PATCH v2 09/25] iommu/fsl_pamu: Implement an IDENTITY domain

2023-06-01 Thread Robin Murphy

On 2023-06-01 20:46, Jason Gunthorpe wrote:

On Thu, Jun 01, 2023 at 08:37:45PM +0100, Robin Murphy wrote:

On 2023-05-16 01:00, Jason Gunthorpe wrote:

Robin was able to check the documentation and what fsl_pamu has
historically called detach_dev() is really putting the IOMMU into an
IDENTITY mode.


Unfortunately it was the other way around - it's the call to
fsl_setup_liodns() from fsl_pamu_probe() which leaves everything in bypass
by default (the PAACE_ATM_NO_XLATE part, IIRC), whereas the detach_device()
call here ends up disabling the given device's LIODN altogether


Er, I see.. Let me think about it, you convinced me to change it from
PLATFORM, so maybe we should go back to that if it is all wonky.


FWIW I was thinking more along the lines of a token nominal identity 
domain where attach does nothing at all...



There doesn't appear to have ever been any code anywhere for putting
things *back* into bypass after using a VFIO domain, so as-is these
default domains would probably break all DMA :(


Sounds like it just never worked right.

ie going to VFIO mode was always a one way trip and you can't go back
to a kernel driver.


...on the assumption that doing so wouldn't really be any less broken 
than it always has been :)


Thanks,
Robin.


I don't think this patch makes it worse because we call the identity
attach_dev in all the same places we called detach_dev in the first
place.

We add an extra call at the start of time, but that call is NOP'd
by this:


if (domain == platform_domain || !domain)
+   return 0;
+


(bah, and the variable name needs updating too)

Honestly, I don't really want to fix FSL since it seems abandoned, so
either this patch or going back to PLATFORM seems like the best option.

Jason


Re: [PATCH v2 25/25] iommu: Convert remaining simple drivers to domain_alloc_paging()

2023-06-01 Thread Robin Murphy

On 2023-05-16 01:00, Jason Gunthorpe wrote:

These drivers don't support IOMMU_DOMAIN_DMA, so this commit effectively
allows them to support that mode.

The prior work to require default_domains makes this safe because every
one of these drivers is either compilation incompatible with dma-iommu.c,
or already establishing a default_domain. In both cases alloc_domain()
will never be called with IOMMU_DOMAIN_DMA for these drivers so it is safe
to drop the test.

Removing these tests clarifies that the domain allocation path is only
about the functionality of a paging domain and has nothing to do with
policy of how the paging domain is used for UNMANAGED/DMA/DMA_FQ.

Tested-by: Niklas Schnelle 
Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/fsl_pamu_domain.c | 7 ++-
  drivers/iommu/msm_iommu.c   | 7 ++-
  drivers/iommu/mtk_iommu_v1.c| 7 ++-
  drivers/iommu/omap-iommu.c  | 7 ++-
  drivers/iommu/s390-iommu.c  | 7 ++-
  5 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index ca4f5ebf028783..8d5d6a3acf9dfd 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -192,13 +192,10 @@ static void fsl_pamu_domain_free(struct iommu_domain 
*domain)
kmem_cache_free(fsl_pamu_domain_cache, dma_domain);
  }
  
-static struct iommu_domain *fsl_pamu_domain_alloc(unsigned type)

+static struct iommu_domain *fsl_pamu_domain_alloc_paging(struct device *dev)


This isn't a paging domain - it doesn't support map/unmap, and AFAICT 
all it has ever been intended to do is "isolate" accesses to within an 
aperture which is never set to anything less than the entire physical 
address space :/


I hate to imagine what the VFIO userspace applications looked like...

Thanks,
Robin.


  {
struct fsl_dma_domain *dma_domain;
  
-	if (type != IOMMU_DOMAIN_UNMANAGED)

-   return NULL;
-
dma_domain = kmem_cache_zalloc(fsl_pamu_domain_cache, GFP_KERNEL);
if (!dma_domain)
return NULL;
@@ -476,7 +473,7 @@ static const struct iommu_ops fsl_pamu_ops = {
.identity_domain = _pamu_identity_domain,
.def_domain_type = _pamu_def_domain_type,
.capable= fsl_pamu_capable,
-   .domain_alloc   = fsl_pamu_domain_alloc,
+   .domain_alloc_paging = fsl_pamu_domain_alloc_paging,
.probe_device   = fsl_pamu_probe_device,
.device_group   = fsl_pamu_device_group,
.default_domain_ops = &(const struct iommu_domain_ops) {
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 26ed81cfeee897..a163cee0b7242d 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -302,13 +302,10 @@ static void __program_context(void __iomem *base, int ctx,
SET_M(base, ctx, 1);
  }
  
-static struct iommu_domain *msm_iommu_domain_alloc(unsigned type)

+static struct iommu_domain *msm_iommu_domain_alloc_paging(struct device *dev)
  {
struct msm_priv *priv;
  
-	if (type != IOMMU_DOMAIN_UNMANAGED)

-   return NULL;
-
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
goto fail_nomem;
@@ -691,7 +688,7 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
  
  static struct iommu_ops msm_iommu_ops = {

.identity_domain = _iommu_identity_domain,
-   .domain_alloc = msm_iommu_domain_alloc,
+   .domain_alloc_paging = msm_iommu_domain_alloc_paging,
.probe_device = msm_iommu_probe_device,
.device_group = generic_device_group,
.pgsize_bitmap = MSM_IOMMU_PGSIZES,
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 7c0c1d50df5f75..67e044c1a7d93b 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -270,13 +270,10 @@ static int mtk_iommu_v1_domain_finalise(struct 
mtk_iommu_v1_data *data)
return 0;
  }
  
-static struct iommu_domain *mtk_iommu_v1_domain_alloc(unsigned type)

+static struct iommu_domain *mtk_iommu_v1_domain_alloc_paging(struct device 
*dev)
  {
struct mtk_iommu_v1_domain *dom;
  
-	if (type != IOMMU_DOMAIN_UNMANAGED)

-   return NULL;
-
dom = kzalloc(sizeof(*dom), GFP_KERNEL);
if (!dom)
return NULL;
@@ -585,7 +582,7 @@ static int mtk_iommu_v1_hw_init(const struct 
mtk_iommu_v1_data *data)
  
  static const struct iommu_ops mtk_iommu_v1_ops = {

.identity_domain = _iommu_v1_identity_domain,
-   .domain_alloc   = mtk_iommu_v1_domain_alloc,
+   .domain_alloc_paging = mtk_iommu_v1_domain_alloc_paging,
.probe_device   = mtk_iommu_v1_probe_device,
.probe_finalize = mtk_iommu_v1_probe_finalize,
.release_device = mtk_iommu_v1_release_device,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 34340ef15241bc..fcf99bd195b32e 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1580,13 

Re: [PATCH v2 09/25] iommu/fsl_pamu: Implement an IDENTITY domain

2023-06-01 Thread Robin Murphy

On 2023-05-16 01:00, Jason Gunthorpe wrote:

Robin was able to check the documentation and what fsl_pamu has
historically called detach_dev() is really putting the IOMMU into an
IDENTITY mode.


Unfortunately it was the other way around - it's the call to 
fsl_setup_liodns() from fsl_pamu_probe() which leaves everything in 
bypass by default (the PAACE_ATM_NO_XLATE part, IIRC), whereas the 
detach_device() call here ends up disabling the given device's LIODN 
altogether. There doesn't appear to have ever been any code anywhere for 
putting things *back* into bypass after using a VFIO domain, so as-is 
these default domains would probably break all DMA :(


Thanks,
Robin.


Move to the new core support for ARM_DMA_USE_IOMMU by defining
ops->identity_domain. This is a ppc driver without any dma_ops, so ensure
the identity translation is the default domain.

Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/fsl_pamu_domain.c | 32 +---
  1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index bce37229709965..ca4f5ebf028783 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -283,15 +283,21 @@ static int fsl_pamu_attach_device(struct iommu_domain 
*domain,
return ret;
  }
  
-static void fsl_pamu_set_platform_dma(struct device *dev)

+static int fsl_pamu_identity_attach(struct iommu_domain *platform_domain,
+   struct device *dev)
  {
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
+   struct fsl_dma_domain *dma_domain;
const u32 *prop;
int len;
struct pci_dev *pdev = NULL;
struct pci_controller *pci_ctl;
  
+	if (domain == platform_domain || !domain)

+   return 0;
+
+   dma_domain = to_fsl_dma_domain(domain);
+
/*
 * Use LIODN of the PCI controller while detaching a
 * PCI device.
@@ -312,8 +318,18 @@ static void fsl_pamu_set_platform_dma(struct device *dev)
detach_device(dev, dma_domain);
else
pr_debug("missing fsl,liodn property at %pOF\n", dev->of_node);
+   return 0;
  }
  
+static struct iommu_domain_ops fsl_pamu_identity_ops = {

+   .attach_dev = fsl_pamu_identity_attach,
+};
+
+static struct iommu_domain fsl_pamu_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = _pamu_identity_ops,
+};
+
  /* Set the domain stash attribute */
  int fsl_pamu_configure_l1_stash(struct iommu_domain *domain, u32 cpu)
  {
@@ -447,12 +463,22 @@ static struct iommu_device *fsl_pamu_probe_device(struct 
device *dev)
return _iommu;
  }
  
+static int fsl_pamu_def_domain_type(struct device *dev)

+{
+   /*
+* This platform does not use dma_ops at all so the normally the iommu
+* must be in identity mode
+*/
+   return IOMMU_DOMAIN_IDENTITY;
+}
+
  static const struct iommu_ops fsl_pamu_ops = {
+   .identity_domain = _pamu_identity_domain,
+   .def_domain_type = _pamu_def_domain_type,
.capable= fsl_pamu_capable,
.domain_alloc   = fsl_pamu_domain_alloc,
.probe_device   = fsl_pamu_probe_device,
.device_group   = fsl_pamu_device_group,
-   .set_platform_dma_ops = fsl_pamu_set_platform_dma,
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = fsl_pamu_attach_device,
.iova_to_phys   = fsl_pamu_iova_to_phys,


Re: [PATCH v2 23/25] iommu: Add ops->domain_alloc_paging()

2023-06-01 Thread Robin Murphy

On 2023-05-16 01:00, Jason Gunthorpe wrote:

This callback requests the driver to create only a __IOMMU_DOMAIN_PAGING
domain, so it saves a few lines in a lot of drivers needlessly checking
the type.

More critically, this allows us to sweep out all the
IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA checks from a lot of the
drivers, simplifying what is going on in the code and ultimately removing
the now-unused special cases in drivers where they did not support
IOMMU_DOMAIN_DMA.

domain_alloc_paging() should return a struct iommu_domain that is
functionally compatible with ARM_DMA_USE_IOMMU, dma-iommu.c and iommufd.

Be forwards looking and pass in a 'struct device *' argument. We can
provide this when allocating the default_domain. No drivers will look at
this.


As mentioned before, we already know we're going to need additional 
flags (and possibly data) to cover the existing set_pgtable_quirks 
use-case plus new stuff like the proposed dirty-tracking enable, so I'd 
be inclined to either add an extensible structure argument now to avoid 
future churn, or just not bother adding the device argument either until 
drivers can actually use it.



Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/iommu.c | 18 +++---
  include/linux/iommu.h |  3 +++
  2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c4cac1dcf80610..15aa51c356bd74 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1995,14 +1995,25 @@ void iommu_set_fault_handler(struct iommu_domain 
*domain,
  EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
  
  static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,

+struct device *dev,
 unsigned int type)
  {
struct iommu_domain *domain;
  
  	if (type == IOMMU_DOMAIN_IDENTITY && ops->identity_domain)

return ops->identity_domain;
+   else if ((type == IOMMU_DOMAIN_UNMANAGED || type == IOMMU_DOMAIN_DMA) &&
+ops->domain_alloc_paging) {
+   /*
+* For now exclude DMA_FQ since it is still a driver policy
+* decision through domain_alloc() if we can use FQ mode.
+*/


That's sorted now, so the type test can neatly collapse down to "type & 
__IOMMU_DOMAIN_PAGING".


Thanks,
Robin.


+   domain = ops->domain_alloc_paging(dev);
+   } else if (ops->domain_alloc)
+   domain = ops->domain_alloc(type);
+   else
+   return NULL;
  
-	domain = ops->domain_alloc(type);

if (!domain)
return NULL;
  
@@ -2033,14 +2044,15 @@ __iommu_group_domain_alloc(struct iommu_group *group, unsigned int type)
  
  	lockdep_assert_held(>mutex);
  
-	return __iommu_domain_alloc(dev_iommu_ops(dev), type);

+   return __iommu_domain_alloc(dev_iommu_ops(dev), dev, type);
  }
  
  struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)

  {
if (bus == NULL || bus->iommu_ops == NULL)
return NULL;
-   return __iommu_domain_alloc(bus->iommu_ops, IOMMU_DOMAIN_UNMANAGED);
+   return __iommu_domain_alloc(bus->iommu_ops, NULL,
+   IOMMU_DOMAIN_UNMANAGED);
  }
  EXPORT_SYMBOL_GPL(iommu_domain_alloc);
  
diff --git a/include/linux/iommu.h b/include/linux/iommu.h

index 387746f8273c99..18b0df42cc80d1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
   * struct iommu_ops - iommu ops and capabilities
   * @capable: check capability
   * @domain_alloc: allocate iommu domain
+ * @domain_alloc_paging: Allocate an iommu_domain that can be used for
+ *   UNMANAGED, DMA, and DMA_FQ domain types.
   * @probe_device: Add device to iommu driver handling
   * @release_device: Remove device from iommu driver handling
   * @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -258,6 +260,7 @@ struct iommu_ops {
  
  	/* Domain allocation and freeing by the iommu driver */

struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
+   struct iommu_domain *(*domain_alloc_paging)(struct device *dev);
  
  	struct iommu_device *(*probe_device)(struct device *dev);

void (*release_device)(struct device *dev);


Re: [PATCH v2 08/25] iommu: Allow an IDENTITY domain as the default_domain in ARM32

2023-06-01 Thread Robin Murphy

On 2023-05-16 01:00, Jason Gunthorpe wrote:

Even though dma-iommu.c and CONFIG_ARM_DMA_USE_IOMMU do approximately the
same stuff, the way they relate to the IOMMU core is quiet different.

dma-iommu.c expects the core code to setup an UNMANAGED domain (of type
IOMMU_DOMAIN_DMA) and then configures itself to use that domain. This
becomes the default_domain for the group.

ARM_DMA_USE_IOMMU does not use the default_domain, instead it directly
allocates an UNMANAGED domain and operates it just like an external
driver. In this case group->default_domain is NULL.

If the driver provides a global static identity_domain then automatically
use it as the default_domain when in ARM_DMA_USE_IOMMU mode.

This allows drivers that implemented default_domain == NULL as an IDENTITY
translation to trivially get a properly labeled non-NULL default_domain on
ARM32 configs.

With this arrangment when ARM_DMA_USE_IOMMU wants to disconnect from the
device the normal detach_domain flow will restore the IDENTITY domain as
the default domain. Overall this makes attach_dev() of the IDENTITY domain
called in the same places as detach_dev().

This effectively migrates these drivers to default_domain mode. For
drivers that support ARM64 they will gain support for the IDENTITY
translation mode for the dma_api and behave in a uniform way.

Drivers use this by setting ops->identity_domain to a static singleton
iommu_domain that implements the identity attach. If the core detects
ARM_DMA_USE_IOMMU mode then it automatically attaches the IDENTITY domain
during probe.

Drivers can continue to prevent the use of DMA translation by returning
IOMMU_DOMAIN_IDENTITY from def_domain_type, this will completely prevent
IOMMU_DMA from running but will not impact ARM_DMA_USE_IOMMU.

This allows removing the set_platform_dma_ops() from every remaining
driver.

Remove the set_platform_dma_ops from rockchip and mkt_v1 as all it does
is set an existing global static identity domain. mkt_v1 does not support
IOMMU_DOMAIN_DMA and it does not compile on ARM64 so this transformation
is safe.

Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/iommu.c  | 40 +-
  drivers/iommu/mtk_iommu_v1.c   | 12 --
  drivers/iommu/rockchip-iommu.c | 10 -
  3 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8ba90571449cec..bed7cb6e5ee65b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1757,18 +1757,48 @@ static int iommu_get_default_domain_type(struct 
iommu_group *group,
int type;
  
  	lockdep_assert_held(>mutex);

+
+   /*
+* ARM32 drivers supporting CONFIG_ARM_DMA_USE_IOMMU can declare an
+* identity_domain and it will automatically become their default
+* domain. Later on ARM_DMA_USE_IOMMU will install its UNMANAGED domain.
+* Override the selection to IDENTITY if we are sure the driver supports
+* it.
+*/
+   if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) && ops->identity_domain) {


If I cared about arm-smmu on 32-bit, I'd bring that up again, but 
honestly I'm not sure that I do... I think it might end up working after 
patch #21, and it's currently still broken for lack of .set_platform_dma 
anyway, so meh.



+   type = IOMMU_DOMAIN_IDENTITY;
+   if (best_type && type && best_type != type)
+   goto err;
+   best_type = target_type = IOMMU_DOMAIN_IDENTITY;
+   }
+
for_each_group_device(group, gdev) {
type = best_type;
if (ops->def_domain_type) {
type = ops->def_domain_type(gdev->dev);
-   if (best_type && type && best_type != type)
+   if (best_type && type && best_type != type) {
+   /* Stick with the last driver override we saw */
+   best_type = type;
goto err;
+   }
}
  
  		if (dev_is_pci(gdev->dev) && to_pci_dev(gdev->dev)->untrusted) {

-   type = IOMMU_DOMAIN_DMA;
-   if (best_type && type && best_type != type)
-   goto err;
+   /*
+* We don't have any way for the iommu core code to
+* force arm_iommu to activate so we can't enforce
+* trusted. Log it and keep going with the IDENTITY
+* default domain.
+*/
+   if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)) {
+   dev_warn(
+   gdev->dev,
+   "PCI device is untrusted but ARM32 does not 
support secure IOMMU operation, continuing anyway.\n");


To within experimental error, this is dead code. The ARM DMA ops don't 

Re: [PATCH v2 04/25] iommu: Add IOMMU_DOMAIN_PLATFORM for S390

2023-06-01 Thread Robin Murphy

On 2023-05-16 01:00, Jason Gunthorpe wrote:

The PLATFORM domain will be set as the default domain and attached as
normal during probe. The driver will ignore the initial attach from a NULL
domain to the PLATFORM domain.

After this, the PLATFORM domain's attach_dev will be called whenever we
detach from an UNMANAGED domain (eg for VFIO). This is the same time the
original design would have called op->detach_dev().

This is temporary until the S390 dma-iommu.c conversion is merged.


If we do need a stopgap here, can we please just call the current 
situation an identity domain? It's true enough in the sense that the 
IOMMU API is not offering any translation or guarantee of isolation, so 
the semantics of an identity domain - from the point of view of anything 
inside the IOMMU API that would be looking - are no weaker or less 
useful than a "platform" domain whose semantics are intentionally unknown.


Then similarly for patch #3 - since we already know s390 is temporary, 
it seems an anathema to introduce a whole domain type with its own weird 
ops->default_domain mechanism solely for POWER to not actually use 
domains with.


In terms of reasoning, I don't see that IOMMU_DOMAIN_PLATFORM is any 
more useful than a NULL default domain, it just renames the problem, and 
gives us more code to maintain for the privilege. As I say, though, we 
don't actually need to juggle the semantic of a "we don't know what's 
happening here" domain around any further, since it works out that a 
"we're not influencing anything here" domain actually suffices for what 
we want to reason about, and those are already well-defined. Sure, the 
platform DMA ops *might* be doing more, but that's beyond the scope of 
the IOMMU API either way. At that point, lo and behold, s390 and POWER 
now look just like ARM and the core code only needs a single special 
case for arch-specific default identity domains, lovely!


Thanks,
Robin.


Tested-by: Heiko Stuebner 
Tested-by: Niklas Schnelle 
Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/s390-iommu.c | 21 +++--
  1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fbf59a8db29b11..f0c867c57a5b9b 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -142,14 +142,31 @@ static int s390_iommu_attach_device(struct iommu_domain 
*domain,
return 0;
  }
  
-static void s390_iommu_set_platform_dma(struct device *dev)

+/*
+ * Switch control over the IOMMU to S390's internal dma_api ops
+ */
+static int s390_iommu_platform_attach(struct iommu_domain *platform_domain,
+ struct device *dev)
  {
struct zpci_dev *zdev = to_zpci_dev(dev);
  
+	if (!zdev->s390_domain)

+   return 0;
+
__s390_iommu_detach_device(zdev);
zpci_dma_init_device(zdev);
+   return 0;
  }
  
+static struct iommu_domain_ops s390_iommu_platform_ops = {

+   .attach_dev = s390_iommu_platform_attach,
+};
+
+static struct iommu_domain s390_iommu_platform_domain = {
+   .type = IOMMU_DOMAIN_PLATFORM,
+   .ops = _iommu_platform_ops,
+};
+
  static void s390_iommu_get_resv_regions(struct device *dev,
struct list_head *list)
  {
@@ -428,12 +445,12 @@ void zpci_destroy_iommu(struct zpci_dev *zdev)
  }
  
  static const struct iommu_ops s390_iommu_ops = {

+   .default_domain = _iommu_platform_domain,
.capable = s390_iommu_capable,
.domain_alloc = s390_domain_alloc,
.probe_device = s390_iommu_probe_device,
.release_device = s390_iommu_release_device,
.device_group = generic_device_group,
-   .set_platform_dma_ops = s390_iommu_set_platform_dma,
.pgsize_bitmap = SZ_4K,
.get_resv_regions = s390_iommu_get_resv_regions,
.default_domain_ops = &(const struct iommu_domain_ops) {


Re: [PATCH 4/7] drm/apu: Add support of IOMMU

2023-05-18 Thread Robin Murphy

On 2023-05-17 15:52, Alexandre Bailon wrote:

Some APU devices are behind an IOMMU.
For some of these devices, we can't use DMA API because
they use static addresses so we have to manually use
IOMMU API to correctly map the buffers.


Except you still need to use the DMA for the sake of cache coherency and 
any other aspects :(



This adds support of IOMMU.

Signed-off-by: Alexandre Bailon 
Reviewed-by: Julien Stephan 
---
  drivers/gpu/drm/apu/apu_drv.c  |   4 +
  drivers/gpu/drm/apu/apu_gem.c  | 174 +
  drivers/gpu/drm/apu/apu_internal.h |  16 +++
  drivers/gpu/drm/apu/apu_sched.c|  28 +
  include/uapi/drm/apu_drm.h |  12 +-
  5 files changed, 233 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/apu/apu_drv.c b/drivers/gpu/drm/apu/apu_drv.c
index b6bd340b2bc8..a0dce785a02a 100644
--- a/drivers/gpu/drm/apu/apu_drv.c
+++ b/drivers/gpu/drm/apu/apu_drv.c
@@ -23,6 +23,10 @@ static const struct drm_ioctl_desc ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue,
  DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
+ DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
+ DRM_RENDER_ALLOW),
  };
  
  DEFINE_DRM_GEM_DMA_FOPS(apu_drm_ops);

diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c
index 0e7b3b27942c..0a91363754c5 100644
--- a/drivers/gpu/drm/apu/apu_gem.c
+++ b/drivers/gpu/drm/apu/apu_gem.c
@@ -2,6 +2,9 @@
  //
  // Copyright 2020 BayLibre SAS
  
+#include 

+#include 
+
  #include 
  
  #include 

@@ -42,6 +45,7 @@ int ioctl_gem_new(struct drm_device *dev, void *data,
 */
apu_obj->size = args->size;
apu_obj->offset = 0;
+   apu_obj->iommu_refcount = 0;
mutex_init(_obj->mutex);
  
  	ret = drm_gem_handle_create(file_priv, gem_obj, >handle);

@@ -54,3 +58,173 @@ int ioctl_gem_new(struct drm_device *dev, void *data,
  
  	return 0;

  }
+
+void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj)
+{
+   int iova_pfn;
+   int i;
+
+   if (!obj->iommu_sgt)
+   return;
+
+   mutex_lock(>mutex);
+   obj->iommu_refcount--;
+   if (obj->iommu_refcount) {
+   mutex_unlock(>mutex);
+   return;
+   }
+
+   iova_pfn = PHYS_PFN(obj->iova);


Using mm layer operations on IOVAs looks wrong. In practice I don't 
think it's ultimately harmful, other than potentially making less 
efficient use of IOVA space if the CPU page size is larger than the 
IOMMU page size, but it's still a bad code smell when you're using an 
IOVA abstraction that is deliberately decoupled from CPU pages.



+   for (i = 0; i < obj->iommu_sgt->nents; i++) {
+   iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
+   PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
+   iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));


You can unmap a set of IOVA-contiguous mappings as a single range with 
one call.



+   }
+
+   sg_free_table(obj->iommu_sgt);
+   kfree(obj->iommu_sgt);
+
+   free_iova(_drm->iovad, PHYS_PFN(obj->iova));
+   mutex_unlock(>mutex);
+}
+
+static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj)
+{
+   if (obj->funcs)
+   return obj->funcs->get_sg_table(obj);
+   return NULL;
+}
+
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj)
+{
+   struct apu_gem_object *apu_obj = to_apu_bo(obj);
+   struct scatterlist *sgl;
+   phys_addr_t phys;
+   int total_buf_space;
+   int iova_pfn;
+   int iova;
+   int ret;
+   int i;
+
+   mutex_lock(_obj->mutex);
+   apu_obj->iommu_refcount++;
+   if (apu_obj->iommu_refcount != 1) {
+   mutex_unlock(_obj->mutex);
+   return 0;
+   }
+
+   apu_obj->iommu_sgt = apu_get_sg_table(obj);
+   if (IS_ERR(apu_obj->iommu_sgt)) {
+   mutex_unlock(_obj->mutex);
+   return PTR_ERR(apu_obj->iommu_sgt);
+   }
+
+   total_buf_space = obj->size;
+   iova_pfn = alloc_iova_fast(_drm->iovad,
+  total_buf_space >> PAGE_SHIFT,
+  apu_drm->iova_limit_pfn, true);


If you need things mapped at specific addresses like the commit message 
claims, the DMA IOVA allocator is a terrible tool for the job. DRM 
already has its own more flexible abstraction for address space 
management in the form of drm_mm, so as a DRM driver it would seem a lot 
more sensible to use one of those.


And even if you could justify using this allocator, I can't imagine 
there's any way you'd need the _fast version (further illustrated by the 
fact that you're freeing the IOVAs wrongly for that).



+   apu_obj->iova = 

[PATCH v2] drm/mediatek: Stop using iommu_present()

2023-05-10 Thread Robin Murphy
Remove the pointless check. If an IOMMU is providing transparent DMA API
ops for any device(s) we care about, the DT code will have enforced the
appropriate probe ordering already.

Signed-off-by: Robin Murphy 
---

v2: Rebase to 6.4-rc1

 drivers/gpu/drm/mediatek/mtk_drm_drv.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c 
b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index 6dcb4ba2466c..3e677eb0dc70 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -5,7 +5,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -582,9 +581,6 @@ static int mtk_drm_bind(struct device *dev)
struct drm_device *drm;
int ret, i;
 
-   if (!iommu_present(_bus_type))
-   return -EPROBE_DEFER;
-
pdev = of_find_device_by_node(private->mutex_node);
if (!pdev) {
dev_err(dev, "Waiting for disp-mutex device %pOF\n",
-- 
2.39.2.101.g768bb238c484.dirty



Re: [PATCH 1/3] iommu/dma: Clean up Kconfig

2023-05-05 Thread Robin Murphy

On 2023-05-05 15:50, Jason Gunthorpe wrote:

On Tue, Aug 16, 2022 at 06:28:03PM +0100, Robin Murphy wrote:

Although iommu-dma is a per-architecture chonce, that is currently
implemented in a rather haphazard way. Selecting from the arch Kconfig
was the original logical approach, but is complicated by having to
manage dependencies; conversely, selecting from drivers ends up hiding
the architecture dependency *too* well. Instead, let's just have it
enable itself automatically when IOMMU API support is enabled for the
relevant architectures. It can't get much clearer than that.

Signed-off-by: Robin Murphy 
---
  arch/arm64/Kconfig  | 1 -
  drivers/iommu/Kconfig   | 3 +--
  drivers/iommu/amd/Kconfig   | 1 -
  drivers/iommu/intel/Kconfig | 1 -
  4 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..59af600445c2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -209,7 +209,6 @@ config ARM64
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_GENERIC_VDSO
-   select IOMMU_DMA if IOMMU_SUPPORT
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 5c5cb5bee8b6..1d99c2d984fb 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -137,7 +137,7 @@ config OF_IOMMU
  
  # IOMMU-agnostic DMA-mapping layer

  config IOMMU_DMA
-   bool
+   def_bool ARM64 || IA64 || X86


Robin, do you remember why you added IA64 here? What is the Itanimum
IOMMU driver?


config INTEL_IOMMU
bool "Support for Intel IOMMU using DMA Remapping Devices"
depends on PCI_MSI && ACPI && (X86 || IA64)

Yes, really :)

Robin.


Re: [PATCH v2 5/5] fbdev: Define framebuffer I/O from Linux' I/O functions

2023-04-28 Thread Robin Murphy

On 2023-04-28 10:27, Thomas Zimmermann wrote:

Implement framebuffer I/O helpers, such as fb_read*() and fb_write*()
with Linux' regular I/O functions. Remove all ifdef cases for the
various architectures.

Most of the supported architectures use __raw_() I/O functions or treat
framebuffer memory like regular memory. This is also implemented by the
architectures' I/O function, so we can use them instead.

Sparc uses SBus to connect to framebuffer devices. It provides respective
implementations of the framebuffer I/O helpers. The involved sbus_()
I/O helpers map to the same code as Sparc's regular I/O functions. As
with other platforms, we can use those instead.

We leave a TODO item to replace all fb_() functions with their regular
I/O counterparts throughout the fbdev drivers.

Signed-off-by: Thomas Zimmermann 
---
  include/linux/fb.h | 63 +++---
  1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/include/linux/fb.h b/include/linux/fb.h
index 08cb47da71f8..4aa9e90edd17 100644
--- a/include/linux/fb.h
+++ b/include/linux/fb.h
@@ -15,7 +15,6 @@
  #include 
  #include 
  #include 
-#include 
  
  struct vm_area_struct;

  struct fb_info;
@@ -511,58 +510,26 @@ struct fb_info {
   */
  #define STUPID_ACCELF_TEXT_SHIT
  
-// This will go away

-#if defined(__sparc__)
-
-/* We map all of our framebuffers such that big-endian accesses
- * are what we want, so the following is sufficient.
+/*
+ * TODO: Update fbdev drivers to call the I/O helpers directly and
+ *   remove the fb_() tokens.
   */
-
-// This will go away
-#define fb_readb sbus_readb
-#define fb_readw sbus_readw
-#define fb_readl sbus_readl
-#define fb_readq sbus_readq
-#define fb_writeb sbus_writeb
-#define fb_writew sbus_writew
-#define fb_writel sbus_writel
-#define fb_writeq sbus_writeq
-#define fb_memset sbus_memset_io
-#define fb_memcpy_fromfb sbus_memcpy_fromio
-#define fb_memcpy_tofb sbus_memcpy_toio
-
-#elif defined(__i386__) || defined(__alpha__) || defined(__x86_64__) ||
\
-   defined(__hppa__) || defined(__sh__) || defined(__powerpc__) || \
-   defined(__arm__) || defined(__aarch64__) || defined(__mips__)
-
-#define fb_readb __raw_readb
-#define fb_readw __raw_readw
-#define fb_readl __raw_readl
-#define fb_readq __raw_readq
-#define fb_writeb __raw_writeb
-#define fb_writew __raw_writew
-#define fb_writel __raw_writel
-#define fb_writeq __raw_writeq


Note that on at least some architectures, the __raw variants are 
native-endian, whereas the regular accessors are explicitly 
little-endian, so there is a slight risk of inadvertently changing 
behaviour on big-endian systems (MIPS most likely, but a few old ARM 
platforms run BE as well).



+#define fb_readb readb
+#define fb_readw readw
+#define fb_readl readl
+#if defined(CONFIG_64BIT)
+#define fb_readq readq
+#endif


You probably don't need to bother making these conditional - 32-bit 
architectures aren't forbidden from providing readq/writeq if they 
really want to, and drivers can also use the io-64-nonatomic headers for 
portability. The build will still fail in a sufficiently obvious manner 
if neither is true.


Thanks,
Robin.


+#define fb_writeb writeb
+#define fb_writew writew
+#define fb_writel writel
+#if defined(CONFIG_64BIT)
+#define fb_writeq writeq
+#endif
  #define fb_memset memset_io
  #define fb_memcpy_fromfb memcpy_fromio
  #define fb_memcpy_tofb memcpy_toio
  
-#else

-
-#define fb_readb(addr) (*(volatile u8 *) (addr))
-#define fb_readw(addr) (*(volatile u16 *) (addr))
-#define fb_readl(addr) (*(volatile u32 *) (addr))
-#define fb_readq(addr) (*(volatile u64 *) (addr))
-#define fb_writeb(b,addr) (*(volatile u8 *) (addr) = (b))
-#define fb_writew(b,addr) (*(volatile u16 *) (addr) = (b))
-#define fb_writel(b,addr) (*(volatile u32 *) (addr) = (b))
-#define fb_writeq(b,addr) (*(volatile u64 *) (addr) = (b))
-#define fb_memset memset
-#define fb_memcpy_fromfb memcpy
-#define fb_memcpy_tofb memcpy
-
-#endif
-
  #define FB_LEFT_POS(p, bpp)  (fb_be_math(p) ? (32 - (bpp)) : 0)
  #define FB_SHIFT_HIGH(p, val, bits)  (fb_be_math(p) ? (val) >> (bits) : \
  (val) << (bits))


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v2 5/5] fbdev: Define framebuffer I/O from Linux' I/O functions

2023-04-28 Thread Robin Murphy

On 2023-04-28 10:27, Thomas Zimmermann wrote:

Implement framebuffer I/O helpers, such as fb_read*() and fb_write*()
with Linux' regular I/O functions. Remove all ifdef cases for the
various architectures.

Most of the supported architectures use __raw_() I/O functions or treat
framebuffer memory like regular memory. This is also implemented by the
architectures' I/O function, so we can use them instead.

Sparc uses SBus to connect to framebuffer devices. It provides respective
implementations of the framebuffer I/O helpers. The involved sbus_()
I/O helpers map to the same code as Sparc's regular I/O functions. As
with other platforms, we can use those instead.

We leave a TODO item to replace all fb_() functions with their regular
I/O counterparts throughout the fbdev drivers.

Signed-off-by: Thomas Zimmermann 
---
  include/linux/fb.h | 63 +++---
  1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/include/linux/fb.h b/include/linux/fb.h
index 08cb47da71f8..4aa9e90edd17 100644
--- a/include/linux/fb.h
+++ b/include/linux/fb.h
@@ -15,7 +15,6 @@
  #include 
  #include 
  #include 
-#include 
  
  struct vm_area_struct;

  struct fb_info;
@@ -511,58 +510,26 @@ struct fb_info {
   */
  #define STUPID_ACCELF_TEXT_SHIT
  
-// This will go away

-#if defined(__sparc__)
-
-/* We map all of our framebuffers such that big-endian accesses
- * are what we want, so the following is sufficient.
+/*
+ * TODO: Update fbdev drivers to call the I/O helpers directly and
+ *   remove the fb_() tokens.
   */
-
-// This will go away
-#define fb_readb sbus_readb
-#define fb_readw sbus_readw
-#define fb_readl sbus_readl
-#define fb_readq sbus_readq
-#define fb_writeb sbus_writeb
-#define fb_writew sbus_writew
-#define fb_writel sbus_writel
-#define fb_writeq sbus_writeq
-#define fb_memset sbus_memset_io
-#define fb_memcpy_fromfb sbus_memcpy_fromio
-#define fb_memcpy_tofb sbus_memcpy_toio
-
-#elif defined(__i386__) || defined(__alpha__) || defined(__x86_64__) ||
\
-   defined(__hppa__) || defined(__sh__) || defined(__powerpc__) || \
-   defined(__arm__) || defined(__aarch64__) || defined(__mips__)
-
-#define fb_readb __raw_readb
-#define fb_readw __raw_readw
-#define fb_readl __raw_readl
-#define fb_readq __raw_readq
-#define fb_writeb __raw_writeb
-#define fb_writew __raw_writew
-#define fb_writel __raw_writel
-#define fb_writeq __raw_writeq


Note that on at least some architectures, the __raw variants are 
native-endian, whereas the regular accessors are explicitly 
little-endian, so there is a slight risk of inadvertently changing 
behaviour on big-endian systems (MIPS most likely, but a few old ARM 
platforms run BE as well).



+#define fb_readb readb
+#define fb_readw readw
+#define fb_readl readl
+#if defined(CONFIG_64BIT)
+#define fb_readq readq
+#endif


You probably don't need to bother making these conditional - 32-bit 
architectures aren't forbidden from providing readq/writeq if they 
really want to, and drivers can also use the io-64-nonatomic headers for 
portability. The build will still fail in a sufficiently obvious manner 
if neither is true.


Thanks,
Robin.


+#define fb_writeb writeb
+#define fb_writew writew
+#define fb_writel writel
+#if defined(CONFIG_64BIT)
+#define fb_writeq writeq
+#endif
  #define fb_memset memset_io
  #define fb_memcpy_fromfb memcpy_fromio
  #define fb_memcpy_tofb memcpy_toio
  
-#else

-
-#define fb_readb(addr) (*(volatile u8 *) (addr))
-#define fb_readw(addr) (*(volatile u16 *) (addr))
-#define fb_readl(addr) (*(volatile u32 *) (addr))
-#define fb_readq(addr) (*(volatile u64 *) (addr))
-#define fb_writeb(b,addr) (*(volatile u8 *) (addr) = (b))
-#define fb_writew(b,addr) (*(volatile u16 *) (addr) = (b))
-#define fb_writel(b,addr) (*(volatile u32 *) (addr) = (b))
-#define fb_writeq(b,addr) (*(volatile u64 *) (addr) = (b))
-#define fb_memset memset
-#define fb_memcpy_fromfb memcpy
-#define fb_memcpy_tofb memcpy
-
-#endif
-
  #define FB_LEFT_POS(p, bpp)  (fb_be_math(p) ? (32 - (bpp)) : 0)
  #define FB_SHIFT_HIGH(p, val, bits)  (fb_be_math(p) ? (val) >> (bits) : \
  (val) << (bits))


Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper

2023-03-31 Thread Robin Murphy

On 31/03/2023 3:00 pm, Arnd Bergmann wrote:

On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote:

On 2023-03-27 13:13, Arnd Bergmann wrote:


[ HELP NEEDED: can anyone confirm that it is a correct assumption
on arm that a cache-coherent device writing to a page always results
in it being in a PG_dcache_clean state like on ia64, or can a device
write directly into the dcache?]


In AMBA at least, if a snooping write hits in a cache then the data is
most likely going to get routed directly into that cache. If it has
write-back write-allocate attributes it could also land in any cache
along its normal path to RAM; it wouldn't have to go all the way.

Hence all the fun we have where treating a coherent device as
non-coherent can still be almost as broken as the other way round :)


Ok, thanks for the information. I'm still not sure whether this can
result in the situation where PG_dcache_clean is wrong though.

Specifically, the question is whether a DMA to a coherent buffer
can end up in a dirty L1 dcache of one core and require to write
back the dcache before invalidating the icache for that page.

On ia64, this is not the case, the optimization here is to
only flush the icache after a coherent DMA into an executable
user page, while Arm only does this for noncoherent DMA but not
coherent DMA.

 From your explanation it sounds like this might happen,
even though that would mean that "coherent" DMA is slightly
less coherent than it is elsewhere.

To be on the safe side, I'd have to pass a flag into
arch_dma_mark_clean() about coherency, to let the arm
implementation still require the extra dcache flush
for coherent DMA, while ia64 can ignore that flag.


Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA 
write should be pretty much equivalent to a coherent write by another 
CPU (or indeed the local CPU itself) - nothing says that it *couldn't* 
dirty a line in a data cache above the level of unification, so in 
general the assumption must be that, yes, if coherent DMA is writing 
data intended to be executable, then it's going to want a Dcache clean 
to PoU and an Icache invalidate to PoU before trying to execute it. By 
comparison, a non-coherent DMA transfer will inherently have to 
invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot 
leave dirty data above the PoU, so only the Icache maintenance is 
required in the executable case.


(FWIW I believe the Armv8 IDC/DIC features can safely be considered 
irrelevant to 32-bit kernels)


I don't know a great deal about IA-64, but it appears to be using its 
PG_arch_1 flag in a subtly different manner to Arm, namely to optimise 
out the *Icache* maintenance. So if anything, it seems IA-64 is the 
weirdo here (who'd have guessed?) where DMA manages to be *more* 
coherent than the CPUs themselves :)


This is all now making me think we need some careful consideration of 
whether the benefits of consolidating code outweigh the confusion of 
conflating multiple different meanings of "clean" together...


Thanks,
Robin.


Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper

2023-03-31 Thread Robin Murphy

On 31/03/2023 3:00 pm, Arnd Bergmann wrote:

On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote:

On 2023-03-27 13:13, Arnd Bergmann wrote:


[ HELP NEEDED: can anyone confirm that it is a correct assumption
on arm that a cache-coherent device writing to a page always results
in it being in a PG_dcache_clean state like on ia64, or can a device
write directly into the dcache?]


In AMBA at least, if a snooping write hits in a cache then the data is
most likely going to get routed directly into that cache. If it has
write-back write-allocate attributes it could also land in any cache
along its normal path to RAM; it wouldn't have to go all the way.

Hence all the fun we have where treating a coherent device as
non-coherent can still be almost as broken as the other way round :)


Ok, thanks for the information. I'm still not sure whether this can
result in the situation where PG_dcache_clean is wrong though.

Specifically, the question is whether a DMA to a coherent buffer
can end up in a dirty L1 dcache of one core and require to write
back the dcache before invalidating the icache for that page.

On ia64, this is not the case, the optimization here is to
only flush the icache after a coherent DMA into an executable
user page, while Arm only does this for noncoherent DMA but not
coherent DMA.

 From your explanation it sounds like this might happen,
even though that would mean that "coherent" DMA is slightly
less coherent than it is elsewhere.

To be on the safe side, I'd have to pass a flag into
arch_dma_mark_clean() about coherency, to let the arm
implementation still require the extra dcache flush
for coherent DMA, while ia64 can ignore that flag.


Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA 
write should be pretty much equivalent to a coherent write by another 
CPU (or indeed the local CPU itself) - nothing says that it *couldn't* 
dirty a line in a data cache above the level of unification, so in 
general the assumption must be that, yes, if coherent DMA is writing 
data intended to be executable, then it's going to want a Dcache clean 
to PoU and an Icache invalidate to PoU before trying to execute it. By 
comparison, a non-coherent DMA transfer will inherently have to 
invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot 
leave dirty data above the PoU, so only the Icache maintenance is 
required in the executable case.


(FWIW I believe the Armv8 IDC/DIC features can safely be considered 
irrelevant to 32-bit kernels)


I don't know a great deal about IA-64, but it appears to be using its 
PG_arch_1 flag in a subtly different manner to Arm, namely to optimise 
out the *Icache* maintenance. So if anything, it seems IA-64 is the 
weirdo here (who'd have guessed?) where DMA manages to be *more* 
coherent than the CPUs themselves :)


This is all now making me think we need some careful consideration of 
whether the benefits of consolidating code outweigh the confusion of 
conflating multiple different meanings of "clean" together...


Thanks,
Robin.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper

2023-03-27 Thread Robin Murphy

On 2023-03-27 13:13, Arnd Bergmann wrote:

From: Arnd Bergmann 

The arm version of the arch_sync_dma_for_cpu() function annotates pages as
PG_dcache_clean after a DMA, but no other architecture does this here. On
ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense
to use the same hook in order to have identical arch_sync_dma_for_cpu()
semantics as all other architectures.

Splitting this out has multiple effects:

  - for dma-direct, this now gets called after arch_sync_dma_for_cpu()
for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While
it would not be harmful to keep doing it for bidirectional mappings,
those are apparently not used in any callers that care about the flag.

  - Since arm has its own dma-iommu abstraction, this now also needs to
call the same function, so the calls are added there to mirror the
dma-direct version.

  - Like dma-direct, the dma-iommu version now marks the dcache clean
for both coherent and noncoherent devices after a DMA, but it only
does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL.

[ HELP NEEDED: can anyone confirm that it is a correct assumption
   on arm that a cache-coherent device writing to a page always results
   in it being in a PG_dcache_clean state like on ia64, or can a device
   write directly into the dcache?]


In AMBA at least, if a snooping write hits in a cache then the data is 
most likely going to get routed directly into that cache. If it has 
write-back write-allocate attributes it could also land in any cache 
along its normal path to RAM; it wouldn't have to go all the way.


Hence all the fun we have where treating a coherent device as 
non-coherent can still be almost as broken as the other way round :)


Cheers,
Robin.


Signed-off-by: Arnd Bergmann 
---
  arch/arm/Kconfig  |  1 +
  arch/arm/mm/dma-mapping.c | 71 +++
  2 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e24a9820e12f..125d58c54ab1 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -7,6 +7,7 @@ config ARM
select ARCH_HAS_BINFMT_FLAT
select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_DEBUG_VIRTUAL if MMU
+   select ARCH_HAS_DMA_MARK_CLEAN if MMU
select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index cc702cb27ae7..b703cb83d27e 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr,
} while (left);
  }
  
+/*

+ * Mark the D-cache clean for these pages to avoid extra flushing.
+ */
+void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
+{
+   unsigned long pfn = PFN_UP(paddr);
+   unsigned long off = paddr & (PAGE_SIZE - 1);
+   size_t left = size;
+
+   if (size < PAGE_SIZE)
+   return;
+
+   if (off)
+   left -= PAGE_SIZE - off;
+
+   while (left >= PAGE_SIZE) {
+   struct page *page = pfn_to_page(pfn++);
+   set_bit(PG_dcache_clean, >flags);
+   left -= PAGE_SIZE;
+   }
+}
+
  static bool arch_sync_dma_cpu_needs_post_dma_flush(void)
  {
if (IS_ENABLED(CONFIG_CPU_V6) ||
@@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
outer_inv_range(paddr, paddr + size);
dma_cache_maint(paddr, size, dmac_inv_range);
}
-
-   /*
-* Mark the D-cache clean for these pages to avoid extra flushing.
-*/
-   if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) {
-   unsigned long pfn = PFN_UP(paddr);
-   unsigned long off = paddr & (PAGE_SIZE - 1);
-   size_t left = size;
-
-   if (off)
-   left -= PAGE_SIZE - off;
-
-   while (left >= PAGE_SIZE) {
-   struct page *page = pfn_to_page(pfn++);
-   set_bit(PG_dcache_clean, >flags);
-   left -= PAGE_SIZE;
-   }
-   }
  }
  
  #ifdef CONFIG_ARM_DMA_USE_IOMMU

@@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct 
scatterlist *sg,
return -EINVAL;
  }
  
+static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len,

+  enum dma_data_direction dir,
+  bool dma_coherent)
+{
+   if (!dma_coherent)
+   arch_sync_dma_for_cpu(phys, s->length, dir);
+
+   if (dir == DMA_FROM_DEVICE)
+   arch_dma_mark_clean(phys, s->length);
+}
+
  /**
   * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
   * @dev: valid struct device pointer
@@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev,
if 

Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper

2023-03-27 Thread Robin Murphy

On 2023-03-27 13:13, Arnd Bergmann wrote:

From: Arnd Bergmann 

The arm version of the arch_sync_dma_for_cpu() function annotates pages as
PG_dcache_clean after a DMA, but no other architecture does this here. On
ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense
to use the same hook in order to have identical arch_sync_dma_for_cpu()
semantics as all other architectures.

Splitting this out has multiple effects:

  - for dma-direct, this now gets called after arch_sync_dma_for_cpu()
for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While
it would not be harmful to keep doing it for bidirectional mappings,
those are apparently not used in any callers that care about the flag.

  - Since arm has its own dma-iommu abstraction, this now also needs to
call the same function, so the calls are added there to mirror the
dma-direct version.

  - Like dma-direct, the dma-iommu version now marks the dcache clean
for both coherent and noncoherent devices after a DMA, but it only
does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL.

[ HELP NEEDED: can anyone confirm that it is a correct assumption
   on arm that a cache-coherent device writing to a page always results
   in it being in a PG_dcache_clean state like on ia64, or can a device
   write directly into the dcache?]


In AMBA at least, if a snooping write hits in a cache then the data is 
most likely going to get routed directly into that cache. If it has 
write-back write-allocate attributes it could also land in any cache 
along its normal path to RAM; it wouldn't have to go all the way.


Hence all the fun we have where treating a coherent device as 
non-coherent can still be almost as broken as the other way round :)


Cheers,
Robin.


Signed-off-by: Arnd Bergmann 
---
  arch/arm/Kconfig  |  1 +
  arch/arm/mm/dma-mapping.c | 71 +++
  2 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e24a9820e12f..125d58c54ab1 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -7,6 +7,7 @@ config ARM
select ARCH_HAS_BINFMT_FLAT
select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_DEBUG_VIRTUAL if MMU
+   select ARCH_HAS_DMA_MARK_CLEAN if MMU
select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index cc702cb27ae7..b703cb83d27e 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr,
} while (left);
  }
  
+/*

+ * Mark the D-cache clean for these pages to avoid extra flushing.
+ */
+void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
+{
+   unsigned long pfn = PFN_UP(paddr);
+   unsigned long off = paddr & (PAGE_SIZE - 1);
+   size_t left = size;
+
+   if (size < PAGE_SIZE)
+   return;
+
+   if (off)
+   left -= PAGE_SIZE - off;
+
+   while (left >= PAGE_SIZE) {
+   struct page *page = pfn_to_page(pfn++);
+   set_bit(PG_dcache_clean, >flags);
+   left -= PAGE_SIZE;
+   }
+}
+
  static bool arch_sync_dma_cpu_needs_post_dma_flush(void)
  {
if (IS_ENABLED(CONFIG_CPU_V6) ||
@@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
outer_inv_range(paddr, paddr + size);
dma_cache_maint(paddr, size, dmac_inv_range);
}
-
-   /*
-* Mark the D-cache clean for these pages to avoid extra flushing.
-*/
-   if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) {
-   unsigned long pfn = PFN_UP(paddr);
-   unsigned long off = paddr & (PAGE_SIZE - 1);
-   size_t left = size;
-
-   if (off)
-   left -= PAGE_SIZE - off;
-
-   while (left >= PAGE_SIZE) {
-   struct page *page = pfn_to_page(pfn++);
-   set_bit(PG_dcache_clean, >flags);
-   left -= PAGE_SIZE;
-   }
-   }
  }
  
  #ifdef CONFIG_ARM_DMA_USE_IOMMU

@@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct 
scatterlist *sg,
return -EINVAL;
  }
  
+static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len,

+  enum dma_data_direction dir,
+  bool dma_coherent)
+{
+   if (!dma_coherent)
+   arch_sync_dma_for_cpu(phys, s->length, dir);
+
+   if (dir == DMA_FROM_DEVICE)
+   arch_dma_mark_clean(phys, s->length);
+}
+
  /**
   * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
   * @dev: valid struct device pointer
@@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev,
if 

Re: [PATCH 3/3] of: address: Use dma_default_coherent to determine default coherency

2023-02-22 Thread Robin Murphy

On 2023-02-22 13:37, Jiaxun Yang wrote:

As for now all arches have dma_default_coherent matched with default
DMA coherency for of devices, so there is no need to have a standalone
config option.

This also fixes a case that for some MIPS platforms, coherency information
is not carried in devicetree and kernel will override dma_default_coherent
at early boot.

Note for PowerPC: CONFIG_OF_DMA_DEFUALT_COHERENT was only selected when
CONFIG_NOT_COHERENT_CACHE is false, in this case dma_default_coherent will
be true, so it still matches present behavior.

Note for RISC-V: dma_default_coherent is set to true at init code in this
series.


OK, so the fundamental problem here is that we have two slightly 
different conflicting mechanisms, the ex-PowerPC config option, and the 
ex-MIPS dma_default_coherent for which of_dma_is_coherent() has 
apparently been broken forever.


I'd agree that it's worth consolidating the two, but please separate out 
the fix as below, so it's feasible to backport without having to muck 
about in arch code.



Signed-off-by: Jiaxun Yang 
---
  arch/powerpc/Kconfig | 1 -
  arch/riscv/Kconfig   | 1 -
  drivers/of/Kconfig   | 4 
  drivers/of/address.c | 2 +-
  4 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2c9cdf1d8761..c67e5da714f7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -272,7 +272,6 @@ config PPC
select NEED_PER_CPU_PAGE_FIRST_CHUNKif PPC64
select NEED_SG_DMA_LENGTH
select OF
-   select OF_DMA_DEFAULT_COHERENT  if !NOT_COHERENT_CACHE
select OF_EARLY_FLATTREE
select OLD_SIGACTIONif PPC32
select OLD_SIGSUSPEND
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 1d46a268ce16..406c6816d289 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -119,7 +119,6 @@ config RISCV
select MODULES_USE_ELF_RELA if MODULES
select MODULE_SECTIONS if MODULES
select OF
-   select OF_DMA_DEFAULT_COHERENT
select OF_EARLY_FLATTREE
select OF_IRQ
select PCI_DOMAINS_GENERIC if PCI
diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
index 644386833a7b..e40f10bf2ba4 100644
--- a/drivers/of/Kconfig
+++ b/drivers/of/Kconfig
@@ -102,8 +102,4 @@ config OF_OVERLAY
  config OF_NUMA
bool
  
-config OF_DMA_DEFAULT_COHERENT

-   # arches should select this if DMA is coherent by default for OF devices
-   bool
-
  endif # OF
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 4c0b169ef9bf..23ade4919853 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -1103,7 +1103,7 @@ phys_addr_t __init of_dma_get_max_cpu_address(struct 
device_node *np)
  bool of_dma_is_coherent(struct device_node *np)
  {
struct device_node *node;
-   bool is_coherent = IS_ENABLED(CONFIG_OF_DMA_DEFAULT_COHERENT);
+   bool is_coherent = dma_default_coherent;


AFAICS, all you should actually need is a single self-contained addition 
here, something like:


+   /*
+* DT-based MIPS doesn't use OF_DMA_DEFAULT_COHERENT, but
+* might override the system-wide default at runtime.
+*/
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
+   defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
+   defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
+   is_coherent = dma_default_coherent;
+#endif

  
  	node = of_node_get(np);
  


Then *after* that's fixed, we can do a more comprehensive refactoring to 
merge the two mechanisms properly. FWIW I think I'd prefer an approach 
closer to the first one, where config options control the initial value 
of dma_default_coherent rather than architectures having to override it 
unconditionally (and TBH I'd also like to have a generic config symbol 
for whether an arch supports per-device coherency or not).


Thanks,
Robin.


Re: [PATCH 0/7] MIPS DMA coherence fixes

2023-02-22 Thread Robin Murphy

On 2023-02-22 13:04, Jiaxun Yang wrote:




2023年2月22日 12:55,Robin Murphy  写道:

On 2023-02-21 19:55, Jiaxun Yang wrote:

2023年2月21日 19:46,Robin Murphy  写道:

On 2023-02-21 18:15, Jiaxun Yang wrote:

2023年2月21日 17:54,Christoph Hellwig  写道:

Can you explain the motivation here?  Also why riscv patches are at
the end of a mips fіxes series?

Ah sorry for any confusion.
So the main purpose of this patch is to fix MIPS’s broken per-device coherency.
To be more precise, we want to be able to control the default coherency for all 
devices probed from
devicetree in early boot code.


Including the patch which actually does that would be helpful. As it is, 
patches 4-7 here just appear to be moving an option around for no practical 
effect.

Well the affect is default coherency of devicetree probed devices are now 
following dma_default_coherent
instead of a static Kconfig option. For MIPS platform, dma_default_coherent 
will be determined by boot code.


"Will be" is the issue I'm getting at. We can't review some future promise of a 
patch, we can only review actual patches. And it's hard to meaningfully review 
preparatory patches for some change without the full context of that change.


Actually this already present in current MIPS platform code.

arch/mips/mti-malta is setting dma_default_coherent on boot, and it’s 
devicetree does not explicitly specify coherency.


OK, this really needs to be explained much more clearly. I read this 
series as 3 actual fix patches, then 3 patches adding a new option to 
replace an existing one on the grounds that it "can be useful" for 
unspecified purposes, then a final cleanup patch removing the old option 
that has now been superseded.


Going back and looking closely I see there is actually a brief mention 
in the cleanup patch that it also happens to fix some issue, but even 
then it doesn't clearly explain what the issue really is or how and why 
the fix works and is appropriate.


Ideally, functional fixes and cleanup should be in distinct patches 
whenever that is reasonable. Sometimes the best fix is inherently a 
cleanup, but in such cases the patch should always be presented as the 
fix being its primary purpose. Please also use the cover letter to give 
reviewers an overview of the whole series if it's not merely a set of 
loosely-related patches that just happened to be convenient so send all 
together.


I think I do at least now understand the underlying problem well enough 
to have a think about whether this is the best way to address it.


Thanks,
Robin.


Re: [PATCH 0/7] MIPS DMA coherence fixes

2023-02-22 Thread Robin Murphy

On 2023-02-21 19:55, Jiaxun Yang wrote:




2023年2月21日 19:46,Robin Murphy  写道:

On 2023-02-21 18:15, Jiaxun Yang wrote:

2023年2月21日 17:54,Christoph Hellwig  写道:

Can you explain the motivation here?  Also why riscv patches are at
the end of a mips fіxes series?

Ah sorry for any confusion.
So the main purpose of this patch is to fix MIPS’s broken per-device coherency.
To be more precise, we want to be able to control the default coherency for all 
devices probed from
devicetree in early boot code.


Including the patch which actually does that would be helpful. As it is, 
patches 4-7 here just appear to be moving an option around for no practical 
effect.


Well the affect is default coherency of devicetree probed devices are now 
following dma_default_coherent
instead of a static Kconfig option. For MIPS platform, dma_default_coherent 
will be determined by boot code.


"Will be" is the issue I'm getting at. We can't review some future 
promise of a patch, we can only review actual patches. And it's hard to 
meaningfully review preparatory patches for some change without the full 
context of that change.


Thanks,
Robin.


Re: [PATCH 0/7] MIPS DMA coherence fixes

2023-02-21 Thread Robin Murphy

On 2023-02-21 18:15, Jiaxun Yang wrote:




2023年2月21日 17:54,Christoph Hellwig  写道:

Can you explain the motivation here?  Also why riscv patches are at
the end of a mips fіxes series?


Ah sorry for any confusion.

So the main purpose of this patch is to fix MIPS’s broken per-device coherency.
To be more precise, we want to be able to control the default coherency for all 
devices probed from
devicetree in early boot code.


Including the patch which actually does that would be helpful. As it is, 
patches 4-7 here just appear to be moving an option around for no 
practical effect.


Robin.


To achieve that I decided to reuse dma_default_coherent to set default 
coherency for devicetree.
And all later patches are severing for this purpose.

Thanks
- Jiaxun


Re: [PATCH 4/7] dma-mapping: Always provide dma_default_coherent

2023-02-21 Thread Robin Murphy

On 2023-02-21 17:58, Christoph Hellwig wrote:

On Tue, Feb 21, 2023 at 12:46:10PM +, Jiaxun Yang wrote:

dma_default_coherent can be useful for determine default coherency
even on arches without noncoherent support.


How?


Indeed, "default" is conceptually meaningless when there is no possible 
alternative :/


Robin.


Re: [PATCH v2 04/10] iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous()

2023-01-20 Thread Robin Murphy

On 2023-01-18 18:00, Jason Gunthorpe wrote:

Change the sg_alloc_table_from_pages() allocation that was hardwired to
GFP_KERNEL to use the gfp parameter like the other allocations in this
function.

Auditing says this is never called from an atomic context, so it is safe
as is, but reads wrong.


I think the point may have been that the sgtable metadata is a 
logically-distinct allocation from the buffer pages themselves. Much 
like the allocation of the pages array itself further down in 
__iommu_dma_alloc_pages(). I see these days it wouldn't be catastrophic 
to pass GFP_HIGHMEM into __get_free_page() via sg_kmalloc(), but still, 
allocating implementation-internal metadata with all the same 
constraints as a DMA buffer has just as much smell of wrong about it IMO.


I'd say the more confusing thing about this particular context is why 
we're using iommu_map_sg_atomic() further down - that seems to have been 
an oversight in 781ca2de89ba, since this particular path has never 
supported being called in atomic context.


Overall I'm starting to wonder if it might not be better to stick a "use 
GFP_KERNEL_ACCOUNT if you allocate" flag in the domain for any level of 
the API internals to pick up as appropriate, rather than propagate 
per-call gfp flags everywhere. As it stands we're still missing 
potential pagetable and other domain-related allocations by drivers in 
.attach_dev and even (in probably-shouldn't-really-happen cases) 
.unmap_pages...


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8c2788633c1766..e4bf1bb159f7c7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -822,7 +822,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
if (!iova)
goto out_free_pages;
  
-	if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))

+   if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, gfp))
goto out_free_iova;
  
  	if (!(ioprot & IOMMU_CACHE)) {


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 04/10] iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous()

2023-01-20 Thread Robin Murphy

On 2023-01-18 18:00, Jason Gunthorpe wrote:

Change the sg_alloc_table_from_pages() allocation that was hardwired to
GFP_KERNEL to use the gfp parameter like the other allocations in this
function.

Auditing says this is never called from an atomic context, so it is safe
as is, but reads wrong.


I think the point may have been that the sgtable metadata is a 
logically-distinct allocation from the buffer pages themselves. Much 
like the allocation of the pages array itself further down in 
__iommu_dma_alloc_pages(). I see these days it wouldn't be catastrophic 
to pass GFP_HIGHMEM into __get_free_page() via sg_kmalloc(), but still, 
allocating implementation-internal metadata with all the same 
constraints as a DMA buffer has just as much smell of wrong about it IMO.


I'd say the more confusing thing about this particular context is why 
we're using iommu_map_sg_atomic() further down - that seems to have been 
an oversight in 781ca2de89ba, since this particular path has never 
supported being called in atomic context.


Overall I'm starting to wonder if it might not be better to stick a "use 
GFP_KERNEL_ACCOUNT if you allocate" flag in the domain for any level of 
the API internals to pick up as appropriate, rather than propagate 
per-call gfp flags everywhere. As it stands we're still missing 
potential pagetable and other domain-related allocations by drivers in 
.attach_dev and even (in probably-shouldn't-really-happen cases) 
.unmap_pages...


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8c2788633c1766..e4bf1bb159f7c7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -822,7 +822,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
if (!iova)
goto out_free_pages;
  
-	if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))

+   if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, gfp))
goto out_free_iova;
  
  	if (!(ioprot & IOMMU_CACHE)) {


  1   2   3   4   5   6   7   8   9   10   >