from:"Robin Murphy"

Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-06 Thread Robin Murphy


On 06/09/2024 11:56 am, Will Deacon wrote:

On Thu, Sep 05, 2024 at 05:27:28PM +0100, Robin Murphy wrote:

On 05/09/2024 4:53 pm, Will Deacon wrote:

On Thu, Sep 05, 2024 at 05:49:56AM -0700, Rob Clark wrote:

From: Rob Clark 

This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.

It was causing gpu smmu faults on x1e80100.

I _think_ what is causing this is the change in ordering of
__arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
memory) and io_pgtable_tlb_flush_walk().  I'm not entirely sure how
this patch is supposed to work correctly in the face of other
concurrent translations (to buffers unrelated to the one being
unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
stale data read back into the tlb.

Signed-off-by: Rob Clark 
---
   drivers/iommu/io-pgtable-arm.c | 31 ++-
   1 file changed, 14 insertions(+), 17 deletions(-)


Please can you try the diff below, instead?


Given that the GPU driver's .tlb_add_page is a no-op, I can't see this
making a difference. In fact, given that msm_iommu_pagetable_unmap() still
does a brute-force iommu_flush_iotlb_all() after io-pgtable returns, and in
fact only recently made .tlb_flush_walk start doing anything either for the
sake of the map path, I'm now really wondering how this patch has had any
effect at all... :/


Hmm, yup. Looks like Rob has come back to say the problem lies elsewhere
anyway.

One thing below though...



Will

--->8

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 0e67f1721a3d..0a32e9499e2c 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -672,7 +672,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
  /* Clear the remaining entries */
  __arm_lpae_clear_pte(ptep, &iop->cfg, i);
-   if (gather && !iommu_iotlb_gather_queued(gather))
+   if (!iommu_iotlb_gather_queued(gather))


Note that this would reintroduce the latent issue which was present
originally, wherein iommu_iotlb_gather_queued(NULL) is false, but if we
actually allow a NULL gather to be passed to io_pgtable_tlb_add_page() it
may end up being dereferenced (e.g. in arm-smmu-v3).


I think there is still something to fix here. arm_lpae_init_pte() can
pass a NULL gather to __arm_lpae_unmap() and I don't think skipping the
invalidation is correct in that case. Either the drivers need to handle
that or we shouldn't be passing NULL.

What do you think?


The subtlety there is that in that case it's always a non-leaf PTE, so 
all that goes back to the driver is io_pgtable_tlb_flush_walk() and the 
gather is never used.


Thanks,
Robin.

Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-06 Thread Robin Murphy


On 2024-09-05 6:10 pm, Rob Clark wrote:

On Thu, Sep 5, 2024 at 10:00 AM Rob Clark  wrote:


On Thu, Sep 5, 2024 at 9:27 AM Robin Murphy  wrote:


On 05/09/2024 4:53 pm, Will Deacon wrote:

Hi Rob,

On Thu, Sep 05, 2024 at 05:49:56AM -0700, Rob Clark wrote:

From: Rob Clark 

This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.

It was causing gpu smmu faults on x1e80100.

I _think_ what is causing this is the change in ordering of
__arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
memory) and io_pgtable_tlb_flush_walk().  I'm not entirely sure how
this patch is supposed to work correctly in the face of other
concurrent translations (to buffers unrelated to the one being
unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
stale data read back into the tlb.

Signed-off-by: Rob Clark 
---
   drivers/iommu/io-pgtable-arm.c | 31 ++-
   1 file changed, 14 insertions(+), 17 deletions(-)


Please can you try the diff below, instead?


Given that the GPU driver's .tlb_add_page is a no-op, I can't see this
making a difference. In fact, given that msm_iommu_pagetable_unmap()
still does a brute-force iommu_flush_iotlb_all() after io-pgtable
returns, and in fact only recently made .tlb_flush_walk start doing
anything either for the sake of the map path, I'm now really wondering
how this patch has had any effect at all... :/


Yeah..  and unfortunately the TBU code only supports two devices so
far, so I can't easily repro with TBU enabled atm.  Hmm..
__arm_lpae_unmap() is also called in the ->map() path, although not
sure how that changes things.


Ok, an update.. after a reboot, still with this patch reverted, I once
again see faults.  So I guess that vindicates the original patch, and
leaves me still searching..

fwiw, fault info from the gpu devcore:

-
fault-info:
   - ttbr0=000919306000
   - iova=000100c17000
   - dir=WRITE
   - type=UNKNOWN
   - source=CP
pgtable-fault-info:
   - ttbr0: 00090ca4
   - asid: 0
   - ptes: 00095db47003 00095db48003 000914c8f003 0008fd7f0f47
-

the 'ptes' part shows the table walk, which looks ok to me..


But is it the right pagetable at all, given that the "ttbr0" values 
appear to be indicating different places?


Robin.

Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-05 Thread Robin Murphy


On 05/09/2024 4:53 pm, Will Deacon wrote:

Hi Rob,

On Thu, Sep 05, 2024 at 05:49:56AM -0700, Rob Clark wrote:

From: Rob Clark 

This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.

It was causing gpu smmu faults on x1e80100.

I _think_ what is causing this is the change in ordering of
__arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
memory) and io_pgtable_tlb_flush_walk().  I'm not entirely sure how
this patch is supposed to work correctly in the face of other
concurrent translations (to buffers unrelated to the one being
unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
stale data read back into the tlb.

Signed-off-by: Rob Clark 
---
  drivers/iommu/io-pgtable-arm.c | 31 ++-
  1 file changed, 14 insertions(+), 17 deletions(-)


Please can you try the diff below, instead?


Given that the GPU driver's .tlb_add_page is a no-op, I can't see this 
making a difference. In fact, given that msm_iommu_pagetable_unmap() 
still does a brute-force iommu_flush_iotlb_all() after io-pgtable 
returns, and in fact only recently made .tlb_flush_walk start doing 
anything either for the sake of the map path, I'm now really wondering 
how this patch has had any effect at all... :/




Will

--->8

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 0e67f1721a3d..0a32e9499e2c 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -672,7 +672,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
 /* Clear the remaining entries */
 __arm_lpae_clear_pte(ptep, &iop->cfg, i);
  
-   if (gather && !iommu_iotlb_gather_queued(gather))

+   if (!iommu_iotlb_gather_queued(gather))


Note that this would reintroduce the latent issue which was present 
originally, wherein iommu_iotlb_gather_queued(NULL) is false, but if we 
actually allow a NULL gather to be passed to io_pgtable_tlb_add_page() 
it may end up being dereferenced (e.g. in arm-smmu-v3).


Thanks,
Robin.


 for (int j = 0; j < i; j++)
 io_pgtable_tlb_add_page(iop, gather, iova + j 
* size, size);

Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-05 Thread Robin Murphy


On 05/09/2024 2:57 pm, Rob Clark wrote:

On Thu, Sep 5, 2024 at 6:24 AM Robin Murphy  wrote:


On 05/09/2024 1:49 pm, Rob Clark wrote:

From: Rob Clark 

This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.

It was causing gpu smmu faults on x1e80100.

I _think_ what is causing this is the change in ordering of
__arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
memory) and io_pgtable_tlb_flush_walk().


As I just commented, how do you believe the order of operations between:

 __arm_lpae_clear_pte();
 if (!iopte_leaf()) {
 io_pgtable_tlb_flush_walk();

and:

 if (!iopte_leaf()) {
 __arm_lpae_clear_pte();
 io_pgtable_tlb_flush_walk();

fundamentally differs?


from my reading of the original patch, the ordering is the same for
non-leaf nodes, but not for leaf nodes


But tlb_flush_walk is never called for leaf entries either way, so no 
such ordering exists... And the non-leaf path still calls 
__arm_lpae_clear_pte() and io_pgtable_tlb_add_page() in the same order 
relative to each other too, it's just now done for the whole range in 
one go, after any non-leaf entries have already been dealt with.


Thanks,
Robin.




I'm not saying there couldn't be some subtle bug in the implementation
which we've all missed, but I still can't see an issue with the intended
logic.


  I'm not entirely sure how
this patch is supposed to work correctly in the face of other
concurrent translations (to buffers unrelated to the one being
unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
stale data read back into the tlb.


Read back from where? The ex-table PTE which was already set to zero
before tlb_flush_walk was called?

And isn't the hilariously overcomplicated TBU driver supposed to be
telling you exactly what happened here? Otherwise I'm going to continue
to seriously question the purpose of shoehorning that upstream at all...


I guess I could try the TBU driver.  But I already had my patchset to
extract the pgtable walk for gpu devcore dump, and that is telling me
that the CPU view of the pgtable is fine.  Which I think just leaves a
tlbinv problem.  If that is the case, swapping the order of leaf node
cpu cache ops and tlbinv ops seems like the cause.  But maybe I'm
missing something.

BR,
-R


Thanks,
Robin.


Signed-off-by: Rob Clark 
---
   drivers/iommu/io-pgtable-arm.c | 31 ++-
   1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 16e51528772d..85261baa3a04 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -274,13 +274,13 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int 
num_entries,
  sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
   }

-static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg 
*cfg, int num_entries)
+static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg 
*cfg)
   {
- for (int i = 0; i < num_entries; i++)
- ptep[i] = 0;

- if (!cfg->coherent_walk && num_entries)
- __arm_lpae_sync_pte(ptep, num_entries, cfg);
+ *ptep = 0;
+
+ if (!cfg->coherent_walk)
+ __arm_lpae_sync_pte(ptep, 1, cfg);
   }

   static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
@@ -653,28 +653,25 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
   max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
   num_entries = min_t(int, pgcount, max_entries);

- /* Find and handle non-leaf entries */
- for (i = 0; i < num_entries; i++) {
- pte = READ_ONCE(ptep[i]);
+ while (i < num_entries) {
+ pte = READ_ONCE(*ptep);
   if (WARN_ON(!pte))
   break;

- if (!iopte_leaf(pte, lvl, iop->fmt)) {
- __arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1);
+ __arm_lpae_clear_pte(ptep, &iop->cfg);

+ if (!iopte_leaf(pte, lvl, iop->fmt)) {
   /* Also flush any partial walks */
   io_pgtable_tlb_flush_walk(iop, iova + i * size, 
size,
 
ARM_LPAE_GRANULE(data));
   __arm_lpae_free_pgtable(data, lvl + 1, 
iopte_deref(pte, data));
+ } else if (!iommu_iotlb_gather_queued(gather)) {
+ io_pgtable_tlb_add_page(iop, gather, iova + i * 
size, size);
   }
- }

- /* Clear the remaining entries */
- __arm_lpae_clear_pte(ptep, &iop->cfg, i);
-
-

Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-05 Thread Robin Murphy


On 05/09/2024 2:24 pm, Robin Murphy wrote:

On 05/09/2024 1:49 pm, Rob Clark wrote:

From: Rob Clark 

This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.

It was causing gpu smmu faults on x1e80100.

I _think_ what is causing this is the change in ordering of
__arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
memory) and io_pgtable_tlb_flush_walk().


As I just commented, how do you believe the order of operations between:


...or at least I *thought* I'd just commented, but Thunderbird is now 
denying any knowledge of the reply to your other mail I swore I sent a 
couple of hours ago :/


Apologies for any confusion,
Robin.



 __arm_lpae_clear_pte();
  if (!iopte_leaf()) {
  io_pgtable_tlb_flush_walk();

and:

  if (!iopte_leaf()) {
     __arm_lpae_clear_pte();
  io_pgtable_tlb_flush_walk();

fundamentally differs?

I'm not saying there couldn't be some subtle bug in the implementation 
which we've all missed, but I still can't see an issue with the intended 
logic.



 I'm not entirely sure how
this patch is supposed to work correctly in the face of other
concurrent translations (to buffers unrelated to the one being
unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
stale data read back into the tlb.


Read back from where? The ex-table PTE which was already set to zero 
before tlb_flush_walk was called?


And isn't the hilariously overcomplicated TBU driver supposed to be 
telling you exactly what happened here? Otherwise I'm going to continue 
to seriously question the purpose of shoehorning that upstream at all...


Thanks,
Robin.


Signed-off-by: Rob Clark 
---
  drivers/iommu/io-pgtable-arm.c | 31 ++-
  1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 16e51528772d..85261baa3a04 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -274,13 +274,13 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte 
*ptep, int num_entries,

 sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
  }
-static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct 
io_pgtable_cfg *cfg, int num_entries)
+static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct 
io_pgtable_cfg *cfg)

  {
-    for (int i = 0; i < num_entries; i++)
-    ptep[i] = 0;
-    if (!cfg->coherent_walk && num_entries)
-    __arm_lpae_sync_pte(ptep, num_entries, cfg);
+    *ptep = 0;
+
+    if (!cfg->coherent_walk)
+    __arm_lpae_sync_pte(ptep, 1, cfg);
  }
  static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
@@ -653,28 +653,25 @@ static size_t __arm_lpae_unmap(struct 
arm_lpae_io_pgtable *data,

  max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
  num_entries = min_t(int, pgcount, max_entries);
-    /* Find and handle non-leaf entries */
-    for (i = 0; i < num_entries; i++) {
-    pte = READ_ONCE(ptep[i]);
+    while (i < num_entries) {
+    pte = READ_ONCE(*ptep);
  if (WARN_ON(!pte))
  break;
-    if (!iopte_leaf(pte, lvl, iop->fmt)) {
-    __arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1);
+    __arm_lpae_clear_pte(ptep, &iop->cfg);
+    if (!iopte_leaf(pte, lvl, iop->fmt)) {
  /* Also flush any partial walks */
  io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
    ARM_LPAE_GRANULE(data));
  __arm_lpae_free_pgtable(data, lvl + 1, 
iopte_deref(pte, data));

+    } else if (!iommu_iotlb_gather_queued(gather)) {
+    io_pgtable_tlb_add_page(iop, gather, iova + i * size, 
size);

  }
-    }
-    /* Clear the remaining entries */
-    __arm_lpae_clear_pte(ptep, &iop->cfg, i);
-
-    if (gather && !iommu_iotlb_gather_queued(gather))
-    for (int j = 0; j < i; j++)
-    io_pgtable_tlb_add_page(iop, gather, iova + j * size, 
size);

+    ptep++;
+    i++;
+    }
  return i * size;
  } else if (iopte_leaf(pte, lvl, iop->fmt)) {

Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-05 Thread Robin Murphy


On 05/09/2024 1:49 pm, Rob Clark wrote:

From: Rob Clark 

This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.

It was causing gpu smmu faults on x1e80100.

I _think_ what is causing this is the change in ordering of
__arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
memory) and io_pgtable_tlb_flush_walk().


As I just commented, how do you believe the order of operations between:

__arm_lpae_clear_pte();
if (!iopte_leaf()) {
io_pgtable_tlb_flush_walk();

and:

if (!iopte_leaf()) {
__arm_lpae_clear_pte();
io_pgtable_tlb_flush_walk();

fundamentally differs?

I'm not saying there couldn't be some subtle bug in the implementation 
which we've all missed, but I still can't see an issue with the intended 
logic.



 I'm not entirely sure how
this patch is supposed to work correctly in the face of other
concurrent translations (to buffers unrelated to the one being
unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
stale data read back into the tlb.


Read back from where? The ex-table PTE which was already set to zero 
before tlb_flush_walk was called?


And isn't the hilariously overcomplicated TBU driver supposed to be 
telling you exactly what happened here? Otherwise I'm going to continue 
to seriously question the purpose of shoehorning that upstream at all...


Thanks,
Robin.


Signed-off-by: Rob Clark 
---
  drivers/iommu/io-pgtable-arm.c | 31 ++-
  1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 16e51528772d..85261baa3a04 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -274,13 +274,13 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int 
num_entries,
   sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
  }
  
-static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg, int num_entries)

+static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg 
*cfg)
  {
-   for (int i = 0; i < num_entries; i++)
-   ptep[i] = 0;
  
-	if (!cfg->coherent_walk && num_entries)

-   __arm_lpae_sync_pte(ptep, num_entries, cfg);
+   *ptep = 0;
+
+   if (!cfg->coherent_walk)
+   __arm_lpae_sync_pte(ptep, 1, cfg);
  }
  
  static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,

@@ -653,28 +653,25 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
num_entries = min_t(int, pgcount, max_entries);
  
-		/* Find and handle non-leaf entries */

-   for (i = 0; i < num_entries; i++) {
-   pte = READ_ONCE(ptep[i]);
+   while (i < num_entries) {
+   pte = READ_ONCE(*ptep);
if (WARN_ON(!pte))
break;
  
-			if (!iopte_leaf(pte, lvl, iop->fmt)) {

-   __arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1);
+   __arm_lpae_clear_pte(ptep, &iop->cfg);
  
+			if (!iopte_leaf(pte, lvl, iop->fmt)) {

/* Also flush any partial walks */
io_pgtable_tlb_flush_walk(iop, iova + i * size, 
size,
  
ARM_LPAE_GRANULE(data));
__arm_lpae_free_pgtable(data, lvl + 1, 
iopte_deref(pte, data));
+   } else if (!iommu_iotlb_gather_queued(gather)) {
+   io_pgtable_tlb_add_page(iop, gather, iova + i * 
size, size);
}
-   }
  
-		/* Clear the remaining entries */

-   __arm_lpae_clear_pte(ptep, &iop->cfg, i);
-
-   if (gather && !iommu_iotlb_gather_queued(gather))
-   for (int j = 0; j < i; j++)
-   io_pgtable_tlb_add_page(iop, gather, iova + j * 
size, size);
+   ptep++;
+   i++;
+   }
  
  		return i * size;

} else if (iopte_leaf(pte, lvl, iop->fmt)) {

Re: [PATCH v4 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk

2024-06-17 Thread Robin Murphy


On 23/05/2024 6:52 pm, Rob Clark wrote:

From: Rob Clark 

Add an io-pgtable method to walk the pgtable returning the raw PTEs that
would be traversed for a given iova access.


Have to say I'm a little torn here - with my iommu-dma hat on I'm not 
super enthusiastic about adding any more overhead to iova_to_phys, but 
in terms of maintaining io-pgtable I do like the overall shape of the 
implementation...


Will, how much would you hate a compromise of inlining iova_to_phys as 
the default walk behaviour if cb is NULL? :)


That said, looking at the unmap figures for dma_map_benchmark on a 
Neoverse N1, any difference I think I see is still well within the 
noise, so maybe a handful of extra indirect calls isn't really enough to 
worry about?


Cheers,
Robin.


Signed-off-by: Rob Clark 
---
  drivers/iommu/io-pgtable-arm.c | 51 --
  include/linux/io-pgtable.h |  4 +++
  2 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f7828a7aad41..f47a0e64bb35 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -693,17 +693,19 @@ static size_t arm_lpae_unmap_pages(struct io_pgtable_ops 
*ops, unsigned long iov
data->start_level, ptep);
  }
  
-static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,

-unsigned long iova)
+static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned long 
iova,
+   int (*cb)(void *cb_data, void *pte, int level),
+   void *cb_data)
  {
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
arm_lpae_iopte pte, *ptep = data->pgd;
int lvl = data->start_level;
+   int ret;
  
  	do {

/* Valid IOPTE pointer? */
if (!ptep)
-   return 0;
+   return -EFAULT;
  
  		/* Grab the IOPTE we're interested in */

ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
@@ -711,22 +713,52 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
  
  		/* Valid entry? */

if (!pte)
-   return 0;
+   return -EFAULT;
+
+   ret = cb(cb_data, &pte, lvl);
+   if (ret)
+   return ret;
  
-		/* Leaf entry? */

+   /* Leaf entry?  If so, we've found the translation */
if (iopte_leaf(pte, lvl, data->iop.fmt))
-   goto found_translation;
+   return 0;
  
  		/* Take it to the next level */

ptep = iopte_deref(pte, data);
} while (++lvl < ARM_LPAE_MAX_LEVELS);
  
  	/* Ran out of page tables to walk */

+   return -EFAULT;
+}
+
+struct iova_to_phys_walk_data {
+   arm_lpae_iopte pte;
+   int level;
+};
+
+static int iova_to_phys_walk_cb(void *cb_data, void *pte, int level)
+{
+   struct iova_to_phys_walk_data *d = cb_data;
+
+   d->pte = *(arm_lpae_iopte *)pte;
+   d->level = level;
+
return 0;
+}
+
+static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+unsigned long iova)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct iova_to_phys_walk_data d;
+   int ret;
+
+   ret = arm_lpae_pgtable_walk(ops, iova, iova_to_phys_walk_cb, &d);
+   if (ret)
+   return 0;
  
-found_translation:

-   iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
-   return iopte_to_paddr(pte, data) | iova;
+   iova &= (ARM_LPAE_BLOCK_SIZE(d.level, data) - 1);
+   return iopte_to_paddr(d.pte, data) | iova;
  }
  
  static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)

@@ -807,6 +839,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.map_pages  = arm_lpae_map_pages,
.unmap_pages= arm_lpae_unmap_pages,
.iova_to_phys   = arm_lpae_iova_to_phys,
+   .pgtable_walk   = arm_lpae_pgtable_walk,
};
  
  	return data;

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 86cf1f7ae389..261b48af068a 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -177,6 +177,7 @@ struct io_pgtable_cfg {
   * @map_pages:Map a physically contiguous range of pages of the same size.
   * @unmap_pages:  Unmap a range of virtually contiguous pages of the same 
size.
   * @iova_to_phys: Translate iova to physical address.
+ * @pgtable_walk: (optional) Perform a page table walk for a given iova.
   *
   * These functions map directly onto the iommu_ops member functions with
   * the same names.
@@ -190,6 +191,9 @@ struct io_pgtable_ops {
  struct iommu_iotlb_gather *gather);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,

Re: [PATCH v2] iommu/arm-smmu-qcom: Add missing GMU entry to match table

2023-12-11 Thread Robin Murphy


On 2023-12-10 6:06 pm, Rob Clark wrote:

From: Rob Clark 

In some cases the firmware expects cbndx 1 to be assigned to the GMU,
so we also want the default domain for the GMU to be an identy domain.
This way it does not get a context bank assigned.  Without this, both
of_dma_configure() and drm/msm's iommu_domain_attach() will trigger
allocating and configuring a context bank.  So GMU ends up attached to
both cbndx 1 and later cbndx 2.  This arrangement seemingly confounds
and surprises the firmware if the GPU later triggers a translation
fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU
getting wedged and the GPU stuck without memory access.


Reviewed-by: Robin Murphy 


Cc: sta...@vger.kernel.org
Signed-off-by: Rob Clark 
---

I didn't add a fixes tag because really this issue has been there
all along, but either didn't matter with other firmware or we didn't
notice the problem.

  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 549ae4dba3a6..d326fa230b96 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -243,6 +243,7 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
  
  static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {

{ .compatible = "qcom,adreno" },
+   { .compatible = "qcom,adreno-gmu" },
{ .compatible = "qcom,mdp4" },
{ .compatible = "qcom,mdss" },
{ .compatible = "qcom,sc7180-mdss" },

Re: [PATCH] iommu/arm-smmu-qcom: Add missing GMU entry to match table

2023-12-08 Thread Robin Murphy


On 07/12/2023 9:24 pm, Rob Clark wrote:

From: Rob Clark 

We also want the default domain for the GMU to be an identy domain,
so it does not get a context bank assigned.  Without this, both
of_dma_configure() and drm/msm's iommu_domain_attach() will trigger
allocating and configuring a context bank.  So GMU ends up attached
to both cbndx 1 and cbndx 2.


I can't help but read this as implying that it gets attached to both *at 
the same time*, which would be indicative of a far more serious problem 
in the main driver and/or IOMMU core code.


However, from what we discussed on IRC last night, it sounds like the 
key point here is more straightforwardly that firmware expects the GMU 
to be using context bank 1, in a vaguely similar fashion to how context 
bank 0 is special for the GPU. Clarifying that would help explain why 
we're just doing this as a trick to influence the allocator (i.e. unlike 
some of the other devices in this list we don't actually need the 
properties of the identity domain itself).


In future it might be nice to reserve this explicitly on platforms which 
need it and extend qcom_adreno_smmu_alloc_context_bank() to handle the 
GMU as well, but I don't object to this patch as an immediate quick fix 
for now, especially as something nice and easy for stable (I'd agree 
with Johan in that regard).


Thanks,
Robin.


 This arrangement seemingly confounds
and surprises the firmware if the GPU later triggers a translation
fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU
getting wedged and the GPU stuck without memory access.

Signed-off-by: Rob Clark 
---
  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 549ae4dba3a6..d326fa230b96 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -243,6 +243,7 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
  
  static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {

{ .compatible = "qcom,adreno" },
+   { .compatible = "qcom,adreno-gmu" },
{ .compatible = "qcom,mdp4" },
{ .compatible = "qcom,mdss" },
{ .compatible = "qcom,sc7180-mdss" },

Re: [PATCH 1/3] iommu/msm-iommu: don't limit the driver too much

2023-12-07 Thread Robin Murphy


On 07/12/2023 12:54 pm, Dmitry Baryshkov wrote:

In preparation of dropping most of ARCH_QCOM subtypes, stop limiting the
driver just to those machines. Allow it to be built for any 32-bit
Qualcomm platform (ARCH_QCOM).


Acked-by: Robin Murphy 

Unless Joerg disagrees, I think it should be fine if you want to take 
this via the SoC tree.


Thanks,
Robin.



Signed-off-by: Dmitry Baryshkov 
---
  drivers/iommu/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b..fd67f586f010 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -178,7 +178,7 @@ config FSL_PAMU
  config MSM_IOMMU
bool "MSM IOMMU Support"
depends on ARM
-   depends on ARCH_MSM8X60 || ARCH_MSM8960 || COMPILE_TEST
+   depends on ARCH_QCOM || COMPILE_TEST
select IOMMU_API
select IOMMU_IO_PGTABLE_ARMV7S
help

Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-29 Thread Robin Murphy


On 29/09/2023 4:45 pm, Will Deacon wrote:

On Mon, Sep 25, 2023 at 06:54:42PM +0100, Robin Murphy wrote:

On 2023-04-10 19:52, Dmitry Baryshkov wrote:

If the Adreno SMMU is dma-coherent, allocation will fail unless we
disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
coherent SMMUs (like we have on sm8350 platform).


Hmm, but is it right that it should fail in the first place? The fact is
that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly
can't see why the io-pgtable code is going out of its way to explicitly
reject a request to give them the same attribute it's already giving then
anyway :/

Even if the original intent was for the quirk to have an over-specific
implication of representing inner-NC as well, that hardly seems useful if
what we've ended up with in practice is a nonsensical-looking check in one
place and then a weird hacky bodge in another purely to work around it.

Does anyone know a good reason why this is the way it is?


I think it was mainly because the quick doesn't make sense for a coherent
page-table walker and we could in theory use that bit for something else
in that case.


Yuck, even if we did want some horrible notion of quirks being 
conditional on parts of the config rather than just the format, then the 
users would need to be testing for the same condition as the pagetable 
code itself (i.e. cfg->coherent_walk), rather than hoping some other 
property of something else indirectly reflects the right information - 
e.g. there'd be no hope of backporting this particular bodge before 5.19 
where the old iommu_capable(IOMMU_CAP_CACHE_COHERENCY) always returned 
true, and in future we could conceivably support coherent SMMUs being 
configured for non-coherent walks on a per-domain basis.


Furthermore, if we did overload a flag to have multiple meanings, then 
we'd have no way of knowing which one the caller was actually expecting, 
thus the illusion of being able to validate calls in the meantime isn't 
necessarily as helpful as it seems, particularly in a case where the 
"wrong" interpretation would be to have no effect anyway. Mostly though 
I'd hope that if we ever got anywhere near the point of running out of 
quirk bits we'd have already realised that it's time for a better 
interface :(


Based on that, I think that when I do get round to needing to touch this 
code, I'll propose just streamlining the whole quirk.


Cheers,
Robin.

Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-25 Thread Robin Murphy


On 2023-04-10 19:52, Dmitry Baryshkov wrote:

If the Adreno SMMU is dma-coherent, allocation will fail unless we
disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
coherent SMMUs (like we have on sm8350 platform).


Hmm, but is it right that it should fail in the first place? The fact is 
that if the SMMU is coherent then walks *will* be outer-WBWA, so I 
honestly can't see why the io-pgtable code is going out of its way to 
explicitly reject a request to give them the same attribute it's already 
giving then anyway :/


Even if the original intent was for the quirk to have an over-specific 
implication of representing inner-NC as well, that hardly seems useful 
if what we've ended up with in practice is a nonsensical-looking check 
in one place and then a weird hacky bodge in another purely to work 
around it.


Does anyone know a good reason why this is the way it is?

[ just came across this code in the tree while trying to figure out what 
to do with iommu_set_pgtable_quirks()... ]


Thanks,
Robin.


Fixes: 54af0ceb7595 ("arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU 
nodes")
Reported-by: David Heidelberg 
Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 2942d2548ce6..f74495dcbd96 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1793,7 +1793,8 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct 
platform_device *pdev)
 * This allows GPU to set the bus attributes required to use system
 * cache on behalf of the iommu page table walker.
 */
-   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
+   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice) &&
+   !device_iommu_capable(&pdev->dev, IOMMU_CAP_CACHE_COHERENCY))
quirks |= IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
  
  	return adreno_iommu_create_address_space(gpu, pdev, quirks);

Re: [Freedreno] [PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()

2023-08-21 Thread Robin Murphy


On 2023-08-18 22:32, Jason Gunthorpe wrote:

It turns out several drivers are calling of_dma_configure() outside the
expected bus_type.dma_configure op. This ends up being mis-locked and
triggers a lockdep assertion, or instance:

   iommu_probe_device_locked+0xd4/0xe4
   of_iommu_configure+0x10c/0x200
   of_dma_configure_id+0x104/0x3b8
   a6xx_gmu_init+0x4c/0xccc [msm]
   a6xx_gpu_init+0x3ac/0x770 [msm]
   adreno_bind+0x174/0x2ac [msm]
   component_bind_all+0x118/0x24c
   msm_drm_bind+0x1e8/0x6c4 [msm]
   try_to_bring_up_aggregate_device+0x168/0x1d4
   __component_add+0xa8/0x170
   component_add+0x14/0x20
   dsi_dev_attach+0x20/0x2c [msm]
   dsi_host_attach+0x9c/0x144 [msm]
   devm_mipi_dsi_attach+0x34/0xb4
   lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc]
   lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc]
   i2c_device_probe+0x14c/0x290
   really_probe+0x148/0x2b4
   __driver_probe_device+0x78/0x12c
   driver_probe_device+0x3c/0x160
   __device_attach_driver+0xb8/0x138
   bus_for_each_drv+0x84/0xe0
   __device_attach+0xa8/0x1b0
   device_initial_probe+0x14/0x20
   bus_probe_device+0xb0/0xb4
   deferred_probe_work_func+0x8c/0xc8
   process_one_work+0x1ec/0x53c
   worker_thread+0x298/0x408
   kthread+0x124/0x128
   ret_from_fork+0x10/0x20

It is subtle and was never documented or enforced, but there has always
been an assumption that of_dma_configure_id() is not concurrent. It makes
several calls into the iommu layer that require this, including
dev_iommu_get(). The majority of cases have been preventing concurrency
using the device_lock().

Thus the new lock debugging added exposes an existing problem in
drivers. On inspection this looks like a theoretical locking problem as
generally the cases are already assuming they are the exclusive (single
threaded) user of the target device.


Sorry to be blunt, but the only problem is that you've introduced an 
idealistic new locking scheme which failed to take into account how 
things currently actually work, and is broken and achieving nothing but 
causing problems.


The solution is to drop those locking patches entirely and rethink the 
whole thing. When their sole purpose was to improve the locking and make 
it easier to reason about, and now the latest "fix" is now to remove one 
of the assertions which forms the fundamental basis for that reasoning, 
then the point has clearly been lost. All we've done is churned a dodgy 
and incomplete locking scheme into a *different* dodgy and incomplete 
locking scheme. I do not think that continuing to dig in deeper is the 
way out of the hole...


It's now rc7, and I have little confidence that aren't still more latent 
problems which just haven't been hit yet (e.g. acpi_dma_configure() is 
also called in different contexts relative to the device lock, which is 
absolutely by design and not broken).


And on the subject of idealism, the fact is that doing IOMMU 
configuration based on driver probe via bus->dma_configure is 
*fundamentally wrong* and breaking a bunch of other IOMMU API 
assumptions, so it is not a robust foundation to build anything upon in 
the first place. The problem it causes with broken groups has been known 
about for several years now, however it's needed a lot of work to get to 
the point of being able to fix it properly (FWIW that is now #2 on my 
priority list after getting the bus ops stuff done, which should also 
make it easier).


Thanks,
Robin.


Sadly, there are deeper technical problems with all of the places doing
this. There are several problemetic patterns:

1) Probe a driver on device A and then steal device B and use it as part
of the driver operation.

Since no driver was probed to device B it means we never called
bus_type.dma_configure and thus the drivers hackily try to open code
this.

Unfortunately nothing prevents another driver from binding to device B
and creating total chaos. eg vfio bind triggered by userspace

2) Probe a driver on device A and then create a new platform driver B for a
fwnode that doesn't have one, then do #1

This has the same essential problem as #1, the new device is never
probed so the hack call to of_dma_configure() is needed to setup DMA,
and we are at risk of something else trying to use the device.

3) Probe a driver on device A but the of_node was incorrect for DMA so fix
it by figuring out the right node and calling of_dma_configure()

This will blow up in the iommu code if the driver is unprobed because
the bus_type now assumes that dma_configure and dma_cleanup are
strictly paired. Since dma_configure will have done the wrong thing due
to the missing of_node, dma_cleanup will be unpaired and
iommu_device_unuse_default_domain() will blow up.

Further the driver operating on device A will not be protected against
changes to the iommu domain since it never called
iommu_device_use_default_domain()

At least this case will not throw a lockdep warning as pr

Re: [Freedreno] [PATCH 0/2] drm: Add component_match_add_of and convert users of drm_of_component_match_add

2022-12-16 Thread Robin Murphy


On 2022-12-16 17:08, Sean Anderson wrote:

On 11/3/22 14:22, Sean Anderson wrote:

This series adds a new function component_match_add_of to simplify the
common case of calling component_match_add_release with
component_release_of and component_compare_of. There is already
drm_of_component_match_add, which allows for a custom compare function.
However, all existing users just use component_compare_of (or an
equivalent).

I can split the second commit up if that is easier to review.


Sean Anderson (2):
   component: Add helper for device nodes
   drm: Convert users of drm_of_component_match_add to
 component_match_add_of

  .../gpu/drm/arm/display/komeda/komeda_drv.c   |  6 ++--
  drivers/gpu/drm/arm/hdlcd_drv.c   |  9 +-
  drivers/gpu/drm/arm/malidp_drv.c  | 11 +--
  drivers/gpu/drm/armada/armada_drv.c   | 10 ---
  drivers/gpu/drm/drm_of.c  | 29 +++
  drivers/gpu/drm/etnaviv/etnaviv_drv.c |  4 +--
  .../gpu/drm/hisilicon/kirin/kirin_drm_drv.c   |  3 +-
  drivers/gpu/drm/ingenic/ingenic-drm-drv.c |  3 +-
  drivers/gpu/drm/mediatek/mtk_drm_drv.c|  4 +--
  drivers/gpu/drm/msm/msm_drv.c | 14 -
  drivers/gpu/drm/sti/sti_drv.c |  3 +-
  drivers/gpu/drm/sun4i/sun4i_drv.c |  3 +-
  drivers/gpu/drm/tilcdc/tilcdc_external.c  | 10 ++-
  drivers/iommu/mtk_iommu.c |  3 +-
  drivers/iommu/mtk_iommu_v1.c  |  3 +-
  include/drm/drm_of.h  | 12 
  include/linux/component.h |  9 ++
  sound/soc/codecs/wcd938x.c|  6 ++--
  18 files changed, 46 insertions(+), 96 deletions(-)



ping?

Should I send a v2 broken up like Mark suggested?


FWIW you'll need to rebase the IOMMU changes on 6.2-rc1 anyway - 
mtk_iommu stops using component_match_add_release() at all.


Thanks,
Robin.

Re: [Freedreno] [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables

2022-08-22 Thread Robin Murphy


On 2022-08-22 14:52, Robin Murphy wrote:

On 2022-08-21 19:19, Rob Clark wrote:

From: Rob Clark 

Optimize TLB invalidation by using different ASID for each set of
pgtables.  There can be scenarios where multiple processes end up
with the same ASID (such as >256 processes using the GPU), but this
is harmless, it will only result in some over-invalidation (but
less over-invalidation compared to using ASID=0 for all processes)


Um, if you're still relying on the GPU doing an invalidate-all-by-ASID 
whenever it switches a TTBR, then there's only ever one ASID live in the 
TLBs at once, so it really doesn't matter whether its value stays the 
same or changes. This seems like a whole chunk of complexity to achieve 
nothing :/


If you could actually use multiple ASIDs in a meaningful way to avoid 
any invalidations, you'd need to do considerably more work to keep track 
of reuse, and those races would probably be a lot less benign.


Oh, and on top of that, if you did want to go down that route then 
chances are you'd then also want to start looking at using global 
mappings in TTBR1 to avoid increased TLB pressure from kernel buffers, 
and then we'd run up against some pretty horrid MMU-500 errata which so 
far I've been happy to ignore on the basis that Linux doesn't use global 
mappings. Spoiler alert: unless you can additionally convert everything 
to invalidate by VA, the workaround for #752357 most likely makes the 
whole idea moot.


Robin.


Signed-off-by: Rob Clark 
---
  drivers/gpu/drm/msm/msm_iommu.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
b/drivers/gpu/drm/msm/msm_iommu.c

index a54ed354578b..94c8c09980d1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu 
*mmu, u64 iova,

  size_t size)
  {
  struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+    struct adreno_smmu_priv *adreno_smmu =
+    dev_get_drvdata(pagetable->parent->dev);
  struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
  size_t unmapped = 0;
@@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu 
*mmu, u64 iova,

  size -= 4096;
  }
-    iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
+    adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
  return (unmapped == size) ? 0 : -EINVAL;
  }
@@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain 
*domain, struct device *dev,

  struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
  {
+    static atomic_t asid = ATOMIC_INIT(1);
  struct adreno_smmu_priv *adreno_smmu = 
dev_get_drvdata(parent->dev);

  struct msm_iommu *iommu = to_msm_iommu(parent);
  struct msm_iommu_pagetable *pagetable;
@@ -210,12 +213,14 @@ struct msm_mmu 
*msm_iommu_pagetable_create(struct msm_mmu *parent)

  pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
  /*
- * TODO we would like each set of page tables to have a unique ASID
- * to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
- * end up flushing the ASID used for TTBR1 pagetables, which is not
- * what we want.  So for now just use the same ASID as TTBR1.
+ * ASID 0 is used for kernel mapped buffers in TTBR1, which we
+ * do not need to invalidate when unmapping from TTBR0 pgtables.
+ * The hw ASID is at *least* 8b, but can be 16b.  We just assume
+ * the worst:
   */
  pagetable->asid = 0;
+    while (!pagetable->asid)
+    pagetable->asid = atomic_inc_return(&asid) & 0xff;
  return &pagetable->base;
  }

Re: [Freedreno] [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables

2022-08-22 Thread Robin Murphy


On 2022-08-21 19:19, Rob Clark wrote:

From: Rob Clark 

Optimize TLB invalidation by using different ASID for each set of
pgtables.  There can be scenarios where multiple processes end up
with the same ASID (such as >256 processes using the GPU), but this
is harmless, it will only result in some over-invalidation (but
less over-invalidation compared to using ASID=0 for all processes)


Um, if you're still relying on the GPU doing an invalidate-all-by-ASID 
whenever it switches a TTBR, then there's only ever one ASID live in the 
TLBs at once, so it really doesn't matter whether its value stays the 
same or changes. This seems like a whole chunk of complexity to achieve 
nothing :/


If you could actually use multiple ASIDs in a meaningful way to avoid 
any invalidations, you'd need to do considerably more work to keep track 
of reuse, and those races would probably be a lot less benign.


Robin.

.> Signed-off-by: Rob Clark 

---
  drivers/gpu/drm/msm/msm_iommu.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index a54ed354578b..94c8c09980d1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 
iova,
size_t size)
  {
struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct adreno_smmu_priv *adreno_smmu =
+   dev_get_drvdata(pagetable->parent->dev);
struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
size_t unmapped = 0;
  
@@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,

size -= 4096;
}
  
-	iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);

+   adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
  
  	return (unmapped == size) ? 0 : -EINVAL;

  }
@@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain *domain, 
struct device *dev,
  
  struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)

  {
+   static atomic_t asid = ATOMIC_INIT(1);
struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(parent->dev);
struct msm_iommu *iommu = to_msm_iommu(parent);
struct msm_iommu_pagetable *pagetable;
@@ -210,12 +213,14 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu 
*parent)
pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
  
  	/*

-* TODO we would like each set of page tables to have a unique ASID
-* to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
-* end up flushing the ASID used for TTBR1 pagetables, which is not
-* what we want.  So for now just use the same ASID as TTBR1.
+* ASID 0 is used for kernel mapped buffers in TTBR1, which we
+* do not need to invalidate when unmapping from TTBR0 pgtables.
+* The hw ASID is at *least* 8b, but can be 16b.  We just assume
+* the worst:
 */
pagetable->asid = 0;
+   while (!pagetable->asid)
+   pagetable->asid = atomic_inc_return(&asid) & 0xff;
  
  	return &pagetable->base;

  }

Re: [Freedreno] [PATCH v2 5/5] drm/msm: switch msm_kms_init_aspace() to use device_iommu_mapped()

2022-05-05 Thread Robin Murphy


On 2022-05-05 01:16, Dmitry Baryshkov wrote:

Change msm_kms_init_aspace() to use generic function
device_iommu_mapped() instead of the fwnode-specific interface
dev_iommu_fwspec_get(). While we are at it, stop referencing
platform_bus_type directly and use the bus of the IOMMU device.


FWIW, I'd have squashed these changes across the previous patches, such 
that the dodgy fwspec calls are never introduced in the first place, but 
it's your driver, and if that's the way you want to work it and Rob's 
happy with it too, then fine by me.


For the end result,

Reviewed-by: Robin Murphy 

I'm guessing MDP4 could probably use msm_kms_init_aspace() now as well, 
but unless there's any other reason to respin this series, that's 
something we could do as a follow-up. Thanks for sorting this out!


Robin.


Suggested-by: Robin Murphy 
Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/msm_drv.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 98ae0036ab57..2fc3f820cd59 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -272,21 +272,21 @@ struct msm_gem_address_space *msm_kms_init_aspace(struct 
drm_device *dev)
struct device *mdss_dev = mdp_dev->parent;
struct device *iommu_dev;
  
-	domain = iommu_domain_alloc(&platform_bus_type);

-   if (!domain) {
-   drm_info(dev, "no IOMMU, fallback to phys contig buffers for 
scanout\n");
-   return NULL;
-   }
-
/*
 * IOMMUs can be a part of MDSS device tree binding, or the
 * MDP/DPU device.
 */
-   if (dev_iommu_fwspec_get(mdp_dev))
+   if (device_iommu_mapped(mdp_dev))
iommu_dev = mdp_dev;
else
iommu_dev = mdss_dev;
  
+	domain = iommu_domain_alloc(iommu_dev->bus);

+   if (!domain) {
+   drm_info(dev, "no IOMMU, fallback to phys contig buffers for 
scanout\n");
+   return NULL;
+   }
+
mmu = msm_iommu_new(iommu_dev, domain);
if (IS_ERR(mmu)) {
iommu_domain_free(domain);

Re: [Freedreno] [PATCH 2/3] drm/msm/mdp5: move iommu_domain_alloc() call close to its usage

2022-05-03 Thread Robin Murphy


On 2022-05-03 14:30, Dmitry Baryshkov wrote:

On Tue, 3 May 2022 at 13:57, Robin Murphy  wrote:


On 2022-05-01 11:10, Dmitry Baryshkov wrote:

Move iommu_domain_alloc() in front of adress space/IOMMU initialization.
This allows us to drop final bits of struct mdp5_cfg_platform which
remained from the pre-DT days.

Signed-off-by: Dmitry Baryshkov 
---
   drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 16 
   drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h |  6 --
   drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  6 --
   3 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
index 1bf9ff5dbabc..714effb967ff 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
@@ -1248,8 +1248,6 @@ static const struct mdp5_cfg_handler cfg_handlers_v3[] = {
   { .revision = 3, .config = { .hw = &sdm630_config } },
   };

-static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev);
-
   const struct mdp5_cfg_hw *mdp5_cfg_get_hw_config(struct mdp5_cfg_handler 
*cfg_handler)
   {
   return cfg_handler->config.hw;
@@ -1274,10 +1272,8 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
   uint32_t major, uint32_t minor)
   {
   struct drm_device *dev = mdp5_kms->dev;
- struct platform_device *pdev = to_platform_device(dev->dev);
   struct mdp5_cfg_handler *cfg_handler;
   const struct mdp5_cfg_handler *cfg_handlers;
- struct mdp5_cfg_platform *pconfig;
   int i, ret = 0, num_handlers;

   cfg_handler = kzalloc(sizeof(*cfg_handler), GFP_KERNEL);
@@ -1320,9 +1316,6 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
   cfg_handler->revision = minor;
   cfg_handler->config.hw = mdp5_cfg;

- pconfig = mdp5_get_config(pdev);
- memcpy(&cfg_handler->config.platform, pconfig, sizeof(*pconfig));
-
   DBG("MDP5: %s hw config selected", mdp5_cfg->name);

   return cfg_handler;
@@ -1333,12 +1326,3 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,

   return ERR_PTR(ret);
   }
-
-static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev)
-{
- static struct mdp5_cfg_platform config = {};
-
- config.iommu = iommu_domain_alloc(&platform_bus_type);
-
- return &config;
-}
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
index 6b03d7899309..c2502cc33864 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
@@ -104,14 +104,8 @@ struct mdp5_cfg_hw {
   uint32_t max_clk;
   };

-/* platform config data (ie. from DT, or pdata) */
-struct mdp5_cfg_platform {
- struct iommu_domain *iommu;
-};
-
   struct mdp5_cfg {
   const struct mdp5_cfg_hw *hw;
- struct mdp5_cfg_platform platform;
   };

   struct mdp5_kms;
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index 9b7bbc3adb97..1c67c2c828cd 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -558,6 +558,7 @@ static int mdp5_kms_init(struct drm_device *dev)
   struct msm_gem_address_space *aspace;
   int irq, i, ret;
   struct device *iommu_dev;
+ struct iommu_domain *iommu;

   ret = mdp5_init(to_platform_device(dev->dev), dev);

@@ -601,14 +602,15 @@ static int mdp5_kms_init(struct drm_device *dev)
   }
   mdelay(16);

- if (config->platform.iommu) {
+ iommu = iommu_domain_alloc(&platform_bus_type);


To preempt the next change down the line as well, could this be
rearranged to work as iommu_domain_alloc(iommu_dev->bus)?


I'd prefer to split this into the separate change, if you don't mind.


Oh, for sure, divide the patches however you see fit - I'm just hoping 
to save your time overall by getting all the IOMMU-related refactoring 
done now as a single series rather than risk me coming back and breaking 
things again in a few months :)


Cheers,
Robin.






+ if (iommu) {
   struct msm_mmu *mmu;

   iommu_dev = &pdev->dev;
   if (!dev_iommu_fwspec_get(iommu_dev))


The fwspec helpers are more of an internal thing between the IOMMU
drivers and the respective firmware code - I'd rather that external API
users stuck consistently to using device_iommu_mapped() (it should give
the same result).


Let me check that it works correctly and spin a v2 afterwards.



Otherwise, thanks for sorting this out!

Robin.


   iommu_dev = iommu_dev->parent;

- mmu = msm_iommu_new(iommu_dev, config->platform.iommu);
+ mmu = msm_iommu_new(iommu_dev, iommu);

   aspace = msm_gem_address_space_create(mmu, "mdp5",
   0x1000, 0x1 - 0x1000);

Re: [Freedreno] [PATCH 2/3] drm/msm/mdp5: move iommu_domain_alloc() call close to its usage

2022-05-03 Thread Robin Murphy


On 2022-05-01 11:10, Dmitry Baryshkov wrote:

Move iommu_domain_alloc() in front of adress space/IOMMU initialization.
This allows us to drop final bits of struct mdp5_cfg_platform which
remained from the pre-DT days.

Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 16 
  drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h |  6 --
  drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  6 --
  3 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
index 1bf9ff5dbabc..714effb967ff 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
@@ -1248,8 +1248,6 @@ static const struct mdp5_cfg_handler cfg_handlers_v3[] = {
{ .revision = 3, .config = { .hw = &sdm630_config } },
  };
  
-static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev);

-
  const struct mdp5_cfg_hw *mdp5_cfg_get_hw_config(struct mdp5_cfg_handler 
*cfg_handler)
  {
return cfg_handler->config.hw;
@@ -1274,10 +1272,8 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
uint32_t major, uint32_t minor)
  {
struct drm_device *dev = mdp5_kms->dev;
-   struct platform_device *pdev = to_platform_device(dev->dev);
struct mdp5_cfg_handler *cfg_handler;
const struct mdp5_cfg_handler *cfg_handlers;
-   struct mdp5_cfg_platform *pconfig;
int i, ret = 0, num_handlers;
  
  	cfg_handler = kzalloc(sizeof(*cfg_handler), GFP_KERNEL);

@@ -1320,9 +1316,6 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
cfg_handler->revision = minor;
cfg_handler->config.hw = mdp5_cfg;
  
-	pconfig = mdp5_get_config(pdev);

-   memcpy(&cfg_handler->config.platform, pconfig, sizeof(*pconfig));
-
DBG("MDP5: %s hw config selected", mdp5_cfg->name);
  
  	return cfg_handler;

@@ -1333,12 +1326,3 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
  
  	return ERR_PTR(ret);

  }
-
-static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev)
-{
-   static struct mdp5_cfg_platform config = {};
-
-   config.iommu = iommu_domain_alloc(&platform_bus_type);
-
-   return &config;
-}
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
index 6b03d7899309..c2502cc33864 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
@@ -104,14 +104,8 @@ struct mdp5_cfg_hw {
uint32_t max_clk;
  };
  
-/* platform config data (ie. from DT, or pdata) */

-struct mdp5_cfg_platform {
-   struct iommu_domain *iommu;
-};
-
  struct mdp5_cfg {
const struct mdp5_cfg_hw *hw;
-   struct mdp5_cfg_platform platform;
  };
  
  struct mdp5_kms;

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index 9b7bbc3adb97..1c67c2c828cd 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -558,6 +558,7 @@ static int mdp5_kms_init(struct drm_device *dev)
struct msm_gem_address_space *aspace;
int irq, i, ret;
struct device *iommu_dev;
+   struct iommu_domain *iommu;
  
  	ret = mdp5_init(to_platform_device(dev->dev), dev);
  
@@ -601,14 +602,15 @@ static int mdp5_kms_init(struct drm_device *dev)

}
mdelay(16);
  
-	if (config->platform.iommu) {

+   iommu = iommu_domain_alloc(&platform_bus_type);


To preempt the next change down the line as well, could this be 
rearranged to work as iommu_domain_alloc(iommu_dev->bus)?



+   if (iommu) {
struct msm_mmu *mmu;
  
  		iommu_dev = &pdev->dev;

if (!dev_iommu_fwspec_get(iommu_dev))


The fwspec helpers are more of an internal thing between the IOMMU 
drivers and the respective firmware code - I'd rather that external API 
users stuck consistently to using device_iommu_mapped() (it should give 
the same result).


Otherwise, thanks for sorting this out!

Robin.


iommu_dev = iommu_dev->parent;
  
-		mmu = msm_iommu_new(iommu_dev, config->platform.iommu);

+   mmu = msm_iommu_new(iommu_dev, iommu);
  
  		aspace = msm_gem_address_space_create(mmu, "mdp5",

0x1000, 0x1 - 0x1000);

Re: [Freedreno] [PATCH] drm/msm: Revert "drm/msm: Stop using iommu_present()"

2022-04-19 Thread Robin Murphy


On 2022-04-19 22:08, Dmitry Baryshkov wrote:

On 20/04/2022 00:04, Robin Murphy wrote:

On 2022-04-19 14:04, Dmitry Baryshkov wrote:

This reverts commit e2a88eabb02410267519b838fb9b79f5206769be. The commit
in question makes msm_use_mmu() check whether the DRM 'component master'
device is translated by the IOMMU. At this moment it is the 'mdss'
device.
However on platforms using the MDP5 driver (e.g. MSM8916/APQ8016,
MSM8996/APQ8096) it's the mdp5 device, which has the iommus property
(and thus is "translated by the IOMMU"). This results in these devices
being broken with the following lines in the dmesg.

[drm] Initialized msm 1.9.0 20130625 for 1a0.mdss on minor 0
msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pm4.fw 
from new location
msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pfp.fw 
from new location

msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28
msm 1a0.mdss: could not allocate stolen bo
msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28
msm 1a0.mdss: [drm:msm_alloc_stolen_fb] *ERROR* failed to 
allocate buffer object

msm 1a0.mdss: [drm:msm_fbdev_create] *ERROR* failed to allocate fb

Getting the mdp5 device pointer from this function is not that easy at
this moment. Thus this patch is reverted till the MDSS rework [1] lands.
It will make the mdp5/dpu1 device component master and the check will be
legit.

[1] https://patchwork.freedesktop.org/series/98525/


Oh, DRM...

If that series is going to land got 5.19, could you please implement 
the correct equivalent of this patch within it?


Yes, that's the plan. I'm sending a reworked version of your patch 
shortly (but it still depends on [1]).




I'm fine with the revert for now if this patch doesn't work properly 
in all cases, but I have very little sympathy left for DRM drivers 
riding roughshod over all the standard driver model abstractions 
because they're "special". iommu_present() *needs* to go away, so if 
it's left to me to have a second go at fixing this driver next cycle, 
you're liable to get some abomination based on 
of_find_compatible_node() or similar, and I'll probably be demanding 
an ack to take it through the IOMMU tree ;)


No need for such measures :-)


Awesome, thanks!

Robin.

Re: [Freedreno] [PATCH] drm/msm: Revert "drm/msm: Stop using iommu_present()"

2022-04-19 Thread Robin Murphy


On 2022-04-19 14:04, Dmitry Baryshkov wrote:

This reverts commit e2a88eabb02410267519b838fb9b79f5206769be. The commit
in question makes msm_use_mmu() check whether the DRM 'component master'
device is translated by the IOMMU. At this moment it is the 'mdss'
device.
However on platforms using the MDP5 driver (e.g. MSM8916/APQ8016,
MSM8996/APQ8096) it's the mdp5 device, which has the iommus property
(and thus is "translated by the IOMMU"). This results in these devices
being broken with the following lines in the dmesg.

[drm] Initialized msm 1.9.0 20130625 for 1a0.mdss on minor 0
msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pm4.fw from new 
location
msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pfp.fw from new 
location
msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28
msm 1a0.mdss: could not allocate stolen bo
msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28
msm 1a0.mdss: [drm:msm_alloc_stolen_fb] *ERROR* failed to allocate buffer 
object
msm 1a0.mdss: [drm:msm_fbdev_create] *ERROR* failed to allocate fb

Getting the mdp5 device pointer from this function is not that easy at
this moment. Thus this patch is reverted till the MDSS rework [1] lands.
It will make the mdp5/dpu1 device component master and the check will be
legit.

[1] https://patchwork.freedesktop.org/series/98525/


Oh, DRM...

If that series is going to land got 5.19, could you please implement the 
correct equivalent of this patch within it?


I'm fine with the revert for now if this patch doesn't work properly in 
all cases, but I have very little sympathy left for DRM drivers riding 
roughshod over all the standard driver model abstractions because 
they're "special". iommu_present() *needs* to go away, so if it's left 
to me to have a second go at fixing this driver next cycle, you're 
liable to get some abomination based on of_find_compatible_node() or 
similar, and I'll probably be demanding an ack to take it through the 
IOMMU tree ;)


Thanks,
Robin.


Fixes: e2a88eabb024 ("drm/msm: Stop using iommu_present()")
Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/msm_drv.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index b6702b0fafcb..e2b5307b2360 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -263,7 +263,7 @@ bool msm_use_mmu(struct drm_device *dev)
struct msm_drm_private *priv = dev->dev_private;
  
  	/* a2xx comes with its own MMU */

-   return priv->is_a2xx || device_iommu_mapped(dev->dev);
+   return priv->is_a2xx || iommu_present(&platform_bus_type);
  }
  
  static int msm_init_vram(struct drm_device *dev)

[Freedreno] [PATCH] drm/msm: Stop using iommu_present()

2022-04-05 Thread Robin Murphy

Even if some IOMMU has registered itself on the platform "bus", that
doesn't necessarily mean it provides translation for the device we
care about. Replace iommu_present() with a more appropriate check.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/msm/msm_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index affa95eb05fc..9c36b505daab 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -274,7 +274,7 @@ bool msm_use_mmu(struct drm_device *dev)
struct msm_drm_private *priv = dev->dev_private;
 
/* a2xx comes with its own MMU */
-   return priv->is_a2xx || iommu_present(&platform_bus_type);
+   return priv->is_a2xx || device_iommu_mapped(dev->dev);
 }
 
 static int msm_init_vram(struct drm_device *dev)
-- 
2.28.0.dirty

Re: [Freedreno] [PATCH] drm/msm: use orig_nents to iterate over scatterlist with per-process tables

2022-04-05 Thread Robin Murphy


On 2022-03-28 13:55, Jonathan Marek wrote:

This matches the implementation of iommu_map_sgtable() used for the
non-per-process page tables path.

This works around the dma_map_sgtable() call (used to invalidate cache)
overwriting sgt->nents with 1 (which is probably a separate issue).


FWIW that's expected behaviour. The sgtable APIs use nents to keep track 
of the number of DMA segments, while orig_nents represents the physical 
segments. IIUC you're not actually using the DMA mapping, just relying 
on its side-effects, so it's still the physical segments that you care 
about for your private IOMMU mapping here.



Fixes: b145c6e65eb0 ("drm/msm: Add support to create a local pagetable")
Signed-off-by: Jonathan Marek 
---
  drivers/gpu/drm/msm/msm_iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index bcaddbba564df..22935ef26a3a1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -58,7 +58,7 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 
iova,
u64 addr = iova;
unsigned int i;
  
-	for_each_sg(sgt->sgl, sg, sgt->nents, i) {

+   for_each_sg(sgt->sgl, sg, sgt->orig_nents, i) {


Even better would be to use for_each_sgtable_sg(), which exists 
primarily for the sake of avoiding this exact confusion.


Robin.


size_t size = sg->length;
phys_addr_t phys = sg_phys(sg);

Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-31 Thread Robin Murphy


On 2021-03-16 15:38, Christoph Hellwig wrote:
[...]

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f1e38526d5bd40..996dfdf9d375dd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2017,7 +2017,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
  
-	if (smmu_domain->non_strict)

+   if (!iommu_get_dma_strict())


As Will raised, this also needs to be checking "domain->type == 
IOMMU_DOMAIN_DMA" to maintain equivalent behaviour to the attribute code 
below.



pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
  
  	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);

@@ -2449,52 +2449,6 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
-{
-   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-
-   switch (domain->type) {
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
-}

[...]

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index f985817c967a25..edb1de479dd1a7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -668,7 +668,6 @@ struct arm_smmu_domain {
struct mutexinit_mutex; /* Protects smmu pointer */
  
  	struct io_pgtable_ops		*pgtbl_ops;

-   boolnon_strict;
atomic_tnr_ats_masters;
  
  	enum arm_smmu_domain_stage	stage;

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 0aa6d667274970..3dde22b1f8ffb0 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -761,6 +761,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
  
+	if (!iommu_get_dma_strict())


Ditto here.

Sorry for not spotting that sooner :(

Robin.


+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
if (smmu->impl && smmu->impl->init_context) {
ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg, dev);
if (ret)

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-31 Thread Robin Murphy


On 2021-03-31 16:32, Will Deacon wrote:

On Wed, Mar 31, 2021 at 02:09:37PM +0100, Robin Murphy wrote:

On 2021-03-31 12:49, Will Deacon wrote:

On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote:

On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
 missing bits]
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/amd/iommu.c   | 23 +---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
 drivers/iommu/dma-iommu.c   |  9 +--
 drivers/iommu/intel/iommu.c | 64 -
 drivers/iommu/iommu.c   | 27 ++---
 include/linux/iommu.h   |  4 +-
 8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the
domain attribute by iommu_group_alloc_default_domain(), so there's no
functional change there.


For DMA domains sure, but I don't think that's the case for unmanaged
domains such as those used by VFIO.


Eh? This is only relevant to DMA domains anyway. Flush queues are part of
the IOVA allocator that VFIO doesn't even use. It's always been the case
that unmanaged domains only use strict invalidation.


Maybe I'm going mad. With this patch, the SMMU driver unconditionally sets
IO_PGTABLE_QUIRK_NON_STRICT for page-tables if iommu_get_dma_strict() is
true, no? In which case, that will get set for page-tables corresponding
to unmanaged domains as well as DMA domains when it is enabled. That didn't
happen before because you couldn't set the attribute for unmanaged domains.

What am I missing?


Oh cock... sorry, all this time I've been saying what I *expect* it to 
do, while overlooking the fact that the IO_PGTABLE_QUIRK_NON_STRICT 
hunks were the bits I forgot to write and Christoph had to fix up. 
Indeed, those should be checking the domain type too to preserve the 
existing behaviour. Apologies for the confusion.


Robin.


Obviously some of the above checks could be factored out into some kind of
iommu_use_flush_queue() helper that IOMMU drivers can also call if they need
to keep in sync. Or maybe we just allow iommu-dma to set
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're
treating that as a generic thing now.


I think a helper that takes a domain would be a good starting point.


You mean device, right? The one condition we currently have is at the device
level, and there's really nothing inherent to the domain itself that matters
(since the type is implicitly IOMMU_DOMAIN_DMA to even care about this).


Device would probably work too; you'd pass the first device to attach to the
domain when querying t

Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-31 Thread Robin Murphy


On 2021-03-31 12:49, Will Deacon wrote:

On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote:

On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
missing bits]
Signed-off-by: Christoph Hellwig 
---
drivers/iommu/amd/iommu.c   | 23 +---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
drivers/iommu/dma-iommu.c   |  9 +--
drivers/iommu/intel/iommu.c | 64 -
drivers/iommu/iommu.c   | 27 ++---
include/linux/iommu.h   |  4 +-
8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the
domain attribute by iommu_group_alloc_default_domain(), so there's no
functional change there.


For DMA domains sure, but I don't think that's the case for unmanaged
domains such as those used by VFIO.


Eh? This is only relevant to DMA domains anyway. Flush queues are part 
of the IOVA allocator that VFIO doesn't even use. It's always been the 
case that unmanaged domains only use strict invalidation.



Obviously some of the above checks could be factored out into some kind of
iommu_use_flush_queue() helper that IOMMU drivers can also call if they need
to keep in sync. Or maybe we just allow iommu-dma to set
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're
treating that as a generic thing now.


I think a helper that takes a domain would be a good starting point.


You mean device, right? The one condition we currently have is at the 
device level, and there's really nothing inherent to the domain itself 
that matters (since the type is implicitly IOMMU_DOMAIN_DMA to even care 
about this).


Another idea that's just come to mind is now that IOMMU_DOMAIN_DMA has a 
standard meaning, maybe we could split out a separate 
IOMMU_DOMAIN_DMA_STRICT type such that it can all propagate from 
iommu_get_def_domain_type()? That feels like it might be quite 
promising, but I'd still do it as an improvement on top of this patch, 
since it's beyond just cleaning up the abuse of domain attributes to 
pass a command-line option around.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Robin Murphy


On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
   missing bits]
Signed-off-by: Christoph Hellwig 
---
   drivers/iommu/amd/iommu.c   | 23 +---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
   drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
   drivers/iommu/dma-iommu.c   |  9 +--
   drivers/iommu/intel/iommu.c | 64 -
   drivers/iommu/iommu.c   | 27 ++---
   include/linux/iommu.h   |  4 +-
   8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the 
domain attribute by iommu_group_alloc_default_domain(), so there's no 
functional change there.


Obviously some of the above checks could be factored out into some kind 
of iommu_use_flush_queue() helper that IOMMU drivers can also call if 
they need to keep in sync. Or maybe we just allow iommu-dma to set 
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if 
we're treating that as a generic thing now.



For example, see the recent patch from Lu Baolu:

https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com


Erm, this patch is based on that one, it's right there in the context :/


Ah, sorry, I didn't spot that! I was just trying to illustrate that this
is per-device.


Sure, I understand - and I'm just trying to bang home that despite 
appearances it's never actually been treated as such for SMMU, so 
anything that's wrong after this change was already wrong before.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Robin Murphy


On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
  missing bits]
Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 +---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
  drivers/iommu/dma-iommu.c   |  9 +--
  drivers/iommu/intel/iommu.c | 64 -
  drivers/iommu/iommu.c   | 27 ++---
  include/linux/iommu.h   |  4 +-
  8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of 
stuffing a global property into a domain attribute then pulling it out 
again in the illusion that it was in any way per-domain. We're still 
checking dev_is_untrusted() before making an actual decision, and it's 
not like we can't add more factors at that point if we want to.



For example, see the recent patch from Lu Baolu:

https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com


Erm, this patch is based on that one, it's right there in the context :/

Thanks,
Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-16 Thread Robin Murphy


On 2021-03-15 08:33, Christoph Hellwig wrote:

On Fri, Mar 12, 2021 at 04:18:24PM +, Robin Murphy wrote:

Let me know what you think of the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup

I'll happily switch the patch to you as the author if you're fine with
that as well.


I still have reservations about removing the attribute API entirely and
pretending that io_pgtable_cfg is anything other than a SoC-specific
private interface,


I think a private inteface would make more sense.  For now I've just
condensed it down to a generic set of quirk bits and dropped the
attrs structure, which seems like an ok middle ground for now.  That
being said I wonder why that quirk isn't simply set in the device
tree?


Because it's a software policy decision rather than any inherent 
property of the platform, and the DT certainly doesn't know *when* any 
particular device might prefer its IOMMU to use cacheable pagetables to 
minimise TLB miss latency vs. saving the cache capacity for larger data 
buffers. It really is most logical to decide this at the driver level.


In truth the overall concept *is* relatively generic (a trend towards 
larger system caches and cleverer usage is about both raw performance 
and saving power on off-SoC DRAM traffic), it's just the particular 
implementation of using io-pgtable to set an outer-cacheable walk 
attribute in an SMMU TCR that's pretty much specific to Qualcomm SoCs. 
Hence why having a common abstraction at the iommu_domain level, but 
where the exact details are free to vary across different IOMMUs and 
their respective client drivers, is in many ways an ideal fit.



but the reworked patch on its own looks reasonable to
me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers
either...) Just iommu_get_dma_strict() needs an export since the SMMU
drivers can be modular - I consciously didn't add that myself since I was
mistakenly thinking only iommu-dma would call it.


Fixed.  Can I get your signoff for the patch?  Then I'll switch it to
over to being attributed to you.


Sure - I would have thought that the one I originally posted still 
stands, but for the avoidance of doubt, for the parts of commit 
8b6d45c495bd in your tree that remain from what I wrote:


Signed-off-by: Robin Murphy 

Cheers,
Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-12 Thread Robin Murphy


On 2021-03-11 08:26, Christoph Hellwig wrote:

On Wed, Mar 10, 2021 at 06:39:57PM +, Robin Murphy wrote:

Actually... Just mirroring the iommu_dma_strict value into
struct iommu_domain should solve all of that with very little
boilerplate code.


Yes, my initial thought was to directly replace the attribute with a
common flag at iommu_domain level, but since in all cases the behaviour
is effectively global rather than actually per-domain, it seemed
reasonable to take it a step further. This passes compile-testing for
arm64 and x86, what do you think?


It seems to miss a few bits, and also generally seems to be not actually
apply to recent mainline or something like it due to different empty
lines in a few places.


Yeah, that was sketched out on top of some other development patches, 
and in being so focused on not breaking any of the x86 behaviours I did 
indeed overlook fully converting the SMMU drivers... oops!


(my thought was to do the conversion for its own sake, then clean up the 
redundant attribute separately, but I guess it's fine either way)



Let me know what you think of the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup

I'll happily switch the patch to you as the author if you're fine with
that as well.


I still have reservations about removing the attribute API entirely and 
pretending that io_pgtable_cfg is anything other than a SoC-specific 
private interface, but the reworked patch on its own looks reasonable to 
me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers 
either...) Just iommu_get_dma_strict() needs an export since the SMMU 
drivers can be modular - I consciously didn't add that myself since I 
was mistakenly thinking only iommu-dma would call it.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-10 Thread Robin Murphy


On 2021-03-10 09:25, Christoph Hellwig wrote:

On Wed, Mar 10, 2021 at 10:15:01AM +0100, Christoph Hellwig wrote:

On Thu, Mar 04, 2021 at 03:25:27PM +, Robin Murphy wrote:

On 2021-03-01 08:42, Christoph Hellwig wrote:

Use explicit methods for setting and querying the information instead.


Now that everyone's using iommu-dma, is there any point in bouncing this
through the drivers at all? Seems like it would make more sense for the x86
drivers to reflect their private options back to iommu_dma_strict (and
allow Intel's caching mode to override it as well), then have
iommu_dma_init_domain just test !iommu_dma_strict &&
domain->ops->flush_iotlb_all.


Hmm.  I looked at this, and kill off ->dma_enable_flush_queue for
the ARM drivers and just looking at iommu_dma_strict seems like a
very clear win.

OTOH x86 is a little more complicated.  AMD and intel defaul to lazy
mode, so we'd have to change the global iommu_dma_strict if they are
initialized.  Also Intel has not only a "static" option to disable
lazy mode, but also a "dynamic" one where it iterates structure.  So
I think on the get side we're stuck with the method, but it still
simplifies the whole thing.


Actually... Just mirroring the iommu_dma_strict value into
struct iommu_domain should solve all of that with very little
boilerplate code.


Yes, my initial thought was to directly replace the attribute with a
common flag at iommu_domain level, but since in all cases the behaviour
is effectively global rather than actually per-domain, it seemed
reasonable to take it a step further. This passes compile-testing for
arm64 and x86, what do you think?

Robin.

->8-
Subject: [PATCH] iommu: Consolidate strict invalidation handling

Now that everyone is using iommu-dma, the global invalidation policy
really doesn't need to be woven through several parts of the core API
and individual drivers, we can just look it up directly at the one point
that we now make the flush queue decision. If the x86 drivers reflect
their internal options and overrides back to iommu_dma_strict, that can
become the canonical source.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/amd/iommu.c   |  2 ++
 drivers/iommu/dma-iommu.c   |  8 +---
 drivers/iommu/intel/iommu.c | 12 
 drivers/iommu/iommu.c   | 35 +++
 include/linux/iommu.h   |  2 ++
 5 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40..1db29e59d468 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1856,6 +1856,8 @@ int __init amd_iommu_init_dma_ops(void)
else
pr_info("Lazy IO/TLB flushing enabled\n");
 
+	iommu_set_dma_strict(amd_iommu_unmap_flush);

+
return 0;
 
 }

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c813cc8..789a950cc125 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -304,10 +304,6 @@ static void iommu_dma_flush_iotlb_all(struct iova_domain 
*iovad)
 
 	cookie = container_of(iovad, struct iommu_dma_cookie, iovad);

domain = cookie->fq_domain;
-   /*
-* The IOMMU driver supporting DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
-* implies that ops->flush_iotlb_all must be non-NULL.
-*/
domain->ops->flush_iotlb_all(domain);
 }
 
@@ -334,7 +330,6 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,

struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
struct iova_domain *iovad;
-   int attr;
 
 	if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)

return -EINVAL;
@@ -371,8 +366,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
init_iova_domain(iovad, 1UL << order, base_pfn);
 
 	if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&

-   !iommu_domain_get_attr(domain, DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, &attr) 
&&
-   attr) {
+   domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) {
if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
  iommu_dma_entry_dtor))
pr_warn("iova flush queue initialization failed\n");
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b5c746f0f63b..f5b452cd1266 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4377,6 +4377,17 @@ int __init intel_iommu_init(void)
 
 	down_read(&dmar_global_lock);

for_each_active_iommu(iommu, drhd) {
+   if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
+   /*
+* The flush queue implementation does

Re: [Freedreno] [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-04 Thread Robin Murphy


On 2021-03-01 08:42, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 


Moreso than the previous patch, where the feature is at least relatively 
generic (note that there's a bunch of in-flight development around 
DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to 
bloat the generic iommu_ops structure with private driver-specific 
interfaces. The attribute interface is a great compromise for these 
kinds of things, and you can easily add type-checked wrappers around it 
for external callers (maybe even make the actual attributes internal 
between the IOMMU core and drivers) if that's your concern.


Robin.


---
  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 40 +++--
  drivers/iommu/iommu.c   |  9 ++
  include/linux/iommu.h   |  9 +-
  4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 0f184c3dd9d9ec..78d98ab2ee3a68 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu)
struct io_pgtable_domain_attr pgtbl_cfg;
  
  	pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;

-   iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg);
+   iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg);
  }
  
  struct msm_gem_address_space *

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2e17d990d04481..2858999c86dfd1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct 
iommu_domain *domain)
return ret;
  }
  
-static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain,
+   struct io_pgtable_domain_attr *pgtbl_cfg)
  {
-   int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret = -EPERM;
  
-	mutex_lock(&smmu_domain->init_mutex);

-
-   switch(domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
-   struct io_pgtable_domain_attr *pgtbl_cfg = data;
-
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
-   }
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
  
-			smmu_domain->pgtbl_cfg = *pgtbl_cfg;

-   break;
-   }
-   default:
-   ret = -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   ret = -ENODEV;
-   break;
-   default:
-   ret = -EINVAL;
+   mutex_lock(&smmu_domain->init_mutex);
+   if (!smmu_domain->smmu) {
+   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
+   ret = 0;
}
-out_unlock:
mutex_unlock(&smmu_domain->init_mutex);
+
return ret;
  }
  
@@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {

.device_group   = arm_smmu_device_group,
.dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
.dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
-   .domain_set_attr= arm_smmu_domain_set_attr,
+   .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr,
.domain_enable_nesting  = arm_smmu_domain_enable_nesting,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2e9e058501a953..8490aefd4b41f8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain 
*domain)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting);
  
+int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,

+   struct io_pgtable_domain_attr *pgtbl_cfg)
+{
+   if (!domain->ops->domain_set_pgtable_attr)
+   return -EINVAL;
+   return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr);
+
  void iommu_get_resv_regions(struct device *dev, struct list_head *list)
  {
const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aed88aa3bd3edf..39d3ed4d2700ac 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,6 +40,7 @@ struct iommu_domain;
  struct notifier_block;
  struct iommu_sva;
  struct iommu_fault_event;
+stru

Re: [Freedreno] [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-04 Thread Robin Murphy


On 2021-03-01 08:42, Christoph Hellwig wrote:

Use explicit methods for setting and querying the information instead.


Now that everyone's using iommu-dma, is there any point in bouncing this 
through the drivers at all? Seems like it would make more sense for the 
x86 drivers to reflect their private options back to iommu_dma_strict 
(and allow Intel's caching mode to override it as well), then have 
iommu_dma_init_domain just test !iommu_dma_strict && 
domain->ops->flush_iotlb_all.


Robin.


Also remove the now unused iommu_domain_get_attr functionality.

Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 ++---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++---
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 56 +
  drivers/iommu/dma-iommu.c   |  8 ++-
  drivers/iommu/intel/iommu.c | 27 ++
  drivers/iommu/iommu.c   | 19 +++
  include/linux/iommu.h   | 17 ++-
  7 files changed, 51 insertions(+), 146 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40d0..37a8e51db17656 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1771,24 +1771,11 @@ static struct iommu_group 
*amd_iommu_device_group(struct device *dev)
return acpihid_device_group(dev);
  }
  
-static int amd_iommu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain)
  {
-   switch (domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   return -ENODEV;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = !amd_iommu_unmap_flush;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return !amd_iommu_unmap_flush;
  }
  
  /*

@@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = {
.release_device = amd_iommu_release_device,
.probe_finalize = amd_iommu_probe_finalize,
.device_group = amd_iommu_device_group,
-   .domain_get_attr = amd_iommu_domain_get_attr,
+   .dma_use_flush_queue = amd_iommu_dma_use_flush_queue,
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a8304375..bf96172e8c1f71 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain)
  {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
  
-	switch (domain->type) {

-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return smmu_domain->non_strict;
+}
+
+
+static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain)
+{
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return;
+   to_smmu_domain(domain)->non_strict = true;
  }
  
  static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

@@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
}
break;
case IOMMU_DOMAIN_DMA:
-   switch(attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
-   break;
-   default:
-   ret = -EN

Re: [Freedreno] [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration

2020-12-23 Thread Robin Murphy


On 2020-12-22 19:54, isa...@codeaurora.org wrote:

On 2020-12-22 11:27, Robin Murphy wrote:

On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The io-pgtable code constructs an array of init functions for each
page table format at compile time. This is not ideal, as this
increases the footprint of the io-pgtable code, as well as prevents
io-pgtable formats from being built as kernel modules.

In preparation for modularizing the io-pgtable formats, switch to a
dynamic registration scheme, where each io-pgtable format can register
their init functions with the io-pgtable code at boot or module
insertion time.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/io-pgtable-arm-v7s.c | 34 +-
  drivers/iommu/io-pgtable-arm.c | 90 
++--
  drivers/iommu/io-pgtable.c | 94 
--

  include/linux/io-pgtable.h | 51 +
  4 files changed, 209 insertions(+), 60 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c

index 1d92ac9..89aad2f 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -835,7 +836,8 @@ static struct io_pgtable 
*arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,

  return NULL;
  }
  -struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
+static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
+    .fmt    = ARM_V7S,
  .alloc    = arm_v7s_alloc_pgtable,
  .free    = arm_v7s_free_pgtable,
  };
@@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void)
  pr_info("self test ok\n");
  return 0;
  }
-subsys_initcall(arm_v7s_do_selftests);
+#else
+static int arm_v7s_do_selftests(void)
+{
+    return 0;
+}
  #endif
+
+static int __init arm_v7s_init(void)
+{
+    int ret;
+
+    ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns);
+    if (ret < 0) {
+    pr_err("Failed to register ARM V7S format\n");


Super-nit: I think "v7s" should probably be lowercase there. Also
general consistency WRT to showing the error code and whether or not
to abbreviate "format" would be nice.


Ok, I can fix this accordingly.


+    return ret;
+    }
+
+    ret = arm_v7s_do_selftests();
+    if (ret < 0)
+    io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+
+    return ret;
+}
+core_initcall(arm_v7s_init);
+
+static void __exit arm_v7s_exit(void)
+{
+    io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+}
+module_exit(arm_v7s_exit);
diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 87def58..ff0ea2f 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct 
io_pgtable_cfg *cfg, void *cookie)

  return NULL;
  }
  -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
-    .alloc    = arm_64_lpae_alloc_pgtable_s1,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
-    .alloc    = arm_64_lpae_alloc_pgtable_s2,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
-    .alloc    = arm_32_lpae_alloc_pgtable_s1,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
-    .alloc    = arm_32_lpae_alloc_pgtable_s2,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
-    .alloc    = arm_mali_lpae_alloc_pgtable,
-    .free    = arm_lpae_free_pgtable,
+static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = {
+    {
+    .fmt    = ARM_32_LPAE_S1,
+    .alloc    = arm_32_lpae_alloc_pgtable_s1,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_32_LPAE_S2,
+    .alloc    = arm_32_lpae_alloc_pgtable_s2,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_64_LPAE_S1,
+    .alloc    = arm_64_lpae_alloc_pgtable_s1,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_64_LPAE_S2,
+    .alloc    = arm_64_lpae_alloc_pgtable_s2,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_MALI_LPAE,
+    .alloc    = arm_mali_lpae_alloc_pgtable,
+    .free    = arm_lpae_free_pgtable,
+    },
  };
    #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
@@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void)
  pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail);
  return fail ? -EFAULT : 0;
  }
-subsys_initcall(arm_lpae_do_selftests);
+#else
+static int __init arm_lpae_d

Re: [Freedreno] [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules

2020-12-23 Thread Robin Murphy


On 2020-12-22 19:49, isa...@codeaurora.org wrote:

On 2020-12-22 11:27, Robin Murphy wrote:

On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The SMMU driver depends on the availability of the ARM LPAE and
ARM V7S io-pgtable format code to work properly. In preparation


Nit: we don't really depend on v7s - we *can* use it if it's
available, address constraints are suitable, and the SMMU
implementation actually supports it (many don't), but we can still
quite happily not use it even so. LPAE is mandatory in the
architecture so that's our only hard requirement, embodied in the
kconfig select.

This does mean there may technically still be a corner case involving
ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime
failure rather than a build error, so unless and until anyone
demonstrates that it actually matters I don't feel particularly
inclined to give it much thought.

Robin.


Okay, I'll fix up the commit message, as well as the code, so that it
only depends on io-pgtable-arm.


Well, IIUC it would make sense to keep the softdep for when the v7s 
module *is* present; I just wanted to clarify that it's more of a 
nice-to-have rather than a necessity.


Robin.


Thanks,
Isaac

for having the io-pgtable formats as modules, add a "pre"
dependency with MODULE_SOFTDEP() to ensure that the io-pgtable
format modules are loaded before loading the ARM SMMU driver module.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index d8c6bfd..a72649f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM 
architected SMMU implementations");

  MODULE_AUTHOR("Will Deacon ");
  MODULE_ALIAS("platform:arm-smmu");
  MODULE_LICENSE("GPL v2");
+MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s");



___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration

2020-12-22 Thread Robin Murphy


On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The io-pgtable code constructs an array of init functions for each
page table format at compile time. This is not ideal, as this
increases the footprint of the io-pgtable code, as well as prevents
io-pgtable formats from being built as kernel modules.

In preparation for modularizing the io-pgtable formats, switch to a
dynamic registration scheme, where each io-pgtable format can register
their init functions with the io-pgtable code at boot or module
insertion time.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/io-pgtable-arm-v7s.c | 34 +-
  drivers/iommu/io-pgtable-arm.c | 90 ++--
  drivers/iommu/io-pgtable.c | 94 --
  include/linux/io-pgtable.h | 51 +
  4 files changed, 209 insertions(+), 60 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 1d92ac9..89aad2f 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -835,7 +836,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
return NULL;
  }
  
-struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {

+static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
+   .fmt= ARM_V7S,
.alloc  = arm_v7s_alloc_pgtable,
.free   = arm_v7s_free_pgtable,
  };
@@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void)
pr_info("self test ok\n");
return 0;
  }
-subsys_initcall(arm_v7s_do_selftests);
+#else
+static int arm_v7s_do_selftests(void)
+{
+   return 0;
+}
  #endif
+
+static int __init arm_v7s_init(void)
+{
+   int ret;
+
+   ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns);
+   if (ret < 0) {
+   pr_err("Failed to register ARM V7S format\n");


Super-nit: I think "v7s" should probably be lowercase there. Also 
general consistency WRT to showing the error code and whether or not to 
abbreviate "format" would be nice.



+   return ret;
+   }
+
+   ret = arm_v7s_do_selftests();
+   if (ret < 0)
+   io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+
+   return ret;
+}
+core_initcall(arm_v7s_init);
+
+static void __exit arm_v7s_exit(void)
+{
+   io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+}
+module_exit(arm_v7s_exit);
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58..ff0ea2f 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, 
void *cookie)
return NULL;
  }
  
-struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {

-   .alloc  = arm_64_lpae_alloc_pgtable_s1,
-   .free   = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
-   .alloc  = arm_64_lpae_alloc_pgtable_s2,
-   .free   = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
-   .alloc  = arm_32_lpae_alloc_pgtable_s1,
-   .free   = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
-   .alloc  = arm_32_lpae_alloc_pgtable_s2,
-   .free   = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
-   .alloc  = arm_mali_lpae_alloc_pgtable,
-   .free   = arm_lpae_free_pgtable,
+static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = {
+   {
+   .fmt= ARM_32_LPAE_S1,
+   .alloc  = arm_32_lpae_alloc_pgtable_s1,
+   .free   = arm_lpae_free_pgtable,
+   },
+   {
+   .fmt= ARM_32_LPAE_S2,
+   .alloc  = arm_32_lpae_alloc_pgtable_s2,
+   .free   = arm_lpae_free_pgtable,
+   },
+   {
+   .fmt= ARM_64_LPAE_S1,
+   .alloc  = arm_64_lpae_alloc_pgtable_s1,
+   .free   = arm_lpae_free_pgtable,
+   },
+   {
+   .fmt= ARM_64_LPAE_S2,
+   .alloc  = arm_64_lpae_alloc_pgtable_s2,
+   .free   = arm_lpae_free_pgtable,
+   },
+   {
+   .fmt= ARM_MALI_LPAE,
+   .alloc  = arm_mali_lpae_alloc_pgtable,
+   .free   = arm_lpae_free_pgtable,
+   },
  };
  
  #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST

@@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void)
pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail);
return fail ? -EFAULT : 0;
  }
-subsys_initcall(arm_lpae_do_selftests);
+#else
+static int __init arm_lpae_

Re: [Freedreno] [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules

2020-12-22 Thread Robin Murphy


On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The SMMU driver depends on the availability of the ARM LPAE and
ARM V7S io-pgtable format code to work properly. In preparation


Nit: we don't really depend on v7s - we *can* use it if it's available, 
address constraints are suitable, and the SMMU implementation actually 
supports it (many don't), but we can still quite happily not use it even 
so. LPAE is mandatory in the architecture so that's our only hard 
requirement, embodied in the kconfig select.


This does mean there may technically still be a corner case involving 
ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime 
failure rather than a build error, so unless and until anyone 
demonstrates that it actually matters I don't feel particularly inclined 
to give it much thought.


Robin.


for having the io-pgtable formats as modules, add a "pre"
dependency with MODULE_SOFTDEP() to ensure that the io-pgtable
format modules are loaded before loading the ARM SMMU driver module.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfd..a72649f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU 
implementations");
  MODULE_AUTHOR("Will Deacon ");
  MODULE_ALIAS("platform:arm-smmu");
  MODULE_LICENSE("GPL v2");
+MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s");


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH] drm/msm/a6xx: Add support for using system cache on MMU500 based targets

2020-10-27 Thread Robin Murphy


On 2020-10-26 18:54, Jordan Crouse wrote:

This is an extension to the series [1] to enable the System Cache (LLC) for
Adreno a6xx targets.

GPU targets with an MMU-500 attached have a slightly different process for
enabling system cache. Use the compatible string on the IOMMU phandle
to see if an MMU-500 is attached and modify the programming sequence
accordingly.

[1] https://patchwork.freedesktop.org/series/83037/

Signed-off-by: Jordan Crouse 
---

  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 +--
  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
  2 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 95c98c642876..b7737732fbb6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1042,6 +1042,8 @@ static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu)
  
  static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)

  {
+   struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+   struct msm_gpu *gpu = &adreno_gpu->base;
u32 cntl1_regval = 0;
  
  	if (IS_ERR(a6xx_gpu->llc_mmio))

@@ -1055,11 +1057,17 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
   (gpu_scid << 15) | (gpu_scid << 20);
}
  
+	/*

+* For targets with a MMU500, activate the slice but don't program the
+* register.  The XBL will take care of that.
+*/
if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) {
-   u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
+   if (!a6xx_gpu->have_mmu500) {
+   u32 gpuhtw_scid = 
llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
  
-		gpuhtw_scid &= 0x1f;

-   cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid);
+   gpuhtw_scid &= 0x1f;
+   cntl1_regval |= FIELD_PREP(GENMASK(29, 25), 
gpuhtw_scid);
+   }
}
  
  	if (cntl1_regval) {

@@ -1067,13 +1075,20 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
 * Program the slice IDs for the various GPU blocks and GPU MMU
 * pagetables
 */
-   a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, 
cntl1_regval);
-
-   /*
-* Program cacheability overrides to not allocate cache lines on
-* a write miss
-*/
-   a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 
0xF, 0x03);
+   if (a6xx_gpu->have_mmu500)
+   gpu_rmw(gpu, REG_A6XX_GBIF_SCACHE_CNTL1, GENMASK(24, 0),
+   cntl1_regval);
+   else {
+   a6xx_llc_write(a6xx_gpu,
+   REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, 
cntl1_regval);
+
+   /*
+* Program cacheability overrides to not allocate cache
+* lines on a write miss
+*/
+   a6xx_llc_rmw(a6xx_gpu,
+   REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, 
0x03);
+   }
}
  }
  
@@ -1086,10 +1101,21 @@ static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)

  static void a6xx_llc_slices_init(struct platform_device *pdev,
struct a6xx_gpu *a6xx_gpu)
  {
+   struct device_node *phandle;
+
a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
if (IS_ERR(a6xx_gpu->llc_mmio))
return;
  
+	/*

+* There is a different programming path for targets with an mmu500
+* attached, so detect if that is the case
+*/
+   phandle = of_parse_phandle(pdev->dev.of_node, "iommus", 0);
+   a6xx_gpu->have_mmu500 = (phandle &&
+   of_device_is_compatible(phandle, "arm,mmu500"));


Note that this should never match, since the compatible string defined 
by the binding is "arm,mmu-500" ;)


Robin.


+   of_node_put(phandle);
+
a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU);
a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
  
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h

index 9e6079af679c..e793d329e77b 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -32,6 +32,7 @@ struct a6xx_gpu {
void __iomem *llc_mmio;
void *llc_slice;
void *htw_llc_slice;
+   bool have_mmu500;
  };
  
  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)



___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

[Freedreno] [PATCH] drm/msm: Add missing stub definition

2020-10-26 Thread Robin Murphy

DRM_MSM fails to build with DRM_MSM_DP=n; add the missing stub.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/msm/msm_drv.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index b9dd8f8f4887..0b2686b060c7 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -423,6 +423,11 @@ static inline int msm_dp_display_disable(struct msm_dp *dp,
 {
return -EINVAL;
 }
+static inline int msm_dp_display_pre_disable(struct msm_dp *dp,
+   struct drm_encoder *encoder)
+{
+   return -EINVAL;
+}
 static inline void msm_dp_display_mode_set(struct msm_dp *dp,
struct drm_encoder *encoder,
struct drm_display_mode *mode,
-- 
2.28.0.dirty

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 2/3] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance

2020-10-13 Thread Robin Murphy


On 2020-10-07 07:25, Christoph Hellwig wrote:

On Tue, Oct 06, 2020 at 09:19:32AM -0400, Jonathan Marek wrote:

One example why drm/msm can't use DMA API is multiple page table support
(that is landing in 5.10), which is something that definitely couldn't work
with DMA API.

Another one is being able to choose the address for mappings, which AFAIK
DMA API can't do (somewhat related to this: qcom hardware often has ranges
of allowed addresses, which the dma_mask mechanism fails to represent, what
I see is drivers using dma_mask as a "maximum address", and since addresses
are allocated from the top it generally works)


That sounds like a good enough rason to use the IOMMU API.  I just
wanted to make sure this really makes sense.


I still think this situation would be best handled with a variant of 
dma_ops_bypass that also guarantees to bypass SWIOTLB, and can be set 
automatically when attaching to an unmanaged IOMMU domain. That way the 
device driver can make DMA API calls in the appropriate places that do 
the right thing either way, and only needs logic to decide whether to 
use the returned DMA addresses directly or ignore them if it knows 
they're overridden by its own IOMMU mapping.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCHv5 5/6] iommu: arm-smmu-impl: Use table to list QCOM implementations

2020-09-23 Thread Robin Murphy


On 2020-09-22 07:18, Sai Prakash Ranjan wrote:

Use table and of_match_node() to match qcom implementation
instead of multiple of_device_compatible() calls for each
QCOM SMMU implementation.

Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index d199b4bff15d..ce78295cfa78 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -9,6 +9,13 @@
  
  #include "arm-smmu.h"
  
+static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = {

+   { .compatible = "qcom,sc7180-smmu-500" },
+   { .compatible = "qcom,sdm845-smmu-500" },
+   { .compatible = "qcom,sm8150-smmu-500" },
+   { .compatible = "qcom,sm8250-smmu-500" },
+   { }
+};


Can you push the table itself into arm-smmu-qcom? That way you'll be 
free to add new SoCs willy-nilly without any possibility of conflicting 
with anything else.


Bonus points if you can fold in the Adreno variant and keep everything 
together ;)


Robin.


  static int arm_smmu_gr0_ns(int offset)
  {
@@ -217,10 +224,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
return nvidia_smmu_impl_init(smmu);
  
-	if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||

-   of_device_is_compatible(np, "qcom,sc7180-smmu-500") ||
-   of_device_is_compatible(np, "qcom,sm8150-smmu-500") ||
-   of_device_is_compatible(np, "qcom,sm8250-smmu-500"))
+   if (of_match_node(qcom_smmu_impl_of_match, np))
return qcom_smmu_impl_init(smmu);
  
  	if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))



___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCHv4 1/6] iommu/io-pgtable-arm: Add support to use system cache

2020-09-21 Thread Robin Murphy


On 2020-09-21 19:03, Will Deacon wrote:

On Fri, Sep 11, 2020 at 07:57:18PM +0530, Sai Prakash Ranjan wrote:

Add a quirk IO_PGTABLE_QUIRK_SYS_CACHE to override the
attributes set in TCR for the page table walker when
using system cache.


I wonder if the panfrost folks can reuse this for the issue discussed
over at:

https://lore.kernel.org/r/cover.1600213517.git.robin.mur...@arm.com


Isn't this all hinged around the outer cacheability attribute, rather 
than shareability (since these are nominally NC mappings and thus 
already properly Osh)? The Panfrost issue is just about shareability 
domains being a bit wonky; the cacheability attributes there are 
actually reasonably normal (other than not having a non-cacheable type 
at all, only a choice of allocation policies...)


Robin.


However, Sai, your email setup went wrong when you posted this so you
probably need to repost now that you have that fixed.

Will
___
iommu mailing list
io...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCHv4 6/6] iommu: arm-smmu-impl: Remove unwanted extra blank lines

2020-09-11 Thread Robin Murphy


On 2020-09-11 17:21, Sai Prakash Ranjan wrote:

On 2020-09-11 21:37, Will Deacon wrote:

On Fri, Sep 11, 2020 at 05:03:06PM +0100, Robin Murphy wrote:
BTW am I supposed to have received 3 copies of everything? Because I 
did...


Yeah, this seems to be happening for all of Sai's emails :/



Sorry, I am not sure what went wrong as I only sent this once
and there are no recent changes to any of my configs, I'll
check it further.


Actually on closer inspection it appears to be "correct" behaviour. I'm 
still subscribed to LAKML and the IOMMU list on this account, but 
normally Office 365 deduplicates so aggressively that I have rules set 
up to copy list mails that I'm cc'ed on back to my inbox, in case they 
arrive first and cause the direct copy to get eaten - apparently there's 
something unique about your email setup that manages to defeat the 
deduplicator and make it deliver all 3 copies intact... :/


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCHv4 6/6] iommu: arm-smmu-impl: Remove unwanted extra blank lines

2020-09-11 Thread Robin Murphy


On 2020-09-11 15:28, Sai Prakash Ranjan wrote:

There are few places in arm-smmu-impl where there are
extra blank lines, remove them


FWIW those were deliberate - sometimes I like a bit of subtle space to 
visually delineate distinct groups of definitions. I suppose it won't be 
to everyone's taste :/



and while at it fix the
checkpatch warning for space required before the open
parenthesis.


That one, however, was not ;)

BTW am I supposed to have received 3 copies of everything? Because I did...

Robin.


Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index ce78295cfa78..f5b5218cbe5b 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -19,7 +19,7 @@ static const struct of_device_id __maybe_unused 
qcom_smmu_impl_of_match[] = {
  
  static int arm_smmu_gr0_ns(int offset)

  {
-   switch(offset) {
+   switch (offset) {
case ARM_SMMU_GR0_sCR0:
case ARM_SMMU_GR0_sACR:
case ARM_SMMU_GR0_sGFSR:
@@ -54,7 +54,6 @@ static const struct arm_smmu_impl calxeda_impl = {
.write_reg = arm_smmu_write_ns,
  };
  
-

  struct cavium_smmu {
struct arm_smmu_device smmu;
u32 id_base;
@@ -110,7 +109,6 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct 
arm_smmu_device *smm
return &cs->smmu;
  }
  
-

  #define ARM_MMU500_ACTLR_CPRE (1 << 1)
  
  #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)

@@ -197,7 +195,6 @@ static const struct arm_smmu_impl mrvl_mmu500_impl = {
.reset = arm_mmu500_reset,
  };
  
-

  struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
  {
const struct device_node *np = smmu->dev->of_node;


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

[Freedreno] [PATCH] drm/msm: Drop local dma_parms

2020-09-03 Thread Robin Murphy

Since commit 9495b7e92f71 ("driver core: platform: Initialize dma_parms
for platform devices"), struct platform_device already provides a
dma_parms structure, so we can save allocating another one.

Also the DMA segment size is simply a size, not a bitmask.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/msm/msm_drv.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 7d641c7e3514..2b73d29642b7 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -453,15 +453,7 @@ static int msm_drm_init(struct device *dev, struct 
drm_driver *drv)
if (ret)
goto err_msm_uninit;
 
-   if (!dev->dma_parms) {
-   dev->dma_parms = devm_kzalloc(dev, sizeof(*dev->dma_parms),
- GFP_KERNEL);
-   if (!dev->dma_parms) {
-   ret = -ENOMEM;
-   goto err_msm_uninit;
-   }
-   }
-   dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+   dma_set_max_seg_size(dev, UINT_MAX);
 
msm_gem_shrinker_init(ddev);
 
-- 
2.28.0.dirty

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v9 12/32] drm: msm: fix common struct sg_table related issues

2020-09-01 Thread Robin Murphy


On 2020-08-26 07:32, Marek Szyprowski wrote:

The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Acked-by: Rob Clark 
---
  drivers/gpu/drm/msm/msm_gem.c| 13 +
  drivers/gpu/drm/msm/msm_gpummu.c | 14 ++
  drivers/gpu/drm/msm/msm_iommu.c  |  2 +-
  3 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index b2f49152b4d4..8c7ae812b813 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -53,11 +53,10 @@ static void sync_for_device(struct msm_gem_object *msm_obj)
struct device *dev = msm_obj->base.dev->dev;
  
  	if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {

-   dma_sync_sg_for_device(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_sync_sgtable_for_device(dev, msm_obj->sgt,
+   DMA_BIDIRECTIONAL);
} else {
-   dma_map_sg(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_map_sgtable(dev, msm_obj->sgt, DMA_BIDIRECTIONAL, 0);
}
  }
  
@@ -66,11 +65,9 @@ static void sync_for_cpu(struct msm_gem_object *msm_obj)

struct device *dev = msm_obj->base.dev->dev;
  
  	if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {

-   dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_sync_sgtable_for_cpu(dev, msm_obj->sgt, DMA_BIDIRECTIONAL);
} else {
-   dma_unmap_sg(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(dev, msm_obj->sgt, DMA_BIDIRECTIONAL, 0);
}
  }
  
diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c

index 310a31b05faa..319f06c28235 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -30,21 +30,19 @@ static int msm_gpummu_map(struct msm_mmu *mmu, uint64_t 
iova,
  {
struct msm_gpummu *gpummu = to_msm_gpummu(mmu);
unsigned idx = (iova - GPUMMU_VA_START) / GPUMMU_PAGE_SIZE;
-   struct scatterlist *sg;
+   struct sg_dma_page_iter dma_iter;
unsigned prot_bits = 0;
-   unsigned i, j;
  
  	if (prot & IOMMU_WRITE)

prot_bits |= 1;
if (prot & IOMMU_READ)
prot_bits |= 2;
  
-	for_each_sg(sgt->sgl, sg, sgt->nents, i) {

-   dma_addr_t addr = sg->dma_address;
-   for (j = 0; j < sg->length / GPUMMU_PAGE_SIZE; j++, idx++) {
-   gpummu->table[idx] = addr | prot_bits;
-   addr += GPUMMU_PAGE_SIZE;
-   }
+   for_each_sgtable_dma_page(sgt, &dma_iter, 0) {
+   dma_addr_t addr = sg_page_iter_dma_address(&dma_iter);
+
+   BUILD_BUG_ON(GPUMMU_PAGE_SIZE != PAGE_SIZE);
+   gpummu->table[idx++] = addr | prot_bits;


Given that the BUILD_BUG_ON might prevent valid arm64 configs from 
building, how about a simple tweak like:


for (i = 0; i < PAGE_SIZE; i += GPUMMU_PAGE_SIZE)
gpummu->table[idx++] = i + addr | prot_bits;
?

Or alternatively perhaps some more aggressive #ifdefs or makefile tweaks 
to prevent the GPUMMU code building for arm64 at all if it's only 
relevant to 32-bit platforms (which I believe might be the case).


Robin.


}
  
  	/* we can improve by deferring flush for multiple map() */

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 3a381a9674c9..6c31e65834c6 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -36,7 +36,7 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,

Re: [Freedreno] arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c

2020-07-20 Thread Robin Murphy


On 2020-07-20 08:17, Arnd Bergmann wrote:

On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju
 wrote:


This kernel oops while boot linux mainline kernel on arm64  db410c device.

metadata:
   git branch: master
   git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
   git commit: f8456690ba8eb18ea4714e68554e242a04f65cff
   git describe: v5.8-rc5-48-gf8456690ba8e
   make_kernelversion: 5.8.0-rc5
   kernel-config:
https://builds.tuxbuild.com/2aLnwV7BLStU0t1R1QPwHQ/kernel.config


Thanks for the report. Adding freedreno folks to Cc, as this may have something
to do with that driver.



[5.444121] Unable to handle kernel NULL pointer dereference at
virtual address 0018
[5.456615]   ESR = 0x9604
[5.464471]   SET = 0, FnV = 0
[5.464487]   EA = 0, S1PTW = 0
[5.466521] Data abort info:
[5.469971]   ISV = 0, ISS = 0x0004
[5.472768]   CM = 0, WnR = 0
[5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=bacba000
[5.479349] [0018] pgd=, p4d=
[5.485820] Internal error: Oops: 9604 [#1] PREEMPT SMP
[5.492448] Modules linked in: crct10dif_ce adv7511(+)
qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg
drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng
videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns
rmtfs_mem fuse
[5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1
[5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[5.529170] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--)
[5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8
[5.541148] lr : free_io_pgtable_ops+0x28/0x58
[5.546350] sp : 80001219b5f0
[5.550689] x29: 80001219b5f0 x28: 0013
[5.554078] x27: 0100 x26: 36add3b8
[5.559459] x25: 8915e910 x24: 3a5458c0
[5.564753] x23: 0003 x22: 36a37058
[5.570049] x21: 36a3a100 x20: 36a3a480
[5.575344] x19: 36a37158 x18: 
[5.580639] x17:  x16: 
[5.585935] x15: 0004 x14: 0368
[5.591229] x13:  x12: 39c61798
[5.596525] x11: 39c616d0 x10: 4000
[5.601820] x9 :  x8 : 39c616f8
[5.607114] x7 :  x6 : 09f699a0
[5.612410] x5 : 80001219b520 x4 : 36a3a000
[5.617705] x3 : 09f69904 x2 : 
[5.623001] x1 : 8000107e27e8 x0 : 3a545810
[5.628297] Call trace:
[5.633592]  qcom_iommu_tlb_inv_context+0x18/0xa8


This means that dev_iommu_fwspec_get() has returned NULL
in qcom_iommu_tlb_inv_context(), either because dev->iommu
is NULL, or because dev->iommu->fwspec is NULL.

qcom_iommu_tlb_inv_context() does not check for a NULL
pointer before using the returned object.

The bug is either in the lack of error handling, or the fact
that it's possible to get into this function for a device
that has not been fully set up.


Not quite - the device *was* properly set up, but has already been 
properly torn down again in the removal path by iommu_release_device(). 
The problem is that qcom-iommu kept the device pointer as its TLB cookie 
for the domain, but the domain has a longer lifespan than the validity 
of that device - that's a fundamental design flaw in the driver.


Robin.


[5.635764]  free_io_pgtable_ops+0x28/0x58
[5.640624]  qcom_iommu_domain_free+0x38/0x60
[5.644617]  iommu_group_release+0x4c/0x70
[5.649045]  kobject_put+0x6c/0x120
[5.653035]  kobject_del+0x64/0x90
[5.656421]  kobject_put+0xfc/0x120
[5.659893]  iommu_group_remove_device+0xdc/0xf0
[5.663281]  iommu_release_device+0x44/0x70
[5.668142]  iommu_bus_notifier+0xbc/0xd0
[5.672048]  notifier_call_chain+0x54/0x98
[5.676214]  blocking_notifier_call_chain+0x48/0x70
[5.680209]  device_del+0x26c/0x3a0
[5.684981]  platform_device_del.part.0+0x1c/0x88
[5.688453]  platform_device_unregister+0x24/0x40
[5.693316]  of_platform_device_destroy+0xe4/0xf8
[5.698002]  device_for_each_child+0x5c/0xa8
[5.702689]  of_platform_depopulate+0x3c/0x80
[5.707144]  msm_pdev_probe+0x1c4/0x308 [msm]


It was triggered by a failure in msm_pdev_probe(), which was
calls of_platform_depopulate() in its error handling code.
This is a combination of two problems:

a) Whatever caused msm_pdev_probe() to fail means that
the gpu won't be usable, though it should not have caused the
kernel to crash.

b) the error handling itself causing additional problems due
to failed unwinding.


[5.711286]  platform_drv_probe+0x54/0xa8
[5.715624]  really_probe+0xd8/0x320
[5.719617]  driver_probe_device+0x58/0xb8
[5.723263]  device_driver_attach+0x74/0x80
[5.727168]  __driver_attach+0x58/0xe0
[5.731248]  bus_for_each_d

Re: [Freedreno] [PATCH v2 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2

2020-07-07 Thread Robin Murphy


On 2020-06-26 21:04, Jordan Crouse wrote:

Support auxiliary domains for arm-smmu-v2 to initialize and support
multiple pagetables for a single SMMU context bank. Since the smmu-v2
hardware doesn't have any built in support for switching the pagetable
base it is left as an exercise to the caller to actually use the pagetable.


Hmm, I've still been thinking that we could model this as supporting 
exactly 1 aux domain iff the device is currently attached to a primary 
domain with TTBR1 enabled. Then supporting multiple aux domains with 
magic TTBR0 switching is the Adreno-specific extension on top of that.


And if we don't want to go to that length, then - as I think Will was 
getting at - I'm not sure it's worth bothering at all. There doesn't 
seem to be any point in half-implementing a pretend aux domain interface 
while still driving a bus through the rest of the abstractions - it's 
really the worst of both worlds. If we're going to hand over the guts of 
io-pgtable to the GPU driver then couldn't it just use 
DOMAIN_ATTR_PGTABLE_CFG bidirectionally to inject a TTBR0 table straight 
into the TTBR1-ified domain?


Much as I like the idea of the aux domain abstraction and making this 
fit semi-transparently into the IOMMU API, if an almost entirely private 
interface will be the simplest and cleanest way to get it done then at 
this point also I'm starting to lean towards just getting it done. But 
if some other mediated-device type case then turns up that doesn't quite 
fit that private interface, we revisit the proper abstraction again and 
I reserve the right to say "I told you so" ;)


Robin.


Aux domains are supported if split pagetable (TTBR1) support has been
enabled on the master domain.  Each auxiliary domain will reuse the
configuration of the master domain. By default the a domain with TTBR1
support will have the TTBR0 region disabled so the first attached aux
domain will enable the TTBR0 region in the hardware and conversely the
last domain to be detached will disable TTBR0 translations.  All subsequent
auxiliary domains create a pagetable but not touch the hardware.

The leaf driver will be able to query the physical address of the
pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
address with whatever means it has to switch the pagetable base.

Following is a pseudo code example of how a domain can be created

  /* Check to see if aux domains are supported */
  if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
 iommu = iommu_domain_alloc(...);

 if (iommu_aux_attach_device(domain, dev))
 return FAIL;

/* Save the base address of the pagetable for use by the driver
iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, &ptbase);
  }

Then 'domain' can be used like any other iommu domain to map and
unmap iova addresses in the pagetable.

Signed-off-by: Jordan Crouse 
---

  drivers/iommu/arm-smmu.c | 219 ---
  drivers/iommu/arm-smmu.h |   1 +
  2 files changed, 204 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 060139452c54..ce6d654301bf 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -91,6 +91,7 @@ struct arm_smmu_cb {
u32 tcr[2];
u32 mair[2];
struct arm_smmu_cfg *cfg;
+   atomic_taux;
  };
  
  struct arm_smmu_master_cfg {

@@ -667,6 +668,86 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
  }
  
+/*

+ * Update the context context bank to enable TTBR0. Assumes AARCH64 S1
+ * configuration.
+ */
+static void arm_smmu_context_set_ttbr0(struct arm_smmu_cb *cb,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   u32 tcr = cb->tcr[0];
+
+   /* Add the TCR configuration from the new pagetable config */
+   tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+
+   /* Make sure that both TTBR0 and TTBR1 are enabled */
+   tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+   /* Udate the TCR register */
+   cb->tcr[0] = tcr;
+
+   /* Program the new TTBR0 */
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+}
+
+/*
+ * Thus function assumes that the current model only allows aux domains for
+ * AARCH64 S1 configurations
+ */
+static int arm_smmu_aux_init_domain_context(struct iommu_domain *domain,
+   struct arm_smmu_device *smmu, struct arm_smmu_cfg *master)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *pgtbl_ops;
+   struct io_pgtable_cfg pgtbl_cfg;
+
+   mutex_lock(&smmu_domain->init_mutex);
+
+   /* Copy the configuration from the master */
+   memcpy(&smmu_domain->cfg, master, sizeof(smmu_domain->cfg));
+
+   s

Re: [Freedreno] [PATCH v2 4/6] drm/msm: Add support to create a local pagetable

2020-07-07 Thread Robin Murphy


On 2020-06-26 21:04, Jordan Crouse wrote:

Add support to create a io-pgtable for use by targets that support
per-instance pagetables.  In order to support per-instance pagetables the
GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
split pagetables and auxiliary domains need to be supported and enabled.

Signed-off-by: Jordan Crouse 
---

  drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
  drivers/gpu/drm/msm/msm_iommu.c  | 180 ++-
  drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
  3 files changed, 195 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c
index 310a31b05faa..aab121f4beb7 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct 
msm_gpu *gpu)
}
  
  	gpummu->gpu = gpu;

-   msm_mmu_init(&gpummu->base, dev, &funcs);
+   msm_mmu_init(&gpummu->base, dev, &funcs, MSM_MMU_GPUMMU);
  
  	return &gpummu->base;

  }
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 1b6635504069..f455c597f76d 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -4,15 +4,192 @@
   * Author: Rob Clark 
   */
  
+#include 

  #include "msm_drv.h"
  #include "msm_mmu.h"
  
  struct msm_iommu {

struct msm_mmu base;
struct iommu_domain *domain;
+   struct iommu_domain *aux_domain;
  };
+
  #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
  
+struct msm_iommu_pagetable {

+   struct msm_mmu base;
+   struct msm_mmu *parent;
+   struct io_pgtable_ops *pgtbl_ops;
+   phys_addr_t ttbr;
+   u32 asid;
+};
+
+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
+{
+   return container_of(mmu, struct msm_iommu_pagetable, base);
+}
+
+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
+   size_t size)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   size_t unmapped = 0;
+
+   /* Unmap the block one page at a time */
+   while (size) {
+   unmapped += ops->unmap(ops, iova, 4096, NULL);
+   iova += 4096;
+   size -= 4096;
+   }
+
+   iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
+
+   return (unmapped == size) ? 0 : -EINVAL;
+}


Remember in patch #1 when you said "Then 'domain' can be used like any 
other iommu domain to map and unmap iova addresses in the pagetable."?


This appears to be very much not that :/

Robin.


+
+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
+   struct sg_table *sgt, size_t len, int prot)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   struct scatterlist *sg;
+   size_t mapped = 0;
+   u64 addr = iova;
+   unsigned int i;
+
+   for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+   size_t size = sg->length;
+   phys_addr_t phys = sg_phys(sg);
+
+   /* Map the block one page at a time */
+   while (size) {
+   if (ops->map(ops, addr, phys, 4096, prot)) {
+   msm_iommu_pagetable_unmap(mmu, iova, mapped);
+   return -EINVAL;
+   }
+
+   phys += 4096;
+   addr += 4096;
+   size -= 4096;
+   mapped += 4096;
+   }
+   }
+
+   return 0;
+}
+
+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+
+   free_io_pgtable_ops(pagetable->pgtbl_ops);
+   kfree(pagetable);
+}
+
+/*
+ * Given a parent device, create and return an aux domain. This will enable the
+ * TTBR0 region
+ */
+static struct iommu_domain *msm_iommu_get_aux_domain(struct msm_mmu *parent)
+{
+   struct msm_iommu *iommu = to_msm_iommu(parent);
+   struct iommu_domain *domain;
+   int ret;
+
+   if (iommu->aux_domain)
+   return iommu->aux_domain;
+
+   if (!iommu_dev_has_feature(parent->dev, IOMMU_DEV_FEAT_AUX))
+   return ERR_PTR(-ENODEV);
+
+   domain = iommu_domain_alloc(&platform_bus_type);
+   if (!domain)
+   return ERR_PTR(-ENODEV);
+
+   ret = iommu_aux_attach_device(domain, parent->dev);
+   if (ret) {
+   iommu_domain_free(domain);
+   return ERR_PTR(ret);
+   }
+
+   iommu->aux_domain = domain;
+   return domain;
+}
+
+int msm_iommu_pagetable_params(struct msm_mmu *mmu,
+   phys_addr_t *ttbr, int *asid)
+{
+   struct msm_iommu_pagetable *pagetable;
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return

Re: [Freedreno] [PATCH v2 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations

2020-07-07 Thread Robin Murphy


On 2020-06-26 21:04, Jordan Crouse wrote:

Allow a io-pgtable implementation to skip TLB operations by checking for
NULL pointers in the helper functions. It will be up to to the owner
of the io-pgtable instance to make sure that they independently handle
the TLB correctly.


I don't really understand what this is for - tricking the IOMMU driver 
into not performing its TLB maintenance at points when that maintenance 
has been deemed necessary doesn't seem like the appropriate way to 
achieve anything good :/


Robin.


Signed-off-by: Jordan Crouse 
---

  include/linux/io-pgtable.h | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 53d53c6c2be9..bbed1d3925ba 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -210,21 +210,24 @@ struct io_pgtable {
  
  static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)

  {
-   iop->cfg.tlb->tlb_flush_all(iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_all(iop->cookie);
  }
  
  static inline void

  io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
  {
-   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
  }
  
  static inline void

  io_pgtable_tlb_flush_leaf(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
  {
-   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
  }
  
  static inline void

@@ -232,7 +235,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
struct iommu_iotlb_gather * gather, unsigned long iova,
size_t granule)
  {
-   if (iop->cfg.tlb->tlb_add_page)
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
  }
  


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH] dt-bindings: arm-smmu: update the list of clocks

2020-02-26 Thread Robin Murphy


[ /me fires off MAINTAINERS patch... ]

On 20/02/2020 9:02 pm, Matthias Kaehlcke wrote:

On Thu, Feb 20, 2020 at 01:42:22PM +0530, Sharat Masetty wrote:

This patch adds a clock definition needed for powering on the GPU TBUs
and the GPU TCU.

Signed-off-by: Sharat Masetty 
---
  Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 6515dbe..235c0df 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -28,6 +28,7 @@ properties:
- enum:
- qcom,msm8996-smmu-v2
- qcom,msm8998-smmu-v2
+  - qcom,sc7180-smmu-v2


The addition of the compatible string isn't (directly) related with $subject,
this should be done in a separate patch.


...or the patch should just be accurately described as "add support for 
new variant X, which requires an additional clock Y".


And speaking of which, can anyone clarify how "TCU and TBU core clock" 
manages to abbreviate to "mem_iface_clk"? I would have naively assumed 
something like "core" would be most logical.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v2] iommu/arm-smmu: fix "hang" when games exit

2019-10-28 Thread Robin Murphy


On 2019-10-28 10:38 pm, Rob Clark wrote:

On Mon, Oct 28, 2019 at 3:20 PM Will Deacon  wrote:


Hi Rob,

On Mon, Oct 07, 2019 at 01:49:06PM -0700, Rob Clark wrote:

From: Rob Clark 

When games, browser, or anything using a lot of GPU buffers exits, there
can be many hundreds or thousands of buffers to unmap and free.  If the
GPU is otherwise suspended, this can cause arm-smmu to resume/suspend
for each buffer, resulting 5-10 seconds worth of reprogramming the
context bank (arm_smmu_write_context_bank()/arm_smmu_write_s2cr()/etc).
To the user it would appear that the system just locked up.

A simple solution is to use pm_runtime_put_autosuspend() instead, so we
don't immediately suspend the SMMU device.


Please can you reword the subject to be a bit more useful? The commit
message is great, but the subject is a bit like "fix bug in code" to me.


yeah, not the best $subject, but I wasn't quite sure how to fit
something better in a reasonable # of chars.. maybe something like:
"iommu/arm-smmu: optimize unmap but avoiding toggling runpm state"?


FWIW, I'd be inclined to frame it as something like "avoid pathological 
RPM behaviour for unmaps".


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v2] iommu/arm-smmu: fix "hang" when games exit

2019-10-09 Thread Robin Murphy


On 2019-10-07 9:49 pm, Rob Clark wrote:

From: Rob Clark 

When games, browser, or anything using a lot of GPU buffers exits, there
can be many hundreds or thousands of buffers to unmap and free.  If the
GPU is otherwise suspended, this can cause arm-smmu to resume/suspend
for each buffer, resulting 5-10 seconds worth of reprogramming the
context bank (arm_smmu_write_context_bank()/arm_smmu_write_s2cr()/etc).
To the user it would appear that the system just locked up.

A simple solution is to use pm_runtime_put_autosuspend() instead, so we
don't immediately suspend the SMMU device.


Reviewed-by: Robin Murphy 


Signed-off-by: Rob Clark 
---
v1: original
v2: unconditionally use autosuspend, rather than deciding based on what
 consumer does

  drivers/iommu/arm-smmu.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 3f1d55fb43c4..b7b41f5001bc 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -289,7 +289,7 @@ static inline int arm_smmu_rpm_get(struct arm_smmu_device 
*smmu)
  static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
  {
if (pm_runtime_enabled(smmu->dev))
-   pm_runtime_put(smmu->dev);
+   pm_runtime_put_autosuspend(smmu->dev);
  }
  
  static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)

@@ -1445,6 +1445,9 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
/* Looks ok, so add the device to the domain */
ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
  
+	pm_runtime_set_autosuspend_delay(smmu->dev, 20);

+   pm_runtime_use_autosuspend(smmu->dev);
+
  rpm_put:
arm_smmu_rpm_put(smmu);
return ret;


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 00/11] of: Fix DMA configuration for non-DT masters

2019-09-26 Thread Robin Murphy


On 2019-09-26 11:44 am, Nicolas Saenz Julienne wrote:

Robin, have you looked into supporting multiple dma-ranges? It's the
next thing
we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in
the
works already.


Multiple dma-ranges as far as configuring inbound windows should work
already other than the bug when there's any parent translation. But if
you mean supporting multiple DMA offsets and masks per device in the
DMA API, there's nothing in the works yet.


Sorry, I meant supporting multiple DMA offsets[1]. I think I could still make
it with a single DMA mask though.


The main problem for supporting that case in general is the disgusting 
carving up of the physical memory map you may have to do to guarantee 
that a single buffer allocation cannot ever span two windows with 
different offsets. I don't think we ever reached a conclusion on whether 
that was even achievable in practice.



There's also the in-between step of making of_dma_get_range() return a
size based on all the dma-ranges entries rather than only the first one
- otherwise, something like [1] can lead to pretty unworkable default
masks. We implemented that when doing acpi_dma_get_range(), it's just
that the OF counterpart never caught up.


Right. I suppose we assume any holes in the ranges are addressable by
the device but won't get used for other reasons (such as no memory
there). However, to be correct, the range of the dma offset plus mask
would need to be within the min start and max end addresses. IOW,
while we need to round up (0xa_8000_ - 0x2c1c_) to the next
power of 2, the 'correct' thing to do is round down.


IIUC I also have this issue on my list. The RPi4 PCIe block has an integration
bug that only allows DMA to the lower 3GB. With dma-ranges of size 0xc000_
you get a 32bit DMA mask wich is not what you need. So far I faked it in the
device-tree but I guess it be better to add an extra check in
of_dma_configure(), decrease the mask and print some kind of warning stating
that DMA addressing is suboptimal.


Yeah, there's just no way for masks to describe that the device can 
drive all the individual bits, just not in certain combinations :(


The plan I have sketched out there is to merge dma_pfn_offset and 
bus_dma_mask into a "DMA range" descriptor, so we can then hang one or 
more of those off a device to properly cope with all these weird 
interconnects. Conceptually it feels pretty straightforward; I think 
most of the challenge is in implementing it efficiently. Plus there's 
the question of whether it could also subsume the dma_mask as well.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 00/11] of: Fix DMA configuration for non-DT masters

2019-09-25 Thread Robin Murphy

On 25/09/2019 17:16, Rob Herring wrote:

On Wed, Sep 25, 2019 at 10:30 AM Nicolas Saenz Julienne
wrote:

On Wed, 2019-09-25 at 16:09 +0100, Robin Murphy wrote:

On 25/09/2019 15:52, Nicolas Saenz Julienne wrote:

On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote:

On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne
wrote:

Hi All,
this series tries to address one of the issues blocking us from
upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that
devices not represented in DT which sit behind a PCI bus fail to get the
bus' DMA addressing constraints.

This is due to the fact that of_dma_configure() assumes it's receiving a
DT node representing the device being configured, as opposed to the PCIe
bridge node we currently pass. This causes the code to directly jump
into PCI's parent node when checking for 'dma-ranges' and misses
whatever was set there.

To address this I create a new API in OF - inspired from Robin Murphys
original proposal[2] - which accepts a bus DT node as it's input in
order to configure a device's DMA constraints. The changes go deep into
of/address.c's implementation, as a device being having a DT node
assumption was pretty strong.

On top of this work, I also cleaned up of_dma_configure() removing its
redundant arguments and creating an alternative function for the special
cases
not applicable to either the above case or the default usage.

IMO the resulting functions are more explicit. They will probably
surface some hacky usages that can be properly fixed as I show with the
DT fixes on the Layerscape platform.

This was also tested on a Raspberry Pi 4 with a custom PCIe driver and
on a Seattle AMD board.

Humm, I've been working on this issue too. Looks similar though yours
has a lot more churn and there's some other bugs I've found.

That's good news, and yes now that I see it, some stuff on my series is
overly
complicated. Specially around of_translate_*().

On top of that, you removed in of_dma_get_range():

- /*
-* At least empty ranges has to be defined for parent node if
-* DMA is supported
-*/
- if (!ranges)
- break;

Which I assumed was bound to the standard and makes things easier.

Can you test out this branch[1]. I don't have any h/w needing this,
but wrote a unittest and tested with modified QEMU.

I reviewed everything, I did find a minor issue, see the patch attached.

WRT that patch, the original intent of "force_dma" was purely to
consider a device DMA-capable regardless of the presence of
"dma-ranges". Expecting of_dma_configure() to do anything for a non-OF
device has always been bogus - magic paravirt devices which appear out
of nowhere and expect to be treated as genuine DMA masters are a
separate problem that we haven't really approached yet.

I agree it's clearly abusing the function. I have no problem with the behaviour
change if it's OK with you.

Thinking about it, you could probably just remove that call from the Xen
DRM driver now anyway - since the dma-direct rework, we lost the ability
to set dma_dummy_ops by default, and NULL ops now represent what it
(presumably) wants.

Robin, have you looked into supporting multiple dma-ranges? It's the next thing
we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in the
works already.

Multiple dma-ranges as far as configuring inbound windows should work
already other than the bug when there's any parent translation. But if
you mean supporting multiple DMA offsets and masks per device in the
DMA API, there's nothing in the works yet.

There's also the in-between step of making of_dma_get_range() return a
size based on all the dma-ranges entries rather than only the first one
- otherwise, something like [1] can lead to pretty unworkable default
masks. We implemented that when doing acpi_dma_get_range(), it's just
that the OF counterpart never caught up.

Robin.

[1]
http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a2814af56b3486c2985a95540a88d8f9fa3a699f

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 00/11] of: Fix DMA configuration for non-DT masters

2019-09-25 Thread Robin Murphy


On 25/09/2019 15:52, Nicolas Saenz Julienne wrote:

On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote:

On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne
 wrote:

Hi All,
this series tries to address one of the issues blocking us from
upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that
devices not represented in DT which sit behind a PCI bus fail to get the
bus' DMA addressing constraints.

This is due to the fact that of_dma_configure() assumes it's receiving a
DT node representing the device being configured, as opposed to the PCIe
bridge node we currently pass. This causes the code to directly jump
into PCI's parent node when checking for 'dma-ranges' and misses
whatever was set there.

To address this I create a new API in OF - inspired from Robin Murphys
original proposal[2] - which accepts a bus DT node as it's input in
order to configure a device's DMA constraints. The changes go deep into
of/address.c's implementation, as a device being having a DT node
assumption was pretty strong.

On top of this work, I also cleaned up of_dma_configure() removing its
redundant arguments and creating an alternative function for the special
cases
not applicable to either the above case or the default usage.

IMO the resulting functions are more explicit. They will probably
surface some hacky usages that can be properly fixed as I show with the
DT fixes on the Layerscape platform.

This was also tested on a Raspberry Pi 4 with a custom PCIe driver and
on a Seattle AMD board.


Humm, I've been working on this issue too. Looks similar though yours
has a lot more churn and there's some other bugs I've found.


That's good news, and yes now that I see it, some stuff on my series is overly
complicated. Specially around of_translate_*().

On top of that, you removed in of_dma_get_range():

-   /*
-* At least empty ranges has to be defined for parent node if
-* DMA is supported
-*/
-   if (!ranges)
-   break;

Which I assumed was bound to the standard and makes things easier.


Can you test out this branch[1]. I don't have any h/w needing this,
but wrote a unittest and tested with modified QEMU.


I reviewed everything, I did find a minor issue, see the patch attached.


WRT that patch, the original intent of "force_dma" was purely to 
consider a device DMA-capable regardless of the presence of 
"dma-ranges". Expecting of_dma_configure() to do anything for a non-OF 
device has always been bogus - magic paravirt devices which appear out 
of nowhere and expect to be treated as genuine DMA masters are a 
separate problem that we haven't really approached yet.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v3 0/2] iommu: handle drivers that manage iommu directly

2019-09-10 Thread Robin Murphy


On 06/09/2019 22:44, Rob Clark wrote:

From: Rob Clark 

One of the challenges we have to enable the aarch64 laptops upstream
is dealing with the fact that the bootloader enables the display and
takes the corresponding SMMU context-bank out of BYPASS.  Unfortunately,
currently, the IOMMU framework attaches a DMA (or potentially an
IDENTITY) domain before the driver is probed and has a chance to
intervene and shutdown scanout.  Which makes things go horribly wrong.


Nope, things already went horribly wrong in arm_smmu_device_reset() - 
sure, sometimes for some configurations it might *seem* like they didn't 
and that you can fudge the context bank state at arm's length from core 
code later, but the truth is that impl->cfg_probe is your last chance to 
guarantee that any necessary SMMU state is preserved.


The remainder of the problem involves reworking default domain 
allocation such that we can converge on what iommu_request_dm_for_dev() 
currently does but without the momentary attachment to a translation 
domain to cause hiccups. That's starting here:


https://lore.kernel.org/linux-iommu/cover.1566353521.git.sai.praneeth.prak...@intel.com/


But in this case, drm/msm is already directly managing it's IOMMUs
directly, the DMA API attached iommu_domain simply gets in the way.
This series adds a way that a driver can indicate to drivers/iommu
that it does not wish to have an DMA managed iommu_domain attached.
This way, drm/msm can shut down scanout cleanly before attaching it's
own iommu_domain.

NOTE that to get things working with arm-smmu on the aarch64 laptops,
you also need a patchset[1] from Bjorn Andersson to inherit SMMU config
at boot, when it is already enabled.

[1] https://www.spinics.net/lists/arm-kernel/msg732246.html

NOTE that in discussion of previous revisions, RMRR came up.  This is
not really a replacement for RMRR (nor does RMRR really provide any
more information than we already get from EFI GOP, or DT in the
simplefb case).  I also don't see how RMRR could help w/ SMMU handover
of CB/SMR config (Bjorn's patchset[1]) without defining new tables.


The point of RMRR-like-things is that they identify not just the memory 
region but also the specific device accessing them, which means the 
IOMMU driver knows up-front which IDs etc. it must be careful not to 
disrupt. Obviously for SMMU that *would* be some new table (designed to 
encompass everything relevant) since literal RMRRs are specifically an 
Intel VT-d thing.



This perhaps doesn't solve the more general case of bootloader enabled
display for drivers that actually want to use DMA API managed IOMMU.
But it does also happen to avoid a related problem with GPU, caused by
the DMA domain claiming the context bank that the GPU firmware expects
to use.


Careful bringing that up again, or I really will rework the context bank 
allocator to avoid this default domain problem entirely... ;)


Robin.


 And it avoids spurious TLB invalidation coming from the unused
DMA domain.  So IMHO this is a useful and necessary change.

Rob Clark (2):
   iommu: add support for drivers that manage iommu explicitly
   drm/msm: mark devices where iommu is managed by driver

  drivers/gpu/drm/msm/adreno/adreno_device.c | 1 +
  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c| 1 +
  drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c   | 1 +
  drivers/gpu/drm/msm/msm_drv.c  | 1 +
  drivers/iommu/iommu.c  | 2 +-
  drivers/iommu/of_iommu.c   | 3 +++
  include/linux/device.h | 3 ++-
  7 files changed, 10 insertions(+), 2 deletions(-)


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH] iommu/arm-smmu: fix "hang" when games exit

2019-09-10 Thread Robin Murphy


On 07/09/2019 18:50, Rob Clark wrote:

From: Rob Clark 

When games, browser, or anything using a lot of GPU buffers exits, there
can be many hundreds or thousands of buffers to unmap and free.  If the
GPU is otherwise suspended, this can cause arm-smmu to resume/suspend
for each buffer, resulting 5-10 seconds worth of reprogramming the
context bank (arm_smmu_write_context_bank()/arm_smmu_write_s2cr()/etc).
To the user it would appear that the system is locked up.

A simple solution is to use pm_runtime_put_autosuspend() instead, so we
don't immediately suspend the SMMU device.

Signed-off-by: Rob Clark 
---
Note: I've tied the autosuspend enable/delay to the consumer device,
based on the reasoning that if the consumer device benefits from using
an autosuspend delay, then it's corresponding SMMU probably does too.
Maybe that is overkill and we should just unconditionally enable
autosuspend.


I'm not sure there's really any reason to expect that a supplier's usage 
model when doing things for itself bears any relation to that of its 
consumer(s), so I'd certainly lean towards the "unconditional" argument 
myself.


Of course ideally we'd skip resuming altogether in the map/unmap paths 
(since resume implies a full TLB reset anyway), but IIRC that approach 
started to get messy in the context of the initial RPM patchset. I'm 
planning to fiddle around a bit more to clean up the implementation of 
the new iommu_flush_ops stuff, so I've made a note to myself to revisit 
RPM to see if there's a sufficiently clean way to do better. In the 
meantime, though, I don't have any real objection to using some 
reasonable autosuspend delay on the principle that if we've been woken 
up to map/unmap one page, there's a high likelihood that more will 
follow in short order (and in the configuration slow-paths it won't have 
much impact either way).


Robin.


  drivers/iommu/arm-smmu.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c2733b447d9c..73a0dd53c8a3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -289,7 +289,7 @@ static inline int arm_smmu_rpm_get(struct arm_smmu_device 
*smmu)
  static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
  {
if (pm_runtime_enabled(smmu->dev))
-   pm_runtime_put(smmu->dev);
+   pm_runtime_put_autosuspend(smmu->dev);
  }
  
  static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)

@@ -1445,6 +1445,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
/* Looks ok, so add the device to the domain */
ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
  
+#ifdef CONFIG_PM

+   /* TODO maybe device_link_add() should do this for us? */
+   if (dev->power.use_autosuspend) {
+   pm_runtime_set_autosuspend_delay(smmu->dev,
+   dev->power.autosuspend_delay);
+   pm_runtime_use_autosuspend(smmu->dev);
+   }
+#endif
+
  rpm_put:
arm_smmu_rpm_put(smmu);
return ret;


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v3 0/2] iommu/arm-smmu: Split pagetable support

2019-08-16 Thread Robin Murphy


On 16/08/2019 19:12, Rob Clark wrote:

On Fri, Aug 16, 2019 at 9:58 AM Robin Murphy  wrote:


Hi Jordan,

On 15/08/2019 16:33, Jordan Crouse wrote:

On Wed, Aug 07, 2019 at 04:21:38PM -0600, Jordan Crouse wrote:

(Sigh, resend. I freaked out my SMTP server)

This is part of an ongoing evolution for enabling split pagetable support for
arm-smmu. Previous versions can be found [1].

In the discussion for v2 Robin pointed out that this is a very Adreno specific
use case and that is exactly true. Not only do we want to configure and use a
pagetable in the TTBR1 space, we also want to configure the TTBR0 region but
not allocate a pagetable for it or touch it until the GPU hardware does so. As
much as I want it to be a generic concept it really isn't.

This revision leans into that idea. Most of the same io-pgtable code is there
but now it is wrapped as an Adreno GPU specific format that is selected by the
compatible string in the arm-smmu device.

Additionally, per Robin's suggestion we are skipping creating a TTBR0 pagetable
to save on wasted memory.

This isn't as clean as I would like it to be but I think that this is a better
direction than trying to pretend that the generic format would work.

I'm tempting fate by posting this and then taking some time off, but I wanted
to try to kick off a conversation or at least get some flames so I can try to
refine this again next week. Please take a look and give some advice on the
direction.


Will, Robin -

Modulo the impl changes from Robin, do you think that using a dedicated
pagetable format is the right approach for supporting split pagetables for the
Adreno GPU?


How many different Adreno drivers would benefit from sharing it?


Hypothetically everything back to a3xx, so I *could* see usefulness of
this in qcom_iommu (or maybe even msm-iommu).  OTOH maybe with
"modularizing" arm-smmu we could re-combine qcom_iommu and arm-smmu.


Indeed, that's certainly something I'm planning to investigate as a 
future refactoring step.



And as a practical matter, I'm not sure if anyone will get around to
backporting per-context pagetables as far back as a3xx.

BR,
-R


The more I come back to this, the more I'm convinced that io-pgtable
should focus on the heavy lifting of pagetable management - the code
that nobody wants to have to write at all, let alone more than once -
and any subtleties which aren't essential to that should be pushed back
into whichever callers actually care. Consider that already, literally
no caller actually uses an unmodified stage 1 TCR value as provided in
the io_pgtable_cfg.

I feel it would be most productive to elaborate further in the form of
patches, so let me get right on that and try to bash something out
before I go home tonight...


...and now there's a rough WIP branch here:

http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/pgtable

I'll finish testing and polishing those patches at some point next week, 
probably, but hopefully they're sufficiently illustrative for the moment.


Robin.

Re: [Freedreno] [PATCH v3 0/2] iommu/arm-smmu: Split pagetable support

2019-08-16 Thread Robin Murphy


Hi Jordan,

On 15/08/2019 16:33, Jordan Crouse wrote:

On Wed, Aug 07, 2019 at 04:21:38PM -0600, Jordan Crouse wrote:

(Sigh, resend. I freaked out my SMTP server)

This is part of an ongoing evolution for enabling split pagetable support for
arm-smmu. Previous versions can be found [1].

In the discussion for v2 Robin pointed out that this is a very Adreno specific
use case and that is exactly true. Not only do we want to configure and use a
pagetable in the TTBR1 space, we also want to configure the TTBR0 region but
not allocate a pagetable for it or touch it until the GPU hardware does so. As
much as I want it to be a generic concept it really isn't.

This revision leans into that idea. Most of the same io-pgtable code is there
but now it is wrapped as an Adreno GPU specific format that is selected by the
compatible string in the arm-smmu device.

Additionally, per Robin's suggestion we are skipping creating a TTBR0 pagetable
to save on wasted memory.

This isn't as clean as I would like it to be but I think that this is a better
direction than trying to pretend that the generic format would work.

I'm tempting fate by posting this and then taking some time off, but I wanted
to try to kick off a conversation or at least get some flames so I can try to
refine this again next week. Please take a look and give some advice on the
direction.


Will, Robin -

Modulo the impl changes from Robin, do you think that using a dedicated
pagetable format is the right approach for supporting split pagetables for the
Adreno GPU?


How many different Adreno drivers would benefit from sharing it?

The more I come back to this, the more I'm convinced that io-pgtable 
should focus on the heavy lifting of pagetable management - the code 
that nobody wants to have to write at all, let alone more than once - 
and any subtleties which aren't essential to that should be pushed back 
into whichever callers actually care. Consider that already, literally 
no caller actually uses an unmodified stage 1 TCR value as provided in 
the io_pgtable_cfg.


I feel it would be most productive to elaborate further in the form of 
patches, so let me get right on that and try to bash something out 
before I go home tonight...


Robin.


If so, then is adding the changes to io-pgtable-arm.c possible for 5.4 and then
add the implementation specific code on top of Robin's stack later or do you
feel they should come as part of a package deal?

Jordan


Jordan Crouse (2):
   iommu/io-pgtable-arm: Add support for ARM_ADRENO_GPU_LPAE io-pgtable
 format
   iommu/arm-smmu: Add support for Adreno GPU pagetable formats

  drivers/iommu/arm-smmu.c   |   8 +-
  drivers/iommu/io-pgtable-arm.c | 214 ++---
  drivers/iommu/io-pgtable.c |   1 +
  include/linux/io-pgtable.h |   2 +
  4 files changed, 209 insertions(+), 16 deletions(-)

--
2.7.4

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno



___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [RESEND PATCH v2 3/3] iommu/arm-smmu: Add support for DOMAIN_ATTR_SPLIT_TABLES

2019-07-10 Thread Robin Murphy


On 08/07/2019 20:00, Jordan Crouse wrote:

When DOMAIN_ATTR_SPLIT_TABLES is specified for pass ARM_64_LPAE_SPLIT_S1
to io_pgtable_ops to allocate and initialize TTBR0 and TTBR1 pagetables.

v3: Moved all the pagetable specific work into io-pgtable-arm
in a previous patch.

Signed-off-by: Jordan Crouse 
---

  drivers/iommu/arm-smmu.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 653b6b3..7a6b4bb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -257,6 +257,7 @@ struct arm_smmu_domain {
boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
+   u32 attributes;
struct iommu_domain domain;
  };
  
@@ -832,7 +833,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,

ias = smmu->va_size;
oas = smmu->ipa_size;
if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
-   fmt = ARM_64_LPAE_S1;
+   if (smmu_domain->attributes &
+   (1 << DOMAIN_ATTR_SPLIT_TABLES))
+   fmt = ARM_64_LPAE_SPLIT_S1;
+   else
+   fmt = ARM_64_LPAE_S1;
} else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
fmt = ARM_32_LPAE_S1;
ias = min(ias, 32UL);
@@ -1582,6 +1587,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_SPLIT_TABLES:
+   *(int *)data = !!(smmu_domain->attributes &
+   (1 << DOMAIN_ATTR_SPLIT_TABLES));


I'm not really a fan of always claiming to support this but silently 
ignoring it depending on hardware/kernel configuration - it seems 
somewhat tricky to make callers properly robust against making false 
assumptions (e.g. consider if the system is booted with 
"arm-smmu.force_stage=2").


Robin.


+   return 0;
default:
return -ENODEV;
}
@@ -1622,6 +1631,11 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
else
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
break;
+   case DOMAIN_ATTR_SPLIT_TABLES:
+   if (*((int *)data))
+   smmu_domain->attributes |=
+   (1 << DOMAIN_ATTR_SPLIT_TABLES);
+   break;
default:
ret = -ENODEV;
}


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [RESEND PATCH v2 2/3] iommu/io-pgtable-arm: Add support for AARCH64 split pagetables

2019-07-10 Thread Robin Murphy


Hi Jordan,

On 08/07/2019 20:00, Jordan Crouse wrote:

Add a new sub-format ARM_64_LPAE_SPLIT_S1 to create and set up split
pagetables (TTBR0 and TTBR1). The initialization function sets up the
correct va_size and sign extension bits and programs the TCR registers.
Split pagetable formats use their own own map/unmap wrappers to ensure
that the correct pagetable is selected based on the incoming iova but
most of the heavy lifting is common to the other formats.


I'm somewhat concerned that this implementation is very closely tied to 
the current Adreno use-case, and won't easily generalise in future to 
other potential TTBR1 uses which have been tossed around, like 
SMMUv3-without-substream-ID.


Furthermore, even for the Adreno pretend-PASID case it appears to be a 
bit too fragile for comfort - given that a DOMAIN_ATTR_SPLIT_TABLES 
domain doesn't look any different from a regular one from the users' 
point of view, what's to stop them making "without PASID" mappings in 
the lower half of the address space, and thus unwittingly pulling the 
rug out from under their own feet upon attaching an aux domain? In fact 
allocating a TTBR0 table at all for the main domain seems like little 
more than a waste of memory.




Signed-off-by: Jordan Crouse 
---

  drivers/iommu/io-pgtable-arm.c | 261 +
  drivers/iommu/io-pgtable.c |   1 +
  include/linux/io-pgtable.h |   2 +
  3 files changed, 240 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 161a7d56..aec35e5 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -118,7 +118,12 @@
  #define ARM_LPAE_TCR_TG0_64K  (1 << 14)
  #define ARM_LPAE_TCR_TG0_16K  (2 << 14)
  
+#define ARM_LPAE_TCR_TG1_4K		(0 << 30)

+#define ARM_LPAE_TCR_TG1_64K   (1 << 30)
+#define ARM_LPAE_TCR_TG1_16K   (2 << 30)
+
  #define ARM_LPAE_TCR_SH0_SHIFT12
+#define ARM_LPAE_TCR_SH1_SHIFT 28
  #define ARM_LPAE_TCR_SH0_MASK 0x3
  #define ARM_LPAE_TCR_SH_NS0
  #define ARM_LPAE_TCR_SH_OS2
@@ -126,6 +131,8 @@
  
  #define ARM_LPAE_TCR_ORGN0_SHIFT	10

  #define ARM_LPAE_TCR_IRGN0_SHIFT  8
+#define ARM_LPAE_TCR_ORGN1_SHIFT   26
+#define ARM_LPAE_TCR_IRGN1_SHIFT   24
  #define ARM_LPAE_TCR_RGN_MASK 0x3
  #define ARM_LPAE_TCR_RGN_NC   0
  #define ARM_LPAE_TCR_RGN_WBWA 1
@@ -136,6 +143,7 @@
  #define ARM_LPAE_TCR_SL0_MASK 0x3
  
  #define ARM_LPAE_TCR_T0SZ_SHIFT		0

+#define ARM_LPAE_TCR_T1SZ_SHIFT16
  #define ARM_LPAE_TCR_SZ_MASK  0xf
  
  #define ARM_LPAE_TCR_PS_SHIFT		16

@@ -152,6 +160,14 @@
  #define ARM_LPAE_TCR_PS_48_BIT0x5ULL
  #define ARM_LPAE_TCR_PS_52_BIT0x6ULL
  
+#define ARM_LPAE_TCR_SEP_SHIFT		47

+#define ARM_LPAE_TCR_SEP_31(0x0ULL << ARM_LPAE_TCR_SEP_SHIFT)
+#define ARM_LPAE_TCR_SEP_35(0x1ULL << ARM_LPAE_TCR_SEP_SHIFT)
+#define ARM_LPAE_TCR_SEP_39(0x2ULL << ARM_LPAE_TCR_SEP_SHIFT)
+#define ARM_LPAE_TCR_SEP_41(0x3ULL << ARM_LPAE_TCR_SEP_SHIFT)
+#define ARM_LPAE_TCR_SEP_43(0x4ULL << ARM_LPAE_TCR_SEP_SHIFT)
+#define ARM_LPAE_TCR_SEP_UPSTREAM  (0x7ULL << ARM_LPAE_TCR_SEP_SHIFT)


This is a specific detail of SMMUv2, and nothing to do with the 
LPAE/AArch64 VMSA formats.



+
  #define ARM_LPAE_MAIR_ATTR_SHIFT(n)   ((n) << 3)
  #define ARM_LPAE_MAIR_ATTR_MASK   0xff
  #define ARM_LPAE_MAIR_ATTR_DEVICE 0x04
@@ -179,11 +195,12 @@ struct arm_lpae_io_pgtable {
struct io_pgtable   iop;
  
  	int			levels;

+   u32 sep;
size_t  pgd_size;
unsigned long   pg_shift;
unsigned long   bits_per_level;
  
-	void			*pgd;

+   void*pgd[2];
  };
  
  typedef u64 arm_lpae_iopte;

@@ -426,7 +443,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
arm_lpae_iopte pte;
  
  	if (data->iop.fmt == ARM_64_LPAE_S1 ||

-   data->iop.fmt == ARM_32_LPAE_S1) {
+   data->iop.fmt == ARM_32_LPAE_S1 ||
+   data->iop.fmt == ARM_64_LPAE_SPLIT_S1) {
pte = ARM_LPAE_PTE_nG;
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
@@ -470,11 +488,10 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
return pte;
  }
  
-static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,

-   phys_addr_t paddr, size_t size, int iommu_prot)
+static int _arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
+   phys_addr_t paddr, size_t size, int iommu_prot,
+   arm_lpae_iopte *ptep)
  {
-   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-

Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Robin Murphy


On 03/06/2019 11:47, Rob Clark wrote:

On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:


On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:


On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:


On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:


On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:


This solves a problem we see with drm/msm, caused by getting
iommu_dma_ops while we attach our own domain and manage it directly at
the iommu API level:

   [0038] user address but active_mm is swapper
   Internal error: Oops: 9605 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 #90
   Hardware name: xxx (DT)
   Workqueue: events deferred_probe_work_func
   pstate: 80c9 (Nzcv daif +PAN +UAO)
   pc : iommu_dma_map_sg+0x7c/0x2c8
   lr : iommu_dma_map_sg+0x40/0x2c8
   sp : ff80095eb4f0
   x29: ff80095eb4f0 x28: 
   x27: ffc0f9431578 x26: 
   x25:  x24: 0003
   x23: 0001 x22: ffc0fa9ac010
   x21:  x20: ffc0fab40980
   x19: ffc0fab40980 x18: 0003
   x17: 01c4 x16: 0007
   x15: 000e x14: 
   x13:  x12: 0028
   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
   x9 :  x8 : ffc0fab409a0
   x7 :  x6 : 0002
   x5 : 0001 x4 : 
   x3 : 0001 x2 : 0002
   x1 : ffc0f9431578 x0 : 
   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
   Call trace:
iommu_dma_map_sg+0x7c/0x2c8
__iommu_map_sg_attrs+0x70/0x84
get_pages+0x170/0x1e8
msm_gem_get_iova+0x8c/0x128
_msm_gem_kernel_new+0x6c/0xc8
msm_gem_kernel_new+0x4c/0x58
dsi_tx_buf_alloc_6g+0x4c/0x8c
msm_dsi_host_modeset_init+0xc8/0x108
msm_dsi_modeset_init+0x54/0x18c
_dpu_kms_drm_obj_init+0x430/0x474
dpu_kms_hw_init+0x5f8/0x6b4
msm_drm_bind+0x360/0x6c8
try_to_bring_up_master.part.7+0x28/0x70
component_master_add_with_match+0xe8/0x124
msm_pdev_probe+0x294/0x2b4
platform_drv_probe+0x58/0xa4
really_probe+0x150/0x294
driver_probe_device+0xac/0xe8
__device_attach_driver+0xa4/0xb4
bus_for_each_drv+0x98/0xc8
__device_attach+0xac/0x12c
device_initial_probe+0x24/0x30
bus_probe_device+0x38/0x98
deferred_probe_work_func+0x78/0xa4
process_one_work+0x24c/0x3dc
worker_thread+0x280/0x360
kthread+0x134/0x13c
ret_from_fork+0x10/0x18
   Code: d284 91000725 6b17039f 5400048a (f9401f40)
   ---[ end trace f22dda57f3648e2c ]---
   Kernel panic - not syncing: Fatal exception
   SMP: stopping secondary CPUs
   Kernel Offset: disabled
   CPU features: 0x0,22802a18
   Memory Limit: none

The problem is that when drm/msm does it's own iommu_attach_device(),
now the domain returned by iommu_get_domain_for_dev() is drm/msm's
domain, and it doesn't have domain->iova_cookie.

We kind of avoided this problem prior to sdm845/dpu because the iommu
was attached to the mdp node in dt, which is a child of the toplevel
mdss node (which corresponds to the dev passed in dma_map_sg()).  But
with sdm845, now the iommu is attached at the mdss level so we hit the
iommu_dma_ops in dma_map_sg().

But auto allocating/attaching a domain before the driver is probed was
already a blocking problem for enabling per-context pagetables for the
GPU.  This problem is also now solved with this patch.

Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in of_dma_configure
Tested-by: Douglas Anderson 
Signed-off-by: Rob Clark 
---
This is an alternative/replacement for [1].  What it lacks in elegance
it makes up for in practicality ;-)

[1] https://patchwork.freedesktop.org/patch/264930/

  drivers/of/device.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 5957cd4fa262..15ffee00fb22 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev)
 return device_add(&ofdev->dev);
  }

+static const struct of_device_id iommu_blacklist[] = {
+   { .compatible = "qcom,mdp4" },
+   { .compatible = "qcom,mdss" },
+   { .compatible = "qcom,sdm845-mdss" },
+   { .compatible = "qcom,adreno" },
+   {}
+};


Not completely clear to whether this is still needed or not, but this
really won't scale. Why can't the driver for these devices override
whatever has been setup by default?



fwiw, at the moment it is not needed, but it will become needed again
to implement per-context pagetables (although I suppose for this we
only need to blacklist qcom,adreno and not also the display nodes).


So, another case I've come across, on the display side.. I'm working
on handling the case where bootloader enables display (and takes iommu
out of reset).. as soon as DMA doma

Re: [Freedreno] [PATCH v2 03/15] iommu/arm-smmu: Add split pagetable support for arm-smmu-v2

2019-05-21 Thread Robin Murphy


On 21/05/2019 17:13, Jordan Crouse wrote:

Add support for a split pagetable (TTBR0/TTBR1) scheme for arm-smmu-v2.
If split pagetables are enabled, create a pagetable for TTBR1 and set
up the sign extension bit so that all IOVAs with that bit set are mapped
and translated from the TTBR1 pagetable.

Signed-off-by: Jordan Crouse 
---

  drivers/iommu/arm-smmu-regs.h  |  19 +
  drivers/iommu/arm-smmu.c   | 179 ++---
  drivers/iommu/io-pgtable-arm.c |   3 +-
  3 files changed, 186 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index e9132a9..23f27c2 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -195,7 +195,26 @@ enum arm_smmu_s2cr_privcfg {
  #define RESUME_RETRY  (0 << 0)
  #define RESUME_TERMINATE  (1 << 0)
  
+#define TTBCR_EPD1			(1 << 23)

+#define TTBCR_T0SZ_SHIFT   0
+#define TTBCR_T1SZ_SHIFT   16
+#define TTBCR_IRGN1_SHIFT  24
+#define TTBCR_ORGN1_SHIFT  26
+#define TTBCR_RGN_WBWA 1
+#define TTBCR_SH1_SHIFT28
+#define TTBCR_SH_IS3
+
+#define TTBCR_TG1_16K  (1 << 30)
+#define TTBCR_TG1_4K   (2 << 30)
+#define TTBCR_TG1_64K  (3 << 30)
+
  #define TTBCR2_SEP_SHIFT  15
+#define TTBCR2_SEP_31  (0x0 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_35  (0x1 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_39  (0x2 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_41  (0x3 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_43  (0x4 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_47  (0x5 << TTBCR2_SEP_SHIFT)
  #define TTBCR2_SEP_UPSTREAM   (0x7 << TTBCR2_SEP_SHIFT)
  #define TTBCR2_AS (1 << 4)
  
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c

index a795ada..e09c0e6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -152,6 +152,7 @@ struct arm_smmu_cb {
u32 tcr[2];
u32 mair[2];
struct arm_smmu_cfg *cfg;
+   unsigned long   split_table_mask;
  };
  
  struct arm_smmu_master_cfg {

@@ -253,13 +254,14 @@ enum arm_smmu_domain_stage {
  
  struct arm_smmu_domain {

struct arm_smmu_device  *smmu;
-   struct io_pgtable_ops   *pgtbl_ops;
+   struct io_pgtable_ops   *pgtbl_ops[2];


This seems a bit off - surely the primary domain and aux domain only 
ever need one set of tables each, but either way there's definitely 
unnecessary redundancy in having four sets of io_pgtable_ops between them.



const struct iommu_gather_ops   *tlb_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
+   u32 attributes;
struct iommu_domain domain;
  };
  
@@ -621,6 +623,85 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)

return IRQ_HANDLED;
  }
  
+/* Adjust the context bank settings to support TTBR1 */

+static void arm_smmu_init_ttbr1(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
+   int pgsize = 1 << __ffs(pgtbl_cfg->pgsize_bitmap);
+
+   /* Enable speculative walks through the TTBR1 */
+   cb->tcr[0] &= ~TTBCR_EPD1;
+
+   cb->tcr[0] |= TTBCR_SH_IS << TTBCR_SH1_SHIFT;
+   cb->tcr[0] |= TTBCR_RGN_WBWA << TTBCR_IRGN1_SHIFT;
+   cb->tcr[0] |= TTBCR_RGN_WBWA << TTBCR_ORGN1_SHIFT;
+
+   switch (pgsize) {
+   case SZ_4K:
+   cb->tcr[0] |= TTBCR_TG1_4K;
+   break;
+   case SZ_16K:
+   cb->tcr[0] |= TTBCR_TG1_16K;
+   break;
+   case SZ_64K:
+   cb->tcr[0] |= TTBCR_TG1_64K;
+   break;
+   }
+
+   /*
+* Outside of the special 49 bit UBS case that has a dedicated sign
+* extension bit, setting the SEP for any other va_size will force us to
+* shrink the size of the T0/T1 regions by one bit to accommodate the
+* SEP
+*/
+   if (smmu->va_size != 48) {
+   /* Replace the T0 size */
+   cb->tcr[0] &= ~(0x3f << TTBCR_T0SZ_SHIFT);
+   cb->tcr[0] |= (64ULL - smmu->va_size - 1) << TTBCR_T0SZ_SHIFT;
+   /* Set the T1 size */
+   cb->tcr[0] |= (64ULL - smmu->va_size - 1) << TTBCR_T1SZ_SHIFT;
+

Re: [Freedreno] [PATCH v2 01/15] iommu/arm-smmu: Allow IOMMU enabled devices to skip DMA domains

2019-05-21 Thread Robin Murphy


On 21/05/2019 17:13, Jordan Crouse wrote:

Allow IOMMU enabled devices specified on an opt-in list to create a
default identity domain for a new IOMMU group and bypass the DMA
domain created by the IOMMU core. This allows the group to be properly
set up but otherwise skips touching the hardware until the client
device attaches a unmanaged domain of its own.


All the cool kids are using iommu_request_dm_for_dev() to force an 
identity domain for particular devices, won't that suffice for this case 
too? There is definite scope for improvement in this area, so I'd really 
like to keep things as consistent as possible to make that easier in future.


Robin.


Signed-off-by: Jordan Crouse 
---

  drivers/iommu/arm-smmu.c | 42 ++
  drivers/iommu/iommu.c| 29 +++--
  include/linux/iommu.h|  3 +++
  3 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5e54cc0..a795ada 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1235,6 +1235,35 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
return 0;
  }
  
+struct arm_smmu_client_match_data {

+   bool use_identity_domain;
+};
+
+static const struct arm_smmu_client_match_data qcom_adreno = {
+   .use_identity_domain = true,
+};
+
+static const struct arm_smmu_client_match_data qcom_mdss = {
+   .use_identity_domain = true,
+};
+
+static const struct of_device_id arm_smmu_client_of_match[] = {
+   { .compatible = "qcom,adreno", .data = &qcom_adreno },
+   { .compatible = "qcom,mdp4", .data = &qcom_mdss },
+   { .compatible = "qcom,mdss", .data = &qcom_mdss },
+   { .compatible = "qcom,sdm845-mdss", .data = &qcom_mdss },
+   {},
+};
+
+static const struct arm_smmu_client_match_data *
+arm_smmu_client_data(struct device *dev)
+{
+   const struct of_device_id *match =
+   of_match_device(arm_smmu_client_of_match, dev);
+
+   return match ? match->data : NULL;
+}
+
  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device 
*dev)
  {
int ret;
@@ -1552,6 +1581,7 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
  {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_device *smmu = fwspec_smmu(fwspec);
+   const struct arm_smmu_client_match_data *client;
struct iommu_group *group = NULL;
int i, idx;
  
@@ -1573,6 +1603,18 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)

else
group = generic_device_group(dev);
  
+	client = arm_smmu_client_data(dev);

+
+   /*
+* If the client chooses to bypass the dma domain, create a identity
+* domain as a default placeholder. This will give the device a
+* default domain but skip DMA operations and not consume a context
+* bank
+*/
+   if (client && client->no_dma_domain)
+   iommu_group_set_default_domain(group, dev,
+   IOMMU_DOMAIN_IDENTITY);
+
return group;
  }
  
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c

index 67ee662..af3e1ed 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1062,6 +1062,24 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
return group;
  }
  
+struct iommu_domain *iommu_group_set_default_domain(struct iommu_group *group,

+   struct device *dev, unsigned int type)
+{
+   struct iommu_domain *dom;
+
+   dom = __iommu_domain_alloc(dev->bus, type);
+   if (!dom)
+   return NULL;
+
+   /* FIXME: Error if the default domain is already set? */
+   group->default_domain = dom;
+   if (!group->domain)
+   group->domain = dom;
+
+   return dom;
+}
+EXPORT_SYMBOL_GPL(iommu_group_set_default_domain);
+
  /**
   * iommu_group_get_for_dev - Find or create the IOMMU group for a device
   * @dev: target device
@@ -1099,9 +1117,12 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
if (!group->default_domain) {
struct iommu_domain *dom;
  
-		dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);

+   dom = iommu_group_set_default_domain(group, dev,
+   iommu_def_domain_type);
+
if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
-   dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
+   dom = iommu_group_set_default_domain(group, dev,
+   IOMMU_DOMAIN_DMA);
if (dom) {
dev_warn(dev,
 "failed to allocate default IOMMU domain of 
type %u; falling back to IOMMU_DOMAIN_DMA",
@@ -1109,10 +1130,6 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)

Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops

2018-12-03 Thread Robin Murphy


Hi Rob,

On 01/12/2018 16:53, Rob Clark wrote:

This solves a problem we see with drm/msm, caused by getting
iommu_dma_ops while we attach our own domain and manage it directly at
the iommu API level:

   [0038] user address but active_mm is swapper
   Internal error: Oops: 9605 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 #90
   Hardware name: xxx (DT)
   Workqueue: events deferred_probe_work_func
   pstate: 80c9 (Nzcv daif +PAN +UAO)
   pc : iommu_dma_map_sg+0x7c/0x2c8
   lr : iommu_dma_map_sg+0x40/0x2c8
   sp : ff80095eb4f0
   x29: ff80095eb4f0 x28: 
   x27: ffc0f9431578 x26: 
   x25:  x24: 0003
   x23: 0001 x22: ffc0fa9ac010
   x21:  x20: ffc0fab40980
   x19: ffc0fab40980 x18: 0003
   x17: 01c4 x16: 0007
   x15: 000e x14: 
   x13:  x12: 0028
   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
   x9 :  x8 : ffc0fab409a0
   x7 :  x6 : 0002
   x5 : 0001 x4 : 
   x3 : 0001 x2 : 0002
   x1 : ffc0f9431578 x0 : 
   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
   Call trace:
iommu_dma_map_sg+0x7c/0x2c8
__iommu_map_sg_attrs+0x70/0x84
get_pages+0x170/0x1e8
msm_gem_get_iova+0x8c/0x128
_msm_gem_kernel_new+0x6c/0xc8
msm_gem_kernel_new+0x4c/0x58
dsi_tx_buf_alloc_6g+0x4c/0x8c
msm_dsi_host_modeset_init+0xc8/0x108
msm_dsi_modeset_init+0x54/0x18c
_dpu_kms_drm_obj_init+0x430/0x474
dpu_kms_hw_init+0x5f8/0x6b4
msm_drm_bind+0x360/0x6c8
try_to_bring_up_master.part.7+0x28/0x70
component_master_add_with_match+0xe8/0x124
msm_pdev_probe+0x294/0x2b4
platform_drv_probe+0x58/0xa4
really_probe+0x150/0x294
driver_probe_device+0xac/0xe8
__device_attach_driver+0xa4/0xb4
bus_for_each_drv+0x98/0xc8
__device_attach+0xac/0x12c
device_initial_probe+0x24/0x30
bus_probe_device+0x38/0x98
deferred_probe_work_func+0x78/0xa4
process_one_work+0x24c/0x3dc
worker_thread+0x280/0x360
kthread+0x134/0x13c
ret_from_fork+0x10/0x18
   Code: d284 91000725 6b17039f 5400048a (f9401f40)
   ---[ end trace f22dda57f3648e2c ]---
   Kernel panic - not syncing: Fatal exception
   SMP: stopping secondary CPUs
   Kernel Offset: disabled
   CPU features: 0x0,22802a18
   Memory Limit: none

The problem is that when drm/msm does it's own iommu_attach_device(),
now the domain returned by iommu_get_domain_for_dev() is drm/msm's
domain, and it doesn't have domain->iova_cookie.


Does this crash still happen with 4.20-rc? Because as of 6af588fed391 it 
really shouldn't.



We kind of avoided this problem prior to sdm845/dpu because the iommu
was attached to the mdp node in dt, which is a child of the toplevel
mdss node (which corresponds to the dev passed in dma_map_sg()).  But
with sdm845, now the iommu is attached at the mdss level so we hit the
iommu_dma_ops in dma_map_sg().

But auto allocating/attaching a domain before the driver is probed was
already a blocking problem for enabling per-context pagetables for the
GPU.  This problem is also now solved with this patch.


s/solved/worked around/

If you want a guarantee of actually getting a specific hardware context 
allocated for a given domain, there needs to be code in the IOMMU driver 
to understand and honour that. Implicitly depending on whatever happens 
to fall out of current driver behaviour on current systems is not a real 
solution.



Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in of_dma_configure


That's rather misleading, since the crash described above depends on at 
least two other major changes which came long after that commit.


It's not that I don't understand exactly what you want here - just that 
this commit message isn't a coherent justification for that ;)



Tested-by: Douglas Anderson 
Signed-off-by: Rob Clark 
---
This is an alternative/replacement for [1].  What it lacks in elegance
it makes up for in practicality ;-)

[1] https://patchwork.freedesktop.org/patch/264930/

  drivers/of/device.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 5957cd4fa262..15ffee00fb22 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev)
return device_add(&ofdev->dev);
  }
  
+static const struct of_device_id iommu_blacklist[] = {

+   { .compatible = "qcom,mdp4" },
+   { .compatible = "qcom,mdss" },
+   { .compatible = "qcom,sdm845-mdss" },
+   { .compatible = "qcom,adreno" },
+   {}
+};
+
  /**
   * of_dma_configure - Setup DMA configuration
   * @dev:  Device to apply DMA configurati

Re: [Freedreno] [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-11-29 Thread Robin Murphy


On 29/11/2018 19:57, Tomasz Figa wrote:

On Thu, Nov 29, 2018 at 11:40 AM Jordan Crouse  wrote:


On Thu, Nov 29, 2018 at 01:48:15PM -0500, Rob Clark wrote:

On Thu, Nov 29, 2018 at 10:54 AM Christoph Hellwig  wrote:


On Thu, Nov 29, 2018 at 09:42:50AM -0500, Rob Clark wrote:

Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup.  Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).


You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.


I think even with detach, we wouldn't end up with the context-bank
that the gpu firmware was hard-coded to expect, and so it would
overwrite the incorrect page table address register.  (I could be
mis-remembering that, Jordan spent more time looking at that.  But it
was something along those lines.)


Right - basically the DMA domain steals context bank 0 and the GPU is hard coded
to use that context bank for pagetable switching.

I believe the Tegra guys also had a similar problem with a hard coded context
bank.


AIUI, they don't need a specific hardware context, they just need to 
know which one they're actually using, which the domain abstraction hides.



Wait, if we detach the GPU/display struct device from the default
domain and attach it to a newly allocated domain, wouldn't the newly
allocated domain use the context bank we need? Note that we're already


The arm-smmu driver doesn't, but there's no fundamental reason it 
couldn't. That should just need code to refcount domain users and 
release hardware contexts for domains with no devices currently attached.


Robin.


doing that, except that we're doing it behind the back of the DMA
mapping subsystem, so that it keeps using the IOMMU version of the DMA
ops for the device and doing any mapping operations on the default
domain. If we ask the DMA mapping to detach, wouldn't it essentially
solve the problem?

Best regards,
Tomasz


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v18 1/5] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-11-28 Thread Robin Murphy


On 28/11/2018 16:24, Stephen Boyd wrote:

Quoting Vivek Gautam (2018-11-27 02:11:41)

@@ -1966,6 +1970,23 @@ static const struct of_device_id arm_smmu_of_match[] = {
  };
  MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
  
+static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu,

+  const char * const *clks)
+{
+   int i;
+
+   if (smmu->num_clks < 1)
+   return;
+
+   smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks,
+ sizeof(*smmu->clks), GFP_KERNEL);
+   if (!smmu->clks)
+   return;
+
+   for (i = 0; i < smmu->num_clks; i++)
+   smmu->clks[i].id = clks[i];


Is this clk_bulk_get_all()?


Ooh, did that finally get merged while we weren't looking? Great!

Much as I don't want to drag this series out to a v19, it *would* be 
neat if we no longer need to open-code that bit...


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v16 5/5] iommu/arm-smmu: Add support for qcom, smmu-v2 variant

2018-09-26 Thread Robin Murphy


On 30/08/18 15:45, Vivek Gautam wrote:

qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
clock and power requirements.
On msm8996, multiple cores, viz. mdss, video, etc. use this
smmu. On sdm845, this smmu is used with gpu.
Add bindings for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Rob Herring 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
---
  drivers/iommu/arm-smmu.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 166c8c6da24f..411e5ac57c64 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -119,6 +119,7 @@ enum arm_smmu_implementation {
GENERIC_SMMU,
ARM_MMU500,
CAVIUM_SMMUV2,
+   QCOM_SMMUV2,


Hmm, it seems we don't actually need this right now, but maybe that just 
means there's more imp-def registers and/or errata to come ;)


Either way I guess there's no real harm in having it.

Reviewed-by: Robin Murphy 


  };
  
  struct arm_smmu_s2cr {

@@ -1970,6 +1971,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, 
GENERIC_SMMU);
  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
  
+static const char * const qcom_smmuv2_clks[] = {

+   "bus", "iface",
+};
+
+static const struct arm_smmu_match_data qcom_smmuv2 = {
+   .version = ARM_SMMU_V2,
+   .model = QCOM_SMMUV2,
+   .clks = qcom_smmuv2_clks,
+   .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
+};
+
  static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
@@ -1977,6 +1989,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,mmu-401", .data = &arm_mmu401 },
{ .compatible = "arm,mmu-500", .data = &arm_mmu500 },
{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
+   { .compatible = "qcom,smmu-v2", .data = &qcom_smmuv2 },
{ },
  };
  MODULE_DEVICE_TABLE(of, arm_smmu_of_match);


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v16 4/5] dt-bindings: arm-smmu: Add bindings for qcom, smmu-v2

2018-09-26 Thread Robin Murphy


On 30/08/18 15:45, Vivek Gautam wrote:

Add bindings doc for Qcom's smmu-v2 implementation.


Reviewed-by: Robin Murphy 


Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
---
  .../devicetree/bindings/iommu/arm,smmu.txt | 39 ++
  1 file changed, 39 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce12af5..a6504b37cc21 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,10 +17,16 @@ conditions.
  "arm,mmu-401"
  "arm,mmu-500"
  "cavium,smmu-v2"
+"qcom,smmu-v2"
  
depending on the particular implementation and/or the

version of the architecture implemented.
  
+  Qcom SoCs must contain, as below, SoC-specific compatibles

+  along with "qcom,smmu-v2":
+  "qcom,msm8996-smmu-v2", "qcom,smmu-v2",
+  "qcom,sdm845-smmu-v2", "qcom,smmu-v2".
+
  - reg   : Base address and size of the SMMU.
  
  - #global-interrupts : The number of global interrupts exposed by the

@@ -71,6 +77,22 @@ conditions.
or using stream matching with #iommu-cells = <2>, and
may be ignored if present in such cases.
  
+- clock-names:List of the names of clocks input to the device. The

+  required list depends on particular implementation and
+  is as follows:
+  - for "qcom,smmu-v2":
+- "bus": clock required for downstream bus access and
+ for the smmu ptw,
+- "iface": clock required to access smmu's registers
+   through the TCU's programming interface.
+  - unspecified for other implementations.
+
+- clocks: Specifiers for all clocks listed in the clock-names property,
+  as per generic clock bindings.
+
+- power-domains:  Specifiers for power domains required to be powered on for
+  the SMMU to operate, as per generic power domain bindings.
+
  ** Deprecated properties:
  
  - mmu-masters (deprecated in favour of the generic "iommus" binding) :

@@ -137,3 +159,20 @@ conditions.
  iommu-map = <0 &smmu3 0 0x400>;
  ...
  };
+
+   /* Qcom's arm,smmu-v2 implementation */
+   smmu4: iommu@d0 {
+   compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";
+   reg = <0xd0 0x1>;
+
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+;
+   #iommu-cells = <1>;
+   power-domains = <&mmcc MDSS_GDSC>;
+
+   clocks = <&mmcc SMMU_MDP_AXI_CLK>,
+<&mmcc SMMU_MDP_AHB_CLK>;
+   clock-names = "bus", "iface";
+   };


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v16 3/5] iommu/arm-smmu: Add the device_link between masters and smmu

2018-09-26 Thread Robin Murphy


On 30/08/18 15:45, Vivek Gautam wrote:

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.


Reviewed-by: Robin Murphy 


Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
---
  drivers/iommu/arm-smmu.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1bf542010be7..166c8c6da24f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1461,6 +1461,9 @@ static int arm_smmu_add_device(struct device *dev)
  
  	iommu_device_link(&smmu->iommu, dev);
  
+	device_link_add(dev, smmu->dev,

+   DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
+
return 0;
  
  out_cfg_free:



___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v16 2/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-09-26 Thread Robin Murphy


On 30/08/18 15:45, Vivek Gautam wrote:

From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.
Global locks are also initialized before enabling runtime pm as the
runtime_resume() calls device_reset() which does tlb_sync_global()
that ultimately requires locks to be initialized.


To the best of my knowledge in this stuff (which is still not quite 
enough to be *truly* confident...),


Reviewed-by: Robin Murphy 


Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
---
  drivers/iommu/arm-smmu.c | 89 +++-
  1 file changed, 81 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d900e007c3c9..1bf542010be7 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -268,6 +268,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
{ 0, NULL},
  };
  
+static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)

+{
+   if (pm_runtime_enabled(smmu->dev))
+   return pm_runtime_get_sync(smmu->dev);
+
+   return 0;
+}
+
+static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+{
+   if (pm_runtime_enabled(smmu->dev))
+   pm_runtime_put(smmu->dev);
+}
+
  static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
  {
return container_of(dom, struct arm_smmu_domain, domain);
@@ -913,11 +927,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-   int irq;
+   int ret, irq;
  
  	if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)

return;
  
+	ret = arm_smmu_rpm_get(smmu);

+   if (ret < 0)
+   return;
+
/*
 * Disable the context bank and free the page tables before freeing
 * it.
@@ -932,6 +950,8 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
  
  	free_io_pgtable_ops(smmu_domain->pgtbl_ops);

__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+   arm_smmu_rpm_put(smmu);
  }
  
  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)

@@ -1213,10 +1233,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return -ENODEV;
  
  	smmu = fwspec_smmu(fwspec);

+
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return ret;
+
/* Ensure that the domain is finalised */
ret = arm_smmu_init_domain_context(domain, smmu);
if (ret < 0)
-   return ret;
+   goto rpm_put;
  
  	/*

 * Sanity check the domain. We don't support domains across
@@ -1226,33 +1251,50 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
dev_err(dev,
"cannot attach to SMMU %s whilst already attached to domain 
on SMMU %s\n",
dev_name(smmu_domain->smmu->dev), dev_name(smmu->dev));
-   return -EINVAL;
+   ret = -EINVAL;
+   goto rpm_put;
}
  
  	/* Looks ok, so add the device to the domain */

-   return arm_smmu_domain_add_master(smmu_domain, fwspec);
+   ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
+
+rpm_put:
+   arm_smmu_rpm_put(smmu);
+   return ret;
  }
  
  static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,

phys_addr_t paddr, size_t size, int prot)
  {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+   int ret;
  
  	if (!ops)

return -ENODEV;
  
-	return ops->map(ops, iova, paddr, size, prot);

+   arm_smmu_rpm_get(smmu);
+   ret = ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
  }
  
  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,

 size_t size)
  {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+   size_t ret;
  
  	if (!ops)

return 0;
  
-	return ops->unmap(ops, iova, size);

+   arm_smmu_rpm_get(smmu);
+   ret = ops->unmap(ops, iova, size);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
  }
  
  static void arm_smmu_iotlb_sync(struct iommu_domain *domain)

Re: [Freedreno] [PATCH v16 1/5] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-09-26 Thread Robin Murphy


On 30/08/18 15:45, Vivek Gautam wrote:

From: Sricharan R 

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Also, while we enable the runtime pm add a pm sleep suspend
callback that pushes devices to low power state by turning
the clocks off in a system sleep.
Also add corresponding clock enable path in resume callback.

Signed-off-by: Sricharan R 
Signed-off-by: Archit Taneja 
[vivek: rework for clock and pm ops]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
---
  drivers/iommu/arm-smmu.c | 77 ++--
  1 file changed, 74 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fd1b80ef9490..d900e007c3c9 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -205,6 +206,8 @@ struct arm_smmu_device {

u32 num_global_irqs;
u32 num_context_irqs;
unsigned int*irqs;
+   struct clk_bulk_data*clks;
+   int num_clks;
  
  	u32cavium_id_base; /* Specific to Cavium */
  
@@ -1896,10 +1899,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)

  struct arm_smmu_match_data {
enum arm_smmu_arch_version version;
enum arm_smmu_implementation model;
+   const char * const *clks;
+   int num_clks;
  };
  
  #define ARM_SMMU_MATCH_DATA(name, ver, imp)	\

-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }
  
  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);

  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -1918,6 +1923,23 @@ static const struct of_device_id arm_smmu_of_match[] = {
  };
  MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
  
+static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu,

+  const char * const *clks)
+{
+   int i;
+
+   if (smmu->num_clks < 1)
+   return;
+
+   smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks,
+ sizeof(*smmu->clks), GFP_KERNEL);
+   if (!smmu->clks)
+   return;
+
+   for (i = 0; i < smmu->num_clks; i++)
+   smmu->clks[i].id = clks[i];
+}
+
  #ifdef CONFIG_ACPI
  static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu)
  {
@@ -2000,6 +2022,9 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,
data = of_device_get_match_data(dev);
smmu->version = data->version;
smmu->model = data->model;
+   smmu->num_clks = data->num_clks;
+
+   arm_smmu_fill_clk_data(smmu, data->clks);
  
  	parse_driver_options(smmu);
  
@@ -2098,6 +2123,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)

smmu->irqs[i] = irq;
}
  
+	err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks);

+   if (err)
+   return err;
+
+   err = clk_bulk_prepare_enable(smmu->num_clks, smmu->clks);
+   if (err)
+   return err;
+


Hmm, if we error out beyond here it looks like we should strictly 
balance that prepare/enable before devres does the clk_bulk_put(), 
however the probe error path is starting to look like it needs a bit of 
love in general, so I might just spin a cleanup patch on top (and even 
then only for the sake of not being a bad example; SMMU probe failure is 
never a realistic situation for the system to actually recover from).


Otherwise,

Reviewed-by: Robin Murphy 


err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2184,6 +2217,9 @@ static int arm_smmu_device_remove(struct platform_device 
*pdev)
  
  	/* Turn the thing off */

writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+
+   clk_bulk_disable_unprepare(smmu->num_clks, smmu->clks);
+
return 0;
  }
  
@@ -2192,15 +2228,50 @@ static void arm_smmu_device_shutdown(struct platform_device *pdev)

arm_smmu_device_remove(pdev);
  }
  
-static int __maybe_unused arm_smmu_pm_resume(struct device *dev)

+static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
  {
struct arm_smmu_device *smmu = dev_get_drvdata(d

Re: [Freedreno] [PATCH v16 2/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-09-25 Thread Robin Murphy


Hi Vivek,

On 2018-09-25 6:56 AM, Vivek Gautam wrote:

Hi Robin, Will,

On Tue, Sep 18, 2018 at 8:41 AM Vivek Gautam
 wrote:


Hi Robin,

On Fri, Sep 7, 2018 at 3:52 PM Vivek Gautam  wrote:


On Fri, Sep 7, 2018 at 3:22 PM Tomasz Figa  wrote:


On Fri, Sep 7, 2018 at 6:38 PM Vivek Gautam  wrote:


Hi Tomasz,


On 9/7/2018 2:46 PM, Tomasz Figa wrote:

Hi Vivek,

On Thu, Aug 30, 2018 at 11:46 PM Vivek Gautam
 wrote:

From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.
Global locks are also initialized before enabling runtime pm as the
runtime_resume() calls device_reset() which does tlb_sync_global()
that ultimately requires locks to be initialized.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
---
   drivers/iommu/arm-smmu.c | 89 
+++-
   1 file changed, 81 insertions(+), 8 deletions(-)

[snip]

@@ -2215,10 +2281,17 @@ static int arm_smmu_device_remove(struct 
platform_device *pdev)
  if (!bitmap_empty(smmu->context_map, ARM_SMMU_MAX_CBS))
  dev_err(&pdev->dev, "removing device with active domains!\n");

+   arm_smmu_rpm_get(smmu);
  /* Turn the thing off */
  writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+   arm_smmu_rpm_put(smmu);
+
+   if (pm_runtime_enabled(smmu->dev))
+   pm_runtime_force_suspend(smmu->dev);
+   else
+   clk_bulk_disable(smmu->num_clks, smmu->clks);

-   clk_bulk_disable_unprepare(smmu->num_clks, smmu->clks);
+   clk_bulk_unprepare(smmu->num_clks, smmu->clks);

Aren't we missing pm_runtime_disable() here? We'll have the enable
count unbalanced if the driver is removed and probed again.


pm_runtime_force_suspend() does a pm_runtime_disable() also if i am not
wrong.
And, as mentioned in a previous thread [1], we were seeing a warning
which we avoided
by keeping force_suspend().

[1] https://lkml.org/lkml/2018/7/8/124


I see, thanks. I didn't realize that pm_runtime_force_suspend()
already disables runtime PM indeed. Sorry for the noise.


Hi Tomasz,
No problem. Thanks for looking back at it.

Hi Robin,
If you are fine with this series, then can you please consider giving
Reviewed-by, so that we are certain that this series will go in the next merge
window.
Thanks


Gentle ping.
You ack will be very helpful in letting Will pull this series for 4.20.
Thanks.


I would really appreciate if you could provide your ack for this series.
Or if there are any concerns, I am willing to address them.


Apologies, I thought I'd replied to say I'd be getting to this shortly, 
but apparently not :(


FWIW, "shortly" is now tomorrow - I don't *think* there's anything 
outstanding, but given the number of subtleties we've turned up so far I 
do just want one last thorough double-check to make sure.


Thanks,
Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v14 0/4] iommu/arm-smmu: Add runtime pm/sleep support

2018-08-22 Thread Robin Murphy


On 20/08/18 10:31, Tomasz Figa wrote:

Hi Robin,

On Fri, Jul 27, 2018 at 4:02 PM Vivek Gautam
 wrote:


This series provides the support for turning on the arm-smmu's
clocks/power domains using runtime pm. This is done using
device links between smmu and client devices. The device link
framework keeps the two devices in correct order for power-cycling
across runtime PM or across system-wide PM.

With addition of a new device link flag DL_FLAG_AUTOREMOVE_SUPPLIER [8]
(available in linux-next of Rafael's linux-pm tree [9]), the device links
created between arm-smmu and its clients will be automatically purged
when arm-smmu driver unbinds from its device.

As not all implementations support clock/power gating, we are checking
for a valid 'smmu->dev's pm_domain' to conditionally enable the runtime
power management for such smmu implementations that can support it.
Otherwise, the clocks are turned to be always on in .probe until .remove.
With conditional runtime pm now, we avoid touching dev->power.lock
in fastpaths for smmu implementations that don't need to do anything
useful with pm_runtime.
This lets us to use the much-argued pm_runtime_get_sync/put_sync()
calls in map/unmap callbacks so that the clients do not have to
worry about handling any of the arm-smmu's power.

This series also adds support for Qcom's arm-smmu-v2 variant that
has different clocks and power requirements.

Previous version of this patch series is @ [2].

Tested this series on msm8996, and sdm845 after pulling in Rafael's linux-pm
linux-next[9] and Joerg's iommu next[10] branches, and related changes for
device tree, etc.

Hi Robin, Will,
I have addressed the comments for v13. If there's still a chance
can you please consider pulling this for v4.19.
Thanks.

[v14]
* Moved arm_smmu_device_reset() from arm_smmu_pm_resume() to
  arm_smmu_runtime_resume() so that the pm_resume callback calls
  only runtime_resume to resume the device.
  This should take care of restoring the state of smmu in systems
  in which smmu lose register state on power-domain collapse.


It's been a while since this series was posted and no more comments
seem to be left anymore. Would you have some time to take a look
again? Thanks.


Other than the binding issue which turned up in the meantime, I *think* 
this is looking OK now in terms of being sufficiently safe for all the 
various awkward retention vs. state-loss combinations. There's almost 
certainly still ways to improve it in future, but what we have now seems 
like a reasonable starting point that isn't impossibly complicated to 
reason about.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v14 4/4] iommu/arm-smmu: Add support for qcom, smmu-v2 variant

2018-08-22 Thread Robin Murphy


On 27/07/18 08:02, Vivek Gautam wrote:

qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
clock and power requirements. This smmu core is used with
multiple masters on msm8996, viz. mdss, video, etc.
Add bindings for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Rob Herring 
Reviewed-by: Tomasz Figa 
---

Change since v13:
  - No change.

  .../devicetree/bindings/iommu/arm,smmu.txt | 42 ++
  drivers/iommu/arm-smmu.c   | 13 +++
  2 files changed, 55 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce12af5..7c71a6ed465a 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,10 +17,19 @@ conditions.
  "arm,mmu-401"
  "arm,mmu-500"
  "cavium,smmu-v2"
+"qcom,-smmu-v2", "qcom,smmu-v2"
  
depending on the particular implementation and/or the

version of the architecture implemented.
  
+  A number of Qcom SoCs use qcom,smmu-v2 version of the IP.

+  "qcom,-smmu-v2" represents a soc specific compatible
+  string that should be present along with the "qcom,smmu-v2"
+  to facilitate SoC specific clocks/power connections and to
+  address specific bug fixes.


As demonstrated in the GPU thread, this proves a bit too vague for a 
useful binding. Provided Qcom folks can reach a consensus on what a 
given SoC is actually called, I'd rather just unambiguously list 
whatever sets of fully-defined strings we need.


Robin.


+  An example string would be -
+  "qcom,msm8996-smmu-v2", "qcom,smmu-v2".
+
  - reg   : Base address and size of the SMMU.
  
  - #global-interrupts : The number of global interrupts exposed by the

@@ -71,6 +80,22 @@ conditions.
or using stream matching with #iommu-cells = <2>, and
may be ignored if present in such cases.
  
+- clock-names:List of the names of clocks input to the device. The

+  required list depends on particular implementation and
+  is as follows:
+  - for "qcom,smmu-v2":
+- "bus": clock required for downstream bus access and
+ for the smmu ptw,
+- "iface": clock required to access smmu's registers
+   through the TCU's programming interface.
+  - unspecified for other implementations.
+
+- clocks: Specifiers for all clocks listed in the clock-names property,
+  as per generic clock bindings.
+
+- power-domains:  Specifiers for power domains required to be powered on for
+  the SMMU to operate, as per generic power domain bindings.
+
  ** Deprecated properties:
  
  - mmu-masters (deprecated in favour of the generic "iommus" binding) :

@@ -137,3 +162,20 @@ conditions.
  iommu-map = <0 &smmu3 0 0x400>;
  ...
  };
+
+   /* Qcom's arm,smmu-v2 implementation */
+   smmu4: iommu {
+   compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";
+   reg = <0xd0 0x1>;
+
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+;
+   #iommu-cells = <1>;
+   power-domains = <&mmcc MDSS_GDSC>;
+
+   clocks = <&mmcc SMMU_MDP_AXI_CLK>,
+<&mmcc SMMU_MDP_AHB_CLK>;
+   clock-names = "bus", "iface";
+   };
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e558abf1ecfc..2b4edba188a5 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -119,6 +119,7 @@ enum arm_smmu_implementation {
GENERIC_SMMU,
ARM_MMU500,
CAVIUM_SMMUV2,
+   QCOM_SMMUV2,
  };
  
  struct arm_smmu_s2cr {

@@ -1971,6 +1972,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, 
GENERIC_SMMU);
  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
  
+static const char * const qcom_smmuv2_clks[] = {

+   "bus", "iface",
+};
+
+static const struct arm_smmu_match_data qcom_smmuv2 = {
+   .version = ARM_SMMU_V2,
+   .model = QCOM_SMMUV2,
+   .clks = qcom_smmuv2_clks,
+   .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
+};
+
  static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
@@ -1978,6 +1990,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,mmu-401",

Re: [Freedreno] [PATCH 2/2] arm64: dts: sdm845: Add gpu and gmu device nodes

2018-08-09 Thread Robin Murphy


On 08/08/18 23:47, Jordan Crouse wrote:

Add the nodes to describe the Adreno GPU and GMU devices.

Signed-off-by: Jordan Crouse 
---
  arch/arm64/boot/dts/qcom/sdm845.dtsi | 121 +++
  1 file changed, 121 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index cdaabeb3c995..9fb90bb4ea1f 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -323,5 +323,126 @@
status = "disabled";
};
};
+
+   adreno_smmu: arm,smmu-adreno@504 {
+   compatible = "qcom,msm8996-smmu-v2";


Per the binding, this should be:

compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";

(note that even with Vivek's series the driver won't actually match the 
SoC-specific string until we find a real need to)



+   reg = <0x504 0x1>;
+   #iommu-cells = <1>;
+   #global-interrupts = <2>;


Indentation?


+   interrupts = ,
+,


And here?

Otherwise, assuming the table walk really isn't cache-coherent, and the 
global and CB interrupts really do have different triggers (yuck :P), 
the SMMU parts look fine to me.


Robin.


+,
+,
+,
+,
+,
+,
+,
+;
+   clocks = <&clock_gcc GCC_GPU_MEMNOC_GFX_CLK>,
+   <&clock_gcc GCC_GPU_CFG_AHB_CLK>;
+   clock-names = "bus", "iface";
+
+   power-domains = <&clock_gpucc GPU_CX_GDSC>;
+   };
+
+   gpu_opp_table: adreno-opp-table {
+   compatible = "operating-points-v2";
+
+   opp-71000 {
+   opp-hz = /bits/ 64 <71000>;
+   qcom,level = <416>;
+   };
+
+   opp-67500 {
+   opp-hz = /bits/ 64 <67500>;
+   qcom,level = <384>;
+   };
+
+   opp-59600 {
+   opp-hz = /bits/ 64 <59600>;
+   qcom,level = <320>;
+   };
+
+   opp-52000 {
+   opp-hz = /bits/ 64 <52000>;
+   qcom,level = <256>;
+   };
+
+   opp-41400 {
+   opp-hz = /bits/ 64 <41400>;
+   qcom,level = <192>;
+   };
+
+   opp-34200 {
+   opp-hz = /bits/ 64 <34200>;
+   qcom,level = <128>;
+   };
+
+   opp-25700 {
+   opp-hz = /bits/ 64 <25700>;
+   qcom,level = <64>;
+   };
+   };
+
+   gpu@500 {
+   compatible = "qcom,adreno-630.2", "qcom,adreno";
+   #stream-id-cells = <16>;
+
+   reg = <0x500 0x4>;
+   reg-names = "kgsl_3d0_reg_memory";
+
+   /*
+* Look ma, no clocks! The GPU clocks and power are
+* controlled entirely by the GMU
+*/
+
+   interrupts = <0 300 0>;
+   interrupt-names = "kgsl_3d0_irq";
+
+   iommus = <&adreno_smmu 0>;
+
+   operating-points-v2 = <&gpu_opp_table>;
+
+   qcom,gmu = <&gmu>;
+   };
+
+   gmu_opp_table: adreno-gmu-opp-table {
+   compatible = "operating-points-v2";
+
+   opp-4 {
+   opp-hz = /bits/ 64 <4>;
+   qcom,level = <128>;
+   };
+
+   opp-2 {
+   opp-hz = /bits/ 64 <2>;
+   qcom,level = <48>;
+   };
+   };
+
+   gmu: gmu@506a000 {
+   compatible="qcom,adreno-gmu";
+
+   reg = <0x506a000 0x3>,
+   <0xb28 0x1>,
+   <0xb48 0x1>;
+   reg-names = "gmu", "gmu_pdc", "gmu_pdc_seq";
+
+   interrupts = ,
+;
+   interrupt-names = "hfi", "gmu";

Re: [Freedreno] [PATCH v13 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-07-26 Thread Robin Murphy


On 26/07/18 08:12, Vivek Gautam wrote:

On Wed, Jul 25, 2018 at 11:46 PM, Vivek Gautam
 wrote:

On Tue, Jul 24, 2018 at 8:51 PM, Robin Murphy  wrote:

On 19/07/18 11:15, Vivek Gautam wrote:


From: Sricharan R 

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Also, while we enable the runtime pm add a pm sleep suspend
callback that pushes devices to low power state by turning
the clocks off in a system sleep.
Also add corresponding clock enable path in resume callback.

Signed-off-by: Sricharan R 
Signed-off-by: Archit Taneja 
[vivek: rework for clock and pm ops]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---

Changes since v12:
   - Added pm sleep .suspend callback. This disables the clocks.
   - Added corresponding change to enable clocks in .resume
pm sleep callback.

   drivers/iommu/arm-smmu.c | 75
++--
   1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c73cfce1ccc0..9138a6fffe04 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c


[snip]


platform_device *pdev)
 arm_smmu_device_remove(pdev);
   }
   +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
+{
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   return clk_bulk_enable(smmu->num_clks, smmu->clks);



If there's a power domain being automatically switched by genpd then we need
a reset here because we may have lost state entirely. Since I remembered the
otherwise-useless GPU SMMU on Juno is in a separate power domain, I gave it
a poking via sysfs with some debug stuff to dump sCR0 in these callbacks,
and the problem is clear:

...
[4.625551] arm-smmu 2b40.iommu: genpd_runtime_suspend()
[4.631163] arm-smmu 2b40.iommu: arm_smmu_runtime_suspend: 0x00201936
[4.637897] arm-smmu 2b40.iommu: suspend latency exceeded, 6733980 ns
[   21.566983] arm-smmu 2b40.iommu: genpd_runtime_resume()
[   21.584796] arm-smmu 2b40.iommu: arm_smmu_runtime_resume: 0x00220101
[   21.591452] arm-smmu 2b40.iommu: resume latency exceeded, 6658020 ns
...


Qualcomm SoCs have retention enabled for SMMU registers so they don't
lose state.
...
[  256.013367] arm-smmu b4.arm,smmu: arm_smmu_runtime_suspend
SCR0 = 0x201e36
[  256.013367]
[  256.019160] arm-smmu b4.arm,smmu: arm_smmu_runtime_resume
SCR0 = 0x201e36
[  256.019160]
[  256.027368] arm-smmu b4.arm,smmu: arm_smmu_runtime_suspend
SCR0 = 0x201e36
[  256.027368]
[  256.036786] arm-smmu b4.arm,smmu: arm_smmu_runtime_resume
SCR0 = 0x201e36
...

However after adding arm_smmu_device_reset() in runtime_resume() I observe
some performance degradation when kill an instance of 'kmscube' and
start it again.
The launch time with arm_smmu_device_reset() in runtime_resume() change is
more.
Could this be because of frequent TLB invalidation and sync?


Probably. Plus the reset procedure is a big chunk of MMIO accesses, 
which for a non-trivial SMMU configuration probably isn't negligible in 
itself. Unfortunately, unless you know for absolute certain that you 
don't need to do that, you do.



Some more information that i gathered.
On Qcom SoCs besides the registers retention, TCU invalidates TLB cache on
a CX power collapse exit, which is the system wide suspend case.
The arm-smmu software is not aware of this CX power collapse /
auto-invalidation.

So wouldn't doing an explicit TLB invalidations during runtime resume be
detrimental to performance?


Indeed it would be, but resuming with TLBs full of random valid-looking 
junk is even more so.



I have one more doubt here -
We do runtime power cycle around arm_smmu_map/unmap() too.
Now during map/unmap we selectively do TLB maintenance (either
tlb_sync or tlb_add_flush).
But with runtime pm we want to do TLBIALL*. Is that a problem?


It's technically redundant to do both, true, but as we've covered in 
previous rounds of discussion it's very difficult to know *which* one is 
sufficient at any given time, so in order to make progress for now I 
think we have to settle with doing both.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v13 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-07-24 Thread Robin Murphy


On 19/07/18 11:15, Vivek Gautam wrote:

From: Sricharan R 

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Also, while we enable the runtime pm add a pm sleep suspend
callback that pushes devices to low power state by turning
the clocks off in a system sleep.
Also add corresponding clock enable path in resume callback.

Signed-off-by: Sricharan R 
Signed-off-by: Archit Taneja 
[vivek: rework for clock and pm ops]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---

Changes since v12:
  - Added pm sleep .suspend callback. This disables the clocks.
  - Added corresponding change to enable clocks in .resume
   pm sleep callback.

  drivers/iommu/arm-smmu.c | 75 ++--
  1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c73cfce1ccc0..9138a6fffe04 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -205,6 +206,8 @@ struct arm_smmu_device {

u32 num_global_irqs;
u32 num_context_irqs;
unsigned int*irqs;
+   struct clk_bulk_data*clks;
+   int num_clks;
  
  	u32cavium_id_base; /* Specific to Cavium */
  
@@ -1897,10 +1900,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)

  struct arm_smmu_match_data {
enum arm_smmu_arch_version version;
enum arm_smmu_implementation model;
+   const char * const *clks;
+   int num_clks;
  };
  
  #define ARM_SMMU_MATCH_DATA(name, ver, imp)	\

-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }
  
  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);

  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -1919,6 +1924,23 @@ static const struct of_device_id arm_smmu_of_match[] = {
  };
  MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
  
+static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu,

+  const char * const *clks)
+{
+   int i;
+
+   if (smmu->num_clks < 1)
+   return;
+
+   smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks,
+ sizeof(*smmu->clks), GFP_KERNEL);
+   if (!smmu->clks)
+   return;
+
+   for (i = 0; i < smmu->num_clks; i++)
+   smmu->clks[i].id = clks[i];
+}
+
  #ifdef CONFIG_ACPI
  static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu)
  {
@@ -2001,6 +2023,9 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,
data = of_device_get_match_data(dev);
smmu->version = data->version;
smmu->model = data->model;
+   smmu->num_clks = data->num_clks;
+
+   arm_smmu_fill_clk_data(smmu, data->clks);
  
  	parse_driver_options(smmu);
  
@@ -2099,6 +2124,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)

smmu->irqs[i] = irq;
}
  
+	err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks);

+   if (err)
+   return err;
+
+   err = clk_bulk_prepare(smmu->num_clks, smmu->clks);
+   if (err)
+   return err;
+
err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2181,6 +2214,9 @@ static int arm_smmu_device_remove(struct platform_device 
*pdev)
  
  	/* Turn the thing off */

writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+
+   clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+
return 0;
  }
  
@@ -2189,15 +2225,50 @@ static void arm_smmu_device_shutdown(struct platform_device *pdev)

arm_smmu_device_remove(pdev);
  }
  
+static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)

+{
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   return clk_bulk_enable(smmu->num_clks, smmu->clks);


If there's a power domain being automatically switched by genpd then we 
need a reset here because we may have lost state entirely. Since I 
remembered the otherwise-useless GPU SMMU on Juno is in a separate power 
domain, I gave it a poking via sysfs with some debug stuff to dump sCR0 
in these callbacks, and the problem is clear:


...
[4.625551] arm-smmu 2b40.iommu: genpd_runtime_suspend()
[4.631163] arm-smmu 2b4

Re: [Freedreno] [PATCH v12 3/4] iommu/arm-smmu: Add the device_link between masters and smmu

2018-07-18 Thread Robin Murphy


On 18/07/18 10:30, Vivek Gautam wrote:

On Wed, Jul 11, 2018 at 3:23 PM, Rafael J. Wysocki  wrote:

On Sunday, July 8, 2018 7:34:12 PM CEST Vivek Gautam wrote:

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Cc: Rafael J. Wysocki 
Cc: Lukas Wunner 
---

  - Change since v11
* Replaced DL_FLAG_AUTOREMOVE flag with DL_FLAG_AUTOREMOVE_SUPPLIER.

  drivers/iommu/arm-smmu.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 09265e206e2d..916cde4954d2 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1461,8 +1461,20 @@ static int arm_smmu_add_device(struct device *dev)

   iommu_device_link(&smmu->iommu, dev);

+ if (pm_runtime_enabled(smmu->dev) &&


Why does the creation of the link depend on whether or not runtime PM
is enabled for the MMU device?

What about system-wide PM and system shutdown?  Are they always guaranteed
to happen in the right order without the link?


Hi Robin,

As Rafael pointed, we should the device link creation should not depend on
runtime PM being enabled or not, as we would also want to guarantee
that system wide PM callbacks are called in the right order for smmu
and clients.

Does this change of removing the check for pm_runtime_enabled() from here
looks okay to you?


FWIW the existing system PM ops make no claim to be perfect, and I 
wouldn't be at all surprised if it was only by coincidence that my 
devices happened to put on the relevant lists in the right order to 
start with. If we no longer need to worry about explicit device_link 
housekeeping in the SMMU driver, then creating them unconditionally 
sounds like the sensible thing to do. I'd be inclined to treat failure 
as non-fatal like we do for the sysfs link, though, since it's another 
thing that correct SMMU operation doesn't actually depend on (at this 
point we don't necessarily know if this consumer even has a driver at all).



FYI, as discussed in the first patch [1] of this series, I will add a
system wide
suspend callback - arm_smmu_pm_suspend, that would do clock disable, and will
add corresponding clock enable calls in arm_smmu_pm_resume().


OK, I still don't really understand the finer points of how system PM 
and runtime PM interact, but if making it robust is just a case of 
calling the runtime suspend/resume hooks as appropriate from the system 
ones, that sounds reasonable.


Robin.



[1] https://lore.kernel.org/patchwork/patch/960460/


Best regards
Vivek




+ !device_link_add(dev, smmu->dev,
+ DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER)) {
+ dev_err(smmu->dev, "Unable to add link to the consumer %s\n",
+ dev_name(dev));
+ ret = -ENODEV;
+ goto out_unlink;
+ }
+
   return 0;

+out_unlink:
+ iommu_device_unlink(&smmu->iommu, dev);
+ arm_smmu_master_free_smes(fwspec);
  out_cfg_free:
   kfree(cfg);
  out_free:









___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [VERY RFC 0/2] iommu: optionally skip attaching to a DMA domain

2018-04-30 Thread Robin Murphy


On 11/04/18 19:55, Jordan Crouse wrote:

I've been struggling with a problem for a while and I haven't been able to come
up with a clean solution. Rob convinced me to stop complaining and do
_something_ and hopefully this can spur a good discussion.

The scenario is basically this: The MSM GPU wants to manage its own IOMMU
virtual address space in lieu of using the DMA API to do so. The most important
reason is that when per-instance pagetables [1] are enabled each file descriptor
uses its own pagetable which are dynamically switched between contexts. In
conjunction with this we use a split pagetable scheme to allow global buffers
be persistently mapped so even a single pagetable has multiple ranges that need
to be utilized based on buffer type.

Obviously the DMA API isn't suitable for this kind of specialized use. We want
to push it off to the side, allocate our own domain, and use it directly with
the IOMMU APIs. In a perfect world we would just not use the DMA API and attach
our own domain and that would be the end of the story. Unfortunately this is not
a perfect world.

In order to switch the pagetable from the GPU the SoC has some special wiring so
that the GPU can reprogram the TTBR0 register in the IOMMU to the appropriate
value. For a variety of reasons the hardware is hardcoded to only be able to
access context bank 0 in the SMMU device. Long story short; if the DMA
domain is allocated during bus enumeration and attached to context bank 0 then
by the time we get to the driver domain we have long since lost our chance to
get the context bank we need.


Red herrings abound! That's not actually the DMA domain's fault at all, 
it just happens to fall out of the current SMMU driver behaviour. If the 
context bank allocator went top-down instead of bottom-up (IIRC I nearly 
ended up with that at one point during refactoring all that lot, or 
maybe that was the SMR allocator...) then you'd see a very different 
perspective of the problem.


The driver could certainly do with being cleverer in terms of not 
keeping hardware CBs allocated for inactive domains - right now I don't 
think you can even move a device from one domain to another at all if 
you only have a single context bank, which is pretty bogus. That alone 
might even manage to hide this problem, but I wouldn't like to rely on 
it. The "device X can only use context bank N" condition is a real 
hardware limitation which should be described by firmware, but I don't 
have any immediate good ideas as to how :(


The matter of opting out of DMA ops which expect the default domain is a 
separate and more general issue, so I won't start a fork of that 
discussion here.


Robin.


After going back and forth and trying a few possible options it seems like the
"easiest" solution" is to skip attaching the DMA domain in the first place. But
we still need to create a default domain and set up the IOMMU groups correctly
so that when the GPU driver comes along it can attach the device and everything
will Just Work (TM). Rob suggested that we should use a blacklist of devices
that choose to not participate so thats what I'm offering as a starting point
for discussion.

The first patch in this series allows the specific IOMMU driver to gracefully
skip attaching a device by returning -ENOSUPP. In that case, the core will
skip printing error messages and it won't attach the domain to the group but
it still allows the group to get created so that the IOMMU device is properly
set up. In arch_setup_dma_ops() the call to iommu_get_domain_for_dev() will
return NULL and the IOMMU ops won't be set up.

The second patch implements the blacklist in arm-smmu.c based on the compatible
string of the GPU device and returns -ENOTSUPP.

I tested this with my current per-instance pagetable stack and it does the job
but I am very much in the market for better ideas.

[1] https://patchwork.freedesktop.org/series/38729/

Jordan Crouse (2):
   iommu: Gracefully allow drivers to not attach to a default domain
   iommu/arm-smmu: Add list of devices to opt out of DMA domains

  drivers/iommu/arm-smmu.c | 23 +++
  drivers/iommu/iommu.c| 18 ++
  2 files changed, 37 insertions(+), 4 deletions(-)


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH 02/14] iommu/arm-smmu: Add support for TTBR1

2018-03-02 Thread Robin Murphy


On 21/02/18 22:59, Jordan Crouse wrote:

Allow a SMMU device to opt into allocating a TTBR1 pagetable.

The size of the TTBR1 region will be the same as
the TTBR0 size with the sign extension bit set on the highest
bit in the region unless the upstream size is 49 bits and then
the sign-extension bit will be set on the 49th bit.


Um, isn't the 49th bit still "the highest bit" if the address size is 49 
bits? ;)



The map/unmap operations will automatically use the appropriate
pagetable based on the specified iova and the existing mask.

Signed-off-by: Jordan Crouse 
---
  drivers/iommu/arm-smmu-regs.h  |   2 -
  drivers/iommu/arm-smmu.c   |  22 --
  drivers/iommu/io-pgtable-arm.c | 160 -
  drivers/iommu/io-pgtable-arm.h |  20 ++
  drivers/iommu/io-pgtable.h |  16 -
  5 files changed, 192 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index a1226e4ab5f8..0ce85d5b22e9 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -193,8 +193,6 @@ enum arm_smmu_s2cr_privcfg {
  #define RESUME_RETRY  (0 << 0)
  #define RESUME_TERMINATE  (1 << 0)
  
-#define TTBCR2_SEP_SHIFT		15

-#define TTBCR2_SEP_UPSTREAM(0x7 << TTBCR2_SEP_SHIFT)
  #define TTBCR2_AS (1 << 4)
  
  #define TTBRn_ASID_SHIFT		48

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 69e7c60792a8..ebfa59b59622 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -248,6 +248,7 @@ struct arm_smmu_domain {
enum arm_smmu_domain_stage  stage;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
+   u32 attributes;
struct iommu_domain domain;
  };
  
@@ -598,7 +599,6 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,

} else {
cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
-   cb->tcr[1] |= TTBCR2_SEP_UPSTREAM;
if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
cb->tcr[1] |= TTBCR2_AS;
}
@@ -729,6 +729,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
enum io_pgtable_fmt fmt;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   unsigned int quirks =
+   smmu_domain->attributes & (1 << DOMAIN_ATTR_ENABLE_TTBR1) ?
+   IO_PGTABLE_QUIRK_ARM_TTBR1 : 0;
  
  	mutex_lock(&smmu_domain->init_mutex);

if (smmu_domain->smmu)
@@ -852,7 +855,11 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
else
cfg->asid = cfg->cbndx + smmu->cavium_id_base;
  
+	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)

+   quirks |= IO_PGTABLE_QUIRK_NO_DMA;
+
pgtbl_cfg = (struct io_pgtable_cfg) {
+   .quirks = quirks,
.pgsize_bitmap  = smmu->pgsize_bitmap,
.ias= ias,
.oas= oas,
@@ -860,9 +867,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
  
-	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)

-   pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
-
smmu_domain->smmu = smmu;
pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
if (!pgtbl_ops) {
@@ -1477,6 +1481,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_ENABLE_TTBR1:
+   *((int *)data) = !!(smmu_domain->attributes
+   & (1 << DOMAIN_ATTR_ENABLE_TTBR1));
+   return 0;
default:
return -ENODEV;
}
@@ -1505,6 +1513,12 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
else
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
  
+		break;

+   case DOMAIN_ATTR_ENABLE_TTBR1:
+   if (*((int *)data))
+   smmu_domain->attributes |=
+   1 << DOMAIN_ATTR_ENABLE_TTBR1;
+   ret = 0;
break;
default:
ret = -ENODEV;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index fff0b6ba0a69..1bd0045f2cb7 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -152,7 +152,7 @@ struct arm_lpae_io_pgtable {
unsigned long

Re: [Freedreno] [PATCH 01/14] iommu: Add DOMAIN_ATTR_ENABLE_TTBR1

2018-03-02 Thread Robin Murphy


On 21/02/18 22:59, Jordan Crouse wrote:

Add a new domain attribute to enable the TTBR1 pagetable for drivers
and devices that support it.  This will enabled using a TTBR1 (otherwise
known as a "global" or "system" pagetable for devices that support a split
pagetable scheme for switching pagetables quickly and safely.


TTBR1 is very much an Arm VMSA-specific term; if the concept of a split 
address space is useful in general, is it worth trying to frame it in 
general terms? AFAICS other IOMMU drivers could achieve the same effect 
fairly straightforwardly by simply copying the top-level "global" 
entries across whenever they switch "private" tables.


FWIW even for SMMU there could potentially be cases with Arm Ltd. IP 
where the SoC vendor implements a stage-2-only configuration in their 
media subsystem, because they care most about minimising area and 
stage-1-only isn't an option.


Robin.


Signed-off-by: Jordan Crouse 
---
  include/linux/iommu.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 641aaf0f1b81..e2c49e583d8d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -153,6 +153,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMU_ENABLE,
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
+   DOMAIN_ATTR_ENABLE_TTBR1,
DOMAIN_ATTR_MAX,
  };
  


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-22 Thread Robin Murphy


[sorry, I had intended to reply sooner but clearly forgot]

On 16/02/18 00:13, Tomasz Figa wrote:

On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy  wrote:

On 15/02/18 04:17, Tomasz Figa wrote:
[...]


Could you elaborate on what kind of locking you are concerned about?
As I explained before, the normally happening fast path would lock
dev->power_lock only for the brief moment of incrementing the runtime
PM usage counter.



My bad, that's not even it.

The atomic usage counter is incremented beforehands, without any
locking [1] and the spinlock is acquired only for the sake of
validating that device's runtime PM state remained valid indeed [2],
which would be the case in the fast path of the same driver doing two
mappings in parallel, with the master powered on (and so the SMMU,
through device links; if master was not powered on already, powering
on the SMMU is unavoidable anyway and it would add much more latency
than the spinlock itself).



We now have no locking at all in the map path, and only a per-domain lock
around TLB sync in unmap which is unfortunately necessary for correctness;
the latter isn't too terrible, since in "serious" hardware it should only be
serialising a few cpus serving the same device against each other (e.g. for
multiple queues on a single NIC).

Putting in a global lock which serialises *all* concurrent map and unmap
calls for *all* unrelated devices makes things worse. Period. Even if the
lock itself were held for the minimum possible time, i.e. trivially
"spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that
one cache line around between 96 CPUs across two sockets is not negligible.


Fair enough. Note that we're in a quite interesting situation now:
  a) We need to have runtime PM enabled on Qualcomm SoC to have power
properly managed,
  b) We need to have lock-free map/unmap on such distributed systems,
  c) If runtime PM is enabled, we need to call into runtime PM from any
code that does hardware accesses, otherwise the IOMMU API (and so DMA
API and then any V4L2 driver) becomes unusable.

I can see one more way that could potentially let us have all the
three. How about enabling runtime PM only on selected implementations
(e.g. qcom,smmu) and then having all the runtime PM calls surrounded
with if (pm_runtime_enabled()), which is lockless?


Yes, that's the kind of thing I was gravitating towards - my vague 
thought was adding some flag to the smmu_domain, but 
pm_runtime_enabled() does look conceptually a lot cleaner.





[1]
http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028
[2]
http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613

In any case, I can't imagine this working with V4L2 or anything else
relying on any memory management more generic than calling IOMMU API
directly from the driver, with the IOMMU device having runtime PM
enabled, but without managing the runtime PM from the IOMMU driver's
callbacks that need access to the hardware. As I mentioned before,
only the IOMMU driver knows when exactly the real hardware access
needs to be done (e.g. Rockchip/Exynos don't need to do that for
map/unmap if the power is down, but some implementations of SMMU with
TLB powered separately might need to do so).



It's worth noting that Exynos and Rockchip are relatively small
self-contained IP blocks integrated closely with the interfaces of their
relevant master devices; SMMU is an architecture, implementations of which
may be large, distributed, and have complex and wildly differing internal
topologies. As such, it's a lot harder to make hardware-specific assumptions
and/or be correct for all possible cases.

Don't get me wrong, I do ultimately agree that the IOMMU driver is the only
agent who ultimately knows what calls are going to be necessary for whatever
operation it's performing on its own hardware*; it's just that for SMMU it
needs to be implemented in a way that has zero impact on the cases where it
doesn't matter, because it's not viable to specialise that driver for any
particular IP implementation/use-case.


Still, exactly the same holds for the low power embedded use cases,
where we strive for the lowest possible power consumption, while
maintaining performance levels high as well. And so the SMMU code is
expected to also work with our use cases, such as V4L2 or DRM drivers.
Since these points don't hold for current SMMU code, I could say that
the it has been already specialized for large, distributed
implementations.


Heh, really it's specialised for ease of maintenance in terms of doing 
as little as we can get away with, but for what we have implemented, 
fast code does save CPU cycles and power on embedded systems too ;)


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-15 Thread Robin Murphy


On 15/02/18 04:17, Tomasz Figa wrote:
[...]

Could you elaborate on what kind of locking you are concerned about?
As I explained before, the normally happening fast path would lock
dev->power_lock only for the brief moment of incrementing the runtime
PM usage counter.


My bad, that's not even it.

The atomic usage counter is incremented beforehands, without any
locking [1] and the spinlock is acquired only for the sake of
validating that device's runtime PM state remained valid indeed [2],
which would be the case in the fast path of the same driver doing two
mappings in parallel, with the master powered on (and so the SMMU,
through device links; if master was not powered on already, powering
on the SMMU is unavoidable anyway and it would add much more latency
than the spinlock itself).


We now have no locking at all in the map path, and only a per-domain 
lock around TLB sync in unmap which is unfortunately necessary for 
correctness; the latter isn't too terrible, since in "serious" hardware 
it should only be serialising a few cpus serving the same device against 
each other (e.g. for multiple queues on a single NIC).


Putting in a global lock which serialises *all* concurrent map and unmap 
calls for *all* unrelated devices makes things worse. Period. Even if 
the lock itself were held for the minimum possible time, i.e. trivially 
"spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing 
that one cache line around between 96 CPUs across two sockets is not 
negligible.



[1] 
http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028
[2] 
http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613

In any case, I can't imagine this working with V4L2 or anything else
relying on any memory management more generic than calling IOMMU API
directly from the driver, with the IOMMU device having runtime PM
enabled, but without managing the runtime PM from the IOMMU driver's
callbacks that need access to the hardware. As I mentioned before,
only the IOMMU driver knows when exactly the real hardware access
needs to be done (e.g. Rockchip/Exynos don't need to do that for
map/unmap if the power is down, but some implementations of SMMU with
TLB powered separately might need to do so).


It's worth noting that Exynos and Rockchip are relatively small 
self-contained IP blocks integrated closely with the interfaces of their 
relevant master devices; SMMU is an architecture, implementations of 
which may be large, distributed, and have complex and wildly differing 
internal topologies. As such, it's a lot harder to make 
hardware-specific assumptions and/or be correct for all possible cases.


Don't get me wrong, I do ultimately agree that the IOMMU driver is the 
only agent who ultimately knows what calls are going to be necessary for 
whatever operation it's performing on its own hardware*; it's just that 
for SMMU it needs to be implemented in a way that has zero impact on the 
cases where it doesn't matter, because it's not viable to specialise 
that driver for any particular IP implementation/use-case.


Robin.


*AFAICS it still makes some sense to have the get_suppliers option as 
well, though - the IOMMU driver does what it needs for correctness 
internally, but the external consumer doing something non-standard can 
can grab and hold the link around multiple calls to short-circuit that.

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-14 Thread Robin Murphy


On 14/02/18 10:33, Vivek Gautam wrote:

On Wed, Feb 14, 2018 at 2:46 PM, Tomasz Figa  wrote:

Adding Jordan to this thread as well.


On Wed, Feb 14, 2018 at 6:13 PM, Vivek Gautam
 wrote:

Hi Tomasz,

On Wed, Feb 14, 2018 at 11:08 AM, Tomasz Figa  wrote:

On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam
 wrote:

Hi Tomasz,

On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa  wrote:

On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark  wrote:

On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa  wrote:

On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark  wrote:

On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa  wrote:

Hi Vivek,

Thanks for the patch. Please see my comments inline.

On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam
 wrote:

While handling the concerned iommu, there should not be a
need to power control the drm devices from iommu interface.
If these drm devices need to be powered around this time,
the respective drivers should take care of this.

Replace the pm_runtime_get/put_sync() with
pm_runtime_get/put_suppliers() calls, to power-up
the connected iommu through the device link interface.
In case the device link is not setup these get/put_suppliers()
calls will be a no-op, and the iommu driver should take care of
powering on its devices accordingly.

Signed-off-by: Vivek Gautam 
---
  drivers/gpu/drm/msm/msm_iommu.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index b23d33622f37..1ab629bbee69 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * 
const *names,
 struct msm_iommu *iommu = to_msm_iommu(mmu);
 int ret;

-   pm_runtime_get_sync(mmu->dev);
+   pm_runtime_get_suppliers(mmu->dev);
 ret = iommu_attach_device(iommu->domain, mmu->dev);
-   pm_runtime_put_sync(mmu->dev);
+   pm_runtime_put_suppliers(mmu->dev);


For me, it looks like a wrong place to handle runtime PM of IOMMU
here. iommu_attach_device() calls into IOMMU driver's attach_device()
callback and that's where necessary runtime PM gets should happen, if
any. In other words, driver A (MSM DRM driver) shouldn't be dealing
with power state of device controlled by driver B (ARM SMMU).


Note that we end up having to do the same, because of iommu_unmap()
while DRM driver is powered off..  it might be cleaner if it was all
self contained in the iommu driver, but that would make it so other
drivers couldn't call iommu_unmap() from an irq handler, which is
apparently something that some of them want to do..


I'd assume that runtime PM status is already guaranteed to be active
when the IRQ handler is running, by some other means (e.g.
pm_runtime_get_sync() called earlier, when queuing some work to the
hardware). Otherwise, I'm not sure how a powered down device could
trigger an IRQ.

So, if the master device power is already on, suppliers should be
powered on as well, thanks to device links.



umm, that is kindof the inverse of the problem..  the problem is
things like gpu driver (and v4l2 drivers that import dma-buf's,
afaict).. they will potentially call iommu->unmap() when device is not
active (due to userspace or things beyond the control of the driver)..
so *they* would want iommu to do pm get/put calls.


Which is fine and which is actually already done by one of the patches
in this series, not for map/unmap, but probe, add_device,
remove_device. Having parts of the API doing it inside the callback
and other parts outside sounds at least inconsistent.


But other drivers
trying to unmap from irq ctx would not.  Which is the contradictory
requirement that lead to the idea of iommu user powering up iommu for
unmap.


Sorry, maybe I wasn't clear. My last message was supposed to show that
it's not contradictory at all, because "other drivers trying to unmap
from irq ctx" would already have called pm_runtime_get_*() earlier
from a non-irq ctx, which would have also done the same on all the
linked suppliers, including the IOMMU. The ultimate result would be
that the map/unmap() of the IOMMU driver calling pm_runtime_get_sync()
would do nothing besides incrementing the reference count.


The entire point was to avoid the slowpath that pm_runtime_get/put_sync()
would add in map/unmap. It would not be correct to add a slowpath in irq_ctx
for taking care of non-irq_ctx and for the situations where master is already
powered-off.


Correct me if I'm wrong, but I believe that with what I'm proposing
there wouldn't be any slow path.


Yea, but only when the power domain is irq-safe? And not all platforms
enable irq-safe power domains. For instance, msm doesn't enable its
gdsc power domains as irq-safe.
Is it something i am missing?


irq-safe would matter if there would exist a case when the call is
done from IRQ context and the power is off. As I explained in a), it
shouldn't happen.


Hi Robin, Will

Does adding pm_runtime_

Re: [Freedreno] [PATCH v7 1/6] base: power: runtime: Export pm_runtime_get/put_suppliers

2018-02-13 Thread Robin Murphy


On 13/02/18 12:54, Tomasz Figa wrote:

On Tue, Feb 13, 2018 at 9:00 PM, Robin Murphy  wrote:

On 13/02/18 07:44, Tomasz Figa wrote:


Hi Vivek,

On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam
 wrote:


The device link allows the pm framework to tie the supplier and
consumer. So, whenever the consumer is powered-on the supplier
is powered-on first.

There are however cases in which the consumer wants to power-on
the supplier, but not itself.
E.g., A Graphics or multimedia driver wants to power-on the SMMU
to unmap a buffer and finish the TLB operations without powering
on itself.



This sounds strange to me. If the SMMU is powered down, wouldn't the
TLB lose its contents as well (and so no flushing needed)?



Depends on implementation details - if runtime PM is actually implemented
via external clock gating (in the absence of fine-grained power domains),
then "suspended" TLBs might both retain state and not receive invalidation
requests, which is really the worst case.


Agreed. That's why in "[PATCH v7 3/6] iommu/arm-smmu: Invoke
pm_runtime during probe, add/remove device" I actually suggested
managing clocks separately from runtime PM. At least until runtime PM
framework arrives at a state, where multiple power states can be
managed, i.e. full power state, clock-gated state, domain-off state.
(I think I might have seen some ongoing work on this on LWN though...)




Other than that, what kind of hardware operations would be needed
besides just updating the page tables from the CPU?



Domain attach/detach also require updating SMMU hardware state (and possibly
TLB maintenance), but don't logically require the master device itself to be
active at the time.


Wouldn't this hardware state need to be reinitialized anyway after
respective power domain power cycles? (In other words, hardware would
only need programming if it's powered on at the moment.)


Yes, if the entire SMMU was fully powered down because all masters were 
inactive, then all that should need to be done is to update the software 
shadow state in the expectation that arm_smmu_reset() would re-sync it 
upon TCU powerup. If at least some part of the internal logic remains 
active, though, then you may or may not need to fiddle with zero or more 
clocks and/or power domains (depending on microarchitecture and 
integration) in order to be sure that everything from the programming 
slave interface through to wherever that state is kept works correctly 
so that it can be changed.


The main motivation here is that the Qualcomm SMMU microarchitecture 
apparently allows the programming interface to be shut down separately 
from the TCU core (context banks, page table walker, etc.), and they get 
an appreciable power saving from doing so. This is different from, say, 
the Arm Ltd. implementations, where the entire TCU is a single 
clock/power domain internally (although you could maybe still gate the 
external APB interface clock).


As the previous discussions have shown, this is really, really hard to 
do properly in a generic manner.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v7 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-02-13 Thread Robin Murphy


On 13/02/18 08:24, Tomasz Figa wrote:

Hi Vivek,

Thanks for the patch. Please see my comments inline.

On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam
 wrote:

From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
  drivers/iommu/arm-smmu.c | 42 ++
  1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9e2f917e16c2..c024f69c1682 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -913,11 +913,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
 struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 struct arm_smmu_device *smmu = smmu_domain->smmu;
 struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-   int irq;
+   int ret, irq;

 if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
 return;

+   ret = pm_runtime_get_sync(smmu->dev);
+   if (ret)
+   return;


pm_runtime_get_sync() will return 0 if the device was powered off, 1
if it was already/still powered on or runtime PM is not compiled in,
or a negative value on error, so shouldn't the test be (ret < 0)?

Moreover, I'm actually wondering if it makes any sense to power up the
hardware just to program it and power it down again. In a system where
the IOMMU is located within a power domain, it would cause the IOMMU
block to lose its state anyway.


This is generally for the case where the SMMU internal state remains 
active, but the programming interface needs to be powered up in order to 
access it.



Actually, reflecting back on "[PATCH v7 2/6] iommu/arm-smmu: Add
pm_runtime/sleep ops", perhaps it would make more sense to just
control the clocks independently of runtime PM? Then, runtime PM could
be used for real power management, e.g. really powering the block up
and down, for further power saving.


Unfortunately that ends up pretty much unmanageable, because there are 
numerous different SMMU microarchitectures with fundamentally different 
clock/power domain schemes (multiplied by individual SoC integration 
possibilities). Since this is fundamentally a generic architectural 
driver, adding explicit clock support would probably make the whole 
thing about 50% clock code, with complicated decision trees around every 
hardware access calculating which clocks are necessary for a given 
operation on a given system. That maintainability aspect is why we've 
already nacked such a fine-grained approach in the past.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v7 1/6] base: power: runtime: Export pm_runtime_get/put_suppliers

2018-02-13 Thread Robin Murphy


On 13/02/18 07:44, Tomasz Figa wrote:

Hi Vivek,

On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam
 wrote:

The device link allows the pm framework to tie the supplier and
consumer. So, whenever the consumer is powered-on the supplier
is powered-on first.

There are however cases in which the consumer wants to power-on
the supplier, but not itself.
E.g., A Graphics or multimedia driver wants to power-on the SMMU
to unmap a buffer and finish the TLB operations without powering
on itself.


This sounds strange to me. If the SMMU is powered down, wouldn't the
TLB lose its contents as well (and so no flushing needed)?


Depends on implementation details - if runtime PM is actually 
implemented via external clock gating (in the absence of fine-grained 
power domains), then "suspended" TLBs might both retain state and not 
receive invalidation requests, which is really the worst case.



Other than that, what kind of hardware operations would be needed
besides just updating the page tables from the CPU?


Domain attach/detach also require updating SMMU hardware state (and 
possibly TLB maintenance), but don't logically require the master device 
itself to be active at the time.


Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v6 4/6] iommu/arm-smmu: Add the device_link between masters and smmu

2018-02-02 Thread Robin Murphy


On 02/02/18 05:40, Sricharan R wrote:

Hi Robin/Vivek,

On 2/1/2018 2:23 PM, Vivek Gautam wrote:

Hi,


On 1/31/2018 6:39 PM, Robin Murphy wrote:

On 19/01/18 11:43, Vivek Gautam wrote:

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.


Don't we need to balance this with a device_link_del() in .remove_device (like 
exynos-iommu does)?


Right. Will add device_link_del() call. Thanks for pointing out.


  The reason for not adding device_link_del from .remove_device was, the core 
device_del
  which calls the .remove_device from notifier, calls device_links_purge before 
that.
  That does the same thing as device_link_del. So by the time .remove_device is 
called,
  device_links for that device is already cleaned up. Vivek, you may want to 
check once that
  calling device_link_del from .remove_device has no effect, just to confirm 
once more.


There is at least one path in which .remove_device is not called via the 
notifier from device_del(), which is in the cleanup path of 
iommu_bus_init(). AFAICS any links created by .add_device during that 
process would be left dangling, because the device(s) would be live but 
otherwise disassociated from the IOMMU afterwards.


From a maintenance perspective it's easier to have the call in its 
logical place even if it does nothing 99% of the time; that way we 
shouldn't have to keep an eye out for subtle changes in the power 
management code or driver core that might invalidate the device_del() 
reasoning above, and the power management guys shouldn't have to 
comprehend the internals of the IOMMU API to make sense of the 
unbalanced call if they ever want to change their API.


Thanks,
Robin.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v6 4/6] iommu/arm-smmu: Add the device_link between masters and smmu

2018-01-31 Thread Robin Murphy


On 19/01/18 11:43, Vivek Gautam wrote:

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.


Don't we need to balance this with a device_link_del() in .remove_device 
(like exynos-iommu does)?


Robin.


Signed-off-by: Sricharan R 
---
  drivers/iommu/arm-smmu.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 95478bfb182c..33bbcfedb896 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1367,6 +1367,7 @@ static int arm_smmu_add_device(struct device *dev)
struct arm_smmu_device *smmu;
struct arm_smmu_master_cfg *cfg;
struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct device_link *link;
int i, ret;
  
  	if (using_legacy_binding) {

@@ -1428,6 +1429,16 @@ static int arm_smmu_add_device(struct device *dev)
  
  	pm_runtime_put_sync(smmu->dev);
  
+	/*

+* Establish the link between smmu and master, so that the
+* smmu gets runtime enabled/disabled as per the master's
+* needs.
+*/
+   link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
+   if (!link)
+   dev_warn(smmu->dev, "Unable to create device link between %s and 
%s\n",
+dev_name(smmu->dev), dev_name(dev));
+
return 0;
  
  out_cfg_free:



___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH v6 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-01-31 Thread Robin Murphy


On 19/01/18 11:43, Vivek Gautam wrote:

From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
  drivers/iommu/arm-smmu.c | 45 +
  1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 21acffe91a1c..95478bfb182c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -914,11 +914,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-   int irq;
+   int ret, irq;
  
  	if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)

return;
  
+	ret = pm_runtime_get_sync(smmu->dev);

+   if (ret)
+   return;
+
/*
 * Disable the context bank and free the page tables before freeing
 * it.
@@ -933,6 +937,8 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
  
  	free_io_pgtable_ops(smmu_domain->pgtbl_ops);

__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+   pm_runtime_put_sync(smmu->dev);
  }
  
  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)

@@ -1408,12 +1414,20 @@ static int arm_smmu_add_device(struct device *dev)
while (i--)
cfg->smendx[i] = INVALID_SMENDX;
  
-	ret = arm_smmu_master_alloc_smes(dev);

+   ret = pm_runtime_get_sync(smmu->dev);
if (ret)
goto out_cfg_free;
  
+	ret = arm_smmu_master_alloc_smes(dev);

+   if (ret) {
+   pm_runtime_put_sync(smmu->dev);
+   goto out_cfg_free;


Please keep to the existing pattern and put this on the cleanup path 
with a new label, rather than inline.



+   }
+
iommu_device_link(&smmu->iommu, dev);
  
+	pm_runtime_put_sync(smmu->dev);

+
return 0;
  
  out_cfg_free:

@@ -1428,7 +1442,7 @@ static void arm_smmu_remove_device(struct device *dev)
struct iommu_fwspec *fwspec = dev->iommu_fwspec;
struct arm_smmu_master_cfg *cfg;
struct arm_smmu_device *smmu;
-
+   int ret;
  
  	if (!fwspec || fwspec->ops != &arm_smmu_ops)

return;
@@ -1436,8 +1450,21 @@ static void arm_smmu_remove_device(struct device *dev)
cfg  = fwspec->iommu_priv;
smmu = cfg->smmu;
  
+	/*

+* The device link between the master device and
+* smmu is already purged at this point.
+* So enable the power to smmu explicitly.
+*/


I don't understand this comment, especially since we don't even 
introduce device links until the following patch... :/



+
+   ret = pm_runtime_get_sync(smmu->dev);
+   if (ret)
+   return;
+
iommu_device_unlink(&smmu->iommu, dev);
arm_smmu_master_free_smes(fwspec);
+
+   pm_runtime_put_sync(smmu->dev);
+
iommu_group_remove_device(dev);
kfree(fwspec->iommu_priv);
iommu_fwspec_free(dev);
@@ -2130,6 +2157,14 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
  
+	platform_set_drvdata(pdev, smmu);

+
+   pm_runtime_enable(dev);
+
+   err = pm_runtime_get_sync(dev);
+   if (err)
+   return err;
+
err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2171,9 +2206,9 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
return err;
}
  
-	platform_set_drvdata(pdev, smmu);

arm_smmu_device_reset(smmu);
arm_smmu_test_smr_masks(smmu);
+   pm_runtime_put_sync(dev);
  
  	/*

 * For ACPI and generic DT bindings, an SMMU will be probed before
@@ -2212,6 +2247,8 @@ static int arm_smmu_device_remove(struct platform_device 
*pdev)
  
  	/* Turn the thing off */

writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+   pm_runtime_force_suspend(smmu->dev);


Why do we need this? I guess it might be a Qualcomm-ism as I don't see 
anyone else calling it from .remove other than a couple of other qcom_* 
drivers. Given that we only get here during system shutdown (or the root 
user intentionally pissing about with driver unbinding), it doesn't seem 
like a point where power saving really matters all that much.


I'd also naively expect that anything this device was the last consumer 
off would get turned off by core code anyway once it's removed, but 
maybe things aren't that slick; I dunno :/


Robin.


+
return 0;
  }
  


___
Freedreno mailing list

Re: [Freedreno] [PATCH v6 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-01-31 Thread Robin Murphy


On 19/01/18 11:43, Vivek Gautam wrote:

From: Sricharan R 

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Signed-off-by: Sricharan R 
Signed-off-by: Archit Taneja 
[vivek: Clock rework to request bulk of clocks]
Signed-off-by: Vivek Gautam 
---
  drivers/iommu/arm-smmu.c | 55 ++--
  1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 78d4c6b8f1ba..21acffe91a1c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -205,6 +206,9 @@ struct arm_smmu_device {

u32 num_global_irqs;
u32 num_context_irqs;
unsigned int*irqs;
+   struct clk_bulk_data*clocks;
+   int num_clks;
+   const char * const  *clk_names;


This seems unnecessary, as we use it a grand total of of once, during 
initialisation when we have the source data directly to hand. Just pass 
data->clks into arm_smmu_init_clks() as an additional argument.


Otherwise, I think this looks reasonable; it's about as unobtrusive as 
it's going to get.


Robin.


u32 cavium_id_base; /* Specific to Cavium */
  
@@ -1685,6 +1689,25 @@ static int arm_smmu_id_size_to_bits(int size)

}
  }
  
+static int arm_smmu_init_clocks(struct arm_smmu_device *smmu)

+{
+   int i;
+   int num = smmu->num_clks;
+
+   if (num < 1)
+   return 0;
+
+   smmu->clocks = devm_kcalloc(smmu->dev, num,
+   sizeof(*smmu->clocks), GFP_KERNEL);
+   if (!smmu->clocks)
+   return -ENOMEM;
+
+   for (i = 0; i < num; i++)
+   smmu->clocks[i].id = smmu->clk_names[i];
+
+   return devm_clk_bulk_get(smmu->dev, num, smmu->clocks);
+}
+
  static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
  {
unsigned long size;
@@ -1897,10 +1920,12 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
  struct arm_smmu_match_data {
enum arm_smmu_arch_version version;
enum arm_smmu_implementation model;
+   const char * const *clks;
+   int num_clks;
  };
  
  #define ARM_SMMU_MATCH_DATA(name, ver, imp)	\

-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }
  
  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);

  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -2001,6 +2026,8 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,
data = of_device_get_match_data(dev);
smmu->version = data->version;
smmu->model = data->model;
+   smmu->clk_names = data->clks;
+   smmu->num_clks = data->num_clks;
  
  	parse_driver_options(smmu);
  
@@ -2099,6 +2126,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev)

smmu->irqs[i] = irq;
}
  
+	err = arm_smmu_init_clocks(smmu);

+   if (err)
+   return err;
+
err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2197,7 +2228,27 @@ static int __maybe_unused arm_smmu_pm_resume(struct 
device *dev)
return 0;
  }
  
-static SIMPLE_DEV_PM_OPS(arm_smmu_pm_ops, NULL, arm_smmu_pm_resume);

+static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
+{
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   return clk_bulk_prepare_enable(smmu->num_clks, smmu->clocks);
+}
+
+static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
+{
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   clk_bulk_disable_unprepare(smmu->num_clks, smmu->clocks);
+
+   return 0;
+}
+
+static const struct dev_pm_ops arm_smmu_pm_ops = {
+   SET_SYSTEM_SLEEP_PM_OPS(NULL, arm_smmu_pm_resume)
+   SET_RUNTIME_PM_OPS(arm_smmu_runtime_suspend,
+  arm_smmu_runtime_resume, NULL)
+};
  
  static struct platform_driver arm_smmu_driver = {

.driver = {


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [Freedreno] [PATCH] drm: convert DT component matching to component_match_add_release()

2016-06-03 Thread Robin Murphy


Hi Russell,

On 03/06/16 08:58, Russell King wrote:

Convert DT component matching to use component_match_add_release().

Signed-off-by: Russell King 
---
  drivers/gpu/drm/arm/hdlcd_drv.c | 10 --
  drivers/gpu/drm/armada/armada_drv.c |  9 +++--
  drivers/gpu/drm/drm_of.c| 13 +
  drivers/gpu/drm/msm/msm_drv.c   |  8 +++-
  drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 13 ++---
  drivers/gpu/drm/sti/sti_drv.c   |  9 +++--
  drivers/gpu/drm/tilcdc/tilcdc_external.c|  9 +++--
  7 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c
index b987c63ba8d6..bbde48c4f550 100644
--- a/drivers/gpu/drm/arm/hdlcd_drv.c
+++ b/drivers/gpu/drm/arm/hdlcd_drv.c
@@ -443,11 +443,16 @@ static const struct component_master_ops hdlcd_master_ops 
= {
.unbind = hdlcd_drm_unbind,
  };

-static int compare_dev(struct device *dev, void *data)
+static int compare_of(struct device *dev, void *data)
  {
return dev->of_node == data;
  }

+static void release_of(struct device *dev, void *data)
+{
+   of_node_put(data);
+}


Considering that there's 7 identical copies of this function in this 
patch alone, perhaps there's some mileage in defining it commonly as a 
static __maybe_unused default_release_of() in component.h or drm_of.h 
(and maybe default_compare_of() similarly)?


(Apologies if there's already been some strong argument against that 
which I've not seen, but it seems like a reasonable thing to do.)


Robin.


+
  static int hdlcd_probe(struct platform_device *pdev)
  {
struct device_node *port, *ep;
@@ -474,7 +479,8 @@ static int hdlcd_probe(struct platform_device *pdev)
return -EAGAIN;
}

-   component_match_add(&pdev->dev, &match, compare_dev, port);
+   component_match_add_release(&pdev->dev, &match, release_of,
+   compare_of, port);

return component_master_add_with_match(&pdev->dev, &hdlcd_master_ops,
   match);
diff --git a/drivers/gpu/drm/armada/armada_drv.c 
b/drivers/gpu/drm/armada/armada_drv.c
index 439824a61aa5..6ca2aa36515e 100644
--- a/drivers/gpu/drm/armada/armada_drv.c
+++ b/drivers/gpu/drm/armada/armada_drv.c
@@ -232,6 +232,11 @@ static int compare_of(struct device *dev, void *data)
return dev->of_node == data;
  }

+static void release_of(struct device *dev, void *data)
+{
+   of_node_put(data);
+}
+
  static int compare_dev_name(struct device *dev, void *data)
  {
const char *name = data;
@@ -255,8 +260,8 @@ static void armada_add_endpoints(struct device *dev,
continue;
}

-   component_match_add(dev, match, compare_of, remote);
-   of_node_put(remote);
+   component_match_add_release(dev, match, release_of,
+   compare_of, remote);
}
  }

diff --git a/drivers/gpu/drm/drm_of.c b/drivers/gpu/drm/drm_of.c
index bc98bb94264d..5d183479d7d6 100644
--- a/drivers/gpu/drm/drm_of.c
+++ b/drivers/gpu/drm/drm_of.c
@@ -6,6 +6,11 @@
  #include 
  #include 

+static void drm_release_of(struct device *dev, void *data)
+{
+   of_node_put(data);
+}
+
  /**
   * drm_crtc_port_mask - find the mask of a registered CRTC by port OF node
   * @dev: DRM device
@@ -101,8 +106,8 @@ int drm_of_component_probe(struct device *dev,
continue;
}

-   component_match_add(dev, &match, compare_of, port);
-   of_node_put(port);
+   component_match_add_release(dev, &match, drm_release_of,
+   compare_of, port);
}

if (i == 0) {
@@ -140,8 +145,8 @@ int drm_of_component_probe(struct device *dev,
continue;
}

-   component_match_add(dev, &match, compare_of, remote);
-   of_node_put(remote);
+   component_match_add_release(dev, &match, drm_release_of,
+   compare_of, remote);
}
of_node_put(port);
}
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 9c654092ef78..1f7de47d817e 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -805,6 +805,11 @@ static int compare_of(struct device *dev, void *data)
return dev->of_node == data;
  }

+static void release_of(struct device *dev, void *data)
+{
+   of_node_put(data);
+}
+
  static int add_components(struct device *dev, struct component_match 
**matchptr,
const char *name)
  {
@@ -818,7 +823,8 @@ static int add_components(struct device *dev, struct 
component_match **matchptr,
if (!node)

94 matches

Mail list logo