Re: [RFC PATCH v4 10/15] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-08 Thread isaacm

On 2021-04-08 07:32, Will Deacon wrote:

On Wed, Apr 07, 2021 at 09:52:36PM -0700, Isaac J. Manjarres wrote:

Implement the unmap_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/io-pgtable-arm.c | 70 
++

 1 file changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index ea66b10c04c4..6700685f81d4 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -253,8 +253,8 @@ static void __arm_lpae_set_pte(arm_lpae_iopte 
*ptep, arm_lpae_iopte pte,


 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
-  unsigned long iova, size_t size, int lvl,
-  arm_lpae_iopte *ptep);
+  unsigned long iova, size_t size, size_t pgcount,
+  int lvl, arm_lpae_iopte *ptep);

 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
phys_addr_t paddr, arm_lpae_iopte prot,
@@ -298,7 +298,7 @@ static int arm_lpae_init_pte(struct 
arm_lpae_io_pgtable *data,

size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);

tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz,
+   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
 lvl, tblp) != sz) {
WARN_ON(1);
return -EINVAL;
@@ -526,14 +526,14 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,

   struct iommu_iotlb_gather *gather,
   unsigned long iova, size_t size,
   arm_lpae_iopte blk_pte, int lvl,
-  arm_lpae_iopte *ptep)
+  arm_lpae_iopte *ptep, size_t pgcount)
 {
struct io_pgtable_cfg *cfg = &data->iop.cfg;
arm_lpae_iopte pte, *tablep;
phys_addr_t blk_paddr;
size_t tablesz = ARM_LPAE_GRANULE(data);
size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-   int i, unmap_idx = -1;
+   int i, unmap_idx_start = -1, num_entries = 0, max_entries;

if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
return 0;
@@ -542,15 +542,18 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,

if (!tablep)
return 0; /* Bytes unmapped */

-   if (size == split_sz)
-   unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   if (size == split_sz) {
+   unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   max_entries = (tablesz >> ilog2(sizeof(pte))) - unmap_idx_start;
+   num_entries = min_t(int, pgcount, max_entries);
+   }

blk_paddr = iopte_to_paddr(blk_pte, data);
pte = iopte_prot(blk_pte);

for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {


Given that we already have a 'tablesz / sizeof(pte)' expression here, 
I'd be
inclined to have either a local variable or a macro helper to get at 
the

ptes_per_table value that you also need to compute max_entries.
I think a macro might be helpful, as the number of PTEs per table is 
useful in a few places.



/* Unmap! */
-   if (i == unmap_idx)
+   if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
continue;

__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, &tablep[i]);
@@ -568,38 +571,45 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,

return 0;

tablep = iopte_deref(pte, data);
-   } else if (unmap_idx >= 0) {
-   io_pgtable_tlb_add_page(&data->iop, gather, iova, size);
-   return size;
+   } else if (unmap_idx_start >= 0) {
+   for (i = 0; i < num_entries; i++)
+			io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, 
size);


I suppose we could add a count paramater to the iotlb gather stuff in
future too, but for now this is fine as this series is already pretty 
big.



Okay. I can keep this in mind for the future.

Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-04-08 Thread isaacm

On 2021-04-08 06:59, Will Deacon wrote:

On Wed, Apr 07, 2021 at 09:52:32PM -0700, Isaac J. Manjarres wrote:

From: Will Deacon 

The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this 
address

is the logical OR of the virtual and physical addresses.

Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.

No functional change.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bcd623862bf9..ab689611a03b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct 
iommu_domain *domain, dma_addr_t iova)

 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);

-static size_t iommu_pgsize(struct iommu_domain *domain,
-  unsigned long addr_merge, size_t size)
+static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long 
iova,

+  phys_addr_t paddr, size_t size)
 {
unsigned int pgsize_idx;
unsigned long pgsizes;
size_t pgsize;
+   phys_addr_t addr_merge = paddr | iova;


^^^ this needs to be 'unsigned long' as it was before (otherwise using
GENMASK _is_ a problem).


Done.

Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()

2021-04-08 Thread isaacm

On 2021-04-08 06:58, Will Deacon wrote:

On Wed, Apr 07, 2021 at 09:52:38PM -0700, Isaac J. Manjarres wrote:

Implement the unmap_pages() callback for the ARM v7s io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c

index d4004bcf333a..5e203e03c352 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -710,15 +710,32 @@ static size_t __arm_v7s_unmap(struct 
arm_v7s_io_pgtable *data,

return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
 }

-static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long 
iova,

-   size_t size, struct iommu_iotlb_gather *gather)
+static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, 
unsigned long iova,

+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *gather)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   size_t unmapped = 0, ret;

if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
return 0;

-   return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd);
+   while (pgcount--) {
+   ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
+   if (!ret)
+   break;
+
+   unmapped += pgsize;
+   iova += pgsize;
+   }
+
+   return unmapped;
+}


Wait -- don't you need to hook this up somewhere (likewise for 
->map_pages)?
Done. Likewise for map_pages(). I'm not sure how the compiler didn't 
catch this; I'm compile testing this, as I don't have hardware that uses 
the short descriptor format.

How are you testing this?

Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 09/12] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-07 Thread isaacm

On 2021-04-07 02:57, Will Deacon wrote:

On Tue, Apr 06, 2021 at 02:02:26PM -0700, isa...@codeaurora.org wrote:

On 2021-04-06 05:15, Will Deacon wrote:
> On Mon, Apr 05, 2021 at 12:11:09PM -0700, Isaac J. Manjarres wrote:
> > Implement the unmap_pages() callback for the ARM LPAE io-pgtable
> > format.
> >
> > Signed-off-by: Isaac J. Manjarres 
> > Suggested-by: Will Deacon 
> > ---
> >  drivers/iommu/io-pgtable-arm.c | 124
> > +++--
> >  1 file changed, 104 insertions(+), 20 deletions(-)
>
> [...]
>
> > +static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops,
> > unsigned long iova,
> > +size_t pgsize, size_t pgcount,
> > +struct iommu_iotlb_gather *gather)
> > +{
> > + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > + struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > + arm_lpae_iopte *ptep = data->pgd;
> > + long iaext = (s64)iova >> cfg->ias;
> > + size_t unmapped = 0, unmapped_page;
> > + int last_lvl;
> > + size_t table_size, pages, tbl_offset, max_entries;
> > +
> > + if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize ||
> > !pgcount))
> > + return 0;
> > +
> > + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> > + iaext = ~iaext;
> > + if (WARN_ON(iaext))
> > + return 0;
> > +
> > + /*
> > +  * Calculating the page table size here helps avoid situations where
> > +  * a page range that is being unmapped may be mapped at the same
> > level
> > +  * but not mapped by the same tables. Allowing such a scenario to
> > +  * occur can complicate the logic in arm_lpae_split_blk_unmap().
> > +  */
> > + last_lvl = ARM_LPAE_BLOCK_SIZE_LVL(pgsize, data);
> > +
> > + if (last_lvl == data->start_level)
> > + table_size = ARM_LPAE_PGD_SIZE(data);
> > + else
> > + table_size = ARM_LPAE_GRANULE(data);
> > +
> > + max_entries = table_size / sizeof(*ptep);
> > +
> > + while (pgcount) {
> > + tbl_offset = ARM_LPAE_LVL_IDX(iova, last_lvl, data);
> > + pages = min_t(size_t, pgcount, max_entries - tbl_offset);
> > + unmapped_page = __arm_lpae_unmap(data, gather, iova, pgsize,
> > +  pages, data->start_level,
> > +  ptep);
> > + if (!unmapped_page)
> > + break;
> > +
> > + unmapped += unmapped_page;
> > + iova += unmapped_page;
> > + pgcount -= pages;
> > + }
>
> Robin has some comments on the first version of this patch, and I
> don't think you
> addressed them:
>
> https://lore.kernel.org/r/b93fa0b1-e2a4-1aad-8b88-4d0dfecdf...@arm.com
>
> I'm inclined to agree that iterating here doesn't make a lot of sense --
> we've already come back out of the page-table walk, so I think we should
> just return to the caller (who is well prepared to handle a partial
> unmap).
> Same for the map side of things.
>
> If we get numbers showing that this is causing a performance issue, then
> we should rework the page-table code to handle this at the lower level
> (because I doubt the loop that you have is really worse than returning
> to
> the caller anyway).
>
Sorry, I seem to have overlooked those comments.

I will go ahead and address them. I think it might be ideal to try to 
do
as much work as possible in the io-pgtable level, so as to minimize 
the

number
of indirect calls incurred by jumping back and forth between iommu 
fwk,

iommu
driver, and io-pgtable code.

Perhaps we can try something like how the linear mappings are created 
on

arm64 i.e.
on the previous level, we can determine how many pages can be unmapped 
in

one page table
in one iteration, and on the subsequent iterations, we can tackle 
another

page table at
the lower level. Looking at the code, it doesn't seem too difficult to 
add

this in. Thoughts?


I don't object to getting there eventually, but as an initial step I 
think
returning back to the caller is perfectly reasonable and will probably 
make
the patches easier to review. In other words, implement the simple 
(correct)

thing first, and then have subsequent patches to improve it.

Will
Thanks for the feedback. I've implemented the simpler approach in v4 of 
the patches: 
https://lore.kernel.org/linux-iommu/20210408045241.27316-1-isa...@codeaurora.org/T/#t.


Thanks,
Isaac
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 03/12] iommu/io-pgtable: Introduce map_pages() as a page table op

2021-04-06 Thread isaacm

On 2021-04-06 04:57, Will Deacon wrote:

On Mon, Apr 05, 2021 at 12:11:03PM -0700, Isaac J. Manjarres wrote:

Mapping memory into io-pgtables follows the same semantics
that unmapping memory used to follow (i.e. a buffer will be
mapped one page block per call to the io-pgtable code). This
means that it can be optimized in the same way that unmapping
memory was, so add a map_pages() callback to the io-pgtable
ops structure, so that a range of pages of the same size
can be mapped within the same call.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 include/linux/io-pgtable.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 2ed0c057d9e7..019149b204b8 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -143,6 +143,7 @@ struct io_pgtable_cfg {
  * struct io_pgtable_ops - Page table manipulation API for IOMMU 
drivers.

  *
  * @map:  Map a physically contiguous memory region.
+ * @map_pages:Map a physically contiguous range of pages of the 
same size.

  * @unmap:Unmap a physically contiguous memory region.
  * @unmap_pages:  Unmap a range of virtually contiguous pages of the 
same size.

  * @iova_to_phys: Translate iova to physical address.
@@ -153,6 +154,9 @@ struct io_pgtable_cfg {
 struct io_pgtable_ops {
int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+   int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped);


How about returning 'size_t' and using IS_ERR_VALUE() instead of adding
the extra 'mapped' argument (i.e. return the size of the region mapped
or an error code)? I don't think we realistically need to care about 
map

sizes that overlap with the error region.

I'd given that a shot before, but the problem that I kept running into 
was that
in case of an error, if I return an error code, I don't know how much 
memory
was mapped, so that I can invoke iommu_unmap from __iommu_map with that 
size to

undo the partial mappings from a map_pages() call.

Returning the amount of memory that was mapped in the case of an error 
will be
less than the size that was requested, but then we lose the information 
about why
the error happened, since the error code won't be returned, so that's 
why I went
with the current approach. Do you have any other ideas about how to 
handle this?


I'd thought of having arm_lpae_map_pages() invoke 
arm_lpae_unmap_pages(), but
the TLB maintenance was a problem, as we wouldn't invoke 
iommu_iotlb_sync().


Thanks,
Isaac

Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 09/12] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-06 Thread isaacm

On 2021-04-06 05:15, Will Deacon wrote:

On Mon, Apr 05, 2021 at 12:11:09PM -0700, Isaac J. Manjarres wrote:

Implement the unmap_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/io-pgtable-arm.c | 124 
+++--

 1 file changed, 104 insertions(+), 20 deletions(-)


[...]

+static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, 
unsigned long iova,

+  size_t pgsize, size_t pgcount,
+  struct iommu_iotlb_gather *gather)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct io_pgtable_cfg *cfg = &data->iop.cfg;
+   arm_lpae_iopte *ptep = data->pgd;
+   long iaext = (s64)iova >> cfg->ias;
+   size_t unmapped = 0, unmapped_page;
+   int last_lvl;
+   size_t table_size, pages, tbl_offset, max_entries;
+
+	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || 
!pgcount))

+   return 0;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   iaext = ~iaext;
+   if (WARN_ON(iaext))
+   return 0;
+
+   /*
+* Calculating the page table size here helps avoid situations where
+	 * a page range that is being unmapped may be mapped at the same 
level

+* but not mapped by the same tables. Allowing such a scenario to
+* occur can complicate the logic in arm_lpae_split_blk_unmap().
+*/
+   last_lvl = ARM_LPAE_BLOCK_SIZE_LVL(pgsize, data);
+
+   if (last_lvl == data->start_level)
+   table_size = ARM_LPAE_PGD_SIZE(data);
+   else
+   table_size = ARM_LPAE_GRANULE(data);
+
+   max_entries = table_size / sizeof(*ptep);
+
+   while (pgcount) {
+   tbl_offset = ARM_LPAE_LVL_IDX(iova, last_lvl, data);
+   pages = min_t(size_t, pgcount, max_entries - tbl_offset);
+   unmapped_page = __arm_lpae_unmap(data, gather, iova, pgsize,
+pages, data->start_level,
+ptep);
+   if (!unmapped_page)
+   break;
+
+   unmapped += unmapped_page;
+   iova += unmapped_page;
+   pgcount -= pages;
+   }


Robin has some comments on the first version of this patch, and I
don't think you
addressed them:

https://lore.kernel.org/r/b93fa0b1-e2a4-1aad-8b88-4d0dfecdf...@arm.com

I'm inclined to agree that iterating here doesn't make a lot of sense 
--
we've already come back out of the page-table walk, so I think we 
should
just return to the caller (who is well prepared to handle a partial 
unmap).

Same for the map side of things.

If we get numbers showing that this is causing a performance issue, 
then

we should rework the page-table code to handle this at the lower level
(because I doubt the loop that you have is really worse than returning 
to

the caller anyway).


Sorry, I seem to have overlooked those comments.

I will go ahead and address them. I think it might be ideal to try to do
as much work as possible in the io-pgtable level, so as to minimize the 
number
of indirect calls incurred by jumping back and forth between iommu fwk, 
iommu

driver, and io-pgtable code.

Perhaps we can try something like how the linear mappings are created on 
arm64 i.e.
on the previous level, we can determine how many pages can be unmapped 
in one page table
in one iteration, and on the subsequent iterations, we can tackle 
another page table at
the lower level. Looking at the code, it doesn't seem too difficult to 
add this in. Thoughts?


Thanks,
Isaac


Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 06/12] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-04-06 Thread isaacm

On 2021-04-06 04:53, Will Deacon wrote:

On Mon, Apr 05, 2021 at 12:11:06PM -0700, Isaac J. Manjarres wrote:

From: Will Deacon 

The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this 
address

is the logical OR of the virtual and physical addresses.

Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.

No functional change.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9006397b6604..a3bbf7e310b0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct 
iommu_domain *domain, dma_addr_t iova)

 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);

-static size_t iommu_pgsize(struct iommu_domain *domain,
-  unsigned long addr_merge, size_t size)
+static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long 
iova,

+  phys_addr_t paddr, size_t size)
 {
unsigned int pgsize_idx;
unsigned long pgsizes;
size_t pgsize;
+   phys_addr_t addr_merge = paddr | iova;


Huh, so this was 'unsigned long' before and, given that the 
pgsize_bitmap

on the domain is also unsigned long, then I think that's fine. So using
that would mean you don't need GENMASK_ULL for this guy either.

Will

Thanks, I will address this in version 4 of the series.

--Isaac
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 07/12] iommu: Hook up '->unmap_pages' driver callback

2021-04-05 Thread isaacm

On 2021-04-04 23:00, Lu Baolu wrote:

Hi,

On 4/2/21 9:34 AM, Isaac J. Manjarres wrote:

  static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather)
@@ -2476,7 +2519,7 @@ static size_t __iommu_unmap(struct iommu_domain 
*domain,

unsigned long orig_iova = iova;
unsigned int min_pagesz;
  - if (unlikely(ops->unmap == NULL ||
+   if (unlikely((ops->unmap == NULL && ops->unmap_pages == NULL) ||
 domain->pgsize_bitmap == 0UL))


This change should also be applied to __iommu_map() path. And perhaps
could be:

	if (unlikely(!(ops->unmap || ops->unmap_pages) || 
!domain->pgsize_bitmap))


Yep, that's correct. Thank you for spotting that; I've updated it in the 
latest series: 
https://lore.kernel.org/linux-iommu/20210405191112.28192-1-isa...@codeaurora.org/T/#t.


Thanks,
Isaac

Best regards,
baolu

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/5] iommu: Add an unmap_pages() op for IOMMU drivers

2021-04-02 Thread isaacm

On 2021-03-30 22:39, Lu Baolu wrote:

On 3/31/21 1:36 PM, isa...@codeaurora.org wrote:

On 2021-03-30 21:47, Lu Baolu wrote:

On 3/31/21 11:00 AM, Isaac J. Manjarres wrote:

Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can call
into the io-pgtable code, to unmap a virtually contiguous
range of pages of the same size.

For IOMMU drivers that do not specify an unmap_pages() callback,
the existing logic of unmapping memory one page block at a time
will be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
  include/linux/iommu.h | 4 
  1 file changed, 4 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..9cf81242581a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -193,6 +193,7 @@ struct iommu_iotlb_gather {
   * @detach_dev: detach device from an iommu domain
   * @map: map a physically contiguous memory region to an iommu 
domain
   * @unmap: unmap a physically contiguous memory region from an 
iommu domain
+ * @unmap_pages: unmap a number of pages of the same size from an 
iommu domain
   * @flush_iotlb_all: Synchronously flush all hardware TLBs for 
this domain
   * @iotlb_sync_map: Sync mappings created recently using @map to 
the hardware
   * @iotlb_sync: Flush all queued ranges from the hardware TLBs and 
empty flush

@@ -245,6 +246,9 @@ struct iommu_ops {
 phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
  size_t (*unmap)(struct iommu_domain *domain, unsigned long 
iova,
   size_t size, struct iommu_iotlb_gather 
*iotlb_gather);
+    size_t (*unmap_pages)(struct iommu_domain *domain, unsigned 
long iova,

+  size_t pgsize, size_t pgcount,
+  struct iommu_iotlb_gather *iotlb_gather);


Is it possible to add an equivalent map_pages() callback?
Yes, map_pages() can be implemented and can leverage a lot of the 
implementation
of unmap_pages(). The problem is that if you map several pages in one 
call, and then
encounter an error and have to rollback, you should do TLB 
maintenance, as iommu_map
does when it encounters an error. However, we can't call iommu_unmap 
from io-pgtable-arm
for example. We can call arm_lpae_unmap_pages() from the later 
patches, but that doesn't
solve the TLB maintenance issue. Do you have any thoughts on how to 
address this?


Call unmap_pages() with the same pages and size to roll back. Does it
work?

Best regards,
baolu

Hi Lu,

I've given map_pages() a shot. Here's the second version of the RFC 
series: 
https://lore.kernel.org/linux-iommu/20210402013452.4013-1-isa...@codeaurora.org/T/#t.


Thanks,
Isaac





  void (*flush_iotlb_all)(struct iommu_domain *domain);
  void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned 
long iova,

 size_t size);



Best regards,
baolu


___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 3/6] iommu: Use bitmap to calculate page size in iommu_pgsize()

2021-04-01 Thread isaacm

On 2021-04-01 09:47, Will Deacon wrote:

Avoid the potential for shifting values by amounts greater than the
width of their type by using a bitmap to compute page size in
iommu_pgsize().

Signed-off-by: Will Deacon 
---
 drivers/iommu/iommu.c | 31 ---
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..bcd623862bf9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -8,6 +8,7 @@

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2360,30 +2361,22 @@ static size_t iommu_pgsize(struct iommu_domain 
*domain,

   unsigned long addr_merge, size_t size)
 {
unsigned int pgsize_idx;
+   unsigned long pgsizes;
size_t pgsize;

-   /* Max page size that still fits into 'size' */
-   pgsize_idx = __fls(size);
+   /* Page sizes supported by the hardware and small enough for @size */
+   pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
I've fixed this in the latest RFC for the iommu_map/unmap optimization 
patches,
but for the sake of completeness: I think this should be GENMASK_ULL, in 
case

__fls(size) >= 32.

Thank you for these patches, by the way. I've looked through them and 
they
make sense/seem correct. I've integrated them into the latest RFC: 
https://lore.kernel.org/linux-iommu/20210402013452.4013-1-isa...@codeaurora.org/T/#t.


Thanks,
Isaac


-   /* need to consider alignment requirements ? */
-   if (likely(addr_merge)) {
-   /* Max page size allowed by address */
-   unsigned int align_pgsize_idx = __ffs(addr_merge);
-   pgsize_idx = min(pgsize_idx, align_pgsize_idx);
-   }
-
-   /* build a mask of acceptable page sizes */
-   pgsize = (1UL << (pgsize_idx + 1)) - 1;
-
-   /* throw away page sizes not supported by the hardware */
-   pgsize &= domain->pgsize_bitmap;
+   /* Constrain the page sizes further based on the maximum alignment */
+   if (likely(addr_merge))
+   pgsizes &= GENMASK(__ffs(addr_merge), 0);

-   /* make sure we're still sane */
-   BUG_ON(!pgsize);
+   /* Make sure we have at least one suitable page size */
+   BUG_ON(!pgsizes);

-   /* pick the biggest page */
-   pgsize_idx = __fls(pgsize);
-   pgsize = 1UL << pgsize_idx;
+   /* Pick the biggest page size remaining */
+   pgsize_idx = __fls(pgsizes);
+   pgsize = BIT(pgsize_idx);

return pgsize;
 }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/5] iommu: Add an unmap_pages() op for IOMMU drivers

2021-03-30 Thread isaacm

On 2021-03-30 21:47, Lu Baolu wrote:

On 3/31/21 11:00 AM, Isaac J. Manjarres wrote:

Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can call
into the io-pgtable code, to unmap a virtually contiguous
range of pages of the same size.

For IOMMU drivers that do not specify an unmap_pages() callback,
the existing logic of unmapping memory one page block at a time
will be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
  include/linux/iommu.h | 4 
  1 file changed, 4 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..9cf81242581a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -193,6 +193,7 @@ struct iommu_iotlb_gather {
   * @detach_dev: detach device from an iommu domain
   * @map: map a physically contiguous memory region to an iommu 
domain
   * @unmap: unmap a physically contiguous memory region from an iommu 
domain
+ * @unmap_pages: unmap a number of pages of the same size from an 
iommu domain
   * @flush_iotlb_all: Synchronously flush all hardware TLBs for this 
domain
   * @iotlb_sync_map: Sync mappings created recently using @map to the 
hardware
   * @iotlb_sync: Flush all queued ranges from the hardware TLBs and 
empty flush

@@ -245,6 +246,9 @@ struct iommu_ops {
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size, struct iommu_iotlb_gather *iotlb_gather);
+	size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long 
iova,

+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *iotlb_gather);


Is it possible to add an equivalent map_pages() callback?
Yes, map_pages() can be implemented and can leverage a lot of the 
implementation
of unmap_pages(). The problem is that if you map several pages in one 
call, and then
encounter an error and have to rollback, you should do TLB maintenance, 
as iommu_map
does when it encounters an error. However, we can't call iommu_unmap 
from io-pgtable-arm
for example. We can call arm_lpae_unmap_pages() from the later patches, 
but that doesn't
solve the TLB maintenance issue. Do you have any thoughts on how to 
address this?



void (*flush_iotlb_all)(struct iommu_domain *domain);
  	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long 
iova,

   size_t size);



Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/5] Optimize iommu_map_sg() performance

2021-01-21 Thread isaacm

On 2021-01-12 08:00, Robin Murphy wrote:

On 2021-01-11 14:54, Isaac J. Manjarres wrote:

The iommu_map_sg() code currently iterates through the given
scatter-gather list, and in the worst case, invokes iommu_map()
for each element in the scatter-gather list, which calls into
the IOMMU driver through an indirect call. For an IOMMU driver
that uses a format supported by the io-pgtable code, the IOMMU
driver will then call into the io-pgtable code to map the chunk.

Jumping between the IOMMU core code, the IOMMU driver, and the
io-pgtable code and back for each element in a scatter-gather list
is not efficient.

Instead, add a map_sg() hook in both the IOMMU driver ops and the
io-pgtable ops. iommu_map_sg() can then call into the IOMMU driver's
map_sg() hook with the entire scatter-gather list, which can call
into the io-pgtable map_sg() hook, which can process the entire
scatter-gather list, signficantly reducing the number of indirect
calls, and jumps between these layers, boosting performance.


Out of curiosity, how much of the difference is attributable to actual
indirect call overhead vs. the additional massive reduction in visits
to arm_smmu_rpm_{get,put} that you fail to mention?There are ways to
I did an experiment where I compared the two approaches without any 
calls

to arm_smmu_rpm_[get/put]. There's still a large amount of difference
without the overhead incurred by power management calls. Here are the 
results:


no optimizations and no power management calls:
 sizeiommu_map_sg
  4K0.609 us
 64K8.583 us
  1M  136.083 us
  2M  273.145 us
 12M 1442.119 us
 24M 2876.078 us
 32M 3832.041 us

iommu_map_sg optimizations and no power management calls:
sizeiommu_map_sg
  4K0.645 us
 64K1.229 us
  1M9.531 us
  2M   23.198 us
 12M   99.250 us
 24M  185.713 us
 32M  248.781 us

From here, we can see that the amount of latency incurred by the 
indirect

calls is fairly large.


optimise indirect calling that would benefit *all* cases, rather than
just one operation for one particular driver.
Do you mind sharing some more information on how to optimize the 
existing

approach further, such that it benefits other drivers as well?



On a system that uses the ARM SMMU driver, and the ARM LPAE format,
the current implementation of iommu_map_sg() yields the following
latencies for mapping scatter-gather lists of various sizes. These
latencies are calculated by repeating the mapping operation 10 times:

 sizeiommu_map_sg latency
   4K0.624 us
  64K9.468 us
   1M  122.557 us
   2M  239.807 us
  12M 1435.979 us
  24M 2884.968 us
  32M 3832.979 us

On the same system, the proposed modifications yield the following
results:

 sizeiommu_map_sg latency
   4K3.645 us
  64K4.198 us
   1M   11.010 us
   2M   17.125 us
  12M   82.416 us
  24M  158.677 us
  32M  210.468 us

The procedure for collecting the iommu_map_sg latencies is
the same in both experiments. Clearly, reducing the jumps
between the different layers in the IOMMU code offers a
signficant performance boost in iommu_map_sg() latency.


Presumably those are deliberately worst-case numbers? After all, a
32MB scatterlist *could* incur less overhead than a 64KB one if things
line up just right (still 16 ->map calls, but each with one fewer

Yes, these are worst case numbers (i.e. a buffer is composed entirely
of 4 KB pages, so higher order mappings don't get used).

level of pagetable to traverse). TBH I find the significant regression
of the 4KB case the most interesting - what's going on there?

That was an error on my part. After fixing my error, I observed that the
time spent mapping the 4 KB buffer is comparable with and without 
optimizations,

which is expected.


My main reservation here is that we get an explosion of duplicate
copies of almost the same code, and it's code that's just non-trivial
enough to start being bug-prone. And it's all still only for one
specific operation - your argument about calling through multiple
layers for each element applies just as much to iommu_map() itself, so
why aren't we trying to make more fundamental improvements with wider
benefits? Indeed I can't imagine the existing iommu_map_sg() loop
really adds significant overhead compared to a single iommu_map() call
that results in the equivalent set of ->map calls to the driver.

At a glance, I reckon that simply extending the internal ->map and
->unmap interfaces to encode a number of consecutive identical pages
would already get us a large chunk of the way there; then we'd be in a
better place to consider options for the io-pgtable interface.

Do you mean physical

Re: [PATCH 0/5] Optimize iommu_map_sg() performance

2021-01-11 Thread isaacm

On 2021-01-10 23:52, Sai Prakash Ranjan wrote:

On 2021-01-11 11:52, Sai Prakash Ranjan wrote:

Hi Isaac,

I gave this series a go on chromebook and saw these warnings
and several device probe failures, logs attached below:

WARN corresponds to this code in arm_lpae_map_by_pgsize()

if (WARN_ON(iaext || (paddr + size) >> cfg->oas))
return -ERANGE;

Logs:

[2.411391] [ cut here ]
[2.416149] WARNING: CPU: 6 PID: 56 at
drivers/iommu/io-pgtable-arm.c:492 arm_lpae_map_sg+0x234/0x248
[2.425606] Modules linked in:
[2.428749] CPU: 6 PID: 56 Comm: kworker/6:1 Not tainted 5.10.5 
#970

[2.440287] Workqueue: events deferred_probe_work_func
[2.445563] pstate: 20c9 (nzCv daif +PAN +UAO -TCO BTYPE=--)
[2.451726] pc : arm_lpae_map_sg+0x234/0x248
[2.456112] lr : arm_lpae_map_sg+0xe0/0x248
[2.460410] sp : ffc010513750
[2.463820] x29: ffc010513790 x28: ffb943332000
[2.469281] x27: 000ff000 x26: ffb943d14900
[2.474738] x25: 1000 x24: 000103465000
[2.480196] x23: 0001 x22: 000103466000
[2.485645] x21: 0003 x20: 0a20
[2.491103] x19: ffc010513850 x18: 0001
[2.496562] x17: 0002 x16: 
[2.502021] x15:  x14: 
[2.507479] x13: 0001 x12: 
[2.512928] x11: 0010 x10: 
[2.518385] x9 : 0001 x8 : 40201000
[2.523844] x7 : 0a20 x6 : ffb943463000
[2.529302] x5 : 0003 x4 : 1000
[2.534760] x3 : 0001 x2 : ffb941f605a0
[2.540219] x1 : 0003 x0 : 0e40
[2.545679] Call trace:
[2.548196]  arm_lpae_map_sg+0x234/0x248
[2.552225]  arm_smmu_map_sg+0x80/0xc4
[2.556078]  __iommu_map_sg+0x6c/0x188
[2.559931]  iommu_map_sg_atomic+0x18/0x20
[2.564144]  iommu_dma_alloc_remap+0x26c/0x34c
[2.568703]  iommu_dma_alloc+0x9c/0x268
[2.572647]  dma_alloc_attrs+0x88/0xfc
[2.576503]  gsi_ring_alloc+0x50/0x144
[2.580356]  gsi_init+0x2c4/0x5c4
[2.583766]  ipa_probe+0x14c/0x2b4
[2.587263]  platform_drv_probe+0x94/0xb4
[2.591377]  really_probe+0x138/0x348
[2.595145]  driver_probe_device+0x80/0xb8
[2.599358]  __device_attach_driver+0x90/0xa8
[2.603829]  bus_for_each_drv+0x84/0xcc
[2.607772]  __device_attach+0xc0/0x148
[2.611713]  device_initial_probe+0x18/0x20
[2.616012]  bus_probe_device+0x38/0x94
[2.619953]  deferred_probe_work_func+0x78/0xb0
[2.624611]  process_one_work+0x210/0x3dc
[2.628726]  worker_thread+0x284/0x3e0
[2.632578]  kthread+0x148/0x1a8
[2.635891]  ret_from_fork+0x10/0x18
[2.639562] ---[ end trace 9bac18cad6a9862e ]---
[2.644414] ipa 1e4.ipa: error -12 allocating channel 0 event 
ring

[2.651656] ipa: probe of 1e4.ipa failed with error -12
[2.660072] dwc3 a60.dwc3: Adding to iommu group 8
[2.668632] xhci-hcd xhci-hcd.13.auto: xHCI Host Controller
[2.674680] xhci-hcd xhci-hcd.13.auto: new USB bus registered,
assigned bus number 1



...

Isaac provided a fix which he will post as v2 and no warnings were 
observed

with that fix.

Tested-by: Sai Prakash Ranjan 

Thanks,
Sai


Thanks for testing out the patches. I've added the fix (there was an 
off-by-one error in the calculation
used to check if the IOVA/physical addresses are within limits) to 
version 2 of the series:

https://lore.kernel.org/linux-iommu/1610376862-927-1-git-send-email-isa...@codeaurora.org/T/#t

Thanks,
Isaac
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache

2021-01-08 Thread isaacm

On 2021-01-08 10:18, Will Deacon wrote:

On Fri, Jan 08, 2021 at 11:17:25AM +0530, Sai Prakash Ranjan wrote:

On 2021-01-07 22:27, isa...@codeaurora.org wrote:
> On 2021-01-06 03:56, Will Deacon wrote:
> > On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
> > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY
> > > flag")
> > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > the memory type setting required for the non-coherent masters to use
> > > system cache. Now that system cache support for GPU is added, we will
> > > need to mark the memory as normal sys-cached for GPU to use
> > > system cache.
> > > Without this, the system cache lines are not allocated for GPU.
> > > We use
> > > the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page
> > > protection
> > > flag as the flag cannot be exposed via DMA api because of no in-tree
> > > users.
> > >
> > > Signed-off-by: Sai Prakash Ranjan 
> > > ---
> > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > b/drivers/iommu/io-pgtable-arm.c
> > > index 7c9ea9d7874a..3fb7de8304a2 100644
> > > --- a/drivers/iommu/io-pgtable-arm.c
> > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > >  else if (prot & IOMMU_CACHE)
> > >  pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > >  << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > +else if (data->iop.cfg.quirks & 
IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
> > > +pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > +<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > >  }
> >
> While this approach of enabling system cache globally for both page
> tables and other buffers
> works for the GPU usecase, this isn't ideal for other clients that use
> system cache. For example,
> video clients only want to cache a subset of their buffers in the
> system cache, due to the sizing constraint
> imposed by how much of the system cache they can use. So, it would be
> ideal to have
> a way of expressing the desire to use the system cache on a per-buffer
> basis. Additionally,
> our video clients use the DMA layer, and since the requirement is for
> caching in the system cache
> to be a per buffer attribute, it seems like we would have to have a
> DMA attribute to express
> this on a per-buffer basis.
>

I did bring this up initially [1], also where is this video client
in upstream? AFAIK, only system cache user in upstream is GPU.
We cannot add any DMA attribute unless there is any user upstream
as per [2], so when the support for such a client is added, wouldn't
((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || 
PROT_FLAG)

work?


Hmm, I think this is another case where we need to separate out the
page-table walker attributes from the access attributes. Currently,
IO_PGTABLE_QUIRK_ARM_OUTER_WBWA applies _only_ to the page-table walker
and I don't think it makes any sense for that to be per-buffer (how 
would
you even manage that?). However, if we want to extend this to data 
accesses
and we know that there are valid use-cases where this should be 
per-buffer,
then shoe-horning it in with the walker quirk does not feel like the 
best

thing to do.
Right, I agree that this seems something that merits the same level of 
separation
that exists for the page table walker attributes with respect to 
coherency, and
data buffer attributes with respect to coherency (i.e page table walker 
coherency
does not imply data buffer coherency--that is driven through 
IOMMU_CACHE).


As a starting point, we could:

  1. Rename IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
  2. Add a new prot flag IOMMU_LLC
  3. Have the GPU pass the new prot for its buffer mappings

Does that work? One thing I'm not sure about is whether IOMMU_CACHE 
should
Yes, that should work, as that'll leave the door open for there to be a 
DMA attribute

that can be wired up to IOMMU_LLC.
imply IOMMU_LLC, or whether there is a use-case for inner-cacheable, 
outer
non-cacheable mappings for a coherent device. Have you ever seen that 
sort
I'm not aware of such a usecase, but I believe that a coherent device 
will
have their buffers cached in the system cache anyway, as well as the CPU 
caches.


--Isaac

of thing before?

Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache

2021-01-08 Thread isaacm

On 2021-01-07 21:47, Sai Prakash Ranjan wrote:

On 2021-01-07 22:27, isa...@codeaurora.org wrote:

On 2021-01-06 03:56, Will Deacon wrote:

On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
flag")

removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
the memory type setting required for the non-coherent masters to use
system cache. Now that system cache support for GPU is added, we 
will
need to mark the memory as normal sys-cached for GPU to use system 
cache.
Without this, the system cache lines are not allocated for GPU. We 
use
the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page 
protection

flag as the flag cannot be exposed via DMA api because of no in-tree
users.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 7c9ea9d7874a..3fb7de8304a2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -415,6 +415,9 @@ static arm_lpae_iopte 
arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,

else if (prot & IOMMU_CACHE)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+   else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+   pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
+   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
}



While this approach of enabling system cache globally for both page
tables and other buffers
works for the GPU usecase, this isn't ideal for other clients that use
system cache. For example,
video clients only want to cache a subset of their buffers in the
system cache, due to the sizing constraint
imposed by how much of the system cache they can use. So, it would be
ideal to have
a way of expressing the desire to use the system cache on a per-buffer
basis. Additionally,
our video clients use the DMA layer, and since the requirement is for
caching in the system cache
to be a per buffer attribute, it seems like we would have to have a
DMA attribute to express
this on a per-buffer basis.



I did bring this up initially [1], also where is this video client
in upstream? AFAIK, only system cache user in upstream is GPU.
We cannot add any DMA attribute unless there is any user upstream

Right, there wouldn't be an upstream user, which would be problematic,
but I was thinking of having it so that when video or any of our other
clients that use this attribute on a per buffer basis upstreams their
code, it's not too much of a stretch to add the support.

as per [2], so when the support for such a client is added, wouldn't
((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || PROT_FLAG)
work?
I don't think that will work, because we currently have clients who use 
the

system cache as follows:
-cache only page tables in the system cache
-cache only data buffers in the system cache
-cache both page tables and all buffers in the system cache
-cache both page tables and some buffers in the system cache

The approach you're suggesting doesn't allow for the last case, as 
caching the
page tables in the system cache involves setting 
IO_PGTABLE_QUIRK_ARM_OUTER_WBWA,
so we will end up losing the flexibility to cache some data buffers in 
the system cache.


Ideally, the page table quirk would drive the settings for the TCR, and 
the prot flag
drives the PTE for the mapping, as is done with the page table walker 
being dma-coherent,

while buffers are mapped as cacheable based on IOMMU_CACHE. Thoughts?

Thanks,
Isaac


[1]
https://lore.kernel.org/dri-devel/ecfda7ca80f6d7b4ff3d89b8758f4...@codeaurora.org/
[2] 
https://lore.kernel.org/linux-iommu/20191026053026.ga14...@lst.de/T/


Thanks,
Sai

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache

2021-01-07 Thread isaacm

On 2021-01-06 03:56, Will Deacon wrote:

On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:

commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
the memory type setting required for the non-coherent masters to use
system cache. Now that system cache support for GPU is added, we will
need to mark the memory as normal sys-cached for GPU to use system 
cache.

Without this, the system cache lines are not allocated for GPU. We use
the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page protection
flag as the flag cannot be exposed via DMA api because of no in-tree
users.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 7c9ea9d7874a..3fb7de8304a2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,

else if (prot & IOMMU_CACHE)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+   else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+   pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
+   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
}


While this approach of enabling system cache globally for both page 
tables and other buffers
works for the GPU usecase, this isn't ideal for other clients that use 
system cache. For example,
video clients only want to cache a subset of their buffers in the system 
cache, due to the sizing constraint
imposed by how much of the system cache they can use. So, it would be 
ideal to have
a way of expressing the desire to use the system cache on a per-buffer 
basis. Additionally,
our video clients use the DMA layer, and since the requirement is for 
caching in the system cache
to be a per buffer attribute, it seems like we would have to have a DMA 
attribute to express

this on a per-buffer basis.

Thanks,
Isaac

drivers/iommu/io-pgtable.c currently documents this quirk as applying 
only
to the page-table walker. Given that we only have one user at the 
moment,

I think it's ok to change that, but please update the comment.

We also need to decide on whether we want to allow the quirk to be 
passed
if the coherency of the page-table walker differs from the DMA device, 
since

we have these combinations:

Coherent walker?IOMMU_CACHE IO_PGTABLE_QUIRK_ARM_OUTER_WBWA
0:  N   0   0
1:  N   0   1
2:  N   1   0
3:  N   1   1
4:  Y   0   0
5:  Y   0   1
6:  Y   1   0
7:  Y   1   1

Some of them are obviously bogus, such as (7), but I don't know what to
do about cases such as (3) and (5).

Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration

2020-12-23 Thread isaacm

On 2020-12-23 05:44, Robin Murphy wrote:

On 2020-12-22 19:54, isa...@codeaurora.org wrote:

On 2020-12-22 11:27, Robin Murphy wrote:

On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The io-pgtable code constructs an array of init functions for each
page table format at compile time. This is not ideal, as this
increases the footprint of the io-pgtable code, as well as prevents
io-pgtable formats from being built as kernel modules.

In preparation for modularizing the io-pgtable formats, switch to a
dynamic registration scheme, where each io-pgtable format can 
register

their init functions with the io-pgtable code at boot or module
insertion time.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/io-pgtable-arm-v7s.c | 34 +-
  drivers/iommu/io-pgtable-arm.c | 90 
++--
  drivers/iommu/io-pgtable.c | 94 
--

  include/linux/io-pgtable.h | 51 +
  4 files changed, 209 insertions(+), 60 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c

index 1d92ac9..89aad2f 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -835,7 +836,8 @@ static struct io_pgtable 
*arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,

  return NULL;
  }
  -struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
+static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
+    .fmt    = ARM_V7S,
  .alloc    = arm_v7s_alloc_pgtable,
  .free    = arm_v7s_free_pgtable,
  };
@@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void)
  pr_info("self test ok\n");
  return 0;
  }
-subsys_initcall(arm_v7s_do_selftests);
+#else
+static int arm_v7s_do_selftests(void)
+{
+    return 0;
+}
  #endif
+
+static int __init arm_v7s_init(void)
+{
+    int ret;
+
+    ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns);
+    if (ret < 0) {
+    pr_err("Failed to register ARM V7S format\n");


Super-nit: I think "v7s" should probably be lowercase there. Also
general consistency WRT to showing the error code and whether or not
to abbreviate "format" would be nice.


Ok, I can fix this accordingly.


+    return ret;
+    }
+
+    ret = arm_v7s_do_selftests();
+    if (ret < 0)
+    io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+
+    return ret;
+}
+core_initcall(arm_v7s_init);
+
+static void __exit arm_v7s_exit(void)
+{
+    io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+}
+module_exit(arm_v7s_exit);
diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 87def58..ff0ea2f 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct 
io_pgtable_cfg *cfg, void *cookie)

  return NULL;
  }
  -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
-    .alloc    = arm_64_lpae_alloc_pgtable_s1,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
-    .alloc    = arm_64_lpae_alloc_pgtable_s2,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
-    .alloc    = arm_32_lpae_alloc_pgtable_s1,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
-    .alloc    = arm_32_lpae_alloc_pgtable_s2,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
-    .alloc    = arm_mali_lpae_alloc_pgtable,
-    .free    = arm_lpae_free_pgtable,
+static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = 
{

+    {
+    .fmt    = ARM_32_LPAE_S1,
+    .alloc    = arm_32_lpae_alloc_pgtable_s1,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_32_LPAE_S2,
+    .alloc    = arm_32_lpae_alloc_pgtable_s2,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_64_LPAE_S1,
+    .alloc    = arm_64_lpae_alloc_pgtable_s1,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_64_LPAE_S2,
+    .alloc    = arm_64_lpae_alloc_pgtable_s2,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_MALI_LPAE,
+    .alloc    = arm_mali_lpae_alloc_pgtable,
+    .free    = arm_lpae_free_pgtable,
+    },
  };
    #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
@@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void)
  pr_info("selftest: completed with %d PASS %d FAIL\n", pass, 
fail);

  return fail ? -EFAULT : 0;
  }
-subsys_initcall(arm_lpae_do_selftests);
+#else
+static int __init arm_lpae_do_selftests(void)
+{
+  

Re: [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules

2020-12-23 Thread isaacm

On 2020-12-23 05:05, Robin Murphy wrote:

On 2020-12-22 19:49, isa...@codeaurora.org wrote:

On 2020-12-22 11:27, Robin Murphy wrote:

On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The SMMU driver depends on the availability of the ARM LPAE and
ARM V7S io-pgtable format code to work properly. In preparation


Nit: we don't really depend on v7s - we *can* use it if it's
available, address constraints are suitable, and the SMMU
implementation actually supports it (many don't), but we can still
quite happily not use it even so. LPAE is mandatory in the
architecture so that's our only hard requirement, embodied in the
kconfig select.

This does mean there may technically still be a corner case involving
ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime
failure rather than a build error, so unless and until anyone
demonstrates that it actually matters I don't feel particularly
inclined to give it much thought.

Robin.


Okay, I'll fix up the commit message, as well as the code, so that it
only depends on io-pgtable-arm.


Well, IIUC it would make sense to keep the softdep for when the v7s
module *is* present; I just wanted to clarify that it's more of a
nice-to-have rather than a necessity.

Robin.

Understood, I will keep it there and reword the commit msg. I just tried 
it out in an environment

where the io-pgtable-arm-v7s module isn't present, and I didn't see any
warnings or error messages, and the SMMU driver module was loaded 
properly,

so yes, it's good to have it.

Thanks,
Isaac

Thanks,
Isaac

for having the io-pgtable formats as modules, add a "pre"
dependency with MODULE_SOFTDEP() to ensure that the io-pgtable
format modules are loaded before loading the ARM SMMU driver module.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index d8c6bfd..a72649f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM 
architected SMMU implementations");

  MODULE_AUTHOR("Will Deacon ");
  MODULE_ALIAS("platform:arm-smmu");
  MODULE_LICENSE("GPL v2");
+MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s");



___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration

2020-12-22 Thread isaacm

On 2020-12-22 11:27, Robin Murphy wrote:

On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The io-pgtable code constructs an array of init functions for each
page table format at compile time. This is not ideal, as this
increases the footprint of the io-pgtable code, as well as prevents
io-pgtable formats from being built as kernel modules.

In preparation for modularizing the io-pgtable formats, switch to a
dynamic registration scheme, where each io-pgtable format can register
their init functions with the io-pgtable code at boot or module
insertion time.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/io-pgtable-arm-v7s.c | 34 +-
  drivers/iommu/io-pgtable-arm.c | 90 
++--
  drivers/iommu/io-pgtable.c | 94 
--

  include/linux/io-pgtable.h | 51 +
  4 files changed, 209 insertions(+), 60 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c

index 1d92ac9..89aad2f 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -835,7 +836,8 @@ static struct io_pgtable 
*arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,

return NULL;
  }
  -struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
+static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
+   .fmt= ARM_V7S,
.alloc  = arm_v7s_alloc_pgtable,
.free   = arm_v7s_free_pgtable,
  };
@@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void)
pr_info("self test ok\n");
return 0;
  }
-subsys_initcall(arm_v7s_do_selftests);
+#else
+static int arm_v7s_do_selftests(void)
+{
+   return 0;
+}
  #endif
+
+static int __init arm_v7s_init(void)
+{
+   int ret;
+
+   ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns);
+   if (ret < 0) {
+   pr_err("Failed to register ARM V7S format\n");


Super-nit: I think "v7s" should probably be lowercase there. Also
general consistency WRT to showing the error code and whether or not
to abbreviate "format" would be nice.


Ok, I can fix this accordingly.


+   return ret;
+   }
+
+   ret = arm_v7s_do_selftests();
+   if (ret < 0)
+   io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+
+   return ret;
+}
+core_initcall(arm_v7s_init);
+
+static void __exit arm_v7s_exit(void)
+{
+   io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+}
+module_exit(arm_v7s_exit);
diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 87def58..ff0ea2f 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct 
io_pgtable_cfg *cfg, void *cookie)

return NULL;
  }
  -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
-   .alloc  = arm_64_lpae_alloc_pgtable_s1,
-   .free   = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
-   .alloc  = arm_64_lpae_alloc_pgtable_s2,
-   .free   = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
-   .alloc  = arm_32_lpae_alloc_pgtable_s1,
-   .free   = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
-   .alloc  = arm_32_lpae_alloc_pgtable_s2,
-   .free   = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
-   .alloc  = arm_mali_lpae_alloc_pgtable,
-   .free   = arm_lpae_free_pgtable,
+static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = {
+   {
+   .fmt= ARM_32_LPAE_S1,
+   .alloc  = arm_32_lpae_alloc_pgtable_s1,
+   .free   = arm_lpae_free_pgtable,
+   },
+   {
+   .fmt= ARM_32_LPAE_S2,
+   .alloc  = arm_32_lpae_alloc_pgtable_s2,
+   .free   = arm_lpae_free_pgtable,
+   },
+   {
+   .fmt= ARM_64_LPAE_S1,
+   .alloc  = arm_64_lpae_alloc_pgtable_s1,
+   .free   = arm_lpae_free_pgtable,
+   },
+   {
+   .fmt= ARM_64_LPAE_S2,
+   .alloc  = arm_64_lpae_alloc_pgtable_s2,
+   .free   = arm_lpae_free_pgtable,
+   },
+   {
+   .fmt= ARM_MALI_LPAE,
+   .alloc  = arm_mali_lpae_alloc_pgtable,
+   .free   = arm_lpae_free_pgtable,
+   },
  };
#ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
@@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void)
pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail);
return fail ? -EFAULT : 0;
  }
-

Re: [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules

2020-12-22 Thread isaacm

On 2020-12-22 11:27, Robin Murphy wrote:

On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The SMMU driver depends on the availability of the ARM LPAE and
ARM V7S io-pgtable format code to work properly. In preparation


Nit: we don't really depend on v7s - we *can* use it if it's
available, address constraints are suitable, and the SMMU
implementation actually supports it (many don't), but we can still
quite happily not use it even so. LPAE is mandatory in the
architecture so that's our only hard requirement, embodied in the
kconfig select.

This does mean there may technically still be a corner case involving
ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime
failure rather than a build error, so unless and until anyone
demonstrates that it actually matters I don't feel particularly
inclined to give it much thought.

Robin.


Okay, I'll fix up the commit message, as well as the code, so that it
only depends on io-pgtable-arm.

Thanks,
Isaac

for having the io-pgtable formats as modules, add a "pre"
dependency with MODULE_SOFTDEP() to ensure that the io-pgtable
format modules are loaded before loading the ARM SMMU driver module.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index d8c6bfd..a72649f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM 
architected SMMU implementations");

  MODULE_AUTHOR("Will Deacon ");
  MODULE_ALIAS("platform:arm-smmu");
  MODULE_LICENSE("GPL v2");
+MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s");



___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/io-pgtable-arm: Prepare for modularization

2020-12-21 Thread isaacm

On 2020-12-21 07:22, Robin Murphy wrote:

On 2020-12-18 18:59, isa...@codeaurora.org wrote:

On 2020-12-18 04:38, Robin Murphy wrote:

On 2020-12-18 08:38, Isaac J. Manjarres wrote:

The io-pgtable-arm and io-pgtable-arm-v7s source files will
be compiled as separate modules, along with the io-pgtable
source. Export the symbols for the io-pgtable init function
structures for the io-pgtable module to use.


In my current build tree, the io-pgtable glue itself is a whopping 
379

bytes of code and data - is there really any benefit to all the
additional overhead of making that modular? Given the number of
different users (including AMD now), I think at this point we should
start considering this as part of the IOMMU core, and just tweak the
interface such that formats can register their own init_fns
dynamically instead of the static array that's always horrible.

Robin.

Thanks for the feedback, Robin. This is an avenue I had explored a bit 
when modularizing the code. However,

I came up with a few problems that I couldn't get around.

1) If we leave the io-pgtable glue as part of the core kernel, we need 
to ensure that the io-pgtable format
modules get loaded prior to any driver that might use them (e.g. IOMMU 
drivers/other callers of alloc_io_pgtable_ops).
     a) This can get a bit messy, as there's no symbol dependencies 
between the callers of the io-pgtable
    code, and the page table format modules, since everything is 
through function pointers. This is handled
    for the IOMMU drivers through the devlink feature, but I don't 
see how we can leverage something like that
    here. I guess this isn't too much of a problem when everything 
is built-in, as the registration can happen

    in one of the earlier initcall levels.

     b) If we do run into a scenario where a client of io-pgtable 
tries to allocate a page table instance prior
    to the io-pgtable format module being loaded, I couldn't come 
up with a way of distinguishing between
    format module is not available at the moment vs  format module 
will never be available. I don't think
    returning EPROBE_DEFER would be something nice to do in that 
case.


Urgh, I see... yes, the current approach does work out as an
unexpectedly neat way to avoid many of the pitfalls. However I'm not
sure it actually avoids all of them - say you have a config like this:

IPMMU_VMSA=y
-> IO_PGTABLE_ARM_LPAE=y
   -> IO_PGTABLE=y
MTK_IOMMU=m
-> IO_PGTABLE_ARMV7S=m

won't that still fail to link io-pgtable.o?


Yes, you are correct, that would be problematic.
2) We would have to ensure that the format module cannot be unloaded 
while other clients are using it. I suppose
this isn't as big as point #1 though, since it's something that can 
probably be handled through a similar ref count

mechanism that we're using for modular IOMMU drivers.


FWIW I think that would come out in the wash from resolving 1b - I'd
assume there would have to be some sort of module_get() in there
somewhere. I should probably go and look at how the crypto API handles
its modular algorithms for more inspiration...
So I looked through the crypto dir, and it seems like they--along with a 
few other kernel drivers--are using MODULE_SOFTDEP()

to sort out these dependencies.


Given the two reasons above, I went with the current approach, since 
it avoids both issues by creating symbol dependencies
between client drivers, the io-pgtable drivers, and the io-pgtable 
format drivers, so that ensures that they are loaded
in the correct order, and also prevents them from being removed, 
unless there aren't any users present.


Having thought all that over, I'm now wondering what we really gain
from this either way - if vendors can build and ship SoC-tailored
configs, then they can already turn off formats they don't care about.
If the aim is to ship a single config everywhere, then you'll still
have to provision and load all possible formats on any system that
needs any one of them, thanks to those "convenient" symbol
dependencies. The promise in the cover letter doesn't seem to
materialise :/

Robin.

Given the feedback, this makes sense. I've come up with a second version 
of the patches that leaves
the io-pgtable code in the kernel, and allows the formats to be modules, 
which better achieves what
the cover-letter is trying to express :) I believe that with the second 
patch, we should be able to
get to a place where the kernel just needs to provide io-pgtable, while 
vendors can provide either

io-pgtable-arm or io-pgtable-arm-v7s or both, as needed.
Here are the patches: 
https://lore.kernel.org/linux-iommu/1608597876-32367-1-git-send-email-isa...@codeaurora.org/T/#t


Thanks,
Isaac


Thanks,
Isaac

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/io-pgtable-arm-v7s.c | 4 
  drivers/iommu/io-pgtable-arm.c | 8 
  2 files changed, 12 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c

ind

Re: [PATCH 1/3] iommu/io-pgtable-arm: Prepare for modularization

2020-12-18 Thread isaacm

On 2020-12-18 04:38, Robin Murphy wrote:

On 2020-12-18 08:38, Isaac J. Manjarres wrote:

The io-pgtable-arm and io-pgtable-arm-v7s source files will
be compiled as separate modules, along with the io-pgtable
source. Export the symbols for the io-pgtable init function
structures for the io-pgtable module to use.


In my current build tree, the io-pgtable glue itself is a whopping 379
bytes of code and data - is there really any benefit to all the
additional overhead of making that modular? Given the number of
different users (including AMD now), I think at this point we should
start considering this as part of the IOMMU core, and just tweak the
interface such that formats can register their own init_fns
dynamically instead of the static array that's always horrible.

Robin.

Thanks for the feedback, Robin. This is an avenue I had explored a bit 
when modularizing the code. However,

I came up with a few problems that I couldn't get around.

1) If we leave the io-pgtable glue as part of the core kernel, we need 
to ensure that the io-pgtable format
modules get loaded prior to any driver that might use them (e.g. IOMMU 
drivers/other callers of alloc_io_pgtable_ops).
a) This can get a bit messy, as there's no symbol dependencies 
between the callers of the io-pgtable
   code, and the page table format modules, since everything is 
through function pointers. This is handled
   for the IOMMU drivers through the devlink feature, but I don't 
see how we can leverage something like that
   here. I guess this isn't too much of a problem when everything is 
built-in, as the registration can happen

   in one of the earlier initcall levels.

b) If we do run into a scenario where a client of io-pgtable tries 
to allocate a page table instance prior
   to the io-pgtable format module being loaded, I couldn't come up 
with a way of distinguishing between
   format module is not available at the moment vs  format module 
will never be available. I don't think
   returning EPROBE_DEFER would be something nice to do in that 
case.


2) We would have to ensure that the format module cannot be unloaded 
while other clients are using it. I suppose
this isn't as big as point #1 though, since it's something that can 
probably be handled through a similar ref count

mechanism that we're using for modular IOMMU drivers.

Given the two reasons above, I went with the current approach, since it 
avoids both issues by creating symbol dependencies
between client drivers, the io-pgtable drivers, and the io-pgtable 
format drivers, so that ensures that they are loaded
in the correct order, and also prevents them from being removed, unless 
there aren't any users present.


Thanks,
Isaac

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/io-pgtable-arm-v7s.c | 4 
  drivers/iommu/io-pgtable-arm.c | 8 
  2 files changed, 12 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c

index 1d92ac9..f062c1c 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -839,6 +840,7 @@ struct io_pgtable_init_fns 
io_pgtable_arm_v7s_init_fns = {

.alloc  = arm_v7s_alloc_pgtable,
.free   = arm_v7s_free_pgtable,
  };
+EXPORT_SYMBOL_GPL(io_pgtable_arm_v7s_init_fns);
#ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
  @@ -984,3 +986,5 @@ static int __init arm_v7s_do_selftests(void)
  }
  subsys_initcall(arm_v7s_do_selftests);
  #endif
+
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 87def58..2623d57 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1047,26 +1048,31 @@ struct io_pgtable_init_fns 
io_pgtable_arm_64_lpae_s1_init_fns = {

.alloc  = arm_64_lpae_alloc_pgtable_s1,
.free   = arm_lpae_free_pgtable,
  };
+EXPORT_SYMBOL_GPL(io_pgtable_arm_64_lpae_s1_init_fns);
struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
.alloc  = arm_64_lpae_alloc_pgtable_s2,
.free   = arm_lpae_free_pgtable,
  };
+EXPORT_SYMBOL_GPL(io_pgtable_arm_64_lpae_s2_init_fns);
struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
.alloc  = arm_32_lpae_alloc_pgtable_s1,
.free   = arm_lpae_free_pgtable,
  };
+EXPORT_SYMBOL_GPL(io_pgtable_arm_32_lpae_s1_init_fns);
struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
.alloc  = arm_32_lpae_alloc_pgtable_s2,
.free   = arm_lpae_free_pgtable,
  };
+EXPORT_SYMBOL_GPL(io_pgtable_arm_32_lpae_s2_init_fns);
struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
.alloc  = arm_mali_lpae_alloc_pgtable,
.free   = arm_lpae_free_pgtable,
  };
+EXPORT_SYMBOL_GPL

Re: [RFC PATCH] iommu/iova: Add a best-fit algorithm

2020-02-19 Thread isaacm

On 2020-02-17 08:03, Robin Murphy wrote:

On 14/02/2020 11:06 pm, Isaac J. Manjarres wrote:

From: Liam Mark 

Using the best-fit algorithm, instead of the first-fit
algorithm, may reduce fragmentation when allocating
IOVAs.


What kind of pathological allocation patterns make that a serious
problem? Is there any scope for simply changing the order of things in
the callers? Do these drivers also run under other DMA API backends
(e.g. 32-bit Arm)?

The usecases where the IOVA space has been fragmented have 
non-deterministic allocation
patterns, and thus, it's not feasible to change the allocation order to 
avoid fragmenting

the IOVA space.

From what we've observed, the usecases involve allocations of two types 
of
buffers: one type of buffer between 1 KB to 4 MB in size, and another 
type of

buffer between 1 KB to 400 MB in size.

The pathological scenarios seem to arise when there are
many (100+) randomly distributed non-power of two allocations, which in 
some cases leaves

behind holes of up to 100+ MB in the IOVA space.

Here are some examples that show the state of the IOVA space under which 
failure to

allocate an IOVA was observed:

Instance 1:
Currently mapped total size : ~1.3GB
Free space available : ~2GB
Map for ~162MB fails.
Max contiguous space available : < 162MB

Instance 2:
Currently mapped total size : ~950MB
Free space available : ~2.3GB
Map for ~320MB fails.
Max contiguous space available : ~189MB

Instance 3:
Currently mapped total size : ~1.2GB
Free space available : ~2.7GB
Map for ~162MB fails.
Max contiguous space available : <162MB

We are still in the process of collecting data with the best-fit 
algorithm enabled

to provide some numbers to show that it results in less IOVA space
fragmentation.

To answer your question about whether if this driver run under other DMA 
API backends:

yes, such as 32 bit ARM.

More generally, if a driver knows enough to want to second-guess a
generic DMA API allocator, that's a reasonable argument that it should
perhaps be properly IOMMU-aware and managing its own address space
anyway. Perhaps this effort might be better spent finishing off the
DMA ops bypass stuff to make that approach more robust and welcoming.

Robin.


Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/dma-iommu.c | 17 +++
  drivers/iommu/iova.c  | 73 
+--

  include/linux/dma-iommu.h |  7 +
  include/linux/iova.h  |  1 +
  4 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5..af08770 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -364,9 +364,26 @@ static int iommu_dma_deferred_attach(struct 
device *dev,

if (unlikely(ops->is_attach_deferred &&
ops->is_attach_deferred(domain, dev)))
return iommu_attach_device(domain, dev);
+   return 0;
+}
+
+/*
+ * Should be called prior to using dma-apis.
+ */
+int iommu_dma_enable_best_fit_algo(struct device *dev)
+{
+   struct iommu_domain *domain;
+   struct iova_domain *iovad;
+
+   domain = iommu_get_domain_for_dev(dev);
+   if (!domain || !domain->iova_cookie)
+   return -EINVAL;
  + iovad = &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad;
+   iovad->best_fit = true;
return 0;
  }
+EXPORT_SYMBOL(iommu_dma_enable_best_fit_algo);
/**
   * dma_info_to_prot - Translate DMA API directions and attributes to 
IOMMU API

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 0e6a953..716b05f 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -50,6 +50,7 @@ static unsigned long iova_rcache_get(struct 
iova_domain *iovad,

iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
+   iovad->best_fit = false;
init_iova_rcaches(iovad);
  }
  EXPORT_SYMBOL_GPL(init_iova_domain);
@@ -227,6 +228,69 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,

return -ENOMEM;
  }
  +static int __alloc_and_insert_iova_best_fit(struct iova_domain 
*iovad,

+   unsigned long size, unsigned long limit_pfn,
+   struct iova *new, bool size_aligned)
+{
+   struct rb_node *curr, *prev;
+   struct iova *curr_iova, *prev_iova;
+   unsigned long flags;
+   unsigned long align_mask = ~0UL;
+   struct rb_node *candidate_rb_parent;
+   unsigned long new_pfn, candidate_pfn = ~0UL;
+   unsigned long gap, candidate_gap = ~0UL;
+
+   if (size_aligned)
+   align_mask <<= limit_align(iovad, fls_long(size - 1));
+
+   /* Walk the tree backwards */
+   spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+   curr = &iov

Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range

2020-02-19 Thread isaacm

On 2020-02-19 03:15, Will Deacon wrote:

On Tue, Feb 18, 2020 at 05:57:18PM -0800, isa...@codeaurora.org wrote:

On 2020-02-17 07:50, Robin Murphy wrote:
> On 17/02/2020 8:01 am, Christoph Hellwig wrote:
> > On Fri, Feb 14, 2020 at 02:58:16PM -0800, Isaac J. Manjarres wrote:
> > > From: Liam Mark 
> > >
> > > Some devices have a memory map which contains gaps or holes.
> > > In order for the device to have as much IOVA space as possible,
> > > allow its driver to inform the DMA-IOMMU layer that it should
> > > not allocate addresses from these holes.
> >
> > Layering violation.  dma-iommu is the translation layer between the
> > DMA API and the IOMMU API.  And calls into it from drivers performing
> > DMA mappings need to go through the DMA API (and be documented there).
>
> +1
>
> More than that, though, we already have "holes in the address space"
> support for the sake of PCI host bridge windows - assuming this is the
> same kind of thing (i.e. the holes are between memory regions and
> other resources in PA space, so are only relevant once address
> translation comes into the picture), then this is IOMMU API level
To make sure that we're on the same page, this support alludes to the
handling in
dma-iommu.c that reserves portions of the IOVA space for the PCI host 
bridge

windows,
correct? If so, then yes, this is similar.
> stuff, so even a DMA API level interface would be inappropriate.
Does this mean that the driver should be managing the IOVA space and
mappings for this device using the IOMMU API? If so, is the rationale 
for
this because the device driver can have the information of what IOVA 
ranges
can and cannot be used? Shouldn't there be a generic way of informing 
an
IOMMU driver about these reserved ranges? Perhaps through a device 
tree

property, instead of deferring this type of management to the driver?


Before we dive into designing that, can you please clarify whether the
reserved IOVA range applies to all DMA masters mastering through a
particular SMMU, or whether it's just about one specific master? I was
assuming the former, but wanted to be sure.


This situation currently applies to one master.

Thanks,

Will


Thanks,
Isaac
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range

2020-02-18 Thread isaacm

On 2020-02-17 07:50, Robin Murphy wrote:

On 17/02/2020 8:01 am, Christoph Hellwig wrote:

On Fri, Feb 14, 2020 at 02:58:16PM -0800, Isaac J. Manjarres wrote:

From: Liam Mark 

Some devices have a memory map which contains gaps or holes.
In order for the device to have as much IOVA space as possible,
allow its driver to inform the DMA-IOMMU layer that it should
not allocate addresses from these holes.


Layering violation.  dma-iommu is the translation layer between the
DMA API and the IOMMU API.  And calls into it from drivers performing
DMA mappings need to go through the DMA API (and be documented there).


+1

More than that, though, we already have "holes in the address space"
support for the sake of PCI host bridge windows - assuming this is the
same kind of thing (i.e. the holes are between memory regions and
other resources in PA space, so are only relevant once address
translation comes into the picture), then this is IOMMU API level
To make sure that we're on the same page, this support alludes to the 
handling in
dma-iommu.c that reserves portions of the IOVA space for the PCI host 
bridge windows,

correct? If so, then yes, this is similar.

stuff, so even a DMA API level interface would be inappropriate.
Does this mean that the driver should be managing the IOVA space and 
mappings for this device using the IOMMU API? If so, is the rationale 
for this because the device driver can have the information of what IOVA 
ranges can and cannot be used? Shouldn't there be a generic way of 
informing an IOMMU driver about these reserved ranges? Perhaps through a 
device tree property, instead of deferring this type of management to 
the driver?


Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Add support for DMA_ATTR_SYS_CACHE

2019-10-26 Thread isaacm

On 2019-10-25 22:30, Christoph Hellwig wrote:

The definition makes very little sense.
Can you please clarify what part doesn’t make sense, and why? This is 
really just an extension of this patch that got mainlined, so that 
clients that use the DMA API can use IOMMU_QCOM_SYS_CACHE as well: 
https://patchwork.kernel.org/patch/10946099/

 Any without a user in the same series it is a complete no-go anyway.
IOMMU_QCOM_SYS_CACHE does not have any current users in the mainline, 
nor did it have it in the patch series in which it got merged, yet it is 
still present? Furthermore, there are plans to upstream support for one 
of our SoCs that may benefit from this, as seen here: 
https://www.spinics.net/lists/iommu/msg39608.html.


Thanks,
Isaac
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu