Re: [RFC PATCH v3 09/12] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-07 Thread isaacm

On 2021-04-07 02:57, Will Deacon wrote:

On Tue, Apr 06, 2021 at 02:02:26PM -0700, isa...@codeaurora.org wrote:

On 2021-04-06 05:15, Will Deacon wrote:
> On Mon, Apr 05, 2021 at 12:11:09PM -0700, Isaac J. Manjarres wrote:
> > Implement the unmap_pages() callback for the ARM LPAE io-pgtable
> > format.
> >
> > Signed-off-by: Isaac J. Manjarres 
> > Suggested-by: Will Deacon 
> > ---
> >  drivers/iommu/io-pgtable-arm.c | 124
> > +++--
> >  1 file changed, 104 insertions(+), 20 deletions(-)
>
> [...]
>
> > +static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops,
> > unsigned long iova,
> > +size_t pgsize, size_t pgcount,
> > +struct iommu_iotlb_gather *gather)
> > +{
> > + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > + struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > + arm_lpae_iopte *ptep = data->pgd;
> > + long iaext = (s64)iova >> cfg->ias;
> > + size_t unmapped = 0, unmapped_page;
> > + int last_lvl;
> > + size_t table_size, pages, tbl_offset, max_entries;
> > +
> > + if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize ||
> > !pgcount))
> > + return 0;
> > +
> > + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> > + iaext = ~iaext;
> > + if (WARN_ON(iaext))
> > + return 0;
> > +
> > + /*
> > +  * Calculating the page table size here helps avoid situations where
> > +  * a page range that is being unmapped may be mapped at the same
> > level
> > +  * but not mapped by the same tables. Allowing such a scenario to
> > +  * occur can complicate the logic in arm_lpae_split_blk_unmap().
> > +  */
> > + last_lvl = ARM_LPAE_BLOCK_SIZE_LVL(pgsize, data);
> > +
> > + if (last_lvl == data->start_level)
> > + table_size = ARM_LPAE_PGD_SIZE(data);
> > + else
> > + table_size = ARM_LPAE_GRANULE(data);
> > +
> > + max_entries = table_size / sizeof(*ptep);
> > +
> > + while (pgcount) {
> > + tbl_offset = ARM_LPAE_LVL_IDX(iova, last_lvl, data);
> > + pages = min_t(size_t, pgcount, max_entries - tbl_offset);
> > + unmapped_page = __arm_lpae_unmap(data, gather, iova, pgsize,
> > +  pages, data->start_level,
> > +  ptep);
> > + if (!unmapped_page)
> > + break;
> > +
> > + unmapped += unmapped_page;
> > + iova += unmapped_page;
> > + pgcount -= pages;
> > + }
>
> Robin has some comments on the first version of this patch, and I
> don't think you
> addressed them:
>
> https://lore.kernel.org/r/b93fa0b1-e2a4-1aad-8b88-4d0dfecdf...@arm.com
>
> I'm inclined to agree that iterating here doesn't make a lot of sense --
> we've already come back out of the page-table walk, so I think we should
> just return to the caller (who is well prepared to handle a partial
> unmap).
> Same for the map side of things.
>
> If we get numbers showing that this is causing a performance issue, then
> we should rework the page-table code to handle this at the lower level
> (because I doubt the loop that you have is really worse than returning
> to
> the caller anyway).
>
Sorry, I seem to have overlooked those comments.

I will go ahead and address them. I think it might be ideal to try to 
do
as much work as possible in the io-pgtable level, so as to minimize 
the

number
of indirect calls incurred by jumping back and forth between iommu 
fwk,

iommu
driver, and io-pgtable code.

Perhaps we can try something like how the linear mappings are created 
on

arm64 i.e.
on the previous level, we can determine how many pages can be unmapped 
in

one page table
in one iteration, and on the subsequent iterations, we can tackle 
another

page table at
the lower level. Looking at the code, it doesn't seem too difficult to 
add

this in. Thoughts?


I don't object to getting there eventually, but as an initial step I 
think
returning back to the caller is perfectly reasonable and will probably 
make
the patches easier to review. In other words, implement the simple 
(correct)

thing first, and then have subsequent patches to improve it.

Will
Thanks for the feedback. I've implemented the simpler approach in v4 of 
the patches: 
https://lore.kernel.org/linux-iommu/20210408045241.27316-1-isa...@codeaurora.org/T/#t.


Thanks,
Isaac
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 15/15] iommu/arm-smmu: Implement the map_pages() IOMMU driver callback

2021-04-07 Thread Isaac J. Manjarres
Implement the map_pages() callback for the ARM SMMU driver
to allow calls from iommu_map to map multiple pages of
the same size in one call. Also, remove the map() callback
for the ARM SMMU driver, as it will no longer be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 188e506d75e1..8fcc422e2f2f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1191,8 +1191,9 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return ret;
 }
 
-static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t pgsize, size_t pgcount,
+ int prot, gfp_t gfp, size_t *mapped)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1202,7 +1203,7 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
return -ENODEV;
 
arm_smmu_rpm_get(smmu);
-   ret = ops->map(ops, iova, paddr, size, prot, gfp);
+   ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, 
mapped);
arm_smmu_rpm_put(smmu);
 
return ret;
@@ -1624,7 +1625,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_alloc   = arm_smmu_domain_alloc,
.domain_free= arm_smmu_domain_free,
.attach_dev = arm_smmu_attach_dev,
-   .map= arm_smmu_map,
+   .map_pages  = arm_smmu_map_pages,
.unmap_pages= arm_smmu_unmap_pages,
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 09/15] iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries

2021-04-07 Thread Isaac J. Manjarres
The PTE methods currently operate on a single entry. In preparation
for manipulating multiple PTEs in one map or unmap call, allow them
to handle multiple PTEs.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Robin Murphy 
---
 drivers/iommu/io-pgtable-arm.c | 78 +++---
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..ea66b10c04c4 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -232,20 +232,23 @@ static void __arm_lpae_free_pages(void *pages, size_t 
size,
free_pages((unsigned long)pages, get_order(size));
 }
 
-static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
+static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
struct io_pgtable_cfg *cfg)
 {
dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
-  sizeof(*ptep), DMA_TO_DEVICE);
+  sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
 }
 
 static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
-  struct io_pgtable_cfg *cfg)
+  int num_entries, struct io_pgtable_cfg *cfg)
 {
-   *ptep = pte;
+   int i;
+
+   for (i = 0; i < num_entries; i++)
+   ptep[i] = pte;
 
if (!cfg->coherent_walk)
-   __arm_lpae_sync_pte(ptep, cfg);
+   __arm_lpae_sync_pte(ptep, num_entries, cfg);
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
@@ -255,47 +258,54 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
 
 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
phys_addr_t paddr, arm_lpae_iopte prot,
-   int lvl, arm_lpae_iopte *ptep)
+   int lvl, int num_entries, arm_lpae_iopte *ptep)
 {
arm_lpae_iopte pte = prot;
+   struct io_pgtable_cfg *cfg = &data->iop.cfg;
+   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+   int i;
 
if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
pte |= ARM_LPAE_PTE_TYPE_PAGE;
else
pte |= ARM_LPAE_PTE_TYPE_BLOCK;
 
-   pte |= paddr_to_iopte(paddr, data);
+   for (i = 0; i < num_entries; i++)
+   ptep[i] = pte | paddr_to_iopte(paddr + i * sz, data);
 
-   __arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
+   if (!cfg->coherent_walk)
+   __arm_lpae_sync_pte(ptep, num_entries, cfg);
 }
 
 static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 unsigned long iova, phys_addr_t paddr,
-arm_lpae_iopte prot, int lvl,
+arm_lpae_iopte prot, int lvl, int num_entries,
 arm_lpae_iopte *ptep)
 {
-   arm_lpae_iopte pte = *ptep;
-
-   if (iopte_leaf(pte, lvl, data->iop.fmt)) {
-   /* We require an unmap first */
-   WARN_ON(!selftest_running);
-   return -EEXIST;
-   } else if (iopte_type(pte) == ARM_LPAE_PTE_TYPE_TABLE) {
-   /*
-* We need to unmap and free the old table before
-* overwriting it with a block entry.
-*/
-   arm_lpae_iopte *tblp;
-   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-
-   tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-   if (__arm_lpae_unmap(data, NULL, iova, sz, lvl, tblp) != sz) {
-   WARN_ON(1);
-   return -EINVAL;
+   int i;
+
+   for (i = 0; i < num_entries; i++)
+   if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
+   /* We require an unmap first */
+   WARN_ON(!selftest_running);
+   return -EEXIST;
+   } else if (iopte_type(ptep[i]) == ARM_LPAE_PTE_TYPE_TABLE) {
+   /*
+* We need to unmap and free the old table before
+* overwriting it with a block entry.
+*/
+   arm_lpae_iopte *tblp;
+   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+
+   tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
+   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz,
+lvl, tblp) != sz) {
+   WARN_ON(1);
+   return -EINVAL;
+   }
}
-   }
 
-   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   __arm_lpae_init_pte(data, paddr, prot, lvl, num_entries, ptep);
return 0;
 }
 
@@ -323,7 +333,7 @@ static arm_lpae_iopte arm_lpae_install_table(arm_lp

[RFC PATCH v4 14/15] iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback

2021-04-07 Thread Isaac J. Manjarres
Implement the unmap_pages() callback for the ARM SMMU driver
to allow calls from iommu_unmap to unmap multiple pages of
the same size in one call. Also, remove the unmap() callback
for the SMMU driver, as it will no longer be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfde6a61..188e506d75e1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1208,8 +1208,9 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
return ret;
 }
 
-static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
-size_t size, struct iommu_iotlb_gather *gather)
+static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long 
iova,
+  size_t pgsize, size_t pgcount,
+  struct iommu_iotlb_gather *iotlb_gather)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1219,7 +1220,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return 0;
 
arm_smmu_rpm_get(smmu);
-   ret = ops->unmap(ops, iova, size, gather);
+   ret = ops->unmap_pages(ops, iova, pgsize, pgcount, iotlb_gather);
arm_smmu_rpm_put(smmu);
 
return ret;
@@ -1624,7 +1625,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_free= arm_smmu_domain_free,
.attach_dev = arm_smmu_attach_dev,
.map= arm_smmu_map,
-   .unmap  = arm_smmu_unmap,
+   .unmap_pages= arm_smmu_unmap_pages,
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 11/15] iommu/io-pgtable-arm: Implement arm_lpae_map_pages()

2021-04-07 Thread Isaac J. Manjarres
Implement the map_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/io-pgtable-arm.c | 42 ++
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 6700685f81d4..834481d3c7f3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -341,20 +341,30 @@ static arm_lpae_iopte 
arm_lpae_install_table(arm_lpae_iopte *table,
 }
 
 static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
- phys_addr_t paddr, size_t size, arm_lpae_iopte prot,
- int lvl, arm_lpae_iopte *ptep, gfp_t gfp)
+ phys_addr_t paddr, size_t size, size_t pgcount,
+ arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
+ gfp_t gfp, size_t *mapped)
 {
arm_lpae_iopte *cptep, pte;
size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
size_t tblsz = ARM_LPAE_GRANULE(data);
struct io_pgtable_cfg *cfg = &data->iop.cfg;
+   int ret = 0, num_entries, max_entries, map_idx_start;
 
/* Find our entry at the current level */
-   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   map_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   ptep += map_idx_start;
 
/* If we can install a leaf entry at this level, then do so */
-   if (size == block_size)
-   return arm_lpae_init_pte(data, iova, paddr, prot, lvl, 1, ptep);
+   if (size == block_size) {
+   max_entries = (tblsz >> ilog2(sizeof(pte))) - map_idx_start;
+   num_entries = min_t(int, pgcount, max_entries);
+   ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, 
num_entries, ptep);
+   if (!ret && mapped)
+   *mapped += num_entries * size;
+
+   return ret;
+   }
 
/* We can't allocate tables at the final level */
if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1))
@@ -383,7 +393,8 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
}
 
/* Rinse, repeat */
-   return __arm_lpae_map(data, iova, paddr, size, prot, lvl + 1, cptep, 
gfp);
+   return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
+ cptep, gfp, mapped);
 }
 
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
@@ -450,8 +461,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
return pte;
 }
 
-static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
-   phys_addr_t paddr, size_t size, int iommu_prot, gfp_t 
gfp)
+static int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+ phys_addr_t paddr, size_t pgsize, size_t pgcount,
+ int iommu_prot, gfp_t gfp, size_t *mapped)
 {
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
struct io_pgtable_cfg *cfg = &data->iop.cfg;
@@ -460,7 +472,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
arm_lpae_iopte prot;
long iaext = (s64)iova >> cfg->ias;
 
-   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+   if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize))
return -EINVAL;
 
if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
@@ -473,7 +485,8 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return 0;
 
prot = arm_lpae_prot_to_pte(data, iommu_prot);
-   ret = __arm_lpae_map(data, iova, paddr, size, prot, lvl, ptep, gfp);
+   ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
+ptep, gfp, NULL);
/*
 * Synchronise all PTE updates for the new mapping before there's
 * a chance for anything to kick off a table walk for the new iova.
@@ -483,6 +496,14 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return ret;
 }
 
+
+static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
+   phys_addr_t paddr, size_t size, int iommu_prot, gfp_t 
gfp)
+{
+   return arm_lpae_map_pages(ops, iova, paddr, size, 1, iommu_prot, gfp,
+ NULL);
+}
+
 static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
arm_lpae_iopte *ptep)
 {
@@ -779,6 +800,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 
data->iop.ops = (struct io_pgtable_ops) {
.map= arm_lpae_map,
+   .map_pages  = arm_lpae_map_pages,
.unmap  = arm_lpae_unmap,
.unmap_pages= arm_lpae_unmap_pages,
  

[RFC PATCH v4 05/15] iommu: Use bitmap to calculate page size in iommu_pgsize()

2021-04-07 Thread Isaac J. Manjarres
From: Will Deacon 

Avoid the potential for shifting values by amounts greater than the
width of their type by using a bitmap to compute page size in
iommu_pgsize().

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 31 ---
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..bcd623862bf9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2360,30 +2361,22 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
   unsigned long addr_merge, size_t size)
 {
unsigned int pgsize_idx;
+   unsigned long pgsizes;
size_t pgsize;
 
-   /* Max page size that still fits into 'size' */
-   pgsize_idx = __fls(size);
+   /* Page sizes supported by the hardware and small enough for @size */
+   pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
 
-   /* need to consider alignment requirements ? */
-   if (likely(addr_merge)) {
-   /* Max page size allowed by address */
-   unsigned int align_pgsize_idx = __ffs(addr_merge);
-   pgsize_idx = min(pgsize_idx, align_pgsize_idx);
-   }
-
-   /* build a mask of acceptable page sizes */
-   pgsize = (1UL << (pgsize_idx + 1)) - 1;
-
-   /* throw away page sizes not supported by the hardware */
-   pgsize &= domain->pgsize_bitmap;
+   /* Constrain the page sizes further based on the maximum alignment */
+   if (likely(addr_merge))
+   pgsizes &= GENMASK(__ffs(addr_merge), 0);
 
-   /* make sure we're still sane */
-   BUG_ON(!pgsize);
+   /* Make sure we have at least one suitable page size */
+   BUG_ON(!pgsizes);
 
-   /* pick the biggest page */
-   pgsize_idx = __fls(pgsize);
-   pgsize = 1UL << pgsize_idx;
+   /* Pick the biggest page size remaining */
+   pgsize_idx = __fls(pgsizes);
+   pgsize = BIT(pgsize_idx);
 
return pgsize;
 }
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 13/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()

2021-04-07 Thread Isaac J. Manjarres
Implement the map_pages() callback for the ARM v7s io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 5e203e03c352..0d49f0e8cf61 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -519,11 +519,12 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, 
unsigned long iova,
return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep, 
gfp);
 }
 
-static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
-   int ret;
+   int ret = -EINVAL;
 
if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
paddr >= (1ULL << data->iop.cfg.oas)))
@@ -533,7 +534,17 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, 
unsigned long iova,
if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
return 0;
 
-   ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd, gfp);
+   while (pgcount--) {
+   ret = __arm_v7s_map(data, iova, paddr, pgsize, prot, 1, 
data->pgd,
+   gfp);
+   if (ret)
+   break;
+
+   iova += pgsize;
+   paddr += pgsize;
+   if (mapped)
+   *mapped += pgsize;
+   }
/*
 * Synchronise all PTE updates for the new mapping before there's
 * a chance for anything to kick off a table walk for the new iova.
@@ -543,6 +554,12 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return ret;
 }
 
+static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+   return arm_v7s_map_pages(ops, iova, paddr, size, 1, prot, gfp, NULL);
+}
+
 static void arm_v7s_free_pgtable(struct io_pgtable *iop)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 03/15] iommu/io-pgtable: Introduce map_pages() as a page table op

2021-04-07 Thread Isaac J. Manjarres
Mapping memory into io-pgtables follows the same semantics
that unmapping memory used to follow (i.e. a buffer will be
mapped one page block per call to the io-pgtable code). This
means that it can be optimized in the same way that unmapping
memory was, so add a map_pages() callback to the io-pgtable
ops structure, so that a range of pages of the same size
can be mapped within the same call.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 include/linux/io-pgtable.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 2ed0c057d9e7..019149b204b8 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -143,6 +143,7 @@ struct io_pgtable_cfg {
  * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
  *
  * @map:  Map a physically contiguous memory region.
+ * @map_pages:Map a physically contiguous range of pages of the same size.
  * @unmap:Unmap a physically contiguous memory region.
  * @unmap_pages:  Unmap a range of virtually contiguous pages of the same size.
  * @iova_to_phys: Translate iova to physical address.
@@ -153,6 +154,9 @@ struct io_pgtable_cfg {
 struct io_pgtable_ops {
int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+   int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped);
size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
size_t size, struct iommu_iotlb_gather *gather);
size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()

2021-04-07 Thread Isaac J. Manjarres
Implement the unmap_pages() callback for the ARM v7s io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index d4004bcf333a..5e203e03c352 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -710,15 +710,32 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
 }
 
-static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
-   size_t size, struct iommu_iotlb_gather *gather)
+static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long 
iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *gather)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   size_t unmapped = 0, ret;
 
if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
return 0;
 
-   return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd);
+   while (pgcount--) {
+   ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
+   if (!ret)
+   break;
+
+   unmapped += pgsize;
+   iova += pgsize;
+   }
+
+   return unmapped;
+}
+
+static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
+   size_t size, struct iommu_iotlb_gather *gather)
+{
+   return arm_v7s_unmap_pages(ops, iova, size, 1, gather);
 }
 
 static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 10/15] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-07 Thread Isaac J. Manjarres
Implement the unmap_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/io-pgtable-arm.c | 70 ++
 1 file changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index ea66b10c04c4..6700685f81d4 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -253,8 +253,8 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, 
arm_lpae_iopte pte,
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
-  unsigned long iova, size_t size, int lvl,
-  arm_lpae_iopte *ptep);
+  unsigned long iova, size_t size, size_t pgcount,
+  int lvl, arm_lpae_iopte *ptep);
 
 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
phys_addr_t paddr, arm_lpae_iopte prot,
@@ -298,7 +298,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable 
*data,
size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
 
tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz,
+   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
 lvl, tblp) != sz) {
WARN_ON(1);
return -EINVAL;
@@ -526,14 +526,14 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
   unsigned long iova, size_t size,
   arm_lpae_iopte blk_pte, int lvl,
-  arm_lpae_iopte *ptep)
+  arm_lpae_iopte *ptep, size_t pgcount)
 {
struct io_pgtable_cfg *cfg = &data->iop.cfg;
arm_lpae_iopte pte, *tablep;
phys_addr_t blk_paddr;
size_t tablesz = ARM_LPAE_GRANULE(data);
size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-   int i, unmap_idx = -1;
+   int i, unmap_idx_start = -1, num_entries = 0, max_entries;
 
if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
return 0;
@@ -542,15 +542,18 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
if (!tablep)
return 0; /* Bytes unmapped */
 
-   if (size == split_sz)
-   unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   if (size == split_sz) {
+   unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   max_entries = (tablesz >> ilog2(sizeof(pte))) - unmap_idx_start;
+   num_entries = min_t(int, pgcount, max_entries);
+   }
 
blk_paddr = iopte_to_paddr(blk_pte, data);
pte = iopte_prot(blk_pte);
 
for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
/* Unmap! */
-   if (i == unmap_idx)
+   if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
continue;
 
__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, &tablep[i]);
@@ -568,38 +571,45 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
return 0;
 
tablep = iopte_deref(pte, data);
-   } else if (unmap_idx >= 0) {
-   io_pgtable_tlb_add_page(&data->iop, gather, iova, size);
-   return size;
+   } else if (unmap_idx_start >= 0) {
+   for (i = 0; i < num_entries; i++)
+   io_pgtable_tlb_add_page(&data->iop, gather, iova + i * 
size, size);
+
+   return num_entries * size;
}
 
-   return __arm_lpae_unmap(data, gather, iova, size, lvl, tablep);
+   return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
-  unsigned long iova, size_t size, int lvl,
-  arm_lpae_iopte *ptep)
+  unsigned long iova, size_t size, size_t pgcount,
+  int lvl, arm_lpae_iopte *ptep)
 {
arm_lpae_iopte pte;
struct io_pgtable *iop = &data->iop;
+   size_t tblsz = ARM_LPAE_GRANULE(data);
+   int i, num_entries, max_entries, unmap_idx_start;
 
/* Something went horribly wrong and we ran out of page table */
if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
return 0;
 
-   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);

[RFC PATCH v4 07/15] iommu: Hook up '->unmap_pages' driver callback

2021-04-07 Thread Isaac J. Manjarres
From: Will Deacon 

Extend iommu_pgsize() to populate an optional 'count' parameter so that
we can direct unmapping operation to the ->unmap_pages callback if it
has been provided by the driver.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 60 ---
 1 file changed, 51 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ab689611a03b..d5d551754556 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2358,11 +2358,11 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
*domain, dma_addr_t iova)
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
 
 static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
-  phys_addr_t paddr, size_t size)
+  phys_addr_t paddr, size_t size, size_t *count)
 {
-   unsigned int pgsize_idx;
+   unsigned int pgsize_idx, pgsize_idx_next;
unsigned long pgsizes;
-   size_t pgsize;
+   size_t offset, pgsize, pgsize_next;
phys_addr_t addr_merge = paddr | iova;
 
/* Page sizes supported by the hardware and small enough for @size */
@@ -2378,7 +2378,37 @@ static size_t iommu_pgsize(struct iommu_domain *domain, 
unsigned long iova,
/* Pick the biggest page size remaining */
pgsize_idx = __fls(pgsizes);
pgsize = BIT(pgsize_idx);
+   if (!count)
+   return pgsize;
 
+
+   /* Find the next biggest support page size, if it exists */
+   pgsizes = domain->pgsize_bitmap & ~GENMASK(pgsize_idx, 0);
+   if (!pgsizes)
+   goto out_set_count;
+
+   pgsize_idx_next = __ffs(pgsizes);
+   pgsize_next = BIT(pgsize_idx_next);
+
+   /*
+* There's no point trying a bigger page size unless the virtual
+* and physical addresses are similarly offset within the larger page.
+*/
+   if ((iova ^ paddr) & (pgsize_next - 1))
+   goto out_set_count;
+
+   /* Calculate the offset to the next page size alignment boundary */
+   offset = pgsize_next - (addr_merge & (pgsize_next - 1));
+
+   /*
+* If size is big enough to accommodate the larger page, reduce
+* the number of smaller pages.
+*/
+   if (offset + pgsize_next <= size)
+   size = offset;
+
+out_set_count:
+   *count = size >> pgsize_idx;
return pgsize;
 }
 
@@ -2416,7 +2446,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, &paddr, size);
 
while (size) {
-   size_t pgsize = iommu_pgsize(domain, iova, paddr, size);
+   size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL);
 
pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 iova, &paddr, pgsize);
@@ -2467,6 +2497,19 @@ int iommu_map_atomic(struct iommu_domain *domain, 
unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_map_atomic);
 
+static size_t __iommu_unmap_pages(struct iommu_domain *domain,
+ unsigned long iova, size_t size,
+ struct iommu_iotlb_gather *iotlb_gather)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize, count;
+
+   pgsize = iommu_pgsize(domain, iova, iova, size, &count);
+   return ops->unmap_pages ?
+  ops->unmap_pages(domain, iova, pgsize, count, iotlb_gather) :
+  ops->unmap(domain, iova, pgsize, iotlb_gather);
+}
+
 static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather)
@@ -2476,7 +2519,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long orig_iova = iova;
unsigned int min_pagesz;
 
-   if (unlikely(ops->unmap == NULL ||
+   if (unlikely(!(ops->unmap || ops->unmap_pages) ||
 domain->pgsize_bitmap == 0UL))
return 0;
 
@@ -2504,10 +2547,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 * or we hit an area that isn't mapped.
 */
while (unmapped < size) {
-   size_t pgsize;
-
-   pgsize = iommu_pgsize(domain, iova, iova, size - unmapped);
-   unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather);
+   unmapped_page = __iommu_unmap_pages(domain, iova,
+   size - unmapped,
+   iotlb_gather);
if (!unmapped_page)
break;
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linu

[RFC PATCH v4 08/15] iommu: Add support for the map_pages() callback

2021-04-07 Thread Isaac J. Manjarres
Since iommu_pgsize can calculate how many pages of the
same size can be mapped/unmapped before the next largest
page size boundary, add support for invoking an IOMMU
driver's map_pages() callback, if it provides one.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/iommu.c | 43 +++
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d5d551754556..df55761932fd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2412,6 +2412,30 @@ static size_t iommu_pgsize(struct iommu_domain *domain, 
unsigned long iova,
return pgsize;
 }
 
+static int __iommu_map_pages(struct iommu_domain *domain, unsigned long iova,
+phys_addr_t paddr, size_t size, int prot,
+gfp_t gfp, size_t *mapped)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize, count;
+   int ret;
+
+   pgsize = iommu_pgsize(domain, iova, paddr, size, &count);
+
+   pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx count %ld\n",
+iova, &paddr, pgsize, count);
+
+   if (ops->map_pages) {
+   ret = ops->map_pages(domain, iova, paddr, pgsize, count, prot,
+gfp, mapped);
+   } else {
+   ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
+   *mapped = ret ? 0 : pgsize;
+   }
+
+   return ret;
+}
+
 static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
@@ -2422,7 +2446,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
phys_addr_t orig_paddr = paddr;
int ret = 0;
 
-   if (unlikely(ops->map == NULL ||
+   if (unlikely(!(ops->map || ops->map_pages) ||
 domain->pgsize_bitmap == 0UL))
return -ENODEV;
 
@@ -2446,18 +2470,21 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, &paddr, size);
 
while (size) {
-   size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL);
+   size_t mapped = 0;
 
-   pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
-iova, &paddr, pgsize);
-   ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
+   ret = __iommu_map_pages(domain, iova, paddr, size, prot, gfp,
+   &mapped);
+   /*
+* Some pages may have been mapped, even if an error occurred,
+* so we should account for those so they can be unmapped.
+*/
+   size -= mapped;
 
if (ret)
break;
 
-   iova += pgsize;
-   paddr += pgsize;
-   size -= pgsize;
+   iova += mapped;
+   paddr += mapped;
}
 
/* unroll mapping in case something went wrong */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 04/15] iommu: Add a map_pages() op for IOMMU drivers

2021-04-07 Thread Isaac J. Manjarres
Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can
call into the io-pgtable code, to map a physically contiguous
rnage of pages of the same size.

For IOMMU drivers that do not specify a map_pages() callback,
the existing logic of mapping memory one page block at a time
will be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Acked-by: Lu Baolu 
---
 include/linux/iommu.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9cf81242581a..528d6a58479e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -192,6 +192,8 @@ struct iommu_iotlb_gather {
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
  * @map: map a physically contiguous memory region to an iommu domain
+ * @map_pages: map a physically contiguous set of pages of the same size to
+ * an iommu domain.
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @unmap_pages: unmap a number of pages of the same size from an iommu domain
  * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
@@ -244,6 +246,9 @@ struct iommu_ops {
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
int (*map)(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+   int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped);
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size, struct iommu_iotlb_gather *iotlb_gather);
size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-04-07 Thread Isaac J. Manjarres
From: Will Deacon 

The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this address
is the logical OR of the virtual and physical addresses.

Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.

No functional change.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bcd623862bf9..ab689611a03b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
*domain, dma_addr_t iova)
 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
 
-static size_t iommu_pgsize(struct iommu_domain *domain,
-  unsigned long addr_merge, size_t size)
+static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
+  phys_addr_t paddr, size_t size)
 {
unsigned int pgsize_idx;
unsigned long pgsizes;
size_t pgsize;
+   phys_addr_t addr_merge = paddr | iova;
 
/* Page sizes supported by the hardware and small enough for @size */
pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
@@ -2415,7 +2416,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, &paddr, size);
 
while (size) {
-   size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
+   size_t pgsize = iommu_pgsize(domain, iova, paddr, size);
 
pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 iova, &paddr, pgsize);
@@ -2503,8 +2504,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 * or we hit an area that isn't mapped.
 */
while (unmapped < size) {
-   size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
+   size_t pgsize;
 
+   pgsize = iommu_pgsize(domain, iova, iova, size - unmapped);
unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather);
if (!unmapped_page)
break;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 02/15] iommu: Add an unmap_pages() op for IOMMU drivers

2021-04-07 Thread Isaac J. Manjarres
Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can call
into the io-pgtable code, to unmap a virtually contiguous
range of pages of the same size.

For IOMMU drivers that do not specify an unmap_pages() callback,
the existing logic of unmapping memory one page block at a time
will be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Will Deacon 
Acked-by: Lu Baolu 
---
 include/linux/iommu.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..9cf81242581a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -193,6 +193,7 @@ struct iommu_iotlb_gather {
  * @detach_dev: detach device from an iommu domain
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @unmap_pages: unmap a number of pages of the same size from an iommu domain
  * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
@@ -245,6 +246,9 @@ struct iommu_ops {
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size, struct iommu_iotlb_gather *iotlb_gather);
+   size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *iotlb_gather);
void (*flush_iotlb_all)(struct iommu_domain *domain);
void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
   size_t size);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 01/15] iommu/io-pgtable: Introduce unmap_pages() as a page table op

2021-04-07 Thread Isaac J. Manjarres
The io-pgtable code expects to operate on a single block or
granule of memory that is supported by the IOMMU hardware when
unmapping memory.

This means that when a large buffer that consists of multiple
such blocks is unmapped, the io-pgtable code will walk the page
tables to the correct level to unmap each block, even for blocks
that are virtually contiguous and at the same level, which can
incur an overhead in performance.

Introduce the unmap_pages() page table op to express to the
io-pgtable code that it should unmap a number of blocks of
the same size, instead of a single block. Doing so allows
multiple blocks to be unmapped in one call to the io-pgtable
code, reducing the number of page table walks, and indirect
calls.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Will Deacon 
---
 include/linux/io-pgtable.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index a4c9ca2c31f1..2ed0c057d9e7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -144,6 +144,7 @@ struct io_pgtable_cfg {
  *
  * @map:  Map a physically contiguous memory region.
  * @unmap:Unmap a physically contiguous memory region.
+ * @unmap_pages:  Unmap a range of virtually contiguous pages of the same size.
  * @iova_to_phys: Translate iova to physical address.
  *
  * These functions map directly onto the iommu_ops member functions with
@@ -154,6 +155,9 @@ struct io_pgtable_ops {
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
size_t size, struct iommu_iotlb_gather *gather);
+   size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *gather);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
unsigned long iova);
 };
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 00/15] Optimizing iommu_[map/unmap] performance

2021-04-07 Thread Isaac J. Manjarres
When unmapping a buffer from an IOMMU domain, the IOMMU framework unmaps
the buffer at a granule of the largest page size that is supported by
the IOMMU hardware and fits within the buffer. For every block that
is unmapped, the IOMMU framework will call into the IOMMU driver, and
then the io-pgtable framework to walk the page tables to find the entry
that corresponds to the IOVA, and then unmaps the entry.

This can be suboptimal in scenarios where a buffer or a piece of a
buffer can be split into several contiguous page blocks of the same size.
For example, consider an IOMMU that supports 4 KB page blocks, 2 MB page
blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is being
unmapped at IOVA 0. The current call-flow will result in 4 indirect calls,
and 2 page table walks, to unmap 2 entries that are next to each other in
the page-tables, when both entries could have been unmapped in one shot
by clearing both page table entries in the same call.

The same optimization is applicable to mapping buffers as well, so
these patches implement a set of callbacks called unmap_pages and
map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps
an IOVA range that consists of a number of pages of the same
page size that is supported by the IOMMU hardware, and allows for
manipulating multiple page table entries in the same set of indirect
calls. The reason for introducing these callbacks is to give other IOMMU
drivers/io-pgtable formats time to change to using the new callbacks, so
that the transition to using this approach can be done piecemeal.

Changes since V3:

* Removed usage of ULL variants of bitops from Will's patches, as
  they were not needed.
* Instead of unmapping/mapping pgcount pages, unmap_pages() and
  map_pages() will at most unmap and map pgcount pages, allowing
  for part of the pages in pgcount to be mapped and unmapped. This
  was done to simplify the handling in the io-pgtable layer.
* Extended the existing PTE manipulation methods in io-pgtable-arm
  to handle multiple entries, per Robin's suggestion, eliminating
  the need to add functions to clear multiple PTEs.
* Implemented a naive form of [map/unmap]_pages() for ARM v7s io-pgtable
  format.
* arm_[v7s/lpae]_[map/unmap] will call
  arm_[v7s/lpae]_[map_pages/unmap_pages] with an argument of 1 page.
* The arm_smmu_[map/unmap] functions have been removed, since they
  have been replaced by arm_smmu_[map/unmap]_pages.

Changes since V2:

* Added a check in __iommu_map() to check for the existence
  of either the map or map_pages callback as per Lu's suggestion.

Changes since V1:

* Implemented the map_pages() callbacks
* Integrated Will's patches into this series which
  address several concerns about how iommu_pgsize() partitioned a
  buffer (I made a minor change to the patch which changes
  iommu_pgsize() to use bitmaps by using the ULL variants of
  the bitops)

Isaac J. Manjarres (12):
  iommu/io-pgtable: Introduce unmap_pages() as a page table op
  iommu: Add an unmap_pages() op for IOMMU drivers
  iommu/io-pgtable: Introduce map_pages() as a page table op
  iommu: Add a map_pages() op for IOMMU drivers
  iommu: Add support for the map_pages() callback
  iommu/io-pgtable-arm: Prepare PTE methods for handling multiple
entries
  iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
  iommu/io-pgtable-arm: Implement arm_lpae_map_pages()
  iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()
  iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()
  iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback
  iommu/arm-smmu: Implement the map_pages() IOMMU driver callback

Will Deacon (3):
  iommu: Use bitmap to calculate page size in iommu_pgsize()
  iommu: Split 'addr_merge' argument to iommu_pgsize() into separate
parts
  iommu: Hook up '->unmap_pages' driver callback

 drivers/iommu/arm/arm-smmu/arm-smmu.c |  18 +--
 drivers/iommu/io-pgtable-arm-v7s.c|  48 ++-
 drivers/iommu/io-pgtable-arm.c| 184 +-
 drivers/iommu/iommu.c | 130 +-
 include/linux/io-pgtable.h|   8 ++
 include/linux/iommu.h |   9 ++
 6 files changed, 283 insertions(+), 114 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-07 Thread Lu Baolu

Hi Longpeng,

On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product 
Dept.) wrote:

Hi Baolu,


-Original Message-
From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Friday, April 2, 2021 12:44 PM
To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
; iommu@lists.linux-foundation.org;
linux-ker...@vger.kernel.org
Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
Amit ; Alex Williamson ;
Kevin Tian ; Gonglei (Arei) ;
sta...@vger.kernel.org
Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

Hi Longpeng,

On 4/1/21 3:18 PM, Longpeng(Mike) wrote:

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct

dmar_domain *domain,

 * removed to make room for superpage(s).
 * We're adding new large pages, so make sure
 * we don't remove their parent tables.
+*
+* We also need to flush the iotlb before 
creating
+* superpage to ensure it does not perserves any
+* obsolete info.
 */
-   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
-  largepage_lvl + 1);
+   if (dma_pte_present(pte)) {


The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE is
insufficient. How about removing this check and always performing cache
invalidation?



Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping ) 
orNOT-present ( e.g. create a totally new superpage mapping ), but we only need to 
call free_pagetable and flush_iotlb in the former case, right ?


But this code covers multiple PTEs and perhaps crosses the page
boundary.

How about moving this code into a separated function and check PTE
presence there. A sample code could look like below: [compiled but not
tested!]

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d334f5b4e382..0e04d450c38a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,

return level;
 }

+/*
+ * Ensure that old small page tables are removed to make room for 
superpage(s).
+ * We're going to add new large pages, so make sure we don't remove 
their parent

+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+unsigned long start_pfn,
+unsigned long end_pfn, int level)
+{
+   unsigned long lvl_pages = lvl_to_nr_pages(level);
+   struct dma_pte *pte = NULL;
+   int i;
+
+   while (start_pfn <= end_pfn) {
+   if (!pte)
+   pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+   if (dma_pte_present(pte)) {
+   dma_pte_free_pagetable(domain, start_pfn,
+  start_pfn + lvl_pages - 1,
+  level + 1);
+
+   for_each_domain_iommu(i, domain)
+   iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+   }
+
+   pte++;
+   start_pfn += lvl_pages;
+   if (first_pte_in_page(pte))
+   pte = NULL;
+   }
+}
+
 static int
 __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2341,22 +2376,11 @@ __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,

return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
-   unsigned long nr_superpages, end_pfn;
+   unsigned long end_pfn;

pteval |= DMA_PTE_LARGE_PAGE;
-   lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
-   nr_superpages = nr_pages / lvl_pages;
-   end_pfn = iov_pfn + nr_superpages * 
lvl_pages - 1;

-
-   /*
-* Ensure that old small page tables are
-* removed to make room for superpage(s).
-* We're adding new large pages, so make 
sure

-

Re: [PATCH 5.4 v2 1/1] iommu/vt-d: Fix agaw for a supported 48 bit guest address width

2021-04-07 Thread Lu Baolu

On 4/8/21 2:40 AM, Saeed Mirzamohammadi wrote:

The IOMMU driver calculates the guest addressability for a DMA request
based on the value of the mgaw reported from the IOMMU. However, this
is a fused value and as mentioned in the spec, the guest width
should be calculated based on the minimum of supported adjusted guest
address width (SAGAW) and MGAW.

This is from specification:
"Guest addressability for a given DMA request is limited to the
minimum of the value reported through this field and the adjusted
guest address width of the corresponding page-table structure.
(Adjusted guest address widths supported by hardware are reported
through the SAGAW field)."

This causes domain initialization to fail and following
errors appear for EHCI PCI driver:

[2.486393] ehci-pci :01:00.4: EHCI Host Controller
[2.486624] ehci-pci :01:00.4: new USB bus registered, assigned bus
number 1
[2.489127] ehci-pci :01:00.4: DMAR: Allocating domain failed
[2.489350] ehci-pci :01:00.4: DMAR: 32bit DMA uses non-identity
mapping
[2.489359] ehci-pci :01:00.4: can't setup: -12
[2.489531] ehci-pci :01:00.4: USB bus 1 deregistered
[2.490023] ehci-pci :01:00.4: init :01:00.4 fail, -12
[2.490358] ehci-pci: probe of :01:00.4 failed with error -12

This issue happens when the value of the sagaw corresponds to a
48-bit agaw. This fix updates the calculation of the agaw based on
the minimum of IOMMU's sagaw value and MGAW.

Signed-off-by: Saeed Mirzamohammadi 
Tested-by: Camille Lu 
---

Change in v2:
- Added cap_width to calculate AGAW based on the minimum value of MGAW and AGAW.
---
  drivers/iommu/intel-iommu.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 953d86ca6d2b..a2a03df97704 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1853,7 +1853,7 @@ static inline int guestwidth_to_adjustwidth(int gaw)
  static int domain_init(struct dmar_domain *domain, struct intel_iommu *iommu,
   int guest_width)
  {
-   int adjust_width, agaw;
+   int adjust_width, agaw, cap_width;
unsigned long sagaw;
int err;
  
@@ -1867,8 +1867,9 @@ static int domain_init(struct dmar_domain *domain, struct intel_iommu *iommu,

domain_reserve_special_ranges(domain);
  
  	/* calculate AGAW */

-   if (guest_width > cap_mgaw(iommu->cap))
-   guest_width = cap_mgaw(iommu->cap);
+   cap_width = min_t(int, cap_mgaw(iommu->cap), 
agaw_to_width(iommu->agaw));
+   if (guest_width > cap_width)
+   guest_width = cap_width;
domain->gaw = guest_width;
adjust_width = guestwidth_to_adjustwidth(guest_width);
agaw = width_to_agaw(adjust_width);



Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-07 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, April 7, 2021 8:21 PM
> 
> On Wed, Apr 07, 2021 at 02:08:33AM +, Tian, Kevin wrote:
> 
> > > Because if you don't then we enter insane world where a PASID is being
> > > created under /dev/ioasid but its translation path flows through setup
> > > done by VFIO and the whole user API becomes an incomprehensible
> mess.
> > >
> > > How will you even associate the PASID with the other translation??
> >
> > PASID is attached to a specific iommu domain (created by VFIO/VDPA),
> which
> > has GPA->HPA mappings already configured. If we view that mapping as an
> > attribute of the iommu domain, it's reasonable to have the userspace-
> bound
> > pgtable through /dev/ioasid to nest on it.
> 
> A user controlled page table should absolutely not be an attribute of
> a hidden kernel object, nor should two parts of the kernel silently
> connect to each other via a hidden internal objects like this.
> 
> Security is important - the kind of connection must use some explicit
> FD authorization to access shared objects, not be made implicit!
> 
> IMHO this direction is a dead end for this reason.
> 

Could you elaborate what exact security problem is brought with this 
approach? Isn't ALLOW_PASID the authorization interface for the 
connection?

Based on all your replies now I see what you actually want is generalizing
all IOMMU related stuff through /dev/ioasid (sort of /dev/iommu), which
requires factoring out the vfio_iommu_type1 into the general part. This is
a huge work.

Is it really the only practice in Linux that any new feature has to be
blocked as long as a refactoring work is identified? Don't people accept
any balance between enabling new features and completing refactoring
work through a staging approach, as long as we don't introduce an uAPI
specifically for the staging purpose? ☹

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2021 at 08:43:50PM +0200, Jean-Philippe Brucker wrote:

> * Get a container handle out of /dev/ioasid (or /dev/iommu, really.)
>   No operation available since we don't know what the device and IOMMU
>   capabilities are.
>
> * Attach the handle to a VF. With VFIO that would be
>   VFIO_GROUP_SET_CONTAINER. That causes the kernel to associate an IOMMU
>   with the handle, and decide which operations are available.

Right, this is basically the point, - the VFIO container (/dev/vfio)
and the /dev/ioasid we are talking about have a core of
similarity. ioasid is the generalized, modernized, and cross-subsystem
version of the same idea. Instead of calling it "vfio container" we
call it something that evokes the idea of controlling the iommu.

The issue is to seperate /dev/vfio generic functionality from vfio and
share it with every subsystem.

It may be that /dev/vfio and /dev/ioasid end up sharing a lot of code,
with a different IOCTL interface around it. The vfio_iommu_driver_ops
is not particularly VFIOy.

Creating /dev/ioasid may primarily start as a code reorganization
exercise.

> * With a map/unmap vIOMMU (or shadow mappings), a single translation level
>   is supported. With a nesting vIOMMU, we're populating the level-2
>   translation (some day maybe by binding the KVM page tables, but
>   currently with map/unmap ioctl).
> 
>   Single-level translation needs single VF per container. 

Really? Why?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-07 Thread Jean-Philippe Brucker
On Wed, Apr 07, 2021 at 08:17:50AM +, Tian, Kevin wrote:
> btw this discussion was raised when discussing the I/O page fault handling
> process. Currently the IOMMU layer implements a per-device fault reporting
> mechanism, which requires VFIO to register a handler to receive all faults 
> on its device and then forwards to ioasid if it's due to 1st-level. Possibly 
> it 
> makes more sense to convert it into a per-pgtable reporting scheme, and 
> then the owner of each pgtable should register its own handler.

Maybe, but you do need device information in there, since that's how the
fault is reported to the guest and how the response is routed back to the
faulting device (only PASID+PRGI would cause aliasing). And we need to
report non-recoverable faults, as well as recoverable ones without PASID,
once we hand control of level-1 page tables to guests.

> It means
> for 1) VFIO will register a 2nd-level pgtable handler while /dev/ioasid
> will register a 1st-level pgtable handler, while for 3) /dev/ioasid will 
> register 
> handlers for both 1st-level and 2nd-level pgtable. Jean? also want to know 
> your thoughts...  

Moving all IOMMU controls to /dev/ioasid rather that splitting them is
probably better. Hopefully the implementation can reuse most of
vfio_iommu_type1.

I'm trying to sketch what may work for Arm, if we have to reuse
/dev/ioasid to avoid duplication of fault and inval queues:

* Get a container handle out of /dev/ioasid (or /dev/iommu, really.)
  No operation available since we don't know what the device and IOMMU
  capabilities are.

* Attach the handle to a VF. With VFIO that would be
  VFIO_GROUP_SET_CONTAINER. That causes the kernel to associate an IOMMU
  with the handle, and decide which operations are available.

* With a map/unmap vIOMMU (or shadow mappings), a single translation level
  is supported. With a nesting vIOMMU, we're populating the level-2
  translation (some day maybe by binding the KVM page tables, but
  currently with map/unmap ioctl).

  Single-level translation needs single VF per container. Two level would
  allow sharing stage-2 between multiple VFs, though it's a pain to define
  and implement.

* Without a vIOMMU or if the vIOMMU starts in bypass, populate the
  container page tables.

Start the guest.

* With a map/unmap vIOMMU, guest creates mappings, userspace populates the
  page tables with map/unmap ioctl.

  It would be possible to add a PASID mode there: guest requests an
  address space with a specific PASID, userspace derives an IOASID handle
  from the container handle and populate that address space with map/unmap
  ioctl. That would enable PASID on sub-VF assignment, which requires the
  host to control which PASID is programmed into the VF (with
  DEVICE_ALLOW_IOASID, I guess). And either the host allocates the PASID
  in this case (which isn't supported by a vSMMU) or we have to do a
  vPASID -> pPASID. I don't know if it's worth the effort.

Or
* With a nesting vIOMMU, the guest attaches a PASID table to a VF,
  userspace issues a SET_PASID_TABLE ioctl on the container handle. If
  we support multiple VFs per container, we first need to derive a child
  container from the main one and the device, then attach the PASID table.

  Guest programs the PASID table, sends invalidations when removing
  mappings which are relayed to the host on the child container. Page
  faults and response queue would be per container, so if multiple VF per
  container, we could have one queue for the parent (level-2 faults) and
  one for each child (level-1 faults).

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5.4 1/1] iommu/vt-d: Fix agaw for a supported 48 bit guest address width

2021-04-07 Thread Saeed Mirzamohammadi
Hi Lu,


On Apr 6, 2021, at 6:43 PM, Lu Baolu 
mailto:baolu...@linux.intel.com>> wrote:

Hi Saeed,

On 4/7/21 12:35 AM, Saeed Mirzamohammadi wrote:
The IOMMU driver calculates the guest addressability for a DMA request
based on the value of the mgaw reported from the IOMMU. However, this
is a fused value and as mentioned in the spec, the guest width
should be calculated based on the supported adjusted guest address width
(SAGAW).
This is from specification:
"Guest addressability for a given DMA request is limited to the
minimum of the value reported through this field and the adjusted
guest address width of the corresponding page-table structure.
(Adjusted guest address widths supported by hardware are reported
through the SAGAW field)."
This causes domain initialization to fail and following
errors appear for EHCI PCI driver:
[2.486393] ehci-pci :01:00.4: EHCI Host Controller
[2.486624] ehci-pci :01:00.4: new USB bus registered, assigned bus
number 1
[2.489127] ehci-pci :01:00.4: DMAR: Allocating domain failed
[2.489350] ehci-pci :01:00.4: DMAR: 32bit DMA uses non-identity
mapping
[2.489359] ehci-pci :01:00.4: can't setup: -12
[2.489531] ehci-pci :01:00.4: USB bus 1 deregistered
[2.490023] ehci-pci :01:00.4: init :01:00.4 fail, -12
[2.490358] ehci-pci: probe of :01:00.4 failed with error -12
This issue happens when the value of the sagaw corresponds to a
48-bit agaw. This fix updates the calculation of the agaw based on
the IOMMU's sagaw value.
Signed-off-by: Saeed Mirzamohammadi 
mailto:saeed.mirzamohamm...@oracle.com>>
Tested-by: Camille Lu mailto:camille...@hpe.com>>
---
 drivers/iommu/intel-iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 953d86ca6d2b..396e14fad54b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1867,8 +1867,8 @@ static int domain_init(struct dmar_domain *domain, struct 
intel_iommu *iommu,
  domain_reserve_special_ranges(domain);
/* calculate AGAW */
- if (guest_width > cap_mgaw(iommu->cap))
- guest_width = cap_mgaw(iommu->cap);
+ if (guest_width > agaw_to_width(iommu->agaw))
+ guest_width = agaw_to_width(iommu->agaw);

The spec requires to use a minimum of MGAW and AGAW, so why not,

cap_width = min_t(int, cap_mgaw(iommu->cap), agaw_to_width(iommu->agaw));
if (guest_width > cap_width)
guest_width = cap_width;

Yes, this works better. I just sent the v2.

Thanks,
Saeed


Best regards,
baolu

  domain->gaw = guest_width;
  adjust_width = guestwidth_to_adjustwidth(guest_width);
  agaw = width_to_agaw(adjust_width);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 5.4 v2 1/1] iommu/vt-d: Fix agaw for a supported 48 bit guest address width

2021-04-07 Thread Saeed Mirzamohammadi
The IOMMU driver calculates the guest addressability for a DMA request
based on the value of the mgaw reported from the IOMMU. However, this
is a fused value and as mentioned in the spec, the guest width
should be calculated based on the minimum of supported adjusted guest
address width (SAGAW) and MGAW.

This is from specification:
"Guest addressability for a given DMA request is limited to the
minimum of the value reported through this field and the adjusted
guest address width of the corresponding page-table structure.
(Adjusted guest address widths supported by hardware are reported
through the SAGAW field)."

This causes domain initialization to fail and following
errors appear for EHCI PCI driver:

[2.486393] ehci-pci :01:00.4: EHCI Host Controller
[2.486624] ehci-pci :01:00.4: new USB bus registered, assigned bus
number 1
[2.489127] ehci-pci :01:00.4: DMAR: Allocating domain failed
[2.489350] ehci-pci :01:00.4: DMAR: 32bit DMA uses non-identity
mapping
[2.489359] ehci-pci :01:00.4: can't setup: -12
[2.489531] ehci-pci :01:00.4: USB bus 1 deregistered
[2.490023] ehci-pci :01:00.4: init :01:00.4 fail, -12
[2.490358] ehci-pci: probe of :01:00.4 failed with error -12

This issue happens when the value of the sagaw corresponds to a
48-bit agaw. This fix updates the calculation of the agaw based on
the minimum of IOMMU's sagaw value and MGAW.

Signed-off-by: Saeed Mirzamohammadi 
Tested-by: Camille Lu 
---

Change in v2:
- Added cap_width to calculate AGAW based on the minimum value of MGAW and AGAW.
---
 drivers/iommu/intel-iommu.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 953d86ca6d2b..a2a03df97704 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1853,7 +1853,7 @@ static inline int guestwidth_to_adjustwidth(int gaw)
 static int domain_init(struct dmar_domain *domain, struct intel_iommu *iommu,
   int guest_width)
 {
-   int adjust_width, agaw;
+   int adjust_width, agaw, cap_width;
unsigned long sagaw;
int err;
 
@@ -1867,8 +1867,9 @@ static int domain_init(struct dmar_domain *domain, struct 
intel_iommu *iommu,
domain_reserve_special_ranges(domain);
 
/* calculate AGAW */
-   if (guest_width > cap_mgaw(iommu->cap))
-   guest_width = cap_mgaw(iommu->cap);
+   cap_width = min_t(int, cap_mgaw(iommu->cap), 
agaw_to_width(iommu->agaw));
+   if (guest_width > cap_width)
+   guest_width = cap_width;
domain->gaw = guest_width;
adjust_width = guestwidth_to_adjustwidth(guest_width);
agaw = width_to_agaw(adjust_width);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-07 Thread Nadav Amit



> On Apr 7, 2021, at 3:01 AM, Joerg Roedel  wrote:
> 
> On Tue, Mar 23, 2021 at 02:06:19PM -0700, Nadav Amit wrote:
>> From: Nadav Amit 
>> 
>> Currently, IOMMU invalidations and device-IOTLB invalidations using
>> AMD IOMMU fall back to full address-space invalidation if more than a
>> single page need to be flushed.
>> 
>> Full flushes are especially inefficient when the IOMMU is virtualized by
>> a hypervisor, since it requires the hypervisor to synchronize the entire
>> address-space.
>> 
>> AMD IOMMUs allow to provide a mask to perform page-specific
>> invalidations for multiple pages that match the address. The mask is
>> encoded as part of the address, and the first zero bit in the address
>> (in bits [51:12]) indicates the mask size.
>> 
>> Use this hardware feature to perform selective IOMMU and IOTLB flushes.
>> Combine the logic between both for better code reuse.
>> 
>> The IOMMU invalidations passed a smoke-test. The device IOTLB
>> invalidations are untested.
> 
> Have you thoroughly tested this on real hardware? I had a patch-set
> doing the same many years ago and it lead to data corruption under load.
> Back then it could have been a bug in my code of course, but it made me
> cautious about using targeted invalidations.

I tested it on real bare-metal hardware. I ran some basic I/O workloads
with the IOMMU enabled, checkers enabled/disabled, and so on.

However, I only tested the IOMMU-flushes and I did not test that the
device-IOTLB flush work, since I did not have the hardware for that.

If you can refer me to the old patches, I will have a look and see
whether I can see a difference in the logic or test them. If you want
me to run different tests - let me know. If you want me to remove
the device-IOTLB invalidations logic - that is also fine with me.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2021 at 02:08:33AM +, Tian, Kevin wrote:

> > Because if you don't then we enter insane world where a PASID is being
> > created under /dev/ioasid but its translation path flows through setup
> > done by VFIO and the whole user API becomes an incomprehensible mess.
> > 
> > How will you even associate the PASID with the other translation??
> 
> PASID is attached to a specific iommu domain (created by VFIO/VDPA), which
> has GPA->HPA mappings already configured. If we view that mapping as an
> attribute of the iommu domain, it's reasonable to have the userspace-bound
> pgtable through /dev/ioasid to nest on it.

A user controlled page table should absolutely not be an attribute of
a hidden kernel object, nor should two parts of the kernel silently
connect to each other via a hidden internal objects like this.

Security is important - the kind of connection must use some explicit
FD authorization to access shared objects, not be made implicit!

IMHO this direction is a dead end for this reason.

> > The entire translation path for any ioasid or PASID should be defined
> > only by /dev/ioasid. Everything else is a legacy API.
> > 
> > > If following your suggestion then VFIO must deny VFIO MAP operations
> > > on sva1 (assume userspace should not mix sva1 and sva2 in the same
> > > container and instead use /dev/ioasid to map for sva1)?
> > 
> > No, userspace creates an iosaid for the guest physical mapping and
> > passes this ioasid to VFIO PCI which will assign it as the first layer
> > mapping on the RID
> 
> Is it an dummy ioasid just for providing GPA mappings for nesting purpose
> of other IOASIDs? Then we waste one per VM?

Generic ioasid's are "free" they are just software constructs in the
kernel.

> > When PASIDs are allocated the uAPI will be told to logically nested
> > under the first ioasid. When VFIO authorizes a PASID for a RID it
> > checks that all the HW rules are being followed.
> 
> As I explained above, why cannot we just use iommu domain to connect 
> the dots? 

Security.

> Every passthrough framework needs to create an iommu domain
> first. and It needs to support both devices w/ PASID and devices w/o
> PASID.  For devices w/o PASID it needs to invent its own MAP
> interface anyway.  

No, it should consume a ioasid from /dev/ioasid, use a common ioasid
map interface and assign that ioasid to a RID.

Don't get so fixated on PASID as a special case

> Then why do we bother creating another MAP interface through
> /dev/ioasid which not only duplicates but also creating transition
> burden between two set of MAP interfaces when the guest turns on/off
> the pasid capability on the device?

Don't transition. Always use the new interface. qemu detects the
kernel supports /dev/ioasid and *all iommu page table configuration*
goes through there. VFIO and VDPA APIs become unused for iommu
configuration.

> 'universally' upon from which angle you look at this problem. From IOASID
> p.o.v possibly yes, but from device passthrough p.o.v. it's the opposite
> since the passthrough framework needs to handle devices w/o PASID anyway
> (or even for device w/ PASID it could send traffic w/o PASID) thus 
> 'universally'
> makes more sense if the passthrough framework can use one interface of its
> own to manage GPA mappings for all consumers (apply to the case when a
> PASID is allowed/authorized).

You correctly named it /dev/ioasid, it is a generic way to allocate,
manage and assign IOMMU page tables, which when generalized, only some
of which may consume a limited PASID.

RID and RID,PASID are the same thing, just a small difference in how
they match TLPs.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2021 at 08:17:50AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe
> > Sent: Tuesday, April 6, 2021 8:43 PM
> > 
> > On Tue, Apr 06, 2021 at 09:35:17AM +0800, Jason Wang wrote:
> > 
> > > > VFIO and VDPA has no buisness having map/unmap interfaces once we
> > have
> > > > /dev/ioasid. That all belongs in the iosaid side.
> > > >
> > > > I know they have those interfaces today, but that doesn't mean we have
> > > > to keep using them for PASID use cases, they should be replaced with a
> > > > 'do dma from this pasid on /dev/ioasid' interface certainly not a
> > > > 'here is a pasid from /dev/ioasid, go ahead and configure it youself'
> > > > interface
> > >
> > > So it looks like the PASID was bound to SVA in this design. I think it's 
> > > not
> > > necessairly the case:
> > 
> > No, I wish people would stop talking about SVA.
> > 
> > SVA and vSVA are a very special narrow configuration of a PASID. There
> > are lots of other PASID configurations! That is the whole point, a
> > PASID is complicated, there are many configuration scenarios, they
> > need to be in one place with a very clearly defined uAPI
> > 
> 
> I feel it also makes sense to allow a subsystem to specify which 
> configurations
> are permitted when allowing a PASID on its device

huh? why?

> e.g. excluding things like
> GPA mappings that existing subsystems (VFIO/VDPA) already handle well:

They don't "handle well", they have some historical baggage that is no
longer suitable for the complexity this area has in the modern world.

Forget about the existing APIs and replace them in /dev/ioasid.

> - Share GPA mappings between multiple devices (w/ or w/o PASID) for 
> better IOTLB efficiency;
>
> - Share GPA mappings between transactions w/ PASID and transactions w/o
> PASID from the same device (e.g. GPU) for better IOTLB efficiency;
> 
> - Use the same page table for GPA mappings before and after the guest 
> turns on/off the PASID capability;

All of these are cases you need to design the /dev/ioasid to handle.

It is pretty clear to me that you'll need non-PASID IOASID's as
well.

Ideally a generic IOASID would just be a page table and it doesn't
crystalize into a RID or RID,PASID routing until devices are attached
to it.

Since IOASID can be nested the only thing that makes any sense is for
each level of the nest to be visible under /dev/ioasid. 

What a complete mess it would be if vfio-pci owns the GPA table,
/dev/ioasid has a nested PASID, and vfio-mdev is running a mdev on top
of that PASID.

> All above are given as long as we continue to let VFIO/VDPA manage the
> iommu domain and associated GPA mappings for PASID.

So don't do that. Don't I keep saying this weird split is making a
horrible mess?

You can't reasonably build the complex PASID scenarios you talk about
above unless the entire translation path is owned by one entity:
/dev/ioasid.

You need to focus on figuring out what that looks like then figure out
how to move VDPA and VFIO to consume /dev/ioasid for all of their
translation instead of open-coding half-baked internal versions.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 7/9] vfio/mdev: Add iommu related member in mdev_device

2021-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2021 at 09:58:05AM +0800, Lu Baolu wrote:

> I've ever tried to implement a bus iommu_ops for mdev devices.
> 
> https://lore.kernel.org/lkml/20201030045809.957927-1-baolu...@linux.intel.com/
> 
> Any comments?

You still have the symbol_get, so something continues to be wrong with
that series:

+   mdev_bus = symbol_get(mdev_bus_type);
+   if (mdev_bus) {
+   if (bus == mdev_bus && !iommu_present(bus)) {
+   symbol_put(mdev_bus_type);

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command

2021-04-07 Thread Will Deacon
On Wed, 7 Apr 2021 16:44:48 +0800, Zenghui Yu wrote:
> Per SMMUv3 spec, there is no Size and Addr field in the PREFETCH_CONFIG
> command and they're not used by the driver. Remove them.
> 
> We can add them back if we're going to use PREFETCH_ADDR in the future.

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command
  https://git.kernel.org/will/c/e0bb4b735404

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu: io-pgtable: add DART pagetable format

2021-04-07 Thread Will Deacon
On Sun, Mar 28, 2021 at 09:40:07AM +0200, Sven Peter wrote:
> Apple's DART iommu uses a pagetable format that shares some
> similarities with the ones already implemented by io-pgtable.c.
> Add a new format variant to support the required differences
> so that we don't have to duplicate the pagetable handling code.
> 
> Signed-off-by: Sven Peter 
> ---
>  drivers/iommu/io-pgtable-arm.c | 59 ++
>  drivers/iommu/io-pgtable.c |  1 +
>  include/linux/io-pgtable.h |  6 
>  3 files changed, 66 insertions(+)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 87def58e79b5..2f63443fd115 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -127,6 +127,9 @@
>  #define ARM_MALI_LPAE_MEMATTR_IMP_DEF0x88ULL
>  #define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
>  
> +#define APPLE_DART_PTE_PROT_NO_WRITE (1<<7)
> +#define APPLE_DART_PTE_PROT_NO_READ (1<<8)
> +
>  /* IOPTE accessors */
>  #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
>  
> @@ -381,6 +384,15 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
> arm_lpae_io_pgtable *data,
>  {
>   arm_lpae_iopte pte;
>  
> + if (data->iop.fmt == ARM_APPLE_DART) {
> + pte = 0;
> + if (!(prot & IOMMU_WRITE))
> + pte |= APPLE_DART_PTE_PROT_NO_WRITE;
> + if (!(prot & IOMMU_READ))
> + pte |= APPLE_DART_PTE_PROT_NO_READ;
> + return pte;
> + }
> +
>   if (data->iop.fmt == ARM_64_LPAE_S1 ||
>   data->iop.fmt == ARM_32_LPAE_S1) {
>   pte = ARM_LPAE_PTE_nG;
> @@ -1043,6 +1055,48 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg 
> *cfg, void *cookie)
>   return NULL;
>  }
>  
> +static struct io_pgtable *
> +apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> +{
> + struct arm_lpae_io_pgtable *data;
> +
> + if (cfg->ias > 36)
> + return NULL;
> + if (cfg->oas > 36)
> + return NULL;
> +
> + if (!cfg->coherent_walk)
> + return NULL;

This all feels like IOMMU-specific limitations leaking into the page-table
code here; it doesn't feel so unlikely that future implementations of this
IP might have greater addressing capabilities, for example, and so I don't
see why the page-table code needs to police this.

> + cfg->pgsize_bitmap &= SZ_16K;
> + if (!cfg->pgsize_bitmap)
> + return NULL;

This is worrying (and again, I don't think this belongs here). How is this
thing supposed to work if the CPU is using 4k pages?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu: dart: Add DART iommu driver

2021-04-07 Thread Will Deacon
On Sun, Mar 28, 2021 at 09:40:09AM +0200, Sven Peter wrote:
> Apple's new SoCs use iommus for almost all peripherals. These Device
> Address Resolution Tables must be setup before these peripherals can
> act as DMA masters.
> 
> Signed-off-by: Sven Peter 
> ---
>  MAINTAINERS  |   1 +
>  drivers/iommu/Kconfig|  14 +
>  drivers/iommu/Makefile   |   1 +
>  drivers/iommu/apple-dart-iommu.c | 858 +++
>  4 files changed, 874 insertions(+)
>  create mode 100644 drivers/iommu/apple-dart-iommu.c

[...]

> +/* must be called with held domain->lock */
> +static int apple_dart_attach_stream(struct apple_dart_domain *domain,
> + struct apple_dart *dart, u32 sid)
> +{
> + unsigned long flags;
> + struct apple_dart_stream *stream;
> + struct io_pgtable_cfg *pgtbl_cfg;
> + int ret;
> +
> + list_for_each_entry(stream, &domain->streams, stream_head) {
> + if (stream->dart == dart && stream->sid == sid) {
> + stream->num_devices++;
> + return 0;
> + }
> + }
> +
> + spin_lock_irqsave(&dart->lock, flags);
> +
> + if (WARN_ON(dart->used_sids & BIT(sid))) {
> + ret = -EINVAL;
> + goto error;
> + }
> +
> + stream = kzalloc(sizeof(*stream), GFP_KERNEL);
> + if (!stream) {
> + ret = -ENOMEM;
> + goto error;
> + }

Just in case you missed it, a cocci bot noticed that you're using GFP_KERNEL
to allocate while holding a spinlock here:

https://lore.kernel.org/r/alpine.DEB.2.22.394.2104041724340.2958@hadrien

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Revert "iommu/amd: Fix performance counter initialization"

2021-04-07 Thread Joerg Roedel
Hi Paul,

On Thu, Mar 18, 2021 at 10:20:16AM +0100, Paul Menzel wrote:
> Jörg, I know you are probably busy, but the patch was applied to the stable
> series (v5.11.7). There are still too many question open regarding the
> patch, and Suravee has not yet addressed the comments. It’d be great, if you
> could revert it.

We are currently discussing the next steps here. Maybe the retry logic
can be removed entirely.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-07 Thread Joerg Roedel
On Tue, Mar 23, 2021 at 02:06:19PM -0700, Nadav Amit wrote:
> From: Nadav Amit 
> 
> Currently, IOMMU invalidations and device-IOTLB invalidations using
> AMD IOMMU fall back to full address-space invalidation if more than a
> single page need to be flushed.
> 
> Full flushes are especially inefficient when the IOMMU is virtualized by
> a hypervisor, since it requires the hypervisor to synchronize the entire
> address-space.
> 
> AMD IOMMUs allow to provide a mask to perform page-specific
> invalidations for multiple pages that match the address. The mask is
> encoded as part of the address, and the first zero bit in the address
> (in bits [51:12]) indicates the mask size.
> 
> Use this hardware feature to perform selective IOMMU and IOTLB flushes.
> Combine the logic between both for better code reuse.
> 
> The IOMMU invalidations passed a smoke-test. The device IOTLB
> invalidations are untested.

Have you thoroughly tested this on real hardware? I had a patch-set
doing the same many years ago and it lead to data corruption under load.
Back then it could have been a bug in my code of course, but it made me
cautious about using targeted invalidations.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 09/12] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-07 Thread Will Deacon
On Tue, Apr 06, 2021 at 02:02:26PM -0700, isa...@codeaurora.org wrote:
> On 2021-04-06 05:15, Will Deacon wrote:
> > On Mon, Apr 05, 2021 at 12:11:09PM -0700, Isaac J. Manjarres wrote:
> > > Implement the unmap_pages() callback for the ARM LPAE io-pgtable
> > > format.
> > > 
> > > Signed-off-by: Isaac J. Manjarres 
> > > Suggested-by: Will Deacon 
> > > ---
> > >  drivers/iommu/io-pgtable-arm.c | 124
> > > +++--
> > >  1 file changed, 104 insertions(+), 20 deletions(-)
> > 
> > [...]
> > 
> > > +static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops,
> > > unsigned long iova,
> > > +size_t pgsize, size_t pgcount,
> > > +struct iommu_iotlb_gather *gather)
> > > +{
> > > + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > > + struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > > + arm_lpae_iopte *ptep = data->pgd;
> > > + long iaext = (s64)iova >> cfg->ias;
> > > + size_t unmapped = 0, unmapped_page;
> > > + int last_lvl;
> > > + size_t table_size, pages, tbl_offset, max_entries;
> > > +
> > > + if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize ||
> > > !pgcount))
> > > + return 0;
> > > +
> > > + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> > > + iaext = ~iaext;
> > > + if (WARN_ON(iaext))
> > > + return 0;
> > > +
> > > + /*
> > > +  * Calculating the page table size here helps avoid situations where
> > > +  * a page range that is being unmapped may be mapped at the same
> > > level
> > > +  * but not mapped by the same tables. Allowing such a scenario to
> > > +  * occur can complicate the logic in arm_lpae_split_blk_unmap().
> > > +  */
> > > + last_lvl = ARM_LPAE_BLOCK_SIZE_LVL(pgsize, data);
> > > +
> > > + if (last_lvl == data->start_level)
> > > + table_size = ARM_LPAE_PGD_SIZE(data);
> > > + else
> > > + table_size = ARM_LPAE_GRANULE(data);
> > > +
> > > + max_entries = table_size / sizeof(*ptep);
> > > +
> > > + while (pgcount) {
> > > + tbl_offset = ARM_LPAE_LVL_IDX(iova, last_lvl, data);
> > > + pages = min_t(size_t, pgcount, max_entries - tbl_offset);
> > > + unmapped_page = __arm_lpae_unmap(data, gather, iova, pgsize,
> > > +  pages, data->start_level,
> > > +  ptep);
> > > + if (!unmapped_page)
> > > + break;
> > > +
> > > + unmapped += unmapped_page;
> > > + iova += unmapped_page;
> > > + pgcount -= pages;
> > > + }
> > 
> > Robin has some comments on the first version of this patch, and I
> > don't think you
> > addressed them:
> > 
> > https://lore.kernel.org/r/b93fa0b1-e2a4-1aad-8b88-4d0dfecdf...@arm.com
> > 
> > I'm inclined to agree that iterating here doesn't make a lot of sense --
> > we've already come back out of the page-table walk, so I think we should
> > just return to the caller (who is well prepared to handle a partial
> > unmap).
> > Same for the map side of things.
> > 
> > If we get numbers showing that this is causing a performance issue, then
> > we should rework the page-table code to handle this at the lower level
> > (because I doubt the loop that you have is really worse than returning
> > to
> > the caller anyway).
> > 
> Sorry, I seem to have overlooked those comments.
> 
> I will go ahead and address them. I think it might be ideal to try to do
> as much work as possible in the io-pgtable level, so as to minimize the
> number
> of indirect calls incurred by jumping back and forth between iommu fwk,
> iommu
> driver, and io-pgtable code.
> 
> Perhaps we can try something like how the linear mappings are created on
> arm64 i.e.
> on the previous level, we can determine how many pages can be unmapped in
> one page table
> in one iteration, and on the subsequent iterations, we can tackle another
> page table at
> the lower level. Looking at the code, it doesn't seem too difficult to add
> this in. Thoughts?

I don't object to getting there eventually, but as an initial step I think
returning back to the caller is perfectly reasonable and will probably make
the patches easier to review. In other words, implement the simple (correct)
thing first, and then have subsequent patches to improve it.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 03/12] iommu/io-pgtable: Introduce map_pages() as a page table op

2021-04-07 Thread Will Deacon
On Tue, Apr 06, 2021 at 02:07:41PM -0700, isa...@codeaurora.org wrote:
> On 2021-04-06 04:57, Will Deacon wrote:
> > On Mon, Apr 05, 2021 at 12:11:03PM -0700, Isaac J. Manjarres wrote:
> > > Mapping memory into io-pgtables follows the same semantics
> > > that unmapping memory used to follow (i.e. a buffer will be
> > > mapped one page block per call to the io-pgtable code). This
> > > means that it can be optimized in the same way that unmapping
> > > memory was, so add a map_pages() callback to the io-pgtable
> > > ops structure, so that a range of pages of the same size
> > > can be mapped within the same call.
> > > 
> > > Signed-off-by: Isaac J. Manjarres 
> > > Suggested-by: Will Deacon 
> > > ---
> > >  include/linux/io-pgtable.h | 4 
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> > > index 2ed0c057d9e7..019149b204b8 100644
> > > --- a/include/linux/io-pgtable.h
> > > +++ b/include/linux/io-pgtable.h
> > > @@ -143,6 +143,7 @@ struct io_pgtable_cfg {
> > >   * struct io_pgtable_ops - Page table manipulation API for IOMMU
> > > drivers.
> > >   *
> > >   * @map:  Map a physically contiguous memory region.
> > > + * @map_pages:Map a physically contiguous range of pages of the
> > > same size.
> > >   * @unmap:Unmap a physically contiguous memory region.
> > >   * @unmap_pages:  Unmap a range of virtually contiguous pages of
> > > the same size.
> > >   * @iova_to_phys: Translate iova to physical address.
> > > @@ -153,6 +154,9 @@ struct io_pgtable_cfg {
> > >  struct io_pgtable_ops {
> > >   int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
> > >  phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> > > + int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
> > > +  phys_addr_t paddr, size_t pgsize, size_t pgcount,
> > > +  int prot, gfp_t gfp, size_t *mapped);
> > 
> > How about returning 'size_t' and using IS_ERR_VALUE() instead of adding
> > the extra 'mapped' argument (i.e. return the size of the region mapped
> > or an error code)? I don't think we realistically need to care about map
> > sizes that overlap with the error region.
> > 
> I'd given that a shot before, but the problem that I kept running into was
> that
> in case of an error, if I return an error code, I don't know how much memory
> was mapped, so that I can invoke iommu_unmap from __iommu_map with that size
> to
> undo the partial mappings from a map_pages() call.

Ah yes, sorry, I see it now. So keep this as you've got it. Pushing the
cleanup path deeper doesn't feel like the right thing to do.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-04-07 Thread Zenghui Yu

Hi Eric,

On 2021/2/24 5:06, Eric Auger wrote:

+/*
+ * VFIO_IOMMU_SET_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
+ * struct vfio_iommu_type1_set_pasid_table)
+ *
+ * The SET operation passes a PASID table to the host while the
+ * UNSET operation detaches the one currently programmed. Setting
+ * a table while another is already programmed replaces the old table.


It looks to me that this description doesn't match the IOMMU part.

[v14,05/13] iommu/smmuv3: Implement attach/detach_pasid_table

|   case IOMMU_PASID_CONFIG_TRANSLATE:
|   /* we do not support S1 <-> S1 transitions */
|   if (smmu_domain->s1_cfg.set)
|   goto out;

Maybe I've misread something?


Thanks,
Zenghui
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: AMD IOMMU cleanups and dead code removal

2021-04-07 Thread Joerg Roedel
On Fri, Apr 02, 2021 at 04:33:08PM +0200, Christoph Hellwig wrote:
> Hi,
> 
> this series cleans up a few random bits in the AMD IOMMU driver.
> 
> Diffstat:
>  arch/x86/events/amd/iommu.c|1 
>  arch/x86/events/amd/iommu.h|   19 --
>  drivers/gpu/drm/amd/amdkfd/kfd_iommu.c |4 -
>  drivers/iommu/amd/amd_iommu.h  |2 
>  drivers/iommu/amd/amd_iommu_types.h|1 
>  drivers/iommu/amd/init.c   |5 -
>  drivers/iommu/amd/iommu.c  |   90 
> +
>  include/linux/amd-iommu.h  |   30 ---
>  8 files changed, 16 insertions(+), 136 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: cleanup unused or almost unused IOMMU APIs and the FSL PAMU driver v3

2021-04-07 Thread Joerg Roedel
On Thu, Apr 01, 2021 at 05:52:36PM +0200, Christoph Hellwig wrote:
> Diffstat:
>  arch/powerpc/include/asm/fsl_pamu_stash.h   |   12 
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |5 
>  drivers/iommu/amd/iommu.c   |   23 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   75 ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |1 
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  111 +---
>  drivers/iommu/arm/arm-smmu/arm-smmu.h   |2 
>  drivers/iommu/dma-iommu.c   |9 
>  drivers/iommu/fsl_pamu.c|  293 ---
>  drivers/iommu/fsl_pamu.h|   12 
>  drivers/iommu/fsl_pamu_domain.c |  688 
> ++--
>  drivers/iommu/fsl_pamu_domain.h |   46 -
>  drivers/iommu/intel/iommu.c |   95 ---
>  drivers/iommu/iommu.c   |  118 +---
>  drivers/soc/fsl/qbman/qman_portal.c |   55 --
>  drivers/vfio/vfio_iommu_type1.c |   31 -
>  drivers/vhost/vdpa.c|   10 
>  include/linux/io-pgtable.h  |4 
>  include/linux/iommu.h   |   76 ---
>  19 files changed, 203 insertions(+), 1463 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 00/10] iommu: I/O page faults for SMMUv3

2021-04-07 Thread Joerg Roedel
On Thu, Apr 01, 2021 at 05:47:09PM +0200, Jean-Philippe Brucker wrote:
> Jean-Philippe Brucker (10):
>   iommu: Fix comment for struct iommu_fwspec
>   iommu/arm-smmu-v3: Use device properties for pasid-num-bits
>   iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA
>   iommu/vt-d: Support IOMMU_DEV_FEAT_IOPF
>   uacce: Enable IOMMU_DEV_FEAT_IOPF
>   iommu: Add a page fault handler
>   iommu/arm-smmu-v3: Maintain a SID->device structure

Applied the first 7 patches, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command

2021-04-07 Thread Zenghui Yu
Per SMMUv3 spec, there is no Size and Addr field in the PREFETCH_CONFIG
command and they're not used by the driver. Remove them.

We can add them back if we're going to use PREFETCH_ADDR in the future.

Signed-off-by: Zenghui Yu 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 --
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a83043..610c9f4b7789 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -245,8 +245,6 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
arm_smmu_cmdq_ent *ent)
break;
case CMDQ_OP_PREFETCH_CFG:
cmd[0] |= FIELD_PREP(CMDQ_PREFETCH_0_SID, ent->prefetch.sid);
-   cmd[1] |= FIELD_PREP(CMDQ_PREFETCH_1_SIZE, ent->prefetch.size);
-   cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
break;
case CMDQ_OP_CFGI_CD:
cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index f985817c967a..83726b6a1cba 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -410,8 +410,6 @@ struct arm_smmu_cmdq_ent {
#define CMDQ_OP_PREFETCH_CFG0x1
struct {
u32 sid;
-   u8  size;
-   u64 addr;
} prefetch;
 
#define CMDQ_OP_CFGI_STE0x3
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: sprd: Fix parameter type warning

2021-04-07 Thread Joerg Roedel
On Wed, Mar 31, 2021 at 11:16:45AM +0800, Chunyan Zhang wrote:
>  drivers/iommu/sprd-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Report right snoop capability when using FL for IOVA

2021-04-07 Thread Joerg Roedel
On Tue, Mar 30, 2021 at 10:11:45AM +0800, Lu Baolu wrote:
>  drivers/iommu/intel/pasid.h |  1 +
>  drivers/iommu/intel/iommu.c | 11 ++-
>  drivers/iommu/intel/pasid.c | 16 
>  3 files changed, 27 insertions(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/mediatek-v1: Allow building as module

2021-04-07 Thread Joerg Roedel
On Fri, Mar 26, 2021 at 11:23:36AM +0800, Yong Wu wrote:
> This patch only adds support for building the IOMMU-v1 driver as module.
> Correspondingly switch the config to tristate and update the iommu_ops's
> owner to THIS_MODULE.
> 
> Signed-off-by: Yong Wu 

Applied both, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] iommu/iova: Add CPU hotplug handler to flush rcaches to core code

2021-04-07 Thread Joerg Roedel
On Thu, Mar 25, 2021 at 08:29:57PM +0800, John Garry wrote:
> John Garry (4):
>   iova: Add CPU hotplug handler to flush rcaches
>   iommu/vt-d: Remove IOVA domain rcache flushing for CPU offlining
>   iommu: Delete iommu_dma_free_cpu_cached_iovas()
>   iommu: Stop exporting free_iova_fast()
> 
>  drivers/iommu/dma-iommu.c   |  9 -
>  drivers/iommu/intel/iommu.c | 31 ---
>  drivers/iommu/iova.c| 34 +++---
>  include/linux/cpuhotplug.h  |  2 +-
>  include/linux/dma-iommu.h   |  8 
>  include/linux/iova.h|  6 +-
>  6 files changed, 33 insertions(+), 57 deletions(-)

Okay, newer version. Applied this one instead, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Fix a boundary issue to avoid performance drop

2021-04-07 Thread Joerg Roedel
On Thu, Mar 25, 2021 at 09:43:45AM +, Will Deacon wrote:
> Urgh, I must say I much preferred these things being exclusive, but this
> looks like a necessary fix:
> 
> Acked-by: Will Deacon 

Applied for v5.12.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-07 Thread Tian, Kevin
> From: Jason Gunthorpe
> Sent: Tuesday, April 6, 2021 8:43 PM
> 
> On Tue, Apr 06, 2021 at 09:35:17AM +0800, Jason Wang wrote:
> 
> > > VFIO and VDPA has no buisness having map/unmap interfaces once we
> have
> > > /dev/ioasid. That all belongs in the iosaid side.
> > >
> > > I know they have those interfaces today, but that doesn't mean we have
> > > to keep using them for PASID use cases, they should be replaced with a
> > > 'do dma from this pasid on /dev/ioasid' interface certainly not a
> > > 'here is a pasid from /dev/ioasid, go ahead and configure it youself'
> > > interface
> >
> > So it looks like the PASID was bound to SVA in this design. I think it's not
> > necessairly the case:
> 
> No, I wish people would stop talking about SVA.
> 
> SVA and vSVA are a very special narrow configuration of a PASID. There
> are lots of other PASID configurations! That is the whole point, a
> PASID is complicated, there are many configuration scenarios, they
> need to be in one place with a very clearly defined uAPI
> 

I feel it also makes sense to allow a subsystem to specify which configurations
are permitted when allowing a PASID on its device, e.g. excluding things like
GPA mappings that existing subsystems (VFIO/VDPA) already handle well:

- Share GPA mappings between multiple devices (w/ or w/o PASID) for 
better IOTLB efficiency;

- Share GPA mappings between transactions w/ PASID and transactions w/o
PASID from the same device (e.g. GPU) for better IOTLB efficiency;

- Use the same page table for GPA mappings before and after the guest 
turns on/off the PASID capability;

All above are given as long as we continue to let VFIO/VDPA manage the
iommu domain and associated GPA mappings for PASID. The IOMMU driver 
already ensures a nested PASID entry linking to the established GPA paging 
structure of the domain when the 1st-level pgtable is bound through 
/dev/ioasid. 

In contrast, above merits are lost if forcing a model where GPA mappings
for PASID must be constructed through /dev/ioasid, as this will lead to
multiple paging structures for the same GPA mappings implying worse 
IOTLB usage and unnecessary cost of invalidations.

Therefore, I envision a scheme where the subsystem could specify 
permitted PASID configurations when doing ALLOW_PASID, and then 
userspace queries per-PASID capability to learn which operations
are allowed, e.g.:

1) To enable vSVA, VFIO/VDPA allows pgtable binding and related invalidation/
fault ops through /dev/ioasid;

2) for vDPA control vq usage, no configuration is allowed through /dev/ioasid;

3) for new subsystem which doesn't carry any legacy or similar usage as 
VFIO/VDPA, it could permit all configurations through /dev/ioasid including 
1st-level binding and 2nd-level mapping ops;

This approach also allows us to grow the uAPI in a staging approach. Now 
focus on 1) and 2) as VFIO/VDPA are the only two users for now with good
legacy to cover the GPA mappings. More ops can be introduced for 3) when 
there is a real example to show what exact ops are required for such a new 
subsystem.

Is this a good strategy to move forward?

btw this discussion was raised when discussing the I/O page fault handling
process. Currently the IOMMU layer implements a per-device fault reporting
mechanism, which requires VFIO to register a handler to receive all faults 
on its device and then forwards to ioasid if it's due to 1st-level. Possibly it 
makes more sense to convert it into a per-pgtable reporting scheme, and 
then the owner of each pgtable should register its own handler. It means
for 1) VFIO will register a 2nd-level pgtable handler while /dev/ioasid
will register a 1st-level pgtable handler, while for 3) /dev/ioasid will 
register 
handlers for both 1st-level and 2nd-level pgtable. Jean? also want to know 
your thoughts...  

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] iommu/vt-d: Several misc cleanups

2021-04-07 Thread Joerg Roedel
On Tue, Mar 23, 2021 at 09:05:55AM +0800, Lu Baolu wrote:
> Lu Baolu (5):
>   iommu/vt-d: Remove unused dma map/unmap trace events
>   iommu/vt-d: Remove svm_dev_ops
>   iommu/vt-d: Remove SVM_FLAG_PRIVATE_PASID
>   iommu/vt-d: Remove unused function declarations
>   iommu/vt-d: Make unnecessarily global functions static

Applied all, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/iova: Add CPU hotplug handler to flush rcaches to core code

2021-04-07 Thread John Garry

On 07/04/2021 09:04, Joerg Roedel wrote:

On Mon, Mar 01, 2021 at 08:12:18PM +0800, John Garry wrote:

The Intel IOMMU driver supports flushing the per-CPU rcaches when a CPU is
offlined.

Let's move it to core code, so everyone can take advantage.

Also correct a code comment.

Based on v5.12-rc1. Tested on arm64 only.

John Garry (3):
   iova: Add CPU hotplug handler to flush rcaches
   iommu/vt-d: Remove IOVA domain rcache flushing for CPU offlining
   iova: Correct comment for free_cpu_cached_iovas()

  drivers/iommu/intel/iommu.c | 31 ---
  drivers/iommu/iova.c| 32 ++--
  include/linux/cpuhotplug.h  |  2 +-
  include/linux/iova.h|  1 +
  4 files changed, 32 insertions(+), 34 deletions(-)


Applied, thanks.

.



Thanks, but there was a v2 on this series. Not sure which you applied.

https://lore.kernel.org/linux-iommu/9aad6e94-ecb7-ca34-7f7d-3df6e43e9...@huawei.com/T/#mbea81468782c75fa84744ad7a7801831a4c952e9

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/1] iommu/vt-d: Don't set then clear private data in prq_event_thread()

2021-04-07 Thread Joerg Roedel
On Sat, Mar 20, 2021 at 10:41:56AM +0800, Lu Baolu wrote:
>  drivers/iommu/intel/svm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/1] iommu/vt-d: Fix lockdep splat in intel_pasid_get_entry()

2021-04-07 Thread Joerg Roedel
On Sat, Mar 20, 2021 at 10:09:16AM +0800, Lu Baolu wrote:
>  drivers/iommu/intel/pasid.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/iova: Add CPU hotplug handler to flush rcaches to core code

2021-04-07 Thread Joerg Roedel
On Mon, Mar 01, 2021 at 08:12:18PM +0800, John Garry wrote:
> The Intel IOMMU driver supports flushing the per-CPU rcaches when a CPU is
> offlined.
> 
> Let's move it to core code, so everyone can take advantage.
> 
> Also correct a code comment.
> 
> Based on v5.12-rc1. Tested on arm64 only.
> 
> John Garry (3):
>   iova: Add CPU hotplug handler to flush rcaches
>   iommu/vt-d: Remove IOVA domain rcache flushing for CPU offlining
>   iova: Correct comment for free_cpu_cached_iovas()
> 
>  drivers/iommu/intel/iommu.c | 31 ---
>  drivers/iommu/iova.c| 32 ++--
>  include/linux/cpuhotplug.h  |  2 +-
>  include/linux/iova.h|  1 +
>  4 files changed, 32 insertions(+), 34 deletions(-)

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie

2021-04-07 Thread Zenghui Yu

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:

Up to now, when the type was UNMANAGED, we used to
allocate IOVA pages within a reserved IOVA MSI range.

If both the host and the guest are exposed with SMMUs, each
would allocate an IOVA. The guest allocates an IOVA (gIOVA)
to map onto the guest MSI doorbell (gDB). The Host allocates
another IOVA (hIOVA) to map onto the physical doorbell (hDB).

So we end up with 2 unrelated mappings, at S1 and S2:
  S1 S2
gIOVA-> gDB
hIOVA->hDB

The PCI device would be programmed with hIOVA.
No stage 1 mapping would existing, causing the MSIs to fault.

iommu_dma_bind_guest_msi() allows to pass gIOVA/gDB
to the host so that gIOVA can be used by the host instead of
re-allocating a new hIOVA.

  S1   S2
gIOVA->gDB->hDB

this time, the PCI device can be programmed with the gIOVA MSI
doorbell which is correctly mapped through both stages.

Nested mode is not compatible with HW MSI regions as in that
case gDB and hDB should have a 1-1 mapping. This check will
be done when attaching each device to the IOMMU domain.

Signed-off-by: Eric Auger 


[...]


diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f659395e7959..d25eb7cecaa7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -19,6 +19,7 @@
  #include 
  #include 
  #include 
+#include 


Duplicated include.


  #include 
  #include 
  #include 
@@ -29,12 +30,15 @@
  struct iommu_dma_msi_page {
struct list_headlist;
dma_addr_t  iova;
+   dma_addr_t  gpa;
phys_addr_t phys;
+   size_t  s1_granule;
  };
  
  enum iommu_dma_cookie_type {

IOMMU_DMA_IOVA_COOKIE,
IOMMU_DMA_MSI_COOKIE,
+   IOMMU_DMA_NESTED_MSI_COOKIE,
  };
  
  struct iommu_dma_cookie {

@@ -46,6 +50,7 @@ struct iommu_dma_cookie {
dma_addr_t  msi_iova;


msi_iova is unused in the nested mode, but we still set it to the start
address of the RESV_SW_MSI region (in iommu_get_msi_cookie()), which
looks a bit strange to me.


};
struct list_headmsi_page_list;
+   spinlock_t  msi_lock;


Should msi_lock be grabbed everywhere msi_page_list is populated?
Especially in iommu_dma_get_msi_page(), which can be invoked from the
irqchip driver.

  
  	/* Domain for flush queue callback; NULL if flush queue not in use */

struct iommu_domain *fq_domain;
@@ -87,6 +92,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum 
iommu_dma_cookie_type type)
  
  	cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);

if (cookie) {
+   spin_lock_init(&cookie->msi_lock);
INIT_LIST_HEAD(&cookie->msi_page_list);
cookie->type = type;
}
@@ -120,14 +126,17 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
   *
   * Users who manage their own IOVA allocation and do not want DMA API support,
   * but would still like to take advantage of automatic MSI remapping, can use
- * this to initialise their own domain appropriately. Users should reserve a
+ * this to initialise their own domain appropriately. Users may reserve a
   * contiguous IOVA region, starting at @base, large enough to accommodate the
   * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
- * used by the devices attached to @domain.
+ * used by the devices attached to @domain. The other way round is to provide
+ * usable iova pages through the iommu_dma_bind_doorbell API (nested stages


s/iommu_dma_bind_doorbell/iommu_dma_bind_guest_msi/ ?


+ * use case)
   */
  int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
  {
struct iommu_dma_cookie *cookie;
+   int nesting, ret;
  
  	if (domain->type != IOMMU_DOMAIN_UNMANAGED)

return -EINVAL;
@@ -135,7 +144,12 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, 
dma_addr_t base)
if (domain->iova_cookie)
return -EEXIST;
  
-	cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);

+   ret =  iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting);


Redundant space.


+   if (!ret && nesting)
+   cookie = cookie_alloc(IOMMU_DMA_NESTED_MSI_COOKIE);
+   else
+   cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+
if (!cookie)
return -ENOMEM;
  
@@ -156,6 +170,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)

  {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi, *tmp;
+   bool s2_unmap = false;
  
  	if (!cookie)

return;
@@ -163,7 +178,15 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
put_iova_domain(&cookie->iovad);
  
+	if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE)

+