Re: [RFC PATCH v4 10/15] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
On 2021-04-08 07:32, Will Deacon wrote: On Wed, Apr 07, 2021 at 09:52:36PM -0700, Isaac J. Manjarres wrote: Implement the unmap_pages() callback for the ARM LPAE io-pgtable format. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon --- drivers/iommu/io-pgtable-arm.c | 70 ++ 1 file changed, 45 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index ea66b10c04c4..6700685f81d4 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -253,8 +253,8 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte, static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, struct iommu_iotlb_gather *gather, - unsigned long iova, size_t size, int lvl, - arm_lpae_iopte *ptep); + unsigned long iova, size_t size, size_t pgcount, + int lvl, arm_lpae_iopte *ptep); static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, phys_addr_t paddr, arm_lpae_iopte prot, @@ -298,7 +298,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data); tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data); - if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, + if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1, lvl, tblp) != sz) { WARN_ON(1); return -EINVAL; @@ -526,14 +526,14 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, struct iommu_iotlb_gather *gather, unsigned long iova, size_t size, arm_lpae_iopte blk_pte, int lvl, - arm_lpae_iopte *ptep) + arm_lpae_iopte *ptep, size_t pgcount) { struct io_pgtable_cfg *cfg = &data->iop.cfg; arm_lpae_iopte pte, *tablep; phys_addr_t blk_paddr; size_t tablesz = ARM_LPAE_GRANULE(data); size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data); - int i, unmap_idx = -1; + int i, unmap_idx_start = -1, num_entries = 0, max_entries; if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) return 0; @@ -542,15 +542,18 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, if (!tablep) return 0; /* Bytes unmapped */ - if (size == split_sz) - unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data); + if (size == split_sz) { + unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data); + max_entries = (tablesz >> ilog2(sizeof(pte))) - unmap_idx_start; + num_entries = min_t(int, pgcount, max_entries); + } blk_paddr = iopte_to_paddr(blk_pte, data); pte = iopte_prot(blk_pte); for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) { Given that we already have a 'tablesz / sizeof(pte)' expression here, I'd be inclined to have either a local variable or a macro helper to get at the ptes_per_table value that you also need to compute max_entries. I think a macro might be helpful, as the number of PTEs per table is useful in a few places. /* Unmap! */ - if (i == unmap_idx) + if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries)) continue; __arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, &tablep[i]); @@ -568,38 +571,45 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, return 0; tablep = iopte_deref(pte, data); - } else if (unmap_idx >= 0) { - io_pgtable_tlb_add_page(&data->iop, gather, iova, size); - return size; + } else if (unmap_idx_start >= 0) { + for (i = 0; i < num_entries; i++) + io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, size); I suppose we could add a count paramater to the iotlb gather stuff in future too, but for now this is fine as this series is already pretty big. Okay. I can keep this in mind for the future. Will ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v4 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts
On 2021-04-08 06:59, Will Deacon wrote: On Wed, Apr 07, 2021 at 09:52:32PM -0700, Isaac J. Manjarres wrote: From: Will Deacon The 'addr_merge' parameter to iommu_pgsize() is a fabricated address intended to describe the alignment requirements to consider when choosing an appropriate page size. On the iommu_map() path, this address is the logical OR of the virtual and physical addresses. Subsequent improvements to iommu_pgsize() will need to check the alignment of the virtual and physical components of 'addr_merge' independently, so pass them in as separate parameters and reconstruct 'addr_merge' locally. No functional change. Signed-off-by: Will Deacon Signed-off-by: Isaac J. Manjarres --- drivers/iommu/iommu.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index bcd623862bf9..ab689611a03b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) } EXPORT_SYMBOL_GPL(iommu_iova_to_phys); -static size_t iommu_pgsize(struct iommu_domain *domain, - unsigned long addr_merge, size_t size) +static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size) { unsigned int pgsize_idx; unsigned long pgsizes; size_t pgsize; + phys_addr_t addr_merge = paddr | iova; ^^^ this needs to be 'unsigned long' as it was before (otherwise using GENMASK _is_ a problem). Done. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v4 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()
On 2021-04-08 06:58, Will Deacon wrote: On Wed, Apr 07, 2021 at 09:52:38PM -0700, Isaac J. Manjarres wrote: Implement the unmap_pages() callback for the ARM v7s io-pgtable format. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/io-pgtable-arm-v7s.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index d4004bcf333a..5e203e03c352 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -710,15 +710,32 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data, return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep); } -static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova, - size_t size, struct iommu_iotlb_gather *gather) +static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *gather) { struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops); + size_t unmapped = 0, ret; if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias))) return 0; - return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd); + while (pgcount--) { + ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd); + if (!ret) + break; + + unmapped += pgsize; + iova += pgsize; + } + + return unmapped; +} Wait -- don't you need to hook this up somewhere (likewise for ->map_pages)? Done. Likewise for map_pages(). I'm not sure how the compiler didn't catch this; I'm compile testing this, as I don't have hardware that uses the short descriptor format. How are you testing this? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v3 09/12] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
On 2021-04-07 02:57, Will Deacon wrote: On Tue, Apr 06, 2021 at 02:02:26PM -0700, isa...@codeaurora.org wrote: On 2021-04-06 05:15, Will Deacon wrote: > On Mon, Apr 05, 2021 at 12:11:09PM -0700, Isaac J. Manjarres wrote: > > Implement the unmap_pages() callback for the ARM LPAE io-pgtable > > format. > > > > Signed-off-by: Isaac J. Manjarres > > Suggested-by: Will Deacon > > --- > > drivers/iommu/io-pgtable-arm.c | 124 > > +++-- > > 1 file changed, 104 insertions(+), 20 deletions(-) > > [...] > > > +static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, > > unsigned long iova, > > +size_t pgsize, size_t pgcount, > > +struct iommu_iotlb_gather *gather) > > +{ > > + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); > > + struct io_pgtable_cfg *cfg = &data->iop.cfg; > > + arm_lpae_iopte *ptep = data->pgd; > > + long iaext = (s64)iova >> cfg->ias; > > + size_t unmapped = 0, unmapped_page; > > + int last_lvl; > > + size_t table_size, pages, tbl_offset, max_entries; > > + > > + if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || > > !pgcount)) > > + return 0; > > + > > + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) > > + iaext = ~iaext; > > + if (WARN_ON(iaext)) > > + return 0; > > + > > + /* > > + * Calculating the page table size here helps avoid situations where > > + * a page range that is being unmapped may be mapped at the same > > level > > + * but not mapped by the same tables. Allowing such a scenario to > > + * occur can complicate the logic in arm_lpae_split_blk_unmap(). > > + */ > > + last_lvl = ARM_LPAE_BLOCK_SIZE_LVL(pgsize, data); > > + > > + if (last_lvl == data->start_level) > > + table_size = ARM_LPAE_PGD_SIZE(data); > > + else > > + table_size = ARM_LPAE_GRANULE(data); > > + > > + max_entries = table_size / sizeof(*ptep); > > + > > + while (pgcount) { > > + tbl_offset = ARM_LPAE_LVL_IDX(iova, last_lvl, data); > > + pages = min_t(size_t, pgcount, max_entries - tbl_offset); > > + unmapped_page = __arm_lpae_unmap(data, gather, iova, pgsize, > > + pages, data->start_level, > > + ptep); > > + if (!unmapped_page) > > + break; > > + > > + unmapped += unmapped_page; > > + iova += unmapped_page; > > + pgcount -= pages; > > + } > > Robin has some comments on the first version of this patch, and I > don't think you > addressed them: > > https://lore.kernel.org/r/b93fa0b1-e2a4-1aad-8b88-4d0dfecdf...@arm.com > > I'm inclined to agree that iterating here doesn't make a lot of sense -- > we've already come back out of the page-table walk, so I think we should > just return to the caller (who is well prepared to handle a partial > unmap). > Same for the map side of things. > > If we get numbers showing that this is causing a performance issue, then > we should rework the page-table code to handle this at the lower level > (because I doubt the loop that you have is really worse than returning > to > the caller anyway). > Sorry, I seem to have overlooked those comments. I will go ahead and address them. I think it might be ideal to try to do as much work as possible in the io-pgtable level, so as to minimize the number of indirect calls incurred by jumping back and forth between iommu fwk, iommu driver, and io-pgtable code. Perhaps we can try something like how the linear mappings are created on arm64 i.e. on the previous level, we can determine how many pages can be unmapped in one page table in one iteration, and on the subsequent iterations, we can tackle another page table at the lower level. Looking at the code, it doesn't seem too difficult to add this in. Thoughts? I don't object to getting there eventually, but as an initial step I think returning back to the caller is perfectly reasonable and will probably make the patches easier to review. In other words, implement the simple (correct) thing first, and then have subsequent patches to improve it. Will Thanks for the feedback. I've implemented the simpler approach in v4 of the patches: https://lore.kernel.org/linux-iommu/20210408045241.27316-1-isa...@codeaurora.org/T/#t. Thanks, Isaac ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v3 03/12] iommu/io-pgtable: Introduce map_pages() as a page table op
On 2021-04-06 04:57, Will Deacon wrote: On Mon, Apr 05, 2021 at 12:11:03PM -0700, Isaac J. Manjarres wrote: Mapping memory into io-pgtables follows the same semantics that unmapping memory used to follow (i.e. a buffer will be mapped one page block per call to the io-pgtable code). This means that it can be optimized in the same way that unmapping memory was, so add a map_pages() callback to the io-pgtable ops structure, so that a range of pages of the same size can be mapped within the same call. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon --- include/linux/io-pgtable.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 2ed0c057d9e7..019149b204b8 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -143,6 +143,7 @@ struct io_pgtable_cfg { * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers. * * @map: Map a physically contiguous memory region. + * @map_pages:Map a physically contiguous range of pages of the same size. * @unmap:Unmap a physically contiguous memory region. * @unmap_pages: Unmap a range of virtually contiguous pages of the same size. * @iova_to_phys: Translate iova to physical address. @@ -153,6 +154,9 @@ struct io_pgtable_cfg { struct io_pgtable_ops { int (*map)(struct io_pgtable_ops *ops, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp); + int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova, +phys_addr_t paddr, size_t pgsize, size_t pgcount, +int prot, gfp_t gfp, size_t *mapped); How about returning 'size_t' and using IS_ERR_VALUE() instead of adding the extra 'mapped' argument (i.e. return the size of the region mapped or an error code)? I don't think we realistically need to care about map sizes that overlap with the error region. I'd given that a shot before, but the problem that I kept running into was that in case of an error, if I return an error code, I don't know how much memory was mapped, so that I can invoke iommu_unmap from __iommu_map with that size to undo the partial mappings from a map_pages() call. Returning the amount of memory that was mapped in the case of an error will be less than the size that was requested, but then we lose the information about why the error happened, since the error code won't be returned, so that's why I went with the current approach. Do you have any other ideas about how to handle this? I'd thought of having arm_lpae_map_pages() invoke arm_lpae_unmap_pages(), but the TLB maintenance was a problem, as we wouldn't invoke iommu_iotlb_sync(). Thanks, Isaac Will ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v3 09/12] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
On 2021-04-06 05:15, Will Deacon wrote: On Mon, Apr 05, 2021 at 12:11:09PM -0700, Isaac J. Manjarres wrote: Implement the unmap_pages() callback for the ARM LPAE io-pgtable format. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon --- drivers/iommu/io-pgtable-arm.c | 124 +++-- 1 file changed, 104 insertions(+), 20 deletions(-) [...] +static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *gather) +{ + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct io_pgtable_cfg *cfg = &data->iop.cfg; + arm_lpae_iopte *ptep = data->pgd; + long iaext = (s64)iova >> cfg->ias; + size_t unmapped = 0, unmapped_page; + int last_lvl; + size_t table_size, pages, tbl_offset, max_entries; + + if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || !pgcount)) + return 0; + + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) + iaext = ~iaext; + if (WARN_ON(iaext)) + return 0; + + /* +* Calculating the page table size here helps avoid situations where + * a page range that is being unmapped may be mapped at the same level +* but not mapped by the same tables. Allowing such a scenario to +* occur can complicate the logic in arm_lpae_split_blk_unmap(). +*/ + last_lvl = ARM_LPAE_BLOCK_SIZE_LVL(pgsize, data); + + if (last_lvl == data->start_level) + table_size = ARM_LPAE_PGD_SIZE(data); + else + table_size = ARM_LPAE_GRANULE(data); + + max_entries = table_size / sizeof(*ptep); + + while (pgcount) { + tbl_offset = ARM_LPAE_LVL_IDX(iova, last_lvl, data); + pages = min_t(size_t, pgcount, max_entries - tbl_offset); + unmapped_page = __arm_lpae_unmap(data, gather, iova, pgsize, +pages, data->start_level, +ptep); + if (!unmapped_page) + break; + + unmapped += unmapped_page; + iova += unmapped_page; + pgcount -= pages; + } Robin has some comments on the first version of this patch, and I don't think you addressed them: https://lore.kernel.org/r/b93fa0b1-e2a4-1aad-8b88-4d0dfecdf...@arm.com I'm inclined to agree that iterating here doesn't make a lot of sense -- we've already come back out of the page-table walk, so I think we should just return to the caller (who is well prepared to handle a partial unmap). Same for the map side of things. If we get numbers showing that this is causing a performance issue, then we should rework the page-table code to handle this at the lower level (because I doubt the loop that you have is really worse than returning to the caller anyway). Sorry, I seem to have overlooked those comments. I will go ahead and address them. I think it might be ideal to try to do as much work as possible in the io-pgtable level, so as to minimize the number of indirect calls incurred by jumping back and forth between iommu fwk, iommu driver, and io-pgtable code. Perhaps we can try something like how the linear mappings are created on arm64 i.e. on the previous level, we can determine how many pages can be unmapped in one page table in one iteration, and on the subsequent iterations, we can tackle another page table at the lower level. Looking at the code, it doesn't seem too difficult to add this in. Thoughts? Thanks, Isaac Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v3 06/12] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts
On 2021-04-06 04:53, Will Deacon wrote: On Mon, Apr 05, 2021 at 12:11:06PM -0700, Isaac J. Manjarres wrote: From: Will Deacon The 'addr_merge' parameter to iommu_pgsize() is a fabricated address intended to describe the alignment requirements to consider when choosing an appropriate page size. On the iommu_map() path, this address is the logical OR of the virtual and physical addresses. Subsequent improvements to iommu_pgsize() will need to check the alignment of the virtual and physical components of 'addr_merge' independently, so pass them in as separate parameters and reconstruct 'addr_merge' locally. No functional change. Signed-off-by: Will Deacon Signed-off-by: Isaac J. Manjarres --- drivers/iommu/iommu.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 9006397b6604..a3bbf7e310b0 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) } EXPORT_SYMBOL_GPL(iommu_iova_to_phys); -static size_t iommu_pgsize(struct iommu_domain *domain, - unsigned long addr_merge, size_t size) +static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size) { unsigned int pgsize_idx; unsigned long pgsizes; size_t pgsize; + phys_addr_t addr_merge = paddr | iova; Huh, so this was 'unsigned long' before and, given that the pgsize_bitmap on the domain is also unsigned long, then I think that's fine. So using that would mean you don't need GENMASK_ULL for this guy either. Will Thanks, I will address this in version 4 of the series. --Isaac ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 07/12] iommu: Hook up '->unmap_pages' driver callback
On 2021-04-04 23:00, Lu Baolu wrote: Hi, On 4/2/21 9:34 AM, Isaac J. Manjarres wrote: static size_t __iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather) @@ -2476,7 +2519,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain, unsigned long orig_iova = iova; unsigned int min_pagesz; - if (unlikely(ops->unmap == NULL || + if (unlikely((ops->unmap == NULL && ops->unmap_pages == NULL) || domain->pgsize_bitmap == 0UL)) This change should also be applied to __iommu_map() path. And perhaps could be: if (unlikely(!(ops->unmap || ops->unmap_pages) || !domain->pgsize_bitmap)) Yep, that's correct. Thank you for spotting that; I've updated it in the latest series: https://lore.kernel.org/linux-iommu/20210405191112.28192-1-isa...@codeaurora.org/T/#t. Thanks, Isaac Best regards, baolu ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/5] iommu: Add an unmap_pages() op for IOMMU drivers
On 2021-03-30 22:39, Lu Baolu wrote: On 3/31/21 1:36 PM, isa...@codeaurora.org wrote: On 2021-03-30 21:47, Lu Baolu wrote: On 3/31/21 11:00 AM, Isaac J. Manjarres wrote: Add a callback for IOMMU drivers to provide a path for the IOMMU framework to call into an IOMMU driver, which can call into the io-pgtable code, to unmap a virtually contiguous range of pages of the same size. For IOMMU drivers that do not specify an unmap_pages() callback, the existing logic of unmapping memory one page block at a time will be used. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon --- include/linux/iommu.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5e7fe519430a..9cf81242581a 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -193,6 +193,7 @@ struct iommu_iotlb_gather { * @detach_dev: detach device from an iommu domain * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain + * @unmap_pages: unmap a number of pages of the same size from an iommu domain * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain * @iotlb_sync_map: Sync mappings created recently using @map to the hardware * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush @@ -245,6 +246,9 @@ struct iommu_ops { phys_addr_t paddr, size_t size, int prot, gfp_t gfp); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather); + size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *iotlb_gather); Is it possible to add an equivalent map_pages() callback? Yes, map_pages() can be implemented and can leverage a lot of the implementation of unmap_pages(). The problem is that if you map several pages in one call, and then encounter an error and have to rollback, you should do TLB maintenance, as iommu_map does when it encounters an error. However, we can't call iommu_unmap from io-pgtable-arm for example. We can call arm_lpae_unmap_pages() from the later patches, but that doesn't solve the TLB maintenance issue. Do you have any thoughts on how to address this? Call unmap_pages() with the same pages and size to roll back. Does it work? Best regards, baolu Hi Lu, I've given map_pages() a shot. Here's the second version of the RFC series: https://lore.kernel.org/linux-iommu/20210402013452.4013-1-isa...@codeaurora.org/T/#t. Thanks, Isaac void (*flush_iotlb_all)(struct iommu_domain *domain); void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova, size_t size); Best regards, baolu ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 3/6] iommu: Use bitmap to calculate page size in iommu_pgsize()
On 2021-04-01 09:47, Will Deacon wrote: Avoid the potential for shifting values by amounts greater than the width of their type by using a bitmap to compute page size in iommu_pgsize(). Signed-off-by: Will Deacon --- drivers/iommu/iommu.c | 31 --- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index d0b0a15dba84..bcd623862bf9 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -2360,30 +2361,22 @@ static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long addr_merge, size_t size) { unsigned int pgsize_idx; + unsigned long pgsizes; size_t pgsize; - /* Max page size that still fits into 'size' */ - pgsize_idx = __fls(size); + /* Page sizes supported by the hardware and small enough for @size */ + pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0); I've fixed this in the latest RFC for the iommu_map/unmap optimization patches, but for the sake of completeness: I think this should be GENMASK_ULL, in case __fls(size) >= 32. Thank you for these patches, by the way. I've looked through them and they make sense/seem correct. I've integrated them into the latest RFC: https://lore.kernel.org/linux-iommu/20210402013452.4013-1-isa...@codeaurora.org/T/#t. Thanks, Isaac - /* need to consider alignment requirements ? */ - if (likely(addr_merge)) { - /* Max page size allowed by address */ - unsigned int align_pgsize_idx = __ffs(addr_merge); - pgsize_idx = min(pgsize_idx, align_pgsize_idx); - } - - /* build a mask of acceptable page sizes */ - pgsize = (1UL << (pgsize_idx + 1)) - 1; - - /* throw away page sizes not supported by the hardware */ - pgsize &= domain->pgsize_bitmap; + /* Constrain the page sizes further based on the maximum alignment */ + if (likely(addr_merge)) + pgsizes &= GENMASK(__ffs(addr_merge), 0); - /* make sure we're still sane */ - BUG_ON(!pgsize); + /* Make sure we have at least one suitable page size */ + BUG_ON(!pgsizes); - /* pick the biggest page */ - pgsize_idx = __fls(pgsize); - pgsize = 1UL << pgsize_idx; + /* Pick the biggest page size remaining */ + pgsize_idx = __fls(pgsizes); + pgsize = BIT(pgsize_idx); return pgsize; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/5] iommu: Add an unmap_pages() op for IOMMU drivers
On 2021-03-30 21:47, Lu Baolu wrote: On 3/31/21 11:00 AM, Isaac J. Manjarres wrote: Add a callback for IOMMU drivers to provide a path for the IOMMU framework to call into an IOMMU driver, which can call into the io-pgtable code, to unmap a virtually contiguous range of pages of the same size. For IOMMU drivers that do not specify an unmap_pages() callback, the existing logic of unmapping memory one page block at a time will be used. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon --- include/linux/iommu.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5e7fe519430a..9cf81242581a 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -193,6 +193,7 @@ struct iommu_iotlb_gather { * @detach_dev: detach device from an iommu domain * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain + * @unmap_pages: unmap a number of pages of the same size from an iommu domain * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain * @iotlb_sync_map: Sync mappings created recently using @map to the hardware * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush @@ -245,6 +246,9 @@ struct iommu_ops { phys_addr_t paddr, size_t size, int prot, gfp_t gfp); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather); + size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *iotlb_gather); Is it possible to add an equivalent map_pages() callback? Yes, map_pages() can be implemented and can leverage a lot of the implementation of unmap_pages(). The problem is that if you map several pages in one call, and then encounter an error and have to rollback, you should do TLB maintenance, as iommu_map does when it encounters an error. However, we can't call iommu_unmap from io-pgtable-arm for example. We can call arm_lpae_unmap_pages() from the later patches, but that doesn't solve the TLB maintenance issue. Do you have any thoughts on how to address this? void (*flush_iotlb_all)(struct iommu_domain *domain); void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova, size_t size); Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/5] Optimize iommu_map_sg() performance
On 2021-01-12 08:00, Robin Murphy wrote: On 2021-01-11 14:54, Isaac J. Manjarres wrote: The iommu_map_sg() code currently iterates through the given scatter-gather list, and in the worst case, invokes iommu_map() for each element in the scatter-gather list, which calls into the IOMMU driver through an indirect call. For an IOMMU driver that uses a format supported by the io-pgtable code, the IOMMU driver will then call into the io-pgtable code to map the chunk. Jumping between the IOMMU core code, the IOMMU driver, and the io-pgtable code and back for each element in a scatter-gather list is not efficient. Instead, add a map_sg() hook in both the IOMMU driver ops and the io-pgtable ops. iommu_map_sg() can then call into the IOMMU driver's map_sg() hook with the entire scatter-gather list, which can call into the io-pgtable map_sg() hook, which can process the entire scatter-gather list, signficantly reducing the number of indirect calls, and jumps between these layers, boosting performance. Out of curiosity, how much of the difference is attributable to actual indirect call overhead vs. the additional massive reduction in visits to arm_smmu_rpm_{get,put} that you fail to mention?There are ways to I did an experiment where I compared the two approaches without any calls to arm_smmu_rpm_[get/put]. There's still a large amount of difference without the overhead incurred by power management calls. Here are the results: no optimizations and no power management calls: sizeiommu_map_sg 4K0.609 us 64K8.583 us 1M 136.083 us 2M 273.145 us 12M 1442.119 us 24M 2876.078 us 32M 3832.041 us iommu_map_sg optimizations and no power management calls: sizeiommu_map_sg 4K0.645 us 64K1.229 us 1M9.531 us 2M 23.198 us 12M 99.250 us 24M 185.713 us 32M 248.781 us From here, we can see that the amount of latency incurred by the indirect calls is fairly large. optimise indirect calling that would benefit *all* cases, rather than just one operation for one particular driver. Do you mind sharing some more information on how to optimize the existing approach further, such that it benefits other drivers as well? On a system that uses the ARM SMMU driver, and the ARM LPAE format, the current implementation of iommu_map_sg() yields the following latencies for mapping scatter-gather lists of various sizes. These latencies are calculated by repeating the mapping operation 10 times: sizeiommu_map_sg latency 4K0.624 us 64K9.468 us 1M 122.557 us 2M 239.807 us 12M 1435.979 us 24M 2884.968 us 32M 3832.979 us On the same system, the proposed modifications yield the following results: sizeiommu_map_sg latency 4K3.645 us 64K4.198 us 1M 11.010 us 2M 17.125 us 12M 82.416 us 24M 158.677 us 32M 210.468 us The procedure for collecting the iommu_map_sg latencies is the same in both experiments. Clearly, reducing the jumps between the different layers in the IOMMU code offers a signficant performance boost in iommu_map_sg() latency. Presumably those are deliberately worst-case numbers? After all, a 32MB scatterlist *could* incur less overhead than a 64KB one if things line up just right (still 16 ->map calls, but each with one fewer Yes, these are worst case numbers (i.e. a buffer is composed entirely of 4 KB pages, so higher order mappings don't get used). level of pagetable to traverse). TBH I find the significant regression of the 4KB case the most interesting - what's going on there? That was an error on my part. After fixing my error, I observed that the time spent mapping the 4 KB buffer is comparable with and without optimizations, which is expected. My main reservation here is that we get an explosion of duplicate copies of almost the same code, and it's code that's just non-trivial enough to start being bug-prone. And it's all still only for one specific operation - your argument about calling through multiple layers for each element applies just as much to iommu_map() itself, so why aren't we trying to make more fundamental improvements with wider benefits? Indeed I can't imagine the existing iommu_map_sg() loop really adds significant overhead compared to a single iommu_map() call that results in the equivalent set of ->map calls to the driver. At a glance, I reckon that simply extending the internal ->map and ->unmap interfaces to encode a number of consecutive identical pages would already get us a large chunk of the way there; then we'd be in a better place to consider options for the io-pgtable interface. Do you mean physical
Re: [PATCH 0/5] Optimize iommu_map_sg() performance
On 2021-01-10 23:52, Sai Prakash Ranjan wrote: On 2021-01-11 11:52, Sai Prakash Ranjan wrote: Hi Isaac, I gave this series a go on chromebook and saw these warnings and several device probe failures, logs attached below: WARN corresponds to this code in arm_lpae_map_by_pgsize() if (WARN_ON(iaext || (paddr + size) >> cfg->oas)) return -ERANGE; Logs: [2.411391] [ cut here ] [2.416149] WARNING: CPU: 6 PID: 56 at drivers/iommu/io-pgtable-arm.c:492 arm_lpae_map_sg+0x234/0x248 [2.425606] Modules linked in: [2.428749] CPU: 6 PID: 56 Comm: kworker/6:1 Not tainted 5.10.5 #970 [2.440287] Workqueue: events deferred_probe_work_func [2.445563] pstate: 20c9 (nzCv daif +PAN +UAO -TCO BTYPE=--) [2.451726] pc : arm_lpae_map_sg+0x234/0x248 [2.456112] lr : arm_lpae_map_sg+0xe0/0x248 [2.460410] sp : ffc010513750 [2.463820] x29: ffc010513790 x28: ffb943332000 [2.469281] x27: 000ff000 x26: ffb943d14900 [2.474738] x25: 1000 x24: 000103465000 [2.480196] x23: 0001 x22: 000103466000 [2.485645] x21: 0003 x20: 0a20 [2.491103] x19: ffc010513850 x18: 0001 [2.496562] x17: 0002 x16: [2.502021] x15: x14: [2.507479] x13: 0001 x12: [2.512928] x11: 0010 x10: [2.518385] x9 : 0001 x8 : 40201000 [2.523844] x7 : 0a20 x6 : ffb943463000 [2.529302] x5 : 0003 x4 : 1000 [2.534760] x3 : 0001 x2 : ffb941f605a0 [2.540219] x1 : 0003 x0 : 0e40 [2.545679] Call trace: [2.548196] arm_lpae_map_sg+0x234/0x248 [2.552225] arm_smmu_map_sg+0x80/0xc4 [2.556078] __iommu_map_sg+0x6c/0x188 [2.559931] iommu_map_sg_atomic+0x18/0x20 [2.564144] iommu_dma_alloc_remap+0x26c/0x34c [2.568703] iommu_dma_alloc+0x9c/0x268 [2.572647] dma_alloc_attrs+0x88/0xfc [2.576503] gsi_ring_alloc+0x50/0x144 [2.580356] gsi_init+0x2c4/0x5c4 [2.583766] ipa_probe+0x14c/0x2b4 [2.587263] platform_drv_probe+0x94/0xb4 [2.591377] really_probe+0x138/0x348 [2.595145] driver_probe_device+0x80/0xb8 [2.599358] __device_attach_driver+0x90/0xa8 [2.603829] bus_for_each_drv+0x84/0xcc [2.607772] __device_attach+0xc0/0x148 [2.611713] device_initial_probe+0x18/0x20 [2.616012] bus_probe_device+0x38/0x94 [2.619953] deferred_probe_work_func+0x78/0xb0 [2.624611] process_one_work+0x210/0x3dc [2.628726] worker_thread+0x284/0x3e0 [2.632578] kthread+0x148/0x1a8 [2.635891] ret_from_fork+0x10/0x18 [2.639562] ---[ end trace 9bac18cad6a9862e ]--- [2.644414] ipa 1e4.ipa: error -12 allocating channel 0 event ring [2.651656] ipa: probe of 1e4.ipa failed with error -12 [2.660072] dwc3 a60.dwc3: Adding to iommu group 8 [2.668632] xhci-hcd xhci-hcd.13.auto: xHCI Host Controller [2.674680] xhci-hcd xhci-hcd.13.auto: new USB bus registered, assigned bus number 1 ... Isaac provided a fix which he will post as v2 and no warnings were observed with that fix. Tested-by: Sai Prakash Ranjan Thanks, Sai Thanks for testing out the patches. I've added the fix (there was an off-by-one error in the calculation used to check if the IOVA/physical addresses are within limits) to version 2 of the series: https://lore.kernel.org/linux-iommu/1610376862-927-1-git-send-email-isa...@codeaurora.org/T/#t Thanks, Isaac ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
On 2021-01-08 10:18, Will Deacon wrote: On Fri, Jan 08, 2021 at 11:17:25AM +0530, Sai Prakash Ranjan wrote: On 2021-01-07 22:27, isa...@codeaurora.org wrote: > On 2021-01-06 03:56, Will Deacon wrote: > > On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote: > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY > > > flag") > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went > > > the memory type setting required for the non-coherent masters to use > > > system cache. Now that system cache support for GPU is added, we will > > > need to mark the memory as normal sys-cached for GPU to use > > > system cache. > > > Without this, the system cache lines are not allocated for GPU. > > > We use > > > the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page > > > protection > > > flag as the flag cannot be exposed via DMA api because of no in-tree > > > users. > > > > > > Signed-off-by: Sai Prakash Ranjan > > > --- > > > drivers/iommu/io-pgtable-arm.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/iommu/io-pgtable-arm.c > > > b/drivers/iommu/io-pgtable-arm.c > > > index 7c9ea9d7874a..3fb7de8304a2 100644 > > > --- a/drivers/iommu/io-pgtable-arm.c > > > +++ b/drivers/iommu/io-pgtable-arm.c > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, > > > else if (prot & IOMMU_CACHE) > > > pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE > > > << ARM_LPAE_PTE_ATTRINDX_SHIFT); > > > +else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) > > > +pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE > > > +<< ARM_LPAE_PTE_ATTRINDX_SHIFT); > > > } > > > While this approach of enabling system cache globally for both page > tables and other buffers > works for the GPU usecase, this isn't ideal for other clients that use > system cache. For example, > video clients only want to cache a subset of their buffers in the > system cache, due to the sizing constraint > imposed by how much of the system cache they can use. So, it would be > ideal to have > a way of expressing the desire to use the system cache on a per-buffer > basis. Additionally, > our video clients use the DMA layer, and since the requirement is for > caching in the system cache > to be a per buffer attribute, it seems like we would have to have a > DMA attribute to express > this on a per-buffer basis. > I did bring this up initially [1], also where is this video client in upstream? AFAIK, only system cache user in upstream is GPU. We cannot add any DMA attribute unless there is any user upstream as per [2], so when the support for such a client is added, wouldn't ((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || PROT_FLAG) work? Hmm, I think this is another case where we need to separate out the page-table walker attributes from the access attributes. Currently, IO_PGTABLE_QUIRK_ARM_OUTER_WBWA applies _only_ to the page-table walker and I don't think it makes any sense for that to be per-buffer (how would you even manage that?). However, if we want to extend this to data accesses and we know that there are valid use-cases where this should be per-buffer, then shoe-horning it in with the walker quirk does not feel like the best thing to do. Right, I agree that this seems something that merits the same level of separation that exists for the page table walker attributes with respect to coherency, and data buffer attributes with respect to coherency (i.e page table walker coherency does not imply data buffer coherency--that is driven through IOMMU_CACHE). As a starting point, we could: 1. Rename IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC 2. Add a new prot flag IOMMU_LLC 3. Have the GPU pass the new prot for its buffer mappings Does that work? One thing I'm not sure about is whether IOMMU_CACHE should Yes, that should work, as that'll leave the door open for there to be a DMA attribute that can be wired up to IOMMU_LLC. imply IOMMU_LLC, or whether there is a use-case for inner-cacheable, outer non-cacheable mappings for a coherent device. Have you ever seen that sort I'm not aware of such a usecase, but I believe that a coherent device will have their buffers cached in the system cache anyway, as well as the CPU caches. --Isaac of thing before? Will ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
On 2021-01-07 21:47, Sai Prakash Ranjan wrote: On 2021-01-07 22:27, isa...@codeaurora.org wrote: On 2021-01-06 03:56, Will Deacon wrote: On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote: commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag") removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went the memory type setting required for the non-coherent masters to use system cache. Now that system cache support for GPU is added, we will need to mark the memory as normal sys-cached for GPU to use system cache. Without this, the system cache lines are not allocated for GPU. We use the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page protection flag as the flag cannot be exposed via DMA api because of no in-tree users. Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/io-pgtable-arm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 7c9ea9d7874a..3fb7de8304a2 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, else if (prot & IOMMU_CACHE) pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE << ARM_LPAE_PTE_ATTRINDX_SHIFT); + else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE + << ARM_LPAE_PTE_ATTRINDX_SHIFT); } While this approach of enabling system cache globally for both page tables and other buffers works for the GPU usecase, this isn't ideal for other clients that use system cache. For example, video clients only want to cache a subset of their buffers in the system cache, due to the sizing constraint imposed by how much of the system cache they can use. So, it would be ideal to have a way of expressing the desire to use the system cache on a per-buffer basis. Additionally, our video clients use the DMA layer, and since the requirement is for caching in the system cache to be a per buffer attribute, it seems like we would have to have a DMA attribute to express this on a per-buffer basis. I did bring this up initially [1], also where is this video client in upstream? AFAIK, only system cache user in upstream is GPU. We cannot add any DMA attribute unless there is any user upstream Right, there wouldn't be an upstream user, which would be problematic, but I was thinking of having it so that when video or any of our other clients that use this attribute on a per buffer basis upstreams their code, it's not too much of a stretch to add the support. as per [2], so when the support for such a client is added, wouldn't ((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || PROT_FLAG) work? I don't think that will work, because we currently have clients who use the system cache as follows: -cache only page tables in the system cache -cache only data buffers in the system cache -cache both page tables and all buffers in the system cache -cache both page tables and some buffers in the system cache The approach you're suggesting doesn't allow for the last case, as caching the page tables in the system cache involves setting IO_PGTABLE_QUIRK_ARM_OUTER_WBWA, so we will end up losing the flexibility to cache some data buffers in the system cache. Ideally, the page table quirk would drive the settings for the TCR, and the prot flag drives the PTE for the mapping, as is done with the page table walker being dma-coherent, while buffers are mapped as cacheable based on IOMMU_CACHE. Thoughts? Thanks, Isaac [1] https://lore.kernel.org/dri-devel/ecfda7ca80f6d7b4ff3d89b8758f4...@codeaurora.org/ [2] https://lore.kernel.org/linux-iommu/20191026053026.ga14...@lst.de/T/ Thanks, Sai ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
On 2021-01-06 03:56, Will Deacon wrote: On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote: commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag") removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went the memory type setting required for the non-coherent masters to use system cache. Now that system cache support for GPU is added, we will need to mark the memory as normal sys-cached for GPU to use system cache. Without this, the system cache lines are not allocated for GPU. We use the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page protection flag as the flag cannot be exposed via DMA api because of no in-tree users. Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/io-pgtable-arm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 7c9ea9d7874a..3fb7de8304a2 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, else if (prot & IOMMU_CACHE) pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE << ARM_LPAE_PTE_ATTRINDX_SHIFT); + else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE + << ARM_LPAE_PTE_ATTRINDX_SHIFT); } While this approach of enabling system cache globally for both page tables and other buffers works for the GPU usecase, this isn't ideal for other clients that use system cache. For example, video clients only want to cache a subset of their buffers in the system cache, due to the sizing constraint imposed by how much of the system cache they can use. So, it would be ideal to have a way of expressing the desire to use the system cache on a per-buffer basis. Additionally, our video clients use the DMA layer, and since the requirement is for caching in the system cache to be a per buffer attribute, it seems like we would have to have a DMA attribute to express this on a per-buffer basis. Thanks, Isaac drivers/iommu/io-pgtable.c currently documents this quirk as applying only to the page-table walker. Given that we only have one user at the moment, I think it's ok to change that, but please update the comment. We also need to decide on whether we want to allow the quirk to be passed if the coherency of the page-table walker differs from the DMA device, since we have these combinations: Coherent walker?IOMMU_CACHE IO_PGTABLE_QUIRK_ARM_OUTER_WBWA 0: N 0 0 1: N 0 1 2: N 1 0 3: N 1 1 4: Y 0 0 5: Y 0 1 6: Y 1 0 7: Y 1 1 Some of them are obviously bogus, such as (7), but I don't know what to do about cases such as (3) and (5). Will ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration
On 2020-12-23 05:44, Robin Murphy wrote: On 2020-12-22 19:54, isa...@codeaurora.org wrote: On 2020-12-22 11:27, Robin Murphy wrote: On 2020-12-22 00:44, Isaac J. Manjarres wrote: The io-pgtable code constructs an array of init functions for each page table format at compile time. This is not ideal, as this increases the footprint of the io-pgtable code, as well as prevents io-pgtable formats from being built as kernel modules. In preparation for modularizing the io-pgtable formats, switch to a dynamic registration scheme, where each io-pgtable format can register their init functions with the io-pgtable code at boot or module insertion time. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/io-pgtable-arm-v7s.c | 34 +- drivers/iommu/io-pgtable-arm.c | 90 ++-- drivers/iommu/io-pgtable.c | 94 -- include/linux/io-pgtable.h | 51 + 4 files changed, 209 insertions(+), 60 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 1d92ac9..89aad2f 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -835,7 +836,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { +static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { + .fmt = ARM_V7S, .alloc = arm_v7s_alloc_pgtable, .free = arm_v7s_free_pgtable, }; @@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void) pr_info("self test ok\n"); return 0; } -subsys_initcall(arm_v7s_do_selftests); +#else +static int arm_v7s_do_selftests(void) +{ + return 0; +} #endif + +static int __init arm_v7s_init(void) +{ + int ret; + + ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns); + if (ret < 0) { + pr_err("Failed to register ARM V7S format\n"); Super-nit: I think "v7s" should probably be lowercase there. Also general consistency WRT to showing the error code and whether or not to abbreviate "format" would be nice. Ok, I can fix this accordingly. + return ret; + } + + ret = arm_v7s_do_selftests(); + if (ret < 0) + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); + + return ret; +} +core_initcall(arm_v7s_init); + +static void __exit arm_v7s_exit(void) +{ + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); +} +module_exit(arm_v7s_exit); diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58..ff0ea2f 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = { - .alloc = arm_mali_lpae_alloc_pgtable, - .free = arm_lpae_free_pgtable, +static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = { + { + .fmt = ARM_32_LPAE_S1, + .alloc = arm_32_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_32_LPAE_S2, + .alloc = arm_32_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_64_LPAE_S1, + .alloc = arm_64_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_64_LPAE_S2, + .alloc = arm_64_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_MALI_LPAE, + .alloc = arm_mali_lpae_alloc_pgtable, + .free = arm_lpae_free_pgtable, + }, }; #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST @@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void) pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail); return fail ? -EFAULT : 0; } -subsys_initcall(arm_lpae_do_selftests); +#else +static int __init arm_lpae_do_selftests(void) +{ +
Re: [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules
On 2020-12-23 05:05, Robin Murphy wrote: On 2020-12-22 19:49, isa...@codeaurora.org wrote: On 2020-12-22 11:27, Robin Murphy wrote: On 2020-12-22 00:44, Isaac J. Manjarres wrote: The SMMU driver depends on the availability of the ARM LPAE and ARM V7S io-pgtable format code to work properly. In preparation Nit: we don't really depend on v7s - we *can* use it if it's available, address constraints are suitable, and the SMMU implementation actually supports it (many don't), but we can still quite happily not use it even so. LPAE is mandatory in the architecture so that's our only hard requirement, embodied in the kconfig select. This does mean there may technically still be a corner case involving ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime failure rather than a build error, so unless and until anyone demonstrates that it actually matters I don't feel particularly inclined to give it much thought. Robin. Okay, I'll fix up the commit message, as well as the code, so that it only depends on io-pgtable-arm. Well, IIUC it would make sense to keep the softdep for when the v7s module *is* present; I just wanted to clarify that it's more of a nice-to-have rather than a necessity. Robin. Understood, I will keep it there and reword the commit msg. I just tried it out in an environment where the io-pgtable-arm-v7s module isn't present, and I didn't see any warnings or error messages, and the SMMU driver module was loaded properly, so yes, it's good to have it. Thanks, Isaac Thanks, Isaac for having the io-pgtable formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to ensure that the io-pgtable format modules are loaded before loading the ARM SMMU driver module. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index d8c6bfd..a72649f 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations"); MODULE_AUTHOR("Will Deacon "); MODULE_ALIAS("platform:arm-smmu"); MODULE_LICENSE("GPL v2"); +MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s"); ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration
On 2020-12-22 11:27, Robin Murphy wrote: On 2020-12-22 00:44, Isaac J. Manjarres wrote: The io-pgtable code constructs an array of init functions for each page table format at compile time. This is not ideal, as this increases the footprint of the io-pgtable code, as well as prevents io-pgtable formats from being built as kernel modules. In preparation for modularizing the io-pgtable formats, switch to a dynamic registration scheme, where each io-pgtable format can register their init functions with the io-pgtable code at boot or module insertion time. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/io-pgtable-arm-v7s.c | 34 +- drivers/iommu/io-pgtable-arm.c | 90 ++-- drivers/iommu/io-pgtable.c | 94 -- include/linux/io-pgtable.h | 51 + 4 files changed, 209 insertions(+), 60 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 1d92ac9..89aad2f 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -835,7 +836,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { +static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { + .fmt= ARM_V7S, .alloc = arm_v7s_alloc_pgtable, .free = arm_v7s_free_pgtable, }; @@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void) pr_info("self test ok\n"); return 0; } -subsys_initcall(arm_v7s_do_selftests); +#else +static int arm_v7s_do_selftests(void) +{ + return 0; +} #endif + +static int __init arm_v7s_init(void) +{ + int ret; + + ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns); + if (ret < 0) { + pr_err("Failed to register ARM V7S format\n"); Super-nit: I think "v7s" should probably be lowercase there. Also general consistency WRT to showing the error code and whether or not to abbreviate "format" would be nice. Ok, I can fix this accordingly. + return ret; + } + + ret = arm_v7s_do_selftests(); + if (ret < 0) + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); + + return ret; +} +core_initcall(arm_v7s_init); + +static void __exit arm_v7s_exit(void) +{ + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); +} +module_exit(arm_v7s_exit); diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58..ff0ea2f 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = { - .alloc = arm_mali_lpae_alloc_pgtable, - .free = arm_lpae_free_pgtable, +static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = { + { + .fmt= ARM_32_LPAE_S1, + .alloc = arm_32_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt= ARM_32_LPAE_S2, + .alloc = arm_32_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt= ARM_64_LPAE_S1, + .alloc = arm_64_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt= ARM_64_LPAE_S2, + .alloc = arm_64_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt= ARM_MALI_LPAE, + .alloc = arm_mali_lpae_alloc_pgtable, + .free = arm_lpae_free_pgtable, + }, }; #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST @@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void) pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail); return fail ? -EFAULT : 0; } -
Re: [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules
On 2020-12-22 11:27, Robin Murphy wrote: On 2020-12-22 00:44, Isaac J. Manjarres wrote: The SMMU driver depends on the availability of the ARM LPAE and ARM V7S io-pgtable format code to work properly. In preparation Nit: we don't really depend on v7s - we *can* use it if it's available, address constraints are suitable, and the SMMU implementation actually supports it (many don't), but we can still quite happily not use it even so. LPAE is mandatory in the architecture so that's our only hard requirement, embodied in the kconfig select. This does mean there may technically still be a corner case involving ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime failure rather than a build error, so unless and until anyone demonstrates that it actually matters I don't feel particularly inclined to give it much thought. Robin. Okay, I'll fix up the commit message, as well as the code, so that it only depends on io-pgtable-arm. Thanks, Isaac for having the io-pgtable formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to ensure that the io-pgtable format modules are loaded before loading the ARM SMMU driver module. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index d8c6bfd..a72649f 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations"); MODULE_AUTHOR("Will Deacon "); MODULE_ALIAS("platform:arm-smmu"); MODULE_LICENSE("GPL v2"); +MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s"); ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/io-pgtable-arm: Prepare for modularization
On 2020-12-21 07:22, Robin Murphy wrote: On 2020-12-18 18:59, isa...@codeaurora.org wrote: On 2020-12-18 04:38, Robin Murphy wrote: On 2020-12-18 08:38, Isaac J. Manjarres wrote: The io-pgtable-arm and io-pgtable-arm-v7s source files will be compiled as separate modules, along with the io-pgtable source. Export the symbols for the io-pgtable init function structures for the io-pgtable module to use. In my current build tree, the io-pgtable glue itself is a whopping 379 bytes of code and data - is there really any benefit to all the additional overhead of making that modular? Given the number of different users (including AMD now), I think at this point we should start considering this as part of the IOMMU core, and just tweak the interface such that formats can register their own init_fns dynamically instead of the static array that's always horrible. Robin. Thanks for the feedback, Robin. This is an avenue I had explored a bit when modularizing the code. However, I came up with a few problems that I couldn't get around. 1) If we leave the io-pgtable glue as part of the core kernel, we need to ensure that the io-pgtable format modules get loaded prior to any driver that might use them (e.g. IOMMU drivers/other callers of alloc_io_pgtable_ops). a) This can get a bit messy, as there's no symbol dependencies between the callers of the io-pgtable code, and the page table format modules, since everything is through function pointers. This is handled for the IOMMU drivers through the devlink feature, but I don't see how we can leverage something like that here. I guess this isn't too much of a problem when everything is built-in, as the registration can happen in one of the earlier initcall levels. b) If we do run into a scenario where a client of io-pgtable tries to allocate a page table instance prior to the io-pgtable format module being loaded, I couldn't come up with a way of distinguishing between format module is not available at the moment vs format module will never be available. I don't think returning EPROBE_DEFER would be something nice to do in that case. Urgh, I see... yes, the current approach does work out as an unexpectedly neat way to avoid many of the pitfalls. However I'm not sure it actually avoids all of them - say you have a config like this: IPMMU_VMSA=y -> IO_PGTABLE_ARM_LPAE=y -> IO_PGTABLE=y MTK_IOMMU=m -> IO_PGTABLE_ARMV7S=m won't that still fail to link io-pgtable.o? Yes, you are correct, that would be problematic. 2) We would have to ensure that the format module cannot be unloaded while other clients are using it. I suppose this isn't as big as point #1 though, since it's something that can probably be handled through a similar ref count mechanism that we're using for modular IOMMU drivers. FWIW I think that would come out in the wash from resolving 1b - I'd assume there would have to be some sort of module_get() in there somewhere. I should probably go and look at how the crypto API handles its modular algorithms for more inspiration... So I looked through the crypto dir, and it seems like they--along with a few other kernel drivers--are using MODULE_SOFTDEP() to sort out these dependencies. Given the two reasons above, I went with the current approach, since it avoids both issues by creating symbol dependencies between client drivers, the io-pgtable drivers, and the io-pgtable format drivers, so that ensures that they are loaded in the correct order, and also prevents them from being removed, unless there aren't any users present. Having thought all that over, I'm now wondering what we really gain from this either way - if vendors can build and ship SoC-tailored configs, then they can already turn off formats they don't care about. If the aim is to ship a single config everywhere, then you'll still have to provision and load all possible formats on any system that needs any one of them, thanks to those "convenient" symbol dependencies. The promise in the cover letter doesn't seem to materialise :/ Robin. Given the feedback, this makes sense. I've come up with a second version of the patches that leaves the io-pgtable code in the kernel, and allows the formats to be modules, which better achieves what the cover-letter is trying to express :) I believe that with the second patch, we should be able to get to a place where the kernel just needs to provide io-pgtable, while vendors can provide either io-pgtable-arm or io-pgtable-arm-v7s or both, as needed. Here are the patches: https://lore.kernel.org/linux-iommu/1608597876-32367-1-git-send-email-isa...@codeaurora.org/T/#t Thanks, Isaac Thanks, Isaac Signed-off-by: Isaac J. Manjarres --- drivers/iommu/io-pgtable-arm-v7s.c | 4 drivers/iommu/io-pgtable-arm.c | 8 2 files changed, 12 insertions(+) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c ind
Re: [PATCH 1/3] iommu/io-pgtable-arm: Prepare for modularization
On 2020-12-18 04:38, Robin Murphy wrote: On 2020-12-18 08:38, Isaac J. Manjarres wrote: The io-pgtable-arm and io-pgtable-arm-v7s source files will be compiled as separate modules, along with the io-pgtable source. Export the symbols for the io-pgtable init function structures for the io-pgtable module to use. In my current build tree, the io-pgtable glue itself is a whopping 379 bytes of code and data - is there really any benefit to all the additional overhead of making that modular? Given the number of different users (including AMD now), I think at this point we should start considering this as part of the IOMMU core, and just tweak the interface such that formats can register their own init_fns dynamically instead of the static array that's always horrible. Robin. Thanks for the feedback, Robin. This is an avenue I had explored a bit when modularizing the code. However, I came up with a few problems that I couldn't get around. 1) If we leave the io-pgtable glue as part of the core kernel, we need to ensure that the io-pgtable format modules get loaded prior to any driver that might use them (e.g. IOMMU drivers/other callers of alloc_io_pgtable_ops). a) This can get a bit messy, as there's no symbol dependencies between the callers of the io-pgtable code, and the page table format modules, since everything is through function pointers. This is handled for the IOMMU drivers through the devlink feature, but I don't see how we can leverage something like that here. I guess this isn't too much of a problem when everything is built-in, as the registration can happen in one of the earlier initcall levels. b) If we do run into a scenario where a client of io-pgtable tries to allocate a page table instance prior to the io-pgtable format module being loaded, I couldn't come up with a way of distinguishing between format module is not available at the moment vs format module will never be available. I don't think returning EPROBE_DEFER would be something nice to do in that case. 2) We would have to ensure that the format module cannot be unloaded while other clients are using it. I suppose this isn't as big as point #1 though, since it's something that can probably be handled through a similar ref count mechanism that we're using for modular IOMMU drivers. Given the two reasons above, I went with the current approach, since it avoids both issues by creating symbol dependencies between client drivers, the io-pgtable drivers, and the io-pgtable format drivers, so that ensures that they are loaded in the correct order, and also prevents them from being removed, unless there aren't any users present. Thanks, Isaac Signed-off-by: Isaac J. Manjarres --- drivers/iommu/io-pgtable-arm-v7s.c | 4 drivers/iommu/io-pgtable-arm.c | 8 2 files changed, 12 insertions(+) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 1d92ac9..f062c1c 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -839,6 +840,7 @@ struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { .alloc = arm_v7s_alloc_pgtable, .free = arm_v7s_free_pgtable, }; +EXPORT_SYMBOL_GPL(io_pgtable_arm_v7s_init_fns); #ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S_SELFTEST @@ -984,3 +986,5 @@ static int __init arm_v7s_do_selftests(void) } subsys_initcall(arm_v7s_do_selftests); #endif + +MODULE_LICENSE("GPL v2"); diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58..2623d57 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -1047,26 +1048,31 @@ struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = { .alloc = arm_64_lpae_alloc_pgtable_s1, .free = arm_lpae_free_pgtable, }; +EXPORT_SYMBOL_GPL(io_pgtable_arm_64_lpae_s1_init_fns); struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = { .alloc = arm_64_lpae_alloc_pgtable_s2, .free = arm_lpae_free_pgtable, }; +EXPORT_SYMBOL_GPL(io_pgtable_arm_64_lpae_s2_init_fns); struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = { .alloc = arm_32_lpae_alloc_pgtable_s1, .free = arm_lpae_free_pgtable, }; +EXPORT_SYMBOL_GPL(io_pgtable_arm_32_lpae_s1_init_fns); struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = { .alloc = arm_32_lpae_alloc_pgtable_s2, .free = arm_lpae_free_pgtable, }; +EXPORT_SYMBOL_GPL(io_pgtable_arm_32_lpae_s2_init_fns); struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = { .alloc = arm_mali_lpae_alloc_pgtable, .free = arm_lpae_free_pgtable, }; +EXPORT_SYMBOL_GPL
Re: [RFC PATCH] iommu/iova: Add a best-fit algorithm
On 2020-02-17 08:03, Robin Murphy wrote: On 14/02/2020 11:06 pm, Isaac J. Manjarres wrote: From: Liam Mark Using the best-fit algorithm, instead of the first-fit algorithm, may reduce fragmentation when allocating IOVAs. What kind of pathological allocation patterns make that a serious problem? Is there any scope for simply changing the order of things in the callers? Do these drivers also run under other DMA API backends (e.g. 32-bit Arm)? The usecases where the IOVA space has been fragmented have non-deterministic allocation patterns, and thus, it's not feasible to change the allocation order to avoid fragmenting the IOVA space. From what we've observed, the usecases involve allocations of two types of buffers: one type of buffer between 1 KB to 4 MB in size, and another type of buffer between 1 KB to 400 MB in size. The pathological scenarios seem to arise when there are many (100+) randomly distributed non-power of two allocations, which in some cases leaves behind holes of up to 100+ MB in the IOVA space. Here are some examples that show the state of the IOVA space under which failure to allocate an IOVA was observed: Instance 1: Currently mapped total size : ~1.3GB Free space available : ~2GB Map for ~162MB fails. Max contiguous space available : < 162MB Instance 2: Currently mapped total size : ~950MB Free space available : ~2.3GB Map for ~320MB fails. Max contiguous space available : ~189MB Instance 3: Currently mapped total size : ~1.2GB Free space available : ~2.7GB Map for ~162MB fails. Max contiguous space available : <162MB We are still in the process of collecting data with the best-fit algorithm enabled to provide some numbers to show that it results in less IOVA space fragmentation. To answer your question about whether if this driver run under other DMA API backends: yes, such as 32 bit ARM. More generally, if a driver knows enough to want to second-guess a generic DMA API allocator, that's a reasonable argument that it should perhaps be properly IOMMU-aware and managing its own address space anyway. Perhaps this effort might be better spent finishing off the DMA ops bypass stuff to make that approach more robust and welcoming. Robin. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/dma-iommu.c | 17 +++ drivers/iommu/iova.c | 73 +-- include/linux/dma-iommu.h | 7 + include/linux/iova.h | 1 + 4 files changed, 96 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index a2e96a5..af08770 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -364,9 +364,26 @@ static int iommu_dma_deferred_attach(struct device *dev, if (unlikely(ops->is_attach_deferred && ops->is_attach_deferred(domain, dev))) return iommu_attach_device(domain, dev); + return 0; +} + +/* + * Should be called prior to using dma-apis. + */ +int iommu_dma_enable_best_fit_algo(struct device *dev) +{ + struct iommu_domain *domain; + struct iova_domain *iovad; + + domain = iommu_get_domain_for_dev(dev); + if (!domain || !domain->iova_cookie) + return -EINVAL; + iovad = &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad; + iovad->best_fit = true; return 0; } +EXPORT_SYMBOL(iommu_dma_enable_best_fit_algo); /** * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 0e6a953..716b05f 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -50,6 +50,7 @@ static unsigned long iova_rcache_get(struct iova_domain *iovad, iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR; rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node); rb_insert_color(&iovad->anchor.node, &iovad->rbroot); + iovad->best_fit = false; init_iova_rcaches(iovad); } EXPORT_SYMBOL_GPL(init_iova_domain); @@ -227,6 +228,69 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, return -ENOMEM; } +static int __alloc_and_insert_iova_best_fit(struct iova_domain *iovad, + unsigned long size, unsigned long limit_pfn, + struct iova *new, bool size_aligned) +{ + struct rb_node *curr, *prev; + struct iova *curr_iova, *prev_iova; + unsigned long flags; + unsigned long align_mask = ~0UL; + struct rb_node *candidate_rb_parent; + unsigned long new_pfn, candidate_pfn = ~0UL; + unsigned long gap, candidate_gap = ~0UL; + + if (size_aligned) + align_mask <<= limit_align(iovad, fls_long(size - 1)); + + /* Walk the tree backwards */ + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); + curr = &iov
Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range
On 2020-02-19 03:15, Will Deacon wrote: On Tue, Feb 18, 2020 at 05:57:18PM -0800, isa...@codeaurora.org wrote: On 2020-02-17 07:50, Robin Murphy wrote: > On 17/02/2020 8:01 am, Christoph Hellwig wrote: > > On Fri, Feb 14, 2020 at 02:58:16PM -0800, Isaac J. Manjarres wrote: > > > From: Liam Mark > > > > > > Some devices have a memory map which contains gaps or holes. > > > In order for the device to have as much IOVA space as possible, > > > allow its driver to inform the DMA-IOMMU layer that it should > > > not allocate addresses from these holes. > > > > Layering violation. dma-iommu is the translation layer between the > > DMA API and the IOMMU API. And calls into it from drivers performing > > DMA mappings need to go through the DMA API (and be documented there). > > +1 > > More than that, though, we already have "holes in the address space" > support for the sake of PCI host bridge windows - assuming this is the > same kind of thing (i.e. the holes are between memory regions and > other resources in PA space, so are only relevant once address > translation comes into the picture), then this is IOMMU API level To make sure that we're on the same page, this support alludes to the handling in dma-iommu.c that reserves portions of the IOVA space for the PCI host bridge windows, correct? If so, then yes, this is similar. > stuff, so even a DMA API level interface would be inappropriate. Does this mean that the driver should be managing the IOVA space and mappings for this device using the IOMMU API? If so, is the rationale for this because the device driver can have the information of what IOVA ranges can and cannot be used? Shouldn't there be a generic way of informing an IOMMU driver about these reserved ranges? Perhaps through a device tree property, instead of deferring this type of management to the driver? Before we dive into designing that, can you please clarify whether the reserved IOVA range applies to all DMA masters mastering through a particular SMMU, or whether it's just about one specific master? I was assuming the former, but wanted to be sure. This situation currently applies to one master. Thanks, Will Thanks, Isaac ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range
On 2020-02-17 07:50, Robin Murphy wrote: On 17/02/2020 8:01 am, Christoph Hellwig wrote: On Fri, Feb 14, 2020 at 02:58:16PM -0800, Isaac J. Manjarres wrote: From: Liam Mark Some devices have a memory map which contains gaps or holes. In order for the device to have as much IOVA space as possible, allow its driver to inform the DMA-IOMMU layer that it should not allocate addresses from these holes. Layering violation. dma-iommu is the translation layer between the DMA API and the IOMMU API. And calls into it from drivers performing DMA mappings need to go through the DMA API (and be documented there). +1 More than that, though, we already have "holes in the address space" support for the sake of PCI host bridge windows - assuming this is the same kind of thing (i.e. the holes are between memory regions and other resources in PA space, so are only relevant once address translation comes into the picture), then this is IOMMU API level To make sure that we're on the same page, this support alludes to the handling in dma-iommu.c that reserves portions of the IOVA space for the PCI host bridge windows, correct? If so, then yes, this is similar. stuff, so even a DMA API level interface would be inappropriate. Does this mean that the driver should be managing the IOVA space and mappings for this device using the IOMMU API? If so, is the rationale for this because the device driver can have the information of what IOVA ranges can and cannot be used? Shouldn't there be a generic way of informing an IOMMU driver about these reserved ranges? Perhaps through a device tree property, instead of deferring this type of management to the driver? Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Add support for DMA_ATTR_SYS_CACHE
On 2019-10-25 22:30, Christoph Hellwig wrote: The definition makes very little sense. Can you please clarify what part doesn’t make sense, and why? This is really just an extension of this patch that got mainlined, so that clients that use the DMA API can use IOMMU_QCOM_SYS_CACHE as well: https://patchwork.kernel.org/patch/10946099/ Any without a user in the same series it is a complete no-go anyway. IOMMU_QCOM_SYS_CACHE does not have any current users in the mainline, nor did it have it in the patch series in which it got merged, yet it is still present? Furthermore, there are plans to upstream support for one of our SoCs that may benefit from this, as seen here: https://www.spinics.net/lists/iommu/msg39608.html. Thanks, Isaac ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu