Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"
On 06/09/2024 11:56 am, Will Deacon wrote: On Thu, Sep 05, 2024 at 05:27:28PM +0100, Robin Murphy wrote: On 05/09/2024 4:53 pm, Will Deacon wrote: On Thu, Sep 05, 2024 at 05:49:56AM -0700, Rob Clark wrote: From: Rob Clark This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0. It was causing gpu smmu faults on x1e80100. I _think_ what is causing this is the change in ordering of __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable memory) and io_pgtable_tlb_flush_walk(). I'm not entirely sure how this patch is supposed to work correctly in the face of other concurrent translations (to buffers unrelated to the one being unmapped(), because after the io_pgtable_tlb_flush_walk() we can have stale data read back into the tlb. Signed-off-by: Rob Clark --- drivers/iommu/io-pgtable-arm.c | 31 ++- 1 file changed, 14 insertions(+), 17 deletions(-) Please can you try the diff below, instead? Given that the GPU driver's .tlb_add_page is a no-op, I can't see this making a difference. In fact, given that msm_iommu_pagetable_unmap() still does a brute-force iommu_flush_iotlb_all() after io-pgtable returns, and in fact only recently made .tlb_flush_walk start doing anything either for the sake of the map path, I'm now really wondering how this patch has had any effect at all... :/ Hmm, yup. Looks like Rob has come back to say the problem lies elsewhere anyway. One thing below though... Will --->8 diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 0e67f1721a3d..0a32e9499e2c 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -672,7 +672,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, /* Clear the remaining entries */ __arm_lpae_clear_pte(ptep, &iop->cfg, i); - if (gather && !iommu_iotlb_gather_queued(gather)) + if (!iommu_iotlb_gather_queued(gather)) Note that this would reintroduce the latent issue which was present originally, wherein iommu_iotlb_gather_queued(NULL) is false, but if we actually allow a NULL gather to be passed to io_pgtable_tlb_add_page() it may end up being dereferenced (e.g. in arm-smmu-v3). I think there is still something to fix here. arm_lpae_init_pte() can pass a NULL gather to __arm_lpae_unmap() and I don't think skipping the invalidation is correct in that case. Either the drivers need to handle that or we shouldn't be passing NULL. What do you think? The subtlety there is that in that case it's always a non-leaf PTE, so all that goes back to the driver is io_pgtable_tlb_flush_walk() and the gather is never used. Thanks, Robin.
Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"
On 2024-09-05 6:10 pm, Rob Clark wrote: On Thu, Sep 5, 2024 at 10:00 AM Rob Clark wrote: On Thu, Sep 5, 2024 at 9:27 AM Robin Murphy wrote: On 05/09/2024 4:53 pm, Will Deacon wrote: Hi Rob, On Thu, Sep 05, 2024 at 05:49:56AM -0700, Rob Clark wrote: From: Rob Clark This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0. It was causing gpu smmu faults on x1e80100. I _think_ what is causing this is the change in ordering of __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable memory) and io_pgtable_tlb_flush_walk(). I'm not entirely sure how this patch is supposed to work correctly in the face of other concurrent translations (to buffers unrelated to the one being unmapped(), because after the io_pgtable_tlb_flush_walk() we can have stale data read back into the tlb. Signed-off-by: Rob Clark --- drivers/iommu/io-pgtable-arm.c | 31 ++- 1 file changed, 14 insertions(+), 17 deletions(-) Please can you try the diff below, instead? Given that the GPU driver's .tlb_add_page is a no-op, I can't see this making a difference. In fact, given that msm_iommu_pagetable_unmap() still does a brute-force iommu_flush_iotlb_all() after io-pgtable returns, and in fact only recently made .tlb_flush_walk start doing anything either for the sake of the map path, I'm now really wondering how this patch has had any effect at all... :/ Yeah.. and unfortunately the TBU code only supports two devices so far, so I can't easily repro with TBU enabled atm. Hmm.. __arm_lpae_unmap() is also called in the ->map() path, although not sure how that changes things. Ok, an update.. after a reboot, still with this patch reverted, I once again see faults. So I guess that vindicates the original patch, and leaves me still searching.. fwiw, fault info from the gpu devcore: - fault-info: - ttbr0=000919306000 - iova=000100c17000 - dir=WRITE - type=UNKNOWN - source=CP pgtable-fault-info: - ttbr0: 00090ca4 - asid: 0 - ptes: 00095db47003 00095db48003 000914c8f003 0008fd7f0f47 - the 'ptes' part shows the table walk, which looks ok to me.. But is it the right pagetable at all, given that the "ttbr0" values appear to be indicating different places? Robin.
Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"
On 05/09/2024 4:53 pm, Will Deacon wrote: Hi Rob, On Thu, Sep 05, 2024 at 05:49:56AM -0700, Rob Clark wrote: From: Rob Clark This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0. It was causing gpu smmu faults on x1e80100. I _think_ what is causing this is the change in ordering of __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable memory) and io_pgtable_tlb_flush_walk(). I'm not entirely sure how this patch is supposed to work correctly in the face of other concurrent translations (to buffers unrelated to the one being unmapped(), because after the io_pgtable_tlb_flush_walk() we can have stale data read back into the tlb. Signed-off-by: Rob Clark --- drivers/iommu/io-pgtable-arm.c | 31 ++- 1 file changed, 14 insertions(+), 17 deletions(-) Please can you try the diff below, instead? Given that the GPU driver's .tlb_add_page is a no-op, I can't see this making a difference. In fact, given that msm_iommu_pagetable_unmap() still does a brute-force iommu_flush_iotlb_all() after io-pgtable returns, and in fact only recently made .tlb_flush_walk start doing anything either for the sake of the map path, I'm now really wondering how this patch has had any effect at all... :/ Will --->8 diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 0e67f1721a3d..0a32e9499e2c 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -672,7 +672,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, /* Clear the remaining entries */ __arm_lpae_clear_pte(ptep, &iop->cfg, i); - if (gather && !iommu_iotlb_gather_queued(gather)) + if (!iommu_iotlb_gather_queued(gather)) Note that this would reintroduce the latent issue which was present originally, wherein iommu_iotlb_gather_queued(NULL) is false, but if we actually allow a NULL gather to be passed to io_pgtable_tlb_add_page() it may end up being dereferenced (e.g. in arm-smmu-v3). Thanks, Robin. for (int j = 0; j < i; j++) io_pgtable_tlb_add_page(iop, gather, iova + j * size, size);
Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"
On 05/09/2024 2:57 pm, Rob Clark wrote: On Thu, Sep 5, 2024 at 6:24 AM Robin Murphy wrote: On 05/09/2024 1:49 pm, Rob Clark wrote: From: Rob Clark This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0. It was causing gpu smmu faults on x1e80100. I _think_ what is causing this is the change in ordering of __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable memory) and io_pgtable_tlb_flush_walk(). As I just commented, how do you believe the order of operations between: __arm_lpae_clear_pte(); if (!iopte_leaf()) { io_pgtable_tlb_flush_walk(); and: if (!iopte_leaf()) { __arm_lpae_clear_pte(); io_pgtable_tlb_flush_walk(); fundamentally differs? from my reading of the original patch, the ordering is the same for non-leaf nodes, but not for leaf nodes But tlb_flush_walk is never called for leaf entries either way, so no such ordering exists... And the non-leaf path still calls __arm_lpae_clear_pte() and io_pgtable_tlb_add_page() in the same order relative to each other too, it's just now done for the whole range in one go, after any non-leaf entries have already been dealt with. Thanks, Robin. I'm not saying there couldn't be some subtle bug in the implementation which we've all missed, but I still can't see an issue with the intended logic. I'm not entirely sure how this patch is supposed to work correctly in the face of other concurrent translations (to buffers unrelated to the one being unmapped(), because after the io_pgtable_tlb_flush_walk() we can have stale data read back into the tlb. Read back from where? The ex-table PTE which was already set to zero before tlb_flush_walk was called? And isn't the hilariously overcomplicated TBU driver supposed to be telling you exactly what happened here? Otherwise I'm going to continue to seriously question the purpose of shoehorning that upstream at all... I guess I could try the TBU driver. But I already had my patchset to extract the pgtable walk for gpu devcore dump, and that is telling me that the CPU view of the pgtable is fine. Which I think just leaves a tlbinv problem. If that is the case, swapping the order of leaf node cpu cache ops and tlbinv ops seems like the cause. But maybe I'm missing something. BR, -R Thanks, Robin. Signed-off-by: Rob Clark --- drivers/iommu/io-pgtable-arm.c | 31 ++- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 16e51528772d..85261baa3a04 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -274,13 +274,13 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries, sizeof(*ptep) * num_entries, DMA_TO_DEVICE); } -static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg, int num_entries) +static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg) { - for (int i = 0; i < num_entries; i++) - ptep[i] = 0; - if (!cfg->coherent_walk && num_entries) - __arm_lpae_sync_pte(ptep, num_entries, cfg); + *ptep = 0; + + if (!cfg->coherent_walk) + __arm_lpae_sync_pte(ptep, 1, cfg); } static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, @@ -653,28 +653,25 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start; num_entries = min_t(int, pgcount, max_entries); - /* Find and handle non-leaf entries */ - for (i = 0; i < num_entries; i++) { - pte = READ_ONCE(ptep[i]); + while (i < num_entries) { + pte = READ_ONCE(*ptep); if (WARN_ON(!pte)) break; - if (!iopte_leaf(pte, lvl, iop->fmt)) { - __arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1); + __arm_lpae_clear_pte(ptep, &iop->cfg); + if (!iopte_leaf(pte, lvl, iop->fmt)) { /* Also flush any partial walks */ io_pgtable_tlb_flush_walk(iop, iova + i * size, size, ARM_LPAE_GRANULE(data)); __arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data)); + } else if (!iommu_iotlb_gather_queued(gather)) { + io_pgtable_tlb_add_page(iop, gather, iova + i * size, size); } - } - /* Clear the remaining entries */ - __arm_lpae_clear_pte(ptep, &iop->cfg, i); - -
Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"
On 05/09/2024 2:24 pm, Robin Murphy wrote: On 05/09/2024 1:49 pm, Rob Clark wrote: From: Rob Clark This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0. It was causing gpu smmu faults on x1e80100. I _think_ what is causing this is the change in ordering of __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable memory) and io_pgtable_tlb_flush_walk(). As I just commented, how do you believe the order of operations between: ...or at least I *thought* I'd just commented, but Thunderbird is now denying any knowledge of the reply to your other mail I swore I sent a couple of hours ago :/ Apologies for any confusion, Robin. __arm_lpae_clear_pte(); if (!iopte_leaf()) { io_pgtable_tlb_flush_walk(); and: if (!iopte_leaf()) { __arm_lpae_clear_pte(); io_pgtable_tlb_flush_walk(); fundamentally differs? I'm not saying there couldn't be some subtle bug in the implementation which we've all missed, but I still can't see an issue with the intended logic. I'm not entirely sure how this patch is supposed to work correctly in the face of other concurrent translations (to buffers unrelated to the one being unmapped(), because after the io_pgtable_tlb_flush_walk() we can have stale data read back into the tlb. Read back from where? The ex-table PTE which was already set to zero before tlb_flush_walk was called? And isn't the hilariously overcomplicated TBU driver supposed to be telling you exactly what happened here? Otherwise I'm going to continue to seriously question the purpose of shoehorning that upstream at all... Thanks, Robin. Signed-off-by: Rob Clark --- drivers/iommu/io-pgtable-arm.c | 31 ++- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 16e51528772d..85261baa3a04 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -274,13 +274,13 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries, sizeof(*ptep) * num_entries, DMA_TO_DEVICE); } -static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg, int num_entries) +static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg) { - for (int i = 0; i < num_entries; i++) - ptep[i] = 0; - if (!cfg->coherent_walk && num_entries) - __arm_lpae_sync_pte(ptep, num_entries, cfg); + *ptep = 0; + + if (!cfg->coherent_walk) + __arm_lpae_sync_pte(ptep, 1, cfg); } static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, @@ -653,28 +653,25 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start; num_entries = min_t(int, pgcount, max_entries); - /* Find and handle non-leaf entries */ - for (i = 0; i < num_entries; i++) { - pte = READ_ONCE(ptep[i]); + while (i < num_entries) { + pte = READ_ONCE(*ptep); if (WARN_ON(!pte)) break; - if (!iopte_leaf(pte, lvl, iop->fmt)) { - __arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1); + __arm_lpae_clear_pte(ptep, &iop->cfg); + if (!iopte_leaf(pte, lvl, iop->fmt)) { /* Also flush any partial walks */ io_pgtable_tlb_flush_walk(iop, iova + i * size, size, ARM_LPAE_GRANULE(data)); __arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data)); + } else if (!iommu_iotlb_gather_queued(gather)) { + io_pgtable_tlb_add_page(iop, gather, iova + i * size, size); } - } - /* Clear the remaining entries */ - __arm_lpae_clear_pte(ptep, &iop->cfg, i); - - if (gather && !iommu_iotlb_gather_queued(gather)) - for (int j = 0; j < i; j++) - io_pgtable_tlb_add_page(iop, gather, iova + j * size, size); + ptep++; + i++; + } return i * size; } else if (iopte_leaf(pte, lvl, iop->fmt)) {
Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"
On 05/09/2024 1:49 pm, Rob Clark wrote: From: Rob Clark This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0. It was causing gpu smmu faults on x1e80100. I _think_ what is causing this is the change in ordering of __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable memory) and io_pgtable_tlb_flush_walk(). As I just commented, how do you believe the order of operations between: __arm_lpae_clear_pte(); if (!iopte_leaf()) { io_pgtable_tlb_flush_walk(); and: if (!iopte_leaf()) { __arm_lpae_clear_pte(); io_pgtable_tlb_flush_walk(); fundamentally differs? I'm not saying there couldn't be some subtle bug in the implementation which we've all missed, but I still can't see an issue with the intended logic. I'm not entirely sure how this patch is supposed to work correctly in the face of other concurrent translations (to buffers unrelated to the one being unmapped(), because after the io_pgtable_tlb_flush_walk() we can have stale data read back into the tlb. Read back from where? The ex-table PTE which was already set to zero before tlb_flush_walk was called? And isn't the hilariously overcomplicated TBU driver supposed to be telling you exactly what happened here? Otherwise I'm going to continue to seriously question the purpose of shoehorning that upstream at all... Thanks, Robin. Signed-off-by: Rob Clark --- drivers/iommu/io-pgtable-arm.c | 31 ++- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 16e51528772d..85261baa3a04 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -274,13 +274,13 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries, sizeof(*ptep) * num_entries, DMA_TO_DEVICE); } -static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg, int num_entries) +static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg) { - for (int i = 0; i < num_entries; i++) - ptep[i] = 0; - if (!cfg->coherent_walk && num_entries) - __arm_lpae_sync_pte(ptep, num_entries, cfg); + *ptep = 0; + + if (!cfg->coherent_walk) + __arm_lpae_sync_pte(ptep, 1, cfg); } static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, @@ -653,28 +653,25 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start; num_entries = min_t(int, pgcount, max_entries); - /* Find and handle non-leaf entries */ - for (i = 0; i < num_entries; i++) { - pte = READ_ONCE(ptep[i]); + while (i < num_entries) { + pte = READ_ONCE(*ptep); if (WARN_ON(!pte)) break; - if (!iopte_leaf(pte, lvl, iop->fmt)) { - __arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1); + __arm_lpae_clear_pte(ptep, &iop->cfg); + if (!iopte_leaf(pte, lvl, iop->fmt)) { /* Also flush any partial walks */ io_pgtable_tlb_flush_walk(iop, iova + i * size, size, ARM_LPAE_GRANULE(data)); __arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data)); + } else if (!iommu_iotlb_gather_queued(gather)) { + io_pgtable_tlb_add_page(iop, gather, iova + i * size, size); } - } - /* Clear the remaining entries */ - __arm_lpae_clear_pte(ptep, &iop->cfg, i); - - if (gather && !iommu_iotlb_gather_queued(gather)) - for (int j = 0; j < i; j++) - io_pgtable_tlb_add_page(iop, gather, iova + j * size, size); + ptep++; + i++; + } return i * size; } else if (iopte_leaf(pte, lvl, iop->fmt)) {
Re: [PATCH v4 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk
On 23/05/2024 6:52 pm, Rob Clark wrote: From: Rob Clark Add an io-pgtable method to walk the pgtable returning the raw PTEs that would be traversed for a given iova access. Have to say I'm a little torn here - with my iommu-dma hat on I'm not super enthusiastic about adding any more overhead to iova_to_phys, but in terms of maintaining io-pgtable I do like the overall shape of the implementation... Will, how much would you hate a compromise of inlining iova_to_phys as the default walk behaviour if cb is NULL? :) That said, looking at the unmap figures for dma_map_benchmark on a Neoverse N1, any difference I think I see is still well within the noise, so maybe a handful of extra indirect calls isn't really enough to worry about? Cheers, Robin. Signed-off-by: Rob Clark --- drivers/iommu/io-pgtable-arm.c | 51 -- include/linux/io-pgtable.h | 4 +++ 2 files changed, 46 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index f7828a7aad41..f47a0e64bb35 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -693,17 +693,19 @@ static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iov data->start_level, ptep); } -static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, -unsigned long iova) +static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned long iova, + int (*cb)(void *cb_data, void *pte, int level), + void *cb_data) { struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); arm_lpae_iopte pte, *ptep = data->pgd; int lvl = data->start_level; + int ret; do { /* Valid IOPTE pointer? */ if (!ptep) - return 0; + return -EFAULT; /* Grab the IOPTE we're interested in */ ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); @@ -711,22 +713,52 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, /* Valid entry? */ if (!pte) - return 0; + return -EFAULT; + + ret = cb(cb_data, &pte, lvl); + if (ret) + return ret; - /* Leaf entry? */ + /* Leaf entry? If so, we've found the translation */ if (iopte_leaf(pte, lvl, data->iop.fmt)) - goto found_translation; + return 0; /* Take it to the next level */ ptep = iopte_deref(pte, data); } while (++lvl < ARM_LPAE_MAX_LEVELS); /* Ran out of page tables to walk */ + return -EFAULT; +} + +struct iova_to_phys_walk_data { + arm_lpae_iopte pte; + int level; +}; + +static int iova_to_phys_walk_cb(void *cb_data, void *pte, int level) +{ + struct iova_to_phys_walk_data *d = cb_data; + + d->pte = *(arm_lpae_iopte *)pte; + d->level = level; + return 0; +} + +static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, +unsigned long iova) +{ + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct iova_to_phys_walk_data d; + int ret; + + ret = arm_lpae_pgtable_walk(ops, iova, iova_to_phys_walk_cb, &d); + if (ret) + return 0; -found_translation: - iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1); - return iopte_to_paddr(pte, data) | iova; + iova &= (ARM_LPAE_BLOCK_SIZE(d.level, data) - 1); + return iopte_to_paddr(d.pte, data) | iova; } static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) @@ -807,6 +839,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg) .map_pages = arm_lpae_map_pages, .unmap_pages= arm_lpae_unmap_pages, .iova_to_phys = arm_lpae_iova_to_phys, + .pgtable_walk = arm_lpae_pgtable_walk, }; return data; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 86cf1f7ae389..261b48af068a 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -177,6 +177,7 @@ struct io_pgtable_cfg { * @map_pages:Map a physically contiguous range of pages of the same size. * @unmap_pages: Unmap a range of virtually contiguous pages of the same size. * @iova_to_phys: Translate iova to physical address. + * @pgtable_walk: (optional) Perform a page table walk for a given iova. * * These functions map directly onto the iommu_ops member functions with * the same names. @@ -190,6 +191,9 @@ struct io_pgtable_ops { struct iommu_iotlb_gather *gather); phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
Re: [PATCH v2] iommu/arm-smmu-qcom: Add missing GMU entry to match table
On 2023-12-10 6:06 pm, Rob Clark wrote: From: Rob Clark In some cases the firmware expects cbndx 1 to be assigned to the GMU, so we also want the default domain for the GMU to be an identy domain. This way it does not get a context bank assigned. Without this, both of_dma_configure() and drm/msm's iommu_domain_attach() will trigger allocating and configuring a context bank. So GMU ends up attached to both cbndx 1 and later cbndx 2. This arrangement seemingly confounds and surprises the firmware if the GPU later triggers a translation fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU getting wedged and the GPU stuck without memory access. Reviewed-by: Robin Murphy Cc: sta...@vger.kernel.org Signed-off-by: Rob Clark --- I didn't add a fixes tag because really this issue has been there all along, but either didn't matter with other firmware or we didn't notice the problem. drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 549ae4dba3a6..d326fa230b96 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -243,6 +243,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = { { .compatible = "qcom,adreno" }, + { .compatible = "qcom,adreno-gmu" }, { .compatible = "qcom,mdp4" }, { .compatible = "qcom,mdss" }, { .compatible = "qcom,sc7180-mdss" },
Re: [PATCH] iommu/arm-smmu-qcom: Add missing GMU entry to match table
On 07/12/2023 9:24 pm, Rob Clark wrote: From: Rob Clark We also want the default domain for the GMU to be an identy domain, so it does not get a context bank assigned. Without this, both of_dma_configure() and drm/msm's iommu_domain_attach() will trigger allocating and configuring a context bank. So GMU ends up attached to both cbndx 1 and cbndx 2. I can't help but read this as implying that it gets attached to both *at the same time*, which would be indicative of a far more serious problem in the main driver and/or IOMMU core code. However, from what we discussed on IRC last night, it sounds like the key point here is more straightforwardly that firmware expects the GMU to be using context bank 1, in a vaguely similar fashion to how context bank 0 is special for the GPU. Clarifying that would help explain why we're just doing this as a trick to influence the allocator (i.e. unlike some of the other devices in this list we don't actually need the properties of the identity domain itself). In future it might be nice to reserve this explicitly on platforms which need it and extend qcom_adreno_smmu_alloc_context_bank() to handle the GMU as well, but I don't object to this patch as an immediate quick fix for now, especially as something nice and easy for stable (I'd agree with Johan in that regard). Thanks, Robin. This arrangement seemingly confounds and surprises the firmware if the GPU later triggers a translation fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU getting wedged and the GPU stuck without memory access. Signed-off-by: Rob Clark --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 549ae4dba3a6..d326fa230b96 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -243,6 +243,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = { { .compatible = "qcom,adreno" }, + { .compatible = "qcom,adreno-gmu" }, { .compatible = "qcom,mdp4" }, { .compatible = "qcom,mdss" }, { .compatible = "qcom,sc7180-mdss" },
Re: [PATCH 1/3] iommu/msm-iommu: don't limit the driver too much
On 07/12/2023 12:54 pm, Dmitry Baryshkov wrote: In preparation of dropping most of ARCH_QCOM subtypes, stop limiting the driver just to those machines. Allow it to be built for any 32-bit Qualcomm platform (ARCH_QCOM). Acked-by: Robin Murphy Unless Joerg disagrees, I think it should be fine if you want to take this via the SoC tree. Thanks, Robin. Signed-off-by: Dmitry Baryshkov --- drivers/iommu/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 7673bb82945b..fd67f586f010 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -178,7 +178,7 @@ config FSL_PAMU config MSM_IOMMU bool "MSM IOMMU Support" depends on ARM - depends on ARCH_MSM8X60 || ARCH_MSM8960 || COMPILE_TEST + depends on ARCH_QCOM || COMPILE_TEST select IOMMU_API select IOMMU_IO_PGTABLE_ARMV7S help
Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU
On 29/09/2023 4:45 pm, Will Deacon wrote: On Mon, Sep 25, 2023 at 06:54:42PM +0100, Robin Murphy wrote: On 2023-04-10 19:52, Dmitry Baryshkov wrote: If the Adreno SMMU is dma-coherent, allocation will fail unless we disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the coherent SMMUs (like we have on sm8350 platform). Hmm, but is it right that it should fail in the first place? The fact is that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly can't see why the io-pgtable code is going out of its way to explicitly reject a request to give them the same attribute it's already giving then anyway :/ Even if the original intent was for the quirk to have an over-specific implication of representing inner-NC as well, that hardly seems useful if what we've ended up with in practice is a nonsensical-looking check in one place and then a weird hacky bodge in another purely to work around it. Does anyone know a good reason why this is the way it is? I think it was mainly because the quick doesn't make sense for a coherent page-table walker and we could in theory use that bit for something else in that case. Yuck, even if we did want some horrible notion of quirks being conditional on parts of the config rather than just the format, then the users would need to be testing for the same condition as the pagetable code itself (i.e. cfg->coherent_walk), rather than hoping some other property of something else indirectly reflects the right information - e.g. there'd be no hope of backporting this particular bodge before 5.19 where the old iommu_capable(IOMMU_CAP_CACHE_COHERENCY) always returned true, and in future we could conceivably support coherent SMMUs being configured for non-coherent walks on a per-domain basis. Furthermore, if we did overload a flag to have multiple meanings, then we'd have no way of knowing which one the caller was actually expecting, thus the illusion of being able to validate calls in the meantime isn't necessarily as helpful as it seems, particularly in a case where the "wrong" interpretation would be to have no effect anyway. Mostly though I'd hope that if we ever got anywhere near the point of running out of quirk bits we'd have already realised that it's time for a better interface :( Based on that, I think that when I do get round to needing to touch this code, I'll propose just streamlining the whole quirk. Cheers, Robin.
Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU
On 2023-04-10 19:52, Dmitry Baryshkov wrote: If the Adreno SMMU is dma-coherent, allocation will fail unless we disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the coherent SMMUs (like we have on sm8350 platform). Hmm, but is it right that it should fail in the first place? The fact is that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly can't see why the io-pgtable code is going out of its way to explicitly reject a request to give them the same attribute it's already giving then anyway :/ Even if the original intent was for the quirk to have an over-specific implication of representing inner-NC as well, that hardly seems useful if what we've ended up with in practice is a nonsensical-looking check in one place and then a weird hacky bodge in another purely to work around it. Does anyone know a good reason why this is the way it is? [ just came across this code in the tree while trying to figure out what to do with iommu_set_pgtable_quirks()... ] Thanks, Robin. Fixes: 54af0ceb7595 ("arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU nodes") Reported-by: David Heidelberg Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 2942d2548ce6..f74495dcbd96 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1793,7 +1793,8 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) * This allows GPU to set the bus attributes required to use system * cache on behalf of the iommu page table walker. */ - if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice)) + if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice) && + !device_iommu_capable(&pdev->dev, IOMMU_CAP_CACHE_COHERENCY)) quirks |= IO_PGTABLE_QUIRK_ARM_OUTER_WBWA; return adreno_iommu_create_address_space(gpu, pdev, quirks);
Re: [Freedreno] [PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()
On 2023-08-18 22:32, Jason Gunthorpe wrote: It turns out several drivers are calling of_dma_configure() outside the expected bus_type.dma_configure op. This ends up being mis-locked and triggers a lockdep assertion, or instance: iommu_probe_device_locked+0xd4/0xe4 of_iommu_configure+0x10c/0x200 of_dma_configure_id+0x104/0x3b8 a6xx_gmu_init+0x4c/0xccc [msm] a6xx_gpu_init+0x3ac/0x770 [msm] adreno_bind+0x174/0x2ac [msm] component_bind_all+0x118/0x24c msm_drm_bind+0x1e8/0x6c4 [msm] try_to_bring_up_aggregate_device+0x168/0x1d4 __component_add+0xa8/0x170 component_add+0x14/0x20 dsi_dev_attach+0x20/0x2c [msm] dsi_host_attach+0x9c/0x144 [msm] devm_mipi_dsi_attach+0x34/0xb4 lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc] lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc] i2c_device_probe+0x14c/0x290 really_probe+0x148/0x2b4 __driver_probe_device+0x78/0x12c driver_probe_device+0x3c/0x160 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0xa8/0x1b0 device_initial_probe+0x14/0x20 bus_probe_device+0xb0/0xb4 deferred_probe_work_func+0x8c/0xc8 process_one_work+0x1ec/0x53c worker_thread+0x298/0x408 kthread+0x124/0x128 ret_from_fork+0x10/0x20 It is subtle and was never documented or enforced, but there has always been an assumption that of_dma_configure_id() is not concurrent. It makes several calls into the iommu layer that require this, including dev_iommu_get(). The majority of cases have been preventing concurrency using the device_lock(). Thus the new lock debugging added exposes an existing problem in drivers. On inspection this looks like a theoretical locking problem as generally the cases are already assuming they are the exclusive (single threaded) user of the target device. Sorry to be blunt, but the only problem is that you've introduced an idealistic new locking scheme which failed to take into account how things currently actually work, and is broken and achieving nothing but causing problems. The solution is to drop those locking patches entirely and rethink the whole thing. When their sole purpose was to improve the locking and make it easier to reason about, and now the latest "fix" is now to remove one of the assertions which forms the fundamental basis for that reasoning, then the point has clearly been lost. All we've done is churned a dodgy and incomplete locking scheme into a *different* dodgy and incomplete locking scheme. I do not think that continuing to dig in deeper is the way out of the hole... It's now rc7, and I have little confidence that aren't still more latent problems which just haven't been hit yet (e.g. acpi_dma_configure() is also called in different contexts relative to the device lock, which is absolutely by design and not broken). And on the subject of idealism, the fact is that doing IOMMU configuration based on driver probe via bus->dma_configure is *fundamentally wrong* and breaking a bunch of other IOMMU API assumptions, so it is not a robust foundation to build anything upon in the first place. The problem it causes with broken groups has been known about for several years now, however it's needed a lot of work to get to the point of being able to fix it properly (FWIW that is now #2 on my priority list after getting the bus ops stuff done, which should also make it easier). Thanks, Robin. Sadly, there are deeper technical problems with all of the places doing this. There are several problemetic patterns: 1) Probe a driver on device A and then steal device B and use it as part of the driver operation. Since no driver was probed to device B it means we never called bus_type.dma_configure and thus the drivers hackily try to open code this. Unfortunately nothing prevents another driver from binding to device B and creating total chaos. eg vfio bind triggered by userspace 2) Probe a driver on device A and then create a new platform driver B for a fwnode that doesn't have one, then do #1 This has the same essential problem as #1, the new device is never probed so the hack call to of_dma_configure() is needed to setup DMA, and we are at risk of something else trying to use the device. 3) Probe a driver on device A but the of_node was incorrect for DMA so fix it by figuring out the right node and calling of_dma_configure() This will blow up in the iommu code if the driver is unprobed because the bus_type now assumes that dma_configure and dma_cleanup are strictly paired. Since dma_configure will have done the wrong thing due to the missing of_node, dma_cleanup will be unpaired and iommu_device_unuse_default_domain() will blow up. Further the driver operating on device A will not be protected against changes to the iommu domain since it never called iommu_device_use_default_domain() At least this case will not throw a lockdep warning as pr
Re: [Freedreno] [PATCH 0/2] drm: Add component_match_add_of and convert users of drm_of_component_match_add
On 2022-12-16 17:08, Sean Anderson wrote: On 11/3/22 14:22, Sean Anderson wrote: This series adds a new function component_match_add_of to simplify the common case of calling component_match_add_release with component_release_of and component_compare_of. There is already drm_of_component_match_add, which allows for a custom compare function. However, all existing users just use component_compare_of (or an equivalent). I can split the second commit up if that is easier to review. Sean Anderson (2): component: Add helper for device nodes drm: Convert users of drm_of_component_match_add to component_match_add_of .../gpu/drm/arm/display/komeda/komeda_drv.c | 6 ++-- drivers/gpu/drm/arm/hdlcd_drv.c | 9 +- drivers/gpu/drm/arm/malidp_drv.c | 11 +-- drivers/gpu/drm/armada/armada_drv.c | 10 --- drivers/gpu/drm/drm_of.c | 29 +++ drivers/gpu/drm/etnaviv/etnaviv_drv.c | 4 +-- .../gpu/drm/hisilicon/kirin/kirin_drm_drv.c | 3 +- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 3 +- drivers/gpu/drm/mediatek/mtk_drm_drv.c| 4 +-- drivers/gpu/drm/msm/msm_drv.c | 14 - drivers/gpu/drm/sti/sti_drv.c | 3 +- drivers/gpu/drm/sun4i/sun4i_drv.c | 3 +- drivers/gpu/drm/tilcdc/tilcdc_external.c | 10 ++- drivers/iommu/mtk_iommu.c | 3 +- drivers/iommu/mtk_iommu_v1.c | 3 +- include/drm/drm_of.h | 12 include/linux/component.h | 9 ++ sound/soc/codecs/wcd938x.c| 6 ++-- 18 files changed, 46 insertions(+), 96 deletions(-) ping? Should I send a v2 broken up like Mark suggested? FWIW you'll need to rebase the IOMMU changes on 6.2-rc1 anyway - mtk_iommu stops using component_match_add_release() at all. Thanks, Robin.
Re: [Freedreno] [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables
On 2022-08-22 14:52, Robin Murphy wrote: On 2022-08-21 19:19, Rob Clark wrote: From: Rob Clark Optimize TLB invalidation by using different ASID for each set of pgtables. There can be scenarios where multiple processes end up with the same ASID (such as >256 processes using the GPU), but this is harmless, it will only result in some over-invalidation (but less over-invalidation compared to using ASID=0 for all processes) Um, if you're still relying on the GPU doing an invalidate-all-by-ASID whenever it switches a TTBR, then there's only ever one ASID live in the TLBs at once, so it really doesn't matter whether its value stays the same or changes. This seems like a whole chunk of complexity to achieve nothing :/ If you could actually use multiple ASIDs in a meaningful way to avoid any invalidations, you'd need to do considerably more work to keep track of reuse, and those races would probably be a lot less benign. Oh, and on top of that, if you did want to go down that route then chances are you'd then also want to start looking at using global mappings in TTBR1 to avoid increased TLB pressure from kernel buffers, and then we'd run up against some pretty horrid MMU-500 errata which so far I've been happy to ignore on the basis that Linux doesn't use global mappings. Spoiler alert: unless you can additionally convert everything to invalidate by VA, the workaround for #752357 most likely makes the whole idea moot. Robin. Signed-off-by: Rob Clark --- drivers/gpu/drm/msm/msm_iommu.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index a54ed354578b..94c8c09980d1 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova, size_t size) { struct msm_iommu_pagetable *pagetable = to_pagetable(mmu); + struct adreno_smmu_priv *adreno_smmu = + dev_get_drvdata(pagetable->parent->dev); struct io_pgtable_ops *ops = pagetable->pgtbl_ops; size_t unmapped = 0; @@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova, size -= 4096; } - iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain); + adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid); return (unmapped == size) ? 0 : -EINVAL; } @@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev, struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent) { + static atomic_t asid = ATOMIC_INIT(1); struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(parent->dev); struct msm_iommu *iommu = to_msm_iommu(parent); struct msm_iommu_pagetable *pagetable; @@ -210,12 +213,14 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent) pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr; /* - * TODO we would like each set of page tables to have a unique ASID - * to optimize TLB invalidation. But iommu_flush_iotlb_all() will - * end up flushing the ASID used for TTBR1 pagetables, which is not - * what we want. So for now just use the same ASID as TTBR1. + * ASID 0 is used for kernel mapped buffers in TTBR1, which we + * do not need to invalidate when unmapping from TTBR0 pgtables. + * The hw ASID is at *least* 8b, but can be 16b. We just assume + * the worst: */ pagetable->asid = 0; + while (!pagetable->asid) + pagetable->asid = atomic_inc_return(&asid) & 0xff; return &pagetable->base; }
Re: [Freedreno] [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables
On 2022-08-21 19:19, Rob Clark wrote: From: Rob Clark Optimize TLB invalidation by using different ASID for each set of pgtables. There can be scenarios where multiple processes end up with the same ASID (such as >256 processes using the GPU), but this is harmless, it will only result in some over-invalidation (but less over-invalidation compared to using ASID=0 for all processes) Um, if you're still relying on the GPU doing an invalidate-all-by-ASID whenever it switches a TTBR, then there's only ever one ASID live in the TLBs at once, so it really doesn't matter whether its value stays the same or changes. This seems like a whole chunk of complexity to achieve nothing :/ If you could actually use multiple ASIDs in a meaningful way to avoid any invalidations, you'd need to do considerably more work to keep track of reuse, and those races would probably be a lot less benign. Robin. .> Signed-off-by: Rob Clark --- drivers/gpu/drm/msm/msm_iommu.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index a54ed354578b..94c8c09980d1 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova, size_t size) { struct msm_iommu_pagetable *pagetable = to_pagetable(mmu); + struct adreno_smmu_priv *adreno_smmu = + dev_get_drvdata(pagetable->parent->dev); struct io_pgtable_ops *ops = pagetable->pgtbl_ops; size_t unmapped = 0; @@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova, size -= 4096; } - iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain); + adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid); return (unmapped == size) ? 0 : -EINVAL; } @@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev, struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent) { + static atomic_t asid = ATOMIC_INIT(1); struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(parent->dev); struct msm_iommu *iommu = to_msm_iommu(parent); struct msm_iommu_pagetable *pagetable; @@ -210,12 +213,14 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent) pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr; /* -* TODO we would like each set of page tables to have a unique ASID -* to optimize TLB invalidation. But iommu_flush_iotlb_all() will -* end up flushing the ASID used for TTBR1 pagetables, which is not -* what we want. So for now just use the same ASID as TTBR1. +* ASID 0 is used for kernel mapped buffers in TTBR1, which we +* do not need to invalidate when unmapping from TTBR0 pgtables. +* The hw ASID is at *least* 8b, but can be 16b. We just assume +* the worst: */ pagetable->asid = 0; + while (!pagetable->asid) + pagetable->asid = atomic_inc_return(&asid) & 0xff; return &pagetable->base; }
Re: [Freedreno] [PATCH v2 5/5] drm/msm: switch msm_kms_init_aspace() to use device_iommu_mapped()
On 2022-05-05 01:16, Dmitry Baryshkov wrote: Change msm_kms_init_aspace() to use generic function device_iommu_mapped() instead of the fwnode-specific interface dev_iommu_fwspec_get(). While we are at it, stop referencing platform_bus_type directly and use the bus of the IOMMU device. FWIW, I'd have squashed these changes across the previous patches, such that the dodgy fwspec calls are never introduced in the first place, but it's your driver, and if that's the way you want to work it and Rob's happy with it too, then fine by me. For the end result, Reviewed-by: Robin Murphy I'm guessing MDP4 could probably use msm_kms_init_aspace() now as well, but unless there's any other reason to respin this series, that's something we could do as a follow-up. Thanks for sorting this out! Robin. Suggested-by: Robin Murphy Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/msm_drv.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 98ae0036ab57..2fc3f820cd59 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -272,21 +272,21 @@ struct msm_gem_address_space *msm_kms_init_aspace(struct drm_device *dev) struct device *mdss_dev = mdp_dev->parent; struct device *iommu_dev; - domain = iommu_domain_alloc(&platform_bus_type); - if (!domain) { - drm_info(dev, "no IOMMU, fallback to phys contig buffers for scanout\n"); - return NULL; - } - /* * IOMMUs can be a part of MDSS device tree binding, or the * MDP/DPU device. */ - if (dev_iommu_fwspec_get(mdp_dev)) + if (device_iommu_mapped(mdp_dev)) iommu_dev = mdp_dev; else iommu_dev = mdss_dev; + domain = iommu_domain_alloc(iommu_dev->bus); + if (!domain) { + drm_info(dev, "no IOMMU, fallback to phys contig buffers for scanout\n"); + return NULL; + } + mmu = msm_iommu_new(iommu_dev, domain); if (IS_ERR(mmu)) { iommu_domain_free(domain);
Re: [Freedreno] [PATCH 2/3] drm/msm/mdp5: move iommu_domain_alloc() call close to its usage
On 2022-05-03 14:30, Dmitry Baryshkov wrote: On Tue, 3 May 2022 at 13:57, Robin Murphy wrote: On 2022-05-01 11:10, Dmitry Baryshkov wrote: Move iommu_domain_alloc() in front of adress space/IOMMU initialization. This allows us to drop final bits of struct mdp5_cfg_platform which remained from the pre-DT days. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 16 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h | 6 -- drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 6 -- 3 files changed, 4 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c index 1bf9ff5dbabc..714effb967ff 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c @@ -1248,8 +1248,6 @@ static const struct mdp5_cfg_handler cfg_handlers_v3[] = { { .revision = 3, .config = { .hw = &sdm630_config } }, }; -static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev); - const struct mdp5_cfg_hw *mdp5_cfg_get_hw_config(struct mdp5_cfg_handler *cfg_handler) { return cfg_handler->config.hw; @@ -1274,10 +1272,8 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms *mdp5_kms, uint32_t major, uint32_t minor) { struct drm_device *dev = mdp5_kms->dev; - struct platform_device *pdev = to_platform_device(dev->dev); struct mdp5_cfg_handler *cfg_handler; const struct mdp5_cfg_handler *cfg_handlers; - struct mdp5_cfg_platform *pconfig; int i, ret = 0, num_handlers; cfg_handler = kzalloc(sizeof(*cfg_handler), GFP_KERNEL); @@ -1320,9 +1316,6 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms *mdp5_kms, cfg_handler->revision = minor; cfg_handler->config.hw = mdp5_cfg; - pconfig = mdp5_get_config(pdev); - memcpy(&cfg_handler->config.platform, pconfig, sizeof(*pconfig)); - DBG("MDP5: %s hw config selected", mdp5_cfg->name); return cfg_handler; @@ -1333,12 +1326,3 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms *mdp5_kms, return ERR_PTR(ret); } - -static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev) -{ - static struct mdp5_cfg_platform config = {}; - - config.iommu = iommu_domain_alloc(&platform_bus_type); - - return &config; -} diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h index 6b03d7899309..c2502cc33864 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h @@ -104,14 +104,8 @@ struct mdp5_cfg_hw { uint32_t max_clk; }; -/* platform config data (ie. from DT, or pdata) */ -struct mdp5_cfg_platform { - struct iommu_domain *iommu; -}; - struct mdp5_cfg { const struct mdp5_cfg_hw *hw; - struct mdp5_cfg_platform platform; }; struct mdp5_kms; diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c index 9b7bbc3adb97..1c67c2c828cd 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c @@ -558,6 +558,7 @@ static int mdp5_kms_init(struct drm_device *dev) struct msm_gem_address_space *aspace; int irq, i, ret; struct device *iommu_dev; + struct iommu_domain *iommu; ret = mdp5_init(to_platform_device(dev->dev), dev); @@ -601,14 +602,15 @@ static int mdp5_kms_init(struct drm_device *dev) } mdelay(16); - if (config->platform.iommu) { + iommu = iommu_domain_alloc(&platform_bus_type); To preempt the next change down the line as well, could this be rearranged to work as iommu_domain_alloc(iommu_dev->bus)? I'd prefer to split this into the separate change, if you don't mind. Oh, for sure, divide the patches however you see fit - I'm just hoping to save your time overall by getting all the IOMMU-related refactoring done now as a single series rather than risk me coming back and breaking things again in a few months :) Cheers, Robin. + if (iommu) { struct msm_mmu *mmu; iommu_dev = &pdev->dev; if (!dev_iommu_fwspec_get(iommu_dev)) The fwspec helpers are more of an internal thing between the IOMMU drivers and the respective firmware code - I'd rather that external API users stuck consistently to using device_iommu_mapped() (it should give the same result). Let me check that it works correctly and spin a v2 afterwards. Otherwise, thanks for sorting this out! Robin. iommu_dev = iommu_dev->parent; - mmu = msm_iommu_new(iommu_dev, config->platform.iommu); + mmu = msm_iommu_new(iommu_dev, iommu); aspace = msm_gem_address_space_create(mmu, "mdp5", 0x1000, 0x1 - 0x1000);
Re: [Freedreno] [PATCH 2/3] drm/msm/mdp5: move iommu_domain_alloc() call close to its usage
On 2022-05-01 11:10, Dmitry Baryshkov wrote: Move iommu_domain_alloc() in front of adress space/IOMMU initialization. This allows us to drop final bits of struct mdp5_cfg_platform which remained from the pre-DT days. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 16 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h | 6 -- drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 6 -- 3 files changed, 4 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c index 1bf9ff5dbabc..714effb967ff 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c @@ -1248,8 +1248,6 @@ static const struct mdp5_cfg_handler cfg_handlers_v3[] = { { .revision = 3, .config = { .hw = &sdm630_config } }, }; -static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev); - const struct mdp5_cfg_hw *mdp5_cfg_get_hw_config(struct mdp5_cfg_handler *cfg_handler) { return cfg_handler->config.hw; @@ -1274,10 +1272,8 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms *mdp5_kms, uint32_t major, uint32_t minor) { struct drm_device *dev = mdp5_kms->dev; - struct platform_device *pdev = to_platform_device(dev->dev); struct mdp5_cfg_handler *cfg_handler; const struct mdp5_cfg_handler *cfg_handlers; - struct mdp5_cfg_platform *pconfig; int i, ret = 0, num_handlers; cfg_handler = kzalloc(sizeof(*cfg_handler), GFP_KERNEL); @@ -1320,9 +1316,6 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms *mdp5_kms, cfg_handler->revision = minor; cfg_handler->config.hw = mdp5_cfg; - pconfig = mdp5_get_config(pdev); - memcpy(&cfg_handler->config.platform, pconfig, sizeof(*pconfig)); - DBG("MDP5: %s hw config selected", mdp5_cfg->name); return cfg_handler; @@ -1333,12 +1326,3 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms *mdp5_kms, return ERR_PTR(ret); } - -static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev) -{ - static struct mdp5_cfg_platform config = {}; - - config.iommu = iommu_domain_alloc(&platform_bus_type); - - return &config; -} diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h index 6b03d7899309..c2502cc33864 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h @@ -104,14 +104,8 @@ struct mdp5_cfg_hw { uint32_t max_clk; }; -/* platform config data (ie. from DT, or pdata) */ -struct mdp5_cfg_platform { - struct iommu_domain *iommu; -}; - struct mdp5_cfg { const struct mdp5_cfg_hw *hw; - struct mdp5_cfg_platform platform; }; struct mdp5_kms; diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c index 9b7bbc3adb97..1c67c2c828cd 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c @@ -558,6 +558,7 @@ static int mdp5_kms_init(struct drm_device *dev) struct msm_gem_address_space *aspace; int irq, i, ret; struct device *iommu_dev; + struct iommu_domain *iommu; ret = mdp5_init(to_platform_device(dev->dev), dev); @@ -601,14 +602,15 @@ static int mdp5_kms_init(struct drm_device *dev) } mdelay(16); - if (config->platform.iommu) { + iommu = iommu_domain_alloc(&platform_bus_type); To preempt the next change down the line as well, could this be rearranged to work as iommu_domain_alloc(iommu_dev->bus)? + if (iommu) { struct msm_mmu *mmu; iommu_dev = &pdev->dev; if (!dev_iommu_fwspec_get(iommu_dev)) The fwspec helpers are more of an internal thing between the IOMMU drivers and the respective firmware code - I'd rather that external API users stuck consistently to using device_iommu_mapped() (it should give the same result). Otherwise, thanks for sorting this out! Robin. iommu_dev = iommu_dev->parent; - mmu = msm_iommu_new(iommu_dev, config->platform.iommu); + mmu = msm_iommu_new(iommu_dev, iommu); aspace = msm_gem_address_space_create(mmu, "mdp5", 0x1000, 0x1 - 0x1000);
Re: [Freedreno] [PATCH] drm/msm: Revert "drm/msm: Stop using iommu_present()"
On 2022-04-19 22:08, Dmitry Baryshkov wrote: On 20/04/2022 00:04, Robin Murphy wrote: On 2022-04-19 14:04, Dmitry Baryshkov wrote: This reverts commit e2a88eabb02410267519b838fb9b79f5206769be. The commit in question makes msm_use_mmu() check whether the DRM 'component master' device is translated by the IOMMU. At this moment it is the 'mdss' device. However on platforms using the MDP5 driver (e.g. MSM8916/APQ8016, MSM8996/APQ8096) it's the mdp5 device, which has the iommus property (and thus is "translated by the IOMMU"). This results in these devices being broken with the following lines in the dmesg. [drm] Initialized msm 1.9.0 20130625 for 1a0.mdss on minor 0 msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pm4.fw from new location msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pfp.fw from new location msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28 msm 1a0.mdss: could not allocate stolen bo msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28 msm 1a0.mdss: [drm:msm_alloc_stolen_fb] *ERROR* failed to allocate buffer object msm 1a0.mdss: [drm:msm_fbdev_create] *ERROR* failed to allocate fb Getting the mdp5 device pointer from this function is not that easy at this moment. Thus this patch is reverted till the MDSS rework [1] lands. It will make the mdp5/dpu1 device component master and the check will be legit. [1] https://patchwork.freedesktop.org/series/98525/ Oh, DRM... If that series is going to land got 5.19, could you please implement the correct equivalent of this patch within it? Yes, that's the plan. I'm sending a reworked version of your patch shortly (but it still depends on [1]). I'm fine with the revert for now if this patch doesn't work properly in all cases, but I have very little sympathy left for DRM drivers riding roughshod over all the standard driver model abstractions because they're "special". iommu_present() *needs* to go away, so if it's left to me to have a second go at fixing this driver next cycle, you're liable to get some abomination based on of_find_compatible_node() or similar, and I'll probably be demanding an ack to take it through the IOMMU tree ;) No need for such measures :-) Awesome, thanks! Robin.
Re: [Freedreno] [PATCH] drm/msm: Revert "drm/msm: Stop using iommu_present()"
On 2022-04-19 14:04, Dmitry Baryshkov wrote: This reverts commit e2a88eabb02410267519b838fb9b79f5206769be. The commit in question makes msm_use_mmu() check whether the DRM 'component master' device is translated by the IOMMU. At this moment it is the 'mdss' device. However on platforms using the MDP5 driver (e.g. MSM8916/APQ8016, MSM8996/APQ8096) it's the mdp5 device, which has the iommus property (and thus is "translated by the IOMMU"). This results in these devices being broken with the following lines in the dmesg. [drm] Initialized msm 1.9.0 20130625 for 1a0.mdss on minor 0 msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pm4.fw from new location msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pfp.fw from new location msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28 msm 1a0.mdss: could not allocate stolen bo msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28 msm 1a0.mdss: [drm:msm_alloc_stolen_fb] *ERROR* failed to allocate buffer object msm 1a0.mdss: [drm:msm_fbdev_create] *ERROR* failed to allocate fb Getting the mdp5 device pointer from this function is not that easy at this moment. Thus this patch is reverted till the MDSS rework [1] lands. It will make the mdp5/dpu1 device component master and the check will be legit. [1] https://patchwork.freedesktop.org/series/98525/ Oh, DRM... If that series is going to land got 5.19, could you please implement the correct equivalent of this patch within it? I'm fine with the revert for now if this patch doesn't work properly in all cases, but I have very little sympathy left for DRM drivers riding roughshod over all the standard driver model abstractions because they're "special". iommu_present() *needs* to go away, so if it's left to me to have a second go at fixing this driver next cycle, you're liable to get some abomination based on of_find_compatible_node() or similar, and I'll probably be demanding an ack to take it through the IOMMU tree ;) Thanks, Robin. Fixes: e2a88eabb024 ("drm/msm: Stop using iommu_present()") Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/msm_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index b6702b0fafcb..e2b5307b2360 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -263,7 +263,7 @@ bool msm_use_mmu(struct drm_device *dev) struct msm_drm_private *priv = dev->dev_private; /* a2xx comes with its own MMU */ - return priv->is_a2xx || device_iommu_mapped(dev->dev); + return priv->is_a2xx || iommu_present(&platform_bus_type); } static int msm_init_vram(struct drm_device *dev)
[Freedreno] [PATCH] drm/msm: Stop using iommu_present()
Even if some IOMMU has registered itself on the platform "bus", that doesn't necessarily mean it provides translation for the device we care about. Replace iommu_present() with a more appropriate check. Signed-off-by: Robin Murphy --- drivers/gpu/drm/msm/msm_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index affa95eb05fc..9c36b505daab 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -274,7 +274,7 @@ bool msm_use_mmu(struct drm_device *dev) struct msm_drm_private *priv = dev->dev_private; /* a2xx comes with its own MMU */ - return priv->is_a2xx || iommu_present(&platform_bus_type); + return priv->is_a2xx || device_iommu_mapped(dev->dev); } static int msm_init_vram(struct drm_device *dev) -- 2.28.0.dirty
Re: [Freedreno] [PATCH] drm/msm: use orig_nents to iterate over scatterlist with per-process tables
On 2022-03-28 13:55, Jonathan Marek wrote: This matches the implementation of iommu_map_sgtable() used for the non-per-process page tables path. This works around the dma_map_sgtable() call (used to invalidate cache) overwriting sgt->nents with 1 (which is probably a separate issue). FWIW that's expected behaviour. The sgtable APIs use nents to keep track of the number of DMA segments, while orig_nents represents the physical segments. IIUC you're not actually using the DMA mapping, just relying on its side-effects, so it's still the physical segments that you care about for your private IOMMU mapping here. Fixes: b145c6e65eb0 ("drm/msm: Add support to create a local pagetable") Signed-off-by: Jonathan Marek --- drivers/gpu/drm/msm/msm_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index bcaddbba564df..22935ef26a3a1 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -58,7 +58,7 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova, u64 addr = iova; unsigned int i; - for_each_sg(sgt->sgl, sg, sgt->nents, i) { + for_each_sg(sgt->sgl, sg, sgt->orig_nents, i) { Even better would be to use for_each_sgtable_sg(), which exists primarily for the sake of avoiding this exact confusion. Robin. size_t size = sg->length; phys_addr_t phys = sg_phys(sg);
Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-16 15:38, Christoph Hellwig wrote: [...] diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index f1e38526d5bd40..996dfdf9d375dd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2017,7 +2017,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain, .iommu_dev = smmu->dev, }; - if (smmu_domain->non_strict) + if (!iommu_get_dma_strict()) As Will raised, this also needs to be checking "domain->type == IOMMU_DOMAIN_DMA" to maintain equivalent behaviour to the attribute code below. pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); @@ -2449,52 +2449,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) return group; } -static int arm_smmu_domain_get_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) -{ - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - - switch (domain->type) { - case IOMMU_DOMAIN_DMA: - switch (attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = smmu_domain->non_strict; - return 0; - default: - return -ENODEV; - } - break; - default: - return -EINVAL; - } -} [...] diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index f985817c967a25..edb1de479dd1a7 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -668,7 +668,6 @@ struct arm_smmu_domain { struct mutexinit_mutex; /* Protects smmu pointer */ struct io_pgtable_ops *pgtbl_ops; - boolnon_strict; atomic_tnr_ats_masters; enum arm_smmu_domain_stage stage; diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 0aa6d667274970..3dde22b1f8ffb0 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -761,6 +761,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, .iommu_dev = smmu->dev, }; + if (!iommu_get_dma_strict()) Ditto here. Sorry for not spotting that sooner :( Robin. + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; + if (smmu->impl && smmu->impl->init_context) { ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg, dev); if (ret) ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-31 16:32, Will Deacon wrote: On Wed, Mar 31, 2021 at 02:09:37PM +0100, Robin Murphy wrote: On 2021-03-31 12:49, Will Deacon wrote: On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote: On 2021-03-30 14:58, Will Deacon wrote: On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote: On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. Like I say, the cleanup is great. I'm just wondering whether there's a better way to express the complicated logic to decide whether or not to use the flush queue than what we end up with: if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) which is mixing up globals, device properties and domain properties. The result is that the driver code ends up just using the global to determine whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code, which is a departure from the current way of doing things. But previously, SMMU only ever saw the global policy piped through the domain attribute by iommu_group_alloc_default_domain(), so there's no functional change there. For DMA domains sure, but I don't think that's the case for unmanaged domains such as those used by VFIO. Eh? This is only relevant to DMA domains anyway. Flush queues are part of the IOVA allocator that VFIO doesn't even use. It's always been the case that unmanaged domains only use strict invalidation. Maybe I'm going mad. With this patch, the SMMU driver unconditionally sets IO_PGTABLE_QUIRK_NON_STRICT for page-tables if iommu_get_dma_strict() is true, no? In which case, that will get set for page-tables corresponding to unmanaged domains as well as DMA domains when it is enabled. That didn't happen before because you couldn't set the attribute for unmanaged domains. What am I missing? Oh cock... sorry, all this time I've been saying what I *expect* it to do, while overlooking the fact that the IO_PGTABLE_QUIRK_NON_STRICT hunks were the bits I forgot to write and Christoph had to fix up. Indeed, those should be checking the domain type too to preserve the existing behaviour. Apologies for the confusion. Robin. Obviously some of the above checks could be factored out into some kind of iommu_use_flush_queue() helper that IOMMU drivers can also call if they need to keep in sync. Or maybe we just allow iommu-dma to set IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're treating that as a generic thing now. I think a helper that takes a domain would be a good starting point. You mean device, right? The one condition we currently have is at the device level, and there's really nothing inherent to the domain itself that matters (since the type is implicitly IOMMU_DOMAIN_DMA to even care about this). Device would probably work too; you'd pass the first device to attach to the domain when querying t
Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-31 12:49, Will Deacon wrote: On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote: On 2021-03-30 14:58, Will Deacon wrote: On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote: On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. Like I say, the cleanup is great. I'm just wondering whether there's a better way to express the complicated logic to decide whether or not to use the flush queue than what we end up with: if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) which is mixing up globals, device properties and domain properties. The result is that the driver code ends up just using the global to determine whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code, which is a departure from the current way of doing things. But previously, SMMU only ever saw the global policy piped through the domain attribute by iommu_group_alloc_default_domain(), so there's no functional change there. For DMA domains sure, but I don't think that's the case for unmanaged domains such as those used by VFIO. Eh? This is only relevant to DMA domains anyway. Flush queues are part of the IOVA allocator that VFIO doesn't even use. It's always been the case that unmanaged domains only use strict invalidation. Obviously some of the above checks could be factored out into some kind of iommu_use_flush_queue() helper that IOMMU drivers can also call if they need to keep in sync. Or maybe we just allow iommu-dma to set IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're treating that as a generic thing now. I think a helper that takes a domain would be a good starting point. You mean device, right? The one condition we currently have is at the device level, and there's really nothing inherent to the domain itself that matters (since the type is implicitly IOMMU_DOMAIN_DMA to even care about this). Another idea that's just come to mind is now that IOMMU_DOMAIN_DMA has a standard meaning, maybe we could split out a separate IOMMU_DOMAIN_DMA_STRICT type such that it can all propagate from iommu_get_def_domain_type()? That feels like it might be quite promising, but I'd still do it as an improvement on top of this patch, since it's beyond just cleaning up the abuse of domain attributes to pass a command-line option around. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-30 14:58, Will Deacon wrote: On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote: On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. Like I say, the cleanup is great. I'm just wondering whether there's a better way to express the complicated logic to decide whether or not to use the flush queue than what we end up with: if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) which is mixing up globals, device properties and domain properties. The result is that the driver code ends up just using the global to determine whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code, which is a departure from the current way of doing things. But previously, SMMU only ever saw the global policy piped through the domain attribute by iommu_group_alloc_default_domain(), so there's no functional change there. Obviously some of the above checks could be factored out into some kind of iommu_use_flush_queue() helper that IOMMU drivers can also call if they need to keep in sync. Or maybe we just allow iommu-dma to set IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're treating that as a generic thing now. For example, see the recent patch from Lu Baolu: https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com Erm, this patch is based on that one, it's right there in the context :/ Ah, sorry, I didn't spot that! I was just trying to illustrate that this is per-device. Sure, I understand - and I'm just trying to bang home that despite appearances it's never actually been treated as such for SMMU, so anything that's wrong after this change was already wrong before. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. For example, see the recent patch from Lu Baolu: https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com Erm, this patch is based on that one, it's right there in the context :/ Thanks, Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-15 08:33, Christoph Hellwig wrote: On Fri, Mar 12, 2021 at 04:18:24PM +, Robin Murphy wrote: Let me know what you think of the version here: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup I'll happily switch the patch to you as the author if you're fine with that as well. I still have reservations about removing the attribute API entirely and pretending that io_pgtable_cfg is anything other than a SoC-specific private interface, I think a private inteface would make more sense. For now I've just condensed it down to a generic set of quirk bits and dropped the attrs structure, which seems like an ok middle ground for now. That being said I wonder why that quirk isn't simply set in the device tree? Because it's a software policy decision rather than any inherent property of the platform, and the DT certainly doesn't know *when* any particular device might prefer its IOMMU to use cacheable pagetables to minimise TLB miss latency vs. saving the cache capacity for larger data buffers. It really is most logical to decide this at the driver level. In truth the overall concept *is* relatively generic (a trend towards larger system caches and cleverer usage is about both raw performance and saving power on off-SoC DRAM traffic), it's just the particular implementation of using io-pgtable to set an outer-cacheable walk attribute in an SMMU TCR that's pretty much specific to Qualcomm SoCs. Hence why having a common abstraction at the iommu_domain level, but where the exact details are free to vary across different IOMMUs and their respective client drivers, is in many ways an ideal fit. but the reworked patch on its own looks reasonable to me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers either...) Just iommu_get_dma_strict() needs an export since the SMMU drivers can be modular - I consciously didn't add that myself since I was mistakenly thinking only iommu-dma would call it. Fixed. Can I get your signoff for the patch? Then I'll switch it to over to being attributed to you. Sure - I would have thought that the one I originally posted still stands, but for the avoidance of doubt, for the parts of commit 8b6d45c495bd in your tree that remain from what I wrote: Signed-off-by: Robin Murphy Cheers, Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-11 08:26, Christoph Hellwig wrote: On Wed, Mar 10, 2021 at 06:39:57PM +, Robin Murphy wrote: Actually... Just mirroring the iommu_dma_strict value into struct iommu_domain should solve all of that with very little boilerplate code. Yes, my initial thought was to directly replace the attribute with a common flag at iommu_domain level, but since in all cases the behaviour is effectively global rather than actually per-domain, it seemed reasonable to take it a step further. This passes compile-testing for arm64 and x86, what do you think? It seems to miss a few bits, and also generally seems to be not actually apply to recent mainline or something like it due to different empty lines in a few places. Yeah, that was sketched out on top of some other development patches, and in being so focused on not breaking any of the x86 behaviours I did indeed overlook fully converting the SMMU drivers... oops! (my thought was to do the conversion for its own sake, then clean up the redundant attribute separately, but I guess it's fine either way) Let me know what you think of the version here: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup I'll happily switch the patch to you as the author if you're fine with that as well. I still have reservations about removing the attribute API entirely and pretending that io_pgtable_cfg is anything other than a SoC-specific private interface, but the reworked patch on its own looks reasonable to me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers either...) Just iommu_get_dma_strict() needs an export since the SMMU drivers can be modular - I consciously didn't add that myself since I was mistakenly thinking only iommu-dma would call it. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-10 09:25, Christoph Hellwig wrote: On Wed, Mar 10, 2021 at 10:15:01AM +0100, Christoph Hellwig wrote: On Thu, Mar 04, 2021 at 03:25:27PM +, Robin Murphy wrote: On 2021-03-01 08:42, Christoph Hellwig wrote: Use explicit methods for setting and querying the information instead. Now that everyone's using iommu-dma, is there any point in bouncing this through the drivers at all? Seems like it would make more sense for the x86 drivers to reflect their private options back to iommu_dma_strict (and allow Intel's caching mode to override it as well), then have iommu_dma_init_domain just test !iommu_dma_strict && domain->ops->flush_iotlb_all. Hmm. I looked at this, and kill off ->dma_enable_flush_queue for the ARM drivers and just looking at iommu_dma_strict seems like a very clear win. OTOH x86 is a little more complicated. AMD and intel defaul to lazy mode, so we'd have to change the global iommu_dma_strict if they are initialized. Also Intel has not only a "static" option to disable lazy mode, but also a "dynamic" one where it iterates structure. So I think on the get side we're stuck with the method, but it still simplifies the whole thing. Actually... Just mirroring the iommu_dma_strict value into struct iommu_domain should solve all of that with very little boilerplate code. Yes, my initial thought was to directly replace the attribute with a common flag at iommu_domain level, but since in all cases the behaviour is effectively global rather than actually per-domain, it seemed reasonable to take it a step further. This passes compile-testing for arm64 and x86, what do you think? Robin. ->8- Subject: [PATCH] iommu: Consolidate strict invalidation handling Now that everyone is using iommu-dma, the global invalidation policy really doesn't need to be woven through several parts of the core API and individual drivers, we can just look it up directly at the one point that we now make the flush queue decision. If the x86 drivers reflect their internal options and overrides back to iommu_dma_strict, that can become the canonical source. Signed-off-by: Robin Murphy --- drivers/iommu/amd/iommu.c | 2 ++ drivers/iommu/dma-iommu.c | 8 +--- drivers/iommu/intel/iommu.c | 12 drivers/iommu/iommu.c | 35 +++ include/linux/iommu.h | 2 ++ 5 files changed, 44 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a69a8b573e40..1db29e59d468 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1856,6 +1856,8 @@ int __init amd_iommu_init_dma_ops(void) else pr_info("Lazy IO/TLB flushing enabled\n"); + iommu_set_dma_strict(amd_iommu_unmap_flush); + return 0; } diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index af765c813cc8..789a950cc125 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -304,10 +304,6 @@ static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad) cookie = container_of(iovad, struct iommu_dma_cookie, iovad); domain = cookie->fq_domain; - /* -* The IOMMU driver supporting DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE -* implies that ops->flush_iotlb_all must be non-NULL. -*/ domain->ops->flush_iotlb_all(domain); } @@ -334,7 +330,6 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, struct iommu_dma_cookie *cookie = domain->iova_cookie; unsigned long order, base_pfn; struct iova_domain *iovad; - int attr; if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) return -EINVAL; @@ -371,8 +366,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, init_iova_domain(iovad, 1UL << order, base_pfn); if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && - !iommu_domain_get_attr(domain, DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, &attr) && - attr) { + domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) { if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, iommu_dma_entry_dtor)) pr_warn("iova flush queue initialization failed\n"); diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index b5c746f0f63b..f5b452cd1266 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4377,6 +4377,17 @@ int __init intel_iommu_init(void) down_read(&dmar_global_lock); for_each_active_iommu(iommu, drhd) { + if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) { + /* +* The flush queue implementation does
Re: [Freedreno] [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG
On 2021-03-01 08:42, Christoph Hellwig wrote: Signed-off-by: Christoph Hellwig Moreso than the previous patch, where the feature is at least relatively generic (note that there's a bunch of in-flight development around DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to bloat the generic iommu_ops structure with private driver-specific interfaces. The attribute interface is a great compromise for these kinds of things, and you can easily add type-checked wrappers around it for external callers (maybe even make the actual attributes internal between the IOMMU core and drivers) if that's your concern. Robin. --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +- drivers/iommu/arm/arm-smmu/arm-smmu.c | 40 +++-- drivers/iommu/iommu.c | 9 ++ include/linux/iommu.h | 9 +- 4 files changed, 29 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 0f184c3dd9d9ec..78d98ab2ee3a68 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu) struct io_pgtable_domain_attr pgtbl_cfg; pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA; - iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg); + iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg); } struct msm_gem_address_space * diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 2e17d990d04481..2858999c86dfd1 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct iommu_domain *domain) return ret; } -static int arm_smmu_domain_set_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) +static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain, + struct io_pgtable_domain_attr *pgtbl_cfg) { - int ret = 0; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + int ret = -EPERM; - mutex_lock(&smmu_domain->init_mutex); - - switch(domain->type) { - case IOMMU_DOMAIN_UNMANAGED: - switch (attr) { - case DOMAIN_ATTR_IO_PGTABLE_CFG: { - struct io_pgtable_domain_attr *pgtbl_cfg = data; - - if (smmu_domain->smmu) { - ret = -EPERM; - goto out_unlock; - } + if (domain->type != IOMMU_DOMAIN_UNMANAGED) + return -EINVAL; - smmu_domain->pgtbl_cfg = *pgtbl_cfg; - break; - } - default: - ret = -ENODEV; - } - break; - case IOMMU_DOMAIN_DMA: - ret = -ENODEV; - break; - default: - ret = -EINVAL; + mutex_lock(&smmu_domain->init_mutex); + if (!smmu_domain->smmu) { + smmu_domain->pgtbl_cfg = *pgtbl_cfg; + ret = 0; } -out_unlock: mutex_unlock(&smmu_domain->init_mutex); + return ret; } @@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = { .device_group = arm_smmu_device_group, .dma_use_flush_queue= arm_smmu_dma_use_flush_queue, .dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue, - .domain_set_attr= arm_smmu_domain_set_attr, + .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr, .domain_enable_nesting = arm_smmu_domain_enable_nesting, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2e9e058501a953..8490aefd4b41f8 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain *domain) } EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting); +int iommu_domain_set_pgtable_attr(struct iommu_domain *domain, + struct io_pgtable_domain_attr *pgtbl_cfg) +{ + if (!domain->ops->domain_set_pgtable_attr) + return -EINVAL; + return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg); +} +EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr); + void iommu_get_resv_regions(struct device *dev, struct list_head *list) { const struct iommu_ops *ops = dev->bus->iommu_ops; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index aed88aa3bd3edf..39d3ed4d2700ac 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -40,6 +40,7 @@ struct iommu_domain; struct notifier_block; struct iommu_sva; struct iommu_fault_event; +stru
Re: [Freedreno] [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-01 08:42, Christoph Hellwig wrote: Use explicit methods for setting and querying the information instead. Now that everyone's using iommu-dma, is there any point in bouncing this through the drivers at all? Seems like it would make more sense for the x86 drivers to reflect their private options back to iommu_dma_strict (and allow Intel's caching mode to override it as well), then have iommu_dma_init_domain just test !iommu_dma_strict && domain->ops->flush_iotlb_all. Robin. Also remove the now unused iommu_domain_get_attr functionality. Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 ++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++--- drivers/iommu/arm/arm-smmu/arm-smmu.c | 56 + drivers/iommu/dma-iommu.c | 8 ++- drivers/iommu/intel/iommu.c | 27 ++ drivers/iommu/iommu.c | 19 +++ include/linux/iommu.h | 17 ++- 7 files changed, 51 insertions(+), 146 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a69a8b573e40d0..37a8e51db17656 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1771,24 +1771,11 @@ static struct iommu_group *amd_iommu_device_group(struct device *dev) return acpihid_device_group(dev); } -static int amd_iommu_domain_get_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) +static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain) { - switch (domain->type) { - case IOMMU_DOMAIN_UNMANAGED: - return -ENODEV; - case IOMMU_DOMAIN_DMA: - switch (attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = !amd_iommu_unmap_flush; - return 0; - default: - return -ENODEV; - } - break; - default: - return -EINVAL; - } + if (domain->type != IOMMU_DOMAIN_DMA) + return false; + return !amd_iommu_unmap_flush; } /* @@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = { .release_device = amd_iommu_release_device, .probe_finalize = amd_iommu_probe_finalize, .device_group = amd_iommu_device_group, - .domain_get_attr = amd_iommu_domain_get_attr, + .dma_use_flush_queue = amd_iommu_dma_use_flush_queue, .get_resv_regions = amd_iommu_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, .is_attach_deferred = amd_iommu_is_attach_deferred, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 8594b4a8304375..bf96172e8c1f71 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) return group; } -static int arm_smmu_domain_get_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) +static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - switch (domain->type) { - case IOMMU_DOMAIN_UNMANAGED: - switch (attr) { - case DOMAIN_ATTR_NESTING: - *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); - return 0; - default: - return -ENODEV; - } - break; - case IOMMU_DOMAIN_DMA: - switch (attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = smmu_domain->non_strict; - return 0; - default: - return -ENODEV; - } - break; - default: - return -EINVAL; - } + if (domain->type != IOMMU_DOMAIN_DMA) + return false; + return smmu_domain->non_strict; +} + + +static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain) +{ + if (domain->type != IOMMU_DOMAIN_DMA) + return; + to_smmu_domain(domain)->non_strict = true; } static int arm_smmu_domain_set_attr(struct iommu_domain *domain, @@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, } break; case IOMMU_DOMAIN_DMA: - switch(attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - smmu_domain->non_strict = *(int *)data; - break; - default: - ret = -EN
Re: [Freedreno] [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration
On 2020-12-22 19:54, isa...@codeaurora.org wrote: On 2020-12-22 11:27, Robin Murphy wrote: On 2020-12-22 00:44, Isaac J. Manjarres wrote: The io-pgtable code constructs an array of init functions for each page table format at compile time. This is not ideal, as this increases the footprint of the io-pgtable code, as well as prevents io-pgtable formats from being built as kernel modules. In preparation for modularizing the io-pgtable formats, switch to a dynamic registration scheme, where each io-pgtable format can register their init functions with the io-pgtable code at boot or module insertion time. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/io-pgtable-arm-v7s.c | 34 +- drivers/iommu/io-pgtable-arm.c | 90 ++-- drivers/iommu/io-pgtable.c | 94 -- include/linux/io-pgtable.h | 51 + 4 files changed, 209 insertions(+), 60 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 1d92ac9..89aad2f 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -835,7 +836,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { +static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { + .fmt = ARM_V7S, .alloc = arm_v7s_alloc_pgtable, .free = arm_v7s_free_pgtable, }; @@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void) pr_info("self test ok\n"); return 0; } -subsys_initcall(arm_v7s_do_selftests); +#else +static int arm_v7s_do_selftests(void) +{ + return 0; +} #endif + +static int __init arm_v7s_init(void) +{ + int ret; + + ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns); + if (ret < 0) { + pr_err("Failed to register ARM V7S format\n"); Super-nit: I think "v7s" should probably be lowercase there. Also general consistency WRT to showing the error code and whether or not to abbreviate "format" would be nice. Ok, I can fix this accordingly. + return ret; + } + + ret = arm_v7s_do_selftests(); + if (ret < 0) + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); + + return ret; +} +core_initcall(arm_v7s_init); + +static void __exit arm_v7s_exit(void) +{ + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); +} +module_exit(arm_v7s_exit); diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58..ff0ea2f 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = { - .alloc = arm_mali_lpae_alloc_pgtable, - .free = arm_lpae_free_pgtable, +static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = { + { + .fmt = ARM_32_LPAE_S1, + .alloc = arm_32_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_32_LPAE_S2, + .alloc = arm_32_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_64_LPAE_S1, + .alloc = arm_64_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_64_LPAE_S2, + .alloc = arm_64_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_MALI_LPAE, + .alloc = arm_mali_lpae_alloc_pgtable, + .free = arm_lpae_free_pgtable, + }, }; #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST @@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void) pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail); return fail ? -EFAULT : 0; } -subsys_initcall(arm_lpae_do_selftests); +#else +static int __init arm_lpae_d
Re: [Freedreno] [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules
On 2020-12-22 19:49, isa...@codeaurora.org wrote: On 2020-12-22 11:27, Robin Murphy wrote: On 2020-12-22 00:44, Isaac J. Manjarres wrote: The SMMU driver depends on the availability of the ARM LPAE and ARM V7S io-pgtable format code to work properly. In preparation Nit: we don't really depend on v7s - we *can* use it if it's available, address constraints are suitable, and the SMMU implementation actually supports it (many don't), but we can still quite happily not use it even so. LPAE is mandatory in the architecture so that's our only hard requirement, embodied in the kconfig select. This does mean there may technically still be a corner case involving ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime failure rather than a build error, so unless and until anyone demonstrates that it actually matters I don't feel particularly inclined to give it much thought. Robin. Okay, I'll fix up the commit message, as well as the code, so that it only depends on io-pgtable-arm. Well, IIUC it would make sense to keep the softdep for when the v7s module *is* present; I just wanted to clarify that it's more of a nice-to-have rather than a necessity. Robin. Thanks, Isaac for having the io-pgtable formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to ensure that the io-pgtable format modules are loaded before loading the ARM SMMU driver module. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index d8c6bfd..a72649f 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations"); MODULE_AUTHOR("Will Deacon "); MODULE_ALIAS("platform:arm-smmu"); MODULE_LICENSE("GPL v2"); +MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s"); ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration
On 2020-12-22 00:44, Isaac J. Manjarres wrote: The io-pgtable code constructs an array of init functions for each page table format at compile time. This is not ideal, as this increases the footprint of the io-pgtable code, as well as prevents io-pgtable formats from being built as kernel modules. In preparation for modularizing the io-pgtable formats, switch to a dynamic registration scheme, where each io-pgtable format can register their init functions with the io-pgtable code at boot or module insertion time. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/io-pgtable-arm-v7s.c | 34 +- drivers/iommu/io-pgtable-arm.c | 90 ++-- drivers/iommu/io-pgtable.c | 94 -- include/linux/io-pgtable.h | 51 + 4 files changed, 209 insertions(+), 60 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 1d92ac9..89aad2f 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -835,7 +836,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { +static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { + .fmt= ARM_V7S, .alloc = arm_v7s_alloc_pgtable, .free = arm_v7s_free_pgtable, }; @@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void) pr_info("self test ok\n"); return 0; } -subsys_initcall(arm_v7s_do_selftests); +#else +static int arm_v7s_do_selftests(void) +{ + return 0; +} #endif + +static int __init arm_v7s_init(void) +{ + int ret; + + ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns); + if (ret < 0) { + pr_err("Failed to register ARM V7S format\n"); Super-nit: I think "v7s" should probably be lowercase there. Also general consistency WRT to showing the error code and whether or not to abbreviate "format" would be nice. + return ret; + } + + ret = arm_v7s_do_selftests(); + if (ret < 0) + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); + + return ret; +} +core_initcall(arm_v7s_init); + +static void __exit arm_v7s_exit(void) +{ + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); +} +module_exit(arm_v7s_exit); diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58..ff0ea2f 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = { - .alloc = arm_mali_lpae_alloc_pgtable, - .free = arm_lpae_free_pgtable, +static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = { + { + .fmt= ARM_32_LPAE_S1, + .alloc = arm_32_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt= ARM_32_LPAE_S2, + .alloc = arm_32_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt= ARM_64_LPAE_S1, + .alloc = arm_64_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt= ARM_64_LPAE_S2, + .alloc = arm_64_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt= ARM_MALI_LPAE, + .alloc = arm_mali_lpae_alloc_pgtable, + .free = arm_lpae_free_pgtable, + }, }; #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST @@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void) pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail); return fail ? -EFAULT : 0; } -subsys_initcall(arm_lpae_do_selftests); +#else +static int __init arm_lpae_
Re: [Freedreno] [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules
On 2020-12-22 00:44, Isaac J. Manjarres wrote: The SMMU driver depends on the availability of the ARM LPAE and ARM V7S io-pgtable format code to work properly. In preparation Nit: we don't really depend on v7s - we *can* use it if it's available, address constraints are suitable, and the SMMU implementation actually supports it (many don't), but we can still quite happily not use it even so. LPAE is mandatory in the architecture so that's our only hard requirement, embodied in the kconfig select. This does mean there may technically still be a corner case involving ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime failure rather than a build error, so unless and until anyone demonstrates that it actually matters I don't feel particularly inclined to give it much thought. Robin. for having the io-pgtable formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to ensure that the io-pgtable format modules are loaded before loading the ARM SMMU driver module. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index d8c6bfd..a72649f 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations"); MODULE_AUTHOR("Will Deacon "); MODULE_ALIAS("platform:arm-smmu"); MODULE_LICENSE("GPL v2"); +MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s"); ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH] drm/msm/a6xx: Add support for using system cache on MMU500 based targets
On 2020-10-26 18:54, Jordan Crouse wrote: This is an extension to the series [1] to enable the System Cache (LLC) for Adreno a6xx targets. GPU targets with an MMU-500 attached have a slightly different process for enabling system cache. Use the compatible string on the IOMMU phandle to see if an MMU-500 is attached and modify the programming sequence accordingly. [1] https://patchwork.freedesktop.org/series/83037/ Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 +-- drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 + 2 files changed, 37 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 95c98c642876..b7737732fbb6 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1042,6 +1042,8 @@ static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu) static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu) { + struct adreno_gpu *adreno_gpu = &a6xx_gpu->base; + struct msm_gpu *gpu = &adreno_gpu->base; u32 cntl1_regval = 0; if (IS_ERR(a6xx_gpu->llc_mmio)) @@ -1055,11 +1057,17 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu) (gpu_scid << 15) | (gpu_scid << 20); } + /* +* For targets with a MMU500, activate the slice but don't program the +* register. The XBL will take care of that. +*/ if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) { - u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice); + if (!a6xx_gpu->have_mmu500) { + u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice); - gpuhtw_scid &= 0x1f; - cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid); + gpuhtw_scid &= 0x1f; + cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid); + } } if (cntl1_regval) { @@ -1067,13 +1075,20 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu) * Program the slice IDs for the various GPU blocks and GPU MMU * pagetables */ - a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, cntl1_regval); - - /* -* Program cacheability overrides to not allocate cache lines on -* a write miss -*/ - a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, 0x03); + if (a6xx_gpu->have_mmu500) + gpu_rmw(gpu, REG_A6XX_GBIF_SCACHE_CNTL1, GENMASK(24, 0), + cntl1_regval); + else { + a6xx_llc_write(a6xx_gpu, + REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, cntl1_regval); + + /* +* Program cacheability overrides to not allocate cache +* lines on a write miss +*/ + a6xx_llc_rmw(a6xx_gpu, + REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, 0x03); + } } } @@ -1086,10 +1101,21 @@ static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu) static void a6xx_llc_slices_init(struct platform_device *pdev, struct a6xx_gpu *a6xx_gpu) { + struct device_node *phandle; + a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx"); if (IS_ERR(a6xx_gpu->llc_mmio)) return; + /* +* There is a different programming path for targets with an mmu500 +* attached, so detect if that is the case +*/ + phandle = of_parse_phandle(pdev->dev.of_node, "iommus", 0); + a6xx_gpu->have_mmu500 = (phandle && + of_device_is_compatible(phandle, "arm,mmu500")); Note that this should never match, since the compatible string defined by the binding is "arm,mmu-500" ;) Robin. + of_node_put(phandle); + a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU); a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW); diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h index 9e6079af679c..e793d329e77b 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h @@ -32,6 +32,7 @@ struct a6xx_gpu { void __iomem *llc_mmio; void *llc_slice; void *htw_llc_slice; + bool have_mmu500; }; #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base) ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
[Freedreno] [PATCH] drm/msm: Add missing stub definition
DRM_MSM fails to build with DRM_MSM_DP=n; add the missing stub. Signed-off-by: Robin Murphy --- drivers/gpu/drm/msm/msm_drv.h | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index b9dd8f8f4887..0b2686b060c7 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -423,6 +423,11 @@ static inline int msm_dp_display_disable(struct msm_dp *dp, { return -EINVAL; } +static inline int msm_dp_display_pre_disable(struct msm_dp *dp, + struct drm_encoder *encoder) +{ + return -EINVAL; +} static inline void msm_dp_display_mode_set(struct msm_dp *dp, struct drm_encoder *encoder, struct drm_display_mode *mode, -- 2.28.0.dirty ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 2/3] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance
On 2020-10-07 07:25, Christoph Hellwig wrote: On Tue, Oct 06, 2020 at 09:19:32AM -0400, Jonathan Marek wrote: One example why drm/msm can't use DMA API is multiple page table support (that is landing in 5.10), which is something that definitely couldn't work with DMA API. Another one is being able to choose the address for mappings, which AFAIK DMA API can't do (somewhat related to this: qcom hardware often has ranges of allowed addresses, which the dma_mask mechanism fails to represent, what I see is drivers using dma_mask as a "maximum address", and since addresses are allocated from the top it generally works) That sounds like a good enough rason to use the IOMMU API. I just wanted to make sure this really makes sense. I still think this situation would be best handled with a variant of dma_ops_bypass that also guarantees to bypass SWIOTLB, and can be set automatically when attaching to an unmanaged IOMMU domain. That way the device driver can make DMA API calls in the appropriate places that do the right thing either way, and only needs logic to decide whether to use the returned DMA addresses directly or ignore them if it knows they're overridden by its own IOMMU mapping. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCHv5 5/6] iommu: arm-smmu-impl: Use table to list QCOM implementations
On 2020-09-22 07:18, Sai Prakash Ranjan wrote: Use table and of_match_node() to match qcom implementation instead of multiple of_device_compatible() calls for each QCOM SMMU implementation. Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c index d199b4bff15d..ce78295cfa78 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c @@ -9,6 +9,13 @@ #include "arm-smmu.h" +static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = { + { .compatible = "qcom,sc7180-smmu-500" }, + { .compatible = "qcom,sdm845-smmu-500" }, + { .compatible = "qcom,sm8150-smmu-500" }, + { .compatible = "qcom,sm8250-smmu-500" }, + { } +}; Can you push the table itself into arm-smmu-qcom? That way you'll be free to add new SoCs willy-nilly without any possibility of conflicting with anything else. Bonus points if you can fold in the Adreno variant and keep everything together ;) Robin. static int arm_smmu_gr0_ns(int offset) { @@ -217,10 +224,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_device_is_compatible(np, "nvidia,tegra194-smmu")) return nvidia_smmu_impl_init(smmu); - if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || - of_device_is_compatible(np, "qcom,sc7180-smmu-500") || - of_device_is_compatible(np, "qcom,sm8150-smmu-500") || - of_device_is_compatible(np, "qcom,sm8250-smmu-500")) + if (of_match_node(qcom_smmu_impl_of_match, np)) return qcom_smmu_impl_init(smmu); if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu")) ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCHv4 1/6] iommu/io-pgtable-arm: Add support to use system cache
On 2020-09-21 19:03, Will Deacon wrote: On Fri, Sep 11, 2020 at 07:57:18PM +0530, Sai Prakash Ranjan wrote: Add a quirk IO_PGTABLE_QUIRK_SYS_CACHE to override the attributes set in TCR for the page table walker when using system cache. I wonder if the panfrost folks can reuse this for the issue discussed over at: https://lore.kernel.org/r/cover.1600213517.git.robin.mur...@arm.com Isn't this all hinged around the outer cacheability attribute, rather than shareability (since these are nominally NC mappings and thus already properly Osh)? The Panfrost issue is just about shareability domains being a bit wonky; the cacheability attributes there are actually reasonably normal (other than not having a non-cacheable type at all, only a choice of allocation policies...) Robin. However, Sai, your email setup went wrong when you posted this so you probably need to repost now that you have that fixed. Will ___ iommu mailing list io...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCHv4 6/6] iommu: arm-smmu-impl: Remove unwanted extra blank lines
On 2020-09-11 17:21, Sai Prakash Ranjan wrote: On 2020-09-11 21:37, Will Deacon wrote: On Fri, Sep 11, 2020 at 05:03:06PM +0100, Robin Murphy wrote: BTW am I supposed to have received 3 copies of everything? Because I did... Yeah, this seems to be happening for all of Sai's emails :/ Sorry, I am not sure what went wrong as I only sent this once and there are no recent changes to any of my configs, I'll check it further. Actually on closer inspection it appears to be "correct" behaviour. I'm still subscribed to LAKML and the IOMMU list on this account, but normally Office 365 deduplicates so aggressively that I have rules set up to copy list mails that I'm cc'ed on back to my inbox, in case they arrive first and cause the direct copy to get eaten - apparently there's something unique about your email setup that manages to defeat the deduplicator and make it deliver all 3 copies intact... :/ Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCHv4 6/6] iommu: arm-smmu-impl: Remove unwanted extra blank lines
On 2020-09-11 15:28, Sai Prakash Ranjan wrote: There are few places in arm-smmu-impl where there are extra blank lines, remove them FWIW those were deliberate - sometimes I like a bit of subtle space to visually delineate distinct groups of definitions. I suppose it won't be to everyone's taste :/ and while at it fix the checkpatch warning for space required before the open parenthesis. That one, however, was not ;) BTW am I supposed to have received 3 copies of everything? Because I did... Robin. Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c index ce78295cfa78..f5b5218cbe5b 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c @@ -19,7 +19,7 @@ static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = { static int arm_smmu_gr0_ns(int offset) { - switch(offset) { + switch (offset) { case ARM_SMMU_GR0_sCR0: case ARM_SMMU_GR0_sACR: case ARM_SMMU_GR0_sGFSR: @@ -54,7 +54,6 @@ static const struct arm_smmu_impl calxeda_impl = { .write_reg = arm_smmu_write_ns, }; - struct cavium_smmu { struct arm_smmu_device smmu; u32 id_base; @@ -110,7 +109,6 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smm return &cs->smmu; } - #define ARM_MMU500_ACTLR_CPRE (1 << 1) #define ARM_MMU500_ACR_CACHE_LOCK (1 << 26) @@ -197,7 +195,6 @@ static const struct arm_smmu_impl mrvl_mmu500_impl = { .reset = arm_mmu500_reset, }; - struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) { const struct device_node *np = smmu->dev->of_node; ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
[Freedreno] [PATCH] drm/msm: Drop local dma_parms
Since commit 9495b7e92f71 ("driver core: platform: Initialize dma_parms for platform devices"), struct platform_device already provides a dma_parms structure, so we can save allocating another one. Also the DMA segment size is simply a size, not a bitmask. Signed-off-by: Robin Murphy --- drivers/gpu/drm/msm/msm_drv.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 7d641c7e3514..2b73d29642b7 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -453,15 +453,7 @@ static int msm_drm_init(struct device *dev, struct drm_driver *drv) if (ret) goto err_msm_uninit; - if (!dev->dma_parms) { - dev->dma_parms = devm_kzalloc(dev, sizeof(*dev->dma_parms), - GFP_KERNEL); - if (!dev->dma_parms) { - ret = -ENOMEM; - goto err_msm_uninit; - } - } - dma_set_max_seg_size(dev, DMA_BIT_MASK(32)); + dma_set_max_seg_size(dev, UINT_MAX); msm_gem_shrinker_init(ddev); -- 2.28.0.dirty ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v9 12/32] drm: msm: fix common struct sg_table related issues
On 2020-08-26 07:32, Marek Szyprowski wrote: The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function returns the number of the created entries in the DMA address space. However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and dma_unmap_sg must be called with the original number of the entries passed to the dma_map_sg(). struct sg_table is a common structure used for describing a non-contiguous memory buffer, used commonly in the DRM and graphics subsystems. It consists of a scatterlist with memory pages and DMA addresses (sgl entry), as well as the number of scatterlist entries: CPU pages (orig_nents entry) and DMA mapped pages (nents entry). It turned out that it was a common mistake to misuse nents and orig_nents entries, calling DMA-mapping functions with a wrong number of entries or ignoring the number of mapped entries returned by the dma_map_sg() function. To avoid such issues, lets use a common dma-mapping wrappers operating directly on the struct sg_table objects and use scatterlist page iterators where possible. This, almost always, hides references to the nents and orig_nents entries, making the code robust, easier to follow and copy/paste safe. Signed-off-by: Marek Szyprowski Acked-by: Rob Clark --- drivers/gpu/drm/msm/msm_gem.c| 13 + drivers/gpu/drm/msm/msm_gpummu.c | 14 ++ drivers/gpu/drm/msm/msm_iommu.c | 2 +- 3 files changed, 12 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c index b2f49152b4d4..8c7ae812b813 100644 --- a/drivers/gpu/drm/msm/msm_gem.c +++ b/drivers/gpu/drm/msm/msm_gem.c @@ -53,11 +53,10 @@ static void sync_for_device(struct msm_gem_object *msm_obj) struct device *dev = msm_obj->base.dev->dev; if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) { - dma_sync_sg_for_device(dev, msm_obj->sgt->sgl, - msm_obj->sgt->nents, DMA_BIDIRECTIONAL); + dma_sync_sgtable_for_device(dev, msm_obj->sgt, + DMA_BIDIRECTIONAL); } else { - dma_map_sg(dev, msm_obj->sgt->sgl, - msm_obj->sgt->nents, DMA_BIDIRECTIONAL); + dma_map_sgtable(dev, msm_obj->sgt, DMA_BIDIRECTIONAL, 0); } } @@ -66,11 +65,9 @@ static void sync_for_cpu(struct msm_gem_object *msm_obj) struct device *dev = msm_obj->base.dev->dev; if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) { - dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl, - msm_obj->sgt->nents, DMA_BIDIRECTIONAL); + dma_sync_sgtable_for_cpu(dev, msm_obj->sgt, DMA_BIDIRECTIONAL); } else { - dma_unmap_sg(dev, msm_obj->sgt->sgl, - msm_obj->sgt->nents, DMA_BIDIRECTIONAL); + dma_unmap_sgtable(dev, msm_obj->sgt, DMA_BIDIRECTIONAL, 0); } } diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c index 310a31b05faa..319f06c28235 100644 --- a/drivers/gpu/drm/msm/msm_gpummu.c +++ b/drivers/gpu/drm/msm/msm_gpummu.c @@ -30,21 +30,19 @@ static int msm_gpummu_map(struct msm_mmu *mmu, uint64_t iova, { struct msm_gpummu *gpummu = to_msm_gpummu(mmu); unsigned idx = (iova - GPUMMU_VA_START) / GPUMMU_PAGE_SIZE; - struct scatterlist *sg; + struct sg_dma_page_iter dma_iter; unsigned prot_bits = 0; - unsigned i, j; if (prot & IOMMU_WRITE) prot_bits |= 1; if (prot & IOMMU_READ) prot_bits |= 2; - for_each_sg(sgt->sgl, sg, sgt->nents, i) { - dma_addr_t addr = sg->dma_address; - for (j = 0; j < sg->length / GPUMMU_PAGE_SIZE; j++, idx++) { - gpummu->table[idx] = addr | prot_bits; - addr += GPUMMU_PAGE_SIZE; - } + for_each_sgtable_dma_page(sgt, &dma_iter, 0) { + dma_addr_t addr = sg_page_iter_dma_address(&dma_iter); + + BUILD_BUG_ON(GPUMMU_PAGE_SIZE != PAGE_SIZE); + gpummu->table[idx++] = addr | prot_bits; Given that the BUILD_BUG_ON might prevent valid arm64 configs from building, how about a simple tweak like: for (i = 0; i < PAGE_SIZE; i += GPUMMU_PAGE_SIZE) gpummu->table[idx++] = i + addr | prot_bits; ? Or alternatively perhaps some more aggressive #ifdefs or makefile tweaks to prevent the GPUMMU code building for arm64 at all if it's only relevant to 32-bit platforms (which I believe might be the case). Robin. } /* we can improve by deferring flush for multiple map() */ diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 3a381a9674c9..6c31e65834c6 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -36,7 +36,7 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
Re: [Freedreno] arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c
On 2020-07-20 08:17, Arnd Bergmann wrote: On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju wrote: This kernel oops while boot linux mainline kernel on arm64 db410c device. metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f8456690ba8eb18ea4714e68554e242a04f65cff git describe: v5.8-rc5-48-gf8456690ba8e make_kernelversion: 5.8.0-rc5 kernel-config: https://builds.tuxbuild.com/2aLnwV7BLStU0t1R1QPwHQ/kernel.config Thanks for the report. Adding freedreno folks to Cc, as this may have something to do with that driver. [5.444121] Unable to handle kernel NULL pointer dereference at virtual address 0018 [5.456615] ESR = 0x9604 [5.464471] SET = 0, FnV = 0 [5.464487] EA = 0, S1PTW = 0 [5.466521] Data abort info: [5.469971] ISV = 0, ISS = 0x0004 [5.472768] CM = 0, WnR = 0 [5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=bacba000 [5.479349] [0018] pgd=, p4d= [5.485820] Internal error: Oops: 9604 [#1] PREEMPT SMP [5.492448] Modules linked in: crct10dif_ce adv7511(+) qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns rmtfs_mem fuse [5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1 [5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [5.529170] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--) [5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8 [5.541148] lr : free_io_pgtable_ops+0x28/0x58 [5.546350] sp : 80001219b5f0 [5.550689] x29: 80001219b5f0 x28: 0013 [5.554078] x27: 0100 x26: 36add3b8 [5.559459] x25: 8915e910 x24: 3a5458c0 [5.564753] x23: 0003 x22: 36a37058 [5.570049] x21: 36a3a100 x20: 36a3a480 [5.575344] x19: 36a37158 x18: [5.580639] x17: x16: [5.585935] x15: 0004 x14: 0368 [5.591229] x13: x12: 39c61798 [5.596525] x11: 39c616d0 x10: 4000 [5.601820] x9 : x8 : 39c616f8 [5.607114] x7 : x6 : 09f699a0 [5.612410] x5 : 80001219b520 x4 : 36a3a000 [5.617705] x3 : 09f69904 x2 : [5.623001] x1 : 8000107e27e8 x0 : 3a545810 [5.628297] Call trace: [5.633592] qcom_iommu_tlb_inv_context+0x18/0xa8 This means that dev_iommu_fwspec_get() has returned NULL in qcom_iommu_tlb_inv_context(), either because dev->iommu is NULL, or because dev->iommu->fwspec is NULL. qcom_iommu_tlb_inv_context() does not check for a NULL pointer before using the returned object. The bug is either in the lack of error handling, or the fact that it's possible to get into this function for a device that has not been fully set up. Not quite - the device *was* properly set up, but has already been properly torn down again in the removal path by iommu_release_device(). The problem is that qcom-iommu kept the device pointer as its TLB cookie for the domain, but the domain has a longer lifespan than the validity of that device - that's a fundamental design flaw in the driver. Robin. [5.635764] free_io_pgtable_ops+0x28/0x58 [5.640624] qcom_iommu_domain_free+0x38/0x60 [5.644617] iommu_group_release+0x4c/0x70 [5.649045] kobject_put+0x6c/0x120 [5.653035] kobject_del+0x64/0x90 [5.656421] kobject_put+0xfc/0x120 [5.659893] iommu_group_remove_device+0xdc/0xf0 [5.663281] iommu_release_device+0x44/0x70 [5.668142] iommu_bus_notifier+0xbc/0xd0 [5.672048] notifier_call_chain+0x54/0x98 [5.676214] blocking_notifier_call_chain+0x48/0x70 [5.680209] device_del+0x26c/0x3a0 [5.684981] platform_device_del.part.0+0x1c/0x88 [5.688453] platform_device_unregister+0x24/0x40 [5.693316] of_platform_device_destroy+0xe4/0xf8 [5.698002] device_for_each_child+0x5c/0xa8 [5.702689] of_platform_depopulate+0x3c/0x80 [5.707144] msm_pdev_probe+0x1c4/0x308 [msm] It was triggered by a failure in msm_pdev_probe(), which was calls of_platform_depopulate() in its error handling code. This is a combination of two problems: a) Whatever caused msm_pdev_probe() to fail means that the gpu won't be usable, though it should not have caused the kernel to crash. b) the error handling itself causing additional problems due to failed unwinding. [5.711286] platform_drv_probe+0x54/0xa8 [5.715624] really_probe+0xd8/0x320 [5.719617] driver_probe_device+0x58/0xb8 [5.723263] device_driver_attach+0x74/0x80 [5.727168] __driver_attach+0x58/0xe0 [5.731248] bus_for_each_d
Re: [Freedreno] [PATCH v2 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2
On 2020-06-26 21:04, Jordan Crouse wrote: Support auxiliary domains for arm-smmu-v2 to initialize and support multiple pagetables for a single SMMU context bank. Since the smmu-v2 hardware doesn't have any built in support for switching the pagetable base it is left as an exercise to the caller to actually use the pagetable. Hmm, I've still been thinking that we could model this as supporting exactly 1 aux domain iff the device is currently attached to a primary domain with TTBR1 enabled. Then supporting multiple aux domains with magic TTBR0 switching is the Adreno-specific extension on top of that. And if we don't want to go to that length, then - as I think Will was getting at - I'm not sure it's worth bothering at all. There doesn't seem to be any point in half-implementing a pretend aux domain interface while still driving a bus through the rest of the abstractions - it's really the worst of both worlds. If we're going to hand over the guts of io-pgtable to the GPU driver then couldn't it just use DOMAIN_ATTR_PGTABLE_CFG bidirectionally to inject a TTBR0 table straight into the TTBR1-ified domain? Much as I like the idea of the aux domain abstraction and making this fit semi-transparently into the IOMMU API, if an almost entirely private interface will be the simplest and cleanest way to get it done then at this point also I'm starting to lean towards just getting it done. But if some other mediated-device type case then turns up that doesn't quite fit that private interface, we revisit the proper abstraction again and I reserve the right to say "I told you so" ;) Robin. Aux domains are supported if split pagetable (TTBR1) support has been enabled on the master domain. Each auxiliary domain will reuse the configuration of the master domain. By default the a domain with TTBR1 support will have the TTBR0 region disabled so the first attached aux domain will enable the TTBR0 region in the hardware and conversely the last domain to be detached will disable TTBR0 translations. All subsequent auxiliary domains create a pagetable but not touch the hardware. The leaf driver will be able to query the physical address of the pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the address with whatever means it has to switch the pagetable base. Following is a pseudo code example of how a domain can be created /* Check to see if aux domains are supported */ if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { iommu = iommu_domain_alloc(...); if (iommu_aux_attach_device(domain, dev)) return FAIL; /* Save the base address of the pagetable for use by the driver iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, &ptbase); } Then 'domain' can be used like any other iommu domain to map and unmap iova addresses in the pagetable. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu.c | 219 --- drivers/iommu/arm-smmu.h | 1 + 2 files changed, 204 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 060139452c54..ce6d654301bf 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -91,6 +91,7 @@ struct arm_smmu_cb { u32 tcr[2]; u32 mair[2]; struct arm_smmu_cfg *cfg; + atomic_taux; }; struct arm_smmu_master_cfg { @@ -667,6 +668,86 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg); } +/* + * Update the context context bank to enable TTBR0. Assumes AARCH64 S1 + * configuration. + */ +static void arm_smmu_context_set_ttbr0(struct arm_smmu_cb *cb, + struct io_pgtable_cfg *pgtbl_cfg) +{ + u32 tcr = cb->tcr[0]; + + /* Add the TCR configuration from the new pagetable config */ + tcr |= arm_smmu_lpae_tcr(pgtbl_cfg); + + /* Make sure that both TTBR0 and TTBR1 are enabled */ + tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1); + + /* Udate the TCR register */ + cb->tcr[0] = tcr; + + /* Program the new TTBR0 */ + cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid); +} + +/* + * Thus function assumes that the current model only allows aux domains for + * AARCH64 S1 configurations + */ +static int arm_smmu_aux_init_domain_context(struct iommu_domain *domain, + struct arm_smmu_device *smmu, struct arm_smmu_cfg *master) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct io_pgtable_ops *pgtbl_ops; + struct io_pgtable_cfg pgtbl_cfg; + + mutex_lock(&smmu_domain->init_mutex); + + /* Copy the configuration from the master */ + memcpy(&smmu_domain->cfg, master, sizeof(smmu_domain->cfg)); + + s
Re: [Freedreno] [PATCH v2 4/6] drm/msm: Add support to create a local pagetable
On 2020-06-26 21:04, Jordan Crouse wrote: Add support to create a io-pgtable for use by targets that support per-instance pagetables. In order to support per-instance pagetables the GPU SMMU device needs to have the qcom,adreno-smmu compatible string and split pagetables and auxiliary domains need to be supported and enabled. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/msm_gpummu.c | 2 +- drivers/gpu/drm/msm/msm_iommu.c | 180 ++- drivers/gpu/drm/msm/msm_mmu.h| 16 ++- 3 files changed, 195 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c index 310a31b05faa..aab121f4beb7 100644 --- a/drivers/gpu/drm/msm/msm_gpummu.c +++ b/drivers/gpu/drm/msm/msm_gpummu.c @@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct msm_gpu *gpu) } gpummu->gpu = gpu; - msm_mmu_init(&gpummu->base, dev, &funcs); + msm_mmu_init(&gpummu->base, dev, &funcs, MSM_MMU_GPUMMU); return &gpummu->base; } diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 1b6635504069..f455c597f76d 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -4,15 +4,192 @@ * Author: Rob Clark */ +#include #include "msm_drv.h" #include "msm_mmu.h" struct msm_iommu { struct msm_mmu base; struct iommu_domain *domain; + struct iommu_domain *aux_domain; }; + #define to_msm_iommu(x) container_of(x, struct msm_iommu, base) +struct msm_iommu_pagetable { + struct msm_mmu base; + struct msm_mmu *parent; + struct io_pgtable_ops *pgtbl_ops; + phys_addr_t ttbr; + u32 asid; +}; + +static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu) +{ + return container_of(mmu, struct msm_iommu_pagetable, base); +} + +static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova, + size_t size) +{ + struct msm_iommu_pagetable *pagetable = to_pagetable(mmu); + struct io_pgtable_ops *ops = pagetable->pgtbl_ops; + size_t unmapped = 0; + + /* Unmap the block one page at a time */ + while (size) { + unmapped += ops->unmap(ops, iova, 4096, NULL); + iova += 4096; + size -= 4096; + } + + iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain); + + return (unmapped == size) ? 0 : -EINVAL; +} Remember in patch #1 when you said "Then 'domain' can be used like any other iommu domain to map and unmap iova addresses in the pagetable."? This appears to be very much not that :/ Robin. + +static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova, + struct sg_table *sgt, size_t len, int prot) +{ + struct msm_iommu_pagetable *pagetable = to_pagetable(mmu); + struct io_pgtable_ops *ops = pagetable->pgtbl_ops; + struct scatterlist *sg; + size_t mapped = 0; + u64 addr = iova; + unsigned int i; + + for_each_sg(sgt->sgl, sg, sgt->nents, i) { + size_t size = sg->length; + phys_addr_t phys = sg_phys(sg); + + /* Map the block one page at a time */ + while (size) { + if (ops->map(ops, addr, phys, 4096, prot)) { + msm_iommu_pagetable_unmap(mmu, iova, mapped); + return -EINVAL; + } + + phys += 4096; + addr += 4096; + size -= 4096; + mapped += 4096; + } + } + + return 0; +} + +static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu) +{ + struct msm_iommu_pagetable *pagetable = to_pagetable(mmu); + + free_io_pgtable_ops(pagetable->pgtbl_ops); + kfree(pagetable); +} + +/* + * Given a parent device, create and return an aux domain. This will enable the + * TTBR0 region + */ +static struct iommu_domain *msm_iommu_get_aux_domain(struct msm_mmu *parent) +{ + struct msm_iommu *iommu = to_msm_iommu(parent); + struct iommu_domain *domain; + int ret; + + if (iommu->aux_domain) + return iommu->aux_domain; + + if (!iommu_dev_has_feature(parent->dev, IOMMU_DEV_FEAT_AUX)) + return ERR_PTR(-ENODEV); + + domain = iommu_domain_alloc(&platform_bus_type); + if (!domain) + return ERR_PTR(-ENODEV); + + ret = iommu_aux_attach_device(domain, parent->dev); + if (ret) { + iommu_domain_free(domain); + return ERR_PTR(ret); + } + + iommu->aux_domain = domain; + return domain; +} + +int msm_iommu_pagetable_params(struct msm_mmu *mmu, + phys_addr_t *ttbr, int *asid) +{ + struct msm_iommu_pagetable *pagetable; + + if (mmu->type != MSM_MMU_IOMMU_PAGETABLE) + return
Re: [Freedreno] [PATCH v2 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations
On 2020-06-26 21:04, Jordan Crouse wrote: Allow a io-pgtable implementation to skip TLB operations by checking for NULL pointers in the helper functions. It will be up to to the owner of the io-pgtable instance to make sure that they independently handle the TLB correctly. I don't really understand what this is for - tricking the IOMMU driver into not performing its TLB maintenance at points when that maintenance has been deemed necessary doesn't seem like the appropriate way to achieve anything good :/ Robin. Signed-off-by: Jordan Crouse --- include/linux/io-pgtable.h | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 53d53c6c2be9..bbed1d3925ba 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -210,21 +210,24 @@ struct io_pgtable { static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop) { - iop->cfg.tlb->tlb_flush_all(iop->cookie); + if (iop->cfg.tlb) + iop->cfg.tlb->tlb_flush_all(iop->cookie); } static inline void io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova, size_t size, size_t granule) { - iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie); + if (iop->cfg.tlb) + iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie); } static inline void io_pgtable_tlb_flush_leaf(struct io_pgtable *iop, unsigned long iova, size_t size, size_t granule) { - iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie); + if (iop->cfg.tlb) + iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie); } static inline void @@ -232,7 +235,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop, struct iommu_iotlb_gather * gather, unsigned long iova, size_t granule) { - if (iop->cfg.tlb->tlb_add_page) + if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page) iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie); } ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH] dt-bindings: arm-smmu: update the list of clocks
[ /me fires off MAINTAINERS patch... ] On 20/02/2020 9:02 pm, Matthias Kaehlcke wrote: On Thu, Feb 20, 2020 at 01:42:22PM +0530, Sharat Masetty wrote: This patch adds a clock definition needed for powering on the GPU TBUs and the GPU TCU. Signed-off-by: Sharat Masetty --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index 6515dbe..235c0df 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -28,6 +28,7 @@ properties: - enum: - qcom,msm8996-smmu-v2 - qcom,msm8998-smmu-v2 + - qcom,sc7180-smmu-v2 The addition of the compatible string isn't (directly) related with $subject, this should be done in a separate patch. ...or the patch should just be accurately described as "add support for new variant X, which requires an additional clock Y". And speaking of which, can anyone clarify how "TCU and TBU core clock" manages to abbreviate to "mem_iface_clk"? I would have naively assumed something like "core" would be most logical. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v2] iommu/arm-smmu: fix "hang" when games exit
On 2019-10-28 10:38 pm, Rob Clark wrote: On Mon, Oct 28, 2019 at 3:20 PM Will Deacon wrote: Hi Rob, On Mon, Oct 07, 2019 at 01:49:06PM -0700, Rob Clark wrote: From: Rob Clark When games, browser, or anything using a lot of GPU buffers exits, there can be many hundreds or thousands of buffers to unmap and free. If the GPU is otherwise suspended, this can cause arm-smmu to resume/suspend for each buffer, resulting 5-10 seconds worth of reprogramming the context bank (arm_smmu_write_context_bank()/arm_smmu_write_s2cr()/etc). To the user it would appear that the system just locked up. A simple solution is to use pm_runtime_put_autosuspend() instead, so we don't immediately suspend the SMMU device. Please can you reword the subject to be a bit more useful? The commit message is great, but the subject is a bit like "fix bug in code" to me. yeah, not the best $subject, but I wasn't quite sure how to fit something better in a reasonable # of chars.. maybe something like: "iommu/arm-smmu: optimize unmap but avoiding toggling runpm state"? FWIW, I'd be inclined to frame it as something like "avoid pathological RPM behaviour for unmaps". Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v2] iommu/arm-smmu: fix "hang" when games exit
On 2019-10-07 9:49 pm, Rob Clark wrote: From: Rob Clark When games, browser, or anything using a lot of GPU buffers exits, there can be many hundreds or thousands of buffers to unmap and free. If the GPU is otherwise suspended, this can cause arm-smmu to resume/suspend for each buffer, resulting 5-10 seconds worth of reprogramming the context bank (arm_smmu_write_context_bank()/arm_smmu_write_s2cr()/etc). To the user it would appear that the system just locked up. A simple solution is to use pm_runtime_put_autosuspend() instead, so we don't immediately suspend the SMMU device. Reviewed-by: Robin Murphy Signed-off-by: Rob Clark --- v1: original v2: unconditionally use autosuspend, rather than deciding based on what consumer does drivers/iommu/arm-smmu.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 3f1d55fb43c4..b7b41f5001bc 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -289,7 +289,7 @@ static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu) { if (pm_runtime_enabled(smmu->dev)) - pm_runtime_put(smmu->dev); + pm_runtime_put_autosuspend(smmu->dev); } static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) @@ -1445,6 +1445,9 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) /* Looks ok, so add the device to the domain */ ret = arm_smmu_domain_add_master(smmu_domain, fwspec); + pm_runtime_set_autosuspend_delay(smmu->dev, 20); + pm_runtime_use_autosuspend(smmu->dev); + rpm_put: arm_smmu_rpm_put(smmu); return ret; ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 00/11] of: Fix DMA configuration for non-DT masters
On 2019-09-26 11:44 am, Nicolas Saenz Julienne wrote: Robin, have you looked into supporting multiple dma-ranges? It's the next thing we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in the works already. Multiple dma-ranges as far as configuring inbound windows should work already other than the bug when there's any parent translation. But if you mean supporting multiple DMA offsets and masks per device in the DMA API, there's nothing in the works yet. Sorry, I meant supporting multiple DMA offsets[1]. I think I could still make it with a single DMA mask though. The main problem for supporting that case in general is the disgusting carving up of the physical memory map you may have to do to guarantee that a single buffer allocation cannot ever span two windows with different offsets. I don't think we ever reached a conclusion on whether that was even achievable in practice. There's also the in-between step of making of_dma_get_range() return a size based on all the dma-ranges entries rather than only the first one - otherwise, something like [1] can lead to pretty unworkable default masks. We implemented that when doing acpi_dma_get_range(), it's just that the OF counterpart never caught up. Right. I suppose we assume any holes in the ranges are addressable by the device but won't get used for other reasons (such as no memory there). However, to be correct, the range of the dma offset plus mask would need to be within the min start and max end addresses. IOW, while we need to round up (0xa_8000_ - 0x2c1c_) to the next power of 2, the 'correct' thing to do is round down. IIUC I also have this issue on my list. The RPi4 PCIe block has an integration bug that only allows DMA to the lower 3GB. With dma-ranges of size 0xc000_ you get a 32bit DMA mask wich is not what you need. So far I faked it in the device-tree but I guess it be better to add an extra check in of_dma_configure(), decrease the mask and print some kind of warning stating that DMA addressing is suboptimal. Yeah, there's just no way for masks to describe that the device can drive all the individual bits, just not in certain combinations :( The plan I have sketched out there is to merge dma_pfn_offset and bus_dma_mask into a "DMA range" descriptor, so we can then hang one or more of those off a device to properly cope with all these weird interconnects. Conceptually it feels pretty straightforward; I think most of the challenge is in implementing it efficiently. Plus there's the question of whether it could also subsume the dma_mask as well. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 00/11] of: Fix DMA configuration for non-DT masters
On 25/09/2019 17:16, Rob Herring wrote: On Wed, Sep 25, 2019 at 10:30 AM Nicolas Saenz Julienne wrote: On Wed, 2019-09-25 at 16:09 +0100, Robin Murphy wrote: On 25/09/2019 15:52, Nicolas Saenz Julienne wrote: On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote: On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne wrote: Hi All, this series tries to address one of the issues blocking us from upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that devices not represented in DT which sit behind a PCI bus fail to get the bus' DMA addressing constraints. This is due to the fact that of_dma_configure() assumes it's receiving a DT node representing the device being configured, as opposed to the PCIe bridge node we currently pass. This causes the code to directly jump into PCI's parent node when checking for 'dma-ranges' and misses whatever was set there. To address this I create a new API in OF - inspired from Robin Murphys original proposal[2] - which accepts a bus DT node as it's input in order to configure a device's DMA constraints. The changes go deep into of/address.c's implementation, as a device being having a DT node assumption was pretty strong. On top of this work, I also cleaned up of_dma_configure() removing its redundant arguments and creating an alternative function for the special cases not applicable to either the above case or the default usage. IMO the resulting functions are more explicit. They will probably surface some hacky usages that can be properly fixed as I show with the DT fixes on the Layerscape platform. This was also tested on a Raspberry Pi 4 with a custom PCIe driver and on a Seattle AMD board. Humm, I've been working on this issue too. Looks similar though yours has a lot more churn and there's some other bugs I've found. That's good news, and yes now that I see it, some stuff on my series is overly complicated. Specially around of_translate_*(). On top of that, you removed in of_dma_get_range(): - /* -* At least empty ranges has to be defined for parent node if -* DMA is supported -*/ - if (!ranges) - break; Which I assumed was bound to the standard and makes things easier. Can you test out this branch[1]. I don't have any h/w needing this, but wrote a unittest and tested with modified QEMU. I reviewed everything, I did find a minor issue, see the patch attached. WRT that patch, the original intent of "force_dma" was purely to consider a device DMA-capable regardless of the presence of "dma-ranges". Expecting of_dma_configure() to do anything for a non-OF device has always been bogus - magic paravirt devices which appear out of nowhere and expect to be treated as genuine DMA masters are a separate problem that we haven't really approached yet. I agree it's clearly abusing the function. I have no problem with the behaviour change if it's OK with you. Thinking about it, you could probably just remove that call from the Xen DRM driver now anyway - since the dma-direct rework, we lost the ability to set dma_dummy_ops by default, and NULL ops now represent what it (presumably) wants. Robin, have you looked into supporting multiple dma-ranges? It's the next thing we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in the works already. Multiple dma-ranges as far as configuring inbound windows should work already other than the bug when there's any parent translation. But if you mean supporting multiple DMA offsets and masks per device in the DMA API, there's nothing in the works yet. There's also the in-between step of making of_dma_get_range() return a size based on all the dma-ranges entries rather than only the first one - otherwise, something like [1] can lead to pretty unworkable default masks. We implemented that when doing acpi_dma_get_range(), it's just that the OF counterpart never caught up. Robin. [1] http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a2814af56b3486c2985a95540a88d8f9fa3a699f ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 00/11] of: Fix DMA configuration for non-DT masters
On 25/09/2019 15:52, Nicolas Saenz Julienne wrote: On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote: On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne wrote: Hi All, this series tries to address one of the issues blocking us from upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that devices not represented in DT which sit behind a PCI bus fail to get the bus' DMA addressing constraints. This is due to the fact that of_dma_configure() assumes it's receiving a DT node representing the device being configured, as opposed to the PCIe bridge node we currently pass. This causes the code to directly jump into PCI's parent node when checking for 'dma-ranges' and misses whatever was set there. To address this I create a new API in OF - inspired from Robin Murphys original proposal[2] - which accepts a bus DT node as it's input in order to configure a device's DMA constraints. The changes go deep into of/address.c's implementation, as a device being having a DT node assumption was pretty strong. On top of this work, I also cleaned up of_dma_configure() removing its redundant arguments and creating an alternative function for the special cases not applicable to either the above case or the default usage. IMO the resulting functions are more explicit. They will probably surface some hacky usages that can be properly fixed as I show with the DT fixes on the Layerscape platform. This was also tested on a Raspberry Pi 4 with a custom PCIe driver and on a Seattle AMD board. Humm, I've been working on this issue too. Looks similar though yours has a lot more churn and there's some other bugs I've found. That's good news, and yes now that I see it, some stuff on my series is overly complicated. Specially around of_translate_*(). On top of that, you removed in of_dma_get_range(): - /* -* At least empty ranges has to be defined for parent node if -* DMA is supported -*/ - if (!ranges) - break; Which I assumed was bound to the standard and makes things easier. Can you test out this branch[1]. I don't have any h/w needing this, but wrote a unittest and tested with modified QEMU. I reviewed everything, I did find a minor issue, see the patch attached. WRT that patch, the original intent of "force_dma" was purely to consider a device DMA-capable regardless of the presence of "dma-ranges". Expecting of_dma_configure() to do anything for a non-OF device has always been bogus - magic paravirt devices which appear out of nowhere and expect to be treated as genuine DMA masters are a separate problem that we haven't really approached yet. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v3 0/2] iommu: handle drivers that manage iommu directly
On 06/09/2019 22:44, Rob Clark wrote: From: Rob Clark One of the challenges we have to enable the aarch64 laptops upstream is dealing with the fact that the bootloader enables the display and takes the corresponding SMMU context-bank out of BYPASS. Unfortunately, currently, the IOMMU framework attaches a DMA (or potentially an IDENTITY) domain before the driver is probed and has a chance to intervene and shutdown scanout. Which makes things go horribly wrong. Nope, things already went horribly wrong in arm_smmu_device_reset() - sure, sometimes for some configurations it might *seem* like they didn't and that you can fudge the context bank state at arm's length from core code later, but the truth is that impl->cfg_probe is your last chance to guarantee that any necessary SMMU state is preserved. The remainder of the problem involves reworking default domain allocation such that we can converge on what iommu_request_dm_for_dev() currently does but without the momentary attachment to a translation domain to cause hiccups. That's starting here: https://lore.kernel.org/linux-iommu/cover.1566353521.git.sai.praneeth.prak...@intel.com/ But in this case, drm/msm is already directly managing it's IOMMUs directly, the DMA API attached iommu_domain simply gets in the way. This series adds a way that a driver can indicate to drivers/iommu that it does not wish to have an DMA managed iommu_domain attached. This way, drm/msm can shut down scanout cleanly before attaching it's own iommu_domain. NOTE that to get things working with arm-smmu on the aarch64 laptops, you also need a patchset[1] from Bjorn Andersson to inherit SMMU config at boot, when it is already enabled. [1] https://www.spinics.net/lists/arm-kernel/msg732246.html NOTE that in discussion of previous revisions, RMRR came up. This is not really a replacement for RMRR (nor does RMRR really provide any more information than we already get from EFI GOP, or DT in the simplefb case). I also don't see how RMRR could help w/ SMMU handover of CB/SMR config (Bjorn's patchset[1]) without defining new tables. The point of RMRR-like-things is that they identify not just the memory region but also the specific device accessing them, which means the IOMMU driver knows up-front which IDs etc. it must be careful not to disrupt. Obviously for SMMU that *would* be some new table (designed to encompass everything relevant) since literal RMRRs are specifically an Intel VT-d thing. This perhaps doesn't solve the more general case of bootloader enabled display for drivers that actually want to use DMA API managed IOMMU. But it does also happen to avoid a related problem with GPU, caused by the DMA domain claiming the context bank that the GPU firmware expects to use. Careful bringing that up again, or I really will rework the context bank allocator to avoid this default domain problem entirely... ;) Robin. And it avoids spurious TLB invalidation coming from the unused DMA domain. So IMHO this is a useful and necessary change. Rob Clark (2): iommu: add support for drivers that manage iommu explicitly drm/msm: mark devices where iommu is managed by driver drivers/gpu/drm/msm/adreno/adreno_device.c | 1 + drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c| 1 + drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 1 + drivers/gpu/drm/msm/msm_drv.c | 1 + drivers/iommu/iommu.c | 2 +- drivers/iommu/of_iommu.c | 3 +++ include/linux/device.h | 3 ++- 7 files changed, 10 insertions(+), 2 deletions(-) ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH] iommu/arm-smmu: fix "hang" when games exit
On 07/09/2019 18:50, Rob Clark wrote: From: Rob Clark When games, browser, or anything using a lot of GPU buffers exits, there can be many hundreds or thousands of buffers to unmap and free. If the GPU is otherwise suspended, this can cause arm-smmu to resume/suspend for each buffer, resulting 5-10 seconds worth of reprogramming the context bank (arm_smmu_write_context_bank()/arm_smmu_write_s2cr()/etc). To the user it would appear that the system is locked up. A simple solution is to use pm_runtime_put_autosuspend() instead, so we don't immediately suspend the SMMU device. Signed-off-by: Rob Clark --- Note: I've tied the autosuspend enable/delay to the consumer device, based on the reasoning that if the consumer device benefits from using an autosuspend delay, then it's corresponding SMMU probably does too. Maybe that is overkill and we should just unconditionally enable autosuspend. I'm not sure there's really any reason to expect that a supplier's usage model when doing things for itself bears any relation to that of its consumer(s), so I'd certainly lean towards the "unconditional" argument myself. Of course ideally we'd skip resuming altogether in the map/unmap paths (since resume implies a full TLB reset anyway), but IIRC that approach started to get messy in the context of the initial RPM patchset. I'm planning to fiddle around a bit more to clean up the implementation of the new iommu_flush_ops stuff, so I've made a note to myself to revisit RPM to see if there's a sufficiently clean way to do better. In the meantime, though, I don't have any real objection to using some reasonable autosuspend delay on the principle that if we've been woken up to map/unmap one page, there's a high likelihood that more will follow in short order (and in the configuration slow-paths it won't have much impact either way). Robin. drivers/iommu/arm-smmu.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c2733b447d9c..73a0dd53c8a3 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -289,7 +289,7 @@ static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu) { if (pm_runtime_enabled(smmu->dev)) - pm_runtime_put(smmu->dev); + pm_runtime_put_autosuspend(smmu->dev); } static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) @@ -1445,6 +1445,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) /* Looks ok, so add the device to the domain */ ret = arm_smmu_domain_add_master(smmu_domain, fwspec); +#ifdef CONFIG_PM + /* TODO maybe device_link_add() should do this for us? */ + if (dev->power.use_autosuspend) { + pm_runtime_set_autosuspend_delay(smmu->dev, + dev->power.autosuspend_delay); + pm_runtime_use_autosuspend(smmu->dev); + } +#endif + rpm_put: arm_smmu_rpm_put(smmu); return ret; ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v3 0/2] iommu/arm-smmu: Split pagetable support
On 16/08/2019 19:12, Rob Clark wrote: On Fri, Aug 16, 2019 at 9:58 AM Robin Murphy wrote: Hi Jordan, On 15/08/2019 16:33, Jordan Crouse wrote: On Wed, Aug 07, 2019 at 04:21:38PM -0600, Jordan Crouse wrote: (Sigh, resend. I freaked out my SMTP server) This is part of an ongoing evolution for enabling split pagetable support for arm-smmu. Previous versions can be found [1]. In the discussion for v2 Robin pointed out that this is a very Adreno specific use case and that is exactly true. Not only do we want to configure and use a pagetable in the TTBR1 space, we also want to configure the TTBR0 region but not allocate a pagetable for it or touch it until the GPU hardware does so. As much as I want it to be a generic concept it really isn't. This revision leans into that idea. Most of the same io-pgtable code is there but now it is wrapped as an Adreno GPU specific format that is selected by the compatible string in the arm-smmu device. Additionally, per Robin's suggestion we are skipping creating a TTBR0 pagetable to save on wasted memory. This isn't as clean as I would like it to be but I think that this is a better direction than trying to pretend that the generic format would work. I'm tempting fate by posting this and then taking some time off, but I wanted to try to kick off a conversation or at least get some flames so I can try to refine this again next week. Please take a look and give some advice on the direction. Will, Robin - Modulo the impl changes from Robin, do you think that using a dedicated pagetable format is the right approach for supporting split pagetables for the Adreno GPU? How many different Adreno drivers would benefit from sharing it? Hypothetically everything back to a3xx, so I *could* see usefulness of this in qcom_iommu (or maybe even msm-iommu). OTOH maybe with "modularizing" arm-smmu we could re-combine qcom_iommu and arm-smmu. Indeed, that's certainly something I'm planning to investigate as a future refactoring step. And as a practical matter, I'm not sure if anyone will get around to backporting per-context pagetables as far back as a3xx. BR, -R The more I come back to this, the more I'm convinced that io-pgtable should focus on the heavy lifting of pagetable management - the code that nobody wants to have to write at all, let alone more than once - and any subtleties which aren't essential to that should be pushed back into whichever callers actually care. Consider that already, literally no caller actually uses an unmodified stage 1 TCR value as provided in the io_pgtable_cfg. I feel it would be most productive to elaborate further in the form of patches, so let me get right on that and try to bash something out before I go home tonight... ...and now there's a rough WIP branch here: http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/pgtable I'll finish testing and polishing those patches at some point next week, probably, but hopefully they're sufficiently illustrative for the moment. Robin.
Re: [Freedreno] [PATCH v3 0/2] iommu/arm-smmu: Split pagetable support
Hi Jordan, On 15/08/2019 16:33, Jordan Crouse wrote: On Wed, Aug 07, 2019 at 04:21:38PM -0600, Jordan Crouse wrote: (Sigh, resend. I freaked out my SMTP server) This is part of an ongoing evolution for enabling split pagetable support for arm-smmu. Previous versions can be found [1]. In the discussion for v2 Robin pointed out that this is a very Adreno specific use case and that is exactly true. Not only do we want to configure and use a pagetable in the TTBR1 space, we also want to configure the TTBR0 region but not allocate a pagetable for it or touch it until the GPU hardware does so. As much as I want it to be a generic concept it really isn't. This revision leans into that idea. Most of the same io-pgtable code is there but now it is wrapped as an Adreno GPU specific format that is selected by the compatible string in the arm-smmu device. Additionally, per Robin's suggestion we are skipping creating a TTBR0 pagetable to save on wasted memory. This isn't as clean as I would like it to be but I think that this is a better direction than trying to pretend that the generic format would work. I'm tempting fate by posting this and then taking some time off, but I wanted to try to kick off a conversation or at least get some flames so I can try to refine this again next week. Please take a look and give some advice on the direction. Will, Robin - Modulo the impl changes from Robin, do you think that using a dedicated pagetable format is the right approach for supporting split pagetables for the Adreno GPU? How many different Adreno drivers would benefit from sharing it? The more I come back to this, the more I'm convinced that io-pgtable should focus on the heavy lifting of pagetable management - the code that nobody wants to have to write at all, let alone more than once - and any subtleties which aren't essential to that should be pushed back into whichever callers actually care. Consider that already, literally no caller actually uses an unmodified stage 1 TCR value as provided in the io_pgtable_cfg. I feel it would be most productive to elaborate further in the form of patches, so let me get right on that and try to bash something out before I go home tonight... Robin. If so, then is adding the changes to io-pgtable-arm.c possible for 5.4 and then add the implementation specific code on top of Robin's stack later or do you feel they should come as part of a package deal? Jordan Jordan Crouse (2): iommu/io-pgtable-arm: Add support for ARM_ADRENO_GPU_LPAE io-pgtable format iommu/arm-smmu: Add support for Adreno GPU pagetable formats drivers/iommu/arm-smmu.c | 8 +- drivers/iommu/io-pgtable-arm.c | 214 ++--- drivers/iommu/io-pgtable.c | 1 + include/linux/io-pgtable.h | 2 + 4 files changed, 209 insertions(+), 16 deletions(-) -- 2.7.4 ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [RESEND PATCH v2 3/3] iommu/arm-smmu: Add support for DOMAIN_ATTR_SPLIT_TABLES
On 08/07/2019 20:00, Jordan Crouse wrote: When DOMAIN_ATTR_SPLIT_TABLES is specified for pass ARM_64_LPAE_SPLIT_S1 to io_pgtable_ops to allocate and initialize TTBR0 and TTBR1 pagetables. v3: Moved all the pagetable specific work into io-pgtable-arm in a previous patch. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 653b6b3..7a6b4bb 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -257,6 +257,7 @@ struct arm_smmu_domain { boolnon_strict; struct mutexinit_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ + u32 attributes; struct iommu_domain domain; }; @@ -832,7 +833,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, ias = smmu->va_size; oas = smmu->ipa_size; if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) { - fmt = ARM_64_LPAE_S1; + if (smmu_domain->attributes & + (1 << DOMAIN_ATTR_SPLIT_TABLES)) + fmt = ARM_64_LPAE_SPLIT_S1; + else + fmt = ARM_64_LPAE_S1; } else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) { fmt = ARM_32_LPAE_S1; ias = min(ias, 32UL); @@ -1582,6 +1587,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_SPLIT_TABLES: + *(int *)data = !!(smmu_domain->attributes & + (1 << DOMAIN_ATTR_SPLIT_TABLES)); I'm not really a fan of always claiming to support this but silently ignoring it depending on hardware/kernel configuration - it seems somewhat tricky to make callers properly robust against making false assumptions (e.g. consider if the system is booted with "arm-smmu.force_stage=2"). Robin. + return 0; default: return -ENODEV; } @@ -1622,6 +1631,11 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, else smmu_domain->stage = ARM_SMMU_DOMAIN_S1; break; + case DOMAIN_ATTR_SPLIT_TABLES: + if (*((int *)data)) + smmu_domain->attributes |= + (1 << DOMAIN_ATTR_SPLIT_TABLES); + break; default: ret = -ENODEV; } ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [RESEND PATCH v2 2/3] iommu/io-pgtable-arm: Add support for AARCH64 split pagetables
Hi Jordan, On 08/07/2019 20:00, Jordan Crouse wrote: Add a new sub-format ARM_64_LPAE_SPLIT_S1 to create and set up split pagetables (TTBR0 and TTBR1). The initialization function sets up the correct va_size and sign extension bits and programs the TCR registers. Split pagetable formats use their own own map/unmap wrappers to ensure that the correct pagetable is selected based on the incoming iova but most of the heavy lifting is common to the other formats. I'm somewhat concerned that this implementation is very closely tied to the current Adreno use-case, and won't easily generalise in future to other potential TTBR1 uses which have been tossed around, like SMMUv3-without-substream-ID. Furthermore, even for the Adreno pretend-PASID case it appears to be a bit too fragile for comfort - given that a DOMAIN_ATTR_SPLIT_TABLES domain doesn't look any different from a regular one from the users' point of view, what's to stop them making "without PASID" mappings in the lower half of the address space, and thus unwittingly pulling the rug out from under their own feet upon attaching an aux domain? In fact allocating a TTBR0 table at all for the main domain seems like little more than a waste of memory. Signed-off-by: Jordan Crouse --- drivers/iommu/io-pgtable-arm.c | 261 + drivers/iommu/io-pgtable.c | 1 + include/linux/io-pgtable.h | 2 + 3 files changed, 240 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 161a7d56..aec35e5 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -118,7 +118,12 @@ #define ARM_LPAE_TCR_TG0_64K (1 << 14) #define ARM_LPAE_TCR_TG0_16K (2 << 14) +#define ARM_LPAE_TCR_TG1_4K (0 << 30) +#define ARM_LPAE_TCR_TG1_64K (1 << 30) +#define ARM_LPAE_TCR_TG1_16K (2 << 30) + #define ARM_LPAE_TCR_SH0_SHIFT12 +#define ARM_LPAE_TCR_SH1_SHIFT 28 #define ARM_LPAE_TCR_SH0_MASK 0x3 #define ARM_LPAE_TCR_SH_NS0 #define ARM_LPAE_TCR_SH_OS2 @@ -126,6 +131,8 @@ #define ARM_LPAE_TCR_ORGN0_SHIFT 10 #define ARM_LPAE_TCR_IRGN0_SHIFT 8 +#define ARM_LPAE_TCR_ORGN1_SHIFT 26 +#define ARM_LPAE_TCR_IRGN1_SHIFT 24 #define ARM_LPAE_TCR_RGN_MASK 0x3 #define ARM_LPAE_TCR_RGN_NC 0 #define ARM_LPAE_TCR_RGN_WBWA 1 @@ -136,6 +143,7 @@ #define ARM_LPAE_TCR_SL0_MASK 0x3 #define ARM_LPAE_TCR_T0SZ_SHIFT 0 +#define ARM_LPAE_TCR_T1SZ_SHIFT16 #define ARM_LPAE_TCR_SZ_MASK 0xf #define ARM_LPAE_TCR_PS_SHIFT 16 @@ -152,6 +160,14 @@ #define ARM_LPAE_TCR_PS_48_BIT0x5ULL #define ARM_LPAE_TCR_PS_52_BIT0x6ULL +#define ARM_LPAE_TCR_SEP_SHIFT 47 +#define ARM_LPAE_TCR_SEP_31(0x0ULL << ARM_LPAE_TCR_SEP_SHIFT) +#define ARM_LPAE_TCR_SEP_35(0x1ULL << ARM_LPAE_TCR_SEP_SHIFT) +#define ARM_LPAE_TCR_SEP_39(0x2ULL << ARM_LPAE_TCR_SEP_SHIFT) +#define ARM_LPAE_TCR_SEP_41(0x3ULL << ARM_LPAE_TCR_SEP_SHIFT) +#define ARM_LPAE_TCR_SEP_43(0x4ULL << ARM_LPAE_TCR_SEP_SHIFT) +#define ARM_LPAE_TCR_SEP_UPSTREAM (0x7ULL << ARM_LPAE_TCR_SEP_SHIFT) This is a specific detail of SMMUv2, and nothing to do with the LPAE/AArch64 VMSA formats. + #define ARM_LPAE_MAIR_ATTR_SHIFT(n) ((n) << 3) #define ARM_LPAE_MAIR_ATTR_MASK 0xff #define ARM_LPAE_MAIR_ATTR_DEVICE 0x04 @@ -179,11 +195,12 @@ struct arm_lpae_io_pgtable { struct io_pgtable iop; int levels; + u32 sep; size_t pgd_size; unsigned long pg_shift; unsigned long bits_per_level; - void *pgd; + void*pgd[2]; }; typedef u64 arm_lpae_iopte; @@ -426,7 +443,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, arm_lpae_iopte pte; if (data->iop.fmt == ARM_64_LPAE_S1 || - data->iop.fmt == ARM_32_LPAE_S1) { + data->iop.fmt == ARM_32_LPAE_S1 || + data->iop.fmt == ARM_64_LPAE_SPLIT_S1) { pte = ARM_LPAE_PTE_nG; if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ)) pte |= ARM_LPAE_PTE_AP_RDONLY; @@ -470,11 +488,10 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, return pte; } -static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova, - phys_addr_t paddr, size_t size, int iommu_prot) +static int _arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova, + phys_addr_t paddr, size_t size, int iommu_prot, + arm_lpae_iopte *ptep) { - struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); -
Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops
On 03/06/2019 11:47, Rob Clark wrote: On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa wrote: On Mon, Jun 3, 2019 at 4:40 AM Rob Clark wrote: On Fri, May 10, 2019 at 7:35 AM Rob Clark wrote: On Tue, Dec 4, 2018 at 2:29 PM Rob Herring wrote: On Sat, Dec 1, 2018 at 10:54 AM Rob Clark wrote: This solves a problem we see with drm/msm, caused by getting iommu_dma_ops while we attach our own domain and manage it directly at the iommu API level: [0038] user address but active_mm is swapper Internal error: Oops: 9605 [#1] PREEMPT SMP Modules linked in: CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 #90 Hardware name: xxx (DT) Workqueue: events deferred_probe_work_func pstate: 80c9 (Nzcv daif +PAN +UAO) pc : iommu_dma_map_sg+0x7c/0x2c8 lr : iommu_dma_map_sg+0x40/0x2c8 sp : ff80095eb4f0 x29: ff80095eb4f0 x28: x27: ffc0f9431578 x26: x25: x24: 0003 x23: 0001 x22: ffc0fa9ac010 x21: x20: ffc0fab40980 x19: ffc0fab40980 x18: 0003 x17: 01c4 x16: 0007 x15: 000e x14: x13: x12: 0028 x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f x9 : x8 : ffc0fab409a0 x7 : x6 : 0002 x5 : 0001 x4 : x3 : 0001 x2 : 0002 x1 : ffc0f9431578 x0 : Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb) Call trace: iommu_dma_map_sg+0x7c/0x2c8 __iommu_map_sg_attrs+0x70/0x84 get_pages+0x170/0x1e8 msm_gem_get_iova+0x8c/0x128 _msm_gem_kernel_new+0x6c/0xc8 msm_gem_kernel_new+0x4c/0x58 dsi_tx_buf_alloc_6g+0x4c/0x8c msm_dsi_host_modeset_init+0xc8/0x108 msm_dsi_modeset_init+0x54/0x18c _dpu_kms_drm_obj_init+0x430/0x474 dpu_kms_hw_init+0x5f8/0x6b4 msm_drm_bind+0x360/0x6c8 try_to_bring_up_master.part.7+0x28/0x70 component_master_add_with_match+0xe8/0x124 msm_pdev_probe+0x294/0x2b4 platform_drv_probe+0x58/0xa4 really_probe+0x150/0x294 driver_probe_device+0xac/0xe8 __device_attach_driver+0xa4/0xb4 bus_for_each_drv+0x98/0xc8 __device_attach+0xac/0x12c device_initial_probe+0x24/0x30 bus_probe_device+0x38/0x98 deferred_probe_work_func+0x78/0xa4 process_one_work+0x24c/0x3dc worker_thread+0x280/0x360 kthread+0x134/0x13c ret_from_fork+0x10/0x18 Code: d284 91000725 6b17039f 5400048a (f9401f40) ---[ end trace f22dda57f3648e2c ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0,22802a18 Memory Limit: none The problem is that when drm/msm does it's own iommu_attach_device(), now the domain returned by iommu_get_domain_for_dev() is drm/msm's domain, and it doesn't have domain->iova_cookie. We kind of avoided this problem prior to sdm845/dpu because the iommu was attached to the mdp node in dt, which is a child of the toplevel mdss node (which corresponds to the dev passed in dma_map_sg()). But with sdm845, now the iommu is attached at the mdss level so we hit the iommu_dma_ops in dma_map_sg(). But auto allocating/attaching a domain before the driver is probed was already a blocking problem for enabling per-context pagetables for the GPU. This problem is also now solved with this patch. Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in of_dma_configure Tested-by: Douglas Anderson Signed-off-by: Rob Clark --- This is an alternative/replacement for [1]. What it lacks in elegance it makes up for in practicality ;-) [1] https://patchwork.freedesktop.org/patch/264930/ drivers/of/device.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/of/device.c b/drivers/of/device.c index 5957cd4fa262..15ffee00fb22 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev) return device_add(&ofdev->dev); } +static const struct of_device_id iommu_blacklist[] = { + { .compatible = "qcom,mdp4" }, + { .compatible = "qcom,mdss" }, + { .compatible = "qcom,sdm845-mdss" }, + { .compatible = "qcom,adreno" }, + {} +}; Not completely clear to whether this is still needed or not, but this really won't scale. Why can't the driver for these devices override whatever has been setup by default? fwiw, at the moment it is not needed, but it will become needed again to implement per-context pagetables (although I suppose for this we only need to blacklist qcom,adreno and not also the display nodes). So, another case I've come across, on the display side.. I'm working on handling the case where bootloader enables display (and takes iommu out of reset).. as soon as DMA doma
Re: [Freedreno] [PATCH v2 03/15] iommu/arm-smmu: Add split pagetable support for arm-smmu-v2
On 21/05/2019 17:13, Jordan Crouse wrote: Add support for a split pagetable (TTBR0/TTBR1) scheme for arm-smmu-v2. If split pagetables are enabled, create a pagetable for TTBR1 and set up the sign extension bit so that all IOVAs with that bit set are mapped and translated from the TTBR1 pagetable. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu-regs.h | 19 + drivers/iommu/arm-smmu.c | 179 ++--- drivers/iommu/io-pgtable-arm.c | 3 +- 3 files changed, 186 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h index e9132a9..23f27c2 100644 --- a/drivers/iommu/arm-smmu-regs.h +++ b/drivers/iommu/arm-smmu-regs.h @@ -195,7 +195,26 @@ enum arm_smmu_s2cr_privcfg { #define RESUME_RETRY (0 << 0) #define RESUME_TERMINATE (1 << 0) +#define TTBCR_EPD1 (1 << 23) +#define TTBCR_T0SZ_SHIFT 0 +#define TTBCR_T1SZ_SHIFT 16 +#define TTBCR_IRGN1_SHIFT 24 +#define TTBCR_ORGN1_SHIFT 26 +#define TTBCR_RGN_WBWA 1 +#define TTBCR_SH1_SHIFT28 +#define TTBCR_SH_IS3 + +#define TTBCR_TG1_16K (1 << 30) +#define TTBCR_TG1_4K (2 << 30) +#define TTBCR_TG1_64K (3 << 30) + #define TTBCR2_SEP_SHIFT 15 +#define TTBCR2_SEP_31 (0x0 << TTBCR2_SEP_SHIFT) +#define TTBCR2_SEP_35 (0x1 << TTBCR2_SEP_SHIFT) +#define TTBCR2_SEP_39 (0x2 << TTBCR2_SEP_SHIFT) +#define TTBCR2_SEP_41 (0x3 << TTBCR2_SEP_SHIFT) +#define TTBCR2_SEP_43 (0x4 << TTBCR2_SEP_SHIFT) +#define TTBCR2_SEP_47 (0x5 << TTBCR2_SEP_SHIFT) #define TTBCR2_SEP_UPSTREAM (0x7 << TTBCR2_SEP_SHIFT) #define TTBCR2_AS (1 << 4) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index a795ada..e09c0e6 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -152,6 +152,7 @@ struct arm_smmu_cb { u32 tcr[2]; u32 mair[2]; struct arm_smmu_cfg *cfg; + unsigned long split_table_mask; }; struct arm_smmu_master_cfg { @@ -253,13 +254,14 @@ enum arm_smmu_domain_stage { struct arm_smmu_domain { struct arm_smmu_device *smmu; - struct io_pgtable_ops *pgtbl_ops; + struct io_pgtable_ops *pgtbl_ops[2]; This seems a bit off - surely the primary domain and aux domain only ever need one set of tables each, but either way there's definitely unnecessary redundancy in having four sets of io_pgtable_ops between them. const struct iommu_gather_ops *tlb_ops; struct arm_smmu_cfg cfg; enum arm_smmu_domain_stage stage; boolnon_strict; struct mutexinit_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ + u32 attributes; struct iommu_domain domain; }; @@ -621,6 +623,85 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev) return IRQ_HANDLED; } +/* Adjust the context bank settings to support TTBR1 */ +static void arm_smmu_init_ttbr1(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg) +{ + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_cfg *cfg = &smmu_domain->cfg; + struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx]; + int pgsize = 1 << __ffs(pgtbl_cfg->pgsize_bitmap); + + /* Enable speculative walks through the TTBR1 */ + cb->tcr[0] &= ~TTBCR_EPD1; + + cb->tcr[0] |= TTBCR_SH_IS << TTBCR_SH1_SHIFT; + cb->tcr[0] |= TTBCR_RGN_WBWA << TTBCR_IRGN1_SHIFT; + cb->tcr[0] |= TTBCR_RGN_WBWA << TTBCR_ORGN1_SHIFT; + + switch (pgsize) { + case SZ_4K: + cb->tcr[0] |= TTBCR_TG1_4K; + break; + case SZ_16K: + cb->tcr[0] |= TTBCR_TG1_16K; + break; + case SZ_64K: + cb->tcr[0] |= TTBCR_TG1_64K; + break; + } + + /* +* Outside of the special 49 bit UBS case that has a dedicated sign +* extension bit, setting the SEP for any other va_size will force us to +* shrink the size of the T0/T1 regions by one bit to accommodate the +* SEP +*/ + if (smmu->va_size != 48) { + /* Replace the T0 size */ + cb->tcr[0] &= ~(0x3f << TTBCR_T0SZ_SHIFT); + cb->tcr[0] |= (64ULL - smmu->va_size - 1) << TTBCR_T0SZ_SHIFT; + /* Set the T1 size */ + cb->tcr[0] |= (64ULL - smmu->va_size - 1) << TTBCR_T1SZ_SHIFT; +
Re: [Freedreno] [PATCH v2 01/15] iommu/arm-smmu: Allow IOMMU enabled devices to skip DMA domains
On 21/05/2019 17:13, Jordan Crouse wrote: Allow IOMMU enabled devices specified on an opt-in list to create a default identity domain for a new IOMMU group and bypass the DMA domain created by the IOMMU core. This allows the group to be properly set up but otherwise skips touching the hardware until the client device attaches a unmanaged domain of its own. All the cool kids are using iommu_request_dm_for_dev() to force an identity domain for particular devices, won't that suffice for this case too? There is definite scope for improvement in this area, so I'd really like to keep things as consistent as possible to make that easier in future. Robin. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu.c | 42 ++ drivers/iommu/iommu.c| 29 +++-- include/linux/iommu.h| 3 +++ 3 files changed, 68 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 5e54cc0..a795ada 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1235,6 +1235,35 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain, return 0; } +struct arm_smmu_client_match_data { + bool use_identity_domain; +}; + +static const struct arm_smmu_client_match_data qcom_adreno = { + .use_identity_domain = true, +}; + +static const struct arm_smmu_client_match_data qcom_mdss = { + .use_identity_domain = true, +}; + +static const struct of_device_id arm_smmu_client_of_match[] = { + { .compatible = "qcom,adreno", .data = &qcom_adreno }, + { .compatible = "qcom,mdp4", .data = &qcom_mdss }, + { .compatible = "qcom,mdss", .data = &qcom_mdss }, + { .compatible = "qcom,sdm845-mdss", .data = &qcom_mdss }, + {}, +}; + +static const struct arm_smmu_client_match_data * +arm_smmu_client_data(struct device *dev) +{ + const struct of_device_id *match = + of_match_device(arm_smmu_client_of_match, dev); + + return match ? match->data : NULL; +} + static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) { int ret; @@ -1552,6 +1581,7 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) { struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct arm_smmu_device *smmu = fwspec_smmu(fwspec); + const struct arm_smmu_client_match_data *client; struct iommu_group *group = NULL; int i, idx; @@ -1573,6 +1603,18 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) else group = generic_device_group(dev); + client = arm_smmu_client_data(dev); + + /* +* If the client chooses to bypass the dma domain, create a identity +* domain as a default placeholder. This will give the device a +* default domain but skip DMA operations and not consume a context +* bank +*/ + if (client && client->no_dma_domain) + iommu_group_set_default_domain(group, dev, + IOMMU_DOMAIN_IDENTITY); + return group; } diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 67ee662..af3e1ed 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1062,6 +1062,24 @@ struct iommu_group *fsl_mc_device_group(struct device *dev) return group; } +struct iommu_domain *iommu_group_set_default_domain(struct iommu_group *group, + struct device *dev, unsigned int type) +{ + struct iommu_domain *dom; + + dom = __iommu_domain_alloc(dev->bus, type); + if (!dom) + return NULL; + + /* FIXME: Error if the default domain is already set? */ + group->default_domain = dom; + if (!group->domain) + group->domain = dom; + + return dom; +} +EXPORT_SYMBOL_GPL(iommu_group_set_default_domain); + /** * iommu_group_get_for_dev - Find or create the IOMMU group for a device * @dev: target device @@ -1099,9 +1117,12 @@ struct iommu_group *iommu_group_get_for_dev(struct device *dev) if (!group->default_domain) { struct iommu_domain *dom; - dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type); + dom = iommu_group_set_default_domain(group, dev, + iommu_def_domain_type); + if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) { - dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA); + dom = iommu_group_set_default_domain(group, dev, + IOMMU_DOMAIN_DMA); if (dom) { dev_warn(dev, "failed to allocate default IOMMU domain of type %u; falling back to IOMMU_DOMAIN_DMA", @@ -1109,10 +1130,6 @@ struct iommu_group *iommu_group_get_for_dev(struct device *dev)
Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops
Hi Rob, On 01/12/2018 16:53, Rob Clark wrote: This solves a problem we see with drm/msm, caused by getting iommu_dma_ops while we attach our own domain and manage it directly at the iommu API level: [0038] user address but active_mm is swapper Internal error: Oops: 9605 [#1] PREEMPT SMP Modules linked in: CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 #90 Hardware name: xxx (DT) Workqueue: events deferred_probe_work_func pstate: 80c9 (Nzcv daif +PAN +UAO) pc : iommu_dma_map_sg+0x7c/0x2c8 lr : iommu_dma_map_sg+0x40/0x2c8 sp : ff80095eb4f0 x29: ff80095eb4f0 x28: x27: ffc0f9431578 x26: x25: x24: 0003 x23: 0001 x22: ffc0fa9ac010 x21: x20: ffc0fab40980 x19: ffc0fab40980 x18: 0003 x17: 01c4 x16: 0007 x15: 000e x14: x13: x12: 0028 x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f x9 : x8 : ffc0fab409a0 x7 : x6 : 0002 x5 : 0001 x4 : x3 : 0001 x2 : 0002 x1 : ffc0f9431578 x0 : Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb) Call trace: iommu_dma_map_sg+0x7c/0x2c8 __iommu_map_sg_attrs+0x70/0x84 get_pages+0x170/0x1e8 msm_gem_get_iova+0x8c/0x128 _msm_gem_kernel_new+0x6c/0xc8 msm_gem_kernel_new+0x4c/0x58 dsi_tx_buf_alloc_6g+0x4c/0x8c msm_dsi_host_modeset_init+0xc8/0x108 msm_dsi_modeset_init+0x54/0x18c _dpu_kms_drm_obj_init+0x430/0x474 dpu_kms_hw_init+0x5f8/0x6b4 msm_drm_bind+0x360/0x6c8 try_to_bring_up_master.part.7+0x28/0x70 component_master_add_with_match+0xe8/0x124 msm_pdev_probe+0x294/0x2b4 platform_drv_probe+0x58/0xa4 really_probe+0x150/0x294 driver_probe_device+0xac/0xe8 __device_attach_driver+0xa4/0xb4 bus_for_each_drv+0x98/0xc8 __device_attach+0xac/0x12c device_initial_probe+0x24/0x30 bus_probe_device+0x38/0x98 deferred_probe_work_func+0x78/0xa4 process_one_work+0x24c/0x3dc worker_thread+0x280/0x360 kthread+0x134/0x13c ret_from_fork+0x10/0x18 Code: d284 91000725 6b17039f 5400048a (f9401f40) ---[ end trace f22dda57f3648e2c ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0,22802a18 Memory Limit: none The problem is that when drm/msm does it's own iommu_attach_device(), now the domain returned by iommu_get_domain_for_dev() is drm/msm's domain, and it doesn't have domain->iova_cookie. Does this crash still happen with 4.20-rc? Because as of 6af588fed391 it really shouldn't. We kind of avoided this problem prior to sdm845/dpu because the iommu was attached to the mdp node in dt, which is a child of the toplevel mdss node (which corresponds to the dev passed in dma_map_sg()). But with sdm845, now the iommu is attached at the mdss level so we hit the iommu_dma_ops in dma_map_sg(). But auto allocating/attaching a domain before the driver is probed was already a blocking problem for enabling per-context pagetables for the GPU. This problem is also now solved with this patch. s/solved/worked around/ If you want a guarantee of actually getting a specific hardware context allocated for a given domain, there needs to be code in the IOMMU driver to understand and honour that. Implicitly depending on whatever happens to fall out of current driver behaviour on current systems is not a real solution. Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in of_dma_configure That's rather misleading, since the crash described above depends on at least two other major changes which came long after that commit. It's not that I don't understand exactly what you want here - just that this commit message isn't a coherent justification for that ;) Tested-by: Douglas Anderson Signed-off-by: Rob Clark --- This is an alternative/replacement for [1]. What it lacks in elegance it makes up for in practicality ;-) [1] https://patchwork.freedesktop.org/patch/264930/ drivers/of/device.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/of/device.c b/drivers/of/device.c index 5957cd4fa262..15ffee00fb22 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev) return device_add(&ofdev->dev); } +static const struct of_device_id iommu_blacklist[] = { + { .compatible = "qcom,mdp4" }, + { .compatible = "qcom,mdss" }, + { .compatible = "qcom,sdm845-mdss" }, + { .compatible = "qcom,adreno" }, + {} +}; + /** * of_dma_configure - Setup DMA configuration * @dev: Device to apply DMA configurati
Re: [Freedreno] [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*
On 29/11/2018 19:57, Tomasz Figa wrote: On Thu, Nov 29, 2018 at 11:40 AM Jordan Crouse wrote: On Thu, Nov 29, 2018 at 01:48:15PM -0500, Rob Clark wrote: On Thu, Nov 29, 2018 at 10:54 AM Christoph Hellwig wrote: On Thu, Nov 29, 2018 at 09:42:50AM -0500, Rob Clark wrote: Maybe the thing we need to do is just implement a blacklist of compatible strings for devices which should skip the automatic iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem preventing us from enabling per-process pagetables for a5xx (where we need to control the domain/context-bank that is allocated by the dma api). You can detach from the dma map attachment using arm_iommu_detach_device, which a few drm drivers do, but I don't think this is the problem. I think even with detach, we wouldn't end up with the context-bank that the gpu firmware was hard-coded to expect, and so it would overwrite the incorrect page table address register. (I could be mis-remembering that, Jordan spent more time looking at that. But it was something along those lines.) Right - basically the DMA domain steals context bank 0 and the GPU is hard coded to use that context bank for pagetable switching. I believe the Tegra guys also had a similar problem with a hard coded context bank. AIUI, they don't need a specific hardware context, they just need to know which one they're actually using, which the domain abstraction hides. Wait, if we detach the GPU/display struct device from the default domain and attach it to a newly allocated domain, wouldn't the newly allocated domain use the context bank we need? Note that we're already The arm-smmu driver doesn't, but there's no fundamental reason it couldn't. That should just need code to refcount domain users and release hardware contexts for domains with no devices currently attached. Robin. doing that, except that we're doing it behind the back of the DMA mapping subsystem, so that it keeps using the IOMMU version of the DMA ops for the device and doing any mapping operations on the default domain. If we ask the DMA mapping to detach, wouldn't it essentially solve the problem? Best regards, Tomasz ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v18 1/5] iommu/arm-smmu: Add pm_runtime/sleep ops
On 28/11/2018 16:24, Stephen Boyd wrote: Quoting Vivek Gautam (2018-11-27 02:11:41) @@ -1966,6 +1970,23 @@ static const struct of_device_id arm_smmu_of_match[] = { }; MODULE_DEVICE_TABLE(of, arm_smmu_of_match); +static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu, + const char * const *clks) +{ + int i; + + if (smmu->num_clks < 1) + return; + + smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks, + sizeof(*smmu->clks), GFP_KERNEL); + if (!smmu->clks) + return; + + for (i = 0; i < smmu->num_clks; i++) + smmu->clks[i].id = clks[i]; Is this clk_bulk_get_all()? Ooh, did that finally get merged while we weren't looking? Great! Much as I don't want to drag this series out to a v19, it *would* be neat if we no longer need to open-code that bit... Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v16 5/5] iommu/arm-smmu: Add support for qcom, smmu-v2 variant
On 30/08/18 15:45, Vivek Gautam wrote: qcom,smmu-v2 is an arm,smmu-v2 implementation with specific clock and power requirements. On msm8996, multiple cores, viz. mdss, video, etc. use this smmu. On sdm845, this smmu is used with gpu. Add bindings for the same. Signed-off-by: Vivek Gautam Reviewed-by: Rob Herring Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla --- drivers/iommu/arm-smmu.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 166c8c6da24f..411e5ac57c64 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -119,6 +119,7 @@ enum arm_smmu_implementation { GENERIC_SMMU, ARM_MMU500, CAVIUM_SMMUV2, + QCOM_SMMUV2, Hmm, it seems we don't actually need this right now, but maybe that just means there's more imp-def registers and/or errata to come ;) Either way I guess there's no real harm in having it. Reviewed-by: Robin Murphy }; struct arm_smmu_s2cr { @@ -1970,6 +1971,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); +static const char * const qcom_smmuv2_clks[] = { + "bus", "iface", +}; + +static const struct arm_smmu_match_data qcom_smmuv2 = { + .version = ARM_SMMU_V2, + .model = QCOM_SMMUV2, + .clks = qcom_smmuv2_clks, + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), +}; + static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 }, { .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 }, @@ -1977,6 +1989,7 @@ static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,mmu-401", .data = &arm_mmu401 }, { .compatible = "arm,mmu-500", .data = &arm_mmu500 }, { .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 }, + { .compatible = "qcom,smmu-v2", .data = &qcom_smmuv2 }, { }, }; MODULE_DEVICE_TABLE(of, arm_smmu_of_match); ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v16 4/5] dt-bindings: arm-smmu: Add bindings for qcom, smmu-v2
On 30/08/18 15:45, Vivek Gautam wrote: Add bindings doc for Qcom's smmu-v2 implementation. Reviewed-by: Robin Murphy Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla --- .../devicetree/bindings/iommu/arm,smmu.txt | 39 ++ 1 file changed, 39 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 8a6ffce12af5..a6504b37cc21 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -17,10 +17,16 @@ conditions. "arm,mmu-401" "arm,mmu-500" "cavium,smmu-v2" +"qcom,smmu-v2" depending on the particular implementation and/or the version of the architecture implemented. + Qcom SoCs must contain, as below, SoC-specific compatibles + along with "qcom,smmu-v2": + "qcom,msm8996-smmu-v2", "qcom,smmu-v2", + "qcom,sdm845-smmu-v2", "qcom,smmu-v2". + - reg : Base address and size of the SMMU. - #global-interrupts : The number of global interrupts exposed by the @@ -71,6 +77,22 @@ conditions. or using stream matching with #iommu-cells = <2>, and may be ignored if present in such cases. +- clock-names:List of the names of clocks input to the device. The + required list depends on particular implementation and + is as follows: + - for "qcom,smmu-v2": +- "bus": clock required for downstream bus access and + for the smmu ptw, +- "iface": clock required to access smmu's registers + through the TCU's programming interface. + - unspecified for other implementations. + +- clocks: Specifiers for all clocks listed in the clock-names property, + as per generic clock bindings. + +- power-domains: Specifiers for power domains required to be powered on for + the SMMU to operate, as per generic power domain bindings. + ** Deprecated properties: - mmu-masters (deprecated in favour of the generic "iommus" binding) : @@ -137,3 +159,20 @@ conditions. iommu-map = <0 &smmu3 0 0x400>; ... }; + + /* Qcom's arm,smmu-v2 implementation */ + smmu4: iommu@d0 { + compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; + reg = <0xd0 0x1>; + + #global-interrupts = <1>; + interrupts = , +, +; + #iommu-cells = <1>; + power-domains = <&mmcc MDSS_GDSC>; + + clocks = <&mmcc SMMU_MDP_AXI_CLK>, +<&mmcc SMMU_MDP_AHB_CLK>; + clock-names = "bus", "iface"; + }; ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v16 3/5] iommu/arm-smmu: Add the device_link between masters and smmu
On 30/08/18 15:45, Vivek Gautam wrote: From: Sricharan R Finally add the device link between the master device and smmu, so that the smmu gets runtime enabled/disabled only when the master needs it. This is done from add_device callback which gets called once when the master is added to the smmu. Reviewed-by: Robin Murphy Signed-off-by: Sricharan R Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla --- drivers/iommu/arm-smmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 1bf542010be7..166c8c6da24f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1461,6 +1461,9 @@ static int arm_smmu_add_device(struct device *dev) iommu_device_link(&smmu->iommu, dev); + device_link_add(dev, smmu->dev, + DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER); + return 0; out_cfg_free: ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v16 2/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 30/08/18 15:45, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Global locks are also initialized before enabling runtime pm as the runtime_resume() calls device_reset() which does tlb_sync_global() that ultimately requires locks to be initialized. To the best of my knowledge in this stuff (which is still not quite enough to be *truly* confident...), Reviewed-by: Robin Murphy Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla --- drivers/iommu/arm-smmu.c | 89 +++- 1 file changed, 81 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index d900e007c3c9..1bf542010be7 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -268,6 +268,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = { { 0, NULL}, }; +static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) +{ + if (pm_runtime_enabled(smmu->dev)) + return pm_runtime_get_sync(smmu->dev); + + return 0; +} + +static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu) +{ + if (pm_runtime_enabled(smmu->dev)) + pm_runtime_put(smmu->dev); +} + static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain); @@ -913,11 +927,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cfg *cfg = &smmu_domain->cfg; - int irq; + int ret, irq; if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY) return; + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return; + /* * Disable the context bank and free the page tables before freeing * it. @@ -932,6 +950,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) free_io_pgtable_ops(smmu_domain->pgtbl_ops); __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx); + + arm_smmu_rpm_put(smmu); } static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) @@ -1213,10 +1233,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENODEV; smmu = fwspec_smmu(fwspec); + + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return ret; + /* Ensure that the domain is finalised */ ret = arm_smmu_init_domain_context(domain, smmu); if (ret < 0) - return ret; + goto rpm_put; /* * Sanity check the domain. We don't support domains across @@ -1226,33 +1251,50 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) dev_err(dev, "cannot attach to SMMU %s whilst already attached to domain on SMMU %s\n", dev_name(smmu_domain->smmu->dev), dev_name(smmu->dev)); - return -EINVAL; + ret = -EINVAL; + goto rpm_put; } /* Looks ok, so add the device to the domain */ - return arm_smmu_domain_add_master(smmu_domain, fwspec); + ret = arm_smmu_domain_add_master(smmu_domain, fwspec); + +rpm_put: + arm_smmu_rpm_put(smmu); + return ret; } static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; + int ret; if (!ops) return -ENODEV; - return ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_get(smmu); + ret = ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_put(smmu); + + return ret; } static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; + size_t ret; if (!ops) return 0; - return ops->unmap(ops, iova, size); + arm_smmu_rpm_get(smmu); + ret = ops->unmap(ops, iova, size); + arm_smmu_rpm_put(smmu); + + return ret; } static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
Re: [Freedreno] [PATCH v16 1/5] iommu/arm-smmu: Add pm_runtime/sleep ops
On 30/08/18 15:45, Vivek Gautam wrote: From: Sricharan R The smmu needs to be functional only when the respective master's using it are active. The device_link feature helps to track such functional dependencies, so that the iommu gets powered when the master device enables itself using pm_runtime. So by adapting the smmu driver for runtime pm, above said dependency can be addressed. This patch adds the pm runtime/sleep callbacks to the driver and also the functions to parse the smmu clocks from DT and enable them in resume/suspend. Also, while we enable the runtime pm add a pm sleep suspend callback that pushes devices to low power state by turning the clocks off in a system sleep. Also add corresponding clock enable path in resume callback. Signed-off-by: Sricharan R Signed-off-by: Archit Taneja [vivek: rework for clock and pm ops] Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla --- drivers/iommu/arm-smmu.c | 77 ++-- 1 file changed, 74 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index fd1b80ef9490..d900e007c3c9 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -48,6 +48,7 @@ #include #include #include +#include #include #include @@ -205,6 +206,8 @@ struct arm_smmu_device { u32 num_global_irqs; u32 num_context_irqs; unsigned int*irqs; + struct clk_bulk_data*clks; + int num_clks; u32cavium_id_base; /* Specific to Cavium */ @@ -1896,10 +1899,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) struct arm_smmu_match_data { enum arm_smmu_arch_version version; enum arm_smmu_implementation model; + const char * const *clks; + int num_clks; }; #define ARM_SMMU_MATCH_DATA(name, ver, imp) \ -static struct arm_smmu_match_data name = { .version = ver, .model = imp } +static const struct arm_smmu_match_data name = { .version = ver, .model = imp } ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU); @@ -1918,6 +1923,23 @@ static const struct of_device_id arm_smmu_of_match[] = { }; MODULE_DEVICE_TABLE(of, arm_smmu_of_match); +static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu, + const char * const *clks) +{ + int i; + + if (smmu->num_clks < 1) + return; + + smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks, + sizeof(*smmu->clks), GFP_KERNEL); + if (!smmu->clks) + return; + + for (i = 0; i < smmu->num_clks; i++) + smmu->clks[i].id = clks[i]; +} + #ifdef CONFIG_ACPI static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu) { @@ -2000,6 +2022,9 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev, data = of_device_get_match_data(dev); smmu->version = data->version; smmu->model = data->model; + smmu->num_clks = data->num_clks; + + arm_smmu_fill_clk_data(smmu, data->clks); parse_driver_options(smmu); @@ -2098,6 +2123,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev) smmu->irqs[i] = irq; } + err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks); + if (err) + return err; + + err = clk_bulk_prepare_enable(smmu->num_clks, smmu->clks); + if (err) + return err; + Hmm, if we error out beyond here it looks like we should strictly balance that prepare/enable before devres does the clk_bulk_put(), however the probe error path is starting to look like it needs a bit of love in general, so I might just spin a cleanup patch on top (and even then only for the sake of not being a bad example; SMMU probe failure is never a realistic situation for the system to actually recover from). Otherwise, Reviewed-by: Robin Murphy err = arm_smmu_device_cfg_probe(smmu); if (err) return err; @@ -2184,6 +2217,9 @@ static int arm_smmu_device_remove(struct platform_device *pdev) /* Turn the thing off */ writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0); + + clk_bulk_disable_unprepare(smmu->num_clks, smmu->clks); + return 0; } @@ -2192,15 +2228,50 @@ static void arm_smmu_device_shutdown(struct platform_device *pdev) arm_smmu_device_remove(pdev); } -static int __maybe_unused arm_smmu_pm_resume(struct device *dev) +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev) { struct arm_smmu_device *smmu = dev_get_drvdata(d
Re: [Freedreno] [PATCH v16 2/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
Hi Vivek, On 2018-09-25 6:56 AM, Vivek Gautam wrote: Hi Robin, Will, On Tue, Sep 18, 2018 at 8:41 AM Vivek Gautam wrote: Hi Robin, On Fri, Sep 7, 2018 at 3:52 PM Vivek Gautam wrote: On Fri, Sep 7, 2018 at 3:22 PM Tomasz Figa wrote: On Fri, Sep 7, 2018 at 6:38 PM Vivek Gautam wrote: Hi Tomasz, On 9/7/2018 2:46 PM, Tomasz Figa wrote: Hi Vivek, On Thu, Aug 30, 2018 at 11:46 PM Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Global locks are also initialized before enabling runtime pm as the runtime_resume() calls device_reset() which does tlb_sync_global() that ultimately requires locks to be initialized. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla --- drivers/iommu/arm-smmu.c | 89 +++- 1 file changed, 81 insertions(+), 8 deletions(-) [snip] @@ -2215,10 +2281,17 @@ static int arm_smmu_device_remove(struct platform_device *pdev) if (!bitmap_empty(smmu->context_map, ARM_SMMU_MAX_CBS)) dev_err(&pdev->dev, "removing device with active domains!\n"); + arm_smmu_rpm_get(smmu); /* Turn the thing off */ writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0); + arm_smmu_rpm_put(smmu); + + if (pm_runtime_enabled(smmu->dev)) + pm_runtime_force_suspend(smmu->dev); + else + clk_bulk_disable(smmu->num_clks, smmu->clks); - clk_bulk_disable_unprepare(smmu->num_clks, smmu->clks); + clk_bulk_unprepare(smmu->num_clks, smmu->clks); Aren't we missing pm_runtime_disable() here? We'll have the enable count unbalanced if the driver is removed and probed again. pm_runtime_force_suspend() does a pm_runtime_disable() also if i am not wrong. And, as mentioned in a previous thread [1], we were seeing a warning which we avoided by keeping force_suspend(). [1] https://lkml.org/lkml/2018/7/8/124 I see, thanks. I didn't realize that pm_runtime_force_suspend() already disables runtime PM indeed. Sorry for the noise. Hi Tomasz, No problem. Thanks for looking back at it. Hi Robin, If you are fine with this series, then can you please consider giving Reviewed-by, so that we are certain that this series will go in the next merge window. Thanks Gentle ping. You ack will be very helpful in letting Will pull this series for 4.20. Thanks. I would really appreciate if you could provide your ack for this series. Or if there are any concerns, I am willing to address them. Apologies, I thought I'd replied to say I'd be getting to this shortly, but apparently not :( FWIW, "shortly" is now tomorrow - I don't *think* there's anything outstanding, but given the number of subtleties we've turned up so far I do just want one last thorough double-check to make sure. Thanks, Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v14 0/4] iommu/arm-smmu: Add runtime pm/sleep support
On 20/08/18 10:31, Tomasz Figa wrote: Hi Robin, On Fri, Jul 27, 2018 at 4:02 PM Vivek Gautam wrote: This series provides the support for turning on the arm-smmu's clocks/power domains using runtime pm. This is done using device links between smmu and client devices. The device link framework keeps the two devices in correct order for power-cycling across runtime PM or across system-wide PM. With addition of a new device link flag DL_FLAG_AUTOREMOVE_SUPPLIER [8] (available in linux-next of Rafael's linux-pm tree [9]), the device links created between arm-smmu and its clients will be automatically purged when arm-smmu driver unbinds from its device. As not all implementations support clock/power gating, we are checking for a valid 'smmu->dev's pm_domain' to conditionally enable the runtime power management for such smmu implementations that can support it. Otherwise, the clocks are turned to be always on in .probe until .remove. With conditional runtime pm now, we avoid touching dev->power.lock in fastpaths for smmu implementations that don't need to do anything useful with pm_runtime. This lets us to use the much-argued pm_runtime_get_sync/put_sync() calls in map/unmap callbacks so that the clients do not have to worry about handling any of the arm-smmu's power. This series also adds support for Qcom's arm-smmu-v2 variant that has different clocks and power requirements. Previous version of this patch series is @ [2]. Tested this series on msm8996, and sdm845 after pulling in Rafael's linux-pm linux-next[9] and Joerg's iommu next[10] branches, and related changes for device tree, etc. Hi Robin, Will, I have addressed the comments for v13. If there's still a chance can you please consider pulling this for v4.19. Thanks. [v14] * Moved arm_smmu_device_reset() from arm_smmu_pm_resume() to arm_smmu_runtime_resume() so that the pm_resume callback calls only runtime_resume to resume the device. This should take care of restoring the state of smmu in systems in which smmu lose register state on power-domain collapse. It's been a while since this series was posted and no more comments seem to be left anymore. Would you have some time to take a look again? Thanks. Other than the binding issue which turned up in the meantime, I *think* this is looking OK now in terms of being sufficiently safe for all the various awkward retention vs. state-loss combinations. There's almost certainly still ways to improve it in future, but what we have now seems like a reasonable starting point that isn't impossibly complicated to reason about. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v14 4/4] iommu/arm-smmu: Add support for qcom, smmu-v2 variant
On 27/07/18 08:02, Vivek Gautam wrote: qcom,smmu-v2 is an arm,smmu-v2 implementation with specific clock and power requirements. This smmu core is used with multiple masters on msm8996, viz. mdss, video, etc. Add bindings for the same. Signed-off-by: Vivek Gautam Reviewed-by: Rob Herring Reviewed-by: Tomasz Figa --- Change since v13: - No change. .../devicetree/bindings/iommu/arm,smmu.txt | 42 ++ drivers/iommu/arm-smmu.c | 13 +++ 2 files changed, 55 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 8a6ffce12af5..7c71a6ed465a 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -17,10 +17,19 @@ conditions. "arm,mmu-401" "arm,mmu-500" "cavium,smmu-v2" +"qcom,-smmu-v2", "qcom,smmu-v2" depending on the particular implementation and/or the version of the architecture implemented. + A number of Qcom SoCs use qcom,smmu-v2 version of the IP. + "qcom,-smmu-v2" represents a soc specific compatible + string that should be present along with the "qcom,smmu-v2" + to facilitate SoC specific clocks/power connections and to + address specific bug fixes. As demonstrated in the GPU thread, this proves a bit too vague for a useful binding. Provided Qcom folks can reach a consensus on what a given SoC is actually called, I'd rather just unambiguously list whatever sets of fully-defined strings we need. Robin. + An example string would be - + "qcom,msm8996-smmu-v2", "qcom,smmu-v2". + - reg : Base address and size of the SMMU. - #global-interrupts : The number of global interrupts exposed by the @@ -71,6 +80,22 @@ conditions. or using stream matching with #iommu-cells = <2>, and may be ignored if present in such cases. +- clock-names:List of the names of clocks input to the device. The + required list depends on particular implementation and + is as follows: + - for "qcom,smmu-v2": +- "bus": clock required for downstream bus access and + for the smmu ptw, +- "iface": clock required to access smmu's registers + through the TCU's programming interface. + - unspecified for other implementations. + +- clocks: Specifiers for all clocks listed in the clock-names property, + as per generic clock bindings. + +- power-domains: Specifiers for power domains required to be powered on for + the SMMU to operate, as per generic power domain bindings. + ** Deprecated properties: - mmu-masters (deprecated in favour of the generic "iommus" binding) : @@ -137,3 +162,20 @@ conditions. iommu-map = <0 &smmu3 0 0x400>; ... }; + + /* Qcom's arm,smmu-v2 implementation */ + smmu4: iommu { + compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; + reg = <0xd0 0x1>; + + #global-interrupts = <1>; + interrupts = , +, +; + #iommu-cells = <1>; + power-domains = <&mmcc MDSS_GDSC>; + + clocks = <&mmcc SMMU_MDP_AXI_CLK>, +<&mmcc SMMU_MDP_AHB_CLK>; + clock-names = "bus", "iface"; + }; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index e558abf1ecfc..2b4edba188a5 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -119,6 +119,7 @@ enum arm_smmu_implementation { GENERIC_SMMU, ARM_MMU500, CAVIUM_SMMUV2, + QCOM_SMMUV2, }; struct arm_smmu_s2cr { @@ -1971,6 +1972,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); +static const char * const qcom_smmuv2_clks[] = { + "bus", "iface", +}; + +static const struct arm_smmu_match_data qcom_smmuv2 = { + .version = ARM_SMMU_V2, + .model = QCOM_SMMUV2, + .clks = qcom_smmuv2_clks, + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), +}; + static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 }, { .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 }, @@ -1978,6 +1990,7 @@ static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,mmu-401",
Re: [Freedreno] [PATCH 2/2] arm64: dts: sdm845: Add gpu and gmu device nodes
On 08/08/18 23:47, Jordan Crouse wrote: Add the nodes to describe the Adreno GPU and GMU devices. Signed-off-by: Jordan Crouse --- arch/arm64/boot/dts/qcom/sdm845.dtsi | 121 +++ 1 file changed, 121 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi index cdaabeb3c995..9fb90bb4ea1f 100644 --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi @@ -323,5 +323,126 @@ status = "disabled"; }; }; + + adreno_smmu: arm,smmu-adreno@504 { + compatible = "qcom,msm8996-smmu-v2"; Per the binding, this should be: compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; (note that even with Vivek's series the driver won't actually match the SoC-specific string until we find a real need to) + reg = <0x504 0x1>; + #iommu-cells = <1>; + #global-interrupts = <2>; Indentation? + interrupts = , +, And here? Otherwise, assuming the table walk really isn't cache-coherent, and the global and CB interrupts really do have different triggers (yuck :P), the SMMU parts look fine to me. Robin. +, +, +, +, +, +, +, +; + clocks = <&clock_gcc GCC_GPU_MEMNOC_GFX_CLK>, + <&clock_gcc GCC_GPU_CFG_AHB_CLK>; + clock-names = "bus", "iface"; + + power-domains = <&clock_gpucc GPU_CX_GDSC>; + }; + + gpu_opp_table: adreno-opp-table { + compatible = "operating-points-v2"; + + opp-71000 { + opp-hz = /bits/ 64 <71000>; + qcom,level = <416>; + }; + + opp-67500 { + opp-hz = /bits/ 64 <67500>; + qcom,level = <384>; + }; + + opp-59600 { + opp-hz = /bits/ 64 <59600>; + qcom,level = <320>; + }; + + opp-52000 { + opp-hz = /bits/ 64 <52000>; + qcom,level = <256>; + }; + + opp-41400 { + opp-hz = /bits/ 64 <41400>; + qcom,level = <192>; + }; + + opp-34200 { + opp-hz = /bits/ 64 <34200>; + qcom,level = <128>; + }; + + opp-25700 { + opp-hz = /bits/ 64 <25700>; + qcom,level = <64>; + }; + }; + + gpu@500 { + compatible = "qcom,adreno-630.2", "qcom,adreno"; + #stream-id-cells = <16>; + + reg = <0x500 0x4>; + reg-names = "kgsl_3d0_reg_memory"; + + /* +* Look ma, no clocks! The GPU clocks and power are +* controlled entirely by the GMU +*/ + + interrupts = <0 300 0>; + interrupt-names = "kgsl_3d0_irq"; + + iommus = <&adreno_smmu 0>; + + operating-points-v2 = <&gpu_opp_table>; + + qcom,gmu = <&gmu>; + }; + + gmu_opp_table: adreno-gmu-opp-table { + compatible = "operating-points-v2"; + + opp-4 { + opp-hz = /bits/ 64 <4>; + qcom,level = <128>; + }; + + opp-2 { + opp-hz = /bits/ 64 <2>; + qcom,level = <48>; + }; + }; + + gmu: gmu@506a000 { + compatible="qcom,adreno-gmu"; + + reg = <0x506a000 0x3>, + <0xb28 0x1>, + <0xb48 0x1>; + reg-names = "gmu", "gmu_pdc", "gmu_pdc_seq"; + + interrupts = , +; + interrupt-names = "hfi", "gmu";
Re: [Freedreno] [PATCH v13 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops
On 26/07/18 08:12, Vivek Gautam wrote: On Wed, Jul 25, 2018 at 11:46 PM, Vivek Gautam wrote: On Tue, Jul 24, 2018 at 8:51 PM, Robin Murphy wrote: On 19/07/18 11:15, Vivek Gautam wrote: From: Sricharan R The smmu needs to be functional only when the respective master's using it are active. The device_link feature helps to track such functional dependencies, so that the iommu gets powered when the master device enables itself using pm_runtime. So by adapting the smmu driver for runtime pm, above said dependency can be addressed. This patch adds the pm runtime/sleep callbacks to the driver and also the functions to parse the smmu clocks from DT and enable them in resume/suspend. Also, while we enable the runtime pm add a pm sleep suspend callback that pushes devices to low power state by turning the clocks off in a system sleep. Also add corresponding clock enable path in resume callback. Signed-off-by: Sricharan R Signed-off-by: Archit Taneja [vivek: rework for clock and pm ops] Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa --- Changes since v12: - Added pm sleep .suspend callback. This disables the clocks. - Added corresponding change to enable clocks in .resume pm sleep callback. drivers/iommu/arm-smmu.c | 75 ++-- 1 file changed, 73 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c73cfce1ccc0..9138a6fffe04 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c [snip] platform_device *pdev) arm_smmu_device_remove(pdev); } +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev) +{ + struct arm_smmu_device *smmu = dev_get_drvdata(dev); + + return clk_bulk_enable(smmu->num_clks, smmu->clks); If there's a power domain being automatically switched by genpd then we need a reset here because we may have lost state entirely. Since I remembered the otherwise-useless GPU SMMU on Juno is in a separate power domain, I gave it a poking via sysfs with some debug stuff to dump sCR0 in these callbacks, and the problem is clear: ... [4.625551] arm-smmu 2b40.iommu: genpd_runtime_suspend() [4.631163] arm-smmu 2b40.iommu: arm_smmu_runtime_suspend: 0x00201936 [4.637897] arm-smmu 2b40.iommu: suspend latency exceeded, 6733980 ns [ 21.566983] arm-smmu 2b40.iommu: genpd_runtime_resume() [ 21.584796] arm-smmu 2b40.iommu: arm_smmu_runtime_resume: 0x00220101 [ 21.591452] arm-smmu 2b40.iommu: resume latency exceeded, 6658020 ns ... Qualcomm SoCs have retention enabled for SMMU registers so they don't lose state. ... [ 256.013367] arm-smmu b4.arm,smmu: arm_smmu_runtime_suspend SCR0 = 0x201e36 [ 256.013367] [ 256.019160] arm-smmu b4.arm,smmu: arm_smmu_runtime_resume SCR0 = 0x201e36 [ 256.019160] [ 256.027368] arm-smmu b4.arm,smmu: arm_smmu_runtime_suspend SCR0 = 0x201e36 [ 256.027368] [ 256.036786] arm-smmu b4.arm,smmu: arm_smmu_runtime_resume SCR0 = 0x201e36 ... However after adding arm_smmu_device_reset() in runtime_resume() I observe some performance degradation when kill an instance of 'kmscube' and start it again. The launch time with arm_smmu_device_reset() in runtime_resume() change is more. Could this be because of frequent TLB invalidation and sync? Probably. Plus the reset procedure is a big chunk of MMIO accesses, which for a non-trivial SMMU configuration probably isn't negligible in itself. Unfortunately, unless you know for absolute certain that you don't need to do that, you do. Some more information that i gathered. On Qcom SoCs besides the registers retention, TCU invalidates TLB cache on a CX power collapse exit, which is the system wide suspend case. The arm-smmu software is not aware of this CX power collapse / auto-invalidation. So wouldn't doing an explicit TLB invalidations during runtime resume be detrimental to performance? Indeed it would be, but resuming with TLBs full of random valid-looking junk is even more so. I have one more doubt here - We do runtime power cycle around arm_smmu_map/unmap() too. Now during map/unmap we selectively do TLB maintenance (either tlb_sync or tlb_add_flush). But with runtime pm we want to do TLBIALL*. Is that a problem? It's technically redundant to do both, true, but as we've covered in previous rounds of discussion it's very difficult to know *which* one is sufficient at any given time, so in order to make progress for now I think we have to settle with doing both. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v13 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops
On 19/07/18 11:15, Vivek Gautam wrote: From: Sricharan R The smmu needs to be functional only when the respective master's using it are active. The device_link feature helps to track such functional dependencies, so that the iommu gets powered when the master device enables itself using pm_runtime. So by adapting the smmu driver for runtime pm, above said dependency can be addressed. This patch adds the pm runtime/sleep callbacks to the driver and also the functions to parse the smmu clocks from DT and enable them in resume/suspend. Also, while we enable the runtime pm add a pm sleep suspend callback that pushes devices to low power state by turning the clocks off in a system sleep. Also add corresponding clock enable path in resume callback. Signed-off-by: Sricharan R Signed-off-by: Archit Taneja [vivek: rework for clock and pm ops] Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa --- Changes since v12: - Added pm sleep .suspend callback. This disables the clocks. - Added corresponding change to enable clocks in .resume pm sleep callback. drivers/iommu/arm-smmu.c | 75 ++-- 1 file changed, 73 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c73cfce1ccc0..9138a6fffe04 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -48,6 +48,7 @@ #include #include #include +#include #include #include @@ -205,6 +206,8 @@ struct arm_smmu_device { u32 num_global_irqs; u32 num_context_irqs; unsigned int*irqs; + struct clk_bulk_data*clks; + int num_clks; u32cavium_id_base; /* Specific to Cavium */ @@ -1897,10 +1900,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) struct arm_smmu_match_data { enum arm_smmu_arch_version version; enum arm_smmu_implementation model; + const char * const *clks; + int num_clks; }; #define ARM_SMMU_MATCH_DATA(name, ver, imp) \ -static struct arm_smmu_match_data name = { .version = ver, .model = imp } +static const struct arm_smmu_match_data name = { .version = ver, .model = imp } ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU); @@ -1919,6 +1924,23 @@ static const struct of_device_id arm_smmu_of_match[] = { }; MODULE_DEVICE_TABLE(of, arm_smmu_of_match); +static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu, + const char * const *clks) +{ + int i; + + if (smmu->num_clks < 1) + return; + + smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks, + sizeof(*smmu->clks), GFP_KERNEL); + if (!smmu->clks) + return; + + for (i = 0; i < smmu->num_clks; i++) + smmu->clks[i].id = clks[i]; +} + #ifdef CONFIG_ACPI static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu) { @@ -2001,6 +2023,9 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev, data = of_device_get_match_data(dev); smmu->version = data->version; smmu->model = data->model; + smmu->num_clks = data->num_clks; + + arm_smmu_fill_clk_data(smmu, data->clks); parse_driver_options(smmu); @@ -2099,6 +2124,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev) smmu->irqs[i] = irq; } + err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks); + if (err) + return err; + + err = clk_bulk_prepare(smmu->num_clks, smmu->clks); + if (err) + return err; + err = arm_smmu_device_cfg_probe(smmu); if (err) return err; @@ -2181,6 +2214,9 @@ static int arm_smmu_device_remove(struct platform_device *pdev) /* Turn the thing off */ writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0); + + clk_bulk_unprepare(smmu->num_clks, smmu->clks); + return 0; } @@ -2189,15 +2225,50 @@ static void arm_smmu_device_shutdown(struct platform_device *pdev) arm_smmu_device_remove(pdev); } +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev) +{ + struct arm_smmu_device *smmu = dev_get_drvdata(dev); + + return clk_bulk_enable(smmu->num_clks, smmu->clks); If there's a power domain being automatically switched by genpd then we need a reset here because we may have lost state entirely. Since I remembered the otherwise-useless GPU SMMU on Juno is in a separate power domain, I gave it a poking via sysfs with some debug stuff to dump sCR0 in these callbacks, and the problem is clear: ... [4.625551] arm-smmu 2b40.iommu: genpd_runtime_suspend() [4.631163] arm-smmu 2b4
Re: [Freedreno] [PATCH v12 3/4] iommu/arm-smmu: Add the device_link between masters and smmu
On 18/07/18 10:30, Vivek Gautam wrote: On Wed, Jul 11, 2018 at 3:23 PM, Rafael J. Wysocki wrote: On Sunday, July 8, 2018 7:34:12 PM CEST Vivek Gautam wrote: From: Sricharan R Finally add the device link between the master device and smmu, so that the smmu gets runtime enabled/disabled only when the master needs it. This is done from add_device callback which gets called once when the master is added to the smmu. Signed-off-by: Sricharan R Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa Cc: Rafael J. Wysocki Cc: Lukas Wunner --- - Change since v11 * Replaced DL_FLAG_AUTOREMOVE flag with DL_FLAG_AUTOREMOVE_SUPPLIER. drivers/iommu/arm-smmu.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 09265e206e2d..916cde4954d2 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1461,8 +1461,20 @@ static int arm_smmu_add_device(struct device *dev) iommu_device_link(&smmu->iommu, dev); + if (pm_runtime_enabled(smmu->dev) && Why does the creation of the link depend on whether or not runtime PM is enabled for the MMU device? What about system-wide PM and system shutdown? Are they always guaranteed to happen in the right order without the link? Hi Robin, As Rafael pointed, we should the device link creation should not depend on runtime PM being enabled or not, as we would also want to guarantee that system wide PM callbacks are called in the right order for smmu and clients. Does this change of removing the check for pm_runtime_enabled() from here looks okay to you? FWIW the existing system PM ops make no claim to be perfect, and I wouldn't be at all surprised if it was only by coincidence that my devices happened to put on the relevant lists in the right order to start with. If we no longer need to worry about explicit device_link housekeeping in the SMMU driver, then creating them unconditionally sounds like the sensible thing to do. I'd be inclined to treat failure as non-fatal like we do for the sysfs link, though, since it's another thing that correct SMMU operation doesn't actually depend on (at this point we don't necessarily know if this consumer even has a driver at all). FYI, as discussed in the first patch [1] of this series, I will add a system wide suspend callback - arm_smmu_pm_suspend, that would do clock disable, and will add corresponding clock enable calls in arm_smmu_pm_resume(). OK, I still don't really understand the finer points of how system PM and runtime PM interact, but if making it robust is just a case of calling the runtime suspend/resume hooks as appropriate from the system ones, that sounds reasonable. Robin. [1] https://lore.kernel.org/patchwork/patch/960460/ Best regards Vivek + !device_link_add(dev, smmu->dev, + DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER)) { + dev_err(smmu->dev, "Unable to add link to the consumer %s\n", + dev_name(dev)); + ret = -ENODEV; + goto out_unlink; + } + return 0; +out_unlink: + iommu_device_unlink(&smmu->iommu, dev); + arm_smmu_master_free_smes(fwspec); out_cfg_free: kfree(cfg); out_free: ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [VERY RFC 0/2] iommu: optionally skip attaching to a DMA domain
On 11/04/18 19:55, Jordan Crouse wrote: I've been struggling with a problem for a while and I haven't been able to come up with a clean solution. Rob convinced me to stop complaining and do _something_ and hopefully this can spur a good discussion. The scenario is basically this: The MSM GPU wants to manage its own IOMMU virtual address space in lieu of using the DMA API to do so. The most important reason is that when per-instance pagetables [1] are enabled each file descriptor uses its own pagetable which are dynamically switched between contexts. In conjunction with this we use a split pagetable scheme to allow global buffers be persistently mapped so even a single pagetable has multiple ranges that need to be utilized based on buffer type. Obviously the DMA API isn't suitable for this kind of specialized use. We want to push it off to the side, allocate our own domain, and use it directly with the IOMMU APIs. In a perfect world we would just not use the DMA API and attach our own domain and that would be the end of the story. Unfortunately this is not a perfect world. In order to switch the pagetable from the GPU the SoC has some special wiring so that the GPU can reprogram the TTBR0 register in the IOMMU to the appropriate value. For a variety of reasons the hardware is hardcoded to only be able to access context bank 0 in the SMMU device. Long story short; if the DMA domain is allocated during bus enumeration and attached to context bank 0 then by the time we get to the driver domain we have long since lost our chance to get the context bank we need. Red herrings abound! That's not actually the DMA domain's fault at all, it just happens to fall out of the current SMMU driver behaviour. If the context bank allocator went top-down instead of bottom-up (IIRC I nearly ended up with that at one point during refactoring all that lot, or maybe that was the SMR allocator...) then you'd see a very different perspective of the problem. The driver could certainly do with being cleverer in terms of not keeping hardware CBs allocated for inactive domains - right now I don't think you can even move a device from one domain to another at all if you only have a single context bank, which is pretty bogus. That alone might even manage to hide this problem, but I wouldn't like to rely on it. The "device X can only use context bank N" condition is a real hardware limitation which should be described by firmware, but I don't have any immediate good ideas as to how :( The matter of opting out of DMA ops which expect the default domain is a separate and more general issue, so I won't start a fork of that discussion here. Robin. After going back and forth and trying a few possible options it seems like the "easiest" solution" is to skip attaching the DMA domain in the first place. But we still need to create a default domain and set up the IOMMU groups correctly so that when the GPU driver comes along it can attach the device and everything will Just Work (TM). Rob suggested that we should use a blacklist of devices that choose to not participate so thats what I'm offering as a starting point for discussion. The first patch in this series allows the specific IOMMU driver to gracefully skip attaching a device by returning -ENOSUPP. In that case, the core will skip printing error messages and it won't attach the domain to the group but it still allows the group to get created so that the IOMMU device is properly set up. In arch_setup_dma_ops() the call to iommu_get_domain_for_dev() will return NULL and the IOMMU ops won't be set up. The second patch implements the blacklist in arm-smmu.c based on the compatible string of the GPU device and returns -ENOTSUPP. I tested this with my current per-instance pagetable stack and it does the job but I am very much in the market for better ideas. [1] https://patchwork.freedesktop.org/series/38729/ Jordan Crouse (2): iommu: Gracefully allow drivers to not attach to a default domain iommu/arm-smmu: Add list of devices to opt out of DMA domains drivers/iommu/arm-smmu.c | 23 +++ drivers/iommu/iommu.c| 18 ++ 2 files changed, 37 insertions(+), 4 deletions(-) ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH 02/14] iommu/arm-smmu: Add support for TTBR1
On 21/02/18 22:59, Jordan Crouse wrote: Allow a SMMU device to opt into allocating a TTBR1 pagetable. The size of the TTBR1 region will be the same as the TTBR0 size with the sign extension bit set on the highest bit in the region unless the upstream size is 49 bits and then the sign-extension bit will be set on the 49th bit. Um, isn't the 49th bit still "the highest bit" if the address size is 49 bits? ;) The map/unmap operations will automatically use the appropriate pagetable based on the specified iova and the existing mask. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu-regs.h | 2 - drivers/iommu/arm-smmu.c | 22 -- drivers/iommu/io-pgtable-arm.c | 160 - drivers/iommu/io-pgtable-arm.h | 20 ++ drivers/iommu/io-pgtable.h | 16 - 5 files changed, 192 insertions(+), 28 deletions(-) diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h index a1226e4ab5f8..0ce85d5b22e9 100644 --- a/drivers/iommu/arm-smmu-regs.h +++ b/drivers/iommu/arm-smmu-regs.h @@ -193,8 +193,6 @@ enum arm_smmu_s2cr_privcfg { #define RESUME_RETRY (0 << 0) #define RESUME_TERMINATE (1 << 0) -#define TTBCR2_SEP_SHIFT 15 -#define TTBCR2_SEP_UPSTREAM(0x7 << TTBCR2_SEP_SHIFT) #define TTBCR2_AS (1 << 4) #define TTBRn_ASID_SHIFT 48 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 69e7c60792a8..ebfa59b59622 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -248,6 +248,7 @@ struct arm_smmu_domain { enum arm_smmu_domain_stage stage; struct mutexinit_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ + u32 attributes; struct iommu_domain domain; }; @@ -598,7 +599,6 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, } else { cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr; cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32; - cb->tcr[1] |= TTBCR2_SEP_UPSTREAM; if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) cb->tcr[1] |= TTBCR2_AS; } @@ -729,6 +729,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, enum io_pgtable_fmt fmt; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_cfg *cfg = &smmu_domain->cfg; + unsigned int quirks = + smmu_domain->attributes & (1 << DOMAIN_ATTR_ENABLE_TTBR1) ? + IO_PGTABLE_QUIRK_ARM_TTBR1 : 0; mutex_lock(&smmu_domain->init_mutex); if (smmu_domain->smmu) @@ -852,7 +855,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, else cfg->asid = cfg->cbndx + smmu->cavium_id_base; + if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) + quirks |= IO_PGTABLE_QUIRK_NO_DMA; + pgtbl_cfg = (struct io_pgtable_cfg) { + .quirks = quirks, .pgsize_bitmap = smmu->pgsize_bitmap, .ias= ias, .oas= oas, @@ -860,9 +867,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, .iommu_dev = smmu->dev, }; - if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) - pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA; - smmu_domain->smmu = smmu; pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); if (!pgtbl_ops) { @@ -1477,6 +1481,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_ENABLE_TTBR1: + *((int *)data) = !!(smmu_domain->attributes + & (1 << DOMAIN_ATTR_ENABLE_TTBR1)); + return 0; default: return -ENODEV; } @@ -1505,6 +1513,12 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, else smmu_domain->stage = ARM_SMMU_DOMAIN_S1; + break; + case DOMAIN_ATTR_ENABLE_TTBR1: + if (*((int *)data)) + smmu_domain->attributes |= + 1 << DOMAIN_ATTR_ENABLE_TTBR1; + ret = 0; break; default: ret = -ENODEV; diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index fff0b6ba0a69..1bd0045f2cb7 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -152,7 +152,7 @@ struct arm_lpae_io_pgtable { unsigned long
Re: [Freedreno] [PATCH 01/14] iommu: Add DOMAIN_ATTR_ENABLE_TTBR1
On 21/02/18 22:59, Jordan Crouse wrote: Add a new domain attribute to enable the TTBR1 pagetable for drivers and devices that support it. This will enabled using a TTBR1 (otherwise known as a "global" or "system" pagetable for devices that support a split pagetable scheme for switching pagetables quickly and safely. TTBR1 is very much an Arm VMSA-specific term; if the concept of a split address space is useful in general, is it worth trying to frame it in general terms? AFAICS other IOMMU drivers could achieve the same effect fairly straightforwardly by simply copying the top-level "global" entries across whenever they switch "private" tables. FWIW even for SMMU there could potentially be cases with Arm Ltd. IP where the SoC vendor implements a stage-2-only configuration in their media subsystem, because they care most about minimising area and stage-1-only isn't an option. Robin. Signed-off-by: Jordan Crouse --- include/linux/iommu.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 641aaf0f1b81..e2c49e583d8d 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -153,6 +153,7 @@ enum iommu_attr { DOMAIN_ATTR_FSL_PAMU_ENABLE, DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_NESTING,/* two stages of translation */ + DOMAIN_ATTR_ENABLE_TTBR1, DOMAIN_ATTR_MAX, }; ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers
[sorry, I had intended to reply sooner but clearly forgot] On 16/02/18 00:13, Tomasz Figa wrote: On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy wrote: On 15/02/18 04:17, Tomasz Figa wrote: [...] Could you elaborate on what kind of locking you are concerned about? As I explained before, the normally happening fast path would lock dev->power_lock only for the brief moment of incrementing the runtime PM usage counter. My bad, that's not even it. The atomic usage counter is incremented beforehands, without any locking [1] and the spinlock is acquired only for the sake of validating that device's runtime PM state remained valid indeed [2], which would be the case in the fast path of the same driver doing two mappings in parallel, with the master powered on (and so the SMMU, through device links; if master was not powered on already, powering on the SMMU is unavoidable anyway and it would add much more latency than the spinlock itself). We now have no locking at all in the map path, and only a per-domain lock around TLB sync in unmap which is unfortunately necessary for correctness; the latter isn't too terrible, since in "serious" hardware it should only be serialising a few cpus serving the same device against each other (e.g. for multiple queues on a single NIC). Putting in a global lock which serialises *all* concurrent map and unmap calls for *all* unrelated devices makes things worse. Period. Even if the lock itself were held for the minimum possible time, i.e. trivially "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that one cache line around between 96 CPUs across two sockets is not negligible. Fair enough. Note that we're in a quite interesting situation now: a) We need to have runtime PM enabled on Qualcomm SoC to have power properly managed, b) We need to have lock-free map/unmap on such distributed systems, c) If runtime PM is enabled, we need to call into runtime PM from any code that does hardware accesses, otherwise the IOMMU API (and so DMA API and then any V4L2 driver) becomes unusable. I can see one more way that could potentially let us have all the three. How about enabling runtime PM only on selected implementations (e.g. qcom,smmu) and then having all the runtime PM calls surrounded with if (pm_runtime_enabled()), which is lockless? Yes, that's the kind of thing I was gravitating towards - my vague thought was adding some flag to the smmu_domain, but pm_runtime_enabled() does look conceptually a lot cleaner. [1] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028 [2] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613 In any case, I can't imagine this working with V4L2 or anything else relying on any memory management more generic than calling IOMMU API directly from the driver, with the IOMMU device having runtime PM enabled, but without managing the runtime PM from the IOMMU driver's callbacks that need access to the hardware. As I mentioned before, only the IOMMU driver knows when exactly the real hardware access needs to be done (e.g. Rockchip/Exynos don't need to do that for map/unmap if the power is down, but some implementations of SMMU with TLB powered separately might need to do so). It's worth noting that Exynos and Rockchip are relatively small self-contained IP blocks integrated closely with the interfaces of their relevant master devices; SMMU is an architecture, implementations of which may be large, distributed, and have complex and wildly differing internal topologies. As such, it's a lot harder to make hardware-specific assumptions and/or be correct for all possible cases. Don't get me wrong, I do ultimately agree that the IOMMU driver is the only agent who ultimately knows what calls are going to be necessary for whatever operation it's performing on its own hardware*; it's just that for SMMU it needs to be implemented in a way that has zero impact on the cases where it doesn't matter, because it's not viable to specialise that driver for any particular IP implementation/use-case. Still, exactly the same holds for the low power embedded use cases, where we strive for the lowest possible power consumption, while maintaining performance levels high as well. And so the SMMU code is expected to also work with our use cases, such as V4L2 or DRM drivers. Since these points don't hold for current SMMU code, I could say that the it has been already specialized for large, distributed implementations. Heh, really it's specialised for ease of maintenance in terms of doing as little as we can get away with, but for what we have implemented, fast code does save CPU cycles and power on embedded systems too ;) Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers
On 15/02/18 04:17, Tomasz Figa wrote: [...] Could you elaborate on what kind of locking you are concerned about? As I explained before, the normally happening fast path would lock dev->power_lock only for the brief moment of incrementing the runtime PM usage counter. My bad, that's not even it. The atomic usage counter is incremented beforehands, without any locking [1] and the spinlock is acquired only for the sake of validating that device's runtime PM state remained valid indeed [2], which would be the case in the fast path of the same driver doing two mappings in parallel, with the master powered on (and so the SMMU, through device links; if master was not powered on already, powering on the SMMU is unavoidable anyway and it would add much more latency than the spinlock itself). We now have no locking at all in the map path, and only a per-domain lock around TLB sync in unmap which is unfortunately necessary for correctness; the latter isn't too terrible, since in "serious" hardware it should only be serialising a few cpus serving the same device against each other (e.g. for multiple queues on a single NIC). Putting in a global lock which serialises *all* concurrent map and unmap calls for *all* unrelated devices makes things worse. Period. Even if the lock itself were held for the minimum possible time, i.e. trivially "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that one cache line around between 96 CPUs across two sockets is not negligible. [1] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028 [2] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613 In any case, I can't imagine this working with V4L2 or anything else relying on any memory management more generic than calling IOMMU API directly from the driver, with the IOMMU device having runtime PM enabled, but without managing the runtime PM from the IOMMU driver's callbacks that need access to the hardware. As I mentioned before, only the IOMMU driver knows when exactly the real hardware access needs to be done (e.g. Rockchip/Exynos don't need to do that for map/unmap if the power is down, but some implementations of SMMU with TLB powered separately might need to do so). It's worth noting that Exynos and Rockchip are relatively small self-contained IP blocks integrated closely with the interfaces of their relevant master devices; SMMU is an architecture, implementations of which may be large, distributed, and have complex and wildly differing internal topologies. As such, it's a lot harder to make hardware-specific assumptions and/or be correct for all possible cases. Don't get me wrong, I do ultimately agree that the IOMMU driver is the only agent who ultimately knows what calls are going to be necessary for whatever operation it's performing on its own hardware*; it's just that for SMMU it needs to be implemented in a way that has zero impact on the cases where it doesn't matter, because it's not viable to specialise that driver for any particular IP implementation/use-case. Robin. *AFAICS it still makes some sense to have the get_suppliers option as well, though - the IOMMU driver does what it needs for correctness internally, but the external consumer doing something non-standard can can grab and hold the link around multiple calls to short-circuit that. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers
On 14/02/18 10:33, Vivek Gautam wrote: On Wed, Feb 14, 2018 at 2:46 PM, Tomasz Figa wrote: Adding Jordan to this thread as well. On Wed, Feb 14, 2018 at 6:13 PM, Vivek Gautam wrote: Hi Tomasz, On Wed, Feb 14, 2018 at 11:08 AM, Tomasz Figa wrote: On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam wrote: Hi Tomasz, On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa wrote: On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark wrote: On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa wrote: On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark wrote: On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa wrote: Hi Vivek, Thanks for the patch. Please see my comments inline. On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam wrote: While handling the concerned iommu, there should not be a need to power control the drm devices from iommu interface. If these drm devices need to be powered around this time, the respective drivers should take care of this. Replace the pm_runtime_get/put_sync() with pm_runtime_get/put_suppliers() calls, to power-up the connected iommu through the device link interface. In case the device link is not setup these get/put_suppliers() calls will be a no-op, and the iommu driver should take care of powering on its devices accordingly. Signed-off-by: Vivek Gautam --- drivers/gpu/drm/msm/msm_iommu.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index b23d33622f37..1ab629bbee69 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, struct msm_iommu *iommu = to_msm_iommu(mmu); int ret; - pm_runtime_get_sync(mmu->dev); + pm_runtime_get_suppliers(mmu->dev); ret = iommu_attach_device(iommu->domain, mmu->dev); - pm_runtime_put_sync(mmu->dev); + pm_runtime_put_suppliers(mmu->dev); For me, it looks like a wrong place to handle runtime PM of IOMMU here. iommu_attach_device() calls into IOMMU driver's attach_device() callback and that's where necessary runtime PM gets should happen, if any. In other words, driver A (MSM DRM driver) shouldn't be dealing with power state of device controlled by driver B (ARM SMMU). Note that we end up having to do the same, because of iommu_unmap() while DRM driver is powered off.. it might be cleaner if it was all self contained in the iommu driver, but that would make it so other drivers couldn't call iommu_unmap() from an irq handler, which is apparently something that some of them want to do.. I'd assume that runtime PM status is already guaranteed to be active when the IRQ handler is running, by some other means (e.g. pm_runtime_get_sync() called earlier, when queuing some work to the hardware). Otherwise, I'm not sure how a powered down device could trigger an IRQ. So, if the master device power is already on, suppliers should be powered on as well, thanks to device links. umm, that is kindof the inverse of the problem.. the problem is things like gpu driver (and v4l2 drivers that import dma-buf's, afaict).. they will potentially call iommu->unmap() when device is not active (due to userspace or things beyond the control of the driver).. so *they* would want iommu to do pm get/put calls. Which is fine and which is actually already done by one of the patches in this series, not for map/unmap, but probe, add_device, remove_device. Having parts of the API doing it inside the callback and other parts outside sounds at least inconsistent. But other drivers trying to unmap from irq ctx would not. Which is the contradictory requirement that lead to the idea of iommu user powering up iommu for unmap. Sorry, maybe I wasn't clear. My last message was supposed to show that it's not contradictory at all, because "other drivers trying to unmap from irq ctx" would already have called pm_runtime_get_*() earlier from a non-irq ctx, which would have also done the same on all the linked suppliers, including the IOMMU. The ultimate result would be that the map/unmap() of the IOMMU driver calling pm_runtime_get_sync() would do nothing besides incrementing the reference count. The entire point was to avoid the slowpath that pm_runtime_get/put_sync() would add in map/unmap. It would not be correct to add a slowpath in irq_ctx for taking care of non-irq_ctx and for the situations where master is already powered-off. Correct me if I'm wrong, but I believe that with what I'm proposing there wouldn't be any slow path. Yea, but only when the power domain is irq-safe? And not all platforms enable irq-safe power domains. For instance, msm doesn't enable its gdsc power domains as irq-safe. Is it something i am missing? irq-safe would matter if there would exist a case when the call is done from IRQ context and the power is off. As I explained in a), it shouldn't happen. Hi Robin, Will Does adding pm_runtime_
Re: [Freedreno] [PATCH v7 1/6] base: power: runtime: Export pm_runtime_get/put_suppliers
On 13/02/18 12:54, Tomasz Figa wrote: On Tue, Feb 13, 2018 at 9:00 PM, Robin Murphy wrote: On 13/02/18 07:44, Tomasz Figa wrote: Hi Vivek, On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam wrote: The device link allows the pm framework to tie the supplier and consumer. So, whenever the consumer is powered-on the supplier is powered-on first. There are however cases in which the consumer wants to power-on the supplier, but not itself. E.g., A Graphics or multimedia driver wants to power-on the SMMU to unmap a buffer and finish the TLB operations without powering on itself. This sounds strange to me. If the SMMU is powered down, wouldn't the TLB lose its contents as well (and so no flushing needed)? Depends on implementation details - if runtime PM is actually implemented via external clock gating (in the absence of fine-grained power domains), then "suspended" TLBs might both retain state and not receive invalidation requests, which is really the worst case. Agreed. That's why in "[PATCH v7 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device" I actually suggested managing clocks separately from runtime PM. At least until runtime PM framework arrives at a state, where multiple power states can be managed, i.e. full power state, clock-gated state, domain-off state. (I think I might have seen some ongoing work on this on LWN though...) Other than that, what kind of hardware operations would be needed besides just updating the page tables from the CPU? Domain attach/detach also require updating SMMU hardware state (and possibly TLB maintenance), but don't logically require the master device itself to be active at the time. Wouldn't this hardware state need to be reinitialized anyway after respective power domain power cycles? (In other words, hardware would only need programming if it's powered on at the moment.) Yes, if the entire SMMU was fully powered down because all masters were inactive, then all that should need to be done is to update the software shadow state in the expectation that arm_smmu_reset() would re-sync it upon TCU powerup. If at least some part of the internal logic remains active, though, then you may or may not need to fiddle with zero or more clocks and/or power domains (depending on microarchitecture and integration) in order to be sure that everything from the programming slave interface through to wherever that state is kept works correctly so that it can be changed. The main motivation here is that the Qualcomm SMMU microarchitecture apparently allows the programming interface to be shut down separately from the TCU core (context banks, page table walker, etc.), and they get an appreciable power saving from doing so. This is different from, say, the Arm Ltd. implementations, where the entire TCU is a single clock/power domain internally (although you could maybe still gate the external APB interface clock). As the previous discussions have shown, this is really, really hard to do properly in a generic manner. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v7 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 13/02/18 08:24, Tomasz Figa wrote: Hi Vivek, Thanks for the patch. Please see my comments inline. On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 42 ++ 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 9e2f917e16c2..c024f69c1682 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -913,11 +913,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cfg *cfg = &smmu_domain->cfg; - int irq; + int ret, irq; if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY) return; + ret = pm_runtime_get_sync(smmu->dev); + if (ret) + return; pm_runtime_get_sync() will return 0 if the device was powered off, 1 if it was already/still powered on or runtime PM is not compiled in, or a negative value on error, so shouldn't the test be (ret < 0)? Moreover, I'm actually wondering if it makes any sense to power up the hardware just to program it and power it down again. In a system where the IOMMU is located within a power domain, it would cause the IOMMU block to lose its state anyway. This is generally for the case where the SMMU internal state remains active, but the programming interface needs to be powered up in order to access it. Actually, reflecting back on "[PATCH v7 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops", perhaps it would make more sense to just control the clocks independently of runtime PM? Then, runtime PM could be used for real power management, e.g. really powering the block up and down, for further power saving. Unfortunately that ends up pretty much unmanageable, because there are numerous different SMMU microarchitectures with fundamentally different clock/power domain schemes (multiplied by individual SoC integration possibilities). Since this is fundamentally a generic architectural driver, adding explicit clock support would probably make the whole thing about 50% clock code, with complicated decision trees around every hardware access calculating which clocks are necessary for a given operation on a given system. That maintainability aspect is why we've already nacked such a fine-grained approach in the past. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v7 1/6] base: power: runtime: Export pm_runtime_get/put_suppliers
On 13/02/18 07:44, Tomasz Figa wrote: Hi Vivek, On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam wrote: The device link allows the pm framework to tie the supplier and consumer. So, whenever the consumer is powered-on the supplier is powered-on first. There are however cases in which the consumer wants to power-on the supplier, but not itself. E.g., A Graphics or multimedia driver wants to power-on the SMMU to unmap a buffer and finish the TLB operations without powering on itself. This sounds strange to me. If the SMMU is powered down, wouldn't the TLB lose its contents as well (and so no flushing needed)? Depends on implementation details - if runtime PM is actually implemented via external clock gating (in the absence of fine-grained power domains), then "suspended" TLBs might both retain state and not receive invalidation requests, which is really the worst case. Other than that, what kind of hardware operations would be needed besides just updating the page tables from the CPU? Domain attach/detach also require updating SMMU hardware state (and possibly TLB maintenance), but don't logically require the master device itself to be active at the time. Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v6 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
On 02/02/18 05:40, Sricharan R wrote: Hi Robin/Vivek, On 2/1/2018 2:23 PM, Vivek Gautam wrote: Hi, On 1/31/2018 6:39 PM, Robin Murphy wrote: On 19/01/18 11:43, Vivek Gautam wrote: From: Sricharan R Finally add the device link between the master device and smmu, so that the smmu gets runtime enabled/disabled only when the master needs it. This is done from add_device callback which gets called once when the master is added to the smmu. Don't we need to balance this with a device_link_del() in .remove_device (like exynos-iommu does)? Right. Will add device_link_del() call. Thanks for pointing out. The reason for not adding device_link_del from .remove_device was, the core device_del which calls the .remove_device from notifier, calls device_links_purge before that. That does the same thing as device_link_del. So by the time .remove_device is called, device_links for that device is already cleaned up. Vivek, you may want to check once that calling device_link_del from .remove_device has no effect, just to confirm once more. There is at least one path in which .remove_device is not called via the notifier from device_del(), which is in the cleanup path of iommu_bus_init(). AFAICS any links created by .add_device during that process would be left dangling, because the device(s) would be live but otherwise disassociated from the IOMMU afterwards. From a maintenance perspective it's easier to have the call in its logical place even if it does nothing 99% of the time; that way we shouldn't have to keep an eye out for subtle changes in the power management code or driver core that might invalidate the device_del() reasoning above, and the power management guys shouldn't have to comprehend the internals of the IOMMU API to make sense of the unbalanced call if they ever want to change their API. Thanks, Robin. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v6 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
On 19/01/18 11:43, Vivek Gautam wrote: From: Sricharan R Finally add the device link between the master device and smmu, so that the smmu gets runtime enabled/disabled only when the master needs it. This is done from add_device callback which gets called once when the master is added to the smmu. Don't we need to balance this with a device_link_del() in .remove_device (like exynos-iommu does)? Robin. Signed-off-by: Sricharan R --- drivers/iommu/arm-smmu.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 95478bfb182c..33bbcfedb896 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1367,6 +1367,7 @@ static int arm_smmu_add_device(struct device *dev) struct arm_smmu_device *smmu; struct arm_smmu_master_cfg *cfg; struct iommu_fwspec *fwspec = dev->iommu_fwspec; + struct device_link *link; int i, ret; if (using_legacy_binding) { @@ -1428,6 +1429,16 @@ static int arm_smmu_add_device(struct device *dev) pm_runtime_put_sync(smmu->dev); + /* +* Establish the link between smmu and master, so that the +* smmu gets runtime enabled/disabled as per the master's +* needs. +*/ + link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME); + if (!link) + dev_warn(smmu->dev, "Unable to create device link between %s and %s\n", +dev_name(smmu->dev), dev_name(dev)); + return 0; out_cfg_free: ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH v6 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 19/01/18 11:43, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 45 + 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 21acffe91a1c..95478bfb182c 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -914,11 +914,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cfg *cfg = &smmu_domain->cfg; - int irq; + int ret, irq; if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY) return; + ret = pm_runtime_get_sync(smmu->dev); + if (ret) + return; + /* * Disable the context bank and free the page tables before freeing * it. @@ -933,6 +937,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) free_io_pgtable_ops(smmu_domain->pgtbl_ops); __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx); + + pm_runtime_put_sync(smmu->dev); } static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) @@ -1408,12 +1414,20 @@ static int arm_smmu_add_device(struct device *dev) while (i--) cfg->smendx[i] = INVALID_SMENDX; - ret = arm_smmu_master_alloc_smes(dev); + ret = pm_runtime_get_sync(smmu->dev); if (ret) goto out_cfg_free; + ret = arm_smmu_master_alloc_smes(dev); + if (ret) { + pm_runtime_put_sync(smmu->dev); + goto out_cfg_free; Please keep to the existing pattern and put this on the cleanup path with a new label, rather than inline. + } + iommu_device_link(&smmu->iommu, dev); + pm_runtime_put_sync(smmu->dev); + return 0; out_cfg_free: @@ -1428,7 +1442,7 @@ static void arm_smmu_remove_device(struct device *dev) struct iommu_fwspec *fwspec = dev->iommu_fwspec; struct arm_smmu_master_cfg *cfg; struct arm_smmu_device *smmu; - + int ret; if (!fwspec || fwspec->ops != &arm_smmu_ops) return; @@ -1436,8 +1450,21 @@ static void arm_smmu_remove_device(struct device *dev) cfg = fwspec->iommu_priv; smmu = cfg->smmu; + /* +* The device link between the master device and +* smmu is already purged at this point. +* So enable the power to smmu explicitly. +*/ I don't understand this comment, especially since we don't even introduce device links until the following patch... :/ + + ret = pm_runtime_get_sync(smmu->dev); + if (ret) + return; + iommu_device_unlink(&smmu->iommu, dev); arm_smmu_master_free_smes(fwspec); + + pm_runtime_put_sync(smmu->dev); + iommu_group_remove_device(dev); kfree(fwspec->iommu_priv); iommu_fwspec_free(dev); @@ -2130,6 +2157,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev) if (err) return err; + platform_set_drvdata(pdev, smmu); + + pm_runtime_enable(dev); + + err = pm_runtime_get_sync(dev); + if (err) + return err; + err = arm_smmu_device_cfg_probe(smmu); if (err) return err; @@ -2171,9 +2206,9 @@ static int arm_smmu_device_probe(struct platform_device *pdev) return err; } - platform_set_drvdata(pdev, smmu); arm_smmu_device_reset(smmu); arm_smmu_test_smr_masks(smmu); + pm_runtime_put_sync(dev); /* * For ACPI and generic DT bindings, an SMMU will be probed before @@ -2212,6 +2247,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev) /* Turn the thing off */ writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0); + pm_runtime_force_suspend(smmu->dev); Why do we need this? I guess it might be a Qualcomm-ism as I don't see anyone else calling it from .remove other than a couple of other qcom_* drivers. Given that we only get here during system shutdown (or the root user intentionally pissing about with driver unbinding), it doesn't seem like a point where power saving really matters all that much. I'd also naively expect that anything this device was the last consumer off would get turned off by core code anyway once it's removed, but maybe things aren't that slick; I dunno :/ Robin. + return 0; } ___ Freedreno mailing list
Re: [Freedreno] [PATCH v6 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
On 19/01/18 11:43, Vivek Gautam wrote: From: Sricharan R The smmu needs to be functional only when the respective master's using it are active. The device_link feature helps to track such functional dependencies, so that the iommu gets powered when the master device enables itself using pm_runtime. So by adapting the smmu driver for runtime pm, above said dependency can be addressed. This patch adds the pm runtime/sleep callbacks to the driver and also the functions to parse the smmu clocks from DT and enable them in resume/suspend. Signed-off-by: Sricharan R Signed-off-by: Archit Taneja [vivek: Clock rework to request bulk of clocks] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 55 ++-- 1 file changed, 53 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 78d4c6b8f1ba..21acffe91a1c 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -48,6 +48,7 @@ #include #include #include +#include #include #include @@ -205,6 +206,9 @@ struct arm_smmu_device { u32 num_global_irqs; u32 num_context_irqs; unsigned int*irqs; + struct clk_bulk_data*clocks; + int num_clks; + const char * const *clk_names; This seems unnecessary, as we use it a grand total of of once, during initialisation when we have the source data directly to hand. Just pass data->clks into arm_smmu_init_clks() as an additional argument. Otherwise, I think this looks reasonable; it's about as unobtrusive as it's going to get. Robin. u32 cavium_id_base; /* Specific to Cavium */ @@ -1685,6 +1689,25 @@ static int arm_smmu_id_size_to_bits(int size) } } +static int arm_smmu_init_clocks(struct arm_smmu_device *smmu) +{ + int i; + int num = smmu->num_clks; + + if (num < 1) + return 0; + + smmu->clocks = devm_kcalloc(smmu->dev, num, + sizeof(*smmu->clocks), GFP_KERNEL); + if (!smmu->clocks) + return -ENOMEM; + + for (i = 0; i < num; i++) + smmu->clocks[i].id = smmu->clk_names[i]; + + return devm_clk_bulk_get(smmu->dev, num, smmu->clocks); +} + static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) { unsigned long size; @@ -1897,10 +1920,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) struct arm_smmu_match_data { enum arm_smmu_arch_version version; enum arm_smmu_implementation model; + const char * const *clks; + int num_clks; }; #define ARM_SMMU_MATCH_DATA(name, ver, imp) \ -static struct arm_smmu_match_data name = { .version = ver, .model = imp } +static const struct arm_smmu_match_data name = { .version = ver, .model = imp } ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU); @@ -2001,6 +2026,8 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev, data = of_device_get_match_data(dev); smmu->version = data->version; smmu->model = data->model; + smmu->clk_names = data->clks; + smmu->num_clks = data->num_clks; parse_driver_options(smmu); @@ -2099,6 +2126,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev) smmu->irqs[i] = irq; } + err = arm_smmu_init_clocks(smmu); + if (err) + return err; + err = arm_smmu_device_cfg_probe(smmu); if (err) return err; @@ -2197,7 +2228,27 @@ static int __maybe_unused arm_smmu_pm_resume(struct device *dev) return 0; } -static SIMPLE_DEV_PM_OPS(arm_smmu_pm_ops, NULL, arm_smmu_pm_resume); +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev) +{ + struct arm_smmu_device *smmu = dev_get_drvdata(dev); + + return clk_bulk_prepare_enable(smmu->num_clks, smmu->clocks); +} + +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev) +{ + struct arm_smmu_device *smmu = dev_get_drvdata(dev); + + clk_bulk_disable_unprepare(smmu->num_clks, smmu->clocks); + + return 0; +} + +static const struct dev_pm_ops arm_smmu_pm_ops = { + SET_SYSTEM_SLEEP_PM_OPS(NULL, arm_smmu_pm_resume) + SET_RUNTIME_PM_OPS(arm_smmu_runtime_suspend, + arm_smmu_runtime_resume, NULL) +}; static struct platform_driver arm_smmu_driver = { .driver = { ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [PATCH] drm: convert DT component matching to component_match_add_release()
Hi Russell, On 03/06/16 08:58, Russell King wrote: Convert DT component matching to use component_match_add_release(). Signed-off-by: Russell King --- drivers/gpu/drm/arm/hdlcd_drv.c | 10 -- drivers/gpu/drm/armada/armada_drv.c | 9 +++-- drivers/gpu/drm/drm_of.c| 13 + drivers/gpu/drm/msm/msm_drv.c | 8 +++- drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 13 ++--- drivers/gpu/drm/sti/sti_drv.c | 9 +++-- drivers/gpu/drm/tilcdc/tilcdc_external.c| 9 +++-- 7 files changed, 55 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c index b987c63ba8d6..bbde48c4f550 100644 --- a/drivers/gpu/drm/arm/hdlcd_drv.c +++ b/drivers/gpu/drm/arm/hdlcd_drv.c @@ -443,11 +443,16 @@ static const struct component_master_ops hdlcd_master_ops = { .unbind = hdlcd_drm_unbind, }; -static int compare_dev(struct device *dev, void *data) +static int compare_of(struct device *dev, void *data) { return dev->of_node == data; } +static void release_of(struct device *dev, void *data) +{ + of_node_put(data); +} Considering that there's 7 identical copies of this function in this patch alone, perhaps there's some mileage in defining it commonly as a static __maybe_unused default_release_of() in component.h or drm_of.h (and maybe default_compare_of() similarly)? (Apologies if there's already been some strong argument against that which I've not seen, but it seems like a reasonable thing to do.) Robin. + static int hdlcd_probe(struct platform_device *pdev) { struct device_node *port, *ep; @@ -474,7 +479,8 @@ static int hdlcd_probe(struct platform_device *pdev) return -EAGAIN; } - component_match_add(&pdev->dev, &match, compare_dev, port); + component_match_add_release(&pdev->dev, &match, release_of, + compare_of, port); return component_master_add_with_match(&pdev->dev, &hdlcd_master_ops, match); diff --git a/drivers/gpu/drm/armada/armada_drv.c b/drivers/gpu/drm/armada/armada_drv.c index 439824a61aa5..6ca2aa36515e 100644 --- a/drivers/gpu/drm/armada/armada_drv.c +++ b/drivers/gpu/drm/armada/armada_drv.c @@ -232,6 +232,11 @@ static int compare_of(struct device *dev, void *data) return dev->of_node == data; } +static void release_of(struct device *dev, void *data) +{ + of_node_put(data); +} + static int compare_dev_name(struct device *dev, void *data) { const char *name = data; @@ -255,8 +260,8 @@ static void armada_add_endpoints(struct device *dev, continue; } - component_match_add(dev, match, compare_of, remote); - of_node_put(remote); + component_match_add_release(dev, match, release_of, + compare_of, remote); } } diff --git a/drivers/gpu/drm/drm_of.c b/drivers/gpu/drm/drm_of.c index bc98bb94264d..5d183479d7d6 100644 --- a/drivers/gpu/drm/drm_of.c +++ b/drivers/gpu/drm/drm_of.c @@ -6,6 +6,11 @@ #include #include +static void drm_release_of(struct device *dev, void *data) +{ + of_node_put(data); +} + /** * drm_crtc_port_mask - find the mask of a registered CRTC by port OF node * @dev: DRM device @@ -101,8 +106,8 @@ int drm_of_component_probe(struct device *dev, continue; } - component_match_add(dev, &match, compare_of, port); - of_node_put(port); + component_match_add_release(dev, &match, drm_release_of, + compare_of, port); } if (i == 0) { @@ -140,8 +145,8 @@ int drm_of_component_probe(struct device *dev, continue; } - component_match_add(dev, &match, compare_of, remote); - of_node_put(remote); + component_match_add_release(dev, &match, drm_release_of, + compare_of, remote); } of_node_put(port); } diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 9c654092ef78..1f7de47d817e 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -805,6 +805,11 @@ static int compare_of(struct device *dev, void *data) return dev->of_node == data; } +static void release_of(struct device *dev, void *data) +{ + of_node_put(data); +} + static int add_components(struct device *dev, struct component_match **matchptr, const char *name) { @@ -818,7 +823,8 @@ static int add_components(struct device *dev, struct component_match **matchptr, if (!node)