On Thu, 15 Jan 2026 12:15:22 +0100 Nicolas Frattaroli <[email protected]> wrote:
> On Monday, 12 January 2026 16:12:52 Central European Standard Time Boris > Brezillon wrote: > > On Mon, 12 Jan 2026 15:37:50 +0100 > > Nicolas Frattaroli <[email protected]> wrote: > > > > > The current IRQ helpers do not guarantee mutual exclusion that covers > > > the entire transaction from accessing the mask member and modifying the > > > mask register. > > > > > > This makes it hard, if not impossible, to implement mask modification > > > helpers that may change one of these outside the normal > > > suspend/resume/isr code paths. > > > > > > Add a spinlock to struct panthor_irq that protects both the mask member > > > and register. Acquire it in all code paths that access these, but drop > > > it before processing the threaded handler function. Then, add the > > > aforementioned new helpers: enable_events, and disable_events. They work > > > by ORing and NANDing the mask bits. > > > > > > resume is changed to no longer have a mask passed, as pirq->mask is > > > supposed to be the user-requested mask now, rather than a mirror of the > > > INT_MASK register contents. Users of the resume helper are adjusted > > > accordingly, including a rather painful refactor in panthor_mmu.c. > > > > > > Signed-off-by: Nicolas Frattaroli <[email protected]> > > > --- > > > drivers/gpu/drm/panthor/panthor_device.h | 72 +++++++-- > > > drivers/gpu/drm/panthor/panthor_fw.c | 3 +- > > > drivers/gpu/drm/panthor/panthor_gpu.c | 2 +- > > > drivers/gpu/drm/panthor/panthor_mmu.c | 247 > > > ++++++++++++++++--------------- > > > drivers/gpu/drm/panthor/panthor_pwr.c | 2 +- > > > 5 files changed, 187 insertions(+), 139 deletions(-) > > > > > [... snip ...] > > > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c > > > b/drivers/gpu/drm/panthor/panthor_mmu.c > > > index 198d59f42578..71b8318eab31 100644 > > > --- a/drivers/gpu/drm/panthor/panthor_mmu.c > > > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c > > > @@ -655,125 +655,6 @@ static void panthor_vm_release_as_locked(struct > > > panthor_vm *vm) > > > vm->as.id = -1; > > > } > > > > > > -/** > > > - * panthor_vm_active() - Flag a VM as active > > > - * @vm: VM to flag as active. > > > - * > > > - * Assigns an address space to a VM so it can be used by the GPU/MCU. > > > - * > > > - * Return: 0 on success, a negative error code otherwise. > > > - */ > > > -int panthor_vm_active(struct panthor_vm *vm) > > > -{ > > > - struct panthor_device *ptdev = vm->ptdev; > > > - u32 va_bits = GPU_MMU_FEATURES_VA_BITS(ptdev->gpu_info.mmu_features); > > > - struct io_pgtable_cfg *cfg = > > > &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg; > > > - int ret = 0, as, cookie; > > > - u64 transtab, transcfg; > > > - > > > - if (!drm_dev_enter(&ptdev->base, &cookie)) > > > - return -ENODEV; > > > - > > > - if (refcount_inc_not_zero(&vm->as.active_cnt)) > > > - goto out_dev_exit; > > > - > > > - /* Make sure we don't race with lock/unlock_region() calls > > > - * happening around VM bind operations. > > > - */ > > > - mutex_lock(&vm->op_lock); > > > - mutex_lock(&ptdev->mmu->as.slots_lock); > > > - > > > - if (refcount_inc_not_zero(&vm->as.active_cnt)) > > > - goto out_unlock; > > > - > > > - as = vm->as.id; > > > - if (as >= 0) { > > > - /* Unhandled pagefault on this AS, the MMU was disabled. We > > > need to > > > - * re-enable the MMU after clearing+unmasking the AS interrupts. > > > - */ > > > - if (ptdev->mmu->as.faulty_mask & > > > panthor_mmu_as_fault_mask(ptdev, as)) > > > - goto out_enable_as; > > > - > > > - goto out_make_active; > > > - } > > > - > > > - /* Check for a free AS */ > > > - if (vm->for_mcu) { > > > - drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0)); > > > - as = 0; > > > - } else { > > > - as = ffz(ptdev->mmu->as.alloc_mask | BIT(0)); > > > - } > > > - > > > - if (!(BIT(as) & ptdev->gpu_info.as_present)) { > > > - struct panthor_vm *lru_vm; > > > - > > > - lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list, > > > - struct panthor_vm, > > > - as.lru_node); > > > - if (drm_WARN_ON(&ptdev->base, !lru_vm)) { > > > - ret = -EBUSY; > > > - goto out_unlock; > > > - } > > > - > > > - drm_WARN_ON(&ptdev->base, > > > refcount_read(&lru_vm->as.active_cnt)); > > > - as = lru_vm->as.id; > > > - > > > - ret = panthor_mmu_as_disable(ptdev, as, true); > > > - if (ret) > > > - goto out_unlock; > > > - > > > - panthor_vm_release_as_locked(lru_vm); > > > - } > > > - > > > - /* Assign the free or reclaimed AS to the FD */ > > > - vm->as.id = as; > > > - set_bit(as, &ptdev->mmu->as.alloc_mask); > > > - ptdev->mmu->as.slots[as].vm = vm; > > > - > > > -out_enable_as: > > > - transtab = cfg->arm_lpae_s1_cfg.ttbr; > > > - transcfg = AS_TRANSCFG_PTW_MEMATTR_WB | > > > - AS_TRANSCFG_PTW_RA | > > > - AS_TRANSCFG_ADRMODE_AARCH64_4K | > > > - AS_TRANSCFG_INA_BITS(55 - va_bits); > > > - if (ptdev->coherent) > > > - transcfg |= AS_TRANSCFG_PTW_SH_OS; > > > - > > > - /* If the VM is re-activated, we clear the fault. */ > > > - vm->unhandled_fault = false; > > > - > > > - /* Unhandled pagefault on this AS, clear the fault and re-enable > > > interrupts > > > - * before enabling the AS. > > > - */ > > > - if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as)) { > > > - gpu_write(ptdev, MMU_INT_CLEAR, > > > panthor_mmu_as_fault_mask(ptdev, as)); > > > - ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, > > > as); > > > - ptdev->mmu->irq.mask |= panthor_mmu_as_fault_mask(ptdev, as); > > > - gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask); > > > - } > > > - > > > - /* The VM update is guarded by ::op_lock, which we take at the beginning > > > - * of this function, so we don't expect any locked region here. > > > - */ > > > - drm_WARN_ON(&vm->ptdev->base, vm->locked_region.size > 0); > > > - ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, > > > vm->memattr); > > > - > > > -out_make_active: > > > - if (!ret) { > > > - refcount_set(&vm->as.active_cnt, 1); > > > - list_del_init(&vm->as.lru_node); > > > - } > > > - > > > -out_unlock: > > > - mutex_unlock(&ptdev->mmu->as.slots_lock); > > > - mutex_unlock(&vm->op_lock); > > > - > > > -out_dev_exit: > > > - drm_dev_exit(cookie); > > > - return ret; > > > -} > > > - > > > /** > > > * panthor_vm_idle() - Flag a VM idle > > > * @vm: VM to flag as idle. > > > @@ -1762,6 +1643,128 @@ static void panthor_mmu_irq_handler(struct > > > panthor_device *ptdev, u32 status) > > > } > > > PANTHOR_IRQ_HANDLER(mmu, MMU, panthor_mmu_irq_handler); > > > > > > +/** > > > + * panthor_vm_active() - Flag a VM as active > > > + * @vm: VM to flag as active. > > > + * > > > + * Assigns an address space to a VM so it can be used by the GPU/MCU. > > > + * > > > + * Return: 0 on success, a negative error code otherwise. > > > + */ > > > +int panthor_vm_active(struct panthor_vm *vm) > > > +{ > > > + struct panthor_device *ptdev = vm->ptdev; > > > + u32 va_bits = GPU_MMU_FEATURES_VA_BITS(ptdev->gpu_info.mmu_features); > > > + struct io_pgtable_cfg *cfg = > > > &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg; > > > + int ret = 0, as, cookie; > > > + u64 transtab, transcfg; > > > + u32 fault_mask; > > > + > > > + if (!drm_dev_enter(&ptdev->base, &cookie)) > > > + return -ENODEV; > > > + > > > + if (refcount_inc_not_zero(&vm->as.active_cnt)) > > > + goto out_dev_exit; > > > + > > > + /* Make sure we don't race with lock/unlock_region() calls > > > + * happening around VM bind operations. > > > + */ > > > + mutex_lock(&vm->op_lock); > > > + mutex_lock(&ptdev->mmu->as.slots_lock); > > > + > > > + if (refcount_inc_not_zero(&vm->as.active_cnt)) > > > + goto out_unlock; > > > + > > > + as = vm->as.id; > > > + if (as >= 0) { > > > + /* Unhandled pagefault on this AS, the MMU was disabled. We > > > need to > > > + * re-enable the MMU after clearing+unmasking the AS interrupts. > > > + */ > > > + if (ptdev->mmu->as.faulty_mask & > > > panthor_mmu_as_fault_mask(ptdev, as)) > > > + goto out_enable_as; > > > + > > > + goto out_make_active; > > > + } > > > + > > > + /* Check for a free AS */ > > > + if (vm->for_mcu) { > > > + drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0)); > > > + as = 0; > > > + } else { > > > + as = ffz(ptdev->mmu->as.alloc_mask | BIT(0)); > > > + } > > > + > > > + if (!(BIT(as) & ptdev->gpu_info.as_present)) { > > > + struct panthor_vm *lru_vm; > > > + > > > + lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list, > > > + struct panthor_vm, > > > + as.lru_node); > > > + if (drm_WARN_ON(&ptdev->base, !lru_vm)) { > > > + ret = -EBUSY; > > > + goto out_unlock; > > > + } > > > + > > > + drm_WARN_ON(&ptdev->base, > > > refcount_read(&lru_vm->as.active_cnt)); > > > + as = lru_vm->as.id; > > > + > > > + ret = panthor_mmu_as_disable(ptdev, as, true); > > > + if (ret) > > > + goto out_unlock; > > > + > > > + panthor_vm_release_as_locked(lru_vm); > > > + } > > > + > > > + /* Assign the free or reclaimed AS to the FD */ > > > + vm->as.id = as; > > > + set_bit(as, &ptdev->mmu->as.alloc_mask); > > > + ptdev->mmu->as.slots[as].vm = vm; > > > + > > > +out_enable_as: > > > + transtab = cfg->arm_lpae_s1_cfg.ttbr; > > > + transcfg = AS_TRANSCFG_PTW_MEMATTR_WB | > > > + AS_TRANSCFG_PTW_RA | > > > + AS_TRANSCFG_ADRMODE_AARCH64_4K | > > > + AS_TRANSCFG_INA_BITS(55 - va_bits); > > > + if (ptdev->coherent) > > > + transcfg |= AS_TRANSCFG_PTW_SH_OS; > > > + > > > + /* If the VM is re-activated, we clear the fault. */ > > > + vm->unhandled_fault = false; > > > + > > > + /* Unhandled pagefault on this AS, clear the fault and re-enable > > > interrupts > > > + * before enabling the AS. > > > + */ > > > + fault_mask = panthor_mmu_as_fault_mask(ptdev, as); > > > + if (ptdev->mmu->as.faulty_mask & fault_mask) { > > > + gpu_write(ptdev, MMU_INT_CLEAR, fault_mask); > > > + ptdev->mmu->as.faulty_mask &= ~fault_mask; > > > + panthor_mmu_irq_enable_events(&ptdev->mmu->irq, fault_mask); > > > + panthor_mmu_irq_disable_events(&ptdev->mmu->irq, > > > ptdev->mmu->as.faulty_mask); > > > > Why do we need a _disable_events() here? > > It's what the code originally did as far as I can tell. Not super obvious > because I had to move the function, but it did: > > /* Unhandled pagefault on this AS, clear the fault and re-enable > interrupts > * before enabling the AS. > */ > if (ptdev->mmu->as.faulty_mask & panthor_mmu_as_fault_mask(ptdev, as)) { > gpu_write(ptdev, MMU_INT_CLEAR, > panthor_mmu_as_fault_mask(ptdev, as)); > ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, > as); > ptdev->mmu->irq.mask |= panthor_mmu_as_fault_mask(ptdev, as); > gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask); > } > > We write `~(ptdev->mmu->as.faulty_mask & ~panthor_mmu_as_fault_mask(ptdev, > as))` to > the mask register. Though now looking at it again, I don't think my new > version > expands to the same thing at all, since > `ptdev->mmu->as.faulty_mask &= ~panthor_mmu_as_fault_mask(ptdev, as);` is > trying > to clear the fault mask of the one bit this translates to from what I can > tell, > and then the negation in the write re-enables it but clears all other bits? > That > can't be right. If anything if it wanted to re-enable interrupts it should OR > the register contents, not overwrite them. It's more an "enable anything that's not faulty" than a "re-enable events for this specific AS". So all we were doing is keep track of the faulty+not-acknowledged mask, and then clearing bits in this mask as AS slots get acknowledged or recycled. > > I feel a little better about the me from a few days ago when I can look at the > code with a fresh set of eyes and still not get what it's actually trying to > do, > other than trusting the comment. > > Also, genuinely what is the point of `panthor_mmu_as_fault_mask`? Half of its > parameters are unused and its entire implementation is shorter than the > function > name. panthor_mmu_as_fault_mask() is only here because at some point I intended to port JM HW support to panthor, the fault mask for an AS there is `BIT(as) | BIT(as + 16)` instead of just `BIT(as)` on new HW. So the plan was for panthor_mmu_as_fault_mask() to abstract that for us.
