On 11/7/25 11:21, Jesse.Zhang wrote:
> Fix a potential deadlock caused by inconsistent spinlock usage
> between interrupt and process contexts in the userq fence driver.
> 
> The issue occurs when amdgpu_userq_fence_driver_process() is called
> from both:
> - Interrupt context: gfx_v11_0_eop_irq() -> 
> amdgpu_userq_fence_driver_process()
> - Process context: amdgpu_eviction_fence_suspend_worker() ->
>   amdgpu_userq_fence_driver_force_completion() -> 
> amdgpu_userq_fence_driver_process()
> 
> In interrupt context, the spinlock was acquired without disabling
> interrupts, leaving it in {IN-HARDIRQ-W} state. When the same lock
> is acquired in process context, the kernel detects inconsistent
> locking since the process context acquisition would enable interrupts
> while holding a lock previously acquired in interrupt context.
> 
> Kernel log shows:
> [ 4039.310790] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [ 4039.310804] kworker/7:2/409 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 4039.310818] ffff9284e1bed000 (&fence_drv->fence_list_lock){?...}-{3:3},
> [ 4039.310993] {IN-HARDIRQ-W} state was registered at:
> [ 4039.311004]   lock_acquire+0xc6/0x300
> [ 4039.311018]   _raw_spin_lock+0x39/0x80
> [ 4039.311031]   amdgpu_userq_fence_driver_process.part.0+0x30/0x180 [amdgpu]
> [ 4039.311146]   amdgpu_userq_fence_driver_process+0x17/0x30 [amdgpu]
> [ 4039.311257]   gfx_v11_0_eop_irq+0x132/0x170 [amdgpu]
> 
> Fix by using spin_lock_irqsave()/spin_unlock_irqrestore() to properly
> manage interrupt state regardless of calling context.
> 
> Signed-off-by: Jesse Zhang <[email protected]>

Reviewed-by: Christian König <[email protected]>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 99ae1d19b751..eba9fb359047 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -151,15 +151,16 @@ void amdgpu_userq_fence_driver_process(struct 
> amdgpu_userq_fence_driver *fence_d
>  {
>       struct amdgpu_userq_fence *userq_fence, *tmp;
>       struct dma_fence *fence;
> +     unsigned long flags;
>       u64 rptr;
>       int i;
>  
>       if (!fence_drv)
>               return;
>  
> +     spin_lock_irqsave(&fence_drv->fence_list_lock, flags);
>       rptr = amdgpu_userq_fence_read(fence_drv);
>  
> -     spin_lock(&fence_drv->fence_list_lock);
>       list_for_each_entry_safe(userq_fence, tmp, &fence_drv->fences, link) {
>               fence = &userq_fence->base;
>  
> @@ -174,7 +175,7 @@ void amdgpu_userq_fence_driver_process(struct 
> amdgpu_userq_fence_driver *fence_d
>               list_del(&userq_fence->link);
>               dma_fence_put(fence);
>       }
> -     spin_unlock(&fence_drv->fence_list_lock);
> +     spin_unlock_irqrestore(&fence_drv->fence_list_lock, flags);
>  }
>  
>  void amdgpu_userq_fence_driver_destroy(struct kref *ref)

Reply via email to