On 5/19/26 13:18, Sunil Khatri wrote:
> Thread 1: Running amdgpu_userq_destroy which eventually remove
> the queue from door bell and set userq_mgr = NULL.
> 
> Thread2: An interrupt might have scheduled the hang_detect_work
> which still need userq_mgr to be valid but could get an NULL
> ptrs.
> 
> To fix that make sure we cancel the hang_detect_work again before
> setting userq_mgr to NULL.
> 
> Along with that we also need all the queue va to remain valid till
> we could be running anything on the queue and hence moving the
> userq_va post hang_detect handler is cancelled.
> 
> Signed-off-by: Sunil Khatri <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index c8f7bb23e2c3..7354c51ae83d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -427,8 +427,6 @@ static void amdgpu_userq_cleanup(struct 
> amdgpu_usermode_queue *queue)
>       xa_erase_irq(&adev->userq_doorbell_xa, queue->doorbell_index);
>       amdgpu_userq_fence_driver_free(queue);
>       queue->fence_drv = NULL;
> -     queue->userq_mgr = NULL;
> -     list_del(&queue->userq_va_list);
>  
>       up_read(&adev->reset_domain->sem);
>  }
> @@ -619,11 +617,6 @@ amdgpu_userq_destroy(struct amdgpu_userq_mgr *uq_mgr, 
> struct amdgpu_usermode_que
>  
>       /* Cancel any pending hang detection work and cleanup */
>       cancel_delayed_work_sync(&queue->hang_detect_work);
> -
> -     amdgpu_bo_reserve(vm->root.bo, true);
> -     amdgpu_userq_buffer_vas_list_cleanup(adev, queue);
> -     amdgpu_bo_unreserve(vm->root.bo);
> -
>       mutex_lock(&uq_mgr->userq_mutex);
>       amdgpu_userq_wait_for_last_fence(queue);
>  
> @@ -635,6 +628,14 @@ amdgpu_userq_destroy(struct amdgpu_userq_mgr *uq_mgr, 
> struct amdgpu_usermode_que
>       amdgpu_userq_cleanup(queue);
>       mutex_unlock(&uq_mgr->userq_mutex);
>  
> +     /* This is case an interrupt was fired and a hang detection work is 
> pending */

Typo in comment, but I think you can drop it complete. Comments should explain 
why and not what.

Apart from that Reviewed-by: Christian König <[email protected]>

Regards,
Christian.

> +     cancel_delayed_work_sync(&queue->hang_detect_work);
> +     amdgpu_bo_reserve(vm->root.bo, true);
> +     amdgpu_userq_buffer_vas_list_cleanup(adev, queue);
> +     amdgpu_bo_unreserve(vm->root.bo);
> +     list_del(&queue->userq_va_list);
> +     queue->userq_mgr = NULL;
> +
>       amdgpu_bo_reserve(queue->db_obj.obj, true);
>       amdgpu_bo_unpin(queue->db_obj.obj);
>       amdgpu_bo_unreserve(queue->db_obj.obj);

Reply via email to