On 1/14/26 11:48, Srinivasan Shanmugam wrote:
> The user mode queue keeps a pointer to the most recent fence in
> userq->last_fence. This pointer holds an extra dma_fence reference.
> 
> When the queue is destroyed, we free the fence driver and its xarray,
> but we forgot to drop the last_fence reference.
> 
> Because of the missing dma_fence_put(), the last fence object can stay
> alive when the driver unloads. This leaves an allocated object in the
> amdgpu_userq_fence slab cache and triggers
> 
> This is visible during driver unload as:
> 
>   BUG amdgpu_userq_fence: Objects remaining on __kmem_cache_shutdown()
>   kmem_cache_destroy amdgpu_userq_fence: Slab cache still has objects
>   Call Trace:
>     kmem_cache_destroy
>     amdgpu_userq_fence_slab_fini
>     amdgpu_exit
>     __do_sys_delete_module
> 
> Fix this by putting userq->last_fence and clearing the pointer during
> amdgpu_userq_fence_driver_free().
> 
> This makes sure the fence reference is released and the slab cache is
> empty when the module exits.
> 
> Fixes: edc762a51c71 ("drm/amdgpu/userq: move some code around")
> Cc: Alex Deucher <[email protected]>
> Cc: Christian König <[email protected]>
> Signed-off-by: Srinivasan Shanmugam <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 3c6bd5531627..b13cf5114121 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -50,8 +50,12 @@ int amdgpu_userq_fence_slab_init(void)
>  
>  void amdgpu_userq_fence_slab_fini(void)
>  {
> +     if (!amdgpu_userq_fence_slab)
> +             return;
> +
>       rcu_barrier();
>       kmem_cache_destroy(amdgpu_userq_fence_slab);
> +     amdgpu_userq_fence_slab = NULL;

Please completely drop that change. When amdgpu_userq_fence_slab_fini() is 
called multiple times we have a major problem at hand that we shouldn't hide.

>  }
>  
>  static inline struct amdgpu_userq_fence *to_amdgpu_userq_fence(struct 
> dma_fence *f)
> @@ -141,6 +145,12 @@ static void amdgpu_userq_walk_and_drop_fence_drv(struct 
> xarray *xa)
>  void
>  amdgpu_userq_fence_driver_free(struct amdgpu_usermode_queue *userq)
>  {
> +     /* Drop the last fence reference held by the queue */
> +     if (userq->last_fence) {

Drop the if check and the comment, dma_fence_put() is NULL safe.

> +             dma_fence_put(userq->last_fence);
> +             userq->last_fence = NULL;

We kfree() the structure a few more lines down, don't we?

If that's true then please drop setting the member to NULL as well, otherwise 
keep it.

Regards,
Christian.

> +     }
> +
>       amdgpu_userq_walk_and_drop_fence_drv(&userq->fence_drv_xa);
>       xa_destroy(&userq->fence_drv_xa);
>       /* Drop the fence_drv reference held by user queue */

Reply via email to