On 10/17/25 16:04, Gang Ba wrote:
> drm_sched_entity_flush() may kill the VM entities under certain condition.
> then KFD need to issue kfd_process_wq_release to release associated
> resources, it cam cause following job submissions of process failed.
> 
> [ 3976.788183] [drm:amddrm_sched_entity_push_job [amd_sched]] *ERROR* Trying 
> to push to a killed entity
> Or
> [  129.600916] [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a 
> killed entity

Clear NAK. When the process is killed the KFD should not try to submit any VM 
updates any more.

Regards,
Christian.

> 
> Signed-off-by: Gang Ba <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index bebf2ebc4f34..2361c09ddc77 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2997,6 +2997,9 @@ static int amdgpu_flush(struct file *f, fl_owner_t id)
>       struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
>       long timeout = MAX_WAIT_SCHED_ENTITY_Q_EMPTY;
>  
> +     if (fpriv->vm.is_compute_context)
> +             return 0;
> +
>       timeout = amdgpu_ctx_mgr_entity_flush(&fpriv->ctx_mgr, timeout);
>       timeout = amdgpu_vm_wait_idle(&fpriv->vm, timeout);
>  

Reply via email to