On Sun, Jun 21, 2020 at 02:03:07AM -0400, Andrey Grodzovsky wrote:
> entity->rq becomes null aftre device unplugged so just return early
> in that case.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>

That looks very deep in amdgpu internals ... how do you even get in here
after the device is fully unplugged on the sw side?

Is this amdkfd doing something stupid because entirely unaware of what
amdgpu has done? Something else? Just feels like this is just duct-taping
over a more fundamental problem, after hotunplug no one should be able to
even submit anything new, or do bo moves, or well anything really.
-Daniel

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index 8d9c6fe..d252427 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -24,6 +24,7 @@
>  #include "amdgpu_job.h"
>  #include "amdgpu_object.h"
>  #include "amdgpu_trace.h"
> +#include <drm/drm_drv.h>
>  
>  #define AMDGPU_VM_SDMA_MIN_NUM_DW    256u
>  #define AMDGPU_VM_SDMA_MAX_NUM_DW    (16u * 1024u)
> @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct 
> amdgpu_vm_update_params *p,
>       struct drm_sched_entity *entity;
>       struct amdgpu_ring *ring;
>       struct dma_fence *f;
> -     int r;
> +     int r, idx;
> +
> +     if (!drm_dev_enter(p->adev->ddev, &idx)) {
> +             r = -ENODEV;
> +             goto nodev;
> +     }
>  
>       entity = p->immediate ? &p->vm->immediate : &p->vm->delayed;
>       ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
> @@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct 
> amdgpu_vm_update_params *p,
>       WARN_ON(ib->length_dw > p->num_dw_left);
>       r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f);
>       if (r)
> -             goto error;
> +             goto job_fail;
>  
>       if (p->unlocked) {
>               struct dma_fence *tmp = dma_fence_get(f);
> @@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct 
> amdgpu_vm_update_params *p,
>       if (fence && !p->immediate)
>               swap(*fence, f);
>       dma_fence_put(f);
> -     return 0;
>  
> -error:
> -     amdgpu_job_free(p->job);
> +     r = 0;
> +
> +job_fail:
> +     drm_dev_exit(idx);
> +nodev:
> +     if (r)
> +             amdgpu_job_free(p->job);
> +
>       return r;
>  }
>  
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to