Because when SDMA was hang by like process A, and meanwhile another process B 
is already running into the code of fill_buffer()
So just let process B continue, don't block it otherwise process B would fail 
by software reason .

Let it run and finally process B's job would fail and GPU recover will repeat 
it again (since it is a kernel job) 

Without this solution other process will be greatly harmed by one black sheep 
that triggering GPU recover 

/Monk



-----Original Message-----
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] 
Sent: 2018年2月28日 20:24
To: Liu, Monk <monk....@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready for 
fill_buffer

Am 28.02.2018 um 08:21 schrieb Monk Liu:
> because this time SDMA may under GPU RESET so its ring->ready may not 
> true, keep going and GPU scheduler will reschedule this job if it 
> failed.
>
> give a warning on copy_buffer when go through direct_submit while 
> ring->ready is false

NAK, that test has already saved us quite a bunch of trouble with the fb layer.

Why exactly are you running into issues with that?

Christian.

>
> Change-Id: Ife6cd55e0e843d99900e5bed5418499e88633685
> Signed-off-by: Monk Liu <monk....@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index e38e6db..7b75ac9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1656,6 +1656,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, 
> uint64_t src_offset,
>       amdgpu_ring_pad_ib(ring, &job->ibs[0]);
>       WARN_ON(job->ibs[0].length_dw > num_dw);
>       if (direct_submit) {
> +             WARN_ON(!ring->ready);
>               r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs,
>                                      NULL, fence);
>               job->fence = dma_fence_get(*fence); @@ -1692,11 +1693,6 @@ int 
> amdgpu_fill_buffer(struct amdgpu_bo *bo,
>       struct amdgpu_job *job;
>       int r;
>   
> -     if (!ring->ready) {
> -             DRM_ERROR("Trying to clear memory with ring turned off.\n");
> -             return -EINVAL;
> -     }
> -
>       if (bo->tbo.mem.mem_type == TTM_PL_TT) {
>               r = amdgpu_ttm_alloc_gart(&bo->tbo);
>               if (r)

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to