Hi Christian,

Sometimes shadow->parent would be NULL in my testbed, but not reproduce today...
Just sent out another patch following your advice.
Thanks.

BR,
Wentao


-----Original Message-----
From: Christian König <ckoenig.leichtzumer...@gmail.com> 
Sent: Tuesday, April 2, 2019 6:36 PM
To: Lou, Wentao <wentao....@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] amdgpu_device_recover_vram always failed if only one node 
in shadow_list

Am 02.04.19 um 11:19 schrieb wentalou:
> amdgpu_bo_restore_shadow would assign zero to r if succeeded.
> r would remain zero if there is only one node in shadow_list.
> current code would always return failure when r <= 0.
> restart the timeout for each wait was a rather problematic bug as well.
> The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each 
> loop.
> meanwhile, fix Call Trace by NULL of shadow->parent.
>
> Change-Id: I7e836ec7ab6cd0f069aac24f88e454e906637541
> Signed-off-by: Wentao Lou <wentao....@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index c4c61e9..5a2dc44 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3183,7 +3183,7 @@ static int amdgpu_device_recover_vram(struct 
> amdgpu_device *adev)
>   
>               /* No need to recover an evicted BO */
>               if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
> -                 shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
> +                 shadow->parent == NULL || shadow->parent->tbo.mem.mem_type 
> != 
> +TTM_PL_VRAM)

That doesn't looks like a good idea to me. Did you actually run into this issue?

>                       continue;
>   
>               r = amdgpu_bo_restore_shadow(shadow, &next); @@ -3191,11 
> +3191,16 
> @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>                       break;
>   
>               if (fence) {
> -                     r = dma_fence_wait_timeout(fence, false, tmo);
> +                     tmo = dma_fence_wait_timeout(fence, false, tmo);
>                       dma_fence_put(fence);
>                       fence = next;
> -                     if (r <= 0)
> +                     if (tmo == 0) {
> +                             r = -ETIMEDOUT;
>                               break;
> +                     } else if (tmo < 0) {
> +                             r = tmo;
> +                             break;
> +                     }
>               } else {
>                       fence = next;
>               }
> @@ -3206,8 +3211,8 @@ static int amdgpu_device_recover_vram(struct 
> amdgpu_device *adev)
>               tmo = dma_fence_wait_timeout(fence, false, tmo);
>       dma_fence_put(fence);
>   
> -     if (r <= 0 || tmo <= 0) {
> -             DRM_ERROR("recover vram bo from shadow failed\n");
> +     if (r < 0 || tmo <= 0) {
> +             DRM_ERROR("recover vram bo from shadow failed, tmo is %d\n", 
> tmo);

Maybe print both r and tmo in the message.

Regards,
Christian.

>               return -EIO;
>       }
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to