Mhm, that sounds like another bug to me which we need to investigate.

Probably a race during freeing of shadow allocations.

Christian.

Am 03.04.19 um 08:35 schrieb Lou, Wentao:
> Hi Christian,
>
> Sometimes shadow->parent would be NULL in my testbed, but not reproduce 
> today...
> Just sent out another patch following your advice.
> Thanks.
>
> BR,
> Wentao
>
>
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumer...@gmail.com>
> Sent: Tuesday, April 2, 2019 6:36 PM
> To: Lou, Wentao <wentao....@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] amdgpu_device_recover_vram always failed if only one 
> node in shadow_list
>
> Am 02.04.19 um 11:19 schrieb wentalou:
>> amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>> r would remain zero if there is only one node in shadow_list.
>> current code would always return failure when r <= 0.
>> restart the timeout for each wait was a rather problematic bug as well.
>> The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each 
>> loop.
>> meanwhile, fix Call Trace by NULL of shadow->parent.
>>
>> Change-Id: I7e836ec7ab6cd0f069aac24f88e454e906637541
>> Signed-off-by: Wentao Lou <wentao....@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 ++++++++++-----
>>    1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index c4c61e9..5a2dc44 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3183,7 +3183,7 @@ static int amdgpu_device_recover_vram(struct
>> amdgpu_device *adev)
>>    
>>              /* No need to recover an evicted BO */
>>              if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
>> -                shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
>> +                shadow->parent == NULL || shadow->parent->tbo.mem.mem_type 
>> !=
>> +TTM_PL_VRAM)
> That doesn't looks like a good idea to me. Did you actually run into this 
> issue?
>
>>                      continue;
>>    
>>              r = amdgpu_bo_restore_shadow(shadow, &next); @@ -3191,11 
>> +3191,16
>> @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>>                      break;
>>    
>>              if (fence) {
>> -                    r = dma_fence_wait_timeout(fence, false, tmo);
>> +                    tmo = dma_fence_wait_timeout(fence, false, tmo);
>>                      dma_fence_put(fence);
>>                      fence = next;
>> -                    if (r <= 0)
>> +                    if (tmo == 0) {
>> +                            r = -ETIMEDOUT;
>>                              break;
>> +                    } else if (tmo < 0) {
>> +                            r = tmo;
>> +                            break;
>> +                    }
>>              } else {
>>                      fence = next;
>>              }
>> @@ -3206,8 +3211,8 @@ static int amdgpu_device_recover_vram(struct 
>> amdgpu_device *adev)
>>              tmo = dma_fence_wait_timeout(fence, false, tmo);
>>      dma_fence_put(fence);
>>    
>> -    if (r <= 0 || tmo <= 0) {
>> -            DRM_ERROR("recover vram bo from shadow failed\n");
>> +    if (r < 0 || tmo <= 0) {
>> +            DRM_ERROR("recover vram bo from shadow failed, tmo is %d\n", 
>> tmo);
> Maybe print both r and tmo in the message.
>
> Regards,
> Christian.
>
>>              return -EIO;
>>      }
>>    

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to