Lockup_timeout = 0 doesn't indicate GPU reset isn't ready, kernel/amdgpu never 
tell you that, instead if means there is no Timeout of jobs 
So no warning, no gpu recover triggered by time out event, but that doesn't 
mean gpu recover cannot be triggered, e.g. for SRIOV we can
Trigger gpu recover by hypervisor.

Your patch shouldn't and cannot break exist logics, that's very simple rule ...
If you insist your change, at least make sure it doesn't change any logic of 
SRIOV and that's not hard for you, just add "if (!amdgpu_sriov_vf(adev))" 
checking 
Prior to your path, although I didn't encourage such ugly actions...


-----Original Message-----
From: Marek Olšák [mailto:mar...@gmail.com] 
Sent: Tuesday, December 12, 2017 11:02 PM
To: Liu, Monk <monk....@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0

On Tue, Dec 12, 2017 at 4:18 AM, Liu, Monk <monk....@amd.com> wrote:
> NAK, you change break SRIOV logic:
>
> Without lockup_timeout set, this gpu_recover() won't get called at all 
> , unless your IB triggered invalid instruct and that IRQ invoked 
> Amdgpu_gpu_recover(), by this cause you should disable the logic that 
> in that IRQ instead of change gpu_recover() itself because For SRIOV 
> we need gpu_recover() even lockup_timeout is zero

The default value of 0 indicates that GPU reset isn't ready to be enabled by 
default. That's what it means. Once the GPU reset works, the default should be 
non-zero (e.g. 10000) and
amdgpu.lockup_timeout=0 should be used to disable all GPU resets in order to be 
able do scandumps and debug GPU hangs.

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to