RE: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)
[AMD Official Use Only] Acked-by: Evan Quan > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Saturday, November 13, 2021 12:26 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: [PATCH] drm/amdgpu: always reset the asic in suspend (v2) > > If the platform suspend happens to fail and the power rail > is not turned off, the GPU will be in an unknown state on > resume, so reset the asic so that it will be in a known > good state on resume even if the platform suspend failed. > > v2: handle s0ix > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 1db76429a673..b4591f6e82dd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device > *dev) > adev->in_s3 = true; > r = amdgpu_device_suspend(drm_dev, true); > adev->in_s3 = false; > - > + if (r) > + return r; > + if (!adev->in_s0ix) > + r = amdgpu_asic_reset(adev); > return r; > } > > -- > 2.31.1
Re: [PATCH] drm/amdgpu: always reset the asic in suspend
[AMD Official Use Only] Well, that handles the case of the GPU needing to be reset on driver (e.g., virtualization), but doesn't handle the interrupted suspend case (e.g., when suspend is unwound before the power rail was turned off). We already so something similar for hibernate to deal with the multiple freeze and thaw cycles. Alex From: Christian König Sent: Monday, November 15, 2021 8:41 AM To: Alex Deucher ; Deucher, Alexander Cc: amd-gfx list Subject: Re: [PATCH] drm/amdgpu: always reset the asic in suspend I was just about to write up my concern as well. IIRC we used to have that and it didn't really worked that well and we switched to resetting the GPU on driver load instead if initializing it doesn't work of hand. Christian. Am 12.11.21 um 17:19 schrieb Alex Deucher: > Actually, ignore this for now. This will likely cause problems with S0ix. > > Alex > > On Fri, Nov 12, 2021 at 11:18 AM Alex Deucher > wrote: >> If the platform suspend happens to fail and the power rail >> is not turned off, the GPU will be in an unknown state on >> resume, so reset the asic so that it will be in a known >> good state on resume even if the platform suspend failed. >> >> Signed-off-by: Alex Deucher >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> index 1db76429a673..42af3d88e0ba 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> @@ -2165,8 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev) >> adev->in_s3 = true; >> r = amdgpu_device_suspend(drm_dev, true); >> adev->in_s3 = false; >> - >> - return r; >> + if (r) >> + return r; >> + return amdgpu_asic_reset(adev); >> } >> >> static int amdgpu_pmops_resume(struct device *dev) >> -- >> 2.31.1 >>
Re: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)
On Mon, Nov 15, 2021 at 2:50 AM Lazar, Lijo wrote: > > > > On 11/12/2021 9:55 PM, Alex Deucher wrote: > > If the platform suspend happens to fail and the power rail > > is not turned off, the GPU will be in an unknown state on > > resume, so reset the asic so that it will be in a known > > good state on resume even if the platform suspend failed. > > > > Any more background info on the issue? Is there a need to trigger BACO > or D3cold entry similar to how it's done for runtime suspend? Basically something like the following, user requests S3, drivers start to do their suspend thing, but then something interrupts it (e.g., user plugs/unplugs a usb device or S3 gets interrupted for something). At that point, the power rail has not been turned off. The kernel then starts calling the resume functions for each device because the suspend was aborted. However, since the power rail was not turned off, the GPU is still initialized so the driver can't properly re-init it without a reset. Alex > > Thanks, > Lijo > > > v2: handle s0ix > > > > Signed-off-by: Alex Deucher > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 - > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > index 1db76429a673..b4591f6e82dd 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > @@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device *dev) > > adev->in_s3 = true; > > r = amdgpu_device_suspend(drm_dev, true); > > adev->in_s3 = false; > > - > > + if (r) > > + return r; > > + if (!adev->in_s0ix) > > + r = amdgpu_asic_reset(adev); > > return r; > > } > > > >
Re: [PATCH] drm/amdgpu: always reset the asic in suspend
I was just about to write up my concern as well. IIRC we used to have that and it didn't really worked that well and we switched to resetting the GPU on driver load instead if initializing it doesn't work of hand. Christian. Am 12.11.21 um 17:19 schrieb Alex Deucher: Actually, ignore this for now. This will likely cause problems with S0ix. Alex On Fri, Nov 12, 2021 at 11:18 AM Alex Deucher wrote: If the platform suspend happens to fail and the power rail is not turned off, the GPU will be in an unknown state on resume, so reset the asic so that it will be in a known good state on resume even if the platform suspend failed. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 1db76429a673..42af3d88e0ba 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2165,8 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev) adev->in_s3 = true; r = amdgpu_device_suspend(drm_dev, true); adev->in_s3 = false; - - return r; + if (r) + return r; + return amdgpu_asic_reset(adev); } static int amdgpu_pmops_resume(struct device *dev) -- 2.31.1
Re: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)
On 11/12/2021 9:55 PM, Alex Deucher wrote: If the platform suspend happens to fail and the power rail is not turned off, the GPU will be in an unknown state on resume, so reset the asic so that it will be in a known good state on resume even if the platform suspend failed. Any more background info on the issue? Is there a need to trigger BACO or D3cold entry similar to how it's done for runtime suspend? Thanks, Lijo v2: handle s0ix Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 1db76429a673..b4591f6e82dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device *dev) adev->in_s3 = true; r = amdgpu_device_suspend(drm_dev, true); adev->in_s3 = false; - + if (r) + return r; + if (!adev->in_s0ix) + r = amdgpu_asic_reset(adev); return r; }
Re: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)
Acked-by: Luben Tuikov On 2021-11-12 11:25, Alex Deucher wrote: > If the platform suspend happens to fail and the power rail > is not turned off, the GPU will be in an unknown state on > resume, so reset the asic so that it will be in a known > good state on resume even if the platform suspend failed. > > v2: handle s0ix > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 1db76429a673..b4591f6e82dd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device *dev) > adev->in_s3 = true; > r = amdgpu_device_suspend(drm_dev, true); > adev->in_s3 = false; > - > + if (r) > + return r; > + if (!adev->in_s0ix) > + r = amdgpu_asic_reset(adev); > return r; > } >
Re: [PATCH] drm/amdgpu: always reset the asic in suspend
Actually, ignore this for now. This will likely cause problems with S0ix. Alex On Fri, Nov 12, 2021 at 11:18 AM Alex Deucher wrote: > > If the platform suspend happens to fail and the power rail > is not turned off, the GPU will be in an unknown state on > resume, so reset the asic so that it will be in a known > good state on resume even if the platform suspend failed. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 1db76429a673..42af3d88e0ba 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2165,8 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev) > adev->in_s3 = true; > r = amdgpu_device_suspend(drm_dev, true); > adev->in_s3 = false; > - > - return r; > + if (r) > + return r; > + return amdgpu_asic_reset(adev); > } > > static int amdgpu_pmops_resume(struct device *dev) > -- > 2.31.1 >