RE: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)

2021-11-15 Thread Quan, Evan
[AMD Official Use Only]

Acked-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Saturday, November 13, 2021 12:26 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)
> 
> If the platform suspend happens to fail and the power rail
> is not turned off, the GPU will be in an unknown state on
> resume, so reset the asic so that it will be in a known
> good state on resume even if the platform suspend failed.
> 
> v2: handle s0ix
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 1db76429a673..b4591f6e82dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device
> *dev)
>   adev->in_s3 = true;
>   r = amdgpu_device_suspend(drm_dev, true);
>   adev->in_s3 = false;
> -
> + if (r)
> + return r;
> + if (!adev->in_s0ix)
> + r = amdgpu_asic_reset(adev);
>   return r;
>  }
> 
> --
> 2.31.1


Re: [PATCH] drm/amdgpu: always reset the asic in suspend

2021-11-15 Thread Deucher, Alexander
[AMD Official Use Only]

Well, that handles the case of the GPU needing to be reset on driver (e.g., 
virtualization), but doesn't handle the interrupted suspend case (e.g., when 
suspend is unwound before the power rail was turned off).  We already so 
something similar for hibernate to deal with the multiple freeze and thaw 
cycles.

Alex


From: Christian König 
Sent: Monday, November 15, 2021 8:41 AM
To: Alex Deucher ; Deucher, Alexander 

Cc: amd-gfx list 
Subject: Re: [PATCH] drm/amdgpu: always reset the asic in suspend

I was just about to write up my concern as well.

IIRC we used to have that and it didn't really worked that well and we
switched to resetting the GPU on driver load instead if initializing it
doesn't work of hand.

Christian.

Am 12.11.21 um 17:19 schrieb Alex Deucher:
> Actually, ignore this for now.  This will likely cause problems with S0ix.
>
> Alex
>
> On Fri, Nov 12, 2021 at 11:18 AM Alex Deucher  
> wrote:
>> If the platform suspend happens to fail and the power rail
>> is not turned off, the GPU will be in an unknown state on
>> resume, so reset the asic so that it will be in a known
>> good state on resume even if the platform suspend failed.
>>
>> Signed-off-by: Alex Deucher 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 1db76429a673..42af3d88e0ba 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -2165,8 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev)
>>  adev->in_s3 = true;
>>  r = amdgpu_device_suspend(drm_dev, true);
>>  adev->in_s3 = false;
>> -
>> -   return r;
>> +   if (r)
>> +   return r;
>> +   return amdgpu_asic_reset(adev);
>>   }
>>
>>   static int amdgpu_pmops_resume(struct device *dev)
>> --
>> 2.31.1
>>



Re: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)

2021-11-15 Thread Alex Deucher
On Mon, Nov 15, 2021 at 2:50 AM Lazar, Lijo  wrote:
>
>
>
> On 11/12/2021 9:55 PM, Alex Deucher wrote:
> > If the platform suspend happens to fail and the power rail
> > is not turned off, the GPU will be in an unknown state on
> > resume, so reset the asic so that it will be in a known
> > good state on resume even if the platform suspend failed.
> >
>
> Any more background info on the issue? Is there a need to trigger BACO
> or D3cold entry similar to how it's done for runtime suspend?

Basically something like the following, user requests S3, drivers
start to do their suspend thing, but then something interrupts it
(e.g., user plugs/unplugs a usb device or S3 gets interrupted for
something).  At that point, the power rail has not been turned off.
The kernel then starts calling the resume functions for each device
because the suspend was aborted.  However, since the power rail was
not turned off, the GPU is still initialized so the driver can't
properly re-init it without a reset.

Alex


>
> Thanks,
> Lijo
>
> > v2: handle s0ix
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 -
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 1db76429a673..b4591f6e82dd 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device *dev)
> >   adev->in_s3 = true;
> >   r = amdgpu_device_suspend(drm_dev, true);
> >   adev->in_s3 = false;
> > -
> > + if (r)
> > + return r;
> > + if (!adev->in_s0ix)
> > + r = amdgpu_asic_reset(adev);
> >   return r;
> >   }
> >
> >


Re: [PATCH] drm/amdgpu: always reset the asic in suspend

2021-11-15 Thread Christian König

I was just about to write up my concern as well.

IIRC we used to have that and it didn't really worked that well and we 
switched to resetting the GPU on driver load instead if initializing it 
doesn't work of hand.


Christian.

Am 12.11.21 um 17:19 schrieb Alex Deucher:

Actually, ignore this for now.  This will likely cause problems with S0ix.

Alex

On Fri, Nov 12, 2021 at 11:18 AM Alex Deucher  wrote:

If the platform suspend happens to fail and the power rail
is not turned off, the GPU will be in an unknown state on
resume, so reset the asic so that it will be in a known
good state on resume even if the platform suspend failed.

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 1db76429a673..42af3d88e0ba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2165,8 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev)
 adev->in_s3 = true;
 r = amdgpu_device_suspend(drm_dev, true);
 adev->in_s3 = false;
-
-   return r;
+   if (r)
+   return r;
+   return amdgpu_asic_reset(adev);
  }

  static int amdgpu_pmops_resume(struct device *dev)
--
2.31.1





Re: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)

2021-11-14 Thread Lazar, Lijo




On 11/12/2021 9:55 PM, Alex Deucher wrote:

If the platform suspend happens to fail and the power rail
is not turned off, the GPU will be in an unknown state on
resume, so reset the asic so that it will be in a known
good state on resume even if the platform suspend failed.



Any more background info on the issue? Is there a need to trigger BACO 
or D3cold entry similar to how it's done for runtime suspend?


Thanks,
Lijo


v2: handle s0ix

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 1db76429a673..b4591f6e82dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device *dev)
adev->in_s3 = true;
r = amdgpu_device_suspend(drm_dev, true);
adev->in_s3 = false;
-
+   if (r)
+   return r;
+   if (!adev->in_s0ix)
+   r = amdgpu_asic_reset(adev);
return r;
  }
  



Re: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)

2021-11-12 Thread Luben Tuikov


Acked-by: Luben Tuikov 


On 2021-11-12 11:25, Alex Deucher wrote:
> If the platform suspend happens to fail and the power rail
> is not turned off, the GPU will be in an unknown state on
> resume, so reset the asic so that it will be in a known
> good state on resume even if the platform suspend failed.
>
> v2: handle s0ix
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 1db76429a673..b4591f6e82dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device *dev)
>   adev->in_s3 = true;
>   r = amdgpu_device_suspend(drm_dev, true);
>   adev->in_s3 = false;
> -
> + if (r)
> + return r;
> + if (!adev->in_s0ix)
> + r = amdgpu_asic_reset(adev);
>   return r;
>  }
>  



Re: [PATCH] drm/amdgpu: always reset the asic in suspend

2021-11-12 Thread Alex Deucher
Actually, ignore this for now.  This will likely cause problems with S0ix.

Alex

On Fri, Nov 12, 2021 at 11:18 AM Alex Deucher  wrote:
>
> If the platform suspend happens to fail and the power rail
> is not turned off, the GPU will be in an unknown state on
> resume, so reset the asic so that it will be in a known
> good state on resume even if the platform suspend failed.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 1db76429a673..42af3d88e0ba 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2165,8 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev)
> adev->in_s3 = true;
> r = amdgpu_device_suspend(drm_dev, true);
> adev->in_s3 = false;
> -
> -   return r;
> +   if (r)
> +   return r;
> +   return amdgpu_asic_reset(adev);
>  }
>
>  static int amdgpu_pmops_resume(struct device *dev)
> --
> 2.31.1
>