Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF

2023-05-17 Thread Alex Deucher
On Tue, May 16, 2023 at 10:23 PM Limonciello, Mario  wrote:
>
>
> On 5/16/2023 4:57 PM, Alex Deucher wrote:
> > On Tue, May 16, 2023 at 5:50 PM Limonciello, Mario  wrote:
> >>
> >> On 5/16/2023 4:39 PM, Alex Deucher wrote:
> >>> On Tue, May 16, 2023 at 2:15 PM Mario Limonciello
> >>>  wrote:
>  On GFX11 if RLC is stopped when not in GFXOFF the system will hang.
>  Prevent this case from ever happening.
> 
>  Tested-by: Juan Martinez 
>  Signed-off-by: Mario Limonciello 
>  ---
> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
> 1 file changed, 4 insertions(+)
> 
>  diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
>  b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>  index dcbdb2641086..f1f879d9ed8d 100644
>  --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>  +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>  @@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct 
>  amdgpu_device *adev)
> {
>    u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL);
> 
>  +   if (!adev->gfx.gfx_off_state) {
>  +   dev_err(adev->dev, "GFX is not in GFXOFF\n");
>  +   return;
>  +   }
> >>> This should move up before the RREG above?  Also, I think it would be
> >>> cleaner to just not mess with the RLC in S0i3.  Can we just return
> >>> early in smu_disable_dpms() for the APU case?  All of the DPM features
> >>> are controlled by the SMU so that function is mostly a nop of APUs
> >>> anyway.
> >>>
> >>> Alex
> >> That was what the original attempt did when we first identified this issue.
> >> Unfortunately though just skipping RLC (without patches 1 and 2) means
> >> that GFXOFF still either doesn't get toggled at suspend entry or isn't 
> >> fully
> >>
> >> off at suspend entry.
> >>
> >> This leads to the graphics core behaving erratically upon resume.
> >>
> >> So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip
> >> RLC for
> >> APU.
> > Sure.
> OK, let me double check RLC skip and I'll send out a v2.
> > I wonder if we need something similar as patch 2 for other APUs?
> I expect patch 1 "alone" to help Renoir and Cezanne hitting a similar
> circumstance.
> For Rembrandt and Mendocino, they don't have IMU, so what would you poll?

For older chips, mmPWR_MISC_CNTL_STATUS (smu10), mmSMUIO_GFX_MISC_CNTL
(smu12).  See smu_v12_0_gfx_off_control() and smu10_disable_gfx_off().
It looks like smu_v13_0_gfx_off_control() doesn't wait for gfxoff like
the other functions.

Alex

> >
> > Thinking out loud here, I wonder if we shouldn't just return early in
> > the top level suspend/resume functions for S0i3.
>
> I think this can make sense for GFX10 and GFX11 maybe, but as it's already
> bifurcated I think it's probably better to do case by case basis.
>


Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF

2023-05-16 Thread Limonciello, Mario



On 5/16/2023 4:57 PM, Alex Deucher wrote:

On Tue, May 16, 2023 at 5:50 PM Limonciello, Mario  wrote:


On 5/16/2023 4:39 PM, Alex Deucher wrote:

On Tue, May 16, 2023 at 2:15 PM Mario Limonciello
 wrote:

On GFX11 if RLC is stopped when not in GFXOFF the system will hang.
Prevent this case from ever happening.

Tested-by: Juan Martinez 
Signed-off-by: Mario Limonciello 
---
   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index dcbdb2641086..f1f879d9ed8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device 
*adev)
   {
  u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL);

+   if (!adev->gfx.gfx_off_state) {
+   dev_err(adev->dev, "GFX is not in GFXOFF\n");
+   return;
+   }

This should move up before the RREG above?  Also, I think it would be
cleaner to just not mess with the RLC in S0i3.  Can we just return
early in smu_disable_dpms() for the APU case?  All of the DPM features
are controlled by the SMU so that function is mostly a nop of APUs
anyway.

Alex

That was what the original attempt did when we first identified this issue.
Unfortunately though just skipping RLC (without patches 1 and 2) means
that GFXOFF still either doesn't get toggled at suspend entry or isn't fully

off at suspend entry.

This leads to the graphics core behaving erratically upon resume.

So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip
RLC for
APU.

Sure.

OK, let me double check RLC skip and I'll send out a v2.

I wonder if we need something similar as patch 2 for other APUs?
I expect patch 1 "alone" to help Renoir and Cezanne hitting a similar 
circumstance.

For Rembrandt and Mendocino, they don't have IMU, so what would you poll?


Thinking out loud here, I wonder if we shouldn't just return early in
the top level suspend/resume functions for S0i3.


I think this can make sense for GFX10 and GFX11 maybe, but as it's already
bifurcated I think it's probably better to do case by case basis.



Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF

2023-05-16 Thread Alex Deucher
On Tue, May 16, 2023 at 5:50 PM Limonciello, Mario  wrote:
>
>
> On 5/16/2023 4:39 PM, Alex Deucher wrote:
> > On Tue, May 16, 2023 at 2:15 PM Mario Limonciello
> >  wrote:
> >> On GFX11 if RLC is stopped when not in GFXOFF the system will hang.
> >> Prevent this case from ever happening.
> >>
> >> Tested-by: Juan Martinez 
> >> Signed-off-by: Mario Limonciello 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
> >>   1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
> >> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> index dcbdb2641086..f1f879d9ed8d 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> @@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device 
> >> *adev)
> >>   {
> >>  u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL);
> >>
> >> +   if (!adev->gfx.gfx_off_state) {
> >> +   dev_err(adev->dev, "GFX is not in GFXOFF\n");
> >> +   return;
> >> +   }
> > This should move up before the RREG above?  Also, I think it would be
> > cleaner to just not mess with the RLC in S0i3.  Can we just return
> > early in smu_disable_dpms() for the APU case?  All of the DPM features
> > are controlled by the SMU so that function is mostly a nop of APUs
> > anyway.
> >
> > Alex
> That was what the original attempt did when we first identified this issue.
> Unfortunately though just skipping RLC (without patches 1 and 2) means
> that GFXOFF still either doesn't get toggled at suspend entry or isn't fully
>
> off at suspend entry.
>
> This leads to the graphics core behaving erratically upon resume.
>
> So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip
> RLC for
> APU.

Sure.  I wonder if we need something similar as patch 2 for other APUs?

Thinking out loud here, I wonder if we shouldn't just return early in
the top level suspend/resume functions for S0i3.

Alex


Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF

2023-05-16 Thread Limonciello, Mario



On 5/16/2023 4:39 PM, Alex Deucher wrote:

On Tue, May 16, 2023 at 2:15 PM Mario Limonciello
 wrote:

On GFX11 if RLC is stopped when not in GFXOFF the system will hang.
Prevent this case from ever happening.

Tested-by: Juan Martinez 
Signed-off-by: Mario Limonciello 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index dcbdb2641086..f1f879d9ed8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device 
*adev)
  {
 u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL);

+   if (!adev->gfx.gfx_off_state) {
+   dev_err(adev->dev, "GFX is not in GFXOFF\n");
+   return;
+   }

This should move up before the RREG above?  Also, I think it would be
cleaner to just not mess with the RLC in S0i3.  Can we just return
early in smu_disable_dpms() for the APU case?  All of the DPM features
are controlled by the SMU so that function is mostly a nop of APUs
anyway.

Alex

That was what the original attempt did when we first identified this issue.
Unfortunately though just skipping RLC (without patches 1 and 2) means
that GFXOFF still either doesn't get toggled at suspend entry or isn't fully

off at suspend entry.

This leads to the graphics core behaving erratically upon resume.

So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip 
RLC for

APU.



Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF

2023-05-16 Thread Alex Deucher
On Tue, May 16, 2023 at 2:15 PM Mario Limonciello
 wrote:
>
> On GFX11 if RLC is stopped when not in GFXOFF the system will hang.
> Prevent this case from ever happening.
>
> Tested-by: Juan Martinez 
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index dcbdb2641086..f1f879d9ed8d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device 
> *adev)
>  {
> u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL);
>
> +   if (!adev->gfx.gfx_off_state) {
> +   dev_err(adev->dev, "GFX is not in GFXOFF\n");
> +   return;
> +   }

This should move up before the RREG above?  Also, I think it would be
cleaner to just not mess with the RLC in S0i3.  Can we just return
early in smu_disable_dpms() for the APU case?  All of the DPM features
are controlled by the SMU so that function is mostly a nop of APUs
anyway.

Alex

> tmp = REG_SET_FIELD(tmp, RLC_CNTL, RLC_ENABLE_F32, 0);
> WREG32_SOC15(GC, 0, regRLC_CNTL, tmp);
>  }
> --
> 2.34.1
>


[PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF

2023-05-16 Thread Mario Limonciello
On GFX11 if RLC is stopped when not in GFXOFF the system will hang.
Prevent this case from ever happening.

Tested-by: Juan Martinez 
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index dcbdb2641086..f1f879d9ed8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device 
*adev)
 {
u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL);
 
+   if (!adev->gfx.gfx_off_state) {
+   dev_err(adev->dev, "GFX is not in GFXOFF\n");
+   return;
+   }
tmp = REG_SET_FIELD(tmp, RLC_CNTL, RLC_ENABLE_F32, 0);
WREG32_SOC15(GC, 0, regRLC_CNTL, tmp);
 }
-- 
2.34.1