Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF
On Tue, May 16, 2023 at 10:23 PM Limonciello, Mario wrote: > > > On 5/16/2023 4:57 PM, Alex Deucher wrote: > > On Tue, May 16, 2023 at 5:50 PM Limonciello, Mario wrote: > >> > >> On 5/16/2023 4:39 PM, Alex Deucher wrote: > >>> On Tue, May 16, 2023 at 2:15 PM Mario Limonciello > >>> wrote: > On GFX11 if RLC is stopped when not in GFXOFF the system will hang. > Prevent this case from ever happening. > > Tested-by: Juan Martinez > Signed-off-by: Mario Limonciello > --- > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > index dcbdb2641086..f1f879d9ed8d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > @@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct > amdgpu_device *adev) > { > u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL); > > + if (!adev->gfx.gfx_off_state) { > + dev_err(adev->dev, "GFX is not in GFXOFF\n"); > + return; > + } > >>> This should move up before the RREG above? Also, I think it would be > >>> cleaner to just not mess with the RLC in S0i3. Can we just return > >>> early in smu_disable_dpms() for the APU case? All of the DPM features > >>> are controlled by the SMU so that function is mostly a nop of APUs > >>> anyway. > >>> > >>> Alex > >> That was what the original attempt did when we first identified this issue. > >> Unfortunately though just skipping RLC (without patches 1 and 2) means > >> that GFXOFF still either doesn't get toggled at suspend entry or isn't > >> fully > >> > >> off at suspend entry. > >> > >> This leads to the graphics core behaving erratically upon resume. > >> > >> So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip > >> RLC for > >> APU. > > Sure. > OK, let me double check RLC skip and I'll send out a v2. > > I wonder if we need something similar as patch 2 for other APUs? > I expect patch 1 "alone" to help Renoir and Cezanne hitting a similar > circumstance. > For Rembrandt and Mendocino, they don't have IMU, so what would you poll? For older chips, mmPWR_MISC_CNTL_STATUS (smu10), mmSMUIO_GFX_MISC_CNTL (smu12). See smu_v12_0_gfx_off_control() and smu10_disable_gfx_off(). It looks like smu_v13_0_gfx_off_control() doesn't wait for gfxoff like the other functions. Alex > > > > Thinking out loud here, I wonder if we shouldn't just return early in > > the top level suspend/resume functions for S0i3. > > I think this can make sense for GFX10 and GFX11 maybe, but as it's already > bifurcated I think it's probably better to do case by case basis. >
Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF
On 5/16/2023 4:57 PM, Alex Deucher wrote: On Tue, May 16, 2023 at 5:50 PM Limonciello, Mario wrote: On 5/16/2023 4:39 PM, Alex Deucher wrote: On Tue, May 16, 2023 at 2:15 PM Mario Limonciello wrote: On GFX11 if RLC is stopped when not in GFXOFF the system will hang. Prevent this case from ever happening. Tested-by: Juan Martinez Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index dcbdb2641086..f1f879d9ed8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device *adev) { u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL); + if (!adev->gfx.gfx_off_state) { + dev_err(adev->dev, "GFX is not in GFXOFF\n"); + return; + } This should move up before the RREG above? Also, I think it would be cleaner to just not mess with the RLC in S0i3. Can we just return early in smu_disable_dpms() for the APU case? All of the DPM features are controlled by the SMU so that function is mostly a nop of APUs anyway. Alex That was what the original attempt did when we first identified this issue. Unfortunately though just skipping RLC (without patches 1 and 2) means that GFXOFF still either doesn't get toggled at suspend entry or isn't fully off at suspend entry. This leads to the graphics core behaving erratically upon resume. So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip RLC for APU. Sure. OK, let me double check RLC skip and I'll send out a v2. I wonder if we need something similar as patch 2 for other APUs? I expect patch 1 "alone" to help Renoir and Cezanne hitting a similar circumstance. For Rembrandt and Mendocino, they don't have IMU, so what would you poll? Thinking out loud here, I wonder if we shouldn't just return early in the top level suspend/resume functions for S0i3. I think this can make sense for GFX10 and GFX11 maybe, but as it's already bifurcated I think it's probably better to do case by case basis.
Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF
On Tue, May 16, 2023 at 5:50 PM Limonciello, Mario wrote: > > > On 5/16/2023 4:39 PM, Alex Deucher wrote: > > On Tue, May 16, 2023 at 2:15 PM Mario Limonciello > > wrote: > >> On GFX11 if RLC is stopped when not in GFXOFF the system will hang. > >> Prevent this case from ever happening. > >> > >> Tested-by: Juan Martinez > >> Signed-off-by: Mario Limonciello > >> --- > >> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > >> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > >> index dcbdb2641086..f1f879d9ed8d 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > >> @@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device > >> *adev) > >> { > >> u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL); > >> > >> + if (!adev->gfx.gfx_off_state) { > >> + dev_err(adev->dev, "GFX is not in GFXOFF\n"); > >> + return; > >> + } > > This should move up before the RREG above? Also, I think it would be > > cleaner to just not mess with the RLC in S0i3. Can we just return > > early in smu_disable_dpms() for the APU case? All of the DPM features > > are controlled by the SMU so that function is mostly a nop of APUs > > anyway. > > > > Alex > That was what the original attempt did when we first identified this issue. > Unfortunately though just skipping RLC (without patches 1 and 2) means > that GFXOFF still either doesn't get toggled at suspend entry or isn't fully > > off at suspend entry. > > This leads to the graphics core behaving erratically upon resume. > > So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip > RLC for > APU. Sure. I wonder if we need something similar as patch 2 for other APUs? Thinking out loud here, I wonder if we shouldn't just return early in the top level suspend/resume functions for S0i3. Alex
Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF
On 5/16/2023 4:39 PM, Alex Deucher wrote: On Tue, May 16, 2023 at 2:15 PM Mario Limonciello wrote: On GFX11 if RLC is stopped when not in GFXOFF the system will hang. Prevent this case from ever happening. Tested-by: Juan Martinez Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index dcbdb2641086..f1f879d9ed8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device *adev) { u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL); + if (!adev->gfx.gfx_off_state) { + dev_err(adev->dev, "GFX is not in GFXOFF\n"); + return; + } This should move up before the RREG above? Also, I think it would be cleaner to just not mess with the RLC in S0i3. Can we just return early in smu_disable_dpms() for the APU case? All of the DPM features are controlled by the SMU so that function is mostly a nop of APUs anyway. Alex That was what the original attempt did when we first identified this issue. Unfortunately though just skipping RLC (without patches 1 and 2) means that GFXOFF still either doesn't get toggled at suspend entry or isn't fully off at suspend entry. This leads to the graphics core behaving erratically upon resume. So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip RLC for APU.
Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF
On Tue, May 16, 2023 at 2:15 PM Mario Limonciello wrote: > > On GFX11 if RLC is stopped when not in GFXOFF the system will hang. > Prevent this case from ever happening. > > Tested-by: Juan Martinez > Signed-off-by: Mario Limonciello > --- > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > index dcbdb2641086..f1f879d9ed8d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > @@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device > *adev) > { > u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL); > > + if (!adev->gfx.gfx_off_state) { > + dev_err(adev->dev, "GFX is not in GFXOFF\n"); > + return; > + } This should move up before the RREG above? Also, I think it would be cleaner to just not mess with the RLC in S0i3. Can we just return early in smu_disable_dpms() for the APU case? All of the DPM features are controlled by the SMU so that function is mostly a nop of APUs anyway. Alex > tmp = REG_SET_FIELD(tmp, RLC_CNTL, RLC_ENABLE_F32, 0); > WREG32_SOC15(GC, 0, regRLC_CNTL, tmp); > } > -- > 2.34.1 >
[PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF
On GFX11 if RLC is stopped when not in GFXOFF the system will hang. Prevent this case from ever happening. Tested-by: Juan Martinez Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index dcbdb2641086..f1f879d9ed8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device *adev) { u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL); + if (!adev->gfx.gfx_off_state) { + dev_err(adev->dev, "GFX is not in GFXOFF\n"); + return; + } tmp = REG_SET_FIELD(tmp, RLC_CNTL, RLC_ENABLE_F32, 0); WREG32_SOC15(GC, 0, regRLC_CNTL, tmp); } -- 2.34.1