On 3/2/2023 2:43 PM, Quan, Evan wrote:
[AMD Official Use Only - General]



-----Original Message-----
From: Lazar, Lijo <lijo.la...@amd.com>
Sent: Thursday, March 2, 2023 4:28 PM
To: Quan, Evan <evan.q...@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <alexander.deuc...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: disable cstate properly for driver
reloading scenario



On 3/2/2023 12:28 PM, Evan Quan wrote:
Gpu reset might be needed during driver reloading. To guard that(gpu
reset) work, df cstate needs to be disabled properly.

Signed-off-by: Evan Quan <evan.q...@amd.com>
Change-Id: I5c074c265c0b08a67b6934ae1ad9aa3fed245461
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++++
   1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 51bbeaa1f311..3c854461ef32 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2816,6 +2816,15 @@ static int amdgpu_device_ip_fini_early(struct
amdgpu_device *adev)
        amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
        amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);

+       /*
+        * Get df cstate disabled properly on driver unloading.
+        * Since on the succeeding driver reloading, gpu reset might
+        * be required. And cstate disabled is a prerequisite for
+        * that(gpu reset).
+        */
+       if (amdgpu_dpm_set_df_cstate(adev, DF_CSTATE_DISALLOW))
+               dev_warn(adev->dev, "Failed to disallow df cstate");
+

This looks more like a firmware bug. Driver sends the Unload message to FW.
In that case FW should disable all features including C-state.
Driver does not send the Unload message. We want PMFM alive and ready for 
handling possible gpu reset on reloading.


Actually, soc21_need_reset_on_init code itself has a bug. PSP won't get unloaded by default on ring destruction. Even if PSP stops, it could just keep the heartbeat value as non-zero (just that it won't increment).

Probably, that needs to be fixed first rather than keeping PMFW alive for a reset.

Thanks,
Lijo

BR
Evan

Thanks,
Lijo

        amdgpu_amdkfd_suspend(adev, false);

        /* Workaroud for ASICs need to disable SMC first */

Reply via email to