[Public]

One problem is it's also bypassing a valid 0 response which usually means FW 
may not have completed processing the previous message.

What I thought was is it shouldn't even attempt sending a message if it 
identified a FW hang.

Is there a possibility to have the same problem whenever there is SRIOV full 
access - as in before/after reset etc.?

If state == FW_INIT, ignore response state before sending the message.
If there is no expected response to a message, make the state to FW_HANG. This 
part is tricky as what qualifies as a FW hang could change based on the 
specific SOC's message. Avoiding bool for this reason; to keep it open for 
having other FW states.
If state == FW_HANG don't even attempt to send the message.

Move FW state to FW_INIT whenever there is init/resume sequence - 
hw_init/hw_resume?

Thanks,
Lijo
-----Original Message-----
From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Danijel 
Slivka
Sent: Monday, July 8, 2024 1:37 PM
To: amd-gfx@lists.freedesktop.org
Cc: Slivka, Danijel <danijel.sli...@amd.com>
Subject: [PATCH] drm/amd/pm: Ignore initial value in smu response register

Why:
If the reg mmMP1_SMN_C2PMSG_90 is being written to during amdgpu driver load or 
driver unload, subsequent amdgpu driver load will fail at smu_hw_init. The 
default of mmMP1_SMN_C2PMSG_90 register at a clean environment is 0x1 and if 
value differs from expected, amdgpu driver load will fail.

How to fix:
Ignore the initial value in smu response register before the first smu message 
is sent, proceed further to send the message. If register holds
0x0 or an unexpected value after smu message was sent set fw_state_hang flag 
and no further smu messages will be sent.

Signed-off-by: Danijel Slivka <danijel.sli...@amd.com>
---
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 1 +
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c        | 7 +++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index a34c802f52be..bfe08fa0db6d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -562,6 +562,7 @@ struct smu_context {
        uint32_t smc_fw_if_version;
        uint32_t smc_fw_version;
        uint32_t smc_fw_caps;
+       bool smc_fw_state_hang;

        bool uploading_custom_pp_table;
        bool dc_controlled_by_gpio;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 5592fd825aa3..9e4e62dcbee7 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -421,7 +421,7 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
        if (poll) {
                reg = __smu_cmn_poll_stat(smu);
                res = __smu_cmn_reg2errno(smu, reg);
-               if (reg == SMU_RESP_NONE || res == -EREMOTEIO) {
+               if ((reg == SMU_RESP_NONE || res == -EREMOTEIO) &&
+smu->smc_fw_state_hang) {
                        __smu_cmn_reg_print_error(smu, reg, index, param, msg);
                        goto Out;
                }
@@ -429,8 +429,11 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context 
*smu,
        __smu_cmn_send_msg(smu, (uint16_t) index, param);
        reg = __smu_cmn_poll_stat(smu);
        res = __smu_cmn_reg2errno(smu, reg);
-       if (res != 0)
+       if (res != 0) {
+               if (reg == SMU_RESP_NONE || res == -EREMOTEIO)
+                       smu->smc_fw_state_hang = true;
                __smu_cmn_reg_print_error(smu, reg, index, param, msg);
+       }
        if (read_arg) {
                smu_cmn_read_arg(smu, read_arg);
                dev_dbg(adev->dev, "smu send message: %s(%d) param: 0x%08x, 
resp: 0x%08x,\
--
2.34.1

Reply via email to