Am 13.06.24 um 04:25 schrieb YiPeng Chai:
If gpu is recovering, clear all message reset flags
in fifo and wait for gpu to complete recovery.

Signed-off-by: YiPeng Chai <yipeng.c...@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12 ++++++++++++
  1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 341c9bd0d1a4..bf4f8d439ebe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2982,6 +2982,18 @@ static int amdgpu_ras_page_retirement_thread(void *param)
atomic_dec(&con->page_retirement_req_cnt); + reinit_completion(&con->gpu_reset_completion);
+
+               if (amdgpu_in_reset(adev) || atomic_read(&con->in_recovery)) {

It's illegal to call amdgpu_in_reset() from outside of the hw specific backends.

When you want to make the code mutual exclusive with GPU resets you need to grab the reset lock.

Regards,
Christian.

+                       uint32_t reset;
+
+                       amdgpu_ras_clear_poison_fifo_msg_reset_flag(adev, 
&reset);
+
+                       if 
(!wait_for_completion_timeout(&con->gpu_reset_completion,
+                               
msecs_to_jiffies(MAX_GPU_RESET_COMPLETION_TIME)))
+                               dev_err(adev->dev, "Waiting for GPU to complete 
reset timeout!\n");
+               }
+
  #ifdef HAVE_KFIFO_PUT_NON_POINTER
                if (!amdgpu_ras_get_poison_req(adev, &poison_msg))
                        continue;

Reply via email to