On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.

Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.

The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.

v2: switch to use amdgpu_xcp_restore_partition_mode().
v3: use in_suspend to replace in_s4.

Signed-off-by: Samuel Zhang <[email protected]>
---
 drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c    | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
index 811124ff88a8..f9e2edf5260b 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
@@ -407,7 +407,8 @@ static int aqua_vanjaram_switch_partition_mode(struct 
amdgpu_xcp_mgr *xcp_mgr,
                return -EINVAL;
        }
 
-       if (adev->kfd.init_complete && !amdgpu_in_reset(adev))
+       if (adev->kfd.init_complete && !amdgpu_in_reset(adev) &&
+               !adev->in_suspend)
                flags |= AMDGPU_XCP_OPS_KFD;
 
        if (flags & AMDGPU_XCP_OPS_KFD) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index c4c551ef6b87..a5748088d9a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -2291,7 +2291,9 @@ static int gfx_v9_4_3_cp_resume(struct amdgpu_device 
*adev)
                r = amdgpu_xcp_init(adev->xcp_mgr, num_xcp, mode);
 
        } else {
-               if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
+               if (adev->in_suspend) /* Restore if resuming from suspend */
+                       amdgpu_xcp_restore_partition_mode(adev->xcp_mgr);
+               else if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
                                                    AMDGPU_XCP_FL_NONE) ==
                    AMDGPU_UNKNOWN_COMPUTE_PARTITION_MODE)
                        r = amdgpu_xcp_switch_partition_mode(
-- 
2.27.0

Reply via email to