If the GPU fails to suspend the return code is passed up to the caller
but it's left in an inconsistent state.  This could lead to hangs
if userspace tries to access it.

The last stage of all pmpops calls (success or fail) is the complete()
callback.  If by the time the PM core reaches this state the GPU is still
in suspend something went really wrong, so reset it.

Signed-off-by: Mario Limonciello (AMD) <[email protected]>
---
v5:
 * Handle case of DPM_FLAG_SMART_SUSPEND (Lijo)
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index cee90f9e58a9..4d437e31d1bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2590,6 +2590,17 @@ static int amdgpu_pmops_prepare(struct device *dev)
 
 static void amdgpu_pmops_complete(struct device *dev)
 {
+       struct drm_device *drm_dev = dev_get_drvdata(dev);
+       struct amdgpu_device *adev = drm_to_adev(drm_dev);
+
+       /* sequence failed, use a big 🔨 try to cleanup */
+       if (adev->in_suspend && !pm_runtime_suspended(dev)) {
+               adev->in_suspend = adev->in_s0ix = adev->in_s3 = false;
+               dev_crit(adev->dev, "pmpops sequence failed, resetting\n");
+               amdgpu_asic_reset(adev);
+               return;
+       }
+
        amdgpu_device_complete(dev_get_drvdata(dev));
 }
 
-- 
2.51.1

Reply via email to