If a suspend fails the PM core doesn't clean it up, the device
is just left in a bad state. If this happens during memory pressure
it could be a hung system from just trying to suspend.
For all phases of suspend that return an error code, add an unwind
flow that will (try to) resume exactly the parts that have failed.
If this fails, then reset the GPU during complete() callback.
v5:
* Take RLC patch from Alex's Van Gogh series, slight modifications
* Unwind in middle of IP suspend too
* Fix missing call to fix console
* Cover issues with DPM_FLAG_SMART_SUSPEND
Alex Deucher (1):
drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()
Mario Limonciello (AMD) (3):
drm/amd: Reset the GPU if pmops failed
drm/amd: Add an unwind for failures in
amdgpu_device_ip_suspend_phase1()
drm/amd: Add an unwind for failures in
amdgpu_device_ip_suspend_phase2()
drm/amd: Unwind for failed device suspend
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 121 ++++++++++++++++++---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 11 ++
drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 18 ---
drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 2 -
4 files changed, 117 insertions(+), 35 deletions(-)
--
2.51.1