On 10/15/25 9:19 AM, Harry Wentland wrote:


On 2025-10-14 17:38, Mario Limonciello wrote:


On 10/14/2025 4:27 PM, Alex Deucher wrote:
On Tue, Oct 14, 2025 at 3:46 PM Mario Limonciello
<[email protected]> wrote:

[Why]
If userspace hasn't frozen user processes (like systemd does with
user.slice) then an aborted hibernate could give control back to
userspace before display hardware is resumed.  IoW an atomic commit could
be done while the hardware is in D3, which could hang a system.

Is there any way to prevent this altogether?

The obvious way is that userspace should be freezing user processes. That's 
what systemd does.

This seems like a recipe
for trouble for any driver.

If we want to uplevel this kind of check I think we would need some helper to 
report hardware status into drm and drm would have to call that.

Most distros use systemd, and this only happened in an aborted hibernate.  I 
guess I'd like to see how much this warning actually comes up before deciding 
if all that plumbing is worth it.

I was briefly thinking about a generic solution as well and don't
think it's worth it at this point. This is mostly about internal
driver/HW state.

Harry

FWIW IGT testing seems to have thrown up some problems with this which didn't show up in my unit testing.

[  471.261018]  ? drm_atomic_helper_resume+0x2b/0x150 [drm_kms_helper]
[  471.261031]  ? dm_resume+0x459/0x8f0 [amdgpu]
[  471.261396]  ? amdgpu_ip_block_resume+0x27/0x70 [amdgpu]
[  471.261635]  ? dpm_run_callback+0x9c/0x200
[  471.261640]  ? device_resume+0x1b4/0x2b0
[  471.261645]  drm_atomic_check_only+0x5d9/0x9e0 [drm]
[  471.261672]  drm_atomic_commit+0x6d/0xe0 [drm]
[  471.261697]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[ 471.261729] drm_atomic_helper_commit_duplicated_state+0xcd/0xe0 [drm_kms_helper]
[  471.261739]  drm_atomic_helper_resume+0x93/0x150 [drm_kms_helper]
[  471.261751]  dm_resume+0x459/0x8f0 [amdgpu]
[  471.262119]  ? preempt_count_add+0x7b/0xc0
[  471.262125]  amdgpu_ip_block_resume+0x27/0x70 [amdgpu]
[  471.262363]  amdgpu_device_ip_resume_phase3+0x54/0x80 [amdgpu]
[  471.262598]  amdgpu_device_resume+0x188/0x3b0 [amdgpu]
[  471.262842]  amdgpu_pmops_resume+0x4c/0x90 [amdgpu]
[  471.263078]  pci_pm_resume+0x6b/0x100

The issue appears to me to be that ip_block->status.hw isn't set again until the end of amdgpu_ip_block_resume().

I am tempted to move it before the call to ip_block->version->funcs->resume().

Thoughts?



Alex


[How]
Add a check whether the IP block hardware is ready to the atomic check
handler and return a failure. Userspace shouldn't do an atomic commit if
the atomic check fails.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4627
Signed-off-by: Mario Limonciello <[email protected]>
---
Cc: Harry Wentland <[email protected]>
v2:
   * Return -EBUSY instead (Harry)
---
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 +++++
   1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 6446ec6c21d4..f5cd9982af99 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -12010,6 +12010,11 @@ static int amdgpu_dm_atomic_check(struct drm_device 
*dev,

          trace_amdgpu_dm_atomic_check_begin(state);

+       if (WARN_ON(unlikely(!amdgpu_device_ip_is_hw(adev, 
AMD_IP_BLOCK_TYPE_DCE)))) {
+               ret = -EBUSY;
+               goto fail;
+       }
+
          ret = drm_atomic_helper_check_modeset(dev, state);
          if (ret) {
                  drm_dbg_atomic(dev, "drm_atomic_helper_check_modeset() 
failed\n");
--
2.49.0




Reply via email to