On 10/15/25 9:30 AM, Alex Deucher wrote:
On Wed, Oct 15, 2025 at 10:19 AM Harry Wentland <[email protected]> wrote:
On 2025-10-14 17:38, Mario Limonciello wrote:
On 10/14/2025 4:27 PM, Alex Deucher wrote:
On Tue, Oct 14, 2025 at 3:46 PM Mario Limonciello
<[email protected]> wrote:
[Why]
If userspace hasn't frozen user processes (like systemd does with
user.slice) then an aborted hibernate could give control back to
userspace before display hardware is resumed. IoW an atomic commit could
be done while the hardware is in D3, which could hang a system.
Is there any way to prevent this altogether?
The obvious way is that userspace should be freezing user processes. That's
what systemd does.
This seems like a recipe
for trouble for any driver.
If we want to uplevel this kind of check I think we would need some helper to
report hardware status into drm and drm would have to call that.
Most distros use systemd, and this only happened in an aborted hibernate. I
guess I'd like to see how much this warning actually comes up before deciding
if all that plumbing is worth it.
I was briefly thinking about a generic solution as well and don't
think it's worth it at this point. This is mostly about internal
driver/HW state.
Allowing userspace to be running while the kernel is either entering
or exiting suspend or hibernation is a recipe for problems. It seems
to me that the kernel should stop userspace before it tries to do any
of this and only restores it when everything has resumed. Having
userspace stop itself seems broken by design. Applications can
allocate memory or call IOCTLs into any driver which may not be
functional at that point in time, or depending on the timing, the
user's state is lost because they did something to it after the kernel
already created the hibernation image. Fixes like this seem like
playing wack-a-mole. It might be the GPU driver today, but it might
be a usb driver tomorrow.
The kernel does freeze threads once you're far enough "into" the sequence.
I'll look through the aborted hibernate flow and see if I can find out
why this is happening. Maybe they are thawing at the wrong time.
Alex
Harry
Alex
[How]
Add a check whether the IP block hardware is ready to the atomic check
handler and return a failure. Userspace shouldn't do an atomic commit if
the atomic check fails.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4627
Signed-off-by: Mario Limonciello <[email protected]>
---
Cc: Harry Wentland <[email protected]>
v2:
* Return -EBUSY instead (Harry)
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 6446ec6c21d4..f5cd9982af99 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -12010,6 +12010,11 @@ static int amdgpu_dm_atomic_check(struct drm_device
*dev,
trace_amdgpu_dm_atomic_check_begin(state);
+ if (WARN_ON(unlikely(!amdgpu_device_ip_is_hw(adev,
AMD_IP_BLOCK_TYPE_DCE)))) {
+ ret = -EBUSY;
+ goto fail;
+ }
+
ret = drm_atomic_helper_check_modeset(dev, state);
if (ret) {
drm_dbg_atomic(dev, "drm_atomic_helper_check_modeset()
failed\n");
--
2.49.0