On Sat, Jun 21, 2025 at 07:00:56PM +0200, Matthieu Herrb wrote:
> On Wed, Jun 04, 2025 at 08:39:19AM +0100, Laurence Tratt wrote:
> > As of somewhere in the last 10-14 days or so, my amd64 machine has stopped
> > reliably suspending (for reference: it's reliably suspended for the past
> > 9-10 months). Unfortunately I haven't worked out *what* the sequence of
> > events that causes this is, so bisecting has proven impossible (I
> > tried...). When `zzz` is unsuccessful, X disappears, the machine drops to
> > the console (i.e. displaying "Login:" or whatever was there last), and then
> > is stuck there indefinitely: it doesn't get as far as "Syncing disks". The
> > machine's fans spin up after a while, so presumably something is stuck in a
> > busy loop.
> 
> Hi,
> 
> I'm seeing something similar on my X395 since some weeks. I *think*
> failure to suspend happens only after I've played a few large (HD) video
> with mpv, maybe in chrome too. I've not tried to figure out a
> reproducer.
> 
> When it happens, and the machine sits a the login prompt, I can still
> interact with it for a couple of minutes before it freezes: type
> commands, enter ddb,... I cannot switch back to X (either instant
> freeze or panic) nor retry zzz. halt -p sometimes works, sometimes
> unlock the pending supend (so the machine suspends, and powers itself
> off shortly after resume).
> 
> Unfortunatly I've no idea what kind of resources to look for that
> would be blocking the suspend path. Any clues of stuff I can run on
> the console or in ddb next time this happens ?

I suggested Laurence try a revert of 6.12.31's
'Revert "drm/amd: Stop evicting resources on APUs in suspend"'
And after a week or so of testing he hasn't had a problem.

Perhaps it helps in your case as well?  A T495 with the same
APU as the X395 (Picasso) suspends and resumes fine without this.

diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu.h 
sys/dev/pci/drm/amd/amdgpu/amdgpu.h
index 0a7bf1d3839..e223d956c28 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu.h
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu.h
@@ -1602,9 +1602,11 @@ static inline void amdgpu_acpi_get_backlight_caps(struct 
amdgpu_dm_backlight_cap
 #if defined(CONFIG_ACPI) && defined(CONFIG_SUSPEND)
 bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev);
 bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev);
+void amdgpu_choose_low_power_state(struct amdgpu_device *adev);
 #else
 static inline bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) { 
return false; }
 static inline bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev) { 
return false; }
+static inline void amdgpu_choose_low_power_state(struct amdgpu_device *adev) { 
}
 #endif
 
 void amdgpu_register_gpu_instance(struct amdgpu_device *adev);
diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu_acpi.c 
sys/dev/pci/drm/amd/amdgpu/amdgpu_acpi.c
index b83527a69ba..8ad181eda2e 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_acpi.c
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1567,4 +1567,22 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device 
*adev)
 #endif /* CONFIG_AMD_PMC */
 }
 
+/**
+ * amdgpu_choose_low_power_state
+ *
+ * @adev: amdgpu_device_pointer
+ *
+ * Choose the target low power state for the GPU
+ */
+void amdgpu_choose_low_power_state(struct amdgpu_device *adev)
+{
+       if (adev->in_runpm)
+               return;
+
+       if (amdgpu_acpi_is_s0ix_active(adev))
+               adev->in_s0ix = true;
+       else if (amdgpu_acpi_is_s3_active(adev))
+               adev->in_s3 = true;
+}
+
 #endif /* CONFIG_SUSPEND */
diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c 
sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
index c4a899194dc..14bae1eacf4 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
@@ -4897,13 +4897,15 @@ int amdgpu_device_prepare(struct drm_device *dev)
        struct amdgpu_device *adev = drm_to_adev(dev);
        int i, r;
 
+       amdgpu_choose_low_power_state(adev);
+
        if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
                return 0;
 
        /* Evict the majority of BOs before starting suspend sequence */
        r = amdgpu_device_evict_resources(adev);
        if (r)
-               return r;
+               goto unprepare;
 
        flush_delayed_work(&adev->gfx.gfx_off_delay_work);
 
@@ -4914,10 +4916,15 @@ int amdgpu_device_prepare(struct drm_device *dev)
                        continue;
                r = adev->ip_blocks[i].version->funcs->prepare_suspend((void 
*)adev);
                if (r)
-                       return r;
+                       goto unprepare;
        }
 
        return 0;
+
+unprepare:
+       adev->in_s0ix = adev->in_s3 = adev->in_s4 = false;
+
+       return r;
 }
 
 /**

Reply via email to