On Sat, Jun 21, 2025 at 07:00:56PM +0200, Matthieu Herrb wrote:
> On Wed, Jun 04, 2025 at 08:39:19AM +0100, Laurence Tratt wrote:
> > As of somewhere in the last 10-14 days or so, my amd64 machine has stopped
> > reliably suspending (for reference: it's reliably suspended for the past
> > 9-10 months). Unfortunately I haven't worked out *what* the sequence of
> > events that causes this is, so bisecting has proven impossible (I
> > tried...). When `zzz` is unsuccessful, X disappears, the machine drops to
> > the console (i.e. displaying "Login:" or whatever was there last), and then
> > is stuck there indefinitely: it doesn't get as far as "Syncing disks". The
> > machine's fans spin up after a while, so presumably something is stuck in a
> > busy loop.
>
> Hi,
>
> I'm seeing something similar on my X395 since some weeks. I *think*
> failure to suspend happens only after I've played a few large (HD) video
> with mpv, maybe in chrome too. I've not tried to figure out a
> reproducer.
>
> When it happens, and the machine sits a the login prompt, I can still
> interact with it for a couple of minutes before it freezes: type
> commands, enter ddb,... I cannot switch back to X (either instant
> freeze or panic) nor retry zzz. halt -p sometimes works, sometimes
> unlock the pending supend (so the machine suspends, and powers itself
> off shortly after resume).
>
> Unfortunatly I've no idea what kind of resources to look for that
> would be blocking the suspend path. Any clues of stuff I can run on
> the console or in ddb next time this happens ?
I suggested Laurence try a revert of 6.12.31's
'Revert "drm/amd: Stop evicting resources on APUs in suspend"'
And after a week or so of testing he hasn't had a problem.
Perhaps it helps in your case as well? A T495 with the same
APU as the X395 (Picasso) suspends and resumes fine without this.
diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu.h
sys/dev/pci/drm/amd/amdgpu/amdgpu.h
index 0a7bf1d3839..e223d956c28 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu.h
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu.h
@@ -1602,9 +1602,11 @@ static inline void amdgpu_acpi_get_backlight_caps(struct
amdgpu_dm_backlight_cap
#if defined(CONFIG_ACPI) && defined(CONFIG_SUSPEND)
bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev);
bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev);
+void amdgpu_choose_low_power_state(struct amdgpu_device *adev);
#else
static inline bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) {
return false; }
static inline bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev) {
return false; }
+static inline void amdgpu_choose_low_power_state(struct amdgpu_device *adev) {
}
#endif
void amdgpu_register_gpu_instance(struct amdgpu_device *adev);
diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu_acpi.c
sys/dev/pci/drm/amd/amdgpu/amdgpu_acpi.c
index b83527a69ba..8ad181eda2e 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_acpi.c
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1567,4 +1567,22 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device
*adev)
#endif /* CONFIG_AMD_PMC */
}
+/**
+ * amdgpu_choose_low_power_state
+ *
+ * @adev: amdgpu_device_pointer
+ *
+ * Choose the target low power state for the GPU
+ */
+void amdgpu_choose_low_power_state(struct amdgpu_device *adev)
+{
+ if (adev->in_runpm)
+ return;
+
+ if (amdgpu_acpi_is_s0ix_active(adev))
+ adev->in_s0ix = true;
+ else if (amdgpu_acpi_is_s3_active(adev))
+ adev->in_s3 = true;
+}
+
#endif /* CONFIG_SUSPEND */
diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
index c4a899194dc..14bae1eacf4 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
@@ -4897,13 +4897,15 @@ int amdgpu_device_prepare(struct drm_device *dev)
struct amdgpu_device *adev = drm_to_adev(dev);
int i, r;
+ amdgpu_choose_low_power_state(adev);
+
if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
return 0;
/* Evict the majority of BOs before starting suspend sequence */
r = amdgpu_device_evict_resources(adev);
if (r)
- return r;
+ goto unprepare;
flush_delayed_work(&adev->gfx.gfx_off_delay_work);
@@ -4914,10 +4916,15 @@ int amdgpu_device_prepare(struct drm_device *dev)
continue;
r = adev->ip_blocks[i].version->funcs->prepare_suspend((void
*)adev);
if (r)
- return r;
+ goto unprepare;
}
return 0;
+
+unprepare:
+ adev->in_s0ix = adev->in_s3 = adev->in_s4 = false;
+
+ return r;
}
/**