On 2026-03-11 06:38, Michele Palazzi wrote:
> Applied your debug diff on clean v6.19, reproduced with bpftrace running
> (dm_crtc_high_irq and dm_vupdate_high_irq probes added).
>
> dmesg:
>
> [drm] *ERROR* [CRTC:283:crtc-0] flip_done timed out
> [flip_done timeout] crtc-0 event 00000000baf6917e status 0
> [flip_done timeout] crtc-1 event 0000000000000000 status 0
> [flip_done timeout] crtc-2 event 0000000000000000 status 0
> [flip_done timeout] crtc-3 event 0000000000000000 status 0
>
> crtc-0 has a non-NULL event with pflip_status=0 (AMDGPU_FLIP_NONE).
> Note: %p hashes the pointer so can't directly correlate with the bpftrace
> output.
>
> bpftrace:
>
> 8301644 dm_pflip_high_irq [tid=0]
> 8301644 DELIVER event=ffff8b87186d5a80 crtc=0 [tid=0]
> 8301644 WAIT_FLIP EXIT 1ms [tid=36993]
> 8301649 ARM cursor event=ffff8b87186d5480 acrtc=ffff8b84958f7000 [tid=176]
> 8301649 commit_hw_done [tid=176]
> 8301649 WAIT_FLIP ENTER [tid=176]
> ...
> 10252ms, CRTC 1 continues normally
> ...
> 8311902 WAIT_FLIP !!!TIMEOUT!!! waited 10252ms [tid=176]
> Between the ARM cursor at 8301649 and the TIMEOUT at 8311902:
>
> 692 dm_crtc_high_irq fired, all on CRTC 1 (zero DELIVER with crtc=0 in the
> window)
> 0 DELIVER for event ffff8b87186d5480
> 0 ARM or DELIVER referencing acrtc ffff8b84958f7000 (CRTC 0)
> drm_vblank_disable_and_save continued firing (on CRTC 1)
> no dm_vupdate_high_irq fired at all during the entire trace
> acrtc ffff8b84958f7000 = CRTC 0
>
> If this is not enough i can retry to have the proper correlation using %px
dm_crtc_high_irq() not firing on CRTC 0 is quite strange. It suggests either
the interrupts were disabled (even though drm_vblank_disable_and_save() was
not called), or the timing generator in HW hanged.
Could you dump the interrupt state registers once the timeout is hit? Using UMR:
# get the GPU instance for your 9070XT, it should be the one with "dcn401" under
# "IP Blocks:"
sudo umr -e
# Dump interrupt state, replacing --instance # with your 9070XT instance:
sudo umr --instance 1 -r '*.*.OTG_GLOBAL_SYNC_STATUS' -O bits
UMR is available on aur, building it is also straightforward:
https://aur.archlinux.org/packages/umr
https://gitlab.freedesktop.org/tomstdenis/umr
----
Another suspicion is that DGPU idle optimizations might hang the TG. If force-
disabling it fixes the issue, then it would support that suspicion:
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 2676865f6f943..eb4c5f13943e0 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2096,6 +2096,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
/* Display Core create. */
adev->dm.dc = dc_create(&init_data);
+ adev->dm.dc->debug.disable_idle_power_optimizations = true;
+ adev->dm.dc->debug.force_disable_subvp = true;
+ adev->dm.dc->debug.fams2_config.bits.enable = false;
+
if (adev->dm.dc) {
drm_info(adev_to_drm(adev), "Display Core v%s initialized on
%s\n", DC_VER,
dce_version_to_string(adev->dm.dc->ctx->dce_version));
Thanks,
Leo