On Fri, Jan 23, 2026 at 9:52 AM Hamza Mahfooz <[email protected]> wrote: > > On Fri, Jan 23, 2026 at 02:52:44PM +0100, Christian König wrote: > > I can only see two reasons why you could run into a timeout: > > > > 1. A dma_fence never signals. > > How that should be handled is already well documented and doesn't > > require any of this. > > > > 2. A coding error in the vblank or page flip handler leading to waiting > > forever. > > In that case calling back into the driver doesn't help either. > > > > So as far as I can see the whole approach doesn't make any sense at all. > > It appears that resetting display firmware is able to put at least a > subset of these systems back into a consistent (usable) state. Though, I > don't have a reliable way to reproduce the issue that I'm seeing so I > can't say for sure what it boils down to.
I'm not at all an expert on KMS, but I took a quick look at the in and out fences in KMS, and I think I know what might be going on. The out fence is signalled by calling drm_crtc_send_vblank_event() from the interrupt handler for the vblank/pageflip interrupt. If that interrupt gets missed somehow, that never gets called and userspace will wait forever. As a safety measure, maybe add a worker thread that gets scheduled when the atomic commit happens and then in the interrupt handler we cancel the worker. If the interrupt never happens, the worker will eventually run and call drm_crtc_send_vblank_event() and get things unstuck. Alex
