On Fri, Jan 23, 2026 at 9:52 AM Hamza Mahfooz
<[email protected]> wrote:
>
> On Fri, Jan 23, 2026 at 02:52:44PM +0100, Christian König wrote:
> > I can only see two reasons why you could run into a timeout:
> >
> > 1. A dma_fence never signals.
> >       How that should be handled is already well documented and doesn't 
> > require any of this.
> >
> > 2. A coding error in the vblank or page flip handler leading to waiting 
> > forever.
> >       In that case calling back into the driver doesn't help either.
> >
> > So as far as I can see the whole approach doesn't make any sense at all.
>
> It appears that resetting display firmware is able to put at least a
> subset of these systems back into a consistent (usable) state. Though, I
> don't have a reliable way to reproduce the issue that I'm seeing so I
> can't say for sure what it boils down to.

I'm not at all an expert on KMS, but I took a quick look at the in and
out fences in KMS, and I think I know what might be going on.  The out
fence is signalled by calling drm_crtc_send_vblank_event() from the
interrupt handler for the vblank/pageflip interrupt.  If that
interrupt gets missed somehow, that never gets called and userspace
will wait forever.  As a safety measure, maybe add a worker thread
that gets scheduled when the atomic commit happens and then in the
interrupt handler we cancel the worker.  If the interrupt never
happens, the worker will eventually run and call
drm_crtc_send_vblank_event() and get things unstuck.

Alex

Reply via email to