On 1/28/26 10:19, Timur Kristóf wrote: > On 2026. január 26., hétfő 14:00:59 közép-európai téli idő Christian König > wrote: >> On 1/26/26 11:27, Michel Dänzer wrote: >>> On 1/26/26 11:14, Christian König wrote: >>>> On 1/23/26 15:44, Timur Kristóf wrote: >>>>> On Friday, January 23, 2026 2:52:44 PM Central European Standard Time >>>>> >>>>> Christian König wrote: >>>>>> So as far as I can see the whole approach doesn't make any sense at >>>>>> all. >>>>> >>>>> Actually this approach was proposed as a solution at XDC 2025 in Harry's >>>>> presentation, "DRM calls driver callback to attempt recovery", see page >>>>> 9 in this slide deck: >>>>> >>>>> https://indico.freedesktop.org/event/10/contributions/431/attachments/ >>>>> 267/355/2025%20XDC%20Hackfest%20Update%20v1.2.pdf >>>>> >>>>> If you disagree with Harry, please make a counter-proposal. >>>> >>>> Well I must have missed that detail otherwise I would have objected. >>>> >>>> But looking at the slide Harry actually pointed out what immediately came >>>> to my mind as well, e.g. that the Compositor needs to issue a full >>>> modeset to re-program the CRTC.> >>> In principle, the kernel driver has all the information it needs to >>> reprogram the HW by itself. Not sure why the compositor would need to be >>> actively involved. >> Well first of all I'm not sure if we can reprogram the HW even if all >> information are available. >> >> Please keep in mind that we are in a dma_fence timeout handler here with the >> usual rat tail of consequences. So no allocation of memory or taking locks >> under which memory is allocated or are part of preparing the page flip >> etc... I'm not so deep in the atomic code, so Alex, Sima and probably you >> as well can answer that much better than I do, but of hand it sounds >> questionable. >> >> On the other hand we could of course postpone reprogramming the CRTC into an >> async work item, but that might created more problems then it solves. >> >> Then second even if the kernel can do it I'm not sure if it should do it. >> >> I mean userspace asked for a quick page flip and not some expensive CRTC/PLL >> reprogramming. Stuff like that usually takes some time and by then the >> frame which should be displayed by the page flip might already be stale and >> it would be better to tell userspace that we couldn't display it on time >> and wait for a new frame to be generated. > > I agree with Michel here. It's a kernel bug, it should be solved by the > kernel. I don't like the tendency of pushing userspace to handle kernel bugs, > especially if this is just needed for one vendor's buggy driver. (No offence.)
Well I strongly disagree. The kernel is not here to serve userspace, but to give userspace access to the HW in a generalized manner. If this is caused by a HW failure then reporting back to userspace is the most reasonable thing to do. >> >> And third, there must be a root cause of the page flip not completing. >> >> My educated guess is that we have some atomic property change or even >> turning the CRTC off in parallel with the page flip. I mean HW rarely turns >> off its reoccurring vblank interrupt on its own. > > I think a page flip timeout can many root causes, so it's unlikely that a > single solution would solve all of it. That's actually what I'm not sure about. I mean what seems to happen consistently is that somehow the CRTC is turned off, who is doing that and why? As far as I can see drm_atomic_helper_wait_for_flip_done() is called by amdgpu_dm_atomic_commit_tail() to wait for the page flip to finish. So the message comes from our DM/DC code calling into the DRM code. That in turn makes the callback completely superfluous, we could just signal an error back from drm_atomic_helper_wait_for_flip_done(). Additional to that I don't see much locking here. Who is preventing multiple atomic changes from happening at the same time? Regards, Christian. > > See: > https://gitlab.freedesktop.org/drm/amd/-/issues? > label_name[]=page%20flip%20timeout > > Best regards, > Timur > > >
