On 1/26/26 14:00, Christian König wrote:
> On 1/26/26 11:27, Michel Dänzer wrote:
>> On 1/26/26 11:14, Christian König wrote:
>>> On 1/23/26 15:44, Timur Kristóf wrote:
>>>> On Friday, January 23, 2026 2:52:44 PM Central European Standard Time 
>>>> Christian König wrote:
>>>>
>>>>> So as far as I can see the whole approach doesn't make any sense at all.
>>>>
>>>> Actually this approach was proposed as a solution at XDC 2025 in Harry's 
>>>> presentation, "DRM calls driver callback to attempt recovery", see page 9 
>>>> in 
>>>> this slide deck:
>>>>
>>>> https://indico.freedesktop.org/event/10/contributions/431/attachments/
>>>> 267/355/2025%20XDC%20Hackfest%20Update%20v1.2.pdf
>>>>
>>>> If you disagree with Harry, please make a counter-proposal.
>>>
>>> Well I must have missed that detail otherwise I would have objected.
>>>
>>> But looking at the slide Harry actually pointed out what immediately came 
>>> to my mind as well, e.g. that the Compositor needs to issue a full modeset 
>>> to re-program the CRTC.
>>
>> In principle, the kernel driver has all the information it needs to 
>> reprogram the HW by itself. Not sure why the compositor would need to be 
>> actively involved.
> 
> Well first of all I'm not sure if we can reprogram the HW even if all 
> information are available.
> 
> Please keep in mind that we are in a dma_fence timeout handler here with the 
> usual rat tail of consequences. So no allocation of memory or taking locks 
> under which memory is allocated or are part of preparing the page flip etc... 
> I'm not so deep in the atomic code, so Alex, Sima and probably you as well 
> can answer that much better than I do, but of hand it sounds questionable.
> 
> On the other hand we could of course postpone reprogramming the CRTC into an 
> async work item, but that might created more problems then it solves.

Seems doable offhand from a KMS UAPI PoV. The reprogramming just needs to be 
done before sending the atomic commit completion event(s) to user space.

Not sure about the DMA fence angle though. (I consider OUT_FENCE_PTR 
problematic for other reasons, in particular, using it to get a release fence 
for clients is kind of laying a trap for them. And in the compositor I see no 
benefit vs completion events)


> Then second even if the kernel can do it I'm not sure if it should do it.
> 
> I mean userspace asked for a quick page flip and not some expensive CRTC/PLL 
> reprogramming.

More complex atomic commits can also hang, FWIW. In fact, they might be more 
likely to hang.


> Stuff like that usually takes some time and by then the frame which should be 
> displayed by the page flip might already be stale and it would be better to 
> tell userspace that we couldn't display it on time and wait for a new frame 
> to be generated.

With my compositor developer hat on, I'd rather not spend effort generating a 
new frame if there is doubt that the kernel will actually be able to display 
it. The worst case of that would be constantly generating new frames, none of 
which are displayed.

I'd rather try again with the same frame, which boils down to an "empty" (no 
actual state changes) commit with the DRM_MODE_ATOMIC_ALLOW_MODESET flag.

Relying on user space for this can also be problematic, e.g. if user space dies 
and drops back to fbcon.


> And third, there must be a root cause of the page flip not completing.
> 
> My educated guess is that we have some atomic property change or even turning 
> the CRTC off in parallel with the page flip. I mean HW rarely turns off its 
> reoccurring vblank interrupt on its own.
> 
> Returning an error to userspace might actually help identify the root cause.

It seems pretty clear that the hangs plaguing KWin are amdgpu DC bugs.


-- 
Earthling Michel Dänzer       \        GNOME / Xwayland / Mesa developer
https://redhat.com             \               Libre software enthusiast

Reply via email to