On 10/13/23 11:41, Daniel Vetter wrote: > On Thu, Oct 12, 2023 at 02:19:41PM -0400, Ray Strode wrote: >> On Mon, Oct 09, 2023 at 02:36:17PM +0200, Christian König wrote: >>>>>> To be clear, my take is, if driver code is running in process context >>>>>> and needs to wait for periods of time on the order of or in excess of >>>>>> a typical process time slice it should be sleeping during the waiting. >>>>>> If the operation is at a point where it can be cancelled without side >>>>>> effects, the sleeping should be INTERRUPTIBLE. If it's past the point >>>>>> of no return, sleeping should be UNINTERRUPTIBLE. At no point, in my >>>>>> opinion, should kernel code busy block a typical process for dozens of >>>>>> milliseconds while keeping the process RUNNING. I don't think this is >>>>>> a controversial take. >>>>> Exactly that's what I completely disagree on. >> >> Okay if we can't agree that it's not okay for user space (or the >> kernel running in the context of user space) to busy loop a cpu core >> at 100% utilization throughout and beyond the process's entire >> scheduled time slice then we really are at an impasse. I gotta say i'm >> astonished that this seemingly indefensible behavior is somehow a >> point of contention, but I'm not going to keep arguing about it beyond >> this email. >> >> I mean we're not talking about scientific computing, or code >> compilation, or seti@home. We're talking about nearly the equivalent >> of `while (1) __asm__ ("nop");` > > I don't think anyone said this shouldn't be fixed or improved. > > What I'm saying is that the atomic ioctl is not going to make guarantees > that it will not take up to much cpu time (for some extremely vague value > of "too much") to the point that userspace can configure it's compositor > in a way that it _will_ get killed if we _ever_ violate this rule. > > We should of course try to do as good as job as possible, but that's not > what you're asking for. You're asking for a hard real-time guarantee with > the implication if we ever break it, it's a regression, and the kernel has > to bend over backwards with tricks like in your patch to make it work.
I don't think mutter really needs or wants such a hard real-time guarantee. What it needs is a fighting chance to react before the kernel kills its process. The intended mechanism for this is SIGXCPU, but that can't work if the kernel is stuck in a busy-loop. Ray's patch seems like one way to avoid that. That said, as long as SIGXCPU can work as intended with the non-blocking commits mutter uses for everything except modesets, mutter's workaround of dropping RT priority for the blocking commits seems acceptable for the time being. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and Xwayland developer