On Friday, May 10, 2024 9:11:13 AM EDT Jonas Ådahl wrote: > On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote: > > Hi > > > > > (This was discussed on #dri-devel, but I'll reiterate here as well). > > > > > > There are two problems at hand; one is the race condition during boot > > > when the login screen (or whatever display server appears first) is > > > launched with simpledrm, only some moments later having the real GPU > > > driver appear. > > > > > > The other is general purpose GPU hotplugging, including the unplugging > > > the GPU decided by the compositor to be the primary one. > > > > The situation of booting with simpledrm (problem 2) is a special case of > > problem 1. From the kernel's perspective, unloading simpledrm is the same as > > what you call general purpose GPU hotplugging. Even through there is not a > > full GPU, but a trivial scanout buffer. In userspace, you see the same > > sequence of events as in the general case. > > Sure, in a way it is, but the consequence and frequency of occurence is > quite different, so I think it makes sense to think of them as different > problems, since they need different solutions. One is about fixing > userspace components support for arbitrary hotplugging, the other for > mitigating the race condition that caused this discussion to begin with. > > > > > > > > > The latter is something that should be handled in userspace, by > > > compositors, etc, I agree. > > > > > > The former, however, is not properly solved by userspace learning how to > > > deal with primary GPU unplugging and switching to using a real GPU > > > driver, as it'd break the booting and login experience. > > > > > > When it works, i.e. the race condition is not hit, is this: > > > > > > * System boots > > > * Plymouth shows a "splash" screen > > > * The login screen display server is launched with the real GPU driver > > > * The login screen interface is smoothly animating using hardware > > > accelerating, presenting "advanced" graphical content depending on > > > hardware capabilities (e.g. high color bit depth, HDR, and so on) > > > > > > If the race condition is hit, with a compositor supporting primary GPU > > > hotplugging, it'll work like this: > > > > > > * System boots > > > * Plymouth shows a "splash" screen > > > * The login screen display server is launched with simpledrm > > > * Due to using simpldrm, the login screen interface is not animated and > > > just plops up, and no "advanced" graphical content is enabled due to > > > apparent missing hardware capabilities > > > * The real GPU driver appears, the login screen now starts to become > > > animated, and may suddenly change appearance due to capabilties > > > having changed > > > > > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll > > > still end up with a glitchy boot experience, and it forces userspace to > > > add things like sleep(10) to work around this. > > > > > > In other words, fixing userspace is *not* a correct solution to the > > > problem, it's a work around (albeit a behaivor we want for other > > > reasons) for the race condition. > > > > To really fix the flickering, you need to read the old DRM device's atomic > > state and apply it to the new device. Then tell the desktop and applications > > to re-init their rendering stack. > > > > Depending on the DRM driver and its hardware, it might be possible to do > > this without flickering. The key is to not loose the original scanout > > buffer, while not probing the new device driver. But that needs work in each > > individual DRM driver. > > This doesn't sound like it'll fix any flickering as I describe them. > First, the loss of initial animation when the login interface appears is > not something one can "fix", since it has already happened. > I feel like whatever animations that a login screen has though is going to be in the realm of a fade-in animation, or maybe a sliding animation though, or one of those that are more on the simple side.
llvmpipe should be good enough for animations like that these days I would think, right? Or is it really bad on very very old CPUs, like say a Pentium III? > Avoiding flickering when switching to the new driver is only possible > if one limits oneself to what simpledrm was capable of doing, i.e. no > HDR signaling etc. > > > > > > > > > Arguably, the only place a more educated guess about whether to wait or > > > not, and if so how long, is the kernel. > > > > As I said before, driver modules come and go and hardware devices come and > > go. > > > > To detect if there might be a native driver waiting to be loaded, you can > > test for > > > > - 'nomodeset' on the command line -> no native driver > > Makes sense to not wait here, and just assume simpledrm forever. > > > - 'systemd-load-modules' not started -> maybe wait > > - look for drivers under /lib/modules/<version>/kernel/drivers/gpu/drm/ -> > > maybe wait > > I suspect this is not useful for general purpose distributions. I have > 43 kernel GPU modules there, on a F40 installation. > > > - maybe udev can tell you more > > - it might for detection help that recently simpledrm devices refer to their > > parent PCI device > > - maybe systemd tracks the probed devices > > If the kernel already plumbs enough state so userspace components can > make a decent decision, instead of just sleeping for an arbitrary amount > of time, then great. This is to some degree what > https://github.com/systemd/systemd/issues/32509 is about. > > > Jonas > > > > > Best regards > > Thomas > > > > > > > > > > > Jonas > > > > > > > The next best solution is to keep the final DRM device open until a new > > > > one > > > > shows up. All DRM graphics drivers with hotplugging support are > > > > required to > > > > accept commands after their hardware has been unplugged. They simply > > > > won't > > > > display anything. > > > > > > > > Best regards > > > > Thomas > > > > > > > > > > > > > Thanks > > > > > > > > >