Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
On Fri, May 10, 2024 at 03:11:13PM +0200, Jonas Ådahl wrote: > On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote: > > Hi > > > > > (This was discussed on #dri-devel, but I'll reiterate here as well). > > > > > > There are two problems at hand; one is the race condition during boot > > > when the login screen (or whatever display server appears first) is > > > launched with simpledrm, only some moments later having the real GPU > > > driver appear. > > > > > > The other is general purpose GPU hotplugging, including the unplugging > > > the GPU decided by the compositor to be the primary one. > > > > The situation of booting with simpledrm (problem 2) is a special case of > > problem 1. From the kernel's perspective, unloading simpledrm is the same as > > what you call general purpose GPU hotplugging. Even through there is not a > > full GPU, but a trivial scanout buffer. In userspace, you see the same > > sequence of events as in the general case. > > Sure, in a way it is, but the consequence and frequency of occurence is > quite different, so I think it makes sense to think of them as different > problems, since they need different solutions. One is about fixing > userspace components support for arbitrary hotplugging, the other for > mitigating the race condition that caused this discussion to begin with. We're trying to document the hotunplug consensus here: https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#device-hot-unplug And yes hotunplug is really rough on userspace, but if that doesn't work, we need to discuss what should be done instead in general. I agree with Thomas that simpledrm really isn't special in that regard. > > > The latter is something that should be handled in userspace, by > > > compositors, etc, I agree. > > > > > > The former, however, is not properly solved by userspace learning how to > > > deal with primary GPU unplugging and switching to using a real GPU > > > driver, as it'd break the booting and login experience. > > > > > > When it works, i.e. the race condition is not hit, is this: > > > > > > * System boots > > > * Plymouth shows a "splash" screen > > > * The login screen display server is launched with the real GPU driver > > > * The login screen interface is smoothly animating using hardware > > > accelerating, presenting "advanced" graphical content depending on > > > hardware capabilities (e.g. high color bit depth, HDR, and so on) > > > > > > If the race condition is hit, with a compositor supporting primary GPU > > > hotplugging, it'll work like this: > > > > > > * System boots > > > * Plymouth shows a "splash" screen > > > * The login screen display server is launched with simpledrm > > > * Due to using simpldrm, the login screen interface is not animated and > > > just plops up, and no "advanced" graphical content is enabled due to > > > apparent missing hardware capabilities > > > * The real GPU driver appears, the login screen now starts to become > > > animated, and may suddenly change appearance due to capabilties > > > having changed > > > > > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll > > > still end up with a glitchy boot experience, and it forces userspace to > > > add things like sleep(10) to work around this. > > > > > > In other words, fixing userspace is *not* a correct solution to the > > > problem, it's a work around (albeit a behaivor we want for other > > > reasons) for the race condition. > > > > To really fix the flickering, you need to read the old DRM device's atomic > > state and apply it to the new device. Then tell the desktop and applications > > to re-init their rendering stack. > > > > Depending on the DRM driver and its hardware, it might be possible to do > > this without flickering. The key is to not loose the original scanout > > buffer, while not probing the new device driver. But that needs work in each > > individual DRM driver. > > This doesn't sound like it'll fix any flickering as I describe them. > First, the loss of initial animation when the login interface appears is > not something one can "fix", since it has already happened. > > Avoiding flickering when switching to the new driver is only possible > if one limits oneself to what simpledrm was capable of doing, i.e. no > HDR signaling etc. As long as you use the atomic ioctls (I think at least) and the real driver has full atomic state takeover support (only i915 to my knowledge), and your userspace doesn't unecessarily mess with the display state when it takes over a new driver, then that should lead to flicker free boot even across a simpledrm->real driver takeover. If your userspace doesn't crash ofc :-) But it's a real steep ask of all components to get this right. > > > Arguably, the only place a more educated guess about whether to wait or > > > not, and if so how long, is the kernel. > > > > As I said before, driver modules come and go and
Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
On Thu, 09 May 2024, nerdopolis wrote: > Hi > > So I have been made aware of an apparent race condition of some drivers > taking a bit longer to load, which could lead to a possible race condition of > display servers/greeters using the simpledrm device, and then experiencing > problems once the real driver loads, the simpledrm device that the display > servers are using as their primary GPU goes away. > > For example Weston crashes, Xorg crashes, wlroots seems to stay running, but > doesn't draw anything on the screen, kwin aborts, > This is if you boot on a QEMU machine with the virtio card, with > modprobe.blacklist=virtio_gpu, and then, when the display server is running, > run sudo modprobe virtio-gpu > > Namely, it's been recently reported here: > https://github.com/sddm/sddm/issues/1917[1] and here > https://github.com/systemd/systemd/issues/32509[2] > > My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the > real driver loads, is it possible for simpledrm to instead simulate an unplug > of the fake display/CRTC? > That way in theory, the simpledrm device will now be useless for drawing for > drawing to the screen at that point, since the real driver is now taken over, > but this way here, at least the display server doesn't lose its handles to > the /dev/dri/card0 device, (and then maybe only remove itself once the final > handle to it closes?) > > Is something like this possible to do with the way simpledrm works with the > low level video memory? Or is this not possible? Related [1][2]. BR, Jani. [1] https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10133 [2] https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11158 > > Thanks > > > [1] https://github.com/sddm/sddm/issues/1917 > [2] https://github.com/systemd/systemd/issues/32509 -- Jani Nikula, Intel
Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
On Friday, May 10, 2024 9:11:13 AM EDT Jonas Ådahl wrote: > On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote: > > Hi > > > > > (This was discussed on #dri-devel, but I'll reiterate here as well). > > > > > > There are two problems at hand; one is the race condition during boot > > > when the login screen (or whatever display server appears first) is > > > launched with simpledrm, only some moments later having the real GPU > > > driver appear. > > > > > > The other is general purpose GPU hotplugging, including the unplugging > > > the GPU decided by the compositor to be the primary one. > > > > The situation of booting with simpledrm (problem 2) is a special case of > > problem 1. From the kernel's perspective, unloading simpledrm is the same as > > what you call general purpose GPU hotplugging. Even through there is not a > > full GPU, but a trivial scanout buffer. In userspace, you see the same > > sequence of events as in the general case. > > Sure, in a way it is, but the consequence and frequency of occurence is > quite different, so I think it makes sense to think of them as different > problems, since they need different solutions. One is about fixing > userspace components support for arbitrary hotplugging, the other for > mitigating the race condition that caused this discussion to begin with. > > > > > > > > > The latter is something that should be handled in userspace, by > > > compositors, etc, I agree. > > > > > > The former, however, is not properly solved by userspace learning how to > > > deal with primary GPU unplugging and switching to using a real GPU > > > driver, as it'd break the booting and login experience. > > > > > > When it works, i.e. the race condition is not hit, is this: > > > > > > * System boots > > > * Plymouth shows a "splash" screen > > > * The login screen display server is launched with the real GPU driver > > > * The login screen interface is smoothly animating using hardware > > > accelerating, presenting "advanced" graphical content depending on > > > hardware capabilities (e.g. high color bit depth, HDR, and so on) > > > > > > If the race condition is hit, with a compositor supporting primary GPU > > > hotplugging, it'll work like this: > > > > > > * System boots > > > * Plymouth shows a "splash" screen > > > * The login screen display server is launched with simpledrm > > > * Due to using simpldrm, the login screen interface is not animated and > > > just plops up, and no "advanced" graphical content is enabled due to > > > apparent missing hardware capabilities > > > * The real GPU driver appears, the login screen now starts to become > > > animated, and may suddenly change appearance due to capabilties > > > having changed > > > > > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll > > > still end up with a glitchy boot experience, and it forces userspace to > > > add things like sleep(10) to work around this. > > > > > > In other words, fixing userspace is *not* a correct solution to the > > > problem, it's a work around (albeit a behaivor we want for other > > > reasons) for the race condition. > > > > To really fix the flickering, you need to read the old DRM device's atomic > > state and apply it to the new device. Then tell the desktop and applications > > to re-init their rendering stack. > > > > Depending on the DRM driver and its hardware, it might be possible to do > > this without flickering. The key is to not loose the original scanout > > buffer, while not probing the new device driver. But that needs work in each > > individual DRM driver. > > This doesn't sound like it'll fix any flickering as I describe them. > First, the loss of initial animation when the login interface appears is > not something one can "fix", since it has already happened. > I feel like whatever animations that a login screen has though is going to be in the realm of a fade-in animation, or maybe a sliding animation though, or one of those that are more on the simple side. llvmpipe should be good enough for animations like that these days I would think, right? Or is it really bad on very very old CPUs, like say a Pentium III? > Avoiding flickering when switching to the new driver is only possible > if one limits oneself to what simpledrm was capable of doing, i.e. no > HDR signaling etc. > > > > > > > > > Arguably, the only place a more educated guess about whether to wait or > > > not, and if so how long, is the kernel. > > > > As I said before, driver modules come and go and hardware devices come and > > go. > > > > To detect if there might be a native driver waiting to be loaded, you can > > test for > > > > - 'nomodeset' on the command line -> no native driver > > Makes sense to not wait here, and just assume simpledrm forever. > > > - 'systemd-load-modules' not started -> maybe wait > > - look for drivers under /lib/modules//kernel/drivers/gpu/drm/ -> > >
Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote: > Hi > > > (This was discussed on #dri-devel, but I'll reiterate here as well). > > > > There are two problems at hand; one is the race condition during boot > > when the login screen (or whatever display server appears first) is > > launched with simpledrm, only some moments later having the real GPU > > driver appear. > > > > The other is general purpose GPU hotplugging, including the unplugging > > the GPU decided by the compositor to be the primary one. > > The situation of booting with simpledrm (problem 2) is a special case of > problem 1. From the kernel's perspective, unloading simpledrm is the same as > what you call general purpose GPU hotplugging. Even through there is not a > full GPU, but a trivial scanout buffer. In userspace, you see the same > sequence of events as in the general case. Sure, in a way it is, but the consequence and frequency of occurence is quite different, so I think it makes sense to think of them as different problems, since they need different solutions. One is about fixing userspace components support for arbitrary hotplugging, the other for mitigating the race condition that caused this discussion to begin with. > > > > > The latter is something that should be handled in userspace, by > > compositors, etc, I agree. > > > > The former, however, is not properly solved by userspace learning how to > > deal with primary GPU unplugging and switching to using a real GPU > > driver, as it'd break the booting and login experience. > > > > When it works, i.e. the race condition is not hit, is this: > > > > * System boots > > * Plymouth shows a "splash" screen > > * The login screen display server is launched with the real GPU driver > > * The login screen interface is smoothly animating using hardware > > accelerating, presenting "advanced" graphical content depending on > > hardware capabilities (e.g. high color bit depth, HDR, and so on) > > > > If the race condition is hit, with a compositor supporting primary GPU > > hotplugging, it'll work like this: > > > > * System boots > > * Plymouth shows a "splash" screen > > * The login screen display server is launched with simpledrm > > * Due to using simpldrm, the login screen interface is not animated and > > just plops up, and no "advanced" graphical content is enabled due to > > apparent missing hardware capabilities > > * The real GPU driver appears, the login screen now starts to become > > animated, and may suddenly change appearance due to capabilties > > having changed > > > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll > > still end up with a glitchy boot experience, and it forces userspace to > > add things like sleep(10) to work around this. > > > > In other words, fixing userspace is *not* a correct solution to the > > problem, it's a work around (albeit a behaivor we want for other > > reasons) for the race condition. > > To really fix the flickering, you need to read the old DRM device's atomic > state and apply it to the new device. Then tell the desktop and applications > to re-init their rendering stack. > > Depending on the DRM driver and its hardware, it might be possible to do > this without flickering. The key is to not loose the original scanout > buffer, while not probing the new device driver. But that needs work in each > individual DRM driver. This doesn't sound like it'll fix any flickering as I describe them. First, the loss of initial animation when the login interface appears is not something one can "fix", since it has already happened. Avoiding flickering when switching to the new driver is only possible if one limits oneself to what simpledrm was capable of doing, i.e. no HDR signaling etc. > > > > > Arguably, the only place a more educated guess about whether to wait or > > not, and if so how long, is the kernel. > > As I said before, driver modules come and go and hardware devices come and > go. > > To detect if there might be a native driver waiting to be loaded, you can > test for > > - 'nomodeset' on the command line -> no native driver Makes sense to not wait here, and just assume simpledrm forever. > - 'systemd-load-modules' not started -> maybe wait > - look for drivers under /lib/modules//kernel/drivers/gpu/drm/ -> > maybe wait I suspect this is not useful for general purpose distributions. I have 43 kernel GPU modules there, on a F40 installation. > - maybe udev can tell you more > - it might for detection help that recently simpledrm devices refer to their > parent PCI device > - maybe systemd tracks the probed devices If the kernel already plumbs enough state so userspace components can make a decent decision, instead of just sleeping for an arbitrary amount of time, then great. This is to some degree what https://github.com/systemd/systemd/issues/32509 is about. Jonas > > Best regards > Thomas > > > > > > >
Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
Hi (This was discussed on #dri-devel, but I'll reiterate here as well). There are two problems at hand; one is the race condition during boot when the login screen (or whatever display server appears first) is launched with simpledrm, only some moments later having the real GPU driver appear. The other is general purpose GPU hotplugging, including the unplugging the GPU decided by the compositor to be the primary one. The situation of booting with simpledrm (problem 2) is a special case of problem 1. From the kernel's perspective, unloading simpledrm is the same as what you call general purpose GPU hotplugging. Even through there is not a full GPU, but a trivial scanout buffer. In userspace, you see the same sequence of events as in the general case. The latter is something that should be handled in userspace, by compositors, etc, I agree. The former, however, is not properly solved by userspace learning how to deal with primary GPU unplugging and switching to using a real GPU driver, as it'd break the booting and login experience. When it works, i.e. the race condition is not hit, is this: * System boots * Plymouth shows a "splash" screen * The login screen display server is launched with the real GPU driver * The login screen interface is smoothly animating using hardware accelerating, presenting "advanced" graphical content depending on hardware capabilities (e.g. high color bit depth, HDR, and so on) If the race condition is hit, with a compositor supporting primary GPU hotplugging, it'll work like this: * System boots * Plymouth shows a "splash" screen * The login screen display server is launched with simpledrm * Due to using simpldrm, the login screen interface is not animated and just plops up, and no "advanced" graphical content is enabled due to apparent missing hardware capabilities * The real GPU driver appears, the login screen now starts to become animated, and may suddenly change appearance due to capabilties having changed Thus, by just supporting hotplugging the primary GPU in userspace, we'll still end up with a glitchy boot experience, and it forces userspace to add things like sleep(10) to work around this. In other words, fixing userspace is *not* a correct solution to the problem, it's a work around (albeit a behaivor we want for other reasons) for the race condition. To really fix the flickering, you need to read the old DRM device's atomic state and apply it to the new device. Then tell the desktop and applications to re-init their rendering stack. Depending on the DRM driver and its hardware, it might be possible to do this without flickering. The key is to not loose the original scanout buffer, while not probing the new device driver. But that needs work in each individual DRM driver. Arguably, the only place a more educated guess about whether to wait or not, and if so how long, is the kernel. As I said before, driver modules come and go and hardware devices come and go. To detect if there might be a native driver waiting to be loaded, you can test for - 'nomodeset' on the command line -> no native driver - 'systemd-load-modules' not started -> maybe wait - look for drivers under /lib/modules//kernel/drivers/gpu/drm/ -> maybe wait - maybe udev can tell you more - it might for detection help that recently simpledrm devices refer to their parent PCI device - maybe systemd tracks the probed devices Best regards Thomas Jonas The next best solution is to keep the final DRM device open until a new one shows up. All DRM graphics drivers with hotplugging support are required to accept commands after their hardware has been unplugged. They simply won't display anything. Best regards Thomas Thanks -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg) -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
On Fri, May 10, 2024 at 09:32:02AM +0200, Thomas Zimmermann wrote: > Hi > > Am 09.05.24 um 15:06 schrieb nerdopolis: > > > > Hi > > > > > > So I have been made aware of an apparent race condition of some drivers > > taking a bit longer to load, which could lead to a possible race > > condition of display servers/greeters using the simpledrm device, and > > then experiencing problems once the real driver loads, the simpledrm > > device that the display servers are using as their primary GPU goes > > away. > > > > > > For example Weston crashes, Xorg crashes, wlroots seems to stay running, > > but doesn't draw anything on the screen, kwin aborts, > > > > This is if you boot on a QEMU machine with the virtio card, with > > modprobe.blacklist=virtio_gpu, and then, when the display server is > > running, run sudo modprobe virtio-gpu > > > > > > Namely, it's been recently reported here: > > https://github.com/sddm/sddm/issues/1917 and here > > https://github.com/systemd/systemd/issues/32509 > > > > > > My thinking: Instead of simpledrm's /dev/dri/card0 device going away > > when the real driver loads, is it possible for simpledrm to instead > > simulate an unplug of the fake display/CRTC? > > > > To my knowledge, there's no hotplugging for CRTCs. > > > That way in theory, the simpledrm device will now be useless for drawing > > for drawing to the screen at that point, since the real driver is now > > taken over, but this way here, at least the display server doesn't lose > > its handles to the /dev/dri/card0 device, (and then maybe only remove > > itself once the final handle to it closes?) > > > > > > Is something like this possible to do with the way simpledrm works with > > the low level video memory? Or is this not possible? > > > > Userspace needs to be prepared that graphics devices can do hotplugging. The > correct solution is to make compositors work without graphics devices. (This was discussed on #dri-devel, but I'll reiterate here as well). There are two problems at hand; one is the race condition during boot when the login screen (or whatever display server appears first) is launched with simpledrm, only some moments later having the real GPU driver appear. The other is general purpose GPU hotplugging, including the unplugging the GPU decided by the compositor to be the primary one. The latter is something that should be handled in userspace, by compositors, etc, I agree. The former, however, is not properly solved by userspace learning how to deal with primary GPU unplugging and switching to using a real GPU driver, as it'd break the booting and login experience. When it works, i.e. the race condition is not hit, is this: * System boots * Plymouth shows a "splash" screen * The login screen display server is launched with the real GPU driver * The login screen interface is smoothly animating using hardware accelerating, presenting "advanced" graphical content depending on hardware capabilities (e.g. high color bit depth, HDR, and so on) If the race condition is hit, with a compositor supporting primary GPU hotplugging, it'll work like this: * System boots * Plymouth shows a "splash" screen * The login screen display server is launched with simpledrm * Due to using simpldrm, the login screen interface is not animated and just plops up, and no "advanced" graphical content is enabled due to apparent missing hardware capabilities * The real GPU driver appears, the login screen now starts to become animated, and may suddenly change appearance due to capabilties having changed Thus, by just supporting hotplugging the primary GPU in userspace, we'll still end up with a glitchy boot experience, and it forces userspace to add things like sleep(10) to work around this. In other words, fixing userspace is *not* a correct solution to the problem, it's a work around (albeit a behaivor we want for other reasons) for the race condition. Arguably, the only place a more educated guess about whether to wait or not, and if so how long, is the kernel. Jonas > > The next best solution is to keep the final DRM device open until a new one > shows up. All DRM graphics drivers with hotplugging support are required to > accept commands after their hardware has been unplugged. They simply won't > display anything. > > Best regards > Thomas > > > > > > Thanks > > > > -- > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Frankenstrasse 146, 90461 Nuernberg, Germany > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman > HRB 36809 (AG Nuernberg) >
Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
nerdopolis writes: Hello, > Hi > > So I have been made aware of an apparent race condition of some drivers > taking a bit longer to load, which could lead to a possible race condition of > display servers/greeters using the simpledrm device, and then experiencing > problems once the real driver loads, the simpledrm device that the display > servers are using as their primary GPU goes away. > Plymouth also had this issue and that is the reason why simpledrm is not treated as a KMS device by default (unless plymouth.use-simpledrm used). > For example Weston crashes, Xorg crashes, wlroots seems to stay running, but > doesn't draw anything on the screen, kwin aborts, > This is if you boot on a QEMU machine with the virtio card, with > modprobe.blacklist=virtio_gpu, and then, when the display server is running, > run sudo modprobe virtio-gpu > > Namely, it's been recently reported here: > https://github.com/sddm/sddm/issues/1917[1] and here > https://github.com/systemd/systemd/issues/32509[2] > > My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the > real driver loads, is it possible for simpledrm to instead simulate an unplug > of the fake display/CRTC? > That way in theory, the simpledrm device will now be useless for drawing for > drawing to the screen at that point, since the real driver is now taken over, > but this way here, at least the display server doesn't lose its handles to > the /dev/dri/card0 device, (and then maybe only remove itself once the final > handle to it closes?) > > Is something like this possible to do with the way simpledrm works with the > low level video memory? Or is this not possible? > How it works is that when a native DRM driver is probed, it calls to the drm_aperture_remove_conflicting_framebuffers() to kick out the generic system framebuffer video drivers and the aperture infrastructure does a device (e.g: "simple-framebuffer", "efi-framebuffer", etc) unregistration. So is not only that the /dev/dri/card0 devnode is unregistered but that the underlaying platform device bound to the simpledrm/efifb/vesafb/simplefb drivers are unregistered, and this leads to the drivers being unregistered as well by the Linux device model infrastructure. But also, this seems to be user-space bugs for me and doing anything in the kernel is papering over the real problem IMO. -- Best regards, Javier Martinez Canillas Core Platforms Red Hat
Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
Hi Am 09.05.24 um 15:06 schrieb nerdopolis: Hi So I have been made aware of an apparent race condition of some drivers taking a bit longer to load, which could lead to a possible race condition of display servers/greeters using the simpledrm device, and then experiencing problems once the real driver loads, the simpledrm device that the display servers are using as their primary GPU goes away. For example Weston crashes, Xorg crashes, wlroots seems to stay running, but doesn't draw anything on the screen, kwin aborts, This is if you boot on a QEMU machine with the virtio card, with modprobe.blacklist=virtio_gpu, and then, when the display server is running, run sudo modprobe virtio-gpu Namely, it's been recently reported here: https://github.com/sddm/sddm/issues/1917 and here https://github.com/systemd/systemd/issues/32509 My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the real driver loads, is it possible for simpledrm to instead simulate an unplug of the fake display/CRTC? To my knowledge, there's no hotplugging for CRTCs. That way in theory, the simpledrm device will now be useless for drawing for drawing to the screen at that point, since the real driver is now taken over, but this way here, at least the display server doesn't lose its handles to the /dev/dri/card0 device, (and then maybe only remove itself once the final handle to it closes?) Is something like this possible to do with the way simpledrm works with the low level video memory? Or is this not possible? Userspace needs to be prepared that graphics devices can do hotplugging. The correct solution is to make compositors work without graphics devices. The next best solution is to keep the final DRM device open until a new one shows up. All DRM graphics drivers with hotplugging support are required to accept commands after their hardware has been unplugged. They simply won't display anything. Best regards Thomas Thanks -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
On Thu, 09 May 2024 09:06:29 -0400 nerdopolis wrote: > Hi > > So I have been made aware of an apparent race condition of some > drivers taking a bit longer to load, which could lead to a possible > race condition of display servers/greeters using the simpledrm > device, and then experiencing problems once the real driver loads, > the simpledrm device that the display servers are using as their > primary GPU goes away. > > For example Weston crashes, Xorg crashes, wlroots seems to stay > running, but doesn't draw anything on the screen, kwin aborts, This > is if you boot on a QEMU machine with the virtio card, with > modprobe.blacklist=virtio_gpu, and then, when the display server is > running, run sudo modprobe virtio-gpu > > Namely, it's been recently reported here: > https://github.com/sddm/sddm/issues/1917[1] and here > https://github.com/systemd/systemd/issues/32509[2] > > My thinking: Instead of simpledrm's /dev/dri/card0 device going away > when the real driver loads, is it possible for simpledrm to instead > simulate an unplug of the fake display/CRTC? That way in theory, the > simpledrm device will now be useless for drawing for drawing to the > screen at that point, since the real driver is now taken over, but > this way here, at least the display server doesn't lose its handles > to the /dev/dri/card0 device, (and then maybe only remove itself once > the final handle to it closes?) > > Is something like this possible to do with the way simpledrm works > with the low level video memory? Or is this not possible? Hi, what you describe sounds similar to what has been agreed that drivers should implement: https://docs.kernel.org/gpu/drm-uapi.html#device-hot-unplug That would be the first step. Then display servers would need fixing to handle the hot-unplug. Then they would need to handle hot-plug of the new DRM devices and ideally migrate to GPU accelerated compositing in order to support GPU accelerated applications. Simpledrm is not a GPU driver, and I assume that in the case you describe, the GPU driver comes up later, just like the hardware-specific display driver. Any userspace that initialized with simpledrm will be using software rendering. Ideally if a hardware rendering GPU driver turns up later and is usable with the displays, userspace would migrate to that. Essentially this is a display/GPU device switch. In general that's a big problem, needing all applications to be able to handle a GPU disappearing and another GPU appearing, and not die in between. For the simpledrm case it is easier, because the migration is from no GPU to a maybe GPU. So applications like Wayland clients could stay alive as-is, they just don't use a GPU until they restart. The problem is making display servers handle this switch of display devices and a GPU hotplug. Theoretically I believe it is doable. E.g. Weston used to be able to migrate from pixman-renderer to GL-renderer, but I suspect it is lacking support for hot-unplug of the "main" DRM display device. Thanks, pq > Thanks > > > [1] https://github.com/sddm/sddm/issues/1917 > [2] https://github.com/systemd/systemd/issues/32509 pgpA0fX1MAPXN.pgp Description: OpenPGP digital signature
simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
Hi So I have been made aware of an apparent race condition of some drivers taking a bit longer to load, which could lead to a possible race condition of display servers/greeters using the simpledrm device, and then experiencing problems once the real driver loads, the simpledrm device that the display servers are using as their primary GPU goes away. For example Weston crashes, Xorg crashes, wlroots seems to stay running, but doesn't draw anything on the screen, kwin aborts, This is if you boot on a QEMU machine with the virtio card, with modprobe.blacklist=virtio_gpu, and then, when the display server is running, run sudo modprobe virtio-gpu Namely, it's been recently reported here: https://github.com/sddm/sddm/issues/1917[1] and here https://github.com/systemd/systemd/issues/32509[2] My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the real driver loads, is it possible for simpledrm to instead simulate an unplug of the fake display/CRTC? That way in theory, the simpledrm device will now be useless for drawing for drawing to the screen at that point, since the real driver is now taken over, but this way here, at least the display server doesn't lose its handles to the /dev/dri/card0 device, (and then maybe only remove itself once the final handle to it closes?) Is something like this possible to do with the way simpledrm works with the low level video memory? Or is this not possible? Thanks [1] https://github.com/sddm/sddm/issues/1917 [2] https://github.com/systemd/systemd/issues/32509