Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-21 Thread Daniel Vetter
On Fri, May 10, 2024 at 03:11:13PM +0200, Jonas Ådahl wrote:
> On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote:
> > Hi
> > 
> > > (This was discussed on #dri-devel, but I'll reiterate here as well).
> > > 
> > > There are two problems at hand; one is the race condition during boot
> > > when the login screen (or whatever display server appears first) is
> > > launched with simpledrm, only some moments later having the real GPU
> > > driver appear.
> > > 
> > > The other is general purpose GPU hotplugging, including the unplugging
> > > the GPU decided by the compositor to be the primary one.
> > 
> > The situation of booting with simpledrm (problem 2) is a special case of
> > problem 1. From the kernel's perspective, unloading simpledrm is the same as
> > what you call general purpose GPU hotplugging. Even through there is not a
> > full GPU, but a trivial scanout buffer. In userspace, you see the same
> > sequence of events as in the general case.
> 
> Sure, in a way it is, but the consequence and frequency of occurence is
> quite different, so I think it makes sense to think of them as different
> problems, since they need different solutions. One is about fixing
> userspace components support for arbitrary hotplugging, the other for
> mitigating the race condition that caused this discussion to begin with.

We're trying to document the hotunplug consensus here:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#device-hot-unplug

And yes hotunplug is really rough on userspace, but if that doesn't work,
we need to discuss what should be done instead in general. I agree with
Thomas that simpledrm really isn't special in that regard.

> > > The latter is something that should be handled in userspace, by
> > > compositors, etc, I agree.
> > > 
> > > The former, however, is not properly solved by userspace learning how to
> > > deal with primary GPU unplugging and switching to using a real GPU
> > > driver, as it'd break the booting and login experience.
> > > 
> > > When it works, i.e. the race condition is not hit, is this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with the real GPU driver
> > >   * The login screen interface is smoothly animating using hardware
> > > accelerating, presenting "advanced" graphical content depending on
> > > hardware capabilities (e.g. high color bit depth, HDR, and so on)
> > > 
> > > If the race condition is hit, with a compositor supporting primary GPU
> > > hotplugging, it'll work like this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with simpledrm
> > >   * Due to using simpldrm, the login screen interface is not animated and
> > > just plops up, and no "advanced" graphical content is enabled due to
> > > apparent missing hardware capabilities
> > >   * The real GPU driver appears, the login screen now starts to become
> > > animated, and may suddenly change appearance due to capabilties
> > > having changed
> > > 
> > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> > > still end up with a glitchy boot experience, and it forces userspace to
> > > add things like sleep(10) to work around this.
> > > 
> > > In other words, fixing userspace is *not* a correct solution to the
> > > problem, it's a work around (albeit a behaivor we want for other
> > > reasons) for the race condition.
> > 
> > To really fix the flickering, you need to read the old DRM device's atomic
> > state and apply it to the new device. Then tell the desktop and applications
> > to re-init their rendering stack.
> > 
> > Depending on the DRM driver and its hardware, it might be possible to do
> > this without flickering. The key is to not loose the original scanout
> > buffer, while not probing the new device driver. But that needs work in each
> > individual DRM driver.
> 
> This doesn't sound like it'll fix any flickering as I describe them.
> First, the loss of initial animation when the login interface appears is
> not something one can "fix", since it has already happened.
> 
> Avoiding flickering when switching to the new driver is only possible
> if one limits oneself to what simpledrm was capable of doing, i.e. no
> HDR signaling etc.

As long as you use the atomic ioctls (I think at least) and the real
driver has full atomic state takeover support (only i915 to my knowledge),
and your userspace doesn't unecessarily mess with the display state when
it takes over a new driver, then that should lead to flicker free boot
even across a simpledrm->real driver takeover.

If your userspace doesn't crash ofc :-)

But it's a real steep ask of all components to get this right.

> > > Arguably, the only place a more educated guess about whether to wait or
> > > not, and if so how long, is the kernel.
> > 
> > As I said before, driver modules come and go and 

Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-21 Thread Jani Nikula
On Thu, 09 May 2024, nerdopolis  wrote:
> Hi
>
> So I have been made aware of an apparent race condition of some drivers 
> taking a bit longer to load, which could lead to a possible race condition of 
> display servers/greeters using the simpledrm device, and then experiencing 
> problems once the real driver loads, the simpledrm device that the display 
> servers are using as their primary GPU goes away. 
>
> For example Weston crashes, Xorg crashes, wlroots seems to stay running, but 
> doesn't draw anything on the screen, kwin aborts, 
> This is if you boot on a QEMU machine with the virtio card, with 
> modprobe.blacklist=virtio_gpu, and then, when the display server is running, 
> run sudo modprobe virtio-gpu
>
> Namely, it's been recently reported here: 
> https://github.com/sddm/sddm/issues/1917[1] and here 
> https://github.com/systemd/systemd/issues/32509[2]
>
> My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the 
> real driver loads, is it possible for simpledrm to instead simulate an unplug 
> of the fake display/CRTC?
> That way in theory, the simpledrm device will now be useless for drawing for 
> drawing to the screen at that point, since the real driver is now taken over, 
> but this way here, at least the display server doesn't lose its handles to 
> the /dev/dri/card0 device, (and then maybe only remove itself once the final 
> handle to it closes?)
>
> Is something like this possible to do with the way simpledrm works with the 
> low level video memory? Or is this not possible?

Related [1][2].

BR,
Jani.


[1] https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10133
[2] https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11158


>
> Thanks
>
> 
> [1] https://github.com/sddm/sddm/issues/1917
> [2] https://github.com/systemd/systemd/issues/32509

-- 
Jani Nikula, Intel


Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-19 Thread nerdopolis
On Friday, May 10, 2024 9:11:13 AM EDT Jonas Ådahl wrote:
> On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote:
> > Hi
> > 
> > > (This was discussed on #dri-devel, but I'll reiterate here as well).
> > > 
> > > There are two problems at hand; one is the race condition during boot
> > > when the login screen (or whatever display server appears first) is
> > > launched with simpledrm, only some moments later having the real GPU
> > > driver appear.
> > > 
> > > The other is general purpose GPU hotplugging, including the unplugging
> > > the GPU decided by the compositor to be the primary one.
> > 
> > The situation of booting with simpledrm (problem 2) is a special case of
> > problem 1. From the kernel's perspective, unloading simpledrm is the same as
> > what you call general purpose GPU hotplugging. Even through there is not a
> > full GPU, but a trivial scanout buffer. In userspace, you see the same
> > sequence of events as in the general case.
> 
> Sure, in a way it is, but the consequence and frequency of occurence is
> quite different, so I think it makes sense to think of them as different
> problems, since they need different solutions. One is about fixing
> userspace components support for arbitrary hotplugging, the other for
> mitigating the race condition that caused this discussion to begin with.
> 
> > 
> > > 
> > > The latter is something that should be handled in userspace, by
> > > compositors, etc, I agree.
> > > 
> > > The former, however, is not properly solved by userspace learning how to
> > > deal with primary GPU unplugging and switching to using a real GPU
> > > driver, as it'd break the booting and login experience.
> > > 
> > > When it works, i.e. the race condition is not hit, is this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with the real GPU driver
> > >   * The login screen interface is smoothly animating using hardware
> > > accelerating, presenting "advanced" graphical content depending on
> > > hardware capabilities (e.g. high color bit depth, HDR, and so on)
> > > 
> > > If the race condition is hit, with a compositor supporting primary GPU
> > > hotplugging, it'll work like this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with simpledrm
> > >   * Due to using simpldrm, the login screen interface is not animated and
> > > just plops up, and no "advanced" graphical content is enabled due to
> > > apparent missing hardware capabilities
> > >   * The real GPU driver appears, the login screen now starts to become
> > > animated, and may suddenly change appearance due to capabilties
> > > having changed
> > > 
> > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> > > still end up with a glitchy boot experience, and it forces userspace to
> > > add things like sleep(10) to work around this.
> > > 
> > > In other words, fixing userspace is *not* a correct solution to the
> > > problem, it's a work around (albeit a behaivor we want for other
> > > reasons) for the race condition.
> > 
> > To really fix the flickering, you need to read the old DRM device's atomic
> > state and apply it to the new device. Then tell the desktop and applications
> > to re-init their rendering stack.
> > 
> > Depending on the DRM driver and its hardware, it might be possible to do
> > this without flickering. The key is to not loose the original scanout
> > buffer, while not probing the new device driver. But that needs work in each
> > individual DRM driver.
> 
> This doesn't sound like it'll fix any flickering as I describe them.
> First, the loss of initial animation when the login interface appears is
> not something one can "fix", since it has already happened.
> 
I feel like whatever animations that a login screen has though is going to be 
in the realm of a fade-in animation, or maybe a sliding animation though, or 
one of those that are more on the simple side.

llvmpipe should be good enough for animations like that these days I would 
think, right? Or is it really bad on very very old CPUs, like say a Pentium III?
> Avoiding flickering when switching to the new driver is only possible
> if one limits oneself to what simpledrm was capable of doing, i.e. no
> HDR signaling etc.
> 
> > 
> > > 
> > > Arguably, the only place a more educated guess about whether to wait or
> > > not, and if so how long, is the kernel.
> > 
> > As I said before, driver modules come and go and hardware devices come and
> > go.
> > 
> > To detect if there might be a native driver waiting to be loaded, you can
> > test for
> > 
> > - 'nomodeset' on the command line -> no native driver
> 
> Makes sense to not wait here, and just assume simpledrm forever.
> 
> > - 'systemd-load-modules' not started -> maybe wait
> > - look for drivers under /lib/modules//kernel/drivers/gpu/drm/ ->
> > 

Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-10 Thread Jonas Ådahl
On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> > (This was discussed on #dri-devel, but I'll reiterate here as well).
> > 
> > There are two problems at hand; one is the race condition during boot
> > when the login screen (or whatever display server appears first) is
> > launched with simpledrm, only some moments later having the real GPU
> > driver appear.
> > 
> > The other is general purpose GPU hotplugging, including the unplugging
> > the GPU decided by the compositor to be the primary one.
> 
> The situation of booting with simpledrm (problem 2) is a special case of
> problem 1. From the kernel's perspective, unloading simpledrm is the same as
> what you call general purpose GPU hotplugging. Even through there is not a
> full GPU, but a trivial scanout buffer. In userspace, you see the same
> sequence of events as in the general case.

Sure, in a way it is, but the consequence and frequency of occurence is
quite different, so I think it makes sense to think of them as different
problems, since they need different solutions. One is about fixing
userspace components support for arbitrary hotplugging, the other for
mitigating the race condition that caused this discussion to begin with.

> 
> > 
> > The latter is something that should be handled in userspace, by
> > compositors, etc, I agree.
> > 
> > The former, however, is not properly solved by userspace learning how to
> > deal with primary GPU unplugging and switching to using a real GPU
> > driver, as it'd break the booting and login experience.
> > 
> > When it works, i.e. the race condition is not hit, is this:
> > 
> >   * System boots
> >   * Plymouth shows a "splash" screen
> >   * The login screen display server is launched with the real GPU driver
> >   * The login screen interface is smoothly animating using hardware
> > accelerating, presenting "advanced" graphical content depending on
> > hardware capabilities (e.g. high color bit depth, HDR, and so on)
> > 
> > If the race condition is hit, with a compositor supporting primary GPU
> > hotplugging, it'll work like this:
> > 
> >   * System boots
> >   * Plymouth shows a "splash" screen
> >   * The login screen display server is launched with simpledrm
> >   * Due to using simpldrm, the login screen interface is not animated and
> > just plops up, and no "advanced" graphical content is enabled due to
> > apparent missing hardware capabilities
> >   * The real GPU driver appears, the login screen now starts to become
> > animated, and may suddenly change appearance due to capabilties
> > having changed
> > 
> > Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> > still end up with a glitchy boot experience, and it forces userspace to
> > add things like sleep(10) to work around this.
> > 
> > In other words, fixing userspace is *not* a correct solution to the
> > problem, it's a work around (albeit a behaivor we want for other
> > reasons) for the race condition.
> 
> To really fix the flickering, you need to read the old DRM device's atomic
> state and apply it to the new device. Then tell the desktop and applications
> to re-init their rendering stack.
> 
> Depending on the DRM driver and its hardware, it might be possible to do
> this without flickering. The key is to not loose the original scanout
> buffer, while not probing the new device driver. But that needs work in each
> individual DRM driver.

This doesn't sound like it'll fix any flickering as I describe them.
First, the loss of initial animation when the login interface appears is
not something one can "fix", since it has already happened.

Avoiding flickering when switching to the new driver is only possible
if one limits oneself to what simpledrm was capable of doing, i.e. no
HDR signaling etc.

> 
> > 
> > Arguably, the only place a more educated guess about whether to wait or
> > not, and if so how long, is the kernel.
> 
> As I said before, driver modules come and go and hardware devices come and
> go.
> 
> To detect if there might be a native driver waiting to be loaded, you can
> test for
> 
> - 'nomodeset' on the command line -> no native driver

Makes sense to not wait here, and just assume simpledrm forever.

> - 'systemd-load-modules' not started -> maybe wait
> - look for drivers under /lib/modules//kernel/drivers/gpu/drm/ ->
> maybe wait

I suspect this is not useful for general purpose distributions. I have
43 kernel GPU modules there, on a F40 installation.

> - maybe udev can tell you more
> - it might for detection help that recently simpledrm devices refer to their
> parent PCI device
> - maybe systemd tracks the probed devices

If the kernel already plumbs enough state so userspace components can
make a decent decision, instead of just sleeping for an arbitrary amount
of time, then great. This is to some degree what
https://github.com/systemd/systemd/issues/32509 is about.


Jonas

> 
> Best regards
> Thomas
> 
> > 
> > 
> > 

Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-10 Thread Thomas Zimmermann

Hi


(This was discussed on #dri-devel, but I'll reiterate here as well).

There are two problems at hand; one is the race condition during boot
when the login screen (or whatever display server appears first) is
launched with simpledrm, only some moments later having the real GPU
driver appear.

The other is general purpose GPU hotplugging, including the unplugging
the GPU decided by the compositor to be the primary one.


The situation of booting with simpledrm (problem 2) is a special case of 
problem 1. From the kernel's perspective, unloading simpledrm is the 
same as what you call general purpose GPU hotplugging. Even through 
there is not a full GPU, but a trivial scanout buffer. In userspace, you 
see the same sequence of events as in the general case.




The latter is something that should be handled in userspace, by
compositors, etc, I agree.

The former, however, is not properly solved by userspace learning how to
deal with primary GPU unplugging and switching to using a real GPU
driver, as it'd break the booting and login experience.

When it works, i.e. the race condition is not hit, is this:

  * System boots
  * Plymouth shows a "splash" screen
  * The login screen display server is launched with the real GPU driver
  * The login screen interface is smoothly animating using hardware
accelerating, presenting "advanced" graphical content depending on
hardware capabilities (e.g. high color bit depth, HDR, and so on)

If the race condition is hit, with a compositor supporting primary GPU
hotplugging, it'll work like this:

  * System boots
  * Plymouth shows a "splash" screen
  * The login screen display server is launched with simpledrm
  * Due to using simpldrm, the login screen interface is not animated and
just plops up, and no "advanced" graphical content is enabled due to
apparent missing hardware capabilities
  * The real GPU driver appears, the login screen now starts to become
animated, and may suddenly change appearance due to capabilties
having changed

Thus, by just supporting hotplugging the primary GPU in userspace, we'll
still end up with a glitchy boot experience, and it forces userspace to
add things like sleep(10) to work around this.

In other words, fixing userspace is *not* a correct solution to the
problem, it's a work around (albeit a behaivor we want for other
reasons) for the race condition.


To really fix the flickering, you need to read the old DRM device's 
atomic state and apply it to the new device. Then tell the desktop and 
applications to re-init their rendering stack.


Depending on the DRM driver and its hardware, it might be possible to do 
this without flickering. The key is to not loose the original scanout 
buffer, while not probing the new device driver. But that needs work in 
each individual DRM driver.




Arguably, the only place a more educated guess about whether to wait or
not, and if so how long, is the kernel.


As I said before, driver modules come and go and hardware devices come 
and go.


To detect if there might be a native driver waiting to be loaded, you 
can test for


- 'nomodeset' on the command line -> no native driver
- 'systemd-load-modules' not started -> maybe wait
- look for drivers under /lib/modules//kernel/drivers/gpu/drm/ 
-> maybe wait

- maybe udev can tell you more
- it might for detection help that recently simpledrm devices refer to 
their parent PCI device

- maybe systemd tracks the probed devices

Best regards
Thomas




Jonas


The next best solution is to keep the final DRM device open until a new one
shows up. All DRM graphics drivers with hotplugging support are required to
accept commands after their hardware has been unplugged. They simply won't
display anything.

Best regards
Thomas



Thanks


--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)



--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)



Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-10 Thread Jonas Ådahl
On Fri, May 10, 2024 at 09:32:02AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 09.05.24 um 15:06 schrieb nerdopolis:
> > 
> > Hi
> > 
> > 
> > So I have been made aware of an apparent race condition of some drivers
> > taking a bit longer to load, which could lead to a possible race
> > condition of display servers/greeters using the simpledrm device, and
> > then experiencing problems once the real driver loads, the simpledrm
> > device that the display servers are using as their primary GPU goes
> > away.
> > 
> > 
> > For example Weston crashes, Xorg crashes, wlroots seems to stay running,
> > but doesn't draw anything on the screen, kwin aborts,
> > 
> > This is if you boot on a QEMU machine with the virtio card, with
> > modprobe.blacklist=virtio_gpu, and then, when the display server is
> > running, run sudo modprobe virtio-gpu
> > 
> > 
> > Namely, it's been recently reported here:
> > https://github.com/sddm/sddm/issues/1917 and here
> > https://github.com/systemd/systemd/issues/32509
> > 
> > 
> > My thinking: Instead of simpledrm's /dev/dri/card0 device going away
> > when the real driver loads, is it possible for simpledrm to instead
> > simulate an unplug of the fake display/CRTC?
> > 
> 
> To my knowledge, there's no hotplugging for CRTCs.
> 
> > That way in theory, the simpledrm device will now be useless for drawing
> > for drawing to the screen at that point, since the real driver is now
> > taken over, but this way here, at least the display server doesn't lose
> > its handles to the /dev/dri/card0 device, (and then maybe only remove
> > itself once the final handle to it closes?)
> > 
> > 
> > Is something like this possible to do with the way simpledrm works with
> > the low level video memory? Or is this not possible?
> > 
> 
> Userspace needs to be prepared that graphics devices can do hotplugging. The
> correct solution is to make compositors work without graphics devices.

(This was discussed on #dri-devel, but I'll reiterate here as well).

There are two problems at hand; one is the race condition during boot
when the login screen (or whatever display server appears first) is
launched with simpledrm, only some moments later having the real GPU
driver appear.

The other is general purpose GPU hotplugging, including the unplugging
the GPU decided by the compositor to be the primary one.

The latter is something that should be handled in userspace, by
compositors, etc, I agree.

The former, however, is not properly solved by userspace learning how to
deal with primary GPU unplugging and switching to using a real GPU
driver, as it'd break the booting and login experience.

When it works, i.e. the race condition is not hit, is this:

 * System boots
 * Plymouth shows a "splash" screen
 * The login screen display server is launched with the real GPU driver
 * The login screen interface is smoothly animating using hardware
   accelerating, presenting "advanced" graphical content depending on
   hardware capabilities (e.g. high color bit depth, HDR, and so on)

If the race condition is hit, with a compositor supporting primary GPU
hotplugging, it'll work like this:

 * System boots
 * Plymouth shows a "splash" screen
 * The login screen display server is launched with simpledrm
 * Due to using simpldrm, the login screen interface is not animated and
   just plops up, and no "advanced" graphical content is enabled due to
   apparent missing hardware capabilities
 * The real GPU driver appears, the login screen now starts to become
   animated, and may suddenly change appearance due to capabilties
   having changed

Thus, by just supporting hotplugging the primary GPU in userspace, we'll
still end up with a glitchy boot experience, and it forces userspace to
add things like sleep(10) to work around this.

In other words, fixing userspace is *not* a correct solution to the
problem, it's a work around (albeit a behaivor we want for other
reasons) for the race condition.

Arguably, the only place a more educated guess about whether to wait or
not, and if so how long, is the kernel.


Jonas

> 
> The next best solution is to keep the final DRM device open until a new one
> shows up. All DRM graphics drivers with hotplugging support are required to
> accept commands after their hardware has been unplugged. They simply won't
> display anything.
> 
> Best regards
> Thomas
> 
> 
> > 
> > Thanks
> > 
> 
> -- 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
> 


Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-10 Thread Javier Martinez Canillas
nerdopolis  writes:

Hello,

> Hi
>
> So I have been made aware of an apparent race condition of some drivers 
> taking a bit longer to load, which could lead to a possible race condition of 
> display servers/greeters using the simpledrm device, and then experiencing 
> problems once the real driver loads, the simpledrm device that the display 
> servers are using as their primary GPU goes away. 
>

Plymouth also had this issue and that is the reason why simpledrm is not
treated as a KMS device by default (unless plymouth.use-simpledrm used).

> For example Weston crashes, Xorg crashes, wlroots seems to stay running, but 
> doesn't draw anything on the screen, kwin aborts, 
> This is if you boot on a QEMU machine with the virtio card, with 
> modprobe.blacklist=virtio_gpu, and then, when the display server is running, 
> run sudo modprobe virtio-gpu
>
> Namely, it's been recently reported here: 
> https://github.com/sddm/sddm/issues/1917[1] and here 
> https://github.com/systemd/systemd/issues/32509[2]
>
> My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the 
> real driver loads, is it possible for simpledrm to instead simulate an unplug 
> of the fake display/CRTC?
> That way in theory, the simpledrm device will now be useless for drawing for 
> drawing to the screen at that point, since the real driver is now taken over, 
> but this way here, at least the display server doesn't lose its handles to 
> the /dev/dri/card0 device, (and then maybe only remove itself once the final 
> handle to it closes?)
>
> Is something like this possible to do with the way simpledrm works with the 
> low level video memory? Or is this not possible?
>

How it works is that when a native DRM driver is probed, it calls to the
drm_aperture_remove_conflicting_framebuffers() to kick out the generic
system framebuffer video drivers and the aperture infrastructure does a
device (e.g: "simple-framebuffer", "efi-framebuffer", etc) unregistration.

So is not only that the /dev/dri/card0 devnode is unregistered but that the
underlaying platform device bound to the simpledrm/efifb/vesafb/simplefb
drivers are unregistered, and this leads to the drivers being unregistered
as well by the Linux device model infrastructure.

But also, this seems to be user-space bugs for me and doing anything in
the kernel is papering over the real problem IMO.

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-10 Thread Thomas Zimmermann

Hi

Am 09.05.24 um 15:06 schrieb nerdopolis:


Hi


So I have been made aware of an apparent race condition of some 
drivers taking a bit longer to load, which could lead to a possible 
race condition of display servers/greeters using the simpledrm device, 
and then experiencing problems once the real driver loads, the 
simpledrm device that the display servers are using as their primary 
GPU goes away.



For example Weston crashes, Xorg crashes, wlroots seems to stay 
running, but doesn't draw anything on the screen, kwin aborts,


This is if you boot on a QEMU machine with the virtio card, with 
modprobe.blacklist=virtio_gpu, and then, when the display server is 
running, run sudo modprobe virtio-gpu



Namely, it's been recently reported here: 
https://github.com/sddm/sddm/issues/1917 and here 
https://github.com/systemd/systemd/issues/32509



My thinking: Instead of simpledrm's /dev/dri/card0 device going away 
when the real driver loads, is it possible for simpledrm to instead 
simulate an unplug of the fake display/CRTC?




To my knowledge, there's no hotplugging for CRTCs.

That way in theory, the simpledrm device will now be useless for 
drawing for drawing to the screen at that point, since the real driver 
is now taken over, but this way here, at least the display server 
doesn't lose its handles to the /dev/dri/card0 device, (and then maybe 
only remove itself once the final handle to it closes?)



Is something like this possible to do with the way simpledrm works 
with the low level video memory? Or is this not possible?




Userspace needs to be prepared that graphics devices can do hotplugging. 
The correct solution is to make compositors work without graphics devices.


The next best solution is to keep the final DRM device open until a new 
one shows up. All DRM graphics drivers with hotplugging support are 
required to accept commands after their hardware has been unplugged. 
They simply won't display anything.


Best regards
Thomas




Thanks



--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)



Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-10 Thread Pekka Paalanen
On Thu, 09 May 2024 09:06:29 -0400
nerdopolis  wrote:

> Hi
> 
> So I have been made aware of an apparent race condition of some
> drivers taking a bit longer to load, which could lead to a possible
> race condition of display servers/greeters using the simpledrm
> device, and then experiencing problems once the real driver loads,
> the simpledrm device that the display servers are using as their
> primary GPU goes away. 
> 
> For example Weston crashes, Xorg crashes, wlroots seems to stay
> running, but doesn't draw anything on the screen, kwin aborts, This
> is if you boot on a QEMU machine with the virtio card, with
> modprobe.blacklist=virtio_gpu, and then, when the display server is
> running, run sudo modprobe virtio-gpu
> 
> Namely, it's been recently reported here:
> https://github.com/sddm/sddm/issues/1917[1] and here
> https://github.com/systemd/systemd/issues/32509[2]
> 
> My thinking: Instead of simpledrm's /dev/dri/card0 device going away
> when the real driver loads, is it possible for simpledrm to instead
> simulate an unplug of the fake display/CRTC? That way in theory, the
> simpledrm device will now be useless for drawing for drawing to the
> screen at that point, since the real driver is now taken over, but
> this way here, at least the display server doesn't lose its handles
> to the /dev/dri/card0 device, (and then maybe only remove itself once
> the final handle to it closes?)
> 
> Is something like this possible to do with the way simpledrm works
> with the low level video memory? Or is this not possible?

Hi,

what you describe sounds similar to what has been agreed that drivers
should implement:
https://docs.kernel.org/gpu/drm-uapi.html#device-hot-unplug

That would be the first step. Then display servers would need fixing to
handle the hot-unplug. Then they would need to handle hot-plug of the
new DRM devices and ideally migrate to GPU accelerated compositing in
order to support GPU accelerated applications.

Simpledrm is not a GPU driver, and I assume that in the case you
describe, the GPU driver comes up later, just like the
hardware-specific display driver. Any userspace that initialized with
simpledrm will be using software rendering. Ideally if a hardware
rendering GPU driver turns up later and is usable with the displays,
userspace would migrate to that.

Essentially this is a display/GPU device switch. In general that's a
big problem, needing all applications to be able to handle a GPU
disappearing and another GPU appearing, and not die in between. For
the simpledrm case it is easier, because the migration is from no GPU
to a maybe GPU. So applications like Wayland clients could stay alive
as-is, they just don't use a GPU until they restart.

The problem is making display servers handle this switch of display
devices and a GPU hotplug. Theoretically I believe it is doable. E.g.
Weston used to be able to migrate from pixman-renderer to GL-renderer,
but I suspect it is lacking support for hot-unplug of the "main" DRM
display device.


Thanks,
pq

> Thanks
> 
> 
> [1] https://github.com/sddm/sddm/issues/1917
> [2] https://github.com/systemd/systemd/issues/32509



pgpA0fX1MAPXN.pgp
Description: OpenPGP digital signature


simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

2024-05-09 Thread nerdopolis
Hi

So I have been made aware of an apparent race condition of some drivers taking 
a bit longer to load, which could lead to a possible race condition of display 
servers/greeters using the simpledrm device, and then experiencing problems 
once the real driver loads, the simpledrm device that the display servers are 
using as their primary GPU goes away. 

For example Weston crashes, Xorg crashes, wlroots seems to stay running, but 
doesn't draw anything on the screen, kwin aborts, 
This is if you boot on a QEMU machine with the virtio card, with 
modprobe.blacklist=virtio_gpu, and then, when the display server is running, 
run sudo modprobe virtio-gpu

Namely, it's been recently reported here: 
https://github.com/sddm/sddm/issues/1917[1] and here 
https://github.com/systemd/systemd/issues/32509[2]

My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the 
real driver loads, is it possible for simpledrm to instead simulate an unplug 
of the fake display/CRTC?
That way in theory, the simpledrm device will now be useless for drawing for 
drawing to the screen at that point, since the real driver is now taken over, 
but this way here, at least the display server doesn't lose its handles to the 
/dev/dri/card0 device, (and then maybe only remove itself once the final handle 
to it closes?)

Is something like this possible to do with the way simpledrm works with the low 
level video memory? Or is this not possible?

Thanks


[1] https://github.com/sddm/sddm/issues/1917
[2] https://github.com/systemd/systemd/issues/32509