** Description changed:

  [Impact]
  
-  * In Ubuntu 20.04 (which using Xorg with proprietary nvidia driver), in some 
cases, the nvidia driver will probe later than launching gdm.
+  * In Ubuntu 20.04 (either impish, jammy, upstream gdm) (which using Xorg 
with proprietary nvidia driver), in some cases, the nvidia driver will probe 
later than launching gdm.
   * If above race condition happens in iGPU + nvidia cases and monitor 
connects to dGPU, which will cause gdm starts with wayland as opposed to Xorg. 
Which may lead the monitor stuck in black-screen or boot LOGO.
  
  [Test Plan]
  
   * The environment:
    1. A desktop or workstation which containing an iGPU.
    2. Plug a nvidia graphic card to the system and installing proprietary
    nvidia driver (470 in my case)
    3. Attach a monitor to dGPU and leave iGPU connect to nothing.
    (in my test environment, there is the other ethernet card and TBT4 cards)
   * Setup a cronjob,
     e.g. @reboot /home/u/test.sh
   * Have a test script in something like /home/u/test.sh as
  
  #!/bin/bash
  
  sleep 20
  count="$(cat /home/ubuntu/count)"
  count=$((count+1))
  echo $count | tee /home/ubuntu/count
  journalctl -b | grep -q -i wayland || sudo reboot
  
   * the system will probably stuck in black-screen or boot LOGO.
   * Before applying the fix, the fail rate is 6/24 (fail 6 time in 24 runs).
-  * After applying the fix, it got pass within 90 reboot cycles.
+  * After applying the fix, it got pass within 1000+ reboot cycles.
+  * Test PPA can be found here 
https://launchpad.net/~os369510/+archive/ubuntu/lp1958488
+ 
+ [Fix]
+  * The patch makes gpu-manager to probe nvidia (if needed) first and waiting 
for the /run/u-d-c-nvidia-drm-was-loaded be touched by 
71-u-d-c-gpu-detection.rules.
+  * Also, the gdm is using 61-gdm.rules to configure the gdm mode by checking 
the nvidia driver presents or not.
+  * gpu-manager is before display-manager. Thus, gpu-manager will wait for 
nvidia uevent be processed and then continue to work. When gdm be launched, the 
targeted nvidia uevent has been processed already. 
(71-u-d-c-gpu-detection.rules is later than 61-gdm.rules)
  
  [Where problems could occur]
-  * The patch checkes device/boot_vga with tag "master-of-seat". If a system 
without any GPU as boot_vga then it will wait 10 seconds in the loop.
-  * I tried to detach all monitors from the system and the boot_vga still 
exist (default to use iGPU, the first device found during initialization)
-  * I don't think they will a case which boot_vga doesn't exist in this moment 
because:
- In drivers/gpu/vga/vgaarb.c (linux package)
- vga_arb_select_default_device() has a fallback function to determine the 
boot_vga device to prevent firmware doesn't pass correct efifb to linux-kernel.
+  * there is not potential regression from my mind but it will lead the boot 
time be longer.
+  * In my test cycles, it leads extra 0~1000ms in boot time. Usually, 0~200ms. 
Worst case, over 1 s in 8xx runs (of 1000).
+  * I think the stability is important than performance in this case.
+ 
+ [Other Info]
+  * For non-ubuntu-desktop (which doesn't have gpu-manager), which using gdm 
will meet this issue still. The other potential fix (from either gdm or logind) 
is under discussion in 
https://gitlab.gnome.org/GNOME/gdm/-/issues/763#note_1385786. 
  
  ---
  
  Test environment/steps:
  1. A desktop or workstation which containing an iGPU.
  2. Plug a nvidia graphic card to the system and installing proprietary nvidia 
driver (470 in my case)
  3. Attach a monitor to dGPU and leave iGPU connect to nothing.
  (in my test environment, there is the other ethernet card and TBT4 cards)
  4. Reboot system.
  
  Based on:
  $ cat /lib/udev/rules.d/61-gdm.rules
  # disable Wayland on Hi1710 chipsets
  ATTR{vendor}=="0x19e5", ATTR{device}=="0x1711", 
RUN+="/usr/lib/gdm3/gdm-disable-wayland"
  # disable Wayland when using the proprietary nvidia driver
  DRIVER=="nvidia", RUN+="/usr/lib/gdm3/gdm-disable-wayland"
  
  It will disable wayland by default if proprietary nvidia driver load.
  But in some race condition cases, the nvidia probe is later than gnome 
launches. (The fail rate is 6/24.)
  Thus, ubuntu-gdm has a fix for Bug#1794280 to add 
"ExecStartPre=@libexecdir@/gdm-wait-for-drm".
  
  The gdm-wait-for-drm is intend to make sure all drm udev devices
  enumerated before launching gdm.
  
  It rely on at least one "master-of-seat" graphic card for gdm but it's not 
rigorous enough.
  Since most of graphic cards are own "master-of-seat"[1].
  
  In my case, it detects the iGPU is probed but dGPU.
  However, the display is attached to dGPU.
  
  We need to make sure the targeted gpu (connecting to monitor) is probe
  before launching gdm.
  
  debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1004131
  upstream bug: https://gitlab.gnome.org/GNOME/gdm/-/issues/763
  
  [1] https://www.freedesktop.org/wiki/Software/systemd/multiseat/
  /lib/udev/rules.d/71-seat.rules
  /lib/udev/rules.d/71-nvidia.rules

** Description changed:

  [Impact]
  
   * In Ubuntu 20.04 (either impish, jammy, upstream gdm) (which using Xorg 
with proprietary nvidia driver), in some cases, the nvidia driver will probe 
later than launching gdm.
   * If above race condition happens in iGPU + nvidia cases and monitor 
connects to dGPU, which will cause gdm starts with wayland as opposed to Xorg. 
Which may lead the monitor stuck in black-screen or boot LOGO.
  
  [Test Plan]
  
   * The environment:
    1. A desktop or workstation which containing an iGPU.
    2. Plug a nvidia graphic card to the system and installing proprietary
    nvidia driver (470 in my case)
    3. Attach a monitor to dGPU and leave iGPU connect to nothing.
    (in my test environment, there is the other ethernet card and TBT4 cards)
   * Setup a cronjob,
     e.g. @reboot /home/u/test.sh
   * Have a test script in something like /home/u/test.sh as
  
  #!/bin/bash
  
  sleep 20
  count="$(cat /home/ubuntu/count)"
  count=$((count+1))
  echo $count | tee /home/ubuntu/count
  journalctl -b | grep -q -i wayland || sudo reboot
  
   * the system will probably stuck in black-screen or boot LOGO.
   * Before applying the fix, the fail rate is 6/24 (fail 6 time in 24 runs).
   * After applying the fix, it got pass within 1000+ reboot cycles.
-  * Test PPA can be found here 
https://launchpad.net/~os369510/+archive/ubuntu/lp1958488
+  * Test PPA can be found here 
https://launchpad.net/~os369510/+archive/ubuntu/lp1958488
  
  [Fix]
   * The patch makes gpu-manager to probe nvidia (if needed) first and waiting 
for the /run/u-d-c-nvidia-drm-was-loaded be touched by 
71-u-d-c-gpu-detection.rules.
-  * Also, the gdm is using 61-gdm.rules to configure the gdm mode by checking 
the nvidia driver presents or not.
-  * gpu-manager is before display-manager. Thus, gpu-manager will wait for 
nvidia uevent be processed and then continue to work. When gdm be launched, the 
targeted nvidia uevent has been processed already. 
(71-u-d-c-gpu-detection.rules is later than 61-gdm.rules)
+  * Also, the gdm is using 61-gdm.rules to configure the gdm mode by checking 
the nvidia driver presents or not.
+  * gpu-manager is before display-manager. Thus, gpu-manager will wait for 
nvidia uevent be processed and then continue to work. When gdm be launched, the 
targeted nvidia uevent has been processed already. 
(71-u-d-c-gpu-detection.rules is later than 61-gdm.rules)
  
  [Where problems could occur]
-  * there is not potential regression from my mind but it will lead the boot 
time be longer.
-  * In my test cycles, it leads extra 0~1000ms in boot time. Usually, 0~200ms. 
Worst case, over 1 s in 8xx runs (of 1000).
-  * I think the stability is important than performance in this case.
+  * there is not potential regression from my mind but it will lead the boot 
time be longer.
+  * In my test cycles, it leads extra 0~1000ms in boot time. Usually, 0~200ms. 
Worst case, over 1 s in 8xx runs (of 1000).
+  * I think the stability is important than performance in this case.
  
  [Other Info]
-  * For non-ubuntu-desktop (which doesn't have gpu-manager), which using gdm 
will meet this issue still. The other potential fix (from either gdm or logind) 
is under discussion in 
https://gitlab.gnome.org/GNOME/gdm/-/issues/763#note_1385786. 
+  * For non-ubuntu-desktop (which doesn't have gpu-manager), which using gdm 
will meet this issue still. The other potential fix (from either gdm or logind) 
is under discussion in 
https://gitlab.gnome.org/GNOME/gdm/-/issues/763#note_1385786.
+  * u-d-c upstream fix: 
https://github.com/tseliot/ubuntu-drivers-common/pull/67
  
  ---
  
  Test environment/steps:
  1. A desktop or workstation which containing an iGPU.
  2. Plug a nvidia graphic card to the system and installing proprietary nvidia 
driver (470 in my case)
  3. Attach a monitor to dGPU and leave iGPU connect to nothing.
  (in my test environment, there is the other ethernet card and TBT4 cards)
  4. Reboot system.
  
  Based on:
  $ cat /lib/udev/rules.d/61-gdm.rules
  # disable Wayland on Hi1710 chipsets
  ATTR{vendor}=="0x19e5", ATTR{device}=="0x1711", 
RUN+="/usr/lib/gdm3/gdm-disable-wayland"
  # disable Wayland when using the proprietary nvidia driver
  DRIVER=="nvidia", RUN+="/usr/lib/gdm3/gdm-disable-wayland"
  
  It will disable wayland by default if proprietary nvidia driver load.
  But in some race condition cases, the nvidia probe is later than gnome 
launches. (The fail rate is 6/24.)
  Thus, ubuntu-gdm has a fix for Bug#1794280 to add 
"ExecStartPre=@libexecdir@/gdm-wait-for-drm".
  
  The gdm-wait-for-drm is intend to make sure all drm udev devices
  enumerated before launching gdm.
  
  It rely on at least one "master-of-seat" graphic card for gdm but it's not 
rigorous enough.
  Since most of graphic cards are own "master-of-seat"[1].
  
  In my case, it detects the iGPU is probed but dGPU.
  However, the display is attached to dGPU.
  
  We need to make sure the targeted gpu (connecting to monitor) is probe
  before launching gdm.
  
  debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1004131
  upstream bug: https://gitlab.gnome.org/GNOME/gdm/-/issues/763
  
  [1] https://www.freedesktop.org/wiki/Software/systemd/multiseat/
  /lib/udev/rules.d/71-seat.rules
  /lib/udev/rules.d/71-nvidia.rules

-- 
You received this bug notification because you are a member of Ubuntu
Desktop Bugs, which is subscribed to gdm3 in Ubuntu.
https://bugs.launchpad.net/bugs/1958488

Title:
  [nvidia][xorg] display hangs on boot LOGO

To manage notifications about this bug go to:
https://bugs.launchpad.net/oem-priority/+bug/1958488/+subscriptions


-- 
desktop-bugs mailing list
desktop-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/desktop-bugs

Reply via email to