Fix patches sent to kernel team mailing list:
https://lists.ubuntu.com/archives/kernel-team/2024-December/155871.html.
SRU Justification
[Impact]
On a system with a GV100 GPU using the nouveau driver, the display becomes
unresponsive and a storm of "nouveau 0000:07:00.0: disp: ctrl 00000080"
messages are continuously printed to dmesg once the desktop environment reaches
its idle timeout. This is interfering with certification testing for the DGX
Station desktop system, as the system eventually will become unresponsive
during testing.
[Fix]
This only affects Focal.
Backporting the following patches from K5.6 resolves the issue:
58ae5284f6 ("drm/nouveau/disp/gv100-: halt
NV_PDISP_FE_RM_INTR_STAT_CTRL_DISP_ERROR storms")
5bb88d0794 ("drm/nouveau/kms/gv100-: move window ownership setup into
modesetting path")
137c4ba716 ("drm/nouveau/kms/gv100-: avoid sending a core update until the
first modeset")
[Test Case]
1. Install desktop environment
$ sudo apt install ubuntu-desktop
2. Configure GDM
$ sudo vim /etc/gdm3/custom.conf
=> Uncomment WaylandEnable=false
=> Configure automatic login for the `ubuntu` user by setting
AutomaticLoginEnable = true
AutomaticLogin = ubuntu
3. Disable display timeout
$ gsettings set org.gnome.desktop.session idle-delay 0
4. Set graphical as the default target
$ sudo systemctl set-default graphical.target
5. Reboot the system
6. Enable 1 second display timeout and wait ~10 seconds
$ gsettings set org.gnome.desktop.session idle-delay 1
7. Observe that after applying these patches, the display can wake up from idle
and the system continues to be usable without a storm of "nouveau 0000:07:00.0:
disp: ctrl 00000080" messages in dmesg.
[Where things could go wrong]
These changes affect only the nouveau driver. Issues would appear as
misbehavior of the nouveau driver, mostly likely for Volta NVIDIA GPUs.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2078011
Title:
nouveau keeps showing `disp: ctrl 00000080` and crippling the system
Status in linux package in Ubuntu:
Invalid
Status in xserver-xorg-video-nouveau package in Ubuntu:
Invalid
Status in linux source package in Focal:
In Progress
Status in xserver-xorg-video-nouveau source package in Focal:
Invalid
Bug description:
During the kerenl SRU testing, I found that the DGX A100 station kept
showing error messages from nouveau in dmesg as follows. These
numerous kernel error messages crippled the system and made it
unresponsive.
[ 2265.721452] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721457] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721463] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721474] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721480] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721485] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721491] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721496] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721507] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721514] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721519] nouveau 0000:07:00.0: disp: ctrl 00000080
[ 2265.721525] nouveau 0000:07:00.0: disp: ctrl 00000080
When the system reaches the idle delay, I guess the system will try to turn
off the monitor then something went wrong.
I can quickly reproduce this by setting idle-delay to 1 sec after the system
boot into desktop.
`gsettings set org.gnome.desktop.session idle-delay 0`
The impacted system is https://ubuntu.com/certified/201711-25989
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: xserver-xorg-video-nouveau 1:1.0.16-1
ProcVersionSignature: Ubuntu 5.4.0-193.213-generic 5.4.278
Uname: Linux 5.4.0-193-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.4
Architecture: amd64
CasperMD5CheckResult: skip
Date: Tue Aug 27 20:31:20 2024
DistUpgraded: Fresh install
DistroCodename: focal
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes
InstallationDate: Installed on 2020-08-03 (1485 days ago)
InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
MachineType: NVIDIA DGX Station
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-193-generic
root=UUID=88df95a6-4fd9-475a-8b59-ad14df1ada5a ro
SourcePackage: xserver-xorg-video-nouveau
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/27/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0406
dmi.board.asset.tag: Default string
dmi.board.name: X99-E-10G WS
dmi.board.vendor: EMPTY
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: EMPTY
dmi.chassis.version: Default string
dmi.modalias:
dmi:bvnAmericanMegatrendsInc.:bvr0406:bd08/27/2018:svnNVIDIA:pnDGXStation:pvrSystemVersion:rvnEMPTY:rnX99-E-10GWS:rvrRev1.xx:cvnEMPTY:ct3:cvrDefaultstring:
dmi.product.family: DGX
dmi.product.name: DGX Station
dmi.product.sku: 920-22587-2510-000
dmi.product.version: System Version
dmi.sys.vendor: NVIDIA
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.101-2
version.libgl1-mesa-dri: libgl1-mesa-dri 20.0.8-0ubuntu1~20.04.1
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.8-2ubuntu2.2
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel
2:2.99.917+git20200226-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.16-1
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2078011/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp