Hi Salvatore,I have removed the xorg.conf with the Nvidia graphics driver and any nvidia-related *.conf files in /etc/modprobe.d/, and I have rebooted the laptop. The following output should show, that only the default nouveau driver is loaded:
# lsmod | grep nvidia # lsmod | grep nouveau nouveau 2179072 0 ttm 131072 1 nouveau i2c_algo_bit 16384 2 i915,nouveau drm_kms_helper 208896 2 i915,nouveau mxm_wmi 16384 1 nouveau drm 495616 12 drm_kms_helper,i915,ttm,nouveauwmi 28672 6 dell_wmi,wmi_bmof,dell_smbios,dell_wmi_descriptor,mxm_wmi,nouveau
video 45056 4 dell_wmi,dell_laptop,i915,nouveau button 16384 1 nouveau # lspci -k | egrep 'VGA|3D' -A200:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
Subsystem: Dell HD Graphics 530 Kernel driver in use: i915 -- 01:00.0 3D controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2) Subsystem: Dell GM107GLM [Quadro M1000M] Kernel driver in use: nouveau # dmesg | grep -i nvidia [ 4.282530] nouveau 0000:01:00.0: NVIDIA GM107 (117310a2)[ 4.547712] audit: type=1400 audit(1596389563.639:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=543 comm="apparmor_parser" [ 4.547714] audit: type=1400 audit(1596389563.639:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=543 comm="apparmor_parser"
[ 5.944911] nvidia: loading out-of-tree module taints kernel. [ 5.944918] nvidia: module license 'NVIDIA' taints kernel.[ 5.949482] nvidia: module verification failed: signature and/or required key missing - tainting kernel [ 5.962949] nvidia-nvlink: Nvlink Core is being initialized, major device number 241 [ 5.963181] NVRM: The NVIDIA probe routine was not called for 1 device(s).
NVRM: nouveau, rivafb, nvidiafb or rivatvNVRM: was loaded and obtained ownership of the NVIDIA device(s).
NVRM: driver(s)), then try loading the NVIDIA kernel module [ 5.963182] NVRM: No NVIDIA graphics adapter probed![ 6.005267] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241 [ 6.075128] nvidia-nvlink: Nvlink Core is being initialized, major device number 241 [ 6.075448] NVRM: The NVIDIA probe routine was not called for 1 device(s).
NVRM: nouveau, rivafb, nvidiafb or rivatvNVRM: was loaded and obtained ownership of the NVIDIA device(s).
NVRM: driver(s)), then try loading the NVIDIA kernel module [ 6.075449] NVRM: No NVIDIA graphics adapter probed![ 6.097310] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
Apparently, the nvidia driver was loaded first, and after that, the nouveau driver took over.
Here is the "top" result, again with a permanent high CPU load for a kworker process:
# top top - 19:50:57 up 18 min, 4 users, load average: 1,26, 1,22, 0,93 Tasks: 198 total, 2 running, 196 sleeping, 0 stopped, 0 zombie%Cpu(s): 0,0 us, 11,3 sy, 0,0 ni, 87,1 id, 0,0 wa, 0,0 hi, 1,6 si, 0,0 st
MiB Mem : 15889,5 total, 13903,9 free, 808,5 used, 1177,0 buff/cache MiB Swap: 0,0 total, 0,0 free, 0,0 used. 14617,1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND72 root 20 0 0 0 0 R 86,7 0,0 15:23.97 kworker/7:1+pm 47 root 20 0 0 0 0 S 13,3 0,0 2:52.21 ksoftirqd/7
684 root 20 0 505356 126896 102732 S 6,7 0,8 0:20.77 Xorg 1 root 20 0 169624 10312 7880 S 0,0 0,1 0:01.34 systemd2 root 20 0 0 0 0 S 0,0 0,0 0:00.00 kthreadd
Here is the stack of PID 72: # cat /proc/72/stack [<0>] 0xffffffffffffffffThe file with a few seconds tracing, cut after line 5000 and compressed, is attached as "out-no-nvidia.txt.gz".
Please, let me know, whether my way of not loading the nvidia driver was sufficient or not. If it is required to completely uninstall the Nvidia driver for a really untainted system, I will do it, but would need more time for this.
Regards, Dirk. Am 02.08.20 um 18:22 schrieb Salvatore Bonaccorso:
Hi Dirk, On Sun, Aug 02, 2020 at 03:44:09PM +0200, Salvatore Bonaccorso wrote:Control: tags -1 + moreinfo Hi Dirk On Sun, Aug 02, 2020 at 10:00:27AM +0200, Dirk Kostrewa wrote:Package: src:linux Version: 4.19.132-1 Severity: normal Dear Maintainer, after booting the kernel 4.19.0-10-amd64, there is a kworker process running with a permanent high CPU load of almost 90% as reported by the "top" command: $ top top - 09:48:19 up 0 min, 4 users, load average: 1.91, 0.58, 0.20 Tasks: 218 total, 2 running, 216 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.8 us, 12.4 sy, 0.0 ni, 84.5 id, 0.0 wa, 0.0 hi, 2.3 si, 0.0 st MiB Mem : 15889.4 total, 14173.1 free, 889.3 used, 827.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 14677.7 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 64 root 20 0 0 0 0 R 86.7 0.0 0:47.41 kworker/0:2+pm 9 root 20 0 0 0 0 S 20.0 0.0 0:08.84 ksoftirqd/0 364 root -51 0 0 0 0 S 6.7 0.0 0:00.50 irq/126-nvidia 1177 dirk 20 0 2921696 122848 94268 S 6.7 0.8 0:02.23 kwin_x11 1 root 20 0 169652 10280 7740 S 0.0 0.1 0:01.56 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd ... The expected result after booting the kernel 4.19.0-10-amd64 is a kworker process with a CPU load close to 0%. As a control, booting the previous kernel 4.19.0-9-amd64 does not show a high CPU load for the kworker process. Instead, the kworker CPU load reported by the "top" command is 0.0%. Therefore, I suspect a bug in the kernel 4.19.0-10-amd64. Neither "dmesg" nor "journalctl -b" show any messages containing "kworker". I am using Debian/GNU Linux 10.5 with kernel 4.19.0-10-amd64 and libc6:amd64 2.28-10. If you need more information, I would be happy to provide it.To find out what could be the cause, could you have a look at https://www.kernel.org/doc/html/latest/core-api/workqueue.html#debugging this could help determining isolating why the kworker goes crazy.Please as well to the above one additional thing: Can you reproduce the issue when the kernel does not get tained? So without loading the propriertary, out-of-tree modules. This is particularly important if the issue can be tracked down, found in upstream and needs to be reported upstream. Regards, Salvatore
out-no-nvidia.txt.gz
Description: application/gzip