Bug#994971: OpenCL not working with latest Nvidia driver
On 26/09/2021 19.06, Klaus Ethgen wrote: Am So den 26. Sep 2021 um 17:58 schrieb Andreas Beckmann: Thanks. You had modifications in nvidia-modprobe.conf, didn't try that before. Ehem, I don't think so... Never touched that file by hand. You are probably right, dpkg just reported it as modified because it didn't know anything about the file at the point where it was going to remove it as obsolete conffile. This was triggered by two bugs: * debhelper (#994919) erroneously activated a new dpkg feature to remove obsolete conffiles. This works fine for the trivial cases, but not here where we replaced a conffile by an alternative to be able to select a driver version specific one (while allowing several drivers to be installed concurrntly). * dpkg (#995387): the new remove-on-upgrade feature was removing (or rather renaming to *.dpkg-old because it considered it as modified) the target of a symlink, thus acting on a conffile owned by a different package Luckily the conffile was still present as *.dpkg-old, so it is quite easy to detect this error and move the conffile back ;-) Luckily this only affects the main driver series, I was afraid all the different variants would be affected as well, since the last uploads were all built with the buggy debhelper version. But they would only mishandle a nvidia-$VARIANT-modprobe.conf, which does not exist as an alternative ;-) Andreas
Bug#994971: OpenCL not working with latest Nvidia driver
On 28/09/2021 23.28, Sebastian Ramacher wrote: This smells like #994919 in debhelper, i.e, nvidia-driver needs to be rebuilt with a fixed debhelper version. Thanks. That sounds plausible, and explains my problems to reliably reproduce it elsewhere. Can we quickly find out which packages have been built with the buggy debhelper versions? (Or which packages have the remove-on-upgrade flag in their .conffiles) and binNMU them? Please don't binNMU nvidia-graphics-drivers, yet, I still need to find a way to recover from this bug by somehow restoring the conffile. Andreas
Bug#994971: OpenCL not working with latest Nvidia driver
On 2021-09-26 18:58:48 +0200, Andreas Beckmann wrote: > Control: severity -1 serious > > On 26/09/2021 18.07, Klaus Ethgen wrote: > > > available somewhere in /var/log/apt/term.log*, it might give some hints > > > what > > > happened. > > > > Here it is. > > Thanks. You had modifications in nvidia-modprobe.conf, didn't try that > before. And dpkg messed it up. I could reproduce the disappearance of the > file. Will try to reproduce with a conffile outside nvidia stuff... This smells like #994919 in debhelper, i.e, nvidia-driver needs to be rebuilt with a fixed debhelper version. Cheers > > > By the way, make it sense to include all involved persons in reply? > > Doesn't the bugtracker send notification too? > > Only the maintainer gets notifications by default, everyone else has to > subscribe manually. So let's keep the Cc:s. > > Andreas > -- Sebastian Ramacher signature.asc Description: PGP signature
Bug#994971: OpenCL not working with latest Nvidia driver
Hi, Am So den 26. Sep 2021 um 17:58 schrieb Andreas Beckmann: > Thanks. You had modifications in nvidia-modprobe.conf, didn't try that > before. Ehem, I don't think so... Never touched that file by hand. But I see, it was touched several times by an upgrade. It started 2015-10-16 and was changed by updates 2015-10-22, 2016-07-30, 2016-09-30, 2018-03-10, 2021-08-03 and finally removed 2021-09-25. Regards Klaus -- Klaus Ethgen http://www.ethgen.ch/ pub 4096R/4E20AF1C 2011-05-16Klaus Ethgen Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C signature.asc Description: PGP signature
Bug#994971: OpenCL not working with latest Nvidia driver
Control: severity -1 serious On 26/09/2021 18.07, Klaus Ethgen wrote: available somewhere in /var/log/apt/term.log*, it might give some hints what happened. Here it is. Thanks. You had modifications in nvidia-modprobe.conf, didn't try that before. And dpkg messed it up. I could reproduce the disappearance of the file. Will try to reproduce with a conffile outside nvidia stuff... By the way, make it sense to include all involved persons in reply? Doesn't the bugtracker send notification too? Only the maintainer gets notifications by default, everyone else has to subscribe manually. So let's keep the Cc:s. Andreas
Bug#994971: OpenCL not working with latest Nvidia driver
Am So den 26. Sep 2021 um 16:56 schrieb Andreas Beckmann: > On 26/09/2021 17.52, Klaus Ethgen wrote: > > > I suspect that nvidia-modprobe.conf somehow got deleted. Check > > >debsums -ac nvidia-kernel-support > > > > Well, THAT gives a missing file. > > Do you still have the transcript of the complete upgrade process? Should be > available somewhere in /var/log/apt/term.log*, it might give some hints what > happened. Here it is. By the way, make it sense to include all involved persons in reply? Doesn't the bugtracker send notification too? Regards Klaus -- Klaus Ethgen http://www.ethgen.ch/ pub 4096R/4E20AF1C 2011-05-16Klaus Ethgen Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C term.log.bz2 Description: Binary data signature.asc Description: PGP signature
Bug#994971: OpenCL not working with latest Nvidia driver
On 26/09/2021 17.52, Klaus Ethgen wrote: I suspect that nvidia-modprobe.conf somehow got deleted. Check debsums -ac nvidia-kernel-support Well, THAT gives a missing file. Do you still have the transcript of the complete upgrade process? Should be available somewhere in /var/log/apt/term.log*, it might give some hints what happened. Andreas
Bug#994971: OpenCL not working with latest Nvidia driver
Am So den 26. Sep 2021 um 16:46 schrieb Andreas Beckmann: > On 26/09/2021 17.02, Klaus Ethgen wrote: > > Ah yes, and here is the dpkg -L nvidia-kernel-support: > > /. > > /etc > > /etc/nvidia > > /etc/nvidia/current > > /etc/nvidia/current/nvidia-blacklists-nouveau.conf > > /etc/nvidia/current/nvidia-load.conf > > /etc/nvidia/current/nvidia-modprobe.conf > ... > > I suspect that nvidia-modprobe.conf somehow got deleted. Check > debsums -ac nvidia-kernel-support Well, THAT gives a missing file. > And if the file does not exist, no alternative is created for it. It exsisted before the update.. See my etckeeper-Log: > git log --stat nvidia modprobe.d alternatives commit 8f21d4d148d7b1a349d40896a422a7753ca75509 (HEAD -> master) Author: Root Date: Sat Sep 25 12:53:02 2021 +0100 committing changes in /etc made by "apt-get --auto-remove dist-upgrade" Packages with configuration changes: -nvidia-kernel-support 470.57.02-2 amd64 +nvidia-kernel-support 470.57.02-3 amd64 Package changes: [456 packages changed] alternatives/blas.pc-x86_64-linux-gnu | 1 + alternatives/cblas.h-x86_64-linux-gnu | 1 + alternatives/glx--nvidia-modprobe.conf| 1 - alternatives/libblas.a-x86_64-linux-gnu | 1 + alternatives/libblas.so-x86_64-linux-gnu | 1 + alternatives/nvidia--nvidia-modprobe.conf | 1 - modprobe.d/nvidia.conf| 1 - nvidia/current/nvidia-modprobe.conf | 22 -- nvidia/nvidia-modprobe.conf | 1 - 9 files changed, 4 insertions(+), 26 deletions(-) So, you see that the update deleted alternatives/glx--nvidia-modprobe.conf, alternatives/nvidia--nvidia-modprobe.conf, modprobe.d/nvidia.conf, nvidia/current/nvidia-modprobe.conf and nvidia/nvidia-modprobe.conf. Regards Klaus -- Klaus Ethgen http://www.ethgen.ch/ pub 4096R/4E20AF1C 2011-05-16Klaus Ethgen Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C signature.asc Description: PGP signature
Bug#994971: OpenCL not working with latest Nvidia driver
On 26/09/2021 17.02, Klaus Ethgen wrote: Ah yes, and here is the dpkg -L nvidia-kernel-support: /. /etc /etc/nvidia /etc/nvidia/current /etc/nvidia/current/nvidia-blacklists-nouveau.conf /etc/nvidia/current/nvidia-load.conf /etc/nvidia/current/nvidia-modprobe.conf ... I suspect that nvidia-modprobe.conf somehow got deleted. Check debsums -ac nvidia-kernel-support And if the file does not exist, no alternative is created for it. Are there any other /etc/nvidia/current/nvidia-modprobe.conf* files sitting around? (As it is a conffile, all user modifications to it are preserved, including deletion.) Andreas
Bug#994971: OpenCL not working with latest Nvidia driver
Ah yes, and here is the dpkg -L nvidia-kernel-support: /. /etc /etc/nvidia /etc/nvidia/current /etc/nvidia/current/nvidia-blacklists-nouveau.conf /etc/nvidia/current/nvidia-load.conf /etc/nvidia/current/nvidia-modprobe.conf /lib /lib/firmware /lib/firmware/nvidia /lib/firmware/nvidia/470.57.02 /lib/firmware/nvidia/470.57.02/gsp.bin /usr /usr/share /usr/share/bug /usr/share/bug/nvidia-kernel-support /usr/share/bug/nvidia-kernel-support/control /usr/share/bug/nvidia-kernel-support/script /usr/share/doc /usr/share/doc/nvidia-kernel-support /usr/share/doc/nvidia-kernel-support/changelog.Debian.gz /usr/share/doc/nvidia-kernel-support/changelog.gz /usr/share/doc/nvidia-kernel-support/copyright /usr/share/lintian /usr/share/lintian/overrides /usr/share/lintian/overrides/nvidia-kernel-support Regards Klaus -- Klaus Ethgen http://www.ethgen.ch/ pub 4096R/4E20AF1C 2011-05-16Klaus Ethgen Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C signature.asc Description: PGP signature
Bug#994971: OpenCL not working with latest Nvidia driver
Am So den 26. Sep 2021 um 15:13 schrieb Andreas Beckmann: > On 26/09/2021 14.51, Klaus Ethgen wrote: > > nvidia/current/nvidia-modprobe.conf, which is linked via > > modprobe.d/nvidia.conf -> /etc/alternatives/glx--nvidia-modprobe.conf -> > > nvidia/nvidia-modprobe.conf -> > > /etc/alternatives/nvidia--nvidia-modprobe.conf > > is gone now. It was in package nvidia-kernel-support. > > No, it isn't. Well, it is not installed anymore after the update to this version. And from the report of the OP, I think, it is the same on his system. So, yes, it is gone. > What is your glx alternative pointing to? > (update-glx --display glx) glx - automatischer Modus beste Version des Links ist /usr/lib/nvidia Link verweist zur Zeit auf /usr/lib/nvidia Link glx ist /usr/lib/glx Slave glx--libEGL.so.1-i386-linux-gnu ist /usr/lib/i386-linux-gnu/libEGL.so.1 Slave glx--libEGL.so.1-x86_64-linux-gnu ist /usr/lib/x86_64-linux-gnu/libEGL.so.1 Slave glx--libGL.so.1-i386-linux-gnu ist /usr/lib/i386-linux-gnu/libGL.so.1 Slave glx--libGL.so.1-x86_64-linux-gnu ist /usr/lib/x86_64-linux-gnu/libGL.so.1 Slave glx--libGLESv1_CM.so.1-i386-linux-gnu ist /usr/lib/i386-linux-gnu/libGLESv1_CM.so.1 Slave glx--libGLESv1_CM.so.1-x86_64-linux-gnu ist /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1 Slave glx--libGLESv2.so.2-i386-linux-gnu ist /usr/lib/i386-linux-gnu/libGLESv2.so.2 Slave glx--libGLESv2.so.2-x86_64-linux-gnu ist /usr/lib/x86_64-linux-gnu/libGLESv2.so.2 Slave glx--libGLX_indirect.so.0-i386-linux-gnu ist /usr/lib/i386-linux-gnu/libGLX_indirect.so.0 Slave glx--libGLX_indirect.so.0-x86_64-linux-gnu ist /usr/lib/x86_64-linux-gnu/libGLX_indirect.so.0 Slave glx--libglxserver_nvidia.so ist /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so Slave glx--libnvidia-cfg.so.1-x86_64-linux-gnu ist /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1 Slave glx--nvidia-blacklists-nouveau.conf ist /etc/modprobe.d/nvidia-blacklists-nouveau.conf Slave glx--nvidia-bug-report.sh ist /usr/bin/nvidia-bug-report.sh Slave glx--nvidia-drm-outputclass.conf ist /usr/share/X11/xorg.conf.d/nvidia-drm-outputclass.conf Slave glx--nvidia-load.conf ist /etc/modules-load.d/nvidia.conf Slave glx--nvidia_drv.so ist /usr/lib/xorg/modules/drivers/nvidia_drv.so /usr/lib/mesa-diverted - Priorität 5 Slave glx--libEGL.so.1-i386-linux-gnu: /usr/lib/mesa-diverted/i386-linux-gnu/libEGL.so.1 Slave glx--libEGL.so.1-x86_64-linux-gnu: /usr/lib/mesa-diverted/x86_64-linux-gnu/libEGL.so.1 Slave glx--libGL.so.1-i386-linux-gnu: /usr/lib/mesa-diverted/i386-linux-gnu/libGL.so.1 Slave glx--libGL.so.1-x86_64-linux-gnu: /usr/lib/mesa-diverted/x86_64-linux-gnu/libGL.so.1 Slave glx--libGLESv1_CM.so.1-i386-linux-gnu: /usr/lib/mesa-diverted/i386-linux-gnu/libGLESv1_CM.so.1 Slave glx--libGLESv1_CM.so.1-x86_64-linux-gnu: /usr/lib/mesa-diverted/x86_64-linux-gnu/libGLESv1_CM.so.1 Slave glx--libGLESv2.so.2-i386-linux-gnu: /usr/lib/mesa-diverted/i386-linux-gnu/libGLESv2.so.2 Slave glx--libGLESv2.so.2-x86_64-linux-gnu: /usr/lib/mesa-diverted/x86_64-linux-gnu/libGLESv2.so.2 Slave glx--libGLX_indirect.so.0-i386-linux-gnu: /usr/lib/i386-linux-gnu/libGLX_mesa.so.0 Slave glx--libGLX_indirect.so.0-x86_64-linux-gnu: /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0 /usr/lib/nvidia - Priorität 100 Slave glx--libEGL.so.1-i386-linux-gnu: /usr/lib/mesa-diverted/i386-linux-gnu/libEGL.so.1 Slave glx--libEGL.so.1-x86_64-linux-gnu: /usr/lib/mesa-diverted/x86_64-linux-gnu/libEGL.so.1 Slave glx--libGL.so.1-i386-linux-gnu: /usr/lib/mesa-diverted/i386-linux-gnu/libGL.so.1 Slave glx--libGL.so.1-x86_64-linux-gnu: /usr/lib/mesa-diverted/x86_64-linux-gnu/libGL.so.1 Slave glx--libGLESv1_CM.so.1-i386-linux-gnu: /usr/lib/mesa-diverted/i386-linux-gnu/libGLESv1_CM.so.1 Slave glx--libGLESv1_CM.so.1-x86_64-linux-gnu: /usr/lib/mesa-diverted/x86_64-linux-gnu/libGLESv1_CM.so.1 Slave glx--libGLESv2.so.2-i386-linux-gnu: /usr/lib/mesa-diverted/i386-linux-gnu/libGLESv2.so.2 Slave glx--libGLESv2.so.2-x86_64-linux-gnu: /usr/lib/mesa-diverted/x86_64-linux-gnu/libGLESv2.so.2 Slave glx--libGLX_indirect.so.0-i386-linux-gnu: /usr/lib/i386-linux-gnu/libGLX_nvidia.so.0 Slave glx--libGLX_indirect.so.0-x86_64-linux-gnu: /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0 Slave glx--libglxserver_nvidia.so: /usr/lib/nvidia/libglxserver_nvidia.so Slave glx--libnvidia-cfg.so.1-x86_64-linux-gnu: /usr/lib/x86_64-linux-gnu/nvidia/libnvidia-cfg.so.1 Slave glx--nvidia-blacklists-nouveau.conf: /etc/nvidia/nvidia-blacklists-nouveau.conf Slave glx--nvidia-bug-report.sh: /usr/lib/nvidia/nvidia-bug-report.sh Slave glx--nvidia-drm-outputclass.conf: /etc/nvidia/nvidia-drm-outputclass.conf Slave glx--nvidia-load.conf: /etc/nvidia/nvidia-load.conf Slave glx--nvidia_drv.so: /usr/lib/nvidia/nvidia_drv.so /usr/
Bug#994971: OpenCL not working with latest Nvidia driver
On 26/09/2021 14.51, Klaus Ethgen wrote: nvidia/current/nvidia-modprobe.conf, which is linked via modprobe.d/nvidia.conf -> /etc/alternatives/glx--nvidia-modprobe.conf -> nvidia/nvidia-modprobe.conf -> /etc/alternatives/nvidia--nvidia-modprobe.conf is gone now. It was in package nvidia-kernel-support. No, it isn't. What is your glx alternative pointing to? (update-glx --display glx) Andreas
Bug#994971: OpenCL not working with latest Nvidia driver
Am So den 26. Sep 2021 um 12:52 schrieb Andreas Beckmann: > On 26/09/2021 11.37, Pascal Obry wrote: > > > Thats caused as the modprobe config was dropped. Without, the needed > > > modules are not loaded automatically anymore. > > I haven't touched the modprobe config recently ... IIRC using cuda usually > loaded the module if it wasn't present, but perhaps nvidia has changed > something in that area ... nvidia/current/nvidia-modprobe.conf, which is linked via modprobe.d/nvidia.conf -> /etc/alternatives/glx--nvidia-modprobe.conf -> nvidia/nvidia-modprobe.conf -> /etc/alternatives/nvidia--nvidia-modprobe.conf is gone now. It was in package nvidia-kernel-support. The content was: install nvidia modprobe -i nvidia-current $CMDLINE_OPTS install nvidia-modeset modprobe nvidia ; modprobe -i nvidia-current-modeset $CMDLINE_OPTS install nvidia-drm modprobe nvidia-modeset ; modprobe -i nvidia-current-drm $CMDLINE_OPTS install nvidia-peermem modprobe nvidia ; modprobe -i nvidia-current-peermem $CMDLINE_OPTS install nvidia-uvm modprobe nvidia ; modprobe -i nvidia-current-uvm $CMDLINE_OPTS remove nvidia modprobe -r -i nvidia-drm nvidia-modeset nvidia-peermem nvidia-uvm nvidia remove nvidia-modeset modprobe -r -i nvidia-drm nvidia-modeset # These aliases are defined in *all* nvidia modules. # Duplicating them here sets higher precedence and ensures the selected # module gets loaded instead of a random first match if more than one # version is installed. See #798207. alias pci:v10DEd0E00sv*sd*bc04sc80i00*nvidia alias pci:v10DEd0AA3sv*sd*bc0Bsc40i00*nvidia alias pci:v10DEd*sv*sd*bc03sc02i00* nvidia alias pci:v10DEd*sv*sd*bc03sc00i00* nvidia > > > A simple `modprobe nvidia-current-uvm` fixes the issue temporarily. > > > However, the modprobe config needs to come back. > > I don't think it is gone. See above. Regards Klaus -- Klaus Ethgen http://www.ethgen.ch/ pub 4096R/4E20AF1C 2011-05-16Klaus Ethgen Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C signature.asc Description: PGP signature
Bug#994971: OpenCL not working with latest Nvidia driver
Control: severity -1 important On 26/09/2021 11.37, Pascal Obry wrote: Thats caused as the modprobe config was dropped. Without, the needed modules are not loaded automatically anymore. I haven't touched the modprobe config recently ... IIRC using cuda usually loaded the module if it wasn't present, but perhaps nvidia has changed something in that area ... A simple `modprobe nvidia-current-uvm` fixes the issue temporarily. However, the modprobe config needs to come back. I don't think it is gone. Andreas
Bug#994971: OpenCL not working with latest Nvidia driver
Hi Klaus, > Thats caused as the modprobe config was dropped. Without, the needed > modules are not loaded automatically anymore. > > A simple `modprobe nvidia-current-uvm` fixes the issue temporarily. > However, the modprobe config needs to come back. Will test this. > By the way, I would rate this as a normal bug instead of grave. I fully agree by re-reading the rating description. Sorry for that. Thanks, -- Pascal Obry / Magny Les Hameaux (78) The best way to travel is by means of imagination http://www.obry.net gpg --keyserver keys.gnupg.net --recv-key F949BD3B signature.asc Description: This is a digitally signed message part
Bug#994971: OpenCL not working with latest Nvidia driver
Hi, Am Fr den 24. Sep 2021 um 8:46 schrieb Pascal Obry: > I'm currently using GNU/Debian sid. The current NVidia driver is > 470.57.02-2 and OpenCL is working fine. [...] > There is a new version available 470.57.02-3 and when installed OpenCL > is not supported. clinfo report that there is 0 platform > supported/detected. Thats caused as the modprobe config was dropped. Without, the needed modules are not loaded automatically anymore. A simple `modprobe nvidia-current-uvm` fixes the issue temporarily. However, the modprobe config needs to come back. By the way, I would rate this as a normal bug instead of grave. Regards Klaus -- Klaus Ethgen http://www.ethgen.ch/ pub 4096R/4E20AF1C 2011-05-16Klaus Ethgen Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C signature.asc Description: PGP signature
Bug#994971: OpenCL not working with latest Nvidia driver
Package: nvidia-driver Version: 470.57.02-3 Severity: grave I'm currently using GNU/Debian sid. The current NVidia driver is 470.57.02-2 and OpenCL is working fine. $ clinfo Number of platforms 1 Platform Name NVIDIA CUDA Platform Vendor NVIDIA Corporation Platform Version OpenCL 3.0 CUDA 11.4.94 ... There is a new version available 470.57.02-3 and when installed OpenCL is not supported. clinfo report that there is 0 platform supported/detected. I have this issue with kernel 5.10.0-8-amd64 and new 5.14. The only solution I have found at this point is to revert the NVidia driver to 470.57.02-3 and kernel 5.10. I cannot tell if this is a kernel issue or an NVidia driver one. Maybe someone with more knowledge on this area could help finding out. Thanks, -- Pascal Obry / Magny Les Hameaux (78) The best way to travel is by means of imagination http://www.obry.net gpg --keyserver keys.gnupg.net --recv-key F949BD3B