Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)
Control: tag -1 moreinfo On Tue, 22 Mar 2016 19:48:29 +0100 Michael Below wrote: > $ nvidia-modprobe -u > modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not > permitted > modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() > Error running install command for nvidia_uvm modprobe: ERROR: could not > insert 'nvidia_uvm': Operation not permitted Looking at this bug report again, this was likely an issue with nvidia-modprobe (#888952), fixed in nvidia-modprobe 384.111-2, also available in stretch. Andreas
Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)
Hi, Am Mo 21 Mär 2016 23:29:01 CET schrieb Andreas Beckmann: > So let's try what should happen within clinfo manually: > > (as root) > # modprobe -r nvidia-uvm > > (as user) > $ modprobe -v nvidia-uvm I had to add /sbin to the path, the results are similar to what you are expecting: $ PATH=$PATH:/sbin modprobe -v nvidia-uvm install modprobe nvidia ; modprobe -i nvidia-current-uvm $CMDLINE_OPTS insmod /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current.ko modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not permitted modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() Error running install command for nvidia_uvm modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted > But the libnvidia-opencl.so.1 library does not call modprobe directly, > it uses the nvidia-modprobe setuid root wrapper instead: > > (as user) > $ nvidia-modprobe -u > $ dmesg | tail > $ ls -la /usr/bin/nvidia-modprobe > > Does that work? > If it doesn't, do you use anything for extra hardening of the system? > (selinux, apparmor, ...?) It doesn't work: $ nvidia-modprobe -u modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not permitted modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() Error running install command for nvidia_uvm modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted $ dmesg|tail [ 30.210734] systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 724 (update-binfmts) [ 30.225790] systemd[1]: Mounting Arbitrary Executable File Formats File System... [ 30.310574] systemd[1]: Started Journal Service. [ 30.788185] systemd-journald[717]: Received request to flush runtime journal from PID 1 [ 33.445899] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 36.568635] tg3 :3f:00.0 eth0: Link is up at 1000 Mbps, full duplex [ 36.568658] tg3 :3f:00.0 eth0: Flow control is on for TX and on for RX [ 36.568686] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 134.948174] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data byte 7 [ 257.279351] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data byte 7 $ ls -la /usr/bin/nvidia-modprobe -rwsr-xr-x 1 root root 31224 Feb 17 00:47 /usr/bin/nvidia-modprobe apparmor is installed, but I have not changed the configuration. Maybe they have tightened the rules in a recent update? Should I remove it? Cheers Michael
Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)
[Please keep the bug Cc:ed] On 2016-03-21 22:09, Michael Below wrote: > Hi, > > Am Mo 21 Mär 2016 00:12:04 CET > schrieb Andreas Beckmann: > >>> I just did a reboot, started darktable >> >> you start darktable manually? > > Yes, for testing purposes I start darktable from a terminal window > using "darktable -d opencl" for opencl debugging output. Otherwise you > just notice that the image processing is _slow_. > >> Which libopencl1 library are you using? >> (Note to myself: collect this info in bug-control, and the installed >> icds, too, and /etc/OpenCL/vendors in bug-script) > > I was using ocl-icd-opencl1 2.2.9-1. I noticed that I have both an > amd64 and an i386 version of the library installed, and I don't > think I am using any i386 opencl programs -- should I remove that one? > >> Please install the ocd-icd-libopencl1 and clinfo packages from >> stretch. > > Done, I already had those installed. > >> and now try clinfo (on the nvidia platform only) to see whether the >> nvidia-uvm module is loaded automatically > > First I tried it without setting the variable: > > $ clinfo > modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not > permitted modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() > Error running install command for nvidia_uvm modprobe: ERROR: could not > insert 'nvidia_uvm': Operation not permitted Number of > platforms 0 > > Then I tried it with OCL_ICD_VENDORS set, same result. > > Any idea what to check next? Maybe the nvidia-icd-libopencl1? No, it's not a problem with libOpenCL.so, but some permission problem with loading the module. So let's try what should happen within clinfo manually: (as root) # modprobe -r nvidia-uvm (as user) $ modprobe -v nvidia-uvm I expect this to fail similarily to the following error: = install modprobe nvidia ; modprobe -i nvidia-current-uvm $CMDLINE_OPTS insmod /lib/modules/4.3.0-1-amd64/nvidia/nvidia-current.ko modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not permitted modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() Error running install command for nvidia_uvm modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted = But the libnvidia-opencl.so.1 library does not call modprobe directly, it uses the nvidia-modprobe setuid root wrapper instead: (as user) $ nvidia-modprobe -u $ dmesg | tail $ ls -la /usr/bin/nvidia-modprobe Does that work? If it doesn't, do you use anything for extra hardening of the system? (selinux, apparmor, ...?) Andreas
Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)
Control: retitle -1 nvidia-kernel-dkms: autoloading nvidia-uvm fails On 2016-03-20 22:36, Michael Below wrote: >> sudo modprobe -v nvidia-uvm >> >> What's the output? > > That just works??? good! > I just did a reboot, started darktable you start darktable manually? > -- again, with the openCL > errors -- tried glxgears, but now manually nvidia-uvm is loaded fine, > and after it is loaded darktable openCL works too. > > Sorry, I jumped to a conclusion, this seems to be a different issue > than the previous bug report. no problem, now we have something different to investigate :-) Which libopencl1 library are you using? (Note to myself: collect this info in bug-control, and the installed icds, too, and /etc/OpenCL/vendors in bug-script) Please install the ocd-icd-libopencl1 and clinfo packages from stretch. unload the nvidia-uvm module (nvidia will stay loaded, so X can continue to run) (as root) # modprobe -r nvidia-uvm and now try clinfo (on the nvidia platform only) to see whether the nvidia-uvm module is loaded automatically (as user) $ OCL_ICD_VENDORS=nvidia.icd clinfo $ dmesg | tail That worked for me, i.e. nvidia-uvm got loaded automatically. If that works, try again unloading and start darktable with OCL_ICD_VENDORS=nvidia.icd set Andreas
Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)
On Mar 20, 2016 22:33, "Andreas Beckmann"wrote: > > On 2016-03-20 21:06, Luca Boccassi wrote: > > But I noticed something strange that caught my attention in your Xorg > > log: > > > > [ 2634.192] (II) NVIDIA GLX Module 340.93 Wed Aug 19 16:23:51 PDT > > 2015 > > [ 2634.192] (II) LoadModule: "nvidia" > > [ 2634.192] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so > > [ 2634.193] (II) Module nvidia: vendor="NVIDIA Corporation" > > [ 2634.193] compiled for 4.0.2, module version = 1.0.0 > > [ 2634.193] Module class: X.Org Video Driver > > [ 2634.193] (II) NVIDIA dlloader X Driver 340.93 Wed Aug 19 16:01:53 > > PDT 2015 > > > > It looks like the older 340.93 library is being loaded? But all the > > symlinks and the package versions in your system indicate that 352.79 is > > installed, any idea why this might be happening? > > That's a red herring. Ancient logfiles. > > > -rw-r--r-- 1 root root21716 Oct 21 20:45 /var/log/Xorg.0.log > > -rw-r--r-- 1 root root21716 Oct 21 20:43 /var/log/Xorg.0.log.old Ah good point, missed the dates. > Unfortunately nothing from journald: > (should be investigated!) > > > << Xorg (journald) >> > > ^^ Xorg (journald) ^^ > > There seem to be no traces from the 340.xx driver left. Strange that there's no recent Xorg log and nothing in journald. Michael, Is that by any chance a headless dev machine, ie. one where you don't run a graphical Xorg session at all normally? Kind regards, Luca Boccassi
Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)
On 2016-03-20 21:06, Luca Boccassi wrote: > But I noticed something strange that caught my attention in your Xorg > log: > > [ 2634.192] (II) NVIDIA GLX Module 340.93 Wed Aug 19 16:23:51 PDT > 2015 > [ 2634.192] (II) LoadModule: "nvidia" > [ 2634.192] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so > [ 2634.193] (II) Module nvidia: vendor="NVIDIA Corporation" > [ 2634.193] compiled for 4.0.2, module version = 1.0.0 > [ 2634.193] Module class: X.Org Video Driver > [ 2634.193] (II) NVIDIA dlloader X Driver 340.93 Wed Aug 19 16:01:53 > PDT 2015 > > It looks like the older 340.93 library is being loaded? But all the > symlinks and the package versions in your system indicate that 352.79 is > installed, any idea why this might be happening? That's a red herring. Ancient logfiles. > -rw-r--r-- 1 root root21716 Oct 21 20:45 /var/log/Xorg.0.log > -rw-r--r-- 1 root root21716 Oct 21 20:43 /var/log/Xorg.0.log.old Unfortunately nothing from journald: (should be investigated!) > << Xorg (journald) >> > ^^ Xorg (journald) ^^ There seem to be no traces from the 340.xx driver left. Andreas
Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)
Hi, Am So 20 Mär 2016 20:06:10 CET schrieb Luca Boccassi: > It looks like the older 340.93 library is being loaded? But all the > symlinks and the package versions in your system indicate that 352.79 > is installed, any idea why this might be happening? Sorry, no idea... Maybe there has been an error during DKMS installation so that the module was not replaced properly? But the current module seems to be there in glxgears? > If you run: > > glxgears -info > > What's the reported version? $ glxgears -info Running synchronized to the vertical refresh. The framerate should be approximately the same as the monitor refresh rate. GL_RENDERER = GeForce GTX 750 Ti/PCIe/SSE2 GL_VERSION= 4.5.0 NVIDIA 352.79 GL_VENDOR = NVIDIA Corporation > Finally, if you run manually: > > sudo modprobe -v nvidia-uvm > > What's the output? That just works??? $ sudo modprobe -v nvidia-uvm install modprobe nvidia ; modprobe -i nvidia-current-uvm $CMDLINE_OPTS insmod /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current.ko insmod /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current-uvm.ko $ lsmod|grep nvidia nvidia_uvm 73728 0 nvidia 8540160 85 nvidia_uvm drm 356352 6 nvidia I just did a reboot, started darktable -- again, with the openCL errors -- tried glxgears, but now manually nvidia-uvm is loaded fine, and after it is loaded darktable openCL works too. Sorry, I jumped to a conclusion, this seems to be a different issue than the previous bug report. Cheers Michael
Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)
Contro: tag -1 moreinfo On Sun, 2016-03-20 at 16:07 +0100, Michael Below wrote: > Package: nvidia-kernel-dkms > Version: 352.79-5 > Severity: normal > > Dear Maintainer, > > after some updates to my Debian testing installation I noticed that OpenCL > stopped working with my graphics card. For darktable, the debug log looked > like: > > [opencl_init] found opencl runtime library 'libOpenCL' > [opencl_init] opencl library 'libOpenCL' found on your system and loaded > modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not > permitted > modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() Error running > install command for nvidia_uvm > modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted > [opencl_init] could not get platforms: -1001 > [opencl_init] FINALLY: opencl is NOT AVAILABLE on this system. > [opencl_init] initial status of opencl enabled flag is OFF. > > This seems to be the same bug as in #812396, so I tried the same fix as > proposed there (sudo modprobe --force-modversion nvidia-current-uvm), and now > it works: > > [opencl_init] found opencl runtime library 'libOpenCL' > [opencl_init] opencl library 'libOpenCL' found on your system and loaded > [opencl_init] found 1 platform > [opencl_init] found 1 device > [opencl_init] device 0 `GeForce GTX 750 Ti' has sm_20 support. > [opencl_init] device 0 `GeForce GTX 750 Ti' supports image sizes of 16384 x > 16384 > [opencl_init] device 0 `GeForce GTX 750 Ti' allows GPU memory allocations of > up to 511MB > [opencl_init] device 0: GeForce GTX 750 Ti > GLOBAL_MEM_SIZE: 2047MB > MAX_WORK_GROUP_SIZE: 1024 > MAX_WORK_ITEM_DIMENSIONS: 3 > MAX_WORK_ITEM_SIZES: [ 1024 1024 64 ] > DRIVER_VERSION: 352.79 > DEVICE_VERSION: OpenCL 1.2 CUDA > [opencl_init] compiling program `demosaic_ppg.cl' .. > [opencl_load_program] could not load cached binary program, trying to compile > source > [opencl_load_program] successfully loaded program from > `/usr/share/darktable/kernels/demosaic_ppg.cl' > [opencl_build_program] successfully built program > [opencl_build_program] BUILD STATUS: 0 Hi, Sorry for your problems. Unfortunately I cannot reproduce with the same version of the drivers on my amd64 Jessie desktop, opencl-demo works just fine and nvidia-uvm loads: ./cl-demo 10 5 Choose platform: [0] NVIDIA Corporation Enter choice: Choose device: [0] GeForce GTX 780 Enter choice: - NAME: GeForce GTX 780 VENDOR: NVIDIA Corporation PROFILE: FULL_PROFILE VERSION: OpenCL 1.2 CUDA EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 DRIVER_VERSION: 352.79 0.000265 s 0.000453 GB/s GOOD $ lsmod | grep nvidia nvidia_uvm 73728 0 nvidia 8540160 71 nvidia_uvm drm 352256 3 nvidia But I noticed something strange that caught my attention in your Xorg log: [ 2634.192] (II) NVIDIA GLX Module 340.93 Wed Aug 19 16:23:51 PDT 2015 [ 2634.192] (II) LoadModule: "nvidia" [ 2634.192] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so [ 2634.193] (II) Module nvidia: vendor="NVIDIA Corporation" [ 2634.193]compiled for 4.0.2, module version = 1.0.0 [ 2634.193]Module class: X.Org Video Driver [ 2634.193] (II) NVIDIA dlloader X Driver 340.93 Wed Aug 19 16:01:53 PDT 2015 It looks like the older 340.93 library is being loaded? But all the symlinks and the package versions in your system indicate that 352.79 is installed, any idea why this might be happening? If you run: glxgears -info What's the reported version? Finally, if you run manually: sudo modprobe -v nvidia-uvm What's the output? Kind regards, Luca Boccassi signature.asc Description: This is a digitally signed message part
Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)
Package: nvidia-kernel-dkms Version: 352.79-5 Severity: normal Dear Maintainer, after some updates to my Debian testing installation I noticed that OpenCL stopped working with my graphics card. For darktable, the debug log looked like: [opencl_init] found opencl runtime library 'libOpenCL' [opencl_init] opencl library 'libOpenCL' found on your system and loaded modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not permitted modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() Error running install command for nvidia_uvm modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted [opencl_init] could not get platforms: -1001 [opencl_init] FINALLY: opencl is NOT AVAILABLE on this system. [opencl_init] initial status of opencl enabled flag is OFF. This seems to be the same bug as in #812396, so I tried the same fix as proposed there (sudo modprobe --force-modversion nvidia-current-uvm), and now it works: [opencl_init] found opencl runtime library 'libOpenCL' [opencl_init] opencl library 'libOpenCL' found on your system and loaded [opencl_init] found 1 platform [opencl_init] found 1 device [opencl_init] device 0 `GeForce GTX 750 Ti' has sm_20 support. [opencl_init] device 0 `GeForce GTX 750 Ti' supports image sizes of 16384 x 16384 [opencl_init] device 0 `GeForce GTX 750 Ti' allows GPU memory allocations of up to 511MB [opencl_init] device 0: GeForce GTX 750 Ti GLOBAL_MEM_SIZE: 2047MB MAX_WORK_GROUP_SIZE: 1024 MAX_WORK_ITEM_DIMENSIONS: 3 MAX_WORK_ITEM_SIZES: [ 1024 1024 64 ] DRIVER_VERSION: 352.79 DEVICE_VERSION: OpenCL 1.2 CUDA [opencl_init] compiling program `demosaic_ppg.cl' .. [opencl_load_program] could not load cached binary program, trying to compile source [opencl_load_program] successfully loaded program from `/usr/share/darktable/kernels/demosaic_ppg.cl' [opencl_build_program] successfully built program [opencl_build_program] BUILD STATUS: 0 Thanks for your work! Michael -- Package-specific info: uname -a: Linux ossietzky 4.4.0-1-amd64 #1 SMP Debian 4.4.6-1 (2016-03-17) x86_64 GNU/Linux /proc/version: Linux version 4.4.0-1-amd64 (debian-ker...@lists.debian.org) (gcc version 5.3.1 20160307 (Debian 5.3.1-11) ) #1 SMP Debian 4.4.6-1 (2016-03-17) /proc/driver/nvidia/version: NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.79 Wed Jan 13 16:17:53 PST 2016 GCC version: gcc version 5.3.1 20160307 (Debian 5.3.1-11) lspci 'VGA compatible controller [0300]': 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] [10de:1380] (rev a2) (prog-if 00 [VGA controller]) Subsystem: eVga.com. Corp. GM107 [GeForce GTX 750 Ti] [3842:3751] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: nvidia Kernel modules: nvidia dmesg: [0.00] AGP: No AGP bridge found [0.00] AGP: Checking aperture... [0.00] AGP: No AGP bridge found [0.00] AGP: Node 0: aperture [bus addr 0xd400-0xd5ff] (32MB) [0.00] AGP: Your BIOS doesn't leave an aperture memory hole [0.00] AGP: Please enable the IOMMU option in the BIOS setup [0.00] AGP: This costs you 64MB of RAM [0.00] AGP: Mapping aperture over RAM [mem 0xd400-0xd7ff] (65536KB) [0.00] Console: colour VGA+ 80x25 [0.247215] vgaarb: setting as boot device: PCI::02:00.0 [0.247218] vgaarb: device added: PCI::02:00.0,decodes=io+mem,owns=io+mem,locks=none [0.247219] vgaarb: loaded [0.247220] vgaarb: bridge control possible :02:00.0 [0.824987] PCI-DMA: Disabling AGP. [0.825082] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture [0.851942] Linux agpgart interface v0.103 [ 16.001321] nvidia: module license 'NVIDIA' taints kernel. [ 16.013241] vgaarb: device changed decodes: PCI::02:00.0,olddecodes=io+mem,decodes=none:owns=io+mem [ 16.013553] [drm] Initialized nvidia-drm 0.0.0 20150116 for :02:00.0 on minor 0 [ 16.013565] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 352.79 Wed Jan 13 16:17:53 PST 2016 [ 18.432502] snd_hda_intel :02:00.1: Handle vga_switcheroo audio client [ 19.069316] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci:00/:00:02.0/:02:00.1/sound/card0/input10 [ 19.069533] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci:00/:00:02.0/:02:00.1/sound/card0/input11 [ 19.069714] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci:00/:00:02.0/:02:00.1/sound/card0/input12 [ 659.062928] nvidia: module_layout: kernel tainted. [ 659.220248] nvidia_uvm: Loaded the UVM driver, major device number 247 Device node permissions: crw-rw+ 1 root video 226, 0 Mar 20 15:39 /dev/dri/card0 crw-rw-rw- 1 root root 247, 0 Mar 20 15:49 /dev/nvidia-uvm