Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)

2018-07-08 Thread Andreas Beckmann
Control: tag -1 moreinfo

On Tue, 22 Mar 2016 19:48:29 +0100 Michael Below  wrote:

> $ nvidia-modprobe -u
> modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not
> permitted 
> modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do()
> Error running install command for nvidia_uvm modprobe: ERROR: could not
> insert 'nvidia_uvm': Operation not permitted

Looking at this bug report again, this was likely an issue with
nvidia-modprobe (#888952), fixed in nvidia-modprobe 384.111-2, also
available in stretch.


Andreas



Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)

2016-03-22 Thread Michael Below
Hi,

Am Mo 21 Mär 2016 23:29:01 CET
schrieb Andreas Beckmann :

> So let's try what should happen within clinfo manually:
> 
> (as root)
> # modprobe -r nvidia-uvm
> 
> (as user)
> $ modprobe -v nvidia-uvm
 
I had to add /sbin to the path, the results are similar to what you are
expecting:

$ PATH=$PATH:/sbin modprobe -v nvidia-uvm
install modprobe nvidia ; modprobe -i nvidia-current-uvm $CMDLINE_OPTS 
insmod /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current.ko 
modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not
permitted 
modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do()
Error running install command for nvidia_uvm modprobe: ERROR: could not
insert 'nvidia_uvm': Operation not permitted

> But the libnvidia-opencl.so.1 library does not call modprobe directly,
> it uses the nvidia-modprobe setuid root wrapper instead:
> 
> (as user)
> $ nvidia-modprobe -u
> $ dmesg | tail
> $ ls -la /usr/bin/nvidia-modprobe
> 
> Does that work?
> If it doesn't, do you use anything for extra hardening of the system?
> (selinux, apparmor, ...?)

It doesn't work:

$ nvidia-modprobe -u
modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not
permitted 
modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do()
Error running install command for nvidia_uvm modprobe: ERROR: could not
insert 'nvidia_uvm': Operation not permitted

$ dmesg|tail
[   30.210734] systemd[1]: proc-sys-fs-binfmt_misc.automount: Got
automount request for /proc/sys/fs/binfmt_misc, triggered by 724
(update-binfmts) 
[   30.225790] systemd[1]: Mounting Arbitrary Executable File Formats
File System... 
[   30.310574] systemd[1]: Started Journal Service. 
[   30.788185] systemd-journald[717]: Received request to flush runtime
journal from PID 1 
[   33.445899] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready 
[   36.568635] tg3 :3f:00.0 eth0: Link is up at 1000 Mbps, full
duplex 
[   36.568658] tg3 :3f:00.0 eth0: Flow control is on for TX and on
for RX 
[   36.568686] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  134.948174] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data
byte 7 
[  257.279351] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD
data byte 7

$ ls -la /usr/bin/nvidia-modprobe
-rwsr-xr-x 1 root root 31224 Feb 17 00:47 /usr/bin/nvidia-modprobe

apparmor is installed, but I have not changed the configuration.
Maybe they have tightened the rules in a recent update? Should I remove
it?

Cheers
Michael



Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)

2016-03-21 Thread Andreas Beckmann
[Please keep the bug Cc:ed]

On 2016-03-21 22:09, Michael Below wrote:
> Hi,
> 
> Am Mo 21 Mär 2016 00:12:04 CET
> schrieb Andreas Beckmann :
>  
>>> I just did a reboot, started darktable   
>>
>> you start darktable manually?
> 
> Yes, for testing purposes I start darktable from a terminal window
> using "darktable -d opencl" for opencl debugging output. Otherwise you
> just notice that the image processing is _slow_.
> 
>> Which libopencl1 library are you using?
>> (Note to myself: collect this info in bug-control, and the installed
>> icds, too, and /etc/OpenCL/vendors in bug-script)
> 
> I was using ocl-icd-opencl1 2.2.9-1. I noticed that I have both an
> amd64 and an i386 version of the library installed, and I don't
> think I am using any i386 opencl programs -- should I remove that one?
> 
>> Please install the ocd-icd-libopencl1 and clinfo packages from
>> stretch.
> 
> Done, I already had those installed.
> 
>> and now try clinfo (on the nvidia platform only) to see whether the
>> nvidia-uvm module is loaded automatically
> 
> First I tried it without setting the variable:
> 
> $ clinfo
> modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not
> permitted modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do()
> Error running install command for nvidia_uvm modprobe: ERROR: could not
> insert 'nvidia_uvm': Operation not permitted Number of
> platforms   0
> 
> Then I tried it with OCL_ICD_VENDORS set, same result.
> 
> Any idea what to check next? Maybe the nvidia-icd-libopencl1?

No, it's not a problem with libOpenCL.so, but some permission problem 
with loading the module.

So let's try what should happen within clinfo manually:

(as root)
# modprobe -r nvidia-uvm

(as user)
$ modprobe -v nvidia-uvm

I expect this to fail similarily to the following error:
=
install modprobe nvidia ; modprobe -i nvidia-current-uvm $CMDLINE_OPTS 
insmod /lib/modules/4.3.0-1-amd64/nvidia/nvidia-current.ko 
modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not permitted
modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() Error running 
install command for nvidia_uvm
modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted
=

But the libnvidia-opencl.so.1 library does not call modprobe directly,
it uses the nvidia-modprobe setuid root wrapper instead:

(as user)
$ nvidia-modprobe -u
$ dmesg | tail
$ ls -la /usr/bin/nvidia-modprobe

Does that work?
If it doesn't, do you use anything for extra hardening of the system?
(selinux, apparmor, ...?)

Andreas



Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)

2016-03-20 Thread Andreas Beckmann
Control: retitle -1 nvidia-kernel-dkms: autoloading nvidia-uvm fails

On 2016-03-20 22:36, Michael Below wrote:
>> sudo modprobe -v nvidia-uvm
>>
>> What's the output?
> 
> That just works???

good!

> I just did a reboot, started darktable 

you start darktable manually?

> -- again, with the openCL
> errors -- tried glxgears, but now manually nvidia-uvm is loaded fine,
> and after it is loaded darktable openCL works too.
> 
> Sorry, I jumped to a conclusion, this seems to be a different issue
> than the previous bug report.

no problem, now we have something different to investigate :-)

Which libopencl1 library are you using?
(Note to myself: collect this info in bug-control, and the installed
icds, too, and /etc/OpenCL/vendors in bug-script)

Please install the ocd-icd-libopencl1 and clinfo packages from stretch.

unload the nvidia-uvm module (nvidia will stay loaded, so X can continue
to run)

(as root)
# modprobe -r nvidia-uvm

and now try clinfo (on the nvidia platform only) to see whether the
nvidia-uvm module is loaded automatically

(as user)
$ OCL_ICD_VENDORS=nvidia.icd clinfo
$ dmesg | tail

That worked for me, i.e. nvidia-uvm got loaded automatically.

If that works, try again unloading and start darktable with
OCL_ICD_VENDORS=nvidia.icd set


Andreas



Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)

2016-03-20 Thread Luca Boccassi
On Mar 20, 2016 22:33, "Andreas Beckmann"  wrote:
>
> On 2016-03-20 21:06, Luca Boccassi wrote:
> > But I noticed something strange that caught my attention in your Xorg
> > log:
> >
> > [  2634.192] (II) NVIDIA GLX Module  340.93  Wed Aug 19 16:23:51 PDT
> > 2015
> > [  2634.192] (II) LoadModule: "nvidia"
> > [  2634.192] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
> > [  2634.193] (II) Module nvidia: vendor="NVIDIA Corporation"
> > [  2634.193]  compiled for 4.0.2, module version = 1.0.0
> > [  2634.193]  Module class: X.Org Video Driver
> > [  2634.193] (II) NVIDIA dlloader X Driver  340.93  Wed Aug 19 16:01:53
> > PDT 2015
> >
> > It looks like the older 340.93 library is being loaded? But all the
> > symlinks and the package versions in your system indicate that 352.79 is
> > installed, any idea why this might be happening?
>
> That's a red herring. Ancient logfiles.
>
> > -rw-r--r-- 1 root root21716 Oct 21 20:45 /var/log/Xorg.0.log
> > -rw-r--r-- 1 root root21716 Oct 21 20:43 /var/log/Xorg.0.log.old

Ah good point, missed the dates.

> Unfortunately nothing from journald:
> (should be investigated!)
>
> > << Xorg (journald) >>
> > ^^ Xorg (journald) ^^
>
> There seem to be no traces from the 340.xx driver left.

Strange that there's no recent Xorg log and nothing in journald.

Michael,

Is that by any chance a headless dev machine, ie. one where you don't run a
graphical Xorg session at all normally?

Kind regards,
Luca Boccassi


Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)

2016-03-20 Thread Andreas Beckmann
On 2016-03-20 21:06, Luca Boccassi wrote:
> But I noticed something strange that caught my attention in your Xorg
> log:
> 
> [  2634.192] (II) NVIDIA GLX Module  340.93  Wed Aug 19 16:23:51 PDT
> 2015
> [  2634.192] (II) LoadModule: "nvidia"
> [  2634.192] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
> [  2634.193] (II) Module nvidia: vendor="NVIDIA Corporation"
> [  2634.193]  compiled for 4.0.2, module version = 1.0.0
> [  2634.193]  Module class: X.Org Video Driver
> [  2634.193] (II) NVIDIA dlloader X Driver  340.93  Wed Aug 19 16:01:53
> PDT 2015
> 
> It looks like the older 340.93 library is being loaded? But all the
> symlinks and the package versions in your system indicate that 352.79 is
> installed, any idea why this might be happening?

That's a red herring. Ancient logfiles.

> -rw-r--r-- 1 root root21716 Oct 21 20:45 /var/log/Xorg.0.log
> -rw-r--r-- 1 root root21716 Oct 21 20:43 /var/log/Xorg.0.log.old

Unfortunately nothing from journald:
(should be investigated!)

> << Xorg (journald) >>
> ^^ Xorg (journald) ^^

There seem to be no traces from the 340.xx driver left.


Andreas



Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)

2016-03-20 Thread Michael Below
Hi,

Am So 20 Mär 2016 20:06:10 CET
schrieb Luca Boccassi :

> It looks like the older 340.93 library is being loaded? But all the
> symlinks and the package versions in your system indicate that 352.79
> is installed, any idea why this might be happening?

Sorry, no idea... Maybe there has been an error during DKMS
installation so that the module was not replaced properly? But the
current module seems to be there in glxgears?

> If you run:
> 
> glxgears -info
> 
> What's the reported version?

$ glxgears -info
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
GL_RENDERER   = GeForce GTX 750 Ti/PCIe/SSE2
GL_VERSION= 4.5.0 NVIDIA 352.79
GL_VENDOR = NVIDIA Corporation

> Finally, if you run manually:
> 
> sudo modprobe -v nvidia-uvm
> 
> What's the output?

That just works???

$ sudo modprobe -v nvidia-uvm
install modprobe nvidia ; modprobe -i nvidia-current-uvm $CMDLINE_OPTS 
insmod /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current.ko 
insmod /lib/modules/4.4.0-1-amd64/updates/dkms/nvidia-current-uvm.ko 

$ lsmod|grep nvidia
nvidia_uvm 73728  0
nvidia   8540160  85 nvidia_uvm
drm   356352  6 nvidia

I just did a reboot, started darktable -- again, with the openCL
errors -- tried glxgears, but now manually nvidia-uvm is loaded fine,
and after it is loaded darktable openCL works too.

Sorry, I jumped to a conclusion, this seems to be a different issue
than the previous bug report.

Cheers
Michael



Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)

2016-03-20 Thread Luca Boccassi
Contro: tag -1 moreinfo

On Sun, 2016-03-20 at 16:07 +0100, Michael Below wrote:
> Package: nvidia-kernel-dkms
> Version: 352.79-5
> Severity: normal
> 
> Dear Maintainer,
> 
> after some updates to my Debian testing installation I noticed that OpenCL
> stopped working with my graphics card. For darktable, the debug log looked 
> like:
> 
> [opencl_init] found opencl runtime library 'libOpenCL'
> [opencl_init] opencl library 'libOpenCL' found on your system and loaded
> modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not 
> permitted
> modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() Error running 
> install command for nvidia_uvm
> modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted
> [opencl_init] could not get platforms: -1001
> [opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
> [opencl_init] initial status of opencl enabled flag is OFF.
> 
> This seems to be the same bug as in #812396, so I tried the same fix as
> proposed there (sudo modprobe --force-modversion nvidia-current-uvm), and now
> it works:
> 
> [opencl_init] found opencl runtime library 'libOpenCL'
> [opencl_init] opencl library 'libOpenCL' found on your system and loaded
> [opencl_init] found 1 platform
> [opencl_init] found 1 device
> [opencl_init] device 0 `GeForce GTX 750 Ti' has sm_20 support.
> [opencl_init] device 0 `GeForce GTX 750 Ti' supports image sizes of 16384 x 
> 16384
> [opencl_init] device 0 `GeForce GTX 750 Ti' allows GPU memory allocations of 
> up to 511MB
> [opencl_init] device 0: GeForce GTX 750 Ti 
>  GLOBAL_MEM_SIZE:  2047MB
>  MAX_WORK_GROUP_SIZE:  1024
>  MAX_WORK_ITEM_DIMENSIONS: 3
>  MAX_WORK_ITEM_SIZES:  [ 1024 1024 64 ]
>  DRIVER_VERSION:   352.79
>  DEVICE_VERSION:   OpenCL 1.2 CUDA
> [opencl_init] compiling program `demosaic_ppg.cl' ..
> [opencl_load_program] could not load cached binary program, trying to compile 
> source
> [opencl_load_program] successfully loaded program from 
> `/usr/share/darktable/kernels/demosaic_ppg.cl'
> [opencl_build_program] successfully built program
> [opencl_build_program] BUILD STATUS: 0

Hi,

Sorry for your problems.

Unfortunately I cannot reproduce with the same version of the drivers on
my amd64 Jessie desktop, opencl-demo works just fine and nvidia-uvm
loads:

./cl-demo 10 5
Choose platform:
[0] NVIDIA Corporation
Enter choice: 
Choose device:
[0] GeForce GTX 780
Enter choice: 
-
NAME: GeForce GTX 780
VENDOR: NVIDIA Corporation
PROFILE: FULL_PROFILE
VERSION: OpenCL 1.2 CUDA
EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts  cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_fp64 
DRIVER_VERSION: 352.79



0.000265 s
0.000453 GB/s
GOOD

$ lsmod | grep nvidia
nvidia_uvm 73728  0 
nvidia   8540160  71 nvidia_uvm
drm   352256  3 nvidia


But I noticed something strange that caught my attention in your Xorg
log:

[  2634.192] (II) NVIDIA GLX Module  340.93  Wed Aug 19 16:23:51 PDT
2015
[  2634.192] (II) LoadModule: "nvidia"
[  2634.192] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[  2634.193] (II) Module nvidia: vendor="NVIDIA Corporation"
[  2634.193]compiled for 4.0.2, module version = 1.0.0
[  2634.193]Module class: X.Org Video Driver
[  2634.193] (II) NVIDIA dlloader X Driver  340.93  Wed Aug 19 16:01:53
PDT 2015

It looks like the older 340.93 library is being loaded? But all the
symlinks and the package versions in your system indicate that 352.79 is
installed, any idea why this might be happening?

If you run:

glxgears -info

What's the reported version?

Finally, if you run manually:

sudo modprobe -v nvidia-uvm

What's the output?

Kind regards,
Luca Boccassi


signature.asc
Description: This is a digitally signed message part


Bug#818763: nvidia-kernel-dkms: fail to load nvidia-uvm (again)

2016-03-20 Thread Michael Below
Package: nvidia-kernel-dkms
Version: 352.79-5
Severity: normal

Dear Maintainer,

after some updates to my Debian testing installation I noticed that OpenCL
stopped working with my graphics card. For darktable, the debug log looked like:

[opencl_init] found opencl runtime library 'libOpenCL'
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
modprobe: ERROR: could not insert 'nvidia_current_uvm': Operation not permitted
modprobe: ERROR: ../libkmod/libkmod-module.c:977 command_do() Error running 
install command for nvidia_uvm
modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted
[opencl_init] could not get platforms: -1001
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.

This seems to be the same bug as in #812396, so I tried the same fix as
proposed there (sudo modprobe --force-modversion nvidia-current-uvm), and now
it works:

[opencl_init] found opencl runtime library 'libOpenCL'
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device
[opencl_init] device 0 `GeForce GTX 750 Ti' has sm_20 support.
[opencl_init] device 0 `GeForce GTX 750 Ti' supports image sizes of 16384 x 
16384
[opencl_init] device 0 `GeForce GTX 750 Ti' allows GPU memory allocations of up 
to 511MB
[opencl_init] device 0: GeForce GTX 750 Ti 
 GLOBAL_MEM_SIZE:  2047MB
 MAX_WORK_GROUP_SIZE:  1024
 MAX_WORK_ITEM_DIMENSIONS: 3
 MAX_WORK_ITEM_SIZES:  [ 1024 1024 64 ]
 DRIVER_VERSION:   352.79
 DEVICE_VERSION:   OpenCL 1.2 CUDA
[opencl_init] compiling program `demosaic_ppg.cl' ..
[opencl_load_program] could not load cached binary program, trying to compile 
source
[opencl_load_program] successfully loaded program from 
`/usr/share/darktable/kernels/demosaic_ppg.cl'
[opencl_build_program] successfully built program
[opencl_build_program] BUILD STATUS: 0


Thanks for your work!

Michael



-- Package-specific info:
uname -a:
Linux ossietzky 4.4.0-1-amd64 #1 SMP Debian 4.4.6-1 (2016-03-17) x86_64 
GNU/Linux

/proc/version:
Linux version 4.4.0-1-amd64 (debian-ker...@lists.debian.org) (gcc version 5.3.1 
20160307 (Debian 5.3.1-11) ) #1 SMP Debian 4.4.6-1 (2016-03-17)

/proc/driver/nvidia/version:
NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.79  Wed Jan 13 16:17:53 PST 
2016
GCC version:  gcc version 5.3.1 20160307 (Debian 5.3.1-11) 

lspci 'VGA compatible controller [0300]':
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107 [GeForce GTX 
750 Ti] [10de:1380] (rev a2) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. GM107 [GeForce GTX 750 Ti] [3842:3751]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel driver in use: nvidia
Kernel modules: nvidia

dmesg:
[0.00] AGP: No AGP bridge found
[0.00] AGP: Checking aperture...
[0.00] AGP: No AGP bridge found
[0.00] AGP: Node 0: aperture [bus addr 0xd400-0xd5ff] (32MB)
[0.00] AGP: Your BIOS doesn't leave an aperture memory hole
[0.00] AGP: Please enable the IOMMU option in the BIOS setup
[0.00] AGP: This costs you 64MB of RAM
[0.00] AGP: Mapping aperture over RAM [mem 0xd400-0xd7ff] 
(65536KB)
[0.00] Console: colour VGA+ 80x25
[0.247215] vgaarb: setting as boot device: PCI::02:00.0
[0.247218] vgaarb: device added: 
PCI::02:00.0,decodes=io+mem,owns=io+mem,locks=none
[0.247219] vgaarb: loaded
[0.247220] vgaarb: bridge control possible :02:00.0
[0.824987] PCI-DMA: Disabling AGP.
[0.825082] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
[0.851942] Linux agpgart interface v0.103
[   16.001321] nvidia: module license 'NVIDIA' taints kernel.
[   16.013241] vgaarb: device changed decodes: 
PCI::02:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
[   16.013553] [drm] Initialized nvidia-drm 0.0.0 20150116 for :02:00.0 on 
minor 0
[   16.013565] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  352.79  Wed Jan 
13 16:17:53 PST 2016
[   18.432502] snd_hda_intel :02:00.1: Handle vga_switcheroo audio client
[   19.069316] input: HDA NVidia HDMI/DP,pcm=3 as 
/devices/pci:00/:00:02.0/:02:00.1/sound/card0/input10
[   19.069533] input: HDA NVidia HDMI/DP,pcm=7 as 
/devices/pci:00/:00:02.0/:02:00.1/sound/card0/input11
[   19.069714] input: HDA NVidia HDMI/DP,pcm=8 as 
/devices/pci:00/:00:02.0/:02:00.1/sound/card0/input12
[  659.062928] nvidia: module_layout: kernel tainted.
[  659.220248] nvidia_uvm: Loaded the UVM driver, major device number 247

Device node permissions:
crw-rw+ 1 root video 226,   0 Mar 20 15:39 /dev/dri/card0
crw-rw-rw-  1 root root  247,   0 Mar 20 15:49 /dev/nvidia-uvm