leezu edited a comment on pull request #18542:
URL: https://github.com/apache/incubator-mxnet/pull/18542#issuecomment-645790195


   @yzhliu It should be the other way round. Let's open the CI Docker 
container: `docker run -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash` and look 
at the shared libraries in `/usr/local/cuda`:
   
   ```
   root@de49f0e1966c:/work/mxnet# find /usr/local/cuda-10.2 -name "*.so*"
   /usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.440.33.01
   /usr/local/cuda-10.2/compat/libcuda.so
   /usr/local/cuda-10.2/compat/libcuda.so.1
   /usr/local/cuda-10.2/compat/libcuda.so.440.33.01
   /usr/local/cuda-10.2/compat/libnvidia-fatbinaryloader.so.440.33.01
   /usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so
   /usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.1
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1.1
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppim.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcurand.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnpps.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppial.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvrtc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppist.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppig.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppidei.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolver.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicom.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppif.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufftw.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolverMg.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusparse.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvgraph.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvjpeg.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppisu.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppitc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufft.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_target.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2.75
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_host.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10.3.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10.1.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10.3.0.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10.1.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10.3.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1.0.0
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10.1.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10.3.0.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10.2.89
   /usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3.3.0
   /usr/local/cuda-10.2/nvvm/lib64/libnvvm.so
   /usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3
   /usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3.3.0
   /usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so
   /usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3
   /usr/local/cuda-10.2/extras/Sanitizer/libsanitizer-public.so
   ```
   
   Because we don't use the nvidia docker command to run the container, only 
`stubs/libcuda.so` is available. If we're on a host with GPUs, we can use 
`docker run --gpus all -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash` and the 
`libcuda.so` from the host as well as the host GPUs will be available inside 
the container. But on a CPU host this just leads to
   
   ```
   docker: Error response from daemon: OCI runtime create failed: 
container_linux.go:349: starting container process caused 
"process_linux.go:449: container init caused \"process_linux.go:432: running 
prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: 
nvidia-container-cli: initialization error: nvml error: driver not 
loaded\\\\n\\\"\"": unknown.
   ERRO[0000] error waiting for container: context canceled
   ```
   
   The problem is that some part of the tvmop setup currenly requires 
`libcuda.so` to be available (it's listed as shared library dependency of some 
shared library that is opened). We need to check which library is introducing 
the dependency and consider how to fix it. Ideally there shouldn't be a 
dependency on `libcuda.so` as it's only available on GPU hosts.
   
   You can also refer to 
https://github.com/NVIDIA/nvidia-docker/issues/775#issuecomment-400035216 for a 
little background. The problem with the `compat/libcuda.so` AFAIK is that it 
does not necessarily fit the driver version of the host system.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to