leezu edited a comment on pull request #18542: URL: https://github.com/apache/incubator-mxnet/pull/18542#issuecomment-645790195
@yzhliu It should be the other way round. Let's open the CI Docker container: `docker run -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash` and look at the shared libraries in `/usr/local/cuda`: ``` root@de49f0e1966c:/work/mxnet# find /usr/local/cuda-10.2 -name "*.so*" /usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.440.33.01 /usr/local/cuda-10.2/compat/libcuda.so /usr/local/cuda-10.2/compat/libcuda.so.1 /usr/local/cuda-10.2/compat/libcuda.so.440.33.01 /usr/local/cuda-10.2/compat/libnvidia-fatbinaryloader.so.440.33.01 /usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so /usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.1 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1.1 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppim.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppc.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicc.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcurand.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnpps.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppial.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvrtc.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppist.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppig.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppidei.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolver.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicom.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppif.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufftw.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolverMg.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusparse.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvgraph.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvjpeg.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppisu.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppitc.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufft.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_target.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2.75 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_host.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10.3.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10.1.2.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10.3.0.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10.1.2.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10.3.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1.0.0 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10.1.2.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10.3.0.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10.2.1.89 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10.2.89 /usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3.3.0 /usr/local/cuda-10.2/nvvm/lib64/libnvvm.so /usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3 /usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3.3.0 /usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so /usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3 /usr/local/cuda-10.2/extras/Sanitizer/libsanitizer-public.so ``` Because we don't use the nvidia docker command to run the container, only `stubs/libcuda.so` is available. If we're on a host with GPUs, we can use `docker run --gpus all -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash` and the `libcuda.so` from the host as well as the host GPUs will be available inside the container. But on a CPU host this just leads to ``` docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\\\n\\\"\"": unknown. ERRO[0000] error waiting for container: context canceled ``` The problem is that some part of the tvmop setup currenly requires `libcuda.so` to be available (it's listed as shared library dependency of some shared library that is opened). We need to check which library is introducing the dependency and consider how to fix it. Ideally there shouldn't be a dependency on `libcuda.so` as it's only available on GPU hosts. You can also refer to https://github.com/NVIDIA/nvidia-docker/issues/775#issuecomment-400035216 for a little background. The problem with the `compat/libcuda.so` AFAIK is that it does not necessarily fit the driver version of the host system. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org