lgg opened a new issue #19991: URL: https://github.com/apache/incubator-mxnet/issues/19991
## Description (A clear and concise description of what the bug is.) I tried to build mxnet from source from tag `1.8.0` and from branch `v1.8.x` for cuda 11 support. My steps to reproduce: * `git clone --recursive https://github.com/apache/incubator-mxnet mxnet1.8` * `cd mxnet/` * I tried both: `git checkout v1.8.x` and `git checkout 1.8.0` * `cp config/linux_gpu.cmake config.cmake` also add content from distribution file (see my config.cmake below) * `mkdir build; cd build` * `cmake ..` (see output below for details) and it stacks in infinity loop ### cmake loop output <details> <summary>cmake loop output</summary> cmake .. -- The C compiler identification is GNU 9.3.0 -- The CXX compiler identification is GNU 9.3.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- CMAKE_CROSSCOMPILING FALSE -- CMAKE_HOST_SYSTEM_PROCESSOR x86_64 -- CMAKE_SYSTEM_PROCESSOR x86_64 -- CMAKE_SYSTEM_NAME Linux -- CMake version '3.16.3' using generator 'Unix Makefiles' -- Looking for a CUDA compiler -- Looking for a CUDA compiler - /usr/local/cuda-11.2/bin/nvcc CMake Warning at CMakeLists.txt:103 (message): CMAKE_CUDA_COMPILER guessed: /usr/local/cuda/bin/nvcc -- The CUDA compiler identification is NVIDIA 11.2.142 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Performing Test SUPPORT_CXX11 -- Performing Test SUPPORT_CXX11 - Success -- Performing Test SUPPORT_CXX0X -- Performing Test SUPPORT_CXX0X - Success -- CMAKE_BUILD_TYPE is unset, defaulting to Release -- Intel MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value `OFF` -- Intel MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value `OFF` -- Intel MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to MKLDNN_ENABLE_JIT_PROFILING with value `OFF` -- Intel MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value `STATIC` -- Intel MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value `` -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Found Git: /usr/bin/git (found version "2.25.1") -- Primitive cache is enabled -- Using intgemm -- Compiling with OpenMP -- Found OpenMP_C: -fopenmp -- Found OpenMP_CXX: -fopenmp -- Found OpenMP: TRUE -- Found OpenBLAS libraries: /usr/lib/x86_64-linux-gnu/libopenblas.so -- Found OpenBLAS include: /usr/include/x86_64-linux-gnu -- Found OpenCV: /usr (found version "4.2.0") found components: core highgui imgproc imgcodecs -- OpenCV 4.2.0 found (/usr/lib/x86_64-linux-gnu/cmake/opencv4) -- OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs USE_LAPACK is ON CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project): VERSION keyword not followed by a value or was followed by a value that expanded to nothing. -- Found PythonInterp: /usr/bin/python (found version "2.7.18") -- Found GTest: gtest -- Could NOT find CUDNN (missing: CUDNN_LIBRARY CUDNN_INCLUDE) -- Found OpenMP_C: -fopenmp -- Found OpenMP_CXX: -fopenmp -- Looking for clock_gettime in rt -- Looking for clock_gettime in rt - found -- Looking for fopen64 -- Looking for fopen64 - not found -- Looking for C++ include cxxabi.h -- Looking for C++ include cxxabi.h - found -- Looking for nanosleep -- Looking for nanosleep - found -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include -- Check if the system is big endian -- Searching 16 bit integer -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of unsigned short -- Check size of unsigned short - done -- Using unsigned short -- Check if the system is big endian - little endian -- /home/f.golovin/mxnet1.8/mxnet1.8/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h -- Performing Test SUPPORT_MSSE2 -- Performing Test SUPPORT_MSSE2 - Success -- Autodetected CUDA architecture(s): 6.1 6.1 -- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_61,code=sm_61 -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.2.142") -- Could NOT find NCCL (missing: NCCL_INCLUDE_DIRS NCCL_LIBRARIES) CMake Warning at CMakeLists.txt:645 (message): Could not find NCCL libraries -- Performing Test SUPPORT_MSSE3 -- Performing Test SUPPORT_MSSE3 - Success -- Determining F16C support -- Performing Test COMPILER_SUPPORT_MF16C -- Performing Test COMPILER_SUPPORT_MF16C - Success -- Using 64-bit integer for tensor size -- CUDA: Adding NVCC options: --fatbin-options --compress-all -- Configuring done You have changed variables that require your cache to be deleted. Configure will be re-run and you may have to reset some variables. The following variables have changed: CMAKE_CUDA_COMPILER= /usr/local/cuda-11.2/bin/nvcc CMAKE_CUDA_COMPILER= /usr/local/cuda-11.2/bin/nvcc -- The C compiler identification is GNU 9.3.0 -- The CXX compiler identification is GNU 9.3.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- CMAKE_CROSSCOMPILING FALSE -- CMAKE_HOST_SYSTEM_PROCESSOR x86_64 -- CMAKE_SYSTEM_PROCESSOR x86_64 -- CMAKE_SYSTEM_NAME Linux -- CMake version '3.16.3' using generator 'Unix Makefiles' CMake Warning at CMakeLists.txt:103 (message): CMAKE_CUDA_COMPILER guessed: /usr/local/cuda/bin/nvcc -- The CUDA compiler identification is NVIDIA 11.2.142 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Performing Test SUPPORT_CXX11 -- Performing Test SUPPORT_CXX11 - Success -- Performing Test SUPPORT_CXX0X -- Performing Test SUPPORT_CXX0X - Success -- CMAKE_BUILD_TYPE is unset, defaulting to Release -- Intel MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value `OFF` -- Intel MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value `OFF` -- Intel MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to MKLDNN_ENABLE_JIT_PROFILING with value `OFF` -- Intel MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value `STATIC` -- Intel MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value `` -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Found Git: /usr/bin/git (found version "2.25.1") -- Primitive cache is enabled -- Using intgemm -- Compiling with OpenMP -- Found OpenMP_C: -fopenmp -- Found OpenMP_CXX: -fopenmp -- Found OpenMP: TRUE -- Found OpenBLAS libraries: /usr/lib/x86_64-linux-gnu/libopenblas.so -- Found OpenBLAS include: /usr/include/x86_64-linux-gnu -- Found OpenCV: /usr (found version "4.2.0") found components: core highgui imgproc imgcodecs -- OpenCV 4.2.0 found (/usr/lib/x86_64-linux-gnu/cmake/opencv4) -- OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs USE_LAPACK is ON CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project): VERSION keyword not followed by a value or was followed by a value that expanded to nothing. -- Found PythonInterp: /usr/bin/python (found version "2.7.18") -- Found GTest: gtest -- Could NOT find CUDNN (missing: CUDNN_LIBRARY CUDNN_INCLUDE) -- Found OpenMP_C: -fopenmp -- Found OpenMP_CXX: -fopenmp -- Looking for clock_gettime in rt -- Looking for clock_gettime in rt - found -- Looking for fopen64 -- Looking for fopen64 - not found -- Looking for C++ include cxxabi.h -- Looking for C++ include cxxabi.h - found -- Looking for nanosleep -- Looking for nanosleep - found -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include -- Check if the system is big endian -- Searching 16 bit integer -- Looking for sys/types.h ^C </details> ## nvcc paths: In output above all paths for nvcc are valid ``` ... CMake Warning at CMakeLists.txt:103 (message): CMAKE_CUDA_COMPILER guessed: /usr/local/cuda/bin/nvcc ... -- The CUDA compiler identification is NVIDIA 11.2.142 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works ... You have changed variables that require your cache to be deleted. Configure will be re-run and you may have to reset some variables. The following variables have changed: CMAKE_CUDA_COMPILER= /usr/local/cuda-11.2/bin/nvcc CMAKE_CUDA_COMPILER= /usr/local/cuda-11.2/bin/nvcc ``` I checked this paths: ``` user@ml-dev:~/mxnet1.8/mxnet1.8/build$ ll /usr/local/cuda-11.2/bin/nvcc -rwxr-xr-x 1 root root 5473704 фев 28 17:25 /usr/local/cuda-11.2/bin/nvcc* user@ml-dev:~/mxnet1.8/mxnet1.8/build$ ll /usr/local/cuda/bin/nvcc -rwxr-xr-x 1 root root 5473704 фев 28 17:25 /usr/local/cuda/bin/nvcc* ``` ## What have you tried to solve it? 1. Checked path for nvcc as said in this issue: https://github.com/apache/incubator-mxnet/issues/17761 ## Environment ***We recommend using our script for collecting the diagnostic information with the following command*** `curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3` <details> <summary>Environment Information</summary> ----------Python Info---------- Version : 3.8.5 Compiler : GCC 9.3.0 Build : ('default', 'Jan 27 2021 15:41:15') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 20.0.2 Directory : /usr/lib/python3/dist-packages/pip ----------MXNet Info----------- Version : 2.0.0 Directory : /home/user/.local/lib/python3.8/site-packages/mxnet-2.0.0-py3.8.egg/mxnet Commit hash file "/home/user/.local/lib/python3.8/site-packages/mxnet-2.0.0-py3.8.egg/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source. Library : ['/home/user/.local/lib/python3.8/site-packages/mxnet-2.0.0-py3.8.egg/mxnet/libmxnet.so'] Build features: ✔ CUDA ✖ CUDNN ✖ NCCL ✖ TENSORRT ✖ CUTENSOR ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✔ CPU_SSE4_1 ✔ CPU_SSE4_2 ✖ CPU_SSE4A ✔ CPU_AVX ✖ CPU_AVX2 ✔ OPENMP ✖ SSE ✔ F16C ✖ JEMALLOC ✔ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✖ BLAS_APPLE ✔ LAPACK ✔ MKLDNN ✔ OPENCV ✖ DIST_KVSTORE ✔ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ✖ TVM_OP ----------System Info---------- Platform : Linux-5.8.0-44-generic-x86_64-with-glibc2.29 system : Linux node : ml-dev release : 5.8.0-44-generic version : #50~20.04.1-Ubuntu SMP Wed Feb 10 21:07:30 UTC 2021 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz Stepping: 10 CPU MHz: 4327.154 CPU max MHz: 4700,0000 CPU min MHz: 800,0000 BogoMIPS: 7399.70 Virtualization: VT-x L1d cache: 192 KiB L1i cache: 192 KiB L2 cache: 1,5 MiB L3 cache: 12 MiB NUMA node0 CPU(s): 0-11 Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling Vulnerability Srbds: Mitigation; Microcode Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology no nstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdc m pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowpref etch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsb ase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xs avec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0024 sec, LOAD: 0.4952 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.2280 sec, LOAD: 0.5484 sec. Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)>, DNS finished in 1.8177919387817383 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0738 sec, LOAD: 0.8640 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0356 sec, LOAD: 0.7704 sec. Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.11126899719238281 sec. ----------Environment---------- </details> More env info from me: <details> <summary>More Environment Information</summary> user@ml-dev:~/mxnet1.8/mxnet1.8/build$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Jan_28_19:32:09_PST_2021 Cuda compilation tools, release 11.2, V11.2.142 Build cuda_11.2.r11.2/compiler.29558016_0 user@ml-dev:~/mxnet1.8/mxnet1.8/build$ uname -a Linux ml-dev 5.8.0-44-generic #50~20.04.1-Ubuntu SMP Wed Feb 10 21:07:30 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux user@ml-dev:~/mxnet1.8/mxnet1.8/build$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal </details> ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
