[GitHub] [incubator-mxnet] lgg opened a new issue #19991: Infinity loop with cmake from 1.8.0 release and v1.8.x branch

GitBox Sun, 07 Mar 2021 13:13:51 -0800


lgg opened a new issue #19991:
URL: https://github.com/apache/incubator-mxnet/issues/19991



   ## Description
   (A clear and concise description of what the bug is.)
   I tried to build mxnet from source from tag `1.8.0` and from branch `v1.8.x` 
for cuda 11 support.
   
   My steps to reproduce:
   * `git clone --recursive https://github.com/apache/incubator-mxnet mxnet1.8`
   * `cd mxnet/`
   * I tried both: `git checkout v1.8.x` and `git checkout 1.8.0`
   * `cp config/linux_gpu.cmake config.cmake` also add content from 
distribution file (see my config.cmake below)
   * `mkdir build; cd build`
   * `cmake ..` (see output below for details) and it stacks in infinity loop
   
   ### cmake loop output
   <details>
     <summary>cmake loop output</summary>
   cmake ..
   -- The C compiler identification is GNU 9.3.0
   -- The CXX compiler identification is GNU 9.3.0
   -- Check for working C compiler: /usr/bin/cc
   -- Check for working C compiler: /usr/bin/cc -- works
   -- Detecting C compiler ABI info
   -- Detecting C compiler ABI info - done
   -- Detecting C compile features
   -- Detecting C compile features - done
   -- Check for working CXX compiler: /usr/bin/c++
   -- Check for working CXX compiler: /usr/bin/c++ -- works
   -- Detecting CXX compiler ABI info
   -- Detecting CXX compiler ABI info - done
   -- Detecting CXX compile features
   -- Detecting CXX compile features - done
   -- CMAKE_CROSSCOMPILING FALSE
   -- CMAKE_HOST_SYSTEM_PROCESSOR x86_64
   -- CMAKE_SYSTEM_PROCESSOR x86_64
   -- CMAKE_SYSTEM_NAME Linux
   -- CMake version '3.16.3' using generator 'Unix Makefiles'
   -- Looking for a CUDA compiler
   -- Looking for a CUDA compiler - /usr/local/cuda-11.2/bin/nvcc
   CMake Warning at CMakeLists.txt:103 (message):
     CMAKE_CUDA_COMPILER guessed: /usr/local/cuda/bin/nvcc
   
   
   -- The CUDA compiler identification is NVIDIA 11.2.142
   -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
   -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
   -- Detecting CUDA compiler ABI info
   -- Detecting CUDA compiler ABI info - done
   -- Performing Test SUPPORT_CXX11
   -- Performing Test SUPPORT_CXX11 - Success
   -- Performing Test SUPPORT_CXX0X
   -- Performing Test SUPPORT_CXX0X - Success
   -- CMAKE_BUILD_TYPE is unset, defaulting to Release
   -- Intel MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES 
with value `OFF`
   -- Intel MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with 
value `OFF`
   -- Intel MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to 
MKLDNN_ENABLE_JIT_PROFILING with value `OFF`
   -- Intel MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with 
value `STATIC`
   -- Intel MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS 
with value ``
   -- Looking for pthread.h
   -- Looking for pthread.h - found
   -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
   -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
   -- Looking for pthread_create in pthreads
   -- Looking for pthread_create in pthreads - not found
   -- Looking for pthread_create in pthread
   -- Looking for pthread_create in pthread - found
   -- Found Threads: TRUE  
   -- Found OpenMP_C: -fopenmp (found version "4.5") 
   -- Found OpenMP_CXX: -fopenmp (found version "4.5") 
   -- Found OpenMP: TRUE (found version "4.5")  
   -- Found Git: /usr/bin/git (found version "2.25.1") 
   -- Primitive cache is enabled
   -- Using intgemm
   -- Compiling with OpenMP
   -- Found OpenMP_C: -fopenmp  
   -- Found OpenMP_CXX: -fopenmp  
   -- Found OpenMP: TRUE   
   -- Found OpenBLAS libraries: /usr/lib/x86_64-linux-gnu/libopenblas.so
   -- Found OpenBLAS include: /usr/include/x86_64-linux-gnu
   -- Found OpenCV: /usr (found version "4.2.0") found components: core highgui 
imgproc imgcodecs 
   -- OpenCV 4.2.0 found (/usr/lib/x86_64-linux-gnu/cmake/opencv4)
   --  OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs
   USE_LAPACK is ON
   CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
     VERSION keyword not followed by a value or was followed by a value that
     expanded to nothing.
   
   
   -- Found PythonInterp: /usr/bin/python (found version "2.7.18") 
   -- Found GTest: gtest  
   -- Could NOT find CUDNN (missing: CUDNN_LIBRARY CUDNN_INCLUDE) 
   -- Found OpenMP_C: -fopenmp  
   -- Found OpenMP_CXX: -fopenmp  
   -- Looking for clock_gettime in rt
   -- Looking for clock_gettime in rt - found
   -- Looking for fopen64
   -- Looking for fopen64 - not found
   -- Looking for C++ include cxxabi.h
   -- Looking for C++ include cxxabi.h - found
   -- Looking for nanosleep
   -- Looking for nanosleep - found
   -- Looking for backtrace
   -- Looking for backtrace - found
   -- backtrace facility detected in default set of libraries
   -- Found Backtrace: /usr/include  
   -- Check if the system is big endian
   -- Searching 16 bit integer
   -- Looking for sys/types.h
   -- Looking for sys/types.h - found
   -- Looking for stdint.h
   -- Looking for stdint.h - found
   -- Looking for stddef.h
   -- Looking for stddef.h - found
   -- Check size of unsigned short
   -- Check size of unsigned short - done
   -- Using unsigned short
   -- Check if the system is big endian - little endian
   -- 
/home/f.golovin/mxnet1.8/mxnet1.8/3rdparty/dmlc-core/cmake/build_config.h.in -> 
include/dmlc/build_config.h
   -- Performing Test SUPPORT_MSSE2
   -- Performing Test SUPPORT_MSSE2 - Success
   -- Autodetected CUDA architecture(s):  6.1 6.1
   -- CUDA: Using the following NVCC architecture flags 
-gencode;arch=compute_61,code=sm_61
   -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.2.142") 
   -- Could NOT find NCCL (missing: NCCL_INCLUDE_DIRS NCCL_LIBRARIES) 
   CMake Warning at CMakeLists.txt:645 (message):
     Could not find NCCL libraries
   
   
   -- Performing Test SUPPORT_MSSE3
   -- Performing Test SUPPORT_MSSE3 - Success
   -- Determining F16C support
   -- Performing Test COMPILER_SUPPORT_MF16C
   -- Performing Test COMPILER_SUPPORT_MF16C - Success
   -- Using 64-bit integer for tensor size
   -- CUDA: Adding NVCC options: --fatbin-options --compress-all
   -- Configuring done
   You have changed variables that require your cache to be deleted.
   Configure will be re-run and you may have to reset some variables.
   The following variables have changed:
   CMAKE_CUDA_COMPILER= /usr/local/cuda-11.2/bin/nvcc
   CMAKE_CUDA_COMPILER= /usr/local/cuda-11.2/bin/nvcc
   
   -- The C compiler identification is GNU 9.3.0
   -- The CXX compiler identification is GNU 9.3.0
   -- Check for working C compiler: /usr/bin/cc
   -- Check for working C compiler: /usr/bin/cc -- works
   -- Detecting C compiler ABI info
   -- Detecting C compiler ABI info - done
   -- Detecting C compile features
   -- Detecting C compile features - done
   -- Check for working CXX compiler: /usr/bin/c++
   -- Check for working CXX compiler: /usr/bin/c++ -- works
   -- Detecting CXX compiler ABI info
   -- Detecting CXX compiler ABI info - done
   -- Detecting CXX compile features
   -- Detecting CXX compile features - done
   -- CMAKE_CROSSCOMPILING FALSE
   -- CMAKE_HOST_SYSTEM_PROCESSOR x86_64
   -- CMAKE_SYSTEM_PROCESSOR x86_64
   -- CMAKE_SYSTEM_NAME Linux
   -- CMake version '3.16.3' using generator 'Unix Makefiles'
   CMake Warning at CMakeLists.txt:103 (message):
     CMAKE_CUDA_COMPILER guessed: /usr/local/cuda/bin/nvcc
   
   
   -- The CUDA compiler identification is NVIDIA 11.2.142
   -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
   -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
   -- Detecting CUDA compiler ABI info
   -- Detecting CUDA compiler ABI info - done
   -- Performing Test SUPPORT_CXX11
   -- Performing Test SUPPORT_CXX11 - Success
   -- Performing Test SUPPORT_CXX0X
   -- Performing Test SUPPORT_CXX0X - Success
   -- CMAKE_BUILD_TYPE is unset, defaulting to Release
   -- Intel MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES 
with value `OFF`
   -- Intel MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with 
value `OFF`
   -- Intel MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to 
MKLDNN_ENABLE_JIT_PROFILING with value `OFF`
   -- Intel MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with 
value `STATIC`
   -- Intel MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS 
with value ``
   -- Looking for pthread.h
   -- Looking for pthread.h - found
   -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
   -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
   -- Looking for pthread_create in pthreads
   -- Looking for pthread_create in pthreads - not found
   -- Looking for pthread_create in pthread
   -- Looking for pthread_create in pthread - found
   -- Found Threads: TRUE  
   -- Found OpenMP_C: -fopenmp (found version "4.5") 
   -- Found OpenMP_CXX: -fopenmp (found version "4.5") 
   -- Found OpenMP: TRUE (found version "4.5")  
   -- Found Git: /usr/bin/git (found version "2.25.1") 
   -- Primitive cache is enabled
   -- Using intgemm
   -- Compiling with OpenMP
   -- Found OpenMP_C: -fopenmp  
   -- Found OpenMP_CXX: -fopenmp  
   -- Found OpenMP: TRUE   
   -- Found OpenBLAS libraries: /usr/lib/x86_64-linux-gnu/libopenblas.so
   -- Found OpenBLAS include: /usr/include/x86_64-linux-gnu
   -- Found OpenCV: /usr (found version "4.2.0") found components: core highgui 
imgproc imgcodecs 
   -- OpenCV 4.2.0 found (/usr/lib/x86_64-linux-gnu/cmake/opencv4)
   --  OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs
   USE_LAPACK is ON
   CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
     VERSION keyword not followed by a value or was followed by a value that
     expanded to nothing.
   
   
   -- Found PythonInterp: /usr/bin/python (found version "2.7.18") 
   -- Found GTest: gtest  
   -- Could NOT find CUDNN (missing: CUDNN_LIBRARY CUDNN_INCLUDE) 
   -- Found OpenMP_C: -fopenmp  
   -- Found OpenMP_CXX: -fopenmp  
   -- Looking for clock_gettime in rt
   -- Looking for clock_gettime in rt - found
   -- Looking for fopen64
   -- Looking for fopen64 - not found
   -- Looking for C++ include cxxabi.h
   -- Looking for C++ include cxxabi.h - found
   -- Looking for nanosleep
   -- Looking for nanosleep - found
   -- Looking for backtrace
   -- Looking for backtrace - found
   -- backtrace facility detected in default set of libraries
   -- Found Backtrace: /usr/include  
   -- Check if the system is big endian
   -- Searching 16 bit integer
   -- Looking for sys/types.h
   ^C
   </details>
   
   ## nvcc paths:
   In output above all paths for nvcc are valid
   
   ```
   ...
   CMake Warning at CMakeLists.txt:103 (message):
     CMAKE_CUDA_COMPILER guessed: /usr/local/cuda/bin/nvcc
   ...
   -- The CUDA compiler identification is NVIDIA 11.2.142
   -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
   -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
   ...
   You have changed variables that require your cache to be deleted.
   Configure will be re-run and you may have to reset some variables.
   The following variables have changed:
   CMAKE_CUDA_COMPILER= /usr/local/cuda-11.2/bin/nvcc
   CMAKE_CUDA_COMPILER= /usr/local/cuda-11.2/bin/nvcc
   ```
   
   I checked this paths:
   ```
   user@ml-dev:~/mxnet1.8/mxnet1.8/build$ ll  /usr/local/cuda-11.2/bin/nvcc
   -rwxr-xr-x 1 root root 5473704 фев 28 17:25 /usr/local/cuda-11.2/bin/nvcc*
   user@ml-dev:~/mxnet1.8/mxnet1.8/build$ ll /usr/local/cuda/bin/nvcc
   -rwxr-xr-x 1 root root 5473704 фев 28 17:25 /usr/local/cuda/bin/nvcc*
   ```
   
   ## What have you tried to solve it?
   
   1. Checked path for nvcc as said in this issue: 
https://github.com/apache/incubator-mxnet/issues/17761
   
   ## Environment
   
   ***We recommend using our script for collecting the diagnostic information 
with the following command***
   `curl --retry 10 -s 
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
 | python3`
   
   <details>
   <summary>Environment Information</summary>
   ----------Python Info----------
   Version      : 3.8.5
   Compiler     : GCC 9.3.0
   Build        : ('default', 'Jan 27 2021 15:41:15')
   Arch         : ('64bit', 'ELF')
   ------------Pip Info-----------
   Version      : 20.0.2
   Directory    : /usr/lib/python3/dist-packages/pip
   ----------MXNet Info-----------
   Version      : 2.0.0
   Directory    : 
/home/user/.local/lib/python3.8/site-packages/mxnet-2.0.0-py3.8.egg/mxnet
   Commit hash file 
"/home/user/.local/lib/python3.8/site-packages/mxnet-2.0.0-py3.8.egg/mxnet/COMMIT_HASH"
 not found. Not installed from pre-built package or built from source.
   Library      : 
['/home/user/.local/lib/python3.8/site-packages/mxnet-2.0.0-py3.8.egg/mxnet/libmxnet.so']
   Build features:
   ✔ CUDA
   ✖ CUDNN
   ✖ NCCL
   ✖ TENSORRT
   ✖ CUTENSOR
   ✔ CPU_SSE
   ✔ CPU_SSE2
   ✔ CPU_SSE3
   ✔ CPU_SSE4_1
   ✔ CPU_SSE4_2
   ✖ CPU_SSE4A
   ✔ CPU_AVX
   ✖ CPU_AVX2
   ✔ OPENMP
   ✖ SSE
   ✔ F16C
   ✖ JEMALLOC
   ✔ BLAS_OPEN
   ✖ BLAS_ATLAS
   ✖ BLAS_MKL
   ✖ BLAS_APPLE
   ✔ LAPACK
   ✔ MKLDNN
   ✔ OPENCV
   ✖ DIST_KVSTORE
   ✔ INT64_TENSOR_SIZE
   ✔ SIGNAL_HANDLER
   ✖ DEBUG
   ✖ TVM_OP
   ----------System Info----------
   Platform     : Linux-5.8.0-44-generic-x86_64-with-glibc2.29
   system       : Linux
   node         : ml-dev
   release      : 5.8.0-44-generic
   version      : #50~20.04.1-Ubuntu SMP Wed Feb 10 21:07:30 UTC 2021
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:                    x86_64
   CPU op-mode(s):                  32-bit, 64-bit
   Byte Order:                      Little Endian
   Address sizes:                   39 bits physical, 48 bits virtual
   CPU(s):                          12
   On-line CPU(s) list:             0-11
   Thread(s) per core:              2
   Core(s) per socket:              6
   Socket(s):                       1
   NUMA node(s):                    1
   Vendor ID:                       GenuineIntel
   CPU family:                      6
   Model:                           158
   Model name:                      Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
   Stepping:                        10
   CPU MHz:                         4327.154
   CPU max MHz:                     4700,0000
   CPU min MHz:                     800,0000
   BogoMIPS:                        7399.70
   Virtualization:                  VT-x
   L1d cache:                       192 KiB
   L1i cache:                       192 KiB
   L2 cache:                        1,5 MiB
   L3 cache:                        12 MiB
   NUMA node0 CPU(s):               0-11
   Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
   Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional 
cache flushes, SMT vulnerable
   Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT 
vulnerable
   Vulnerability Meltdown:          Mitigation; PTI
   Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass 
disabled via prctl and seccomp
   Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and 
__user pointer sanitization
   Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB 
conditional, IBRS_FW, STIBP conditional, RSB filling
   Vulnerability Srbds:             Mitigation; Microcode
   Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT 
vulnerable
   Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep 
mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 
                                    ss ht tm pbe syscall nx pdpe1gb rdtscp lm 
constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology no
                                    nstop_tsc cpuid aperfmperf pni pclmulqdq 
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdc
                                    m pcid sse4_1 sse4_2 x2apic movbe popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowpref
                                    etch cpuid_fault invpcid_single pti ssbd 
ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsb
                                    ase tsc_adjust bmi1 hle avx2 smep bmi2 erms 
invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xs
                                    avec xgetbv1 xsaves dtherm ida arat pln pts 
hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0024 
sec, LOAD: 0.4952 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.2280 sec, LOAD: 
0.5484 sec.
   Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: 
CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired 
(_ssl.c:1123)>, DNS finished in 1.8177919387817383 sec.
   Timing for FashionMNIST: 
https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
 DNS: 0.0738 sec, LOAD: 0.8640 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0356 sec, LOAD: 
0.7704 sec.
   Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: 
Forbidden, DNS finished in 0.11126899719238281 sec.
   ----------Environment----------
   </details>
   
   More env info from me:
   
   <details>
   <summary>More Environment Information</summary>
   user@ml-dev:~/mxnet1.8/mxnet1.8/build$ nvcc -V
   nvcc: NVIDIA (R) Cuda compiler driver
   Copyright (c) 2005-2021 NVIDIA Corporation
   Built on Thu_Jan_28_19:32:09_PST_2021
   Cuda compilation tools, release 11.2, V11.2.142
   Build cuda_11.2.r11.2/compiler.29558016_0
   user@ml-dev:~/mxnet1.8/mxnet1.8/build$ uname -a
   Linux ml-dev 5.8.0-44-generic #50~20.04.1-Ubuntu SMP Wed Feb 10 21:07:30 UTC 
2021 x86_64 x86_64 x86_64 GNU/Linux
   user@ml-dev:~/mxnet1.8/mxnet1.8/build$ lsb_release -a
   No LSB modules are available.
   Distributor ID:      Ubuntu
   Description: Ubuntu 20.04.2 LTS
   Release:     20.04
   Codename:    focal
   </details>
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-mxnet] lgg opened a new issue #19991: Infinity loop with cmake from 1.8.0 release and v1.8.x branch

Reply via email to