leezu opened a new issue #16399: ndarray.load crashes MXNet GPU builds on CPU machines URL: https://github.com/apache/incubator-mxnet/issues/16399 ## Description For some parameter files below crash happens. It does not happen for all parameter files. ## Environment info (Required) ``` ----------Python Info---------- Version : 3.7.3 Compiler : GCC 5.4.0 20160609 Build : ('default', 'Jun 13 2019 13:24:27') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 19.0.3 Directory : /home/ubuntu/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip ----------MXNet Info----------- Version : 1.5.1 Directory : /home/ubuntu/.local/lib/python3.7/site-packages/mxnet Commit Hash : c9818480680f84daa6e281a974ab263691302ba8 Library : ['/home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so'] Build features: ✔ CUDA ✔ CUDNN ✔ NCCL ✔ CUDA_RTC ✖ TENSORRT ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✔ CPU_SSE4_1 ✔ CPU_SSE4_2 ✖ CPU_SSE4A ✔ CPU_AVX ✖ CPU_AVX2 ✖ OPENMP ✖ SSE ✔ F16C ✖ JEMALLOC ✖ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✖ BLAS_APPLE ✔ LAPACK ✖ MKLDNN ✔ OPENCV ✖ CAFFE ✖ PROFILER ✔ DIST_KVSTORE ✖ CXX14 ✖ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ----------System Info---------- Platform : Linux-4.4.0-1095-aws-x86_64-with-debian-stretch-sid system : Linux node : ip-172-31-84-3 release : 4.4.0-1095-aws version : #106-Ubuntu SMP Wed Sep 18 13:33:48 UTC 2019 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz Stepping: 4 CPU MHz: 2499.998 BogoMIPS: 4999.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 33792K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0010 sec, LOAD: 0.3120 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.3964 sec, LOAD: 0.0625 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0938 sec, LOAD: 0.3685 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0266 sec, LOAD: 0.1123 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0007 sec, LOAD: 0.0284 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0054 sec, LOAD: 0.0288 sec. ----------Environment---------- KMP_DUPLICATE_LIB_OK="True" ``` Package used (Python/R/Scala/Julia): (I'm using ...) ## Error Message: ``` what(): [21:44:06] src/engine/threaded_engine.cc:328: Check failed: device_count_ > 0 (-1 vs. 0) : GPU usage requires at least 1 GPU Stack trace: [bt] (0) /home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x4b04cb) [0x7fab268524cb] [bt] (1) /home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x25b50ec) [0x7fab289570ec] [bt] (2) /home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x25b5cd4) [0x7fab28957cd4] [bt] (3) /home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Chunk::~Chunk()+0x3c2) [0x7fab28b57022] [bt] (4) /home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::~NDArray()+0x9a) [0x7fab2698422a] [bt] (5) /home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Copy(mxnet::Context) const+0x2cd) [0x7fab28b7651d] [bt] (6) /home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Load(dmlc::Stream*)+0xb41) [0x7fab28b78211] [bt] (7) /home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Load(dmlc::Stream*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<std::string, std::allocator<std::string> >*)+0xc51) [0x7fab28b78eb1] [bt] (8) /home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so(MXNDArrayLoad+0x263) [0x7fab288cf613] ``` ## Minimum reproducible example ``` bash wget http://apache-mxnet.s3.amazonaws.com/gluon/models/awd_lstm_lm_1150_wikitext-2-f9562ed0.zip unzip awd_lstm_lm_1150_wikitext-2-f9562ed0.zip python3 -c 'import mxnet as mx; mx.nd.load("awd_lstm_lm_1150_wikitext-2-f9562ed0.params")' ``` ## Steps to reproduce - `pip install mxnet-cu100` on a CPU machine with cuda installed - Run above example
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services