maybeLee opened a new issue #20416: URL: https://github.com/apache/incubator-mxnet/issues/20416
## Description (A clear and concise description of what the bug is.) I was using one derivation of the AlexNet model to do model inference using MXNet as the Keras's backend. I used the `keras-mxnet` library to equip Keras with the MXNet backend. As a result, MXNet can load the compile this library but report an error while doing model inference. However, when I tried using other three frameworks: TensorFlow, Theano, and CNTK as Keras's backend, I found that these three frameworks can perform the model inference without crash. ### Error Message (Paste the complete error message. Please also include stack trace by setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=100` before running your script.) ``` Traceback (most recent call last): File "get_predictions.py", line 41, in <module> pred = model.predict(x, batch_size = 16) File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training.py", line 1184, in predict steps=steps) File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 295, in predict_loop batch_outs = f(ins_batch) File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 5645, in predict_function data, label, _, data_shapes, label_shapes = self._adjust_module(inputs, 'pred') File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 5525, in _adjust_module self._set_weights() File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 5573, in _set_weights allow_missing=True) File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/module/bucketing_module.py", line 220, in set_params force_init=force_init, allow_extra=allow_extra) File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/module/module.py", line 358, in set_params self._exec_group.set_params(arg_params, aux_params, allow_extra=allow_extra) File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 422, in set_params exec_.copy_params_from(arg_params, aux_params, allow_extra_params=allow_extra) File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/executor.py", line 367, in copy_params_from array.astype(dst.dtype).copyto(dst) File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 2663, in copyto return _internal._copyto(self, out=other) File "<string>", line 27, in _copyto File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 91, in _imperative_invoke ctypes.byref(out_stypes))) File "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/base.py", line 246, in check_call raise get_last_ffi_error() mxnet.base.MXNetError: Traceback (most recent call last): [bt] (7) /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x62) [0x7f74bc87d6b2] [bt] (6) /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(MXImperativeInvokeImpl(void*, int, void**, int*, void***, int, char const**, char const**)+0x4d7) [0x7f74bc87d127] [bt] (5) /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0x1c3) [0x7f74bc9da5d3] [bt] (4) /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*)+0xa1c) [0x7f74bc9e63bc] [bt] (3) /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(bool mxnet::op::ElemwiseShape<1, 1>(nnvm::NodeAttrs const&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*)+0x219) [0x7f74bcb94889] [bt] (2) /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(bool mxnet::op::ElemwiseAttrHelper<mxnet::TShape, &mxnet::op::shape_is_none, &mxnet::op::shape_assign, true, &mxnet::op::shape_string, -1, -1>(std::string const&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, mxnet::TShape const&)+0x1fc) [0x7f74bcb944bc] [bt] (1) /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::op::ElemwiseAttrHelper<mxnet::TShape, &mxnet::op::shape_is_none, &mxnet::op::shape_assign, true, &mxnet::op::shape_string, -1, -1>(std::string const&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, mxnet::TShape const&)::{lambda(std::vector<mxnet::TShape, std::allocator<mxnet::TShape> > const&, unsigned long, char const*)#1}::operator()(std::vector<mxnet::TShape, std::allocator<mxnet::TShape> > const&, unsigned long, char const*) const+0x37a) [0x7f74bcb93c9a] [bt] (0) /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f74bc7fb35f] File "../src/ndarray/./../operator/tensor/../elemwise_op_common.h", line 135 MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node at 0-th output: expected [256], got [1] ``` The model information is as follows: ``` Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 16, 16, 96) 2688 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 8, 8, 96) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 8, 8, 256) 614656 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 3, 3, 256) 0 _________________________________________________________________ batch_normalization_2 (Batch (None, 3, 3, 256) 1024 _________________________________________________________________ conv2d_3 (Conv2D) (None, 3, 3, 384) 885120 _________________________________________________________________ conv2d_4 (Conv2D) (None, 3, 3, 384) 1327488 _________________________________________________________________ conv2d_4 (Conv2D) (None, 3, 3, 384) 1327488 _________________________________________________________________ conv2d_5 (Conv2D) (None, 3, 3, 256) 884992 _________________________________________________________________ softmax_1 (Softmax) (None, 3, 3, 256) 0 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 1, 1, 256) 0 _________________________________________________________________ batch_normalization_3 (Batch (None, 1, 1, 256) 1024 _________________________________________________________________ flatten_1 (Flatten) (None, 256) 0 _________________________________________________________________ dense_1 (Dense) (None, 4096) 1052672 _________________________________________________________________ dropout_1 (Dropout) (None, 4096) 0 _________________________________________________________________ dense_2 (Dense) (None, 4096) 16781312 _________________________________________________________________ dropout_2 (Dropout) (None, 4096) 0 _________________________________________________________________ dense_3 (Dense) (None, 10) 40970 ================================================================= Total params: 22,919,434 Trainable params: 22,918,410 Non-trainable params: 1,024 _________________________________________________________________ ``` ## To Reproduce (If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.) You can also access the code in the below zip file. ``` import argparse import os import sys import warnings parse = argparse.ArgumentParser() parse.add_argument("-bk", type=str, help="the name of backend") flags, _ = parse.parse_known_args(sys.argv[1:]) bk = flags.bk os.environ["KERAS_BACKEND"] = bk # additional flag to omit some runtime log os.environ["DMLC_LOG_STACK_TRACE_DEPTH"]="100" os.environ["MXNET_SUBGRAPH_VERBOSE"]="0" os.environ["MXNET_CUDNN_AUTOTUNE_DEFAULT"]="0" os.environ["MXNET_CUDNN_LIB_CHECKING"]="0" import keras def custom_objects(): def no_activation(x): return x def leakyrelu(x): import keras.backend as K return K.relu(x, alpha=0.01) objects = {} objects['no_activation'] = no_activation objects['leakyrelu'] = leakyrelu return objects model_name = "model.h5" warnings.filterwarnings("ignore") print("start loading model") model = keras.models.load_model(model_name) _, (x, _) = keras.datasets.cifar10.load_data() x = x.astype('float32')/255.0 x = x.reshape(x.shape[0], 32, 32, 3) x = x[:1] pred = model.predict(x, batch_size = 16) print("Successfully get the prediction") print(pred) ``` ### Steps to reproduce The step to reproduce is very simple: - Please get the customized model and the reproduced scripts from the zip file [here](https://drive.google.com/drive/folders/1ANXIAboCgBYm-i3ofLQENeCCeODIHFyh?usp=sharing) - Go to the folder, install the following requirements in your python environment: ``` mxnet-cu101==1.8.0.post0 keras-mxnet==2.2.4.3 h5py==2.10.0 ``` - Run the following commands 1. `python get_predictions.py -bk=mxnet` 2. `python get_predictions.py -bk=cntk` command 1 will lead to a direct crash while command 2 can successfully get the prediction output of the model ## What have you tried to solve it? 1. 2. ## Environment ***We recommend using our script for collecting the diagnostic information with the following command*** `curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3` <details> <summary>Environment Information</summary> ``` ----------Python Info---------- Version : 3.6.13 Compiler : GCC 7.5.0 Build : ('default', 'Jun 4 2021 14:25:59') Arch : ('64bit', '') ------------Pip Info----------- Version : 21.1.3 Directory : /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/pip ----------MXNet Info----------- Version : 1.8.0 Directory : /data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet Commit hash file "/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source. Library : ['/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so'] Build features: ✔ CUDA ✔ CUDNN ✔ NCCL ✔ CUDA_RTC ✖ TENSORRT ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✖ CPU_SSE4_1 ✖ CPU_SSE4_2 ✖ CPU_SSE4A ✖ CPU_AVX ✖ CPU_AVX2 ✔ OPENMP ✖ SSE ✖ F16C ✖ JEMALLOC ✔ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✖ BLAS_APPLE ✔ LAPACK ✔ MKLDNN ✔ OPENCV ✖ CAFFE ✖ PROFILER ✔ DIST_KVSTORE ✖ CXX14 ✖ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ✖ TVM_OP ----------System Info---------- Platform : Linux-4.18.0-310.el8.x86_64-x86_64-with-centos-8 system : Linux release : 4.18.0-310.el8.x86_64 version : #1 SMP Tue Jun 8 00:24:50 UTC 2021 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 36 On-line CPU(s) list: 0-35 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz Stepping: 7 CPU MHz: 3613.195 CPU max MHz: 4800.0000 CPU min MHz: 1200.0000 BogoMIPS: 6000.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 25344K NUMA node0 CPU(s): 0-35 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512_vnni md_clear flush_l1d arch_capabilities ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For additional commands, e-mail: issues-h...@mxnet.apache.org