maybeLee opened a new issue #20416:
URL: https://github.com/apache/incubator-mxnet/issues/20416


   ## Description
   (A clear and concise description of what the bug is.)
   I was using one derivation of the AlexNet model to do model inference using 
MXNet as the Keras's backend. I used the `keras-mxnet` library to equip Keras 
with the MXNet backend. As a result, MXNet can load the compile this library 
but report an error while doing model inference. However, when I tried using 
other three frameworks: TensorFlow, Theano, and CNTK as Keras's backend, I 
found that these three frameworks can perform the model inference without crash.
   ### Error Message
   (Paste the complete error message. Please also include stack trace by 
setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=100` before running 
your script.)
   ```
   Traceback (most recent call last):
     File "get_predictions.py", line 41, in <module>
       pred = model.predict(x, batch_size = 16)
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training.py",
 line 1184, in predict
       steps=steps)
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training_arrays.py",
 line 295, in predict_loop
       batch_outs = f(ins_batch)
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
 line 5645, in predict_function
       data, label, _, data_shapes, label_shapes = self._adjust_module(inputs, 
'pred')
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
 line 5525, in _adjust_module
       self._set_weights()
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
 line 5573, in _set_weights
       allow_missing=True)
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/module/bucketing_module.py",
 line 220, in set_params
       force_init=force_init, allow_extra=allow_extra)
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/module/module.py",
 line 358, in set_params
       self._exec_group.set_params(arg_params, aux_params, 
allow_extra=allow_extra)
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/module/executor_group.py",
 line 422, in set_params
       exec_.copy_params_from(arg_params, aux_params, 
allow_extra_params=allow_extra)
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/executor.py",
 line 367, in copy_params_from
       array.astype(dst.dtype).copyto(dst)
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py",
 line 2663, in copyto
       return _internal._copyto(self, out=other)
     File "<string>", line 27, in _copyto
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py",
 line 91, in _imperative_invoke
       ctypes.byref(out_stypes)))
     File 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/base.py",
 line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     [bt] (7) 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x62)
 [0x7f74bc87d6b2]
     [bt] (6) 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(MXImperativeInvokeImpl(void*,
 int, void**, int*, void***, int, char const**, char const**)+0x4d7) 
[0x7f74bc87d127]
     [bt] (5) 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context
 const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&)+0x1c3) [0x7f74bc9da5d3]
     [bt] (4) 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context
 const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*)+0xa1c) 
[0x7f74bc9e63bc]
     [bt] (3) 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(bool
 mxnet::op::ElemwiseShape<1, 1>(nnvm::NodeAttrs const&, 
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, 
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*)+0x219) 
[0x7f74bcb94889]
     [bt] (2) 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(bool
 mxnet::op::ElemwiseAttrHelper<mxnet::TShape, &mxnet::op::shape_is_none, 
&mxnet::op::shape_assign, true, &mxnet::op::shape_string, -1, -1>(std::string 
const&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, 
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, mxnet::TShape 
const&)+0x1fc) [0x7f74bcb944bc]
     [bt] (1) 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::op::ElemwiseAttrHelper<mxnet::TShape,
 &mxnet::op::shape_is_none, &mxnet::op::shape_assign, true, 
&mxnet::op::shape_string, -1, -1>(std::string const&, 
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, 
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, mxnet::TShape 
const&)::{lambda(std::vector<mxnet::TShape, std::allocator<mxnet::TShape> > 
const&, unsigned long, char const*)#1}::operator()(std::vector<mxnet::TShape, 
std::allocator<mxnet::TShape> > const&, unsigned long, char const*) 
const+0x37a) [0x7f74bcb93c9a]
     [bt] (0) 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f)
 [0x7f74bc7fb35f]
     File "../src/ndarray/./../operator/tensor/../elemwise_op_common.h", line 
135
   MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in 
node  at 0-th output: expected [256], got [1]
   ```
   The model information is as follows:
   ```
   Model: "sequential_1"
   _________________________________________________________________
   Layer (type)                 Output Shape              Param #
   =================================================================
   conv2d_1 (Conv2D)            (None, 16, 16, 96)        2688
   _________________________________________________________________
   max_pooling2d_1 (MaxPooling2 (None, 8, 8, 96)          0
   _________________________________________________________________
   conv2d_2 (Conv2D)            (None, 8, 8, 256)         614656
   _________________________________________________________________
   max_pooling2d_2 (MaxPooling2 (None, 3, 3, 256)         0
   _________________________________________________________________
   batch_normalization_2 (Batch (None, 3, 3, 256)         1024
   _________________________________________________________________
   conv2d_3 (Conv2D)             (None, 3, 3, 384)         885120
   _________________________________________________________________
   conv2d_4 (Conv2D)            (None, 3, 3, 384)         1327488
   _________________________________________________________________
   conv2d_4 (Conv2D)            (None, 3, 3, 384)         1327488
   _________________________________________________________________
   conv2d_5 (Conv2D)            (None, 3, 3, 256)         884992
   _________________________________________________________________
   softmax_1 (Softmax)          (None, 3, 3, 256)         0
   _________________________________________________________________
   max_pooling2d_3 (MaxPooling2 (None, 1, 1, 256)         0
   _________________________________________________________________
   batch_normalization_3 (Batch (None, 1, 1, 256)         1024
   _________________________________________________________________
   flatten_1 (Flatten)          (None, 256)               0
   _________________________________________________________________
   dense_1 (Dense)              (None, 4096)              1052672
   _________________________________________________________________
   dropout_1 (Dropout)          (None, 4096)              0
   _________________________________________________________________
   dense_2 (Dense)              (None, 4096)              16781312
   _________________________________________________________________
   dropout_2 (Dropout)          (None, 4096)              0
   _________________________________________________________________
   dense_3 (Dense)              (None, 10)                40970
   =================================================================
   Total params: 22,919,434
   Trainable params: 22,918,410
   Non-trainable params: 1,024
   _________________________________________________________________
   ```
   
   ## To Reproduce
   (If you developed your own code, please provide a short script that 
reproduces the error. For existing examples, please provide link.)
   You can also access the code in the below zip file.
   ```
   import argparse
   import os
   import sys
   import warnings
   parse = argparse.ArgumentParser()
   parse.add_argument("-bk", type=str, help="the name of backend")
   flags, _ = parse.parse_known_args(sys.argv[1:])
   bk = flags.bk
   os.environ["KERAS_BACKEND"] = bk
   
   # additional flag to omit some runtime log
   os.environ["DMLC_LOG_STACK_TRACE_DEPTH"]="100"
   os.environ["MXNET_SUBGRAPH_VERBOSE"]="0"
   os.environ["MXNET_CUDNN_AUTOTUNE_DEFAULT"]="0"
   os.environ["MXNET_CUDNN_LIB_CHECKING"]="0"
   
   import keras
   
   def custom_objects():
   
       def no_activation(x):
           return x
   
       def leakyrelu(x):
           import keras.backend as K
           return K.relu(x, alpha=0.01)
   
       objects = {}
       objects['no_activation'] = no_activation
       objects['leakyrelu'] = leakyrelu
       return objects
   
   model_name = "model.h5"
   warnings.filterwarnings("ignore")
   print("start loading model")
   model = keras.models.load_model(model_name)
   _, (x, _) = keras.datasets.cifar10.load_data()
   x = x.astype('float32')/255.0
   x = x.reshape(x.shape[0], 32, 32, 3)
   x = x[:1]
   pred = model.predict(x, batch_size = 16)
   print("Successfully get the prediction")
   print(pred)
   ```
   
   ### Steps to reproduce
   The step to reproduce is very simple:
   - Please get the customized model and the reproduced scripts from the zip 
file 
[here](https://drive.google.com/drive/folders/1ANXIAboCgBYm-i3ofLQENeCCeODIHFyh?usp=sharing)
   - Go to the folder, install the following requirements in your python 
environment:
   ```
   mxnet-cu101==1.8.0.post0
   keras-mxnet==2.2.4.3
   h5py==2.10.0
   ```
   - Run the following commands
   
   1. `python get_predictions.py -bk=mxnet`
   2. `python get_predictions.py -bk=cntk`
   
   command 1 will lead to a direct crash while command 2 can successfully get 
the prediction output of the model
   
   ## What have you tried to solve it?
   
   1.
   2.
   
   ## Environment
   
   ***We recommend using our script for collecting the diagnostic information 
with the following command***
   `curl --retry 10 -s 
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
 | python3`
   
   <details>
   <summary>Environment Information</summary>
   
   ```
   ----------Python Info----------
   Version      : 3.6.13
   Compiler     : GCC 7.5.0
   Build        : ('default', 'Jun  4 2021 14:25:59')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 21.1.3
   Directory    : 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.8.0
   Directory    : 
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet
   Commit hash file 
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/COMMIT_HASH"
 not found. Not installed from pre-built package or built from source.
   Library      : 
['/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so']
   Build features:
   ✔ CUDA
   ✔ CUDNN
   ✔ NCCL
   ✔ CUDA_RTC
   ✖ TENSORRT
   ✔ CPU_SSE
   ✔ CPU_SSE2
   ✔ CPU_SSE3
   ✖ CPU_SSE4_1
   ✖ CPU_SSE4_2
   ✖ CPU_SSE4A
   ✖ CPU_AVX
   ✖ CPU_AVX2
   ✔ OPENMP
   ✖ SSE
   ✖ F16C
   ✖ JEMALLOC
   ✔ BLAS_OPEN
   ✖ BLAS_ATLAS
   ✖ BLAS_MKL
   ✖ BLAS_APPLE
   ✔ LAPACK
   ✔ MKLDNN
   ✔ OPENCV
   ✖ CAFFE
   ✖ PROFILER
   ✔ DIST_KVSTORE
   ✖ CXX14
   ✖ INT64_TENSOR_SIZE
   ✔ SIGNAL_HANDLER
   ✖ DEBUG
   ✖ TVM_OP
   ----------System Info----------
   Platform     : Linux-4.18.0-310.el8.x86_64-x86_64-with-centos-8
   system       : Linux
   release      : 4.18.0-310.el8.x86_64
   version      : #1 SMP Tue Jun 8 00:24:50 UTC 2021
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:        x86_64
   CPU op-mode(s):      32-bit, 64-bit
   Byte Order:          Little Endian
   CPU(s):              36
   On-line CPU(s) list: 0-35
   Thread(s) per core:  2
   Core(s) per socket:  18
   Socket(s):           1
   NUMA node(s):        1
   Vendor ID:           GenuineIntel
   CPU family:          6
   Model:               85
   Model name:          Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
   Stepping:            7
   CPU MHz:             3613.195
   CPU max MHz:         4800.0000
   CPU min MHz:         1200.0000
   BogoMIPS:            6000.00
   Virtualization:      VT-x
   L1d cache:           32K
   L1i cache:           32K
   L2 cache:            1024K
   L3 cache:            25344K
   NUMA node0 CPU(s):   0-35
   Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx 
est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch 
cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp 
ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust 
bmi1 hle avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx 
smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec 
xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida 
arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512_vnni md_clear 
flush_l1d arch_capabilities
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org

Reply via email to