maybeLee opened a new issue #20416:
URL: https://github.com/apache/incubator-mxnet/issues/20416
## Description
(A clear and concise description of what the bug is.)
I was using one derivation of the AlexNet model to do model inference using
MXNet as the Keras's backend. I used the `keras-mxnet` library to equip Keras
with the MXNet backend. As a result, MXNet can load the compile this library
but report an error while doing model inference. However, when I tried using
other three frameworks: TensorFlow, Theano, and CNTK as Keras's backend, I
found that these three frameworks can perform the model inference without crash.
### Error Message
(Paste the complete error message. Please also include stack trace by
setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=100` before running
your script.)
```
Traceback (most recent call last):
File "get_predictions.py", line 41, in <module>
pred = model.predict(x, batch_size = 16)
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training.py",
line 1184, in predict
steps=steps)
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training_arrays.py",
line 295, in predict_loop
batch_outs = f(ins_batch)
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
line 5645, in predict_function
data, label, _, data_shapes, label_shapes = self._adjust_module(inputs,
'pred')
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
line 5525, in _adjust_module
self._set_weights()
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
line 5573, in _set_weights
allow_missing=True)
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/module/bucketing_module.py",
line 220, in set_params
force_init=force_init, allow_extra=allow_extra)
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/module/module.py",
line 358, in set_params
self._exec_group.set_params(arg_params, aux_params,
allow_extra=allow_extra)
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/module/executor_group.py",
line 422, in set_params
exec_.copy_params_from(arg_params, aux_params,
allow_extra_params=allow_extra)
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/executor.py",
line 367, in copy_params_from
array.astype(dst.dtype).copyto(dst)
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py",
line 2663, in copyto
return _internal._copyto(self, out=other)
File "<string>", line 27, in _copyto
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py",
line 91, in _imperative_invoke
ctypes.byref(out_stypes)))
File
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/base.py",
line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
[bt] (7)
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x62)
[0x7f74bc87d6b2]
[bt] (6)
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(MXImperativeInvokeImpl(void*,
int, void**, int*, void***, int, char const**, char const**)+0x4d7)
[0x7f74bc87d127]
[bt] (5)
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context
const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*,
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*,
std::allocator<mxnet::NDArray*> > const&)+0x1c3) [0x7f74bc9da5d3]
[bt] (4)
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context
const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*,
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*,
std::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*)+0xa1c)
[0x7f74bc9e63bc]
[bt] (3)
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(bool
mxnet::op::ElemwiseShape<1, 1>(nnvm::NodeAttrs const&,
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*,
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*)+0x219)
[0x7f74bcb94889]
[bt] (2)
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(bool
mxnet::op::ElemwiseAttrHelper<mxnet::TShape, &mxnet::op::shape_is_none,
&mxnet::op::shape_assign, true, &mxnet::op::shape_string, -1, -1>(std::string
const&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*,
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, mxnet::TShape
const&)+0x1fc) [0x7f74bcb944bc]
[bt] (1)
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::op::ElemwiseAttrHelper<mxnet::TShape,
&mxnet::op::shape_is_none, &mxnet::op::shape_assign, true,
&mxnet::op::shape_string, -1, -1>(std::string const&,
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*,
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, mxnet::TShape
const&)::{lambda(std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >
const&, unsigned long, char const*)#1}::operator()(std::vector<mxnet::TShape,
std::allocator<mxnet::TShape> > const&, unsigned long, char const*)
const+0x37a) [0x7f74bcb93c9a]
[bt] (0)
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f)
[0x7f74bc7fb35f]
File "../src/ndarray/./../operator/tensor/../elemwise_op_common.h", line
135
MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in
node at 0-th output: expected [256], got [1]
```
The model information is as follows:
```
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 16, 16, 96) 2688
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 96) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 8, 8, 256) 614656
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 3, 3, 256) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 3, 3, 256) 1024
_________________________________________________________________
conv2d_3 (Conv2D) (None, 3, 3, 384) 885120
_________________________________________________________________
conv2d_4 (Conv2D) (None, 3, 3, 384) 1327488
_________________________________________________________________
conv2d_4 (Conv2D) (None, 3, 3, 384) 1327488
_________________________________________________________________
conv2d_5 (Conv2D) (None, 3, 3, 256) 884992
_________________________________________________________________
softmax_1 (Softmax) (None, 3, 3, 256) 0
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 1, 1, 256) 0
_________________________________________________________________
batch_normalization_3 (Batch (None, 1, 1, 256) 1024
_________________________________________________________________
flatten_1 (Flatten) (None, 256) 0
_________________________________________________________________
dense_1 (Dense) (None, 4096) 1052672
_________________________________________________________________
dropout_1 (Dropout) (None, 4096) 0
_________________________________________________________________
dense_2 (Dense) (None, 4096) 16781312
_________________________________________________________________
dropout_2 (Dropout) (None, 4096) 0
_________________________________________________________________
dense_3 (Dense) (None, 10) 40970
=================================================================
Total params: 22,919,434
Trainable params: 22,918,410
Non-trainable params: 1,024
_________________________________________________________________
```
## To Reproduce
(If you developed your own code, please provide a short script that
reproduces the error. For existing examples, please provide link.)
You can also access the code in the below zip file.
```
import argparse
import os
import sys
import warnings
parse = argparse.ArgumentParser()
parse.add_argument("-bk", type=str, help="the name of backend")
flags, _ = parse.parse_known_args(sys.argv[1:])
bk = flags.bk
os.environ["KERAS_BACKEND"] = bk
# additional flag to omit some runtime log
os.environ["DMLC_LOG_STACK_TRACE_DEPTH"]="100"
os.environ["MXNET_SUBGRAPH_VERBOSE"]="0"
os.environ["MXNET_CUDNN_AUTOTUNE_DEFAULT"]="0"
os.environ["MXNET_CUDNN_LIB_CHECKING"]="0"
import keras
def custom_objects():
def no_activation(x):
return x
def leakyrelu(x):
import keras.backend as K
return K.relu(x, alpha=0.01)
objects = {}
objects['no_activation'] = no_activation
objects['leakyrelu'] = leakyrelu
return objects
model_name = "model.h5"
warnings.filterwarnings("ignore")
print("start loading model")
model = keras.models.load_model(model_name)
_, (x, _) = keras.datasets.cifar10.load_data()
x = x.astype('float32')/255.0
x = x.reshape(x.shape[0], 32, 32, 3)
x = x[:1]
pred = model.predict(x, batch_size = 16)
print("Successfully get the prediction")
print(pred)
```
### Steps to reproduce
The step to reproduce is very simple:
- Please get the customized model and the reproduced scripts from the zip
file
[here](https://drive.google.com/drive/folders/1ANXIAboCgBYm-i3ofLQENeCCeODIHFyh?usp=sharing)
- Go to the folder, install the following requirements in your python
environment:
```
mxnet-cu101==1.8.0.post0
keras-mxnet==2.2.4.3
h5py==2.10.0
```
- Run the following commands
1. `python get_predictions.py -bk=mxnet`
2. `python get_predictions.py -bk=cntk`
command 1 will lead to a direct crash while command 2 can successfully get
the prediction output of the model
## What have you tried to solve it?
1.
2.
## Environment
***We recommend using our script for collecting the diagnostic information
with the following command***
`curl --retry 10 -s
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
| python3`
<details>
<summary>Environment Information</summary>
```
----------Python Info----------
Version : 3.6.13
Compiler : GCC 7.5.0
Build : ('default', 'Jun 4 2021 14:25:59')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 21.1.3
Directory :
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.8.0
Directory :
/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet
Commit hash file
"/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/COMMIT_HASH"
not found. Not installed from pre-built package or built from source.
Library :
['/data/ziniu/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✔ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✖ CPU_SSE4_1
✖ CPU_SSE4_2
✖ CPU_SSE4A
✖ CPU_AVX
✖ CPU_AVX2
✔ OPENMP
✖ SSE
✖ F16C
✖ JEMALLOC
✔ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✔ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
✖ TVM_OP
----------System Info----------
Platform : Linux-4.18.0-310.el8.x86_64-x86_64-with-centos-8
system : Linux
release : 4.18.0-310.el8.x86_64
version : #1 SMP Tue Jun 8 00:24:50 UTC 2021
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 36
On-line CPU(s) list: 0-35
Thread(s) per core: 2
Core(s) per socket: 18
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
Stepping: 7
CPU MHz: 3613.195
CPU max MHz: 4800.0000
CPU min MHz: 1200.0000
BogoMIPS: 6000.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-35
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx
est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp
ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
bmi1 hle avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx
smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec
xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida
arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512_vnni md_clear
flush_l1d arch_capabilities
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]