maybeLee edited a comment on issue #20416:
URL:
https://github.com/apache/incubator-mxnet/issues/20416#issuecomment-877246558
Hi, to make this bug easier to reproduce and understand. I simplify the
triggering model into a very easy three-layer randomly generated model.
The model includes three layers: 1 softmax, 1 max pooling, and 1 batch
normalization. I create such models using keras and randomly generate weights
for batch normalization layer.
You can reproduce the bug by **just running the following program** with
mxnet version 1.8.0:
```
import os
import argparse
import sys
import warnings
parse = argparse.ArgumentParser()
parse.add_argument("--bk", type=str,default="mxnet", help="the name of
backend")
flags, _ = parse.parse_known_args(sys.argv[1:])
os.environ["KERAS_BACKEND"]=flags.bk
import keras
from keras import initializers, layers
import numpy as np
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=UserWarning)
model_1 = keras.models.Sequential()
model_1.add(layers.Softmax())
model_1.add(layers.MaxPooling2D())
model_1.add(layers.BatchNormalization())
x = np.random.rand(1,3,3,256)
pred = model_1.predict(x)
print(pred)
```
By running the following command (I assume you save the above toy program as
`try.py`:
- `python try --bk mxnet`
You will meet a crash with the same symptom as I mentioned before:
```
Traceback (most recent call last):
File "try.py", line 23, in <module>
pred = model_1.predict(x)
File
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training.py",
line 1184, in predict
steps=steps)
File
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training_arrays.py",
line 295, in predict_loop
batch_outs = f(ins_batch)
File
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
line 5645, in predict_function
data, label, _, data_shapes, label_shapes = self._adjust_module(inputs,
'pred')
File
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
line 5525, in _adjust_module
self._set_weights()
File
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
line 5573, in _set_weights
allow_missing=True)
File "/mxnet/incubator-mxnet/python/mxnet/module/bucketing_module.py",
line 220, in set_params
force_init=force_init, allow_extra=allow_extra)
File "/mxnet/incubator-mxnet/python/mxnet/module/module.py", line 358, in
set_params
self._exec_group.set_params(arg_params, aux_params,
allow_extra=allow_extra)
File "/mxnet/incubator-mxnet/python/mxnet/module/executor_group.py", line
422, in set_params
exec_.copy_params_from(arg_params, aux_params,
allow_extra_params=allow_extra)
File "/mxnet/incubator-mxnet/python/mxnet/executor.py", line 367, in
copy_params_from
array.astype(dst.dtype).copyto(dst)
File "/mxnet/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 2663,
in copyto
return _internal._copyto(self, out=other)
File "<string>", line 27, in _copyto
File "/mxnet/incubator-mxnet/python/mxnet/_ctypes/ndarray.py", line 91, in
_imperative_invoke
ctypes.byref(out_stypes)))
File "/mxnet/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
File "src/operator/numpy/linalg/./../../tensor/../elemwise_op_common.h",
line 135
MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in
node at 0-th output: expected [256], got [1]
```
But if you run such a program using CNTK as the backend (`python try.py --bk
cntk`), everything works fine.
One interesting thing is that: If I delete either `softmax layer`, `batch
normalization` or `max pooling layer`, no crash will happen.
Further, I tried some investigations and guess this is a bug caused by the
wrong shape inference of mxnet.
When I change the shape of input to `x=np.random.rand(1, 3, 3, 1)` or `x =
np.random.rand(1, 8, 8, 4)` or `x=np.random.rand(1, 5, 3, 1)` , everything
works fine and mxnet will not crash.
**But if I set the shape of input to `x=np.random.rand(1,3,3,10)`, which the
`-1 th` dimension does not match the `-2 th` dimension after max pooling, mxnet
will crash and report such check failed issue.**
Therefore, I assume in some code logic inner elementwise_op_common.h file,
it assumes the `-1 th` dimension should be consistent with `-2 th` dimension.
Can you help check whether this is a true problem? And what's is the root
cause of such issue?
Indeed thanks for your help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]