maybeLee edited a comment on issue #20416:
URL: 
https://github.com/apache/incubator-mxnet/issues/20416#issuecomment-877246558


   Hi, to make this bug easier to reproduce and understand. I simplify the 
triggering model into a very easy three-layer randomly generated model.
   The model includes three layers: 1 softmax, 1 max pooling, and 1 batch 
normalization. I create such models using keras and randomly generate weights 
for batch normalization layer.
   You can reproduce the bug by **just running the following program** with 
mxnet version 1.8.0:
   
   ```
   import os
   import argparse
   import sys
   import warnings
   parse = argparse.ArgumentParser()
   parse.add_argument("--bk", type=str,default="mxnet", help="the name of 
backend")
   flags, _ = parse.parse_known_args(sys.argv[1:])
   os.environ["KERAS_BACKEND"]=flags.bk
   import keras
   from keras import initializers, layers
   import numpy as np
   warnings.filterwarnings("ignore", category=DeprecationWarning)
   warnings.filterwarnings("ignore", category=UserWarning)
   model_1 = keras.models.Sequential()
   model_1.add(layers.Softmax())
   model_1.add(layers.MaxPooling2D())
   model_1.add(layers.BatchNormalization())
   x = np.random.rand(1,3,3,256)
   pred = model_1.predict(x)
   print(pred)
   ```
   By running the following command (I assume you save the above toy program as 
`try.py`:
   - `python try --bk mxnet`
   You will meet a crash with the same symptom as I mentioned before:
   
   ```
   Traceback (most recent call last):
     File "try.py", line 23, in <module>
       pred = model_1.predict(x)
     File 
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training.py",
 line 1184, in predict
       steps=steps)
     File 
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training_arrays.py",
 line 295, in predict_loop
       batch_outs = f(ins_batch)
     File 
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
 line 5645, in predict_function
       data, label, _, data_shapes, label_shapes = self._adjust_module(inputs, 
'pred')
     File 
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
 line 5525, in _adjust_module
       self._set_weights()
     File 
"/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py",
 line 5573, in _set_weights
       allow_missing=True)
     File "/mxnet/incubator-mxnet/python/mxnet/module/bucketing_module.py", 
line 220, in set_params
       force_init=force_init, allow_extra=allow_extra)
     File "/mxnet/incubator-mxnet/python/mxnet/module/module.py", line 358, in 
set_params
       self._exec_group.set_params(arg_params, aux_params, 
allow_extra=allow_extra)
     File "/mxnet/incubator-mxnet/python/mxnet/module/executor_group.py", line 
422, in set_params
       exec_.copy_params_from(arg_params, aux_params, 
allow_extra_params=allow_extra)
     File "/mxnet/incubator-mxnet/python/mxnet/executor.py", line 367, in 
copy_params_from
       array.astype(dst.dtype).copyto(dst)
     File "/mxnet/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 2663, 
in copyto
       return _internal._copyto(self, out=other)
     File "<string>", line 27, in _copyto
     File "/mxnet/incubator-mxnet/python/mxnet/_ctypes/ndarray.py", line 91, in 
_imperative_invoke
       ctypes.byref(out_stypes)))
     File "/mxnet/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     File "src/operator/numpy/linalg/./../../tensor/../elemwise_op_common.h", 
line 135
   MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in 
node  at 0-th output: expected [256], got [1]
   ```
   
   But if you run such a program using CNTK as the backend (`python try.py --bk 
cntk`), everything works fine.
   
   One interesting thing is that: If I delete either `softmax layer`, `batch 
normalization` or `max pooling layer`, no crash will happen. 
   Further, I tried some investigations and guess this is a bug caused by the 
wrong shape inference of mxnet.
   
   When I change the shape of input to `x=np.random.rand(1, 3, 3, 1)` or `x = 
np.random.rand(1, 8, 8, 4)` or `x=np.random.rand(1, 5, 3, 1)` , everything 
works fine and mxnet will not crash. 
   **But if I set the shape of input to `x=np.random.rand(1,3,3,10)`, which the 
`-1 th` dimension does not match the `-2 th` dimension after max pooling, mxnet 
will crash and report such check failed issue.**
   Therefore, I assume in some code logic inner elementwise_op_common.h file, 
it assumes the `-1 th` dimension should be consistent with `-2 th` dimension. 
   
   Can you help check whether this is a true problem? And what's is the root 
cause of such issue?
   Indeed thanks for your help.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to