maybeLee opened a new issue #20539:
URL: https://github.com/apache/incubator-mxnet/issues/20539


   ## Description
   Wrong shape dimension check happens in Pooling function. Will directly crash 
on both mxnet 1.5.1 and mxnet 1.8.0.
   
   ### Error Message
   ```
   Using MXNet backend
   lib/python3.6/site-packages/keras/__init__.py:31: DeprecationWarning: MXNet 
support in Keras is going to be discontinued and v2.2.4.3 is the last release 
as multi-backend Keras has been discontinued . It is recommended to consider 
switching to MXNet Gluon. More information can be found here: 
https://github.com/awslabs/keras-apache-mxnet
     "https://github.com/awslabs/keras-apache-mxnet";, DeprecationWarning)
   Traceback (most recent call last):
     File "try.py", line 35, in <module>
       x = layers.Conv2D(filters=3, kernel_size=(5, 5))(x)
     File "lib/python3.6/site-packages/keras/engine/base_layer.py", line 470, 
in __call__
       output = self.call(inputs, **kwargs)
     File "lib/python3.6/site-packages/keras/layers/convolutional.py", line 
175, in call
       dilation_rate=self.dilation_rate)
     File "lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 
3709, in conv2d
       padding_mode=padding, data_format=data_format)
     File "lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 
95, in func_wrapper
       train_symbol = func(*args, **kwargs)
     File "lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 
5049, in _convnd
       padding, is_slice, out_size = _preprocess_padding_mode(padding_mode, 
x.shape,
     File "lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 
4399, in shape
       return self._get_shape()
     File "lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 
4408, in _get_shape
       _, out_shape, _ = self.symbol.infer_shape_partial()
     File "lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1152, in 
infer_shape_partial
       return self._infer_shape_impl(True, *args, **kwargs)
     File "lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1210, in 
_infer_shape_impl
       ctypes.byref(complete)))
     File "lib/python3.6/site-packages/mxnet/base.py", line 253, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: Error in operator average_pooling2d_1/pool2d1: 
[22:12:42] src/operator/nn/pooling.cc:109: Check failed: dshape.ndim() >= 3 (-1 
vs. 3) : Pooling: Input data should be  3D in (batch, channel, x) Or 4D in 
(batch, channel, y, x)  Or 5D in (batch, channel, d, y, x)
   Stack trace:
     [bt] (0) 1   libmxnet.so                         0x000000011249e929 
mxnet::op::NDArrayOpProp::~NDArrayOpProp() + 4473
     [bt] (1) 2   libmxnet.so                         0x00000001128dd540 
mxnet::op::FullyConnectedComputeExCPU(nnvm::NodeAttrs const&, mxnet::OpContext 
const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > 
const&, std::__1::vector<mxnet::OpReqType, 
std::__1::allocator<mxnet::OpReqType> > const&, 
std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&) 
+ 750816
     [bt] (2) 3   libmxnet.so                         0x00000001139e535a 
std::__1::__tree<std::__1::__value_type<unsigned long, mxnet::NDArray>, 
std::__1::__map_value_compare<unsigned long, std::__1::__value_type<unsigned 
long, mxnet::NDArray>, std::__1::less<unsigned long>, true>, 
std::__1::allocator<std::__1::__value_type<unsigned long, mxnet::NDArray> > 
>::erase(std::__1::__tree_const_iterator<std::__1::__value_type<unsigned long, 
mxnet::NDArray>, std::__1::__tree_node<std::__1::__value_type<unsigned long, 
mxnet::NDArray>, void*>*, long>) + 42170
     [bt] (3) 4   libmxnet.so                         0x00000001139dd94e 
std::__1::__tree<std::__1::__value_type<unsigned long, mxnet::NDArray>, 
std::__1::__map_value_compare<unsigned long, std::__1::__value_type<unsigned 
long, mxnet::NDArray>, std::__1::less<unsigned long>, true>, 
std::__1::allocator<std::__1::__value_type<unsigned long, mxnet::NDArray> > 
>::erase(std::__1::__tree_const_iterator<std::__1::__value_type<unsigned long, 
mxnet::NDArray>, std::__1::__tree_node<std::__1::__value_type<unsigned long, 
mxnet::NDArray>, void*>*, long>) + 10926
     [bt] (4) 5   libmxnet.so                         0x0000000113979a15 
MXSymbolInferShapeEx + 2581
     [bt] (5) 6   libmxnet.so                         0x000000011397aa30 
MXSymbolInferShapePartialEx + 112
     [bt] (6) 7   libffi.7.dylib                      0x0000000110ab0ead 
ffi_call_unix64 + 85
   
   ```
   ## To Reproduce
   Script to produce such a problem
   ```
   import os
   import warnings
   
   if __name__ == "__main__":
       os.environ["KERAS_BACKEND"] = "mxnet"
       warnings.filterwarnings("ignore", category=DeprecationWarning)
       warnings.filterwarnings("ignore", category=UserWarning)
       warnings.filterwarnings("ignore", category=FutureWarning)
       warnings.filterwarnings("ignore")
       gpu_ids = "1"
       os.environ["CUDA_VISIBLE_DEVICES"] = gpu_ids
       os.environ["MXNET_SUBGRAPH_VERBOSE"] = "0"
       os.environ["MXNET_CUDNN_AUTOTUNE_DEFAULT"] = "0"
       os.environ["MXNET_CUDNN_LIB_CHECKING"] = "0"
   
       import keras
       from keras import layers
   
       input_shape = layers.Input(shape=(10, 10, 6))
       # the error will disappear when not using softmax
       x = layers.Conv2DTranspose(filters=3, kernel_size=(3, 3), 
activation='softmax')(input_shape)
   
       # when using default setting(channel_last), mxnet will raise another 
error
       x = layers.AveragePooling2D(pool_size=(2, 2), strides=(2, 2), 
data_format='channels_first')(x)
   
       x = layers.Conv2D(filters=3, kernel_size=(5, 5))(x)
       new_model = keras.models.Model(input_shape, x)
       # print("successfully get the prediction result")
       del new_model
   
   ```
   
   ### Steps to reproduce
   directly run above program
   
   ## What have you tried to solve it?
   
   It seems that this is an inadequate assertion problem.
   I looked into the code that trigger such bugs (which lies in line 109 of 
file `src/operator/nn/pooling.cc`)
   
![image](https://user-images.githubusercontent.com/32777264/129915756-bc6301cf-f776-42c1-b807-086bde12c44b.png)
   And find that when comparing the dimension of `dshape`, the Pooling function 
does not consider the change that the dimension can be `-1`. I guess it is this 
problem that leads to MXNet to crash.
   
   ## Environment
   Such issue is tested in both Centos 8, MacOS Big Sur, and Windows 10


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to