Adam1105 opened a new issue #20858: URL: https://github.com/apache/incubator-mxnet/issues/20858
## Description I am using the latest release of v1.8.x mxnet installed with pip (mxnet-1.8.0.post0-cp39-cp39-macosx_10_13_x86_64.whl), more info in the Environment section. When using mkldnn and the NaiveEngine a model with an even number of channels in batch norm crashes in the backward call with an "MXNetError: Check failed: !is_view" error. This seems very similar to the issue described in the [bug](https://github.com/apache/incubator-mxnet/issues/19150). Apparently, it was fixed only for the forward pass. ### Error Message Is MKLDNN enabled: True input channel of 45 [15:54:53] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine input channel of 45, (1, 45, 8, 80, 80) input channel of 64 Traceback (most recent call last): File "/Users/gabrysa/./buggy_model.py", line 67, in <module> l.backward() File "/usr/local/lib/python3.9/site-packages/mxnet/ndarray/ndarray.py", line 2864, in backward check_call(_LIB.MXAutogradBackwardEx( File "/usr/local/lib/python3.9/site-packages/mxnet/base.py", line 246, in check_call raise get_last_ffi_error() mxnet.base.MXNetError: Traceback (most recent call last): File "../src/ndarray/ndarray.cc", line 650 MXNetError: Check failed: !is_view: ## To Reproduce ### code ```python from mxnet import init from mxnet.context import cpu from mxnet.gluon import nn, loss, Trainer from mxnet.gluon.block import HybridBlock from mxnet.gluon.nn import BatchNorm import mxnet as mx class BuggyModel(HybridBlock): def __init__( self, channels, norm_layer=BatchNorm, norm_kwargs=None, in_channels=3, **kwargs ): super(BuggyModel, self).__init__(**kwargs) self.in_channels = in_channels with self.name_scope(): self.conv1 = nn.Conv3D( in_channels=self.in_channels, channels=channels, kernel_size=(1, 7, 7), strides=(1, 2, 2), padding=(0, 3, 3), use_bias=False, ) self.bn1 = norm_layer(in_channels=channels, **({} if norm_kwargs is None else norm_kwargs)) def hybrid_forward(self, F, x): """Hybrid forward of R2+1D net""" x = self.conv1(x) x = self.bn1(x) return x print(f"Is MKLDNN enabled: {mx.runtime.Features().is_enabled('MKLDNN')}") print(f"input channel of 45") net = BuggyModel(channels=45) net.initialize(init=init.Constant(1)) l2_loss = loss.L2Loss() trainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1}) input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu()) with mx.autograd.record(): output = net(input_data) target_data = mx.nd.ones(output.shape, ctx=mx.cpu()) l = l2_loss(output, target_data) l.backward() print(f"input channel of 45, {output.shape}") print(f"input channel of 64") net = BuggyModel(channels=64) net.initialize(init=init.Constant(1)) input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu()) l2_loss = loss.L2Loss() trainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1}) input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu()) with mx.autograd.record(): output = net(input_data) target_data = mx.nd.ones(output.shape, ctx=mx.cpu()) l = l2_loss(output, target_data) l.backward() print(f"input channel of 64, {output.shape}") ``` ### Steps to reproduce 1. paste above code to the ./code.py 2. Run the code with MKLDNN using MXNet Naive Engine: `MXNET_ENGINE_TYPE=NaiveEngine python3 ./code.py` ## Environment <details> <summary>Environment Information</summary> ``` ----------Python Info---------- Version : 3.9.6 Compiler : Clang 12.0.5 (clang-1205.0.22.9) Build : ('default', 'Jun 29 2021 05:25:02') Arch : ('64bit', '') ------------Pip Info----------- Version : 21.1.3 Directory : /usr/local/lib/python3.9/site-packages/pip ----------MXNet Info----------- Version : 1.8.0 Directory : /usr/local/lib/python3.9/site-packages/mxnet Commit Hash : 891d36c2d1c28f9486ec34ce4a7812e27896acef 891d36c2d1c28f9486ec34ce4a7812e27896acef 891d36c2d1c28f9486ec34ce4a7812e27896acef Library : ['/usr/local/lib/python3.9/site-packages/mxnet/libmxnet.dylib'] Build features: ✖ CUDA ✖ CUDNN ✖ NCCL ✖ CUDA_RTC ✖ TENSORRT ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✔ CPU_SSE4_1 ✖ CPU_SSE4_2 ✖ CPU_SSE4A ✖ CPU_AVX ✖ CPU_AVX2 ✖ OPENMP ✖ SSE ✖ F16C ✖ JEMALLOC ✖ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✔ BLAS_APPLE ✔ LAPACK ✔ MKLDNN ✔ OPENCV ✖ CAFFE ✖ PROFILER ✖ DIST_KVSTORE ✖ CXX14 ✖ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ✖ TVM_OP ----------Environment---------- KMP_DUPLICATE_LIB_OK="True" KMP_INIT_AT_FORK="FALSE" ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
