[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors
[ https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16395792#comment-16395792 ] Chris Olivier commented on MXNET-60: I'm lost – how is this related to profiler? > MXNET_MKLDNN_DEBUG=1 produces errors > > > Key: MXNET-60 > URL: https://issues.apache.org/jira/browse/MXNET-60 > Project: Apache MXNet > Issue Type: Bug >Reporter: Marco de Abreu >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483] > Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the > following error in tests. This happens across all configurations and seeds. I > do not think that this is a test failure. > > {code:java} > == > ERROR: test_gluon_model_zoo.test_models > -- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in > runTest > self.test(*self.arg) > File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new > orig_test(*args, **kwargs) > File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in > test_models > model(mx.nd.random.uniform(shape=data_shape)).wait_to_read() > File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read > check_call(_LIB.MXNDArrayWaitToRead(self.handle)) > File "/work/mxnet/python/mxnet/base.py", line 149, in check_call > raise MXNetError(py_str(_LIB.MXGetLastError())) > MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check > failed: similar > Stack trace returned 10 entries: > [bt] (0) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) > [0x7f06ccf3745b] > [bt] (1) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) > [0x7f06ccf38478] > [bt] (2) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, > mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198] > [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) > [0x7f06cf55a0d9] > [bt] (4) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, > nnvm::NodeAttrs const&, mxnet::Context const&, > std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&):: > {lambda(mxnet::RunContext)#1} > >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) > >[0x7f06cf77608c] > [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) > [0x7f06cfc11fdb] > [bt] (6) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, > mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5] > [bt] (7) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), > mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, > bool):: > {lambda()#1} > ::operator()() const:: > {lambda(std::shared_ptr)#1} > >::_M_invoke(std::_Any_data const&, > >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309] > [bt] (8) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> (std::shared_ptr)> > >::_M_run()+0x4a) [0x7f06cfc1c43a] > [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80] > >> begin captured stdout << - > ResNetV1( > (features): HybridSequential( > (0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activation(relu) > (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False) > (4): HybridSequential( > (0): BasicBlockV1( > (body): HybridSequential( > (0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activation(relu) >
[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors
[ https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16395222#comment-16395222 ] Patric Zhao commented on MXNET-60: -- Yes, thanks for the efforts. I will look into this issue. > MXNET_MKLDNN_DEBUG=1 produces errors > > > Key: MXNET-60 > URL: https://issues.apache.org/jira/browse/MXNET-60 > Project: Apache MXNet > Issue Type: Bug >Reporter: Marco de Abreu >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483] > Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the > following error in tests. This happens across all configurations and seeds. I > do not think that this is a test failure. > > {code:java} > == > ERROR: test_gluon_model_zoo.test_models > -- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in > runTest > self.test(*self.arg) > File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new > orig_test(*args, **kwargs) > File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in > test_models > model(mx.nd.random.uniform(shape=data_shape)).wait_to_read() > File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read > check_call(_LIB.MXNDArrayWaitToRead(self.handle)) > File "/work/mxnet/python/mxnet/base.py", line 149, in check_call > raise MXNetError(py_str(_LIB.MXGetLastError())) > MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check > failed: similar > Stack trace returned 10 entries: > [bt] (0) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) > [0x7f06ccf3745b] > [bt] (1) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) > [0x7f06ccf38478] > [bt] (2) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, > mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198] > [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) > [0x7f06cf55a0d9] > [bt] (4) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, > nnvm::NodeAttrs const&, mxnet::Context const&, > std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&):: > {lambda(mxnet::RunContext)#1} > >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) > >[0x7f06cf77608c] > [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) > [0x7f06cfc11fdb] > [bt] (6) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, > mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5] > [bt] (7) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), > mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, > bool):: > {lambda()#1} > ::operator()() const:: > {lambda(std::shared_ptr)#1} > >::_M_invoke(std::_Any_data const&, > >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309] > [bt] (8) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> (std::shared_ptr)> > >::_M_run()+0x4a) [0x7f06cfc1c43a] > [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80] > >> begin captured stdout << - > ResNetV1( > (features): HybridSequential( > (0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activation(relu) > (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False) > (4): HybridSequential( > (0): BasicBlockV1( > (body): HybridSequential( > (0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activatio
[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors
[ https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16395191#comment-16395191 ] Marco de Abreu commented on MXNET-60: - Yes, it still fails. See [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10069/1/pipeline/483] for reference > MXNET_MKLDNN_DEBUG=1 produces errors > > > Key: MXNET-60 > URL: https://issues.apache.org/jira/browse/MXNET-60 > Project: Apache MXNet > Issue Type: Bug >Reporter: Marco de Abreu >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483] > Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the > following error in tests. This happens across all configurations and seeds. I > do not think that this is a test failure. > > {code:java} > == > ERROR: test_gluon_model_zoo.test_models > -- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in > runTest > self.test(*self.arg) > File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new > orig_test(*args, **kwargs) > File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in > test_models > model(mx.nd.random.uniform(shape=data_shape)).wait_to_read() > File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read > check_call(_LIB.MXNDArrayWaitToRead(self.handle)) > File "/work/mxnet/python/mxnet/base.py", line 149, in check_call > raise MXNetError(py_str(_LIB.MXGetLastError())) > MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check > failed: similar > Stack trace returned 10 entries: > [bt] (0) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) > [0x7f06ccf3745b] > [bt] (1) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) > [0x7f06ccf38478] > [bt] (2) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, > mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198] > [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) > [0x7f06cf55a0d9] > [bt] (4) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, > nnvm::NodeAttrs const&, mxnet::Context const&, > std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&):: > {lambda(mxnet::RunContext)#1} > >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) > >[0x7f06cf77608c] > [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) > [0x7f06cfc11fdb] > [bt] (6) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, > mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5] > [bt] (7) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), > mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, > bool):: > {lambda()#1} > ::operator()() const:: > {lambda(std::shared_ptr)#1} > >::_M_invoke(std::_Any_data const&, > >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309] > [bt] (8) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> (std::shared_ptr)> > >::_M_run()+0x4a) [0x7f06cfc1c43a] > [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80] > >> begin captured stdout << - > ResNetV1( > (features): HybridSequential( > (0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activation(relu) > (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False) > (4): HybridSequential( > (0): BasicBlockV1( > (body): HybridSequential( > (0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), > bias=False) > (1): BatchNorm(fix_gamm
[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors
[ https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16395152#comment-16395152 ] Marco de Abreu commented on MXNET-60: - Nevermind my previous comment, this happened on Ubuntu. I have created a PR at [https://github.com/apache/incubator-mxnet/pull/10069.] If it succeeds, everything is fine. Otherwise, I'd appreciate it if you could look into it. > MXNET_MKLDNN_DEBUG=1 produces errors > > > Key: MXNET-60 > URL: https://issues.apache.org/jira/browse/MXNET-60 > Project: Apache MXNet > Issue Type: Bug >Reporter: Marco de Abreu >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483] > Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the > following error in tests. This happens across all configurations and seeds. I > do not think that this is a test failure. > > {code:java} > == > ERROR: test_gluon_model_zoo.test_models > -- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in > runTest > self.test(*self.arg) > File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new > orig_test(*args, **kwargs) > File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in > test_models > model(mx.nd.random.uniform(shape=data_shape)).wait_to_read() > File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read > check_call(_LIB.MXNDArrayWaitToRead(self.handle)) > File "/work/mxnet/python/mxnet/base.py", line 149, in check_call > raise MXNetError(py_str(_LIB.MXGetLastError())) > MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check > failed: similar > Stack trace returned 10 entries: > [bt] (0) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) > [0x7f06ccf3745b] > [bt] (1) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) > [0x7f06ccf38478] > [bt] (2) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, > mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198] > [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) > [0x7f06cf55a0d9] > [bt] (4) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, > nnvm::NodeAttrs const&, mxnet::Context const&, > std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&):: > {lambda(mxnet::RunContext)#1} > >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) > >[0x7f06cf77608c] > [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) > [0x7f06cfc11fdb] > [bt] (6) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, > mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5] > [bt] (7) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), > mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, > bool):: > {lambda()#1} > ::operator()() const:: > {lambda(std::shared_ptr)#1} > >::_M_invoke(std::_Any_data const&, > >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309] > [bt] (8) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> (std::shared_ptr)> > >::_M_run()+0x4a) [0x7f06cfc1c43a] > [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80] > >> begin captured stdout << - > ResNetV1( > (features): HybridSequential( > (0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activation(relu) > (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False) > (4): HybridSequential( > (0): BasicBlockV1( > (body): HybridSequential( > (0): Conv2D(64 -> 64, kernel_size=(3, 3
[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors
[ https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394966#comment-16394966 ] Marco de Abreu commented on MXNET-60: - I think this only happens on cent os 7, but let me double check. Da said that this is a known error. > MXNET_MKLDNN_DEBUG=1 produces errors > > > Key: MXNET-60 > URL: https://issues.apache.org/jira/browse/MXNET-60 > Project: Apache MXNet > Issue Type: Bug >Reporter: Marco de Abreu >Priority: Major > > [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483] > Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the > following error in tests. This happens across all configurations and seeds. I > do not think that this is a test failure. > > {code:java} > == > ERROR: test_gluon_model_zoo.test_models > -- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in > runTest > self.test(*self.arg) > File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new > orig_test(*args, **kwargs) > File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in > test_models > model(mx.nd.random.uniform(shape=data_shape)).wait_to_read() > File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read > check_call(_LIB.MXNDArrayWaitToRead(self.handle)) > File "/work/mxnet/python/mxnet/base.py", line 149, in check_call > raise MXNetError(py_str(_LIB.MXGetLastError())) > MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check > failed: similar > Stack trace returned 10 entries: > [bt] (0) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) > [0x7f06ccf3745b] > [bt] (1) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) > [0x7f06ccf38478] > [bt] (2) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, > mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198] > [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) > [0x7f06cf55a0d9] > [bt] (4) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, > nnvm::NodeAttrs const&, mxnet::Context const&, > std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&):: > {lambda(mxnet::RunContext)#1} > >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) > >[0x7f06cf77608c] > [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) > [0x7f06cfc11fdb] > [bt] (6) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, > mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5] > [bt] (7) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), > mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, > bool):: > {lambda()#1} > ::operator()() const:: > {lambda(std::shared_ptr)#1} > >::_M_invoke(std::_Any_data const&, > >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309] > [bt] (8) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> (std::shared_ptr)> > >::_M_run()+0x4a) [0x7f06cfc1c43a] > [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80] > >> begin captured stdout << - > ResNetV1( > (features): HybridSequential( > (0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activation(relu) > (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False) > (4): HybridSequential( > (0): BasicBlockV1( > (body): HybridSequential( > (0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activation
[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors
[ https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394900#comment-16394900 ] Patric Zhao commented on MXNET-60: -- [~rahul003] two purposes in here: # Check if this open issue can be reproduced. Seems NOT. # Check if we can get the profiling results of MKL-DNN OP by latest profiling environment. Seems NOT. Thanks, > MXNET_MKLDNN_DEBUG=1 produces errors > > > Key: MXNET-60 > URL: https://issues.apache.org/jira/browse/MXNET-60 > Project: Apache MXNet > Issue Type: Bug >Reporter: Marco de Abreu >Priority: Major > > [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483] > Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the > following error in tests. This happens across all configurations and seeds. I > do not think that this is a test failure. > > {code:java} > == > ERROR: test_gluon_model_zoo.test_models > -- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in > runTest > self.test(*self.arg) > File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new > orig_test(*args, **kwargs) > File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in > test_models > model(mx.nd.random.uniform(shape=data_shape)).wait_to_read() > File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read > check_call(_LIB.MXNDArrayWaitToRead(self.handle)) > File "/work/mxnet/python/mxnet/base.py", line 149, in check_call > raise MXNetError(py_str(_LIB.MXGetLastError())) > MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check > failed: similar > Stack trace returned 10 entries: > [bt] (0) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) > [0x7f06ccf3745b] > [bt] (1) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) > [0x7f06ccf38478] > [bt] (2) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, > mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198] > [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) > [0x7f06cf55a0d9] > [bt] (4) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, > nnvm::NodeAttrs const&, mxnet::Context const&, > std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&):: > {lambda(mxnet::RunContext)#1} > >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) > >[0x7f06cf77608c] > [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) > [0x7f06cfc11fdb] > [bt] (6) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, > mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5] > [bt] (7) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), > mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, > bool):: > {lambda()#1} > ::operator()() const:: > {lambda(std::shared_ptr)#1} > >::_M_invoke(std::_Any_data const&, > >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309] > [bt] (8) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> (std::shared_ptr)> > >::_M_run()+0x4a) [0x7f06cfc1c43a] > [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80] > >> begin captured stdout << - > ResNetV1( > (features): HybridSequential( > (0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activation(relu) > (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False) > (4): HybridSequential( > (0): BasicBlockV1( > (body): HybridSequential( > (0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), > bias=False) > (1): BatchNorm(fix_gamma=Fals
[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors
[ https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394893#comment-16394893 ] Rahul Huilgol commented on MXNET-60: Hi Patric, Sorry I didn't follow what you said? Are you trying to profile the code and are facing issues? > MXNET_MKLDNN_DEBUG=1 produces errors > > > Key: MXNET-60 > URL: https://issues.apache.org/jira/browse/MXNET-60 > Project: Apache MXNet > Issue Type: Bug >Reporter: Marco de Abreu >Priority: Major > > [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483] > Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the > following error in tests. This happens across all configurations and seeds. I > do not think that this is a test failure. > > {code:java} > == > ERROR: test_gluon_model_zoo.test_models > -- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in > runTest > self.test(*self.arg) > File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new > orig_test(*args, **kwargs) > File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in > test_models > model(mx.nd.random.uniform(shape=data_shape)).wait_to_read() > File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read > check_call(_LIB.MXNDArrayWaitToRead(self.handle)) > File "/work/mxnet/python/mxnet/base.py", line 149, in check_call > raise MXNetError(py_str(_LIB.MXGetLastError())) > MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check > failed: similar > Stack trace returned 10 entries: > [bt] (0) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) > [0x7f06ccf3745b] > [bt] (1) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) > [0x7f06ccf38478] > [bt] (2) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, > mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198] > [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) > [0x7f06cf55a0d9] > [bt] (4) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, > nnvm::NodeAttrs const&, mxnet::Context const&, > std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&):: > {lambda(mxnet::RunContext)#1} > >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) > >[0x7f06cf77608c] > [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) > [0x7f06cfc11fdb] > [bt] (6) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, > mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5] > [bt] (7) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), > mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, > bool):: > {lambda()#1} > ::operator()() const:: > {lambda(std::shared_ptr)#1} > >::_M_invoke(std::_Any_data const&, > >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309] > [bt] (8) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> (std::shared_ptr)> > >::_M_run()+0x4a) [0x7f06cfc1c43a] > [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80] > >> begin captured stdout << - > ResNetV1( > (features): HybridSequential( > (0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activation(relu) > (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False) > (4): HybridSequential( > (0): BasicBlockV1( > (body): HybridSequential( > (0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), > bias=False) > (1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, > momentum=0.9, axis=1, in_channels=None) > (2): Activatio
[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors
[ https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394872#comment-16394872 ] Patric Zhao commented on MXNET-60: -- [~rahul003] I am testing the latest code without the error nor the profiler output. _commit 94f68fc8fd21611b7f5c148cb0e5d134efe58f87_ _Author: Rahul Huilgol _ _Date: Sun Mar 11 04:00:55 2018 -0700_ _Fixes for profiler (#9932)_ MKLDNN: Only some code path information. [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Convolution [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Pooling [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test FullyConnected > MXNET_MKLDNN_DEBUG=1 produces errors > > > Key: MXNET-60 > URL: https://issues.apache.org/jira/browse/MXNET-60 > Project: Apache MXNet > Issue Type: Bug >Reporter: Marco de Abreu >Priority: Major > > [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483] > Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the > following error in tests. This happens across all configurations and seeds. I > do not think that this is a test failure. > > {code:java} > == > ERROR: test_gluon_model_zoo.test_models > -- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in > runTest > self.test(*self.arg) > File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new > orig_test(*args, **kwargs) > File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in > test_models > model(mx.nd.random.uniform(shape=data_shape)).wait_to_read() > File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read > check_call(_LIB.MXNDArrayWaitToRead(self.handle)) > File "/work/mxnet/python/mxnet/base.py", line 149, in check_call > raise MXNetError(py_str(_LIB.MXGetLastError())) > MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check > failed: similar > Stack trace returned 10 entries: > [bt] (0) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) > [0x7f06ccf3745b] > [bt] (1) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) > [0x7f06ccf38478] > [bt] (2) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, > mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198] > [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) > [0x7f06cf55a0d9] > [bt] (4) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, > nnvm::NodeAttrs const&, mxnet::Context const&, > std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&, std::vector > > const&):: > {lambda(mxnet::RunContext)#1} > >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) > >[0x7f06cf77608c] > [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) > [0x7f06cfc11fdb] > [bt] (6) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, > mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5] > [bt] (7) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), > mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, > bool):: > {lambda()#1} > ::operator()() const:: > {lambda(std::shared_ptr)#1} > >::_M_invoke(std::_Any_data const&, > >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309] > [bt] (8) > /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> (std::shared_ptr)> > >::_M_run()+0x4a) [0x7f06cfc1c43a] > [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80] > >> begin captured stdout << --