azai91 edited a comment on issue #10994: MKLDNN fails in the backward computation when forward runs with is_train=False URL: https://github.com/apache/incubator-mxnet/issues/10994#issuecomment-404974972 I think this might be an issue not specific to mkldnn. built without mkldnn ``` ubuntu@ip-172-31-11-93:~/incubator-mxnet-original/build$ cmake -DUSE_MKLDNN=OFF -DUSE_CUDNN=ON -DUSE_CUDA=ON -DBLAS=Open -GNinja -DCMAKE_BUILD_TYPE=Debug .. && ninja``` and still get the issue ``` ubuntu@ip-172-31-11-93:~/incubator-mxnet-original$ nosetests tests/python/unittest/test_gluon.py:check_hybrid_static_memory /usr/local/lib/python3.5/dist-packages/nose/util.py:453: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead inspect.getargspec(func) [INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=341183070 to reproduce. [23:03:58] src/operator/nn/mkldnn/mkldnn_base.cc:73: Allocate 147456 bytes with malloc directly terminate called after throwing an instance of 'dmlc::Error' what(): [23:03:58] src/engine/./threaded_engine.h:379: std::exception A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging. Stack trace returned 8 entries: [bt] (0) /home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x1bc) [0x7f2eadfbfadc] [bt] (1) /home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f2eadfc0e58] [bt] (2) /home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0xfa9) [0x7f2eb0cb4619] [bt] (3) /home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0xe2) [0x7f2eb0ccb102] [bt] (4) /home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x4a) [0x7f2eb0cb355a] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2e87a43c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f2f122586ba] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2f11f8e41d] Aborted (core dumped) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services