I compiled MXNet from branch 1.6.x with non-standard CUDA.
I had to put
#define THRUST_IGNORE_CUB_VERSION_CHECK 1
in multiple /src/ directory files to silence thrust library errors (due to
version mismatch with CUDA).
Now I (successfully) build python library. Training is fine. Now, when I load
model from disk, I do
model.bind(...)
model.set_params(arg_params, aux_params)
...
model.predict(...)
and inference is fine again. But when process finished I get stacktrace:
Segmentation fault: 11
Segmentation fault: 11
Stack trace:
[bt] (0)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(+0x18c3f39)
[0x7fabd6dacf39]
[bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fac02bff210]
[bt] (2) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15a7541)
[0x7faaf9c89541]
[bt] (3) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15c710f)
[0x7faaf9ca910f]
[bt] (4)
/usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(cudnnDestroy+0x8f)
[0x7faaf88de72f]
[bt] (5)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void
mshadow::DeleteStream<mshadow::gpu>(mshadow::Stream<mshadow::gpu>*)+0x116)
[0x7fabd6cb1c56]
[bt] (6)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void
mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context,
bool,
mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*,
std::shared_ptr<dmlc::ManualEvent> const&)+0x287) [0x7fabd6ccb007]
[bt] (7)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void
(std::shared_ptr<dmlc::ManualEvent>),
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
bool)::{lambda()#4}::operator()()
const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data
const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x44) [0x7fabd6ccb3c4]
[bt] (8)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void
(std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > >
>::_M_run()+0x45) [0x7fabd6cc6095]
Stack trace:
[bt] (0)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(+0x18c3f39)
[0x7fabd6dacf39]
[bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fac02bff210]
[bt] (2) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15a7541)
[0x7faaf9c89541]
[bt] (3) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15c710f)
[0x7faaf9ca910f]
[bt] (4)
/usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(cudnnDestroy+0x8f)
[0x7faaf88de72f]
[bt] (5)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void
mshadow::DeleteStream<mshadow::gpu>(mshadow::Stream<mshadow::gpu>*)+0x116)
[0x7fabd6cb1c56]
[bt] (6)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void
mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context,
bool,
mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*,
std::shared_ptr<dmlc::ManualEvent> const&)+0x287) [0x7fabd6ccb007]
[bt] (7)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void
(std::shared_ptr<dmlc::ManualEvent>),
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
bool)::{lambda()#4}::operator()()
const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data
const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x44) [0x7fabd6ccb3c4]
[bt] (8)
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void
(std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > >
>::_M_run()+0x45) [0x7fabd6cc6095]
Segmentation fault (core dumped)
---
[Visit
Topic](https://discuss.mxnet.io/t/segmentationfault-on-process-exit-with-cuda-11-0-3/6538/1)
or reply to this email to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.mxnet.io/email/unsubscribe/da268a3c053a9c2724536e5d4a78e1f096a8d227abc1bd551e084fb629b40aec).