## Problem statement MXNet when run with valgrind shows different memory leaks on unittests and when running inference. I have collected a list of such leaks as shown below. Some of these maybe by design or some might be actual. The table below shows comprhensive list of such leaks categorized by type (Engine, memory, CachedOp or Op) https://docs.google.com/spreadsheets/d/184kbSuhCVUTohxkDYxp_eMxcEhIKhoOY65VEkFVjDi0/edit?usp=sharing
## Proposed solutions Investigate which leaks are not by design and fix them ## Setup ``` ## build python from source debug mode cd $HOME wget https://www.python.org/ftp/python/3.6.12/Python-3.6.12.tgz tar -xvzf Python-3.6.12.tgz cd Python-3.6.12.tgz ./configure --with-pydebug --without-pymalloc --with-valgrind --prefix /opt/debugpython/ sudo make OPT=-g && sudo make install ## Add python valgrind suppression file vi $HOME/Python-3.6.12/Misc/valgrind-python.supp ## Then Uncomment PyObject_Free and PyObject_Realloc in the valgring suppression file. ## build Valgrind from source since apt-get installs version 1.13 ## which will give error with python cd $HOME git clone git://sourceware.org/git/valgrind.git cd $HOME/valgrind ./autogen.sh ./configure --prefix=$(pwd) make sudo make install export PATH=$PATH:$HOME/valgrind/bin export VALGRIND_LIB="$HOME/valgrind/lib/valgrind" ## go to MXNet directory and run valgrind cd $HOME/workspace/incubator-mxnet # Build MXNet # run valgrind on single unittest via pytest $HOME/valgrind/bin/valgrind --tool=memcheck --suppressions=$HOME/valgrind/Misc/valgrind-python.supp --leak-check=full --error-exitcode=1 /opt/debugpython/bin/python3 -m pytest -s --exitfirst --verbose --timeout=0 tests/python/unittest/test_numpy_op.py::test_np_sort ``` ## Sample Leak ``` ==23789== 34,652 (240 direct, 34,412 indirect) bytes in 3 blocks are definitely lost in loss record 126,460 of 126,809 ==23789== at 0x4C3257A: operator new(unsigned long) (vg_replace_malloc.c:342) ==23789== by 0x5D4B98C1: void dmlc::any::construct<mxnet::Imperative::DCInfo, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&>(std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) (any.h:267) ==23789== by 0x5D4AB003: mxnet::Imperative::DCInfo::Create(std::shared_ptr<nnvm::Node> const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) (imperative.cc:681) ==23789== by 0x5D4A7278: mxnet::Imperative::RecordDeferredCompute(nnvm::NodeAttrs&&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) (imperative.cc:341) ==23789== by 0x5D226911: mxnet::Invoke(nnvm::Op const*, nnvm::NodeAttrs*, int, mxnet::NDArray**, int*, mxnet::NDArray**) (utils.cc:95) ==23789== by 0x5D22414C: mxnet::UFuncHelper(mxnet::NDArray*, mxnet::NDArray*, mxnet::NDArray*, mxnet::runtime::MXNetRetValue*, nnvm::Op const*) (ufunc_helper.cc:47) ==23789== by 0x5D224B35: mxnet::UFuncHelper(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*, nnvm::Op const*, nnvm::Op const*, nnvm::Op const*) (ufunc_helper.cc:152) ==23789== by 0x5D194321: mxnet::{lambda(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*)#1}::operator()(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*) const (np_elemwise_broadcast_op.cc:36) ==23789== by 0x5D196729: std::_Function_handler<void (mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*), mxnet::{lambda(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*)#1}>::_M_invoke(std::_Any_data const&, mxnet::runtime::MXNetArgs&&, mxnet::runtime::MXNetRetValue*&&) (std_function.h:316) ==23789== by 0x6980964F: std::function<void (mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*)>::operator()(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*) const (std_function.h:706) ==23789== by 0x698095ED: mxnet::runtime::PackedFunc::CallPacked(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*) const (packed_func.h:942) ==23789== by 0x698083E4: MXNetFuncCall (c_runtime_api.cc:64) ``` -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/19871
