access2rohit opened a new issue #19871: URL: https://github.com/apache/incubator-mxnet/issues/19871
## Problem statement MXNet when run with valgrind shows different memory leaks on unittests and when running inference. I have collected a list of such leaks as shown below. Some of these maybe by design or some might be actual. The table below shows comprhensive list of such leaks categorized by type (Engine, memory, CachedOp or Op) https://docs.google.com/spreadsheets/d/184kbSuhCVUTohxkDYxp_eMxcEhIKhoOY65VEkFVjDi0/edit?usp=sharing ## Proposed solutions Investigate which leaks are not by design and fix them ## Setup ``` ## build python from source debug mode cd $HOME wget https://www.python.org/ftp/python/3.6.12/Python-3.6.12.tgz tar -xvzf Python-3.6.12.tgz cd Python-3.6.12.tgz ./configure --with-pydebug --without-pymalloc --with-valgrind --prefix /opt/debugpython/ sudo make OPT=-g && sudo make install ## Add python valgrind suppression file vi $HOME/Python-3.6.12/Misc/valgrind-python.supp ## Then Uncomment PyObject_Free and PyObject_Realloc in the valgring suppression file. ## build Valgrind from source since apt-get installs version 1.13 ## which will give error with python cd $HOME git clone git://sourceware.org/git/valgrind.git cd $HOME/valgrind ./autogen.sh ./configure --prefix=$(pwd) make sudo make install export PATH=$PATH:$HOME/valgrind/bin export VALGRIND_LIB="$HOME/valgrind/lib/valgrind" ## go to MXNet directory and run valgrind cd $HOME/workspace/incubator-mxnet # Build MXNet # run valgrind on single unittest via pytest $HOME/valgrind/bin/valgrind --tool=memcheck --suppressions=$HOME/valgrind/Misc/valgrind-python.supp --leak-check=full --error-exitcode=1 /opt/debugpython/bin/python3 -m pytest -s --exitfirst --verbose --timeout=0 tests/python/unittest/test_numpy_op.py::test_np_sort ``` ## Sample Leak ``` ==23789== 34,652 (240 direct, 34,412 indirect) bytes in 3 blocks are definitely lost in loss record 126,460 of 126,809 ==23789== at 0x4C3257A: operator new(unsigned long) (vg_replace_malloc.c:342) ==23789== by 0x5D4B98C1: void dmlc::any::construct<mxnet::Imperative::DCInfo, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&>(std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) (any.h:267) ==23789== by 0x5D4AB003: mxnet::Imperative::DCInfo::Create(std::shared_ptr<nnvm::Node> const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) (imperative.cc:681) ==23789== by 0x5D4A7278: mxnet::Imperative::RecordDeferredCompute(nnvm::NodeAttrs&&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) (imperative.cc:341) ==23789== by 0x5D226911: mxnet::Invoke(nnvm::Op const*, nnvm::NodeAttrs*, int, mxnet::NDArray**, int*, mxnet::NDArray**) (utils.cc:95) ==23789== by 0x5D22414C: mxnet::UFuncHelper(mxnet::NDArray*, mxnet::NDArray*, mxnet::NDArray*, mxnet::runtime::MXNetRetValue*, nnvm::Op const*) (ufunc_helper.cc:47) ==23789== by 0x5D224B35: mxnet::UFuncHelper(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*, nnvm::Op const*, nnvm::Op const*, nnvm::Op const*) (ufunc_helper.cc:152) ==23789== by 0x5D194321: mxnet::{lambda(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*)#1}::operator()(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*) const (np_elemwise_broadcast_op.cc:36) ==23789== by 0x5D196729: std::_Function_handler<void (mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*), mxnet::{lambda(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*)#1}>::_M_invoke(std::_Any_data const&, mxnet::runtime::MXNetArgs&&, mxnet::runtime::MXNetRetValue*&&) (std_function.h:316) ==23789== by 0x6980964F: std::function<void (mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*)>::operator()(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*) const (std_function.h:706) ==23789== by 0x698095ED: mxnet::runtime::PackedFunc::CallPacked(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*) const (packed_func.h:942) ==23789== by 0x698083E4: MXNetFuncCall (c_runtime_api.cc:64) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
