access2rohit opened a new issue #19871:
URL: https://github.com/apache/incubator-mxnet/issues/19871


   ## Problem statement
   MXNet when run with valgrind shows different memory leaks on unittests and 
when running inference. 
   I have collected a list of such leaks as shown below. Some of these maybe by 
design or some might be actual. The table below shows comprhensive list of such 
leaks categorized by type (Engine, memory, CachedOp or Op)
   
https://docs.google.com/spreadsheets/d/184kbSuhCVUTohxkDYxp_eMxcEhIKhoOY65VEkFVjDi0/edit?usp=sharing
   
   ## Proposed solutions
   Investigate which leaks are not by design and fix them
   
   ## Setup 
   ```
   ## build python from source debug mode
   cd $HOME
   wget https://www.python.org/ftp/python/3.6.12/Python-3.6.12.tgz   
   tar -xvzf Python-3.6.12.tgz
   cd Python-3.6.12.tgz
   ./configure --with-pydebug --without-pymalloc --with-valgrind --prefix 
/opt/debugpython/
   sudo make OPT=-g && sudo make install
   
   ## Add python valgrind suppression file
   vi $HOME/Python-3.6.12/Misc/valgrind-python.supp
   ## Then Uncomment PyObject_Free and PyObject_Realloc in the valgring 
suppression file.
   
   
   ## build Valgrind from source since apt-get installs version 1.13 
   ## which will give error with python 
   cd $HOME
   git clone git://sourceware.org/git/valgrind.git
   cd $HOME/valgrind
   ./autogen.sh
   ./configure --prefix=$(pwd)
   make
   sudo make install
   export PATH=$PATH:$HOME/valgrind/bin
   export VALGRIND_LIB="$HOME/valgrind/lib/valgrind"
   
   ## go to MXNet directory and run valgrind
   cd $HOME/workspace/incubator-mxnet
   # Build MXNet 
   # run valgrind on single unittest via pytest
   $HOME/valgrind/bin/valgrind --tool=memcheck 
--suppressions=$HOME/valgrind/Misc/valgrind-python.supp --leak-check=full 
--error-exitcode=1 /opt/debugpython/bin/python3 -m pytest -s --exitfirst 
--verbose --timeout=0 tests/python/unittest/test_numpy_op.py::test_np_sort
   ```
   
   ## Sample Leak
   
   ```
   ==23789== 34,652 (240 direct, 34,412 indirect) bytes in 3 blocks are 
definitely lost in loss record 126,460 of 126,809
   ==23789==    at 0x4C3257A: operator new(unsigned long) 
(vg_replace_malloc.c:342)
   ==23789==    by 0x5D4B98C1: void 
dmlc::any::construct<mxnet::Imperative::DCInfo, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&>(std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&) (any.h:267)
   ==23789==    by 0x5D4AB003: 
mxnet::Imperative::DCInfo::Create(std::shared_ptr<nnvm::Node> const&, 
std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, 
std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) 
(imperative.cc:681)
   ==23789==    by 0x5D4A7278: 
mxnet::Imperative::RecordDeferredCompute(nnvm::NodeAttrs&&, 
std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, 
std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) 
(imperative.cc:341)
   ==23789==    by 0x5D226911: mxnet::Invoke(nnvm::Op const*, nnvm::NodeAttrs*, 
int, mxnet::NDArray**, int*, mxnet::NDArray**) (utils.cc:95)
   ==23789==    by 0x5D22414C: mxnet::UFuncHelper(mxnet::NDArray*, 
mxnet::NDArray*, mxnet::NDArray*, mxnet::runtime::MXNetRetValue*, nnvm::Op 
const*) (ufunc_helper.cc:47)
   ==23789==    by 0x5D224B35: mxnet::UFuncHelper(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*, nnvm::Op const*, nnvm::Op const*, nnvm::Op 
const*) (ufunc_helper.cc:152)
   ==23789==    by 0x5D194321: mxnet::{lambda(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*)#1}::operator()(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*) const (np_elemwise_broadcast_op.cc:36)
   ==23789==    by 0x5D196729: std::_Function_handler<void 
(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*), 
mxnet::{lambda(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*)#1}>::_M_invoke(std::_Any_data const&, 
mxnet::runtime::MXNetArgs&&, mxnet::runtime::MXNetRetValue*&&) 
(std_function.h:316)
   ==23789==    by 0x6980964F: std::function<void (mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*)>::operator()(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*) const (std_function.h:706)
   ==23789==    by 0x698095ED: 
mxnet::runtime::PackedFunc::CallPacked(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*) const (packed_func.h:942)
   ==23789==    by 0x698083E4: MXNetFuncCall (c_runtime_api.cc:64)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to