## Problem statement
MXNet when run with valgrind shows different memory leaks on unittests and when 
running inference. 
I have collected a list of such leaks as shown below. Some of these maybe by 
design or some might be actual. The table below shows comprhensive list of such 
leaks categorized by type (Engine, memory, CachedOp or Op)
https://docs.google.com/spreadsheets/d/184kbSuhCVUTohxkDYxp_eMxcEhIKhoOY65VEkFVjDi0/edit?usp=sharing

## Proposed solutions
Investigate which leaks are not by design and fix them

## Setup 
```
## build python from source debug mode
cd $HOME
wget https://www.python.org/ftp/python/3.6.12/Python-3.6.12.tgz   
tar -xvzf Python-3.6.12.tgz
cd Python-3.6.12.tgz
./configure --with-pydebug --without-pymalloc --with-valgrind --prefix 
/opt/debugpython/
sudo make OPT=-g && sudo make install

## Add python valgrind suppression file
vi $HOME/Python-3.6.12/Misc/valgrind-python.supp
## Then Uncomment PyObject_Free and PyObject_Realloc in the valgring 
suppression file.


## build Valgrind from source since apt-get installs version 1.13 
## which will give error with python 
cd $HOME
git clone git://sourceware.org/git/valgrind.git
cd $HOME/valgrind
./autogen.sh
./configure --prefix=$(pwd)
make
sudo make install
export PATH=$PATH:$HOME/valgrind/bin
export VALGRIND_LIB="$HOME/valgrind/lib/valgrind"

## go to MXNet directory and run valgrind
cd $HOME/workspace/incubator-mxnet
# Build MXNet 
# run valgrind on single unittest via pytest
$HOME/valgrind/bin/valgrind --tool=memcheck 
--suppressions=$HOME/valgrind/Misc/valgrind-python.supp --leak-check=full 
--error-exitcode=1 /opt/debugpython/bin/python3 -m pytest -s --exitfirst 
--verbose --timeout=0 tests/python/unittest/test_numpy_op.py::test_np_sort
```

## Sample Leak

```
==23789== 34,652 (240 direct, 34,412 indirect) bytes in 3 blocks are definitely 
lost in loss record 126,460 of 126,809
==23789==    at 0x4C3257A: operator new(unsigned long) (vg_replace_malloc.c:342)
==23789==    by 0x5D4B98C1: void 
dmlc::any::construct<mxnet::Imperative::DCInfo, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&>(std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&) (any.h:267)
==23789==    by 0x5D4AB003: 
mxnet::Imperative::DCInfo::Create(std::shared_ptr<nnvm::Node> const&, 
std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, 
std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) 
(imperative.cc:681)
==23789==    by 0x5D4A7278: 
mxnet::Imperative::RecordDeferredCompute(nnvm::NodeAttrs&&, 
std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, 
std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) 
(imperative.cc:341)
==23789==    by 0x5D226911: mxnet::Invoke(nnvm::Op const*, nnvm::NodeAttrs*, 
int, mxnet::NDArray**, int*, mxnet::NDArray**) (utils.cc:95)
==23789==    by 0x5D22414C: mxnet::UFuncHelper(mxnet::NDArray*, 
mxnet::NDArray*, mxnet::NDArray*, mxnet::runtime::MXNetRetValue*, nnvm::Op 
const*) (ufunc_helper.cc:47)
==23789==    by 0x5D224B35: mxnet::UFuncHelper(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*, nnvm::Op const*, nnvm::Op const*, nnvm::Op 
const*) (ufunc_helper.cc:152)
==23789==    by 0x5D194321: mxnet::{lambda(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*)#1}::operator()(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*) const (np_elemwise_broadcast_op.cc:36)
==23789==    by 0x5D196729: std::_Function_handler<void 
(mxnet::runtime::MXNetArgs, mxnet::runtime::MXNetRetValue*), 
mxnet::{lambda(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*)#1}>::_M_invoke(std::_Any_data const&, 
mxnet::runtime::MXNetArgs&&, mxnet::runtime::MXNetRetValue*&&) 
(std_function.h:316)
==23789==    by 0x6980964F: std::function<void (mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*)>::operator()(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*) const (std_function.h:706)
==23789==    by 0x698095ED: 
mxnet::runtime::PackedFunc::CallPacked(mxnet::runtime::MXNetArgs, 
mxnet::runtime::MXNetRetValue*) const (packed_func.h:942)
==23789==    by 0x698083E4: MXNetFuncCall (c_runtime_api.cc:64)
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19871

Reply via email to