Nathan,
the root cause is your fixes were not backported to the v1.8 (nor the
v1.10) branch
i made PR https://github.com/open-mpi/ompi-release/pull/357 to fix this.
could you please review it ?
since there are quite a lot of differences between v1.8 and master, the
backport was not trivial.
i left some #if 0 in the code since i do not know if something need to
be done about rdma fragments
Cheers,
Gilles
On 7/2/2015 6:04 AM, Nathan Hjelm wrote:
Don't see the leak on master with OS X using the leaks command. Will see
what valgrind finds on linux.
-Nathan
On Wed, Jul 01, 2015 at 08:48:57PM +0000, Rolf vandeVaart wrote:
There have been two reports on the user list about memory leaks. I have
reproduced this leak with LAMMPS. Note that this has nothing to do with
CUDA-aware features. The steps that Stefan has provided make it easy to
reproduce.
Here are some more specific steps to reproduce derived from Stefan.
1. clone LAMMPS (git clone git://git.lammps.org/lammps-ro.git lammps)
2. cd src/, compile with openMPI 1.8.6. To do this, set your path to Open
MPI and type "make mpi"
3. run the example listed in lammps/examples/melt. To do this, first copy
"lmp_mpi" from the src directory into the melt directory. Then you need
to modify the in.melt file so that it will run for a while. Change
"run 25" to "run 250000"
4. you can run by mpirun -np 2 lmp_mpi < in.melt
For reference, here is both 1.8.5 and 1.8.6 memory consumption. 1.8.5
stays very stable where 1.8.6 almost triples after 6 minutes of running.
Open MPI 1.8.5
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 59.0 0.0 329672 14584 pts/16 Rl 16:24 0:00
./lmp_mpi_185_nocuda
32341 26908 60.0 0.0 329672 14676 pts/16 Rl 16:24 0:00
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 98.3 0.0 329672 14932 pts/16 Rl 16:24 0:30
./lmp_mpi_185_nocuda
32341 26908 98.5 0.0 329672 14932 pts/16 Rl 16:24 0:30
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 98.9 0.0 329672 14960 pts/16 Rl 16:24 1:00
./lmp_mpi_185_nocuda
32341 26908 99.1 0.0 329672 14952 pts/16 Rl 16:24 1:00
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.1 0.0 329672 14960 pts/16 Rl 16:24 1:30
./lmp_mpi_185_nocuda
32341 26908 99.3 0.0 329672 14952 pts/16 Rl 16:24 1:30
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.2 0.0 329672 14960 pts/16 Rl 16:24 2:00
./lmp_mpi_185_nocuda
32341 26908 99.4 0.0 329672 14952 pts/16 Rl 16:24 2:00
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.3 0.0 329672 14960 pts/16 Rl 16:24 2:30
./lmp_mpi_185_nocuda
32341 26908 99.5 0.0 329672 14952 pts/16 Rl 16:24 2:30
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.4 0.0 329672 14960 pts/16 Rl 16:24 2:59
./lmp_mpi_185_nocuda
32341 26908 99.5 0.0 329672 14952 pts/16 Rl 16:24 3:00
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.4 0.0 329672 14960 pts/16 Rl 16:24 3:29
./lmp_mpi_185_nocuda
32341 26908 99.6 0.0 329672 14956 pts/16 Rl 16:24 3:30
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.4 0.0 329672 14960 pts/16 Rl 16:24 3:59
./lmp_mpi_185_nocuda
32341 26908 99.6 0.0 329672 14956 pts/16 Rl 16:24 4:00
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.4 0.0 329672 14960 pts/16 Rl 16:24 4:29
./lmp_mpi_185_nocuda
32341 26908 99.6 0.0 329672 14956 pts/16 Rl 16:24 4:30
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.5 0.0 329672 14960 pts/16 Rl 16:24 4:59
./lmp_mpi_185_nocuda
32341 26908 99.6 0.0 329672 14956 pts/16 Rl 16:24 5:00
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.5 0.0 329672 14960 pts/16 Rl 16:24 5:29
./lmp_mpi_185_nocuda
32341 26908 99.6 0.0 329672 14956 pts/16 Rl 16:24 5:29
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26907 99.5 0.0 329672 14960 pts/16 Rl 16:24 5:59
./lmp_mpi_185_nocuda
32341 26908 99.6 0.0 329672 14956 pts/16 Rl 16:24 5:59
./lmp_mpi_185_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
Open MPI 1.8.6
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 0.0 0.0 330288 15368 pts/16 Rl 16:10 0:00
./lmp_mpi_186_nocuda
32341 26756 0.0 0.0 330284 15376 pts/16 Rl 16:10 0:00
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 100 0.0 409856 94976 pts/16 Rl 16:10 0:30
./lmp_mpi_186_nocuda
32341 26756 100 0.0 409848 94904 pts/16 Rl 16:10 0:30
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 100 0.1 489292 174320 pts/16 Rl 16:10 1:00
./lmp_mpi_186_nocuda
32341 26756 100 0.1 489288 174536 pts/16 Rl 16:10 1:00
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.9 0.1 568860 253928 pts/16 Rl 16:10 1:29
./lmp_mpi_186_nocuda
32341 26756 100 0.1 568984 254168 pts/16 Rl 16:10 1:30
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.9 0.2 648620 333800 pts/16 Rl 16:10 1:59
./lmp_mpi_186_nocuda
32341 26756 100 0.2 648616 333868 pts/16 Rl 16:10 2:00
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.8 0.3 728444 413516 pts/16 Rl 16:10 2:29
./lmp_mpi_186_nocuda
32341 26756 100 0.3 728376 413800 pts/16 Rl 16:10 2:30
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.8 0.3 808332 493388 pts/16 Rl 16:10 2:59
./lmp_mpi_186_nocuda
32341 26756 99.9 0.3 808328 493432 pts/16 Rl 16:10 2:59
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.7 0.4 888156 573260 pts/16 Rl 16:10 3:29
./lmp_mpi_186_nocuda
32341 26756 99.9 0.4 888088 573328 pts/16 Rl 16:10 3:29
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.7 0.4 968108 653396 pts/16 Rl 16:10 3:59
./lmp_mpi_186_nocuda
32341 26756 99.9 0.4 968232 653488 pts/16 Rl 16:10 3:59
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.7 0.5 1048252 733268 pts/16 Rl 16:10 4:29
./lmp_mpi_186_nocuda
32341 26756 99.9 0.5 1048248 733384 pts/16 Rl 16:10 4:29
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.7 0.6 1128396 813404 pts/16 Rl 16:10 4:59
./lmp_mpi_186_nocuda
32341 26756 99.9 0.6 1128328 813544 pts/16 Rl 16:10 4:59
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.7 0.6 1208736 893804 pts/16 Rl 16:10 5:29
./lmp_mpi_186_nocuda
32341 26756 99.9 0.6 1208668 893968 pts/16 Rl 16:10 5:29
./lmp_mpi_186_nocuda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
32341 26755 99.7 0.7 1288880 973940 pts/16 Rl 16:10 5:59
./lmp_mpi_186_nocuda
32341 26756 99.9 0.7 1288812 974128 pts/16 Rl 16:10 5:59
./lmp_mpi_186_nocuda
----------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.
----------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/07/17590.php
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/07/17591.php