Nathan,

the root cause is your fixes were not backported to the v1.8 (nor the v1.10) branch

i made PR https://github.com/open-mpi/ompi-release/pull/357 to fix this.

could you please review it ?

since there are quite a lot of differences between v1.8 and master, the backport was not trivial. i left some #if 0 in the code since i do not know if something need to be done about rdma fragments

Cheers,

Gilles

On 7/2/2015 6:04 AM, Nathan Hjelm wrote:
Don't see the leak on master with OS X using the leaks command. Will see
what valgrind finds on linux.

-Nathan

On Wed, Jul 01, 2015 at 08:48:57PM +0000, Rolf vandeVaart wrote:
    There have been two reports on the user list about memory leaks.  I have
    reproduced this leak with LAMMPS.  Note that this has nothing to do with
    CUDA-aware features.  The steps that Stefan has provided make it easy to
    reproduce.

    Here are some more specific steps to reproduce derived from Stefan.

    1. clone LAMMPS (git clone git://git.lammps.org/lammps-ro.git lammps)
    2. cd src/, compile with openMPI 1.8.6.  To do this, set your path to Open
    MPI and type "make mpi"
    3. run the example listed in lammps/examples/melt. To do this, first copy
    "lmp_mpi" from the src directory into the melt directory.  Then you need
    to modify the in.melt file so that it will run for a while.  Change
    "run 25" to "run        250000"

    4. you can run by mpirun -np 2 lmp_mpi < in.melt

    For reference, here is both 1.8.5 and 1.8.6 memory consumption.  1.8.5
    stays very stable where 1.8.6 almost triples after 6 minutes of running.

    Open MPI 1.8.5

    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 59.0  0.0 329672 14584 pts/16   Rl   16:24   0:00
    ./lmp_mpi_185_nocuda
    32341    26908 60.0  0.0 329672 14676 pts/16   Rl   16:24   0:00
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 98.3  0.0 329672 14932 pts/16   Rl   16:24   0:30
    ./lmp_mpi_185_nocuda
    32341    26908 98.5  0.0 329672 14932 pts/16   Rl   16:24   0:30
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 98.9  0.0 329672 14960 pts/16   Rl   16:24   1:00
    ./lmp_mpi_185_nocuda
    32341    26908 99.1  0.0 329672 14952 pts/16   Rl   16:24   1:00
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.1  0.0 329672 14960 pts/16   Rl   16:24   1:30
    ./lmp_mpi_185_nocuda
    32341    26908 99.3  0.0 329672 14952 pts/16   Rl   16:24   1:30
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.2  0.0 329672 14960 pts/16   Rl   16:24   2:00
    ./lmp_mpi_185_nocuda
    32341    26908 99.4  0.0 329672 14952 pts/16   Rl   16:24   2:00
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.3  0.0 329672 14960 pts/16   Rl   16:24   2:30
    ./lmp_mpi_185_nocuda
    32341    26908 99.5  0.0 329672 14952 pts/16   Rl   16:24   2:30
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.4  0.0 329672 14960 pts/16   Rl   16:24   2:59
    ./lmp_mpi_185_nocuda
    32341    26908 99.5  0.0 329672 14952 pts/16   Rl   16:24   3:00
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.4  0.0 329672 14960 pts/16   Rl   16:24   3:29
    ./lmp_mpi_185_nocuda
    32341    26908 99.6  0.0 329672 14956 pts/16   Rl   16:24   3:30
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.4  0.0 329672 14960 pts/16   Rl   16:24   3:59
    ./lmp_mpi_185_nocuda
    32341    26908 99.6  0.0 329672 14956 pts/16   Rl   16:24   4:00
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.4  0.0 329672 14960 pts/16   Rl   16:24   4:29
    ./lmp_mpi_185_nocuda
    32341    26908 99.6  0.0 329672 14956 pts/16   Rl   16:24   4:30
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.5  0.0 329672 14960 pts/16   Rl   16:24   4:59
    ./lmp_mpi_185_nocuda
    32341    26908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:00
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.5  0.0 329672 14960 pts/16   Rl   16:24   5:29
    ./lmp_mpi_185_nocuda
    32341    26908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:29
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26907 99.5  0.0 329672 14960 pts/16   Rl   16:24   5:59
    ./lmp_mpi_185_nocuda
    32341    26908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:59
    ./lmp_mpi_185_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

    Open MPI 1.8.6

    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755  0.0  0.0 330288 15368 pts/16   Rl   16:10   0:00
    ./lmp_mpi_186_nocuda
    32341    26756  0.0  0.0 330284 15376 pts/16   Rl   16:10   0:00
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755  100  0.0 409856 94976 pts/16   Rl   16:10   0:30
    ./lmp_mpi_186_nocuda
    32341    26756  100  0.0 409848 94904 pts/16   Rl   16:10   0:30
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755  100  0.1 489292 174320 pts/16  Rl   16:10   1:00
    ./lmp_mpi_186_nocuda
    32341    26756  100  0.1 489288 174536 pts/16  Rl   16:10   1:00
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.9  0.1 568860 253928 pts/16  Rl   16:10   1:29
    ./lmp_mpi_186_nocuda
    32341    26756  100  0.1 568984 254168 pts/16  Rl   16:10   1:30
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.9  0.2 648620 333800 pts/16  Rl   16:10   1:59
    ./lmp_mpi_186_nocuda
    32341    26756  100  0.2 648616 333868 pts/16  Rl   16:10   2:00
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.8  0.3 728444 413516 pts/16  Rl   16:10   2:29
    ./lmp_mpi_186_nocuda
    32341    26756  100  0.3 728376 413800 pts/16  Rl   16:10   2:30
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.8  0.3 808332 493388 pts/16  Rl   16:10   2:59
    ./lmp_mpi_186_nocuda
    32341    26756 99.9  0.3 808328 493432 pts/16  Rl   16:10   2:59
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.7  0.4 888156 573260 pts/16  Rl   16:10   3:29
    ./lmp_mpi_186_nocuda
    32341    26756 99.9  0.4 888088 573328 pts/16  Rl   16:10   3:29
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.7  0.4 968108 653396 pts/16  Rl   16:10   3:59
    ./lmp_mpi_186_nocuda
    32341    26756 99.9  0.4 968232 653488 pts/16  Rl   16:10   3:59
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.7  0.5 1048252 733268 pts/16 Rl   16:10   4:29
    ./lmp_mpi_186_nocuda
    32341    26756 99.9  0.5 1048248 733384 pts/16 Rl   16:10   4:29
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.7  0.6 1128396 813404 pts/16 Rl   16:10   4:59
    ./lmp_mpi_186_nocuda
    32341    26756 99.9  0.6 1128328 813544 pts/16 Rl   16:10   4:59
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.7  0.6 1208736 893804 pts/16 Rl   16:10   5:29
    ./lmp_mpi_186_nocuda
    32341    26756 99.9  0.6 1208668 893968 pts/16 Rl   16:10   5:29
    ./lmp_mpi_186_nocuda
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    32341    26755 99.7  0.7 1288880 973940 pts/16 Rl   16:10   5:59
    ./lmp_mpi_186_nocuda
    32341    26756 99.9  0.7 1288812 974128 pts/16 Rl   16:10   5:59
    ./lmp_mpi_186_nocuda

      ----------------------------------------------------------------------

    This email message is for the sole use of the intended recipient(s) and
    may contain confidential information.  Any unauthorized review, use,
    disclosure or distribution is prohibited.  If you are not the intended
    recipient, please contact the sender by reply email and destroy all copies
    of the original message.

      ----------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/07/17590.php


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/07/17591.php

Reply via email to