Fixed - just a lingering free that should have been removed
On Wed, Aug 13, 2014 at 8:21 AM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote: > I noticed MTT failures from last night and then reproduced this morning on > 1.8 branch. Looks like maybe a double free. I assume it is related to > fixes for aborting programs. Maybe related to > https://svn.open-mpi.org/trac/ompi/changeset/32508 but not sure. > > [rvandevaart@drossetti-ivy0 environment]$ pwd > /ivylogin/home/rvandevaart/tests/ompi-tests/trunk/ibm/environment > [rvandevaart@drossetti-ivy0 environment]$ mpirun --mca odls_base_verbose > 20 -np 2 abort > [...stuff deleted...] > [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to > tag 30 on child [[58714,1],0] > [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to > tag 30 on child [[58714,1],1] > [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to > tag 30 on child [[58714,1],0] > [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to > tag 30 on child [[58714,1],1] > ************************************************************************** > This program tests MPI_ABORT and generates error messages > ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!! > ************************************************************************** > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 3. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:wait_local_proc > child process [[58714,1],0] pid 14955 terminated > [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired child > [[58714,1],0] exit code 3 > [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired > checking abort file > /tmp/openmpi-sessions-rvandevaart@drossetti-ivy0_0/58714/1/0/aborted > for child [[58714,1],0] > [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired child > [[58714,1],0] died by call to abort > *** glibc detected *** mpirun: double free or corruption (fasttop): > 0x000000000130e210 *** > > From gdb: > gdb) where > #0 0x00007f75ede138e5 in raise () from /lib64/libc.so.6 > #1 0x00007f75ede1504d in abort () from /lib64/libc.so.6 > #2 0x00007f75ede517f7 in __libc_message () from /lib64/libc.so.6 > #3 0x00007f75ede57126 in malloc_printerr () from /lib64/libc.so.6 > #4 0x00007f75eef9eac4 in odls_base_default_wait_local_proc (pid=14955, > status=768, cbdata=0x0) > at ../../../../orte/mca/odls/base/odls_base_default_fns.c:2007 > #5 0x00007f75eef60a78 in do_waitall (options=0) at > ../../orte/runtime/orte_wait.c:554 > #6 0x00007f75eef60712 in orte_wait_signal_callback (fd=17, event=8, > arg=0x7f75ef201400) at ../../orte/runtime/orte_wait.c:421 > #7 0x00007f75eecaecbe in event_signal_closure (base=0x1278370, > ev=0x7f75ef201400) > at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1081 > #8 0x00007f75eecaf7e0 in event_process_active_single_queue > (base=0x1278370, activeq=0x12788f0) > at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1359 > #9 0x00007f75eecafaca in event_process_active (base=0x1278370) > at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1437 > #10 0x00007f75eecb0148 in opal_libevent2021_event_base_loop > (base=0x1278370, flags=1) > at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1645 > #11 0x0000000000405572 in orterun (argc=7, argv=0x7fffbdf1dd08) at > ../../../../orte/tools/orterun/orterun.c:1078 > #12 0x0000000000403904 in main (argc=7, argv=0x7fffbdf1dd08) at > ../../../../orte/tools/orterun/main.c:13 > (gdb) up > #1 0x00007f75ede1504d in abort () from /lib64/libc.so.6 > (gdb) up > #2 0x00007f75ede517f7 in __libc_message () from /lib64/libc.so.6 > (gdb) up > #3 0x00007f75ede57126 in malloc_printerr () from /lib64/libc.so.6 > (gdb) up > #4 0x00007f75eef9eac4 in odls_base_default_wait_local_proc (pid=14955, > status=768, cbdata=0x0) > at ../../../../orte/mca/odls/base/odls_base_default_fns.c:2007 > 2007 free(abortfile); > (gdb) print abortfile > $1 = 0x130e210 "" > (gdb) > > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and > may contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > > ----------------------------------------------------------------------------------- > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15632.php >