Fixed - just a lingering free that should have been removed


On Wed, Aug 13, 2014 at 8:21 AM, Rolf vandeVaart <rvandeva...@nvidia.com>
wrote:

> I noticed MTT failures from last night and then reproduced this morning on
> 1.8 branch.  Looks like maybe a double free.  I assume it is related to
> fixes for aborting programs. Maybe related to
> https://svn.open-mpi.org/trac/ompi/changeset/32508 but not sure.
>
> [rvandevaart@drossetti-ivy0 environment]$ pwd
> /ivylogin/home/rvandevaart/tests/ompi-tests/trunk/ibm/environment
> [rvandevaart@drossetti-ivy0 environment]$ mpirun --mca odls_base_verbose
> 20 -np 2 abort
> [...stuff deleted...]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to
> tag 30 on child [[58714,1],0]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to
> tag 30 on child [[58714,1],1]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to
> tag 30 on child [[58714,1],0]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to
> tag 30 on child [[58714,1],1]
> **************************************************************************
> This program tests MPI_ABORT and generates error messages
> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
> **************************************************************************
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 3.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:wait_local_proc
> child process [[58714,1],0] pid 14955 terminated
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired child
> [[58714,1],0] exit code 3
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired
> checking abort file 
> /tmp/openmpi-sessions-rvandevaart@drossetti-ivy0_0/58714/1/0/aborted
> for child [[58714,1],0]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired child
> [[58714,1],0] died by call to abort
> *** glibc detected *** mpirun: double free or corruption (fasttop):
> 0x000000000130e210 ***
>
> From gdb:
> gdb) where
> #0  0x00007f75ede138e5 in raise () from /lib64/libc.so.6
> #1  0x00007f75ede1504d in abort () from /lib64/libc.so.6
> #2  0x00007f75ede517f7 in __libc_message () from /lib64/libc.so.6
> #3  0x00007f75ede57126 in malloc_printerr () from /lib64/libc.so.6
> #4  0x00007f75eef9eac4 in odls_base_default_wait_local_proc (pid=14955,
> status=768, cbdata=0x0)
>     at ../../../../orte/mca/odls/base/odls_base_default_fns.c:2007
> #5  0x00007f75eef60a78 in do_waitall (options=0) at
> ../../orte/runtime/orte_wait.c:554
> #6  0x00007f75eef60712 in orte_wait_signal_callback (fd=17, event=8,
> arg=0x7f75ef201400) at ../../orte/runtime/orte_wait.c:421
> #7  0x00007f75eecaecbe in event_signal_closure (base=0x1278370,
> ev=0x7f75ef201400)
>     at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1081
> #8  0x00007f75eecaf7e0 in event_process_active_single_queue
> (base=0x1278370, activeq=0x12788f0)
>     at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1359
> #9  0x00007f75eecafaca in event_process_active (base=0x1278370)
>     at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1437
> #10 0x00007f75eecb0148 in opal_libevent2021_event_base_loop
> (base=0x1278370, flags=1)
>     at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1645
> #11 0x0000000000405572 in orterun (argc=7, argv=0x7fffbdf1dd08) at
> ../../../../orte/tools/orterun/orterun.c:1078
> #12 0x0000000000403904 in main (argc=7, argv=0x7fffbdf1dd08) at
> ../../../../orte/tools/orterun/main.c:13
> (gdb) up
> #1  0x00007f75ede1504d in abort () from /lib64/libc.so.6
> (gdb) up
> #2  0x00007f75ede517f7 in __libc_message () from /lib64/libc.so.6
> (gdb) up
> #3  0x00007f75ede57126 in malloc_printerr () from /lib64/libc.so.6
> (gdb) up
> #4  0x00007f75eef9eac4 in odls_base_default_wait_local_proc (pid=14955,
> status=768, cbdata=0x0)
>     at ../../../../orte/mca/odls/base/odls_base_default_fns.c:2007
> 2007                free(abortfile);
> (gdb) print abortfile
> $1 = 0x130e210 ""
> (gdb)
>
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> -----------------------------------------------------------------------------------
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15632.php
>

Reply via email to