Fixed - just a lingering free that should have been removed
On Wed, Aug 13, 2014 at 8:21 AM, Rolf vandeVaart
wrote:
> I noticed MTT failures from last night and then reproduced this morning on
> 1.8 branch. Looks like maybe a double free. I assume it is related to
> fixes for aborting programs. Maybe related to
> https://svn.open-mpi.org/trac/ompi/changeset/32508 but not sure.
>
> [rvandevaart@drossetti-ivy0 environment]$ pwd
> /ivylogin/home/rvandevaart/tests/ompi-tests/trunk/ibm/environment
> [rvandevaart@drossetti-ivy0 environment]$ mpirun --mca odls_base_verbose
> 20 -np 2 abort
> [...stuff deleted...]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to
> tag 30 on child [[58714,1],0]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to
> tag 30 on child [[58714,1],1]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to
> tag 30 on child [[58714,1],0]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to
> tag 30 on child [[58714,1],1]
> **
> This program tests MPI_ABORT and generates error messages
> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
> **
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 3.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:wait_local_proc
> child process [[58714,1],0] pid 14955 terminated
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired child
> [[58714,1],0] exit code 3
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired
> checking abort file
> /tmp/openmpi-sessions-rvandevaart@drossetti-ivy0_0/58714/1/0/aborted
> for child [[58714,1],0]
> [drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired child
> [[58714,1],0] died by call to abort
> *** glibc detected *** mpirun: double free or corruption (fasttop):
> 0x0130e210 ***
>
> From gdb:
> gdb) where
> #0 0x7f75ede138e5 in raise () from /lib64/libc.so.6
> #1 0x7f75ede1504d in abort () from /lib64/libc.so.6
> #2 0x7f75ede517f7 in __libc_message () from /lib64/libc.so.6
> #3 0x7f75ede57126 in malloc_printerr () from /lib64/libc.so.6
> #4 0x7f75eef9eac4 in odls_base_default_wait_local_proc (pid=14955,
> status=768, cbdata=0x0)
> at ../../../../orte/mca/odls/base/odls_base_default_fns.c:2007
> #5 0x7f75eef60a78 in do_waitall (options=0) at
> ../../orte/runtime/orte_wait.c:554
> #6 0x7f75eef60712 in orte_wait_signal_callback (fd=17, event=8,
> arg=0x7f75ef201400) at ../../orte/runtime/orte_wait.c:421
> #7 0x7f75eecaecbe in event_signal_closure (base=0x1278370,
> ev=0x7f75ef201400)
> at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1081
> #8 0x7f75eecaf7e0 in event_process_active_single_queue
> (base=0x1278370, activeq=0x12788f0)
> at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1359
> #9 0x7f75eecafaca in event_process_active (base=0x1278370)
> at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1437
> #10 0x7f75eecb0148 in opal_libevent2021_event_base_loop
> (base=0x1278370, flags=1)
> at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1645
> #11 0x00405572 in orterun (argc=7, argv=0x7fffbdf1dd08) at
> ../../../../orte/tools/orterun/orterun.c:1078
> #12 0x00403904 in main (argc=7, argv=0x7fffbdf1dd08) at
> ../../../../orte/tools/orterun/main.c:13
> (gdb) up
> #1 0x7f75ede1504d in abort () from /lib64/libc.so.6
> (gdb) up
> #2 0x7f75ede517f7 in __libc_message () from /lib64/libc.so.6
> (gdb) up
> #3 0x7f75ede57126 in malloc_printerr () from /lib64/libc.so.6
> (gdb) up
> #4 0x7f75eef9eac4 in odls_base_default_wait_local_proc (pid=14955,
> status=768, cbdata=0x0)
> at ../../../../orte/mca/odls/base/odls_base_default_fns.c:2007
> 2007free(abortfile);
> (gdb) print abortfile
> $1 = 0x130e210 ""
> (gdb)
>
> ---
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information. Any unauthorized review, use, disclosure or
> distribution
> is prohibited. If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> ---
> ___
> devel mailing list
> de...@open-mpi.org
> Sub