I found the problem - fix coming shortly.
> On Oct 27, 2015, at 12:49 PM, Ralph Castain <r...@open-mpi.org> wrote: > > I’m seeing similar failures in the master from several collectives. Looking > at the stack, here is what I see on all of them: > > (gdb) where > #0 0x00007fe49931a5d7 in raise () from /usr/lib64/libc.so.6 > #1 0x00007fe49931be08 in abort () from /usr/lib64/libc.so.6 > #2 0x00007fe49935ae07 in __libc_message () from /usr/lib64/libc.so.6 > #3 0x00007fe4993621fd in _int_free () from /usr/lib64/libc.so.6 > #4 0x00007fe498cfec95 in opal_list_destruct (list=0x25b06d0) at > class/opal_list.c:108 > #5 0x00007fe48f0d0fb0 in opal_obj_run_destructors (object=0x25b06d0) at > ../../../../opal/class/opal_object.h:460 > #6 0x00007fe48f0d132a in mca_pml_ob1_comm_proc_destruct (proc=0x25b05a0) at > pml_ob1_comm.c:42 > #7 0x00007fe48f0d0fb0 in opal_obj_run_destructors (object=0x25b05a0) at > ../../../../opal/class/opal_object.h:460 > #8 0x00007fe48f0d17c7 in mca_pml_ob1_comm_destruct (comm=0x25a0b40) at > pml_ob1_comm.c:71 > #9 0x00007fe48f0cdcd5 in opal_obj_run_destructors (object=0x25a0b40) at > ../../../../opal/class/opal_object.h:460 > #10 0x00007fe48f0cfb05 in mca_pml_ob1_del_comm (comm=0x259db90) at > pml_ob1.c:277 > #11 0x00007fe4998ef19f in ompi_comm_destruct (comm=0x259db90) at > communicator/comm_init.c:418 > #12 0x00007fe4998efa02 in opal_obj_run_destructors (object=0x259db90) at > ../opal/class/opal_object.h:460 > #13 0x00007fe4998f2bed in ompi_comm_free (comm=0x7ffdb43a6940) at > communicator/comm.c:1532 > #14 0x00007fe49993c858 in PMPI_Comm_disconnect (comm=0x7ffdb43a6940) at > pcomm_disconnect.c:75 > #15 0x00000000004014a6 in main (argc=1, argv=0x7ffdb43a6a58) at > ibarrier_inter.c:68 > > > This is with 16 procs on 2 nodes. Any ideas? > Ralph > > >> On Oct 27, 2015, at 12:32 PM, Ralph Castain <r...@open-mpi.org >> <mailto:r...@open-mpi.org>> wrote: >> >> Anyone have an idea of what this is all about? >> >> >> Command: mpirun --hostfile /home/common/hosts -np 16 --prefix >> >> /home/common/openmpi/build/foobar/ collective/alltoall_in_place >> Elapsed: 00:00:00 0.00u 0.00s >> Test: alltoall_in_place, np=16, variant=1: Passed >> *** Error in `collective/alltoallv_somezeros': free(): invalid pointer: >> 0x000000000127a180 *** >> ======= Backtrace: ========= >> /usr/lib64/libc.so.6(+0x7d1fd)[0x7f46e2fda1fd] >> /home/common/openmpi/build/foobar/lib/libopen-pal.so.0(+0x2cd05)[0x7f46e2976d05] >> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x6f74)[0x7f46dcefaf74] >> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x72ee)[0x7f46dcefb2ee] >> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x6f74)[0x7f46dcefaf74] >> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x76e8)[0x7f46dcefb6e8] >> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x3c73)[0x7f46dcef7c73] >> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_del_comm+0xcf)[0x7f46dcef9acc] >> /home/common/openmpi/build/foobar/lib/libmpi.so.0(+0x2d1df)[0x7f46e35671df] >> /home/common/openmpi/build/foobar/lib/libmpi.so.0(+0x2b473)[0x7f46e3565473] >> /home/common/openmpi/build/foobar/lib/libmpi.so.0(ompi_comm_finalize+0x23f)[0x7f46e3566bbd] >> /home/common/openmpi/build/foobar/lib/libmpi.so.0(ompi_mpi_finalize+0x5fd)[0x7f46e3593df7] >> /home/common/openmpi/build/foobar/lib/libmpi.so.0(PMPI_Finalize+0x59)[0x7f46e35bd6e5] >> >> Then I see a bunch of dump info, followed by: >> >> >> Command: mpirun --hostfile /home/common/hosts -np 16 --prefix >> >> /home/common/openmpi/build/foobar/ collective/alltoallv_somezeros >> Elapsed: 00:00:01 0.00u 0.00s >> Test: alltoallv_somezeros, np=16, variant=1: Passed >> >> >> >> Ralph >> >> >