I found the problem - fix coming shortly.

> On Oct 27, 2015, at 12:49 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> I’m seeing similar failures in the master from several collectives. Looking 
> at the stack, here is what I see on all of them:
> 
> (gdb) where
> #0  0x00007fe49931a5d7 in raise () from /usr/lib64/libc.so.6
> #1  0x00007fe49931be08 in abort () from /usr/lib64/libc.so.6
> #2  0x00007fe49935ae07 in __libc_message () from /usr/lib64/libc.so.6
> #3  0x00007fe4993621fd in _int_free () from /usr/lib64/libc.so.6
> #4  0x00007fe498cfec95 in opal_list_destruct (list=0x25b06d0) at 
> class/opal_list.c:108
> #5  0x00007fe48f0d0fb0 in opal_obj_run_destructors (object=0x25b06d0) at 
> ../../../../opal/class/opal_object.h:460
> #6  0x00007fe48f0d132a in mca_pml_ob1_comm_proc_destruct (proc=0x25b05a0) at 
> pml_ob1_comm.c:42
> #7  0x00007fe48f0d0fb0 in opal_obj_run_destructors (object=0x25b05a0) at 
> ../../../../opal/class/opal_object.h:460
> #8  0x00007fe48f0d17c7 in mca_pml_ob1_comm_destruct (comm=0x25a0b40) at 
> pml_ob1_comm.c:71
> #9  0x00007fe48f0cdcd5 in opal_obj_run_destructors (object=0x25a0b40) at 
> ../../../../opal/class/opal_object.h:460
> #10 0x00007fe48f0cfb05 in mca_pml_ob1_del_comm (comm=0x259db90) at 
> pml_ob1.c:277
> #11 0x00007fe4998ef19f in ompi_comm_destruct (comm=0x259db90) at 
> communicator/comm_init.c:418
> #12 0x00007fe4998efa02 in opal_obj_run_destructors (object=0x259db90) at 
> ../opal/class/opal_object.h:460
> #13 0x00007fe4998f2bed in ompi_comm_free (comm=0x7ffdb43a6940) at 
> communicator/comm.c:1532
> #14 0x00007fe49993c858 in PMPI_Comm_disconnect (comm=0x7ffdb43a6940) at 
> pcomm_disconnect.c:75
> #15 0x00000000004014a6 in main (argc=1, argv=0x7ffdb43a6a58) at 
> ibarrier_inter.c:68
> 
> 
> This is with 16 procs on 2 nodes. Any ideas?
> Ralph
> 
> 
>> On Oct 27, 2015, at 12:32 PM, Ralph Castain <r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>> wrote:
>> 
>> Anyone have an idea of what this is all about?
>> 
>> >> Command: mpirun     --hostfile /home/common/hosts -np 16 --prefix 
>> >> /home/common/openmpi/build/foobar/ collective/alltoall_in_place 
>>    Elapsed:       00:00:00 0.00u 0.00s
>>    Test: alltoall_in_place, np=16, variant=1: Passed
>> *** Error in `collective/alltoallv_somezeros': free(): invalid pointer: 
>> 0x000000000127a180 ***
>> ======= Backtrace: =========
>> /usr/lib64/libc.so.6(+0x7d1fd)[0x7f46e2fda1fd]
>> /home/common/openmpi/build/foobar/lib/libopen-pal.so.0(+0x2cd05)[0x7f46e2976d05]
>> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x6f74)[0x7f46dcefaf74]
>> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x72ee)[0x7f46dcefb2ee]
>> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x6f74)[0x7f46dcefaf74]
>> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x76e8)[0x7f46dcefb6e8]
>> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(+0x3c73)[0x7f46dcef7c73]
>> /home/common/openmpi/build/foobar/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_del_comm+0xcf)[0x7f46dcef9acc]
>> /home/common/openmpi/build/foobar/lib/libmpi.so.0(+0x2d1df)[0x7f46e35671df]
>> /home/common/openmpi/build/foobar/lib/libmpi.so.0(+0x2b473)[0x7f46e3565473]
>> /home/common/openmpi/build/foobar/lib/libmpi.so.0(ompi_comm_finalize+0x23f)[0x7f46e3566bbd]
>> /home/common/openmpi/build/foobar/lib/libmpi.so.0(ompi_mpi_finalize+0x5fd)[0x7f46e3593df7]
>> /home/common/openmpi/build/foobar/lib/libmpi.so.0(PMPI_Finalize+0x59)[0x7f46e35bd6e5]
>> 
>> Then I see a bunch of dump info, followed by:
>> 
>> >> Command: mpirun     --hostfile /home/common/hosts -np 16 --prefix 
>> >> /home/common/openmpi/build/foobar/ collective/alltoallv_somezeros 
>>    Elapsed:       00:00:01 0.00u 0.00s
>>    Test: alltoallv_somezeros, np=16, variant=1: Passed
>> 
>> 
>> 
>> Ralph
>> 
>> 
> 

Reply via email to