Steve,
If you will compile OMPI code with CFLAGS="-g" ,generate segfault core_file and send the core + IMB-MPI1 to me I will be able to understand the problem better.

Regards,
Pasha

Steve Wise wrote:

Hey Pasha,


I just applied r20872 and retested, and I still hit this seg fault. So I think this is a new bug.

Lemme pull the trunk and try that.



Pavel Shamis (Pasha) wrote:
I think you problem is related to this bug: https://svn.open-mpi.org/trac/ompi/ticket/1823

And it is resolved on the ompi-trunk.

Pasha.

Steve Wise wrote:
When this happens, that node logs this type of message also in /var/log/messages:

IMB-MPI1[8859]: segfault at 0000000000000018 rip 00002b7bfc880800 rsp 00007fffb1021330 error 4

Steve Wise wrote:
Hey Jeff,

Have you seen this? I'm hitting this regularly running on ofed-1.4.1-rc2.

Test:
[o...@vic12 ~]$ cat doit-ompi
#!/bin/sh
while : ; do
mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g --mca btl openib,self,sm --mca btl_openib_max_btls 1 /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16 bcast scatter sendrecv exchange </dev/null
done


Seg Fault output:

[vic21:04047] *** Process received signal ***
[vic21:04047] Signal: Segmentation fault (11)
[vic21:04047] Signal code: Address not mapped (1)
[vic21:04047] Failing at address: 0x18
[vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
[vic21:04047] [ 1] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc33800] [vic21:04047] [ 2] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc38c2d] [vic21:04047] [ 3] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc33fcb] [vic21:04047] [ 4] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc22af8] [vic21:04047] [ 5] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) [0x2b911933da33] [vic21:04047] [ 6] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) [0x2b9118ea3fb0] [vic21:04047] [ 7] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so [0x2b911ba1938f] [vic21:04047] [ 8] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so [0x2b911b601cde] [vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 [0x2b9118e7241b] [vic21:04047] [10] /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) [0x403498] [vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3ddd61d974] [vic21:04047] [12] /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]
[vic21:04047] *** End of error message ***

_______________________________________________
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

_______________________________________________
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


_______________________________________________
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Reply via email to