Hi,

I am running some tests on a PPC platform that is using LSF and I see the 
following problem every time I launch a job that runs on 2 nodes or more:

[crest1:49998] *** Process received signal ***
[crest1:49998] Signal: Segmentation fault (11)
[crest1:49998] Signal code: Address not mapped (1)
[crest1:49998] Failing at address: 0x10061636d2d
[crest1:49998] [ 0] [0x100000050478]
[crest1:49998] [ 1] 
/opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/libbat.so(+0x0)[0x1000009c0000]
[crest1:49998] [ 2] 
/opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/liblsf.so(straddr_isIPv4+0x44)[0x100000e31b64]
[crest1:49998] [ 3] 
/opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/libbat.so(lsb_pjob_array2LIST+0x114)[0x100000be79b4]
[crest1:49998] [ 4] 
/opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/libbat.so(lsb_pjob_constructList+0xfc)[0x100000becdbc]
[crest1:49998] [ 5] 
/opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/libbat.so(lsb_launch+0x184)[0x100000bed9c4]
[crest1:49998] [ 6] 
/ccs/home/gvh/install/crest/ompi3_llvm/lib/openmpi/mca_plm_lsf.so(+0x2660)[0x100000992660]
[crest1:49998] [ 7] 
/ccs/home/gvh/install/crest/ompi3_llvm/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x940)[0x1000001f7730]
[crest1:49998] [ 8] 
/ccs/home/gvh/install/crest/ompi3_llvm/bin/mpiexec[0x100013e4]
[crest1:49998] [ 9] 
/ccs/home/gvh/install/crest/ompi3_llvm/bin/mpiexec[0x10000f10]
[crest1:49998] [10] /lib64/power8/libc.so.6(+0x24580)[0x1000004f4580]
[crest1:49998] [11] 
/lib64/power8/libc.so.6(__libc_start_main+0xc4)[0x1000004f4774]
[crest1:49998] *** End of error message ***

I do not experience that problem with master and the only difference about the 
LSF support between master and the v3 branch is:

https://github.com/open-mpi/ompi/commit/92c996487c589ef8558a087ce2a9923dacdf0b99

If I can confirm that this change fixes the problem with the v3 branch, would 
you guys accept to bring it into the v3 branch?

Thanks,
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to