Gilles's findings are consistent with mine which showed the SEGVs to be in the coll/ml code. I've built with --enable-debug and so below is a backtrace (well, two actually) that might be helpful. Unfortunately the output of the two ranks did get slightly entangled.
-Paul $ ../INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' [bd-login][[43502,1],0][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] COLL-ML [bd-login:09106] *** Process received signal *** [bd-login][[43502,1],1][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] COLL-ML [bd-login:09107] *** Process received signal *** [bd-login:09107] Signal: Segmentation fault (11) [bd-login:09107] Signal code: Address not mapped (1) [bd-login:09107] Failing at address: 0x10 [bd-login:09107] [ 0] [bd-login:09106] Signal: Segmentation fault (11) [bd-login:09106] Signal code: Address not mapped (1) [bd-login:09106] Failing at address: 0x10 [bd-login:09106] [ 0] [0xfffa96c0418] [bd-login:09106] [ 1] [0xfff8f580418] [bd-login:09107] [ 1] /lib64/libc.so.6(_IO_vfprintf-0x157168)[0x80c9b5b968] [bd-login:09107] [ 2] /lib64/libc.so.6(_IO_vfprintf-0x157168)[0x80c9b5b968] [bd-login:09106] [ 2] /lib64/libc.so.6[0x80c9b600b4] [bd-login:09106] [ 3] /lib64/libc.so.6[0x80c9b600b4] [bd-login:09107] [ 3] /lib64/libc.so.6(_IO_vfprintf-0x157010)[0x80c9b5bac0] [bd-login:09107] [ 4] /lib64/libc.so.6(_IO_vfprintf-0x157010)[0x80c9b5bac0] [bd-login:09106] [ 4] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x66580)[0xfffa8296580] [bd-login:09106] [ 5] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x67604)[0xfffa8297604] [bd-login:09106] [ 6] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc-0x1af04)[0xfffa829784c] [bd-login:09106] [ 7] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block-0x607b4)[0xfffa8250d4c] [bd-login:09106] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x66580)[0xfff8e156580] [bd-login:09107] [ 5] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x67604)[0xfff8e157604] [bd-login:09107] [ 6] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc-0x1af04)[0xfff8e15784c] [bd-login:09107] [ 7] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block-0x607b4)[0xfff8e110d4c] [bd-login:09107] [ 8] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x165e4)[0xfff8e1065e4] [bd-login:09107] [ 9] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x1a7d8)[0xfff8e10a7d8] [bd-login:09107] [10] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query-0x61b50)[0xfff8e10f970] [bd-login:09107] [11] [ 8] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x165e4)[0xfffa82465e4] [bd-login:09106] [ 9] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x1a7d8)[0xfffa824a7d8] [bd-login:09106] [10] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query-0x61b50)[0xfffa824f970] [bd-login:09106] [11] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165ba0)[0xfff8f4b5ba0] [bd-login:09107] [12] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165b14)[0xfff8f4b5b14] [bd-login:09107] [13] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165ba0)[0xfffa95f5ba0] [bd-login:09106] [12] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165b14)[0xfffa95f5b14] [bd-login:09106] [13] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1659a8)[0xfffa95f59a8] [bd-login:09106] [14] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1659a8)[0xfff8f4b59a8] [bd-login:09107] [14] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1657ac)[0xfffa95f57ac] [bd-login:09106] [15] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1657ac)[0xfff8f4b57ac] [bd-login:09107] [15] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0x9b89c)[0xfff8f4ae3ec] [bd-login:09107] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0x9b89c)[0xfffa95ee3ec] [bd-login:09106] [16] [16] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(ompi_mpi_init-0x13f790)[0xfff8f401408] [bd-login:09107] [17] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(ompi_mpi_init-0x13f790)[0xfffa9541408] [bd-login:09106] [17] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(MPI_Init-0xf28d4)[0xfffa9591c74] [bd-login:09106] [18] examples/ring_c[0x1000099c] [bd-login:09106] [19] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(MPI_Init-0xf28d4)[0xfff8f451c74] [bd-login:09107] [18] examples/ring_c[0x1000099c] [bd-login:09107] [19] /lib64/libc.so.6[0x80c9b2bcd8] [bd-login:09107] [20] /lib64/libc.so.6[0x80c9b2bcd8] [bd-login:09106] [20] /lib64/libc.so.6(__libc_start_main-0x184e00)[0x80c9b2bed0] [bd-login:09107] *** End of error message *** /lib64/libc.so.6(__libc_start_main-0x184e00)[0x80c9b2bed0] [bd-login:09106] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node bd-login exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- On Thu, Jul 31, 2014 at 11:39 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Paul and Ralph, > > for what it's worth : > > a) i faced the very same issue on my (sloooow) qemu emulated ppc64 vm > b) i was able to run very basic programs when passing --mca coll ^ml to > mpirun > > Cheers, > > Gilles > > On 2014/08/01 12:30, Ralph Castain wrote: > > Yes, I fear this will require some effort to chase all the breakage down > given that (to my knowledge, at least) we lack PPC machines in the devel > group. > > > On Jul 31, 2014, at 5:46 PM, Paul Hargrove <phhargr...@lbl.gov> > <phhargr...@lbl.gov> wrote: > > > On the path to verifying George's atomics patch, I have started just by > verifying that I can still build the UNPATCHED trunk on each of the platforms > I listed. > > I have tried two PPC64/Linux systems so far and am seeing the same problem on > both. Though I can pass "make check" both platforms SEGV on > mpirun -mca btl sm,self -np 2 examples/ring_c > > Is this the expected state of the trunk on big-endian systems? > I am thinking in particular of > http://www.open-mpi.org/community/lists/devel/2014/07/15365.php in which > Ralph wrote: > > Yeah, my fix won't work for big endian machines - this is going to be an > issue across the > code base now, so we'll have to troll and fix it. I was doing the minimal > change required to > fix the trunk in the meantime. > > If this big-endian failure is not known/expected let me know and I'll > provide details. > Since testing George's patch only requires "make check" I can proceed with > that regardless. > > -Paul > > > On Thu, Jul 31, 2014 at 4:25 PM, George Bosilca <bosi...@icl.utk.edu> > <bosi...@icl.utk.edu> wrote: > Awesome, thanks Paul. When the results will be in we will fix whatever is > needed for these less common architectures. > > George. > > > > On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove <phhargr...@lbl.gov> > <phhargr...@lbl.gov> wrote: > > > On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove <phhargr...@lbl.gov> > <phhargr...@lbl.gov> wrote: > > On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca <bosi...@icl.utk.edu> > <bosi...@icl.utk.edu> wrote: > Paul, I know you have a pretty diverse range computers. Can you try to > compile and run a "make check" with the following patch? > > I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset of > those is still supported). > The ARM and MIPS system are emulators and take forever to build OMPI. > However, I am not even sure how soon I'll get to start this testing. > > > Add SPARC (v8plus and v9) to that list. > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > _______________________________________________ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15411.php > > > _______________________________________________ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15412.php > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15414.php > > > > _______________________________________________ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15425.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15436.php > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900