Re: [OMPI devel] Trunk broken for PPC64?
Good suggestion, Paul - I have committed it in r32407 and added it to cmr #4826 Thanks! Ralph On Aug 1, 2014, at 1:12 AM, Paul Hargrovewrote: > Gilles, > > At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following: > > #ifdef HAVE_GETPAGESIZE > pagesize = getpagesize(); > #else > pagesize = 4096; > #endif > > While other places in the code use sysconf(), but not always consistently. > > And on some systems _SC_PAGESIZE is spelled as _SC_PAGE_SIZE. > Fortunately configure already checks these variations for you. > > So, I suggest > > #ifdef HAVE_GETPAGESIZE > pagesize = getpagesize(); > #elif defined(_SC_PAGESIZE ) > pagesize = sysconf(_SC_PAGESIZE); > #elif defined(_SC_PAGE_SIZE) > pagesize = sysconf(_SC_PAGE_SIZE); > #else > pagesize = 65536; /* safer to overestimate than under */ > #endif > > > opal_pagesize() anyone? > > -Paul > > On Fri, Aug 1, 2014 at 12:50 AM, Gilles Gouaillardet > wrote: > Paul, > > you are absolutly right ! > > in ompi/mca/coll/ml/coll_ml_lmngr.c at line 53, > cm->lmngr_alignment is hard coded to 4096 > > as a proof of concept, i hard coded it to 65536 and now coll/ml works just > fine > > i will now write a patch that uses sysconf(_SC_PAGESIZE) instead > > Cheers, > > Gilles > > On 2014/08/01 15:56, Paul Hargrove wrote: >> Hmm, maybe this has nothing to do with big-endian. >> Below is a backtrace from ring_c on an IA64 platform (definitely >> little-endian) that looks very similar to me. >> >> It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems. >> So, I wonder if that might be related. >> >> -Paul >> >> $ mpirun -mca btl sm,self -np 2 examples/ring_c' >> [altix][[26769,1],0][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] >> COLL-ML [altix:20418] *** Process received signal *** >> [altix:20418] Signal: Segmentation fault (11) >> [altix:20418] Signal code: Invalid permissions (2) >> [altix:20418] Failing at address: 0x16 >> [altix][[26769,1],1][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] >> COLL-ML [altix:20419] *** Process received signal *** >> [altix:20419] Signal: Segmentation fault (11) >> [altix:20419] Signal code: Invalid permissions (2) >> [altix:20419] Failing at address: 0x16 >> [altix:20418] [ 0] [0xa0010800] >> [altix:20418] [ 1] /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0] >> [altix:20418] [altix:20419] [ 0] [0xa0010800] >> [altix:20419] [ 1] [ 2] >> /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0] >> [altix:20419] [ 2] >> /lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x204b15d0] >> [altix:20419] [ 3] /lib/libc.so.6.1(+0x82860)[0x204b2860] >> [altix:20419] [ 4] >> /lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x2040] >> [altix:20419] [ 5] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc5a70)[0x21e55a70] >> [altix:20419] [ 6] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc84a0)[0x21e584a0] >> [altix:20419] [ 7] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc+0x100f520)[0x21e59110] >> [altix:20419] [ 8] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block+0xf6e940)[0x21db8540] >> [altix:20419] [ 9] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x10130)[0x21da0130] >> [altix:20419] [10] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x19970)[0x21da9970] >> [altix:20419] [11] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query+0xf6d6b0)[0x21db5830] >> [altix:20419] [12] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fbd0)[0x2028fbd0] >> [altix:20419] [13] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fac0)[0x2028fac0] >> [altix:20419] [14] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22f7e0)[0x2028f7e0] >> [altix:20419] [15] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22eac0)[0x2028eac0] >> [altix:20419] [16] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0xbcbb90)[0x2027e080] >> [altix:20419] [17] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(ompi_mpi_init-0xd38e70)[0x20110db0] >> [altix:20419] [18] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(MPI_Init-0xc8bf40)[0x201bdcf0] >> [altix:20419] [19]
Re: [OMPI devel] Trunk broken for PPC64?
Paul, i just commited r32393 (and made a CMR for v1.8) can you please give it a try ? in the mean time, i received your email ... sysconf is called directly (e.g. no #ifdef protected) in several other places : $ grep -R sysconf . | grep -v svn | grep -v sysconfdir | grep -v autom4te |grep PAGE | grep -v LARGE ./oshmem/mca/memheap/ptmalloc/malloc.c:#define malloc_getpagesize sysconf(_SC_PAGE_SIZE) ./ompi/mca/pml/base/pml_base_bsend.c:tmp = mca_pml_bsend_pagesz = sysconf(_SC_PAGESIZE); ./ompi/mca/coll/ml/coll_ml_lmngr.c:cm->lmngr_alignment = sysconf(_SC_PAGESIZE); ./orte/mca/oob/ud/oob_ud_module.c:posix_memalign ((void **)_mem->ptr, sysconf(_SC_PAGESIZE), buffer_len); ./opal/mca/memory/linux/malloc.c:#define malloc_getpagesize sysconf(_SC_PAGE_SIZE) ./opal/mca/hwloc/hwloc172/hwloc/src/topology-solaris.c: remainder = (uintptr_t) addr & (sysconf(_SC_PAGESIZE)-1); ./opal/mca/hwloc/hwloc172/hwloc/src/topology-linux.c: remainder = (uintptr_t) addr & (sysconf(_SC_PAGESIZE)-1); ./opal/mca/hwloc/hwloc172/hwloc/include/private/private.h:#define hwloc_getpagesize() sysconf(_SC_PAGE_SIZE) ./opal/mca/hwloc/hwloc172/hwloc/include/private/private.h:#define hwloc_getpagesize() sysconf(_SC_PAGESIZE) ./opal/mca/mpool/base/mpool_base_frame.c:mca_mpool_base_page_size = sysconf(_SC_PAGESIZE); ./opal/mca/btl/openib/connect/btl_openib_connect_sl.c:long page_size = sysconf(_SC_PAGESIZE); ./opal/mca/btl/openib/connect/btl_openib_connect_udcm.c: posix_memalign ((void **)>cm_buffer, sysconf(_SC_PAGESIZE), that is why i did not #ifdef protect it in coll/ml Cheers, Gilles On 2014/08/01 17:12, Paul Hargrove wrote: > Gilles, > > At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following: > > #ifdef HAVE_GETPAGESIZE > pagesize = getpagesize(); > #else > pagesize = 4096; > #endif > > While other places in the code use sysconf(), but not always consistently. > > And on some systems _SC_PAGESIZE is spelled as _SC_PAGE_SIZE. > Fortunately configure already checks these variations for you. > > So, I suggest > > #ifdef HAVE_GETPAGESIZE > pagesize = getpagesize(); > #elif defined(_SC_PAGESIZE ) > pagesize = sysconf(_SC_PAGESIZE); > #elif defined(_SC_PAGE_SIZE) > pagesize = sysconf(_SC_PAGE_SIZE); > #else > pagesize = 65536; /* safer to overestimate than under */ > #endif > > > opal_pagesize() anyone? > > -Paul > > On Fri, Aug 1, 2014 at 12:50 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Paul, >> >> you are absolutly right ! >> >> in ompi/mca/coll/ml/coll_ml_lmngr.c at line 53, >> cm->lmngr_alignment is hard coded to 4096 >> >> as a proof of concept, i hard coded it to 65536 and now coll/ml works just >> fine >> >> i will now write a patch that uses sysconf(_SC_PAGESIZE) instead >> >> Cheers, >> >> Gilles >> >> On 2014/08/01 15:56, Paul Hargrove wrote: >> >> Hmm, maybe this has nothing to do with big-endian. >> Below is a backtrace from ring_c on an IA64 platform (definitely >> little-endian) that looks very similar to me. >> >> It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems. >> So, I wonder if that might be related. >> >> -Paul >> >> $ mpirun -mca btl sm,self -np 2 examples/ring_c' >> [altix][[26769,1],0][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] >> COLL-ML [altix:20418] *** Process received signal *** >> [altix:20418] Signal: Segmentation fault (11) >> [altix:20418] Signal code: Invalid permissions (2) >> [altix:20418] Failing at address: 0x16 >> [altix][[26769,1],1][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] >> COLL-ML [altix:20419] *** Process received signal *** >> [altix:20419] Signal: Segmentation fault (11) >> [altix:20419] Signal code: Invalid permissions (2) >> [altix:20419] Failing at address: 0x16 >> [altix:20418] [ 0] [0xa0010800] >> [altix:20418] [ 1] /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0] >> [altix:20418] [altix:20419] [ 0] [0xa0010800] >> [altix:20419] [ 1] [ 2] >> /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0] >> [altix:20419] [ 2] >> /lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x204b15d0] >> [altix:20419] [ 3] /lib/libc.so.6.1(+0x82860)[0x204b2860] >> [altix:20419] [ 4] >> /lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x2040] >> [altix:20419] [ 5] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc5a70)[0x21e55a70] >> [altix:20419] [ 6] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc84a0)[0x21e584a0] >> [altix:20419] [ 7] >> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc+0x100f520)[0x21e59110] >> [altix:20419] [ 8] >>
Re: [OMPI devel] Trunk broken for PPC64?
Hmm, maybe this has nothing to do with big-endian. Below is a backtrace from ring_c on an IA64 platform (definitely little-endian) that looks very similar to me. It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems. So, I wonder if that might be related. -Paul $ mpirun -mca btl sm,self -np 2 examples/ring_c' [altix][[26769,1],0][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] COLL-ML [altix:20418] *** Process received signal *** [altix:20418] Signal: Segmentation fault (11) [altix:20418] Signal code: Invalid permissions (2) [altix:20418] Failing at address: 0x16 [altix][[26769,1],1][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] COLL-ML [altix:20419] *** Process received signal *** [altix:20419] Signal: Segmentation fault (11) [altix:20419] Signal code: Invalid permissions (2) [altix:20419] Failing at address: 0x16 [altix:20418] [ 0] [0xa0010800] [altix:20418] [ 1] /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0] [altix:20418] [altix:20419] [ 0] [0xa0010800] [altix:20419] [ 1] [ 2] /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0] [altix:20419] [ 2] /lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x204b15d0] [altix:20419] [ 3] /lib/libc.so.6.1(+0x82860)[0x204b2860] [altix:20419] [ 4] /lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x2040] [altix:20419] [ 5] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc5a70)[0x21e55a70] [altix:20419] [ 6] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc84a0)[0x21e584a0] [altix:20419] [ 7] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc+0x100f520)[0x21e59110] [altix:20419] [ 8] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block+0xf6e940)[0x21db8540] [altix:20419] [ 9] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x10130)[0x21da0130] [altix:20419] [10] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x19970)[0x21da9970] [altix:20419] [11] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query+0xf6d6b0)[0x21db5830] [altix:20419] [12] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fbd0)[0x2028fbd0] [altix:20419] [13] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fac0)[0x2028fac0] [altix:20419] [14] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22f7e0)[0x2028f7e0] [altix:20419] [15] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22eac0)[0x2028eac0] [altix:20419] [16] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0xbcbb90)[0x2027e080] [altix:20419] [17] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(ompi_mpi_init-0xd38e70)[0x20110db0] [altix:20419] [18] /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(MPI_Init-0xc8bf40)[0x201bdcf0] [altix:20419] [19] examples/ring_c[0x4c00] [altix:20419] [20] /lib/libc.so.6.1(__libc_start_main-0x9f56b0)[0x20454590] [altix:20419] [21] examples/ring_c[0x4a20] [altix:20419] *** End of error message *** /lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x204b15d0] [altix:20418] [ 3] /lib/libc.so.6.1(+0x82860)[0x204b2860] [altix:20418] [ 4] /lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x2040] On Thu, Jul 31, 2014 at 11:47 PM, Paul Hargrovewrote: > Gilles's findings are consistent with mine which showed the SEGVs to be in > the coll/ml code. > I've built with --enable-debug and so below is a backtrace (well, two > actually) that might be helpful. > Unfortunately the output of the two ranks did get slightly entangled. > > -Paul > > $ ../INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' > [bd-login][[43502,1],0][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] > COLL-ML [bd-login:09106] *** Process received signal *** > [bd-login][[43502,1],1][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] > COLL-ML [bd-login:09107] *** Process received signal *** > [bd-login:09107] Signal: Segmentation fault (11) > [bd-login:09107] Signal code: Address not mapped (1) > [bd-login:09107] Failing at address: 0x10 > [bd-login:09107] [ 0] [bd-login:09106] Signal: Segmentation fault (11) > [bd-login:09106] Signal code: Address not mapped (1) > [bd-login:09106]
Re: [OMPI devel] Trunk broken for PPC64?
Gilles's findings are consistent with mine which showed the SEGVs to be in the coll/ml code. I've built with --enable-debug and so below is a backtrace (well, two actually) that might be helpful. Unfortunately the output of the two ranks did get slightly entangled. -Paul $ ../INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' [bd-login][[43502,1],0][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] COLL-ML [bd-login:09106] *** Process received signal *** [bd-login][[43502,1],1][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init] COLL-ML [bd-login:09107] *** Process received signal *** [bd-login:09107] Signal: Segmentation fault (11) [bd-login:09107] Signal code: Address not mapped (1) [bd-login:09107] Failing at address: 0x10 [bd-login:09107] [ 0] [bd-login:09106] Signal: Segmentation fault (11) [bd-login:09106] Signal code: Address not mapped (1) [bd-login:09106] Failing at address: 0x10 [bd-login:09106] [ 0] [0xfffa96c0418] [bd-login:09106] [ 1] [0xfff8f580418] [bd-login:09107] [ 1] /lib64/libc.so.6(_IO_vfprintf-0x157168)[0x80c9b5b968] [bd-login:09107] [ 2] /lib64/libc.so.6(_IO_vfprintf-0x157168)[0x80c9b5b968] [bd-login:09106] [ 2] /lib64/libc.so.6[0x80c9b600b4] [bd-login:09106] [ 3] /lib64/libc.so.6[0x80c9b600b4] [bd-login:09107] [ 3] /lib64/libc.so.6(_IO_vfprintf-0x157010)[0x80c9b5bac0] [bd-login:09107] [ 4] /lib64/libc.so.6(_IO_vfprintf-0x157010)[0x80c9b5bac0] [bd-login:09106] [ 4] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x66580)[0xfffa8296580] [bd-login:09106] [ 5] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x67604)[0xfffa8297604] [bd-login:09106] [ 6] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc-0x1af04)[0xfffa829784c] [bd-login:09106] [ 7] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block-0x607b4)[0xfffa8250d4c] [bd-login:09106] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x66580)[0xfff8e156580] [bd-login:09107] [ 5] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x67604)[0xfff8e157604] [bd-login:09107] [ 6] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc-0x1af04)[0xfff8e15784c] [bd-login:09107] [ 7] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block-0x607b4)[0xfff8e110d4c] [bd-login:09107] [ 8] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x165e4)[0xfff8e1065e4] [bd-login:09107] [ 9] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x1a7d8)[0xfff8e10a7d8] [bd-login:09107] [10] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query-0x61b50)[0xfff8e10f970] [bd-login:09107] [11] [ 8] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x165e4)[0xfffa82465e4] [bd-login:09106] [ 9] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x1a7d8)[0xfffa824a7d8] [bd-login:09106] [10] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query-0x61b50)[0xfffa824f970] [bd-login:09106] [11] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165ba0)[0xfff8f4b5ba0] [bd-login:09107] [12] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165b14)[0xfff8f4b5b14] [bd-login:09107] [13] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165ba0)[0xfffa95f5ba0] [bd-login:09106] [12] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165b14)[0xfffa95f5b14] [bd-login:09106] [13] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1659a8)[0xfffa95f59a8] [bd-login:09106] [14] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1659a8)[0xfff8f4b59a8] [bd-login:09107] [14] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1657ac)[0xfffa95f57ac] [bd-login:09106] [15] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1657ac)[0xfff8f4b57ac] [bd-login:09107] [15] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0x9b89c)[0xfff8f4ae3ec] [bd-login:09107] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0x9b89c)[0xfffa95ee3ec] [bd-login:09106] [16] [16] /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(ompi_mpi_init-0x13f790)[0xfff8f401408] [bd-login:09107] [17]
Re: [OMPI devel] Trunk broken for PPC64?
Paul and Ralph, for what it's worth : a) i faced the very same issue on my (slw) qemu emulated ppc64 vm b) i was able to run very basic programs when passing --mca coll ^ml to mpirun Cheers, Gilles On 2014/08/01 12:30, Ralph Castain wrote: > Yes, I fear this will require some effort to chase all the breakage down > given that (to my knowledge, at least) we lack PPC machines in the devel > group. > > > On Jul 31, 2014, at 5:46 PM, Paul Hargrovewrote: > >> On the path to verifying George's atomics patch, I have started just by >> verifying that I can still build the UNPATCHED trunk on each of the >> platforms I listed. >> >> I have tried two PPC64/Linux systems so far and am seeing the same problem >> on both. Though I can pass "make check" both platforms SEGV on >>mpirun -mca btl sm,self -np 2 examples/ring_c >> >> Is this the expected state of the trunk on big-endian systems? >> I am thinking in particular of >> http://www.open-mpi.org/community/lists/devel/2014/07/15365.php in which >> Ralph wrote: >>> Yeah, my fix won't work for big endian machines - this is going to be an >>> issue across the >>> code base now, so we'll have to troll and fix it. I was doing the minimal >>> change required to >>> fix the trunk in the meantime. >> If this big-endian failure is not known/expected let me know and I'll >> provide details. >> Since testing George's patch only requires "make check" I can proceed with >> that regardless. >> >> -Paul >> >> >> On Thu, Jul 31, 2014 at 4:25 PM, George Bosilca wrote: >> Awesome, thanks Paul. When the results will be in we will fix whatever is >> needed for these less common architectures. >> >> George. >> >> >> >> On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove wrote: >> >> >> On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove wrote: >> >> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca wrote: >> Paul, I know you have a pretty diverse range computers. Can you try to >> compile and run a "make check" with the following patch? >> >> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset >> of those is still supported). >> The ARM and MIPS system are emulators and take forever to build OMPI. >> However, I am not even sure how soon I'll get to start this testing. >> >> >> Add SPARC (v8plus and v9) to that list. >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15411.php >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15412.php >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15414.php > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15425.php
Re: [OMPI devel] Trunk broken for PPC64?
Yes, I fear this will require some effort to chase all the breakage down given that (to my knowledge, at least) we lack PPC machines in the devel group. On Jul 31, 2014, at 5:46 PM, Paul Hargrovewrote: > On the path to verifying George's atomics patch, I have started just by > verifying that I can still build the UNPATCHED trunk on each of the platforms > I listed. > > I have tried two PPC64/Linux systems so far and am seeing the same problem on > both. Though I can pass "make check" both platforms SEGV on >mpirun -mca btl sm,self -np 2 examples/ring_c > > Is this the expected state of the trunk on big-endian systems? > I am thinking in particular of > http://www.open-mpi.org/community/lists/devel/2014/07/15365.php in which > Ralph wrote: > > Yeah, my fix won't work for big endian machines - this is going to be an > > issue across the > > code base now, so we'll have to troll and fix it. I was doing the minimal > > change required to > > fix the trunk in the meantime. > > If this big-endian failure is not known/expected let me know and I'll provide > details. > Since testing George's patch only requires "make check" I can proceed with > that regardless. > > -Paul > > > On Thu, Jul 31, 2014 at 4:25 PM, George Bosilca wrote: > Awesome, thanks Paul. When the results will be in we will fix whatever is > needed for these less common architectures. > > George. > > > > On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove wrote: > > > On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove wrote: > > On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca wrote: > Paul, I know you have a pretty diverse range computers. Can you try to > compile and run a “make check” with the following patch? > > I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset of > those is still supported). > The ARM and MIPS system are emulators and take forever to build OMPI. > However, I am not even sure how soon I'll get to start this testing. > > > Add SPARC (v8plus and v9) to that list. > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15411.php > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15412.php > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15414.php
Re: [OMPI devel] Trunk broken for --with-devel-headers?
Works okay with a fresh checkout, so something in my tree must have been hosed. On Jul 25, 2014, at 8:51 AM, Ralph Castainwrote: > It seems to be only happening on my Mac, not Linux, but I'll try with a fresh > checkout > > On Jul 25, 2014, at 8:51 AM, Jeff Squyres (jsquyres) > wrote: > >> I'm unable to replicate... perhaps you have a stale install tree? >> >> >> On Jul 24, 2014, at 6:36 PM, Ralph Castain wrote: >> >>> Hey folks >>> >>> Something in the last day or so appears to have broken the trunk's ability >>> to run --with-devel-headers. It looks like the headers are being installed >>> correctly, but mpicc fails to compile a program that uses them - the >>> include passes, but the linker fails: >>> >>> Undefined symbols for architecture x86_64: >>> "_opal_hwloc172_hwloc_bitmap_alloc", referenced from: >>> _main in hello.o >>> "_opal_hwloc172_hwloc_bitmap_list_asprintf", referenced from: >>> _main in hello.o >>> "_opal_hwloc172_hwloc_get_cpubind", referenced from: >>> _main in hello.o >>> "_opal_hwloc_topology", referenced from: >>> _main in hello.o >>> "_orte_process_info", referenced from: >>> _main in hello.o >>> ld: symbol(s) not found for architecture x86_64 >>> collect2: error: ld returned 1 exit status >>> >>> Anybody else seeing this? >>> Ralph >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15262.php >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15265.php >
Re: [OMPI devel] Trunk broken for --with-devel-headers?
It seems to be only happening on my Mac, not Linux, but I'll try with a fresh checkout On Jul 25, 2014, at 8:51 AM, Jeff Squyres (jsquyres)wrote: > I'm unable to replicate... perhaps you have a stale install tree? > > > On Jul 24, 2014, at 6:36 PM, Ralph Castain wrote: > >> Hey folks >> >> Something in the last day or so appears to have broken the trunk's ability >> to run --with-devel-headers. It looks like the headers are being installed >> correctly, but mpicc fails to compile a program that uses them - the include >> passes, but the linker fails: >> >> Undefined symbols for architecture x86_64: >> "_opal_hwloc172_hwloc_bitmap_alloc", referenced from: >> _main in hello.o >> "_opal_hwloc172_hwloc_bitmap_list_asprintf", referenced from: >> _main in hello.o >> "_opal_hwloc172_hwloc_get_cpubind", referenced from: >> _main in hello.o >> "_opal_hwloc_topology", referenced from: >> _main in hello.o >> "_orte_process_info", referenced from: >> _main in hello.o >> ld: symbol(s) not found for architecture x86_64 >> collect2: error: ld returned 1 exit status >> >> Anybody else seeing this? >> Ralph >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15262.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15265.php
Re: [OMPI devel] Trunk broken for --with-devel-headers?
I'm unable to replicate... perhaps you have a stale install tree? On Jul 24, 2014, at 6:36 PM, Ralph Castainwrote: > Hey folks > > Something in the last day or so appears to have broken the trunk's ability to > run --with-devel-headers. It looks like the headers are being installed > correctly, but mpicc fails to compile a program that uses them - the include > passes, but the linker fails: > > Undefined symbols for architecture x86_64: > "_opal_hwloc172_hwloc_bitmap_alloc", referenced from: > _main in hello.o > "_opal_hwloc172_hwloc_bitmap_list_asprintf", referenced from: > _main in hello.o > "_opal_hwloc172_hwloc_get_cpubind", referenced from: > _main in hello.o > "_opal_hwloc_topology", referenced from: > _main in hello.o > "_orte_process_info", referenced from: > _main in hello.o > ld: symbol(s) not found for architecture x86_64 > collect2: error: ld returned 1 exit status > > Anybody else seeing this? > Ralph > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15262.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI devel] Trunk broken for --with-devel-headers?
Hey folks Something in the last day or so appears to have broken the trunk's ability to run --with-devel-headers. It looks like the headers are being installed correctly, but mpicc fails to compile a program that uses them - the include passes, but the linker fails: Undefined symbols for architecture x86_64: "_opal_hwloc172_hwloc_bitmap_alloc", referenced from: _main in hello.o "_opal_hwloc172_hwloc_bitmap_list_asprintf", referenced from: _main in hello.o "_opal_hwloc172_hwloc_get_cpubind", referenced from: _main in hello.o "_opal_hwloc_topology", referenced from: _main in hello.o "_orte_process_info", referenced from: _main in hello.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status Anybody else seeing this? Ralph
Re: [OMPI devel] trunk broken
Looks to me like the warning message saids it all - the problem is in openib. The reason we took this action was to force the problems to the surface across the code base so that people would address them. We've tried before to just ask people to set the right flags to enable async progress and fix things, but nobody ever does it. Hence this action. So please investigate the openib BTL and see what needs to be done. I'll poke Nathan in a couple of hours as well. Thanks Ralph On Wed, Jun 25, 2014 at 6:28 AM, Mike Dubmanwrote: > tried with vader - same crash > > *14:14:22* [vegas12:32068] 7 more processes have sent help message > help-mca-var.txt / deprecated-mca-env*14:14:22* [vegas12:32068] Set MCA > parameter "orte_base_help_aggregate" to 0 to see all help / error > messages*14:14:22* + > LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*14:14:22* > + OMPI_MCA_scoll_fca_enable=1*14:14:22* + OMPI_MCA_scoll_fca_np=0*14:14:22* > + OMPI_MCA_pml=ob1*14:14:22* + OMPI_MCA_btl=vader,self,openib*14:14:22* + > OMPI_MCA_spml=yoda*14:14:22* + > OMPI_MCA_memheap_mr_interleave_factor=8*14:14:22* + > OMPI_MCA_memheap=ptmalloc*14:14:22* + > OMPI_MCA_btl_openib_if_include=mlx4_0:1*14:14:22* + > OMPI_MCA_rmaps_base_dist_hca=mlx4_0*14:14:22* + > OMPI_MCA_memheap_base_hca_name=mlx4_0*14:14:22* + > OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*14:14:22* + > MXM_RDMA_PORTS=mlx4_0:1*14:14:22* + SHMEM_SYMMETRIC_HEAP_SIZE=1024M*14:14:22* > + timeout -s SIGSEGV 3m > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun > -np 8 > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*14:14:22* > [vegas12][[4652,1],1][btl_openib_component.c:909:device_destruct] Failed to > cancel OpenIB progress thread*14:14:22* > [vegas12][[4652,1],0][btl_openib_component.c:909:device_destruct] Failed to > cancel OpenIB progress thread*14:14:22* > --*14:14:22* > WARNING: The openib BTL was directed to use "eager RDMA" for short*14:14:22* > messages, but the openib BTL was compiled with progress threads*14:14:22* > support. Short eager RDMA is not yet supported with progress > threads;*14:14:22* its use has been disabled in this job.*14:14:22* > *14:14:22* This is a warning only; you job will attempt to > continue.*14:14:22* > --*14:14:22* > [vegas12][[4652,1],5][btl_openib_component.c:909:device_destruct] Failed to > cancel OpenIB progress thread*14:14:22* [vegas12:32108] *** Process received > signal 14:14:22* [vegas12:32108] Signal: Segmentation fault > (11)*14:14:22* [vegas12:32108] Signal code: Address not mapped (1)*14:14:22* > [vegas12:32108] Failing at address: (nil)*14:14:22* [vegas12:32108] [ 0] > /lib64/libpthread.so.0[0x3937c0f500]*14:14:22* [vegas12:32108] [ 1] > /usr/lib64/libibverbs.so.1(ibv_destroy_comp_channel+0x16)[0x3b7760bf46]*14:14:22* > [vegas12:32108] [ 2] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xdf02)[0x73fc1f02]*14:14:22* > [vegas12:32108] [ 3] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xf161)[0x73fc3161]*14:14:22* > [vegas12:32108] [ 4] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0x12ab1)[0x73fc6ab1]*14:14:22* > [vegas12:32108] [ 5] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_btl_base_select+0x117)[0x77a29807]*14:14:22* > [vegas12:32108] [ 6] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x741ed7e2]*14:14:22* > [vegas12:32108] [ 7] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_bml_base_init+0x99)[0x77a29009]*14:14:22* > [vegas12:32108] [ 8] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_pml_ob1.so(+0x58b5)[0x735848b5]*14:14:22* > [vegas12:32108] [ 9] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_pml_base_select+0x1e0)[0x77a3c590]*14:14:22* > [vegas12:32108] [10] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(ompi_mpi_init+0x455)[0x77a06bf5]*14:14:22* > [vegas12:32108] [11] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(oshmem_shmem_init+0xfd)[0x77ca66dd]*14:14:22* > [vegas12:32108] [12] >
Re: [OMPI devel] trunk broken
tried with vader - same crash *14:14:22* [vegas12:32068] 7 more processes have sent help message help-mca-var.txt / deprecated-mca-env*14:14:22* [vegas12:32068] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages*14:14:22* + LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*14:14:22* + OMPI_MCA_scoll_fca_enable=1*14:14:22* + OMPI_MCA_scoll_fca_np=0*14:14:22* + OMPI_MCA_pml=ob1*14:14:22* + OMPI_MCA_btl=vader,self,openib*14:14:22* + OMPI_MCA_spml=yoda*14:14:22* + OMPI_MCA_memheap_mr_interleave_factor=8*14:14:22* + OMPI_MCA_memheap=ptmalloc*14:14:22* + OMPI_MCA_btl_openib_if_include=mlx4_0:1*14:14:22* + OMPI_MCA_rmaps_base_dist_hca=mlx4_0*14:14:22* + OMPI_MCA_memheap_base_hca_name=mlx4_0*14:14:22* + OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*14:14:22* + MXM_RDMA_PORTS=mlx4_0:1*14:14:22* + SHMEM_SYMMETRIC_HEAP_SIZE=1024M*14:14:22* + timeout -s SIGSEGV 3m /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun -np 8 /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*14:14:22* [vegas12][[4652,1],1][btl_openib_component.c:909:device_destruct] Failed to cancel OpenIB progress thread*14:14:22* [vegas12][[4652,1],0][btl_openib_component.c:909:device_destruct] Failed to cancel OpenIB progress thread*14:14:22* --*14:14:22* WARNING: The openib BTL was directed to use "eager RDMA" for short*14:14:22* messages, but the openib BTL was compiled with progress threads*14:14:22* support. Short eager RDMA is not yet supported with progress threads;*14:14:22* its use has been disabled in this job.*14:14:22* *14:14:22* This is a warning only; you job will attempt to continue.*14:14:22* --*14:14:22* [vegas12][[4652,1],5][btl_openib_component.c:909:device_destruct] Failed to cancel OpenIB progress thread*14:14:22* [vegas12:32108] *** Process received signal 14:14:22* [vegas12:32108] Signal: Segmentation fault (11)*14:14:22* [vegas12:32108] Signal code: Address not mapped (1)*14:14:22* [vegas12:32108] Failing at address: (nil)*14:14:22* [vegas12:32108] [ 0] /lib64/libpthread.so.0[0x3937c0f500]*14:14:22* [vegas12:32108] [ 1] /usr/lib64/libibverbs.so.1(ibv_destroy_comp_channel+0x16)[0x3b7760bf46]*14:14:22* [vegas12:32108] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xdf02)[0x73fc1f02]*14:14:22* [vegas12:32108] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xf161)[0x73fc3161]*14:14:22* [vegas12:32108] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0x12ab1)[0x73fc6ab1]*14:14:22* [vegas12:32108] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_btl_base_select+0x117)[0x77a29807]*14:14:22* [vegas12:32108] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x741ed7e2]*14:14:22* [vegas12:32108] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_bml_base_init+0x99)[0x77a29009]*14:14:22* [vegas12:32108] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_pml_ob1.so(+0x58b5)[0x735848b5]*14:14:22* [vegas12:32108] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_pml_base_select+0x1e0)[0x77a3c590]*14:14:22* [vegas12:32108] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(ompi_mpi_init+0x455)[0x77a06bf5]*14:14:22* [vegas12:32108] [11] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(oshmem_shmem_init+0xfd)[0x77ca66dd]*14:14:22* [vegas12:32108] [12] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(shmem_init+0x28)[0x77ca9328]*14:14:22* [vegas12:32108] [13] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x40077d]*14:14:22* [vegas12:32108] [14] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]*14:14:22* [vegas12:32108] [15] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x4006a9]*14:14:22* [vegas12:32108] *** End of error message 14:14:22* [vegas12:32112] *** Process received signal 14:14:22* [vegas12:32112] Signal: Segmentation fault (11)*14:14:* On Wed, Jun 25, 2014 at 9:11 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Mike, > > could you try again with > > OMPI_MCA_btl=vader,self,openib > > it seems the sm module causes a hang >
Re: [OMPI devel] trunk broken
will do and update shortly. On Wed, Jun 25, 2014 at 9:11 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Mike, > > could you try again with > > OMPI_MCA_btl=vader,self,openib > > it seems the sm module causes a hang > (which later causes the timeout sending a SIGSEGV) > > Cheers, > > Gilles > > On 2014/06/25 14:22, Mike Dubman wrote: > > Hi, > > The following commit broke trunk in jenkins: > > > Per the OMPI developer conference, remove the last vestiges of > > OMPI_USE_PROGRESS_THREADS > > > > *22:15:09* + > LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09* > > + OMPI_MCA_scoll_fca_enable=1*22:15:09* + > > OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* + > > OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* + > > OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* + > > OMPI_MCA_memheap=ptmalloc*22:15:09* + > > OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* + > > OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* + > > OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* + > > OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* + > > MXM_RDMA_PORTS=mlx4_0:1*22:15:09* + > > SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m > > > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun > > -np 8 > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09* > > [vegas12:08101] *** Process received signal 22:15:09* > > [vegas12:08101] Signal: Segmentation fault (11)*22:15:09* > > [vegas12:08101] Signal code: Address not mapped (1)*22:15:09* > > [vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [ > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15055.php >
Re: [OMPI devel] trunk broken
We should have given more of a "heads up" here. We recognize that the trunk may well become unstable as we can't test all the variations, and clearly some timing issues are going to arise with this change. Our hope is that we can iron them out quickly. If not, then we'll revert and try again. You also may find that you need to disable coll/ml - that is one we've identified here and Nathan should have a fix for shortly. On Wed, Jun 25, 2014 at 1:11 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Mike, > > could you try again with > > OMPI_MCA_btl=vader,self,openib > > it seems the sm module causes a hang > (which later causes the timeout sending a SIGSEGV) > > Cheers, > > Gilles > > On 2014/06/25 14:22, Mike Dubman wrote: > > Hi, > > The following commit broke trunk in jenkins: > > > Per the OMPI developer conference, remove the last vestiges of > > OMPI_USE_PROGRESS_THREADS > > > > *22:15:09* + > LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09* > > + OMPI_MCA_scoll_fca_enable=1*22:15:09* + > > OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* + > > OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* + > > OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* + > > OMPI_MCA_memheap=ptmalloc*22:15:09* + > > OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* + > > OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* + > > OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* + > > OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* + > > MXM_RDMA_PORTS=mlx4_0:1*22:15:09* + > > SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m > > > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun > > -np 8 > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09* > > [vegas12:08101] *** Process received signal 22:15:09* > > [vegas12:08101] Signal: Segmentation fault (11)*22:15:09* > > [vegas12:08101] Signal code: Address not mapped (1)*22:15:09* > > [vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [ > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15055.php >
Re: [OMPI devel] trunk broken
Mike, could you try again with OMPI_MCA_btl=vader,self,openib it seems the sm module causes a hang (which later causes the timeout sending a SIGSEGV) Cheers, Gilles On 2014/06/25 14:22, Mike Dubman wrote: > Hi, > The following commit broke trunk in jenkins: > Per the OMPI developer conference, remove the last vestiges of > OMPI_USE_PROGRESS_THREADS > > *22:15:09* + > LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09* > + OMPI_MCA_scoll_fca_enable=1*22:15:09* + > OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* + > OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* + > OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* + > OMPI_MCA_memheap=ptmalloc*22:15:09* + > OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* + > OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* + > OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* + > OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* + > MXM_RDMA_PORTS=mlx4_0:1*22:15:09* + > SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun > -np 8 > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09* > [vegas12:08101] *** Process received signal 22:15:09* > [vegas12:08101] Signal: Segmentation fault (11)*22:15:09* > [vegas12:08101] Signal code: Address not mapped (1)*22:15:09* > [vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [ >
[OMPI devel] trunk broken
Hi, The following commit broke trunk in jenkins: >>>Per the OMPI developer conference, remove the last vestiges of OMPI_USE_PROGRESS_THREADS *22:15:09* + LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09* + OMPI_MCA_scoll_fca_enable=1*22:15:09* + OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* + OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* + OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* + OMPI_MCA_memheap=ptmalloc*22:15:09* + OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* + OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* + OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* + OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* + MXM_RDMA_PORTS=mlx4_0:1*22:15:09* + SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun -np 8 /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09* [vegas12:08101] *** Process received signal 22:15:09* [vegas12:08101] Signal: Segmentation fault (11)*22:15:09* [vegas12:08101] Signal code: Address not mapped (1)*22:15:09* [vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [ 0] /lib64/libpthread.so.0[0x3937c0f500]*22:15:09* [vegas12:08101] [ 1] /usr/lib64/libibverbs.so.1(ibv_destroy_comp_channel+0x16)[0x73785f46]*22:15:09* [vegas12:08101] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xdf02)[0x73fc1f02]*22:15:09* [vegas12:08101] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xf161)[0x73fc3161]*22:15:09* [vegas12:08101] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0x12b41)[0x73fc6b41]*22:15:09* [vegas12:08101] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_btl_base_select+0x117)[0x77a29807]*22:15:09* [vegas12:08101] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x741ed7e2]*22:15:09* [vegas12:08101] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_bml_base_init+0x99)[0x77a29009]*22:15:09* [vegas12:08101] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_pml_ob1.so(+0x58b5)[0x72f528b5]*22:15:09* [vegas12:08101] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_pml_base_select+0x1e0)[0x77a3c590]*22:15:09* [vegas12:08101] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(ompi_mpi_init+0x455)[0x77a06bf5]*22:15:09* [vegas12:08101] [11] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(oshmem_shmem_init+0xfd)[0x77ca66dd]*22:15:09* [vegas12:08101] [12] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(shmem_init+0x28)[0x77ca9328]*22:15:09* [vegas12:08101] [13] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x40077d]*22:15:09* [vegas12:08101] [14] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]*22:15:09* [vegas12:08101] [15] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x4006a9]*22:15:09* [vegas12:08101] *** End of error message 22:15:09* [vegas12][[28889,1],2][btl_openib_component.c:909:device_destruct] Failed to cancel OpenIB progress thread*22:15:09* [vegas12][[28889,1],5][btl_openib_component.c:909:device_destruct] Failed to cancel OpenIB progress thread*22:15:09* [vegas12:08099] *** Process received signal 22:15:09* [vegas12:08099] Signal: Segmentation fault (11)*22:15:09* [vegas12:08099] Signal code: Address not mapped (1)*22:15:09* [vegas12:08099] Failing at address: (nil)*22:15:09* [vegas12:08099] [ 0] /lib64/libpthread.so.0[0x3937c0f500]*22:15:09* [vegas12:08099] [ 1] /usr/lib64/libibverbs.so.1(ibv_destroy_comp_channel+0x16)[0x73785f46]*22:15:09* [vegas12:08099] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xdf02)[0x73fc1f02]*22:15:09* [vegas12:08099] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xf161)[0x73fc3161]*22:15:09* [vegas12:08099] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0x12b41)[0x73fc6b41]*22:15:09* [vegas12:08099] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_btl_base_select+0x117)[0x77a29807]*22:15:09*
[OMPI devel] trunk - broken logic for oshmem:bindings:fort
Building the trunk on FreeBSD-9/x86-64, and using gmake to work around the non-portable examples/Makefile, I *still* encounter issues with shmemfort when running "gmake" in the examples subdirectory: $ gmake mpicc -ghello_c.c -o hello_c mpicc -gring_c.c -o ring_c mpicc -gconnectivity_c.c -o connectivity_c gmake[1]: Entering directory `/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples' shmemcc -g hello_oshmem_c.c -o hello_oshmem gmake[1]: Leaving directory `/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples' gmake[1]: Entering directory `/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples' shmemcc -g ring_oshmem_c.c -o ring_oshmem gmake[1]: Leaving directory `/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples' gmake[1]: Entering directory `/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples' shmemfort -g hello_oshmemfh.f90 -o hello_oshmemfh -- No underlying compiler was specified in the wrapper compiler data file (e.g., mpicc-wrapper-data.txt) -- gmake[1]: *** [hello_oshmemfh] Error 1 gmake[1]: Leaving directory `/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples' gmake[1]: Entering directory `/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples' shmemfort -g ring_oshmemfh.f90 -o ring_oshmemfh -- No underlying compiler was specified in the wrapper compiler data file (e.g., mpicc-wrapper-data.txt) -- gmake[1]: *** [ring_oshmemfh] Error 1 gmake[1]: Leaving directory `/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples' gmake: *** [mpi] Error 2 If one looks at the logic in the Makefile, one sees use of shmem_info to determine if the fortran bindings are available. Running that utility manually I see: $ oshmem_info --parsable | grep bindings bindings:c:yes bindings:cxx:no bindings:mpif.h:no bindings:use_mpi:no bindings:use_mpi:size:deprecated-ompi-info-value bindings:use_mpi_f08:no bindings:use_mpi_f08:compliance:The mpi_f08 module was not built bindings:use_mpi_f08:subarrays-supported:no bindings:java:no oshmem:bindings:c:yes oshmem:bindings:fort:yes This already looks suspicious because it reports fortran bindings for oshmem but not for MPI. Well, there is *no* fortran compiler on this system. Quoting from the configure output: *** Fortran compiler checking for gfortran... no checking for f95... no checking for fort... no checking for xlf95... no checking for ifort... no checking for ifc... no checking for efc... no checking for pgfortran... no checking for pgf95... no checking for lf95... no checking for f90... no checking for xlf90... no checking for pgf90... no checking for epcf90... no checking whether we are using the GNU Fortran compiler... no checking whether accepts -g... no checking whether ln -s works... yes configure: WARNING: *** All Fortran MPI bindings disabled (could not find compiler) So, why "oshmem:bindings:fort:yes"? The AM_CONDITIONAL "OSHMEM_WANT_FORTRAN_BINDINGS" is somehow "true" despite the lack of a fortran compiler. So, I assume something is busted in config/oshmem_configure_options.m4. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] Trunk broken on NERSC's Cray XE6
Following up as I promised... My results on NERSC's small Cray XE6 (the test/dev rack "Grace", rather than the full-sized "Hopper") match those I get on the Cray XC30 (Edison), and don't follow those Ralph reports for LANL's XE6. An attempt to build/link hello_c.c results in unresolved symbols from libnuma, libxpmem and libugni. A complete list is available if it matters. This is still with last night's openmpi-1.9a1r27905 tarball, and the following 1-line mod to the platform file: - enable_shared=yes + enable_shared=no If it will help determine what is going on, I can probably get NERSC accounts for any of the DOE Lab folks easily. They will only get access to the full-sized XE6 (Hopper) for now. In case any of these are helpful clues to the difference(s): $ module list Currently Loaded Modulefiles: 1) modules/3.2.6.6 18) dvs/1.8.6_0.9.0-1.0401.1401.1.120 2) torque/4.1.4-snap.201211160904 19) csa/3.0.0-1_2.0401.37452.4.50.gem 3) moab/6.0.4 20) job/1.5.5-0.1_2.0401.35380.1.10.gem 4) xtpe-network-gemini 21) xpmem/0.1-2.0401.36790.4.3.gem 5) cray-mpich2/5.6.0 22) gni-headers/2.1-1.0401.5675.4.4.gem 6) atp/1.6.0 23) dmapp/3.2.1-1.0401.5983.4.5.gem 7) xe-sysroot/4.1.40 24) pmi/4.0.0-1..9282.69.4.gem 8) switch/1.0-1.0401.36779.2.72.gem25) ugni/4.0-1.0401.5928.9.5.gem 9) shared-root/1.0-1.0401.37253.3.50.gem 26) udreg/2.3.2-1.0401.5929.3.3.gem 10) pdsh/2.26-1.0401.37449.1.1.gem 27) xt-libsci/12.0.00 11) nodehealth/5.0-1.0401.38460.12.18.gem 28) gcc/4.7.2 12) lbcd/2.1-1.0401.35360.1.2.gem 29) xt-asyncpe/5.16 13) hosts/1.0-1.0401.35364.1.115.gem30) eswrap/1.0.10 14) configuration/1.0-1.0401.35391.1.2.gem 31) xtpe-mc12 15) ccm/2.2.0-1.0401.37254.2.14232) cray-shmem/5.6.0 16) audit/1.0.0-1.0401.37969.2.32.gem 33) PrgEnv-gnu/4.1.40 17) rca/1.0.0-2.0401.38656.2.2.gem -Paul On Fri, Jan 25, 2013 at 5:50 PM, Paul Hargrovewrote: > Ralph, > > Again our results differ. > I did NOT need the additional #include to link a simple test program. > I am going to try on our XE6 shortly. > > I suspect you are right about something in the configury being different. > I am willing to try a few more nightly tarballs if somebody thinks they > have the proper fix. > > -Paul > > > On Fri, Jan 25, 2013 at 5:45 PM, Ralph Castain wrote: > >> >> On Jan 25, 2013, at 5:12 PM, Paul Hargrove wrote: >> >> Ralph, >> >> Those are the result of the missing -lnuma that Nathan already identified >> earlier as missing in BOTH 1.7 and trunk. >> I see MORE missing symbols, which include ones from libxpmem and libugni. >> >> >> Alright, let me try to be clearer. We are missing -lnuma as well as the >> required include file - both are necessary to remove the issue. >> >> I find both the xpmem and ugni libraries *are* correctly included in both >> 1.7 and trunk. It could be a case of finding them in the configury, but we >> are finding them *and* correctly including them on the XE6. >> >> HTH >> Ralph >> >> >> -Paul >> >> >> On Fri, Jan 25, 2013 at 4:59 PM, Ralph Castain wrote: >> >>> >>> On Jan 25, 2013, at 4:53 PM, Ralph Castain wrote: >>> > The repeated libs is something we obviously should fix, but all the >>> libs are there - including lustre. I guess those were dropped due to the >>> shared lib setting, so we probably should fix that in the platform file. >>> > >>> > Perhaps that is the cause of Nathan's issue? shrug...regardless, apps >>> build and run just fine using mpicc for me. >>> >>> Correction - turns out I misspoke. I find apps *don't* build correctly >>> with this setup: >>> >>> mpicc -ghello_c.c -o hello_c >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_set_area_membind': >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1116: >>> undefined reference to `mbind' >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1135: >>> undefined reference to `mbind' >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_get_area_membind': >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1337: >>> undefined reference to `get_mempolicy' >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_find_kernel_max_numnodes': >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1239: >>> undefined reference to `get_mempolicy' >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_set_thisthread_membind': >>>
Re: [OMPI devel] trunk broken?
Yes, we know - been fixed. On Aug 30, 2012, at 7:50 AM, Eugene Lohwrote: > Trunk broken? Last night, Oracle's MTT trunk runs all came up empty handed. > E.g., > > *** Process received signal *** > Signal: Segmentation fault (11) > Signal code: Address not mapped (1) > Failing at address: (nil) > [ 0] [0xe600] > [ 1] /lib/libc.so.6(strlen+0x33) [0x3fa0a3] > [ 2] /lib/libc.so.6(__strdup+0x25) [0x3f9de5] > [ 3] .../lib/openmpi/mca_db_hash.so [0xf7bbdd34] > [ 4] .../lib/libmpi.so.0(orte_util_decode_pidmap+0x5f4) [0xf7e46654] > [ 5] .../lib/libmpi.so.0(orte_util_nidmap_init+0x1b4) [0xf7e46d54] > [ 6] .../lib/openmpi/mca_ess_env.so [0xf7bc4f62] > [ 7] .../lib/libmpi.so.0(orte_init+0x160) [0xf7e2d250] > [ 8] .../lib/libmpi.so.0(ompi_mpi_init+0x163) [0xf7de2133] > [ 9] .../lib/libmpi.so.0(MPI_Init+0x13f) [0xf7dfb6df] > [10] ./c_ring [0x8048759] > [11] /lib/libc.so.6(__libc_start_main+0xdc) [0x3a0dec] > [12] ./c_ring [0x80486a1] > *** End of error message *** > > r27182. The previous night, with r27175, ran fine. Quick peek at 27178 > seems fine (I think). > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] trunk broken?
Trunk broken? Last night, Oracle's MTT trunk runs all came up empty handed. E.g., *** Process received signal *** Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: (nil) [ 0] [0xe600] [ 1] /lib/libc.so.6(strlen+0x33) [0x3fa0a3] [ 2] /lib/libc.so.6(__strdup+0x25) [0x3f9de5] [ 3] .../lib/openmpi/mca_db_hash.so [0xf7bbdd34] [ 4] .../lib/libmpi.so.0(orte_util_decode_pidmap+0x5f4) [0xf7e46654] [ 5] .../lib/libmpi.so.0(orte_util_nidmap_init+0x1b4) [0xf7e46d54] [ 6] .../lib/openmpi/mca_ess_env.so [0xf7bc4f62] [ 7] .../lib/libmpi.so.0(orte_init+0x160) [0xf7e2d250] [ 8] .../lib/libmpi.so.0(ompi_mpi_init+0x163) [0xf7de2133] [ 9] .../lib/libmpi.so.0(MPI_Init+0x13f) [0xf7dfb6df] [10] ./c_ring [0x8048759] [11] /lib/libc.so.6(__libc_start_main+0xdc) [0x3a0dec] [12] ./c_ring [0x80486a1] *** End of error message *** r27182. The previous night, with r27175, ran fine. Quick peek at 27178 seems fine (I think).
Re: [OMPI devel] Trunk broken?
On 06-Jul-11 2:21 AM, Ralph Castain wrote: > Never mind - this seems to have been another svn-related artifact. I started > fresh and it didn't show up. I did some changes in m4 file, so I think that autogen + configure + make should have fixed the problem. But never mind, if it works with fresh checkout then I guess we're OK. -- YK > > On Jul 5, 2011, at 12:46 PM, Ralph Castain wrote: > >> I'm getting this when trying to build the trunk on a system with openib: >> >> In file included from btl_openib_ini.h:16, >> from btl_openib.c:47: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from btl_openib_component.c:80: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from btl_openib_endpoint.h:32, >> from btl_openib_endpoint.c:46: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from btl_openib_frag.c:22: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from btl_openib_proc.c:27: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from btl_openib_mca.c:33: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> btl_openib_mca.c:401:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from btl_openib_ini.c:35: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from btl_openib_async.c:26: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from btl_openib_xrc.h:14, >> from btl_openib_xrc.c:23: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from btl_openib_endpoint.h:32, >> from btl_openib_ip.c:30: >> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from connect/btl_openib_connect_base.c:13: >> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from connect/btl_openib_connect_oob.c:41: >> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> connect/btl_openib_connect_oob.c:47:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is >> not defined >> connect/btl_openib_connect_oob.c:65:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is >> not defined >> connect/btl_openib_connect_oob.c:115:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is >> not defined >> connect/btl_openib_connect_oob.c:271:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is >> not defined >> connect/btl_openib_connect_oob.c:307:7: warning: "OMPI_ENABLE_DYNAMIC_SL" is >> not defined >> connect/btl_openib_connect_oob.c:396:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is >> not defined >> connect/btl_openib_connect_oob.c:1011:6: warning: "OMPI_ENABLE_DYNAMIC_SL" >> is not defined >> In file included from connect/btl_openib_connect_empty.c:13: >> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> In file included from ./btl_openib_proc.h:26, >> from connect/btl_openib_connect_rdmacm.c:53: >> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined >> >> >> Can't build at all...can someone please fix this? >> Ralph >> >> > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] Trunk broken?
Never mind - this seems to have been another svn-related artifact. I started fresh and it didn't show up. On Jul 5, 2011, at 12:46 PM, Ralph Castain wrote: > I'm getting this when trying to build the trunk on a system with openib: > > In file included from btl_openib_ini.h:16, >from btl_openib.c:47: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from btl_openib_component.c:80: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from btl_openib_endpoint.h:32, >from btl_openib_endpoint.c:46: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from btl_openib_frag.c:22: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from btl_openib_proc.c:27: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from btl_openib_mca.c:33: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > btl_openib_mca.c:401:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from btl_openib_ini.c:35: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from btl_openib_async.c:26: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from btl_openib_xrc.h:14, >from btl_openib_xrc.c:23: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from btl_openib_endpoint.h:32, >from btl_openib_ip.c:30: > btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from connect/btl_openib_connect_base.c:13: > ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from connect/btl_openib_connect_oob.c:41: > ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > connect/btl_openib_connect_oob.c:47:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is > not defined > connect/btl_openib_connect_oob.c:65:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is > not defined > connect/btl_openib_connect_oob.c:115:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is > not defined > connect/btl_openib_connect_oob.c:271:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is > not defined > connect/btl_openib_connect_oob.c:307:7: warning: "OMPI_ENABLE_DYNAMIC_SL" is > not defined > connect/btl_openib_connect_oob.c:396:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is > not defined > connect/btl_openib_connect_oob.c:1011:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is > not defined > In file included from connect/btl_openib_connect_empty.c:13: > ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > In file included from ./btl_openib_proc.h:26, >from connect/btl_openib_connect_rdmacm.c:53: > ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined > > > Can't build at all...can someone please fix this? > Ralph > >
[OMPI devel] Trunk broken?
I'm getting this when trying to build the trunk on a system with openib: In file included from btl_openib_ini.h:16, from btl_openib.c:47: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_component.c:80: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_endpoint.h:32, from btl_openib_endpoint.c:46: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_frag.c:22: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_proc.c:27: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_mca.c:33: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined btl_openib_mca.c:401:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_ini.c:35: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_async.c:26: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_xrc.h:14, from btl_openib_xrc.c:23: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_endpoint.h:32, from btl_openib_ip.c:30: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from connect/btl_openib_connect_base.c:13: ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from connect/btl_openib_connect_oob.c:41: ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:47:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:65:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:115:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:271:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:307:7: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:396:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:1011:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from connect/btl_openib_connect_empty.c:13: ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from ./btl_openib_proc.h:26, from connect/btl_openib_connect_rdmacm.c:53: ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined Can't build at all...can someone please fix this? Ralph
Re: [OMPI devel] Trunk broken at r20375
Seems more like a compiler problem. A static inline function defined in the header file but never used is the source of the problem. It did compile for me with the gcc from Leopard and 4.3.1 on Linux. I'll commit the fix asap. george. On Jan 28, 2009, at 14:26 , Ralph Castain wrote: Rats - once I fixed my area, it again broke on Linux at this same spot in convertor. Sorry for the confusion Ralph On Jan 28, 2009, at 12:25 PM, Ralph Castain wrote: Actually, check that - it seems to be building under Linux (my build broke in some other area where I am working, but not here). However, it does not build on the Mac. Any suggestions? Ralph On Jan 28, 2009, at 12:19 PM, Ralph Castain wrote: Hi folks I believe a recent commit has broken the trunk - I am unable to compile it on either Linux or Mac: In file included from convertor_raw.c:24: ../../ompi/datatype/datatype_pack.h: In function ‘pack_predefined_data’: ../../ompi/datatype/datatype_pack.h:41: error: implicit declaration of function ‘MEMCPY_CSUM’ convertor_raw.c: In function ‘ompi_convertor_raw’: convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but argument 4 has type ‘struct iovec *’ convertor_raw.c:60: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 5 has type ‘unsigned int’ convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but argument 6 has type ‘long unsigned int’ convertor_raw.c:87: warning: comparison between signed and unsigned make[2]: *** [convertor_raw.lo] Error 1 make[2]: *** Waiting for unfinished jobs make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 Perhaps an include file is missing? Thanks Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Trunk broken at r20375
Rats - once I fixed my area, it again broke on Linux at this same spot in convertor. Sorry for the confusion Ralph On Jan 28, 2009, at 12:25 PM, Ralph Castain wrote: Actually, check that - it seems to be building under Linux (my build broke in some other area where I am working, but not here). However, it does not build on the Mac. Any suggestions? Ralph On Jan 28, 2009, at 12:19 PM, Ralph Castain wrote: Hi folks I believe a recent commit has broken the trunk - I am unable to compile it on either Linux or Mac: In file included from convertor_raw.c:24: ../../ompi/datatype/datatype_pack.h: In function ‘pack_predefined_data’: ../../ompi/datatype/datatype_pack.h:41: error: implicit declaration of function ‘MEMCPY_CSUM’ convertor_raw.c: In function ‘ompi_convertor_raw’: convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but argument 4 has type ‘struct iovec *’ convertor_raw.c:60: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 5 has type ‘unsigned int’ convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but argument 6 has type ‘long unsigned int’ convertor_raw.c:87: warning: comparison between signed and unsigned make[2]: *** [convertor_raw.lo] Error 1 make[2]: *** Waiting for unfinished jobs make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 Perhaps an include file is missing? Thanks Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Trunk broken at r20375
Actually, check that - it seems to be building under Linux (my build broke in some other area where I am working, but not here). However, it does not build on the Mac. Any suggestions? Ralph On Jan 28, 2009, at 12:19 PM, Ralph Castain wrote: Hi folks I believe a recent commit has broken the trunk - I am unable to compile it on either Linux or Mac: In file included from convertor_raw.c:24: ../../ompi/datatype/datatype_pack.h: In function ‘pack_predefined_data’: ../../ompi/datatype/datatype_pack.h:41: error: implicit declaration of function ‘MEMCPY_CSUM’ convertor_raw.c: In function ‘ompi_convertor_raw’: convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but argument 4 has type ‘struct iovec *’ convertor_raw.c:60: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 5 has type ‘unsigned int’ convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but argument 6 has type ‘long unsigned int’ convertor_raw.c:87: warning: comparison between signed and unsigned make[2]: *** [convertor_raw.lo] Error 1 make[2]: *** Waiting for unfinished jobs make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 Perhaps an include file is missing? Thanks Ralph
[OMPI devel] Trunk broken at r20375
Hi folks I believe a recent commit has broken the trunk - I am unable to compile it on either Linux or Mac: In file included from convertor_raw.c:24: ../../ompi/datatype/datatype_pack.h: In function ‘pack_predefined_data’: ../../ompi/datatype/datatype_pack.h:41: error: implicit declaration of function ‘MEMCPY_CSUM’ convertor_raw.c: In function ‘ompi_convertor_raw’: convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but argument 4 has type ‘struct iovec *’ convertor_raw.c:60: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 5 has type ‘unsigned int’ convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but argument 6 has type ‘long unsigned int’ convertor_raw.c:87: warning: comparison between signed and unsigned make[2]: *** [convertor_raw.lo] Error 1 make[2]: *** Waiting for unfinished jobs make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 Perhaps an include file is missing? Thanks Ralph
Re: [OMPI devel] Trunk broken with linear, direct routing
Just an update: I have fixed this problem. However, I will hold off checking it into the trunk until tomorrow. It will come in with the MPI-2 repairs to avoid code conflicts. Ralph > Since this appears to have gone unnoticed, it may not be a big deal. > However, I have found that multi-node operations are broken if you invoke > the linear or direct routed modules. > > Things work fine with the default binomial routed module. > > I will be working to fix this - just a heads up. > Ralph
[OMPI devel] Trunk broken with linear, direct routing
Since this appears to have gone unnoticed, it may not be a big deal. However, I have found that multi-node operations are broken if you invoke the linear or direct routed modules. Things work fine with the default binomial routed module. I will be working to fix this - just a heads up. Ralph