Re: [OMPI devel] Trunk broken for PPC64?

2014-08-02 Thread Ralph Castain
Good suggestion, Paul - I have committed it in r32407 and added it to cmr #4826

Thanks!
Ralph

On Aug 1, 2014, at 1:12 AM, Paul Hargrove  wrote:

> Gilles,
> 
> At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following:
> 
> #ifdef HAVE_GETPAGESIZE
> pagesize = getpagesize();
> #else
> pagesize = 4096;
> #endif
> 
> While other places in the code use sysconf(), but not always consistently.
> 
> And on some systems _SC_PAGESIZE is spelled as _SC_PAGE_SIZE.
> Fortunately configure already checks these variations for you.
> 
> So, I suggest
> 
> #ifdef HAVE_GETPAGESIZE
> pagesize = getpagesize();
> #elif defined(_SC_PAGESIZE )
> pagesize = sysconf(_SC_PAGESIZE);
> #elif defined(_SC_PAGE_SIZE)
> pagesize = sysconf(_SC_PAGE_SIZE);
> #else
> pagesize = 65536; /* safer to overestimate than under */
> #endif
> 
> 
> opal_pagesize() anyone?
> 
> -Paul
> 
> On Fri, Aug 1, 2014 at 12:50 AM, Gilles Gouaillardet 
>  wrote:
> Paul,
> 
> you are absolutly right !
> 
> in ompi/mca/coll/ml/coll_ml_lmngr.c at line 53,
> cm->lmngr_alignment is hard coded to 4096
> 
> as a proof of concept, i hard coded it to 65536 and now coll/ml works just 
> fine
> 
> i will now write a patch that uses sysconf(_SC_PAGESIZE) instead
> 
> Cheers,
> 
> Gilles
> 
> On 2014/08/01 15:56, Paul Hargrove wrote:
>> Hmm, maybe this has nothing to do with big-endian.
>> Below is a backtrace from ring_c on an IA64 platform (definitely
>> little-endian) that looks very similar to me.
>> 
>> It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems.
>> So, I wonder if that might be related.
>> 
>> -Paul
>> 
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>> [altix][[26769,1],0][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
>> COLL-ML [altix:20418] *** Process received signal ***
>> [altix:20418] Signal: Segmentation fault (11)
>> [altix:20418] Signal code: Invalid permissions (2)
>> [altix:20418] Failing at address: 0x16
>> [altix][[26769,1],1][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
>> COLL-ML [altix:20419] *** Process received signal ***
>> [altix:20419] Signal: Segmentation fault (11)
>> [altix:20419] Signal code: Invalid permissions (2)
>> [altix:20419] Failing at address: 0x16
>> [altix:20418] [ 0] [0xa0010800]
>> [altix:20418] [ 1] /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0]
>> [altix:20418] [altix:20419] [ 0] [0xa0010800]
>> [altix:20419] [ 1] [ 2]
>> /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0]
>> [altix:20419] [ 2]
>> /lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x204b15d0]
>> [altix:20419] [ 3] /lib/libc.so.6.1(+0x82860)[0x204b2860]
>> [altix:20419] [ 4]
>> /lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x2040]
>> [altix:20419] [ 5]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc5a70)[0x21e55a70]
>> [altix:20419] [ 6]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc84a0)[0x21e584a0]
>> [altix:20419] [ 7]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc+0x100f520)[0x21e59110]
>> [altix:20419] [ 8]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block+0xf6e940)[0x21db8540]
>> [altix:20419] [ 9]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x10130)[0x21da0130]
>> [altix:20419] [10]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x19970)[0x21da9970]
>> [altix:20419] [11]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query+0xf6d6b0)[0x21db5830]
>> [altix:20419] [12]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fbd0)[0x2028fbd0]
>> [altix:20419] [13]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fac0)[0x2028fac0]
>> [altix:20419] [14]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22f7e0)[0x2028f7e0]
>> [altix:20419] [15]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22eac0)[0x2028eac0]
>> [altix:20419] [16]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0xbcbb90)[0x2027e080]
>> [altix:20419] [17]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(ompi_mpi_init-0xd38e70)[0x20110db0]
>> [altix:20419] [18]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(MPI_Init-0xc8bf40)[0x201bdcf0]
>> [altix:20419] [19] 

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul,

i just commited r32393 (and made a CMR for v1.8)

can you please give it a try ?

in the mean time, i received your email ...
sysconf is called directly (e.g. no #ifdef protected) in several other
places :
$ grep -R sysconf . | grep -v svn | grep -v sysconfdir | grep -v
autom4te |grep PAGE | grep -v LARGE
./oshmem/mca/memheap/ptmalloc/malloc.c:#define malloc_getpagesize
sysconf(_SC_PAGE_SIZE)
./ompi/mca/pml/base/pml_base_bsend.c:tmp = mca_pml_bsend_pagesz =
sysconf(_SC_PAGESIZE);
./ompi/mca/coll/ml/coll_ml_lmngr.c:cm->lmngr_alignment =
sysconf(_SC_PAGESIZE);
./orte/mca/oob/ud/oob_ud_module.c:posix_memalign ((void
**)_mem->ptr, sysconf(_SC_PAGESIZE), buffer_len);
./opal/mca/memory/linux/malloc.c:#define malloc_getpagesize
sysconf(_SC_PAGE_SIZE)
./opal/mca/hwloc/hwloc172/hwloc/src/topology-solaris.c:  remainder =
(uintptr_t) addr & (sysconf(_SC_PAGESIZE)-1);
./opal/mca/hwloc/hwloc172/hwloc/src/topology-linux.c:  remainder =
(uintptr_t) addr & (sysconf(_SC_PAGESIZE)-1);
./opal/mca/hwloc/hwloc172/hwloc/include/private/private.h:#define
hwloc_getpagesize() sysconf(_SC_PAGE_SIZE)
./opal/mca/hwloc/hwloc172/hwloc/include/private/private.h:#define
hwloc_getpagesize() sysconf(_SC_PAGESIZE)
./opal/mca/mpool/base/mpool_base_frame.c:mca_mpool_base_page_size =
sysconf(_SC_PAGESIZE);
./opal/mca/btl/openib/connect/btl_openib_connect_sl.c:long page_size
= sysconf(_SC_PAGESIZE);
./opal/mca/btl/openib/connect/btl_openib_connect_udcm.c:   
posix_memalign ((void **)>cm_buffer, sysconf(_SC_PAGESIZE),

that is why i did not #ifdef protect it in coll/ml


Cheers,

Gilles

On 2014/08/01 17:12, Paul Hargrove wrote:
> Gilles,
>
> At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following:
>
> #ifdef HAVE_GETPAGESIZE
> pagesize = getpagesize();
> #else
> pagesize = 4096;
> #endif
>
> While other places in the code use sysconf(), but not always consistently.
>
> And on some systems _SC_PAGESIZE is spelled as _SC_PAGE_SIZE.
> Fortunately configure already checks these variations for you.
>
> So, I suggest
>
> #ifdef HAVE_GETPAGESIZE
> pagesize = getpagesize();
> #elif defined(_SC_PAGESIZE )
> pagesize = sysconf(_SC_PAGESIZE);
> #elif defined(_SC_PAGE_SIZE)
> pagesize = sysconf(_SC_PAGE_SIZE);
> #else
> pagesize = 65536; /* safer to overestimate than under */
> #endif
>
>
> opal_pagesize() anyone?
>
> -Paul
>
> On Fri, Aug 1, 2014 at 12:50 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  Paul,
>>
>> you are absolutly right !
>>
>> in ompi/mca/coll/ml/coll_ml_lmngr.c at line 53,
>> cm->lmngr_alignment is hard coded to 4096
>>
>> as a proof of concept, i hard coded it to 65536 and now coll/ml works just
>> fine
>>
>> i will now write a patch that uses sysconf(_SC_PAGESIZE) instead
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/08/01 15:56, Paul Hargrove wrote:
>>
>> Hmm, maybe this has nothing to do with big-endian.
>> Below is a backtrace from ring_c on an IA64 platform (definitely
>> little-endian) that looks very similar to me.
>>
>> It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems.
>> So, I wonder if that might be related.
>>
>> -Paul
>>
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>> [altix][[26769,1],0][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
>> COLL-ML [altix:20418] *** Process received signal ***
>> [altix:20418] Signal: Segmentation fault (11)
>> [altix:20418] Signal code: Invalid permissions (2)
>> [altix:20418] Failing at address: 0x16
>> [altix][[26769,1],1][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
>> COLL-ML [altix:20419] *** Process received signal ***
>> [altix:20419] Signal: Segmentation fault (11)
>> [altix:20419] Signal code: Invalid permissions (2)
>> [altix:20419] Failing at address: 0x16
>> [altix:20418] [ 0] [0xa0010800]
>> [altix:20418] [ 1] /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0]
>> [altix:20418] [altix:20419] [ 0] [0xa0010800]
>> [altix:20419] [ 1] [ 2]
>> /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0]
>> [altix:20419] [ 2]
>> /lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x204b15d0]
>> [altix:20419] [ 3] /lib/libc.so.6.1(+0x82860)[0x204b2860]
>> [altix:20419] [ 4]
>> /lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x2040]
>> [altix:20419] [ 5]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc5a70)[0x21e55a70]
>> [altix:20419] [ 6]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc84a0)[0x21e584a0]
>> [altix:20419] [ 7]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc+0x100f520)[0x21e59110]
>> [altix:20419] [ 8]
>> 

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Hmm, maybe this has nothing to do with big-endian.
Below is a backtrace from ring_c on an IA64 platform (definitely
little-endian) that looks very similar to me.

It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems.
So, I wonder if that might be related.

-Paul

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
[altix][[26769,1],0][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
COLL-ML [altix:20418] *** Process received signal ***
[altix:20418] Signal: Segmentation fault (11)
[altix:20418] Signal code: Invalid permissions (2)
[altix:20418] Failing at address: 0x16
[altix][[26769,1],1][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
COLL-ML [altix:20419] *** Process received signal ***
[altix:20419] Signal: Segmentation fault (11)
[altix:20419] Signal code: Invalid permissions (2)
[altix:20419] Failing at address: 0x16
[altix:20418] [ 0] [0xa0010800]
[altix:20418] [ 1] /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0]
[altix:20418] [altix:20419] [ 0] [0xa0010800]
[altix:20419] [ 1] [ 2]
/lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0]
[altix:20419] [ 2]
/lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x204b15d0]
[altix:20419] [ 3] /lib/libc.so.6.1(+0x82860)[0x204b2860]
[altix:20419] [ 4]
/lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x2040]
[altix:20419] [ 5]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc5a70)[0x21e55a70]
[altix:20419] [ 6]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc84a0)[0x21e584a0]
[altix:20419] [ 7]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc+0x100f520)[0x21e59110]
[altix:20419] [ 8]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block+0xf6e940)[0x21db8540]
[altix:20419] [ 9]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x10130)[0x21da0130]
[altix:20419] [10]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x19970)[0x21da9970]
[altix:20419] [11]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query+0xf6d6b0)[0x21db5830]
[altix:20419] [12]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fbd0)[0x2028fbd0]
[altix:20419] [13]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fac0)[0x2028fac0]
[altix:20419] [14]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22f7e0)[0x2028f7e0]
[altix:20419] [15]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22eac0)[0x2028eac0]
[altix:20419] [16]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0xbcbb90)[0x2027e080]
[altix:20419] [17]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(ompi_mpi_init-0xd38e70)[0x20110db0]
[altix:20419] [18]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(MPI_Init-0xc8bf40)[0x201bdcf0]
[altix:20419] [19] examples/ring_c[0x4c00]
[altix:20419] [20]
/lib/libc.so.6.1(__libc_start_main-0x9f56b0)[0x20454590]
[altix:20419] [21] examples/ring_c[0x4a20]
[altix:20419] *** End of error message ***
/lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x204b15d0]
[altix:20418] [ 3] /lib/libc.so.6.1(+0x82860)[0x204b2860]
[altix:20418] [ 4]
/lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x2040]




On Thu, Jul 31, 2014 at 11:47 PM, Paul Hargrove  wrote:

> Gilles's findings are consistent with mine which showed the SEGVs to be in
> the coll/ml code.
> I've built with --enable-debug and so below is a backtrace (well, two
> actually) that might be helpful.
> Unfortunately the output of the two ranks did get slightly entangled.
>
> -Paul
>
> $ ../INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c'
> [bd-login][[43502,1],0][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
> COLL-ML [bd-login:09106] *** Process received signal ***
> [bd-login][[43502,1],1][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
> COLL-ML [bd-login:09107] *** Process received signal ***
> [bd-login:09107] Signal: Segmentation fault (11)
> [bd-login:09107] Signal code: Address not mapped (1)
> [bd-login:09107] Failing at address: 0x10
> [bd-login:09107] [ 0] [bd-login:09106] Signal: Segmentation fault (11)
> [bd-login:09106] Signal code: Address not mapped (1)
> [bd-login:09106] 

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Gilles's findings are consistent with mine which showed the SEGVs to be in
the coll/ml code.
I've built with --enable-debug and so below is a backtrace (well, two
actually) that might be helpful.
Unfortunately the output of the two ranks did get slightly entangled.

-Paul

$ ../INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c'
[bd-login][[43502,1],0][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
COLL-ML [bd-login:09106] *** Process received signal ***
[bd-login][[43502,1],1][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
COLL-ML [bd-login:09107] *** Process received signal ***
[bd-login:09107] Signal: Segmentation fault (11)
[bd-login:09107] Signal code: Address not mapped (1)
[bd-login:09107] Failing at address: 0x10
[bd-login:09107] [ 0] [bd-login:09106] Signal: Segmentation fault (11)
[bd-login:09106] Signal code: Address not mapped (1)
[bd-login:09106] Failing at address: 0x10
[bd-login:09106] [ 0] [0xfffa96c0418]
[bd-login:09106] [ 1] [0xfff8f580418]
[bd-login:09107] [ 1] /lib64/libc.so.6(_IO_vfprintf-0x157168)[0x80c9b5b968]
[bd-login:09107] [ 2] /lib64/libc.so.6(_IO_vfprintf-0x157168)[0x80c9b5b968]
[bd-login:09106] [ 2] /lib64/libc.so.6[0x80c9b600b4]
[bd-login:09106] [ 3] /lib64/libc.so.6[0x80c9b600b4]
[bd-login:09107] [ 3] /lib64/libc.so.6(_IO_vfprintf-0x157010)[0x80c9b5bac0]
[bd-login:09107] [ 4] /lib64/libc.so.6(_IO_vfprintf-0x157010)[0x80c9b5bac0]
[bd-login:09106] [ 4]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x66580)[0xfffa8296580]
[bd-login:09106] [ 5]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x67604)[0xfffa8297604]
[bd-login:09106] [ 6]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc-0x1af04)[0xfffa829784c]
[bd-login:09106] [ 7]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block-0x607b4)[0xfffa8250d4c]
[bd-login:09106]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x66580)[0xfff8e156580]
[bd-login:09107] [ 5]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x67604)[0xfff8e157604]
[bd-login:09107] [ 6]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc-0x1af04)[0xfff8e15784c]
[bd-login:09107] [ 7]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block-0x607b4)[0xfff8e110d4c]
[bd-login:09107] [ 8]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x165e4)[0xfff8e1065e4]
[bd-login:09107] [ 9]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x1a7d8)[0xfff8e10a7d8]
[bd-login:09107] [10]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query-0x61b50)[0xfff8e10f970]
[bd-login:09107] [11] [ 8]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x165e4)[0xfffa82465e4]
[bd-login:09106] [ 9]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x1a7d8)[0xfffa824a7d8]
[bd-login:09106] [10]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query-0x61b50)[0xfffa824f970]
[bd-login:09106] [11]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165ba0)[0xfff8f4b5ba0]
[bd-login:09107] [12]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165b14)[0xfff8f4b5b14]
[bd-login:09107] [13]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165ba0)[0xfffa95f5ba0]
[bd-login:09106] [12]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165b14)[0xfffa95f5b14]
[bd-login:09106] [13]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1659a8)[0xfffa95f59a8]
[bd-login:09106] [14]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1659a8)[0xfff8f4b59a8]
[bd-login:09107] [14]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1657ac)[0xfffa95f57ac]
[bd-login:09106] [15]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1657ac)[0xfff8f4b57ac]
[bd-login:09107] [15]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0x9b89c)[0xfff8f4ae3ec]
[bd-login:09107]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0x9b89c)[0xfffa95ee3ec]
[bd-login:09106] [16] [16]
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(ompi_mpi_init-0x13f790)[0xfff8f401408]
[bd-login:09107] [17]

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul and Ralph,

for what it's worth :

a) i faced the very same issue on my (slw) qemu emulated ppc64 vm
b) i was able to run very basic programs when passing --mca coll ^ml to
mpirun

Cheers,

Gilles

On 2014/08/01 12:30, Ralph Castain wrote:
> Yes, I fear this will require some effort to chase all the breakage down 
> given that (to my knowledge, at least) we lack PPC machines in the devel 
> group.
>
>
> On Jul 31, 2014, at 5:46 PM, Paul Hargrove  wrote:
>
>> On the path to verifying George's atomics patch, I have started just by 
>> verifying that I can still build the UNPATCHED trunk on each of the 
>> platforms I listed.
>>
>> I have tried two PPC64/Linux systems so far and am seeing the same problem 
>> on both.  Though I can pass "make check" both platforms SEGV on
>>mpirun -mca btl sm,self -np 2 examples/ring_c
>>
>> Is this the expected state of the trunk on big-endian systems?
>> I am thinking in particular of 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15365.php in which 
>> Ralph wrote:
>>> Yeah, my fix won't work for big endian machines - this is going to be an 
>>> issue across the
>>> code base now, so we'll have to troll and fix it. I was doing the minimal 
>>> change required to
>>> fix the trunk in the meantime. 
>> If this big-endian failure is not known/expected let me know and I'll 
>> provide details.
>> Since testing George's patch only requires "make check" I can proceed with 
>> that regardless.
>>
>> -Paul
>>
>>
>> On Thu, Jul 31, 2014 at 4:25 PM, George Bosilca  wrote:
>> Awesome, thanks Paul. When the results will be in we will fix whatever is 
>> needed for these less common architectures.
>>
>>   George.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove  wrote:
>>
>>
>> On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove  wrote:
>>
>> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca  wrote:
>> Paul, I know you have a pretty diverse range computers. Can you try to 
>> compile and run a "make check" with the following patch?
>>
>> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset 
>> of those is still supported).
>> The ARM and MIPS system are emulators and take forever to build OMPI.
>> However, I am not even sure how soon I'll get to start this testing.
>>
>>
>> Add SPARC (v8plus and v9) to that list.
>>  
>>
>>
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15411.php
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15412.php
>>
>>
>>
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15414.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15425.php



Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Ralph Castain
Yes, I fear this will require some effort to chase all the breakage down given 
that (to my knowledge, at least) we lack PPC machines in the devel group.


On Jul 31, 2014, at 5:46 PM, Paul Hargrove  wrote:

> On the path to verifying George's atomics patch, I have started just by 
> verifying that I can still build the UNPATCHED trunk on each of the platforms 
> I listed.
> 
> I have tried two PPC64/Linux systems so far and am seeing the same problem on 
> both.  Though I can pass "make check" both platforms SEGV on
>mpirun -mca btl sm,self -np 2 examples/ring_c
> 
> Is this the expected state of the trunk on big-endian systems?
> I am thinking in particular of 
> http://www.open-mpi.org/community/lists/devel/2014/07/15365.php in which 
> Ralph wrote:
> > Yeah, my fix won't work for big endian machines - this is going to be an 
> > issue across the
> > code base now, so we'll have to troll and fix it. I was doing the minimal 
> > change required to
> > fix the trunk in the meantime. 
> 
> If this big-endian failure is not known/expected let me know and I'll provide 
> details.
> Since testing George's patch only requires "make check" I can proceed with 
> that regardless.
> 
> -Paul
> 
> 
> On Thu, Jul 31, 2014 at 4:25 PM, George Bosilca  wrote:
> Awesome, thanks Paul. When the results will be in we will fix whatever is 
> needed for these less common architectures.
> 
>   George.
> 
> 
> 
> On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove  wrote:
> 
> 
> On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove  wrote:
> 
> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca  wrote:
> Paul, I know you have a pretty diverse range computers. Can you try to 
> compile and run a “make check” with the following patch?
> 
> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset of 
> those is still supported).
> The ARM and MIPS system are emulators and take forever to build OMPI.
> However, I am not even sure how soon I'll get to start this testing.
> 
> 
> Add SPARC (v8plus and v9) to that list.
>  
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15411.php
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15412.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15414.php



Re: [OMPI devel] Trunk broken for --with-devel-headers?

2014-07-25 Thread Ralph Castain
Works okay with a fresh checkout, so something in my tree must have been hosed.


On Jul 25, 2014, at 8:51 AM, Ralph Castain  wrote:

> It seems to be only happening on my Mac, not Linux, but I'll try with a fresh 
> checkout
> 
> On Jul 25, 2014, at 8:51 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
>> I'm unable to replicate... perhaps you have a stale install tree?
>> 
>> 
>> On Jul 24, 2014, at 6:36 PM, Ralph Castain  wrote:
>> 
>>> Hey folks
>>> 
>>> Something in the last day or so appears to have broken the trunk's ability 
>>> to run --with-devel-headers. It looks like the headers are being installed 
>>> correctly, but mpicc fails to compile a program that uses them - the 
>>> include passes, but the linker fails:
>>> 
>>> Undefined symbols for architecture x86_64:
>>> "_opal_hwloc172_hwloc_bitmap_alloc", referenced from:
>>> _main in hello.o
>>> "_opal_hwloc172_hwloc_bitmap_list_asprintf", referenced from:
>>> _main in hello.o
>>> "_opal_hwloc172_hwloc_get_cpubind", referenced from:
>>> _main in hello.o
>>> "_opal_hwloc_topology", referenced from:
>>> _main in hello.o
>>> "_orte_process_info", referenced from:
>>> _main in hello.o
>>> ld: symbol(s) not found for architecture x86_64
>>> collect2: error: ld returned 1 exit status
>>> 
>>> Anybody else seeing this?
>>> Ralph
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15262.php
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15265.php
> 



Re: [OMPI devel] Trunk broken for --with-devel-headers?

2014-07-25 Thread Ralph Castain
It seems to be only happening on my Mac, not Linux, but I'll try with a fresh 
checkout

On Jul 25, 2014, at 8:51 AM, Jeff Squyres (jsquyres)  wrote:

> I'm unable to replicate... perhaps you have a stale install tree?
> 
> 
> On Jul 24, 2014, at 6:36 PM, Ralph Castain  wrote:
> 
>> Hey folks
>> 
>> Something in the last day or so appears to have broken the trunk's ability 
>> to run --with-devel-headers. It looks like the headers are being installed 
>> correctly, but mpicc fails to compile a program that uses them - the include 
>> passes, but the linker fails:
>> 
>> Undefined symbols for architecture x86_64:
>>  "_opal_hwloc172_hwloc_bitmap_alloc", referenced from:
>>  _main in hello.o
>>  "_opal_hwloc172_hwloc_bitmap_list_asprintf", referenced from:
>>  _main in hello.o
>>  "_opal_hwloc172_hwloc_get_cpubind", referenced from:
>>  _main in hello.o
>>  "_opal_hwloc_topology", referenced from:
>>  _main in hello.o
>>  "_orte_process_info", referenced from:
>>  _main in hello.o
>> ld: symbol(s) not found for architecture x86_64
>> collect2: error: ld returned 1 exit status
>> 
>> Anybody else seeing this?
>> Ralph
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15262.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15265.php



Re: [OMPI devel] Trunk broken for --with-devel-headers?

2014-07-25 Thread Jeff Squyres (jsquyres)
I'm unable to replicate... perhaps you have a stale install tree?


On Jul 24, 2014, at 6:36 PM, Ralph Castain  wrote:

> Hey folks
> 
> Something in the last day or so appears to have broken the trunk's ability to 
> run --with-devel-headers. It looks like the headers are being installed 
> correctly, but mpicc fails to compile a program that uses them - the include 
> passes, but the linker fails:
> 
> Undefined symbols for architecture x86_64:
>   "_opal_hwloc172_hwloc_bitmap_alloc", referenced from:
>   _main in hello.o
>   "_opal_hwloc172_hwloc_bitmap_list_asprintf", referenced from:
>   _main in hello.o
>   "_opal_hwloc172_hwloc_get_cpubind", referenced from:
>   _main in hello.o
>   "_opal_hwloc_topology", referenced from:
>   _main in hello.o
>   "_orte_process_info", referenced from:
>   _main in hello.o
> ld: symbol(s) not found for architecture x86_64
> collect2: error: ld returned 1 exit status
> 
> Anybody else seeing this?
> Ralph
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15262.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] Trunk broken for --with-devel-headers?

2014-07-24 Thread Ralph Castain
Hey folks

Something in the last day or so appears to have broken the trunk's ability to 
run --with-devel-headers. It looks like the headers are being installed 
correctly, but mpicc fails to compile a program that uses them - the include 
passes, but the linker fails:

Undefined symbols for architecture x86_64:
  "_opal_hwloc172_hwloc_bitmap_alloc", referenced from:
  _main in hello.o
  "_opal_hwloc172_hwloc_bitmap_list_asprintf", referenced from:
  _main in hello.o
  "_opal_hwloc172_hwloc_get_cpubind", referenced from:
  _main in hello.o
  "_opal_hwloc_topology", referenced from:
  _main in hello.o
  "_orte_process_info", referenced from:
  _main in hello.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status

Anybody else seeing this?
Ralph



Re: [OMPI devel] trunk broken

2014-06-25 Thread Ralph Castain
Looks to me like the warning message saids it all - the problem is in
openib.

The reason we took this action was to force the problems to the surface
across the code base so that people would address them. We've tried before
to just ask people to set the right flags to enable async progress and fix
things, but nobody ever does it. Hence this action.

So please investigate the openib BTL and see what needs to be done. I'll
poke Nathan in a couple of hours as well.

Thanks
Ralph



On Wed, Jun 25, 2014 at 6:28 AM, Mike Dubman 
wrote:

> tried with vader - same crash
>
> *14:14:22* [vegas12:32068] 7 more processes have sent help message 
> help-mca-var.txt / deprecated-mca-env*14:14:22* [vegas12:32068] Set MCA 
> parameter "orte_base_help_aggregate" to 0 to see all help / error 
> messages*14:14:22* + 
> LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*14:14:22*
>  + OMPI_MCA_scoll_fca_enable=1*14:14:22* + OMPI_MCA_scoll_fca_np=0*14:14:22* 
> + OMPI_MCA_pml=ob1*14:14:22* + OMPI_MCA_btl=vader,self,openib*14:14:22* + 
> OMPI_MCA_spml=yoda*14:14:22* + 
> OMPI_MCA_memheap_mr_interleave_factor=8*14:14:22* + 
> OMPI_MCA_memheap=ptmalloc*14:14:22* + 
> OMPI_MCA_btl_openib_if_include=mlx4_0:1*14:14:22* + 
> OMPI_MCA_rmaps_base_dist_hca=mlx4_0*14:14:22* + 
> OMPI_MCA_memheap_base_hca_name=mlx4_0*14:14:22* + 
> OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*14:14:22* + 
> MXM_RDMA_PORTS=mlx4_0:1*14:14:22* + SHMEM_SYMMETRIC_HEAP_SIZE=1024M*14:14:22* 
> + timeout -s SIGSEGV 3m 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun
>  -np 8 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*14:14:22*
>  [vegas12][[4652,1],1][btl_openib_component.c:909:device_destruct] Failed to 
> cancel OpenIB progress thread*14:14:22* 
> [vegas12][[4652,1],0][btl_openib_component.c:909:device_destruct] Failed to 
> cancel OpenIB progress thread*14:14:22* 
> --*14:14:22*
>  WARNING: The openib BTL was directed to use "eager RDMA" for short*14:14:22* 
> messages, but the openib BTL was compiled with progress threads*14:14:22* 
> support.  Short eager RDMA is not yet supported with progress 
> threads;*14:14:22* its use has been disabled in this job.*14:14:22* 
> *14:14:22* This is a warning only; you job will attempt to 
> continue.*14:14:22* 
> --*14:14:22*
>  [vegas12][[4652,1],5][btl_openib_component.c:909:device_destruct] Failed to 
> cancel OpenIB progress thread*14:14:22* [vegas12:32108] *** Process received 
> signal 14:14:22* [vegas12:32108] Signal: Segmentation fault 
> (11)*14:14:22* [vegas12:32108] Signal code: Address not mapped (1)*14:14:22* 
> [vegas12:32108] Failing at address: (nil)*14:14:22* [vegas12:32108] [ 0] 
> /lib64/libpthread.so.0[0x3937c0f500]*14:14:22* [vegas12:32108] [ 1] 
> /usr/lib64/libibverbs.so.1(ibv_destroy_comp_channel+0x16)[0x3b7760bf46]*14:14:22*
>  [vegas12:32108] [ 2] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xdf02)[0x73fc1f02]*14:14:22*
>  [vegas12:32108] [ 3] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xf161)[0x73fc3161]*14:14:22*
>  [vegas12:32108] [ 4] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0x12ab1)[0x73fc6ab1]*14:14:22*
>  [vegas12:32108] [ 5] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_btl_base_select+0x117)[0x77a29807]*14:14:22*
>  [vegas12:32108] [ 6] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x741ed7e2]*14:14:22*
>  [vegas12:32108] [ 7] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_bml_base_init+0x99)[0x77a29009]*14:14:22*
>  [vegas12:32108] [ 8] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_pml_ob1.so(+0x58b5)[0x735848b5]*14:14:22*
>  [vegas12:32108] [ 9] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_pml_base_select+0x1e0)[0x77a3c590]*14:14:22*
>  [vegas12:32108] [10] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(ompi_mpi_init+0x455)[0x77a06bf5]*14:14:22*
>  [vegas12:32108] [11] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(oshmem_shmem_init+0xfd)[0x77ca66dd]*14:14:22*
>  [vegas12:32108] [12] 
> 

Re: [OMPI devel] trunk broken

2014-06-25 Thread Mike Dubman
tried with vader - same crash

*14:14:22* [vegas12:32068] 7 more processes have sent help message
help-mca-var.txt / deprecated-mca-env*14:14:22* [vegas12:32068] Set
MCA parameter "orte_base_help_aggregate" to 0 to see all help / error
messages*14:14:22* +
LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*14:14:22*
+ OMPI_MCA_scoll_fca_enable=1*14:14:22* +
OMPI_MCA_scoll_fca_np=0*14:14:22* + OMPI_MCA_pml=ob1*14:14:22* +
OMPI_MCA_btl=vader,self,openib*14:14:22* +
OMPI_MCA_spml=yoda*14:14:22* +
OMPI_MCA_memheap_mr_interleave_factor=8*14:14:22* +
OMPI_MCA_memheap=ptmalloc*14:14:22* +
OMPI_MCA_btl_openib_if_include=mlx4_0:1*14:14:22* +
OMPI_MCA_rmaps_base_dist_hca=mlx4_0*14:14:22* +
OMPI_MCA_memheap_base_hca_name=mlx4_0*14:14:22* +
OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*14:14:22* +
MXM_RDMA_PORTS=mlx4_0:1*14:14:22* +
SHMEM_SYMMETRIC_HEAP_SIZE=1024M*14:14:22* + timeout -s SIGSEGV 3m
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun
-np 8 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*14:14:22*
[vegas12][[4652,1],1][btl_openib_component.c:909:device_destruct]
Failed to cancel OpenIB progress thread*14:14:22*
[vegas12][[4652,1],0][btl_openib_component.c:909:device_destruct]
Failed to cancel OpenIB progress thread*14:14:22*
--*14:14:22*
WARNING: The openib BTL was directed to use "eager RDMA" for
short*14:14:22* messages, but the openib BTL was compiled with
progress threads*14:14:22* support.  Short eager RDMA is not yet
supported with progress threads;*14:14:22* its use has been disabled
in this job.*14:14:22* *14:14:22* This is a warning only; you job will
attempt to continue.*14:14:22*
--*14:14:22*
[vegas12][[4652,1],5][btl_openib_component.c:909:device_destruct]
Failed to cancel OpenIB progress thread*14:14:22* [vegas12:32108] ***
Process received signal 14:14:22* [vegas12:32108] Signal:
Segmentation fault (11)*14:14:22* [vegas12:32108] Signal code: Address
not mapped (1)*14:14:22* [vegas12:32108] Failing at address:
(nil)*14:14:22* [vegas12:32108] [ 0]
/lib64/libpthread.so.0[0x3937c0f500]*14:14:22* [vegas12:32108] [ 1]
/usr/lib64/libibverbs.so.1(ibv_destroy_comp_channel+0x16)[0x3b7760bf46]*14:14:22*
[vegas12:32108] [ 2]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xdf02)[0x73fc1f02]*14:14:22*
[vegas12:32108] [ 3]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xf161)[0x73fc3161]*14:14:22*
[vegas12:32108] [ 4]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0x12ab1)[0x73fc6ab1]*14:14:22*
[vegas12:32108] [ 5]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_btl_base_select+0x117)[0x77a29807]*14:14:22*
[vegas12:32108] [ 6]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x741ed7e2]*14:14:22*
[vegas12:32108] [ 7]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_bml_base_init+0x99)[0x77a29009]*14:14:22*
[vegas12:32108] [ 8]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_pml_ob1.so(+0x58b5)[0x735848b5]*14:14:22*
[vegas12:32108] [ 9]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_pml_base_select+0x1e0)[0x77a3c590]*14:14:22*
[vegas12:32108] [10]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(ompi_mpi_init+0x455)[0x77a06bf5]*14:14:22*
[vegas12:32108] [11]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(oshmem_shmem_init+0xfd)[0x77ca66dd]*14:14:22*
[vegas12:32108] [12]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(shmem_init+0x28)[0x77ca9328]*14:14:22*
[vegas12:32108] [13]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x40077d]*14:14:22*
[vegas12:32108] [14]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]*14:14:22*
[vegas12:32108] [15]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x4006a9]*14:14:22*
[vegas12:32108] *** End of error message 14:14:22* [vegas12:32112]
*** Process received signal 14:14:22* [vegas12:32112] Signal:
Segmentation fault (11)*14:14:*



On Wed, Jun 25, 2014 at 9:11 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Mike,
>
> could you try again with
>
> OMPI_MCA_btl=vader,self,openib
>
> it seems the sm module causes a hang
> 

Re: [OMPI devel] trunk broken

2014-06-25 Thread Mike Dubman
will do and update shortly.



On Wed, Jun 25, 2014 at 9:11 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Mike,
>
> could you try again with
>
> OMPI_MCA_btl=vader,self,openib
>
> it seems the sm module causes a hang
> (which later causes the timeout sending a SIGSEGV)
>
> Cheers,
>
> Gilles
>
> On 2014/06/25 14:22, Mike Dubman wrote:
> > Hi,
> > The following commit broke trunk in jenkins:
> >
>  Per the OMPI developer conference, remove the last vestiges of
> > OMPI_USE_PROGRESS_THREADS
> >
> > *22:15:09* +
> LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09*
> > + OMPI_MCA_scoll_fca_enable=1*22:15:09* +
> > OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* +
> > OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* +
> > OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* +
> > OMPI_MCA_memheap=ptmalloc*22:15:09* +
> > OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* +
> > OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* +
> > OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* +
> > OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* +
> > MXM_RDMA_PORTS=mlx4_0:1*22:15:09* +
> > SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m
> >
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun
> > -np 8
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09*
> > [vegas12:08101] *** Process received signal 22:15:09*
> > [vegas12:08101] Signal: Segmentation fault (11)*22:15:09*
> > [vegas12:08101] Signal code: Address not mapped (1)*22:15:09*
> > [vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [
> >
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/15055.php
>


Re: [OMPI devel] trunk broken

2014-06-25 Thread Ralph Castain
We should have given more of a "heads up" here. We recognize that the trunk
may well become unstable as we can't test all the variations, and clearly
some timing issues are going to arise with this change. Our hope is that we
can iron them out quickly. If not, then we'll revert and try again.

You also may find that you need to disable coll/ml - that is one we've
identified here and Nathan should have a fix for shortly.



On Wed, Jun 25, 2014 at 1:11 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Mike,
>
> could you try again with
>
> OMPI_MCA_btl=vader,self,openib
>
> it seems the sm module causes a hang
> (which later causes the timeout sending a SIGSEGV)
>
> Cheers,
>
> Gilles
>
> On 2014/06/25 14:22, Mike Dubman wrote:
> > Hi,
> > The following commit broke trunk in jenkins:
> >
>  Per the OMPI developer conference, remove the last vestiges of
> > OMPI_USE_PROGRESS_THREADS
> >
> > *22:15:09* +
> LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09*
> > + OMPI_MCA_scoll_fca_enable=1*22:15:09* +
> > OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* +
> > OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* +
> > OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* +
> > OMPI_MCA_memheap=ptmalloc*22:15:09* +
> > OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* +
> > OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* +
> > OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* +
> > OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* +
> > MXM_RDMA_PORTS=mlx4_0:1*22:15:09* +
> > SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m
> >
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun
> > -np 8
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09*
> > [vegas12:08101] *** Process received signal 22:15:09*
> > [vegas12:08101] Signal: Segmentation fault (11)*22:15:09*
> > [vegas12:08101] Signal code: Address not mapped (1)*22:15:09*
> > [vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [
> >
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/15055.php
>


Re: [OMPI devel] trunk broken

2014-06-25 Thread Gilles Gouaillardet
Mike,

could you try again with

OMPI_MCA_btl=vader,self,openib

it seems the sm module causes a hang
(which later causes the timeout sending a SIGSEGV)

Cheers,

Gilles

On 2014/06/25 14:22, Mike Dubman wrote:
> Hi,
> The following commit broke trunk in jenkins:
>
 Per the OMPI developer conference, remove the last vestiges of
> OMPI_USE_PROGRESS_THREADS
>
> *22:15:09* + 
> LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09*
> + OMPI_MCA_scoll_fca_enable=1*22:15:09* +
> OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* +
> OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* +
> OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* +
> OMPI_MCA_memheap=ptmalloc*22:15:09* +
> OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* +
> OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* +
> OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* +
> OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* +
> MXM_RDMA_PORTS=mlx4_0:1*22:15:09* +
> SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun
> -np 8 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09*
> [vegas12:08101] *** Process received signal 22:15:09*
> [vegas12:08101] Signal: Segmentation fault (11)*22:15:09*
> [vegas12:08101] Signal code: Address not mapped (1)*22:15:09*
> [vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [
>



[OMPI devel] trunk broken

2014-06-25 Thread Mike Dubman
Hi,
The following commit broke trunk in jenkins:

>>>Per the OMPI developer conference, remove the last vestiges of
OMPI_USE_PROGRESS_THREADS

*22:15:09* + 
LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09*
+ OMPI_MCA_scoll_fca_enable=1*22:15:09* +
OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* +
OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* +
OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* +
OMPI_MCA_memheap=ptmalloc*22:15:09* +
OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* +
OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* +
OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* +
OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* +
MXM_RDMA_PORTS=mlx4_0:1*22:15:09* +
SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun
-np 8 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09*
[vegas12:08101] *** Process received signal 22:15:09*
[vegas12:08101] Signal: Segmentation fault (11)*22:15:09*
[vegas12:08101] Signal code: Address not mapped (1)*22:15:09*
[vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [
0] /lib64/libpthread.so.0[0x3937c0f500]*22:15:09* [vegas12:08101] [ 1]
/usr/lib64/libibverbs.so.1(ibv_destroy_comp_channel+0x16)[0x73785f46]*22:15:09*
[vegas12:08101] [ 2]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xdf02)[0x73fc1f02]*22:15:09*
[vegas12:08101] [ 3]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xf161)[0x73fc3161]*22:15:09*
[vegas12:08101] [ 4]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0x12b41)[0x73fc6b41]*22:15:09*
[vegas12:08101] [ 5]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_btl_base_select+0x117)[0x77a29807]*22:15:09*
[vegas12:08101] [ 6]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x741ed7e2]*22:15:09*
[vegas12:08101] [ 7]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_bml_base_init+0x99)[0x77a29009]*22:15:09*
[vegas12:08101] [ 8]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_pml_ob1.so(+0x58b5)[0x72f528b5]*22:15:09*
[vegas12:08101] [ 9]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_pml_base_select+0x1e0)[0x77a3c590]*22:15:09*
[vegas12:08101] [10]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(ompi_mpi_init+0x455)[0x77a06bf5]*22:15:09*
[vegas12:08101] [11]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(oshmem_shmem_init+0xfd)[0x77ca66dd]*22:15:09*
[vegas12:08101] [12]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(shmem_init+0x28)[0x77ca9328]*22:15:09*
[vegas12:08101] [13]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x40077d]*22:15:09*
[vegas12:08101] [14]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]*22:15:09*
[vegas12:08101] [15]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x4006a9]*22:15:09*
[vegas12:08101] *** End of error message 22:15:09*
[vegas12][[28889,1],2][btl_openib_component.c:909:device_destruct]
Failed to cancel OpenIB progress thread*22:15:09*
[vegas12][[28889,1],5][btl_openib_component.c:909:device_destruct]
Failed to cancel OpenIB progress thread*22:15:09* [vegas12:08099] ***
Process received signal 22:15:09* [vegas12:08099] Signal:
Segmentation fault (11)*22:15:09* [vegas12:08099] Signal code: Address
not mapped (1)*22:15:09* [vegas12:08099] Failing at address:
(nil)*22:15:09* [vegas12:08099] [ 0]
/lib64/libpthread.so.0[0x3937c0f500]*22:15:09* [vegas12:08099] [ 1]
/usr/lib64/libibverbs.so.1(ibv_destroy_comp_channel+0x16)[0x73785f46]*22:15:09*
[vegas12:08099] [ 2]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xdf02)[0x73fc1f02]*22:15:09*
[vegas12:08099] [ 3]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xf161)[0x73fc3161]*22:15:09*
[vegas12:08099] [ 4]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0x12b41)[0x73fc6b41]*22:15:09*
[vegas12:08099] [ 5]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_btl_base_select+0x117)[0x77a29807]*22:15:09*

[OMPI devel] trunk - broken logic for oshmem:bindings:fort

2014-01-09 Thread Paul Hargrove
Building the trunk on FreeBSD-9/x86-64, and using gmake to work around the
non-portable examples/Makefile, I *still* encounter issues with shmemfort
when running "gmake" in the examples subdirectory:

$ gmake
mpicc -ghello_c.c   -o hello_c
mpicc -gring_c.c   -o ring_c
mpicc -gconnectivity_c.c   -o connectivity_c
gmake[1]: Entering directory
`/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples'
shmemcc -g hello_oshmem_c.c -o hello_oshmem
gmake[1]: Leaving directory
`/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples'
gmake[1]: Entering directory
`/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples'
shmemcc -g ring_oshmem_c.c -o ring_oshmem
gmake[1]: Leaving directory
`/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples'
gmake[1]: Entering directory
`/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples'
shmemfort -g hello_oshmemfh.f90 -o hello_oshmemfh
--
No underlying compiler was specified in the wrapper compiler data file
(e.g., mpicc-wrapper-data.txt)
--
gmake[1]: *** [hello_oshmemfh] Error 1
gmake[1]: Leaving directory
`/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples'
gmake[1]: Entering directory
`/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples'
shmemfort -g ring_oshmemfh.f90 -o ring_oshmemfh
--
No underlying compiler was specified in the wrapper compiler data file
(e.g., mpicc-wrapper-data.txt)
--
gmake[1]: *** [ring_oshmemfh] Error 1
gmake[1]: Leaving directory
`/usr/home/phargrov/OMPI/openmpi-trunk-freebsd9-amd64/BLD/examples'
gmake: *** [mpi] Error 2

If one looks at the logic in the Makefile, one sees use of shmem_info to
determine if the fortran bindings are available.  Running that utility
manually I see:
$ oshmem_info --parsable | grep bindings
bindings:c:yes
bindings:cxx:no
bindings:mpif.h:no
bindings:use_mpi:no
bindings:use_mpi:size:deprecated-ompi-info-value
bindings:use_mpi_f08:no
bindings:use_mpi_f08:compliance:The mpi_f08 module was not built
bindings:use_mpi_f08:subarrays-supported:no
bindings:java:no
oshmem:bindings:c:yes
oshmem:bindings:fort:yes

This already looks suspicious because it reports fortran bindings for
oshmem but not for MPI.
Well, there is *no* fortran compiler on this system.  Quoting from the
configure output:
*** Fortran compiler
checking for gfortran... no
checking for f95... no
checking for fort... no
checking for xlf95... no
checking for ifort... no
checking for ifc... no
checking for efc... no
checking for pgfortran... no
checking for pgf95... no
checking for lf95... no
checking for f90... no
checking for xlf90... no
checking for pgf90... no
checking for epcf90... no
checking whether we are using the GNU Fortran compiler... no
checking whether  accepts -g... no
checking whether ln -s works... yes
configure: WARNING: *** All Fortran MPI bindings disabled (could not find
compiler)

So, why "oshmem:bindings:fort:yes"?
The AM_CONDITIONAL "OSHMEM_WANT_FORTRAN_BINDINGS" is somehow "true" despite
the lack of a fortran compiler.  So, I assume something is busted
in config/oshmem_configure_options.m4.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] Trunk broken on NERSC's Cray XE6

2013-01-25 Thread Paul Hargrove
Following up as I promised...

My results on NERSC's small Cray XE6 (the test/dev rack "Grace", rather
than the full-sized "Hopper") match those I get on the Cray XC30 (Edison),
and don't follow those Ralph reports for LANL's XE6.

An attempt to build/link hello_c.c results in unresolved symbols from
libnuma, libxpmem and libugni.
A complete list is available if it matters.

This is still with last night's openmpi-1.9a1r27905 tarball, and the
following 1-line mod to the platform file:
- enable_shared=yes
+ enable_shared=no

If it will help determine what is going on, I can probably get NERSC
accounts for any of the DOE Lab folks easily.
They will only get access to the full-sized XE6 (Hopper) for now.

In case any of these are helpful clues to the difference(s):
$ module list
Currently Loaded Modulefiles:
  1) modules/3.2.6.6 18)
dvs/1.8.6_0.9.0-1.0401.1401.1.120
  2) torque/4.1.4-snap.201211160904  19)
csa/3.0.0-1_2.0401.37452.4.50.gem
  3) moab/6.0.4  20)
job/1.5.5-0.1_2.0401.35380.1.10.gem
  4) xtpe-network-gemini 21)
xpmem/0.1-2.0401.36790.4.3.gem
  5) cray-mpich2/5.6.0   22)
gni-headers/2.1-1.0401.5675.4.4.gem
  6) atp/1.6.0   23)
dmapp/3.2.1-1.0401.5983.4.5.gem
  7) xe-sysroot/4.1.40   24)
pmi/4.0.0-1..9282.69.4.gem
  8) switch/1.0-1.0401.36779.2.72.gem25)
ugni/4.0-1.0401.5928.9.5.gem
  9) shared-root/1.0-1.0401.37253.3.50.gem   26)
udreg/2.3.2-1.0401.5929.3.3.gem
 10) pdsh/2.26-1.0401.37449.1.1.gem  27) xt-libsci/12.0.00
 11) nodehealth/5.0-1.0401.38460.12.18.gem   28) gcc/4.7.2
 12) lbcd/2.1-1.0401.35360.1.2.gem   29) xt-asyncpe/5.16
 13) hosts/1.0-1.0401.35364.1.115.gem30) eswrap/1.0.10
 14) configuration/1.0-1.0401.35391.1.2.gem  31) xtpe-mc12
 15) ccm/2.2.0-1.0401.37254.2.14232) cray-shmem/5.6.0
 16) audit/1.0.0-1.0401.37969.2.32.gem   33) PrgEnv-gnu/4.1.40
 17) rca/1.0.0-2.0401.38656.2.2.gem


-Paul


On Fri, Jan 25, 2013 at 5:50 PM, Paul Hargrove  wrote:

> Ralph,
>
> Again our results differ.
> I did NOT need the additional #include to link a simple test program.
> I am going to try on our XE6 shortly.
>
> I suspect you are right about something in the configury being different.
> I am willing to try a few more nightly tarballs if somebody thinks they
> have the proper fix.
>
> -Paul
>
>
> On Fri, Jan 25, 2013 at 5:45 PM, Ralph Castain  wrote:
>
>>
>> On Jan 25, 2013, at 5:12 PM, Paul Hargrove  wrote:
>>
>> Ralph,
>>
>> Those are the result of the missing -lnuma that Nathan already identified
>> earlier as missing in BOTH 1.7 and trunk.
>> I see MORE missing symbols, which include ones from libxpmem and libugni.
>>
>>
>> Alright, let me try to be clearer. We are missing -lnuma as well as the
>> required include file - both are necessary to remove the issue.
>>
>> I find both the xpmem and ugni libraries *are* correctly included in both
>> 1.7 and trunk. It could be a case of finding them in the configury, but we
>> are finding them *and* correctly including them on the XE6.
>>
>> HTH
>> Ralph
>>
>>
>> -Paul
>>
>>
>> On Fri, Jan 25, 2013 at 4:59 PM, Ralph Castain  wrote:
>>
>>>
>>> On Jan 25, 2013, at 4:53 PM, Ralph Castain  wrote:
>>> > The repeated libs is something we obviously should fix, but all the
>>> libs are there - including lustre. I guess those were dropped due to the
>>> shared lib setting, so we probably should fix that in the platform file.
>>> >
>>> > Perhaps that is the cause of Nathan's issue? shrug...regardless, apps
>>> build and run just fine using mpicc for me.
>>>
>>> Correction - turns out I misspoke. I find apps *don't* build correctly
>>> with this setup:
>>>
>>> mpicc -ghello_c.c   -o hello_c
>>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o):
>>> In function `hwloc_linux_set_area_membind':
>>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1116:
>>> undefined reference to `mbind'
>>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1135:
>>> undefined reference to `mbind'
>>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o):
>>> In function `hwloc_linux_get_area_membind':
>>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1337:
>>> undefined reference to `get_mempolicy'
>>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o):
>>> In function `hwloc_linux_find_kernel_max_numnodes':
>>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1239:
>>> undefined reference to `get_mempolicy'
>>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o):
>>> In function `hwloc_linux_set_thisthread_membind':
>>> 

Re: [OMPI devel] trunk broken?

2012-08-30 Thread Ralph Castain
Yes, we know - been fixed.


On Aug 30, 2012, at 7:50 AM, Eugene Loh  wrote:

> Trunk broken?  Last night, Oracle's MTT trunk runs all came up empty handed.  
> E.g.,
> 
> *** Process received signal ***
> Signal: Segmentation fault (11)
> Signal code: Address not mapped (1)
> Failing at address: (nil)
> [ 0] [0xe600]
> [ 1] /lib/libc.so.6(strlen+0x33) [0x3fa0a3]
> [ 2] /lib/libc.so.6(__strdup+0x25) [0x3f9de5]
> [ 3] .../lib/openmpi/mca_db_hash.so [0xf7bbdd34]
> [ 4] .../lib/libmpi.so.0(orte_util_decode_pidmap+0x5f4) [0xf7e46654]
> [ 5] .../lib/libmpi.so.0(orte_util_nidmap_init+0x1b4) [0xf7e46d54]
> [ 6] .../lib/openmpi/mca_ess_env.so [0xf7bc4f62]
> [ 7] .../lib/libmpi.so.0(orte_init+0x160) [0xf7e2d250]
> [ 8] .../lib/libmpi.so.0(ompi_mpi_init+0x163) [0xf7de2133]
> [ 9] .../lib/libmpi.so.0(MPI_Init+0x13f) [0xf7dfb6df]
> [10] ./c_ring [0x8048759]
> [11] /lib/libc.so.6(__libc_start_main+0xdc) [0x3a0dec]
> [12] ./c_ring [0x80486a1]
> *** End of error message ***
> 
> r27182.  The previous night, with r27175, ran fine.  Quick peek at 27178 
> seems fine (I think).
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] trunk broken?

2012-08-30 Thread Eugene Loh
Trunk broken?  Last night, Oracle's MTT trunk runs all came up empty 
handed.  E.g.,


*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: (nil)
[ 0] [0xe600]
[ 1] /lib/libc.so.6(strlen+0x33) [0x3fa0a3]
[ 2] /lib/libc.so.6(__strdup+0x25) [0x3f9de5]
[ 3] .../lib/openmpi/mca_db_hash.so [0xf7bbdd34]
[ 4] .../lib/libmpi.so.0(orte_util_decode_pidmap+0x5f4) [0xf7e46654]
[ 5] .../lib/libmpi.so.0(orte_util_nidmap_init+0x1b4) [0xf7e46d54]
[ 6] .../lib/openmpi/mca_ess_env.so [0xf7bc4f62]
[ 7] .../lib/libmpi.so.0(orte_init+0x160) [0xf7e2d250]
[ 8] .../lib/libmpi.so.0(ompi_mpi_init+0x163) [0xf7de2133]
[ 9] .../lib/libmpi.so.0(MPI_Init+0x13f) [0xf7dfb6df]
[10] ./c_ring [0x8048759]
[11] /lib/libc.so.6(__libc_start_main+0xdc) [0x3a0dec]
[12] ./c_ring [0x80486a1]
*** End of error message ***

r27182.  The previous night, with r27175, ran fine.  Quick peek at 27178 
seems fine (I think).


Re: [OMPI devel] Trunk broken?

2011-07-06 Thread Yevgeny Kliteynik
On 06-Jul-11 2:21 AM, Ralph Castain wrote:
> Never mind - this seems to have been another svn-related artifact. I started 
> fresh and it didn't show up.

I did some changes in m4 file, so I think that autogen + configure + make
should have fixed the problem. But never mind, if it works with fresh
checkout then I guess we're OK.

-- YK

> 
> On Jul 5, 2011, at 12:46 PM, Ralph Castain wrote:
> 
>> I'm getting this when trying to build the trunk on a system with openib:
>>
>> In file included from btl_openib_ini.h:16,
>> from btl_openib.c:47:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from btl_openib_component.c:80:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from btl_openib_endpoint.h:32,
>> from btl_openib_endpoint.c:46:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from btl_openib_frag.c:22:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from btl_openib_proc.c:27:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from btl_openib_mca.c:33:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> btl_openib_mca.c:401:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from btl_openib_ini.c:35:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from btl_openib_async.c:26:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from btl_openib_xrc.h:14,
>> from btl_openib_xrc.c:23:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from btl_openib_endpoint.h:32,
>> from btl_openib_ip.c:30:
>> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from connect/btl_openib_connect_base.c:13:
>> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from connect/btl_openib_connect_oob.c:41:
>> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> connect/btl_openib_connect_oob.c:47:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
>> not defined
>> connect/btl_openib_connect_oob.c:65:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
>> not defined
>> connect/btl_openib_connect_oob.c:115:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
>> not defined
>> connect/btl_openib_connect_oob.c:271:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
>> not defined
>> connect/btl_openib_connect_oob.c:307:7: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
>> not defined
>> connect/btl_openib_connect_oob.c:396:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
>> not defined
>> connect/btl_openib_connect_oob.c:1011:6: warning: "OMPI_ENABLE_DYNAMIC_SL" 
>> is not defined
>> In file included from connect/btl_openib_connect_empty.c:13:
>> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>> In file included from ./btl_openib_proc.h:26,
>> from connect/btl_openib_connect_rdmacm.c:53:
>> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
>>
>>
>> Can't build at all...can someone please fix this?
>> Ralph
>>
>>
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 



Re: [OMPI devel] Trunk broken?

2011-07-05 Thread Ralph Castain
Never mind - this seems to have been another svn-related artifact. I started 
fresh and it didn't show up.


On Jul 5, 2011, at 12:46 PM, Ralph Castain wrote:

> I'm getting this when trying to build the trunk on a system with openib:
> 
> In file included from btl_openib_ini.h:16,
>from btl_openib.c:47:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from btl_openib_component.c:80:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from btl_openib_endpoint.h:32,
>from btl_openib_endpoint.c:46:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from btl_openib_frag.c:22:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from btl_openib_proc.c:27:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from btl_openib_mca.c:33:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> btl_openib_mca.c:401:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from btl_openib_ini.c:35:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from btl_openib_async.c:26:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from btl_openib_xrc.h:14,
>from btl_openib_xrc.c:23:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from btl_openib_endpoint.h:32,
>from btl_openib_ip.c:30:
> btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from connect/btl_openib_connect_base.c:13:
> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from connect/btl_openib_connect_oob.c:41:
> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> connect/btl_openib_connect_oob.c:47:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
> not defined
> connect/btl_openib_connect_oob.c:65:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
> not defined
> connect/btl_openib_connect_oob.c:115:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
> not defined
> connect/btl_openib_connect_oob.c:271:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
> not defined
> connect/btl_openib_connect_oob.c:307:7: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
> not defined
> connect/btl_openib_connect_oob.c:396:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
> not defined
> connect/btl_openib_connect_oob.c:1011:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
> not defined
> In file included from connect/btl_openib_connect_empty.c:13:
> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> In file included from ./btl_openib_proc.h:26,
>from connect/btl_openib_connect_rdmacm.c:53:
> ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
> 
> 
> Can't build at all...can someone please fix this?
> Ralph
> 
> 




[OMPI devel] Trunk broken?

2011-07-05 Thread Ralph Castain
I'm getting this when trying to build the trunk on a system with openib:

In file included from btl_openib_ini.h:16,
from btl_openib.c:47:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from btl_openib_component.c:80:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from btl_openib_endpoint.h:32,
from btl_openib_endpoint.c:46:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from btl_openib_frag.c:22:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from btl_openib_proc.c:27:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from btl_openib_mca.c:33:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
btl_openib_mca.c:401:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from btl_openib_ini.c:35:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from btl_openib_async.c:26:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from btl_openib_xrc.h:14,
from btl_openib_xrc.c:23:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from btl_openib_endpoint.h:32,
from btl_openib_ip.c:30:
btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from connect/btl_openib_connect_base.c:13:
./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from connect/btl_openib_connect_oob.c:41:
./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
connect/btl_openib_connect_oob.c:47:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not 
defined
connect/btl_openib_connect_oob.c:65:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not 
defined
connect/btl_openib_connect_oob.c:115:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
not defined
connect/btl_openib_connect_oob.c:271:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
not defined
connect/btl_openib_connect_oob.c:307:7: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
not defined
connect/btl_openib_connect_oob.c:396:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
not defined
connect/btl_openib_connect_oob.c:1011:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is 
not defined
In file included from connect/btl_openib_connect_empty.c:13:
./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined
In file included from ./btl_openib_proc.h:26,
from connect/btl_openib_connect_rdmacm.c:53:
./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined


Can't build at all...can someone please fix this?
Ralph





Re: [OMPI devel] Trunk broken at r20375

2009-01-28 Thread George Bosilca
Seems more like a compiler problem. A static inline function defined  
in the header file but never used is the source of the problem. It did  
compile for me with the gcc from Leopard and 4.3.1 on Linux. I'll  
commit the fix asap.


  george.

On Jan 28, 2009, at 14:26 , Ralph Castain wrote:

Rats - once I fixed my area, it again broke on Linux at this same  
spot in convertor.


Sorry for the confusion
Ralph

On Jan 28, 2009, at 12:25 PM, Ralph Castain wrote:

Actually, check that  - it seems to be building under Linux (my  
build broke in some other area where I am working, but not here).


However, it does not build on the Mac.

Any suggestions?
Ralph

On Jan 28, 2009, at 12:19 PM, Ralph Castain wrote:


Hi folks

I believe a recent commit has broken the trunk - I am unable to  
compile it on either Linux or Mac:


In file included from convertor_raw.c:24:
../../ompi/datatype/datatype_pack.h: In function  
‘pack_predefined_data’:
../../ompi/datatype/datatype_pack.h:41: error: implicit  
declaration of function ‘MEMCPY_CSUM’

convertor_raw.c: In function ‘ompi_convertor_raw’:
convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’,  
but argument 4 has type ‘struct iovec *’
convertor_raw.c:60: warning: format ‘%lu’ expects type ‘long  
unsigned int’, but argument 5 has type ‘unsigned int’
convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’,  
but argument 6 has type ‘long unsigned int’

convertor_raw.c:87: warning: comparison between signed and unsigned
make[2]: *** [convertor_raw.lo] Error 1
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

Perhaps an include file is missing?

Thanks
Ralph




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] Trunk broken at r20375

2009-01-28 Thread Ralph Castain
Rats - once I fixed my area, it again broke on Linux at this same spot  
in convertor.


Sorry for the confusion
Ralph

On Jan 28, 2009, at 12:25 PM, Ralph Castain wrote:

Actually, check that  - it seems to be building under Linux (my  
build broke in some other area where I am working, but not here).


However, it does not build on the Mac.

Any suggestions?
Ralph

On Jan 28, 2009, at 12:19 PM, Ralph Castain wrote:


Hi folks

I believe a recent commit has broken the trunk - I am unable to  
compile it on either Linux or Mac:


In file included from convertor_raw.c:24:
../../ompi/datatype/datatype_pack.h: In function  
‘pack_predefined_data’:
../../ompi/datatype/datatype_pack.h:41: error: implicit declaration  
of function ‘MEMCPY_CSUM’

convertor_raw.c: In function ‘ompi_convertor_raw’:
convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but  
argument 4 has type ‘struct iovec *’
convertor_raw.c:60: warning: format ‘%lu’ expects type ‘long  
unsigned int’, but argument 5 has type ‘unsigned int’
convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but  
argument 6 has type ‘long unsigned int’

convertor_raw.c:87: warning: comparison between signed and unsigned
make[2]: *** [convertor_raw.lo] Error 1
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

Perhaps an include file is missing?

Thanks
Ralph




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] Trunk broken at r20375

2009-01-28 Thread Ralph Castain
Actually, check that  - it seems to be building under Linux (my build  
broke in some other area where I am working, but not here).


However, it does not build on the Mac.

Any suggestions?
Ralph

On Jan 28, 2009, at 12:19 PM, Ralph Castain wrote:


Hi folks

I believe a recent commit has broken the trunk - I am unable to  
compile it on either Linux or Mac:


In file included from convertor_raw.c:24:
../../ompi/datatype/datatype_pack.h: In function  
‘pack_predefined_data’:
../../ompi/datatype/datatype_pack.h:41: error: implicit declaration  
of function ‘MEMCPY_CSUM’

convertor_raw.c: In function ‘ompi_convertor_raw’:
convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but  
argument 4 has type ‘struct iovec *’
convertor_raw.c:60: warning: format ‘%lu’ expects type ‘long  
unsigned int’, but argument 5 has type ‘unsigned int’
convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but  
argument 6 has type ‘long unsigned int’

convertor_raw.c:87: warning: comparison between signed and unsigned
make[2]: *** [convertor_raw.lo] Error 1
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

Perhaps an include file is missing?

Thanks
Ralph






[OMPI devel] Trunk broken at r20375

2009-01-28 Thread Ralph Castain

Hi folks

I believe a recent commit has broken the trunk - I am unable to  
compile it on either Linux or Mac:


In file included from convertor_raw.c:24:
../../ompi/datatype/datatype_pack.h: In function ‘pack_predefined_data’:
../../ompi/datatype/datatype_pack.h:41: error: implicit declaration of  
function ‘MEMCPY_CSUM’

convertor_raw.c: In function ‘ompi_convertor_raw’:
convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but  
argument 4 has type ‘struct iovec *’
convertor_raw.c:60: warning: format ‘%lu’ expects type ‘long unsigned  
int’, but argument 5 has type ‘unsigned int’
convertor_raw.c:60: warning: format ‘%p’ expects type ‘void *’, but  
argument 6 has type ‘long unsigned int’

convertor_raw.c:87: warning: comparison between signed and unsigned
make[2]: *** [convertor_raw.lo] Error 1
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

Perhaps an include file is missing?

Thanks
Ralph




Re: [OMPI devel] Trunk broken with linear, direct routing

2008-07-01 Thread Ralph Castain
Just an update: I have fixed this problem. However, I will hold off checking
it into the trunk until tomorrow. It will come in with the MPI-2 repairs to
avoid code conflicts.

Ralph


> Since this appears to have gone unnoticed, it may not be a big deal.
> However, I have found that multi-node operations are broken if you invoke
> the linear or direct routed modules.
> 
> Things work fine with the default binomial routed module.
> 
> I will be working to fix this - just a heads up.
> Ralph 




[OMPI devel] Trunk broken with linear, direct routing

2008-07-01 Thread Ralph H Castain
Since this appears to have gone unnoticed, it may not be a big deal.
However, I have found that multi-node operations are broken if you invoke
the linear or direct routed modules.

Things work fine with the default binomial routed module.

I will be working to fix this - just a heads up.
Ralph