[OMPI devel] Free list warning

2015-02-26 Thread Ralph Castain
It looks like everything in ompi_free_list has been tied together and labeled 
as deprecated. So I’m getting this warning:

In file included from class/ompi_free_list.c:12:0:
../opal/class/ompi_free_list.h: In function 'ompi_free_list_init_ex':
../opal/class/ompi_free_list.h:100:5: warning: 'ompi_free_list_init_ex_new' is 
deprecated (declared at ../opal/class/ompi_free_list.h:61) 
[-Wdeprecated-declarations]
 return ompi_free_list_init_ex_new (free_list, element_size, alignment,
 ^


Can someone please clean this up? It’s causing Intel unit tests to abort as 
they treat warnings as errors.

Thanks
Ralph



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1046-g004160f

2015-02-26 Thread Gilles Gouaillardet
George,

this is now fixed by commit 05140df1e66ad3a5d6449d339244580f4b36d872
/* i wanted to silence a false positive and introduce a bug, shame on me
... */

the original code basically did
if( MPI_ERR_IN_STATUS == err ) {
/* At least we know the error was detected during the wait_all */
int err_index = 1;
if( MPI_SUCCESS == statuses[0].MPI_ERROR ) {
err_index = 0;
}
err = statuses[err_index].MPI_ERROR;
return err;
}

is this really correct ?
i mean should it rather be

if( MPI_ERR_IN_STATUS == err ) {
/* At least we know the error was detected during the wait_all */
int err_index = 0;
if( MPI_SUCCESS == statuses[0].MPI_ERROR ) {
err_index = 1;
}
err = statuses[err_index].MPI_ERROR;
return err;
}

Cheers,

Gilles

On 2015/02/23 23:23, George Bosilca wrote:
> Please revert this fix. I don’t know what you’re trying to fix (Coverity CID 
> 1269934), but you altered the meaning of the code (regarding the 
> MPI_ERR_IN_STATUS return code) and remove meaningful comments. Btw the 
> original fix was useless as a call to recv could not return MPI_ERR_IN_STATUS 
> (as this code is reserved for functions handling multiple requests).
>
>   George.
>
>
>> On Feb 22, 2015, at 23:45 , git...@crest.iu.edu wrote:
>>
>> This is an automated email from the git hooks/post-receive script. It was
>> generated because a ref change was pushed to the repository containing
>> the project "open-mpi/ompi".
>>
>> The branch, master has been updated
>>   via  004160f8da97be1f29aefeaaa51cf52298e0d3a4 (commit)
>>  from  4c91bdfb0c106f66590aa20b245946dea4af6d61 (commit)
>>
>> Those revisions listed above that are new to this repository have
>> not appeared on any other notification email; so we list those
>> revisions in full, below.
>>
>> - Log -
>> https://github.com/open-mpi/ompi/commit/004160f8da97be1f29aefeaaa51cf52298e0d3a4
>>
>> commit 004160f8da97be1f29aefeaaa51cf52298e0d3a4
>> Author: Gilles Gouaillardet 
>> Date:   Mon Feb 23 13:45:23 2015 +0900
>>
>>coll/tuned: silence CID 1269934
>>
>> diff --git a/ompi/mca/coll/tuned/coll_tuned_barrier.c 
>> b/ompi/mca/coll/tuned/coll_tuned_barrier.c
>> index 8002a74..455e7e1 100644
>> --- a/ompi/mca/coll/tuned/coll_tuned_barrier.c
>> +++ b/ompi/mca/coll/tuned/coll_tuned_barrier.c
>> @@ -69,8 +69,6 @@ ompi_coll_tuned_sendrecv_zero(int dest, int stag,
>> /* post new irecv */
>> err = MCA_PML_CALL(irecv( NULL, 0, MPI_BYTE, source, rtag,
>>   comm, &reqs[0]));
>> -/* try to silence CID 1269934 */
>> -assert( MPI_ERR_IN_STATUS != err );
>> if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; }
>>
>> /* send data to children */
>> @@ -79,15 +77,6 @@ ompi_coll_tuned_sendrecv_zero(int dest, int stag,
>> if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; }
>>
>> err = ompi_request_wait_all( 2, reqs, statuses );
>> -if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; }
>> -
>> -return (MPI_SUCCESS);
>> -
>> - error_handler:
>> -/* As we use wait_all we will get MPI_ERR_IN_STATUS which is not an 
>> error
>> - * code that we can propagate up the stack. Instead, look for the real
>> - * error code from the MPI_ERROR in the status.
>> - */
>> if( MPI_ERR_IN_STATUS == err ) {
>> /* At least we know the error was detected during the wait_all */
>> int err_index = 1;
>> @@ -98,13 +87,18 @@ ompi_coll_tuned_sendrecv_zero(int dest, int stag,
>> OPAL_OUTPUT ((ompi_coll_tuned_stream, "%s:%d: Error %d occurred in 
>> the %s"
>>   " stage of 
>> ompi_coll_tuned_sendrecv_zero\n",
>>   __FILE__, line, err, (0 == err_index ? "receive" : 
>> "send")));
>> -} else {
>> -/* Error discovered during the posting of the irecv or isend,
>> - * and no status is available.
>> - */
>> -OPAL_OUTPUT ((ompi_coll_tuned_stream, "%s:%d: Error %d occurred\n",
>> -  __FILE__, line, err));
>> +return MPI_ERR_IN_STATUS;
>> }
>> +if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; }
>> +
>> +return (MPI_SUCCESS);
>> +
>> + error_handler:
>> +/* Error discovered during the posting of the irecv or isend,
>> + * and no status is available.
>> + */
>> +OPAL_OUTPUT ((ompi_coll_tuned_stream, "%s:%d: Error %d occurred\n",
>> +  __FILE__, line, err));
>> return err;
>> }
>>
>>
>>
>> ---
>>
>> Summary of changes:
>> ompi/mca/coll/tuned/coll_tuned_barrier.c | 28 +++-
>> 1 file changed, 11 insertions(+), 17 deletions(-)
>>
>>
>> hooks/post-receive
>> -- 
>> open-mpi/ompi
>> ___
>> ompi-co

Re: [OMPI devel] Tues Mar 3rd telecon

2015-02-26 Thread Howard Pritchard
I will also be available but suggest we skip next Tuesday.
 On Feb 25, 2015 5:04 PM, "Ralph Castain"  wrote:

> Hey folks
>
> Given that some number of us will be at the MPI Forum next week, do we
> have a quorum available for the weekly telecon? Who would be able to make
> it?
>
> Me: available
>
> Ralph
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/17049.php
>


Re: [OMPI devel] Free list warning

2015-02-26 Thread Nathan Hjelm

Is it just complaining that the inline functions are deprecated or some
code still using ompi_free_list_t? If it is the former I will go ahead
and remove the dummy implementation.

-Nathan

On Wed, Feb 25, 2015 at 09:00:26PM -0800, Ralph Castain wrote:
>It looks like everything in ompi_free_list has been tied together and
>labeled as deprecated. So I'm getting this warning:
>In file included from class/ompi_free_list.c:12:0:
>../opal/class/ompi_free_list.h: In function 'ompi_free_list_init_ex':
>../opal/class/ompi_free_list.h:100:5: warning:
>'ompi_free_list_init_ex_new' is deprecated (declared at
>../opal/class/ompi_free_list.h:61) [-Wdeprecated-declarations]
> return ompi_free_list_init_ex_new (free_list, element_size,
>alignment,
> ^
>Can someone please clean this up? It's causing Intel unit tests to abort
>as they treat warnings as errors.
>Thanks
>Ralph

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/02/17052.php



pgp45W1aPxG4h.pgp
Description: PGP signature


Re: [OMPI devel] Free list warning

2015-02-26 Thread Ralph Castain
So far as I can tell, it is complaining about the definitions - the error 
doesn’t indicate anyone is using it

> On Feb 26, 2015, at 8:07 AM, Nathan Hjelm  wrote:
> 
> 
> Is it just complaining that the inline functions are deprecated or some
> code still using ompi_free_list_t? If it is the former I will go ahead
> and remove the dummy implementation.
> 
> -Nathan
> 
> On Wed, Feb 25, 2015 at 09:00:26PM -0800, Ralph Castain wrote:
>>   It looks like everything in ompi_free_list has been tied together and
>>   labeled as deprecated. So I'm getting this warning:
>>   In file included from class/ompi_free_list.c:12:0:
>>   ../opal/class/ompi_free_list.h: In function 'ompi_free_list_init_ex':
>>   ../opal/class/ompi_free_list.h:100:5: warning:
>>   'ompi_free_list_init_ex_new' is deprecated (declared at
>>   ../opal/class/ompi_free_list.h:61) [-Wdeprecated-declarations]
>>return ompi_free_list_init_ex_new (free_list, element_size,
>>   alignment,
>>^
>>   Can someone please clean this up? It's causing Intel unit tests to abort
>>   as they treat warnings as errors.
>>   Thanks
>>   Ralph
> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/02/17052.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/02/17055.php



Re: [OMPI devel] Free list warning

2015-02-26 Thread Nathan Hjelm

Ok, I put that implementation there to ease the transition for
off-master components. Removing now.

-Nathan

On Thu, Feb 26, 2015 at 08:09:27AM -0800, Ralph Castain wrote:
> So far as I can tell, it is complaining about the definitions - the error 
> doesn’t indicate anyone is using it
> 
> > On Feb 26, 2015, at 8:07 AM, Nathan Hjelm  wrote:
> > 
> > 
> > Is it just complaining that the inline functions are deprecated or some
> > code still using ompi_free_list_t? If it is the former I will go ahead
> > and remove the dummy implementation.
> > 
> > -Nathan
> > 
> > On Wed, Feb 25, 2015 at 09:00:26PM -0800, Ralph Castain wrote:
> >>   It looks like everything in ompi_free_list has been tied together and
> >>   labeled as deprecated. So I'm getting this warning:
> >>   In file included from class/ompi_free_list.c:12:0:
> >>   ../opal/class/ompi_free_list.h: In function 'ompi_free_list_init_ex':
> >>   ../opal/class/ompi_free_list.h:100:5: warning:
> >>   'ompi_free_list_init_ex_new' is deprecated (declared at
> >>   ../opal/class/ompi_free_list.h:61) [-Wdeprecated-declarations]
> >>return ompi_free_list_init_ex_new (free_list, element_size,
> >>   alignment,
> >>^
> >>   Can someone please clean this up? It's causing Intel unit tests to abort
> >>   as they treat warnings as errors.
> >>   Thanks
> >>   Ralph
> > 
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2015/02/17052.php
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/02/17055.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/02/17056.php


pgpPE6tm2b62M.pgp
Description: PGP signature


Re: [OMPI devel] [mpich-discuss] ROMIO+Lustre problems in OpenMPI 1.8.3

2015-02-26 Thread Rob Latham



On 11/07/2014 06:26 AM, Ralph Castain wrote:

Hi Rob

Following up on this: I cannot find any reference to XOPEN_SOURCE in our 
included ROMIO source for Lustre. I only found one reference anywhere in ROMIO:

romio/adio/ad_xfs/ad_xfs.h:11:#define _XOPEN_SOURCE 500

Any other suggestions on what could be causing the problem?


I've fixed this in ROMIO by not mucking around with XOPEN_SOURCE at all, 
in either lustre or xfs or anywhere.


http://git.mpich.org/mpich.git/commit/4e80e1d2b
and
http://git.mpich.org/mpich.git/commit/5a10283bf7
==rob



Thanks
Ralph



On Oct 28, 2014, at 7:32 AM, Rob Latham  wrote:



On 10/28/2014 06:00 AM, Paul Kapinos wrote:

Dear Open MPI and ROMIO developer,

We use Open MPI v.1.6.x and 1.8.x in our cluster.
We have Lustre file system; we wish to use MPI_IO.
So the OpenMPI's are compiled with this flag:

--with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre'


In our newest installation openmpi/1.8.3 we found that MPI_IO is *broken*.

Short seek for root of the evil bring the following to light:

- the ROMIO component 'MCA io: romio' isn't here at all in the affected
version, because

- configure of ROMIO has *failed* (cf. logs (a,b,c).
- because lustre_user.h was found but could not be compiled.


lustre_user.h cannot be compiled because quota defines won't compile. Ugh, what 
a mess.

A while back I noticed this and fixed it by removing an XOPEN_SOURCE feature 
test macro:

http://trac.mpich.org/projects/mpich/ticket/1973

Then, on solaris with --enable-strict we needed to put *back* the XOPEN_SOURCE 
macro or else pread and pwrite would be undefined.

So what I really need to to is delete XOPEN_SOURCE since it causes such 
headaches, and on the rare platforms that only have pread/pwrite defined if you 
take extraordinary measures, if at all, I'll have a ROMIO pread and pwrite that 
simply do seek + write (or read).

For now, please delete the XOPEN_SOURCE line at the very beginning of 
src/mpi/romio/adio/ad_lustre/ad_lustre_rwcontig.c

==rob





In our system, there are two lustre_user.h available:
$ locate lustre_user.h
/usr/include/linux/lustre_user.h
/usr/include/lustre/lustre_user.h
As I'm not very convinient with lustre, I just attach both of them.

pk224850@cluster:~[509]$ uname -a
Linux cluster.rz.RWTH-Aachen.DE 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue
Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux

pk224850@cluster:~[510]$ cat /etc/issue
Scientific Linux release 6.5 (Carbon)

Note that openmpi/1.8.1 seem to be fully OK (MPI_IO works) in our
environment.

Best

Paul Kapinos

P.S. Is there a confugure flag, which will enforce ROMIO? That is when
ROMIO not available, configure would fail. This would make such hidden
errors publique at installation time..






a) Log in Open MPI's config.log:
--

configure:226781: OMPI configuring in ompi/mca/io/romio/romio
configure:226866: running /bin/sh './configure'
--with-file-system=testfs+ufs+nfs+lustre  FROM_OMPI=yes CC="icc
-std=c99" CFLAGS="-DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2
-m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions
-Qoption,cpp,--extended_float_types -pthread" CPPFLAGS="
-I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/hwloc/hwloc172/hwloc/include
-I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent
-I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent/include"
FFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2   -m64  "
LDFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2   -m64
-fexceptions " --enable-shared --disable-static
--with-file-system=testfs+ufs+nfs+lustre
--prefix=/opt/MPI/openmpi-1.8.3/linux/intel --disable-aio
--cache-file=/dev/null --srcdir=. --disable-option-checking
configure:226876: /bin/sh './configure' *failed* for
ompi/mca/io/romio/romio
configure:226911: WARNING: ROMIO distribution did not configure
successfully
configure:227425: checking if MCA component io:romio can compile
configure:227427: result: no
--




b) dump of Open MPI's 'configure' output to the console:
--

checking lustre/lustre_user.h usability... no
checking lustre/lustre_user.h presence... yes
configure: WARNING: lustre/lustre_user.h: present but cannot be compiled
configure: WARNING: lustre/lustre_user.h: check for missing
prerequisite headers?
configure: WARNING: lustre/lustre_user.h: see the Autoconf documentation
configure: WARNING: lustre/lustre_user.h: section "Present But
Cannot Be Compiled"
configure: WARNING: lustre/lustre_user.h: proceeding with the compiler's
result
configure: WARNING: ##  ##
configure: WARNING: ## Report this to disc...@mpich.org ##
configure: WARNING: ## -

[OMPI devel] Using the Github ompi-release bot

2015-02-26 Thread Jeff Squyres (jsquyres)
I forgot to mention that you must change your Github flag to publicly state 
that you are part of the Open MPI organization to be able to use the 
ompi-release bot.  Go to this page:

https://github.com/orgs/open-mpi/people

Find yourself, and change your membership from "Private" (the default) to 
"Public".

Then you can issue bot:* commands on ompi-release PRs.

(I also added this info to 
https://github.com/open-mpi/ompi/wiki/OmpiReleaseBotCommands)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] opal_verbs_want_fork_support question

2015-02-26 Thread Howard Pritchard
Hi Folks,

Just tried to build a fresh head of master and am getting
opal_verbs_want_fork_support as undefined symbol when trying to build opal
lib.

Any ideas on where this should go?

It would be nice to get jenkins checking everything, or at least a light
weight travis check.

Howard


Re: [OMPI devel] opal_verbs_want_fork_support question

2015-02-26 Thread George Bosilca
Just pushed some fixes into the trunk. However, the naming of the MCA
variable for verbs fork is not following our usual requirements. I guess
the code owners should address this topic.

  George.


On Thu, Feb 26, 2015 at 4:52 PM, Howard Pritchard 
wrote:

> Hi Folks,
>
> Just tried to build a fresh head of master and am getting
> opal_verbs_want_fork_support as undefined symbol when trying to build opal
> lib.
>
> Any ideas on where this should go?
>
> It would be nice to get jenkins checking everything, or at least a light
> weight travis check.
>
> Howard
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/17060.php
>


Re: [OMPI devel] opal_verbs_want_fork_support question

2015-02-26 Thread Jeff Squyres (jsquyres)
Howard --

It looks like https://github.com/open-mpi/ompi/pull/415 was merged before it 
was ready.  George then did some commits to try and fix things, but I still 
don't think they were right.

I put some comments on #415 after it was merged; I don't know if they got 
mailed out or not.


> On Feb 26, 2015, at 4:52 PM, Howard Pritchard  wrote:
> 
> Hi Folks,
> 
> Just tried to build a fresh head of master and am getting 
> opal_verbs_want_fork_support as undefined symbol when trying to build opal 
> lib.
> 
> Any ideas on where this should go?
> 
> It would be nice to get jenkins checking everything, or at least a light 
> weight travis check.
> 
> Howard
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/02/17060.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] opal_verbs_want_fork_support question

2015-02-26 Thread George Bosilca
A better fix is underway. One that will be checked on a verbs-enabled
environment.

  George


On Thu, Feb 26, 2015 at 5:08 PM, Jeff Squyres (jsquyres)  wrote:

> Howard --
>
> It looks like https://github.com/open-mpi/ompi/pull/415 was merged before
> it was ready.  George then did some commits to try and fix things, but I
> still don't think they were right.
>
> I put some comments on #415 after it was merged; I don't know if they got
> mailed out or not.
>
>
> > On Feb 26, 2015, at 4:52 PM, Howard Pritchard 
> wrote:
> >
> > Hi Folks,
> >
> > Just tried to build a fresh head of master and am getting
> opal_verbs_want_fork_support as undefined symbol when trying to build opal
> lib.
> >
> > Any ideas on where this should go?
> >
> > It would be nice to get jenkins checking everything, or at least a light
> weight travis check.
> >
> > Howard
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/17060.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/17062.php
>


[OMPI devel] printf format warnings on master

2015-02-26 Thread Paul Hargrove
Clang noted the following on FreeBSD-10/amd64 using the current master
tarball:

Making check in threads
make  opal_thread opal_condition
  CC   opal_thread.o
  CCLD opal_thread
  CC   opal_condition.o
/home/phargrov/OMPI/openmpi-master-freebsd10-amd64/openmpi-dev-1118-gdc80863/test/threads/opal_condition.c:72:
61: warning: format specifies type 'long' but the argument has type 'int'
[-Wformat]
fprintf(stderr, "thr1: time per iteration: %ld usec\n", (c2 - c1) /
TEST_COUNT);
   ~~~
 ^~
   %d
/home/phargrov/OMPI/openmpi-master-freebsd10-amd64/openmpi-dev-1118-gdc80863/test/threads/opal_condition.c:89:
61: warning: format specifies type 'long' but the argument has type 'int'
[-Wformat]
fprintf(stderr, "thr2: time per iteration: %ld usec\n", (c2 - c1) /
TEST_COUNT);
   ~~~
 ^~
   %d
2 warnings generated.


-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] BML changes

2015-02-26 Thread Rolf vandeVaart
This message is mostly for Nathan, but figured I would go with the wider 
distribution. I have noticed some different behaviour that I assume started 
with this change.


https://github.com/open-mpi/ompi/commit/4bf7a207e90997e75ba1c60d9d191d9d96402d04


I am noticing that the openib BTL will also be used for on-node communication 
even though the sm (or smcuda) BTL is also available. I think with the 
aforementioned change that the openib BTL is listed as an available BTL that 
supports RDMA. While looking through the debugger and looking at the 
bml_endpoint, it appears that the sm BTL is listed as the eager and send BTL, 
but the openib is listed as the RDMA btl. Looking at the logic in 
pml_ob1_sendreq.h, it looks like we can end up selecting the openib btl for 
some of the communication. I ran with some various verbosity and saw that this 
was happening. With v1.8, we only appear to use the sm (or smcuda) btl.


I am wondering if this was intentional with this change or maybe a side effect.


Rolf


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] opal_verbs_want_fork_support question

2015-02-26 Thread George Bosilca
Sorry for the mess, it took me few commits and few cleanups to have it back
in a workable state (and without other pending work from my own branches).

I also changed the naming of few MCA parameters to reflect upon their
location (OPAL and common and verbs). However, I create the corresponding
synonyms (and marked them as deprecated).

  George.


On Thu, Feb 26, 2015 at 5:15 PM, George Bosilca  wrote:

> A better fix is underway. One that will be checked on a verbs-enabled
> environment.
>
>   George
>
>
> On Thu, Feb 26, 2015 at 5:08 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> Howard --
>>
>> It looks like https://github.com/open-mpi/ompi/pull/415 was merged
>> before it was ready.  George then did some commits to try and fix things,
>> but I still don't think they were right.
>>
>> I put some comments on #415 after it was merged; I don't know if they got
>> mailed out or not.
>>
>>
>> > On Feb 26, 2015, at 4:52 PM, Howard Pritchard 
>> wrote:
>> >
>> > Hi Folks,
>> >
>> > Just tried to build a fresh head of master and am getting
>> opal_verbs_want_fork_support as undefined symbol when trying to build opal
>> lib.
>> >
>> > Any ideas on where this should go?
>> >
>> > It would be nice to get jenkins checking everything, or at least a
>> light weight travis check.
>> >
>> > Howard
>> >
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/02/17060.php
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/02/17062.php
>>
>
>


[OMPI devel] mtl:psm configury build broken in master

2015-02-26 Thread Paul Hargrove
I have been testing mtl:psm on a very old system.
Sometime pretty recently (this week I think), this started to break at
configure time:

--- MCA component mtl:psm (m4 configuration macro)
checking for MCA component mtl:psm compile mode... dso
checking --with-psm value... sanity check ok (/usr/local/Infinipath)
checking --with-psm-libdir value... simple ok (unspecified)
checking psm.h usability... yes
checking psm.h presence... yes
checking for psm.h... yes
looking for library in lib
checking for library containing psm_finalize... no
looking for library in lib64
checking for library containing psm_finalize... (cached) no
configure: error: PSM support requested but not found.  Aborting


I strongly suspect that "(cached) no" is a sign of the real problem.
The test didn't find /usr/local/Infinipath/lib64/libpsm_infinipath.so.1.0
because it didn't actually try to!

For comparison here is the same section of configure output archived from
testing of 1.8.4rc5:

--- MCA component mtl:psm (m4 configuration macro)
checking for MCA component mtl:psm compile mode... dso
checking --with-psm value... sanity check ok (/usr/local/Infinipath)
checking --with-psm-libdir value... simple ok (unspecified)
checking psm.h usability... yes
checking psm.h presence... yes
checking for psm.h... yes
looking for library in lib
checking for psm_finalize in -lpsm_infinipath... no
looking for library in lib64
checking for psm_finalize in -lpsm_infinipath... yes
checking if MCA component mtl:psm can compile... yes
checking for index in endpoint array for tag MTL... 1


Note the "yes" rather than "(cached) no" AND the different checking
description (specific lib vs any).

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] CORRECTION: mtl:psm configury broken (but NOT on master)

2015-02-26 Thread Paul Hargrove
Oops, my mistake - this was *not* a test of 'master'

This problem appears in Jeff's latest unofficial tarball built from his
branch "feature/libltdl-must-die".

I don't know if Jeff introduced the problem in his branch, or is missing
the fix.  Either way, its in your lap, Jeff.

-Paul

On Thu, Feb 26, 2015 at 4:12 PM, Paul Hargrove  wrote:

> I have been testing mtl:psm on a very old system.
> Sometime pretty recently (this week I think), this started to break at
> configure time:
>
> --- MCA component mtl:psm (m4 configuration macro)
> checking for MCA component mtl:psm compile mode... dso
> checking --with-psm value... sanity check ok (/usr/local/Infinipath)
> checking --with-psm-libdir value... simple ok (unspecified)
> checking psm.h usability... yes
> checking psm.h presence... yes
> checking for psm.h... yes
> looking for library in lib
> checking for library containing psm_finalize... no
> looking for library in lib64
> checking for library containing psm_finalize... (cached) no
> configure: error: PSM support requested but not found.  Aborting
>
>
> I strongly suspect that "(cached) no" is a sign of the real problem.
> The test didn't find /usr/local/Infinipath/lib64/libpsm_infinipath.so.1.0
> because it didn't actually try to!
>
> For comparison here is the same section of configure output archived from
> testing of 1.8.4rc5:
>
> --- MCA component mtl:psm (m4 configuration macro)
> checking for MCA component mtl:psm compile mode... dso
> checking --with-psm value... sanity check ok (/usr/local/Infinipath)
> checking --with-psm-libdir value... simple ok (unspecified)
> checking psm.h usability... yes
> checking psm.h presence... yes
> checking for psm.h... yes
> looking for library in lib
> checking for psm_finalize in -lpsm_infinipath... no
> looking for library in lib64
> checking for psm_finalize in -lpsm_infinipath... yes
> checking if MCA component mtl:psm can compile... yes
> checking for index in endpoint array for tag MTL... 1
>
>
> Note the "yes" rather than "(cached) no" AND the different checking
> description (specific lib vs any).
>
> -Paul
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] Master warning on oob:ud w/ PGI

2015-02-26 Thread Paul Hargrove
The warning below comes from pgi-14.7 on the latest master tarball (output
from "make V=1").

-Paul

libtool: compile:  pgcc -DHAVE_CONFIG_H -I.
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/orte/mca/oob/ud
-I../../../../opal/include -I../../../../ompi/include
-I../../../../oshmem/include
-I../../../../opal/mca/common/libfabric/libfabric
-I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen
-I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863
-I../../../..
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/opal/include
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/orte/include
-I../../../../orte/include
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/ompi/include
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/oshmem/include
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/opal/mca/hwloc/hwloc191/hwloc/include
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/BLD/opal/mca/hwloc/hwloc191/hwloc/include
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/opal/mca/event/libevent2022/libevent
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/opal/mca/event/libevent2022/libevent/include
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/BLD/opal/mca/event/libevent2022/libevent/include
-g -c
/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/orte/mca/oob/ud/oob_ud_req.c
-MD  -fpic -DPIC -o .libs/oob_ud_req.o
PGC-W-0095-Type cast required for this conversion
(/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/orte/mca/oob/ud/oob_ud_req.c:
140)
PGC/x86-64 Linux 14.7-0: compilation completed with warnings

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] mtl:psm configury build broken in master

2015-02-26 Thread Jeff Squyres (jsquyres)
Nite that there's a psm finalize check right before that that is not cached.

Sent from my phone. No type good.

On Feb 26, 2015, at 7:12 PM, Paul Hargrove 
mailto:phhargr...@lbl.gov>> wrote:

I have been testing mtl:psm on a very old system.
Sometime pretty recently (this week I think), this started to break at 
configure time:

--- MCA component mtl:psm (m4 configuration macro)
checking for MCA component mtl:psm compile mode... dso
checking --with-psm value... sanity check ok (/usr/local/Infinipath)
checking --with-psm-libdir value... simple ok (unspecified)
checking psm.h usability... yes
checking psm.h presence... yes
checking for psm.h... yes
looking for library in lib
checking for library containing psm_finalize... no
looking for library in lib64
checking for library containing psm_finalize... (cached) no
configure: error: PSM support requested but not found.  Aborting

I strongly suspect that "(cached) no" is a sign of the real problem.
The test didn't find /usr/local/Infinipath/lib64/libpsm_infinipath.so.1.0 
because it didn't actually try to!

For comparison here is the same section of configure output archived from 
testing of 1.8.4rc5:

--- MCA component mtl:psm (m4 configuration macro)
checking for MCA component mtl:psm compile mode... dso
checking --with-psm value... sanity check ok (/usr/local/Infinipath)
checking --with-psm-libdir value... simple ok (unspecified)
checking psm.h usability... yes
checking psm.h presence... yes
checking for psm.h... yes
looking for library in lib
checking for psm_finalize in -lpsm_infinipath... no
looking for library in lib64
checking for psm_finalize in -lpsm_infinipath... yes
checking if MCA component mtl:psm can compile... yes
checking for index in endpoint array for tag MTL... 1

Note the "yes" rather than "(cached) no" AND the different checking description 
(specific lib vs any).

-Paul

--
Paul H. Hargrove  
phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/02/17067.php


[OMPI devel] nearly-irrepoducable failure of master on Mac OS X 10.8

2015-02-26 Thread Paul Hargrove
Initially I was testing Jeff's tarball for PR 410, on Mac OS X 10.8 where
cc is clang, I have configured with
--prefix=[...] --enable-debug --enable-osx-builtin-atomics CC=cc CXX=c++

I passed "make check", but when I try to run ring_c I get the first failure
shown (far) below.
HOWEVER, I tried 50 times to reproduce the failure and could not do so.
Since Jeff's tarball is not "official" I turned my attention to the current
master tarball instead.

I next tried FIVE HUNDRED times with the current master tarball, and was
able to reproduce the failure ONCE.
The failed assertion and backtrace are different than what I saw before, so
they also appear below.

Next, I tried with the master tarball without the builtin-atomics configure
option.
In that case my 95th trial failed and I didn't continue trying.
The failure output was (to me) indistinguishable from the one with
builtin-atomics, but it is also included below for completeness.

Finally, I tried w/o clang leaving only "--prefix=[...] --enable-debug" on
the configure command line.
However, note that "gcc" is really "i686-apple-darwin11-llvm-gcc-4.2" and
thus shares MUCH in common with clang on the same system.
This configuration failed too, and the failure output is also provided
below.

I hope somebody knows how to proceed from here.
I don't really have any reason to believe this is specific to Mac OS X, but
don't have the spare cycles to dedicate to additional testing.

-Paul

Seen w/ Jeff's tarball:

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
 Warning :: opal_list_remove_item - the item 0x7fc092a0cb50 is not on the
list 0x7fc0928006a0
Assertion failed: (OPAL_OBJ_MAGIC_ID == ((opal_object_t *)
(kv))->obj_magic_id), function store, file
/Users/Paul/OMPI/openmpi-pr410-v4-macos10.8-x86-clang-atomics/openmpi-gitclone/opal/mca/dstore/hash/dstore_hash.c,
line 143.
[tesuji:26399] *** Process received signal ***
[tesuji:26399] Signal: Abort trap: 6 (6)
[tesuji:26399] Signal code:  (0)
[tesuji:26399] [ 0] 0   libsystem_c.dylib
0x7fff91e2b90a _sigtramp + 26^@
[tesuji:26399] [ 1] 0   ???
0x 0x0 + 4294967295^@
[tesuji:26399] [ 2] 0   libsystem_c.dylib
0x7fff91e82f61 abort + 143^@
[tesuji:26399] [ 3] 0   libsystem_c.dylib
0x7fff91e83cb9 __assert_rtn + 146^@
[tesuji:26399] [ 4] 0   mca_dstore_hash.so
 0x00010180803c store + 972^@
[tesuji:26399] [ 5] 0   libopen-pal.0.dylib
0x0001016860c6 opal_dstore_base_store + 278^@
[tesuji:26399] [ 6] 0   mca_pmix_native.so
 0x000101825795 native_get + 4709^@
[tesuji:26399] [ 7] 0   libmpi.0.dylib
 0x00010111f6a4 ompi_proc_complete_init + 980^@
[tesuji:26399] [ 8] 0   libmpi.0.dylib
 0x000101126f24 ompi_mpi_init + 2372^@
[tesuji:26399] [ 9] 0   libmpi.0.dylib
 0x0001011744c0 MPI_Init + 480^@
[tesuji:26399] [10] 0   ring_c
 0x0001010e9c25 main + 53^@
[tesuji:26399] [11] 0   libdyld.dylib
0x7fff8e03a7e1 start + 0^@
[tesuji:26399] *** End of error message ***
--
mpirun noticed that process rank 1 with PID 0 on node tesuji exited on
signal 6 (Abort trap: 6).
--

Seen with master tarball and builtin-atomics:

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
 Warning :: opal_list_remove_item - the item 0x7fc6d1900130 is not on the
list 0x7fc6d0c30df0
Assertion failed: (0 == item->opal_list_item_refcount), function
opal_list_item_destruct, file
/Users/Paul/OMPI/openmpi-master-macos10.8-x86-clang-atomics/openmpi-dev-1118-gdc80863/opal/class/opal_list.c,
line 69.
[tesuji:62565] *** Process received signal ***
[tesuji:62565] Signal: Abort trap: 6 (6)
[tesuji:62565] Signal code:  (0)
[tesuji:62565] [ 0] 0   libsystem_c.dylib
0x7fff91e2b90a _sigtramp + 26^@
[tesuji:62565] [ 1] 0   ???
0x 0x0 + 0^@
[tesuji:62565] [ 2] 0   libsystem_c.dylib
0x7fff91e82f61 abort + 143^@
[tesuji:62565] [ 3] 0   libsystem_c.dylib
0x7fff91e83cb9 __assert_rtn + 146^@
[tesuji:62565] [ 4] 0   libopen-pal.0.dylib
0x000107d54dd5 opal_list_item_destruct + 85^@
[tesuji:62565] [ 5] 0   mca_dstore_hash.so
 0x000107f67e21 opal_obj_run_destructors + 145^@
[tesuji:62565] [ 6] 0   mca_dstore_hash.so
 0x000107f6707e store + 1054^@
[tesuji:62565] [ 7] 0   libopen-pal.0.dylib
0x000107de0336 opal_dstore_base_store + 278^@
[tesuji:62565] [ 8] 0   mca_pmix_native.so
 0x000107f8aaa3 fencenb_cbfunc + 851^@
[tesuji:62565] [ 9] 0   mca_pmix_native.so
 0x000107f8bf97 pmix_usock_process_msg + 695^@
[tesuji:62565] [10] 0   libopen-pal.0.dylib
0x000107dea38d event_process_active_single_queue + 493^@
[tesuji:62565] [11] 0   libopen-pal.0.dylib
0x000107de5f7c event_process_active + 140^@
[tesuji:62565] [12] 0   libopen-pal.0.dylib
0x000107de502e opal_libevent2022_event_base_loop + 830^@
[tesuji:62565] [13] 0   libopen-pal.0.dylib
0x000107d66532 progress_engine + 66^@
[tesuji:62565] [14] 0   libsystem_c.dylib
0x7fff

Re: [OMPI devel] nearly-irrepoducable failure of master on Mac OS X 10.8

2015-02-26 Thread Ralph Castain
Hmmm…someone else recently reported this same issue (it was the Absoft folks 
hitting it occasionally on their MTT runs). I’m in the process of replacing 
that code path, so I don’t plan on pursuing it right now. However, we’ll have 
to see if the revised path resolves it.


> On Feb 26, 2015, at 5:45 PM, Paul Hargrove  wrote:
> 
> Initially I was testing Jeff's tarball for PR 410, on Mac OS X 10.8 where cc 
> is clang, I have configured with
> --prefix=[...] --enable-debug --enable-osx-builtin-atomics CC=cc CXX=c++
> 
> I passed "make check", but when I try to run ring_c I get the first failure 
> shown (far) below.
> HOWEVER, I tried 50 times to reproduce the failure and could not do so.
> Since Jeff's tarball is not "official" I turned my attention to the current 
> master tarball instead.
> 
> I next tried FIVE HUNDRED times with the current master tarball, and was able 
> to reproduce the failure ONCE.
> The failed assertion and backtrace are different than what I saw before, so 
> they also appear below.
> 
> Next, I tried with the master tarball without the builtin-atomics configure 
> option.
> In that case my 95th trial failed and I didn't continue trying.
> The failure output was (to me) indistinguishable from the one with 
> builtin-atomics, but it is also included below for completeness.
> 
> Finally, I tried w/o clang leaving only "--prefix=[...] --enable-debug" on 
> the configure command line.
> However, note that "gcc" is really "i686-apple-darwin11-llvm-gcc-4.2" and 
> thus shares MUCH in common with clang on the same system.
> This configuration failed too, and the failure output is also provided below.
> 
> I hope somebody knows how to proceed from here.
> I don't really have any reason to believe this is specific to Mac OS X, but 
> don't have the spare cycles to dedicate to additional testing.
> 
> -Paul
> 
> Seen w/ Jeff's tarball:
> 
> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>  Warning :: opal_list_remove_item - the item 0x7fc092a0cb50 is not on the 
> list 0x7fc0928006a0
> Assertion failed: (OPAL_OBJ_MAGIC_ID == ((opal_object_t *) 
> (kv))->obj_magic_id), function store, file 
> /Users/Paul/OMPI/openmpi-pr410-v4-macos10.8-x86-clang-atomics/openmpi-gitclone/opal/mca/dstore/hash/dstore_hash.c,
>  line 143.
> [tesuji:26399] *** Process received signal ***
> [tesuji:26399] Signal: Abort trap: 6 (6)
> [tesuji:26399] Signal code:  (0)
> [tesuji:26399] [ 0] 0   libsystem_c.dylib   
> 0x7fff91e2b90a _sigtramp + 26^@
> [tesuji:26399] [ 1] 0   ??? 
> 0x 0x0 + 4294967295^@
> [tesuji:26399] [ 2] 0   libsystem_c.dylib   
> 0x7fff91e82f61 abort + 143^@
> [tesuji:26399] [ 3] 0   libsystem_c.dylib   
> 0x7fff91e83cb9 __assert_rtn + 146^@
> [tesuji:26399] [ 4] 0   mca_dstore_hash.so  
> 0x00010180803c store + 972^@
> [tesuji:26399] [ 5] 0   libopen-pal.0.dylib 
> 0x0001016860c6 opal_dstore_base_store + 278^@
> [tesuji:26399] [ 6] 0   mca_pmix_native.so  
> 0x000101825795 native_get + 4709^@
> [tesuji:26399] [ 7] 0   libmpi.0.dylib  
> 0x00010111f6a4 ompi_proc_complete_init + 980^@
> [tesuji:26399] [ 8] 0   libmpi.0.dylib  
> 0x000101126f24 ompi_mpi_init + 2372^@
> [tesuji:26399] [ 9] 0   libmpi.0.dylib  
> 0x0001011744c0 MPI_Init + 480^@
> [tesuji:26399] [10] 0   ring_c  
> 0x0001010e9c25 main + 53^@
> [tesuji:26399] [11] 0   libdyld.dylib   
> 0x7fff8e03a7e1 start + 0^@
> [tesuji:26399] *** End of error message ***
> --
> mpirun noticed that process rank 1 with PID 0 on node tesuji exited on signal 
> 6 (Abort trap: 6).
> --
> 
> Seen with master tarball and builtin-atomics:
> 
> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>  Warning :: opal_list_remove_item - the item 0x7fc6d1900130 is not on the 
> list 0x7fc6d0c30df0
> Assertion failed: (0 == item->opal_list_item_refcount), function 
> opal_list_item_destruct, file 
> /Users/Paul/OMPI/openmpi-master-macos10.8-x86-clang-atomics/openmpi-dev-1118-gdc80863/opal/class/opal_list.c,
>  line 69.
> [tesuji:62565] *** Process received signal ***
> [tesuji:62565] Signal: Abort trap: 6 (6)
> [tesuji:62565] Signal code:  (0)
> [tesuji:62565] [ 0] 0   libsystem_c.dylib   
> 0x7fff91e2b90a _sigtramp + 26^@
> [tesuji:62565] [ 1] 0   ??? 
> 0x 0x0 + 0^@
> [tesuji:62565] [ 2] 0   libsystem_c.dylib   
> 0x7fff91e82f61 abort + 143^@
> [tesuji:62565] [ 3] 0   libsystem_c.dylib   
> 0x7fff91e83cb9 __assert_rtn + 146^@
> [tesuji:62565] [ 4] 0   libopen-pal.0.dylib 
> 0x000107d54dd5 opal_list_i

[OMPI devel] Odd master build failure with Studio 12.4 on Linux w/ -m32

2015-02-26 Thread Paul Hargrove
I am using Oracle's Studio 12.4 compilers for Linux/x86-64 to build the
current master tarball.
However, I am passing "-m32" to generate x86 (ILP32 ABI) executables.

The full configure mess is:

--prefix=[...] --enable-debug \
CC=cc  CFLAGS="-m32"   --with-wrapper-cflags="-m32" \
CXX=CC CXXFLAGS="-m32" --with-wrapper-cxxflags="-m32" \
FC=f90 FCFLAGS="-m32"  --with-wrapper-fcflags="-m32"


The failing output from "make V=1" is

/bin/sh ../../../libtool  --tag=CC   --mode=link cc  -m32 -g -mt
 -export-dynamic-o opal_wrapper opal_wrapper.o ../../../opal/
libopen-pal.la -lrt -lm -lutil   -lrt -lm -lutil
libtool: link: cc -m32 -g -mt -o .libs/opal_wrapper opal_wrapper.o
-Wl,--export-dynamic  ../../../opal/.libs/libopen-pal.so -ldl -lrt -lm
-lutil -mt -Wl,-rpath
-Wl,/scratch/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u4-m32/INST/lib
../../../opal/.libs/libopen-pal.so: undefined reference to `ebx'


Now clearly "ebx" should be referring to the CPU register, not an external
symbol, right?
HOWEVER, in x86 PIC code (e.g. a .so file) one CANNOT generally use 'ebx'
in inline asm because it is used as the GOT pointer.
So, their might be more than one problem here.

The same is seen with the older Studio 12.3 compilers for Linux.
However, the problem is *NOT* seen with Studio 12.3 compilers on Solaris-11
and the identical configure options.

-Paul

BTW:
Can somebody tell me if I really need to specify "-m32" in *both* CFLAGS
and --with-wrapper-cflags (etc.)?

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900