Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Ralph,

You get it right.
The latest nightly tarball shoul work out of the box.
(well, -m64 must be passed manually, but this is not related whatsoever to the 
issue discussed here)

Cheers,

Gilles

"Jeff Squyres (jsquyres)"  wrote:
>Paul --
>
>The __sun macro check is now in the OMPI 1.8 tree, and is in the latest 
>nightly tarball.
>
>If I'm following this thread right -- and I might not be! -- I think Gilles is 
>saying that now that the __sun check is in, it should fix this 
>-mt/-D_REENTRANT/whatever problem.
>
>Can you confirm?
>
>
>On Dec 16, 2014, at 1:55 PM, Paul Hargrove  wrote:
>
>> Gilles,
>> 
>> I am running mpirun on a host that ALSO will run one of the application 
>> processes.
>> Requested ifconfig and netstat outputs appear below.
>> 
>> -Paul
>> 
>> [phargrov@pcp-j-20 ~]$ ifconfig -a
>> lo0: flags=2001000849 mtu 8232 
>> index 1
>> inet 127.0.0.1 netmask ff00 
>> bge0: flags=1004843 mtu 1500 index 
>> 2
>> inet 172.16.0.120 netmask  broadcast 172.16.255.255
>> p.ibp0: flags=1001000843 
>> mtu 2044 index 3
>> inet 172.18.0.120 netmask  broadcast 172.18.255.255
>> lo0: flags=2002000849 mtu 8252 
>> index 1
>> inet6 ::1/128 
>> bge0: flags=20002004841 mtu 1500 index 2
>> inet6 fe80::250:45ff:fe5c:2b0/10 
>> [phargrov@pcp-j-20 ~]$ netstat -nr
>> 
>> Routing Table: IPv4
>>   Destination   Gateway   Flags  Ref Use Interface 
>>   - - -- - 
>> default  172.16.254.1 UG2 158463 bge0  
>> 127.0.0.1127.0.0.1UH5 398913 lo0   
>> 172.16.0.0   172.16.0.120 U 4  135241319 bge0  
>> 172.18.0.0   172.18.0.120 U 3 26 p.ibp0 
>> 
>> Routing Table: IPv6
>>   Destination/MaskGateway   Flags Ref   Use
>> If  
>> --- --- - --- --- 
>> - 
>> ::1 ::1 UH  2   0 
>> lo0  
>> fe80::/10   fe80::250:45ff:fe5c:2b0 U   2   0 
>> bge0 
>> 
>> On Tue, Dec 16, 2014 at 2:55 AM, Gilles Gouaillardet 
>>  wrote:
>> Paul,
>> 
>> could you please send the output of
>> ifconfig -a
>> netstat -nr
>> 
>> on the three hosts you are using
>> (i assume you are still invoking mpirun from one node, and tasks are running 
>> on two other nodes)
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> 
>> On 2014/12/16 16:00, Paul Hargrove wrote:
>>> Gilles,
>>> 
>>> I looked again carefully and I am *NOT* finding -D_REENTRANT passed to most
>>> compilations.
>>> It appears to be used for building libevent and vt, but nothing else.
>>> The output from configure contains
>>> 
>>> checking if more special flags are required for pthreads... -D_REENTRANT
>>> 
>>> only in the libevent and vt sub-configure portions.
>>> 
>>> When configured for gcc on Solaris-11 I see the following in configure
>>> 
>>> checking for C optimization flags... -m64 -D_REENTRANT -g
>>> -finline-functions -fno-strict-aliasing
>>> 
>>> but with CC=cc the equivalent line is
>>> 
>>> checking for C optimization flags... -m64 -g
>>> 
>>> In both cases the "-m64" is from the CFLAGS I have passed to configure.
>>> 
>>> However, when I use CFLAGS="-m64 -D_REENTRANT" the problem DOES NOT go away.
>>> I see
>>> 
>>> [pcp-j-20:24740] mca_oob_tcp_accept: accept() failed: Error 0 (11).
>>> 
>>> A process or daemon was unable to complete a TCP connection
>>> to another process:
>>>   Local host:pcp-j-20
>>>   Remote host:   172.18.0.120
>>> This is usually caused by a firewall on the remote host. Please
>>> check that any firewall (e.g., iptables) has been disabled and
>>> try again.
>>> 
>>> 
>>> which is at least appears to have a non-zero errno.
>>> A quick grep through /usr/include/sys/errno shows 11 is EAGAIN.
>>> 
>>> With the oob.patch you provided the failed accept goes away, BUT the
>>> connection still fails:
>>> 
>>> 
>>> A process or daemon was unable to complete a TCP connection
>>> to another process:
>>>   Local host:pcp-j-20
>>>   Remote host:   172.18.0.120
>>> This is usually caused by a firewall on the remote host. Please
>>> check that any firewall (e.g., iptables) has been disabled and
>>> try again.
>>> 
>>> 
>>> 
>>> Use of "-mca oob_tcp_if_include bge0" to use a single interface did not fix
>>> 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
Results of tests described below:

1) SEGV in hwloc - will report later
2) PASS
3) PASS

So, both -D_REENTRANT or -mt are working for me IF added both the CFLAGS
and wrapper-cflags.

-Paul

On Tue, Dec 16, 2014 at 10:56 PM, Paul Hargrove  wrote:
>
> I've queued 3 tests:
>
> 1) openmpi-v1.8.3-272-g4e4f997
> 2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags
> 3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags
>
> I hope to be able to login and collect the results around noon pacific
> time on Wed.
>
> -Paul
>
> On Tue, Dec 16, 2014 at 10:48 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  Paul,
>>
>> i understand, i will now work on a better way to figure out the required
>> flags
>>
>> the latest nightly snapshot does not include the commit i mentionned, and
>> i think
>> it is worth giving it a try (to be 100.0% sure ...)
>>
>> can you please do that tomorrow ?
>>
>> in the mean time, if we (well Ralph indeed) want to release 1.8.4, then
>> simply restore
>> the two config files i mentionned.
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/12/17 15:39, Paul Hargrove wrote:
>>
>> Gilles,
>>
>> If I have done my testing correctly (not 100% sure) then adding
>> "-D_REENTRANT" was NOT sufficient, where "-mt" was.
>>
>> I can at least test 1 tarball with one set of configure args each evening.
>> Anything more than that I cannot commit to.
>>
>> My scripts are capable of grabbing the v1.8 nightly instead of the rc if
>> that helps.
>>
>> -Paul
>>
>> On Tue, Dec 16, 2014 at 10:31 PM, Gilles Gouaillardet 
>>  wrote:
>>
>>
>>  Ralph,
>>
>> i think that will not work.
>>
>> here is the full story :
>>
>> once upon a time, on solaris, we did not try to compile pthread'ed app
>> without any special parameters.
>> that was a minor annoyance on solaris 10 with old gcc : configure passed a
>> flag (-pthread if i remember correctly)
>> that was not supported by gcc (at that time) and generated tons of
>> warnings.
>> when i asked "why don't we just try no special parameter on solaris ?" i
>> was replied this is because looong time ago
>> openmpi used solaris lwp, so solaris was "special" anyway.
>> since solaris is able to build (compile+link) a pthread'ed app without any
>> flags, i removed the special case for solaris,
>> and no flag was used.
>> then i noticed that lead to bad code (errno is global instead of per
>> thread specific), so you automatically added -D_REENTRANT
>> on solaris (e.g. if the __sun__ macro is defined)
>> then i found that solarisstudio compilers do not define the __sun__macro
>> automatically (__sun and sun are defined) so i improved
>> the test (e.g. we are on solaris if __sun__ or __sun is defined)
>> this was merged (yesterday) and is not in rc4
>>
>> what we should do know is unclear for me ...
>> is -D_REENTRANT enough for gcc compilers on solaris ?
>> is -D_REENTRANT *not* enough for solarisstudio compilers on solaris ?
>> /* if -D_REENTRANT is *not* enough, then we all we have to do is use -mt
>> since that implies -D_REENTRANT */
>>
>>
>> a working solution (minus the minor annoyance i described earlier) is to
>> restore
>> config/opal_check_os_flavors.m4
>> config/ompi_config_pthreads.m4
>>
>> and then i ll find a better way to correctly set the flags that must be
>> used on solaris
>>
>> that being said, and based on Paul's availability, i d rather have a new
>> tarball (rc5?) tested.
>> (do we *really* need -mt ? isn't -D_REENTRANT enough ?)
>> this tarball must 
>> includehttps://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>>
>>
>> configury: test the __sun macro to detect solaris OS.
>>
>>
>> FWIW. i was unable to reproduce the problem on solaris 11 with sunstudio
>> 12.4 even if i do not use -D_REENTRANT *nor* -mt (!)
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/12/17 15:01, Ralph Castain wrote:
>>
>> Hi Paul
>>
>> Can you try the attached patch? It would require running autogen, I fear.
>> Otherwise, I can add it to the tarball.
>>
>> Ralph
>>
>>
>> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  
>>    wrote:
>>
>>  Gilles,
>>
>> The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
>> arguments.
>>
>> While it may be overkill, I configured 1.8.4rc4 with
>>
>>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>>LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
>>
>> The resulting run worked!
>>
>> So, I very strongly suspect that the problem will be resolved if one
>> restores the configure logic that my previous email shows has vanished
>> (since that would restore "-mt" to CFLAGS and wrapper cflags).
>>
>> -Paul
>>
>> On Tue, Dec 16, 2014 at 8:10 PM, Paul Hargrove  
>>    wrote:
>>
>>  My 1.8.3 build has not completed.
>> HOWEVER, I can already see a key 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
I did run the nightly and it SEGVs in hwloc!
I will provide more info when I am able.
-Paul

On Tue, Dec 16, 2014 at 10:59 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
>
>  Thanks Paul !
>
> imho the first test is useless since it does not include the commit that
> sets the -D_REENTRANT CFLAGS on solaris/solarisstudio
>
> https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:56, Paul Hargrove wrote:
>
> I've queued 3 tests:
>
> 1) openmpi-v1.8.3-272-g4e4f997
> 2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags
> 3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags
>
> I hope to be able to login and collect the results around noon pacific time
> on Wed.
>
> -Paul
>
> On Tue, Dec 16, 2014 at 10:48 PM, Gilles Gouaillardet 
>  wrote:
>
>
>  Paul,
>
> i understand, i will now work on a better way to figure out the required
> flags
>
> the latest nightly snapshot does not include the commit i mentionned, and
> i think
> it is worth giving it a try (to be 100.0% sure ...)
>
> can you please do that tomorrow ?
>
> in the mean time, if we (well Ralph indeed) want to release 1.8.4, then
> simply restore
> the two config files i mentionned.
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:39, Paul Hargrove wrote:
>
> Gilles,
>
> If I have done my testing correctly (not 100% sure) then adding
> "-D_REENTRANT" was NOT sufficient, where "-mt" was.
>
> I can at least test 1 tarball with one set of configure args each evening.
> Anything more than that I cannot commit to.
>
> My scripts are capable of grabbing the v1.8 nightly instead of the rc if
> that helps.
>
> -Paul
>
> On Tue, Dec 16, 2014 at 10:31 PM, Gilles Gouaillardet 
>   wrote:
>
>
>  Ralph,
>
> i think that will not work.
>
> here is the full story :
>
> once upon a time, on solaris, we did not try to compile pthread'ed app
> without any special parameters.
> that was a minor annoyance on solaris 10 with old gcc : configure passed a
> flag (-pthread if i remember correctly)
> that was not supported by gcc (at that time) and generated tons of
> warnings.
> when i asked "why don't we just try no special parameter on solaris ?" i
> was replied this is because looong time ago
> openmpi used solaris lwp, so solaris was "special" anyway.
> since solaris is able to build (compile+link) a pthread'ed app without any
> flags, i removed the special case for solaris,
> and no flag was used.
> then i noticed that lead to bad code (errno is global instead of per
> thread specific), so you automatically added -D_REENTRANT
> on solaris (e.g. if the __sun__ macro is defined)
> then i found that solarisstudio compilers do not define the __sun__macro
> automatically (__sun and sun are defined) so i improved
> the test (e.g. we are on solaris if __sun__ or __sun is defined)
> this was merged (yesterday) and is not in rc4
>
> what we should do know is unclear for me ...
> is -D_REENTRANT enough for gcc compilers on solaris ?
> is -D_REENTRANT *not* enough for solarisstudio compilers on solaris ?
> /* if -D_REENTRANT is *not* enough, then we all we have to do is use -mt
> since that implies -D_REENTRANT */
>
>
> a working solution (minus the minor annoyance i described earlier) is to
> restore
> config/opal_check_os_flavors.
> m4
> config/ompi_config_pthreads.m4
>
> and then i ll find a better way to correctly set the flags that must be
> used on solaris
>
> that being said, and based on Paul's availability, i d rather have a new
> tarball (rc5?) tested.
> (do we *really* need -mt ? isn't -D_REENTRANT enough ?)
> this tarball must 
> includehttps://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>
>
> configury: test the __sun macro to detect solaris OS.
>
>
> FWIW. i was unable to reproduce the problem on solaris 11 with sunstudio
> 12.4 even if i do not use -D_REENTRANT *nor* -mt (!)
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:01, Ralph Castain wrote:
>
> Hi Paul
>
> Can you try the attached patch? It would require running autogen, I fear.
> Otherwise, I can add it to the tarball.
>
> Ralph
>
>
> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  
>    
>    
>  wrote:
>
>  Gilles,
>
> The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
> arguments.
>
> While it may be overkill, I configured 1.8.4rc4 with
>
>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
>
> The resulting run worked!
>
> So, I very strongly suspect that the problem will be resolved if one
> restores the configure logic that my previous email shows has vanished
> (since that would restore "-mt" to CFLAGS and wrapper cflags).
>

Re: [OMPI devel] OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Jeff Squyres (jsquyres)
Turns out that this problem was caused by not having a Fortran compiler.  I 
fixed that in 
https://github.com/open-mpi/ompi-release/commit/b90c8142d343b12cbcc1023cb767801ea2d567a4.

There's still 2 other minor problems (a cleanfile and a condition source 
include); working on those...


On Dec 17, 2014, at 6:51 AM, Gilles Gouaillardet 
 wrote:

> I was unable to reproduce this on rhel6 like with both stock gcc 4.8.x and 
> gcc 4.9.1
> 
> Was the libtool updated on the ompi server ?
> 2.4.2 works fine for me
> 
> Cheers,
> 
> Gilles
> 
> 
> Ralph Castain  wrote:
> It is breaking the automated nightly tarball build - see the error email that 
> came out earlier:
> 
>  PPFC libmpi_mpifh_sizeof_la-sizeof-mpif08-pre-1.8.4_f.lo
> libtool: compile: unrecognized option 
> `-I../../../../ompi/mpi/fortran/use-mpi-tkr'
> libtool: compile: Try `libtool --help' for more information.
> libtool: compile: unrecognized option `-DHAVE_CONFIG_H'
> libtool: compile: Try `libtool --help' for more information.
> make[4]: *** [libmpi_mpifh_sizeof_la-sizeof-mpi-pre-1.8.4_f.lo] Error 1
> make[4]: *** Waiting for unfinished jobs
> make[4]: *** [libmpi_mpifh_sizeof_la-sizeof-mpif08-pre-1.8.4_f.lo] Error 1
> make[4]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/v1.8/ompi-2014-12-16-211833/ompi/openmpi-v1.8.3-305-ge3ae27d/_build/ompi/mpi/fortran/mpif-h'
> make[3]: *** [all-recursive] Error 1
> make[3]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/v1.8/ompi-2014-12-16-211833/ompi/openmpi-v1.8.3-305-ge3ae27d/_build/ompi/mpi/fortran/mpif-h'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/v1.8/ompi-2014-12-16-211833/ompi/openmpi-v1.8.3-305-ge3ae27d/_build/ompi'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/v1.8/ompi-2014-12-16-211833/ompi/openmpi-v1.8.3-305-ge3ae27d/_build'
> make: *** [distcheck] Error 1
> ===
> 
> 
> On Wed, Dec 17, 2014 at 12:37 AM, Gilles Gouaillardet 
>  wrote:
> Ralph,
> 
> what goes wrong ?
> (e.g. which command ?)
> 
> and which compiler (e.g. gcc < 4.9.1 ?) are you using ?
> 
> Cheers,
> 
> Gilles
> 
> 
> On 2014/12/17 17:30, Ralph Castain wrote:
>> I'm afraid I cannot generate a new rc, nor will there be a new 1.8 nightly
>> tarball as (ahem) Jeff's fortran commit broke the build system. I tried to
>> figure out a fix, but am too tired to get it right.
>> 
>> So I'm afraid we are stuck for the moment until Jeff returns in the morning
>> and fixes the problem. We'll have to pick this up afterwards.
>> 
>> Sorry guys
>> Ralph
>> 
>> 
>> On Tue, Dec 16, 2014 at 10:59 PM, Gilles Gouaillardet <
>> 
>> gilles.gouaillar...@iferc.org
>> > wrote:
>> 
>>> 
>>>  Thanks Paul !
>>> 
>>> imho the first test is useless since it does not include the commit that
>>> sets the -D_REENTRANT CFLAGS on solaris/solarisstudio
>>> 
>>> 
>>> https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>>> 
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> 
>>> On 2014/12/17 15:56, Paul Hargrove wrote:
>>> 
>>> I've queued 3 tests:
>>> 
>>> 1) openmpi-v1.8.3-272-g4e4f997
>>> 2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags
>>> 3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags
>>> 
>>> I hope to be able to login and collect the results around noon pacific time
>>> on Wed.
>>> 
>>> -Paul
>>> 
>>> On Tue, Dec 16, 2014 at 10:48 PM, Gilles Gouaillardet 
>>> 
>>>  wrote:
>>> 
>>> 
>>>  Paul,
>>> 
>>> i understand, i will now work on a better way to figure out the required
>>> flags
>>> 
>>> the latest nightly snapshot does not include the commit i mentionned, and
>>> i think
>>> it is worth giving it a try (to be 100.0% sure ...)
>>> 
>>> can you please do that tomorrow ?
>>> 
>>> in the mean time, if we (well Ralph indeed) want to release 1.8.4, then
>>> simply restore
>>> the two config files i mentionned.
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> 
>>> On 2014/12/17 15:39, Paul Hargrove wrote:
>>> 
>>> Gilles,
>>> 
>>> If I have done my testing correctly (not 100% sure) then adding
>>> "-D_REENTRANT" was NOT sufficient, where "-mt" was.
>>> 
>>> I can at least test 1 tarball with one set of configure args each evening.
>>> Anything more than that I cannot commit to.
>>> 
>>> My scripts are capable of grabbing the v1.8 nightly instead of the rc if
>>> that helps.
>>> 
>>> -Paul
>>> 
>>> On Tue, Dec 16, 2014 at 10:31 PM, Gilles Gouaillardet 
>>>  
>>>  wrote:
>>> 
>>> 
>>>  Ralph,
>>> 
>>> i think that will not work.
>>> 
>>> here is the full story :
>>> 
>>> once upon a time, on solaris, we did not try to compile pthread'ed app
>>> without any 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Ralph,

what goes wrong ?
(e.g. which command ?)

and which compiler (e.g. gcc < 4.9.1 ?) are you using ?

Cheers,

Gilles

On 2014/12/17 17:30, Ralph Castain wrote:
> I'm afraid I cannot generate a new rc, nor will there be a new 1.8 nightly
> tarball as (ahem) Jeff's fortran commit broke the build system. I tried to
> figure out a fix, but am too tired to get it right.
>
> So I'm afraid we are stuck for the moment until Jeff returns in the morning
> and fixes the problem. We'll have to pick this up afterwards.
>
> Sorry guys
> Ralph
>
>
> On Tue, Dec 16, 2014 at 10:59 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>>  Thanks Paul !
>>
>> imho the first test is useless since it does not include the commit that
>> sets the -D_REENTRANT CFLAGS on solaris/solarisstudio
>>
>> https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/12/17 15:56, Paul Hargrove wrote:
>>
>> I've queued 3 tests:
>>
>> 1) openmpi-v1.8.3-272-g4e4f997
>> 2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags
>> 3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags
>>
>> I hope to be able to login and collect the results around noon pacific time
>> on Wed.
>>
>> -Paul
>>
>> On Tue, Dec 16, 2014 at 10:48 PM, Gilles Gouaillardet 
>>  wrote:
>>
>>
>>  Paul,
>>
>> i understand, i will now work on a better way to figure out the required
>> flags
>>
>> the latest nightly snapshot does not include the commit i mentionned, and
>> i think
>> it is worth giving it a try (to be 100.0% sure ...)
>>
>> can you please do that tomorrow ?
>>
>> in the mean time, if we (well Ralph indeed) want to release 1.8.4, then
>> simply restore
>> the two config files i mentionned.
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/12/17 15:39, Paul Hargrove wrote:
>>
>> Gilles,
>>
>> If I have done my testing correctly (not 100% sure) then adding
>> "-D_REENTRANT" was NOT sufficient, where "-mt" was.
>>
>> I can at least test 1 tarball with one set of configure args each evening.
>> Anything more than that I cannot commit to.
>>
>> My scripts are capable of grabbing the v1.8 nightly instead of the rc if
>> that helps.
>>
>> -Paul
>>
>> On Tue, Dec 16, 2014 at 10:31 PM, Gilles Gouaillardet 
>>   wrote:
>>
>>
>>  Ralph,
>>
>> i think that will not work.
>>
>> here is the full story :
>>
>> once upon a time, on solaris, we did not try to compile pthread'ed app
>> without any special parameters.
>> that was a minor annoyance on solaris 10 with old gcc : configure passed a
>> flag (-pthread if i remember correctly)
>> that was not supported by gcc (at that time) and generated tons of
>> warnings.
>> when i asked "why don't we just try no special parameter on solaris ?" i
>> was replied this is because looong time ago
>> openmpi used solaris lwp, so solaris was "special" anyway.
>> since solaris is able to build (compile+link) a pthread'ed app without any
>> flags, i removed the special case for solaris,
>> and no flag was used.
>> then i noticed that lead to bad code (errno is global instead of per
>> thread specific), so you automatically added -D_REENTRANT
>> on solaris (e.g. if the __sun__ macro is defined)
>> then i found that solarisstudio compilers do not define the __sun__macro
>> automatically (__sun and sun are defined) so i improved
>> the test (e.g. we are on solaris if __sun__ or __sun is defined)
>> this was merged (yesterday) and is not in rc4
>>
>> what we should do know is unclear for me ...
>> is -D_REENTRANT enough for gcc compilers on solaris ?
>> is -D_REENTRANT *not* enough for solarisstudio compilers on solaris ?
>> /* if -D_REENTRANT is *not* enough, then we all we have to do is use -mt
>> since that implies -D_REENTRANT */
>>
>>
>> a working solution (minus the minor annoyance i described earlier) is to
>> restore
>> config/opal_check_os_flavors.
>> m4
>> config/ompi_config_pthreads.m4
>>
>> and then i ll find a better way to correctly set the flags that must be
>> used on solaris
>>
>> that being said, and based on Paul's availability, i d rather have a new
>> tarball (rc5?) tested.
>> (do we *really* need -mt ? isn't -D_REENTRANT enough ?)
>> this tarball must 
>> includehttps://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>>
>>
>> configury: test the __sun macro to detect solaris OS.
>>
>>
>> FWIW. i was unable to reproduce the problem on solaris 11 with sunstudio
>> 12.4 even if i do not use -D_REENTRANT *nor* -mt (!)
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/12/17 15:01, Ralph Castain wrote:
>>
>> Hi Paul
>>
>> Can you try the attached patch? It would require running autogen, I fear.
>> Otherwise, I can add it to the tarball.
>>
>> Ralph
>>
>>
>> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  
>>    
>> 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Ralph Castain
I'm afraid I cannot generate a new rc, nor will there be a new 1.8 nightly
tarball as (ahem) Jeff's fortran commit broke the build system. I tried to
figure out a fix, but am too tired to get it right.

So I'm afraid we are stuck for the moment until Jeff returns in the morning
and fixes the problem. We'll have to pick this up afterwards.

Sorry guys
Ralph


On Tue, Dec 16, 2014 at 10:59 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
>
>  Thanks Paul !
>
> imho the first test is useless since it does not include the commit that
> sets the -D_REENTRANT CFLAGS on solaris/solarisstudio
>
> https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:56, Paul Hargrove wrote:
>
> I've queued 3 tests:
>
> 1) openmpi-v1.8.3-272-g4e4f997
> 2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags
> 3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags
>
> I hope to be able to login and collect the results around noon pacific time
> on Wed.
>
> -Paul
>
> On Tue, Dec 16, 2014 at 10:48 PM, Gilles Gouaillardet 
>  wrote:
>
>
>  Paul,
>
> i understand, i will now work on a better way to figure out the required
> flags
>
> the latest nightly snapshot does not include the commit i mentionned, and
> i think
> it is worth giving it a try (to be 100.0% sure ...)
>
> can you please do that tomorrow ?
>
> in the mean time, if we (well Ralph indeed) want to release 1.8.4, then
> simply restore
> the two config files i mentionned.
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:39, Paul Hargrove wrote:
>
> Gilles,
>
> If I have done my testing correctly (not 100% sure) then adding
> "-D_REENTRANT" was NOT sufficient, where "-mt" was.
>
> I can at least test 1 tarball with one set of configure args each evening.
> Anything more than that I cannot commit to.
>
> My scripts are capable of grabbing the v1.8 nightly instead of the rc if
> that helps.
>
> -Paul
>
> On Tue, Dec 16, 2014 at 10:31 PM, Gilles Gouaillardet 
>   wrote:
>
>
>  Ralph,
>
> i think that will not work.
>
> here is the full story :
>
> once upon a time, on solaris, we did not try to compile pthread'ed app
> without any special parameters.
> that was a minor annoyance on solaris 10 with old gcc : configure passed a
> flag (-pthread if i remember correctly)
> that was not supported by gcc (at that time) and generated tons of
> warnings.
> when i asked "why don't we just try no special parameter on solaris ?" i
> was replied this is because looong time ago
> openmpi used solaris lwp, so solaris was "special" anyway.
> since solaris is able to build (compile+link) a pthread'ed app without any
> flags, i removed the special case for solaris,
> and no flag was used.
> then i noticed that lead to bad code (errno is global instead of per
> thread specific), so you automatically added -D_REENTRANT
> on solaris (e.g. if the __sun__ macro is defined)
> then i found that solarisstudio compilers do not define the __sun__macro
> automatically (__sun and sun are defined) so i improved
> the test (e.g. we are on solaris if __sun__ or __sun is defined)
> this was merged (yesterday) and is not in rc4
>
> what we should do know is unclear for me ...
> is -D_REENTRANT enough for gcc compilers on solaris ?
> is -D_REENTRANT *not* enough for solarisstudio compilers on solaris ?
> /* if -D_REENTRANT is *not* enough, then we all we have to do is use -mt
> since that implies -D_REENTRANT */
>
>
> a working solution (minus the minor annoyance i described earlier) is to
> restore
> config/opal_check_os_flavors.
> m4
> config/ompi_config_pthreads.m4
>
> and then i ll find a better way to correctly set the flags that must be
> used on solaris
>
> that being said, and based on Paul's availability, i d rather have a new
> tarball (rc5?) tested.
> (do we *really* need -mt ? isn't -D_REENTRANT enough ?)
> this tarball must 
> includehttps://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>
>
> configury: test the __sun macro to detect solaris OS.
>
>
> FWIW. i was unable to reproduce the problem on solaris 11 with sunstudio
> 12.4 even if i do not use -D_REENTRANT *nor* -mt (!)
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:01, Ralph Castain wrote:
>
> Hi Paul
>
> Can you try the attached patch? It would require running autogen, I fear.
> Otherwise, I can add it to the tarball.
>
> Ralph
>
>
> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  
>    
>    
>  wrote:
>
>  Gilles,
>
> The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
> arguments.
>
> While it may be overkill, I configured 1.8.4rc4 with
>
>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>LDFLAGS="-mt" 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Thanks Paul !

imho the first test is useless since it does not include the commit that
sets the -D_REENTRANT CFLAGS on solaris/solarisstudio
https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83

Cheers,

Gilles

On 2014/12/17 15:56, Paul Hargrove wrote:
> I've queued 3 tests:
>
> 1) openmpi-v1.8.3-272-g4e4f997
> 2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags
> 3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags
>
> I hope to be able to login and collect the results around noon pacific time
> on Wed.
>
> -Paul
>
> On Tue, Dec 16, 2014 at 10:48 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>>  Paul,
>>
>> i understand, i will now work on a better way to figure out the required
>> flags
>>
>> the latest nightly snapshot does not include the commit i mentionned, and
>> i think
>> it is worth giving it a try (to be 100.0% sure ...)
>>
>> can you please do that tomorrow ?
>>
>> in the mean time, if we (well Ralph indeed) want to release 1.8.4, then
>> simply restore
>> the two config files i mentionned.
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/12/17 15:39, Paul Hargrove wrote:
>>
>> Gilles,
>>
>> If I have done my testing correctly (not 100% sure) then adding
>> "-D_REENTRANT" was NOT sufficient, where "-mt" was.
>>
>> I can at least test 1 tarball with one set of configure args each evening.
>> Anything more than that I cannot commit to.
>>
>> My scripts are capable of grabbing the v1.8 nightly instead of the rc if
>> that helps.
>>
>> -Paul
>>
>> On Tue, Dec 16, 2014 at 10:31 PM, Gilles Gouaillardet 
>>  wrote:
>>
>>
>>  Ralph,
>>
>> i think that will not work.
>>
>> here is the full story :
>>
>> once upon a time, on solaris, we did not try to compile pthread'ed app
>> without any special parameters.
>> that was a minor annoyance on solaris 10 with old gcc : configure passed a
>> flag (-pthread if i remember correctly)
>> that was not supported by gcc (at that time) and generated tons of
>> warnings.
>> when i asked "why don't we just try no special parameter on solaris ?" i
>> was replied this is because looong time ago
>> openmpi used solaris lwp, so solaris was "special" anyway.
>> since solaris is able to build (compile+link) a pthread'ed app without any
>> flags, i removed the special case for solaris,
>> and no flag was used.
>> then i noticed that lead to bad code (errno is global instead of per
>> thread specific), so you automatically added -D_REENTRANT
>> on solaris (e.g. if the __sun__ macro is defined)
>> then i found that solarisstudio compilers do not define the __sun__macro
>> automatically (__sun and sun are defined) so i improved
>> the test (e.g. we are on solaris if __sun__ or __sun is defined)
>> this was merged (yesterday) and is not in rc4
>>
>> what we should do know is unclear for me ...
>> is -D_REENTRANT enough for gcc compilers on solaris ?
>> is -D_REENTRANT *not* enough for solarisstudio compilers on solaris ?
>> /* if -D_REENTRANT is *not* enough, then we all we have to do is use -mt
>> since that implies -D_REENTRANT */
>>
>>
>> a working solution (minus the minor annoyance i described earlier) is to
>> restore
>> config/opal_check_os_flavors.m4
>> config/ompi_config_pthreads.m4
>>
>> and then i ll find a better way to correctly set the flags that must be
>> used on solaris
>>
>> that being said, and based on Paul's availability, i d rather have a new
>> tarball (rc5?) tested.
>> (do we *really* need -mt ? isn't -D_REENTRANT enough ?)
>> this tarball must 
>> includehttps://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>>
>>
>> configury: test the __sun macro to detect solaris OS.
>>
>>
>> FWIW. i was unable to reproduce the problem on solaris 11 with sunstudio
>> 12.4 even if i do not use -D_REENTRANT *nor* -mt (!)
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/12/17 15:01, Ralph Castain wrote:
>>
>> Hi Paul
>>
>> Can you try the attached patch? It would require running autogen, I fear.
>> Otherwise, I can add it to the tarball.
>>
>> Ralph
>>
>>
>> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  
>>    wrote:
>>
>>  Gilles,
>>
>> The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
>> arguments.
>>
>> While it may be overkill, I configured 1.8.4rc4 with
>>
>>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>>LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
>>
>> The resulting run worked!
>>
>> So, I very strongly suspect that the problem will be resolved if one
>> restores the configure logic that my previous email shows has vanished
>> (since that would restore "-mt" to CFLAGS and wrapper cflags).
>>
>> -Paul
>>
>> On Tue, Dec 16, 2014 at 8:10 PM, Paul Hargrove  
>>    wrote:
>>
>>  My 1.8.3 build has not completed.
>> HOWEVER, I can 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
I've queued 3 tests:

1) openmpi-v1.8.3-272-g4e4f997
2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags
3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags

I hope to be able to login and collect the results around noon pacific time
on Wed.

-Paul

On Tue, Dec 16, 2014 at 10:48 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
>
>  Paul,
>
> i understand, i will now work on a better way to figure out the required
> flags
>
> the latest nightly snapshot does not include the commit i mentionned, and
> i think
> it is worth giving it a try (to be 100.0% sure ...)
>
> can you please do that tomorrow ?
>
> in the mean time, if we (well Ralph indeed) want to release 1.8.4, then
> simply restore
> the two config files i mentionned.
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:39, Paul Hargrove wrote:
>
> Gilles,
>
> If I have done my testing correctly (not 100% sure) then adding
> "-D_REENTRANT" was NOT sufficient, where "-mt" was.
>
> I can at least test 1 tarball with one set of configure args each evening.
> Anything more than that I cannot commit to.
>
> My scripts are capable of grabbing the v1.8 nightly instead of the rc if
> that helps.
>
> -Paul
>
> On Tue, Dec 16, 2014 at 10:31 PM, Gilles Gouaillardet 
>  wrote:
>
>
>  Ralph,
>
> i think that will not work.
>
> here is the full story :
>
> once upon a time, on solaris, we did not try to compile pthread'ed app
> without any special parameters.
> that was a minor annoyance on solaris 10 with old gcc : configure passed a
> flag (-pthread if i remember correctly)
> that was not supported by gcc (at that time) and generated tons of
> warnings.
> when i asked "why don't we just try no special parameter on solaris ?" i
> was replied this is because looong time ago
> openmpi used solaris lwp, so solaris was "special" anyway.
> since solaris is able to build (compile+link) a pthread'ed app without any
> flags, i removed the special case for solaris,
> and no flag was used.
> then i noticed that lead to bad code (errno is global instead of per
> thread specific), so you automatically added -D_REENTRANT
> on solaris (e.g. if the __sun__ macro is defined)
> then i found that solarisstudio compilers do not define the __sun__macro
> automatically (__sun and sun are defined) so i improved
> the test (e.g. we are on solaris if __sun__ or __sun is defined)
> this was merged (yesterday) and is not in rc4
>
> what we should do know is unclear for me ...
> is -D_REENTRANT enough for gcc compilers on solaris ?
> is -D_REENTRANT *not* enough for solarisstudio compilers on solaris ?
> /* if -D_REENTRANT is *not* enough, then we all we have to do is use -mt
> since that implies -D_REENTRANT */
>
>
> a working solution (minus the minor annoyance i described earlier) is to
> restore
> config/opal_check_os_flavors.m4
> config/ompi_config_pthreads.m4
>
> and then i ll find a better way to correctly set the flags that must be
> used on solaris
>
> that being said, and based on Paul's availability, i d rather have a new
> tarball (rc5?) tested.
> (do we *really* need -mt ? isn't -D_REENTRANT enough ?)
> this tarball must 
> includehttps://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>
>
> configury: test the __sun macro to detect solaris OS.
>
>
> FWIW. i was unable to reproduce the problem on solaris 11 with sunstudio
> 12.4 even if i do not use -D_REENTRANT *nor* -mt (!)
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:01, Ralph Castain wrote:
>
> Hi Paul
>
> Can you try the attached patch? It would require running autogen, I fear.
> Otherwise, I can add it to the tarball.
>
> Ralph
>
>
> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  
>    wrote:
>
>  Gilles,
>
> The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
> arguments.
>
> While it may be overkill, I configured 1.8.4rc4 with
>
>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
>
> The resulting run worked!
>
> So, I very strongly suspect that the problem will be resolved if one
> restores the configure logic that my previous email shows has vanished
> (since that would restore "-mt" to CFLAGS and wrapper cflags).
>
> -Paul
>
> On Tue, Dec 16, 2014 at 8:10 PM, Paul Hargrove  
>    wrote:
>
>  My 1.8.3 build has not completed.
> HOWEVER, I can already see a key difference in the configure step.
>
> In 1.8.3 "-mt" was added AUTOMATICALLY to CFLAGS by configure:
>
> checking if C compiler and POSIX threads work as is... no - Solaris, not
> checked
> checking if C++ compiler and POSIX threads work as is... no - Solaris,
> not checked
> checking if Fortran compiler and POSIX threads work as is... no -
> Solaris, not checked
> checking if C compiler and POSIX threads work with 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
Gilles,

If I have done my testing correctly (not 100% sure) then adding
"-D_REENTRANT" was NOT sufficient, where "-mt" was.

I can at least test 1 tarball with one set of configure args each evening.
Anything more than that I cannot commit to.

My scripts are capable of grabbing the v1.8 nightly instead of the rc if
that helps.

-Paul

On Tue, Dec 16, 2014 at 10:31 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
>
>  Ralph,
>
> i think that will not work.
>
> here is the full story :
>
> once upon a time, on solaris, we did not try to compile pthread'ed app
> without any special parameters.
> that was a minor annoyance on solaris 10 with old gcc : configure passed a
> flag (-pthread if i remember correctly)
> that was not supported by gcc (at that time) and generated tons of
> warnings.
> when i asked "why don't we just try no special parameter on solaris ?" i
> was replied this is because looong time ago
> openmpi used solaris lwp, so solaris was "special" anyway.
> since solaris is able to build (compile+link) a pthread'ed app without any
> flags, i removed the special case for solaris,
> and no flag was used.
> then i noticed that lead to bad code (errno is global instead of per
> thread specific), so you automatically added -D_REENTRANT
> on solaris (e.g. if the __sun__ macro is defined)
> then i found that solarisstudio compilers do not define the __sun__macro
> automatically (__sun and sun are defined) so i improved
> the test (e.g. we are on solaris if __sun__ or __sun is defined)
> this was merged (yesterday) and is not in rc4
>
> what we should do know is unclear for me ...
> is -D_REENTRANT enough for gcc compilers on solaris ?
> is -D_REENTRANT *not* enough for solarisstudio compilers on solaris ?
> /* if -D_REENTRANT is *not* enough, then we all we have to do is use -mt
> since that implies -D_REENTRANT */
>
>
> a working solution (minus the minor annoyance i described earlier) is to
> restore
> config/opal_check_os_flavors.m4
> config/ompi_config_pthreads.m4
>
> and then i ll find a better way to correctly set the flags that must be
> used on solaris
>
> that being said, and based on Paul's availability, i d rather have a new
> tarball (rc5?) tested.
> (do we *really* need -mt ? isn't -D_REENTRANT enough ?)
> this tarball must include
> https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>
> configury: test the __sun macro to detect solaris OS.
>
>
> FWIW. i was unable to reproduce the problem on solaris 11 with sunstudio
> 12.4 even if i do not use -D_REENTRANT *nor* -mt (!)
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:01, Ralph Castain wrote:
>
> Hi Paul
>
> Can you try the attached patch? It would require running autogen, I fear.
> Otherwise, I can add it to the tarball.
>
> Ralph
>
>
> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  
>  wrote:
>
>  Gilles,
>
> The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
> arguments.
>
> While it may be overkill, I configured 1.8.4rc4 with
>
>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
>
> The resulting run worked!
>
> So, I very strongly suspect that the problem will be resolved if one
> restores the configure logic that my previous email shows has vanished
> (since that would restore "-mt" to CFLAGS and wrapper cflags).
>
> -Paul
>
> On Tue, Dec 16, 2014 at 8:10 PM, Paul Hargrove  
>  wrote:
>
>  My 1.8.3 build has not completed.
> HOWEVER, I can already see a key difference in the configure step.
>
> In 1.8.3 "-mt" was added AUTOMATICALLY to CFLAGS by configure:
>
> checking if C compiler and POSIX threads work as is... no - Solaris, not
> checked
> checking if C++ compiler and POSIX threads work as is... no - Solaris,
> not checked
> checking if Fortran compiler and POSIX threads work as is... no -
> Solaris, not checked
> checking if C compiler and POSIX threads work with -pthread... no
> checking if C compiler and POSIX threads work with -pthreads... no
> checking if C compiler and POSIX threads work with -mt... yes
> checking if C++ compiler and POSIX threads work with -pthread... yes
> checking if Fortran compiler and POSIX threads work with -pthread... yes
>
> This is not the case in 1.8.4rc4:
>
> checking if C compiler and POSIX threads work as is... yes
> checking if C++ compiler and POSIX threads work as is... yes
> checking if Fortran compiler and POSIX threads work as is... yes
>
>
> So, it looks like a chunk of Solaris-specific configure logic was LOST.
>
> -Paul
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing listde...@open-mpi.org
> 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Ralph,

i think that will not work.

here is the full story :

once upon a time, on solaris, we did not try to compile pthread'ed app
without any special parameters.
that was a minor annoyance on solaris 10 with old gcc : configure passed
a flag (-pthread if i remember correctly)
that was not supported by gcc (at that time) and generated tons of warnings.
when i asked "why don't we just try no special parameter on solaris ?" i
was replied this is because looong time ago
openmpi used solaris lwp, so solaris was "special" anyway.
since solaris is able to build (compile+link) a pthread'ed app without
any flags, i removed the special case for solaris,
and no flag was used.
then i noticed that lead to bad code (errno is global instead of per
thread specific), so you automatically added -D_REENTRANT
on solaris (e.g. if the __sun__ macro is defined)
then i found that solarisstudio compilers do not define the __sun__macro
automatically (__sun and sun are defined) so i improved
the test (e.g. we are on solaris if __sun__ or __sun is defined)
this was merged (yesterday) and is not in rc4

what we should do know is unclear for me ...
is -D_REENTRANT enough for gcc compilers on solaris ?
is -D_REENTRANT *not* enough for solarisstudio compilers on solaris ?
/* if -D_REENTRANT is *not* enough, then we all we have to do is use -mt
since that implies -D_REENTRANT */


a working solution (minus the minor annoyance i described earlier) is to
restore
config/opal_check_os_flavors.m4
config/ompi_config_pthreads.m4

and then i ll find a better way to correctly set the flags that must be
used on solaris

that being said, and based on Paul's availability, i d rather have a new
tarball (rc5?) tested.
(do we *really* need -mt ? isn't -D_REENTRANT enough ?)
this tarball must include
https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83

configury: test the __sun macro to detect solaris OS.


FWIW. i was unable to reproduce the problem on solaris 11 with sunstudio
12.4 even if i do not use -D_REENTRANT *nor* -mt (!)

Cheers,

Gilles

On 2014/12/17 15:01, Ralph Castain wrote:
> Hi Paul
>
> Can you try the attached patch? It would require running autogen, I fear.
> Otherwise, I can add it to the tarball.
>
> Ralph
>
>
> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  wrote:
>> Gilles,
>>
>> The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
>> arguments.
>>
>> While it may be overkill, I configured 1.8.4rc4 with
>>
>>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>>LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
>>
>> The resulting run worked!
>>
>> So, I very strongly suspect that the problem will be resolved if one
>> restores the configure logic that my previous email shows has vanished
>> (since that would restore "-mt" to CFLAGS and wrapper cflags).
>>
>> -Paul
>>
>> On Tue, Dec 16, 2014 at 8:10 PM, Paul Hargrove  wrote:
>>> My 1.8.3 build has not completed.
>>> HOWEVER, I can already see a key difference in the configure step.
>>>
>>> In 1.8.3 "-mt" was added AUTOMATICALLY to CFLAGS by configure:
>>>
>>> checking if C compiler and POSIX threads work as is... no - Solaris, not
>>> checked
>>> checking if C++ compiler and POSIX threads work as is... no - Solaris,
>>> not checked
>>> checking if Fortran compiler and POSIX threads work as is... no -
>>> Solaris, not checked
>>> checking if C compiler and POSIX threads work with -pthread... no
>>> checking if C compiler and POSIX threads work with -pthreads... no
>>> checking if C compiler and POSIX threads work with -mt... yes
>>> checking if C++ compiler and POSIX threads work with -pthread... yes
>>> checking if Fortran compiler and POSIX threads work with -pthread... yes
>>>
>>> This is not the case in 1.8.4rc4:
>>>
>>> checking if C compiler and POSIX threads work as is... yes
>>> checking if C++ compiler and POSIX threads work as is... yes
>>> checking if Fortran compiler and POSIX threads work as is... yes
>>>
>>>
>>> So, it looks like a chunk of Solaris-specific configure logic was LOST.
>>>
>>> -Paul
>>>
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16625.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16626.php



Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
Ralph,

No change with the patch you supplied.

The test that uses the "pflags" set by your patch is guarded by the value
of ompi_pthread_c_success.
So, I think there must be some other patch needed to the body
of OMPI_INTL_POSIX_THREADS_PLAIN_C to even reach the code changed by the
patch you sent me.  It looks like 1.8.3 SKIPPED the corresponding test on
Solaris, but is not doing so now.

-Paul

On Tue, Dec 16, 2014 at 10:01 PM, Ralph Castain  wrote:
>
> Hi Paul
>
> Can you try the attached patch? It would require running autogen, I fear.
> Otherwise, I can add it to the tarball.
>
> Ralph
>
>
> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  wrote:
>
>> Gilles,
>>
>> The 1.8.3 test works where the 1.8.4rc4 one fails with identical
>> configure arguments.
>>
>> While it may be overkill, I configured 1.8.4rc4 with
>>
>>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>>LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
>>
>> The resulting run worked!
>>
>> So, I very strongly suspect that the problem will be resolved if one
>> restores the configure logic that my previous email shows has vanished
>> (since that would restore "-mt" to CFLAGS and wrapper cflags).
>>
>> -Paul
>>
>> On Tue, Dec 16, 2014 at 8:10 PM, Paul Hargrove 
>> wrote:
>>>
>>> My 1.8.3 build has not completed.
>>> HOWEVER, I can already see a key difference in the configure step.
>>>
>>> In 1.8.3 "-mt" was added AUTOMATICALLY to CFLAGS by configure:
>>>
>>> checking if C compiler and POSIX threads work as is... no - Solaris, not
>>> checked
>>> checking if C++ compiler and POSIX threads work as is... no - Solaris,
>>> not checked
>>> checking if Fortran compiler and POSIX threads work as is... no -
>>> Solaris, not checked
>>> checking if C compiler and POSIX threads work with -pthread... no
>>> checking if C compiler and POSIX threads work with -pthreads... no
>>> checking if C compiler and POSIX threads work with -mt... yes
>>> checking if C++ compiler and POSIX threads work with -pthread... yes
>>> checking if Fortran compiler and POSIX threads work with -pthread... yes
>>>
>>> This is not the case in 1.8.4rc4:
>>>
>>> checking if C compiler and POSIX threads work as is... yes
>>> checking if C++ compiler and POSIX threads work as is... yes
>>> checking if Fortran compiler and POSIX threads work as is... yes
>>>
>>>
>>> So, it looks like a chunk of Solaris-specific configure logic was LOST.
>>>
>>> -Paul
>>>
>>
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16625.php
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16626.php
>


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Ralph Castain
Hi Paul

Can you try the attached patch? It would require running autogen, I fear.
Otherwise, I can add it to the tarball.

Ralph


On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove  wrote:
>
> Gilles,
>
> The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
> arguments.
>
> While it may be overkill, I configured 1.8.4rc4 with
>
>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
>
> The resulting run worked!
>
> So, I very strongly suspect that the problem will be resolved if one
> restores the configure logic that my previous email shows has vanished
> (since that would restore "-mt" to CFLAGS and wrapper cflags).
>
> -Paul
>
> On Tue, Dec 16, 2014 at 8:10 PM, Paul Hargrove  wrote:
>>
>> My 1.8.3 build has not completed.
>> HOWEVER, I can already see a key difference in the configure step.
>>
>> In 1.8.3 "-mt" was added AUTOMATICALLY to CFLAGS by configure:
>>
>> checking if C compiler and POSIX threads work as is... no - Solaris, not
>> checked
>> checking if C++ compiler and POSIX threads work as is... no - Solaris,
>> not checked
>> checking if Fortran compiler and POSIX threads work as is... no -
>> Solaris, not checked
>> checking if C compiler and POSIX threads work with -pthread... no
>> checking if C compiler and POSIX threads work with -pthreads... no
>> checking if C compiler and POSIX threads work with -mt... yes
>> checking if C++ compiler and POSIX threads work with -pthread... yes
>> checking if Fortran compiler and POSIX threads work with -pthread... yes
>>
>> This is not the case in 1.8.4rc4:
>>
>> checking if C compiler and POSIX threads work as is... yes
>> checking if C++ compiler and POSIX threads work as is... yes
>> checking if Fortran compiler and POSIX threads work as is... yes
>>
>>
>> So, it looks like a chunk of Solaris-specific configure logic was LOST.
>>
>> -Paul
>>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16625.php
>
diff --git a/config/ompi_config_pthreads.m4 b/config/ompi_config_pthreads.m4
index 3f073d8..1ff8ada 100644
--- a/config/ompi_config_pthreads.m4
+++ b/config/ompi_config_pthreads.m4
@@ -267,6 +267,14 @@ AC_DEFUN([OMPI_INTL_POSIX_THREADS_SPECIAL_FLAGS_C], [
 #
 # C compiler
 #
+case "${host_cpu}-${host_os}" in
+  *solaris*)
+pflags="-pthread -pthreads -mt"
+  ;;
+  *)
+pflags="-Kthread -kthread -pthread -pthreads -mt -mthreads"
+  ;;
+esac
 if test "$ompi_pthread_c_success" = "0"; then
   for pf in $pflags; do
 AC_MSG_CHECKING([if C compiler and POSIX threads work with $pf])
@@ -294,6 +302,14 @@ AC_DEFUN([OMPI_INTL_POSIX_THREADS_SPECIAL_FLAGS_CXX], [
 # C++ compiler
 #
 if test "$ompi_pthread_cxx_success" = "0"; then
+  case "${host_cpu}-${host_os}" in
+*solaris*)
+  pflags="-pthread -pthreads -mt"
+;;
+*)
+  pflags="-Kthread -kthread -pthread -pthreads -mt -mthreads"
+;;
+  esac
   for pf in $pflags; do
 AC_MSG_CHECKING([if C++ compiler and POSIX threads work with $pf])
 CXXFLAGS="$orig_CXXFLAGS $pf"
@@ -320,6 +336,14 @@ AC_DEFUN([OMPI_INTL_POSIX_THREADS_SPECIAL_FLAGS_FC], [
 # Fortran compiler
 #
 if test "$ompi_pthread_fortran_success" = "0" -a "$OMPI_WANT_FORTRAN_BINDINGS" 
= "1" -a $ompi_fortran_happy -eq 1; then
+  case "${host_cpu}-${host_os}" in
+*solaris*)
+  pflags="-pthread -pthreads -mt"
+;;
+*)
+  pflags="-Kthread -kthread -pthread -pthreads -mt -mthreads"
+;;
+  esac
   for pf in $pflags; do
 AC_MSG_CHECKING([if Fortran compiler and POSIX threads work with $pf])
 FCFLAGS="$orig_FCFLAGS $pf"


Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
Gilles,

The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
arguments.

While it may be overkill, I configured 1.8.4rc4 with

   CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
   LDFLAGS="-mt" --with-wrapper-ldflags="-mt"

The resulting run worked!

So, I very strongly suspect that the problem will be resolved if one
restores the configure logic that my previous email shows has vanished
(since that would restore "-mt" to CFLAGS and wrapper cflags).

-Paul

On Tue, Dec 16, 2014 at 8:10 PM, Paul Hargrove  wrote:
>
> My 1.8.3 build has not completed.
> HOWEVER, I can already see a key difference in the configure step.
>
> In 1.8.3 "-mt" was added AUTOMATICALLY to CFLAGS by configure:
>
> checking if C compiler and POSIX threads work as is... no - Solaris, not
> checked
> checking if C++ compiler and POSIX threads work as is... no - Solaris, not
> checked
> checking if Fortran compiler and POSIX threads work as is... no - Solaris,
> not checked
> checking if C compiler and POSIX threads work with -pthread... no
> checking if C compiler and POSIX threads work with -pthreads... no
> checking if C compiler and POSIX threads work with -mt... yes
> checking if C++ compiler and POSIX threads work with -pthread... yes
> checking if Fortran compiler and POSIX threads work with -pthread... yes
>
> This is not the case in 1.8.4rc4:
>
> checking if C compiler and POSIX threads work as is... yes
> checking if C++ compiler and POSIX threads work as is... yes
> checking if Fortran compiler and POSIX threads work as is... yes
>
>
> So, it looks like a chunk of Solaris-specific configure logic was LOST.
>
> -Paul
>


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Paul Hargrove
My 1.8.3 build has not completed.
HOWEVER, I can already see a key difference in the configure step.

In 1.8.3 "-mt" was added AUTOMATICALLY to CFLAGS by configure:

checking if C compiler and POSIX threads work as is... no - Solaris, not
checked
checking if C++ compiler and POSIX threads work as is... no - Solaris, not
checked
checking if Fortran compiler and POSIX threads work as is... no - Solaris,
not checked
checking if C compiler and POSIX threads work with -pthread... no
checking if C compiler and POSIX threads work with -pthreads... no
checking if C compiler and POSIX threads work with -mt... yes
checking if C++ compiler and POSIX threads work with -pthread... yes
checking if Fortran compiler and POSIX threads work with -pthread... yes

This is not the case in 1.8.4rc4:

checking if C compiler and POSIX threads work as is... yes
checking if C++ compiler and POSIX threads work as is... yes
checking if Fortran compiler and POSIX threads work as is... yes


So, it looks like a chunk of Solaris-specific configure logic was LOST.

-Paul


Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
Paul,

i do not think -lpthread is passed automatically to LDFLAGS on Solaris,
so you might have to do it manually as well

i never used

--with-wrapper-cflags

before, so i'd rather invite you to
mpicc -show
to make sure the right flags are passed at the right place when the app
is built

Cheers,

Gilles

On 2014/12/17 12:04, Paul Hargrove wrote:
> Gilles,
>
> I am running the build of 1.8.3 first.
> As you suggest, I will only try without -m64 if 1.8.3 runs with it.
>
> Regarding "-mt" my understanding from "man cc" is that it has a DUAL
> function:
> 1) Passes -D_REENTRANT to the preprocess stage (if any)
> 2) Passes "the right flags" to the linker stage (if any)
>
> NOTE that since Solaris has historically supported a "native" (non-POSIX)
> threads library, when linking for pthreads one must pass BOTH "-mt" and
> "-lpthread", and they must be in that order.
>
> I *think* have already tried adding "-mt" to both CFLAGS and LDFLAGS, but
> am not 100% sure I've done so correctly.  I believe I need to configure with
>
>CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
>LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
>
> if I am to be sure that orterun and the app are both compiled and linked
> with "-mt".
> Is that right?
>
> -Paul
>
> On Tue, Dec 16, 2014 at 6:25 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>>  Thanks Paul,
>>
>> if 1.8.3 with -m64 and the same compilers runs fine, then please do not
>> bother running 1.8.4rc4 without -m64.
>> /* i understand you are busy and i hardly believe -m64 is the root cause */
>>
>> a regression i can think of involves the flags we use for pthreads :
>> for bad reasons, we initially tested the following flags on solaris :
>> -pthread
>> -pthreads
>> -mt
>>
>> with solarisstudio 12.4, -mt was chosen
>>
>> 1.8.4rc4 has a bug (fixed in the v1.8 git): -D_REENTRANT is not
>> automatically added, so you have to do it manually.
>> i just figured out that -mt is unlikely automatically.
>> do we need this and where ?
>> CFLAGS ? (or is -D_REENTRANT enough ?)
>> LDFLAGS ? (that might be solaris and/or solarisstudio (12.4) specific and
>> i simply ignore it)
>>
>> Bottom line, i do invite you to test 1.8.4rc4 again and with
>> CFLAGS="-mt"
>> or
>> CFLAGS="-mt -m64"
>> if you previously tested 1.8.3 with -m64
>>
>> Cheers,
>>
>> Gilles
>>
>>
>>
>> On 2014/12/17 11:05, Paul Hargrove wrote:
>>
>> Gilles,
>>
>> First, please note that prior tests of 1.8.3 ran with no problems on these
>> hosts.
>> So, I *think* this problem is a regression.
>> However, I am not 100% certain that this *exact* configuration was tested.
>> So, I am RE-running a test of 1.8.3 now to be absolutely sure if this is a
>> regression.
>> I will report the outcome when I can.
>>
>> I have limited time to run the tests you are asking for.  I will do my
>> best, but am concerned that I won't be responsive enough and may hold up
>> the release.  I fully understand why you ask multiple questions in one
>> email to keep things moving.
>>
>> I am running mpirun on pcp-j-20 and "getent hosts pcp-j-20" run there yields
>>
>> $ getent hosts pcp-j-20
>> 127.0.0.1   pcp-j-20 pcp-j-20.local localhost loghost
>> 172.16.0.120pcp-j-20 pcp-j-20.local localhost loghost
>>
>> In case it matters: there is an entry for 172.18.0.0.120 in /etc/hosts as
>> pcp-j-20-ib.
>>
>> I will run a test tonight to determine if the same issue is present without
>> "-m64".
>> I will report the outcome when I can.
>>
>> Yes, I can ping and ssh to "pcp-j-{19,20}" and "172.{16,18}.0.{119,120}".
>> I see the following if run on either pcp-j-19 or pcp-j-20:
>>
>> $ for x in {pcp-j-,172.{16,18}.0.1}{19,20}; do ssh $x echo OK connecting to
>> $x; done
>> OK connecting to pcp-j-19
>> OK connecting to pcp-j-20
>> OK connecting to 172.16.0.119
>> OK connecting to 172.16.0.120
>> OK connecting to 172.18.0.119
>> OK connecting to 172.18.0.120
>>
>>
>> I will report on the 1.8.3 and the non-m64 runs when they are done.
>> Meanwhile, if you have other things you want run let me know.
>>
>> -Paul
>>
>> On Tue, Dec 16, 2014 at 5:35 PM, Gilles Gouaillardet 
>>  wrote:
>>
>>  Thanks Paul,
>>
>> Are you invoking mpirun on pcp-j-20 ?
>> If yes, what does
>> getent hosts pcp-j-20
>> says ?
>>
>> BTW, did you try without -m64 ?
>>
>> Does the following work
>> ping/ssh 172.18.0.120
>>
>> Honestly, this output makes very little sense to me, so i am asking way
>> too much info hoping i can reproduce this issue or get a hint on what can
>> possibly goes wrong.
>>
>> Cheers,
>>
>> Gilles
>>
>> Paul Hargrove   wrote:
>> Gilles,
>>
>> I am running mpirun on a host that ALSO will run one of the application
>> processes.
>> Requested ifconfig and netstat outputs appear below.
>>
>> -Paul
>>
>> [phargrov@pcp-j-20 ~]$ ifconfig -a
>> lo0: flags=2001000849 mtu 8232
>> index 1
>> inet 127.0.0.1 netmask ff00
>> bge0: 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Paul Hargrove
Gilles,

I am running the build of 1.8.3 first.
As you suggest, I will only try without -m64 if 1.8.3 runs with it.

Regarding "-mt" my understanding from "man cc" is that it has a DUAL
function:
1) Passes -D_REENTRANT to the preprocess stage (if any)
2) Passes "the right flags" to the linker stage (if any)

NOTE that since Solaris has historically supported a "native" (non-POSIX)
threads library, when linking for pthreads one must pass BOTH "-mt" and
"-lpthread", and they must be in that order.

I *think* have already tried adding "-mt" to both CFLAGS and LDFLAGS, but
am not 100% sure I've done so correctly.  I believe I need to configure with

   CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
   LDFLAGS="-mt" --with-wrapper-ldflags="-mt"

if I am to be sure that orterun and the app are both compiled and linked
with "-mt".
Is that right?

-Paul

On Tue, Dec 16, 2014 at 6:25 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
>
>  Thanks Paul,
>
> if 1.8.3 with -m64 and the same compilers runs fine, then please do not
> bother running 1.8.4rc4 without -m64.
> /* i understand you are busy and i hardly believe -m64 is the root cause */
>
> a regression i can think of involves the flags we use for pthreads :
> for bad reasons, we initially tested the following flags on solaris :
> -pthread
> -pthreads
> -mt
>
> with solarisstudio 12.4, -mt was chosen
>
> 1.8.4rc4 has a bug (fixed in the v1.8 git): -D_REENTRANT is not
> automatically added, so you have to do it manually.
> i just figured out that -mt is unlikely automatically.
> do we need this and where ?
> CFLAGS ? (or is -D_REENTRANT enough ?)
> LDFLAGS ? (that might be solaris and/or solarisstudio (12.4) specific and
> i simply ignore it)
>
> Bottom line, i do invite you to test 1.8.4rc4 again and with
> CFLAGS="-mt"
> or
> CFLAGS="-mt -m64"
> if you previously tested 1.8.3 with -m64
>
> Cheers,
>
> Gilles
>
>
>
> On 2014/12/17 11:05, Paul Hargrove wrote:
>
> Gilles,
>
> First, please note that prior tests of 1.8.3 ran with no problems on these
> hosts.
> So, I *think* this problem is a regression.
> However, I am not 100% certain that this *exact* configuration was tested.
> So, I am RE-running a test of 1.8.3 now to be absolutely sure if this is a
> regression.
> I will report the outcome when I can.
>
> I have limited time to run the tests you are asking for.  I will do my
> best, but am concerned that I won't be responsive enough and may hold up
> the release.  I fully understand why you ask multiple questions in one
> email to keep things moving.
>
> I am running mpirun on pcp-j-20 and "getent hosts pcp-j-20" run there yields
>
> $ getent hosts pcp-j-20
> 127.0.0.1   pcp-j-20 pcp-j-20.local localhost loghost
> 172.16.0.120pcp-j-20 pcp-j-20.local localhost loghost
>
> In case it matters: there is an entry for 172.18.0.0.120 in /etc/hosts as
> pcp-j-20-ib.
>
> I will run a test tonight to determine if the same issue is present without
> "-m64".
> I will report the outcome when I can.
>
> Yes, I can ping and ssh to "pcp-j-{19,20}" and "172.{16,18}.0.{119,120}".
> I see the following if run on either pcp-j-19 or pcp-j-20:
>
> $ for x in {pcp-j-,172.{16,18}.0.1}{19,20}; do ssh $x echo OK connecting to
> $x; done
> OK connecting to pcp-j-19
> OK connecting to pcp-j-20
> OK connecting to 172.16.0.119
> OK connecting to 172.16.0.120
> OK connecting to 172.18.0.119
> OK connecting to 172.18.0.120
>
>
> I will report on the 1.8.3 and the non-m64 runs when they are done.
> Meanwhile, if you have other things you want run let me know.
>
> -Paul
>
> On Tue, Dec 16, 2014 at 5:35 PM, Gilles Gouaillardet 
>  wrote:
>
>  Thanks Paul,
>
> Are you invoking mpirun on pcp-j-20 ?
> If yes, what does
> getent hosts pcp-j-20
> says ?
>
> BTW, did you try without -m64 ?
>
> Does the following work
> ping/ssh 172.18.0.120
>
> Honestly, this output makes very little sense to me, so i am asking way
> too much info hoping i can reproduce this issue or get a hint on what can
> possibly goes wrong.
>
> Cheers,
>
> Gilles
>
> Paul Hargrove   wrote:
> Gilles,
>
> I am running mpirun on a host that ALSO will run one of the application
> processes.
> Requested ifconfig and netstat outputs appear below.
>
> -Paul
>
> [phargrov@pcp-j-20 ~]$ ifconfig -a
> lo0: flags=2001000849 mtu 8232
> index 1
> inet 127.0.0.1 netmask ff00
> bge0: flags=1004843 mtu 1500
> index 2
> inet 172.16.0.120 netmask  broadcast 172.16.255.255
> p.ibp0: flags=1001000843
> mtu 2044 index 3
> inet 172.18.0.120 netmask  broadcast 172.18.255.255
> lo0: flags=2002000849 mtu 8252
> index 1
> inet6 ::1/128
> bge0: flags=20002004841 mtu 1500 index 2
> inet6 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
Thanks Paul,

if 1.8.3 with -m64 and the same compilers runs fine, then please do not
bother running 1.8.4rc4 without -m64.
/* i understand you are busy and i hardly believe -m64 is the root cause */

a regression i can think of involves the flags we use for pthreads :
for bad reasons, we initially tested the following flags on solaris :
-pthread
-pthreads
-mt

with solarisstudio 12.4, -mt was chosen

1.8.4rc4 has a bug (fixed in the v1.8 git): -D_REENTRANT is not
automatically added, so you have to do it manually.
i just figured out that -mt is unlikely automatically.
do we need this and where ?
CFLAGS ? (or is -D_REENTRANT enough ?)
LDFLAGS ? (that might be solaris and/or solarisstudio (12.4) specific
and i simply ignore it)

Bottom line, i do invite you to test 1.8.4rc4 again and with
CFLAGS="-mt"
or
CFLAGS="-mt -m64"
if you previously tested 1.8.3 with -m64

Cheers,

Gilles


On 2014/12/17 11:05, Paul Hargrove wrote:
> Gilles,
>
> First, please note that prior tests of 1.8.3 ran with no problems on these
> hosts.
> So, I *think* this problem is a regression.
> However, I am not 100% certain that this *exact* configuration was tested.
> So, I am RE-running a test of 1.8.3 now to be absolutely sure if this is a
> regression.
> I will report the outcome when I can.
>
> I have limited time to run the tests you are asking for.  I will do my
> best, but am concerned that I won't be responsive enough and may hold up
> the release.  I fully understand why you ask multiple questions in one
> email to keep things moving.
>
> I am running mpirun on pcp-j-20 and "getent hosts pcp-j-20" run there yields
>
> $ getent hosts pcp-j-20
> 127.0.0.1   pcp-j-20 pcp-j-20.local localhost loghost
> 172.16.0.120pcp-j-20 pcp-j-20.local localhost loghost
>
> In case it matters: there is an entry for 172.18.0.0.120 in /etc/hosts as
> pcp-j-20-ib.
>
> I will run a test tonight to determine if the same issue is present without
> "-m64".
> I will report the outcome when I can.
>
> Yes, I can ping and ssh to "pcp-j-{19,20}" and "172.{16,18}.0.{119,120}".
> I see the following if run on either pcp-j-19 or pcp-j-20:
>
> $ for x in {pcp-j-,172.{16,18}.0.1}{19,20}; do ssh $x echo OK connecting to
> $x; done
> OK connecting to pcp-j-19
> OK connecting to pcp-j-20
> OK connecting to 172.16.0.119
> OK connecting to 172.16.0.120
> OK connecting to 172.18.0.119
> OK connecting to 172.18.0.120
>
>
> I will report on the 1.8.3 and the non-m64 runs when they are done.
> Meanwhile, if you have other things you want run let me know.
>
> -Paul
>
> On Tue, Dec 16, 2014 at 5:35 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>> Thanks Paul,
>>
>> Are you invoking mpirun on pcp-j-20 ?
>> If yes, what does
>> getent hosts pcp-j-20
>> says ?
>>
>> BTW, did you try without -m64 ?
>>
>> Does the following work
>> ping/ssh 172.18.0.120
>>
>> Honestly, this output makes very little sense to me, so i am asking way
>> too much info hoping i can reproduce this issue or get a hint on what can
>> possibly goes wrong.
>>
>> Cheers,
>>
>> Gilles
>>
>> Paul Hargrove  wrote:
>> Gilles,
>>
>> I am running mpirun on a host that ALSO will run one of the application
>> processes.
>> Requested ifconfig and netstat outputs appear below.
>>
>> -Paul
>>
>> [phargrov@pcp-j-20 ~]$ ifconfig -a
>> lo0: flags=2001000849 mtu 8232
>> index 1
>> inet 127.0.0.1 netmask ff00
>> bge0: flags=1004843 mtu 1500
>> index 2
>> inet 172.16.0.120 netmask  broadcast 172.16.255.255
>> p.ibp0: flags=1001000843
>> mtu 2044 index 3
>> inet 172.18.0.120 netmask  broadcast 172.18.255.255
>> lo0: flags=2002000849 mtu 8252
>> index 1
>> inet6 ::1/128
>> bge0: flags=20002004841 mtu 1500 index 2
>> inet6 fe80::250:45ff:fe5c:2b0/10
>> [phargrov@pcp-j-20 ~]$ netstat -nr
>>
>> Routing Table: IPv4
>>   Destination   Gateway   Flags  Ref Use Interface
>>   - - -- -
>> default  172.16.254.1 UG2 158463 bge0
>> 127.0.0.1127.0.0.1UH5 398913 lo0
>> 172.16.0.0   172.16.0.120 U 4  135241319 bge0
>> 172.18.0.0   172.18.0.120 U 3 26
>> p.ibp0
>>
>> Routing Table: IPv6
>>   Destination/MaskGateway   Flags Ref   Use
>>  If
>> --- --- - --- ---
>> -
>> ::1 ::1 UH  2   0
>> lo0
>> fe80::/10   fe80::250:45ff:fe5c:2b0 U   2   0
>> bge0
>>
>> On Tue, Dec 16, 2014 at 2:55 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@iferc.org> 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Paul Hargrove
Gilles,

First, please note that prior tests of 1.8.3 ran with no problems on these
hosts.
So, I *think* this problem is a regression.
However, I am not 100% certain that this *exact* configuration was tested.
So, I am RE-running a test of 1.8.3 now to be absolutely sure if this is a
regression.
I will report the outcome when I can.

I have limited time to run the tests you are asking for.  I will do my
best, but am concerned that I won't be responsive enough and may hold up
the release.  I fully understand why you ask multiple questions in one
email to keep things moving.

I am running mpirun on pcp-j-20 and "getent hosts pcp-j-20" run there yields

$ getent hosts pcp-j-20
127.0.0.1   pcp-j-20 pcp-j-20.local localhost loghost
172.16.0.120pcp-j-20 pcp-j-20.local localhost loghost

In case it matters: there is an entry for 172.18.0.0.120 in /etc/hosts as
pcp-j-20-ib.

I will run a test tonight to determine if the same issue is present without
"-m64".
I will report the outcome when I can.

Yes, I can ping and ssh to "pcp-j-{19,20}" and "172.{16,18}.0.{119,120}".
I see the following if run on either pcp-j-19 or pcp-j-20:

$ for x in {pcp-j-,172.{16,18}.0.1}{19,20}; do ssh $x echo OK connecting to
$x; done
OK connecting to pcp-j-19
OK connecting to pcp-j-20
OK connecting to 172.16.0.119
OK connecting to 172.16.0.120
OK connecting to 172.18.0.119
OK connecting to 172.18.0.120


I will report on the 1.8.3 and the non-m64 runs when they are done.
Meanwhile, if you have other things you want run let me know.

-Paul

On Tue, Dec 16, 2014 at 5:35 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:
>
> Thanks Paul,
>
> Are you invoking mpirun on pcp-j-20 ?
> If yes, what does
> getent hosts pcp-j-20
> says ?
>
> BTW, did you try without -m64 ?
>
> Does the following work
> ping/ssh 172.18.0.120
>
> Honestly, this output makes very little sense to me, so i am asking way
> too much info hoping i can reproduce this issue or get a hint on what can
> possibly goes wrong.
>
> Cheers,
>
> Gilles
>
> Paul Hargrove  wrote:
> Gilles,
>
> I am running mpirun on a host that ALSO will run one of the application
> processes.
> Requested ifconfig and netstat outputs appear below.
>
> -Paul
>
> [phargrov@pcp-j-20 ~]$ ifconfig -a
> lo0: flags=2001000849 mtu 8232
> index 1
> inet 127.0.0.1 netmask ff00
> bge0: flags=1004843 mtu 1500
> index 2
> inet 172.16.0.120 netmask  broadcast 172.16.255.255
> p.ibp0: flags=1001000843
> mtu 2044 index 3
> inet 172.18.0.120 netmask  broadcast 172.18.255.255
> lo0: flags=2002000849 mtu 8252
> index 1
> inet6 ::1/128
> bge0: flags=20002004841 mtu 1500 index 2
> inet6 fe80::250:45ff:fe5c:2b0/10
> [phargrov@pcp-j-20 ~]$ netstat -nr
>
> Routing Table: IPv4
>   Destination   Gateway   Flags  Ref Use Interface
>   - - -- -
> default  172.16.254.1 UG2 158463 bge0
> 127.0.0.1127.0.0.1UH5 398913 lo0
> 172.16.0.0   172.16.0.120 U 4  135241319 bge0
> 172.18.0.0   172.18.0.120 U 3 26
> p.ibp0
>
> Routing Table: IPv6
>   Destination/MaskGateway   Flags Ref   Use
>  If
> --- --- - --- ---
> -
> ::1 ::1 UH  2   0
> lo0
> fe80::/10   fe80::250:45ff:fe5c:2b0 U   2   0
> bge0
>
> On Tue, Dec 16, 2014 at 2:55 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>>
>>  Paul,
>>
>> could you please send the output of
>> ifconfig -a
>> netstat -nr
>>
>> on the three hosts you are using
>> (i assume you are still invoking mpirun from one node, and tasks are
>> running on two other nodes)
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/12/16 16:00, Paul Hargrove wrote:
>>
>> Gilles,
>>
>> I looked again carefully and I am *NOT* finding -D_REENTRANT passed to most
>> compilations.
>> It appears to be used for building libevent and vt, but nothing else.
>> The output from configure contains
>>
>> checking if more special flags are required for pthreads... -D_REENTRANT
>>
>> only in the libevent and vt sub-configure portions.
>>
>> When configured for gcc on Solaris-11 I see the following in configure
>>
>> checking for C optimization flags... -m64 -D_REENTRANT -g
>> -finline-functions -fno-strict-aliasing
>>
>> but with CC=cc the equivalent line is
>>
>> checking for C optimization flags... -m64 -g
>>
>> In both cases the "-m64" is from the CFLAGS I have passed to configure.
>>
>> However, when I use CFLAGS="-m64 -D_REENTRANT" 

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
Thanks Paul,

Are you invoking mpirun on pcp-j-20 ?
If yes, what does
getent hosts pcp-j-20
says ?

BTW, did you try without -m64 ?

Does the following work
ping/ssh 172.18.0.120

Honestly, this output makes very little sense to me, so i am asking way too 
much info hoping i can reproduce this issue or get a hint on what can possibly 
goes wrong.

Cheers,

Gilles

Paul Hargrove  wrote:
>Gilles,
>
>
>I am running mpirun on a host that ALSO will run one of the application 
>processes.
>
>Requested ifconfig and netstat outputs appear below.
>
>
>-Paul
>
>
>[phargrov@pcp-j-20 ~]$ ifconfig -a
>
>lo0: flags=2001000849 mtu 8232 
>index 1
>
>        inet 127.0.0.1 netmask ff00 
>
>bge0: flags=1004843 mtu 1500 index 2
>
>        inet 172.16.0.120 netmask  broadcast 172.16.255.255
>
>p.ibp0: flags=1001000843 mtu 
>2044 index 3
>
>        inet 172.18.0.120 netmask  broadcast 172.18.255.255
>
>lo0: flags=2002000849 mtu 8252 
>index 1
>
>        inet6 ::1/128 
>
>bge0: flags=20002004841 mtu 1500 index 2
>
>        inet6 fe80::250:45ff:fe5c:2b0/10 
>
>[phargrov@pcp-j-20 ~]$ netstat -nr
>
>
>Routing Table: IPv4
>
>  Destination           Gateway           Flags  Ref     Use     Interface 
>
>  - - -- - 
>
>default              172.16.254.1         UG        2     158463 bge0      
>
>127.0.0.1            127.0.0.1            UH        5     398913 lo0       
>
>172.16.0.0           172.16.0.120         U         4  135241319 bge0      
>
>172.18.0.0           172.18.0.120         U         3         26 p.ibp0 
>
>
>Routing Table: IPv6
>
>  Destination/Mask            Gateway                   Flags Ref   Use    If  
> 
>
>--- --- - --- --- 
>- 
>
>::1                         ::1                         UH      2       0 lo0  
> 
>
>fe80::/10                   fe80::250:45ff:fe5c:2b0     U       2       0 bge0 
>
>
>On Tue, Dec 16, 2014 at 2:55 AM, Gilles Gouaillardet 
> wrote:
>
>Paul,
>
>could you please send the output of
>ifconfig -a
>netstat -nr
>
>on the three hosts you are using
>(i assume you are still invoking mpirun from one node, and tasks are running 
>on two other nodes)
>
>Cheers,
>
>Gilles
>
>
>
>On 2014/12/16 16:00, Paul Hargrove wrote:
>
>Gilles, I looked again carefully and I am *NOT* finding -D_REENTRANT passed to 
>most compilations. It appears to be used for building libevent and vt, but 
>nothing else. The output from configure contains checking if more special 
>flags are required for pthreads... -D_REENTRANT only in the libevent and vt 
>sub-configure portions. When configured for gcc on Solaris-11 I see the 
>following in configure checking for C optimization flags... -m64 -D_REENTRANT 
>-g -finline-functions -fno-strict-aliasing but with CC=cc the equivalent line 
>is checking for C optimization flags... -m64 -g In both cases the "-m64" is 
>from the CFLAGS I have passed to configure. However, when I use CFLAGS="-m64 
>-D_REENTRANT" the problem DOES NOT go away. I see [pcp-j-20:24740] 
>mca_oob_tcp_accept: accept() failed: Error 0 (11). 
> A process or 
>daemon was unable to complete a TCP connection to another process: Local host: 
>pcp-j-20 Remote host: 172.18.0.120 This is usually caused by a firewall on the 
>remote host. Please check that any firewall (e.g., iptables) has been disabled 
>and try again.  
>which is at least appears to have a non-zero errno. A quick grep through 
>/usr/include/sys/errno shows 11 is EAGAIN. With the oob.patch you provided the 
>failed accept goes away, BUT the connection still fails: 
> A process or 
>daemon was unable to complete a TCP connection to another process: Local host: 
>pcp-j-20 Remote host: 172.18.0.120 This is usually caused by a firewall on the 
>remote host. Please check that any firewall (e.g., iptables) has been disabled 
>and try again.  
>Use of "-mca oob_tcp_if_include bge0" to use a single interface did not fix 
>this. -Paul On Mon, Dec 15, 2014 at 7:18 PM, Paul Hargrove 
> wrote: 
>
>Gilles, I am NOT seeing the problem with gcc. It is only occurring with the 
>Studio compilers. As I've already reported, I have tried adding either "-mt" 
>or "-mt=yes" to both LDFLAGS and --with-wrapper-ldflags. The "cc" manpage (on 
>the Solaris-10 system I can get to right now) says: -mt Compile and link for 
>multithreaded code. This option passes -D_REENTRANT to the preprocessor and 
>passes