Ralph,

You get it right.
The latest nightly tarball shoul work out of the box.
(well, -m64 must be passed manually, but this is not related whatsoever to the 
issue discussed here)

Cheers,

Gilles

"Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:
>Paul --
>
>The __sun macro check is now in the OMPI 1.8 tree, and is in the latest 
>nightly tarball.
>
>If I'm following this thread right -- and I might not be! -- I think Gilles is 
>saying that now that the __sun check is in, it should fix this 
>-mt/-D_REENTRANT/whatever problem.
>
>Can you confirm?
>
>
>On Dec 16, 2014, at 1:55 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
>> Gilles,
>> 
>> I am running mpirun on a host that ALSO will run one of the application 
>> processes.
>> Requested ifconfig and netstat outputs appear below.
>> 
>> -Paul
>> 
>> [phargrov@pcp-j-20 ~]$ ifconfig -a
>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 
>> index 1
>>         inet 127.0.0.1 netmask ff000000 
>> bge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 
>> 2
>>         inet 172.16.0.120 netmask ffff0000 broadcast 172.16.255.255
>> pFFFF.ibp0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> 
>> mtu 2044 index 3
>>         inet 172.18.0.120 netmask ffff0000 broadcast 172.18.255.255
>> lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 
>> index 1
>>         inet6 ::1/128 
>> bge0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 index 2
>>         inet6 fe80::250:45ff:fe5c:2b0/10 
>> [phargrov@pcp-j-20 ~]$ netstat -nr
>> 
>> Routing Table: IPv4
>>   Destination           Gateway           Flags  Ref     Use     Interface 
>> -------------------- -------------------- ----- ----- ---------- --------- 
>> default              172.16.254.1         UG        2     158463 bge0      
>> 127.0.0.1            127.0.0.1            UH        5     398913 lo0       
>> 172.16.0.0           172.16.0.120         U         4  135241319 bge0      
>> 172.18.0.0           172.18.0.120         U         3         26 pFFFF.ibp0 
>> 
>> Routing Table: IPv6
>>   Destination/Mask            Gateway                   Flags Ref   Use    
>> If  
>> --------------------------- --------------------------- ----- --- ------- 
>> ----- 
>> ::1                         ::1                         UH      2       0 
>> lo0  
>> fe80::/10                   fe80::250:45ff:fe5c:2b0     U       2       0 
>> bge0 
>> 
>> On Tue, Dec 16, 2014 at 2:55 AM, Gilles Gouaillardet 
>> <gilles.gouaillar...@iferc.org> wrote:
>> Paul,
>> 
>> could you please send the output of
>> ifconfig -a
>> netstat -nr
>> 
>> on the three hosts you are using
>> (i assume you are still invoking mpirun from one node, and tasks are running 
>> on two other nodes)
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> 
>> On 2014/12/16 16:00, Paul Hargrove wrote:
>>> Gilles,
>>> 
>>> I looked again carefully and I am *NOT* finding -D_REENTRANT passed to most
>>> compilations.
>>> It appears to be used for building libevent and vt, but nothing else.
>>> The output from configure contains
>>> 
>>> checking if more special flags are required for pthreads... -D_REENTRANT
>>> 
>>> only in the libevent and vt sub-configure portions.
>>> 
>>> When configured for gcc on Solaris-11 I see the following in configure
>>> 
>>> checking for C optimization flags... -m64 -D_REENTRANT -g
>>> -finline-functions -fno-strict-aliasing
>>> 
>>> but with CC=cc the equivalent line is
>>> 
>>> checking for C optimization flags... -m64 -g
>>> 
>>> In both cases the "-m64" is from the CFLAGS I have passed to configure.
>>> 
>>> However, when I use CFLAGS="-m64 -D_REENTRANT" the problem DOES NOT go away.
>>> I see
>>> 
>>> [pcp-j-20:24740] mca_oob_tcp_accept: accept() failed: Error 0 (11).
>>> ------------------------------------------------------------
>>> A process or daemon was unable to complete a TCP connection
>>> to another process:
>>>   Local host:    pcp-j-20
>>>   Remote host:   172.18.0.120
>>> This is usually caused by a firewall on the remote host. Please
>>> check that any firewall (e.g., iptables) has been disabled and
>>> try again.
>>> ------------------------------------------------------------
>>> 
>>> which is at least appears to have a non-zero errno.
>>> A quick grep through /usr/include/sys/errno shows 11 is EAGAIN.
>>> 
>>> With the oob.patch you provided the failed accept goes away, BUT the
>>> connection still fails:
>>> 
>>> ------------------------------------------------------------
>>> A process or daemon was unable to complete a TCP connection
>>> to another process:
>>>   Local host:    pcp-j-20
>>>   Remote host:   172.18.0.120
>>> This is usually caused by a firewall on the remote host. Please
>>> check that any firewall (e.g., iptables) has been disabled and
>>> try again.
>>> ------------------------------------------------------------
>>> 
>>> 
>>> Use of "-mca oob_tcp_if_include bge0" to use a single interface did not fix
>>> this.
>>> 
>>> 
>>> -Paul
>>> 
>>> On Mon, Dec 15, 2014 at 7:18 PM, Paul Hargrove 
>>> <phhargr...@lbl.gov>
>>>  wrote:
>>> 
>>>> Gilles,
>>>> 
>>>> I am NOT seeing the problem with gcc.
>>>> It is only occurring with the Studio compilers.
>>>> 
>>>> As I've already reported, I have tried adding either "-mt" or "-mt=yes" to
>>>> both LDFLAGS and --with-wrapper-ldflags.
>>>> 
>>>> The "cc" manpage (on the Solaris-10 system I can get to right now) says:
>>>> 
>>>>      -mt  Compile and link for multithreaded code.
>>>> 
>>>>           This option passes -D_REENTRANT to the preprocessor and
>>>>           passes -lthread in the correct order to ld.
>>>> 
>>>>           The -mt option is required if the application or
>>>>           libraries are multithreaded.
>>>> 
>>>>           To ensure proper library linking order, you must use
>>>>           this option, rather than -lthread, to link with lib-
>>>>           thread.
>>>> 
>>>>           If you are using POSIX threads, you must link with the
>>>>           options -mt -lpthread.  The -mt option is necessary
>>>>           because libC and libCrun need libthread for a mul-
>>>>           tithreaded application.
>>>> 
>>>>           If you compile and link in separate steps and you com-
>>>>           pile with -mt, you might get unexpected results. If you
>>>>           compile one translation unit with -mt, compile all
>>>>           units of the program with -mt.
>>>> 
>>>> I cannot connect to my Solaris-11 system right now, but I recall the text
>>>> to be quite similar.
>>>> 
>>>> -Paul
>>>> 
>>>> On Mon, Dec 15, 2014 at 7:12 PM, Gilles Gouaillardet <
>>>> 
>>>> gilles.gouaillar...@iferc.org
>>>> > wrote:
>>>> 
>>>> 
>>>>>  Paul,
>>>>> 
>>>>> did you manually set -mt ?
>>>>> 
>>>>> if i remember correctly, solaris 11 (at least with gcc compilers) do not
>>>>> need any flags
>>>>> (except the -D_REENTRANT that is added automatically)
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Gilles
>>>>> 
>>>>> 
>>>>> On 2014/12/16 12:10, Paul Hargrove wrote:
>>>>> 
>>>>> Gilles,
>>>>> 
>>>>> I will try the patch when I can.
>>>>> However, our network is undergoing network maintenance right now, leaving
>>>>> me unable to reach the necessary hosts.
>>>>> 
>>>>> As for -D_REENTRANT, I had already reported having verified in the "make"
>>>>> output that it had been added automatically.
>>>>> 
>>>>> Additionally, the docs say that "-mt" *also* passes -D_REENTRANT to the
>>>>> preprocessor.
>>>>> 
>>>>> -Paul
>>>>> 
>>>>> On Mon, Dec 15, 2014 at 6:07 PM, Gilles Gouaillardet 
>>>>> <gilles.gouaillar...@iferc.org>
>>>>>  wrote:
>>>>> 
>>>>> 
>>>>>  Paul,
>>>>> 
>>>>> could you please make sure configure added  "-D_REENTRANT" to the CFLAGS ?
>>>>> /* otherwise, errno is a global variable instead of a per thread variable,
>>>>> which can
>>>>> explains some weird behaviour. note this should have been already fixed */
>>>>> 
>>>>> assuming -D_REENTRANT is set, could you please give the attached patch a
>>>>> try ?
>>>>> 
>>>>> i suspect the CLOSE_THE_SOCKET macro resets errno, and hence the confusing
>>>>> error message
>>>>> e.g. failed: Error 0 (0)
>>>>> 
>>>>> FWIW, master is also affected.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Gilles
>>>>> 
>>>>> 
>>>>> On 2014/12/16 10:47, Paul Hargrove wrote:
>>>>> 
>>>>> I have tried with a oob_tcp_if_include setting so that there is now only 1
>>>>> interface.
>>>>> Even with just one interface and -mt=yes in both LDFLAGS and
>>>>> wrapper-ldflags I *still* getting messages like
>>>>> 
>>>>> [pcp-j-20:11470] mca_oob_tcp_accept: accept() failed: Error 0 (0).
>>>>> ------------------------------
>>>>> 
>>>>> ------------------------------
>>>>> A process or daemon was unable to complete a TCP connection
>>>>> to another process:
>>>>>   Local host:    pcp-j-20
>>>>>   Remote host:   172.16.0.120
>>>>> This is usually caused by a firewall on the remote host. Please
>>>>> check that any firewall (e.g., iptables) has been disabled and
>>>>> try again.
>>>>> ------------------------------
>>>>> ------------------------------
>>>>> 
>>>>> 
>>>>> I am getting less certain that my speculation about thread-safe libs is
>>>>> correct.
>>>>> 
>>>>> -Paul
>>>>> 
>>>>> On Mon, Dec 15, 2014 at 1:24 PM, Paul Hargrove 
>>>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> 
>>>>> <phhargr...@lbl.gov>
>>>>>  wrote:
>>>>> 
>>>>>  A little more reading finds that...
>>>>> 
>>>>> Docs says that one needs "-mt" without the "=yes".
>>>>> That will work for both old and new compilers, where "-mt=yes" chokes
>>>>> older ones.
>>>>> 
>>>>> Also, man pages say "-mt" must come before "-lpthread" in the link 
>>>>> command.
>>>>> 
>>>>> -Paul
>>>>> 
>>>>> On Mon, Dec 15, 2014 at 12:52 PM, Paul Hargrove 
>>>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> 
>>>>> <phhargr...@lbl.gov>
>>>>> 
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain 
>>>>> <r...@open-mpi.org> <r...@open-mpi.org> <r...@open-mpi.org> 
>>>>> <r...@open-mpi.org>
>>>>>  wrote:
>>>>> 
>>>>>  7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the
>>>>> multi-threaded C libraries, apparently need "-mt=yes" in both compile and
>>>>> link. Need someone to investigate.
>>>>> 
>>>>> 
>>>>> The lack of multi-thread libraries is my SPECULATION.
>>>>> 
>>>>> The fact that configuring with LDFLAGS=-mt=yes did not help may or may
>>>>> not prove anything.
>>>>> I didn't see them in "mpicc -show" and so maybe they needed to be in
>>>>> wrapper-ldflags instead.
>>>>> My time this week is quite limited, but I can "fire an forget" tests of
>>>>> any tarballs you provide.
>>>>> 
>>>>> -Paul
>>>>> 
>>>>> --
>>>>> Paul H. Hargrove                          
>>>>> phhargr...@lbl.gov
>>>>> 
>>>>> 
>>>>> Computer Languages & Systems Software (CLaSS) Group
>>>>> Computer Science Department               Tel: 
>>>>> +1-510-495-2352
>>>>> 
>>>>> Lawrence Berkeley National Laboratory     Fax: 
>>>>> +1-510-486-6900
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Paul H. Hargrove                          
>>>>> phhargr...@lbl.gov
>>>>> 
>>>>> Computer Languages & Systems Software (CLaSS) Group
>>>>> Computer Science Department               Tel: 
>>>>> +1-510-495-2352
>>>>> 
>>>>> Lawrence Berkeley National Laboratory     Fax: 
>>>>> +1-510-486-6900
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing 
>>>>> listde...@open-mpi.org
>>>>> 
>>>>> Subscription: 
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16607.php
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing 
>>>>> listde...@open-mpi.org
>>>>> 
>>>>> Subscription: 
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16608.php
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing 
>>>>> listde...@open-mpi.org
>>>>> 
>>>>> Subscription: 
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16610.php
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> 
>>>>> de...@open-mpi.org
>>>>> 
>>>>> Subscription: 
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> Link to this post:
>>>>> 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16611.php
>>>>> 
>>>>> 
>>>>> 
>>>>  
>>>> --
>>>> Paul H. Hargrove                          
>>>> phhargr...@lbl.gov
>>>> 
>>>> Computer Languages & Systems Software (CLaSS) Group
>>>> Computer Science Department               Tel: 
>>>> +1-510-495-2352
>>>> 
>>>> Lawrence Berkeley National Laboratory     Fax: 
>>>> +1-510-486-6900
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> 
>>> de...@open-mpi.org
>>> 
>>> Subscription: 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16613.php
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16615.php
>> 
>> 
>> -- 
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16617.php
>
>
>-- 
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to: 
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/12/16660.php

Reply via email to