Paul,

could you please make sure configure added  "-D_REENTRANT" to the CFLAGS ?
/* otherwise, errno is a global variable instead of a per thread
variable, which can
explains some weird behaviour. note this should have been already fixed */

assuming -D_REENTRANT is set, could you please give the attached patch a
try ?

i suspect the CLOSE_THE_SOCKET macro resets errno, and hence the
confusing error message
e.g. failed: Error 0 (0)

FWIW, master is also affected.

Cheers,

Gilles

On 2014/12/16 10:47, Paul Hargrove wrote:
> I have tried with a oob_tcp_if_include setting so that there is now only 1
> interface.
> Even with just one interface and -mt=yes in both LDFLAGS and
> wrapper-ldflags I *still* getting messages like
>
> [pcp-j-20:11470] mca_oob_tcp_accept: accept() failed: Error 0 (0).
> ------------------------------------------------------------
> A process or daemon was unable to complete a TCP connection
> to another process:
>   Local host:    pcp-j-20
>   Remote host:   172.16.0.120
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
> ------------------------------------------------------------
>
>
> I am getting less certain that my speculation about thread-safe libs is
> correct.
>
> -Paul
>
> On Mon, Dec 15, 2014 at 1:24 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>> A little more reading finds that...
>>
>> Docs says that one needs "-mt" without the "=yes".
>> That will work for both old and new compilers, where "-mt=yes" chokes
>> older ones.
>>
>> Also, man pages say "-mt" must come before "-lpthread" in the link command.
>>
>> -Paul
>>
>> On Mon, Dec 15, 2014 at 12:52 PM, Paul Hargrove <phhargr...@lbl.gov>
>> wrote:
>>>
>>> On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> 7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the
>>>> multi-threaded C libraries, apparently need "-mt=yes" in both compile and
>>>> link. Need someone to investigate.
>>>
>>> The lack of multi-thread libraries is my SPECULATION.
>>>
>>> The fact that configuring with LDFLAGS=-mt=yes did not help may or may
>>> not prove anything.
>>> I didn't see them in "mpicc -show" and so maybe they needed to be in
>>> wrapper-ldflags instead.
>>> My time this week is quite limited, but I can "fire an forget" tests of
>>> any tarballs you provide.
>>>
>>> -Paul
>>>
>>> --
>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>> Computer Languages & Systems Software (CLaSS) Group
>>> Computer Science Department               Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>
>>
>> --
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16607.php

diff --git a/orte/mca/oob/tcp/oob_tcp_listener.c b/orte/mca/oob/tcp/oob_tcp_listener.c
index b6d2ad8..87ff08d 100644
--- a/orte/mca/oob/tcp/oob_tcp_listener.c
+++ b/orte/mca/oob/tcp/oob_tcp_listener.c
@@ -14,6 +14,8 @@
  * Copyright (c) 2009-2014 Cisco Systems, Inc.  All rights reserved.
  * Copyright (c) 2011      Oak Ridge National Labs.  All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc.  All rights reserved.
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -729,7 +731,6 @@ static void* listen_thread(opal_object_t *obj)
                 if (pending_connection->fd < 0) {
                     if (opal_socket_errno != EAGAIN || 
                         opal_socket_errno != EWOULDBLOCK) {
-                        CLOSE_THE_SOCKET(pending_connection->fd);
                         if (EMFILE == opal_socket_errno) {
                             ORTE_ERROR_LOG(ORTE_ERR_SYS_LIMITS_SOCKETS);
                             orte_show_help("help-orterun.txt", "orterun:sys-limit-sockets", true);
@@ -737,6 +738,7 @@ static void* listen_thread(opal_object_t *obj)
                             opal_output(0, "mca_oob_tcp_accept: accept() failed: %s (%d).",
                                         strerror(opal_socket_errno), opal_socket_errno);
                         }
+                        CLOSE_THE_SOCKET(pending_connection->fd);
                         OBJ_RELEASE(pending_connection);
                         goto done;
                     }

Reply via email to