[ 
https://issues.apache.org/jira/browse/THRIFT-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106454#comment-17106454
 ] 

Max commented on THRIFT-5186:
-----------------------------

Keeping today's findings posted. Still not sure if this is to be fixed with 
further patches, or if one would claim it's a [pervasively common] 
misconfiguration (e.g. Docker default). Probably the former.

 

TServerSocket::listen() has this piece, setting IPV6_V6ONLY on AF_INET6 sockets:
{code:java}
#ifdef IPV6_V6ONLY
  if (path_.empty() && res->ai_family == AF_INET6) {
    int zero = 0;
    if (-1 == setsockopt(serverSocket_,
                         IPPROTO_IPV6,
                         IPV6_V6ONLY,
                         cast_sockopt(&zero),
                         sizeof(zero))) {
      GlobalOutput.perror("TServerSocket::listen() IPV6_V6ONLY ", 
THRIFT_GET_SOCKET_ERROR);    
    }
  }
#endif // #ifdef IPV6_V6ONLY
{code}
More importantly, this is how {{getaddrinfo()}} results are processed in 
TServerSocket::listen():
{code:java}
    // Pick the ipv6 address first since ipv4 addresses can be mapped
    // into ipv6 space.
    for (res = info.res(); res; res = res->ai_next) {
      if (res->ai_family == AF_INET6 || res->ai_next == nullptr)
        break;
    }
  }
{code}
I.e. IPv6 results are unconditionally preferred.

This, together with {{::1 localhost}} entry in {{/etc/hosts}}, and removed 
{{AI_ADDRCONFIG}} hint — leads to funny result: {{localhost}} resolves to 
something which you can't connect() to, at least in Docker containers with the 
default v4-only bridge network.

The issue goes away if I configure IPv6 in Docker. 
[https://docs.docker.com/config/daemon/ipv6/]

The issue goes away if I comment out the {{::1 localhost}} entry in container's 
/etc/hosts.

The issue also goes away if I bring back {{AI_ADDRCONFIG}} hint. But then, I 
get "getaddrinfo() <Host: 127.0.0.1 Port: 1302>Address family for hostname not 
supported" with loopback-only network. Hmmm.

Current conclusion at this point: in that do-while bind()-retry loop, 
TServerSocket should also loop over the individual {{getaddrinfo}} results. 
That way, it would work around this (seemingly standard and OK!) situation:
{code:java}
[root@04dd07b70038 /]# ping -6 localhost
ping: connect: Cannot assign requested address
{code}

> AI_ADDRCONFIG: Thrift libraries crash with localhost-only network.
> ------------------------------------------------------------------
>
>                 Key: THRIFT-5186
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5186
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library, Delphi - Library, Python - Library
>    Affects Versions: 0.13.0
>         Environment: Red Hat Enterprise Linux 8.0
>            Reporter: Max
>            Assignee: Max
>            Priority: Major
>              Labels: getaddrinfo, localhost, sockets
>             Fix For: 0.14.0
>
>         Attachments: 
> 0001-THRIFT-5186-Dont-pass-AI_ADDRCONFIG-to-getaddrinfo.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> THRIFT-2539 has been reported, and fixed — but for win32 only, for no 
> apparent reason. The exact same problem reproduces on POSIX.
> Namely, when no network interfaces besides {{lo}} (the 127.0.0.1 loopback 
> interface) are up, C++ and Python apps linked with Thrift-generated code, 
> both clients and servers — *crash by throwing an exception*. Even when the 
> intention is exactly to run them on localhost only.
> This happens because Thrift library code for TSocket, TServerSocket, 
> TNonblockingServerSocket calls 
> [{{getaddrinfo()}}|http://man7.org/linux/man-pages/man3/getaddrinfo.3.html] 
> to resolve target hostname to connect to/listen on, into concrete IP address 
> (v4 or v6, whichever the system is configured for). To that call, it *passes 
> the {{AI_ADDRCONFIG}} hint* which effectively turns a localhost-only 
> situation into:
> {quote}{{Could not resolve host for client socket.}}
> {quote}
> and into this (server-side):
> {code:java}
> гру 23 13:52:13 localhost.localdomain systemd[1]: db_cache.service: Main 
> process exited, code=dumped, status=6/ABRT
> гру 23 13:52:13 localhost.localdomain systemd[1]: db_cache.service: Failed 
> with result 'core-dump'.
> гру 23 13:52:17 localhost.localdomain db_cache[12912]: Thrift: Mon Dec 23 
> 13:52:15 2019 TSocket::open() getaddrinfo() <Host: 127.0.0.1 Port: 
> 1302>Address family for hostname not supported
> гру 23 13:52:17 localhost.localdomain db_cache[12912]: Thrift: Mon Dec 23 
> 13:52:15 2019 TSocket::open() getaddrinfo() <Host: 127.0.0.1 Port: 
> 8345>Address family for hostname not supported
> гру 23 13:52:17 localhost.localdomain db_cache[12912]: Thrift: Mon Dec 23 
> 13:52:15 2019 TNonblocking: using dedicated listener thread, io threads: 16
> гру 23 13:52:17 localhost.localdomain db_cache[12912]: Thrift: Mon Dec 23 
> 13:52:15 2019 getaddrinfo -9: Address family for hostname not supported
> гру 23 13:52:17 localhost.localdomain db_cache[12912]: terminate called after 
> throwing an instance of 'apache::thrift::transport::TTransportException'
> гру 23 13:52:17 localhost.localdomain db_cache[12912]:   what():  Could not 
> resolve host for server socket.
> {code}
> I fail to understand the original reason to pass that {{AI_ADDRCONFIG}} hint. 
> It shouldn't be there as I see it.
> Further, since Thrift 0.9.2, windows builds of thrift apps don't pass that 
> hint anymore (see THRIFT-2539), and it seems to be okay.
> For comprehension, I'm attaching a sample patch to remove {{AI_ADDRCONFIG}} 
> from {{lib/cpp}} and {{lib/py}}. The main change will be landing via GitHub, 
> per Thrift's contribution process, so please follow there too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to