I have the following patch that I think will retain that
functionality, basically if you're specifying a local port of 0 and
INADDR_ANY, then you shouldn't call bind.
--- iocore/net/UnixConnection.cc 2012-05-07 14:56:06.000000000 -0700
+++ iocore/net/UnixConnection.cc 2012-10-09 15:44:47.021952385 -0700
@@ -297,15 +297,17 @@
}
// Local address/port.
- struct sockaddr_in bind_sa;
- memset(&bind_sa, 0, sizeof(bind_sa));
- bind_sa.sin_family = AF_INET;
- bind_sa.sin_port = htons(local_port);
- bind_sa.sin_addr.s_addr = local_addr;
- if (-1 == socketManager.ink_bind(fd,
- reinterpret_cast<struct sockaddr
*>(&bind_sa),
- sizeof(bind_sa)))
- return -errno;
+ if(local_port != 0 || local_addr != INADDR_ANY) {
+ struct sockaddr_in bind_sa;
+ memset(&bind_sa, 0, sizeof(bind_sa));
+ bind_sa.sin_family = AF_INET;
+ bind_sa.sin_port = htons(local_port);
+ bind_sa.sin_addr.s_addr = local_addr;
+ if (-1 == socketManager.ink_bind(fd,
+ reinterpret_cast<struct sockaddr
*>(&bind_sa),
+ sizeof(bind_sa)))
+ return -errno;
+ }
cleanup.reset();
is_bound = true;
On Tue, Oct 9, 2012 at 3:54 PM, Bart Wyatt <[email protected]> wrote:
> In cases where the socket address is non local (full transparent proxy) and
> when trafficserver is configured to make upstream OS connections from a
> specific interface/address ( port configs that use the ip-out identifier),
> the ::bind call must precede the connect in order to correctly set the
> socket's "local" address.
>
> Barring those two cases, the ::bind call does seem spurious. But whatever
> solution we implement should respect and maintain those capabilities.
>
> I ran into a similar issue with non-local address spaces and running out of
> ports in TS-1075. In that instance the kernels auto-assignment of ports was
> unable to properly account for multiple port-spaces for non-local or Aliased
> IP addresses.
>
> -Bart
>
> -----Original Message-----
> From: Brian Geffon [mailto:[email protected]]
> Sent: Tuesday, October 09, 2012 4:50 PM
> To: [email protected]
> Subject: Connect returning EADDRNOTAVAIL
>
> Hello All,
>
> tl;dr: I think we should remove the call to bind() before our call to
> connect().
>
> I've run into a situation where after a while the connect system call in
> Connection::connect in UnixConnect.cc will actually fail with errno = 99
> (EADDRNOTAVAIL) on RHEL 6, this would cause hostdb to mark the host as down
> and then we would see repeated connection failures because hostdb has
> decided the host was down. Receiving a EADDRNOTAVAIL from connect() was very
> surprising since according to many sources connect() should never actually
> return this value. After some digging, it appears that connect can return
> EADDRNOTAVAIL when the local ip port remote ip port pair is already in use.
> But shouldn't the OS have chosen a port that wasn't in use?
>
> So I found two possible solutions to this problem and verified them on a
> host that was exhibiting this sporadic behavior. Both patches are for 3.0.x.
>
> The first patch is as follows:
>
> --- iocore/net/UnixConnection.cc 2012-05-07 14:56:06.000000000 -0700
> +++ iocore/net/UnixConnection.cc 2012-10-09 12:35:35.960953957 -0700
> @@ -324,9 +324,18 @@
>
> cleaner<Connection> cleanup(this, &Connection::_cleanup); // mark for
> close until we succeed.
>
> + /*
> + * Connect technically should never return this, but ocasionally
> some OSes will.
> + * Since we specified INADDR_ANY and ANYPORT this shouldn't happen, so
> try
> + * again to prevent hostdb from marking the host as down when it
> was a supurious
> + * OS error
> + */
> + do {
> res = ::connect(fd,
> reinterpret_cast<struct sockaddr *>(&sa),
> sizeof(struct sockaddr_in));
> + } while (-1 == res && EADDRNOTAVAIL == errno);
> +
> // It's only really an error if either the connect was blocking
> // or it wasn't blocking and the error was other than EINPROGRESS.
> // (Is EWOULDBLOCK ok? Does that start the connect?)
>
> Basically, it just involves retrying the connect when the OS returns this
> weird EADDRNOTAVAIL, again, I have verified that this stops the problem.
>
> The second fix was to simply not call bind() before a connect(), this also
> fixes the problem and the reason it does is sort of complicated:
>
> --- iocore/net/UnixConnection.cc 2012-05-07 14:56:06.000000000
> -0700
> +++ iocore/net/UnixConnection.cc 2012-10-09 13:35:34.660974785
> -0700
> @@ -296,6 +296,7 @@
> #endif
> }
>
> +#ifdef BIND_BEFORE_CONNECT
> // Local address/port.
> struct sockaddr_in bind_sa;
> memset(&bind_sa, 0, sizeof(bind_sa));
> @@ -307,6 +308,8 @@
> sizeof(bind_sa)))
> return -errno;
>
> +#endif
> +
> cleanup.reset();
> is_bound = true;
> return 0;
>
> So after digging for a while to figure out why not calling bind would fix
> this problem it turns out that the Linux kernel uses two different
> mechanisms to find a free port when local port specific is 0 (ANYPORT), the
> method used in bind() can be seen in net/ipv4/inet_connection_sock.c's
> function inet_csk_get_port(), and the method used when connect() is called
> on an unbind socket can be seen in net/ipv4/inet_hashtables.c's function
> __inet_hash_connect().
> The primary difference is that the bind() version does not consider the
> local ip when looking for a port to use, so this can prevent local ports
> from being reused even though the source ip source port remote ip remote
> port 4 tuple is different, I found somewhat of an explanation here:
> http://aleccolocco.blogspot.com/2008/11/ephemeral-ports-problem-and-solution
> .html.
>
> So I was hoping to get some community feedback on what people thing the best
> solution to this problem is, I believe the second solution which doesn't use
> bind is the better approach.
>
> Thanks,
> Brian
>