Re: TIME_WAIT tuning

Willy Tarreau Sun, 29 Jan 2012 06:20:17 -0800

Hi Sander,

On Fri, Jan 27, 2012 at 09:52:11PM +0100, Sander Klein wrote:
> Hi,
> 
> while benchmarking my new web-server cluster I quickly hit the limit of 
> 32.768 sockets in TIME_WAIT state.


There is no such standard limit, I have even reached 5 million.

> I've been looking around on the internet but I'm a bit confused if this 
> limit can be tuned somehow or if it's an hard limit. I read about the 
> tcp_fin_timeout and tcp_tw_reuse/recycle options but I don't think they 
> will be of any use, since I hit the limit within a couple of seconds.
> 
> Can anyone give me a push in the right direction or even better, a 
> detailed explanation? ;-)

Normally you should enable tcp_tw_reuse, but not tcp_tw_recycle as obscure
bugs have been observed by many people in the past. But whether you use these
options or not, you should not encounter the issue you're describing, because :
  - if the TIME_WAIT are on the listening side, then a new SYN from the same
    port will automatically turn them into SYN_RECV.

  - if the TIME_WAIT is on the connect side (towards servers), well, there
    should not be any there, because they're on the side which performs the
    shutdown() first, and if haproxy has to close a connection to a server,
    it disables lingering which results in an RST and the socket vanishes.

There is one remaining case though : if you're getting incoming SYN with a
sequence number below the end of last window, it will be considered by the
system as a retransmit and will not be converted to SYN_RECV. This happens
in two situations :
  - intermediary firewall excessively randomizing sequence numbers (common
    issue on Cisco ACE and ASA)
  - client using totally random sequence numbers without taking care of
    previous connections (or not tracking them).

Both issues can be solved by enabling TCP timestamps, which enables the PAWS
mechanism where timestamps are used as a complement for sequence numbers.

> I was also wondering if this limit is system wide or per IP. I have 
> multiple VIP's on my loadbalancer.

It's hard to tell when the cause of the issue is not identified. It is
also possible that you're running with an untuned ip_conntrack that
fills its session table, preventing any connection from establishing
to/from the local machine. This would then be system-wide. You can
check for this using dmesg when this happens.

Regards,
Willy

Re: TIME_WAIT tuning

Reply via email to