Hi Denis, On Thu, Feb 06, 2014 at 12:36:05PM +0700, Denis Malyshkin wrote: > Hello Willy, > > Thank you for the explanation and suggestions. > I've re-checked logs and connections. > > 1. There are no TIME_WAIT connections on our server. They may appear for > a very short time, but there are no long-waiting ones. So in that our > system works good.
OK. When you say the server, you mean the machine running haproxy, can you confirm ? > 2. What is connection retry mechanism you mentioned? Is it a haproxy or > a system mechanism? Haproxy supports a retry mechanism for each connect. It defaults to 3 retries after a failed connect. It's set in the backend (or defaults) using the "retries" directive. > 3. With my re-connect loop the second try always was successful. Does it > mean that without my loop connection retry mechanism will also > successfully re-connect and such log errors may be completely ignored? Yes, and you'll have a non-null retry count on the log lines (the value after the connection counts, also the last field of the second block of 5 counters). > 4. The main question. If above is right and such error messages are > completely harmless why so such errors are logged here while all other > connect errors aren't? I have no idea, and that's what we need to sort out. > Such logging worries our admins (and me) and so > we started to investigate and try to fix them. May it be better to > remove these log messages, or move it somewhere upper to the point where > connection retry mechanism decides that all reconnect tries are > unsuccessful? Do you have any reasons to leave them just here? This is a very serious incident, it means that the system is about to collapse. It's totally abnormal that connect() cannot find a spare port, so I'd better keep the message. > 5. We see "Connect() failed...: no free ports." errors 20-70 times per > day (depending on server load). Could you imagine any reasons why such > errors may occur? haproxy has only about 500-700 open connections, there > are no "dead" ones, all are in ESTABLISHED state. Do you have any listening ports in the same range as the outgoing port range ? It could be one reason for the system occasionally failing to allocate a port. It's also very possible that you're facing a kernel bug, we've had many changes to the source port allocation mechanism in various kernels in order to workaround some such issues. Maybe just upgrading the kernel will get rid of the issue. Alternately, you can use the "source" parameter either on each server or in the backend to fix a port range. Haproxy will then use an explicit bind. This is normally used when you want to have more than 64k conns on multiple servers. But here you could try this : source 0.0.0.0:32678-61000 Regards, Willy