Hi Willy,

Thank you a lot for detailed answers.
Sorry for so long delay, I have to check docs, configurations and logs.

1. There are no TIME_WAIT connections on our server. They may appear for a very short time, but there are no long-waiting ones. So in that our system works good.
OK. When you say the server, you mean the machine running haproxy, can you confirm ?
Yes. Sorry for inexact name.

3. With my re-connect loop the second try always was successful. Does it mean that without my loop connection retry mechanism will also successfully re-connect and such log errors may be completely ignored?
Yes, and you'll have a non-null retry count on the log lines (the value
after the connection counts, also the last field of the second block of 5 
counters).
Checked our logs. Found the next, for example:

cat haproxy.log | grep -A 8 "Connect()"
==================================================================================
Feb 18 04:49:43 localhost haproxy[9335]: Connect() failed for backend https: no free ports. Feb 18 04:49:47 localhost haproxy[9335]: <IP:port> [18/Feb/2014:04:49:45.245] httpinternal httpinternal/app_server1 0/0/0/2394/2394 200 269 - - --NI 200/0/0/0/0 0/0 "GET /url HTTP/1.1" Feb 18 04:49:49 localhost haproxy[9335]: <IP:port> [18/Feb/2014:04:49:49.486] httpinternal httpinternal/app_server2 0/0/0/46/46 200 269 - - --NI 200/0/0/0/0 0/0 "GET /url HTTP/1.1" Feb 18 04:50:10 localhost haproxy[9335]: <IP:port> [18/Feb/2014:04:50:07.745] httpinternal httpinternal/app_server1 0/0/0/2935/2935 200 269 - - --NI 201/1/1/0/0 0/0 "GET /url HTTP/1.1" Feb 18 04:50:10 localhost haproxy[9335]: <IP:port> [18/Feb/2014:04:50:10.002] httpinternal httpinternal/app_server2 0/0/0/684/684 200 269 - - --NI 200/0/0/0/0 0/0 "GET /url HTTP/1.1" Feb 18 04:50:13 localhost haproxy[9335]: <IP:port> [18/Feb/2014:04:49:40.489] https~ https/app_server2 2127/0/9/943/33140 200 836 - - --NI 199/199/199/99/0 0/0 "POST /url HTTP/1.1" Feb 18 04:50:13 localhost haproxy[9335]: <IP:port> [18/Feb/2014:04:49:40.487] https~ https/app_server2 2000/0/129/810/33142 200 836 - - --NI 198/198/198/98/0 0/0 "POST /url HTTP/1.1" Feb 18 04:50:13 localhost haproxy[9335]: <IP:port> [18/Feb/2014:04:49:40.489] https~ https/app_server2 1999/0/128/955/33140 200 836 - - --NI 197/197/197/97/0 0/0 "POST /url HTTP/1.1" Feb 18 04:50:13 localhost haproxy[9335]: <IP:port> [18/Feb/2014:04:49:40.488] https~ https/app_server2 2000/0/128/900/33141 200 836 - - --NI 196/196/196/96/0 0/0 "POST /url HTTP/1.1"
==================================================================================

cat haproxy.log | grep -v "\-\-[NV][NI] [0-9]*/[0-9]*/[0-9]*/[0-9]*/0"
==================================================================================
Feb 18 04:49:43 localhost haproxy[9335]: Connect() failed for backend https: no free ports. Feb 18 04:50:18 localhost haproxy[9335]: <IP:port> [18/Feb/2014:04:49:40.481] https~ https/app_server1 3120/0/1/4262/37675 200 836 - - --NI 22/22/22/22/1 0/0 "POST /url HTTP/1.1"
==================================================================================

So it seems that error on 04:49:43 was retried and retry is logged on 04:50:18, isn't it? Could you please confirm that above log says that connect was successfully retried after an error?


5. We see "Connect() failed...: no free ports." errors 20-70 times per day (depending on server load). Could you imagine any reasons why such errors may occur? haproxy has only about 500-700 open connections, there are no "dead" ones, all are in ESTABLISHED state.
Do you have any listening ports in the same range as the outgoing port range ?
It could be one reason for the system occasionally failing to allocate a port.
Yes. It seems that it is the reason. Thank you.

Alternately, you can use the "source" parameter either on each server
or in the backend to fix a port range. Haproxy will then use an explicit
bind. This is normally used when you want to have more than 64k conns on
multiple servers. But here you could try this :

    source 0.0.0.0:32678-61000
Great! As I understand we can set port range to exclude listening ports and so eliminate such errors? Probably it may be a good workaround. Thank you again for the idea.

According to your answers I've re-read source code, documentation, reviewed logs and below is my conclusion about our issue. Could you please check it and correct if I'm wrong or miss something.

1. If 'retries' parameter is set to a value higher than 1 then on such connect error haproxy automatically retries connect attempt. 2. Such errors usually mean critical issues with resources, so their logging is very important and so will not be removed. 3. The information that error is successfully resolved can be taken only from 'info' level logging, moreover -- only by indirect way from a periodical statistics. 4. So, conclusion, we can either just ignore such errors in logs or (more preferably) workaround them by using source port range limits to exclude listening ports.

If all above is correct (could you please confirm it) so it seems that our issue is resolved.
Thank you a lot for your help.


--
Best regards,
 Denis Malyshkin,
Senior C++ Developer
of ISS Art, Ltd., Omsk, Russia.
Mobile Phone: +7 913 669 2896
Office tel/fax +7 3812 396959
Yahoo Messenger: dmalyshkin
Web: http://www.issart.com
E-mail: dmalysh...@issart.com

Reply via email to