Hi Denis,

On Tue, Feb 18, 2014 at 08:33:26PM +0700, Denis Malyshkin wrote:
> Hi Willy,
> 
> Thank you a lot for detailed answers.
> Sorry for so long delay, I have to check docs, configurations and logs.

no problem, I'm not fast to follow e-mails either as you can see...

> >>3. With my re-connect loop the second try always was successful. Does it 
> >>mean that without my loop connection retry mechanism will also 
> >>successfully re-connect and such log errors may be completely ignored?    
> >Yes, and you'll have a non-null retry count on the log lines (the value
> >after the connection counts, also the last field of the second block of 5 
> >counters).
> >  
> Checked our logs. Found the next, for example:
> 
> cat haproxy.log | grep -A 8 "Connect()"
> ==================================================================================
> Feb 18 04:49:43 localhost haproxy[9335]: Connect() failed for backend 
> https: no free ports.
> Feb 18 04:49:47 localhost haproxy[9335]: <IP:port> 
> [18/Feb/2014:04:49:45.245] httpinternal httpinternal/app_server1 
> 0/0/0/2394/2394 200 269 - - --NI 200/0/0/0/0 0/0 "GET /url HTTP/1.1"
> Feb 18 04:49:49 localhost haproxy[9335]: <IP:port> 
> [18/Feb/2014:04:49:49.486] httpinternal httpinternal/app_server2 
> 0/0/0/46/46 200 269 - - --NI 200/0/0/0/0 0/0 "GET /url HTTP/1.1"
> Feb 18 04:50:10 localhost haproxy[9335]: <IP:port> 
> [18/Feb/2014:04:50:07.745] httpinternal httpinternal/app_server1 
> 0/0/0/2935/2935 200 269 - - --NI 201/1/1/0/0 0/0 "GET /url HTTP/1.1"
> Feb 18 04:50:10 localhost haproxy[9335]: <IP:port> 
> [18/Feb/2014:04:50:10.002] httpinternal httpinternal/app_server2 
> 0/0/0/684/684 200 269 - - --NI 200/0/0/0/0 0/0 "GET /url HTTP/1.1"
> Feb 18 04:50:13 localhost haproxy[9335]: <IP:port> 
> [18/Feb/2014:04:49:40.489] https~ https/app_server2 2127/0/9/943/33140 
> 200 836 - - --NI 199/199/199/99/0 0/0 "POST /url HTTP/1.1"
> Feb 18 04:50:13 localhost haproxy[9335]: <IP:port> 
> [18/Feb/2014:04:49:40.487] https~ https/app_server2 2000/0/129/810/33142 
> 200 836 - - --NI 198/198/198/98/0 0/0 "POST /url HTTP/1.1"
> Feb 18 04:50:13 localhost haproxy[9335]: <IP:port> 
> [18/Feb/2014:04:49:40.489] https~ https/app_server2 1999/0/128/955/33140 
> 200 836 - - --NI 197/197/197/97/0 0/0 "POST /url HTTP/1.1"
> Feb 18 04:50:13 localhost haproxy[9335]: <IP:port> 
> [18/Feb/2014:04:49:40.488] https~ https/app_server2 2000/0/128/900/33141 
> 200 836 - - --NI 196/196/196/96/0 0/0 "POST /url HTTP/1.1"
> ==================================================================================
> 
> cat haproxy.log | grep -v "\-\-[NV][NI] [0-9]*/[0-9]*/[0-9]*/[0-9]*/0"
> ==================================================================================
> Feb 18 04:49:43 localhost haproxy[9335]: Connect() failed for backend 
> https: no free ports.
> Feb 18 04:50:18 localhost haproxy[9335]: <IP:port> 
> [18/Feb/2014:04:49:40.481] https~ https/app_server1 3120/0/1/4262/37675 
> 200 836 - - --NI 22/22/22/22/1 0/0 "POST /url HTTP/1.1"
> ==================================================================================
> 
> So it seems that error on 04:49:43 was retried and retry is logged on 
> 04:50:18, isn't it?
> Could you please confirm that above log says that connect was 
> successfully retried after an error?

Yes that's exactly it. It's unfortunate that the connect error does not
provide enough information to correlate it with another one, but since
its purpose is only to alert on a dangerous system condition and the
traffic log will report the information, it's not a big miss.

> >Alternately, you can use the "source" parameter either on each server
> >or in the backend to fix a port range. Haproxy will then use an explicit
> >bind. This is normally used when you want to have more than 64k conns on
> >multiple servers. But here you could try this :
> >
> >    source 0.0.0.0:32678-61000
> >  
> Great! As I understand we can set port range to exclude listening ports 
> and so eliminate such errors?  Probably it may be a good workaround. 
> Thank you again for the idea.

Yes, but the range is contiguous, so you cannot puch holes in it.

> According to your answers I've re-read source code, documentation, 
> reviewed logs and below is my conclusion about our issue. Could you 
> please check it and correct if I'm wrong or miss something.
> 
> 1. If 'retries' parameter is set to a value higher than 1 then on such 
> connect error haproxy automatically retries connect attempt.

Yes.

> 2. Such errors usually mean critical issues with resources, so their 
> logging is very important and so will not be removed.

Exact.

> 3. The information that error is successfully resolved can be taken only 
> from 'info' level logging, moreover -- only by indirect way from a 
> periodical statistics.

Or by ensuring that you don't have any connection error in your logs.

> 4. So, conclusion, we can either just ignore such errors in logs or 
> (more preferably) workaround them by using source port range limits to 
> exclude listening ports.

I'd adopt a different procedure :
  - traffic logs should always be analysed to ensure that everything is
    in good health condition (load balancers, network, servers)
  - higher connection error rate on a specific server indicates something
    in relation with this server ;
  - connection errors equally distributed on all servers indicate something
    with either the load balancer or the network. The network generally
    causes packet losses, so you'll have large timers. Resource shortage
    are generally detected immediately and result in a quick error.
  - these lack of free source port errors in the logs are always a concern
    and may indicate a misbehaving local service or a trouble with the
    configuration. I'd rather not filter them out so that you can be
    alerted to qualify their cause and get rid of them by fixing the cause.

> If all above is correct (could you please confirm it) so it seems that 
> our issue is resolved.

If you apply the "source" parameter to exclude the ports you're listening on,
you should never see these errors anymore, so you won't need to ignore these
errors :-)

Regards,
Willy


Reply via email to