Re: Problems with layer7 check timeout

Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY] Thu, 24 May 2012 15:03:48 -0700

Err...more precisely...
HA-Proxy version 1.4.15 2011/04/08
Copyright 2000-2010 Willy Tarreau <w...@1wt.eu>


Build options :
  TARGET  = linux26
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing
  OPTIONS = USE_REGPARM=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
     sepoll : pref=400,  test result OK
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.

On May 24, 2012, at 5:31 PM, Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY] 
wrote:

> 
> 
> I thought it was a bug in the reporting, considering we've played with 
> numerous values for the various timeouts as an experiment, but wanted your 
> thoughts.
> This is v1.4.15.
> 
> [root@opsslb1 log]# haproxy -v
> HA-Proxy version 1.4.15 2011/04/08
> Copyright 2000-2010 Willy Tarreau <w...@1wt.eu>
> 
> On May 24, 2012, at 5:17 PM, Willy Tarreau wrote:
> 
>> Hi Kevin,
>> 
>> On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. 
>> (GSFC-423.0)[RAYTHEON COMPANY] wrote:
>>> Hi,
>>> We're having odd behavior (apparently have always but didn't realize it), 
>>> where our backend httpchks "time out":
>>> 
>>> May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>>> servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>> 
>>> 
>>> We've been playing with the timeout values, and we don't know what is 
>>> controlling the "Layer7 timeout, check duration: 1002ms".  The backend 
>>> service availability check (by hand) typically takes 2-3 seconds on average.
>>> Here is the relevant haproxy setup.
>>> 
>>> #---------------------------------------------------------------------
>>> # Global settings
>>> #---------------------------------------------------------------------
>>> global
>>>   log-send-hostname opsslb1
>>>   log         127.0.0.1 local1 info
>>> #    chroot      /var/lib/haproxy
>>>   pidfile     /var/run/haproxy.pid
>>>   maxconn     1024
>>>   user        haproxy
>>>   group       haproxy
>>>   daemon
>>> 
>>> #---------------------------------------------------------------------
>>> # common defaults that all the 'listen' and 'backend' sections will
>>> # use if not designated in their block
>>> #---------------------------------------------------------------------
>>> defaults
>>>   mode        http
>>>   log         global
>>>   option      dontlognull
>>>   option      httpclose
>>>   option      httplog
>>>   option      forwardfor
>>>   option      redispatch
>>>   timeout connect 500 # default 10 second time out if a backend is not found
>>>   timeout client 50000
>>>   timeout server 3600000
>>>   maxconn     60000
>>>   retries     3
>>> 
>>> frontend webapp_ops_ft
>>> 
>>>       bind 10.0.40.209:80
>>>       default_backend webapp_ops_bk
>>> 
>>> backend webapp_ops_bk
>>>       balance roundrobin
>>>       option httpchk HEAD /app/availability
>>>       reqrep ^Host:.* Host:\ webapp.example.com
>>>       server webapp_ops1 opsapp1.ops.example.com:41000 check inter 30000
>>>       server webapp_ops2 opsapp2.ops.example.com:41000 check inter 30000
>>>       server webapp_ops3 opsapp3.ops.example.com:41000 check inter 30000
>>>       timeout check 15000
>>>       timeout connect 15000
>> 
>> This is quite strange. The timeout is defined first by "timeout check" or if
>> unset, by "inter". So in your case you should observe a 15sec timeout, not
>> one second.
>> 
>> What exact version is this ? (haproxy -vv)
>> 
>> It looks like a bug, however it could be a bug in the timeout handling as
>> well as in the reporting. I'd suspect the latter since you're saying that
>> the service takes 2-3 sec to respond and you don't seem to see errors
>> that often.
>> 
>> Regards,
>> Willy
>> 
> 
> Kevin Lange
> kevin.m.la...@nasa.gov
> kla...@raytheon.com
> W: +1 (301) 851-8450
> Raytheon  | NASA  | ECS Evolution Development Program
> https://www.echo.com  | https://www.raytheon.com
> 

Kevin Lange
kevin.m.la...@nasa.gov
kla...@raytheon.com
W: +1 (301) 851-8450
Raytheon  | NASA  | ECS Evolution Development Program
https://www.echo.com  | https://www.raytheon.com

smime.p7s
Description: S/MIME cryptographic signature

Re: Problems with layer7 check timeout

Reply via email to