Hi Kevin,

On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
> Hi,
> We're having odd behavior (apparently have always but didn't realize it), 
> where our backend httpchks "time out":
> 
> May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
> May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
> May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
> May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> 
> 
> We've been playing with the timeout values, and we don't know what is 
> controlling the "Layer7 timeout, check duration: 1002ms".  The backend 
> service availability check (by hand) typically takes 2-3 seconds on average.
> Here is the relevant haproxy setup.
> 
> #---------------------------------------------------------------------
> # Global settings
> #---------------------------------------------------------------------
> global
>     log-send-hostname opsslb1
>     log         127.0.0.1 local1 info
> #    chroot      /var/lib/haproxy
>     pidfile     /var/run/haproxy.pid
>     maxconn     1024
>     user        haproxy
>     group       haproxy
>     daemon
> 
> #---------------------------------------------------------------------
> # common defaults that all the 'listen' and 'backend' sections will
> # use if not designated in their block
> #---------------------------------------------------------------------
> defaults
>     mode        http
>     log         global
>     option      dontlognull
>     option      httpclose
>     option      httplog
>     option      forwardfor
>     option      redispatch
>     timeout connect 500 # default 10 second time out if a backend is not found
>     timeout client 50000
>     timeout server 3600000
>     maxconn     60000
>     retries     3
> 
> frontend webapp_ops_ft
>     
>         bind 10.0.40.209:80
>         default_backend webapp_ops_bk
> 
> backend webapp_ops_bk
>         balance roundrobin
>         option httpchk HEAD /app/availability
>         reqrep ^Host:.* Host:\ webapp.example.com
>         server webapp_ops1 opsapp1.ops.example.com:41000 check inter 30000
>         server webapp_ops2 opsapp2.ops.example.com:41000 check inter 30000
>         server webapp_ops3 opsapp3.ops.example.com:41000 check inter 30000
>         timeout check 15000
>         timeout connect 15000

This is quite strange. The timeout is defined first by "timeout check" or if
unset, by "inter". So in your case you should observe a 15sec timeout, not
one second.

What exact version is this ? (haproxy -vv)

It looks like a bug, however it could be a bug in the timeout handling as
well as in the reporting. I'd suspect the latter since you're saying that
the service takes 2-3 sec to respond and you don't seem to see errors
that often.

Regards,
Willy


Reply via email to