Hi Kevin, On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY] wrote: > Hi, > We're having odd behavior (apparently have always but didn't realize it), > where our backend httpchks "time out": > > May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is > DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup > servers left. 1 sessions active, 0 requeued, 0 remaining in queue. > May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is > DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup > servers left. 2 sessions active, 0 requeued, 0 remaining in queue. > May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is > DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup > servers left. 1 sessions active, 0 requeued, 0 remaining in queue. > May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is > DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup > servers left. 0 sessions active, 0 requeued, 0 remaining in queue. > May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is > DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup > servers left. 3 sessions active, 0 requeued, 0 remaining in queue. > May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is > DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup > servers left. 0 sessions active, 0 requeued, 0 remaining in queue. > May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is > DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup > servers left. 1 sessions active, 0 requeued, 0 remaining in queue. > May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is > DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup > servers left. 0 sessions active, 0 requeued, 0 remaining in queue. > May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is > DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup > servers left. 0 sessions active, 0 requeued, 0 remaining in queue. > > > We've been playing with the timeout values, and we don't know what is > controlling the "Layer7 timeout, check duration: 1002ms". The backend > service availability check (by hand) typically takes 2-3 seconds on average. > Here is the relevant haproxy setup. > > #--------------------------------------------------------------------- > # Global settings > #--------------------------------------------------------------------- > global > log-send-hostname opsslb1 > log 127.0.0.1 local1 info > # chroot /var/lib/haproxy > pidfile /var/run/haproxy.pid > maxconn 1024 > user haproxy > group haproxy > daemon > > #--------------------------------------------------------------------- > # common defaults that all the 'listen' and 'backend' sections will > # use if not designated in their block > #--------------------------------------------------------------------- > defaults > mode http > log global > option dontlognull > option httpclose > option httplog > option forwardfor > option redispatch > timeout connect 500 # default 10 second time out if a backend is not found > timeout client 50000 > timeout server 3600000 > maxconn 60000 > retries 3 > > frontend webapp_ops_ft > > bind 10.0.40.209:80 > default_backend webapp_ops_bk > > backend webapp_ops_bk > balance roundrobin > option httpchk HEAD /app/availability > reqrep ^Host:.* Host:\ webapp.example.com > server webapp_ops1 opsapp1.ops.example.com:41000 check inter 30000 > server webapp_ops2 opsapp2.ops.example.com:41000 check inter 30000 > server webapp_ops3 opsapp3.ops.example.com:41000 check inter 30000 > timeout check 15000 > timeout connect 15000
This is quite strange. The timeout is defined first by "timeout check" or if unset, by "inter". So in your case you should observe a 15sec timeout, not one second. What exact version is this ? (haproxy -vv) It looks like a bug, however it could be a bug in the timeout handling as well as in the reporting. I'd suspect the latter since you're saying that the service takes 2-3 sec to respond and you don't seem to see errors that often. Regards, Willy

