Re: httpchk failures

Lucky NumberSevun Sat, 24 Oct 2015 11:31:39 -0700

Benjamin Smith <lists@...> writes:

> 
> Igor, 
> 
> Thanks for the response; I didn't see this email until just now as it 
didn't 
> go through the mailing list and so wasn't filtered as expected. 
> 
> I spent my morning trying everything I could think of to get haproxy's 
agent-
> check to work consistently. The main symptom is that haproxy would 
mark hosts 
> with the status of "DRAIN" and provide no clues as to why, even with 
log-
> health-checks on. After a *lot* of trial and error, I've found the 
following 
> that seem to be bugs, on the latest 1.5.11 release, running on CentOS 
6. 
> 
> 1) agent-check output words sometimes handled inconsistently, ignored, 
or 
> misunderstood if " " was used instead of "," as a separator. 
> 
> This is understood: 
> echo "ready,78%\r\n"
> 
> This line often causes a DRAIN state. A restart of haproxy was 
insufficient to 
> clear the DRAIN state: (see #3) 
> echo "ready 78%\r\n"
> 
> 2) Inconsistent logging of DRAIN status change when health logging was 
on. 
> (the server would turn blue in the stats page without any logging as 
to why. 
> Logging status would sometimes say " Server $service/$name is UP 
(leaving 
> forced drain)" even as the stats page continues to report DRAIN state! 
> 
> 3) Even when the agent output was amended as above, for hosts that 
were set to 
> the DRAIN state pursuant to #1 issue were not brought back to ready/up 
state 
> until "enable health $service/$host" and/or "enable agent 
$service/$host" was 
> sent to the stats port. 
> 
> 4) Setting the server weight to 10 seems to help a significant amount. 
If, in 
> fact, haproxy can't handle 35% of 1 it should throw an error on 
startup IMHO. 
> 
> See also my comments interspersed below: 
> 
> Thanks, 
> 
> Benjamin Smith 
> 
> On Tuesday, April 14, 2015 10:50:31 AM you wrote:
> > On Tue, Apr 14, 2015 at 10:11 AM, Igor Cicimov <
> > 
> > igorc@...> wrote:
> > > On Tue, Apr 14, 2015 at 5:00 AM, Benjamin Smith <lists@...>
> > > 
> > > wrote:
> > >> We have 5 Apache servers behind haproxy and we're trying to 
enable use
> > >> the
> > >> "httpchk" option along with some performance monitoring. For some 
reason,
> > >> haproxy keeps thinking that 3/5 apache servers are "down" even 
though
> > >> it's
> > >> obvious that haproxy is both asking the questions and the servers 
are
> > >> answering.
> > >> 
> > >> Is there a way to log httpchk failures? How can I ask haproxy why 
it
> > >> seems to
> > >> think that several apache servers are down?
> > >> 
> > >> Our config:
> > >> CentOS 6.x recently updated, 64 bit.
> > >> 
> > >> Performing an agent-check manually seems to give good results. 
The below
> > >> result is immediate:
> > >> [root <at> xr1 ~]# telnet 10.1.1.12 9333
> > >> Trying 10.1.1.12...
> > >> Connected to 10.1.1.12.
> > >> Escape character is '^]'.
> > >> up 78%
> > >> Connection closed by foreign host.
> > >> 
> > >> 
> > >> I can see that xinetd on the logic server got the response:
> > >> Apr 13 18:45:02 curie xinetd[21890]: EXIT: calcload333 status=0 
pid=25693
> > >> duration=0(sec)
> > >> Apr 13 18:45:06 curie xinetd[21890]: START: calcload333 pid=26590
> > >> from=::ffff:10.1.1.1
> > >> 
> > >> 
> > >> I can see that apache is serving happy replies to the load 
balancer:
> > >> [root <at> curie ~]# tail -f /var/log/httpd/access_log | grep -i 
"10.1.1.1 "
> > >> 10.1.1.1 - - [13/Apr/2015:18:47:15 +0000] "OPTIONS / HTTP/1.0" 
302 - "-"
> > >> "-"
> > >> 10.1.1.1 - - [13/Apr/2015:18:47:17 +0000] "OPTIONS / HTTP/1.0" 
302 - "-"
> > >> "-"
> > >> 10.1.1.1 - - [13/Apr/2015:18:47:19 +0000] "OPTIONS / HTTP/1.0" 
302 - "-"
> > >> "-"
> > >> ^C
> > > 
> > > I have a feeling you might have been little bit confused here. Per 
my
> > > understanding, and your configuration:
> > > 
> > > server server10 10.1.1.10:20333 maxconn 256 *check agent-check 
agent-port
> > > 9333 agent-inter 4000*
> > > 
> > > the HAP is doing a health check on the agent you are using and not 
on the
> > > Apache so the apache response in this case looks irrelevant to me. 
I don't
> > > know how did you setup the agent since you haven't posted that 
part but
> > > this is an excellent article by Malcolm Turnbull, the inventor of
> > > agent-check, that might help:
> > > 
> > > 
> > > http://blog.loadbalancer.org/open-source-windows-service-for-
reporting-ser
> > > ver-load-back-to-haproxy-load-balancer-feedback-agent/
> 
> We used this exact blog entry as our starting point. In our case, the 
xinetd 
> script compares load average, apache process count, cpu info and a 
little salt 
> to come up with a number ranging from 0% to 500%. 
> 
> > and press enter twice and check the output. Other option is using 
curl:
> > 
> > $ curl -s -S -i --http1.0 -X OPTIONS http://10.1.1.12:20333/
> 
> [root <at> xr1 ~]# curl -s -S -i --http1.0 -X OPTIONS 
http://10.1.1.12:20333
> HTTP/1.1 302 Found
> Date: Tue, 14 Apr 2015 23:39:40 GMT
> Server: Apache/2.2.15 (CentOS)
> X-Powered-By: PHP/5.3.3
> Set-Cookie: PHPSESSID=3ph0dvg4quebl1b2e711d8i5p1; path=/; secure
> Cache-Control: public, must-revalidate, max-age=0
> X-Served-By: curie.-SNIP-
> Location: /mod.php/index.php
> Vary: Accept-Encoding
> Content-Length: 0
> Connection: close
> Content-Type: text/html; charset=UTF-8
> 
> > and some variations of the above that I often use to check the 
headers only:
> > 
> > $ curl -s -S -I --http1.0 -X OPTIONS http://10.1.1.12:20333/
> > $ curl -s -S -D - --http1.0 -X OPTIONS http://10.1.1.12:20333/
> > 
> > You can also try the health check with HTTP/1.1 version which 
provides
> > keepalive but you need to specify the Host header in that case.
> > 
> > By the way, any errors in the haproxy logs? Maybe set the log mode 
to debug?
> 
> Originally there was very little useful data in the log files at all. 
Adding 
> the log-health-checks helped but it's still frustratingly incomplete. 
> 
>


I ran into a similar scenario yesterday on 'HA-Proxy version 1.5.4 
2014/09/02'. We had a situation where a radosgw on one of the backend 
servers was not releasing memory and apache was occasionally showing 500 
error messages in the apache log, rados had stopped logging so we wanted 
to stop new connections going to the server and then restart radosgw. 
Initially we issued the 'server disable' command to the haproxy socket, 
and saw in the health status that the server was in MAINT mode, but 
didnt see new connections stop arriving at the server, so we waited and 
watched it and about 30 minutes later we issued the 'status drain' 
command to the socket, and immediately saw connections stop arriving.  
We restarted the radosgw service at this point. We then issued the 
'status ready' command to the socket and saw connections start arriving 
again, but the server was flip-flopping between MAINT and DRAIN states 
in the health status output. It was only after I re-issued the 'server 
enable' command twice to the socket that the server appeared in the 
health status output as being consistently colored green in its row, and 
showing UP status. However I still saw the following in the logs after 
issuing the last 'server enable' command.

from: haproxy-status.log:
+++++
Server swift_cluster/css-host1-036 is UP/READY (leaving forced 
maintenance).
Server swift_nonssl_cluster/css-host1-036 is UP (leaving forced drain).
+++++

The hosts still appear to be receiving new connections, even before i 
re-issued the 'server enable' command again to the socket and the health 
status output was showing the host flip-flopping between the MAINT and 
DRAIN states. Everything seems ok/UP now, and has been stable for the 
last 12 hours, with no apparent change in the health status output. 

...hope this helps...

Lucky

Re: httpchk failures

Reply via email to