Re: Help! HAProxy randomly failing health checks!

Igor Cicimov Tue, 15 Mar 2016 16:48:07 -0700

On Wed, Mar 16, 2016 at 5:54 AM, Zachary Punches <zpunc...@getcake.com>
wrote:


> Hello!
>
>
>
> My name is Zack, and I have been in the middle of an on going HAProxy
> issue that has me scratching my head.
>
>
>
> Here is the setup:
>
>
>
> Our setup is hosted by amazon, and our HAProxy (1.6.3) boxes are in each
> region in 3 regions. We have 2 HAProxy boxes per region for a total of 6
> proxy boxes.
>
>
>
> These boxes are routed information through route 53. Their entire job is
> to forward data from one of our clients to our database backend. It handles
> this absolutely fine, except between the hours of 7pm PST and 7am PST.
> During these hours, our route53 health checks time out thus causing the
> traffic to switch to the other HAProxy box inside of the same region.
>
>
>
> During the other 12 hours of the day, we receive 0 alerts from our health
> checks.
>
>
>
> I have noticed that we get a series of SSL handshake failures (though this
> happens throughout the entire day) that causes the server to hang for a
> second, thus causing the health checks to fail. During the day our SSL
> failures do not cause the server to hang long enough to go fail the checks,
> they only fail at night. I have attached my HAProxy config hoping that you
> guys have an answer for me. Lemme know if you need any more info.
>
>
>
> I have done a few tcpdump captures during the SSL handshake failures (not
> at night during it failing, but during the day when it still gets the SSL
> handshake failures, but doesn’t fail the health check) and it seems there
> is a d/c and a reconnect during the handshake.
>
>
>
> Here is my config, I will be running a tcpdump tonight to capture the
> packets during the failure and will attach it if you guys need more info.
>
>
>
>  #---------------------------------------------------------------------
>
> # Example configuration for a possible web application.  See the
>
> # full configuration options online.
>
> #
>
> #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
>
> #
>
> #---------------------------------------------------------------------
>
>
>
> #---------------------------------------------------------------------
>
> # Global settings
>
> #---------------------------------------------------------------------
>
> global
>
>     log         127.0.0.1 local2
>
>
>
>     pidfile     /var/run/haproxy.pid
>
>     maxconn     30000
>
>     user        haproxy
>
>     group       haproxy
>
>     daemon
>
>     ssl-default-bind-options no-sslv3 no-tls-tickets
>
>     tune.ssl.default-dh-param 2048
>
>
>
>  # turn on stats unix socket
>
> #    stats socket /var/lib/haproxy/stats`
>
>
>
> #---------------------------------------------------------------------
>
> # common defaults that all the 'listen' and 'backend' sections will
>
> # use if not designated in their block
>
> #---------------------------------------------------------------------
>
> defaults
>
>     mode                    http
>
>     log                     global
>
>     option                  httplog
>
>     retries                 3
>
>     timeout http-request    5s
>
>     timeout queue           1m
>
>     timeout connect         31s
>
>     timeout client          31s
>
>     timeout server          31s
>
>     maxconn                 15000
>
>
>
> # Stats
>
>     stats                            enable
>
>     stats uri                       /haproxy?stats
>
>     stats realm                  Strictly\ Private
>
>     stats auth                    $StatsUser:$StatsPass
>
>
>
> #---------------------------------------------------------------------
>
> # main frontend which proxys to the backends
>
> #---------------------------------------------------------------------
>
>
>
> frontend shared_incoming
>
>     maxconn 15000
>
>     timeout http-request 5s
>
>
>
> #    Bind ports of incoming traffic
>
>     bind *:1025 accept-proxy # http
>
>     bind *:1026 accept-proxy ssl crt /path/to/default/ssl/cert.pem ssl crt
> /path/to/cert/folder/ # https
>
>     bind *:1027 # Health checking port
>
>     acl gs_texthtml url_reg \/gstext\.html    ## allow gs to do meta tag
> verififcation
>
>              acl gs_user_agent hdr_sub(User-Agent) -i globalsign    ##
> allow gs to do meta tag verififcation
>
>
>
> #      Add headers
>
>     http-request set-header $Proxy-Header-Ip %[src]
>
>     http-request set-header $Proxy-Header-Proto http if !{ ssl_fc }
>
>     http-request set-header $Proxy-Header-Proto https if { ssl_fc }
>
>
>
> #     Route traffic based on domain
>
>     use_backend gs_verify if gs_texthtml or gs_user_agent    ## allow gs
> meta tag verification
>
>     use_backend
> %[req.hdr(host),lower,map_dom(/path/to/map/file.map,unknown_domain)]
>
>
>
> #     Drop unrecognized traffic
>
>     default_backend unknown_domain
>
>
>
> #---------------------------------------------------------------------
>
> # Backends
>
> #---------------------------------------------------------------------
>
>
>
> backend server0  ## added to allow gs ssl meta tag verification
>
>     reqrep ^GET\ /.*\ (HTTP/.*)    GET\ /GlobalSignVerification\ \1
>
>     server server0_http server0.domain.com:80/GlobalSignVerification/
>
>
>
> backend server1
>
>     server server1_http server1.domain.net:80
>
>
>
> backend server2
>
>     server server2_http server2.domain.net:80
>
>
>
> backend server3
>
>     server server3_http server3.domain.net:80
>
>
>
> backend server4
>
>     server server4_http server4.domain.net:80
>
>
>
> backend server5
>
>     server server5_http server5.domain.net:80
>
>
>
> backend server6
>
>     server server6_http server6.domain.net:80
>
>
>
> backend server7
>
>     server server7_http server7.domain.net:80
>
>
>
> backend server8
>
>     server server8_http server8.domain.net:80
>
>
>
> backend server9
>
>     server server9_http server9.domain.net:80
>
>
>
> backend unknown_domain
>
>     timeout connect 4s
>
>     timeout server 4s
>
>     errorfile 503 /etc/haproxy-shared/errors/404.html
>
>
>

I would say the best thing is to move the HAP health checks to separate
port(s). Also don't see the need for the hc being over SSL at all. have
following setup on each HAP we use in AWS:

# check LB status
frontend monitor-in
    bind *:33305
    mode http
    option httplog
    monitor-uri /monitor

# this checks the backend health instead HAP health
frontend health
    bind *:34180
    mode http
    # create a status URI in /haproxy_status that will return
    # a 200 if backend is healthy, and 503 if it isn't. This
    # URI can be queried by an ELB or Route53.
    acl backend_dead nbsrv(tomcats) lt 1
    monitor-uri /haproxy_status
    monitor fail if backend_dead

frontend localhost
    bind *:80
    bind *:443 ssl crt /etc/haproxy/star_encompasshost_com.crt no-sslv3
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-RSA-RC4-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES128-SHA:AES256-SHA256:AES256-SHA:EDH+aRSA:DHE-RSA-AES256-SHA256:RC4-SHA:!aNULL:!eNULL:!LOW:!EXP:!RC4
    mode http
    default_backend tomcats
...
backend tomcats
...
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s
maxconn 250 maxqueue 256
    server s1 ...
    server s2 ...
    server s3 ...
...

Then in Route53 I can use check on port 33305 to find out if the HAP is
still alive and/or port 34180 to find out if there is still any backend
left to handle the requests.

Cheers,
Igor

Re: Help! HAProxy randomly failing health checks!

Reply via email to