Re: Externalizing health checks

Bhaskar Maddala Thu, 06 Feb 2014 11:46:56 -0800

Hello,

   Since I did not get any responses on this, I decided to try
motivating a reponse
by attempting an implementation. I am attaching a patch that does
this. Admittedly
this patch is an iteration and I am not submitting it for anything
more than receiving
feedback, on the requirement, alternative ideas and the implementation.

Following is an explanation

I added an option httpchksrv which takes an ipv4/6 address (external
health checker)
 and an option http header. The http header is used to communicate to the health
check server the backend server to check.

        option      httpchk   GET /_health.php HTTP/1.1
        option      httpchksrv    <ipv4|ipv6> [header
<http-header-name=X-Check-For>]

Next, I added a "header-value" specification to the server definition

         server a1 magic.tumblr.com:80 weight 20 maxconn 5 check inter
2s header-value magic.tumblr.com

the header-value is used for the http-header-name specified in httpchksrv

Here is an example of the health check request

GET /_health.php HTTP/1.1
X-Check-For: magic.tumblr.com

The default value of header-value is the server id, in this case 'a1'

The following is a little abstract and describes how health checks can be cached
using this change, please bear with my attempts to describe it, these may be
in-adequate. Please take this for what it is, broad strokes of an
idea. I am not in any way advocating for this deployment.

Going back to my original motivation "excessive health checks due to increasing
proxy and web application deployment", here is a description of how I can solve
it using this implementation.

On haproxy I define 2 frontend, one on port 80 and one on port 6777. The
httpchksrv specification is used to direct health checks back to haproxy on port
6777. With haproxy in http mode

        option      httpchksrv    127.0.0.1:6777

Each server specification on the backend for port 80 (production traffic) uses
a server specification as

         server a1 server:80 weight 20 maxconn 5 check inter 2s

I define a backend of varnish nodes to use with the front end on port 6777.
I also make sure that the varnish backend uses only L4 health checks.

Health check are passed to varnish from all the proxies consistently hashed on
the http header X-Check-For via their front end on port 6777. Varnish
vcl is used to
obtain the header value 'X-Check-For' and make a health check request to the
appropriate web host if required, it may return cached health check
responses according
the configured TTL.

Thanks
Bhaskar

On Fri, Jan 31, 2014 at 1:46 PM, Bhaskar Maddala <madda...@gmail.com> wrote:
> Hello,
>
>    As the number of haproxy deployments (>20) grows in our infrastructure 
> along
> with an increase in the number of backends ~1500 we are beginning to
> see a non trivial resources allocated to health checks. Each proxy instance
> health checking each backend every 2 seconds.
>
>   In an earlier conversation with Willy I was directed to look into the 
> options
> fastinter and on-error configuration options. I have done this but wanted to
> speak about how others might have addressed this and if there was any
> interest in implementing something along these lines and gather ideas/comments
> on what such an implementation would look like.
>
>  We use haproxy as a http load balancer and I have not given any thought
> about how the following description applies to tcp mode.
>
> Currently we http check our backends using
>
>     option httpchk GET /_check.php HTTP/1.1\r\nHost:\ www.domain.com
>
>   We were considering adding an additional directive to specify a check server
> in addition to the httpchk directive
>
>     option          httpchk     GET /_health.php HTTP/1.1\r\nHost:\ hdr(Host)
>     option          chksrv      server hcm-008dad0f 172.16.114.52:80
>
> The change would add a dynamic field to the health check request.
> hdr(Host) (http host header in this instance) is the field used to communicate
> the server to be health checked to the external check server.
>
> The check server can/will be implemented to cache health check responses from
> the back ends.
>
> One of the justifications for implementing this is the need in my
> environment to take
> into consideration factors not available to the backends when
> responding to a health
> check. As an example we will be implementing in our check server
> ability to force
> success/failure of health checks on groups of backends related in some manner.
> We expect this to allow us to avoid brown out scenarios we have
> encountered in the past.
>
> Has anyone considered/achieved something along these lines, or have 
> suggestions
> on how we could implement the same?
>
> Thanks
> Bhaskar

From 914db3e485831e29e5b76bf3d276ce56442b498f Mon Sep 17 00:00:00 2001
From: Bhaskar Maddala <bhas...@tumblr.com>
Date: Wed, 5 Feb 2014 23:58:36 -0500
Subject: [PATCH] Attempt at adding ability to externalize health check

Summary:
We add new option 'httpchksrv' which allows us to specify
the server to use to health check backends. The backend
to health check is communicated via a http header.

The header value to be passed to the backend is specified
in the server specification using the new keyword
'header-value'

The default header is 'X-Check-Host' and the default value
is the server id.
---
 include/common/defaults.h |   1 +
 include/types/proxy.h     |   5 ++-
 include/types/server.h    |   3 ++
 src/cfgparse.c            | 103 +++++++++++++++++++++++++++++++++++++++-------
 src/checks.c              |  13 +++++-
 5 files changed, 106 insertions(+), 19 deletions(-)

diff --git a/include/common/defaults.h b/include/common/defaults.h
index f765e90..dc0dd93 100644
--- a/include/common/defaults.h
+++ b/include/common/defaults.h
@@ -131,6 +131,7 @@
 #define DEF_SMTP_CHECK_REQ   "HELO localhost\r\n"
 #define DEF_LDAP_CHECK_REQ   
"\x30\x0c\x02\x01\x01\x60\x07\x02\x01\x03\x04\x00\x80\x00"
 #define DEF_REDIS_CHECK_REQ  "*1\r\n$4\r\nPING\r\n"
+#define DEF_CHECK_HOST_HDR  "X-Check-For"
 
 #define DEF_HANA_ONERR         HANA_ONERR_FAILCHK
 #define DEF_HANA_ERRLIMIT      10
diff --git a/include/types/proxy.h b/include/types/proxy.h
index af2a3ab..12f82f5 100644
--- a/include/types/proxy.h
+++ b/include/types/proxy.h
@@ -338,7 +338,10 @@ struct proxy {
        int grace;                              /* grace time after stop 
request */
        struct list tcpcheck_rules;             /* tcp-check send / expect 
rules */
        char *check_req;                        /* HTTP or SSL request to use 
for PR_O_HTTP_CHK|PR_O_SSL3_CHK */
-       int check_len;                          /* Length of the HTTP or SSL3 
request */
+       int check_req_len;                              /* Length of the HTTP 
or SSL3 request */
+       struct sockaddr_storage check_addr;   /* the address to check */
+       char *check_hdr_name;        /* HTTP header used to identify host being 
checked */
+       int check_hdr_name_len;          /* Length of the HTTP header */
        char *expect_str;                       /* http-check expected content 
: string or text version of the regex */
        regex_t *expect_regex;                  /* http-check expected content 
*/
        struct chunk errmsg[HTTP_ERR_SIZE];     /* default or customized error 
messages for known errors */
diff --git a/include/types/server.h b/include/types/server.h
index 54ab813..52b60a5 100644
--- a/include/types/server.h
+++ b/include/types/server.h
@@ -161,6 +161,9 @@ struct server {
                struct sockaddr_storage addr;   /* the address to check, if 
different from <addr> */
        } check_common;
 
+       char *check_hdr_val;                   /* http header value used for 
health checkes */
+       int check_hdr_val_len;                      /* length of the http 
header value */
+
        struct check check;                     /* health-check specific 
configuration */
        struct check agent;                     /* agent specific configuration 
*/
 
diff --git a/src/cfgparse.c b/src/cfgparse.c
index 9993c61..05be933 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -1841,12 +1841,19 @@ int cfg_parse_listen(const char *file, int linenum, 
char **args, int kwm)
                if (curproxy->cap & PR_CAP_BE) {
                        curproxy->fullconn = defproxy.fullconn;
                        curproxy->conn_retries = defproxy.conn_retries;
+                       curproxy->check_addr = defproxy.check_addr;
 
                        if (defproxy.check_req) {
-                               curproxy->check_req = calloc(1, 
defproxy.check_len);
-                               memcpy(curproxy->check_req, defproxy.check_req, 
defproxy.check_len);
+                               curproxy->check_req = calloc(1, 
defproxy.check_req_len);
+                               memcpy(curproxy->check_req, defproxy.check_req, 
defproxy.check_req_len);
                        }
-                       curproxy->check_len = defproxy.check_len;
+                       curproxy->check_req_len = defproxy.check_req_len;
+
+                       if (defproxy.check_hdr_name) {
+                               curproxy->check_hdr_name = calloc(1, 
defproxy.check_hdr_name_len);
+                               memcpy(curproxy->check_hdr_name, 
defproxy.check_hdr_name, defproxy.check_hdr_name_len);
+                       }
+                       curproxy->check_hdr_name_len = 
defproxy.check_hdr_name_len;
 
                        if (defproxy.expect_str) {
                                curproxy->expect_str = 
strdup(defproxy.expect_str);
@@ -1990,6 +1997,8 @@ int cfg_parse_listen(const char *file, int linenum, char 
**args, int kwm)
                free(defproxy.monitor_uri);
                free(defproxy.defbe.name);
                free(defproxy.conn_src.iface_name);
+               free(defproxy.check_hdr_name);
+               defproxy.check_hdr_name_len = 0;
                free(defproxy.fwdfor_hdr_name);
                defproxy.fwdfor_hdr_len = 0;
                free(defproxy.orgto_hdr_name);
@@ -3631,11 +3640,11 @@ stats_error_parsing:
                        curproxy->options2 |= PR_O2_HTTP_CHK;
                        if (!*args[2]) { /* no argument */
                                curproxy->check_req = strdup(DEF_CHECK_REQ); /* 
default request */
-                               curproxy->check_len = strlen(DEF_CHECK_REQ);
+                               curproxy->check_req_len = strlen(DEF_CHECK_REQ);
                        } else if (!*args[3]) { /* one argument : URI */
                                int reqlen = strlen(args[2]) + strlen("OPTIONS  
HTTP/1.0\r\n") + 1;
                                curproxy->check_req = (char *)malloc(reqlen);
-                               curproxy->check_len = 
snprintf(curproxy->check_req, reqlen,
+                               curproxy->check_req_len = 
snprintf(curproxy->check_req, reqlen,
                                                               "OPTIONS %s 
HTTP/1.0\r\n", args[2]); /* URI to use */
                        } else { /* more arguments : METHOD URI [HTTP_VER] */
                                int reqlen = strlen(args[2]) + strlen(args[3]) 
+ 3 + strlen("\r\n");
@@ -3645,10 +3654,64 @@ stats_error_parsing:
                                        reqlen += strlen("HTTP/1.0");
                    
                                curproxy->check_req = (char *)malloc(reqlen);
-                               curproxy->check_len = 
snprintf(curproxy->check_req, reqlen,
+                               curproxy->check_req_len = 
snprintf(curproxy->check_req, reqlen,
                                                               "%s %s %s\r\n", 
args[2], args[3], *args[4]?args[4]:"HTTP/1.0");
                        }
                }
+               else if (!strcmp(args[1], "httpchksrv")) {
+                       if (warnifnotcap(curproxy, PR_CAP_BE, file, linenum, 
args[1], NULL))
+                                                       err_code |= ERR_WARN;
+
+                       /* use a external http check server instead of querying 
the server for health checks */
+                       if (!*args[2]) {
+                               Alert("parsing [%s:%d]: '%s' expects an 
<ipv4|ipv6> address.\n",
+                                     file, linenum, args[1]);
+                               err_code |= ERR_ALERT | ERR_FATAL;
+                               goto out;
+                       }
+
+                       struct sockaddr_storage *sk;
+                       int port1, port2;
+                       struct protocol *proto;
+
+                       sk = str2sa_range(args[2], &port1, &port2, &errmsg, 
NULL);
+                       if (!sk) {
+                               Alert("parsing [%s:%d] : '%s' : %s\n",
+                                         file, linenum, args[2], errmsg);
+                               err_code |= ERR_ALERT | ERR_FATAL;
+                               goto out;
+                       }
+
+                       proto = protocol_by_family(sk->ss_family);
+                       if (!proto || !proto->connect) {
+                               Alert("parsing [%s:%d] : '%s %s' : connect() 
not supported for this address family.\n",
+                                         file, linenum, args[1], args[2]);
+                               err_code |= ERR_ALERT | ERR_FATAL;
+                               goto out;
+                       }
+
+                       if (port1 != port2) {
+                               Alert("parsing [%s:%d] : '%s' : port ranges and 
offsets are not allowed in '%s'\n",
+                                         file, linenum, args[1], args[2]);
+                               err_code |= ERR_ALERT | ERR_FATAL;
+                               goto out;
+                       }
+
+                       curproxy->check_addr = *sk;
+
+                       if (!*args[3]) { /* no argument */
+                               curproxy->check_hdr_name = 
strdup(DEF_CHECK_HOST_HDR);
+                               curproxy->check_hdr_name_len = 
strlen(DEF_CHECK_HOST_HDR);
+                       } else if (*args[4] && !strcmp(args[3], "header")) {
+                               curproxy->check_hdr_name = strdup(args[4]);
+                               curproxy->check_hdr_name_len = strlen(args[4]);
+                       } else {
+                               Alert("parsing [%s:%d] : '%s' : valid http 
header is required when using hdr\n",
+                                         file, linenum, args[1]);
+                               err_code |= ERR_ALERT | ERR_FATAL;
+                               goto out;
+                       }
+               }
                else if (!strcmp(args[1], "ssl-hello-chk")) {
                        /* use SSLv3 CLIENT HELLO to check servers' health */
                        if (warnifnotcap(curproxy, PR_CAP_BE, file, linenum, 
args[1], NULL))
@@ -3668,18 +3731,18 @@ stats_error_parsing:
 
                        if (!*args[2] || !*args[3]) { /* no argument or 
incomplete EHLO host */
                                curproxy->check_req = 
strdup(DEF_SMTP_CHECK_REQ); /* default request */
-                               curproxy->check_len = 
strlen(DEF_SMTP_CHECK_REQ);
+                               curproxy->check_req_len = 
strlen(DEF_SMTP_CHECK_REQ);
                        } else { /* ESMTP EHLO, or SMTP HELO, and a hostname */
                                if (!strcmp(args[2], "EHLO") || 
!strcmp(args[2], "HELO")) {
                                        int reqlen = strlen(args[2]) + 
strlen(args[3]) + strlen(" \r\n") + 1;
                                        curproxy->check_req = (char 
*)malloc(reqlen);
-                                       curproxy->check_len = 
snprintf(curproxy->check_req, reqlen,
+                                       curproxy->check_req_len = 
snprintf(curproxy->check_req, reqlen,
                                                                       "%s 
%s\r\n", args[2], args[3]); /* HELO hostname */
                                } else {
                                        /* this just hits the default for now, 
but you could potentially expand it to allow for other stuff
                                           though, it's unlikely you'd want to 
send anything other than an EHLO or HELO */
                                        curproxy->check_req = 
strdup(DEF_SMTP_CHECK_REQ); /* default request */
-                                       curproxy->check_len = 
strlen(DEF_SMTP_CHECK_REQ);
+                                       curproxy->check_req_len = 
strlen(DEF_SMTP_CHECK_REQ);
                                }
                        }
                }
@@ -3726,7 +3789,7 @@ stats_error_parsing:
 
                                                free(curproxy->check_req);
                                                curproxy->check_req = packet;
-                                               curproxy->check_len = 
packet_len;
+                                               curproxy->check_req_len = 
packet_len;
 
                                                packet_len = htonl(packet_len);
                                                memcpy(packet, &packet_len, 4);
@@ -3754,7 +3817,7 @@ stats_error_parsing:
 
                        curproxy->check_req = (char *) 
malloc(sizeof(DEF_REDIS_CHECK_REQ) - 1);
                        memcpy(curproxy->check_req, DEF_REDIS_CHECK_REQ, 
sizeof(DEF_REDIS_CHECK_REQ) - 1);
-                       curproxy->check_len = sizeof(DEF_REDIS_CHECK_REQ) - 1;
+                       curproxy->check_req_len = sizeof(DEF_REDIS_CHECK_REQ) - 
1;
                }
 
                else if (!strcmp(args[1], "mysql-check")) {
@@ -3803,7 +3866,7 @@ stats_error_parsing:
 
                                                free(curproxy->check_req);
                                                curproxy->check_req = (char 
*)calloc(1, reqlen);
-                                               curproxy->check_len = reqlen;
+                                               curproxy->check_req_len = 
reqlen;
 
                                                snprintf(curproxy->check_req, 
4, "%c%c%c",
                                                        ((unsigned char) 
packetlen & 0xff),
@@ -3836,7 +3899,7 @@ stats_error_parsing:
 
                        curproxy->check_req = (char *) 
malloc(sizeof(DEF_LDAP_CHECK_REQ) - 1);
                        memcpy(curproxy->check_req, DEF_LDAP_CHECK_REQ, 
sizeof(DEF_LDAP_CHECK_REQ) - 1);
-                       curproxy->check_len = sizeof(DEF_LDAP_CHECK_REQ) - 1;
+                       curproxy->check_req_len = sizeof(DEF_LDAP_CHECK_REQ) - 
1;
                }
                else if (!strcmp(args[1], "tcp-check")) {
                        /* use raw TCPCHK send/expect to check servers' health 
*/
@@ -4563,6 +4626,8 @@ stats_error_parsing:
                        newsrv->state = SRV_RUNNING; /* early server setup */
                        newsrv->last_change = now.tv_sec;
                        newsrv->id = strdup(args[1]);
+                       newsrv->check_hdr_val = strdup(args[1]);
+                       newsrv->check_hdr_val_len = strlen(args[1]);
 
                        /* several ways to check the port component :
                         *  - IP    => port=+0, relative (IPv4 only)
@@ -4812,6 +4877,11 @@ stats_error_parsing:
                                newsrv->check_common.addr = *sk;
                                cur_arg += 2;
                        }
+                       else if (!strcmp(args[cur_arg], "header-value")) {
+                               newsrv->check_hdr_val = strdup(args[cur_arg + 
1]);
+                               newsrv->check_hdr_val_len = strlen(args[cur_arg 
 + 1]);
+                               cur_arg += 2;
+                       }
                        else if (!strcmp(args[cur_arg], "port")) {
                                newsrv->check.port = atol(args[cur_arg + 1]);
                                cur_arg += 2;
@@ -5258,6 +5328,7 @@ stats_error_parsing:
 #endif
                                newsrv->check.send_proxy |= (newsrv->state & 
SRV_SEND_PROXY);
                        }
+
                        /* try to get the port from check_core.addr if 
check.port not set */
                        if (!newsrv->check.port)
                                newsrv->check.port = 
get_host_port(&newsrv->check_common.addr);
@@ -7078,9 +7149,9 @@ out_uri_auth_compat:
                }
 
                if ((curproxy->options2 & PR_O2_CHK_ANY) == PR_O2_SSL3_CHK) {
-                       curproxy->check_len = sizeof(sslv3_client_hello_pkt) - 
1;
-                       curproxy->check_req = (char 
*)malloc(curproxy->check_len);
-                       memcpy(curproxy->check_req, sslv3_client_hello_pkt, 
curproxy->check_len);
+                       curproxy->check_req_len = 
sizeof(sslv3_client_hello_pkt) - 1;
+                       curproxy->check_req = (char 
*)malloc(curproxy->check_req_len);
+                       memcpy(curproxy->check_req, sslv3_client_hello_pkt, 
curproxy->check_req_len);
                }
 
                /* ensure that cookie capture length is not too large */
diff --git a/src/checks.c b/src/checks.c
index c3051aa..4fae04a 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -1254,7 +1254,7 @@ static void event_srv_chk_r(struct connection *conn)
                if (!done && check->bi->i < 5)
                        goto wait_more_data;
 
-               if (s->proxy->check_len == 0) { // old mode
+               if (s->proxy->check_req_len == 0) { // old mode
                        if (*(check->bi->data + 4) != '\xff') {
                                /* We set the MySQL Version in description for 
information purpose
                                 * FIXME : it can be cool to use MySQL Version 
for other purpose,
@@ -1539,7 +1539,13 @@ static struct task *process_chk(struct task *t)
                 * its own strings.
                 */
                if (check->type && check->type != PR_O2_TCPCHK_CHK && 
!(check->state & CHK_ST_AGENT)) {
-                       bo_putblk(check->bo, s->proxy->check_req, 
s->proxy->check_len);
+
+                       /* set up the http request with headers correctly */
+                       bo_putblk(check->bo, s->proxy->check_req, 
s->proxy->check_req_len);
+                       bo_putblk(check->bo, s->proxy->check_hdr_name, 
s->proxy->check_hdr_name_len);
+                       bo_putstr(check->bo, ": ");
+                       bo_putblk(check->bo, s->check_hdr_val, 
s->check_hdr_val_len);
+                       bo_putstr(check->bo, "\r\n");
 
                        /* we want to check if this host replies to HTTP or 
SSLv3 requests
                         * so we'll send the request, and won't wake the 
checker up now.
@@ -1569,6 +1575,9 @@ static struct task *process_chk(struct task *t)
                if (is_addr(&s->check_common.addr))
                        /* we'll connect to the check addr specified on the 
server */
                        conn->addr.to = s->check_common.addr;
+               else if (check->type == PR_O2_HTTP_CHK && 
is_addr(&s->proxy->check_addr))
+                       /* we will connect to the check addr specified on the 
proxy, only http checks*/
+                       conn->addr.to = s->proxy->check_addr;
                else
                        /* we'll connect to the addr on the server */
                        conn->addr.to = s->addr;
-- 
1.8.3.4 (Apple Git-47)

Re: Externalizing health checks

Reply via email to