On Sun, 30 Aug 2009, Willy Tarreau wrote:

Hi Krzysztof,
Hi Willy,

On Sun, Aug 30, 2009 at 02:21:55PM +0200, Krzysztof Oledzki wrote:
Please check the attached patch. This code is far from being ready for
inclusion, however I cleaned it and ported to 1.4-dev2, so it should work
for you.

This is a really nice work.

Thanks.

I'm eseing a few FIXMEs with random status
codes which look like debugging tags. I think they will not have any
impact on the tests though.

Exactly, they poits where I have something to deal with, mostly to find the proper code to return or to add some docs/comments. ;)

My thought about this feature was to store two statuses :
 - the one which made the server change state
 - the last one

Indeed, it's very common to see a server first timeout, then crash.
Or it may also return 500 server error or things like this, which
will not be available anymore afterwards. And we need to keep the
last one because it's what helps people troubleshoot in production
typically when one error hides another one.

Right, it is a very good idea.

However, this is something which can be implemented incrementally.
So we can start with your patch and improve it later.

OK. ;)

With this patch you should be able to see status of last check in the
stats page and to check in you logs, why servers are considered down, for
example:

[WARNING] 241/141556 (1971) : Backup Server p1/s1 is DOWN, reason: Layer5-7
response error(10), code: 400, check duration: 0ms. 0 active and 4 backup
servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining
in queue.

This really is excellent. I'm already impatient to merge it ;-)

Thanks, again. :)

I have a few comments/questions below :

@@ -474,40 +572,69 @@
        if (s->proxy->options & PR_O_HTTP_CHK) {
                /* Check if the server speaks HTTP 1.X */
                if ((len < strlen("HTTP/1.0 000\r")) ||
-                   (memcmp(trash, "HTTP/1.", 7) != 0)) {
+                   (memcmp(trash, "HTTP/1.", 7) != 0 ||
+                   (trash[12] != ' ' && trash[12] != '\r')) ||
+                   !isdigit(trash[9]) || !isdigit(trash[10]) || 
!isdigit(trash[11])) {

here you're adding some tests to more accurately check the HTTP
status line. Is it just for cleanness or did you encounter cases
where the check looked valid but was not ?

Mostly for cleanness, however as I need to convert string to int below it is nice to make sure it is really a three-digit-string. Alternatively I could use strtol, however I found it easier to do everything in one place.

+               trash[12] = '\0';
+               s->check_code = atoi(&trash[9]);

you can use one of the str2* functions (see standard.h and standard.c)
which automatically stops at the first non-digit character.

OK, will do.

                /* check the reply : HTTP/1.X 2xx and 3xx are OK */
-               if (trash[9] == '2' || trash[9] == '3')
+               if (trash[9] == '2' || trash[9] == '3') {
                        s->result |= SRV_CHK_RUNNING;
-               else if ((s->proxy->options & PR_O_DISABLE404) &&
+                       set_server_check_status(s, HCHK_STATUS_L57OKD);

I think you wanted to put HCHK_STATUS_L57OK here, not OKD since we're
in the 2xx/3xx state and not 404 disable. Or maybe I misunderstood the
OKD status ?

OKD means we have Layer5-7 data avalible, like for example http code. Several times I found that some of my servers were misconfigured and were returning a 3xx code redirecting to a page-not-found webpage instead of doing a proper healt-check, so I think it is good to know what was the response, even if it was OK (2xx/3xx).

+               } else if ((s->proxy->options & PR_O_DISABLE404) &&
                         (s->state & SRV_RUNNING) &&
-                        (memcmp(&trash[9], "404", 3) == 0)) {
+                        (s->check_code == 404)) {
                        /* 404 may be accepted as "stopping" only if the server 
was up */
                        s->result |= SRV_CHK_RUNNING | SRV_CHK_DISABLE;
+                       set_server_check_status(s, HCHK_STATUS_L57OKD);
                }
-               else
+               else {
                        s->result |= SRV_CHK_ERROR;
+                       set_server_check_status(s, HCHK_STATUS_L57RSPERR);
+               }

(...)
        else if (s->proxy->options & PR_O_SMTP_CHK) {
+               /* Check if the server speaks SMTP */
+               if ((len < strlen("000\r")) ||
+                   (trash[3] != ' ' && trash[3] != '\r') ||
+                   !isdigit(trash[0]) || !isdigit(trash[1]) || 
!isdigit(trash[2])) {
+                       s->result |= SRV_CHK_ERROR;
+                       set_server_check_status(s, HCHK_STATUS_L57INVRSP);
+                       goto out_wakeup;
+               }
+
+               trash[3] = '\0';
+               s->check_code = atoi(&trash[0]);

same here for the status code conversion.

Ack.

+
                /* Check for SMTP code 2xx (should be 250) */
-               if ((len >= 3) && (trash[0] == '2'))
+               if (trash[0] == '2') {
                        s->result |= SRV_CHK_RUNNING;
-               else
+                       set_server_check_status(s, HCHK_STATUS_L57OKD);

L57OK here too ?

L57OKD, like in the http case.

Best regards,

                                Krzysztof Olędzki

Reply via email to