Re: [PATCH] [RFC] Decrease server health based on http responses / events

2009-12-09 Thread Willy Tarreau
Hi Krzysztof,

On Wed, Dec 09, 2009 at 01:23:57PM +0100, Krzysztof Ol?dzki wrote:
> There are four modes:
> 
>  - fastinter: force fastinter
> 
>  - failchk: simlate a failed check -> force fastinter

 OK so that one should decrease the health each time we get a
 series of errors, that's it ?
> 
>  - suddth (sudden death): simulate a pre-fatal failed health check, one
> more failed check will marke a server down
> 
>  - markdwn: mark a server down, immediately

OK, I think that covers a wide range of usage patterns.

(...)
> >What are those "events" supposed to check for ? I've not found them 
> >anywhere else.
> 
> Ineed. It is for pure TCP, where we can only track timeouts, resets, 
> etc. However, I'm looking for a good place where to attach those checks, 
> and for a better name - maybe "l3events"?

"events" by itself might not be the proper name then. For instance, a
timeout is precisely a lack of event. Maybe simply "errors" ? The
other ones are not errors, just plain valid status codes after all.

(...)
> >I'm just wondering about those options. Are we supposed to use only some 
> >of them
> >without the other ones ?
> 
> Yes, because each option has its default value, so both:
>  observe http-response onerror failcheck errors-limit 10
> and
>  observe http-response
> are identical - after 10 consecutive errors haproxy simulates a filed check.

OK.

> >I mean, maybe we could have sort of an error-react prefix
> >with its few parameters afterwards. Maybe something in that spirit :
> >
> >   error-react to  by  after 
> > 
> >It's just an idea, not necessarily something to follow.
> 
> Something like "error-react to http-response by failcheck after 10"?

yes, precisely. Another advantage would be that we could also
allow the statement on regular backend config (even defaults)
when it's supposed to be the same for all servers. It would
then be handled just like the "source" keyword : per-server,
then per-backend.

(...)
> >Are these three isolated changes on purpose, are they a mistake, or are 
> >they a
> >fix for something we want to backport or at least merge separately ? I'm 
> >asking
> >because at first glance it seems imbalanced with other changes. There is a
> >fourth one further about check_duration.
> 
> These three isolated changes are on purpose, bacuase now we can call 
> server_status_printf() when we simulate a failed halth check. Before 
> that we used to first start a check and set s->check_start and s->result 
> and then we called server_status_printf(), but it is no longer true.

OK.

(...)
> >Does this mean that we're now forced to at least switch to fast
> >inter in case of error or can we still use the current behaviour ?
> 
> Yes, by simply not enabling the functionality. By default it is not enabled.

OK that's fine. You know how I'm attached to keep backwards
compatibility :-)

Regards,
Willy




Brothersoft PPD Christmas Offers

2009-12-09 Thread tina
Hi Willy Tarreau,

I'm Tina,PPD Account Manager at Brothersoft.There is our Christmas Offers from 
12/10/2009 to 12/25/2009.Please feel free to contact me if you have any problem 
with the sales promotion.Thank you.

2009-12-10 



Please feel free to contact me with email or IM if anything I can help.
Best Regards,

Tina | Account Manager | Brothersoft.com
MSN & EMAIL: t...@brothersoft.com
Skype:tina.brothersoft


[BUG][MINOR] incorrect number of configuration files allowed

2009-12-09 Thread Cyril Bonté
Hi,
I've noticed that haproxy (1.3/1.4 branches) accepts 1 more configuration file 
than allowed.

The max default is fixed to 10 (#define MAX_CFG_FILES 10) but we can specify 11 
"-f" parameters.
This parameters are then stored in a 10 elements array.

In haproxy.c, the test
if (cfg_nbcfgfiles > MAX_CFG_FILES) {
should be :
if (cfg_nbcfgfiles >= MAX_CFG_FILES) {


-- 
Cyril Bonté



Session / Cookie related problem

2009-12-09 Thread Gaël Reignier
Hi everybody,

I have installed and configured HAProxy with pound in order to have SSL
termination at the SLB level and it worked OK until today and we discovered
a problem that does not make sense to me, I will try to explain it here:

So we have a website that does cross domain authentication with SSL: From
www.site.com you are authentication against my.site.com

Number of server:
When I have only 1 web server behind the SLB, everything is working
perfectly fine.
When I have 2 or more web servers behind the SLB, I am experiencing the
problem.


Cookies activation:
Here is how I activated the cookies:
cookie HAPROXYID insert indirect
server gr-web04 10.10.5.14 weight 10 check port 80 fastinter 1000 cookie
gr-web04

When the cookies are turned off, I notice the problem from time to time:
roughly once every 20 clicks
When the cookies are turned on, the problem happens once every 2 clicks (so
in 50% of the cases).


Explanation of what I am seeing:

The first request are going to the first web (http://www.site.com) site in
clear (HTTP) then they are going to another part of the site (
https://www.first.com) through SSL.
When it works you are then redirected to http://my.site.com and you carry
on...

As I understand the problem happens as the result of the script ran during
the SSL connection (I am sure the script works as when there is only 1
webserver for http and https, it works perfectly fine).
But I believe that when it is load balanced to another server in order to do
the SSL connection, then it is not happy...
I have noticed as well that the communication on www.site.com are done on
server A whereas communication on my.site.com are done on server B when it
is successful...

I have now spent a couple of days on the problem and I do not understand why
I am having is really random behaviour ... That does not make sense to me at
all.

If you want more information about the problem please let me know and I will
be happy to give you all the information you need!

Thanks a lot in advance!

Gael



-- 
Gaël Reignier

Contacts :
mail : gael.reign...@gmail.com
Twitter: gael.reignier
Skype: gael.reignier
Facebook
GSM UK: 0044 7 942 042 374
GSM FR: 0033 6 2306 8929


Re: [PATCH] [RFC] Decrease server health based on http responses / events

2009-12-09 Thread Krzysztof Olędzki

On 2009-12-08 22:37, Willy Tarreau wrote:

Hi Krzysztof,

Hi Willy,


it was fortunate that you reminded me about this mail because
I had lost it in the middle of a few others.


No problem. Like I told you in the private mail - it is doubtful I would 
have been able to get to it earlier, anyway.



On Sun, Oct 25, 2009 at 01:35:35AM +0200, Krzysztof Piotr Oledzki wrote:

Subject: [RFC] Decrease server health based on http responses / events

This RFC quality patch implements decreasing server health based on
observing communication between HAProxy and servers.

I have had a working patch for this for a long time, however I needed to
rewrite nearly everything to remove hardcoded values, add more modes and
to port it into 1.4. So after the rework there is nearly nothing left from
the old code. :| In the current status the code is expected to work but it
definitely needs more testing.


OK.


BTW: I'm not very happy with names of both functions and parameters,
If you have a better idea please don't hesitate to propose it. ;)


well, first, s/halth/health/g :-)


Eh. ;) I feel ashamed.


TODO: documentation, comments, pure tcp support.


Indeed, comments at least on the enums and struct members would make
the review *much* easier. Even small abbreviated ones :-/


I assumed that everithing should be clear but it seems it is true only 
for the author of this code. :| Sorry for that.



diff --git a/include/common/defaults.h b/include/common/defaults.h
index b0aee86..ae2f65c 100644
--- a/include/common/defaults.h
+++ b/include/common/defaults.h
@@ -120,6 +120,9 @@
 #define DEF_CHECK_REQ   "OPTIONS / HTTP/1.0\r\n\r\n"
 #define DEF_SMTP_CHECK_REQ   "HELO localhost\r\n"
 
+#define DEF_HANA_ONERR	HANA_ONERR_FAILCHK

+#define DEF_CELIMIT10
+
 // X-Forwarded-For header default
 #define DEF_XFORWARDFOR_HDR"X-Forwarded-For"
 
diff --git a/include/proto/checks.h b/include/proto/checks.h

index bd70164..2d16976 100644
--- a/include/proto/checks.h
+++ b/include/proto/checks.h
@@ -29,6 +29,7 @@ const char *get_check_status_description(short check_status);
 const char *get_check_status_info(short check_status);
 struct task *process_chk(struct task *t);
 int start_checks();
+void halth_analyze(struct server *s, short status);
 
 #endif /* _PROTO_CHECKS_H */
 
diff --git a/include/types/checks.h b/include/types/checks.h

index 1b04608..3690aa5 100644
--- a/include/types/checks.h
+++ b/include/types/checks.h
@@ -18,6 +18,9 @@ enum {
 
 	/* Below we have finished checks */

HCHK_STATUS_CHECKED,/* DUMMY STATUS */
+
+   HCHK_STATUS_HANA,   /* Detected enough consecutive errors */
+
HCHK_STATUS_SOCKERR,/* Socket error */
 
 	HCHK_STATUS_L4OK,		/* L4 check passed, for example tcp connect */

@@ -41,6 +44,39 @@ enum {
HCHK_STATUS_SIZE
 };
 
+enum {

+   HANA_UNKNOWN= 0,
+
+   HANA_TCP_OK,
+
+   HANA_HTTP_OK,
+   HANA_HTTP_STS,
+   HANA_HTTP_HDRRSP,
+   HANA_HTTP_RSP,
+
+   HANA_READ_ERROR,
+   HANA_READ_TIMEOUT,
+   HANA_BROKEN_PIPE,
+
+   HANA_SIZE
+};
+
+enum {
+   HANA_ONERR_UNKNOWN  = 0,
+
+   HANA_ONERR_FASTINTER,
+   HANA_ONERR_FAILCHK,
+   HANA_ONERR_SUDDTH,
+   HANA_ONERR_MARKDWN,


After reading the code, I'm not sure I got the difference between
the last two modes (sudden death and mark down).


There are four modes:

 - fastinter: force fastinter

 - failchk: simlate a failed check -> force fastinter

 - suddth (sudden death): simulate a pre-fatal failed health check, one
more failed check will marke a server down

 - markdwn: mark a server down, immediately


+};
+
+enum {
+   HANA_OBS_NONE   = 0,
+
+   HANA_OBS_EVENTS,
+   HANA_OBS_HTTP_RSPS,
+};
+
 struct check_status {
short result;   /* one of SRV_CHK_* */
char *info; /* human readable short info */
diff --git a/include/types/server.h b/include/types/server.h
index b3fe83d..b163190 100644
--- a/include/types/server.h
+++ b/include/types/server.h
@@ -115,7 +115,10 @@ struct server {
struct sockaddr_in check_addr;  /* the address to check, if different 
from  */
short check_port;   /* the port to use for the 
health checks */
int health; /* 0->rise-1 = bad; 
rise->rise+fall-1 = good */
+   int consecutive_errors; /* */
int rise, fall; /* time in iterations */
+   int consecutive_errors_limit;   /* */
+   short observe, onerror; /* */
int inter, fastinter, downinter;/* checks: time in milliseconds 
*/
int slowstart;  /* slowstart time in seconds 
(ms in the conf) */
int result; /* health-check result : 
SRV_CHK_* */
@@ -137,7 +140,7 @@ struct server {
unsigned down_time; /* total time the