Re: Problems with layer7 check timeout

2012-05-24 Thread Willy Tarreau
Hi again Kevin,

Well, I suspect that there might be a corner case with the bug I fixed
which might have caused what you observed.

The "timeout connect" is computed from the last expire date. Since
"timeout check" was added upon connection establishment but the task
was woken too late, then that after a first check failure reported
too late, you can have the next check timeout shortened.

It's still unclear to me how it is possible that the check timeout is
reported this small, considering that it's updated once the connect
succeeds. But performing computations in the past is never a good way
to have something reliable.

Could you please apply the attached fix for the bug I mentionned in
previous mail, to see if the issue is still present ? After all, I
would not be totally surprized if this bug has nasty side effects
like this.

Thanks,
Willy
 
>From 78604116c3cbe23987bef94cb0d7aa15e6d4371b Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Fri, 25 May 2012 07:41:38 +0200
Subject: [PATCH] BUG/MINOR: checks: expire on timeout.check if smaller than 
timeout.connect

It happens that haproxy doesn't displace the task in the wait queue when
validating a connection, so if the check timeout is set to a smaller value
than timeout.connect, it will not strike before timeout.connect.

The bug is present at least in 1.4.15..1.4.21, so the fix must be backported.
(cherry picked from commit 1e44a49c8973f08ee1e35d8737f4677db11cf7ab)
---
 src/checks.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/src/checks.c b/src/checks.c
index 7255817..0aa65c0 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -781,8 +781,10 @@ static int event_srv_chk_w(int fd)
ret = send(fd, check_req, check_len, MSG_DONTWAIT | 
MSG_NOSIGNAL);
if (ret == check_len) {
/* we allow up to  if nonzero 
for a responce */
-   if (s->proxy->timeout.check)
+   if (s->proxy->timeout.check) {
t->expire = tick_add_ifset(now_ms, 
s->proxy->timeout.check);
+   task_queue(t);
+   }
EV_FD_SET(fd, DIR_RD);   /* prepare for reading 
reply */
goto out_nowake;
}
-- 
1.7.2.1.45.g54fbc



Re: Problems with layer7 check timeout

2012-05-24 Thread Willy Tarreau
Hi Kevin,

On Thu, May 24, 2012 at 09:01:43PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
> Monsieur Tarreau,
> 
> Actually, we are seeing frontend service availability flapping. This morning
> particularly.  Missing from my snippet is the logic for an unplanned outage
> landing page, that our customers were seeing this morning, so it haproxy
> truly is "timing out" and marking each backend as down until there are no
> backend servers available, throwing up the unplanned outage landing page.

I'm not surprized, if you observe that checks should last more than 1 second
to work correctly :-/

I have tested your configuration here. First I can say it's not bad reporting,
as the timers in the logs are the correct ones. Second, I noticed a bug, when
both "timeout check" and "timeout connect" are set, haproxy uses the largest
of the two for the check ! The reason is that it did not displace the task in
the queue upon connection establishment if the check timeout is smaller than
the connect timeout. I've fixed it now.

This aside, I could not reproduce the issue here. I agree with Baptiste, a
tcpdump would certainly help. Maybe we're facing corner case issues which
I did not test, such as servers sending partial responses or things like
this, I don't know.

Regards,
Willy




CISCO Workshop on: Strategic Management of Technology and Resources to Increase Attorney Productivity

2012-05-24 Thread Ellena Wright
Dear Reader,

Invitation to attend GOAL’s next workshop led by - Tanya Vaislev,
Senior Manager, Legal Cisco Systems, Inc., USA

Topic: Strategic Management of Technology and Resources to Increase
Attorney Productivity

Venue: At your desk: On your Laptop/PC or Phone
Date: 29 May 2012
Time: 9:00 am PDT/11:00 am CDT/12:00 pm EDT
Duration: 60 minutes including Q & A
Live Participation & Podcast: US$ 49 (Free for Gold Members and General
Counsels)
Access the Recorded Version (Podcast) only: US$ 49

The purchase price of this event will also give you FREE access to one
additional LPO/IP Offshoring Podcasts from our library (same or less
price!). Over and above, upon your registration you will be awarded
with the Silver Membership at GOAL, absolutely free of cost.

For registration, please contact: +1-562-366-4706 or reply to this
email.


Best regards,
Ellena Wright
Executive, Global Outsourcing Association of Lawyers (GOAL)

PS: To access the library of Podcasts of our previous Legal/IP
Offshoring webinars, please indicate. We can send you the requisite
information.

If you don’t want to receive emails, please reply to this email with
the subject line ‘unsubscribe’.



Re: could haproxy call redis for a result?

2012-05-24 Thread Baptiste
Hi,

I'm jut guessing, but to me it can work for URLs only, so in your
case, it will match "/", "/customer/12345", and
"/some/path?customerId=12345".
For now, the string table can't have a concatenated string of 2
information, Host header and URL in your case.
But who knows, maybe this feature will arrive soon too :)

cheers



On Thu, May 24, 2012 at 6:33 PM, S Ahmed  wrote:
> Baptiste,
>
> Whenever this feature will be implemented, will it work for a specific url
> like:
>
> subdomain1.example.com
>
> What about by query string?  like:
>
> www.example.com/customer/12345
>
> or
>
> www.example.com/some/path?customerId=12345
>
>
> Will it work for all the above?
>
> On Tue, May 8, 2012 at 9:38 PM, S Ahmed  wrote:
>>
>> Yes it is the lookup that I am worried about.
>>
>>
>> On Tue, May 8, 2012 at 5:46 PM, Baptiste  wrote:
>>>
>>> Hi,
>>>
>>> Willy has just released 1.5-dev9, but unfortunately the track
>>> functions can't yet track strings (and so URLs).
>>> I'll let you know once a nightly snapshot could do it and we could
>>> work on a proof of concept configuration.
>>>
>>> Concerning 250K URLs, that should not be an issue at all to store them.
>>> Maybe looking for one URL could have a performance impact, we'll see.
>>>
>>> cheers
>>>
>>> On Tue, May 8, 2012 at 10:00 PM, S Ahmed  wrote:
>>> > Great.
>>> >
>>> > So any ideas how many urls one can story in these sticky tables before
>>> > it
>>> > becomes a problem?
>>> >
>>> > Would 250K be something of a concern?
>>> >
>>> >
>>> > On Tue, May 8, 2012 at 11:26 AM, Baptiste  wrote:
>>> >>
>>> >> On Tue, May 8, 2012 at 3:25 PM, S Ahmed  wrote:
>>> >> > Ok that sounds awesome, how will that work though?  i.e. from say
>>> >> > java,
>>> >> > how
>>> >> > will I do that?
>>> >> >
>>> >> > From what your saying it sounds like I will just have to modify the
>>> >> > response
>>> >> > add and a particular header.  And on the flip side, if I want to
>>> >> > unblock
>>> >> > I'll make a http request with something in the header that will
>>> >> > unblock
>>> >> > it?
>>> >> >
>>> >>
>>> >> That's it.
>>> >> You'll have to track these headers with ACLs in HAProxy and to update
>>> >> the stick table accordingly.
>>> >> Then based on the value setup in the stick table, HAProxy can decide
>>> >> whether it will allow or reject the request.
>>> >>
>>> >> > When do you think this will go live?
>>> >> >
>>> >>
>>> >> In an other mail, Willy said he will release 1.5-dev9 today.
>>> >> So I guess it won't be too long now. Worste case would be later in the
>>> >> week or next week.
>>> >>
>>> >> cheers
>>> >
>>> >
>>
>>
>



Re: Problems with layer7 check timeout

2012-05-24 Thread Baptiste
Hi Lange,

Would it be possible to take a trace (tcpdump) of the health check?
This may help as well.

Cheers


On Fri, May 25, 2012 at 4:01 AM, Lange, Kevin M. (GSFC-423.0)[RAYTHEON
COMPANY]  wrote:
> Monsieur Tarreau,
>
> Actually, we are seeing frontend service availability flapping. This morning
> particularly.  Missing from my snippet is the logic for an unplanned outage
> landing page, that our customers were seeing this morning, so it haproxy
> truly is "timing out" and marking each backend as down until there are no
> backend servers available, throwing up the unplanned outage landing page.
>
> I'll send more logs and details when I analyze later.
>
> Regards,
> Kevin Lange
>
>
> 
> Kevin M Lange
> Mission Operations and Services
> NASA EOSDIS Evolution and Development
> Intelligence and Information Systems
> Raytheon Company
>
> +1 (301) 851-8450 (office)
> +1 (301) 807-2457 (cell)
> kevin.m.la...@nasa.gov
> kla...@raytheon.com
>
> 5700 Rivertech Court
> Riverdale, Maryland 20737
>
> - Reply message -
> From: "Willy Tarreau" 
> Date: Thu, May 24, 2012 5:18 pm
> Subject: Problems with layer7 check timeout
> To: "Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]"
> 
> Cc: "haproxy@formilux.org" 
>
> Hi Kevin,
>
> On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M.
> (GSFC-423.0)[RAYTHEON COMPANY] wrote:
>> Hi,
>> We're having odd behavior (apparently have always but didn't realize it),
>> where our backend httpchks "time out":
>>
>> May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup
>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
>> servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup
>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup
>> servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>
>>
>> We've been playing with the timeout values, and we don't know what is
>> controlling the "Layer7 timeout, check duration: 1002ms".  The backend
>> service availability check (by hand) typically takes 2-3 seconds on average.
>> Here is the relevant haproxy setup.
>>
>> #-
>> # Global settings
>> #-
>> global
>> log-send-hostname opsslb1
>> log 127.0.0.1 local1 info
>> #    chroot  /var/lib/haproxy
>> pidfile /var/run/haproxy.pid
>> maxconn 1024
>> user    haproxy
>> group   haproxy
>> daemon
>>
>> #-
>> # common defaults that all the 'listen' and 'backend' sections will
>> # use if not designated in their block
>> #-
>> defaults
>> mode    http
>> log global
>> option  dontlognull
>> option  httpclose
>> option  httplog
>> option  forwardfor
>> option  redispatch
>> timeout connect 500 # default 10 second time out if a backend is not
>> found
>> timeout client 5
>> timeout server 360
>> maxconn 6
>> retries 3
>>
>> frontend webapp_ops_ft
>>
>> bind 10.0.40.209:80
>> default_backend webapp_ops_bk
>>
>> backend web

Re: Problems with layer7 check timeout

2012-05-24 Thread Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]
Monsieur Tarreau,

Actually, we are seeing frontend service availability flapping. This morning 
particularly.  Missing from my snippet is the logic for an unplanned outage 
landing page, that our customers were seeing this morning, so it haproxy truly 
is "timing out" and marking each backend as down until there are no backend 
servers available, throwing up the unplanned outage landing page.

I'll send more logs and details when I analyze later.

Regards,
Kevin Lange


Kevin M Lange
Mission Operations and Services
NASA EOSDIS Evolution and Development
Intelligence and Information Systems
Raytheon Company

+1 (301) 851-8450 (office)
+1 (301) 807-2457 (cell)
kevin.m.la...@nasa.gov
kla...@raytheon.com

5700 Rivertech Court
Riverdale, Maryland 20737

- Reply message -
From: "Willy Tarreau" 
Date: Thu, May 24, 2012 5:18 pm
Subject: Problems with layer7 check timeout
To: "Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]" 
Cc: "haproxy@formilux.org" 

Hi Kevin,

On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
> Hi,
> We're having odd behavior (apparently have always but didn't realize it), 
> where our backend httpchks "time out":
>
> May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
> May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
> May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
> May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>
>
> We've been playing with the timeout values, and we don't know what is 
> controlling the "Layer7 timeout, check duration: 1002ms".  The backend 
> service availability check (by hand) typically takes 2-3 seconds on average.
> Here is the relevant haproxy setup.
>
> #-
> # Global settings
> #-
> global
> log-send-hostname opsslb1
> log 127.0.0.1 local1 info
> #chroot  /var/lib/haproxy
> pidfile /var/run/haproxy.pid
> maxconn 1024
> userhaproxy
> group   haproxy
> daemon
>
> #-
> # common defaults that all the 'listen' and 'backend' sections will
> # use if not designated in their block
> #-
> defaults
> modehttp
> log global
> option  dontlognull
> option  httpclose
> option  httplog
> option  forwardfor
> option  redispatch
> timeout connect 500 # default 10 second time out if a backend is not found
> timeout client 5
> timeout server 360
> maxconn 6
> retries 3
>
> frontend webapp_ops_ft
>
> bind 10.0.40.209:80
> default_backend webapp_ops_bk
>
> backend webapp_ops_bk
> balance roundrobin
> option httpchk HEAD /app/availability
> reqrep ^Host:.* Host:\ webapp.example.com
> server webapp_ops1 opsapp1.ops.example.com:41000 check inter 3
> server webapp_ops2 opsapp2.ops.example.com:41000 check inter 3
> server webapp_ops3 op

patch: on-marked-up option

2012-05-24 Thread Justin Karneges
Hi,

This implements the feature discussed in the earlier thread of killing 
connections on backup servers when a non-backup server comes back up. For 
example, you can use this to route to a mysql master & slave and ensure 
clients don't stay on the slave after the master goes from down->up. I've done 
some minimal testing and it seems to work.

Today is the first time I ever looked at haproxy's code but this feature seemed 
straightforward enough to implement. I hope I've done it properly.

Justin
diff --git a/include/types/checks.h b/include/types/checks.h
index fd15c95..250a68f 100644
--- a/include/types/checks.h
+++ b/include/types/checks.h
@@ -80,6 +80,12 @@ enum {
 };
 
 enum {
+	HANA_ONMARKEDUP_NONE	= 0,
+
+	HANA_ONMARKEDUP_SHUTDOWNBACKUPSESSIONS,	/* Shutdown peer sessions */
+};
+
+enum {
 	HANA_OBS_NONE		= 0,
 
 	HANA_OBS_LAYER4,		/* Observe L4 - for example tcp */
diff --git a/include/types/server.h b/include/types/server.h
index aa2c4f8..1885eab 100644
--- a/include/types/server.h
+++ b/include/types/server.h
@@ -120,7 +120,8 @@ struct server {
 	int rise, fall;/* time in iterations */
 	int consecutive_errors_limit;		/* number of consecutive errors that triggers an event */
 	short observe, onerror;			/* observing mode: one of HANA_OBS_*; what to do on error: on of ANA_ONERR_* */
-	short onmarkeddown;			/* what to do when marked down: on of HANA_ONMARKEDDOWN_* */
+	short onmarkeddown;			/* what to do when marked down: one of HANA_ONMARKEDDOWN_* */
+	short onmarkedup;			/* what to do when marked up: one of HANA_ONMARKEDUP_* */
 	int inter, fastinter, downinter;	/* checks: time in milliseconds */
 	int slowstart;/* slowstart time in seconds (ms in the conf) */
 	int result;/* health-check result : SRV_CHK_* */
diff --git a/include/types/session.h b/include/types/session.h
index f1b7451..a098002 100644
--- a/include/types/session.h
+++ b/include/types/session.h
@@ -67,6 +67,7 @@
 #define SN_ERR_INTERNAL	0x7000	/* the proxy encountered an internal error */
 #define SN_ERR_DOWN	0x8000	/* the proxy killed a session because the backend became unavailable */
 #define SN_ERR_KILLED	0x9000	/* the proxy killed a session because it was asked to do so */
+#define SN_ERR_UP	0xa000	/* the proxy killed a session because a preferred backend became available */
 #define SN_ERR_MASK	0xf000	/* mask to get only session error flags */
 #define SN_ERR_SHIFT	12		/* bit shift */
 
diff --git a/src/cfgparse.c b/src/cfgparse.c
index 5bd2cfc..92dd094 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -4392,6 +4392,18 @@ stats_error_parsing:
 
 cur_arg += 2;
 			}
+			else if (!strcmp(args[cur_arg], "on-marked-up")) {
+if (!strcmp(args[cur_arg + 1], "shutdown-backup-sessions"))
+	newsrv->onmarkedup = HANA_ONMARKEDUP_SHUTDOWNBACKUPSESSIONS;
+else {
+	Alert("parsing [%s:%d]: '%s' expects 'shutdown-backup-sessions' but got '%s'\n",
+		file, linenum, args[cur_arg], args[cur_arg + 1]);
+	err_code |= ERR_ALERT | ERR_FATAL;
+	goto out;
+}
+
+cur_arg += 2;
+			}
 			else if (!strcmp(args[cur_arg], "error-limit")) {
 if (!*args[cur_arg + 1]) {
 	Alert("parsing [%s:%d]: '%s' expects an integer argument.\n",
diff --git a/src/checks.c b/src/checks.c
index febf77e..5299e98 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -358,15 +358,26 @@ static int check_for_pending(struct server *s)
 	return xferred;
 }
 
-/* Shutdown connections when their server goes down.
+/* Shutdown all connections of a server
  */
-static void shutdown_sessions(struct server *srv)
+static void shutdown_sessions(struct server *srv, int why)
 {
 	struct session *session, *session_bck;
 
 	list_for_each_entry_safe(session, session_bck, &srv->actconns, by_srv)
 		if (session->srv_conn == srv)
-			session_shutdown(session, SN_ERR_DOWN);
+			session_shutdown(session, why);
+}
+
+/* Shutdown all connections of all backup servers of a proxy
+ */
+static void shutdown_backup_sessions(struct proxy *px, int why)
+{
+	struct server *srv;
+
+	for (srv = px->srv; srv != NULL; srv = srv->next)
+		if (srv->state & SRV_BACKUP)
+			shutdown_sessions(srv, why);
 }
 
 /* Sets server  down, notifies by all available means, recounts the
@@ -394,7 +405,7 @@ void set_server_down(struct server *s)
 			s->proxy->lbprm.set_server_status_down(s);
 
 		if (s->onmarkeddown & HANA_ONMARKEDDOWN_SHUTDOWNSESSIONS)
-			shutdown_sessions(s);
+			shutdown_sessions(s, SN_ERR_DOWN);
 
 		/* we might have sessions queued on this server and waiting for
 		 * a connection. Those which are redispatchable will be queued
@@ -465,6 +476,9 @@ void set_server_up(struct server *s) {
 		s->state |= SRV_RUNNING;
 		s->state &= ~SRV_MAINTAIN;
 
+		if (s->onmarkedup & HANA_ONMARKEDUP_SHUTDOWNBACKUPSESSIONS)
+			shutdown_backup_sessions(s->proxy, SN_ERR_UP);
+
 		if (s->slowstart > 0) {
 			s->state |= SRV_WARMINGUP;
 			if (s->proxy->lbprm.algo & BE_LB_PROP_DYN) {


Re: Problems with layer7 check timeout

2012-05-24 Thread Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]
I've already put an upgrade to haproxy in place.


Kevin M Lange
Mission Operations and Services
NASA EOSDIS Evolution and Development
Intelligence and Information Systems
Raytheon Company

+1 (301) 851-8450 (office)
+1 (301) 807-2457 (cell)
kevin.m.la...@nasa.gov
kla...@raytheon.com

5700 Rivertech Court
Riverdale, Maryland 20737

- Reply message -
From: "Willy Tarreau" 
Date: Thu, May 24, 2012 5:59 pm
Subject: Problems with layer7 check timeout
To: "Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]" 
Cc: "haproxy@formilux.org" 

On Thu, May 24, 2012 at 04:31:39PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
> I thought it was a bug in the reporting, considering we've played with 
> numerous values for the various timeouts as an experiment, but wanted your 
> thoughts.
> This is v1.4.15.
>
>  [root@opsslb1 log]# haproxy -v
> HA-Proxy version 1.4.15 2011/04/08
> Copyright 2000-2010 Willy Tarreau 

OK, I'll try to reproduce. There have been a number of fixes since 1.4.15
BTW, but none of them look like what you observe. Still it would be
reasonable to consider an upgrade to 1.4.21.

Regards,
Willy



Re: Problems with layer7 check timeout

2012-05-24 Thread Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]
Err...more precisely...
HA-Proxy version 1.4.15 2011/04/08
Copyright 2000-2010 Willy Tarreau 

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing
  OPTIONS = USE_REGPARM=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
 sepoll : pref=400,  test result OK
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.

On May 24, 2012, at 5:31 PM, Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY] 
wrote:

> 
> 
> I thought it was a bug in the reporting, considering we've played with 
> numerous values for the various timeouts as an experiment, but wanted your 
> thoughts.
> This is v1.4.15.
> 
> [root@opsslb1 log]# haproxy -v
> HA-Proxy version 1.4.15 2011/04/08
> Copyright 2000-2010 Willy Tarreau 
> 
> On May 24, 2012, at 5:17 PM, Willy Tarreau wrote:
> 
>> Hi Kevin,
>> 
>> On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. 
>> (GSFC-423.0)[RAYTHEON COMPANY] wrote:
>>> Hi,
>>> We're having odd behavior (apparently have always but didn't realize it), 
>>> where our backend httpchks "time out":
>>> 
>>> May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>>> servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>> May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>> 
>>> 
>>> We've been playing with the timeout values, and we don't know what is 
>>> controlling the "Layer7 timeout, check duration: 1002ms".  The backend 
>>> service availability check (by hand) typically takes 2-3 seconds on average.
>>> Here is the relevant haproxy setup.
>>> 
>>> #-
>>> # Global settings
>>> #-
>>> global
>>>   log-send-hostname opsslb1
>>>   log 127.0.0.1 local1 info
>>> #chroot  /var/lib/haproxy
>>>   pidfile /var/run/haproxy.pid
>>>   maxconn 1024
>>>   userhaproxy
>>>   group   haproxy
>>>   daemon
>>> 
>>> #-
>>> # common defaults that all the 'listen' and 'backend' sections will
>>> # use if not designated in their block
>>> #-
>>> defaults
>>>   modehttp
>>>   log global
>>>   option  dontlognull
>>>   option  httpclose
>>>   option  httplog
>>>   option  forwardfor
>>>   option  redispatch
>>>   timeout connect 500 # default 10 second time out if a backend is not found
>>>   timeout client 5
>>>   timeout server 360
>>>   maxconn 6
>>>   retries 3
>>> 
>>> frontend webapp_ops_ft
>>> 
>>>   bind 10.0.40.209:80
>>>   default_backend webapp_ops_bk
>>> 
>>> backend webapp_ops_bk
>>>   balance roundrobin
>>>   option httpchk HEAD /app/availability
>>>   reqrep ^Host:.* 

Re: Problems with layer7 check timeout

2012-05-24 Thread Willy Tarreau
On Thu, May 24, 2012 at 04:31:39PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
> I thought it was a bug in the reporting, considering we've played with 
> numerous values for the various timeouts as an experiment, but wanted your 
> thoughts.
> This is v1.4.15.
> 
>  [root@opsslb1 log]# haproxy -v
> HA-Proxy version 1.4.15 2011/04/08
> Copyright 2000-2010 Willy Tarreau 

OK, I'll try to reproduce. There have been a number of fixes since 1.4.15
BTW, but none of them look like what you observe. Still it would be
reasonable to consider an upgrade to 1.4.21.

Regards,
Willy




Re: Problems with layer7 check timeout

2012-05-24 Thread Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]
I thought it was a bug in the reporting, considering we've played with numerous 
values for the various timeouts as an experiment, but wanted your thoughts.
This is v1.4.15.

 [root@opsslb1 log]# haproxy -v
HA-Proxy version 1.4.15 2011/04/08
Copyright 2000-2010 Willy Tarreau 

On May 24, 2012, at 5:17 PM, Willy Tarreau wrote:

> Hi Kevin,
> 
> On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. 
> (GSFC-423.0)[RAYTHEON COMPANY] wrote:
>> Hi,
>> We're having odd behavior (apparently have always but didn't realize it), 
>> where our backend httpchks "time out":
>> 
>> May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>> servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> 
>> 
>> We've been playing with the timeout values, and we don't know what is 
>> controlling the "Layer7 timeout, check duration: 1002ms".  The backend 
>> service availability check (by hand) typically takes 2-3 seconds on average.
>> Here is the relevant haproxy setup.
>> 
>> #-
>> # Global settings
>> #-
>> global
>>log-send-hostname opsslb1
>>log 127.0.0.1 local1 info
>> #chroot  /var/lib/haproxy
>>pidfile /var/run/haproxy.pid
>>maxconn 1024
>>userhaproxy
>>group   haproxy
>>daemon
>> 
>> #-
>> # common defaults that all the 'listen' and 'backend' sections will
>> # use if not designated in their block
>> #-
>> defaults
>>modehttp
>>log global
>>option  dontlognull
>>option  httpclose
>>option  httplog
>>option  forwardfor
>>option  redispatch
>>timeout connect 500 # default 10 second time out if a backend is not found
>>timeout client 5
>>timeout server 360
>>maxconn 6
>>retries 3
>> 
>> frontend webapp_ops_ft
>> 
>>bind 10.0.40.209:80
>>default_backend webapp_ops_bk
>> 
>> backend webapp_ops_bk
>>balance roundrobin
>>option httpchk HEAD /app/availability
>>reqrep ^Host:.* Host:\ webapp.example.com
>>server webapp_ops1 opsapp1.ops.example.com:41000 check inter 3
>>server webapp_ops2 opsapp2.ops.example.com:41000 check inter 3
>>server webapp_ops3 opsapp3.ops.example.com:41000 check inter 3
>>timeout check 15000
>>timeout connect 15000
> 
> This is quite strange. The timeout is defined first by "timeout check" or if
> unset, by "inter". So in your case you should observe a 15sec timeout, not
> one second.
> 
> What exact version is this ? (haproxy -vv)
> 
> It looks like a bug, however it could be a bug in the timeout handling as
> well as in the reporting. I'd suspect the latter since you're saying that
> the service takes 2-3 sec to respond and you don't seem to see errors
> that often.
> 

Re: Problems with layer7 check timeout

2012-05-24 Thread Willy Tarreau
Hi Kevin,

On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
> Hi,
> We're having odd behavior (apparently have always but didn't realize it), 
> where our backend httpchks "time out":
> 
> May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
> May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
> May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
> May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
> May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> 
> 
> We've been playing with the timeout values, and we don't know what is 
> controlling the "Layer7 timeout, check duration: 1002ms".  The backend 
> service availability check (by hand) typically takes 2-3 seconds on average.
> Here is the relevant haproxy setup.
> 
> #-
> # Global settings
> #-
> global
> log-send-hostname opsslb1
> log 127.0.0.1 local1 info
> #chroot  /var/lib/haproxy
> pidfile /var/run/haproxy.pid
> maxconn 1024
> userhaproxy
> group   haproxy
> daemon
> 
> #-
> # common defaults that all the 'listen' and 'backend' sections will
> # use if not designated in their block
> #-
> defaults
> modehttp
> log global
> option  dontlognull
> option  httpclose
> option  httplog
> option  forwardfor
> option  redispatch
> timeout connect 500 # default 10 second time out if a backend is not found
> timeout client 5
> timeout server 360
> maxconn 6
> retries 3
> 
> frontend webapp_ops_ft
> 
> bind 10.0.40.209:80
> default_backend webapp_ops_bk
> 
> backend webapp_ops_bk
> balance roundrobin
> option httpchk HEAD /app/availability
> reqrep ^Host:.* Host:\ webapp.example.com
> server webapp_ops1 opsapp1.ops.example.com:41000 check inter 3
> server webapp_ops2 opsapp2.ops.example.com:41000 check inter 3
> server webapp_ops3 opsapp3.ops.example.com:41000 check inter 3
> timeout check 15000
> timeout connect 15000

This is quite strange. The timeout is defined first by "timeout check" or if
unset, by "inter". So in your case you should observe a 15sec timeout, not
one second.

What exact version is this ? (haproxy -vv)

It looks like a bug, however it could be a bug in the timeout handling as
well as in the reporting. I'd suspect the latter since you're saying that
the service takes 2-3 sec to respond and you don't seem to see errors
that often.

Regards,
Willy




Problems with layer7 check timeout

2012-05-24 Thread Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]
Hi,
We're having odd behavior (apparently have always but didn't realize it), where 
our backend httpchks "time out":

May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.


We've been playing with the timeout values, and we don't know what is 
controlling the "Layer7 timeout, check duration: 1002ms".  The backend service 
availability check (by hand) typically takes 2-3 seconds on average.
Here is the relevant haproxy setup.

#-
# Global settings
#-
global
log-send-hostname opsslb1
log 127.0.0.1 local1 info
#chroot  /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 1024
userhaproxy
group   haproxy
daemon

#-
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#-
defaults
modehttp
log global
option  dontlognull
option  httpclose
option  httplog
option  forwardfor
option  redispatch
timeout connect 500 # default 10 second time out if a backend is not found
timeout client 5
timeout server 360
maxconn 6
retries 3

frontend webapp_ops_ft

bind 10.0.40.209:80
default_backend webapp_ops_bk

backend webapp_ops_bk
balance roundrobin
option httpchk HEAD /app/availability
reqrep ^Host:.* Host:\ webapp.example.com
server webapp_ops1 opsapp1.ops.example.com:41000 check inter 3
server webapp_ops2 opsapp2.ops.example.com:41000 check inter 3
server webapp_ops3 opsapp3.ops.example.com:41000 check inter 3
timeout check 15000
timeout connect 15000

Kevin Lange
kevin.m.la...@nasa.gov
kla...@raytheon.com
W: +1 (301) 851-8450
Raytheon  | NASA  | ECS Evolution Development Program
https://www.echo.com  | https://www.raytheon.com



smime.p7s
Description: S/MIME cryptographic signature


RE: haproxy conditional healthchecks/failover

2012-05-24 Thread Zulu Chas

> > Hi!
> >
> > I'm trying to use HAproxy to support the concepts of "offline", "in
> > maintenance mode", and "not working" servers.
> 
> Any good reason to do that???
> (I'm a bit curious)

Sure.  I want to be able to mark a machine offline by creating a file (as 
opposed to marking it online by creating a file), which is why I can't use 
disable-on-404 below.  This covers situations where I need to take a machine 
out of public-facing operation for some reason, but perhaps I still want it to 
be able to render pages etc -- maybe I'm testing a code deployment once it's 
already deployed in order to verify the system is ready to be marked online.
I also want to be able to mark a machine down for maintenance by creating a 
file, "maintenance.html", which apache will nicely rewrite URLs to etc. during 
critical deployment phases or when performing other maintenance.  In this case, 
I don't want it to render pages (usually to replace otherwise nasty-looking 500 
error pages with a nice html facade).
For normal operations, I want the machine to be up.  But if it's not 
intentionally placed "offline" or "in maintenance" and the machines fail 
heartbeat checks, then the machine is "not working" and should not be served 
requests.
Does this make sense?
> 
> >  I have separate health checks
> > for each condition and I have been trying to use ACLs to be able to switch
> > between backends.  In addition to the fact that this doesn't seem to work,
> > I'm also not loving having to repeat the server lists (which are the same)
> > for each backend.
> 
> Nothing weird here, this is how HAProxy configuration works.
Cool, but variables would be nice to save time and avoid potential 
inconsistencies between sections.
> > -- I think it's more like "if any of
> > these succeed, mark this server online" -- and that's what's making this
> > scenario complex.
> 
> euh I might misunderstanding something.
> There is nothing more simple that "if the health check is successful,
> then the server is considered healthy"...

Since it's not strictly binary, as described above, it's a bit more complex.

> > frontend staging 0.0.0.0:8080
> >   # if the number of servers *not marked offline* is *less than the total
> > number of app servers* (in this case, 2), then it is considered degraded
> >   acl degraded nbsrv(only_online) lt 2
> >
> 
> This will match 0 and 1
> 
> >   # if the number of servers *not marked offline* is *less than one*, the
> > site is considered down
> >   acl down nbsrv(only_online) lt 1
> >
> 
> This will match 0, so you're both down and degraded ACL covers the
> same value (0).
> Which may lead to an issue later
> 
> >   # if the number of servers without the maintenance page is *less than the
> > total number of app servers* (in this case, 2), then it is
> > considered maintenance mode
> >   acl mx_mode nbsrv(maintenance) lt 2
> >
> >   # if the number of servers without the maintenance page is less than 1,
> > we're down because everything is in maintenance mode
> >   acl down_mx nbsrv(maintenance) lt 1
> >
> 
> Same remark as above.
> 
> 
> >   # if not running at full potential, use the backend that identified the
> > degraded state
> >   use_backend only_online if degraded
> >   use_backend maintenance if mx_mode
> >
> >   # if we are down for any reason, use the backend that identified that fact
> >   use_backend backup_only if down
> >   use_backend backup_only if down_mx
> >
> 
> Here is the problem (see above).
> The 2 use_backend above will NEVER match, because the degraded ad
> mx_mode ACL overlaps their values!

Why would they never match?  Aren't you saying they *both* should match and 
wouldn't it then take action on the final match and switch the backend to 
maintenance mode?  That's what I want.  Maintenance mode overrides offline mode 
as a failsafe (since it's more restrictive) to prevent page rendering.
> Do you know the "disable-on-404" option?
> it may help you make your configuration in the right way (not
> considering a 404 as a healthy response).
> 

Yes, but what I actually would need is enable-on-404 :)
Thanks for your feedback!  I'm definitely open to other options, but I'm hoping 
to not have to lose the flexibility described above!
-chaz
  

Re: could haproxy call redis for a result?

2012-05-24 Thread S Ahmed
Baptiste,

Whenever this feature will be implemented, will it work for a specific url
like:

subdomain1.example.com

What about by query string?  like:

www.example.com/customer/12345

or

www.example.com/some/path?customerId=12345


Will it work for all the above?

On Tue, May 8, 2012 at 9:38 PM, S Ahmed  wrote:

> Yes it is the lookup that I am worried about.
>
>
> On Tue, May 8, 2012 at 5:46 PM, Baptiste  wrote:
>
>> Hi,
>>
>> Willy has just released 1.5-dev9, but unfortunately the track
>> functions can't yet track strings (and so URLs).
>> I'll let you know once a nightly snapshot could do it and we could
>> work on a proof of concept configuration.
>>
>> Concerning 250K URLs, that should not be an issue at all to store them.
>> Maybe looking for one URL could have a performance impact, we'll see.
>>
>> cheers
>>
>> On Tue, May 8, 2012 at 10:00 PM, S Ahmed  wrote:
>> > Great.
>> >
>> > So any ideas how many urls one can story in these sticky tables before
>> it
>> > becomes a problem?
>> >
>> > Would 250K be something of a concern?
>> >
>> >
>> > On Tue, May 8, 2012 at 11:26 AM, Baptiste  wrote:
>> >>
>> >> On Tue, May 8, 2012 at 3:25 PM, S Ahmed  wrote:
>> >> > Ok that sounds awesome, how will that work though?  i.e. from say
>> java,
>> >> > how
>> >> > will I do that?
>> >> >
>> >> > From what your saying it sounds like I will just have to modify the
>> >> > response
>> >> > add and a particular header.  And on the flip side, if I want to
>> unblock
>> >> > I'll make a http request with something in the header that will
>> unblock
>> >> > it?
>> >> >
>> >>
>> >> That's it.
>> >> You'll have to track these headers with ACLs in HAProxy and to update
>> >> the stick table accordingly.
>> >> Then based on the value setup in the stick table, HAProxy can decide
>> >> whether it will allow or reject the request.
>> >>
>> >> > When do you think this will go live?
>> >> >
>> >>
>> >> In an other mail, Willy said he will release 1.5-dev9 today.
>> >> So I guess it won't be too long now. Worste case would be later in the
>> >> week or next week.
>> >>
>> >> cheers
>> >
>> >
>>
>
>


Re: mysql failover and forcing disconnects

2012-05-24 Thread Justin Karneges
On Thursday, May 24, 2012 01:59:32 AM Willy Tarreau wrote:
> On Thu, May 24, 2012 at 01:12:14AM -0700, Justin Karneges wrote:
> > Well, the network could fail at anytime and have a similar effect. I'm
> > not sure if killing all connections to the backup is really any worse
> > than killing all connections to the non-backup (via on-marked-down).
> > Either way a bunch of client errors may occur, but for a scenario that
> > is hopefully rare.
> 
> Killing connections when something fails is acceptable to many people,
> but killing connections when everything goes well is generally not
> accepted.

This is certainly a sensible philosophy.

I think what makes my scenario special is that the backup server is 
functionally worse than the non-backup. So even though we are discussing a 
destructive response to a positive event, it's the quickest way to get the 
service out of a degraded state.

> > Maybe an "on-marked-up shutdown-backup-sessions" option would be good.
> 
> I was thinking about something like this, but I still have doubts about
> its real usefulness. I don't know what others think here. If there is
> real demand for this and people think it serves a real purpose, I'm fine
> with accepting a patch to implement it.

Thanks for being open. I'll mull this over some more and consider making a 
patch.

Justin



Re: [ANNOUNCE] haproxy 1.4.21

2012-05-24 Thread Kevin Decherf
Hi,

Just for archive: CVE-2012-2391
http://www.openwall.com/lists/oss-security/2012/05/23/15


Kevin Decherf - M: +33 681194547 - T: @Kdecherf


On Tue, May 22, 2012 at 9:30 PM, Vivek Malik  wrote:

> A recommended upgrade for all production users. While we are not
> (generally) affected by the bugs fixed in haproxy stable version. I
> recommend updating haproxy.
>
> I can update haproxy bin in puppet and can check it in (we distribute
> haproxy binary via puppetmaster).
>
> Aiman,
>
> Please update puppetmaster when you see fit and also in general, please
> ensure that puppet client is running on all machines.
>
> Thanks,
> Vivek
>
>
> On Mon, May 21, 2012 at 1:43 AM, Willy Tarreau  wrote:
>
>> Hi all,
>>
>> a number of old bugs were reported recently. Some of them are quite
>> problematic because they can lead to crashes while parsing configuration
>> or when starting up, which is even worse considering that startup scripts
>> will generally not notice it.
>>
>> Among the bugs fixed in 1.4.21, we can enumerate :
>>  - risk of crash if using reqrep/rsprep and having tune.bufsize manually
>>configured larger than what was compiled in. The cause is the trash
>>buffer used for the replace was still static, and I believed this was
>>fixed months ago but only my mailbox had the fix! Thanks to Dmitry
>>Sivachenko for reporting this bug.
>>
>>  - risk of crash when using header captures on a TCP frontend. This is a
>>configuration issue, and this situation is now correctly detected and
>>reported. Thanks to Olufemi Omojola for reporting this bug.
>>
>>  - risk of crash when some servers are declared with checks in a farm
>> which
>>does not use an LB algorithm (eg: "option transparent" or "dispatch").
>>This happens when a server state is updated and reported to the non-
>>existing LB algorithm. Fortunately, this happens at start-up when
>>reporting the servers either up or down, but still it's after the fork
>>and too late for being easily recovered from by scripts. Thanks to
>> David
>>Touzeau for reporting this bug.
>>
>>  - "balance source" did not correctly hash IPv6 addresses, so IPv4
>>connections to IPv6 listeners would always get the same result. Thanks
>>to Alex Markham for reporting this bug.
>>
>>  - the connect timeout was not properly reset upon connection
>> establishment,
>>resulting in a retry if the timeout struck exactly at the same
>> millisecond
>>the connect succeeded. The effect is that if a request was sent as
>> part of
>>the connect hanshake, it is not available for resend during the retry
>> and
>>a response timeout is reported for the server. Note that in practice,
>> this
>>only happens with erroneous configurations. Thanks to Yehuda Sadeh for
>>reporting this bug.
>>
>>  - the error captures were wrong if the buffer wrapped, which happens when
>>capturing incorrectly encoded chunked responses.
>>
>> I also backported Cyril's work on the stats page to allow POST params to
>> be
>> posted in any order, because I know there are people who script actions on
>> this page.
>>
>> This release also includes doc cleanups from Cyril, Dmitry Sivachenko and
>> Adrian Bridgett.
>>
>> Distro packagers will be happy to know that I added explicit checks to
>> shut
>> gcc warnings about unchecked write() return value in the debug code.
>>
>> While it's very likely that almost nobody is affected by the bugs above,
>> troubleshooting them is annoying enough to justify an upgrade.
>>
>> Sources, Linux/x86 and Solaris/sparc binaries are at the usual location :
>>
>>site index : http://haproxy.1wt.eu/
>>sources: http://haproxy.1wt.eu/download/1.4/src/
>>changelog  : http://haproxy.1wt.eu/download/1.4/src/CHANGELOG
>>binaries   : http://haproxy.1wt.eu/download/1.4/bin/
>>
>> Willy
>>
>>
>>
>


Re: mysql failover and forcing disconnects

2012-05-24 Thread Baptiste
On Thu, May 24, 2012 at 10:59 AM, Willy Tarreau  wrote:
> On Thu, May 24, 2012 at 01:12:14AM -0700, Justin Karneges wrote:
>> Well, the network could fail at anytime and have a similar effect. I'm not 
>> sure
>> if killing all connections to the backup is really any worse than killing all
>> connections to the non-backup (via on-marked-down). Either way a bunch of
>> client errors may occur, but for a scenario that is hopefully rare.
>
> Killing connections when something fails is acceptable to many people,
> but killing connections when everything goes well is generally not accepted.
>
>> Maybe an "on-marked-up shutdown-backup-sessions" option would be good.
>
> I was thinking about something like this, but I still have doubts about
> its real usefulness. I don't know what others think here. If there is
> real demand for this and people think it serves a real purpose, I'm fine
> with accepting a patch to implement it.
>
> Willy
>
>

It's like the "preempt" in VRRP and it may make sense for any protocol
relying on long connections, like HTTP tunnel mode, rdp, IMAP/POP,
etc...

To me it makes sense :)

cheers



Re: mysql failover and forcing disconnects

2012-05-24 Thread Willy Tarreau
On Thu, May 24, 2012 at 01:12:14AM -0700, Justin Karneges wrote:
> Well, the network could fail at anytime and have a similar effect. I'm not 
> sure 
> if killing all connections to the backup is really any worse than killing all 
> connections to the non-backup (via on-marked-down). Either way a bunch of 
> client errors may occur, but for a scenario that is hopefully rare.

Killing connections when something fails is acceptable to many people,
but killing connections when everything goes well is generally not accepted.

> Maybe an "on-marked-up shutdown-backup-sessions" option would be good.

I was thinking about something like this, but I still have doubts about
its real usefulness. I don't know what others think here. If there is
real demand for this and people think it serves a real purpose, I'm fine
with accepting a patch to implement it.

Willy




Re: mysql failover and forcing disconnects

2012-05-24 Thread Justin Karneges
On Wednesday, May 23, 2012 11:57:14 PM Willy Tarreau wrote:
> There is an option at the server level which is "on-marked-down
> shutdown-session". It achieves exactly what you want, it will kill all
> connections to a server which is detected as down.

Perfect!

> > 2) If the master eventually comes back, all connections that ended up
> > routing to the slave will stay on the slave indefinitely. The only
> > solution I have for this is to restart mysql on the slave, which kicks
> > everyone off causing them to reconnect and get routed back to the
> > master. This is acceptable if restoring master required some kind of
> > manual maintenance, since I'd already be getting my hands dirty anyway.
> > However, if master disappears and comes back due to brief network outage
> > that resolves itself automatically, it's unfortunate that I'd still have
> > to manually react to this by kicking everyone off the slave.
> 
> There is no universal solution for this. As haproxy doesn't inspect the
> mysql traffic, it cannot know when a connection remains idle and unused.
> Making it arbitrarily kill connections to a working server would be the
> worst thing to do, as it would kill connections on which a transaction is
> waiting for being completed.

Well, the network could fail at anytime and have a similar effect. I'm not sure 
if killing all connections to the backup is really any worse than killing all 
connections to the non-backup (via on-marked-down). Either way a bunch of 
client errors may occur, but for a scenario that is hopefully rare.

Maybe an "on-marked-up shutdown-backup-sessions" option would be good.

Justin