Re: FW: haproxy conditional healthchecks/failover

2012-05-29 Thread Willy Tarreau
On Tue, May 29, 2012 at 08:32:29PM +, Zulu Chas wrote:
> 
> am I wildly off course or is this config salvageable?
> 

To be honnest, your mail with overly long lines (half a kilobyte) is painful
to read, and once I made the effort of reading it, I didn't understand why
you're trying to cross-dress something which already exists and works.
 
The "disable-on-404" is made to permit enabling/disabling a server by a simple
"touch" or "rm". It appears that you want to exactly swap these two commands,
it really makes no sense to me to modify haproxy to support such a swap in a
script.

Another reason for disabling on 404 is that it will not accidently enable a
server which was started from an unmounted docroot file system. With your
method, it would still start it.

Also, the suggested way of dealing with very specific health checks is to
write a CGI or servlet to handle the various situations. Most people are
already doing this, and if you absolutely want to use "rm" to start the
server and "touch" to stop it, then 5 lines of shell in a CGI will do it.

Regards,
Willy




FW: haproxy conditional healthchecks/failover

2012-05-29 Thread Zulu Chas

am I wildly off course or is this config salvageable?






> > Hi!
> >
> > I'm trying to use HAproxy to support the concepts of "offline", "in
> > maintenance mode", and "not working" servers.
> 
> Any good reason to do that???
> (I'm a bit curious)

Sure.  I want to be able to mark a machine offline by creating a file (as 
opposed to marking it online by creating a file), which is why I can't use 
disable-on-404 below.  This covers situations where I need to take a machine 
out of public-facing operation for some reason, but perhaps I still want it to 
be able to render pages etc -- maybe I'm testing a code deployment once it's 
already deployed in order to verify the system is ready to be marked online.
I also want to be able to mark a machine down for maintenance by creating a 
file, "maintenance.html", which apache will nicely rewrite URLs to etc. during 
critical deployment phases or when performing other maintenance.  In this case, 
I don't want it to render pages (usually to replace otherwise nasty-looking 500 
error pages with a nice html facade).
For normal operations, I want the machine to be up.  But if it's not 
intentionally placed "offline" or "in maintenance" and the machines fail 
heartbeat checks, then the machine is "not working" and should not be served 
requests.
Does this make sense?
> 
> >  I have separate health checks
> > for each condition and I have been trying to use ACLs to be able to switch
> > between backends.  In addition to the fact that this doesn't seem to work,
> > I'm also not loving having to repeat the server lists (which are the same)
> > for each backend.
> 
> Nothing weird here, this is how HAProxy configuration works.
Cool, but variables would be nice to save time and avoid potential 
inconsistencies between sections.
> > -- I think it's more like "if any of
> > these succeed, mark this server online" -- and that's what's making this
> > scenario complex.
> 
> euh I might misunderstanding something.
> There is nothing more simple that "if the health check is successful,
> then the server is considered healthy"...

Since it's not strictly binary, as described above, it's a bit more complex.

> > frontend staging 0.0.0.0:8080
> >   # if the number of servers *not marked offline* is *less than the total
> > number of app servers* (in this case, 2), then it is considered degraded
> >   acl degraded nbsrv(only_online) lt 2
> >
> 
> This will match 0 and 1
> 
> >   # if the number of servers *not marked offline* is *less than one*, the
> > site is considered down
> >   acl down nbsrv(only_online) lt 1
> >
> 
> This will match 0, so you're both down and degraded ACL covers the
> same value (0).
> Which may lead to an issue later
> 
> >   # if the number of servers without the maintenance page is *less than the
> > total number of app servers* (in this case, 2), then it is
> > considered maintenance mode
> >   acl mx_mode nbsrv(maintenance) lt 2
> >
> >   # if the number of servers without the maintenance page is less than 1,
> > we're down because everything is in maintenance mode
> >   acl down_mx nbsrv(maintenance) lt 1
> >
> 
> Same remark as above.
> 
> 
> >   # if not running at full potential, use the backend that identified the
> > degraded state
> >   use_backend only_online if degraded
> >   use_backend maintenance if mx_mode
> >
> >   # if we are down for any reason, use the backend that identified that fact
> >   use_backend backup_only if down
> >   use_backend backup_only if down_mx
> >
> 
> Here is the problem (see above).
> The 2 use_backend above will NEVER match, because the degraded ad
> mx_mode ACL overlaps their values!

Why would they never match?  Aren't you saying they *both* should match and 
wouldn't it then take action on the final match and switch the backend to 
maintenance mode?  That's what I want.  Maintenance mode overrides offline mode 
as a failsafe (since it's more restrictive) to prevent page rendering.
> Do you know the "disable-on-404" option?
> it may help you make your configuration in the right way (not
> considering a 404 as a healthy response).
> 

Yes, but what I actually would need is enable-on-404 :)
Thanks for your feedback!  I'm definitely open to other options, but I'm hoping 
to not have to lose the flexibility described above!
-chaz