On Fri, Feb 01, 2013 at 08:22:24AM +0100, Willy Tarreau wrote: > Hi Simon, > > On Fri, Feb 01, 2013 at 01:56:01PM +0900, Simon Horman wrote: > > Hi Malcolm, Hi Willy, > > > > after a bit of a hiatus I'd like to restart this discussion. > > Cool, I wanted to ping you on this last week-end but forgot to do so ! > > > On Mon, Dec 24, 2012 at 10:23:15AM +0100, Willy Tarreau wrote: > > > Hi Malcolm, > > > > > > On Mon, Dec 24, 2012 at 09:06:25AM +0000, Malcolm Turnbull wrote: > > > > Willy / Simon, > > > > > > > > I'm very happy to add a down option, my original thought was that you > > > > would use the standard health checks as well as the dynamic agent for > > > > changing the weight. > > > > > > That's what I thought I initially understood from our discussion a few > > > months ago but then your post of the specs last week slightly confused > > > me as I understood you needed this as a dedicated check. I think it was > > > the same for Simon. > > > > Sorry, I think that the problem here lies in my understanding of what is > > desired. > > No problem, we were several ones to get confused. > > > > > As you may for example want a specific HAproxy SMTP health check + use > > > > the dynamic weighting agent. > > > > > > Exactly. But then we have two options : > > > - retrieve the information from the checked port (easy for HTTP or TCP) > > > - retrieve the information from a dedicated port => this involves a > > > second task to do this, with its own check intervals. > > > > > > The latter doesn't seem stupid at all, quite the opposite in fact, but > > > it will require more settings on the server line. However it comes with > > > a benefit, it is that when the agent returns "disable", checks are > > > disabled on the real port, but then we could have the agent continue to > > > be checked and later return a valid result again. > > > > > > > I'm not sure if that would cause some coding issues if the health > > > > checks say 'Down' and the agent says 50%? (I would assume haproxy > > > > health checks take priority?) > > > > > > Status and weights are orthogonal. The real check should have precedence. > > > > > > > Or if the agent says Down but the HAProxy health check says up? > > > > > > I think it should be ANDed. This could help provide a first implementation > > > of multi-port checks after all. > > > > That sounds reasonable. > > > > > > I've certainly happy for Down to be added as an option with a > > > > description string. > > > > Also I'm assuming that later (the dynamic agent) could easily be > > > > extended to an http style get check rather than TCP (lb-agent-chk) if > > > > users prefer to write an HTTP server application to integrate with it > > > > (Kemp and Barracuda support this method). > > > > On the topic of of down. I think that Willy's proposal is > > entirely reasonable. However its unclear to me if disable should also > > be supported or not. > > The disable mode is very problematic : if a server accidently returns it, > there is no way to roll back except a manual intervention on the load > balancers. Also there is a high risk that the backup LB will be forgotten > in such an operation. I have no technical worries here, just operational > ones. If we run agent checks on a dedicated port in parallel to health > checks, this is different, because we could ensure that such checks could > still be running when the server is disabled so that the agent can change > the mode again. So maybe a first version should not support disable and a > later one could support it ?
This seems reasonable to me. > Also, I believe that in another thread we discussed about supporting a > new status (eg: STOPPED) which differs from DOWN in that it means the > service was intentionally stopped and did not crash. We can't support > this well right now (just map it do down) but I think it's important > that people can design their agents for this. Similarly, a "FAIL" > status could be useful in the usual situations where a server is inoperant > due to external conditions but could appear valid. The common example is > the mail server which fails to receive e-mails because the FS is full. > Everything works except the service cannot be delivered. There is nothing > to restart, the issue can go away by itself, etc... We'd map this to DOWN > again, but I think some users may later prefer to have a dedicated status > in the agent's language. So we should probably plan it in the language in > order to avoid ugly patches here and there. Adding stopped and fail, and mapping them both to down seems reasonable to me. I assume that they also accept reason strings as down does. > > > That's what I'm commonly observing too. Even right now, there are a lot > > > of users who use httpchk for services that are not HTTP at all, but they > > > have a very simple agent responding to checks. > > > > > > So now we have to decide what to do. I think Simon's code already provides > > > some useful features (assuming we support "down"). It should probably be > > > extended later to support combined checks. > > > > > > In my opinion, this could be done in three steps : > > > > > > 1) we merge Simon's work with the "option lb-agent-chk" directive which > > > *replaces* the health check method with this one ; > > > > > > 2) we implement "agent-port" and "agent-interval" on the server lines to > > > automatically enable the agent to be run on another port even when a > > > different check is running ; > > > > > > 3) we implement "http-check agent-hdr <name>" to retrieve the agent > > > string > > > from an HTTP header for HTTP checks ; > > > > > > That way we always support exactly the same syntax but can retrieve the > > > required information at different places depending on the checks. Does > > > that sound good to you ? > > > > That sounds entirely reasonable to me. > > Nice! > > Best regards, > Willy > >