[ilb-dev] some health check related issues

Peter Tribble Wed, 25 Feb 2009 20:05:02 +0000

On Wed, Feb 25, 2009 at 7:37 PM,  <Zhenghui.Xie at sun.com> wrote:
> forwarding to the right mail list.....
>
> Zhenghui.Xie at Sun.COM wrote:
>>
>> Hi, All
>>
>> We, the iteam, had a lot of discussions about health check last week.
>> There are still several issues I wish people can discuss and reach
>> consensus.
>>
>> Q1. health check probes. What is a valid UDP health check probe without
>> knowing the application? Right now, besides allow user to provide their own
>> test script, ilb provide three build-in health check queries, PING, TCP and
>> UDP. After discussing with Kacheong, for PING, using "/usr/sbin/ping" should
>> be good enough (assume that ping allows us to specify IP_NEXTHOP which
>> covers DSR mode) . For TCP, a small program that opens a socket, connect to
>> the IP/port then close the socket will do. But we could not figure out what
>> is a valid UDP probe if we don't know the application on the server side.
>> Any suggestions?


That's a problem. I don't know about udp (I don't have any udp services that I
load-balance) but I know that in the case of tcp the simple 'can we
open a socket?'
approach can cause havoc with poorly written applications.

>> Q2. how to show health check results to user? There was no objection that
>> we definitely need to show this information to user. But arguments remain in
>> which command should show it and what information should be included. Below
>> is the summary:
>> * which command to show: a) show it with list-servergroup, but one
>> argument against this is server health actually is tied to a rule. b) show
>> it with list-rule, but it seems not a good match to our current rule listing
>> format. Your opinion?
>> * what information needs to included: the current proposal is to include 4
>> items: what the hc test is (Ping, TCP, UDP...). the result of the last hc.
>> when was the last hc time. when will the next hc be. Anything to add or
>> delete?

What I would like is a really simple way to see which checks have failed.
If that's piping the status into grep, then that's fine, but what I'm
after is a really
simple way to list what's down.

Also, is there a health check history?

>> Q3. does one rule need more than one health check methods? Opinions are a)
>> limit one health check per rule. b) allow user to ?specify more than one
>> health check per rule. c) limit one health check per rule, but for each
>> health check object, we can allow user to specify more than one test
>> methods. Suggestions?

I would expect there to be an implicit 'ping' rule for each server.
That's not really
associated with each rule, though - just that if we can't ping a
machine then it's
not worth doing all the complex tests.

One other feature I would love that may be off topic but is another kind of
health check: service dependency. If service A on a server doesn't respond,
then mark service B as failed as well, even if it appears to be
responding itself.
(If B depends on A, then B may accept connections and thus look as if it's
alive even though it can't do anything useful. The dependency avoids having
to do a complex end-to-end probe.)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

[ilb-dev] some health check related issues

Reply via email to