[ilb-dev] some health check related issues

Sangeeta Misra Wed, 25 Feb 2009 12:30:22 -0800

On 02/25/09 12:05, Peter Tribble wrote:
> On Wed, Feb 25, 2009 at 7:37 PM,  <Zhenghui.Xie at sun.com> wrote:
>   
>> forwarding to the right mail list.....
>>
>> Zhenghui.Xie at Sun.COM wrote:
>>     
>>> Hi, All
>>>
>>> We, the iteam, had a lot of discussions about health check last week.
>>> There are still several issues I wish people can discuss and reach
>>> consensus.
>>>
>>> Q1. health check probes. What is a valid UDP health check probe without
>>> knowing the application? Right now, besides allow user to provide their own
>>> test script, ilb provide three build-in health check queries, PING, TCP and
>>> UDP. After discussing with Kacheong, for PING, using "/usr/sbin/ping" should
>>> be good enough (assume that ping allows us to specify IP_NEXTHOP which
>>> covers DSR mode) . For TCP, a small program that opens a socket, connect to
>>> the IP/port then close the socket will do. But we could not figure out what
>>> is a valid UDP probe if we don't know the application on the server side.
>>> Any suggestions?
>>>       
>
> That's a problem. I don't know about udp (I don't have any udp services that I
> load-balance) but I know that in the case of tcp the simple 'can we
> open a socket?'
> approach can cause havoc with poorly written applications.
>


Can you explain this further in terms of what havoc is caused? Perhaps 
ILB should not provide any UDP checks but just tcp, and ping, and 
provide capability to run user-supplied health check.
>   
>>> Q2. how to show health check results to user? There was no objection that
>>> we definitely need to show this information to user. But arguments remain in
>>> which command should show it and what information should be included. Below
>>> is the summary:
>>> * which command to show: a) show it with list-servergroup, but one
>>> argument against this is server health actually is tied to a rule. b) show
>>> it with list-rule, but it seems not a good match to our current rule listing
>>> format. Your opinion?
>>> * what information needs to included: the current proposal is to include 4
>>> items: what the hc test is (Ping, TCP, UDP...). the result of the last hc.
>>> when was the last hc time. when will the next hc be. Anything to add or
>>> delete?
>>>       
>
> What I would like is a really simple way to see which checks have failed.
> If that's piping the status into grep, then that's fine, but what I'm
> after is a really
> simple way to list what's down

I assume what you are stating here the ping hc should not be a 
configurable health check but one that should always done on every 
server in a server group. But should'nt the user not have a say on how 
frequently the ping hc check should occur ( as this may effect the perf 
of the load balancer?)

> .
>
> Also, is there a health check history?
>
>   
You mean keep the health of the server from the time its added to a 
server group until one of this happens
- the server has been removed from all server groups
- machine running ILB goes down.
Correct?
>>> Q3. does one rule need more than one health check methods? Opinions are a)
>>> limit one health check per rule. b) allow user to  specify more than one
>>> health check per rule. c) limit one health check per rule, but for each
>>> health check object, we can allow user to specify more than one test
>>> methods. Suggestions?
>>>       
>
>   
> I would expect there to be an implicit 'ping' rule for each server.
> That's not really
> associated with each rule, though - just that if we can't ping a
> machine then it's
> not worth doing all the complex tests.
>   
I assume what you are stating here the ping hc should not be enabled by 
default.  But should'nt the user not have a say on how frequently the 
ping hc check should occur ( as this may effect the perf of the load 
balancer?)  Also should the user be given the option to disable ping 
check if  he/she wants to?

>   
> One other feature I would love that may be off topic but is another kind of
> health check: service dependency. If service A on a server doesn't respond,
> then mark service B as failed as well, even if it appears to be
> responding itself.
> (If B depends on A, then B may accept connections and thus look as if it's
> alive even though it can't do anything useful. The dependency avoids having
> to do a complex end-to-end probe.)
>
>   
You mean if server 1 is  included in a 2 seperate TCP related virtual 
services, if it passes ping health checks but fails one of the tcp 
health check it should be considered failed for both TCP virtual 
service. Why would you want this?

Sangeeta
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/ilb-dev/attachments/20090225/4948bb34/attachment.html>

[ilb-dev] some health check related issues

Reply via email to