[ilb-dev] writeup for health check open issues

[email protected] Wed, 04 Mar 2009 14:22:54 -0800

Hi, All

Several days ago, I sent out an email 
(http://mail.opensolaris.org/pipermail/ilb-dev/2009-February/000116.html) 
to resolve the remaining open issues in health check design. After 
reading the feedback from Peter and discussing with iteam people, here 
is a writeup that proposes some solution.


1. health check probes:
ilb will provide three build-in health check probes, ping, tcp and udp.
1a) For ping, use /usr/sbin/ping (ping(1M)) to send ICMP request to host(s).
1b) For tcp, ilb delivers /usr/lib/ilb/tcp_query_probe, check whether 
open socket on the destination/port will success.
1c) For udp, ilb defaultly just ping the destination (unless somebody 
can suggest good approach to send udp probe). If user wants more 
specific tests, user needs to provide their own test script.

2. show health check results:
ilbadm list-servergroup will show the health of each server. a sample 
output of "ilbadm list-servergroup" will look like:
------------------- sample --------------------------------------
#ilbadm list-servergroup
SG-NAME   RULE-NAME   SERVER-IP   PORT   SERVER-STATE    FLAGS
sg-123            rule-abc             1.1.1.1             80            
alive                    xxxx
sg-123            rule-abc              1.1.1.2            80            
dead                    xxxx
sg-123            rule-abc              1.1.1.3            80            
disabled              xxxx
sg-456            rule-def               2.1.1.1           21            
alive                    xxxx
sg-456            rule-def               2.1.1.2           21            
alive                    xxxx
---------------- end of sample ---------------------------------

list-servergroup provides "-v" to output more detailed server health 
information, sample output:
-----------------sample------------------------------------------
SG-NAME: sg-123
RULE-NAME: rule-abc
SERVER-IP     PORT  STATE    METHOD   LAST-RESULT    
LAST-TEST-TIME       NEXT-TEST-TIME
1.1.1.1               80         alive          TCP             3/5* 
                         11:34:30                          11:39:30
1.1.1.2               80         dead          TCP             
0/5                            11:34:31                          11:39:31
1.1.1.3               80         disabled     TCP            0/0** 
                            0:0:0                                0:0:0
----------------- end of sample
* this means 3 out of 5 probes succeeded. The server will only be 
declared as dead when all probes fail.
** no hc on disabled server.

3.   does one rule need more than one health check?
For phase 1, we just deliver one health check per rule. It will work 
like the following:
if user didn't specify any hc when create the rule, no health check will 
be performed.
if user specify "ping" as the hc method, ping will be performed. Server 
will be declared dead if ping failed.
if user specify "tcp" "udp" or provide their own test script, first 
perform a default ping. If ping failed, no further try, server will be 
declared as dead.  If ping succeed, next try tcp probe or user test 
script, server will be declared dead if test failed.

"create-healthcheck" provides a '-n' option to turn off the default ping 
probe, allow user to specify "thank you but don't perform the default 
ping for me"

Any suggestions, objections, rotten tomatoes/fresh eggs? Please send 
your comments, if any, by end of this week.

-Jan

[ilb-dev] writeup for health check open issues

Reply via email to