Hi, All
Several days ago, I sent out an email
(http://mail.opensolaris.org/pipermail/ilb-dev/2009-February/000116.html)
to resolve the remaining open issues in health check design. After
reading the feedback from Peter and discussing with iteam people, here
is a writeup that proposes some solution.
1. health check probes:
ilb will provide three build-in health check probes, ping, tcp and udp.
1a) For ping, use /usr/sbin/ping (ping(1M)) to send ICMP request to host(s).
1b) For tcp, ilb delivers /usr/lib/ilb/tcp_query_probe, check whether
open socket on the destination/port will success.
1c) For udp, ilb defaultly just ping the destination (unless somebody
can suggest good approach to send udp probe). If user wants more
specific tests, user needs to provide their own test script.
2. show health check results:
ilbadm list-servergroup will show the health of each server. a sample
output of "ilbadm list-servergroup" will look like:
------------------- sample --------------------------------------
#ilbadm list-servergroup
SG-NAME RULE-NAME SERVER-IP PORT SERVER-STATE FLAGS
sg-123 rule-abc 1.1.1.1 80
alive xxxx
sg-123 rule-abc 1.1.1.2 80
dead xxxx
sg-123 rule-abc 1.1.1.3 80
disabled xxxx
sg-456 rule-def 2.1.1.1 21
alive xxxx
sg-456 rule-def 2.1.1.2 21
alive xxxx
---------------- end of sample ---------------------------------
list-servergroup provides "-v" to output more detailed server health
information, sample output:
-----------------sample------------------------------------------
SG-NAME: sg-123
RULE-NAME: rule-abc
SERVER-IP PORT STATE METHOD LAST-RESULT
LAST-TEST-TIME NEXT-TEST-TIME
1.1.1.1 80 alive TCP 3/5*
11:34:30 11:39:30
1.1.1.2 80 dead TCP
0/5 11:34:31 11:39:31
1.1.1.3 80 disabled TCP 0/0**
0:0:0 0:0:0
----------------- end of sample
* this means 3 out of 5 probes succeeded. The server will only be
declared as dead when all probes fail.
** no hc on disabled server.
3. does one rule need more than one health check?
For phase 1, we just deliver one health check per rule. It will work
like the following:
if user didn't specify any hc when create the rule, no health check will
be performed.
if user specify "ping" as the hc method, ping will be performed. Server
will be declared dead if ping failed.
if user specify "tcp" "udp" or provide their own test script, first
perform a default ping. If ping failed, no further try, server will be
declared as dead. If ping succeed, next try tcp probe or user test
script, server will be declared dead if test failed.
"create-healthcheck" provides a '-n' option to turn off the default ping
probe, allow user to specify "thank you but don't perform the default
ping for me"
Any suggestions, objections, rotten tomatoes/fresh eggs? Please send
your comments, if any, by end of this week.
-Jan