Michael Schuster wrote: > On 03/04/09 14:22, Zhenghui.Xie at Sun.COM wrote: >> Hi, All >> >> Several days ago, I sent out an email >> (http://mail.opensolaris.org/pipermail/ilb-dev/2009-February/000116.html) >> to resolve the remaining open issues in health check design. After >> reading the feedback from Peter and discussing with iteam people, >> here is a writeup that proposes some solution. >> >> 1. health check probes: >> ilb will provide three build-in health check probes, ping, tcp and udp. >> 1a) For ping, use /usr/sbin/ping (ping(1M)) to send ICMP request to >> host(s). >> 1b) For tcp, ilb delivers /usr/lib/ilb/tcp_query_probe, check whether >> open socket on the destination/port will success. >> 1c) For udp, ilb defaultly just ping the destination (unless somebody >> can suggest good approach to send udp probe). If user wants more >> specific tests, user needs to provide their own test script. >> >> 2. show health check results: >> ilbadm list-servergroup will show the health of each server. a sample >> output of "ilbadm list-servergroup" will look like: >> ------------------- sample -------------------------------------- >> #ilbadm list-servergroup >> SG-NAME RULE-NAME SERVER-IP PORT SERVER-STATE FLAGS >> sg-123 rule-abc 1.1.1.1 >> 80 alive xxxx >> sg-123 rule-abc 1.1.1.2 >> 80 dead xxxx >> sg-123 rule-abc 1.1.1.3 >> 80 disabled xxxx >> sg-456 rule-def 2.1.1.1 >> 21 alive xxxx >> sg-456 rule-def 2.1.1.2 >> 21 alive xxxx >> ---------------- end of sample --------------------------------- >> >> list-servergroup provides "-v" to output more detailed server health >> information, sample output: >> -----------------sample------------------------------------------ >> SG-NAME: sg-123 >> RULE-NAME: rule-abc >> SERVER-IP PORT STATE METHOD LAST-RESULT >> LAST-TEST-TIME NEXT-TEST-TIME >> 1.1.1.1 80 alive TCP 3/5* >> 11:34:30 11:39:30 >> 1.1.1.2 80 dead TCP >> 0/5 11:34:31 >> 11:39:31 >> 1.1.1.3 80 disabled TCP 0/0** >> 0:0:0 0:0:0 >> ----------------- end of sample >> * this means 3 out of 5 probes succeeded. The server will only be >> declared as dead when all probes fail. >> ** no hc on disabled server. >> >> 3. does one rule need more than one health check? >> For phase 1, we just deliver one health check per rule. It will work >> like the following: >> if user didn't specify any hc when create the rule, no health check >> will be performed. >> if user specify "ping" as the hc method, ping will be performed. >> Server will be declared dead if ping failed. >> if user specify "tcp" "udp" or provide their own test script, first >> perform a default ping. If ping failed, no further try, server will >> be declared as dead. If ping succeed, next try tcp probe or user >> test script, server will be declared dead if test failed. >> >> "create-healthcheck" provides a '-n' option to turn off the default >> ping probe, allow user to specify "thank you but don't perform the >> default ping for me" >> >> Any suggestions, objections, rotten tomatoes/fresh eggs? Please send >> your comments, if any, by end of this week. > > a comment on the suggested display: in the code I have ready for > putback, I added server IDs to the output.
sure. > In the interest of screen real-estate, maybe we could do away with > "state" and encode that in the flags as well, eg "*" for alive, "+" > for dead, "-" for disabled, etc. > sounds good. We probably need to have one line at bottom to explain which symbol means what. -Jan
