On Wed, Feb 25, 2009 at 7:37 PM, <Zhenghui.Xie at sun.com> wrote: > forwarding to the right mail list..... > > Zhenghui.Xie at Sun.COM wrote: >> >> Hi, All >> >> We, the iteam, had a lot of discussions about health check last week. >> There are still several issues I wish people can discuss and reach >> consensus. >> >> Q1. health check probes. What is a valid UDP health check probe without >> knowing the application? Right now, besides allow user to provide their own >> test script, ilb provide three build-in health check queries, PING, TCP and >> UDP. After discussing with Kacheong, for PING, using "/usr/sbin/ping" should >> be good enough (assume that ping allows us to specify IP_NEXTHOP which >> covers DSR mode) . For TCP, a small program that opens a socket, connect to >> the IP/port then close the socket will do. But we could not figure out what >> is a valid UDP probe if we don't know the application on the server side. >> Any suggestions?
That's a problem. I don't know about udp (I don't have any udp services that I load-balance) but I know that in the case of tcp the simple 'can we open a socket?' approach can cause havoc with poorly written applications. >> Q2. how to show health check results to user? There was no objection that >> we definitely need to show this information to user. But arguments remain in >> which command should show it and what information should be included. Below >> is the summary: >> * which command to show: a) show it with list-servergroup, but one >> argument against this is server health actually is tied to a rule. b) show >> it with list-rule, but it seems not a good match to our current rule listing >> format. Your opinion? >> * what information needs to included: the current proposal is to include 4 >> items: what the hc test is (Ping, TCP, UDP...). the result of the last hc. >> when was the last hc time. when will the next hc be. Anything to add or >> delete? What I would like is a really simple way to see which checks have failed. If that's piping the status into grep, then that's fine, but what I'm after is a really simple way to list what's down. Also, is there a health check history? >> Q3. does one rule need more than one health check methods? Opinions are a) >> limit one health check per rule. b) allow user to ?specify more than one >> health check per rule. c) limit one health check per rule, but for each >> health check object, we can allow user to specify more than one test >> methods. Suggestions? I would expect there to be an implicit 'ping' rule for each server. That's not really associated with each rule, though - just that if we can't ping a machine then it's not worth doing all the complex tests. One other feature I would love that may be off topic but is another kind of health check: service dependency. If service A on a server doesn't respond, then mark service B as failed as well, even if it appears to be responding itself. (If B depends on A, then B may accept connections and thus look as if it's alive even though it can't do anything useful. The dependency avoids having to do a complex end-to-end probe.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
