On 10/27/08 17:09, Darren Reed wrote: > On 10/27/08 09:31, Michael Schuster wrote: >> Darren, >> >> thx for your comments. some answers/reflection below: >> >> On 10/26/08 19:43, Darren Reed wrote: >> .. >>> Health Checks. >>> ============== >>> This design has a single daemon, with a single thread, >>> that polls multiple servers to update a single pool of >>> data in the kernel. >>> >>> If we assume that the in-kernel handling of requests >>> from the daemon enforces MP-safety, why not run multiple >>> daemons? >> >> actually, it's the daemon that will serialise access to the kernel. > > If your kernel interface aren't MP-safe then you need to fix > your interfaces so that they are. It is not acceptable to > require the daemon to ensure the integrity of data inside > the kernel. > > >>> i.e. run an ilbd per back end server (or at least a >>> thread per back-end server.) You might still need a >>> single daemon to act as the manager? *shrug* >> >> this sounds like you're replacing the load of repeatedly starting health >> check processes by having as many processes sitting around idly a lot of >> the time. > > Yup. > Darren,
Assuming I have 100 back-end servers, your suggestions requires 100 ilbd instances . What exactly would be the benefit of having this design that would justify the added complexity? BTW, for Phase 1 we have decided to have the external health checks be implemented in the same way as the ping and tcp/udp probes. Depending on what gets most use by admin, we may change the implementation of ping and tcp/udp probe in later phase. > >> Since in the current design ilbd maintains quite a bit of state, one would >> indeed have to coordinate all the information to be able to get the >> "complete" picture again, so the added benefit seem a little elusive to me >> here. > > What state is there to manage that needs to be shared? > And if there is such state, why isn't it talked about > in the design doc.? > > So far as health checks go, the ilbd is responsible for: > - ensuring that all of the destinations are periodically > probed and > - ensuring that the list of in-kernel destinations matches > those that are successfully responding to probes. > > One way to do that is to have a big program that polls each > one in turn, with lots of complexity to ensure that nobody > causes the program to pause too long and everyone gets > serviced in turn, all inside one big loop. There is lots > of state held but it is all still per-destination. > > >>> This should also remove the ilbd main-loop from being a >>> critical section of code, where slow down from dealing >>> with one external server can impact all of the others. >>> Instead, scheduling of work is left up to the kernel to >>> schedule threads/processes, depending on who's busy or >>> blocked, etc. >> >> anything that we expect to block (health check) is farmed out to processes, >> so that happens anyway. > > The design document does not reflect this at all. > > Darren > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/ilb-dev/attachments/20081029/1ca7800e/attachment.html>
