On 10/27/08 09:31, Michael Schuster wrote: > Darren, > > thx for your comments. some answers/reflection below: > > On 10/26/08 19:43, Darren Reed wrote: > .. >> Health Checks. >> ============== >> This design has a single daemon, with a single thread, >> that polls multiple servers to update a single pool of >> data in the kernel. >> >> If we assume that the in-kernel handling of requests >> from the daemon enforces MP-safety, why not run multiple >> daemons? > > actually, it's the daemon that will serialise access to the kernel.
If your kernel interface aren't MP-safe then you need to fix your interfaces so that they are. It is not acceptable to require the daemon to ensure the integrity of data inside the kernel. >> i.e. run an ilbd per back end server (or at least a >> thread per back-end server.) You might still need a >> single daemon to act as the manager? *shrug* > > this sounds like you're replacing the load of repeatedly starting health > check processes by having as many processes sitting around idly a lot of > the time. Yup. > Since in the current design ilbd maintains quite a bit of state, one would > indeed have to coordinate all the information to be able to get the > "complete" picture again, so the added benefit seem a little elusive to me > here. What state is there to manage that needs to be shared? And if there is such state, why isn't it talked about in the design doc.? So far as health checks go, the ilbd is responsible for: - ensuring that all of the destinations are periodically probed and - ensuring that the list of in-kernel destinations matches those that are successfully responding to probes. One way to do that is to have a big program that polls each one in turn, with lots of complexity to ensure that nobody causes the program to pause too long and everyone gets serviced in turn, all inside one big loop. There is lots of state held but it is all still per-destination. >> This should also remove the ilbd main-loop from being a >> critical section of code, where slow down from dealing >> with one external server can impact all of the others. >> Instead, scheduling of work is left up to the kernel to >> schedule threads/processes, depending on who's busy or >> blocked, etc. > > anything that we expect to block (health check) is farmed out to processes, > so that happens anyway. The design document does not reflect this at all. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/ilb-dev/attachments/20081027/bdc739af/attachment.html>
