On Mon, 2006-04-24 at 19:30 +0200, Marcus Brinkmann wrote: > Hi, > > At Mon, 24 Apr 2006 11:48:22 -0400, > "Jonathan S. Shapiro" <[EMAIL PROTECTED]> wrote: > > So one way to guard against a failing server is to use idempotent timer > > events to implement a "heartbeat" -- in much the way that TCP does. > > > > I like this much better than complicating the invocation mechanism or > > the capability overwrite mechanism, because the majority of interprocess > > interactions are between components of the same application. These have > > been separated into processes for reasons of isolation, reuse, and > > testability, but they still fail as a unit. We do not want to impose > > capability semantics that discourage this pattern, and death notices > > between such processes are undesirable. > > > > The heartbeat does introduce a new specification problem. Basically, we > > are introducing a new class of error that is visible all the way up to > > the user (X timed out) and a new requirement for wall-clock response > > time limits. > > If you are going this way, it seems to make more sense to me to design > the system as a real time operating system in the first place, because > then one can at least precisely define what the requirements for > wall-clock (or even CPU) response time limits are.
Indeed. This would make 10 wonderful Ph.D. dissertations. > The key term you use above is that the processes "fail as a unit". > This is quite pessimistic. I am not sure if I accept this yet... I think you misunderstand what I am saying. I am saying that there are two cases: Processes A and B are in separate failure domains. In this case, one must guard against the failure of the other. Processes A and B are in the *same* failure domain. In this case, neither is required to guard against the other. This is what I meant by "the processes fail as a unit". A more precise way to say this is "the definition of a failure domain is that all processes in that failure domain fail as a unit." > Timeouts do not scale, and they cause a constant background noise > that, depending on the details, I suspect would cause performance and > power management issues. I seem to recall saying this myself, and I agree. The problem is that the kind of reliability you want to achieve cannot be had without them. shap _______________________________________________ L4-hurd mailing list [email protected] http://lists.gnu.org/mailman/listinfo/l4-hurd
