On 4/25/06, Jonathan S. Shapiro <[EMAIL PROTECTED]> wrote: > On Tue, 2006-04-25 at 17:47 +0200, Michal Suchanek wrote: > > On 4/25/06, Jonathan S. Shapiro <[EMAIL PROTECTED]> wrote: > > > On Tue, 2006-04-25 at 11:54 +0200, Michal Suchanek wrote: > > > > > > > ad (b) Imagine a few scenarios: > > > > ... > > > > And I do not think that timeouts or watchdogs solve [these] on > > > > non-realtime system. > > > > > > I agree. However, this mis-states the issue. You are talking about what > > > happens when you have already decided to recover (e.g. by killing a > > > non-performing renderer). The purpose of the timeout is to help > > > determine when recovery is required. > > > > > > Also, in each of the examples that you gave, an asynchronous interface > > > is appropriate. Recovering on an asynchronous interface is relatively > > > straightforward. > > > > > > > So you say that the timeouts and watch dogs actually solve a different > > kind of problem. > > No. Watchdogs and timeouts are the same thing. You were talking about > cases where a user says "kill that rendering agent, because it is > misbehaving."
I meant different from the problem we are trying to solve with the reference counted capabilities. Watchdogs and timeouts sure are pretty much the same thing. > > > > > The send-once + reference-counted capabilities serve to notify when a > > service has already failed. This allows the client to restart the > > action or use different means for obtaiing the service. Or just free > > any resources associated with the failed service in case of a proxy. > > > > But the watchdog is used to identify a service that is slow to respond > > and may be the one that is failing so that the user may remove it and > > trigger the recovery. > > Notice that the first is subsumed by the second. The only question is to > decide what latency is acceptable before noticing that a server has been > destroyed. This will determine whether a timeout is sufficient. No. The timeouts are unreliable so they may only provide a hint to the user. There may be cases when some process acts automatically when commmunication times out but it cannot be accepted as the general solution. So there needs to be some reliable way of recovery when the misbehaving process is identified and terminated. You say that there are systems that do not need reference counted capabilities for this, and work. But note that these are very specialized systems. They are used for servers, not user desktops. So one could expect there are administrators who know (or eventually find out) that when buggy service S is replaced with a fixed version dependent programs A, C, F and service I have to be restarted as well. Such information will be probably provided with the update. You also proposed an alternative to reference counting: garbage collection. Since it takes long and cannot run all the time it does not seem appropriate at the first glance. But if the user wants to see results of killing S immediately there may be a way for the user to trigger the garbage collection. And some notice when the garbage collection finishes. Thanks Michal
_______________________________________________ L4-hurd mailing list [email protected] http://lists.gnu.org/mailman/listinfo/l4-hurd
