On Sat, 2006-04-22 at 03:52 +0200, Marcus Brinkmann wrote: > Interesting that you propose this. I thought about this earlier > today, and called it a "send at least once" capability when describing > it to Neal. Neal pointed out that this may be inconvenient in actual > practice: If a server S wants to propagate (forward) a request to > another server T, then S must be careful to not destroy its own copy > of the reply capability before T had a chance to reply.
I think that it might be useful to make this more precise. What you mean is that S must not be *destroyed* before T has a chance to reply. If S simply overwrites the capability, no problem will arise. In practice, it is almost universal to accept the reply capability into the same location every time, so reply capabilities tend to get overwritten quickly. The case you are considering (where the forwarding agent exits early) does not seem to arise much in practice. There is also a second factor to consider: T will perform its operation correctly, and will then issue a reply that fails (because it is issued on a broken capability). The client will *already* have been advised that the reply capability is in a strange state. The failure is not as undetectable as it looks. > In such a > scenario, the "send-once" semantics, where the capability is moved > rather than copied, seems to be easier to handle (and more efficient). It is certainly not more efficient. I say this because it requires special handling down in the capability copy logic. Resume capabilities similarly required special handling. This was both a significant complication and source of performance problems. > > > * The program S could use timeouts in the call to D. This solution > > > requires significant structural changes to the system design, > > > because now time becomes an important parameter in evaluating > > > services. It can be tried to argue that this is desirable anyway. > > > > This solution leads directly to systems that fail under load. There is > > general agreement in both the L4 and EROS/Coyotos communities that > > generalized timeouts were a mistake, and that "forever" and "don't wait" > > are the only options that should be implemented by the IPC layer. > > It is clear that such a naive approach would not work. However, in > the context of specific systems (for example real time systems) there > seem to be solutions to this problem, eg using scheduler donations, or > priority inversion avoidance protocols. Again, I was trying to look > beyond the specific type of systems we are considering for us. The issue at hand has absolutely nothing to do with schedule donation or priority inheritance. The issue is timeouts. Please identify a boudned timeout duration -- ANY bounded timeout duration -- that operates correctly under all conditions of load. > > The semantics of send-once rights is an abomination. The cost of them is > > considerable, and the overhead of manipulating them correctly from the > > application perspective is a serious problem. Coyotos will not under any > > circumstances implement "send-once" or "grant-only" capabilities. > > Well, let me try to understand the cost factor. As far as I see it, > the only cost involved in the copy operation is setting a data field > in the source capability to mark it as invalid. There is the cost to test whether this is a "special handling" capability. There is also the data-driven branch pipeline delay. There is the branch mispredict. There is the write of the source cache line, which would otherwise be read-only. On many processors that branch mispredict will be extremely noticeable. All of this will occur for every capability copy that happens in the kernel. > Also, the semantics you propose require the kernel to make an attempt > to send a message for every destroyed reply capability, which entails > accessing the FCRB (cache pressure?). And this although in the common > case (>99.999% of the cases), the FCRB will already contain the reply > message and thus not be available. It is not the reply capability that is being destroyed. It is the containing object (which is probably a process, not an FCRB). Destroy is dynamically rare. The cache pressure is actually not a big deal. The case in question would arise in destroy of a process, not destroy of an FCRB. If the FCRB already contains the reply message, then the protocol has completed successfully and you do not want a death notice. This is exactly the outcome that you want! The real problem here comes in destroying processes and capability pages. We now require a scan of these objects to learn whether a valid reply capability is present, and we potentially require that the target FCRBs be paged in so that we can determine whether they are valid and then invoke those reply capabilities to send the death notice. Yes, it is terribly expensive. This is why I took it out of EROS. There are various ways to optimize it. But notice that your "send once" capability has exactly the same problem: we still need to scan these objects and find the send once capabilities, and we still need to page in the target FCRBs in order to send the death notice. > > Worse, it has the disadvantage that every capability copy must be > > preceded by a capability type check, so that the sender knows whether it > > is losing the capability as a side effect. This violates encapsulation > > in a fairly fundamental way. > > Good point. However, I think that it would suffice to apply such > checks on incoming capabilities rather than outgoing, for those places > where it is actually relevant. In many situations, the check can > probably be omitted (for example if the capability will be dropped > anyway). The total number of incoming and outgoing capabilities is (obviously) identical, so that doesn't help. The kernel can never know that a capability will be dropped, so that doesn't help. In the end, the "overwrite on send" idea isn't really going to help you. The real cost is in getting those death notices to be sent. I think that this is something where we should defer the argument until we can actually test it in a real system. Any of the options you are considering can be done. The question is: what do we really want to have? I don't see any way to avoid the "search for caps in dying objects" approach, which is the real killer in *both* proposals. shap _______________________________________________ L4-hurd mailing list [email protected] http://lists.gnu.org/mailman/listinfo/l4-hurd
