On Sat, 2006-04-22 at 10:28 +0200, Marcus Brinkmann wrote: > At Sat, 22 Apr 2006 00:40:53 -0400, > "Jonathan S. Shapiro" <[EMAIL PROTECTED]> wrote:
> > I think that it might be useful to make this more precise. What you mean > > is that S must not be *destroyed* before T has a chance to reply. If S > > simply overwrites the capability, no problem will arise. > > Oh! But that is insufficient, because it does not achieve the level > of robustness I think is important to achieve. If the kernel only > generates reply messages on destruction, but not on overwrite, then > accidential overwrite due to a bug can cause the caller to hang > indefinitely. Marcus: You are making a mistake that a designer must try to avoid: you are issuing requirements for something prior to data. At this time, you have no evidence that this is a problem in practice, yet you are preparing to insist on overwhelming design damage in order to achieve an objective whose actual need has not been demonstrated. A comment, and then several points of explanation. COMMENT: The KeyKOS mechanism, send on containing object destroy, was used successfully in high-reliability production systems for nearly 25 years. At the very least, this suggests that careful thought should be given before it you declare that it is insufficient. Perhaps it *is* insufficient, but the two or three hours of thought that you have given this question is not enough to come to any conclusion, and concrete experience is needed before making a change that is so potentially damaging. EXPLANATIONS/COMMENTS: 1. The behavior that you want can *only* be accomplished with reference counting. Implementing reference counting requires that every time a capability is copied or overwritten, the target object associated with that capability must be brought into memory. Even if this is only done for sender capabilities, you may reasonably assume that, on average, this will add a significant *multiplier* (probably 3x or 4x) to the *average* IPC cost. 2. Notifying on overwrite violates isolation. It discloses to the destination process whenever a capability is dropped by a third party. In general the receiver has no right to know that the holder has the capability at all. 3. In fact, notifying on containing object destroy has the same problem, and I don't like it at all. The only reason that this was accepted in KeyKOS was that it cannot be defended against by the server -- the server has no control over its storage being revoked. 4. You are failing to consider the larger problem. There are *hundreds* of ways that a server can fail to meet the requirements of a client. This is just one of them. Given that this is true, it is not at all obvious that fixing this issue at such great cost is justified. 5. Note that reference counting doesn't really solve the problem either. A server could simply store an extra copy of the reply descriptor and forget about it for a very long time. The existence of this descriptor is sufficient to prevent the "last drop" message from being sent for an indefinite time. 6. You will soon generalize this to RcvQ send capabilities, but in that case the problem is unsolvable because the message will commonly be delayed (therefore lost). > I think the following > condition should be sufficient: The kernel guarantees that a reply > message is sent _at the latest_ when the callee process is destroyed. > This should hold true independent of what the callee does between > being invoked and exiting. In particular, simply dropping the reply > capability should not change this guarantee (which in effect means > that the kernel has to invoke the reply capability when it is > dropped). Several problems: 1. This requires dynamic storage allocation in the kernel. Dynamic storage allocation in the kernel implies denial of resource vulnerabilities and makes any statement of kernel robustness impossible. 2. Your description fails in the case where C calls S which forwards to T, because the exit of S will cause an improper reply. 3. Your proposal seems to have the side effect (I am not certain) of dictating a hierarchical calling relationship. This is bad. > Moreover, your semantics > (issueing notifications on destruction only, not on overwrite), break > down completely if the callee is malicious (for example because it has > been compromised). Marcus: have some more beer. You are not thinking clearly. ANY time that a client sends to a hostile recipient, the client cannot rely on ANYTHING. It cannot rely on getting a correct answer. It cannot rely on getting a well-formed answer. In fact, it cannot rely on getting an answer at all! The only solution to this is that clients must not rely on unreliable code for anything at all. This is axiomatic. > In short: I think overwriting a capability and destroying it should > behave the same with regards to this issue. So let me see if I have this right: you want to sent a death notice on overwrite. So this implies that every IPC must check the destination slots to see if they cause such an overwrite, and must issue death notice calls on those capabilities. If the IPC payload contains up to N capabilities, and we assume that the death notice itself does not transfer a capability, then every IPC has just been multiplied by up to N IPCs. This just won't work, Marcus. Technically it can be done, but the resulting system will perform much worse than Mach. > I am happy to defer discussion of implementation details, but I would > like to clarify the issue of (accidentially, maliciously) dropped > reply capabilities. I believe that I have offered a compelling argument for why it should not be done: in order to solve a very rare problem, your "highly robust" solution imposes a 400% overhead on the common case operation! I am inclined to think that disk GC provides a better approach to the whole problem, but I haven't really thought about it enough to have a sensible opinion about this. shap _______________________________________________ L4-hurd mailing list [email protected] http://lists.gnu.org/mailman/listinfo/l4-hurd
