On Tue, 2006-04-25 at 22:06 +0200, Bas Wijnen wrote: > > Your proposal is to use a watchdog, which is not what is meant by "send > > exactly once". > > Well, a watchdog combined with send-on-destroy. I didn't mention this, but I > was still talking about move-only-send-exactly-once capabilities, which must > have send-on-destroy (otherwise they are send-at-most-once).
I do not believe this statement is correct. It appears (to me) that what you are really doing is to use a watchdog so that you can achieve "receive exactly once" behavior on top of single-copy, send-at-most-once capabilities. Can you confirm, or if not, can you explain why you see it differently? > Even though strictly speaking it is correct that also inside a CPU or on a > motherboard a signal might get lost, the chance that you still have a > functioning system when that happens is minimal. Compared to a network where > a broken cable doesn't actually break the system, this is a very different > situation. I think on one machine it is reasonable to assume that wires > aren't broken, and signals don't get lost. If they do, the computer should be > replaced (or at least a part of it). That is a tempting view, but it is a very dangerous view: it leads directly to bad application development. A correctly written program A should actively manage any situation where it speaks to a second program B such that failure-domain(A) != failure-domain(B). This should be true in the local case as well as the remote case. Programmers are (generally speaking) both lazy and stupid. If a programmer can rely on robust behavior in the local case, and also gets it 99%+ of the time in the network case, they will write programs that assume that this behavior is universally true, and these programs will fail when the bad thing actually happens. Such conditions are extremely hard to test, and they really do happen in the real world, because a 0.02% likely event happens quite often when measured over 100,000 machines across the world. Empirical evidence for my statement: run grep on any large body of source code. Measure the percentage of calls to read() where the error result is actually checked. How many programs recover from bad disk blocks? Hell, how many Linux *FS implmentations* check for them? The only situation where hiding the failure in the API is okay is the situation where the network proxy can simulate some expected *local* failure that the program *is* likely to deal with, and also the action taken in response to this failure causes the right behavior in the local application. Fundamentally, I am arguing that a well-designed API does not encourage the programmer to have unrealistic expectations about reliability. Instead, a well-designed API is structured (a) to encourage the programmer to actually deal with these issues, and (b) where possible, makes it straightforward to deal with them. But it *never* deludes the programmer into believing that the situation is better than it actually is. [Delusional behavior of this form is called "religion", or sometimes "politics". :-)] > > I am not sure how you inferred "crappy hardware" from "network > > transparency". > > When I think about writing an OS, I think about writing it on one computer. > That means, as I wrote above, that I assume that things work. Ah. Since you believe this, you obviously have never used a disk drive. In my lab alone, we have had more disk drive failures over the last 5 years than network failures. Yes, this is because we have a depressingly large number of disk drives, but the point is that local behavior is definitely *not* robust in the way that you assume. You may say that disk drive failure is easier to detect in advance of fatality, and is easy to mask with things like mirroring and RAID. I agree. But running out of space *isn't* easy to mask, and then I ask about what percentage of calls to write() [or fwrite()] have their error result unchecked. > > So what I'm saying is that if you consider a network as the "machine" to write > an OS for, then the failure-rate of that machine is very high, so the hardware > is crappy compared to "usual" computers. Hey, it could be worse! Imagine if you had to deal with Pentium chips at the same time! :-) > You wrote you want to be able to expand the system to a network "machine", and > therefore you don't want to use certain constructs which perhaps cannot handle > the fragility of such a machine. Or at least that's how I understood it, but > please correct me if I'm wrong. :-) I'm not certain that I want to do that. At the moment, I simply do not want to throw that option away yet, and I don't want to build an API that encourages delusional programming practices. shap _______________________________________________ L4-hurd mailing list [email protected] http://lists.gnu.org/mailman/listinfo/l4-hurd
