== Quote from bearophile (bearophileh...@lycos.com)'s article > Walter Bright: > > That misses the point about reliability. Again, you're approaching from the > > point of view that you can make a program that cannot fail (i.e. prove it > > correct). That view is WRONG WRONG WRONG and you must NEVER NEVER NEVER > > rely on > > such for something important, like say your life. Software can (and will) > > fail > > even if you proved it correct, for example, what if a memory cell fails and > > flips a bit? Cosmic rays flip a bit? > > > > Are you willing to bet your life? > If you have a critical system, you use most or all means you know to make it work, so if you can you use theorem proving too. If you have the economic resources to use those higher means, then refusing to use them is not wise. And then you also use error correction memory, 3-way or 6-way redundancy, plus watchdogs and more. > If it is a critical system you can even design it fail gracefully even if zero software is running, see the recent designs of the concrete canal under the core of nuclear reactors, to let the fused core go where you want it to go (where it will kept acceptably cool and safe, making accidents like Chernobyl very hard to happen) :-) > Bye, > bearophile
But the point is that redundancy is probably the **cheapest, most efficient** way to get ultra-high reliability. Yes, cost matters even when people's lives are at stake. If people accepted this more often, maybe the U.S. healthcare system wouldn't be completely bankrupt. Anyhow, if you try to design one ultra reliable system, you can't be stronger than your weakest link. If your system has N components, each with independent probability p_i of failure, your failure probability is: 1 - product_i=1 to n( 1 - p_i), i.e. 1 - the product of the probabilities that everything works. If p_i is large for any component, you're very likely to fail and if the system has a lot of components, you're bound to have at least one oversight. In the limit where one link is a lot more prone to failure than any of the rest, you're basically as strong as your weakest link. For example, if all components except one are perfect and that one component has a 1% chance of failure then the system as a whole has a 1% chance of failure. If, on the other hand, you have redundancy, you're at least as strong as your strongest link because only one system needs to work. Assuming the systems were designed by independent teams and have completely different weak points, we can assume their failures are statistically independent. Assume we have m redundant systems each with probability p_s of failing in some way or another. Then the probability of the whole thing failing is: product_i = 1 to m(p_s), or the probability that ALL of your redundant systems fail. Assuming they're all decent and designed independently, with different weak points, they probably aren't going to fail at the same time. For example, if each redundant system really sucks and has a 5% chance of failure, then the probability that they both fail and you're up the creek is only 0.25%.