On Monday, 29 September 2014 at 03:04:11 UTC, Walter Bright wrote:
You've clearly got a tough job to do, and I understand you're
doing the best you can with it. I know I'm hardcore and
uncompromising on this issue, but that's where I came from (the
aviation industry).
I know what works (airplanes are incredibly safe) and what
doesn't work (Toyota's approach was in the news not too long
ago). Deepwater Horizon and Fukushima are also prime examples
of not dealing properly with modest failures that cascaded into
disaster.
Do you interpret airplane safety right? As I understand,
airplanes are safe exactly because they recover from assert
failures and continue operation. Your suggestion is when seat 2A
creaks, shut down the whole airplane. In reality airplanes
continue to operate until there's zero physical resource to
operate. Fukushima caused disaster because it didn't try to
handle failure. But this is your idea that one can do nothing
meaningful on failure, and Fukushima did just that: nothing.
Termination of the process is the safe default, especially in the
case of client software, but servers should probably terminate
failed request, gracefully clean up and continue operation, like
airplanes.