[2009-02-03 22:33] Marcin Cieslak <sa...@system.pl> > > I don't like this approach. I have always preferred software that "fails > fast". As soon as something is wrong - just abort with debugging information > what went wrong. > > I see some issues with the approach described in the paper. It assumes that > the state saved is okay - I think that crashes occur _because_ internal > state is inconsistent or wrong.
Seems as if you got a different view on the concept than me. I think its not so much about error handling in the first way, but about organizing state so that killing a software is equal to shutting it down. > Sure, you can dump internal state regularly > for recovery - but it's like with backups - you never know which one is > really clean and okay until you try to restore. > > Software bugs will sometimes create incorrect data. This may go unnoticed > for some longer time. But if you implement a crash-only design, then these problems will get erased. Exactly this is the sense of such a design: Have a software that handles these problems in a _sane_ way. Also these situations will be tested throughoutly as they are the _normal_ situations. > I think that authors unnecessarily assume that software components are > "black boxes" that need to be kept up at all costs. This is not the right > approach for availability I think. Most issues will occur when the component > is upgraded and needs to use/migrate old data or sometimes to cooperate with > still not upgraded components. If something goes wrong, the rollback becomes > the issue also - if I have new, badly-behaving components that dumped its > state in a new format, how do I go back? Of course, compatibility is an issue, but IMO an unrelated one. > Sweeping problems under the carpet is not going to help much... I think the crash-only approach explicitely wants to focus on the problems, that means actually _not_ sweeping them under the carpet. However, I know that I don't stick close to the paper. I base my argumentation also a lot on thoughts I made, inspired by the paper. Thus we might discuss from different points of view ... meillo
signature.asc
Description: Digital signature