Jan-Benedict Glaw schrieb am 2006-07-31: > > Massive hardware problems don't count. ext2/ext3 doesn't look much better in > > such cases. I had a machine with RAM gone bad (no ECC - I wonder what > > They do! Very much, actually. These happen In Real Life, so I have to > pay attention to them. Once you're in setups with > 10000 machines, > everything counts. At some certain point, you can even use HDD's > temperature sensors in old machines to diagnose dead fans. > > Everything that eases recovery for whatever reason is something you > have to pay attention to. The simplicity of ext{2,3} is something I > really fail to find proper words for. As well as the really good fsck. > Once seen a SIGSEGV'ing fsck, you really don't want to go there.
The point is: If you've written data with broken hardware (RAM, bus, controllers - loads of them, CPU), what is on your disks is untrustworthy anyways, and fsck isn't going to repair your gzip file where every 64th bit has become a 1 or when the battery-backed write cache threw 60 MB down the drain... Of course, an fsck that crashes is unbearable, but that doesn't apply to "broken hardware" failures. You need backups with a few generations to avoid massively losing data. -- Matthias Andree