On 26/06/16 12:30, Duncan wrote: > Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted: > >> In every case, it was a flurry of csum error messages, then instant >> death. > > This is very possibly a known bug in btrfs, that occurs even in raid1 > where a later scrub repairs all csum errors. While in theory btrfs raid1 > should simply pull from the mirrored copy if its first try fails checksum > (assuming the second one passes, of course), and it seems to do this just > fine if there's only an occasional csum error, if it gets too many at > once, it *does* unfortunately crash, despite the second copy being > available and being just fine as later demonstrated by the scrub fixing > the bad copy from the good one. > > I'm used to dealing with that here any time I have a bad shutdown (and > I'm running live-git kde, which currently has a bug that triggers a > system crash if I let it idle and shut off the monitors, so I've been > getting crash shutdowns and having to deal with this unfortunately often, > recently). Fortunately I keep my root, with all system executables, etc, > mounted read-only by default, so it's not affected and I can /almost/ > boot normally after such a crash. The problem is /var/log and /home > (which has some parts of /var that need to be writable symlinked into / > home/var, so / can stay read-only). Something in the normal after-crash > boot triggers enough csum errors there that I often crash again. > > So I have to boot to emergency mode and manually mount the filesystems in > question, so nothing's trying to access them until I run the scrub and > fix the csum errors. Scrub itself doesn't trigger the crash, thankfully, > and once it has repaired all the csum errors due to partial writes on one > mirror that either were never made or were properly completed on the > other mirror, I can exit emergency mode and complete the normal boot (to > the multi-user default target). As there's no more csum errors then > because scrub fixed them all, the boot doesn't crash due to too many such > errors, and I'm back in business. > > > Tho I believe at least the csum bug that affects me may only trigger if > compression is (or perhaps has been in the past) enabled. Since I run > compress=lzo everywhere, that would certainly affect me. It would also > explain why the bug has remained around for quite some time as well, > since presumably the devs don't run with compression on enough for this > to have become a personal itch they needed to scratch, thus its remaining > untraced and unfixed. > > So if you weren't using the compress option, your bug is probably > different, but either way, the whole thing about too many csum errors at > once triggering a system crash sure does sound familiar, here.
Yes, I was running the compress=lzo option as well... Maybe here lays a common problem? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897
signature.asc
Description: OpenPGP digital signature