Am Sonntag, 26. Juni 2016, 13:13:04 CEST schrieb Steven Haigh: > On 26/06/16 12:30, Duncan wrote: > > Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted: > >> In every case, it was a flurry of csum error messages, then instant > >> death. > > > > This is very possibly a known bug in btrfs, that occurs even in raid1 > > where a later scrub repairs all csum errors. While in theory btrfs raid1 > > should simply pull from the mirrored copy if its first try fails checksum > > (assuming the second one passes, of course), and it seems to do this just > > fine if there's only an occasional csum error, if it gets too many at > > once, it *does* unfortunately crash, despite the second copy being > > available and being just fine as later demonstrated by the scrub fixing > > the bad copy from the good one. > > > > I'm used to dealing with that here any time I have a bad shutdown (and > > I'm running live-git kde, which currently has a bug that triggers a > > system crash if I let it idle and shut off the monitors, so I've been > > getting crash shutdowns and having to deal with this unfortunately often, > > recently). Fortunately I keep my root, with all system executables, etc, > > mounted read-only by default, so it's not affected and I can /almost/ > > boot normally after such a crash. The problem is /var/log and /home > > (which has some parts of /var that need to be writable symlinked into / > > home/var, so / can stay read-only). Something in the normal after-crash > > boot triggers enough csum errors there that I often crash again. > > > > So I have to boot to emergency mode and manually mount the filesystems in > > question, so nothing's trying to access them until I run the scrub and > > fix the csum errors. Scrub itself doesn't trigger the crash, thankfully, > > and once it has repaired all the csum errors due to partial writes on one > > mirror that either were never made or were properly completed on the > > other mirror, I can exit emergency mode and complete the normal boot (to > > the multi-user default target). As there's no more csum errors then > > because scrub fixed them all, the boot doesn't crash due to too many such > > errors, and I'm back in business. > > > > > > Tho I believe at least the csum bug that affects me may only trigger if > > compression is (or perhaps has been in the past) enabled. Since I run > > compress=lzo everywhere, that would certainly affect me. It would also > > explain why the bug has remained around for quite some time as well, > > since presumably the devs don't run with compression on enough for this > > to have become a personal itch they needed to scratch, thus its remaining > > untraced and unfixed. > > > > So if you weren't using the compress option, your bug is probably > > different, but either way, the whole thing about too many csum errors at > > once triggering a system crash sure does sound familiar, here. > > Yes, I was running the compress=lzo option as well... Maybe here lays a > common problem?
Hmm… I found this from being referred to by reading Debian wiki page on BTRFS¹. I use compress=lzo on BTRFS RAID 1 since April 2014 and I never found an issue. Steven, your filesystem wasn´t RAID 1 but RAID 5 or 6? I just want to assess whether using compress=lzo might be dangerous to use in my setup. Actually right now I like to keep using it, since I think at least one of the SSDs does not compress. And… well… /home and / where I use it are both quite full already. [1] https://wiki.debian.org/Btrfs#WARNINGS Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html