On 26/06/16 12:30, Duncan wrote:
> Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:
> 
>> In every case, it was a flurry of csum error messages, then instant
>> death.
> 
> This is very possibly a known bug in btrfs, that occurs even in raid1 
> where a later scrub repairs all csum errors.  While in theory btrfs raid1 
> should simply pull from the mirrored copy if its first try fails checksum 
> (assuming the second one passes, of course), and it seems to do this just 
> fine if there's only an occasional csum error, if it gets too many at 
> once, it *does* unfortunately crash, despite the second copy being 
> available and being just fine as later demonstrated by the scrub fixing 
> the bad copy from the good one.
> 
> I'm used to dealing with that here any time I have a bad shutdown (and 
> I'm running live-git kde, which currently has a bug that triggers a 
> system crash if I let it idle and shut off the monitors, so I've been 
> getting crash shutdowns and having to deal with this unfortunately often, 
> recently).  Fortunately I keep my root, with all system executables, etc, 
> mounted read-only by default, so it's not affected and I can /almost/ 
> boot normally after such a crash.  The problem is /var/log and /home 
> (which has some parts of /var that need to be writable symlinked into /
> home/var, so / can stay read-only).  Something in the normal after-crash 
> boot triggers enough csum errors there that I often crash again.
> 
> So I have to boot to emergency mode and manually mount the filesystems in 
> question, so nothing's trying to access them until I run the scrub and 
> fix the csum errors.  Scrub itself doesn't trigger the crash, thankfully, 
> and once it has repaired all the csum errors due to partial writes on one 
> mirror that either were never made or were properly completed on the 
> other mirror, I can exit emergency mode and complete the normal boot (to 
> the multi-user default target).  As there's no more csum errors then 
> because scrub fixed them all, the boot doesn't crash due to too many such 
> errors, and I'm back in business.
> 
> 
> Tho I believe at least the csum bug that affects me may only trigger if 
> compression is (or perhaps has been in the past) enabled.  Since I run 
> compress=lzo everywhere, that would certainly affect me.  It would also 
> explain why the bug has remained around for quite some time as well, 
> since presumably the devs don't run with compression on enough for this 
> to have become a personal itch they needed to scratch, thus its remaining 
> untraced and unfixed.
> 
> So if you weren't using the compress option, your bug is probably 
> different, but either way, the whole thing about too many csum errors at 
> once triggering a system crash sure does sound familiar, here.

Yes, I was running the compress=lzo option as well... Maybe here lays a
common problem?

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to