It seems like I accidentally managed to break my Btrfs/RAID5 filesystem, yet again, in a similar fashion. This time around, I ran into some random libata driver issue (?) instead of a faulty hardware part but the end result is quiet similar.
I issued the command (replacing X with valid letters for every hard-drives in the system): # echo 1 > /sys/block/sdX/device/queue_depth and I ended up with read-only filesystems. I checked dmesg and saw write errors on every disks (not just those in RAID-5). I tried to reboot immediately without success. My root filesystem with a single-disk Btrfs (which is an SSD, so it has "single" profile for both data and metadata) was unmountable, thus the kernel was stuck in a panic-reboot cycle. I managed to fix this one by booting from an USB stick and trying various recovery methods (like mounting it with "-o clear_cache,nospace_cache,recovery" and running "btrfs rescue chunk-recovery") until everything seemed to be fine (it can now be mounted read-write without error messages in the kernel-log, can be fully scrubbed without errors reported, it passes in "btrfs check", files can be actually written and read, etc). Once my system was up and running (well, sort of), I realized my /data is also un-mountable. I tried the same recovery methods on this RAID-5 filesystem but nothing seemed to help (there is an exception with the recovery attempts: the system drive was a small and fast SSD so "chunk-recovery" was a viable option to try but this one consists of huge slow HDDs - so, I tried to run it as a last-resort over-night but I found an unresponsive machine on the morning with the process stuck relatively early in the process). I can always mount it read-only and access files on it, seemingly without errors (I compared some of the contents with backups and it looks good) but as soon as I mount it read-write, all hell breaks loose and it falls into read-only state in no time (with some files seemingly disappearing from the filesystem) and the kernel log is starting to get spammed with various kind of error messages (including missing csums, etc). After mounting it like this: # mount /dev/sdb /data -o rw,noatime,nospace_cache and doing: # btrfs scrub start /data the result is: scrub status for 7d4769d6-2473-4c94-b476-4facce24b425 scrub started at Sat Jul 23 13:50:55 2016 and was aborted after 00:05:30 total bytes scrubbed: 18.99GiB with 16 errors error details: read=16 corrected errors: 0, uncorrectable errors: 16, unverified errors: 0 The relevant dmesg output is: [ 1047.709830] BTRFS info (device sdc): disabling disk space caching [ 1047.709846] BTRFS: has skinny extents [ 1047.895818] BTRFS info (device sdc): bdev /dev/sdc errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 [ 1047.895835] BTRFS info (device sdc): bdev /dev/sdb errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 [ 1065.764352] BTRFS: checking UUID tree [ 1386.423973] BTRFS error (device sdc): parent transid verify failed on 24431936729088 wanted 585936 found 586145 [ 1386.430922] BTRFS error (device sdc): parent transid verify failed on 24431936729088 wanted 585936 found 586145 [ 1411.738955] BTRFS error (device sdc): parent transid verify failed on 24432322764800 wanted 585779 found 586145 [ 1411.948040] BTRFS error (device sdc): parent transid verify failed on 24432322764800 wanted 585779 found 586145 [ 1412.040964] BTRFS error (device sdc): parent transid verify failed on 24432322764800 wanted 585779 found 586145 [ 1412.040980] BTRFS error (device sdc): parent transid verify failed on 24432322764800 wanted 585779 found 586145 [ 1412.041134] BTRFS error (device sdc): parent transid verify failed on 24432322764800 wanted 585779 found 586145 [ 1412.042628] BTRFS error (device sdc): parent transid verify failed on 24432322764800 wanted 585779 found 586145 [ 1412.042748] BTRFS error (device sdc): parent transid verify failed on 24432322764800 wanted 585779 found 586145 [ 1499.222245] BTRFS error (device sdc): parent transid verify failed on 24432312270848 wanted 585779 found 586143 [ 1499.230264] BTRFS error (device sdc): parent transid verify failed on 24432312270848 wanted 585779 found 586143 [ 1525.865143] BTRFS error (device sdc): parent transid verify failed on 24432367730688 wanted 585779 found 586144 [ 1525.880537] BTRFS error (device sdc): parent transid verify failed on 24432367730688 wanted 585779 found 586144 [ 1552.434209] BTRFS error (device sdc): parent transid verify failed on 24432415821824 wanted 585781 found 586144 [ 1552.437325] BTRFS error (device sdc): parent transid verify failed on 24432415821824 wanted 585781 found 586144 btrfs check /dev/sdc results in: Checking filesystem on /dev/sdc UUID: 7d4769d6-2473-4c94-b476-4facce24b425 checking extents parent transid verify failed on 24431859855360 wanted 585941 found 586144 parent transid verify failed on 24431859855360 wanted 585941 found 586144 checksum verify failed on 24431859855360 found 3F0C0853 wanted 165308D5 parent transid verify failed on 24431859855360 wanted 585941 found 586144 Ignoring transid failure parent transid verify failed on 24432402878464 wanted 585947 found 586144 parent transid verify failed on 24432402878464 wanted 585947 found 586144 checksum verify failed on 24432402878464 found 2018608B wanted 0947600D parent transid verify failed on 24432402878464 wanted 585947 found 586144 Ignoring transid failure leaf parent key incorrect 24432402878464 parent transid verify failed on 24431936729088 wanted 585936 found 586145 parent transid verify failed on 24431936729088 wanted 585936 found 586145 checksum verify failed on 24431936729088 found E464923E wanted CD3B92B8 parent transid verify failed on 24431936729088 wanted 585936 found 586145 Ignoring transid failure leaf parent key incorrect 24431936729088 parent transid verify failed on 24432268873728 wanted 585946 found 586143 parent transid verify failed on 24432268873728 wanted 585946 found 586143 checksum verify failed on 24432268873728 found 7748C8E4 wanted 5E17C862 parent transid verify failed on 24432268873728 wanted 585946 found 586143 Ignoring transid failure leaf parent key incorrect 24432268873728 parent transid verify failed on 24432268873728 wanted 585946 found 586143 Ignoring transid failure leaf parent key incorrect 24432268873728 parent transid verify failed on 24432268873728 wanted 585946 found 586143 Ignoring transid failure leaf parent key incorrect 24432268873728 parent transid verify failed on 24432268873728 wanted 585946 found 586143 Ignoring transid failure leaf parent key incorrect 24432268873728 parent transid verify failed on 24432112070656 wanted 585944 found 586142 parent transid verify failed on 24432112070656 wanted 585944 found 586142 checksum verify failed on 24432112070656 found 0482AA77 wanted 2DDDAAF1 parent transid verify failed on 24432112070656 wanted 585944 found 586142 Ignoring transid failure parent transid verify failed on 24432112070656 wanted 585944 found 586142 Ignoring transid failure parent transid verify failed on 24432112070656 wanted 585944 found 586142 Ignoring transid failure parent transid verify failed on 24431790055424 wanted 585936 found 586144 parent transid verify failed on 24431790055424 wanted 585936 found 586144 checksum verify failed on 24431790055424 found 3B2164E6 wanted 127E6460 parent transid verify failed on 24431790055424 wanted 585936 found 586144 Ignoring transid failure leaf parent key incorrect 24431790055424 parent transid verify failed on 24432038637568 wanted 585941 found 586145 parent transid verify failed on 24432038637568 wanted 585941 found 586145 checksum verify failed on 24432038637568 found 7A070E86 wanted 53580E00 parent transid verify failed on 24432038637568 wanted 585941 found 586145 Ignoring transid failure leaf parent key incorrect 24432038637568 parent transid verify failed on 24432038637568 wanted 585941 found 586145 Ignoring transid failure leaf parent key incorrect 24432038637568 parent transid verify failed on 24431790055424 wanted 585936 found 586144 Ignoring transid failure leaf parent key incorrect 24431790055424 bad block 24431790055424 Errors found in extent allocation tree or chunk allocation parent transid verify failed on 24432322764800 wanted 585779 found 586145 parent transid verify failed on 24432322764800 wanted 585779 found 586145 checksum verify failed on 24432322764800 found 2B2DE1E6 wanted 0272E160 parent transid verify failed on 24432322764800 wanted 585779 found 586145 Ignoring transid failure Segmentation fault So, it seems like there is no way of recovering from this. Thus, so far, my experience with Btrfs RAID-5 is that it's everything but resilient. Something sneezes in the system and it's gone. The only fix is recreating the filesystem from scratch and restoring the backups (if any) or may be recovering some of the content (with read-only mount or the "btrfs recovery" tool). But it seems to be much more prone to become unrecoverable than Btrfs filesystems with "single" data and/or metadata profiles. This one accident could possibly be related to the new space_cache=v2, since I had that enabled when the corruption occurred and now I am unable to mount it with that option (mounting with "-o clear_cache,space_cache=v2" fails completely). So, may be that experimental feature played some role in this: [ 906.664963] BTRFS info (device sdc): disabling disk space caching [ 906.664974] BTRFS: has skinny extents [ 907.032573] BTRFS info (device sdc): bdev /dev/sdc errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 [ 907.032589] BTRFS info (device sdc): bdev /dev/sdb errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 [ 951.948672] BTRFS info (device sdc): enabling free space tree [ 951.948682] BTRFS info (device sdc): force clearing of disk cache [ 951.948694] BTRFS info (device sdc): using free space tree [ 951.948696] BTRFS: has skinny extents [ 952.125700] BTRFS info (device sdc): bdev /dev/sdc errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 [ 952.125717] BTRFS info (device sdc): bdev /dev/sdb errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 [ 970.019994] BTRFS: creating free space tree [ 970.308042] BTRFS error (device sdc): parent transid verify failed on 24431936729088 wanted 585936 found 586145 [ 970.316104] BTRFS error (device sdc): parent transid verify failed on 24431936729088 wanted 585936 found 586145 [ 988.288037] BTRFS error (device sdc): parent transid verify failed on 24432322764800 wanted 585779 found 586145 [ 988.311250] BTRFS error (device sdc): parent transid verify failed on 24432322764800 wanted 585779 found 586145 [ 988.311265] ------------[ cut here ]------------ [ 988.311276] WARNING: CPU: 0 PID: 1930 at fs/btrfs/free-space-tree.c:1196 btrfs_create_free_space_tree+0x160/0x498 [ 988.311280] BTRFS: Transaction aborted (error -5) [ 988.311285] CPU: 0 PID: 1930 Comm: mount Not tainted 4.6.4-gentoo #6 [ 988.311288] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A75 Pro4, BIOS P2.40 07/11/2013 [ 988.311291] 0000000000000286 000000008bf8f073 ffffffff812bdd7d ffff8800d31af9b8 [ 988.311297] 0000000000000000 ffffffff8106919f ffff8800da8652a0 ffff8800d31afa10 [ 988.311302] ffff8800d478e000 ffff880000000000 ffff8800da8652a0 ffff8800da865150 [ 988.311307] Call Trace: [ 988.311314] [<ffffffff812bdd7d>] ? dump_stack+0x46/0x59 [ 988.311320] [<ffffffff8106919f>] ? __warn+0xaf/0xd0 [ 988.311324] [<ffffffff8106921a>] ? warn_slowpath_fmt+0x5a/0x78 [ 988.311330] [<ffffffff8126d898>] ? btrfs_create_free_space_tree+0x160/0x498 [ 988.311334] [<ffffffff811f4fe2>] ? open_ctree+0x1d82/0x26b0 [ 988.311340] [<ffffffff811cb497>] ? btrfs_mount+0xca7/0xde0 [ 988.311346] [<ffffffff810fa289>] ? pcpu_alloc_area+0x219/0x3e0 [ 988.311350] [<ffffffff810fadcc>] ? pcpu_alloc+0x38c/0x690 [ 988.311356] [<ffffffff8112e4da>] ? mount_fs+0xa/0x88 [ 988.311362] [<ffffffff81147e86>] ? vfs_kern_mount+0x56/0x100 [ 988.311367] [<ffffffff811cab38>] ? btrfs_mount+0x348/0xde0 [ 988.311371] [<ffffffff811337ca>] ? terminate_walk+0x8a/0xf0 [ 988.311375] [<ffffffff810fa289>] ? pcpu_alloc_area+0x219/0x3e0 [ 988.311379] [<ffffffff810fa065>] ? pcpu_next_unpop+0x35/0x40 [ 988.311383] [<ffffffff810fadcc>] ? pcpu_alloc+0x38c/0x690 [ 988.311388] [<ffffffff8112e4da>] ? mount_fs+0xa/0x88 [ 988.311393] [<ffffffff81147e86>] ? vfs_kern_mount+0x56/0x100 [ 988.311397] [<ffffffff811491ed>] ? do_mount+0x1fd/0xce0 [ 988.311400] [<ffffffff8113f8fb>] ? dput+0xd3/0x248 [ 988.311405] [<ffffffff81120d38>] ? __kmalloc_track_caller+0x20/0xe8 [ 988.311408] [<ffffffff810f7318>] ? memdup_user+0x38/0x60 [ 988.311412] [<ffffffff81149fe0>] ? SyS_mount+0x80/0xc8 [ 988.311417] [<ffffffff816f379b>] ? entry_SYSCALL_64_fastpath+0x13/0x8f [ 988.311420] ---[ end trace a3cc21d9a0eba35e ]--- [ 988.311425] BTRFS: error (device sdc) in btrfs_create_free_space_tree:1196: errno=-5 IO failure [ 988.311463] BTRFS: failed to create free space tree -5 [ 988.311475] BTRFS error (device sdc): commit super ret -30 [ 988.311561] BTRFS error (device sdc): cleaner transaction attach returned -30 [ 988.350206] BTRFS: open_ctree failed Any ideas before I wipe the filesystem? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html