Re: Possible Raid Bug
On 28 March 2016 at 05:54, Anand Jain wrote: > > On 03/26/2016 07:51 PM, Patrik Lundquist wrote: >> >> # btrfs device stats /mnt >> >> [/dev/sde].write_io_errs 11 >> [/dev/sde].read_io_errs0 >> [/dev/sde].flush_io_errs 2 >> [/dev/sde].corruption_errs 0 >> [/dev/sde].generation_errs 0 >> >> The old counters are back. That's good, but wtf? > > > No. I doubt if they are old counters. The steps above didn't > show old error counts, but since you have created a file > test3 so there will be some write_io_errors, which we don;t > see after the balance. So I doubt if they are old counter > but instead they are new flush errors. No, /mnt/test3 doesn't generate errors, only 'single' block groups. The old counters seem to be cached somewhere and replace doesn't reset them everywhere. One more time with more device stats and I've upgraded the kernel to Linux debian 4.5.0-trunk-amd64 #1 SMP Debian 4.5-1~exp1 (2016-03-20) x86_64 GNU/Linux # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde # mount /dev/sdb /mnt; dmesg | tail # touch /mnt/test1; sync; btrfs device usage /mnt Only raid10 profiles. # echo 1 >/sys/block/sde/device/delete; dmesg | tail [ 426.831037] sd 5:0:0:0: [sde] Synchronizing SCSI cache [ 426.831517] sd 5:0:0:0: [sde] Stopping disk [ 426.845199] ata6.00: disabled We lost a disk. # touch /mnt/test2; sync; dmesg | tail [ 467.126471] BTRFS error (device sde): bdev /dev/sde errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 [ 467.127386] BTRFS error (device sde): bdev /dev/sde errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 [ 467.128125] BTRFS error (device sde): bdev /dev/sde errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 [ 467.128640] BTRFS error (device sde): bdev /dev/sde errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 [ 467.129215] BTRFS error (device sde): bdev /dev/sde errs: wr 4, rd 0, flush 1, corrupt 0, gen 0 [ 467.129331] BTRFS warning (device sde): lost page write due to IO error on /dev/sde [ 467.129334] BTRFS error (device sde): bdev /dev/sde errs: wr 5, rd 0, flush 1, corrupt 0, gen 0 [ 467.129420] BTRFS warning (device sde): lost page write due to IO error on /dev/sde [ 467.129422] BTRFS error (device sde): bdev /dev/sde errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 We've got write errors on the lost disk. # btrfs device usage /mnt No 'single' profiles because we haven't remounted yet. # btrfs device stat /mnt [/dev/sde].write_io_errs 6 [/dev/sde].read_io_errs0 [/dev/sde].flush_io_errs 1 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 # reboot # wipefs -a /dev/sde; reboot # mount -o degraded /dev/sdb /mnt; dmesg | tail [ 52.876897] BTRFS info (device sdb): allowing degraded mounts [ 52.876901] BTRFS info (device sdb): disk space caching is enabled [ 52.876902] BTRFS: has skinny extents [ 52.878008] BTRFS warning (device sdb): devid 4 uuid 231d7892-3f31-40b5-8dff-baf8fec1a8aa is missing [ 52.879057] BTRFS info (device sdb): bdev (null) errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 # btrfs device usage /mnt Still only raid10 profiles. # btrfs device stat /mnt [(null)].write_io_errs 6 [(null)].read_io_errs0 [(null)].flush_io_errs 1 [(null)].corruption_errs 0 [(null)].generation_errs 0 /dev/sde is now called "(null)". Print device id instead? E.g. "[devid:4].write_io_errs 6" # touch /mnt/test3; sync; btrfs device usage /mnt /dev/sdb, ID: 1 Device size: 2.00GiB Data,single: 624.00MiB Data,RAID10: 102.38MiB Metadata,RAID10: 102.38MiB System,RAID10: 4.00MiB Unallocated: 1.19GiB /dev/sdc, ID: 2 Device size: 2.00GiB Data,RAID10: 102.38MiB Metadata,RAID10: 102.38MiB System,single: 32.00MiB System,RAID10: 4.00MiB Unallocated: 1.76GiB /dev/sdd, ID: 3 Device size: 2.00GiB Data,RAID10: 102.38MiB Metadata,single: 256.00MiB Metadata,RAID10: 102.38MiB System,RAID10: 4.00MiB Unallocated: 1.55GiB missing, ID: 4 Device size: 0.00B Data,RAID10: 102.38MiB Metadata,RAID10: 102.38MiB System,RAID10: 4.00MiB Unallocated: 1.80GiB Now we've got 'single' profiles on all devices except the missing one. Replace missing device before unmount or get stuck with a read-only filesystem. # btrfs device stat /mnt Same as before. Only old errors on the missing device. # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail [ 1268.598652] BTRFS info (device sdb): dev_replace from (devid 4) to /dev/sde started [ 1268.615601] BTRFS info (device sdb): dev_replace from (devid 4) to /dev/sde finished # btrfs device stats /mnt [/dev/sde].write_io_errs 0 [/dev/sde].read_io_errs0 [/dev/sde].flush_io_errs 0 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 Device "(null)" is back to /dev/sde and the error counts have been reset. # btrfs b
Re: Possible Raid Bug
Hi Patrik, Thanks for posting a test case. more below. On 03/26/2016 07:51 PM, Patrik Lundquist wrote: So with the lessons learned: # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde # mount /dev/sdb /mnt; dmesg | tail # touch /mnt/test1; sync; btrfs device usage /mnt Only raid10 profiles. # echo 1 >/sys/block/sde/device/delete We lost a disk. # touch /mnt/test2; sync; dmesg | tail We've got write errors. # btrfs device usage /mnt No 'single' profiles because we haven't remounted yet. # reboot # wipefs -a /dev/sde; reboot # mount -o degraded /dev/sdb /mnt; dmesg | tail # btrfs device usage /mnt Still only raid10 profiles. # touch /mnt/test3; sync; btrfs device usage /mnt Now we've got 'single' profiles. Replace now or get hosed. Since you are replacing the failed device without mount/unmount/reboot, so this should work. And you would need those parts of hot spare/auto replace patches only if the test case had unmount/mount or reboot at this stage. # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail # btrfs device stats /mnt [/dev/sde].write_io_errs 0 [/dev/sde].read_io_errs0 [/dev/sde].flush_io_errs 0 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 We didn't inherit the /dev/sde error count. Is that a bug? No. Its other way, it would have been a bug if the replace-target inherited the error counters. # btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft -sconvert=raid10,soft -vf /mnt; dmesg | tail # btrfs device usage /mnt Back to only 'raid10' profiles. # umount /mnt; mount /dev/sdb /mnt; dmesg | tail # btrfs device stats /mnt [/dev/sde].write_io_errs 11 [/dev/sde].read_io_errs0 [/dev/sde].flush_io_errs 2 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 The old counters are back. That's good, but wtf? No. I doubt if they are old counters. The steps above didn't show old error counts, but since you have created a file test3 so there will be some write_io_errors, which we don;t see after the balance. So I doubt if they are old counter but instead they are new flush errors. # btrfs device stats -z /dev/sde Give /dev/sde a clean bill of health. Won't warn when mounting again. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
Yeah I think the Gotchas page would be a good place to give people a heads up. -- Stephen Williams steph...@veryfast.biz On Sat, Mar 26, 2016, at 09:58 PM, Chris Murphy wrote: > On Sat, Mar 26, 2016 at 8:00 AM, Stephen Williams > wrote: > > > I know this is quite a rare occurrence for home use but for Data center > > use this is something that will happen A LOT. > > This really should be placed in the wiki while we wait for a fix. I can > > see a lot of sys admins crying over this. > > Maybe on the gotchas page? While it's not a data loss bug, it might be > viewed as an uptime bug because the dataset is stuck being ro and > hence unmodifiable, until a restore to a rw volume is complete. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
On Sat, Mar 26, 2016 at 8:00 AM, Stephen Williams wrote: > I know this is quite a rare occurrence for home use but for Data center > use this is something that will happen A LOT. > This really should be placed in the wiki while we wait for a fix. I can > see a lot of sys admins crying over this. Maybe on the gotchas page? While it's not a data loss bug, it might be viewed as an uptime bug because the dataset is stuck being ro and hence unmodifiable, until a restore to a rw volume is complete. Since we can ro mount a volume, some way to safely make it a seed device could be useful. All that's needed to make it rw is adding even a small USB stick for example, and now at least ro snapshots can be taken and migrate data off the volume. A larger device that's used for rw would allow this raid to be brought back online. And then once the new array is up and has most data restored, a short downtime to get the latest incremental changes sent over. Yeah, the alternative to this is a cluster, and you just consider this one brick a loss and move on. But most regular users don't do clusters, even with big (for them) storage. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
On Sat, Mar 26, 2016 at 5:51 AM, Patrik Lundquist wrote: > # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail > > # btrfs device stats /mnt > > [/dev/sde].write_io_errs 0 > [/dev/sde].read_io_errs0 > [/dev/sde].flush_io_errs 0 > [/dev/sde].corruption_errs 0 > [/dev/sde].generation_errs 0 > > We didn't inherit the /dev/sde error count. Is that a bug? I'm not sure where this information is stored. Presumably in the fs metadata? So when mounted degraded the counter is zero's is that what's going on? -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
Can confirm that you only get one chance to fix the problem before the array is dead. I know this is quite a rare occurrence for home use but for Data center use this is something that will happen A LOT. This really should be placed in the wiki while we wait for a fix. I can see a lot of sys admins crying over this. -- Stephen Williams steph...@veryfast.biz On Sat, Mar 26, 2016, at 11:51 AM, Patrik Lundquist wrote: > So with the lessons learned: > > # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde > > # mount /dev/sdb /mnt; dmesg | tail > # touch /mnt/test1; sync; btrfs device usage /mnt > > Only raid10 profiles. > > # echo 1 >/sys/block/sde/device/delete > > We lost a disk. > > # touch /mnt/test2; sync; dmesg | tail > > We've got write errors. > > # btrfs device usage /mnt > > No 'single' profiles because we haven't remounted yet. > > # reboot > # wipefs -a /dev/sde; reboot > > # mount -o degraded /dev/sdb /mnt; dmesg | tail > # btrfs device usage /mnt > > Still only raid10 profiles. > > # touch /mnt/test3; sync; btrfs device usage /mnt > > Now we've got 'single' profiles. Replace now or get hosed. > > # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail > > # btrfs device stats /mnt > > [/dev/sde].write_io_errs 0 > [/dev/sde].read_io_errs0 > [/dev/sde].flush_io_errs 0 > [/dev/sde].corruption_errs 0 > [/dev/sde].generation_errs 0 > > We didn't inherit the /dev/sde error count. Is that a bug? > > # btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft > -sconvert=raid10,soft -vf /mnt; dmesg | tail > > # btrfs device usage /mnt > > Back to only 'raid10' profiles. > > # umount /mnt; mount /dev/sdb /mnt; dmesg | tail > > # btrfs device stats /mnt > > [/dev/sde].write_io_errs 11 > [/dev/sde].read_io_errs0 > [/dev/sde].flush_io_errs 2 > [/dev/sde].corruption_errs 0 > [/dev/sde].generation_errs 0 > > The old counters are back. That's good, but wtf? > > # btrfs device stats -z /dev/sde > > Give /dev/sde a clean bill of health. Won't warn when mounting again. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
So with the lessons learned: # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde # mount /dev/sdb /mnt; dmesg | tail # touch /mnt/test1; sync; btrfs device usage /mnt Only raid10 profiles. # echo 1 >/sys/block/sde/device/delete We lost a disk. # touch /mnt/test2; sync; dmesg | tail We've got write errors. # btrfs device usage /mnt No 'single' profiles because we haven't remounted yet. # reboot # wipefs -a /dev/sde; reboot # mount -o degraded /dev/sdb /mnt; dmesg | tail # btrfs device usage /mnt Still only raid10 profiles. # touch /mnt/test3; sync; btrfs device usage /mnt Now we've got 'single' profiles. Replace now or get hosed. # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail # btrfs device stats /mnt [/dev/sde].write_io_errs 0 [/dev/sde].read_io_errs0 [/dev/sde].flush_io_errs 0 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 We didn't inherit the /dev/sde error count. Is that a bug? # btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft -sconvert=raid10,soft -vf /mnt; dmesg | tail # btrfs device usage /mnt Back to only 'raid10' profiles. # umount /mnt; mount /dev/sdb /mnt; dmesg | tail # btrfs device stats /mnt [/dev/sde].write_io_errs 11 [/dev/sde].read_io_errs0 [/dev/sde].flush_io_errs 2 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 The old counters are back. That's good, but wtf? # btrfs device stats -z /dev/sde Give /dev/sde a clean bill of health. Won't warn when mounting again. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
Chris Murphy posted on Fri, 25 Mar 2016 15:34:11 -0600 as excerpted: > Basically you get one chance to mount rw,degraded and you have to fix > the problem at that time. And you have to balance away any phantom > single chunks that have appeared. For what it's worth it's not the > reboot that degraded it further, it's the unmount and then attempt to > mount rw,degraded a 2nd time that's not allowed due to this bug. As CMurphy says here but without mentioning the patch, as Alexander F says in sibling to CMurphy's reply, and as I said in my longer explanation further upthread, this is a known bug, with a patch in the pipeline that really should have made it into 4.5 but didn't as it was part of a larger patch set that apparently wasn't considered ready, and unfortunately it wasn't cherrypicked. So right now, yes, known bug. You get one chance at a degraded-writable mount to rebuild the array. If you crash after writing but before the rebuild is complete, too bad, so sad, now you can only mount degraded- readonly and your only possibility of saving the data (other than rebuilding with the appropriate patch) is to do just that, mount degraded- readonly, and copy off the data to elsewhere. But there's a patch that has been demonstrated to fix the bug, not only in tests, but in live-deployments where people found themselves with a degraded-readonly mount until they built with the patch. Hopefully that patch will hit the 4.6 development kernel with a CC to stable, and be backported as necessary there, but I'm not sure it will be in 4.6 at this point, tho it should hit mainline /eventually/. Meanwhile, the patch can still be applied manually if necessary, and I suppose some distros may already be applying it to their shipped versions as it's certainly a fix worth having. I'll simply refer you to previous discussion on the list for the patch, as that's where I'd have to look for it if I needed it myself before it gets mainlined. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
On 03/26/2016 04:09 AM, Alexander Fougner wrote: 2016-03-25 20:57 GMT+01:00 Patrik Lundquist : On 25 March 2016 at 18:20, Stephen Williams wrote: Your information below was very helpful and I was able to recreate the Raid array. However my initial question still stands - What if the drives dies completely? I work in a Data center and we see this quite a lot where a drive is beyond dead - The OS will literally not detect it. That's currently a weakness of Btrfs. I don't know how people deal with it in production. I think Anand Jain is working on improving it. We need this issue be fixed for the real production usage. Patch set of hot spare contains the fix for this. Currently I am fixing an issue (#5) which Yauhen reported and thats related to the auto replace. Refreshed v2 will be out soon. Thanks, Anand At this point would the Raid10 array be beyond repair? As you need the drive present in order to mount the array in degraded mode. Right... let's try it again but a little bit differently. # mount /dev/sdb /mnt Let's drop the disk. # echo 1 >/sys/block/sde/device/delete [ 3669.024256] sd 5:0:0:0: [sde] Synchronizing SCSI cache [ 3669.024934] sd 5:0:0:0: [sde] Stopping disk [ 3669.037028] ata6.00: disabled # touch /mnt/test3 # sync [ 3845.960839] BTRFS error (device sdb): bdev /dev/sde errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 [ 3845.961525] BTRFS error (device sdb): bdev /dev/sde errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 [ 3845.962738] BTRFS error (device sdb): bdev /dev/sde errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 [ 3845.963038] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 [ 3845.963422] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd 0, flush 1, corrupt 0, gen 0 [ 3845.963686] BTRFS warning (device sdb): lost page write due to IO error on /dev/sde [ 3845.963691] BTRFS error (device sdb): bdev /dev/sde errs: wr 5, rd 0, flush 1, corrupt 0, gen 0 [ 3845.963932] BTRFS warning (device sdb): lost page write due to IO error on /dev/sde [ 3845.963941] BTRFS error (device sdb): bdev /dev/sde errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 # umount /mnt [ 4095.276831] BTRFS error (device sdb): bdev /dev/sde errs: wr 7, rd 0, flush 1, corrupt 0, gen 0 [ 4095.278368] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd 0, flush 1, corrupt 0, gen 0 [ 4095.279152] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd 0, flush 2, corrupt 0, gen 0 [ 4095.279373] BTRFS warning (device sdb): lost page write due to IO error on /dev/sde [ 4095.279377] BTRFS error (device sdb): bdev /dev/sde errs: wr 9, rd 0, flush 2, corrupt 0, gen 0 [ 4095.279609] BTRFS warning (device sdb): lost page write due to IO error on /dev/sde [ 4095.279612] BTRFS error (device sdb): bdev /dev/sde errs: wr 10, rd 0, flush 2, corrupt 0, gen 0 # mount -o degraded /dev/sdb /mnt [ 4608.113751] BTRFS info (device sdb): allowing degraded mounts [ 4608.113756] BTRFS info (device sdb): disk space caching is enabled [ 4608.113757] BTRFS: has skinny extents [ 4608.116557] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 # touch /mnt/test4 # sync Writing to the filesystem works while the device is missing. No new errors in dmesg after re-mounting degraded. Reboot to get back /dev/sde. [4.329852] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d devid 4 transid 26 /dev/sde [4.330157] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d devid 3 transid 31 /dev/sdd [4.330511] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d devid 2 transid 31 /dev/sdc [4.330865] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d devid 1 transid 31 /dev/sdb /dev/sde transid is lagging behind, of course. # wipefs -a /dev/sde # btrfs device scan # mount -o degraded /dev/sdb /mnt [ 507.248621] BTRFS info (device sdb): allowing degraded mounts [ 507.248626] BTRFS info (device sdb): disk space caching is enabled [ 507.248628] BTRFS: has skinny extents [ 507.252815] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 [ 507.252919] BTRFS: missing devices(1) exceeds the limit(0), single/dup profile has zero-limit tolerance for missing devices. Only ro-mount allowed in that case. writeable mount is not allowed [ 507.278277] BTRFS: open_ctree failed Well, that was unexpected! Reboot again. # mount -o degraded /dev/sdb /mnt [ 94.368514] BTRFS info (device sdd): allowing degraded mounts [ 94.368519] BTRFS info (device sdd): disk space caching is enabled [ 94.368521] BTRFS: has skinny extents [ 94.370909] BTRFS warning (device sdd): devid 4 uuid 8549a275-f663-4741-b410-79b49a1d465f is missing [ 94.372170] BTRFS info (device sdd): bdev (null) errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 [ 94.372284] BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed [ 94.395021] BTRFS: open_ctree failed No go. # mount -o degraded,ro /dev/sdb /mnt # btrfs device sta
Re: Possible Raid Bug
On Fri, Mar 25, 2016 at 1:57 PM, Patrik Lundquist wrote: > > Only errors on the device formerly known as /dev/sde, so why won't it > mount degraded,rw? Now I'm stuck like Stephen. > > # btrfs device usage /mnt > /dev/sdb, ID: 1 >Device size: 2.00GiB >Data,single: 624.00MiB <<-- >Data,RAID10: 102.38MiB >Metadata,RAID10: 102.38MiB >System,RAID10: 4.00MiB >Unallocated: 1.19GiB > > /dev/sdc, ID: 2 >Device size: 2.00GiB >Data,RAID10: 102.38MiB >Metadata,RAID10: 102.38MiB >System,single: 32.00MiB <<-- >System,RAID10: 4.00MiB >Unallocated: 1.76GiB > > /dev/sdd, ID: 3 >Device size: 2.00GiB >Data,RAID10: 102.38MiB >Metadata,single: 256.00MiB <<-- >Metadata,RAID10: 102.38MiB >System,RAID10: 4.00MiB >Unallocated: 1.55GiB > > missing, ID: 4 >Device size: 0.00B >Data,RAID10: 102.38MiB >Metadata,RAID10: 102.38MiB >System,RAID10: 4.00MiB >Unallocated: 1.80GiB > > The data written while mounted degraded is in profile 'single' and > will have to be converted to 'raid10' once the filesystem is whole > again. > > So what do I do now? Why did it degrade further after a reboot? You're hosed. The file system is read only and can't be fixed. It's an old bug. It's not a data loss bug, but it's major time loss bug because now the volume has to be rebuilt, and totally unworkable for production use. While the appearance of the single chunks is one bug that shouldn't happen, the worse bug is the truly bogus one that claims there aren't enough drives for rw degraded mount. Those single chunks aren't on the missing drive. They're on the three remaining ones. So the rw fail is just a bad bug. It's a PITA but at least it's not a data loss bug. Basically you get one chance to mount rw,degraded and you have to fix the problem at that time. And you have to balance away any phantom single chunks that have appeared. For what it's worth it's not the reboot that degraded it further, it's the unmount and then attempt to mount rw,degraded a 2nd time that's not allowed due to this bug. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
2016-03-25 20:57 GMT+01:00 Patrik Lundquist : > On 25 March 2016 at 18:20, Stephen Williams wrote: >> >> Your information below was very helpful and I was able to recreate the >> Raid array. However my initial question still stands - What if the >> drives dies completely? I work in a Data center and we see this quite a >> lot where a drive is beyond dead - The OS will literally not detect it. > > That's currently a weakness of Btrfs. I don't know how people deal > with it in production. I think Anand Jain is working on improving it. > >> At this point would the Raid10 array be beyond repair? As you need the >> drive present in order to mount the array in degraded mode. > > Right... let's try it again but a little bit differently. > > # mount /dev/sdb /mnt > > Let's drop the disk. > > # echo 1 >/sys/block/sde/device/delete > > [ 3669.024256] sd 5:0:0:0: [sde] Synchronizing SCSI cache > [ 3669.024934] sd 5:0:0:0: [sde] Stopping disk > [ 3669.037028] ata6.00: disabled > > # touch /mnt/test3 > # sync > > [ 3845.960839] BTRFS error (device sdb): bdev /dev/sde errs: wr 1, rd > 0, flush 0, corrupt 0, gen 0 > [ 3845.961525] BTRFS error (device sdb): bdev /dev/sde errs: wr 2, rd > 0, flush 0, corrupt 0, gen 0 > [ 3845.962738] BTRFS error (device sdb): bdev /dev/sde errs: wr 3, rd > 0, flush 0, corrupt 0, gen 0 > [ 3845.963038] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd > 0, flush 0, corrupt 0, gen 0 > [ 3845.963422] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd > 0, flush 1, corrupt 0, gen 0 > [ 3845.963686] BTRFS warning (device sdb): lost page write due to IO > error on /dev/sde > [ 3845.963691] BTRFS error (device sdb): bdev /dev/sde errs: wr 5, rd > 0, flush 1, corrupt 0, gen 0 > [ 3845.963932] BTRFS warning (device sdb): lost page write due to IO > error on /dev/sde > [ 3845.963941] BTRFS error (device sdb): bdev /dev/sde errs: wr 6, rd > 0, flush 1, corrupt 0, gen 0 > > # umount /mnt > > [ 4095.276831] BTRFS error (device sdb): bdev /dev/sde errs: wr 7, rd > 0, flush 1, corrupt 0, gen 0 > [ 4095.278368] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd > 0, flush 1, corrupt 0, gen 0 > [ 4095.279152] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd > 0, flush 2, corrupt 0, gen 0 > [ 4095.279373] BTRFS warning (device sdb): lost page write due to IO > error on /dev/sde > [ 4095.279377] BTRFS error (device sdb): bdev /dev/sde errs: wr 9, rd > 0, flush 2, corrupt 0, gen 0 > [ 4095.279609] BTRFS warning (device sdb): lost page write due to IO > error on /dev/sde > [ 4095.279612] BTRFS error (device sdb): bdev /dev/sde errs: wr 10, rd > 0, flush 2, corrupt 0, gen 0 > > # mount -o degraded /dev/sdb /mnt > > [ 4608.113751] BTRFS info (device sdb): allowing degraded mounts > [ 4608.113756] BTRFS info (device sdb): disk space caching is enabled > [ 4608.113757] BTRFS: has skinny extents > [ 4608.116557] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd > 0, flush 1, corrupt 0, gen 0 > > # touch /mnt/test4 > # sync > > Writing to the filesystem works while the device is missing. > No new errors in dmesg after re-mounting degraded. Reboot to get back > /dev/sde. > > [4.329852] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d > devid 4 transid 26 /dev/sde > [4.330157] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d > devid 3 transid 31 /dev/sdd > [4.330511] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d > devid 2 transid 31 /dev/sdc > [4.330865] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d > devid 1 transid 31 /dev/sdb > > /dev/sde transid is lagging behind, of course. > > # wipefs -a /dev/sde > # btrfs device scan > > # mount -o degraded /dev/sdb /mnt > > [ 507.248621] BTRFS info (device sdb): allowing degraded mounts > [ 507.248626] BTRFS info (device sdb): disk space caching is enabled > [ 507.248628] BTRFS: has skinny extents > [ 507.252815] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd > 0, flush 1, corrupt 0, gen 0 > [ 507.252919] BTRFS: missing devices(1) exceeds the limit(0), single/dup profile has zero-limit tolerance for missing devices. Only ro-mount allowed in that case. > writeable mount is not allowed > [ 507.278277] BTRFS: open_ctree failed > > Well, that was unexpected! Reboot again. > > # mount -o degraded /dev/sdb /mnt > > [ 94.368514] BTRFS info (device sdd): allowing degraded mounts > [ 94.368519] BTRFS info (device sdd): disk space caching is enabled > [ 94.368521] BTRFS: has skinny extents > [ 94.370909] BTRFS warning (device sdd): devid 4 uuid > 8549a275-f663-4741-b410-79b49a1d465f is missing > [ 94.372170] BTRFS info (device sdd): bdev (null) errs: wr 6, rd 0, > flush 1, corrupt 0, gen 0 > [ 94.372284] BTRFS: missing devices(1) exceeds the limit(0), > writeable mount is not allowed > [ 94.395021] BTRFS: open_ctree failed > > No go. > > # mount -o degraded,ro /dev/sdb /mnt > # btrfs device stats /mnt > [/dev/sdb].write_io_errs 0 > [/dev/sdb].read_io_errs0 > [/dev/sdb].flush_io
Re: Possible Raid Bug
On 25 March 2016 at 18:20, Stephen Williams wrote: > > Your information below was very helpful and I was able to recreate the > Raid array. However my initial question still stands - What if the > drives dies completely? I work in a Data center and we see this quite a > lot where a drive is beyond dead - The OS will literally not detect it. That's currently a weakness of Btrfs. I don't know how people deal with it in production. I think Anand Jain is working on improving it. > At this point would the Raid10 array be beyond repair? As you need the > drive present in order to mount the array in degraded mode. Right... let's try it again but a little bit differently. # mount /dev/sdb /mnt Let's drop the disk. # echo 1 >/sys/block/sde/device/delete [ 3669.024256] sd 5:0:0:0: [sde] Synchronizing SCSI cache [ 3669.024934] sd 5:0:0:0: [sde] Stopping disk [ 3669.037028] ata6.00: disabled # touch /mnt/test3 # sync [ 3845.960839] BTRFS error (device sdb): bdev /dev/sde errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 [ 3845.961525] BTRFS error (device sdb): bdev /dev/sde errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 [ 3845.962738] BTRFS error (device sdb): bdev /dev/sde errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 [ 3845.963038] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 [ 3845.963422] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd 0, flush 1, corrupt 0, gen 0 [ 3845.963686] BTRFS warning (device sdb): lost page write due to IO error on /dev/sde [ 3845.963691] BTRFS error (device sdb): bdev /dev/sde errs: wr 5, rd 0, flush 1, corrupt 0, gen 0 [ 3845.963932] BTRFS warning (device sdb): lost page write due to IO error on /dev/sde [ 3845.963941] BTRFS error (device sdb): bdev /dev/sde errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 # umount /mnt [ 4095.276831] BTRFS error (device sdb): bdev /dev/sde errs: wr 7, rd 0, flush 1, corrupt 0, gen 0 [ 4095.278368] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd 0, flush 1, corrupt 0, gen 0 [ 4095.279152] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd 0, flush 2, corrupt 0, gen 0 [ 4095.279373] BTRFS warning (device sdb): lost page write due to IO error on /dev/sde [ 4095.279377] BTRFS error (device sdb): bdev /dev/sde errs: wr 9, rd 0, flush 2, corrupt 0, gen 0 [ 4095.279609] BTRFS warning (device sdb): lost page write due to IO error on /dev/sde [ 4095.279612] BTRFS error (device sdb): bdev /dev/sde errs: wr 10, rd 0, flush 2, corrupt 0, gen 0 # mount -o degraded /dev/sdb /mnt [ 4608.113751] BTRFS info (device sdb): allowing degraded mounts [ 4608.113756] BTRFS info (device sdb): disk space caching is enabled [ 4608.113757] BTRFS: has skinny extents [ 4608.116557] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 # touch /mnt/test4 # sync Writing to the filesystem works while the device is missing. No new errors in dmesg after re-mounting degraded. Reboot to get back /dev/sde. [4.329852] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d devid 4 transid 26 /dev/sde [4.330157] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d devid 3 transid 31 /dev/sdd [4.330511] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d devid 2 transid 31 /dev/sdc [4.330865] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d devid 1 transid 31 /dev/sdb /dev/sde transid is lagging behind, of course. # wipefs -a /dev/sde # btrfs device scan # mount -o degraded /dev/sdb /mnt [ 507.248621] BTRFS info (device sdb): allowing degraded mounts [ 507.248626] BTRFS info (device sdb): disk space caching is enabled [ 507.248628] BTRFS: has skinny extents [ 507.252815] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 [ 507.252919] BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed [ 507.278277] BTRFS: open_ctree failed Well, that was unexpected! Reboot again. # mount -o degraded /dev/sdb /mnt [ 94.368514] BTRFS info (device sdd): allowing degraded mounts [ 94.368519] BTRFS info (device sdd): disk space caching is enabled [ 94.368521] BTRFS: has skinny extents [ 94.370909] BTRFS warning (device sdd): devid 4 uuid 8549a275-f663-4741-b410-79b49a1d465f is missing [ 94.372170] BTRFS info (device sdd): bdev (null) errs: wr 6, rd 0, flush 1, corrupt 0, gen 0 [ 94.372284] BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed [ 94.395021] BTRFS: open_ctree failed No go. # mount -o degraded,ro /dev/sdb /mnt # btrfs device stats /mnt [/dev/sdb].write_io_errs 0 [/dev/sdb].read_io_errs0 [/dev/sdb].flush_io_errs 0 [/dev/sdb].corruption_errs 0 [/dev/sdb].generation_errs 0 [/dev/sdc].write_io_errs 0 [/dev/sdc].read_io_errs0 [/dev/sdc].flush_io_errs 0 [/dev/sdc].corruption_errs 0 [/dev/sdc].generation_errs 0 [/dev/sdd].write_io_errs 0 [/dev/sdd].read_io_errs0 [/dev/sdd].flush_io_errs 0 [/dev/sdd].corruption_errs 0 [/dev/sdd].generation_errs 0 [(null)].wri
Re: Possible Raid Bug
Hi Patrik, [root@Xen ~]# uname -r 4.4.5-1-ARCH [root@Xen ~]# pacman -Q btrfs-progs btrfs-progs 4.4.1-1 Your information below was very helpful and I was able to recreate the Raid array. However my initial question still stands - What if the drives dies completely? I work in a Data center and we see this quite a lot where a drive is beyond dead - The OS will literally not detect it. At this point would the Raid10 array be beyond repair? As you need the drive present in order to mount the array in degraded mode. -- Stephen Williams steph...@veryfast.biz On Fri, Mar 25, 2016, at 02:57 PM, Patrik Lundquist wrote: > On Debian Stretch with Linux 4.4.6, btrfs-progs 4.4 in VirtualBox > 5.0.16 with 4*2GB VDIs: > > # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sdbe > > # mount /dev/sdb /mnt > # touch /mnt/test > # umount /mnt > > Everything fine so far. > > # wipefs -a /dev/sde > > *reboot* > > # mount /dev/sdb /mnt > mount: wrong fs type, bad option, bad superblock on /dev/sdb, >missing codepage or helper program, or other error > >In some cases useful info is found in syslog - try >dmesg | tail or so. > > # dmesg | tail > [ 85.979655] BTRFS info (device sdb): disk space caching is enabled > [ 85.979660] BTRFS: has skinny extents > [ 85.982377] BTRFS: failed to read the system array on sdb > [ 85.996793] BTRFS: open_ctree failed > > Not very informative! An information regression? > > # mount -o degraded /dev/sdb /mnt > > # dmesg | tail > [ 919.899071] BTRFS info (device sdb): allowing degraded mounts > [ 919.899075] BTRFS info (device sdb): disk space caching is enabled > [ 919.899077] BTRFS: has skinny extents > [ 919.903216] BTRFS warning (device sdb): devid 4 uuid > 8549a275-f663-4741-b410-79b49a1d465f is missing > > # touch /mnt/test2 > # ls -l /mnt/ > total 0 > -rw-r--r-- 1 root root 0 mar 25 15:17 test > -rw-r--r-- 1 root root 0 mar 25 15:42 test2 > > # btrfs device remove missing /mnt > ERROR: error removing device 'missing': unable to go below four > devices on raid10 > > As expected. > > # btrfs replace start -B missing /dev/sde /mnt > ERROR: source device must be a block device or a devid > > Would have been nice if missing worked here too. Maybe it does in > btrfs-progs 4.5? > > # btrfs replace start -B 4 /dev/sde /mnt > > # dmesg | tail > [ 1618.170619] BTRFS info (device sdb): dev_replace from disk> (devid 4) to /dev/sde started > [ 1618.184979] BTRFS info (device sdb): dev_replace from disk> (devid 4) to /dev/sde finished > > Repaired! > > # umount /mnt > # mount /dev/sdb /mnt > # dmesg | tail > [ 1729.917661] BTRFS info (device sde): disk space caching is enabled > [ 1729.917665] BTRFS: has skinny extents > > All in all it works just fine with Linux 4.4.6. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
On Debian Stretch with Linux 4.4.6, btrfs-progs 4.4 in VirtualBox 5.0.16 with 4*2GB VDIs: # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sdbe # mount /dev/sdb /mnt # touch /mnt/test # umount /mnt Everything fine so far. # wipefs -a /dev/sde *reboot* # mount /dev/sdb /mnt mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. # dmesg | tail [ 85.979655] BTRFS info (device sdb): disk space caching is enabled [ 85.979660] BTRFS: has skinny extents [ 85.982377] BTRFS: failed to read the system array on sdb [ 85.996793] BTRFS: open_ctree failed Not very informative! An information regression? # mount -o degraded /dev/sdb /mnt # dmesg | tail [ 919.899071] BTRFS info (device sdb): allowing degraded mounts [ 919.899075] BTRFS info (device sdb): disk space caching is enabled [ 919.899077] BTRFS: has skinny extents [ 919.903216] BTRFS warning (device sdb): devid 4 uuid 8549a275-f663-4741-b410-79b49a1d465f is missing # touch /mnt/test2 # ls -l /mnt/ total 0 -rw-r--r-- 1 root root 0 mar 25 15:17 test -rw-r--r-- 1 root root 0 mar 25 15:42 test2 # btrfs device remove missing /mnt ERROR: error removing device 'missing': unable to go below four devices on raid10 As expected. # btrfs replace start -B missing /dev/sde /mnt ERROR: source device must be a block device or a devid Would have been nice if missing worked here too. Maybe it does in btrfs-progs 4.5? # btrfs replace start -B 4 /dev/sde /mnt # dmesg | tail [ 1618.170619] BTRFS info (device sdb): dev_replace from (devid 4) to /dev/sde started [ 1618.184979] BTRFS info (device sdb): dev_replace from (devid 4) to /dev/sde finished Repaired! # umount /mnt # mount /dev/sdb /mnt # dmesg | tail [ 1729.917661] BTRFS info (device sde): disk space caching is enabled [ 1729.917665] BTRFS: has skinny extents All in all it works just fine with Linux 4.4.6. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
Patrik Lundquist posted on Fri, 25 Mar 2016 13:48:08 +0100 as excerpted: > On 25 March 2016 at 12:49, Stephen Williams > wrote: >> >> So catch 22, you need all the drives otherwise it won't let you mount, >> But what happens if a drive dies and the OS doesn't detect it? BTRFS >> wont allow you to mount the raid volume to remove the bad disk! > > Version of Linux and btrfs-progs? Yes, please. This can be very critical information as a lot of bugs will be fixed in new versions that are known to exist in older versions, and occasionally new ones are introduced as well, where older versions won't be affected. > You can't have a raid10 with less than 4 devices so you need to add a > new device before deleting the missing. That is of course still a > problem with a read-only fs. > > btrfs replace is also the recommended way to replace a failed device > nowadays. The wiki is outdated. In theory, what it's supposed to do in a missing device situation that takes it below the minimum (four devices for a raid10) for a given raid mode, is allow writable mounting, unless the number of missing devices is too high (more than one missing on raid10) to allow functional degraded operation. What it will often end up doing in that case, since it can't write the full raid10, is once current raid10 chunks get filled up and it needs to create more, since it doesn't have enough devices to create them in raid10, it will degrade to creating them in raid1 mode. The problem, however, is that on subsequent mounts, btrfs will see that single chunk in addition to the raid10 chunks, and will see the missing device, and knowing single mode is broken with /any/ missing devices, will at that point only mount read-only. That's a currently known bug, which effectively means you may well get only one read-write mount to fix the problem, before btrfs will see that new single chunk created in the first degraded writable mount, and will refuse to mount writable again. There are patches available that will fix this known bug by changing this detection to per-chunk, instead of per-filesystem. The degraded-writable mount will still degrade to writing single chunks, but btrfs will see that all single chunks are accounted for, and all raid10 chunks only have one device missing and thus can still be used, and the filesystem will thus continue to be write mountable, unless of course another device fails. But AFAIK, those patches were part of a patch set (the hot-spare patches) that as a whole wasn't picked for 4.5, tho by rights the per-chunk checking patches should have been cherry-picked as ready and fixing an existing bug, but weren't. So as of 4.5, AFAIK, they still have to be applied separately before build. Hopefully they'll be in 4.6. However, while lack of the per-chunk checking patch would mean an expected situation of allowing only one degraded-writable mount before no more would be allowed, unless you got it to work once and didn't mention it, and unless that btrfs fi usage was from before that writable mount as it doesn't show the single-mode chunk that would then prevent further writable mounts, it looks like you may have a possibly related, but definitely more severe bug, as it appears you aren't even being allowed what would otherwise be expected to be that one-shot degraded-writable mount. And without that, as mentioned, you have a problem, since you have to have a writable mount to repair the filesystem, and it's not allowing you even that one-shot writable mount that should be possible even with that known bug. Assuming you're using a current kernel and post that information, it's quite likely the dev working on the other bug will be interested, and will have you build a kernel with those patches to see if that alone fixes it, before possibly having you try various debugging patches to hone in on the problem, if it doesn't, so he can hopefully duplicate the problem himself, and ultimately come up with a fix. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
On 25 March 2016 at 12:49, Stephen Williams wrote: > > So catch 22, you need all the drives otherwise it won't let you mount, > But what happens if a drive dies and the OS doesn't detect it? BTRFS > wont allow you to mount the raid volume to remove the bad disk! Version of Linux and btrfs-progs? You can't have a raid10 with less than 4 devices so you need to add a new device before deleting the missing. That is of course still a problem with a read-only fs. btrfs replace is also the recommended way to replace a failed device nowadays. The wiki is outdated. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Possible Raid Bug
Hi, Find instructions on how to recreate below - I have a BTRFS raid 10 setup in Virtualbox (I'm getting to grips with the Filesystem) I have the raid mounted to /mnt like so - [root@Xen ~]# btrfs filesystem show /mnt/ Label: none uuid: ad1d95ee-5cdc-420f-ad30-bd16158ad8cb Total devices 4 FS bytes used 1.00GiB devid1 size 2.00GiB used 927.00MiB path /dev/sdb devid2 size 2.00GiB used 927.00MiB path /dev/sdc devid3 size 2.00GiB used 927.00MiB path /dev/sdd devid4 size 2.00GiB used 927.00MiB path /dev/sde And - [root@Xen ~]# btrfs filesystem usage /mnt/ Overall: Device size: 8.00GiB Device allocated: 3.62GiB Device unallocated:4.38GiB Device missing: 0.00B Used: 2.00GiB Free (estimated): 2.69GiB (min: 2.69GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,RAID10: Size:1.50GiB, Used:1.00GiB /dev/sdb 383.50MiB /dev/sdc 383.50MiB /dev/sdd 383.50MiB /dev/sde 383.50MiB Metadata,RAID10: Size:256.00MiB, Used:1.16MiB /dev/sdb 64.00MiB /dev/sdc 64.00MiB /dev/sdd 64.00MiB /dev/sde 64.00MiB System,RAID10: Size:64.00MiB, Used:16.00KiB /dev/sdb 16.00MiB /dev/sdc 16.00MiB /dev/sdd 16.00MiB /dev/sde 16.00MiB Unallocated: /dev/sdb1.55GiB /dev/sdc1.55GiB /dev/sdd1.55GiB /dev/sde1.55GiB Right so everything looks good and I stuck some dummy files in there too - [root@Xen ~]# ls -lh /mnt/ total 1.1G -rw-r--r-- 1 root root 1.0G May 30 2008 1GB.zip -rw-r--r-- 1 root root 28 Mar 24 15:16 hello -rw-r--r-- 1 root root6 Mar 24 16:12 niglu -rw-r--r-- 1 root root4 Mar 24 15:32 test The bug appears to happen when you try and test out it's ability to handle a dead drive. If you follow the instructions here: https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices It tells you do mount the drive with the 'degraded' option,, however this just does not work, allow me to show - 1) I power off the VM and remove one of the drives (Simulating a drive being pulled from a machine) 2) Power on the VM 3) Check DMESG - Everything looks good 4) Check how BTRFS is feeling - Label: none uuid: ad1d95ee-5cdc-420f-ad30-bd16158ad8cb Total devices 4 FS bytes used 1.00GiB devid1 size 2.00GiB used 1.31GiB path /dev/sdb devid2 size 2.00GiB used 1.31GiB path /dev/sdc devid3 size 2.00GiB used 1.31GiB path /dev/sdd *** Some devices missing So far so good, /dev/sde is missing and BTRFS has detected this. 5) Try and mount it as per the wiki so I can remove the bad drive and replace it with a good one - [root@Xen ~]# mount -o degraded /dev/sdb /mnt/ mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. Ok, this is not good, I check DMESG - [root@Xen ~]# dmesg | tail [4.416445] e1000: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [4.416672] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready [4.631812] snd_intel8x0 :00:05.0: white list rate for 1028:0177 is 48000 [7.091047] floppy0: no floppy controllers found [ 27.488345] BTRFS info (device sdb): allowing degraded mounts [ 27.488348] BTRFS info (device sdb): disk space caching is enabled [ 27.488349] BTRFS: has skinny extents [ 27.489794] BTRFS warning (device sdb): devid 4 uuid ebcd53d9-5956-41d9-b0ef-c59d08e5830f is missing [ 27.491465] BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed [ 27.520231] BTRFS: open_ctree failed So here lies the problem - BTRFS needs you to have all the devices present in order to mount is as writeable, however if a drive dies spectacularly (as they can do) You can't have that do that. And as a result you cannot mount any of the remaining drives and fix the problem. Now you ARE able to mount it read only but you can't issue the fix that is recommend on the wiki, see here - [root@Xen ~]# mount -o ro,degraded /dev/sdb /mnt/ [root@Xen ~]# btrfs device delete missing /mnt/ ERROR: error removing device 'missing': Read-only file system So catch 22, you need all the drives otherwise it won't let you mount, But what happens if a drive dies and the OS doesn't detect it? BTRFS wont allow you to mount the raid volume to remove the bad disk! I also tried it with read only - [root@Xen ~]# mount -o ro,degraded /dev/sdb /mnt/ [root@Xen ~]# btrfs device delete missing /mnt/ ERROR: error removing device 'missing': Read-only file system subscribe linux-btrfs -- To unsubscribe from this list: send the line "unsubscri