Re: RAID Assembly with Missing Empty Drive
> I think it is best that you just repeat the fixing again on the real > disks and just make sure you have an uptodate/latest kernel+tools when > fixing the few damaged files. > With btrfs inspect-internal inode-resolve 257 > you can see what file(s) are damaged. I inspected the damaged files, they are base directories on the two file systems that definitely don't have issues seen by btrfs replace: ubuntu@btrfs-recovery:~$ sudo btrfs inspect-internal inode-resolve 257 /mnt/@home /mnt/@home/aidan ubuntu@btrfs-recovery:~$ sudo btrfs inspect-internal inode-resolve 257 /mnt/@ /mnt/@/home I've completed recovery using the stock 4.5 kernel.org kernel. I'm running a scrub now and it's going well so far. Once someone authorizes my wiki account request I will update the wiki with information on using replace instead of add/delete as well as setting up overlay devices for filesystem recovery testing. Thanks to everyone on the list and irc for their help with my problems. -JohnF -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
On Sun, Mar 27, 2016 at 4:59 PM, John Marrettwrote: >>> If you do want to use a newer one, I'd build against kernel.org, just >>> because the developers only use that base. And use 4.4.6 or 4.5. >> >> At this point I could remove the overlays and recover the filesystem >> permanently, however I'm also deeply indebted to the btrfs community >> and want to give anything I can back. I've built (but not installed ;) >> ) a straight kernel.org 4.5 w/my missing device check patch applied. >> Is there any interest or value in attempting to switch to this kernel, >> add/delete a device and see if I experience the same errors as before >> I tried replace? What information should I gather if I do this? > > I've built and installed a 4.5 straight from kernel.org with my patch. > > I encountered the same errors in recovery when I use add/delete > instead of using replace, here's the sequence of commands: > > ubuntu@btrfs-recovery:~$ sudo mount -o degraded,ro /dev/sda /mnt > ubuntu@btrfs-recovery:~$ sudo mount -o remount,rw /mnt > # Remove first empty device > ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt > # Add blank drive > ubuntu@btrfs-recovery:~$ sudo btrfs device add /dev/sde /mnt > # Remove second missing device with data > ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt > > And the resulting error: > > ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt > ERROR: error removing the device 'missing' - Input/output error > > Here's what we see in dmesg after deleting the missing device: > > [ 588.231341] BTRFS info (device sdd): relocating block group > 10560347308032 flags 17 > [ 664.306122] BTRFS warning (device sdd): csum failed ino 257 off > 695730176 csum 2566472073 expected csum 2706136415 > [ 664.306164] BTRFS warning (device sdd): csum failed ino 257 off > 695734272 csum 2566472073 expected csum 2558511802 > [ 664.306182] BTRFS warning (device sdd): csum failed ino 257 off > 695746560 csum 2566472073 expected csum 3360772439 > [ 664.306191] BTRFS warning (device sdd): csum failed ino 257 off > 695750656 csum 2566472073 expected csum 1205516886 > [ 664.344179] BTRFS warning (device sdd): csum failed ino 257 off > 695730176 csum 2566472073 expected csum 2706136415 > [ 664.344213] BTRFS warning (device sdd): csum failed ino 257 off > 695734272 csum 2566472073 expected csum 2558511802 > [ 664.344224] BTRFS warning (device sdd): csum failed ino 257 off > 695746560 csum 2566472073 expected csum 3360772439 > [ 664.344233] BTRFS warning (device sdd): csum failed ino 257 off > 695750656 csum 2566472073 expected csum 1205516886 > [ 664.344684] BTRFS warning (device sdd): csum failed ino 257 off > 695730176 csum 2566472073 expected csum 2706136415 > [ 664.344693] BTRFS warning (device sdd): csum failed ino 257 off > 695734272 csum 2566472073 expected csum 2558511802 > > Is there anything of value I can do here to help address this possible > issue in btrfs itself, or should I remove the overlays, replace the > device and move on? > > Please let me know, I think it is great that with your local patch you managed to get into a writable situation. In theory, with for example already a new spare disk already attached and standby (hot spare patchset and more etc), a direct replace of the failing disk, so internally or manually with btrfs-replace would have prevented the few csum and other small errors. It could be that the errors have another cause than due to the complete failing harddisk initially, but that won't be easy to trackdown black and white. Also the ddrescue action and local patch make tracking back difficult and it was also based on outdated kernel+tools. I think it is best that you just repeat the fixing again on the real disks and just make sure you have an uptodate/latest kernel+tools when fixing the few damaged files. With btrfs inspect-internal inode-resolve 257 you can see what file(s) are damaged. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
>> If you do want to use a newer one, I'd build against kernel.org, just >> because the developers only use that base. And use 4.4.6 or 4.5. > > At this point I could remove the overlays and recover the filesystem > permanently, however I'm also deeply indebted to the btrfs community > and want to give anything I can back. I've built (but not installed ;) > ) a straight kernel.org 4.5 w/my missing device check patch applied. > Is there any interest or value in attempting to switch to this kernel, > add/delete a device and see if I experience the same errors as before > I tried replace? What information should I gather if I do this? I've built and installed a 4.5 straight from kernel.org with my patch. I encountered the same errors in recovery when I use add/delete instead of using replace, here's the sequence of commands: ubuntu@btrfs-recovery:~$ sudo mount -o degraded,ro /dev/sda /mnt ubuntu@btrfs-recovery:~$ sudo mount -o remount,rw /mnt # Remove first empty device ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt # Add blank drive ubuntu@btrfs-recovery:~$ sudo btrfs device add /dev/sde /mnt # Remove second missing device with data ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt And the resulting error: ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt ERROR: error removing the device 'missing' - Input/output error Here's what we see in dmesg after deleting the missing device: [ 588.231341] BTRFS info (device sdd): relocating block group 10560347308032 flags 17 [ 664.306122] BTRFS warning (device sdd): csum failed ino 257 off 695730176 csum 2566472073 expected csum 2706136415 [ 664.306164] BTRFS warning (device sdd): csum failed ino 257 off 695734272 csum 2566472073 expected csum 2558511802 [ 664.306182] BTRFS warning (device sdd): csum failed ino 257 off 695746560 csum 2566472073 expected csum 3360772439 [ 664.306191] BTRFS warning (device sdd): csum failed ino 257 off 695750656 csum 2566472073 expected csum 1205516886 [ 664.344179] BTRFS warning (device sdd): csum failed ino 257 off 695730176 csum 2566472073 expected csum 2706136415 [ 664.344213] BTRFS warning (device sdd): csum failed ino 257 off 695734272 csum 2566472073 expected csum 2558511802 [ 664.344224] BTRFS warning (device sdd): csum failed ino 257 off 695746560 csum 2566472073 expected csum 3360772439 [ 664.344233] BTRFS warning (device sdd): csum failed ino 257 off 695750656 csum 2566472073 expected csum 1205516886 [ 664.344684] BTRFS warning (device sdd): csum failed ino 257 off 695730176 csum 2566472073 expected csum 2706136415 [ 664.344693] BTRFS warning (device sdd): csum failed ino 257 off 695734272 csum 2566472073 expected csum 2558511802 Is there anything of value I can do here to help address this possible issue in btrfs itself, or should I remove the overlays, replace the device and move on? Please let me know, -JohnF -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
>> I was looking under btrfs device, sorry about that. I do have the >> command. I tried replace and it seemed more promising than the last >> attempt, it wrote enough data to the new drive to overflow and break >> my overlay. I'm trying it without the overlay on the destination >> device, I'll report back later with the results. It looks like replace worked! I got the following final output: ubuntu@btrfs-recovery:~$ sudo btrfs replace status /mnt Started on 26.Mar 20:59:12, finished on 27.Mar 05:20:01, 0 write errs, 0 uncorr. read errs The filesystem appears in good health, no more missing devices: ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7 Total devices 6 FS bytes used 5.47TiB devid1 size 1.81TiB used 1.71TiB path /dev/sdb devid2 size 1.81TiB used 1.71TiB path /dev/sde devid3 size 1.82TiB used 1.72TiB path /dev/sdd devid4 size 1.82TiB used 1.72TiB path /dev/sdc devid5 size 2.73TiB used 2.62TiB path /dev/sda devid6 size 2.73TiB used 2.62TiB path /dev/sdf btrfs-progs v4.0 However the dmesg output shows some errors despite the 0 uncorr. read errs reported above: [112178.006315] BTRFS: checksum error at logical 8576298061824 on dev /dev/sda, sector 4333289864, root 259, inode 10017264, offset 3216, length 4096, links 1 (path: mythtv/store/4663_20150809180500.mpg) [112178.006327] btrfs_dev_stat_print_on_error: 5 callbacks suppressed [112178.006330] BTRFS: bdev /dev/sda errs: wr 0, rd 5002, flush 0, corrupt 16, g en 0 And the underlying file does appear to be damaged: ubuntu@btrfs-recovery:/mnt/@home/mythtv$ dd if=store/4663_20150809180500.mpg of=/dev/null dd: error reading ‘store/4663_20150809180500.mpg’: Input/output error 63368+0 records in 63368+0 records out 3216 bytes (32 MB) copied, 1.08476 s, 29.9 MB/s Here's some dmesg output when accessing a damaged file: [140789.642357] BTRFS warning (device sdc): csum failed ino 10017264 off 32854016 csum 2566472073 expected csum 1193787476 [140789.642503] BTRFS warning (device sdc): csum failed ino 10017264 off 32919552 csum 2566472073 expected csum 2825707817 [140789.645768] BTRFS warning (device sdc): csum failed ino 10017264 off 32509952 csum 2566472073 expected csum 834024150 I can also see that one device has had a few errors; this is the device that was ddrescued and recorded some errors before being ddrescued: [/dev/sda].write_io_errs 0 [/dev/sda].read_io_errs5002 [/dev/sda].flush_io_errs 0 [/dev/sda].corruption_errs 153 [/dev/sda].generation_errs 0 > If you do want to use a newer one, I'd build against kernel.org, just > because the developers only use that base. And use 4.4.6 or 4.5. At this point I could remove the overlays and recover the filesystem permanently, however I'm also deeply indebted to the btrfs community and want to give anything I can back. I've built (but not installed ;) ) a straight kernel.org 4.5 w/my missing device check patch applied. Is there any interest or value in attempting to switch to this kernel, add/delete a device and see if I experience the same errors as before I tried replace? What information should I gather if I do this? -JohnF -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
On Sat, Mar 26, 2016 at 3:01 PM, John Marrettwrote: >> Well off hand it seems like the missing 2.73TB has nothing on it at >> all, and doesn't need to be counted as missing. The other missing is >> counted, and should have all of its data replicated elsewhere. But >> then you're running into csum errors. So something still isn't right, >> we just don't understand what it is. > > I'm not sure what we can do to get a better understanding of these > errors, that said it may not be necessary if replace helps, more > below. > >> Btrfs replace has been around for a while. 'man btrfs replace' the >> command takes the form 'btrfs replace start' plus three required >> pieces of information. You should be able to infer the missing devid >> using 'btrfs show' looks like it's 6. > > I was looking under btrfs device, sorry about that. I do have the > command. I tried replace and it seemed more promising than the last > attempt, it wrote enough data to the new drive to overflow and break > my overlay. I'm trying it without the overlay on the destination > device, I'll report back later with the results. > > I'm running ubuntu linux-image-4.2.0-34-generic with a patch to remove > this check: > > https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770 > > I can switch to whatever kernel though as desired. Would you prefer a > mainline ubuntu packaged kernel? Straight from kernel.org? Things are a lot more deterministic for developers and testers if you're using something current. It might not matter in this case that you're using 4.2 but all you have to do is look at the git pulls in the list archives to see many hundreds, often over 1000, btrfs changes per kernel cycle. So, lots and lots of fixes have happened since 4.2. And any bugs found in 4.2 don't really matter, because you'd have to try to reproduce in 4.4.6 or 4.5, and then the fix would go into 4.6 before it'd get backported, and then 4.2 won't be getting backports done by upstream. That's why list folks always suggest using something so recent. Again, in this case it might not matter, I don't read or understand every single commit. If you do want to use a newer one, I'd build against kernel.org, just because the developers only use that base. And use 4.4.6 or 4.5. It's reasonable to keep the overlay on the existing devices, but remove the overlay for the replacement so that you're directly writing to it. If that blows up with 4.2 you can still start over with a newer kernel. *shrug* -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
> Well off hand it seems like the missing 2.73TB has nothing on it at > all, and doesn't need to be counted as missing. The other missing is > counted, and should have all of its data replicated elsewhere. But > then you're running into csum errors. So something still isn't right, > we just don't understand what it is. I'm not sure what we can do to get a better understanding of these errors, that said it may not be necessary if replace helps, more below. > Btrfs replace has been around for a while. 'man btrfs replace' the > command takes the form 'btrfs replace start' plus three required > pieces of information. You should be able to infer the missing devid > using 'btrfs show' looks like it's 6. I was looking under btrfs device, sorry about that. I do have the command. I tried replace and it seemed more promising than the last attempt, it wrote enough data to the new drive to overflow and break my overlay. I'm trying it without the overlay on the destination device, I'll report back later with the results. I'm running ubuntu linux-image-4.2.0-34-generic with a patch to remove this check: https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770 I can switch to whatever kernel though as desired. Would you prefer a mainline ubuntu packaged kernel? Straight from kernel.org? -JohnF -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
On Sat, Mar 26, 2016 at 6:15 AM, John Marrettwrote: > Chris, > >> Post 'btrfs fi usage' for the fileystem. That may give some insight >> what's expected to be on all the missing drives. > > Here's the information, I believe that the missing we see in most > entries is the failed and absent drive, only the unallocated shows two > missing entries, the 2.73 TB is the missing but empty device. I don't > know if there's a way to prove it however. > > ubuntu@btrfs-recovery:~$ sudo btrfs fi usage /mnt > Overall: > Device size: 15.45TiB > Device allocated: 12.12TiB > Device unallocated: 3.33TiB > Device missing: 5.46TiB > Used: 10.93TiB > Free (estimated): 2.25TiB(min: 2.25TiB) > Data ratio: 2.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB(used: 0.00B) > > Data,RAID1: Size:6.04TiB, Used:5.46TiB >/dev/sda 2.61TiB >/dev/sdb 1.71TiB >/dev/sdc 1.72TiB >/dev/sdd 1.72TiB >/dev/sdf 1.71TiB >missing 2.61TiB > > Metadata,RAID1: Size:14.00GiB, Used:11.59GiB >/dev/sda 8.00GiB >/dev/sdb 2.00GiB >/dev/sdc 3.00GiB >/dev/sdd 4.00GiB >/dev/sdf 3.00GiB >missing 8.00GiB > > System,RAID1: Size:32.00MiB, Used:880.00KiB >/dev/sda 32.00MiB >missing 32.00MiB > > Unallocated: >/dev/sda 111.49GiB >/dev/sdb 98.02GiB >/dev/sdc 98.02GiB >/dev/sdd 98.02GiB >/dev/sdf 98.02GiB >missing 111.49GiB >missing 2.73TiB > > I tried to remove missing, first remove missing only removes the > 2.73TiB missing entry seen above. All the other missing entries > remain. Well off hand it seems like the missing 2.73TB has nothing on it at all, and doesn't need to be counted as missing. The other missing is counted, and should have all of its data replicated elsewhere. But then you're running into csum errors. So something still isn't right, we just don't understand what it is. > I can't "replace", it's not a valid command on my btrfs tools version; > I upgraded btrfs this morning in order to have the btrfs fi usage > command. Btrfs replace has been around for a while. 'man btrfs replace' the command takes the form 'btrfs replace start' plus three required pieces of information. You should be able to infer the missing devid using 'btrfs show' looks like it's 6. > ubuntu@btrfs-recovery:~$ sudo btrfs version > btrfs-progs v4.0 > ubuntu@btrfs-recovery:~$ dpkg -l | grep btrfs > ii btrfs-tools4.0-2 > amd64Checksumming Copy on Write Filesystem utilities I would use something newer, but btrfs replace is in 4.0. But I also don't see in this thread what kernel version you're using. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
Chris, > Post 'btrfs fi usage' for the fileystem. That may give some insight > what's expected to be on all the missing drives. Here's the information, I believe that the missing we see in most entries is the failed and absent drive, only the unallocated shows two missing entries, the 2.73 TB is the missing but empty device. I don't know if there's a way to prove it however. ubuntu@btrfs-recovery:~$ sudo btrfs fi usage /mnt Overall: Device size: 15.45TiB Device allocated: 12.12TiB Device unallocated: 3.33TiB Device missing: 5.46TiB Used: 10.93TiB Free (estimated): 2.25TiB(min: 2.25TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB(used: 0.00B) Data,RAID1: Size:6.04TiB, Used:5.46TiB /dev/sda 2.61TiB /dev/sdb 1.71TiB /dev/sdc 1.72TiB /dev/sdd 1.72TiB /dev/sdf 1.71TiB missing 2.61TiB Metadata,RAID1: Size:14.00GiB, Used:11.59GiB /dev/sda 8.00GiB /dev/sdb 2.00GiB /dev/sdc 3.00GiB /dev/sdd 4.00GiB /dev/sdf 3.00GiB missing 8.00GiB System,RAID1: Size:32.00MiB, Used:880.00KiB /dev/sda 32.00MiB missing 32.00MiB Unallocated: /dev/sda 111.49GiB /dev/sdb 98.02GiB /dev/sdc 98.02GiB /dev/sdd 98.02GiB /dev/sdf 98.02GiB missing 111.49GiB missing 2.73TiB I tried to remove missing, first remove missing only removes the 2.73TiB missing entry seen above. All the other missing entries remain. I can't "replace", it's not a valid command on my btrfs tools version; I upgraded btrfs this morning in order to have the btrfs fi usage command. ubuntu@btrfs-recovery:~$ sudo btrfs version btrfs-progs v4.0 ubuntu@btrfs-recovery:~$ dpkg -l | grep btrfs ii btrfs-tools4.0-2 amd64Checksumming Copy on Write Filesystem utilities For those interested in my recovery techniques, here's how I rebuild the overlay loop devices, be careful, these scripts make certain assumptions that may not be accurate for your system: On Client: sudo umount /mnt sudo /etc/init.d/open-iscsi stop On Server: /etc/init.d/iscsitarget stop loop_devices=$(losetup -a | grep overlay | tr ":" " " | awk ' { printf $1 " " } END { print "" } ') for fn in /dev/mapper/sd??; do dmsetup remove $fn; done for ln in $loop_devices; do losetup -d $ln; done cd /home/ubuntu rm sd*overlay for device in sda3 sdb3 sdc1 sdd1 sde1 sdf1; do dev=/dev/$device ovl=/home/ubuntu/$device-overlay truncate -s512M $ovl newdevname=$device size=$(blockdev --getsize "$dev") loop=$(losetup -f --show -- "$ovl") echo Setting up loop for $dev using overlay $ovl on loop $loop for target $newdevname printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdevname" done Start the targets /etc/init.d/iscsitarget start On Client: sudo /etc/init.d/open-iscsi start -JohnF -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
[let me try keeping the list cc'd] On Fri, Mar 25, 2016 at 7:21 PM, John Marrettwrote: > Chris, > >> Quite honestly I don't understand how Btrfs raid1 volume with two >> missing devices even permits you to mount it degraded,rw in the first >> place. > > I think you missed my previous post, it's simple, I patched the kernel > to bypass the check for missing devices with rw mounts, I did this > because one of my missing devices has no data on it, it's actually > confirmed by my mounting as you can see here: > Yeah too many emails today, and I'm skimming too much. > > ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show > Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7 > Total devices 7 FS bytes used 5.47TiB > devid1 size 1.81TiB used 1.71TiB path /dev/sde > devid2 size 1.81TiB used 1.71TiB path /dev/sda > devid3 size 1.82TiB used 1.72TiB path /dev/sdc > devid4 size 1.82TiB used 1.72TiB path /dev/sdd > devid5 size 2.73TiB used 2.62TiB path /dev/sdf > devid6 size 2.73TiB used 2.62TiB path > devid7 size 2.73TiB used 0.00 path > >> Anyway, maybe it's possible there's no dual missing metadata chunks, >> although I find it hard to believe. > > Considering the above do you still think that I may have missing metadata? Post 'btrfs fi usage' for the fileystem. That may give some insight what's expected to be on all the missing drives. > >> Because there are two devices missing, I doubt this matters, but I >> think you're better off using 'btrfs replace' for this rather than >> 'device add' followed by 'device remove'. The two catches with > > I'll try btrfs replace for the second device (with data) after > removing the first. > > Do you think my chances are better moving data off the array in read only > mode? My expectation is that whether copying everything or using replace, if either process arrives at no metadata copies found, it's going to stop whatever it's doing. Question is only how that manifests. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
Chris, > Quite honestly I don't understand how Btrfs raid1 volume with two > missing devices even permits you to mount it degraded,rw in the first > place. I think you missed my previous post, it's simple, I patched the kernel to bypass the check for missing devices with rw mounts, I did this because one of my missing devices has no data on it, it's actually confirmed by my mounting as you can see here: ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7 Total devices 7 FS bytes used 5.47TiB devid1 size 1.81TiB used 1.71TiB path /dev/sde devid2 size 1.81TiB used 1.71TiB path /dev/sda devid3 size 1.82TiB used 1.72TiB path /dev/sdc devid4 size 1.82TiB used 1.72TiB path /dev/sdd devid5 size 2.73TiB used 2.62TiB path /dev/sdf devid6 size 2.73TiB used 2.62TiB path devid7 size 2.73TiB used 0.00 path > Anyway, maybe it's possible there's no dual missing metadata chunks, > although I find it hard to believe. Considering the above do you still think that I may have missing metadata? > Because there are two devices missing, I doubt this matters, but I > think you're better off using 'btrfs replace' for this rather than > 'device add' followed by 'device remove'. The two catches with I'll try btrfs replace for the second device (with data) after removing the first. Do you think my chances are better moving data off the array in read only mode? -JohnF -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
On Fri, Mar 25, 2016 at 4:31 PM, John Marrettwrote: > Continuing with my recovery efforts I've built overlay mounts of each > of the block devices supporting my btrfs filesystem as well as the new > disk I'm trying to introduce. I have patched the kernel to disable the > check for multiple missing devices. I then exported the overlayed > devices using iSCSI to a second system to attempt the recovery. > > I am able to mount the device rw, then I can remove missing devices > which removes the missing empty disk. I can add in a new device to the > filesystem and then attempt to remove the second missing disk (which > has 2.7 TB of content on it). > > Unfortunately this removal fails as follows: > > ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt > ERROR: error removing the device 'missing' - Input/output error Quite honestly I don't understand how Btrfs raid1 volume with two missing devices even permits you to mount it degraded,rw in the first place. That's rather mystifying considering the other thread where there's a 4 disk raid10 with one missing device, and rw,degraded mount is allowed only once, after that it disallows further attempts to rw,degraded mount it. Anyway, maybe it's possible there's no dual missing metadata chunks, although I find it hard to believe. But OK, maybe it works for a while and you can copy some stuff off the drives where there's at least one data copy. If there's dual missing data copies but there's still at least 1 metadata copy, then file system will just spit out noisy error messages. But if there ends up being dual missing metadata, I expect a crash, or the file system goes read only, or maybe even unmounts. I'm not sure. But once there's 0 copies of metadata I don't see how the file system can correct for that. Because there are two devices missing, I doubt this matters, but I think you're better off using 'btrfs replace' for this rather than 'device add' followed by 'device remove'. The two catches with replace: the replacement device must be as big or bigger than the one being replaced; you have to do a resize on the replacement device, using 'fi resize devid:max' to use all the space if the new one is bigger than the old device. But I suspect either the first or second replacement will fail also, it's too many missing devices. So what can happen, if there's 0 copies of metadata, is that you might not get everything off the drives before you hit the 0 copies problem and the ensuing face plant. In that case you might have to depend on btrfs restore. It could be really tedious to find out what can be scraped. But I still think you're better off than any other file system in this case, because they wouldn't even mount if there were two mirrors lost. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
Continuing with my recovery efforts I've built overlay mounts of each of the block devices supporting my btrfs filesystem as well as the new disk I'm trying to introduce. I have patched the kernel to disable the check for multiple missing devices. I then exported the overlayed devices using iSCSI to a second system to attempt the recovery. I am able to mount the device rw, then I can remove missing devices which removes the missing empty disk. I can add in a new device to the filesystem and then attempt to remove the second missing disk (which has 2.7 TB of content on it). Unfortunately this removal fails as follows: ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt ERROR: error removing the device 'missing' - Input/output error The kernel shows: [ 2772.000680] BTRFS warning (device sdd): csum failed ino 257 off 695730176 csum 2566472073 expected csum 2706136415 [ 2772.000724] BTRFS warning (device sdd): csum failed ino 257 off 695734272 csum 2566472073 expected csum 2558511802 [ 2772.000736] BTRFS warning (device sdd): csum failed ino 257 off 695746560 csum 2566472073 expected csum 3360772439 [ 2772.000742] BTRFS warning (device sdd): csum failed ino 257 off 695750656 csum 2566472073 expected csum 1205516886 [...] Can anyone offer any advice as to how I should proceed from here? One safe option is recreating the array. Now that I have discovered I can mount the filesystem in degraded,ro mode I could purchase another new disk, this will give me enough free disk space to copy all the data off this array and onto a new non-redundant array. I can then add all the drives in to the new array and convert it back to RAID1. Here's a full breakdown of the commands that I ran in the process I describe above; my patch only allows a remount with a missing device, it's not very significant: ubuntu@btrfs-recovery:~$ sudo mount -o degraded,ro /dev/sda /mnt ubuntu@btrfs-recovery:~$ sudo mount -o remount,rw /mnt Here we see the two missing devices: ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7 Total devices 7 FS bytes used 5.47TiB devid1 size 1.81TiB used 1.71TiB path /dev/sde devid2 size 1.81TiB used 1.71TiB path /dev/sda devid3 size 1.82TiB used 1.72TiB path /dev/sdc devid4 size 1.82TiB used 1.72TiB path /dev/sdd devid5 size 2.73TiB used 2.62TiB path /dev/sdf devid6 size 2.73TiB used 2.62TiB path devid7 size 2.73TiB used 0.00 path I remove the first missing device: ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt The unused missing device is removed: ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7 Total devices 6 FS bytes used 5.47TiB devid1 size 1.81TiB used 1.71TiB path /dev/sde devid2 size 1.81TiB used 1.71TiB path /dev/sda devid3 size 1.82TiB used 1.72TiB path /dev/sdc devid4 size 1.82TiB used 1.72TiB path /dev/sdd devid5 size 2.73TiB used 2.62TiB path /dev/sdf devid6 size 2.73TiB used 2.62TiB path I add a new device: ubuntu@btrfs-recovery:~$ sudo btrfs device add /dev/sdb /mnt ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7 Total devices 7 FS bytes used 5.47TiB devid1 size 1.81TiB used 1.71TiB path /dev/sde devid2 size 1.81TiB used 1.71TiB path /dev/sda devid3 size 1.82TiB used 1.72TiB path /dev/sdc devid4 size 1.82TiB used 1.72TiB path /dev/sdd devid5 size 2.73TiB used 2.62TiB path /dev/sdf devid6 size 2.73TiB used 2.62TiB path devid7 size 2.73TiB used 0.00 path /dev/sdb Here's some more details on the techniques necessary to get to this point, in the hopes that others can benefit from them. I will also update the apparently broken parallels scripts on the mdadm wiki. To create overlay mounts use the following script; it will create overlays for each device in the device list, using a sparse overlay file located in /home/ubuntu/$device-overlay, each overlay will be performed using a 512 MB file (the size passed to truncate). for device in sda3 sdb3 sdc1 sdd1 sde1 sdf1; do dev=/dev/$device ovl=/home/ubuntu/$device-overlay truncate -s512M $ovl newdevname=$device size=$(blockdev --getsize "$dev") loop=$(losetup -f --show -- "$ovl") echo Setting up loop for $dev using overlay $ovl on loop $loop for target $newdevname printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdevname" done I used iscsitarget to export the block devices from the server, configuration files are as follows (on ubuntu): Install sudo apt install iscsitarget Enable /etc/default/iscsitarget ISCSITARGET_ENABLE=true Exports /etc/iet/ietd.conf Target iqn.2001-04.com.example:storage.lun1 IncomingUser OutgoingUser Lun 0 Path=/dev/mapper/sda3,Type=fileio Alias LUN1 Target
Re: RAID Assembly with Missing Empty Drive
Henk, > I asume you did btrfs device add ? > Or did you do this withbtrfs replace ? Just realised I missed this question, sorry, I performed an add followed by a (failed) delete. -JohnF > >> filesystem successfully, when I attempted to remove the failed drive I >> encountered an error. I discovered that I actually experienced a dual >> drive failure, the second drive only exhibited as failed when btrfs >> tried to write to the drives in the filesystem when I removed the >> disk. >> >> I shut down the array and imaged the failed drive using GNU ddrescue, >> I was able to recover all but a few kb from the drive. Unfortunately, >> when I imaged the drive I overwrote the drive that I had successfully >> added to the filesystem. >> >> This brings me to my current state, I now have two devices missing: >> >> - the completely failed drive >> - the empty drive that I overwrote with the second failed disks image >> >> Consequently I can't start the filesystem. I've discussed the issue in >> the past with Ke and other people on the #btrfs channel, the >> concensus; as I understood it, is that with the right patch it should >> be possible to mount either the array with the empty drive absent or >> to create a new btrfs fileystem on an empty drive and then manipulate >> its UUIDs so that it believes it's the missing UUID from the existing >> btrfs filesystem. >> >> Here's the info showing the current state of the filesystem: >> >> ubuntu@ubuntu:~$ sudo btrfs filesystem show >> warning, device 6 is missing >> warning devid 6 not found already >> warning devid 7 not found already >> Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7 >> Total devices 7 FS bytes used 5.47TiB >> devid1 size 1.81TiB used 1.71TiB path /dev/sda3 >> devid2 size 1.81TiB used 1.71TiB path /dev/sdb3 >> devid3 size 1.82TiB used 1.72TiB path /dev/sdc1 >> devid4 size 1.82TiB used 1.72TiB path /dev/sdd1 >> devid5 size 2.73TiB used 2.62TiB path /dev/sde1 >> *** Some devices missing >> btrfs-progs v4.0 > > The used kernel version might also give people some hints. > > Also, you have not stated what raid type the fs is; likely not raid6, > but rather raid 1 or 10 or 5 > btrfs filesystem usage will report and show this. > > If it is raid6, you could still fix the issue in theory. AFAIK there > are no patches to fix a dual error in case it is other raid type or > single. The only option is then to use btrfs rescue on the > umounted array and hope to copy as much as possible off the damaged fs > to other storage. > >> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt >> mount: wrong fs type, bad option, bad superblock on /dev/sda3, >>missing codepage or helper program, or other error >> >>In some cases useful info is found in syslog - try >>dmesg | tail or so. >> ubuntu@ubuntu:~$ dmesg >> [...] >> [ 749.322385] BTRFS info (device sde1): allowing degraded mounts >> [ 749.322404] BTRFS info (device sde1): disk space caching is enabled >> [ 749.323571] BTRFS warning (device sde1): devid 6 uuid >> f41bcb72-e88a-432f-9961-01307ec291a9 is missing >> [ 749.335543] BTRFS warning (device sde1): devid 7 uuid >> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing >> [ 749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378, >> flush 0, corrupt 0, gen 0 >> [ 749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0, >> corrupt 0, gen 0 >> [ 774.759717] BTRFS: too many missing devices, writeable mount is not >> allowed >> [ 774.804053] BTRFS: open_ctree failed >> >> Thank you in advance for your help, >> >> -JohnF >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
After further discussion in #btrfs: I left out the raid level, it's raid1: ubuntu@ubuntu:~$ sudo btrfs filesystem df /mnt Data, RAID1: total=6.04TiB, used=5.46TiB System, RAID1: total=32.00MiB, used=880.00KiB Metadata, RAID1: total=14.00GiB, used=11.59GiB GlobalReserve, single: total=512.00MiB, used=0.00B It is possible to mount the filesystem with -o recover,ro It may be possible to comment out this check: https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770 And then to mount read/write, remove the failed drive, add a new drive. If there are no more interesting suggestions forthcoming I will try it, though to test I'll want to overlay the underlying devices and then export them using iSCSI, AoE or NBD in order to avoid further damage to my filesystem. Unfortunately I don't have nearly enough disk space available to make a complete copy of the data and rebuild the filesystem. -JohnF On Tue, Mar 22, 2016 at 5:18 PM, Henk Slagerwrote: > On Tue, Mar 22, 2016 at 9:19 PM, John Marrett wrote: >> I recently had a drive failure in a file server running btrfs. The >> failed drive was completely non-functional. I added a new drive to the > > I asume you did btrfs device add ? > Or did you do this withbtrfs replace ? > >> filesystem successfully, when I attempted to remove the failed drive I >> encountered an error. I discovered that I actually experienced a dual >> drive failure, the second drive only exhibited as failed when btrfs >> tried to write to the drives in the filesystem when I removed the >> disk. >> >> I shut down the array and imaged the failed drive using GNU ddrescue, >> I was able to recover all but a few kb from the drive. Unfortunately, >> when I imaged the drive I overwrote the drive that I had successfully >> added to the filesystem. >> >> This brings me to my current state, I now have two devices missing: >> >> - the completely failed drive >> - the empty drive that I overwrote with the second failed disks image >> >> Consequently I can't start the filesystem. I've discussed the issue in >> the past with Ke and other people on the #btrfs channel, the >> concensus; as I understood it, is that with the right patch it should >> be possible to mount either the array with the empty drive absent or >> to create a new btrfs fileystem on an empty drive and then manipulate >> its UUIDs so that it believes it's the missing UUID from the existing >> btrfs filesystem. >> >> Here's the info showing the current state of the filesystem: >> >> ubuntu@ubuntu:~$ sudo btrfs filesystem show >> warning, device 6 is missing >> warning devid 6 not found already >> warning devid 7 not found already >> Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7 >> Total devices 7 FS bytes used 5.47TiB >> devid1 size 1.81TiB used 1.71TiB path /dev/sda3 >> devid2 size 1.81TiB used 1.71TiB path /dev/sdb3 >> devid3 size 1.82TiB used 1.72TiB path /dev/sdc1 >> devid4 size 1.82TiB used 1.72TiB path /dev/sdd1 >> devid5 size 2.73TiB used 2.62TiB path /dev/sde1 >> *** Some devices missing >> btrfs-progs v4.0 > > The used kernel version might also give people some hints. > > Also, you have not stated what raid type the fs is; likely not raid6, > but rather raid 1 or 10 or 5 > btrfs filesystem usage will report and show this. > > If it is raid6, you could still fix the issue in theory. AFAIK there > are no patches to fix a dual error in case it is other raid type or > single. The only option is then to use btrfs rescue on the > umounted array and hope to copy as much as possible off the damaged fs > to other storage. > >> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt >> mount: wrong fs type, bad option, bad superblock on /dev/sda3, >>missing codepage or helper program, or other error >> >>In some cases useful info is found in syslog - try >>dmesg | tail or so. >> ubuntu@ubuntu:~$ dmesg >> [...] >> [ 749.322385] BTRFS info (device sde1): allowing degraded mounts >> [ 749.322404] BTRFS info (device sde1): disk space caching is enabled >> [ 749.323571] BTRFS warning (device sde1): devid 6 uuid >> f41bcb72-e88a-432f-9961-01307ec291a9 is missing >> [ 749.335543] BTRFS warning (device sde1): devid 7 uuid >> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing >> [ 749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378, >> flush 0, corrupt 0, gen 0 >> [ 749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0, >> corrupt 0, gen 0 >> [ 774.759717] BTRFS: too many missing devices, writeable mount is not >> allowed >> [ 774.804053] BTRFS: open_ctree failed >> >> Thank you in advance for your help, >> >> -JohnF >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line
Re: RAID Assembly with Missing Empty Drive
On Tue, Mar 22, 2016 at 9:19 PM, John Marrettwrote: > I recently had a drive failure in a file server running btrfs. The > failed drive was completely non-functional. I added a new drive to the I asume you did btrfs device add ? Or did you do this withbtrfs replace ? > filesystem successfully, when I attempted to remove the failed drive I > encountered an error. I discovered that I actually experienced a dual > drive failure, the second drive only exhibited as failed when btrfs > tried to write to the drives in the filesystem when I removed the > disk. > > I shut down the array and imaged the failed drive using GNU ddrescue, > I was able to recover all but a few kb from the drive. Unfortunately, > when I imaged the drive I overwrote the drive that I had successfully > added to the filesystem. > > This brings me to my current state, I now have two devices missing: > > - the completely failed drive > - the empty drive that I overwrote with the second failed disks image > > Consequently I can't start the filesystem. I've discussed the issue in > the past with Ke and other people on the #btrfs channel, the > concensus; as I understood it, is that with the right patch it should > be possible to mount either the array with the empty drive absent or > to create a new btrfs fileystem on an empty drive and then manipulate > its UUIDs so that it believes it's the missing UUID from the existing > btrfs filesystem. > > Here's the info showing the current state of the filesystem: > > ubuntu@ubuntu:~$ sudo btrfs filesystem show > warning, device 6 is missing > warning devid 6 not found already > warning devid 7 not found already > Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7 > Total devices 7 FS bytes used 5.47TiB > devid1 size 1.81TiB used 1.71TiB path /dev/sda3 > devid2 size 1.81TiB used 1.71TiB path /dev/sdb3 > devid3 size 1.82TiB used 1.72TiB path /dev/sdc1 > devid4 size 1.82TiB used 1.72TiB path /dev/sdd1 > devid5 size 2.73TiB used 2.62TiB path /dev/sde1 > *** Some devices missing > btrfs-progs v4.0 The used kernel version might also give people some hints. Also, you have not stated what raid type the fs is; likely not raid6, but rather raid 1 or 10 or 5 btrfs filesystem usage will report and show this. If it is raid6, you could still fix the issue in theory. AFAIK there are no patches to fix a dual error in case it is other raid type or single. The only option is then to use btrfs rescue on the umounted array and hope to copy as much as possible off the damaged fs to other storage. > ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt > mount: wrong fs type, bad option, bad superblock on /dev/sda3, >missing codepage or helper program, or other error > >In some cases useful info is found in syslog - try >dmesg | tail or so. > ubuntu@ubuntu:~$ dmesg > [...] > [ 749.322385] BTRFS info (device sde1): allowing degraded mounts > [ 749.322404] BTRFS info (device sde1): disk space caching is enabled > [ 749.323571] BTRFS warning (device sde1): devid 6 uuid > f41bcb72-e88a-432f-9961-01307ec291a9 is missing > [ 749.335543] BTRFS warning (device sde1): devid 7 uuid > 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing > [ 749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378, > flush 0, corrupt 0, gen 0 > [ 749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0, > corrupt 0, gen 0 > [ 774.759717] BTRFS: too many missing devices, writeable mount is not allowed > [ 774.804053] BTRFS: open_ctree failed > > Thank you in advance for your help, > > -JohnF > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html