Re: RAID Assembly with Missing Empty Drive

2016-03-29 Thread John Marrett
> I think it is best that you just repeat the fixing again on the real
> disks and just make sure you have an uptodate/latest kernel+tools when
> fixing the few damaged files.
> With   btrfs inspect-internal inode-resolve 257 
> you can see what file(s) are damaged.

I inspected the damaged files, they are base directories on the two
file systems that definitely don't have issues seen by btrfs replace:

ubuntu@btrfs-recovery:~$ sudo btrfs inspect-internal inode-resolve 257
/mnt/@home
/mnt/@home/aidan
ubuntu@btrfs-recovery:~$ sudo btrfs inspect-internal inode-resolve 257 /mnt/@
/mnt/@/home

I've completed recovery using the stock 4.5 kernel.org kernel. I'm
running a scrub now and it's going well so far.

Once someone authorizes my wiki account request I will update the wiki
with information on using replace instead of add/delete as well as
setting up overlay devices for filesystem recovery testing.

Thanks to everyone on the list and irc for their help with my problems.

-JohnF
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-27 Thread Henk Slager
On Sun, Mar 27, 2016 at 4:59 PM, John Marrett  wrote:
>>> If you do want to use a newer one, I'd build against kernel.org, just
>>> because the developers only use that base. And use 4.4.6 or 4.5.
>>
>> At this point I could remove the overlays and recover the filesystem
>> permanently, however I'm also deeply indebted to the btrfs community
>> and want to give anything I can back. I've built (but not installed ;)
>> ) a straight kernel.org 4.5 w/my missing device check patch applied.
>> Is there any interest or value in attempting to switch to this kernel,
>> add/delete a device and see if I experience the same errors as before
>> I tried replace? What information should I gather if I do this?
>
> I've built and installed a 4.5 straight from kernel.org with my patch.
>
> I encountered the same errors in recovery when I use add/delete
> instead of using replace, here's the sequence of commands:
>
> ubuntu@btrfs-recovery:~$ sudo mount -o degraded,ro /dev/sda /mnt
> ubuntu@btrfs-recovery:~$ sudo mount -o remount,rw /mnt
> # Remove first empty device
> ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
> # Add blank drive
> ubuntu@btrfs-recovery:~$ sudo btrfs device add /dev/sde /mnt
> # Remove second missing device with data
> ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
>
> And the resulting error:
>
> ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
> ERROR: error removing the device 'missing' - Input/output error
>
> Here's what we see in dmesg after deleting the missing device:
>
> [  588.231341] BTRFS info (device sdd): relocating block group
> 10560347308032 flags 17
> [  664.306122] BTRFS warning (device sdd): csum failed ino 257 off
> 695730176 csum 2566472073 expected csum 2706136415
> [  664.306164] BTRFS warning (device sdd): csum failed ino 257 off
> 695734272 csum 2566472073 expected csum 2558511802
> [  664.306182] BTRFS warning (device sdd): csum failed ino 257 off
> 695746560 csum 2566472073 expected csum 3360772439
> [  664.306191] BTRFS warning (device sdd): csum failed ino 257 off
> 695750656 csum 2566472073 expected csum 1205516886
> [  664.344179] BTRFS warning (device sdd): csum failed ino 257 off
> 695730176 csum 2566472073 expected csum 2706136415
> [  664.344213] BTRFS warning (device sdd): csum failed ino 257 off
> 695734272 csum 2566472073 expected csum 2558511802
> [  664.344224] BTRFS warning (device sdd): csum failed ino 257 off
> 695746560 csum 2566472073 expected csum 3360772439
> [  664.344233] BTRFS warning (device sdd): csum failed ino 257 off
> 695750656 csum 2566472073 expected csum 1205516886
> [  664.344684] BTRFS warning (device sdd): csum failed ino 257 off
> 695730176 csum 2566472073 expected csum 2706136415
> [  664.344693] BTRFS warning (device sdd): csum failed ino 257 off
> 695734272 csum 2566472073 expected csum 2558511802
>
> Is there anything of value I can do here to help address this possible
> issue in btrfs itself, or should I remove the overlays, replace the
> device and move on?
>
> Please let me know,

I think it is great that with your local patch you managed to get into
a writable situation.
In theory, with for example already a new spare disk already attached
and standby (hot spare patchset and more etc), a direct replace of the
failing disk, so internally or manually with btrfs-replace would have
prevented the few csum and other small errors. It could be that the
errors have another cause than due to the complete failing harddisk
initially, but that won't be easy to trackdown black and white. Also
the ddrescue action and local patch make tracking back difficult and
it was also based on outdated kernel+tools.

I think it is best that you just repeat the fixing again on the real
disks and just make sure you have an uptodate/latest kernel+tools when
fixing the few damaged files.
With   btrfs inspect-internal inode-resolve 257 
you can see what file(s) are damaged.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-27 Thread John Marrett
>> If you do want to use a newer one, I'd build against kernel.org, just
>> because the developers only use that base. And use 4.4.6 or 4.5.
>
> At this point I could remove the overlays and recover the filesystem
> permanently, however I'm also deeply indebted to the btrfs community
> and want to give anything I can back. I've built (but not installed ;)
> ) a straight kernel.org 4.5 w/my missing device check patch applied.
> Is there any interest or value in attempting to switch to this kernel,
> add/delete a device and see if I experience the same errors as before
> I tried replace? What information should I gather if I do this?

I've built and installed a 4.5 straight from kernel.org with my patch.

I encountered the same errors in recovery when I use add/delete
instead of using replace, here's the sequence of commands:

ubuntu@btrfs-recovery:~$ sudo mount -o degraded,ro /dev/sda /mnt
ubuntu@btrfs-recovery:~$ sudo mount -o remount,rw /mnt
# Remove first empty device
ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
# Add blank drive
ubuntu@btrfs-recovery:~$ sudo btrfs device add /dev/sde /mnt
# Remove second missing device with data
ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt

And the resulting error:

ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
ERROR: error removing the device 'missing' - Input/output error

Here's what we see in dmesg after deleting the missing device:

[  588.231341] BTRFS info (device sdd): relocating block group
10560347308032 flags 17
[  664.306122] BTRFS warning (device sdd): csum failed ino 257 off
695730176 csum 2566472073 expected csum 2706136415
[  664.306164] BTRFS warning (device sdd): csum failed ino 257 off
695734272 csum 2566472073 expected csum 2558511802
[  664.306182] BTRFS warning (device sdd): csum failed ino 257 off
695746560 csum 2566472073 expected csum 3360772439
[  664.306191] BTRFS warning (device sdd): csum failed ino 257 off
695750656 csum 2566472073 expected csum 1205516886
[  664.344179] BTRFS warning (device sdd): csum failed ino 257 off
695730176 csum 2566472073 expected csum 2706136415
[  664.344213] BTRFS warning (device sdd): csum failed ino 257 off
695734272 csum 2566472073 expected csum 2558511802
[  664.344224] BTRFS warning (device sdd): csum failed ino 257 off
695746560 csum 2566472073 expected csum 3360772439
[  664.344233] BTRFS warning (device sdd): csum failed ino 257 off
695750656 csum 2566472073 expected csum 1205516886
[  664.344684] BTRFS warning (device sdd): csum failed ino 257 off
695730176 csum 2566472073 expected csum 2706136415
[  664.344693] BTRFS warning (device sdd): csum failed ino 257 off
695734272 csum 2566472073 expected csum 2558511802

Is there anything of value I can do here to help address this possible
issue in btrfs itself, or should I remove the overlays, replace the
device and move on?

Please let me know,

-JohnF
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-27 Thread John Marrett
>> I was looking under btrfs device, sorry about that. I do have the
>> command. I tried replace and it seemed more promising than the last
>> attempt, it wrote enough data to the new drive to overflow and break
>> my overlay. I'm trying it without the overlay on the destination
>> device, I'll report back later with the results.

It looks like replace worked!

I got the following final output:

ubuntu@btrfs-recovery:~$ sudo btrfs replace status /mnt
Started on 26.Mar 20:59:12, finished on 27.Mar 05:20:01, 0 write errs,
0 uncorr. read errs

The filesystem appears in good health, no more missing devices:

ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
Total devices 6 FS bytes used 5.47TiB
devid1 size 1.81TiB used 1.71TiB path /dev/sdb
devid2 size 1.81TiB used 1.71TiB path /dev/sde
devid3 size 1.82TiB used 1.72TiB path /dev/sdd
devid4 size 1.82TiB used 1.72TiB path /dev/sdc
devid5 size 2.73TiB used 2.62TiB path /dev/sda
devid6 size 2.73TiB used 2.62TiB path /dev/sdf

btrfs-progs v4.0

However the dmesg output shows some errors despite the 0 uncorr. read
errs reported above:

[112178.006315] BTRFS: checksum error at logical 8576298061824 on dev /dev/sda,
sector 4333289864, root 259, inode 10017264, offset 3216, length 4096, links
 1 (path: mythtv/store/4663_20150809180500.mpg)
[112178.006327] btrfs_dev_stat_print_on_error: 5 callbacks suppressed
[112178.006330] BTRFS: bdev /dev/sda errs: wr 0, rd 5002, flush 0, corrupt 16, g
en 0

And the underlying file does appear to be damaged:

ubuntu@btrfs-recovery:/mnt/@home/mythtv$ dd
if=store/4663_20150809180500.mpg of=/dev/null
dd: error reading ‘store/4663_20150809180500.mpg’: Input/output error
63368+0 records in
63368+0 records out
3216 bytes (32 MB) copied, 1.08476 s, 29.9 MB/s

Here's some dmesg output when accessing a damaged file:

[140789.642357] BTRFS warning (device sdc): csum failed ino 10017264
off 32854016 csum 2566472073 expected csum 1193787476
[140789.642503] BTRFS warning (device sdc): csum failed ino 10017264
off 32919552 csum 2566472073 expected csum 2825707817
[140789.645768] BTRFS warning (device sdc): csum failed ino 10017264
off 32509952 csum 2566472073 expected csum 834024150

I can also see that one device has had a few errors; this is the
device that was ddrescued and recorded some errors before being
ddrescued:

[/dev/sda].write_io_errs   0
[/dev/sda].read_io_errs5002
[/dev/sda].flush_io_errs   0
[/dev/sda].corruption_errs 153
[/dev/sda].generation_errs 0

> If you do want to use a newer one, I'd build against kernel.org, just
> because the developers only use that base. And use 4.4.6 or 4.5.

At this point I could remove the overlays and recover the filesystem
permanently, however I'm also deeply indebted to the btrfs community
and want to give anything I can back. I've built (but not installed ;)
) a straight kernel.org 4.5 w/my missing device check patch applied.
Is there any interest or value in attempting to switch to this kernel,
add/delete a device and see if I experience the same errors as before
I tried replace? What information should I gather if I do this?

-JohnF
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-26 Thread Chris Murphy
On Sat, Mar 26, 2016 at 3:01 PM, John Marrett  wrote:
>> Well off hand it seems like the missing 2.73TB has nothing on it at
>> all, and doesn't need to be counted as missing. The other missing is
>> counted, and should have all of its data replicated elsewhere. But
>> then you're running into csum errors. So something still isn't right,
>> we just don't understand what it is.
>
> I'm not sure what we can do to get a better understanding of these
> errors, that said it may not be necessary if replace helps, more
> below.
>
>> Btrfs replace has been around for a while. 'man btrfs replace' the
>> command takes the form 'btrfs replace start' plus three required
>> pieces of information. You should be able to infer the missing devid
>> using 'btrfs show' looks like it's 6.
>
> I was looking under btrfs device, sorry about that. I do have the
> command. I tried replace and it seemed more promising than the last
> attempt, it wrote enough data to the new drive to overflow and break
> my overlay. I'm trying it without the overlay on the destination
> device, I'll report back later with the results.
>
> I'm running ubuntu linux-image-4.2.0-34-generic with a patch to remove
> this check:
>
> https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770
>
> I can switch to whatever kernel though as desired. Would you prefer a
> mainline ubuntu packaged kernel? Straight from kernel.org?

Things are a lot more deterministic for developers and testers if
you're using something current. It might not matter in this case that
you're using 4.2 but all you have to do is look at the git pulls in
the list archives to see many hundreds, often over 1000, btrfs changes
per kernel cycle. So, lots and lots of fixes have happened since 4.2.
And any bugs found in 4.2 don't really matter, because you'd have to
try to reproduce in 4.4.6 or 4.5, and then the fix would go into 4.6
before it'd get backported, and then 4.2 won't be getting backports
done by upstream. That's why list folks always suggest using something
so recent. Again, in this case it might not matter, I don't read or
understand every single commit.

If you do want to use a newer one, I'd build against kernel.org, just
because the developers only use that base. And use 4.4.6 or 4.5.

It's reasonable to keep the overlay on the existing devices, but
remove the overlay for the replacement so that you're directly writing
to it. If that blows up with 4.2 you can still start over with a newer
kernel. *shrug*


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-26 Thread John Marrett
> Well off hand it seems like the missing 2.73TB has nothing on it at
> all, and doesn't need to be counted as missing. The other missing is
> counted, and should have all of its data replicated elsewhere. But
> then you're running into csum errors. So something still isn't right,
> we just don't understand what it is.

I'm not sure what we can do to get a better understanding of these
errors, that said it may not be necessary if replace helps, more
below.

> Btrfs replace has been around for a while. 'man btrfs replace' the
> command takes the form 'btrfs replace start' plus three required
> pieces of information. You should be able to infer the missing devid
> using 'btrfs show' looks like it's 6.

I was looking under btrfs device, sorry about that. I do have the
command. I tried replace and it seemed more promising than the last
attempt, it wrote enough data to the new drive to overflow and break
my overlay. I'm trying it without the overlay on the destination
device, I'll report back later with the results.

I'm running ubuntu linux-image-4.2.0-34-generic with a patch to remove
this check:

https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770

I can switch to whatever kernel though as desired. Would you prefer a
mainline ubuntu packaged kernel? Straight from kernel.org?

-JohnF
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-26 Thread Chris Murphy
On Sat, Mar 26, 2016 at 6:15 AM, John Marrett  wrote:
> Chris,
>
>> Post 'btrfs fi usage' for the fileystem. That may give some insight
>> what's expected to be on all the missing drives.
>
> Here's the information, I believe that the missing we see in most
> entries is the failed and absent drive, only the unallocated shows two
> missing entries, the 2.73 TB is the missing but empty device. I don't
> know if there's a way to prove it however.
>
> ubuntu@btrfs-recovery:~$ sudo btrfs fi usage /mnt
> Overall:
> Device size:  15.45TiB
> Device allocated:  12.12TiB
> Device unallocated:   3.33TiB
> Device missing:   5.46TiB
> Used:  10.93TiB
> Free (estimated):   2.25TiB(min: 2.25TiB)
> Data ratio:  2.00
> Metadata ratio:  2.00
> Global reserve: 512.00MiB(used: 0.00B)
>
> Data,RAID1: Size:6.04TiB, Used:5.46TiB
>/dev/sda   2.61TiB
>/dev/sdb   1.71TiB
>/dev/sdc   1.72TiB
>/dev/sdd   1.72TiB
>/dev/sdf   1.71TiB
>missing   2.61TiB
>
> Metadata,RAID1: Size:14.00GiB, Used:11.59GiB
>/dev/sda   8.00GiB
>/dev/sdb   2.00GiB
>/dev/sdc   3.00GiB
>/dev/sdd   4.00GiB
>/dev/sdf   3.00GiB
>missing   8.00GiB
>
> System,RAID1: Size:32.00MiB, Used:880.00KiB
>/dev/sda  32.00MiB
>missing  32.00MiB
>
> Unallocated:
>/dev/sda 111.49GiB
>/dev/sdb  98.02GiB
>/dev/sdc  98.02GiB
>/dev/sdd  98.02GiB
>/dev/sdf  98.02GiB
>missing 111.49GiB
>missing   2.73TiB
>
> I tried to remove missing, first remove missing only removes the
> 2.73TiB missing entry seen above. All the other missing entries
> remain.

Well off hand it seems like the missing 2.73TB has nothing on it at
all, and doesn't need to be counted as missing. The other missing is
counted, and should have all of its data replicated elsewhere. But
then you're running into csum errors. So something still isn't right,
we just don't understand what it is.


> I can't "replace", it's not a valid command on my btrfs tools version;
> I upgraded btrfs this morning in order to have the btrfs fi usage
> command.

Btrfs replace has been around for a while. 'man btrfs replace' the
command takes the form 'btrfs replace start' plus three required
pieces of information. You should be able to infer the missing devid
using 'btrfs show' looks like it's 6.



> ubuntu@btrfs-recovery:~$ sudo btrfs version
> btrfs-progs v4.0
> ubuntu@btrfs-recovery:~$ dpkg -l | grep btrfs
> ii  btrfs-tools4.0-2
>  amd64Checksumming Copy on Write Filesystem utilities

I would use something newer, but btrfs replace is in 4.0. But I also
don't see in this thread what kernel version you're using.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-26 Thread John Marrett
Chris,

> Post 'btrfs fi usage' for the fileystem. That may give some insight
> what's expected to be on all the missing drives.

Here's the information, I believe that the missing we see in most
entries is the failed and absent drive, only the unallocated shows two
missing entries, the 2.73 TB is the missing but empty device. I don't
know if there's a way to prove it however.

ubuntu@btrfs-recovery:~$ sudo btrfs fi usage /mnt
Overall:
Device size:  15.45TiB
Device allocated:  12.12TiB
Device unallocated:   3.33TiB
Device missing:   5.46TiB
Used:  10.93TiB
Free (estimated):   2.25TiB(min: 2.25TiB)
Data ratio:  2.00
Metadata ratio:  2.00
Global reserve: 512.00MiB(used: 0.00B)

Data,RAID1: Size:6.04TiB, Used:5.46TiB
   /dev/sda   2.61TiB
   /dev/sdb   1.71TiB
   /dev/sdc   1.72TiB
   /dev/sdd   1.72TiB
   /dev/sdf   1.71TiB
   missing   2.61TiB

Metadata,RAID1: Size:14.00GiB, Used:11.59GiB
   /dev/sda   8.00GiB
   /dev/sdb   2.00GiB
   /dev/sdc   3.00GiB
   /dev/sdd   4.00GiB
   /dev/sdf   3.00GiB
   missing   8.00GiB

System,RAID1: Size:32.00MiB, Used:880.00KiB
   /dev/sda  32.00MiB
   missing  32.00MiB

Unallocated:
   /dev/sda 111.49GiB
   /dev/sdb  98.02GiB
   /dev/sdc  98.02GiB
   /dev/sdd  98.02GiB
   /dev/sdf  98.02GiB
   missing 111.49GiB
   missing   2.73TiB

I tried to remove missing, first remove missing only removes the
2.73TiB missing entry seen above. All the other missing entries
remain.

I can't "replace", it's not a valid command on my btrfs tools version;
I upgraded btrfs this morning in order to have the btrfs fi usage
command.

ubuntu@btrfs-recovery:~$ sudo btrfs version
btrfs-progs v4.0
ubuntu@btrfs-recovery:~$ dpkg -l | grep btrfs
ii  btrfs-tools4.0-2
 amd64Checksumming Copy on Write Filesystem utilities

For those interested in my recovery techniques, here's how I rebuild
the overlay loop devices, be careful, these scripts make certain
assumptions that may not be accurate for your system:

On Client:

sudo umount /mnt
sudo /etc/init.d/open-iscsi stop

On Server:

/etc/init.d/iscsitarget stop
loop_devices=$(losetup -a | grep overlay | tr ":" " " | awk ' { printf
$1 " " } END { print "" } ')
for fn in /dev/mapper/sd??; do dmsetup remove $fn; done
for ln in $loop_devices; do losetup -d $ln; done
cd /home/ubuntu
rm sd*overlay

for device in sda3 sdb3 sdc1 sdd1 sde1 sdf1; do
  dev=/dev/$device
  ovl=/home/ubuntu/$device-overlay
  truncate -s512M $ovl
  newdevname=$device
  size=$(blockdev --getsize "$dev")
  loop=$(losetup -f --show -- "$ovl")
  echo Setting up loop for $dev using overlay $ovl on loop $loop for
target $newdevname
  printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdevname"
done

Start the targets

/etc/init.d/iscsitarget start

On Client:

sudo /etc/init.d/open-iscsi start

-JohnF
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-25 Thread Chris Murphy
[let me try keeping the list cc'd]

On Fri, Mar 25, 2016 at 7:21 PM, John Marrett  wrote:
> Chris,
>
>> Quite honestly I don't understand how Btrfs raid1 volume with two
>> missing devices even permits you to mount it degraded,rw in the first
>> place.
>
> I think you missed my previous post, it's simple, I patched the kernel
> to bypass the check for missing devices with rw mounts, I did this
> because one of my missing devices has no data on it, it's actually
> confirmed by my mounting as you can see here:
>

Yeah too many emails today, and I'm skimming too much.



>
> ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
> Total devices 7 FS bytes used 5.47TiB
> devid1 size 1.81TiB used 1.71TiB path /dev/sde
> devid2 size 1.81TiB used 1.71TiB path /dev/sda
> devid3 size 1.82TiB used 1.72TiB path /dev/sdc
> devid4 size 1.82TiB used 1.72TiB path /dev/sdd
> devid5 size 2.73TiB used 2.62TiB path /dev/sdf
> devid6 size 2.73TiB used 2.62TiB path
> devid7 size 2.73TiB used 0.00 path
>
>> Anyway, maybe it's possible there's no dual missing metadata chunks,
>> although I find it hard to believe.
>
> Considering the above do you still think that I may have missing metadata?

Post 'btrfs fi usage' for the fileystem. That may give some insight
what's expected to be on all the missing drives.

>
>> Because there are two devices missing, I doubt this matters, but I
>> think you're better off using 'btrfs replace' for this rather than
>> 'device add' followed by 'device remove'. The two catches with
>
> I'll try btrfs replace for the second device (with data) after
> removing the first.
>
> Do you think my chances are better moving data off the array in read only 
> mode?

My expectation is that whether copying everything or using replace, if
either process arrives at no metadata copies found, it's going to stop
whatever it's doing. Question is only how that manifests.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-25 Thread John Marrett
Chris,

> Quite honestly I don't understand how Btrfs raid1 volume with two
> missing devices even permits you to mount it degraded,rw in the first
> place.

I think you missed my previous post, it's simple, I patched the kernel
to bypass the check for missing devices with rw mounts, I did this
because one of my missing devices has no data on it, it's actually
confirmed by my mounting as you can see here:

ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
Total devices 7 FS bytes used 5.47TiB
devid1 size 1.81TiB used 1.71TiB path /dev/sde
devid2 size 1.81TiB used 1.71TiB path /dev/sda
devid3 size 1.82TiB used 1.72TiB path /dev/sdc
devid4 size 1.82TiB used 1.72TiB path /dev/sdd
devid5 size 2.73TiB used 2.62TiB path /dev/sdf
devid6 size 2.73TiB used 2.62TiB path
devid7 size 2.73TiB used 0.00 path

> Anyway, maybe it's possible there's no dual missing metadata chunks,
> although I find it hard to believe.

Considering the above do you still think that I may have missing metadata?

> Because there are two devices missing, I doubt this matters, but I
> think you're better off using 'btrfs replace' for this rather than
> 'device add' followed by 'device remove'. The two catches with

I'll try btrfs replace for the second device (with data) after
removing the first.

Do you think my chances are better moving data off the array in read only mode?

-JohnF
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-25 Thread Chris Murphy
On Fri, Mar 25, 2016 at 4:31 PM, John Marrett  wrote:
> Continuing with my recovery efforts I've built overlay mounts of each
> of the block devices supporting my btrfs filesystem as well as the new
> disk I'm trying to introduce. I have patched the kernel to disable the
> check for multiple missing devices. I then exported the overlayed
> devices using iSCSI to a second system to attempt the recovery.
>
> I am able to mount the device rw, then I can remove missing devices
> which removes the missing empty disk. I can add in a new device to the
> filesystem and then attempt to remove the second missing disk (which
> has 2.7 TB of content on it).
>
> Unfortunately this removal fails as follows:
>
> ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
> ERROR: error removing the device 'missing' - Input/output error


Quite honestly I don't understand how Btrfs raid1 volume with two
missing devices even permits you to mount it degraded,rw in the first
place. That's rather mystifying considering the other thread where
there's a 4 disk raid10 with one missing device, and rw,degraded mount
is allowed only once, after that it disallows further attempts to
rw,degraded mount it.

Anyway, maybe it's possible there's no dual missing metadata chunks,
although I find it hard to believe. But OK, maybe it works for a while
and you can copy some stuff off the drives where there's at least one
data copy. If there's dual  missing data copies but there's still at
least 1 metadata copy, then file system will just spit out noisy error
messages. But if there ends up being dual missing metadata, I expect a
crash, or the file system goes read only, or maybe even unmounts. I'm
not sure. But once there's 0 copies of metadata I don't see how the
file system can correct for that.

Because there are two devices missing, I doubt this matters, but I
think you're better off using 'btrfs replace' for this rather than
'device add' followed by 'device remove'. The two catches with
replace: the replacement device must be as big or bigger than the one
being replaced; you have to do a resize on the replacement device,
using 'fi resize devid:max' to use all the space if the new one is
bigger than the old device. But I suspect either the first or second
replacement will fail also, it's too many missing devices.

So what can happen, if there's 0 copies of metadata, is that you might
not get everything off the drives before you hit the 0 copies problem
and the ensuing face plant. In that case you might have to depend on
btrfs restore. It could be really tedious to find out what can be
scraped. But I still think you're better off than any other file
system in this case, because they wouldn't even mount if there were
two mirrors lost.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-25 Thread John Marrett
Continuing with my recovery efforts I've built overlay mounts of each
of the block devices supporting my btrfs filesystem as well as the new
disk I'm trying to introduce. I have patched the kernel to disable the
check for multiple missing devices. I then exported the overlayed
devices using iSCSI to a second system to attempt the recovery.

I am able to mount the device rw, then I can remove missing devices
which removes the missing empty disk. I can add in a new device to the
filesystem and then attempt to remove the second missing disk (which
has 2.7 TB of content on it).

Unfortunately this removal fails as follows:

ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
ERROR: error removing the device 'missing' - Input/output error

The kernel shows:

[ 2772.000680] BTRFS warning (device sdd): csum failed ino 257 off
695730176 csum 2566472073 expected csum 2706136415
[ 2772.000724] BTRFS warning (device sdd): csum failed ino 257 off
695734272 csum 2566472073 expected csum 2558511802
[ 2772.000736] BTRFS warning (device sdd): csum failed ino 257 off
695746560 csum 2566472073 expected csum 3360772439
[ 2772.000742] BTRFS warning (device sdd): csum failed ino 257 off
695750656 csum 2566472073 expected csum 1205516886
[...]

Can anyone offer any advice as to how I should proceed from here?

One safe option is recreating the array. Now that I have discovered I
can mount the filesystem in degraded,ro mode I could purchase another
new disk, this will give me enough free disk space to copy all the
data off this array and onto a new non-redundant array. I can then add
all the drives in to the new array and convert it back to RAID1.

Here's a full breakdown of the commands that I ran in the process I
describe above; my patch only allows a remount with a missing device,
it's not very significant:

ubuntu@btrfs-recovery:~$ sudo mount -o degraded,ro /dev/sda /mnt
ubuntu@btrfs-recovery:~$ sudo mount -o remount,rw /mnt

Here we see the two missing devices:

ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
Total devices 7 FS bytes used 5.47TiB
devid1 size 1.81TiB used 1.71TiB path /dev/sde
devid2 size 1.81TiB used 1.71TiB path /dev/sda
devid3 size 1.82TiB used 1.72TiB path /dev/sdc
devid4 size 1.82TiB used 1.72TiB path /dev/sdd
devid5 size 2.73TiB used 2.62TiB path /dev/sdf
devid6 size 2.73TiB used 2.62TiB path
devid7 size 2.73TiB used 0.00 path

I remove the first missing device:

ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt

The unused missing device is removed:

ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
Total devices 6 FS bytes used 5.47TiB
devid1 size 1.81TiB used 1.71TiB path /dev/sde
devid2 size 1.81TiB used 1.71TiB path /dev/sda
devid3 size 1.82TiB used 1.72TiB path /dev/sdc
devid4 size 1.82TiB used 1.72TiB path /dev/sdd
devid5 size 2.73TiB used 2.62TiB path /dev/sdf
devid6 size 2.73TiB used 2.62TiB path

I add a new device:

ubuntu@btrfs-recovery:~$ sudo btrfs device add /dev/sdb /mnt
ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
Total devices 7 FS bytes used 5.47TiB
devid1 size 1.81TiB used 1.71TiB path /dev/sde
devid2 size 1.81TiB used 1.71TiB path /dev/sda
devid3 size 1.82TiB used 1.72TiB path /dev/sdc
devid4 size 1.82TiB used 1.72TiB path /dev/sdd
devid5 size 2.73TiB used 2.62TiB path /dev/sdf
devid6 size 2.73TiB used 2.62TiB path
devid7 size 2.73TiB used 0.00 path /dev/sdb

Here's some more details on the techniques necessary to get to this
point, in the hopes that others can benefit from them. I will also
update the apparently broken parallels scripts on the mdadm wiki.

To create overlay mounts use the following script; it will create
overlays for each device in the device list, using a sparse overlay
file located in /home/ubuntu/$device-overlay, each overlay will be
performed using a 512 MB file (the size passed to truncate).

for device in sda3 sdb3 sdc1 sdd1 sde1 sdf1; do
  dev=/dev/$device
  ovl=/home/ubuntu/$device-overlay
  truncate -s512M $ovl
  newdevname=$device
  size=$(blockdev --getsize "$dev")
  loop=$(losetup -f --show -- "$ovl")
  echo Setting up loop for $dev using overlay $ovl on loop $loop for
target $newdevname
  printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdevname"
done

I used iscsitarget to export the block devices from the server,
configuration files are as follows (on ubuntu):

Install

sudo apt install iscsitarget

Enable

/etc/default/iscsitarget
ISCSITARGET_ENABLE=true

Exports

/etc/iet/ietd.conf

Target iqn.2001-04.com.example:storage.lun1
IncomingUser
OutgoingUser
Lun 0 Path=/dev/mapper/sda3,Type=fileio
Alias LUN1

Target 

Re: RAID Assembly with Missing Empty Drive

2016-03-22 Thread John Marrett
Henk,

> I asume you did btrfs device add  ?
> Or did you do this withbtrfs replace  ?

Just realised I missed this question, sorry, I performed an add
followed by a (failed) delete.

-JohnF

>
>> filesystem successfully, when I attempted to remove the failed drive I
>> encountered an error. I discovered that I actually experienced a dual
>> drive failure, the second drive only exhibited as failed when btrfs
>> tried to write to the drives in the filesystem when I removed the
>> disk.
>>
>> I shut down the array and imaged the failed drive using GNU ddrescue,
>> I was able to recover all but a few kb from the drive. Unfortunately,
>> when I imaged the drive I overwrote the drive that I had successfully
>> added to the filesystem.
>>
>> This brings me to my current state, I now have two devices missing:
>>
>>  - the completely failed drive
>>  - the empty drive that I overwrote with the second failed disks image
>>
>> Consequently I can't start the filesystem. I've discussed the issue in
>> the past with Ke and other people on the #btrfs channel, the
>> concensus; as I understood it, is that with the right patch it should
>> be possible to mount either the array with the empty drive absent or
>> to create a new btrfs fileystem on an empty drive and then manipulate
>> its UUIDs so that it believes it's the missing UUID from the existing
>> btrfs filesystem.
>>
>> Here's the info showing the current state of the filesystem:
>>
>> ubuntu@ubuntu:~$ sudo btrfs filesystem show
>> warning, device 6 is missing
>> warning devid 6 not found already
>> warning devid 7 not found already
>> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
>> Total devices 7 FS bytes used 5.47TiB
>> devid1 size 1.81TiB used 1.71TiB path /dev/sda3
>> devid2 size 1.81TiB used 1.71TiB path /dev/sdb3
>> devid3 size 1.82TiB used 1.72TiB path /dev/sdc1
>> devid4 size 1.82TiB used 1.72TiB path /dev/sdd1
>> devid5 size 2.73TiB used 2.62TiB path /dev/sde1
>> *** Some devices missing
>> btrfs-progs v4.0
>
> The used kernel version might also give people some hints.
>
> Also, you have not stated what raid type the fs is; likely not raid6,
> but rather raid 1 or 10 or 5
> btrfs filesystem usage  will report and show this.
>
> If it is raid6, you could still fix the issue in theory. AFAIK there
> are no patches to fix a dual error in case it is other raid type or
> single. The only option is then to use   btrfs rescue   on the
> umounted array and hope to copy as much as possible off the damaged fs
> to other storage.
>
>> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
>> mount: wrong fs type, bad option, bad superblock on /dev/sda3,
>>missing codepage or helper program, or other error
>>
>>In some cases useful info is found in syslog - try
>>dmesg | tail or so.
>> ubuntu@ubuntu:~$ dmesg
>> [...]
>> [  749.322385] BTRFS info (device sde1): allowing degraded mounts
>> [  749.322404] BTRFS info (device sde1): disk space caching is enabled
>> [  749.323571] BTRFS warning (device sde1): devid 6 uuid
>> f41bcb72-e88a-432f-9961-01307ec291a9 is missing
>> [  749.335543] BTRFS warning (device sde1): devid 7 uuid
>> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
>> [  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
>> flush 0, corrupt 0, gen 0
>> [  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
>> corrupt 0, gen 0
>> [  774.759717] BTRFS: too many missing devices, writeable mount is not 
>> allowed
>> [  774.804053] BTRFS: open_ctree failed
>>
>> Thank you in advance for your help,
>>
>> -JohnF
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-22 Thread John Marrett
After further discussion in #btrfs:

I left out the raid level, it's raid1:

ubuntu@ubuntu:~$ sudo btrfs filesystem df /mnt
Data, RAID1: total=6.04TiB, used=5.46TiB
System, RAID1: total=32.00MiB, used=880.00KiB
Metadata, RAID1: total=14.00GiB, used=11.59GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

It is possible to mount the filesystem with -o recover,ro

It may be possible to comment out this check:

https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770

And then to mount read/write, remove the failed drive, add a new
drive. If there are no more interesting suggestions forthcoming I will
try it, though to test I'll want to overlay the underlying devices and
then export them using iSCSI, AoE or NBD in order to avoid further
damage to my filesystem.

Unfortunately I don't have nearly enough disk space available to make
a complete copy of the data and rebuild the filesystem.

-JohnF

On Tue, Mar 22, 2016 at 5:18 PM, Henk Slager  wrote:
> On Tue, Mar 22, 2016 at 9:19 PM, John Marrett  wrote:
>> I recently had a drive failure in a file server running btrfs. The
>> failed drive was completely non-functional. I added a new drive to the
>
> I asume you did btrfs device add  ?
> Or did you do this withbtrfs replace  ?
>
>> filesystem successfully, when I attempted to remove the failed drive I
>> encountered an error. I discovered that I actually experienced a dual
>> drive failure, the second drive only exhibited as failed when btrfs
>> tried to write to the drives in the filesystem when I removed the
>> disk.
>>
>> I shut down the array and imaged the failed drive using GNU ddrescue,
>> I was able to recover all but a few kb from the drive. Unfortunately,
>> when I imaged the drive I overwrote the drive that I had successfully
>> added to the filesystem.
>>
>> This brings me to my current state, I now have two devices missing:
>>
>>  - the completely failed drive
>>  - the empty drive that I overwrote with the second failed disks image
>>
>> Consequently I can't start the filesystem. I've discussed the issue in
>> the past with Ke and other people on the #btrfs channel, the
>> concensus; as I understood it, is that with the right patch it should
>> be possible to mount either the array with the empty drive absent or
>> to create a new btrfs fileystem on an empty drive and then manipulate
>> its UUIDs so that it believes it's the missing UUID from the existing
>> btrfs filesystem.
>>
>> Here's the info showing the current state of the filesystem:
>>
>> ubuntu@ubuntu:~$ sudo btrfs filesystem show
>> warning, device 6 is missing
>> warning devid 6 not found already
>> warning devid 7 not found already
>> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
>> Total devices 7 FS bytes used 5.47TiB
>> devid1 size 1.81TiB used 1.71TiB path /dev/sda3
>> devid2 size 1.81TiB used 1.71TiB path /dev/sdb3
>> devid3 size 1.82TiB used 1.72TiB path /dev/sdc1
>> devid4 size 1.82TiB used 1.72TiB path /dev/sdd1
>> devid5 size 2.73TiB used 2.62TiB path /dev/sde1
>> *** Some devices missing
>> btrfs-progs v4.0
>
> The used kernel version might also give people some hints.
>
> Also, you have not stated what raid type the fs is; likely not raid6,
> but rather raid 1 or 10 or 5
> btrfs filesystem usage  will report and show this.
>
> If it is raid6, you could still fix the issue in theory. AFAIK there
> are no patches to fix a dual error in case it is other raid type or
> single. The only option is then to use   btrfs rescue   on the
> umounted array and hope to copy as much as possible off the damaged fs
> to other storage.
>
>> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
>> mount: wrong fs type, bad option, bad superblock on /dev/sda3,
>>missing codepage or helper program, or other error
>>
>>In some cases useful info is found in syslog - try
>>dmesg | tail or so.
>> ubuntu@ubuntu:~$ dmesg
>> [...]
>> [  749.322385] BTRFS info (device sde1): allowing degraded mounts
>> [  749.322404] BTRFS info (device sde1): disk space caching is enabled
>> [  749.323571] BTRFS warning (device sde1): devid 6 uuid
>> f41bcb72-e88a-432f-9961-01307ec291a9 is missing
>> [  749.335543] BTRFS warning (device sde1): devid 7 uuid
>> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
>> [  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
>> flush 0, corrupt 0, gen 0
>> [  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
>> corrupt 0, gen 0
>> [  774.759717] BTRFS: too many missing devices, writeable mount is not 
>> allowed
>> [  774.804053] BTRFS: open_ctree failed
>>
>> Thank you in advance for your help,
>>
>> -JohnF
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line 

Re: RAID Assembly with Missing Empty Drive

2016-03-22 Thread Henk Slager
On Tue, Mar 22, 2016 at 9:19 PM, John Marrett  wrote:
> I recently had a drive failure in a file server running btrfs. The
> failed drive was completely non-functional. I added a new drive to the

I asume you did btrfs device add  ?
Or did you do this withbtrfs replace  ?

> filesystem successfully, when I attempted to remove the failed drive I
> encountered an error. I discovered that I actually experienced a dual
> drive failure, the second drive only exhibited as failed when btrfs
> tried to write to the drives in the filesystem when I removed the
> disk.
>
> I shut down the array and imaged the failed drive using GNU ddrescue,
> I was able to recover all but a few kb from the drive. Unfortunately,
> when I imaged the drive I overwrote the drive that I had successfully
> added to the filesystem.
>
> This brings me to my current state, I now have two devices missing:
>
>  - the completely failed drive
>  - the empty drive that I overwrote with the second failed disks image
>
> Consequently I can't start the filesystem. I've discussed the issue in
> the past with Ke and other people on the #btrfs channel, the
> concensus; as I understood it, is that with the right patch it should
> be possible to mount either the array with the empty drive absent or
> to create a new btrfs fileystem on an empty drive and then manipulate
> its UUIDs so that it believes it's the missing UUID from the existing
> btrfs filesystem.
>
> Here's the info showing the current state of the filesystem:
>
> ubuntu@ubuntu:~$ sudo btrfs filesystem show
> warning, device 6 is missing
> warning devid 6 not found already
> warning devid 7 not found already
> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
> Total devices 7 FS bytes used 5.47TiB
> devid1 size 1.81TiB used 1.71TiB path /dev/sda3
> devid2 size 1.81TiB used 1.71TiB path /dev/sdb3
> devid3 size 1.82TiB used 1.72TiB path /dev/sdc1
> devid4 size 1.82TiB used 1.72TiB path /dev/sdd1
> devid5 size 2.73TiB used 2.62TiB path /dev/sde1
> *** Some devices missing
> btrfs-progs v4.0

The used kernel version might also give people some hints.

Also, you have not stated what raid type the fs is; likely not raid6,
but rather raid 1 or 10 or 5
btrfs filesystem usage  will report and show this.

If it is raid6, you could still fix the issue in theory. AFAIK there
are no patches to fix a dual error in case it is other raid type or
single. The only option is then to use   btrfs rescue   on the
umounted array and hope to copy as much as possible off the damaged fs
to other storage.

> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
> mount: wrong fs type, bad option, bad superblock on /dev/sda3,
>missing codepage or helper program, or other error
>
>In some cases useful info is found in syslog - try
>dmesg | tail or so.
> ubuntu@ubuntu:~$ dmesg
> [...]
> [  749.322385] BTRFS info (device sde1): allowing degraded mounts
> [  749.322404] BTRFS info (device sde1): disk space caching is enabled
> [  749.323571] BTRFS warning (device sde1): devid 6 uuid
> f41bcb72-e88a-432f-9961-01307ec291a9 is missing
> [  749.335543] BTRFS warning (device sde1): devid 7 uuid
> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
> [  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
> flush 0, corrupt 0, gen 0
> [  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
> corrupt 0, gen 0
> [  774.759717] BTRFS: too many missing devices, writeable mount is not allowed
> [  774.804053] BTRFS: open_ctree failed
>
> Thank you in advance for your help,
>
> -JohnF
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html