On 05.09.2019 21:57, Chris Murphy wrote:
On Thu, Sep 5, 2019 at 1:18 PM Edmund Urbani <edmund.urb...@liland.com> wrote:
On 04.09.2019 07:36, Chris Murphy wrote:
I have tried all the mount / restore options listed here:
https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490
Good. Stick with ro attempts for now. Including if you want to try a
newer kernel. If it succeeds to mount ro, my advice is to update
backups so at least critical information isn't lost. Back up while you
can. Any repair attempt makes changes that will risk the data being
permanently lost. So it's important to be really deliberate about any
changes.
I'll let you know, when I have the new kernel up and running.
I think you should have all the original drives installed, and try to
mount -o ro first. And if that doesn't work, try -o ro,degraded, and
then we'll just have to see which drive it doesn't like.
Things are finally looking up. I have replaced both sdb and sdf with
ddrescue'd copies. sdb had some 10MB bad sectors and sdf 8KB which could
not be recovered.
I am now able to mount the volume again. :)
btrfsck /dev/sda1
Opening filesystem to check...
Checking filesystem on /dev/sda1
UUID: 108df6ea-2846-4a88-8a50-61aedeef92b4
[1/7] checking root items
checksum verify failed on 34958760591360 found E4E3BDB6 wanted 00000000
checksum verify failed on 34958760591360 found E4E3BDB6 wanted 00000000
parent transid verify failed on 34958760591360 wanted 3331734 found 1544337
checksum verify failed on 34958760591360 found 04DEBA71 wanted B9FBE54D
checksum verify failed on 34958760591360 found 04DEBA71 wanted B9FBE54D
bad tree block 34958760591360, bytenr mismatch, want=34958760591360,
have=27967614209536
ERROR: failed to repair root items: Input/output error
Anyway, I am about to mount it read-only again to try and backup a few
things. And once I am done with that, should I run btrfs scrub?
Did it mount with ro alone, or did you need ro,degraded?
I'm a little confused by the i/o error, which I'd expect will also
produce a message at the same time in dmesg that will hint what the
nature of the i/o error is. That suggests some kind of hardware issue
still exists, even if it is an uncorrectable sector read error. For
sure rw mounted scrubs can fix those thing, if enough redundancy
exists, and those copies aren't also corrupt. But I'm off hand not
sure whether 'btrfs check --repair' can fixup bad sectors like scrub
can.
Anyway, I suggest 'btfs check --repair' is a last resort, no matter
the version of btrfs-progs. 'btrfs check' alone is safe. So in order:
* you've done these
*dmesg
*btrfs check --readonly ##safe, makes no changes, maybe gives a hint
of the problem
*mount -o ro
*mount -o ro,degraded
mount -o rw ## all devices available
mount -o rw,degraded
I'm not sure a read only scrub helps much. It might be interesting?
What you really want is to be able to mount rw with all devices, and
then scrub.
But even rw,degraded is better, because you must be rw mounted to make
scrub repairs, and also to do device replacements. I personally would
not do a degraded scrub, because that scrub requires reading the whole
volume. If you're going to read the whole volume anyway, you might as
well rebuild the bad/missing device, so that you can more quickly get
back to undegraded/normal RAID6 operation.
If you can only mount 'rw,degraded' we need to see 'btrfs fi show' and
the kernel messages for the failed mount and the successful degraded
mount, so we can figure out what devices are affected, maybe why, and
then what the next step is.
Anyone know if latest kernel and progs now reliably supports 'btrfs
replace' for RAID6? For a bit it was recommended to do it the old way,
with 'btrfs device add' followed by 'btrfs device delete'. Main
difference for the user is that 'replace' requires that the
replacement drive is at least as big (in bytes) as the one being
replaced and also that 'replace' will not resize the volume after
replacement is finished, that has to be done manually. Otherwise I
think it's preferred?
I did not need the degraded option. And so far I see no HW I/O errors in
dmesg. I have encountered a few errors while copying files and found
these in the log:
[ 3560.273634] btrfs_print_data_csum_error: 50 callbacks suppressed
[ 3560.273639] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0x98f94189 expected csum 0xcb3af09a mirror 1
[ 3560.825942] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 2
[ 3560.826588] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 3
[ 3560.827813] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 4
[ 3560.829063] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 5
[ 3560.830366] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 6
[ 3560.831559] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 7
[ 3560.832998] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 8
[ 3560.834649] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 9
[ 3560.836188] BTRFS warning (device sdg1): csum failed root 262 ino
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 10
and also:
[ 3889.813300] btree_readpage_end_io_hook: 1860 callbacks suppressed
[ 3889.813304] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 0
[ 3889.825732] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 12157064991241308972
[ 3889.826375] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 12157064991241308972
[ 3889.828149] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 12157064991241308972
[ 3889.829649] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 12157064991241308972
[ 3889.831592] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 12157064991241308972
[ 3889.833436] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 12157064991241308972
[ 3889.835458] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 12157064991241308972
[ 3889.836968] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 12157064991241308972
[ 3889.848545] BTRFS error (device sdg1): bad tree block start, want
34958548107264 have 12157064991241308972
I think that Input/output error btrfsck is showing is actually a
filesystem checksum error and not triggered by faulty hardware (not
anymore, I hope). If there actually are any more failing drives here, I
will most likely do the ddrescue thing again. Currently there are no
free SATA ports in that system to connect an additional drive, so I
cannot simply add one (at least not without also installing an
additional SATA controller).
Anyway, I have some peace of mind now that most of my data is accessible
again. Time to get some sleep...
Thank you, Chris!
Kind regards,
Edmund
--
*Liland IT GmbH*
Ferlach ● Wien ● München
Tel: +43 463 220111
Tel: +49 89
458 15 940
off...@liland.com
https://Liland.com <https://Liland.com>
Copyright © 2019 Liland IT GmbH
Diese Mail enthaelt vertrauliche und/oder
rechtlich geschuetzte Informationen.
Wenn Sie nicht der richtige Adressat
sind oder diese Email irrtuemlich erhalten haben, informieren Sie bitte
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren
sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This
email may contain confidential and/or privileged information.
If you are
not the intended recipient (or have received this email in error) please
notify the sender immediately and destroy this email. Any unauthorised
copying, disclosure or distribution of the material in this email is
strictly forbidden.