Am Fr., 6. Sept. 2019 um 00:33 Uhr schrieb Chris Murphy
<li...@colorremedies.com>:
>
> On Thu, Sep 5, 2019 at 2:44 PM Edmund Urbani <edmund.urb...@liland.com> wrote:
> >
> > I did not need the degraded option. And so far I see no HW I/O errors in
> > dmesg. I have encountered a few errors while copying files and found
> > these in the log:
> >
> > [ 3560.273634] btrfs_print_data_csum_error: 50 callbacks suppressed
> > [ 3560.273639] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0x98f94189 expected csum 0xcb3af09a mirror 1
>
> Not a bit flip
> 0x98f94189
> 10011000111110010100000110001001
> 0xcb3af09a
> 11001011001110101111000010011010
>
>
> > [ 3560.825942] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 2
> > [ 3560.826588] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 3
> > [ 3560.827813] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 4
> > [ 3560.829063] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 5
> > [ 3560.830366] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 6
> > [ 3560.831559] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 7
> > [ 3560.832998] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 8
> > [ 3560.834649] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 9
> > [ 3560.836188] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 10
>
> Also not a bit flip.
> 0xc0248289
> 11000000001001001000001010001001
> 0xcb3af09a
> 11001011001110101111000010011010
>
> I'm not sure what it means or suggests has happened, that all the
> copies are wrong. Plausible with raid5 metadata. But seems unlikely
> with raid6 metadata, and also with all devices accounted for.
>
> The file itself is probably fine - these look like metadata
> complaints. If you find the file this inode belongs to, either
> duplicating it or deleting it is fine, should cause this bad leaf to
> just go away. Make sure you delete the correct file, each subvolume
> has its own list of inodes, this one is in subvol id 262.
>
> >
> > and also:
> >
> > [ 3889.813300] btree_readpage_end_io_hook: 1860 callbacks suppressed
> > [ 3889.813304] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 0
> > [ 3889.825732] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.826375] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.828149] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.829649] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.831592] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.833436] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.835458] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.836968] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.848545] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
>
> I'm skeptical that a scrub will fix these things, because Btrfs is
> passively scrubbing on reads, so any checksum mismatches should get
> fixed up, if they can be fixed, from reconstruction, on the fly as
> well as scrub. This is a different problem, I'm not sure how serious
> it is.
>
> I would still do the full scrub. And then unmount it and run 'btrfs
> check --mode=lowmem'. On a file system of this size it will take a
> long time. So maybe do it over a weekend
>
> >
> > I think that Input/output error btrfsck is showing is actually a
> > filesystem checksum error and not triggered by faulty hardware (not
> > anymore, I hope). If there actually are any more failing drives here, I
> > will most likely do the ddrescue thing again. Currently there are no
> > free SATA ports in that system to connect an additional drive, so I
> > cannot simply add one (at least not without also installing an
> > additional SATA controller).
>
> I suggest start planning how to migrate the data to a new Btrfs
> volume. If the problems can't be repaired, this becomes inevitable. A
> reasonable strategy is to take read-only snapshots of each subvolume
> you want to preserve. And either 'btrfs send/receive' or 'rsync' to
> new storage. That way you can keep using the volume rw in the
> meantime. Once that completes, do another read only snapshot of each
> subvolume, and do an incremental 'send -p' or rsync to migrate the
> much smaller changes.
>
>
> --
> Chris Murphy


Here's a little status update. I am still in the process of salvaging
files (remounting rw did not work for long and btrfs soon reverted to
read-only state and I left it that way for now). After completing my
first rsync pass I was still missing several large directory trees and
found corresponding errors in the logs:
Sep 15 20:34:39 phoenix kernel: BTRFS error (device sdg1): parent
transid verify failed on 34960626352128 wanted 3332854 found 3332691

I remounted with ro,recover,nospace_cache,clear_cache. Now I am able
to access more of the filesystem, but some errors still remain. I am
seeing plenty of csum errors in the logs:
Sep 16 12:08:53 phoenix kernel: BTRFS info (device sdg1): no csum
found for inode 6126287 start 1673527296

then there's these (for all 10 mirrors):
Sep 16 12:09:13 phoenix kernel: BTRFS warning (device sdg1): csum
failed root 261 ino 6126287 off 1734606848 csum 0x7430ddcb expected
csum 0x00000000 mirror 10
curiously at least the recent log entries all refer to inode 6126287
(start, offset etc. vary).

And then there's also still occasionally this:
Sep 16 12:09:19 phoenix kernel: BTRFS error (device sdg1): parent
transid verify failed on 34960627597312 wanted 3332854 found 3332691

I'll investigate the logs further when the second rsync pass is done.

Kind regards,
 Edmund

-- 
*Liland IT GmbH*


Ferlach ● Wien ● München
Tel: +43 463 220111
Tel: +49 89 
458 15 940
off...@liland.com
https://Liland.com <https://Liland.com> 



Copyright © 2019 Liland IT GmbH 

Diese Mail enthaelt vertrauliche und/oder 
rechtlich geschuetzte Informationen. 
Wenn Sie nicht der richtige Adressat 
sind oder diese Email irrtuemlich erhalten haben, informieren Sie bitte 
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren 
sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 

This 
email may contain confidential and/or privileged information. 
If you are 
not the intended recipient (or have received this email in error) please 
notify the sender immediately and destroy this email. Any unauthorised 
copying, disclosure or distribution of the material in this email is 
strictly forbidden.

Reply via email to