On Thu, Sep 5, 2019 at 2:44 PM Edmund Urbani <edmund.urb...@liland.com> wrote: > > I did not need the degraded option. And so far I see no HW I/O errors in > dmesg. I have encountered a few errors while copying files and found > these in the log: > > [ 3560.273634] btrfs_print_data_csum_error: 50 callbacks suppressed > [ 3560.273639] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0x98f94189 expected csum 0xcb3af09a mirror 1
Not a bit flip 0x98f94189 10011000111110010100000110001001 0xcb3af09a 11001011001110101111000010011010 > [ 3560.825942] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 2 > [ 3560.826588] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 3 > [ 3560.827813] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 4 > [ 3560.829063] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 5 > [ 3560.830366] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 6 > [ 3560.831559] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 7 > [ 3560.832998] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 8 > [ 3560.834649] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 9 > [ 3560.836188] BTRFS warning (device sdg1): csum failed root 262 ino > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 10 Also not a bit flip. 0xc0248289 11000000001001001000001010001001 0xcb3af09a 11001011001110101111000010011010 I'm not sure what it means or suggests has happened, that all the copies are wrong. Plausible with raid5 metadata. But seems unlikely with raid6 metadata, and also with all devices accounted for. The file itself is probably fine - these look like metadata complaints. If you find the file this inode belongs to, either duplicating it or deleting it is fine, should cause this bad leaf to just go away. Make sure you delete the correct file, each subvolume has its own list of inodes, this one is in subvol id 262. > > and also: > > [ 3889.813300] btree_readpage_end_io_hook: 1860 callbacks suppressed > [ 3889.813304] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 0 > [ 3889.825732] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 12157064991241308972 > [ 3889.826375] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 12157064991241308972 > [ 3889.828149] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 12157064991241308972 > [ 3889.829649] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 12157064991241308972 > [ 3889.831592] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 12157064991241308972 > [ 3889.833436] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 12157064991241308972 > [ 3889.835458] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 12157064991241308972 > [ 3889.836968] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 12157064991241308972 > [ 3889.848545] BTRFS error (device sdg1): bad tree block start, want > 34958548107264 have 12157064991241308972 I'm skeptical that a scrub will fix these things, because Btrfs is passively scrubbing on reads, so any checksum mismatches should get fixed up, if they can be fixed, from reconstruction, on the fly as well as scrub. This is a different problem, I'm not sure how serious it is. I would still do the full scrub. And then unmount it and run 'btrfs check --mode=lowmem'. On a file system of this size it will take a long time. So maybe do it over a weekend > > I think that Input/output error btrfsck is showing is actually a > filesystem checksum error and not triggered by faulty hardware (not > anymore, I hope). If there actually are any more failing drives here, I > will most likely do the ddrescue thing again. Currently there are no > free SATA ports in that system to connect an additional drive, so I > cannot simply add one (at least not without also installing an > additional SATA controller). I suggest start planning how to migrate the data to a new Btrfs volume. If the problems can't be repaired, this becomes inevitable. A reasonable strategy is to take read-only snapshots of each subvolume you want to preserve. And either 'btrfs send/receive' or 'rsync' to new storage. That way you can keep using the volume rw in the meantime. Once that completes, do another read only snapshot of each subvolume, and do an incremental 'send -p' or rsync to migrate the much smaller changes. -- Chris Murphy