Re: BTRFS checksum mismatch - false positives

Chris Murphy Thu, 26 Sep 2019 13:28:21 -0700

>From the log offlist

2019-09-08T17:27:02+02:00 MHPNAS kernel: [   22.396165] md: invalid
raid superblock magic on sda5
2019-09-08T17:27:02+02:00 MHPNAS kernel: [   22.401816] md: sda5 does
not have a valid v0.90 superblock, not importing!


That doesn't sound good. It's not a Btrfs problem but a md/mdadm
problem. You'll have to get support for this from Synology, only they
understand the design of the storage stack layout and whether these
error messages are important or not and how to fix them. Anyone else
speculating could end up causing damage to the NAS and data to be
lost.

--------
2019-09-08T17:27:02+02:00 MHPNAS kernel: [   22.913298] md: sda2 has
different UUID to sda1

There are several messages like this. I can't tell if they're just
informational and benign or a problem. Also not related to Btrfs.

--------
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.419199] BTRFS warning
(device dm-1): BTRFS: dm-1 checksum verify failed on 375259512832
wanted EA1A10E3 found 3080B64F level 0
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.419199]
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.458453] BTRFS warning
(device dm-1): BTRFS: dm-1 checksum verify failed on 375259512832
wanted EA1A10E3 found 3080B64F level 0
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.458453]
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.528385] BTRFS: read
error corrected: ino 1 off 375259512832 (dev /dev/vg1/volume_1 sector
751819488)
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.539631] BTRFS: read
error corrected: ino 1 off 375259516928 (dev /dev/vg1/volume_1 sector
751819496)
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.550785] BTRFS: read
error corrected: ino 1 off 375259521024 (dev /dev/vg1/volume_1 sector
751819504)
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.561990] BTRFS: read
error corrected: ino 1 off 375259525120 (dev /dev/vg1/volume_1 sector
751819512)

There are a bunch of messages like this. Btrfs is finding metadata
checksum errors, some kind of corruption has happened with one of the
copies, and it's been fixed up. But why are things being corrupt in
the first place? Ordinary bad sectors maybe? There's a lot of these  -
like really a lot. Hundreds of affected sectors. There are too many
for me to read through and see if all of them were corrected by DUP
metadata.

--------

2019-09-22T21:24:27+02:00 MHPNAS kernel: [1224856.764098] md2:
syno_self_heal_is_valid_md_stat(496): md's current state is not
suitable for data correction

What does that mean? Also not a Btrfs problem. There are quite a few of these.

--------

2019-09-23T11:49:20+02:00 MHPNAS kernel: [1276791.652946] BTRFS error
(device dm-1): BTRFS: dm-1 failed to repair btree csum error on
1353162506240, mirror = 1

OK and a few of these also. This means that some metadata could not be
repaired, likely because both copies are corrupt.

My recommendation is to freshen your backups now while you still can,
and prepare to rebuild the NAS. i.e. these are not likely repairable
problems. Once both copies of Btrfs metadata are bad, it's usually not
fixable you just have to recreate the file system from scratch.

You'll have to move everything off the NAS - and anything that's
really important you will want at least two independent copies of, of
course, and then you're going to obliterate the array and start from
scratch. While you're at it, you might as well make sure you've got
the latest supported version of the software for this product. Start
with that. Then follow the Synology procedure to wipe the NAS totally
and set it up again. You'll want to make sure the procedure you use
writes out all new metadata for everything: mdadm, lvm, Btrfs. Nothing
stale or old reused. And then you'll copy you data back over to the
NAS.

There's nothing in the provided log that helps me understand why this
is happening. I suspect hardware problems of some sort - maybe one of
the drives is starting to slowly die, by spitting out bad sectors. To
know more about that we'd need to see 'smartctl -x /dev/' for each
drive in the NAS and see if smart gives a clue. Somewhere around 50/50
shot that smart will predict a drive failure in advance. So my
suggestion again, without delay, is to make sure the NAS is backed up,
and keep those backups fresh. You can recreate the NAS when you have
free time - but these problems likely will get worse.



---
Chris Murphy

Re: BTRFS checksum mismatch - false positives

Reply via email to