Re: BTRFS Data at Rest File Corruption

Austin S. Hemmelgarn Thu, 12 May 2016 11:29:34 -0700

On 2016-05-12 13:49, Richard A. Lochner wrote:

Austin,


I rebooted the computer and reran the scrub to no avail.  The error is
consistent.

The reason I brought this question to the mailing list is because it
seemed like a situation that might be of interest to the developers.
 Perhaps, there might be a way to "defend" against this type of
corruption.

I suspected, and I still suspect that the error occurred upon a
metadata update that corrupted the checksum for the file, probably due
to silent memory corruption.  If the checksum was silently corrupted,
it would be simply written to both drives causing this type of error.

That does seem to be the most likely cause, and sadly, is not somethingany filesystem can protect reliably against on any commodity hardware.


With that in mind, I proved (see below) that the data blocks match on
both mirrors.  This I expected since the data blocks should not have
been touched as the the file has not been written.

This is the sequence of events as I see them that I think might be of
interest to the developers.

1. A block containing a checksum for the file was read into memory.
The block read would have been checksummed, so the checksum for the
file must have been good at that moment.

It's worth noting that BTRFS doesn't verify all the checksums in ametadata block when it loads that metadata block, only the ones for thereads that triggered the metadata block being loaded will get verified.


2. The checksum block was the altered in memory (perhaps to add or
change a value).

3. A new checksum would then have been calculated for the checksum
block.

4. The checksum block would have been written to both mirrors.

Presumably, in the case that I am experiencing, an undetected memory
error must have occurred after 1 and before step 3 was completed.

I wonder if there is a way to correct or detect that situation.

The closest we could get is to provide an option to handle this inscrub, preferably with a big scary warning on it as this same situationcan be easily cause by someone modifying the disks themselves (we can'treasonably protect against that, but we shouldn't make it trivial forpeople to inject arbitrary data that way either).


As I stated previously, the machine on which this occurred does not
have ECC memory, however, I would not think that the majority of users
running btrfs do either.  If it has happened to me, it likely has
happened to others.

Rick Lochner

btrfs dmesg(s):

[16510.334020] BTRFS warning (device sdb1): checksum error at logical
3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode
1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
[16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd
0, flush 0, corrupt 5, gen 0
[16510.345662] BTRFS error (device sdb1): unable to fixup (regular)
error at logical 3037444042752 on dev /dev/sdb1

[17606.978439] BTRFS warning (device sdb1): checksum error at logical
3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode
1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
[17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr 0, rd
13, flush 0, corrupt 4, gen 0
[17606.989497] BTRFS error (device sdb1): unable to fixup (regular)
error at logical 3037444042752 on dev /dev/sdc1

How I compared the data blocks:

#btrfs-map-logical -l 3037444042752  /dev/sdc1
mirror 1 logical 3037444042752 physical 2554240299008 device /dev/sdc1
mirror 1 logical 3037444046848 physical 2554240303104 device /dev/sdc1
mirror 2 logical 3037444042752 physical 2554260221952 device /dev/sdb1
mirror 2 logical 3037444046848 physical 2554260226048 device /dev/sdb1

#dd if=/dev/sdc1 bs=1 skip=2554240299008 count=4096 of=c1
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0292201 s, 140 kB/s

#dd if=/dev/sdc1 bs=1 skip=2554240303104 count=4096 of=c2
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0142381 s, 288 kB/s

#dd if=/dev/sdb1 bs=1 skip=2554260221952 count=4096 of=b1
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0293211 s, 140 kB/s

#dd if=/dev/sdb1 bs=1 skip=2554260226048 count=4096 of=b2
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0151947 s, 270 kB/s

#diff b1 c1
#diff b2 c2

Excellent thinking here.

Now, if you can find some external method to verify that that block isin fact correct, you can just write it back into the file itself at thecorrect offset, and fix the issue.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS Data at Rest File Corruption

Reply via email to