> -----Original Message-----
> From: [email protected] [mailto:linux-btrfs-
> [email protected]] On Behalf Of Zygo Blaxell
> Sent: Wednesday, 21 September 2016 2:56 PM
> To: [email protected]
> Subject: btrfs rare silent data corruption with kernel data leak
>
> Summary:
>
> There seem to be two btrfs bugs here: one loses data on writes, and the
> other leaks data from the kernel to replace it on reads. It all happens after
> checksums are verified, so the corruption is entirely silent--no EIO errors,
> kernel messages, or device event statistics.
>
> Compressed extents are corrupted with kernel data leak. Uncompressed
> extents may not be corrupted, or may be corrupted by deterministically
> replacing data bytes with zero, or may not be corrupted. No preconditions
> for corruption are known. Less than one file per hundred thousand seems to
> be affected. Only specific parts of any file can be affected.
> Kernels v4.0..v4.5.7 tested, all have the issue.
Funny you should bring this up - I think I just suffered from this, or
something similar.
I have a mysql database of around 20 GiB which is under relatively heavy
workload for weeks at a time.
I just remembered today that I still had my root partition using compression
from when disk space was an issue about 4 months ago. I removed the compress
mount option, upgraded the kernel (from 4.7.2 to 4.7.4) and rebooted. Mysql
came up properly on reboot.
I stopped mysql, ran "btrfs filesystem defragment -v -r -c none /var/lib/mysql"
to remove the compression and it finished reporting 1 error, but without any
actual error messages.
However Mysql now wouldn't come back up. Remembering what I read earlier today
I thought "oh no...." and checked dmesg:
[ 539.166231] BTRFS warning (device sda1): csum failed ino 42906332 off 81920
csum 2566472073 expected csum 1967602629
[ 539.166856] BTRFS warning (device sda1): csum failed ino 42906332 off 81920
csum 2566472073 expected csum 1967602629
[ 539.166865] BTRFS warning (device sda1): csum failed ino 42906332 off 94208
csum 2566472073 expected csum 1625955513
[ 539.166908] BTRFS warning (device sda1): csum failed ino 42906332 off 94208
csum 2566472073 expected csum 1625955513
[ 539.167553] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum
2566472073 expected csum 3995365962
[ 539.168234] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum
2566472073 expected csum 3995365962
[ 539.168239] BTRFS warning (device sda1): csum failed ino 42906332 off 4096
csum 2566472073 expected csum 3937913037
[ 539.168282] BTRFS warning (device sda1): csum failed ino 42906332 off 4096
csum 2566472073 expected csum 3937913037
[ 539.168286] BTRFS warning (device sda1): csum failed ino 42906332 off 8192
csum 2566472073 expected csum 1100728286
[ 539.168328] BTRFS warning (device sda1): csum failed ino 42906332 off 8192
csum 2566472073 expected csum 1100728286
[ 612.832463] __readpage_endio_check: 2 callbacks suppressed
[ 612.832466] BTRFS warning (device sda1): csum failed ino 42906332 off 81920
csum 2566472073 expected csum 1967602629
[ 612.833160] BTRFS warning (device sda1): csum failed ino 42906332 off 81920
csum 2566472073 expected csum 1967602629
[ 612.833167] BTRFS warning (device sda1): csum failed ino 42906332 off 94208
csum 2566472073 expected csum 1625955513
[ 612.833202] BTRFS warning (device sda1): csum failed ino 42906332 off 94208
csum 2566472073 expected csum 1625955513
[ 612.833863] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum
2566472073 expected csum 3995365962
[ 612.834549] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum
2566472073 expected csum 3995365962
[ 612.834555] BTRFS warning (device sda1): csum failed ino 42906332 off 4096
csum 2566472073 expected csum 3937913037
[ 612.834602] BTRFS warning (device sda1): csum failed ino 42906332 off 4096
csum 2566472073 expected csum 3937913037
[ 612.834608] BTRFS warning (device sda1): csum failed ino 42906332 off 8192
csum 2566472073 expected csum 1100728286
[ 612.834652] BTRFS warning (device sda1): csum failed ino 42906332 off 8192
csum 2566472073 expected csum 1100728286
Using debug tree I found inode 42906332 was the file ibdata1
I tried to copy the mysql directory elsewhere, but that caused io failures in a
few files so I just removed the whole lot and restored from last nights backup.
These are the errors I got before I cancelled the copy:
[ 1284.349881] __readpage_endio_check: 2 callbacks suppressed
[ 1284.349885] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum
2566472073 expected csum 3995365962
[ 1284.349901] BTRFS warning (device sda1): csum failed ino 42906332 off 65536
csum 2566472073 expected csum 3704130384
[ 1284.349906] BTRFS warning (device sda1): csum failed ino 42906332 off 126976
csum 2566472073 expected csum 254392532
[ 1284.349911] BTRFS warning (device sda1): csum failed ino 42906332 off 8192
csum 2566472073 expected csum 1100728286
[ 1284.349913] BTRFS warning (device sda1): csum failed ino 42906332 off 77824
csum 2566472073 expected csum 716549262
[ 1284.349923] BTRFS warning (device sda1): csum failed ino 42906332 off 131072
csum 2566472073 expected csum 788300917
[ 1284.349925] BTRFS warning (device sda1): csum failed ino 42906332 off 12288
csum 2566472073 expected csum 3265258934
[ 1284.349926] BTRFS warning (device sda1): csum failed ino 42906332 off 81920
csum 2566472073 expected csum 1967602629
[ 1284.349930] BTRFS warning (device sda1): csum failed ino 42906332 off 192512
csum 2566472073 expected csum 2025572636
[ 1284.349934] BTRFS warning (device sda1): csum failed ino 42906332 off 258048
csum 2566472073 expected csum 3392889013
[ 1298.892667] BTRFS info (device sda1): csum failed ino 44628191 extent
228384051200 csum 2566472073 wanted 847116788 mirror 0
[ 1298.892727] BTRFS info (device sda1): csum failed ino 44628191 extent
228384051200 csum 2566472073 wanted 847116788 mirror 2
[ 1298.892732] BTRFS info (device sda1): csum failed ino 44628191 extent
228384051200 csum 2566472073 wanted 847116788 mirror 2
[ 1298.892751] BTRFS info (device sda1): csum failed ino 44628191 extent
228384051200 csum 2566472073 wanted 847116788 mirror 2
[ 1298.892786] BTRFS info (device sda1): csum failed ino 44628191 extent
228383002624 csum 2566472073 wanted 847116788 mirror 1
[ 1298.892792] BTRFS info (device sda1): csum failed ino 44628191 extent
228383002624 csum 2566472073 wanted 847116788 mirror 1
[ 1298.892805] BTRFS info (device sda1): csum failed ino 44628191 extent
228383002624 csum 2566472073 wanted 847116788 mirror 1
[ 1298.892849] BTRFS info (device sda1): csum failed ino 44628191 extent
228384051200 csum 2566472073 wanted 847116788 mirror 0
[ 1298.892896] BTRFS info (device sda1): csum failed ino 44628191 extent
228383002624 csum 2566472073 wanted 847116788 mirror 1
[ 1311.456422] __readpage_endio_check: 4430 callbacks suppressed
[ 1311.456425] BTRFS warning (device sda1): csum failed ino 44628192 off
3221225472 csum 2566472073 expected csum 3669189289
[ 1311.456442] BTRFS warning (device sda1): csum failed ino 44628192 off
3221229568 csum 2566472073 expected csum 317582346
[ 1311.456451] BTRFS warning (device sda1): csum failed ino 44628192 off
3221233664 csum 2566472073 expected csum 1636016048
[ 1311.456459] BTRFS warning (device sda1): csum failed ino 44628192 off
3221237760 csum 2566472073 expected csum 95857614
[ 1311.456467] BTRFS warning (device sda1): csum failed ino 44628192 off
3221241856 csum 2566472073 expected csum 2014942236
[ 1311.456482] BTRFS warning (device sda1): csum failed ino 44628192 off
3221254144 csum 2566472073 expected csum 1884694409
[ 1311.456540] BTRFS warning (device sda1): csum failed ino 44628192 off
3222274048 csum 2566472073 expected csum 2741402016
[ 1311.456542] BTRFS warning (device sda1): csum failed ino 44628192 off
3222339584 csum 2566472073 expected csum 3503993973
[ 1311.456545] BTRFS warning (device sda1): csum failed ino 44628192 off
3222405120 csum 2566472073 expected csum 3548745998
[ 1311.456551] BTRFS warning (device sda1): csum failed ino 44628192 off
3222470656 csum 2566472073 expected csum 2988893031
I'm seeing a lot of checksum 2566472073 - Is that the checksum of blank space I
wonder?
Here are the details of the filesystem concerned:
vm-server mysql # btrfs fi show /
Label: 'Root' uuid: 58d27dbd-7c1e-4ef7-8d43-e93df1537b08
Total devices 2 FS bytes used 103.21GiB
devid 13 size 471.93GiB used 245.03GiB path /dev/sda1
devid 14 size 471.93GiB used 245.03GiB path /dev/sdb1
vm-server mysql # btrfs fi df /
Data, RAID1: total=242.00GiB, used=102.42GiB
System, RAID1: total=32.00MiB, used=64.00KiB
Metadata, RAID1: total=3.00GiB, used=811.02MiB
GlobalReserve, single: total=272.00MiB, used=0.00B
/dev/sda1 on / type btrfs
(rw,noatime,ssd,discard,noacl,space_cache=v2,subvolid=5,subvol=/)
(compress was enabled previously)
Regards,
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html