Additional BTRFS scrub logs

root@xxxx# btrfs scrub start -B /mnt/g/
scrub done for 554b0043-052f-48d1-986f-5a6154496d89
    scrub started at Mon Oct 10 12:52:40 2016 and finished after 00:00:39
    total bytes scrubbed: 20.03GiB with 46304 errors
    error details: csum=46304
    corrected errors: 0, uncorrectable errors: 46304, unverified errors: 0
ERROR: there are uncorrectable errors

I've attached the dmesg log incase gmail wordwrap completely destroys formatting

dmesg :
[10284.452766] BTRFS warning (device sdb2): checksum error at logical
143896739840 on dev /dev/sdb2, sector 42509568, root 5, inode 270,
offset 10548412416, length 4096, links 1 (path: shutdown-4.bin)
[10284.452775] BTRFS warning (device sdb2): checksum error at logical
143896608768 on dev /dev/sdb2, sector 42509312, root 5, inode 270,
offset 10548281344, length 4096, links 1 (path: shutdown-4.bin)
[10284.452784] BTRFS warning (device sdb2): checksum error at logical
143896346624 on dev /dev/sdb2, sector 42508800, root 5, inode 270,
offset 10548019200, length 4096, links 1 (path: shutdown-4.bin)
[10284.452792] BTRFS warning (device sdb2): checksum error at logical
143896215552 on dev /dev/sdb2, sector 42508544, root 5, inode 270,
offset 10547888128, length 4096, links 1 (path: shutdown-4.bin)
[10284.452800] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 1, gen 0
[10284.452804] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 2, gen 0
[10284.452810] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 3, gen 0
[10284.452813] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896608768 on dev /dev/sdb2
[10284.452815] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896346624 on dev /dev/sdb2
[10284.452820] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896215552 on dev /dev/sdb2
[10284.452840] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 4, gen 0
[10284.452845] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896739840 on dev /dev/sdb2
[10284.452906] BTRFS warning (device sdb2): checksum error at logical
143896084480 on dev /dev/sdb2, sector 42508288, root 5, inode 270,
offset 10547757056, length 4096, links 1 (path: shutdown-4.bin)
[10284.453082] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 5, gen 0
[10284.453087] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896084480 on dev /dev/sdb2
[10284.453107] BTRFS warning (device sdb2): checksum error at logical
143896743936 on dev /dev/sdb2, sector 42509576, root 5, inode 270,
offset 10548416512, length 4096, links 1 (path: shutdown-4.bin)
[10284.453113] BTRFS warning (device sdb2): checksum error at logical
143896612864 on dev /dev/sdb2, sector 42509320, root 5, inode 270,
offset 10548285440, length 4096, links 1 (path: shutdown-4.bin)
[10284.453122] BTRFS warning (device sdb2): checksum error at logical
143896350720 on dev /dev/sdb2, sector 42508808, root 5, inode 270,
offset 10548023296, length 4096, links 1 (path: shutdown-4.bin)
[10284.453126] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 6, gen 0
[10284.453130] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896612864 on dev /dev/sdb2
[10284.453134] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 7, gen 0
[10284.453137] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896350720 on dev /dev/sdb2
[10284.453153] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 8, gen 0
[10284.453157] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896743936 on dev /dev/sdb2
[10284.453190] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 9, gen 0
[10284.453194] BTRFS warning (device sdb2): checksum error at logical
143896219648 on dev /dev/sdb2, sector 42508552, root 5, inode 270,
offset 10547892224, length 4096, links 1 (path: shutdown-4.bin)
[10284.453198] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 10, gen 0
[10284.453201] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896219648 on dev /dev/sdb2
[10284.453207] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896616960 on dev /dev/sdb2
[10284.453238] BTRFS warning (device sdb2): checksum error at logical
143896477696 on dev /dev/sdb2, sector 42509056, root 5, inode 270,
offset 10548150272, length 4096, links 1 (path: shutdown-4.bin)

Important to note this happens across 2 machines, now 3 different
drives (all different brand SSDs), on a Z170 chipset and C612 chipset.

On Mon, Oct 10, 2016 at 10:44 AM, Martin Dev <mrturtle...@gmail.com> wrote:
> Hey everyone,
>
> I work for system verification of SSDs and we've recently come up
> against an issue with BTRFS on Ubuntu 16.04. We have a framework which
> follows the following steps:
>
> Generate verifiable 10GB file with FIO on internal drive
> Copy 10GB file to 2 target partitions on DUT (using "cp" command)
> Sync
> Verify copied files with FIO (using direct=1)
> Perform a power event (restart, shutdown, suspend, or hibernate)
> Verify both 10GB files
> Repeat.
>
> We keep the files from the first iteration on the target partitions,
> but subsequent iterations are deleted after verification. Every
> command is monitored for exit status and framework will fail with
> error if anything exits non zero.
>
> We've found that during this process (between 2-9 iterations of
> restarts or shutdowns) BTRFS will fail the pre-power verification of
> the file with 100% reproduction rate out of 7 attempts. So this is a
> "cp" command to copy from the internal to the btrfs partition, a sync,
> then a verify with fio. This seems to be a recent change as the same
> process has been used for the last 2 years including over the last
> month with no issues.
>
> Here are some more details:
>
> Linux ht_stress_b6_20 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18
> 18:33:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> btrfs-progs v4.4
>
> Label: none  uuid: 554b0043-052f-48d1-986f-5a6154496d89
>     Total devices 1 FS bytes used 20.03GiB
>     devid    1 size 29.81GiB used 22.26GiB path /dev/sdb2
>
> Data, single: total=22.00GiB, used=20.00GiB
> System, single: total=4.00MiB, used=16.00KiB
> Metadata, single: total=264.00MiB, used=20.64MiB
> GlobalReserve, single: total=16.00MiB, used=0.00B
>
>
> df -hT
> /dev/sdb1      ext4       30G   21G  7.7G  73% /mnt/f
> /dev/sdb2      btrfs      30G   21G  9.6G  68% /mnt/g
> /dev/sdb3      xfs        30G   33M   30G   1% /mnt/h
> /dev/sdb4      ext4       30G   44M   28G   1% /mnt/i
>
> /etc/fstab
> /dev/sdb2                                       /mnt/g/ btrfs
> defaults,discard       0       0
>
> root@xxxx: ls -l /mnt/g/
> total 20971520
> -rw-r--r-- 1 root root 10737418240 Oct 10 09:42 shutdown-1.bin
> -rw-r--r-- 1 root root 10737418240 Oct 10 10:03 shutdown-4.bin
>
>
> root@xxxx# cat fio-screenlog.log
> test: (g=0): rw=read, bs=128K-128K/128K-128K/128K-128K,
> ioengine=libaio, iodepth=32
> fio-2.13-88-g32bc8-dirty
> Starting 1 process
> verify: bad magic header 101, wanted acca at file
> /mnt/g/shutdown-4.bin offset 10547757056, length 131072
> verify: bad magic header 101, wanted acca at file
> /mnt/g/shutdown-4.bin offset 10547888128, length 131072
> fio: pid=2288, err=84/file:io_u.c:1978, func=io_u_queued_complete,
> error=Invalid or incomplete multibyte or wide character
>
> In this situation, FIO is reading "101" instead of the correct data
> from the shutdown-4.bin file. from the "ls -l" above, we can see the
> shutdown-1.bin and shutdown-4.bin are the same size and we know the cp
> command exited 0.
>
> shutdown-1.bin (OK):
> dd if=/mnt/g/shutdown-1.bin of=/tmp/header bs=512 count=1 skip=20601088
> 1+0 records in
> 1+0 records out
> 512 bytes copied, 0.00247518 s, 207 kB/s
>
> Data:
> 0000000 acca 0001 0000 0002 3d92 e664 178e 8421
> 0000010 0000 74b2 0002 0000 0016 0000 2bfa 0009
> 0000020 0001 3a59 c82b 1b3a a7b8 ee74 881a 0747
> 0000030 94f7 09f2 79c9 04e3 d29e d6c2 ea3f 04b8
> ...
>
> shutdown-4.bin (FAILS):
> dd if=/mnt/g/shutdown-4.bin of=/tmp/header bs=512 count=1 skip=20601088
> dd: error reading '/mnt/g/shutdown-4.bin': Input/output error
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.00230287 s, 0.0 kB/s
>
> We can't even read the file at that offset correctly and have to go
> back to 20601087 (1 512 byte sector) before I can get valid data.
> What's interesting, is over 2 different drives, 2 different machines,
> and 7 reproductions, the data is always "101", this might be a quirk
> of FIO, as I would hope that btrfs would not return corrupt data to
> the user.
>
> Reproduction is straight forward, and takes around an hour. I have a
> large amount of duplicate machines at my disposal if debug /
> investigations need to be run. I can provide portions of our
> automation logs, full dmesg (attached), and any other information that
> needs to be gathered.
[10284.452766] BTRFS warning (device sdb2): checksum error at logical 143896739840 on dev /dev/sdb2, sector 42509568, root 5, inode 270, offset 10548412416, length 4096, links 1 (path: shutdown-4.bin)
[10284.452775] BTRFS warning (device sdb2): checksum error at logical 143896608768 on dev /dev/sdb2, sector 42509312, root 5, inode 270, offset 10548281344, length 4096, links 1 (path: shutdown-4.bin)
[10284.452784] BTRFS warning (device sdb2): checksum error at logical 143896346624 on dev /dev/sdb2, sector 42508800, root 5, inode 270, offset 10548019200, length 4096, links 1 (path: shutdown-4.bin)
[10284.452792] BTRFS warning (device sdb2): checksum error at logical 143896215552 on dev /dev/sdb2, sector 42508544, root 5, inode 270, offset 10547888128, length 4096, links 1 (path: shutdown-4.bin)
[10284.452800] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[10284.452804] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[10284.452810] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[10284.452813] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896608768 on dev /dev/sdb2
[10284.452815] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896346624 on dev /dev/sdb2
[10284.452820] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896215552 on dev /dev/sdb2
[10284.452840] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
[10284.452845] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896739840 on dev /dev/sdb2
[10284.452906] BTRFS warning (device sdb2): checksum error at logical 143896084480 on dev /dev/sdb2, sector 42508288, root 5, inode 270, offset 10547757056, length 4096, links 1 (path: shutdown-4.bin)
[10284.453082] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
[10284.453087] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896084480 on dev /dev/sdb2
[10284.453107] BTRFS warning (device sdb2): checksum error at logical 143896743936 on dev /dev/sdb2, sector 42509576, root 5, inode 270, offset 10548416512, length 4096, links 1 (path: shutdown-4.bin)
[10284.453113] BTRFS warning (device sdb2): checksum error at logical 143896612864 on dev /dev/sdb2, sector 42509320, root 5, inode 270, offset 10548285440, length 4096, links 1 (path: shutdown-4.bin)
[10284.453122] BTRFS warning (device sdb2): checksum error at logical 143896350720 on dev /dev/sdb2, sector 42508808, root 5, inode 270, offset 10548023296, length 4096, links 1 (path: shutdown-4.bin)
[10284.453126] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
[10284.453130] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896612864 on dev /dev/sdb2
[10284.453134] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
[10284.453137] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896350720 on dev /dev/sdb2
[10284.453153] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
[10284.453157] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896743936 on dev /dev/sdb2
[10284.453190] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
[10284.453194] BTRFS warning (device sdb2): checksum error at logical 143896219648 on dev /dev/sdb2, sector 42508552, root 5, inode 270, offset 10547892224, length 4096, links 1 (path: shutdown-4.bin)
[10284.453198] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
[10284.453201] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896219648 on dev /dev/sdb2
[10284.453207] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896616960 on dev /dev/sdb2
[10284.453238] BTRFS warning (device sdb2): checksum error at logical 143896477696 on dev /dev/sdb2, sector 42509056, root 5, inode 270, offset 10548150272, length 4096, links 1 (path: shutdown-4.bin)

Reply via email to