On 2018年07月31日 08:43, Sterling Windmill wrote: > I am using a two disk raid1 btrfs filesystem spanning two external hard > drives connected via USB 3.0.
Is there any speed difference between the two device? And are these 2 devices under the same USB3.0 root hub or different root hubs? lsusb output could help to determine the hierarchy. > > While copying ~6TB of data from this filesystem to local disk via rsync > I am seeing messages like the following in dmesg output: > > [ 2213.406267] BTRFS warning (device sdj1): csum failed root 5 ino 830 > off 2124197888 csum 0xb5da0cd2 expected csum 0x6e478250 mirror 2 Since only one copy shows the problem, the other copy should be good thus the read should work without problem. > [ 4890.178727] BTRFS warning (device sdj1): csum failed root 5 ino 1058 > off 26052067328 csum 0x8ccd1067 expected csum 0x4adb8254 mirror 2 > [27463.940218] BTRFS warning (device sdj1): csum failed root 5 ino 5372 > off 7954096128 csum 0x9f9b697e expected csum 0xbd61a0e2 mirror 2 > [29405.832643] BTRFS warning (device sdj1): csum failed root 5 ino 31374 > off 7893983232 csum 0x12fd0ddc expected csum 0xddcd2f8e mirror 2 > [31224.279082] BTRFS warning (device sdj1): csum failed root 5 ino > 150903 off 183635968 csum 0xea025eb4 expected csum 0x46d64878 mirror 2 > [32282.635615] BTRFS warning (device sdj1): csum failed root 5 ino > 162774 off 31092424704 csum 0x1ee9b38d expected csum 0x4022e3de mirror 2 > [41052.643493] BTRFS warning (device sdj1): csum failed root 5 ino > 163742 off 52214816768 csum 0x6723208c expected csum 0x0377e68a mirror 2 > [47723.500430] BTRFS warning (device sdj1): csum failed root 5 ino > 470775 off 12533760 csum 0x9f50f9a0 expected csum 0x23ddc68e mirror 2 > [60060.843425] BTRFS warning (device sdj1): csum failed root 5 ino > 786762 off 4178321408 csum 0xcd520ead expected csum 0x46fe6ebc mirror 2 > [60900.058745] BTRFS warning (device sdj1): csum failed root 5 ino > 786900 off 896303104 csum 0x4c7e26e7 expected csum 0x86554095 mirror 2 > [68149.417236] BTRFS warning (device sdj1): csum failed root 5 ino 1058 > off 3101224960 csum 0x2b8c363c expected csum 0x8df2991a mirror 1 > [69072.272010] BTRFS warning (device sdj1): csum failed root 5 ino 1141 > off 2939588608 csum 0xa2969f63 expected csum 0xddf33efd mirror 1 > [71342.354453] BTRFS warning (device sdj1): csum failed root 5 ino 1328 > off 57047568384 csum 0xd57f5bb7 expected csum 0x421f96e5 mirror 1 > > Because the device was consistent, it seemed that one of the disks held > bad data. I wasn't sure if btrfs was correcting the issue by using the > other seemingly good copy on the second disk or if I was copying bad > data to the destination filesystem, so I aborted the copy and ran a > scrub of the filesystem that includes sdj1 by issuing the following command: > > btrfs scrub start /external > > I let the scrub finish and monitored the result using the following command: > > btrfs scrub status /external > > Which showed the following output: > > scrub status for ece518d2-4af0-4ef7-a31d-8c89b13a5ad9 > scrub started at Sun Jul 29 11:34:44 2018 and finished after > 14:34:58 > total bytes scrubbed: 12.80TiB with 0 errors Would you provide the dmesg during the scrub? > > Alright, perhaps btrfs had already fixed the issues upon encountering > them. I ran my copy again only to see very similar messages show up in > dmesg: > > [154842.551604] BTRFS warning (device sdj1): csum failed root 5 ino 1284 > off 858886144 csum 0x8caf203c expected csum 0x9a3acab6 mirror 2 At least the corrupted ino and offset is different, thus the old corruption is fixed, but somehow it introduced new corruption. > [159949.727412] BTRFS warning (device sdj1): csum failed root 5 ino 1636 > off 4463370240 csum 0x8dfaf00c expected csum 0xa7ab457e mirror 2 > [160911.893913] BTRFS warning (device sdj1): csum failed root 5 ino 1729 > off 8181428224 csum 0xd57845b5 expected csum 0x6904c54e mirror 2 > [165210.245890] BTRFS warning (device sdj1): csum failed root 5 ino 2927 > off 1013219328 csum 0xf2d2820d expected csum 0x812222bb mirror 2 > [169279.620570] BTRFS warning (device sdj1): csum failed root 5 ino 3363 > off 900493312 csum 0x6c6a35a2 expected csum 0x2a983a9c mirror 2 > [169990.401373] BTRFS warning (device sdj1): csum failed root 5 ino 4277 > off 186707968 csum 0xbdd075d5 expected csum 0xf302e9df mirror 2 > [171411.085425] BTRFS warning (device sdj1): csum failed root 5 ino 4719 > off 593842176 csum 0xcdabc7e6 expected csum 0xc137d47a mirror 2 > [173370.025471] BTRFS warning (device sdj1): csum failed root 5 ino 5267 > off 2605592576 csum 0xcd2cb8a8 expected csum 0x9de364e9 mirror 2 > [180329.942125] BTRFS warning (device sdj1): csum failed root 5 ino > 162774 off 22459506688 csum 0xc38e7a53 expected csum 0xad11854c mirror 2 Since all corruption showed above is about mirror 2, would you mind to try scrub certain device other than the whole fs and attach the dmesg? # btrfs scrub start <device> > > I would have expected the scrub to find these issues or to show some > number of corrected errors. Perhaps I misunderstand what scrub does? Your understanding is completely correct. In fact reading from corrupted block should trigger re-write on corrupted data. I'm wondering if it's related to some scrub race, since for multi-device btrfs, full fs scrub is addressed by doing multiple scrub simultaneously, one scrub for each device. It used to cause problem for raid5/6, but never heard of corruption for raid1. Would you provide the kernel version and full dmesg (including reading error and scrub, and later read)? > > I also tried tracking down individual files via the referenced inode > numbers with the following command: > > btrfs inspect-internal inode-resolve $INODE /external > > And ran checksums of the source and destination versions of these files > to find them to be identical. So at least the copy on the source and > destination appear to match. Since btrfs will switch to the good copy, the data should be correct. > > Maybe I'm experiencing some sort of intermittent USB device / bus issue? Full dmesg may help, if there is something related to usb. Thanks, Qu > Can anyone help explain what might be happening here? > > Thanks! > > > >
signature.asc
Description: OpenPGP digital signature