Thanks for the speedy reply! Here's my kernel version:
4.17.9-200.fc28.x86_64 dmesg doesn't show any USB related info at all, no signs of errors / warnings. Both drives are identical, Seagate 8TB external drives connected to the following PCIe controller: 03:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) lsusb output: Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 005: ID 0bc2:ab38 Seagate RSS LLC Backup Plus Hub Bus 004 Device 003: ID 0bc2:ab45 Seagate RSS LLC Bus 004 Device 004: ID 0bc2:ab38 Seagate RSS LLC Backup Plus Hub Bus 004 Device 002: ID 0bc2:ab45 Seagate RSS LLC Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 003: ID 0bc2:ab44 Seagate RSS LLC Bus 003 Device 002: ID 0bc2:ab44 Seagate RSS LLC Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 0624:0248 Avocent Corp. Virtual Hub Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub There's no output in dmesg related to the scrub, only the few csum failed messages I included in my last email. Per your request, I'm starting a scrub on the single device with the following command: btrfs scrub start /dev/sdj1 scrub started on /dev/sdj1, fsid ece518d2-4af0-4ef7-a31d-8c89b13a5ad9 (pid=14684) I'll report back once this is complete. Anything else I can collect that might be helpful in understanding what's happening here? On Mon, Jul 30, 2018 at 8:56 PM Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > > > On 2018年07月31日 08:43, Sterling Windmill wrote: > > I am using a two disk raid1 btrfs filesystem spanning two external hard > > drives connected via USB 3.0. > > Is there any speed difference between the two device? > And are these 2 devices under the same USB3.0 root hub or different root > hubs? > > lsusb output could help to determine the hierarchy. > > > > > While copying ~6TB of data from this filesystem to local disk via rsync > > I am seeing messages like the following in dmesg output: > > > > [ 2213.406267] BTRFS warning (device sdj1): csum failed root 5 ino 830 > > off 2124197888 csum 0xb5da0cd2 expected csum 0x6e478250 mirror 2 > > Since only one copy shows the problem, the other copy should be good > thus the read should work without problem. > > > [ 4890.178727] BTRFS warning (device sdj1): csum failed root 5 ino 1058 > > off 26052067328 csum 0x8ccd1067 expected csum 0x4adb8254 mirror 2 > > [27463.940218] BTRFS warning (device sdj1): csum failed root 5 ino 5372 > > off 7954096128 csum 0x9f9b697e expected csum 0xbd61a0e2 mirror 2 > > [29405.832643] BTRFS warning (device sdj1): csum failed root 5 ino 31374 > > off 7893983232 csum 0x12fd0ddc expected csum 0xddcd2f8e mirror 2 > > [31224.279082] BTRFS warning (device sdj1): csum failed root 5 ino > > 150903 off 183635968 csum 0xea025eb4 expected csum 0x46d64878 mirror 2 > > [32282.635615] BTRFS warning (device sdj1): csum failed root 5 ino > > 162774 off 31092424704 csum 0x1ee9b38d expected csum 0x4022e3de mirror 2 > > [41052.643493] BTRFS warning (device sdj1): csum failed root 5 ino > > 163742 off 52214816768 csum 0x6723208c expected csum 0x0377e68a mirror 2 > > [47723.500430] BTRFS warning (device sdj1): csum failed root 5 ino > > 470775 off 12533760 csum 0x9f50f9a0 expected csum 0x23ddc68e mirror 2 > > [60060.843425] BTRFS warning (device sdj1): csum failed root 5 ino > > 786762 off 4178321408 csum 0xcd520ead expected csum 0x46fe6ebc mirror 2 > > [60900.058745] BTRFS warning (device sdj1): csum failed root 5 ino > > 786900 off 896303104 csum 0x4c7e26e7 expected csum 0x86554095 mirror 2 > > [68149.417236] BTRFS warning (device sdj1): csum failed root 5 ino 1058 > > off 3101224960 csum 0x2b8c363c expected csum 0x8df2991a mirror 1 > > [69072.272010] BTRFS warning (device sdj1): csum failed root 5 ino 1141 > > off 2939588608 csum 0xa2969f63 expected csum 0xddf33efd mirror 1 > > [71342.354453] BTRFS warning (device sdj1): csum failed root 5 ino 1328 > > off 57047568384 csum 0xd57f5bb7 expected csum 0x421f96e5 mirror 1 > > > > Because the device was consistent, it seemed that one of the disks held > > bad data. I wasn't sure if btrfs was correcting the issue by using the > > other seemingly good copy on the second disk or if I was copying bad > > data to the destination filesystem, so I aborted the copy and ran a > > scrub of the filesystem that includes sdj1 by issuing the following > command: > > > > btrfs scrub start /external > > > > I let the scrub finish and monitored the result using the following > command: > > > > btrfs scrub status /external > > > > Which showed the following output: > > > > scrub status for ece518d2-4af0-4ef7-a31d-8c89b13a5ad9 > > scrub started at Sun Jul 29 11:34:44 2018 and finished after > > 14:34:58 > > total bytes scrubbed: 12.80TiB with 0 errors > > Would you provide the dmesg during the scrub? > > > > > Alright, perhaps btrfs had already fixed the issues upon encountering > > them. I ran my copy again only to see very similar messages show up in > > dmesg: > > > > [154842.551604] BTRFS warning (device sdj1): csum failed root 5 ino 1284 > > off 858886144 csum 0x8caf203c expected csum 0x9a3acab6 mirror 2 > > At least the corrupted ino and offset is different, thus the old > corruption is fixed, but somehow it introduced new corruption. > > > [159949.727412] BTRFS warning (device sdj1): csum failed root 5 ino 1636 > > off 4463370240 csum 0x8dfaf00c expected csum 0xa7ab457e mirror 2 > > [160911.893913] BTRFS warning (device sdj1): csum failed root 5 ino 1729 > > off 8181428224 csum 0xd57845b5 expected csum 0x6904c54e mirror 2 > > [165210.245890] BTRFS warning (device sdj1): csum failed root 5 ino 2927 > > off 1013219328 csum 0xf2d2820d expected csum 0x812222bb mirror 2 > > [169279.620570] BTRFS warning (device sdj1): csum failed root 5 ino 3363 > > off 900493312 csum 0x6c6a35a2 expected csum 0x2a983a9c mirror 2 > > [169990.401373] BTRFS warning (device sdj1): csum failed root 5 ino 4277 > > off 186707968 csum 0xbdd075d5 expected csum 0xf302e9df mirror 2 > > [171411.085425] BTRFS warning (device sdj1): csum failed root 5 ino 4719 > > off 593842176 csum 0xcdabc7e6 expected csum 0xc137d47a mirror 2 > > [173370.025471] BTRFS warning (device sdj1): csum failed root 5 ino 5267 > > off 2605592576 csum 0xcd2cb8a8 expected csum 0x9de364e9 mirror 2 > > [180329.942125] BTRFS warning (device sdj1): csum failed root 5 ino > > 162774 off 22459506688 csum 0xc38e7a53 expected csum 0xad11854c mirror 2 > > Since all corruption showed above is about mirror 2, would you mind to > try scrub certain device other than the whole fs and attach the dmesg? > > # btrfs scrub start <device> > > > > > > > I would have expected the scrub to find these issues or to show some > > number of corrected errors. Perhaps I misunderstand what scrub does? > > Your understanding is completely correct. > In fact reading from corrupted block should trigger re-write on > corrupted data. > > I'm wondering if it's related to some scrub race, since for multi-device > btrfs, full fs scrub is addressed by doing multiple scrub > simultaneously, one scrub for each device. > It used to cause problem for raid5/6, but never heard of corruption for > raid1. > > Would you provide the kernel version and full dmesg (including reading > error and scrub, and later read)? > > > > > I also tried tracking down individual files via the referenced inode > > numbers with the following command: > > > > btrfs inspect-internal inode-resolve $INODE /external > > > > And ran checksums of the source and destination versions of these files > > to find them to be identical. So at least the copy on the source and > > destination appear to match. > > Since btrfs will switch to the good copy, the data should be correct. > > > > > Maybe I'm experiencing some sort of intermittent USB device / bus issue? > > Full dmesg may help, if there is something related to usb. > > Thanks, > Qu > > > Can anyone help explain what might be happening here? > > > > Thanks! > > > > > > > > > >