On Fri, Jun 29, 2018 at 11:55 AM, <j...@livingrock.ca> wrote: > Hello, > > I recently was mounting some drives in the free drive bays of my server and > accidentally removed one of my drives from my raid1 btrfs array. I > immediately put it back in - but whatever damage that would have caused > seems to be already done. > > I got these log errors [dmesg -T] and after reading up it seemed that all I > *probably* needed to do was a scrub: > > Finally - the original logs I found by running "dmesg -T|grep -i btrfs" - > from where the log started to where the scrub begain: > > [Tue Jun 26 18:30:08 2018] BTRFS error (device sdb): bdev /dev/sdh errs: wr > 40547, rd 42808, flush 2, corrupt 0, gen 0 > > [Tue Jun 26 18:33:19 2018] btrfs_dev_stat_print_on_error: 2056 callbacks > suppressed > > A bunch of these errors... > > Then later: > > [Tue Jun 26 19:19:52 2018] BTRFS warning (device sdb): lost page write due > to IO error on /dev/sdh > > I think this is where I unmounted/remounted /mnt/titan: > > [Tue Jun 26 19:49:18 2018] BTRFS info (device sdb): disk space caching is > enabled > [Tue Jun 26 19:49:18 2018] BTRFS info (device sdb): has skinny extents > [Tue Jun 26 19:49:18 2018] BTRFS info (device sdb): bdev /dev/sdh errs: wr > 55907, rd 59133, flush 2, corrupt 0, gen 0 > [Tue Jun 26 19:49:57 2018] btrfs_dev_stat_print_on_error: 49 callbacks > suppressed > [Tue Jun 26 19:49:57 2018] BTRFS error (device sdb): bdev /dev/sdh errs: wr > 55907, rd 59133, flush 2, corrupt 0, gen 1 > [Tue Jun 26 19:49:57 2018] BTRFS error (device sdb): bdev /dev/sdh errs: wr > 55907, rd 59133, flush 2, corrupt 0, gen 2 > [Tue Jun 26 19:49:57 2018] BTRFS error (device sdb): bdev /dev/sdh errs: wr > 55907, rd 59133, flush 2, corrupt 0, gen 3 > [Tue Jun 26 19:50:09 2018] BTRFS error (device sdb): parent transid verify > failed on 9081264963584 wanted 8663 found 8321 > [Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081264963584 (dev /dev/sdh sector 17728418112) > [Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081264967680 (dev /dev/sdh sector 17728418120) > [Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081264971776 (dev /dev/sdh sector 17728418128) > [Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081264975872 (dev /dev/sdh sector 17728418136) > [Tue Jun 26 19:50:09 2018] BTRFS error (device sdb): parent transid verify > failed on 9081265356800 wanted 8663 found 8321 > [Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081265356800 (dev /dev/sdh sector 17728418880) > [Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081265360896 (dev /dev/sdh sector 17728418888) > [Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081265364992 (dev /dev/sdh sector 17728418896) > [Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081265369088 (dev /dev/sdh sector 17728418904)
These are passive fixups that succeed. > > According to btrfs scrub status this is when the scrub began... > > [Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid verify > failed on 9081265061888 wanted 8663 found 8321 > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081265061888 (dev /dev/sdh sector 17728418304) > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081265065984 (dev /dev/sdh sector 17728418312) > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081265070080 (dev /dev/sdh sector 17728418320) > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081265074176 (dev /dev/sdh sector 17728418328) > [Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid verify > failed on 9081263046656 wanted 8664 found 8662 > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081263046656 (dev /dev/sdh sector 17728414368) > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081263050752 (dev /dev/sdh sector 17728414376) > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081263054848 (dev /dev/sdh sector 17728414384) > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081263058944 (dev /dev/sdh sector 17728414392) > [Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid verify > failed on 9081263194112 wanted 8664 found 8662 > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081263194112 (dev /dev/sdh sector 17728414656) > [Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error corrected: > ino 0 off 9081263198208 (dev /dev/sdh sector 17728414664) > [Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid verify > failed on 9081264128000 wanted 8664 found 8662 > [Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid verify > failed on 9081265520640 wanted 8665 found 8321 Fixups during scrub also succeed. > > So I ran btrfs scrub and this was my post-scrub status: > > Idoru:/ # btrfs scrub status -d /mnt/titan > scrub status for e7c494fc-ac4e-4644-a2f2-66f25543708b > scrub device /dev/sdb (id 1) history > scrub started at Tue Jun 26 19:50:29 2018 and finished after 16:10:04 > total bytes scrubbed: 8.57TiB with 0 errors > scrub device /dev/sdh (id 2) history > scrub started at Tue Jun 26 19:50:29 2018 and finished after 16:13:31 > total bytes scrubbed: 8.57TiB with 3 errors > error details: super=3 > corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 > > On a side note - I'm not sure if it matters (or why it might/could/would) - > but I actually used entire drives [as opposed to making partitions and using > them] to create my mirror. Doesn't matter. > I unmounted again and ran btrfs check [without repair] again - after doing > scrub: > > Idoru:/ # btrfs check /dev/sdh > Checking filesystem on /dev/sdh > UUID: e7c494fc-ac4e-4644-a2f2-66f25543708b > checking extents > checking free space cache > checking fs roots > checking only csum items (without verifying data) > checking root refs > found 9414630215796 bytes used, no error found > total csum bytes: 9191775488 > total tree bytes: 12803850240 > total fs tree bytes: 3054665728 > total extent tree bytes: 230211584 > btree space waste bytes: 782320214 > file data blocks allocated: 132510269804544 > referenced 132509780557824 > > Idoru:/ # btrfs check /dev/sdb > Checking filesystem on /dev/sdb > UUID: e7c494fc-ac4e-4644-a2f2-66f25543708b > checking extents > checking free space cache > checking fs roots > checking only csum items (without verifying data) > checking root refs > found 9414630215796 bytes used, no error found > total csum bytes: 9191775488 > total tree bytes: 12803850240 > total fs tree bytes: 3054665728 > total extent tree bytes: 230211584 > btree space waste bytes: 782320214 > file data blocks allocated: 132510269804544 > referenced 132509780557824 FWIW you only need to check once. It doesn't matter which device you pass to btrfs check, it finds the other devices and they all get checked. > > Everything seems identical [and OK]. > > So I tried remounting and found the following in my dmesg log: > > Fri Jun 29 13:27:21 2018] BTRFS info (device sdb): disk space caching is > enabled > [Fri Jun 29 13:27:21 2018] BTRFS info (device sdb): has skinny extents > [Fri Jun 29 13:27:21 2018] BTRFS info (device sdb): bdev /dev/sdh errs: wr > 55907, rd 59133, flush 2, corrupt 0, gen 3 > > So I still seem to have a problem... The bdev stats is a counter, not an active error message. If you want to zero out the counter, use -z option with btrfs dev stats. > I guess I'm wondering why the super errors were not corrected in the scrub > [uncorrectable errors:0 - super but it didn't correct - or even try to - > correct the super errors]? You can use 'btrfs rescue super -v' to check all the supers separately from any other check. But the supers would have been fixed automatically right off the bat with a normal mount in this case. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html