Hello,
I recently was mounting some drives in the free drive bays of my server
and accidentally removed one of my drives from my raid1 btrfs array. I
immediately put it back in - but whatever damage that would have caused
seems to be already done.
I got these log errors [dmesg -T] and after reading up it seemed that
all I *probably* needed to do was a scrub:
Finally - the original logs I found by running "dmesg -T|grep -i btrfs"
- from where the log started to where the scrub begain:
[Tue Jun 26 18:30:08 2018] BTRFS error (device sdb): bdev /dev/sdh errs:
wr 40547, rd 42808, flush 2, corrupt 0, gen 0
[Tue Jun 26 18:33:19 2018] btrfs_dev_stat_print_on_error: 2056 callbacks
suppressed
A bunch of these errors...
Then later:
[Tue Jun 26 19:19:52 2018] BTRFS warning (device sdb): lost page write
due to IO error on /dev/sdh
I think this is where I unmounted/remounted /mnt/titan:
[Tue Jun 26 19:49:18 2018] BTRFS info (device sdb): disk space caching
is enabled
[Tue Jun 26 19:49:18 2018] BTRFS info (device sdb): has skinny extents
[Tue Jun 26 19:49:18 2018] BTRFS info (device sdb): bdev /dev/sdh errs:
wr 55907, rd 59133, flush 2, corrupt 0, gen 0
[Tue Jun 26 19:49:57 2018] btrfs_dev_stat_print_on_error: 49 callbacks
suppressed
[Tue Jun 26 19:49:57 2018] BTRFS error (device sdb): bdev /dev/sdh errs:
wr 55907, rd 59133, flush 2, corrupt 0, gen 1
[Tue Jun 26 19:49:57 2018] BTRFS error (device sdb): bdev /dev/sdh errs:
wr 55907, rd 59133, flush 2, corrupt 0, gen 2
[Tue Jun 26 19:49:57 2018] BTRFS error (device sdb): bdev /dev/sdh errs:
wr 55907, rd 59133, flush 2, corrupt 0, gen 3
[Tue Jun 26 19:50:09 2018] BTRFS error (device sdb): parent transid
verify failed on 9081264963584 wanted 8663 found 8321
[Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081264963584 (dev /dev/sdh sector 17728418112)
[Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081264967680 (dev /dev/sdh sector 17728418120)
[Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081264971776 (dev /dev/sdh sector 17728418128)
[Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081264975872 (dev /dev/sdh sector 17728418136)
[Tue Jun 26 19:50:09 2018] BTRFS error (device sdb): parent transid
verify failed on 9081265356800 wanted 8663 found 8321
[Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081265356800 (dev /dev/sdh sector 17728418880)
[Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081265360896 (dev /dev/sdh sector 17728418888)
[Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081265364992 (dev /dev/sdh sector 17728418896)
[Tue Jun 26 19:50:09 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081265369088 (dev /dev/sdh sector 17728418904)
According to btrfs scrub status this is when the scrub began...
[Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid
verify failed on 9081265061888 wanted 8663 found 8321
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081265061888 (dev /dev/sdh sector 17728418304)
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081265065984 (dev /dev/sdh sector 17728418312)
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081265070080 (dev /dev/sdh sector 17728418320)
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081265074176 (dev /dev/sdh sector 17728418328)
[Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid
verify failed on 9081263046656 wanted 8664 found 8662
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081263046656 (dev /dev/sdh sector 17728414368)
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081263050752 (dev /dev/sdh sector 17728414376)
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081263054848 (dev /dev/sdh sector 17728414384)
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081263058944 (dev /dev/sdh sector 17728414392)
[Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid
verify failed on 9081263194112 wanted 8664 found 8662
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081263194112 (dev /dev/sdh sector 17728414656)
[Tue Jun 26 19:50:29 2018] BTRFS info (device sdb): read error
corrected: ino 0 off 9081263198208 (dev /dev/sdh sector 17728414664)
[Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid
verify failed on 9081264128000 wanted 8664 found 8662
[Tue Jun 26 19:50:29 2018] BTRFS error (device sdb): parent transid
verify failed on 9081265520640 wanted 8665 found 8321
So I ran btrfs scrub and this was my post-scrub status:
Idoru:/ # btrfs scrub status -d /mnt/titan
scrub status for e7c494fc-ac4e-4644-a2f2-66f25543708b
scrub device /dev/sdb (id 1) history
scrub started at Tue Jun 26 19:50:29 2018 and finished after
16:10:04
total bytes scrubbed: 8.57TiB with 0 errors
scrub device /dev/sdh (id 2) history
scrub started at Tue Jun 26 19:50:29 2018 and finished after
16:13:31
total bytes scrubbed: 8.57TiB with 3 errors
error details: super=3
corrected errors: 0, uncorrectable errors: 0, unverified errors: 0
On a side note - I'm not sure if it matters (or why it
might/could/would) - but I actually used entire drives [as opposed to
making partitions and using them] to create my mirror.
Idoru:/ # gdisk -l /dev/sdb
GPT fdisk (gdisk) version 1.0.1
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: not present
Creating new GPT entries.
Disk /dev/sdb: 19532873728 sectors, 9.1 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): DEDB0F7A-93A7-45C9-8235-828492422559
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 19532873694
Partitions will be aligned on 2048-sector boundaries
Total free space is 19532873661 sectors (9.1 TiB)
Number Start (sector) End (sector) Size Code Name
Also, here is the device usage - post scrub [it seems to be the same
byte for byte]:
Idoru:/ # btrfs device usage -b /mnt/titan
/dev/sdb, ID: 1
Device size: 10000831348736
Device slack: 0
Data,RAID1: 9428526956544
Metadata,RAID1: 13958643712
System,RAID1: 8388608
Unallocated: 558337359872
/dev/sdh, ID: 2
Device size: 10000831348736
Device slack: 0
Data,RAID1: 9428526956544
Metadata,RAID1: 13958643712
System,RAID1: 8388608
Unallocated: 558337359872
I unmounted again and ran btrfs check [without repair] again - after
doing scrub:
Idoru:/ # btrfs check /dev/sdh
Checking filesystem on /dev/sdh
UUID: e7c494fc-ac4e-4644-a2f2-66f25543708b
checking extents
checking free space cache
checking fs roots
checking only csum items (without verifying data)
checking root refs
found 9414630215796 bytes used, no error found
total csum bytes: 9191775488
total tree bytes: 12803850240
total fs tree bytes: 3054665728
total extent tree bytes: 230211584
btree space waste bytes: 782320214
file data blocks allocated: 132510269804544
referenced 132509780557824
Idoru:/ # btrfs check /dev/sdb
Checking filesystem on /dev/sdb
UUID: e7c494fc-ac4e-4644-a2f2-66f25543708b
checking extents
checking free space cache
checking fs roots
checking only csum items (without verifying data)
checking root refs
found 9414630215796 bytes used, no error found
total csum bytes: 9191775488
total tree bytes: 12803850240
total fs tree bytes: 3054665728
total extent tree bytes: 230211584
btree space waste bytes: 782320214
file data blocks allocated: 132510269804544
referenced 132509780557824
Everything seems identical [and OK].
So I tried remounting and found the following in my dmesg log:
Fri Jun 29 13:27:21 2018] BTRFS info (device sdb): disk space caching is
enabled
[Fri Jun 29 13:27:21 2018] BTRFS info (device sdb): has skinny extents
[Fri Jun 29 13:27:21 2018] BTRFS info (device sdb): bdev /dev/sdh errs:
wr 55907, rd 59133, flush 2, corrupt 0, gen 3
So I still seem to have a problem...
I guess I'm wondering why the super errors were not corrected in the
scrub [uncorrectable errors:0 - super but it didn't correct - or even
try to - correct the super errors]?
Is there an easy way to tell the drive I removed to "resync" to the
drive that was not removed? I actually assumed [yes - I know what they
say about assumptions] that the OS would know which drive was valid in
event of disagreement and would *easily* catch and correct this.
Since that doesn't seem to be the case - I'm wondering if there is a
command I'm overlooking or if I need to break and rebuild the entire
mirror again? Or is this something that btrfs check -repair is for (it
doesn't seem to be saying anything is wrong when run without -repair)?
Is there a way to revalidate and repair only the superblock [perhaps by
using the superblock from the drive that wasn't removed]? I definitely
don't want "btrfs check -repair" deciding there's a problem with my
"good drive" and breaking it in the process of fixing things - lol.
It's also more than a bit scary that "btrfs check -repair" is always
mentioned "as a last resort" - thus my writing and hoping to avoid it
[unless it's the proper next step?]
Apologies for the long post - I'm hoping it's enough information to
assist me without being overwhelmingly useless.
Thank you to any and all who can help :)
J
P.S: Requisite info requested from
https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list
Idoru:/ # uname -a
Linux Idoru 4.16.12-2-default #1 SMP PREEMPT Fri May 25 18:40:19 UTC
2018 (39c7522) x86_64 x86_64 x86_64 GNU/Linux
Idoru:/ # btrfs --version
btrfs-progs v4.17 [I'm running openSUSE tumbleweed and I noticed a
kernel update has occured without reboot so I suspect I may have
originally been running btrfs scrub from v4.16 tools instead]
Idoru:/ # btrfs fi show /mnt/titan
Label: 'Titan A.E.' uuid: e7c494fc-ac4e-4644-a2f2-66f25543708b
Total devices 2 FS bytes used 8.57TiB
devid 1 size 9.10TiB used 8.59TiB path /dev/sdb
devid 2 size 9.10TiB used 8.59TiB path /dev/sdh [was /dev/sdc
before removal + reinsertion]
Idoru:/boot # btrfs fi df /mnt/titan/
Data, RAID1: total=8.58TiB, used=8.56TiB
System, RAID1: total=8.00MiB, used=1.19MiB
Metadata, RAID1: total=13.00GiB, used=11.92GiB
GlobalReserve, single: total=512.00MiB, used=0.00
Re: dmesg - I am hoping there is enough relevance in the dmesg info I
have already included - I would only rather include the log if
*absolutely* necessary - thank you :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html