On 10/11/2018 12:15 AM, Chris Murphy wrote:
Is this a 68T file system? Seems excessive.
Haha, by excessive I mean nuking such a big fs just for being unable
to remove the space tree. I'm quite sure the devs would like to get
that crashing bug fixed, anyway.
A second FS just started failing. I never had this much trouble with
space cache v1.
This host had a DIMM failure a couple of weeks ago which caused the
system to halt due to uncorrectable ECC error(s). That was the only
recent unsafe shutdown. Other than that, things have been running
normally until today when the FS went read-only during backups. As with
the other host, I tried to clear the space-cache (v2) before doing a
'check --repair' but got this:
[root@fubar ~]# btrfs check --clear-space-cache=v2 /dev/Cached/Nearline
Opening filesystem to check...
Checking filesystem on /dev/Cached/Nearline
UUID: 68d31d5f-97a2-4a73-a398-c7c13ff439a5
Clear free space cache v2
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120,
have=17478763091281320157
extent-tree.c:2703: alloc_reserved_tree_block: BUG_ON `ret` triggered,
value -17
btrfs(+0x1ff96)[0x55eae7dc5f96]
btrfs(+0x2109f)[0x55eae7dc709f]
btrfs(+0x2115e)[0x55eae7dc715e]
btrfs(+0x22054)[0x55eae7dc8054]
btrfs(+0x22c57)[0x55eae7dc8c57]
btrfs(btrfs_alloc_free_block+0xc2)[0x55eae7dcca72]
btrfs(__btrfs_cow_block+0x18a)[0x55eae7dbc05a]
btrfs(btrfs_cow_block+0x104)[0x55eae7dbc874]
btrfs(btrfs_search_slot+0x35f)[0x55eae7dbf6cf]
btrfs(btrfs_clear_free_space_tree+0x104)[0x55eae7de8b54]
btrfs(cmd_check+0xb11)[0x55eae7e0ce31]
btrfs(main+0x88)[0x55eae7dbaaa8]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7fead8094413]
btrfs(_start+0x2e)[0x55eae7dbabbe]
Aborted (core dumped)
# btrfs fi show /public/nearline/
Label: none uuid: 68d31d5f-97a2-4a73-a398-c7c13ff439a5
Total devices 1 FS bytes used 61.09TiB
devid 1 size 65.25TiB used 61.45TiB path
/dev/mapper/Cached-Nearline
# btrfs fi df /public/nearline/
Data, single: total=61.39TiB, used=61.03TiB
System, single: total=32.00MiB, used=6.59MiB
Metadata, single: total=67.00GiB, used=65.85GiB
GlobalReserve, single: total=512.00MiB, used=4.02MiB
# btrfs fi usage /public/nearline/
Overall:
Device size: 65.25TiB
Device allocated: 61.45TiB
Device unallocated: 3.79TiB
Device missing: 0.00B
Used: 61.09TiB
Free (estimated): 4.15TiB (min: 4.15TiB)
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 512.00MiB (used: 4.02MiB)
Data,single: Size:61.39TiB, Used:61.03TiB
/dev/mapper/Cached-Nearline 61.39TiB
Metadata,single: Size:67.00GiB, Used:65.85GiB
/dev/mapper/Cached-Nearline 67.00GiB
System,single: Size:32.00MiB, Used:6.59MiB
/dev/mapper/Cached-Nearline 32.00MiB
Unallocated:
/dev/mapper/Cached-Nearline 3.79TiB
4.19.10-300.fc29.x86_64
btrfs-progs v4.17.1
I haven't nuked the other FS yet so I now have two that are either in
the same or at least very similar states.
What additional information can I provide?
--Larkin