On 10/11/2018 12:15 AM, Chris Murphy wrote:
Is this a 68T file system? Seems excessive.
Haha, by excessive I mean nuking such a big fs just for being unable
to remove the space tree. I'm quite sure the devs would like to get
that crashing bug fixed, anyway.

A second FS just started failing. I never had this much trouble with space cache v1.

This host had a DIMM failure a couple of weeks ago which caused the system to halt due to uncorrectable ECC error(s). That was the only recent unsafe shutdown. Other than that, things have been running normally until today when the FS went read-only during backups. As with the other host, I tried to clear the space-cache (v2) before doing a 'check --repair' but got this:

[root@fubar ~]# btrfs check --clear-space-cache=v2 /dev/Cached/Nearline
Opening filesystem to check...
Checking filesystem on /dev/Cached/Nearline
UUID: 68d31d5f-97a2-4a73-a398-c7c13ff439a5
Clear free space cache v2
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
checksum verify failed on 271262429573120 found 1BA4548E wanted D105DF84
bad tree block 271262429573120, bytenr mismatch, want=271262429573120, have=17478763091281320157 extent-tree.c:2703: alloc_reserved_tree_block: BUG_ON `ret` triggered, value -17
btrfs(+0x1ff96)[0x55eae7dc5f96]
btrfs(+0x2109f)[0x55eae7dc709f]
btrfs(+0x2115e)[0x55eae7dc715e]
btrfs(+0x22054)[0x55eae7dc8054]
btrfs(+0x22c57)[0x55eae7dc8c57]
btrfs(btrfs_alloc_free_block+0xc2)[0x55eae7dcca72]
btrfs(__btrfs_cow_block+0x18a)[0x55eae7dbc05a]
btrfs(btrfs_cow_block+0x104)[0x55eae7dbc874]
btrfs(btrfs_search_slot+0x35f)[0x55eae7dbf6cf]
btrfs(btrfs_clear_free_space_tree+0x104)[0x55eae7de8b54]
btrfs(cmd_check+0xb11)[0x55eae7e0ce31]
btrfs(main+0x88)[0x55eae7dbaaa8]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7fead8094413]
btrfs(_start+0x2e)[0x55eae7dbabbe]
Aborted (core dumped)

# btrfs fi show /public/nearline/
Label: none  uuid: 68d31d5f-97a2-4a73-a398-c7c13ff439a5
        Total devices 1 FS bytes used 61.09TiB
        devid    1 size 65.25TiB used 61.45TiB path /dev/mapper/Cached-Nearline

# btrfs fi df /public/nearline/
Data, single: total=61.39TiB, used=61.03TiB
System, single: total=32.00MiB, used=6.59MiB
Metadata, single: total=67.00GiB, used=65.85GiB
GlobalReserve, single: total=512.00MiB, used=4.02MiB

# btrfs fi usage /public/nearline/
Overall:
    Device size:                  65.25TiB
    Device allocated:             61.45TiB
    Device unallocated:            3.79TiB
    Device missing:                  0.00B
    Used:                         61.09TiB
    Free (estimated):              4.15TiB      (min: 4.15TiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 4.02MiB)

Data,single: Size:61.39TiB, Used:61.03TiB
   /dev/mapper/Cached-Nearline    61.39TiB

Metadata,single: Size:67.00GiB, Used:65.85GiB
   /dev/mapper/Cached-Nearline    67.00GiB

System,single: Size:32.00MiB, Used:6.59MiB
   /dev/mapper/Cached-Nearline    32.00MiB

Unallocated:
   /dev/mapper/Cached-Nearline     3.79TiB

4.19.10-300.fc29.x86_64
btrfs-progs v4.17.1

I haven't nuked the other FS yet so I now have two that are either in the same or at least very similar states.

What additional information can I provide?

--Larkin

Reply via email to