> > > I don't know. The exact nature of the damage of a failing controller
> > > is adding a significant unknown component to it. If it was just a
> > > matter of not writing anything at all, then there'd be no problem. But
> > > it sounds like it wrote spurious or corrupt data, possibly into
> > > locations that weren't even supposed to be written to.
> >
> > Unfortunately I cannot figure out exactly what happened. Logs end
> > Friday night while the backup script was running -- which also
> > includes a finalizing balancing of the device. Monday morning after
> > some exchange of hardware the machine came up being unable to mount
> > the device.
>
> It's probably not discernible with logs anyway. What hardware does
> when it goes berserk? It's chaos. And all file systems have write
> order requirements. It's fine if at a certain point writes just
> abruptly stop going to stable media. But if things are written out of
> order, or if the hardware acknowledges critical metadata writes are
> written but were actually dropped, it's bad. For all file systems.
>
>
> > OK -- I now had the chance to temporarily switch to 5.11.2. Output
> > looks cleaner, but the error stays the same.
> >
> > root@hikitty:/mnt$ mount -o ro,rescue=all /dev/sdi1 hist/
> >
> > [ 3937.815083] BTRFS info (device sdi1): enabling all of the rescue options
> > [ 3937.815090] BTRFS info (device sdi1): ignoring data csums
> > [ 3937.815093] BTRFS info (device sdi1): ignoring bad roots
> > [ 3937.815095] BTRFS info (device sdi1): disabling log replay at mount time
> > [ 3937.815098] BTRFS info (device sdi1): disk space caching is enabled
> > [ 3937.815100] BTRFS info (device sdi1): has skinny extents
> > [ 3938.903454] BTRFS error (device sdi1): bad tree block start, want
> > 122583416078336 have 0
> > [ 3938.994662] BTRFS error (device sdi1): bad tree block start, want
> > 99593231630336 have 0
> > [ 3939.201321] BTRFS error (device sdi1): bad tree block start, want
> > 124762809384960 have 0
> > [ 3939.221395] BTRFS error (device sdi1): bad tree block start, want
> > 124762809384960 have 0
> > [ 3939.221476] BTRFS error (device sdi1): failed to read block groups: -5
> > [ 3939.268928] BTRFS error (device sdi1): open_ctree failed
>
> This looks like a super is expecting something that just isn't there
> at all. If spurious behavior lasted only briefly during the hardware
> failure, there's a chance of recovery. But this diminishes greatly if
> the chaotic behavior was on-going for a while, many seconds or a few
> minutes.
>
>
> > I still hope that there might be some error in the fs created by the
> > crash, which can be resolved instead of real damage to all the data in
> > the FS trees. I used a lot of snapshots and deduplication on that
> > device, so that I expect some damage by a hardware error. But I find
> > it hard to believe that every file got damaged.
>
> Correct. They aren't actually damaged.
>
> However, there's maybe 5-15 MiB of critical metadata on Btrfs, and if
> it gets corrupt, the keys to the maze are lost. And it becomes
> difficult, sometimes impossible, to "bootstrap" the file system. There
> are backup entry points, but depending on the workload, they go stale
> in seconds to a few minutes, and can be subject to being overwritten.
>
> When 'btrfs restore' is doing partial recovery that ends up with a lot
> of damage and holes tells me it's found stale parts of the file system
> - it's on old rails so to speak, there's nothing available to tell it
> that this portion of the tree is just old and not valid anymore (or
> only partially valid), but also the restore code is designed to be
> more tolerant of errors because otherwise it would just do nothing at
> all.
>
> I think if you're able to find the most recent root node for a
> snapshot you want to restore, along with an intact chunk tree it
> should be possible to get data out of that snapshot. The difficulty is
> finding it, because it could be almost anywhere.

Would it make sense to just try  restore -t on any root I got with
btrfs-find-root with all of the snapshots?

> OK so you said there's an original and backup file system, are they
> both in equally bad shape, having been on the same controller? Are
> they both btrfs?

The original / live file system was not btrfs but xfs. It is in a
different but equally bad state than the backup. We used bcache with a
write-back cache on a ssd which is now completely dead (does not get
recognized by any server anymore). To get the file system mounted I
ran xfs-repair. After that only 6% of the data was left and this is
nearly completely in lost+found. I'm now trying to sort these files by
type, since the data itself looks OK. Unfortunately the surviving
files seem to be the oldest ones.

> What do you get for
>
> btrfs insp dump-s -f /dev/sdXY
>
> There might be a backup tree root in there that can be used with btrfs
> restore -t

This is the output of ./btrfs  insp dump-s -f /dev/sdi1 run with
btrfs-progs 5.9.

./btrfs insp dump-s -f /dev/sdi1
superblock: bytenr=65536, device=/dev/sdi1
---------------------------------------------------------
csum_type               0 (crc32c)
csum_size               4
csum                    0x9e6891fc [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    56051c5f-fca6-4d54-a04e-1c1d8129fe56
metadata_uuid           56051c5f-fca6-4d54-a04e-1c1d8129fe56
label                   history
generation              825256
root                    122583415865344
sys_array_size          129
chunk_root_generation   825256
root_level              2
chunk_root              141944043454464
chunk_root_level        2
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             80013782134784
bytes_used              75176955760640
sectorsize              4096
nodesize                16384
leafsize (deprecated)   16384
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x169
                        ( MIXED_BACKREF |
                          COMPRESS_LZO |
                          BIG_METADATA |
                          EXTENDED_IREF |
                          SKINNY_METADATA )
cache_generation        825256
uuid_tree_generation    825256
dev_item.uuid           844e80b3-a8d5-4738-ac8a-4f54980556f6
dev_item.fsid           56051c5f-fca6-4d54-a04e-1c1d8129fe56 [match]
dev_item.type           0
dev_item.total_bytes    80013782134784
dev_item.bytes_used     75413317484544
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          2
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0
sys_chunk_array[2048]:
        item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 141944034426880)
                length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP
                io_align 65536 io_width 65536 sector_size 4096
                num_stripes 2 sub_stripes 1
                        stripe 0 devid 2 offset 2034741805056
                        dev_uuid 844e80b3-a8d5-4738-ac8a-4f54980556f6
                        stripe 1 devid 2 offset 2034775359488
                        dev_uuid 844e80b3-a8d5-4738-ac8a-4f54980556f6
backup_roots[4]:
        backup 0:
                backup_tree_root:       122583415865344 gen: 825256     level: 2
                backup_chunk_root:      141944043454464 gen: 825256     level: 2
                backup_extent_root:     122583418175488 gen: 825256     level: 3
                backup_fs_root:         58363985428480  gen: 789775     level: 0
                backup_dev_root:        122583415783424 gen: 825256     level: 1
                backup_csum_root:       122583553703936 gen: 825256     level: 3
                backup_total_bytes:     80013782134784
                backup_bytes_used:      75176955760640
                backup_num_devices:     1

        backup 1:
                backup_tree_root:       122343302234112 gen: 825253     level: 2
                backup_chunk_root:      141944034426880 gen: 825251     level: 2
                backup_extent_root:     122343333937152 gen: 825253     level: 3
                backup_fs_root:         58363985428480  gen: 789775     level: 0
                backup_dev_root:        122077274357760 gen: 825250     level: 1
                backup_csum_root:       122343380992000 gen: 825253     level: 3
                backup_total_bytes:     80013782134784
                backup_bytes_used:      75176955105280
                backup_num_devices:     1

        backup 2:
                backup_tree_root:       122343762804736 gen: 825254     level: 2
                backup_chunk_root:      141944034426880 gen: 825251     level: 2
                backup_extent_root:     122343762935808 gen: 825254     level: 3
                backup_fs_root:         58363985428480  gen: 789775     level: 0
                backup_dev_root:        122077274357760 gen: 825250     level: 1
                backup_csum_root:       122343764967424 gen: 825254     level: 3
                backup_total_bytes:     80013782134784
                backup_bytes_used:      75176955105280
                backup_num_devices:     1

        backup 3:
                backup_tree_root:       122574011269120 gen: 825255     level: 2
                backup_chunk_root:      141944034426880 gen: 825251     level: 2
                backup_extent_root:     122574011432960 gen: 825255     level: 3
                backup_fs_root:         58363985428480  gen: 789775     level: 0
                backup_dev_root:        122077274357760 gen: 825250     level: 1
                backup_csum_root:       122574014791680 gen: 825255     level: 3
                backup_total_bytes:     80013782134784
                backup_bytes_used:      75176955236352
                backup_num_devices:     1

Reply via email to