> I don't know. The exact nature of the damage of a failing controller
> is adding a significant unknown component to it. If it was just a
> matter of not writing anything at all, then there'd be no problem. But
> it sounds like it wrote spurious or corrupt data, possibly into
> locations that weren't even supposed to be written to.

Unfortunately I cannot figure out exactly what happened. Logs end
Friday night while the backup script was running -- which also
includes a finalizing balancing of the device. Monday morning after
some exchange of hardware the machine came up being unable to mount
the device.

> I think if the snapshot b-tree is ok, and the chunk b-tree is ok, then
> it should be possible to recover the data correctly without needing
> any other tree. I'm not sure if that's how btrfs restore already
> works.
>
> Kernel 5.11 has a new feature, mount -o ro,rescue=all that is more
> tolerant of mounting when there are various kinds of problems. But
> there's another thread where a failed controller is thwarting
> recovery, and that code is being looked at for further enhancement.
> https://lore.kernel.org/linux-btrfs/CAEg-Je-DJW3saYKA2OBLwgyLU6j0JOF7NzXzECi0HJ5hft_5=a...@mail.gmail.com/

OK -- I now had the chance to temporarily switch to 5.11.2. Output
looks cleaner, but the error stays the same.

root@hikitty:/mnt$ mount -o ro,rescue=all /dev/sdi1 hist/

[ 3937.815083] BTRFS info (device sdi1): enabling all of the rescue options
[ 3937.815090] BTRFS info (device sdi1): ignoring data csums
[ 3937.815093] BTRFS info (device sdi1): ignoring bad roots
[ 3937.815095] BTRFS info (device sdi1): disabling log replay at mount time
[ 3937.815098] BTRFS info (device sdi1): disk space caching is enabled
[ 3937.815100] BTRFS info (device sdi1): has skinny extents
[ 3938.903454] BTRFS error (device sdi1): bad tree block start, want
122583416078336 have 0
[ 3938.994662] BTRFS error (device sdi1): bad tree block start, want
99593231630336 have 0
[ 3939.201321] BTRFS error (device sdi1): bad tree block start, want
124762809384960 have 0
[ 3939.221395] BTRFS error (device sdi1): bad tree block start, want
124762809384960 have 0
[ 3939.221476] BTRFS error (device sdi1): failed to read block groups: -5
[ 3939.268928] BTRFS error (device sdi1): open_ctree failed

I still hope that there might be some error in the fs created by the
crash, which can be resolved instead of real damage to all the data in
the FS trees. I used a lot of snapshots and deduplication on that
device, so that I expect some damage by a hardware error. But I find
it hard to believe that every file got damaged.

Sebastian

Reply via email to