On Thu, May 08, 2025 at 07:02:34PM +1000, Jamie McClymont wrote: > Hello, > > Would it be a sane idea to support a reduced metadata_replicas value > for the types of btrees that fall under btree_id_is_alloc, and could > thus be recreated by `-o reconstruct_alloc` (or, I think, > automatically via bch2_btree_lost_data) if lost?
Backpointers, perhaps. I wouldn't recommend doing that for the main alloc btree, reconstructing that one from scratch will fail if your filesystem is big enough because currently we have to do it before going RW, so the whole thing has to fit in memory (in the journal_keys array). Once we have check_allocations working online that will be easier. > > For context, my filesystem (2x280GB SSDs for metadata + caching, > 4x16TB hard drives for data, all types of replicas=2), reports around > 220GB of Btree usage: > > extents: 104 GiB > inodes: 8.11 GiB > dirents: 1.78 GiB > xattrs: 512 KiB > alloc: 26.5 GiB > quotas: 512 KiB > stripes: 512 KiB > reflink: 12.0 MiB > subvolumes: 512 KiB > snapshots: 512 KiB > lru: 763 MiB > freespace: 6.50 MiB > need_discard: 389 MiB > backpointers: 86.6 GiB > bucket_gens: 299 MiB > snapshot_trees: 512 KiB > deleted_inodes: 512 KiB > logged_ops: 1.00 MiB > rebalance_work: 855 MiB > subvolume_children: 512 KiB > accounting: 1.00 MiB > > I figure reducing alloc replicas to 1 would save 55GiB of btree space on the > fast SSDs, which could be better used for user-data caching, without > sacrificing any durability – just a one-off recovery-pass penalty in the > event of an SSD failure (or a one-off read/checksum error, but I have > thankfully had none of those from these drives). > > Am I missing a situation where this could lead to data loss? If not, I might > look at implementing it. > > Thanks, > - Jamie McClymont
