> [ ... ] This post is way too long [ ... ]

Many thanks for your report, it is really useful, especially the
details.

> [ ... ] using rsync with --link-dest to btrfs while still
> using rsync, but with btrfs subvolumes and snapshots [1]. [
> ... ]  Currently there's ~35TiB of data present on the example
> filesystem, with a total of just a bit more than 90000
> subvolumes, in groups of 32 snapshots per remote host (daily
> for 14 days, weekly for 3 months, montly for a year), so
> that's about 2800 'groups' of them. Inside are millions and
> millions and millions of files. And the best part is... it
> just works. [ ... ]

That kind of arrangement, with a single large pool and very many
many files and many subdirectories is a worst case scanario for
any filesystem type, so it is amazing-ish that it works well so
far, especially with 90,000 subvolumes. As I mentioned elsewhere
I would rather do a rotation of smaller volumes, to reduce risk,
like "Duncan" also on this mailing list likes to do (perhaps to
the opposite extreme).

As to the 'ssd'/'nossd' issue that is as described in 'man 5
btrfs' (and I wonder whether 'ssd_spread' was tried too) but it
is not at all obvious it should impact so much metadata
handling. I'll add a new item in the "gotcha" list. 

It is sad that 'ssd' is used by default in your case, and it is
quite perplexing that tghe "wandering trees" problem (that is
"write amplification") is so large with 64KiB write clusters for
metadata (and 'dup' profile for metadata).

* Probably the metadata and data cluster sizes should be create
  or mount parameters instead of being implicit in the 'ssd'
  option.
* A cluster size of 2MiB for metadata and/or data presumably
  has some downsides, otrherwise it would be the default. I
  wonder whether the downsides related to barriers...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to