Am Sun, 17 Sep 2017 08:20:50 -0500 schrieb Dan Douglas <orm...@gmail.com>:
> On 09/17/2017 04:17 AM, Kai Krakow wrote: > > Am Sun, 17 Sep 2017 01:20:45 -0500 > > schrieb Dan Douglas <orm...@gmail.com>: > > > >> On 09/16/2017 07:06 AM, Kai Krakow wrote: > [...] > [...] > >> [...] > [...] > [...] > >> > >> According to btrfs-filesystem(8), defragmentation breaks reflinks, > >> in all but a few old kernel versions where I guess they tried to > >> fix the problem and apparently failed. > > > > It was splitting and splicing all the reflinks which is actually a > > tree walk with more and more extents coming into the equation, and > > ended up doing a lot of small IO and needing a lot of memory. I > > think you really cannot fix this when working with extents. > > I figured by "break up" they meant it eliminates the reflink by making > a full copy... so the increased space they're talking about isn't > really double that of the original data in other words. > > > > >> This really makes much of what btrfs > >> does altogether pointless if you ever defragment manually or have > >> autodefrag enabled. Deduplication is broken for the same reason. > > > > It's much easier to fix this for deduplication: Just write your > > common denominator of an extent to a tmp file, then walk all the > > reflinks and share them with parts of this extent. > > > > If you carefully select what to defragment, there should be no > > problem. A defrag tool could simply skip all the shared extents. A > > few fragments do not hurt performance at all, but what's important > > is spatial locality. A lot small fragments may hurt performance a > > lot, so one could give the defragger a hint when to ignore the rule > > and still defragment the extent. Also, when your deduplication > > window is 1M you could probably safely defrag all extents smaller > > than 1M. > > Yeah this sort of hurts with the way I deal wtih KVM image snapshots. > I have raw base images as backing files with lots of shared and null > data, so I run `fallocate --dig-holes' followed by `duperemove > --dedupe-options=same' on the cow-enabled base images and hope that > btrfs defrag can clean up the resulting fragmented mess, but it's a > slow process and doesn't seem to do a good job. I would be interested about your results if you try bees[1] to deduplicate your KVM images. It should be able to dig holes and merge blocks by reflinking. I'm not sure if it would merge continuous extents back into one single extent, I think that's on a todo list. It could act as a reflink-aware defragger then. It currently does not work well for mixed datasum/nodatasum workloads, so I made a PR[2] to ignore nocow files. A more elaborated patch would not try to reflink datasum and nodatasum extents (nocow implies nodatasum). [1]: https://github.com/Zygo/bees [2]: https://github.com/Zygo/bees/pull/21 -- Regards, Kai Replies to list-only preferred.
pgpb6FiJolG_M.pgp
Description: Digitale Signatur von OpenPGP