On Sat, Sep 16, 2017 at 8:06 AM, Kai Krakow <hurikha...@gmail.com> wrote: > > But I guess that btrfs doesn't use 10G sized extents? And I also guess, > this is where autodefrag jumps in. >
It definitely doesn't use 10G extents considering the chunks are only 1GB. (For those who aren't aware, btrfs divides devices into chunks which basically act like individual sub-devices to which operations like mirroring/raid/etc are applied. This is why you can change raid modes on the fly - the operation takes effect on new chunks. This also allows clever things like a "RAID1" on 3x1TB disks to have 1.5TB of useful space, because the chunks essentially balance themselves across all three disks in pairs. It also is what causes the infamous issues when btrfs runs low on space - once the last chunk is allocated it can become difficult to rebalance/consolidate the remaining space.) I couldn't actually find any info on default extent size. I did find a 128MB example in the docs, so presumably that isn't an unusual size. So, the 1MB example would probably still work. Obviously if an entire extent becomes obsolete it will lose its reference count and become free. Defrag was definitely intended to deal with this. I haven't looked at the state of it in ages, when I stopped using it due to a bug and some limitations. The main limitation being that defrag at least used to be over-zealous. Not only would it free up the 1MB of wasted space, as in this example, but if that 1GB file had a reflink clone it would go ahead and split it into two duplicate 1GB extents. I believe that dedup would do the reverse of this. Getting both to work together "the right way" didn't seem possible the last time I looked into it, but if that has changed I'm interested. Granted, I've been moving away from btrfs lately, due to the fact that it just hasn't matured as I originally thought it would. I really love features like reflinks, but it has been years since it was "almost ready" and it still tends to eat data. For the moment I'm relying more on zfs. I'd love to switch back if they ever pull things together. The other filesystem I'm eyeing with interest is cephfs, but that still is slightly immature (on-disk checksums were only just added), and it has a bit of overhead until you get into fairly large arrays. Cheap arm-based OSD options seem to be fairly RAM-starved at the moment as well given the ceph recommendation of 1GB/TB. arm64 still seems to be slow to catch on, let alone cheap boards with 4-16GB of RAM. -- Rich