On Sat, Sep 16, 2017 at 8:06 AM, Kai Krakow <hurikha...@gmail.com> wrote:
>
> But I guess that btrfs doesn't use 10G sized extents? And I also guess,
> this is where autodefrag jumps in.
>

It definitely doesn't use 10G extents considering the chunks are only
1GB.  (For those who aren't aware, btrfs divides devices into chunks
which basically act like individual sub-devices to which operations
like mirroring/raid/etc are applied.  This is why you can change raid
modes on the fly - the operation takes effect on new chunks.  This
also allows clever things like a "RAID1" on 3x1TB disks to have 1.5TB
of useful space, because the chunks essentially balance themselves
across all three disks in pairs.  It also is what causes the infamous
issues when btrfs runs low on space - once the last chunk is allocated
it can become difficult to rebalance/consolidate the remaining space.)

I couldn't actually find any info on default extent size.  I did find
a 128MB example in the docs, so presumably that isn't an unusual size.
So, the 1MB example would probably still work.  Obviously if an entire
extent becomes obsolete it will lose its reference count and become
free.

Defrag was definitely intended to deal with this.  I haven't looked at
the state of it in ages, when I stopped using it due to a bug and some
limitations.  The main limitation being that defrag at least used to
be over-zealous.  Not only would it free up the 1MB of wasted space,
as in this example, but if that 1GB file had a reflink clone it would
go ahead and split it into two duplicate 1GB extents.  I believe that
dedup would do the reverse of this.  Getting both to work together
"the right way" didn't seem possible the last time I looked into it,
but if that has changed I'm interested.

Granted, I've been moving away from btrfs lately, due to the fact that
it just hasn't matured as I originally thought it would.  I really
love features like reflinks, but it has been years since it was
"almost ready" and it still tends to eat data.  For the moment I'm
relying more on zfs.  I'd love to switch back if they ever pull things
together.  The other filesystem I'm eyeing with interest is cephfs,
but that still is slightly immature (on-disk checksums were only just
added), and it has a bit of overhead until you get into fairly large
arrays.  Cheap arm-based OSD options seem to be fairly RAM-starved at
the moment as well given the ceph recommendation of 1GB/TB.  arm64
still seems to be slow to catch on, let alone cheap boards with 4-16GB
of RAM.

-- 
Rich

Reply via email to