On 2017-11-03 03:26, Kai Krakow wrote:
Am Thu, 2 Nov 2017 22:47:31 -0400
schrieb Dave <davestechs...@gmail.com>:

On Thu, Nov 2, 2017 at 5:16 PM, Kai Krakow <hurikha...@gmail.com>
wrote:


You may want to try btrfs autodefrag mount option and see if it
improves things (tho, the effect may take days or weeks to apply if
you didn't enable it right from the creation of the filesystem).

Also, autodefrag will probably unshare reflinks on your snapshots.
You may be able to use bees[1] to work against this effect. Its
interaction with autodefrag is not well tested but it works fine
for me. Also, bees is able to reduce some of the fragmentation
during deduplication because it will rewrite extents back into
bigger chunks (but only for duplicated data).

[1]: https://github.com/Zygo/bees

I will look into bees. And yes, I plan to try autodefrag. (I already
have it enabled now.) However, I need to understand something about
how btrfs send-receive works in regard to reflinks and fragmentation.

Say I have 2 snapshots on my live volume. The earlier one of them has
already been sent to another block device by btrfs send-receive (full
backup). Now defrag runs on the live volume and breaks some percentage
of the reflinks. At this point I do an incremental btrfs send-receive
using "-p" (or "-c") with the diff going to the same other block
device where the prior snapshot was already sent.

Will reflinks be "made whole" (restored) on the receiving block
device? Or is the state of the source volume replicated so closely
that reflink status is the same on the target?

Also, is fragmentation reduced on the receiving block device?

My expectation is that fragmentation would be reduced and duplication
would be reduced too. In other words, does send-receive result in
defragmentation and deduplication too?

As far as I understand, btrfs send/receive doesn't create an exact
mirror. It just replays the block operations between generation
numbers. That is: If it finds new blocks referenced between
generations, it will write a _new_ block to the destination.
That is mostly correct, except it's not a block level copy. To put it in a heavily simplified manner, send/receive will recreate the subvolume using nothing more than basic file manipulation syscalls (write(), chown(), chmod(), etc), the clone ioctl, and some extra logic to figure out the correct location to clone from. IOW, it's functionally equivalent to using rsync to copy the data, and then deduplicating, albeit a bit smarter about when to deduplicate (and more efficient in that respect).

So, no, it won't reduce fragmentation or duplication. It just keeps
reflinks intact as long as such extents weren't touched within the
generation range. Otherwise they are rewritten as new extents.
A received subvolume will almost always be less fragmented than the source, since everything is received serially, and each file is written out one at a time.

Autodefrag and deduplication processes will as such probably increase
duplication at the destination. A developer may have a better clue, tho.
In theory, yes, but in practice, not so much. Autodefrag generally operates on very small blocks of data (64k IIRC), and I'm pretty sure it has some heuristic that only triggers it on small random writes, so depending on the workload, it may not be triggering much (for example, it often won't trigger on cache directories, since those almost never have files rewritten in place).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to