Quoting Qu Wenruo <quwenruo.bt...@gmx.com>:
On 2019/9/10 上午9:24, webmas...@zedlx.com wrote:
Quoting Qu Wenruo <quwenruo.bt...@gmx.com>:
Btrfs defrag works by creating new extents containing the old data.
So if btrfs decides to defrag, no old extents will be used.
It will all be new extents.
That's why your proposal is freaking strange here.
Ok, but: can the NEW extents still be shared?
Can only be shared by reflink.
Not automatically, so if btrfs decides to defrag, it will not be shared
at all.
If you had an extent E88
shared by 4 files in different subvolumes, can it be copied to another
place and still be shared by the original 4 files?
Not for current btrfs.
I guess that the
answer is YES. And, that's the only requirement for a good defrag
algorithm that doesn't shrink free space.
We may go that direction.
The biggest burden here is, btrfs needs to do expensive full-backref
walk to determine how many files are referring to this extent.
And then change them all to refer to the new extent.
YES! That! Exactly THAT. That is what needs to be done.
I mean, you just create an (perhaps associative) array which links an
extent (the array index contains the extent ID) to all the files that
reference that extent.
You're exactly in the pitfall of btrfs backref walk.
For btrfs, it's definitely not an easy work to do backref walk.
btrfs uses hidden backref, that means, under most case, one extent
shared by 1000 snapshots, in extent tree (shows the backref) it can
completely be possible to only have one ref, for the initial subvolume.
For btrfs, you need to walk up the tree to find how it's shared.
It has to be done like that, that's why we call it backref-*walk*.
E.g
A (subvol 257) B (Subvol 258, snapshot of 257)
| \ / |
| X |
| / \ |
C D
/ \ / \
E F G H
In extent tree, E is only referred by subvol 257.
While C has two referencers, 257 and 258.
So in reality, you need to:
1) Do a tree search from subvol 257
You got a path, E -> C -> A
2) Check each node to see if it's shared.
E is only referred by C, no extra referencer.
C is refered by two new tree blocks, A and B.
A is refered by subvol 257.
B is refered by subvol 258.
So E is shared by 257 and 258.
Now, you see how things would go mad, for each extent you must go that
way to determine the real owner of each extent, not to mention we can
have at most 8 levels, tree blocks at level 0~7 can all be shared.
If it's shared by 1000 subvolumes, hope you had a good day then.
Ok, let's do just this issue for the time being. One issue at a time.
It will be easier.
The solution is to temporarily create a copy of the entire
backref-tree in memory. To create this copy, you just do a preorder
depth-first traversal following only forward references.
So this preorder depth-first traversal would visit the nodes in the
following order:
A,C,E,F,D,G,H,B
Oh, it is not a tree, it is a DAG in that example of yours. OK,
preorder is possible on DAG, too. But how did you get a DAG, shouldn't
it be all trees?
When you have the entire backref-tree (backref-DAG?) in memory, doing
a backref-walk is a piece of cake.
Of course, this in-memory backref tree has to be kept in sync with the
filesystem, that is it has to be updated whenever there is a write to
disk. That's not so hard.