On 10.09.19 г. 6:32 ч., webmas...@zedlx.com wrote:
>
> Quoting Qu Wenruo <quwenruo.bt...@gmx.com>:
>
>> On 2019/9/10 上午9:24, webmas...@zedlx.com wrote:
>>>
>>> Quoting Qu Wenruo <quwenruo.bt...@gmx.com>:
>>>
>>>>>> Btrfs defrag works by creating new extents containing the old data.
>>>>>>
>>>>>> So if btrfs decides to defrag, no old extents will be used.
>>>>>> It will all be new extents.
>>>>>>
>>>>>> That's why your proposal is freaking strange here.
>>>>>
>>>>> Ok, but: can the NEW extents still be shared?
>>>>
>>>> Can only be shared by reflink.
>>>> Not automatically, so if btrfs decides to defrag, it will not be shared
>>>> at all.
>>>>
>>>>> If you had an extent E88
>>>>> shared by 4 files in different subvolumes, can it be copied to another
>>>>> place and still be shared by the original 4 files?
>>>>
>>>> Not for current btrfs.
>>>>
>>>>> I guess that the
>>>>> answer is YES. And, that's the only requirement for a good defrag
>>>>> algorithm that doesn't shrink free space.
>>>>
>>>> We may go that direction.
>>>>
>>>> The biggest burden here is, btrfs needs to do expensive full-backref
>>>> walk to determine how many files are referring to this extent.
>>>> And then change them all to refer to the new extent.
>>>
>>> YES! That! Exactly THAT. That is what needs to be done.
>>>
>>> I mean, you just create an (perhaps associative) array which links an
>>> extent (the array index contains the extent ID) to all the files that
>>> reference that extent.
>>
>> You're exactly in the pitfall of btrfs backref walk.
>>
>> For btrfs, it's definitely not an easy work to do backref walk.
>> btrfs uses hidden backref, that means, under most case, one extent
>> shared by 1000 snapshots, in extent tree (shows the backref) it can
>> completely be possible to only have one ref, for the initial subvolume.
>>
>> For btrfs, you need to walk up the tree to find how it's shared.
>>
>> It has to be done like that, that's why we call it backref-*walk*.
>>
>> E.g
>> A (subvol 257) B (Subvol 258, snapshot of 257)
>> | \ / |
>> | X |
>> | / \ |
>> C D
>> / \ / \
>> E F G H
>>
>> In extent tree, E is only referred by subvol 257.
>> While C has two referencers, 257 and 258.
>>
>> So in reality, you need to:
>> 1) Do a tree search from subvol 257
>> You got a path, E -> C -> A
>> 2) Check each node to see if it's shared.
>> E is only referred by C, no extra referencer.
>> C is refered by two new tree blocks, A and B.
>> A is refered by subvol 257.
>> B is refered by subvol 258.
>> So E is shared by 257 and 258.
>>
>> Now, you see how things would go mad, for each extent you must go that
>> way to determine the real owner of each extent, not to mention we can
>> have at most 8 levels, tree blocks at level 0~7 can all be shared.
>>
>> If it's shared by 1000 subvolumes, hope you had a good day then.
>
> Ok, let's do just this issue for the time being. One issue at a time. It
> will be easier.
>
> The solution is to temporarily create a copy of the entire backref-tree
> in memory. To create this copy, you just do a preorder depth-first
> traversal following only forward references.
>
> So this preorder depth-first traversal would visit the nodes in the
> following order:
> A,C,E,F,D,G,H,B
>
> Oh, it is not a tree, it is a DAG in that example of yours. OK, preorder
> is possible on DAG, too. But how did you get a DAG, shouldn't it be all
> trees?
>
> When you have the entire backref-tree (backref-DAG?) in memory, doing a
> backref-walk is a piece of cake.
>
> Of course, this in-memory backref tree has to be kept in sync with the
> filesystem, that is it has to be updated whenever there is a write to
> disk. That's not so hard.
Great, now that you have devised a solution and have plenty of
experience writing code why not try and contribute to btrfs?