Re: Feature requests: online backup - defrag - change RAID level

webmaster Wed, 11 Sep 2019 10:21:22 -0700


Quoting "Austin S. Hemmelgarn" <ahferro...@gmail.com>:

On 2019-09-10 19:32, webmas...@zedlx.com wrote:


Quoting "Austin S. Hemmelgarn" <ahferro...@gmail.com>:

=== I CHALLENGE you and anyone else on this mailing list: ===
- Show me an exaple where splitting an extent requires unsharing,and this split is needed to defrag.
Make it clear, write it yourself, I don't want any machine-made outputs.
Start with the above comment about all writes unsharing the regionbeing written to.
Now, extrapolating from there:
Assume you have two files, A and B, each consisting of 64 filesystemblocks in single shared extent. Now assume somebody writes a fewbytes to the middle of file B, right around the boundary betweenblocks 31 and 32, and that you get similar writes to file Astraddling blocks 14-15 and 47-48.
After all of that, file A will be 5 extents:

* A reflink to blocks 0-13 of the original extent.
* A single isolated extent consisting of the new blocks 14-15
* A reflink to blocks 16-46 of the original extent.
* A single isolated extent consisting of the new blocks 47-48
* A reflink to blocks 49-63 of the original extent.

And file B will be 3 extents:

* A reflink to blocks 0-30 of the original extent.
* A single isolated extent consisting of the new blocks 31-32.
* A reflink to blocks 32-63 of the original extent.
Note that there are a total of four contiguous sequences of blocksthat are common between both files:
* 0-13
* 16-30
* 32-46
* 49-63
There is no way to completely defragment either file withoutsplitting the original extent (which is still there, just not fullyreferenced by either file) unless you rewrite the whole file to anew single extent (which would, of course, completely unshare thewhole file). In fact, if you want to ensure that those sharedregions stay reflinked, there's no way to defragment either filewithout _increasing_ the number of extents in that file (either filewould need 7 extents to properly share only those 4 regions), andeven then only one of the files could be fully defragmented.
Such a situation generally won't happen if you're just dealing withread-only snapshots, but is not unusual when dealing with regularfiles that are reflinked (which is not an uncommon situation on somesystems, as a lot of people have `cp` aliased to reflink thingswhenever possible).

Well, thank you very much for writing this example. Your example iscertainly not minimal, as it seems to me that one write to the file Aand one write to file B would be sufficient to prove your point, sothere we have one extra write in the example, but that's OK.

Your example proves that I was wrong. I admit: it is impossible toperfectly defrag one subvolume (in the way I imagined it should bedone).Why? Because, as in your example, there can be files within a SINGLEsubvolume which share their extents with each other. I didn't considersuch a case.

On the other hand, I judge this issue to be mostly irrelevant. Why?Because most of the file sharing will be between subvolumes, notwithin a subvolume. When a user creates a reflink to a file in thesame subvolume, he is willingly denying himself the assurance of aperfect defrag. Because, as your example proves, if there are a fewwrites to BOTH files, it gets impossible to defrag perfectly. So, ifthe user creates such reflinks, it's his own whish and his own fault.


Such situations will occur only in some specific circumstances:
a) when the user is reflinking manually

b) when a file is copied from one subvolume into a different file in adifferent subvolume.

The situation a) is unusual in normal use of the filesystem. Even whenit occurs, it is the explicit command given by the user, so he shouldbe willing to accept all the consequences, even the bad ones likeimperfect defrag.

The situation b) is possible, but as far as I know copies arecurrently not done that way in btrfs. There should probably be theoption to reflink-copy files fron another subvolume, that would be good.

But anyway, it doesn't matter. Because most of the sharing will bebetween subvolumes, not within subvolume. So, if there is somein-subvolume sharing, the defrag wont be 100% perfect, that a minorpoint. Unimportant.

About merging extents: a defrag should merge extents ONLY when bothextents are shared by the same files (and when those extents areneighbours in both files). In other words, defrag should alwaysmerge without unsharing. Let's call that operation "fusingextents", so that there are no more misunderstandings.

And I reiterate: defrag only operates on the file it's passed in.It needs to for efficiency reasons (we had a reflink aware defragfor a while a few years back, it got removed because performancelimitations meant it was unusable in the cases where you actuallyneeded it). Defrag doesn't even know that there are reflinks to theextents it's operating on.

If the defrag doesn't know about all reflinks, that's bad in my view.That is a bad defrag. If you had a reflink-aware defrag, and it wasslow, maybe that happened because the implementation was bad. Because,I don't see any reason why it should be slow. So, you will have toexplain to me what was causing this performance problems.

Given this, defrag isn't willfully unsharing anything, it's just aside-effect of how it works (since it's rewriting the block layoutof the file in-place).

The current defrag has to unshare because, as you said, because it isunaware of the full reflink structure. If it doesn't know about allreflinks, it has to unshare, there is no way around that.

Now factor in that _any_ write will result in unsharing the regionbeing written to, rounded to the nearest full filesystem block inboth directions (this is mandatory, it's a side effect of thecopy-on-write nature of BTRFS, and is why files that experienceheavy internal rewrites get fragmented very heavily and very quicklyon BTRFS).

You mean: when defrag performs a write, the new data is unsharedbecause every write is unshared? Really?

Consider there is an extent E55 shared by two files A and B. Thedefrag has to move E55 to another location. In order to do that,defrag creates a new extent E70. It makes it belong to file A bychanging the reflink of extent E55 in file A to point to E70.

Now, to retain the original sharing structure, the defrag has tochange the reflink of extent E55 in file B to point to E70. You aretelling me this is not possible? Bullshit!

Please explain to me how this 'defrag has to unshare' story of yoursisn't an intentional attempt to mislead me.

Re: Feature requests: online backup - defrag - change RAID level

Reply via email to