Quoting Zygo Blaxell <ce3g8...@umail.furryterror.org>:
On Thu, Sep 12, 2019 at 05:23:21PM -0400, General Zed wrote:
Quoting Zygo Blaxell <ce3g8...@umail.furryterror.org>:
> On Wed, Sep 11, 2019 at 07:21:31PM -0400, webmas...@zedlx.com wrote:
> >
> > Quoting Zygo Blaxell <ce3g8...@umail.furryterror.org>:
[...etc...]
> > > On Wed, Sep 11, 2019 at 01:20:53PM -0400, webmas...@zedlx.com wrote:
> It's the default for GNU coreutils, and for 'mv' across subvols there
> is currently no option to turn reflink copies off. Maybe for 'cp'
> you still have to explicitly request reflink, but that will presumably
> change at some point as more filesystems get the CLONE_RANGE ioctl and
> more users expect it to just work by default.
Yes, thank you for posting another batch of arguments that support the use
of my vision of defrag instead of the current one.
The defrag that I'm proposing will preserve all those reflinks that were
painstakingly created by the user. Therefore, I take that you agree with me
on the utmost importance of implementing this new defrag that I'm proposing.
I do not agree that improving the current defrag is of utmost importance,
or indeed of any importance whatsoever. The current defrag API is a
clumsy, unscalable hack that cannot play well with other filesystem layout
optimization tools no matter what you do to its internal implementation
details. It's better to start over with a better design, and spend only
the minimal amount of effort required to keep the old one building until
its replacement(s) is (are) proven in use and ready for deployment.
I'm adding extent-merging support to an existing tool that already
performs several other filesystem layout optimizations. The goal is to
detect degenerate extent layout on filesystems as it appears, and repair
it before it becomes a more severe performance problem, without wasting
resources on parts of the filesystem that do not require intervention.
Oh, I get it. So, the current defrag isn't particularly good, so you
are going to produce a solution which mitigates the fragmentation
problem in some cases (but not all of them). Well, that's a good quick
fix, but not a true solution.
Your defrag ideas are interesting, but you should spend a lot more
time learning the btrfs fundamentals before continuing. Right now
you do not understand what btrfs is capable of doing easily, and what
requires such significant rework in btrfs to implement that the result
cannot be considered the same filesystem. This is impairing the quality
of your design proposals and reducing the value of your contribution
significantly.
Ok, that was a shot at me; and I admit, guilty as charged. I barely
have a clue about btrfs.
Now it's my turn to shoot. Apparently, the people which are
implementing the btrfs defrag, or at least the ones that responded to
my post, seem to have no clue about how on-demand defrag solutions
typically work. I had to explain the usual tricks involved in the
defragmentation, and it was like talking to complete rookies. None of
you even considered a full-featured defrag solution, all that you are
doing are some partial solutions.
And, you all got lost in implementation details. How many times have I
been told here that some operation cannot be performed, and then it
turned out the opposite. You have all sunk into some strange state of
mind where every possible excuse is being made in order not to start
working on a better, hollistic defrag solution.
And you even misunderstood me when I said "hollistic defrag", you
thought I was talking about a full defrag. No. A full defrag is a
defrag performed on all the data. A holistic defrag can be performed
on only some data, but it is hollistic in the sense that it uses whole
information about a filesystem, not just a partial view of it. A
holistic defrag is better than a partial defrag: it is faster and
produces better results, and it can defrag a wider spectrum of cases.
Why? Because a holistic defrag takes everything into account.
So I think you should all inform yourself a little better about
various defrag algorithms and solutions that exist. Apparently, you
all lost the sight of the big picture. You can't see the wood from the
trees.
I suggest that btrfs should first try to determine whether it can split an
extent in-place, or not. If it can't do that, then it should create new
extents to split the old one.
btrfs cannot split extents in place, so it must always create new
extents by copying data blocks. It's a hugely annoying and non-trivial
limitation that makes me consider starting over with some other filesystem
quite often.
Actually, this has no repercussions for the defrag. The defrag will
always copy the data to a new place. So, if brtfs can't split
in-place, that is just fine.
If you are looking for important btrfs work, consider solving that
problem first. It would dramatically improve GC (in the sense that
it would eliminate the need to perform a separate GC step at all) and
dedupe performance on btrfs as well as help defrag and other extent
layout optimizers.
There is no problem there.
Therefore, the defrag can free unused parts of any extent, and then the
extent can be split is necessary. In fact, both these operations can be done
simultaneously.
Sure, but I only call one of these operations "defrag" (the extent merge
operation). The other operations increase the total number of fragments
in the filesystem, so "defrag" is not an appropriate name for them.
An appropriate name would be something like "enfrag" or "refrag" or
"split". In some cases the "defrag" can be performed by doing a "dedupe"
operation with a single unfragmented identical source extent replacing
several fragmented destination extents...what do you call that?
Well, no. Perhaps the word "defrag" can have a wider and narrower
sense. So in a narrower sense, "defrag" means what you just wrote. In
that sense, the word "defrag" means practically the same as "merge",
so why not just use the word "merge" to remove any ambiguities. The
"merge" is the only operation that decreases the number of fragments
(besides "delete"). Perhaps you meant move&merge. But, commonly, the
word "defrag" is used in a wider sense, which is not the one you
described.
In a wider sense, the defrag involves the preparation, analysis, free
space consolidation, multiple phases, splitting and merging, and final
passes.
Try looking on Wikipedia for "defrag".
> Dedupe on btrfs also requires the ability to split and merge extents;
> otherwise, we can't dedupe an extent that contains a combination of
> unique and duplicate data. If we try to just move references around
> without splitting extents into all-duplicate and all-unique extents,
> the duplicate blocks become unreachable, but are not deallocated. If we
> only split extents, fragmentation overhead gets bad. Before creating
> thousands of references to an extent, it is worthwhile to merge it with
> as many of its neighbors as possible, ideally by picking the biggest
> existing garbage-free extents available so we don't have to do defrag.
> As we examine each extent in the filesystem, it may be best to send
> to defrag, dedupe, or garbage collection--sometimes more than one of
> those.
This is sovled simply by always running defrag before dedupe.
Defrag and dedupe in separate passes is nonsense on btrfs.
Defrag can be run without dedupe.
Now, how to organize dedupe? I didn't think about it yet. I'll leave
it to you, but it seems to me that defrag should be involved there.
And, my defrag solution would help there very, very much.
Defrag burns a lot of iops on defrag moving extent data around to create
new size-driven extent boundaries. These will have to be immediately
moved again by dedupe (except in special cases like full-file matches),
because dedupe needs to create content-driven extent boundaries to work
on btrfs.
Defrag can be run without dedupe.
Dedupe probably requires some kind of defrag to produce a good result
(a result without heavy fragmentation).
Extent splitting in-place is not possible on btrfs, so extent boundary
changes necessarily involve data copies. Reference counting is done
by extent in btrfs, so it is only possible to free complete extents.
Great, there is reference counting in btrfs. That helps. Good design.
You have to replace the whole extent with references to data from
somewhere else, creating data copies as required to do so where no
duplicate copy of the data is available for reflink.
Note the phrase "on btrfs" appears often here...other filesystems manage
to solve these problems without special effort. Again, if you're looking
for important btrfs things to work on, maybe start with in-place extent
splitting.
I think that I'll start with "software design document for on-demand
defrag which preserves sharing structure". I have figure out that you
don't have it yet. And, how can you even start working on a defrag
without a software design document?
So I volunteer to write it. Apparently, I'm already half way done.
On XFS you can split extents in place and reference counting is by
block, so you can do alternating defrag and dedupe passes. It's still
suboptimal (you still waste iops to defrag data blocks that are
immediately eliminated by the following dedupe), but it's orders of
magnitude better than btrfs.
I'll reply to the rest of this marathonic post in another reply (when
I find the time to read it). Because I'm writing the software design
document.