Re: Feature requests: online backup - defrag - change RAID level

General Zed Mon, 16 Sep 2019 04:43:54 -0700


Quoting Zygo Blaxell <ce3g8...@umail.furryterror.org>:

On Thu, Sep 12, 2019 at 05:23:21PM -0400, General Zed wrote:


Quoting Zygo Blaxell <ce3g8...@umail.furryterror.org>:

> On Wed, Sep 11, 2019 at 07:21:31PM -0400, webmas...@zedlx.com wrote:
> >
> > Quoting Zygo Blaxell <ce3g8...@umail.furryterror.org>:

[...etc...]

> > > On Wed, Sep 11, 2019 at 01:20:53PM -0400, webmas...@zedlx.com wrote:
> It's the default for GNU coreutils, and for 'mv' across subvols there
> is currently no option to turn reflink copies off.  Maybe for 'cp'
> you still have to explicitly request reflink, but that will presumably
> change at some point as more filesystems get the CLONE_RANGE ioctl and
> more users expect it to just work by default.

Yes, thank you for posting another batch of arguments that support the use
of my vision of defrag instead of the current one.

The defrag that I'm proposing will preserve all those reflinks that were
painstakingly created by the user. Therefore, I take that you agree with me
on the utmost importance of implementing this new defrag that I'm proposing.


I do not agree that improving the current defrag is of utmost importance,
or indeed of any importance whatsoever.  The current defrag API is a
clumsy, unscalable hack that cannot play well with other filesystem layout
optimization tools no matter what you do to its internal implementation
details.  It's better to start over with a better design, and spend only
the minimal amount of effort required to keep the old one building until
its replacement(s) is (are) proven in use and ready for deployment.

I'm adding extent-merging support to an existing tool that already
performs several other filesystem layout optimizations.  The goal is to
detect degenerate extent layout on filesystems as it appears, and repair
it before it becomes a more severe performance problem, without wasting
resources on parts of the filesystem that do not require intervention.

Oh, I get it. So, the current defrag isn't particularly good, so youare going to produce a solution which mitigates the fragmentationproblem in some cases (but not all of them). Well, that's a good quickfix, but not a true solution.

Your defrag ideas are interesting, but you should spend a lot more
time learning the btrfs fundamentals before continuing.  Right now
you do not understand what btrfs is capable of doing easily, and what
requires such significant rework in btrfs to implement that the result
cannot be considered the same filesystem.  This is impairing the quality
of your design proposals and reducing the value of your contribution
significantly.

Ok, that was a shot at me; and I admit, guilty as charged. I barelyhave a clue about btrfs.Now it's my turn to shoot. Apparently, the people which areimplementing the btrfs defrag, or at least the ones that responded tomy post, seem to have no clue about how on-demand defrag solutionstypically work. I had to explain the usual tricks involved in thedefragmentation, and it was like talking to complete rookies. None ofyou even considered a full-featured defrag solution, all that you aredoing are some partial solutions.

And, you all got lost in implementation details. How many times have Ibeen told here that some operation cannot be performed, and then itturned out the opposite. You have all sunk into some strange state ofmind where every possible excuse is being made in order not to startworking on a better, hollistic defrag solution.

And you even misunderstood me when I said "hollistic defrag", youthought I was talking about a full defrag. No. A full defrag is adefrag performed on all the data. A holistic defrag can be performedon only some data, but it is hollistic in the sense that it uses wholeinformation about a filesystem, not just a partial view of it. Aholistic defrag is better than a partial defrag: it is faster andproduces better results, and it can defrag a wider spectrum of cases.Why? Because a holistic defrag takes everything into account.

So I think you should all inform yourself a little better aboutvarious defrag algorithms and solutions that exist. Apparently, youall lost the sight of the big picture. You can't see the wood from thetrees.

I suggest that btrfs should first try to determine whether it can split an
extent in-place, or not. If it can't do that, then it should create new
extents to split the old one.


btrfs cannot split extents in place, so it must always create new
extents by copying data blocks.  It's a hugely annoying and non-trivial
limitation that makes me consider starting over with some other filesystem
quite often.

Actually, this has no repercussions for the defrag. The defrag willalways copy the data to a new place. So, if brtfs can't splitin-place, that is just fine.

If you are looking for important btrfs work, consider solving that
problem first.  It would dramatically improve GC (in the sense that
it would eliminate the need to perform a separate GC step at all) and
dedupe performance on btrfs as well as help defrag and other extent
layout optimizers.


There is no problem there.

Therefore, the defrag can free unused parts of any extent, and then the
extent can be split is necessary. In fact, both these operations can be done
simultaneously.


Sure, but I only call one of these operations "defrag" (the extent merge
operation).  The other operations increase the total number of fragments
in the filesystem, so "defrag" is not an appropriate name for them.
An appropriate name would be something like "enfrag" or "refrag" or
"split".  In some cases the "defrag" can be performed by doing a "dedupe"
operation with a single unfragmented identical source extent replacing
several fragmented destination extents...what do you call that?

Well, no. Perhaps the word "defrag" can have a wider and narrowersense. So in a narrower sense, "defrag" means what you just wrote. Inthat sense, the word "defrag" means practically the same as "merge",so why not just use the word "merge" to remove any ambiguities. The"merge" is the only operation that decreases the number of fragments(besides "delete"). Perhaps you meant move&merge. But, commonly, theword "defrag" is used in a wider sense, which is not the one youdescribed.

In a wider sense, the defrag involves the preparation, analysis, freespace consolidation, multiple phases, splitting and merging, and finalpasses.


Try looking on Wikipedia for "defrag".

> Dedupe on btrfs also requires the ability to split and merge extents;
> otherwise, we can't dedupe an extent that contains a combination of
> unique and duplicate data.  If we try to just move references around
> without splitting extents into all-duplicate and all-unique extents,
> the duplicate blocks become unreachable, but are not deallocated.  If we
> only split extents, fragmentation overhead gets bad.  Before creating
> thousands of references to an extent, it is worthwhile to merge it with
> as many of its neighbors as possible, ideally by picking the biggest
> existing garbage-free extents available so we don't have to do defrag.
> As we examine each extent in the filesystem, it may be best to send
> to defrag, dedupe, or garbage collection--sometimes more than one of
> those.

This is sovled simply by always running defrag before dedupe.


Defrag and dedupe in separate passes is nonsense on btrfs.


Defrag can be run without dedupe.

Now, how to organize dedupe? I didn't think about it yet. I'll leaveit to you, but it seems to me that defrag should be involved there.And, my defrag solution would help there very, very much.

Defrag burns a lot of iops on defrag moving extent data around to create
new size-driven extent boundaries.  These will have to be immediately
moved again by dedupe (except in special cases like full-file matches),
because dedupe needs to create content-driven extent boundaries to work
on btrfs.


Defrag can be run without dedupe.

Dedupe probably requires some kind of defrag to produce a good result(a result without heavy fragmentation).

Extent splitting in-place is not possible on btrfs, so extent boundary
changes necessarily involve data copies.  Reference counting is done
by extent in btrfs, so it is only possible to free complete extents.


Great, there is reference counting in btrfs. That helps. Good design.

You have to replace the whole extent with references to data from
somewhere else, creating data copies as required to do so where no
duplicate copy of the data is available for reflink.

Note the phrase "on btrfs" appears often here...other filesystems manage
to solve these problems without special effort.  Again, if you're looking
for important btrfs things to work on, maybe start with in-place extent
splitting.

I think that I'll start with "software design document for on-demanddefrag which preserves sharing structure". I have figure out that youdon't have it yet. And, how can you even start working on a defragwithout a software design document?


So I volunteer to write it. Apparently, I'm already half way done.

On XFS you can split extents in place and reference counting is by
block, so you can do alternating defrag and dedupe passes.  It's still
suboptimal (you still waste iops to defrag data blocks that are
immediately eliminated by the following dedupe), but it's orders of
magnitude better than btrfs.

I'll reply to the rest of this marathonic post in another reply (whenI find the time to read it). Because I'm writing the software designdocument.

Re: Feature requests: online backup - defrag - change RAID level

Reply via email to