Quoting Zygo Blaxell <ce3g8...@umail.furryterror.org>:

On Wed, Sep 11, 2019 at 04:01:01PM -0400, webmas...@zedlx.com wrote:

Quoting "Austin S. Hemmelgarn" <ahferro...@gmail.com>:


> Not necessarily. Even ignoring the case of data deduplication (which
> needs to be considered if you care at all about enterprise usage, and is
> part of the whole point of using a CoW filesystem), there are existing
> applications that actively use reflinks, either directly or indirectly
> (via things like the `copy_file_range` system call), and the number of
> such applications is growing.

The same argument goes here: If data-deduplication was performed, then the
user has specifically requested it.
Therefore, since it was user's will, the defrag has to honor it, and so the
defrag must not unshare deduplicated extents because the user wants them
shared. This might prevent a perfect defrag, but that is exactly what the
user has requested, either directly or indirectly, by some policy he has
choosen.

You can't both perfectly defrag and honor deduplication. Therefore, the
defrag has to do the best possible thing while still honoring user's will.
<<<!!! So, the fact that the deduplication was performed is actually the
reason FOR not unsharing, not against it, as you made it look in that
paragraph. !!!>>>

IMHO the current kernel 'defrag' API shouldn't be used any more.  We need
a tool that handles dedupe and defrag at the same time, for precisely
this reason:  currently the two operations have no knowledge of each
other and duplicate or reverse each others work.  You don't need to defrag
an extent if you can find a duplicate, and you don't want to use fragmented
extents as dedupe sources.

Yes! The current defrag that you have is a bad counterpart for deduplication.

To preserve deduplication, you need the defrag that I suggested: the defrah which never unshares file data.

If the system unshares automatically after deduplication, then the user will
need to run deduplication again. Ridiculous!

> > When a user creates a reflink to a file in the same subvolume, he is
> > willingly denying himself the assurance of a perfect defrag.
> > Because, as your example proves, if there are a few writes to BOTH
> > files, it gets impossible to defrag perfectly. So, if the user
> > creates such reflinks, it's his own whish and his own fault.

> The same argument can be made about snapshots.  It's an invalid argument
> in both cases though because it's not always the user who's creating the
> reflinks or snapshots.

Um, I don't agree.

1) Actually, it is always the user who is creating reflinks, and snapshots,
too. Ultimately, it's always the user who does absolutely everything,
because a computer is supposed to be under his full control. But, in the
case of reflink-copies, this is even more true
because reflinks are not an essential feature for normal OS operation, at
least as far as today's OSes go. Every OS has to copy files around. Every OS
requires the copy operation. No current OS requires the reflinked-copy
operation in order to function.

If we don't do reflinks all day, every day, our disks fill up in a matter
of hours...

The defrag which I am proposing will honor all your reflinks and won't unshare them ever withut user's specific request.

At the same time, it can still defrag this reflinked data, not perfectly, but almost as good as a perfect defrag. So you can enjoy both your reflinks and have a reasonably defragmented, fast disk IO.

You can have both. It can be done!



Reply via email to