On 2016-11-08 11:57, Darrick J. Wong wrote:
On Tue, Nov 08, 2016 at 08:26:02AM -0500, Austin S. Hemmelgarn wrote:
On 2016-11-07 21:40, Christoph Anton Mitterer wrote:
On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
I think adding a whole-file dedup mode to duperemove would be better
(from user's POV) than writing a whole new tool

What would IMO be really good from a user's POV was, if one of the
tools, deemed to be the "best", would be added to the btrfs-progs and
simply become "the official" one.

The problem is that for deduplication, most tools won't work well for
everything.  For example the cases I use it in are very specific and have
horrible performance using pretty much any available tool (I have a couple
cases where I have disjoint subsets of the same directory tree with
different prefixes, so I can tell exactly which files are duplicated, and
that any duplicate file is 100% duplicate, as well as a couple of cases
where changes are small, scattered, and highly predictable (and thus it's
easier to find what's changed and dedupe everything else instead of finding
what's the same), and none of the existing options do well in either
situation).

I'd argue at minimum for having the extent-same tool from duperemove in
btrfs-progs, as that lets people do deduplication how they want without
having to write C code.  Something equivalent that would let you call any
BTRFS ioctl with (reasonably) arbitrary arguments might actually be even
better (I can see such a tool being wonderful for debugging).

Since xfsprogs 4.3, xfs_io has a 'dedupe' command that can talk to
FIDEDUPERANGE (f.k.a. EXTENT SAME):

$ xfs_io -c '/mnt/srcfile srcoffset dstoffset length' /mnt/destfile

I actually hadn't known about this, thanks. It means that xfs_io just got even more useful despite me not running XFS.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to