Hi David and James,

On Mon, Nov 7, 2016 at 6:02 AM, David Sterba <dste...@suse.cz> wrote:
> On Sun, Nov 06, 2016 at 02:30:52PM +0100, James Pharaoh wrote:
>> I'm pleased to announce my btrfs deduplication utility, written in Rust.
>> This operates on whole files, is fast, and I believe complements the
>> existing utilities (duperemove, bedup), which exist currently.
>
> Mark can correct me if I'm wrong, but AFAIK, duperemove can consume
> output of fdupes, which does the whole file scanning for duplicates. And
> I think adding a whole-file dedup mode to duperemove would be better
> (from user's POV) than writing a whole new tool, eg. because of existing
> availability of duperemove in the distros.

Yeah you are correct - fdupes -r /foo | duperemove --fdupes  will get
you the same effect.

There's been a request for us to do all of that internally so that the
whole file dedupe works with the mtime checking code. This is entirely
doable. I would probably either add a field to the files table or add
a new table to hold whole-file hashes. We can then squeeze down our
existing block hashes into one big one or just rehash the whole file.


> Also looking to your roadmap, some of the items are implemented in
> duperemove: database of existing csums, cross filesystem boundary,
> mtime-based speedups).

Yeah, rescanning based on mtime was a huge speedup for Duperemove as
was keeping checksums in a db. We do all this today, also on XFS with
the dedupe ioctl (I believe this should be out with Linux-4.9).

Btw, there's lots of little details and bug fixes which I feel add up
to a relatively complete (though far from perfect!) tool. For example,
the dedupe code can handle multiple kernel versions including old
kernels which couldn't dedupe on non aligned block boundaries. Every
major step in duperemove is threaded at this point too which has also
been an enormous performance increase (which new features benefit
from).

Thanks,
    --Mark

-- 
"When the going gets weird, the weird turn pro."
Hunter S. Thompson
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to