Hi David and James, On Mon, Nov 7, 2016 at 6:02 AM, David Sterba <dste...@suse.cz> wrote: > On Sun, Nov 06, 2016 at 02:30:52PM +0100, James Pharaoh wrote: >> I'm pleased to announce my btrfs deduplication utility, written in Rust. >> This operates on whole files, is fast, and I believe complements the >> existing utilities (duperemove, bedup), which exist currently. > > Mark can correct me if I'm wrong, but AFAIK, duperemove can consume > output of fdupes, which does the whole file scanning for duplicates. And > I think adding a whole-file dedup mode to duperemove would be better > (from user's POV) than writing a whole new tool, eg. because of existing > availability of duperemove in the distros.
Yeah you are correct - fdupes -r /foo | duperemove --fdupes will get you the same effect. There's been a request for us to do all of that internally so that the whole file dedupe works with the mtime checking code. This is entirely doable. I would probably either add a field to the files table or add a new table to hold whole-file hashes. We can then squeeze down our existing block hashes into one big one or just rehash the whole file. > Also looking to your roadmap, some of the items are implemented in > duperemove: database of existing csums, cross filesystem boundary, > mtime-based speedups). Yeah, rescanning based on mtime was a huge speedup for Duperemove as was keeping checksums in a db. We do all this today, also on XFS with the dedupe ioctl (I believe this should be out with Linux-4.9). Btw, there's lots of little details and bug fixes which I feel add up to a relatively complete (though far from perfect!) tool. For example, the dedupe code can handle multiple kernel versions including old kernels which couldn't dedupe on non aligned block boundaries. Every major step in duperemove is threaded at this point too which has also been an enormous performance increase (which new features benefit from). Thanks, --Mark -- "When the going gets weird, the weird turn pro." Hunter S. Thompson -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html