On Wed, May 11, 2016 at 07:36:59PM +0200, David Sterba wrote:
> I like the in-memory dedup backend. It's lightweight, only a heuristic,
> does not need any IO or persistent storage. OTOH I consider it a subpart
> of the in-band deduplication that does all the persistency etc. So I
> treat the ioctl interface from a broader aspect.
> 
> A usecase I find interesting is to keep the in-memory dedup cache and
> then flush it to disk on demand, compared to automatically synced dedup
> (eg. at commit time).

The tradeoff depends on a lot of parameters, like your expected dup rate,
memory size, and seek latency.  If the dup rate is high (say 40%) and
your seek latency is high (low-cost spinning rust) and you don't have
enough RAM to load the whole hash table into memory, then an on-disk
dedup cache _itself_ creates an unusable I/O load.  Hash table lookups
generate random I/O, and at 40% dup rate every other block you write
requires a performance-crippling disk seek to read the half of the cache
that isn't in RAM.

I looked at my parameters and concluded that an in-memory cache (with
persistence by saving the data at regular intervals) was the *only*
kind of cache I'd ever be able to use with any dedup implementation.

If the dup rate is lower or you're using SSD then you might trade some
IOs for more free RAM and consider an on-disk hash table with some sort of
paging scheme.  If you have huge amounts of RAM you don't need an on-disk
scheme at all--you can add persistence e.g. by trickle-writing the data
over the space of an hour to avoid adding a lot of latency or memory
pressure at once.

> > Users can get better dedupe via the ioctl today than with what
> > you propose go in as an experimental feature so I don't see many people
> > caring to test it. IMHO you would have to provide a more compelling reason
> > to include this code.
> 
> I see it as a complementary feature in the deduplication capabilities,
> covering more usecases.

If you have unlimited amounts of RAM, fast CPU, and slow disks, then it
certainly makes sense, even with the SHA256 hash.  That seems to be the
use case ZFS was designed for.

Attachment: signature.asc
Description: Digital signature

Reply via email to