Re: Client-side deduplication during extraction

James Cass Mon, 20 Nov 2017 04:22:13 -0800

+1 for me.  This sounds like a good idea.
That's my 2 satoshis.  :-)

On Sun, Nov 19, 2017 at 8:03 PM, Colin Percival <cperc...@tarsnap.com>
wrote:


> On 11/19/17 12:37, Robie Basak wrote:
> > On Sat, Apr 08, 2017 at 07:52:54PM -0700, Colin Percival wrote:
> >> On 04/04/17 13:06, Robie Basak wrote:
> >>> Since the redundancy is there and my client has all the details,
> >>> is there any way I can take advantage of this?
> >>
> >> Not right now.  This is something I've been thinking about implementing,
> >> but it's rather complicated (the tarsnap "read" path would need to look
> at
> >> data on disk to see what it can "reuse", and normally it doesn't read
> any
> >> files from disk).
> >
> > In case it helps others, I hacked together a client-side cache for this
> > one task. It appears to have worked. Patch below.
>
> Ah yes, I was thinking in terms of "notice that we're extracting the file
> 'foo' and there is already a file 'foo', then read that file in and split
> it into blocks in case any can be reused" -- the case you've covered here
> of keeping a cache of downloaded blocks is much simpler (but only covers
> the "multiple downloads of the same data" case, not the more general case
> of "synchronizing" a system with an archive).
>
> > This is absolutely a hack and not production ready (no concurrency, bad
> > error handling, hardcoded cache path whose directory must be created in
> > advance and permissions set manually, etc), but for a one-off task it
> > was enough for me to get my data out.
> > [snip patch]
>
> Yes, this patch definitely looks like it does what you want.  I'd consider
> including it (well, with details tidied up) but I'm not sure if anyone else
> would want to use this functionality... anyone else on the list interested?
>
> --
> Colin Percival
> Security Officer Emeritus, FreeBSD | The power to serve
> Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
>

Re: Client-side deduplication during extraction

Reply via email to