On Wed, 2003-06-11 at 13:59, Martin Pool wrote: > On 11 Jun 2003, Donovan Baarda <[EMAIL PROTECTED]> wrote: > > > The vcdiff standard is available as RFC3284, and Josh is listed as one > > of the authors. > > Yes, I've just been reading that. > > I seem to remember that it was around as an Internet-Draft when I > started, but it didn't seem clear that it would become standard so I > didn't use it.
I'm not sure if this is the same one... I vaguely recall something like this too, but I think it was an attempt to add delta support to http and had the significant flaw of not supporting rsync's "delta-from-signature". It may have come out of the early xdelta http proxy project. IMHO rproxy's http extensions for delta support were better because they were more general. There was also another thing I saw which was a compact delta representation spec that I think librsync uses (perhaps it was you who had some discussion about it in the old librsync TODO?), and may have influenced the vcdiff RFC, but AFAIK was never "official" in any way. > > I also had some correspondence with Josh ages ago where he talked about > > how self-referencing delta's can directly do compression of the miss > > data without using things like zlib and by default gives you the > > benefits of rsync's "context compression" without the overheads (rsync > > runs a decompressor _and_ a compressor on the receiving end just to > > regenerate the compressed "hit" context data). > > Something possibly similar is mentioned in tridge's thesis. I was > talking to him a while ago and (iirc) he thought it would be good to > try it again, since it does well with the large amounts of memory and > CPU time that are available on modern machines. I forget if I saw this in Tridge's thesis, but I definitely noticed that librsync uses a modified zlib to make feeding data to the compressor and throwing away the compressed output more efficient. I have implemented this in pysync too, though I don't use a modified zlib... I just throw the compressed output away. The self referencing compression idea is neat but would be a... challenge to implement. For it to be effective, the self-referenced matches would need to be non-block aligned like xdelta, which tends to suggest using xdelta to do the self-reference matches on top of rsync for the block aligned remote matches. Fortunately xdelta and rsync have heaps on common, so implementing both in one library would be easy (see pysync for an example). If I didn't have paid work I would be prototyping it in pysync right now. If anyone wanted to fund something like this I could make myself available :-) > I strongly agree with what you said a while ago about code simplicity > being more valuable than squeezing out every last bit. Yeah, my big complaint about librsync at the moment is it is messy. Just cleaning up the code alone will be a big improvement. I would guess that at least 30% of the code could be trimmed away, leaving a cleaner and more extensible core, and because "messy" leads to "inefficient", it would be faster too. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html