> Vincent Lefevre wrote:
>> Ben Reser wrote:
>>>  It resets the compression algorithm every 1000 bytes and thus makes
>>>  blocks that can be saved between revisions of the file.
>> 
>>  Wouldn't this work only when data are appended to the file?

>>  If data are inserted or deleted, this would change the block
>>  boundaries. Instead of fixed-length blocks, I'd rather see
>>  boundaries based on the file contents.
> 
> That's true, the compression blocks are fixed.

No, that's not true.  I think the article Ben read was inaccurate.  The 
'--rsyncable' option doesn't reset the compression after a fixed number of 
bytes, but rather at every point where a rolling checksum of the last N bytes 
leading up to that point has a certain value.  It will resynchronize after an 
insertion or deletion.  The intervals between resets are irregular but 
deterministic.

Here's an old but readable description and proof-of-concept: 
<http://svana.org/kleptog/rgzip.html>.

Here's an announcement of implementation in pigz: 
<http://mail.zlib.net/pipermail/pigz-announce_zlib.net/2012-January/000003.html>.
  It's described in more detail in a big comment near the beginning of 'pigz.c' 
in the source tarball available at <http://zlib.net/pigz/>.


Philip Martin wrote:
> Julian Foad <julianf...@btopenworld.com> writes:
> 
>>  Yes, a client-side plug-in -- either to Subversion or to OpenOffice --
>>  seems to me the best practical solution.
> 
> A server-side solution is difficult.  Suppose the client has some
> uncompressed content U which it compresses to C and sends to the server.
> The server can uncompress C to get U but unless the compression scheme
> has a canonical compressed form, with no other forms allowed, the server
> cannot avoid storing C because there is no guarantee that C can be
> reconstructed from U.

Yes, a server-side solution would have lots of problems including that one.  
Scalability is another -- keeping the server up to date with plug-ins for all 
(or most) of the compressed content types that the clients are using.

A client-side plug-in does not have those problems, at least not to the same 
extent.  It does have its own problems, though, including installation & 
configuration & portability issues.

- Julian

Reply via email to