> Vincent Lefevre wrote: >> Ben Reser wrote: >>> It resets the compression algorithm every 1000 bytes and thus makes >>> blocks that can be saved between revisions of the file. >> >> Wouldn't this work only when data are appended to the file?
>> If data are inserted or deleted, this would change the block >> boundaries. Instead of fixed-length blocks, I'd rather see >> boundaries based on the file contents. > > That's true, the compression blocks are fixed. No, that's not true. I think the article Ben read was inaccurate. The '--rsyncable' option doesn't reset the compression after a fixed number of bytes, but rather at every point where a rolling checksum of the last N bytes leading up to that point has a certain value. It will resynchronize after an insertion or deletion. The intervals between resets are irregular but deterministic. Here's an old but readable description and proof-of-concept: <http://svana.org/kleptog/rgzip.html>. Here's an announcement of implementation in pigz: <http://mail.zlib.net/pipermail/pigz-announce_zlib.net/2012-January/000003.html>. It's described in more detail in a big comment near the beginning of 'pigz.c' in the source tarball available at <http://zlib.net/pigz/>. Philip Martin wrote: > Julian Foad <julianf...@btopenworld.com> writes: > >> Yes, a client-side plug-in -- either to Subversion or to OpenOffice -- >> seems to me the best practical solution. > > A server-side solution is difficult. Suppose the client has some > uncompressed content U which it compresses to C and sends to the server. > The server can uncompress C to get U but unless the compression scheme > has a canonical compressed form, with no other forms allowed, the server > cannot avoid storing C because there is no guarantee that C can be > reconstructed from U. Yes, a server-side solution would have lots of problems including that one. Scalability is another -- keeping the server up to date with plug-ins for all (or most) of the compressed content types that the clients are using. A client-side plug-in does not have those problems, at least not to the same extent. It does have its own problems, though, including installation & configuration & portability issues. - Julian