On May 3, 11:03 pm, Paul Rubin <no.em...@nospam.invalid> wrote: > Steve Howell <showel...@yahoo.com> writes: > > Sounds like a useful technique. The text snippets that I'm > > compressing are indeed mostly English words, and 7-bit ascii, so it > > would be practical to use a compression library that just uses the > > same good-enough encodings every time, so that you don't have to write > > the encoding dictionary as part of every small payload. > > Zlib stays adaptive, the idea is just to start with some ready-made > compression state that reflects the statistics of your data. > > > Sort of as you suggest, you could build a Huffman encoding for a > > representative run of data, save that tree off somewhere, and then use > > it for all your future encoding/decoding. > > Zlib is better than Huffman in my experience, and Python's zlib module > already has the right entry points. Looking at the docs, > Compress.flush(Z_SYNC_FLUSH) is the important one. I did something like > this before and it was around 20 lines of code. I don't have it around > any more but maybe I can write something else like it sometime. > > > Is there a name to describe this technique? > > Incremental compression maybe?
Many thanks, this is getting me on the right path: compressor = zlib.compressobj() s = compressor.compress("foobar") s += compressor.flush(zlib.Z_SYNC_FLUSH) s_start = s compressor2 = compressor.copy() s += compressor.compress("baz") s += compressor.flush(zlib.Z_FINISH) print zlib.decompress(s) s = s_start s += compressor2.compress("spam") s += compressor2.flush(zlib.Z_FINISH) print zlib.decompress(s) -- http://mail.python.org/mailman/listinfo/python-list