Re: key/value store optimized for disk storage

Steve Howell Thu, 03 May 2012 23:38:19 -0700

On May 3, 11:03 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Steve Howell <showel...@yahoo.com> writes:
> > Sounds like a useful technique.  The text snippets that I'm
> > compressing are indeed mostly English words, and 7-bit ascii, so it
> > would be practical to use a compression library that just uses the
> > same good-enough encodings every time, so that you don't have to write
> > the encoding dictionary as part of every small payload.
>
> Zlib stays adaptive, the idea is just to start with some ready-made
> compression state that reflects the statistics of your data.
>
> > Sort of as you suggest, you could build a Huffman encoding for a
> > representative run of data, save that tree off somewhere, and then use
> > it for all your future encoding/decoding.
>
> Zlib is better than Huffman in my experience, and Python's zlib module
> already has the right entry points.  Looking at the docs,
> Compress.flush(Z_SYNC_FLUSH) is the important one.  I did something like
> this before and it was around 20 lines of code.  I don't have it around
> any more but maybe I can write something else like it sometime.
>
> > Is there a name to describe this technique?
>
> Incremental compression maybe?


Many thanks, this is getting me on the right path:

    compressor = zlib.compressobj()
    s = compressor.compress("foobar")
    s += compressor.flush(zlib.Z_SYNC_FLUSH)

    s_start = s
    compressor2 = compressor.copy()

    s += compressor.compress("baz")
    s += compressor.flush(zlib.Z_FINISH)
    print zlib.decompress(s)

    s = s_start
    s += compressor2.compress("spam")
    s += compressor2.flush(zlib.Z_FINISH)
    print zlib.decompress(s)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: key/value store optimized for disk storage

Reply via email to