greg <[EMAIL PROTECTED]> wrote:
>  Nick Craig-Wood wrote:
> > (Note that basic pickle protocol is likely to be more compressible
> > than the binary version!)
> 
>  Although the binary version may be more compact to
>  start with. It would be interesting to compare the
>  two and see which one wins.

It is very data dependent of course, but in this case the binary
version wins...

However there is exactly the same amount of information in the text
pickle and the binary pickle, so in theory a perfect compressor will
compress each to exactly the same size ;-)

>>> import os
>>> import bz2
>>> import pickle
>>> L = range(1000000)
>>> f = bz2.BZ2File("z.dat", "wb")
>>> pickle.dump(L, f)
>>> f.close()
>>> os.path.getsize("z.dat")
1055197L
>>> f = bz2.BZ2File("z1.dat", "wb")
>>> pickle.dump(L, f, -1)
>>> f.close()
>>> os.path.getsize("z1.dat")
524741L
>>>

Practical considerations might be that bz2 is quite CPU expensive.  It
also has quite a large overhead

eg

>>> len("a".encode("bz2"))
37

So if you are compressing lots of small things, zip is a better
protocol

>>> len("a".encode("zip"))
9

It is also much faster!

-- 
Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to