[EMAIL PROTECTED] wrote:

> Essentially, they note that the NCD does not always bevave like a
> metric and one reason they put forward is that this may be due to the
> size of the header portion (they were using the command line gzip and
> bzip2 programs) compared to the strings being compressed (which are on
> average 48 bytes long).

gzip datastreams have a real header, with a file type identifier, 
optional filenames, comments, and a bunch of flags.

but even if you strip that off (which is basically what happens if you 
use zlib.compress instead of gzip), I doubt you'll get representative 
"compressability" metrics on strings that short.  like most other 
compression algorithms, those algorithms are designed for much larger 
datasets.

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to