Could hit a few snags.  Quick out-of-the-library compression using
standards like zlib will have headers that will dilute the difference
on short strings, and on long strings block compression (zlib, bzip2)
will not pick up similarities because the similarities will be in
different blocks.  With blocks of around 100k-1M in these algos by
default (IIRC),  this could work well for strings between oh say
1k-50k.

But I need to underscore Aahz's posting above:

***Check out difflib, it's in the library.***  Perfect package for what
the OP wants AFAICT.

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to