Could hit a few snags. Quick out-of-the-library compression using standards like zlib will have headers that will dilute the difference on short strings, and on long strings block compression (zlib, bzip2) will not pick up similarities because the similarities will be in different blocks. With blocks of around 100k-1M in these algos by default (IIRC), this could work well for strings between oh say 1k-50k.
But I need to underscore Aahz's posting above: ***Check out difflib, it's in the library.*** Perfect package for what the OP wants AFAICT. -- http://mail.python.org/mailman/listinfo/python-list