The Levenshtein distance is a very precise definition of dissimilarity between 
sequences.  It specifies the minimum number of single-element edits you would 
need to change one sequence into another.  You are right that it is fairly 
expensive to compute.

But you asked for an algorithm that would determine whether groups of strings 
are "sort of similar".  How imprecise can you be?  An analysis of the frequency 
of each individual character in a string might be good enough for you.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to