On Tue, 2 Sep 2008 06:17:37 -0700 (PDT), Guillermo wrote: > Does anyone know whether this function uses edit distance? If not, > which algorithm is it using?
The following passage comes from difflib.py: SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching". The basic idea is to find the longest contiguous matching subsequence that contains no "junk" elements (R-O doesn't address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that "look right" to people. HTH. -- Regards, Wojtek Walczak, http://tosh.pl/gminick/ -- http://mail.python.org/mailman/listinfo/python-list