Gregory Piñero wrote: > Ok, ok, I got it! The Pythonic way is to use an existing library ;-) > > import difflib > CloseMatches=difflib.get_close_matches(AFileName,AllFiles,20,.7) > > I wrote a script to delete duplicate mp3's by filename a few years > back with this. If anyone's interested in seeing it, I'll post a blog > entry on it. I'm betting it uses a similiar algorithm your functions.
A quick trip to difflib.py produces this description of the matching algorithm: The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching". The basic idea is to find the longest contiguous matching subsequence that contains no "junk" elements (R-O doesn't address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. So no, it doesn't seem to be using Levenshtein distance. Kent -- http://mail.python.org/mailman/listinfo/python-list