Re: Fuzzy Lookups

Kent Johnson Tue, 31 Jan 2006 12:56:53 -0800

Gregory Piñero wrote:
> Ok, ok, I got it!  The Pythonic way is to use an existing library ;-)
> 
> import difflib
> CloseMatches=difflib.get_close_matches(AFileName,AllFiles,20,.7)
> 
> I wrote a script to delete duplicate mp3's by filename a few years
> back with this.  If anyone's interested in seeing it, I'll post a blog
> entry on it.  I'm betting it uses a similiar algorithm your functions.


A quick trip to difflib.py produces this description of the matching 
algorithm:

   The basic
   algorithm predates, and is a little fancier than, an algorithm
   published in the late 1980's by Ratcliff and Obershelp under the
   hyperbolic name "gestalt pattern matching".  The basic idea is to find
   the longest contiguous matching subsequence that contains no "junk"
   elements (R-O doesn't address junk).  The same idea is then applied
   recursively to the pieces of the sequences to the left and to the
   right of the matching subsequence.

So no, it doesn't seem to be using Levenshtein distance.

Kent
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Fuzzy Lookups

Reply via email to