Re: Aproximative string matching
javuchi wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word motorcycle, but returns similar strings like motorcicle. Is there such a library? agrep (aproximate grep) allows for a certain amount of errors and there exist Python bindings (http://www.bio.cam.ac.uk/~mw263/pyagrep.html) Or google for agrep python. Daniel -- http://mail.python.org/mailman/listinfo/python-list
Re: Aproximative string matching
Tim Roberts wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word motorcycle, but returns similar strings like motorcicle. Is there such a library? There is an algorithm called Soundex that replaces each word by a 4-character string, such that all words that are pronounced similarly encode to the same string. The algorithm is easy to implement; you can probably find one by Googling. Python used to ship with a soundex module, but it was removed in 1.6, for various reasons. here's a replacement: http://orca.mojam.com/~skip/python/soundex.py /F -- http://mail.python.org/mailman/listinfo/python-list
Re: Aproximative string matching
javuchi [EMAIL PROTECTED] writes: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word motorcycle, but returns similar strings like motorcicle. Is there such a library? I kind of like the one at http://www.personal.psu.edu/staff/i/u/iua1/python/apse/ which you might use something like import Apse ap = Apse.Approx('motorcycle', edit=1) ap.match(['motorcycle', 'motorcicle', 'motorscooter']) ['motorcycle', 'motorcicle'] That page mentions several alternatives, as well. I hope this helps, Tim -- http://mail.python.org/mailman/listinfo/python-list
Re: Aproximative string matching
Steven D'Aprano wrote: [EMAIL PROTECTED] wrote: This algorithm is called soundex. Here is one implementation example. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52213 here is another: http://effbot.org/librarybook/soundex.htm Soundex is *one* particular algorithm for approximate string matching. It is optimised for matching Anglo-American names (like Smith/Smythe), and is considered to be quite old and obsolete for all but the most trivial applications -- or so I'm told. Soundex will not match arbitrary changes -- it will match both cat and cet, but it won't match cat and mat. A more sophisticated approximate string matching algorithm will use the Levenshtein distance. You can find a Useless implementation here: http://www.uselesspython.com/download.php?script_id=108 Given a function levenshtein(s1, s2) that returns the distance between two strings, you could use it for approximate matching like this: def approx_matching(strlist, target, dist=1): Matches approximately strings in strlist to a target string. Returns a list of strings, where each string matched is no further than an edit distance of dist from the target. found = [] for s in strlist: if levenshtein(s, target) = dist: found.append(s) return s I compute a relative metric based on th every same implementation like this: def relative(a, b): Computes a relative distance between two strings. Its in the range (0-1] where 1 means total equality. @type a: string @param a: arg one @type b: string @param b: arg two @rtype: float @return: the distance d = levensthein(a,b) longer = float(max((len(a), len(b shorter = float(min((len(a), len(b r = ((longer - d) / longer) * (shorter / longer) return r The idea is that otherwise e.g. cat and hippopothamus have a l-distance of only 3, which one would consider good at the first look. Regards, Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: Aproximative string matching
On Mon, 21 Nov 2005 19:47:45 +0100, Diez B. Roggisch wrote: The idea is that otherwise e.g. cat and hippopothamus have a l-distance of only 3, which one would consider good at the first look. ??? I make it that the L-distance between cat and hippopothamus is twelve, not three. With len(cat)=3 and len(hippopothamus) = 13, you need ten insertions just to get the lengths equal, so the L-distance must be at least ten. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: Aproximative string matching
javuchi wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word motorcycle, but returns similar strings like motorcicle. Is there such a library? Perhaps the get_close_matches function that is presentt in the standard library (in the difflib module) could be useful ? --Irmen -- http://mail.python.org/mailman/listinfo/python-list
Re: Aproximative string matching
This algorithm is called soundex. Here is one implementation example. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52213 here is another: http://effbot.org/librarybook/soundex.htm -- http://mail.python.org/mailman/listinfo/python-list
Re: Aproximative string matching
[EMAIL PROTECTED] wrote: This algorithm is called soundex. Here is one implementation example. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52213 here is another: http://effbot.org/librarybook/soundex.htm Soundex is *one* particular algorithm for approximate string matching. It is optimised for matching Anglo-American names (like Smith/Smythe), and is considered to be quite old and obsolete for all but the most trivial applications -- or so I'm told. Soundex will not match arbitrary changes -- it will match both cat and cet, but it won't match cat and mat. A more sophisticated approximate string matching algorithm will use the Levenshtein distance. You can find a Useless implementation here: http://www.uselesspython.com/download.php?script_id=108 Given a function levenshtein(s1, s2) that returns the distance between two strings, you could use it for approximate matching like this: def approx_matching(strlist, target, dist=1): Matches approximately strings in strlist to a target string. Returns a list of strings, where each string matched is no further than an edit distance of dist from the target. found = [] for s in strlist: if levenshtein(s, target) = dist: found.append(s) return s -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: Aproximative string matching
javuchi [EMAIL PROTECTED] wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word motorcycle, but returns similar strings like motorcicle. Is there such a library? There is an algorithm called Soundex that replaces each word by a 4-character string, such that all words that are pronounced similarly encode to the same string. The algorithm is easy to implement; you can probably find one by Googling. -- - Tim Roberts, [EMAIL PROTECTED] Providenza Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list