William Park a écrit : > How do you compare 2 strings, and determine how much they are "close" to > each other? Eg. > aqwerty > qwertyb > are similar to each other, except for first/last char. But, how do I > quantify that? > > I guess you can say for the above 2 strings that > - at max, 6 chars out of 7 are same sequence --> 85% max > > But, for > qawerty > qwerbty > max correlation is > - 3 chars out of 7 are the same sequence --> 42% max > > (Crossposted to 3 of my favourite newsgroup.)
Hi, If you want to use phonetic comparison, here are some algorithms that are reportedly more efficient than Soundex : Double-Metaphone NYSIIS Phonex Of course, phonetic algorithms have a lot of disadvantages, the main one being that they know about one way to pronounce words (usually a rather rigid, anglo-saxon way) which may not be the right way (hence the examples given before for Gaellic surnames). But these ones are far "better" than soundex. Regards, Nicolas Lehuen -- http://mail.python.org/mailman/listinfo/python-list