William Park a écrit :
> How do you compare 2 strings, and determine how much they are "close" to
> each other?  Eg.
>     aqwerty
>     qwertyb
> are similar to each other, except for first/last char.  But, how do I
> quantify that?
>
> I guess you can say for the above 2 strings that
>     - at max, 6 chars out of 7 are same sequence --> 85% max
>
> But, for
>     qawerty
>     qwerbty
> max correlation is
>     - 3 chars out of 7 are the same sequence --> 42% max
>
> (Crossposted to 3 of my favourite newsgroup.)

Hi,

If you want to use phonetic comparison, here are some algorithms that
are reportedly more efficient than Soundex :

Double-Metaphone
NYSIIS
Phonex

Of course, phonetic algorithms have a lot of disadvantages, the main
one being that they know about one way to pronounce words (usually a
rather rigid, anglo-saxon way) which may not be the right way (hence
the examples given before for Gaellic surnames). But these ones are far
"better" than soundex.

Regards,

Nicolas Lehuen

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to