On Fri, 20 May 2005 01:47:15 +1000, Steven D'Aprano <[EMAIL PROTECTED]> wrote:
>On Thu, 19 May 2005 14:09:32 +1000, John Machin wrote: > >> None of the other approaches make the mistake of preserving the first >> letter -- this alone is almost enough reason for jettisoning soundex. > >Off-topic now, but you've made me curious. > >Why is this a bad idea? > >How would you handle the case of "barow" and "marow"? (Barrow and >marrow, naturally.) Without the first letter, they sound identical. Why is >throwing that information away a good thing? Sorry if that was unclear. By "preserving the first letter", I meant that in "standard" soundex, the first letter is not transformed into a digit. Karen -> K650 Kieran -> K650 (R->6, N->5; vowels->0 and then are squeezed out) Now compare this: Aaron -> A650 Erin -> E650 Bearing in mind that the usual application of soundex is "all or nothing", the result is Karen == Kieran, but Aaron !== Erin, which is at the very least extremely inconsistent. A better phonetic-key creator would produce the same result for each of the first pair, and for each of the second pair -- e.g. KARAN and ARAN respectively. Also consider Catherine vs Katherine. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list