As I understand Wikipedia, Double Metaphone improves over Metaphone in 2
areas:
1: Better linguistic matching
2: Can output a secondary token for words like Schmidt

A quick look at the Apache commons codec and Lucene filter, it doesn't seem
like that secondary token is supported?  There is "save" code for whether
inject is true/false, but that's not the same thing, and doesn't seem to
have been extended.

Either I'm reading it wrong?  Or it somehow produces a compound token in
those cases?

Looking on the web, one author claims that only 10% of names need a second
token anyway, so not a big deal, but still good to know.

Thanks

--
Mark Bennett / New Idea Engineering, Inc. / [email protected]
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Reply via email to