As I understand Wikipedia, Double Metaphone improves over Metaphone in 2 areas: 1: Better linguistic matching 2: Can output a secondary token for words like Schmidt
A quick look at the Apache commons codec and Lucene filter, it doesn't seem like that secondary token is supported? There is "save" code for whether inject is true/false, but that's not the same thing, and doesn't seem to have been extended. Either I'm reading it wrong? Or it somehow produces a compound token in those cases? Looking on the web, one author claims that only 10% of names need a second token anyway, so not a big deal, but still good to know. Thanks -- Mark Bennett / New Idea Engineering, Inc. / [email protected] Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
