[ https://issues.apache.org/jira/browse/SOLR-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812912#comment-13812912 ]
Thomas Champagne commented on SOLR-2982: ---------------------------------------- I noticed too the bad performance of Beider Morse encoder. So, I have created an issue CODEC-174 in the commons-codec project to improve the performance. Currently, I have created two patches that allow dividing the encoding time by 2. If you want a better Beider Morse encoder, you can join us on the issue CODEC-174 :) > Upgrade Apache Commons Codec to version 1.6 in order to add new Beider-Morse > Phonetic Matching (BMPM) option > ------------------------------------------------------------------------------------------------------------ > > Key: SOLR-2982 > URL: https://issues.apache.org/jira/browse/SOLR-2982 > Project: Solr > Issue Type: Improvement > Components: Rules, Schema and Analysis, search > Reporter: Brooke Schreier Ganz > Labels: codec, commons, commons-codec, language, names, > phonetic, search, searching, soundalike > Fix For: 3.6, 4.0-ALPHA > > Attachments: SOLR-2982.patch > > > Apache Commons Codec released version 1.6 of their codec pack in November, > 2011. Along with a few bug fixes, 1.6 contains a great new phonetic matching > system called Beider-Morse Phonetic Matching (BMPM) that is far superior to > the existing phonetic codecs, such as regular soundex, metaphone, caverphone, > and so on. BMPM has actually been available for some time, but this is the > first port of it to java, and its first commit in the Apache ecosystem. > For a lot more information, see here: http://stevemorse.org/phoneticinfo.htm > and http://stevemorse.org/phonetics/bmpm.htm > BMPM would be a fantastic "soundalike" tool to help search for personal names > (or just surnames) in a Solr/Lucene index, much better than Levenshtein > distance for this use case. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org