[ https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238160#comment-13238160 ]
Björn commented on LUCENENET-466: --------------------------------- Hello, maybe it's a good idea to combine the DIN1 and the DIN2 algorithm. At the moment the DIN2 stemmer "destroys" the root of the word: Haus => Haus Häuser => Haeuser Haeuser => Haeuser DIN1 means: ä = a DIN2 means: ä = ae So we could implicit say: ä = ae = a. This corrects the "root" problem: Haus => Haus Häuser => Hauser Haeuser => Hauser Greetings Björn > optimisation for the GermanStemmer.vb > -------------------------------------- > > Key: LUCENENET-466 > URL: https://issues.apache.org/jira/browse/LUCENENET-466 > Project: Lucene.Net > Issue Type: Improvement > Components: Lucene.Net Contrib > Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 > Reporter: Prescott Nasser > Priority: Minor > Fix For: Lucene.Net 3.0.3 > > > I have a little optimisation for the GermanStemmer.vb (in > Contrib.Analyzers) class. At the moment the function "Substitute" > converts the german "Umlaute" "ä" in "a", "ö" in"o" and "ü" in "u". This > is not the correct german translation. They must be converted to "ae", > "oe" and "ue". So I can write the name "Björn" or "Bjoern" but not > "Bjorn". With this optimization a user can search for "Björn" and also > find "Bjoern". > > Here is the optimized code snippet: > > else if ( buffer[c] == 'ä' ) > { > buffer[c] = 'a'; > buffer.Insert(c + 1, 'e'); > } > else if ( buffer[c] == 'ö' ) > { > buffer[c] = 'o'; > buffer.Insert(c + 1,'e'); > } > else if ( buffer[c] == 'ü' ) > { > buffer[c] = 'u'; > buffer.Insert(c + 1,'e'); > } > > Thank You > Björn -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira