[ https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Christopher Currens updated LUCENENET-466: ------------------------------------------ Attachment: DIN2Stemmer.patch Bjorn, I've made this patch from the src/contrib/Analyzers folder, on top of the DIN2 changes already committed to trunk. Since the extent of my German is "danke!", I was hoping you could see if this stemmer is working properly before I commit it to trunk. These were the test cases I made that should hopefully emulate the results of the normal DIN1 stemmer, where the word left of the semicolon is the word, and to the right, the result. {noformat} # Test cases for words with ae, ue, or oe in them Haus;hau Hauses;hau Haeuser;hau Haeusern;hau steuer;steur rueckwaerts;ruckwar geheimtuer;geheimtur {noformat} With the last word in particular, it produces fairly different results in each stemmer, though I think they are expected, due to the different DIN. Also, the DIN2 stemmer will also translate 'Häuser' and 'Häusern' properly (to hau), so there is support for both umlauts and the expanded 'ae', 'oe' and 'ue' forms. > optimisation for the GermanStemmer.vb > -------------------------------------- > > Key: LUCENENET-466 > URL: https://issues.apache.org/jira/browse/LUCENENET-466 > Project: Lucene.Net > Issue Type: Improvement > Components: Lucene.Net Contrib > Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 > Reporter: Prescott Nasser > Priority: Minor > Fix For: Lucene.Net 3.0.3 > > Attachments: DIN2Stemmer.patch > > > I have a little optimisation for the GermanStemmer.vb (in > Contrib.Analyzers) class. At the moment the function "Substitute" > converts the german "Umlaute" "ä" in "a", "ö" in"o" and "ü" in "u". This > is not the correct german translation. They must be converted to "ae", > "oe" and "ue". So I can write the name "Björn" or "Bjoern" but not > "Bjorn". With this optimization a user can search for "Björn" and also > find "Bjoern". > > Here is the optimized code snippet: > > else if ( buffer[c] == 'ä' ) > { > buffer[c] = 'a'; > buffer.Insert(c + 1, 'e'); > } > else if ( buffer[c] == 'ö' ) > { > buffer[c] = 'o'; > buffer.Insert(c + 1,'e'); > } > else if ( buffer[c] == 'ü' ) > { > buffer[c] = 'u'; > buffer.Insert(c + 1,'e'); > } > > Thank You > Björn -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira