[ 
https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Currens updated LUCENENET-466:
------------------------------------------

    Attachment: DIN2Stemmer.patch

Bjorn,

I've made this patch from the src/contrib/Analyzers folder, on top of the DIN2 
changes already committed to trunk.  Since the extent of my German is "danke!", 
I was hoping you could see if this stemmer is working properly before I commit 
it to trunk.

These were the test cases I made that should hopefully emulate the results of 
the normal DIN1 stemmer, where the word left of the semicolon is the word, and 
to the right, the result.

{noformat}
# Test cases for words with ae, ue, or oe in them
Haus;hau
Hauses;hau
Haeuser;hau
Haeusern;hau
steuer;steur
rueckwaerts;ruckwar
geheimtuer;geheimtur
{noformat}

With the last word in particular, it produces fairly different results in each 
stemmer, though I think they are expected, due to the different DIN.

Also, the DIN2 stemmer will also translate 'Häuser' and 'Häusern' properly (to 
hau), so there is support for both umlauts and the expanded 'ae', 'oe' and 'ue' 
forms.
                
> optimisation for the GermanStemmer.vb‏
> --------------------------------------
>
>                 Key: LUCENENET-466
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-466
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net Contrib
>    Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3
>            Reporter: Prescott Nasser
>            Priority: Minor
>             Fix For: Lucene.Net 3.0.3
>
>         Attachments: DIN2Stemmer.patch
>
>
> I have a little optimisation for the GermanStemmer.vb (in 
> Contrib.Analyzers) class. At the moment the function "Substitute" 
> converts the german "Umlaute" "ä" in "a", "ö" in"o" and "ü" in "u". This 
> is not the correct german translation. They must be converted to "ae", 
> "oe" and "ue". So I can write the name "Björn" or "Bjoern" but not 
> "Bjorn". With this optimization a user can search for "Björn" and also 
> find "Bjoern".
>  
> Here is the optimized code snippet:
>  
> else if ( buffer[c] == 'ä' )
>  {
>  buffer[c] = 'a';
>  buffer.Insert(c + 1, 'e');
>  }
>  else if ( buffer[c] == 'ö' )
>  {
>  buffer[c] = 'o';
>  buffer.Insert(c + 1,'e');
>  }
>  else if ( buffer[c] == 'ü' )
>  {
>  buffer[c] = 'u';
>  buffer.Insert(c + 1,'e');
>  }
>  
> Thank You
> Björn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to