[jira] [Commented] (LUCENENET-466) optimisation for the GermanStemmer.vb‏

Commented Mon, 26 Mar 2012 00:21:05 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238160#comment-13238160
 ]


Björn commented on LUCENENET-466:
---------------------------------

Hello,

maybe it's a good idea to combine the DIN1 and the DIN2 algorithm. At the 
moment the DIN2 stemmer "destroys" the root of the word:

Haus => Haus
Häuser => Haeuser
Haeuser => Haeuser


DIN1 means:
ä = a
DIN2 means:
ä = ae

So we could implicit say: ä = ae = a. This corrects the "root" problem:

Haus => Haus
Häuser => Hauser
Haeuser => Hauser

Greetings
Björn
                
> optimisation for the GermanStemmer.vb‏
> --------------------------------------
>
>                 Key: LUCENENET-466
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-466
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net Contrib
>    Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3
>            Reporter: Prescott Nasser
>            Priority: Minor
>             Fix For: Lucene.Net 3.0.3
>
>
> I have a little optimisation for the GermanStemmer.vb (in 
> Contrib.Analyzers) class. At the moment the function "Substitute" 
> converts the german "Umlaute" "ä" in "a", "ö" in"o" and "ü" in "u". This 
> is not the correct german translation. They must be converted to "ae", 
> "oe" and "ue". So I can write the name "Björn" or "Bjoern" but not 
> "Bjorn". With this optimization a user can search for "Björn" and also 
> find "Bjoern".
>  
> Here is the optimized code snippet:
>  
> else if ( buffer[c] == 'ä' )
>  {
>  buffer[c] = 'a';
>  buffer.Insert(c + 1, 'e');
>  }
>  else if ( buffer[c] == 'ö' )
>  {
>  buffer[c] = 'o';
>  buffer.Insert(c + 1,'e');
>  }
>  else if ( buffer[c] == 'ü' )
>  {
>  buffer[c] = 'u';
>  buffer.Insert(c + 1,'e');
>  }
>  
> Thank You
> Björn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENENET-466) optimisation for the GermanStemmer.vb‏

Reply via email to