[ 
https://issues.apache.org/jira/browse/LUCENE-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896092#comment-16896092
 ] 

ASF subversion and git services commented on LUCENE-8937:
---------------------------------------------------------

Commit d9d16eec95bf294ecf2b73a73f9310e967f4d03c in lucene-solr's branch 
refs/heads/master from Adrien Gallou
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d9d16ee ]

LUCENE-8937: Avoid agressive stemming on numbers in the FrenchMinimalStemmer


> Avoid agressive stemming on numbers in the FrenchMinimalStemmer
> ---------------------------------------------------------------
>
>                 Key: LUCENE-8937
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8937
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Gallou
>            Priority: Minor
>         Attachments: 
> 0001-LUCENE-8937-Avoid-agressive-stemming-on-numbers-in-t.patch, 
> LUCENE-8937.patch
>
>
> Here is the discussion on the mailing list : 
> [http://mail-archives.apache.org/mod_mbox/lucene-java-user/201907.mbox/browser]
> The light stemmer removes the last character of a word if the last two
>  characters are identical.
>  We can see that here:
>  
> https://github.com/apache/lucene-solr/blob/813ca77/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java#L263
>  In this light stemmer, there is a check to avoid altering the token if the
>  token is a number.
> The minimal stemmer also removes the last character of a word if the last
>  two characters are identical.
>  We can see that here:
>  
> https://github.com/apache/lucene-solr/blob/813ca77/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java#L77
> But in this minimal stemmer there is no check to see if the character is a
>  letter or not.
>  So when we have numeric tokens with the last two characters identical they
>  are altered.
> For example "1234567899" will be stemmed as "123456789".
> It could be great of it's not altered.
> Here is the same issue for the LightStemmer : 
> https://issues.apache.org/jira/browse/LUCENE-4063



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to