[ https://issues.apache.org/jira/browse/LUCENE-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tomoko Uchida resolved LUCENE-8937. ----------------------------------- Resolution: Fixed Assignee: Tomoko Uchida Fix Version/s: master (9.0) > Avoid agressive stemming on numbers in the FrenchMinimalStemmer > --------------------------------------------------------------- > > Key: LUCENE-8937 > URL: https://issues.apache.org/jira/browse/LUCENE-8937 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Gallou > Assignee: Tomoko Uchida > Priority: Minor > Fix For: master (9.0) > > Attachments: > 0001-LUCENE-8937-Avoid-agressive-stemming-on-numbers-in-t.patch, > LUCENE-8937.patch > > > Here is the discussion on the mailing list : > [http://mail-archives.apache.org/mod_mbox/lucene-java-user/201907.mbox/browser] > The light stemmer removes the last character of a word if the last two > characters are identical. > We can see that here: > > https://github.com/apache/lucene-solr/blob/813ca77/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java#L263 > In this light stemmer, there is a check to avoid altering the token if the > token is a number. > The minimal stemmer also removes the last character of a word if the last > two characters are identical. > We can see that here: > > https://github.com/apache/lucene-solr/blob/813ca77/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java#L77 > But in this minimal stemmer there is no check to see if the character is a > letter or not. > So when we have numeric tokens with the last two characters identical they > are altered. > For example "1234567899" will be stemmed as "123456789". > It could be great of it's not altered. > Here is the same issue for the LightStemmer : > https://issues.apache.org/jira/browse/LUCENE-4063 -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org