[
https://issues.apache.org/jira/browse/LUCENE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801833#action_12801833
]
Robert Muir commented on LUCENE-2055:
-------------------------------------
Now that we have snowball tests, I started looking at integrating snowball and
deprecating this custom code.
So I ran the snowball tests against these hand-coded algorithms to see what the
differences are... remember they all claim to implement porter:
* RussianStemFilter one passes 100% all snowball tests.
* DutchStemFilter passes 98.9% of snowball tests. all bugs were in handling of
double consonants:
examples:
aangetroffen -> aangetrof expected: aangetroff
afvoerbonnen -> afvoerbon expected: afvoerbonn
klommen -> klom expected: klomm
* FrenchStemFilter only passes 93.92% of snowball tests. but if you put
lowercasefilter after it, it passes 99.66%!
The problem is this stemmer incorrectly creates some uppercase stems from
lowercase words. examples:
xviii -> xviI expected: xvii
vouer -> voU expected: vou
tranquille -> tranqUill expected: tranquill
> Remove duplicate analysis functionality
> ---------------------------------------
>
> Key: LUCENE-2055
> URL: https://issues.apache.org/jira/browse/LUCENE-2055
> Project: Lucene - Java
> Issue Type: Task
> Components: contrib/analyzers
> Reporter: Robert Muir
> Fix For: 3.1
>
>
> would like to mark the following code deprecated, so it can be removed.
> * analyzers/fr: all except ElisionFilter, this is unrelated and standalone.
> * analyzers/nl:entire package
> * analyzers/ru: entire package
> below are excerpts from this code where they proudly proclaim they use the
> snowball algorithm.
> I think we should delete all of this code in favor of the actual snowball
> package.
> {noformat}
> /**
> * A stemmer for French words.
> * <p>
> * The algorithm is based on the work of
> * Dr Martin Porter on his snowball project<br>
> * refer to http://snowball.sourceforge.net/french/stemmer.html<br>
> * (French stemming algorithm) for details
> * </p>
> */
> public class FrenchStemmer {
> /**
> * A stemmer for Dutch words.
> * <p>
> * The algorithm is an implementation of
> * the <a
> href="http://snowball.tartarus.org/algorithms/dutch/stemmer.html">dutch
> stemming</a>
> * algorithm in Martin Porter's snowball project.
> * </p>
> */
> public class DutchStemmer {
> /**
> * Russian stemming algorithm implementation (see
> http://snowball.sourceforge.net for detailed description).
> */
> class RussianStemmer
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]