Another word set to try: invest, investing, investment, investments,
invests, investor, invester, investors, investers.
Also, take a look at EnglishMinimalStemmer (EnglishMinimalStemFilterFactory)
for minimal stemming.
See:
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemFilterFactory.html
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemmer.html
-- Jack Krupansky
-----Original Message-----
From: Scott Smith
Sent: Wednesday, November 14, 2012 5:17 PM
To: java-user@lucene.apache.org
Subject: RE: Which stemmer?
Unfortunately, my "use case" is a customer who wants stemming, but has very
little knowledge of what that means except they think they want it.
I agree with your last comment. So, here's my contribution:
Original porter kstem minStem
------- ------- ------- -------
country countri country country
run run run run
runs run runs run
running run running running
read read read read
reading read reading reading
reader reader reader reader
association associ association association
associate associ associate associate
listing list list listing
water water water water
watered water water watered
sure sure sure sure
surely sure surely surely
fred's fred' fred's fred'
roses rose rose rose
Still not sure which one to pick. Porter is more aggressive. Min stemmer
is pretty minimal. Perhaps the kstemmer is "just right" :-)
Cheers
Scott
-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, November 14, 2012 4:14 PM
To: java-user@lucene.apache.org
Subject: Re: Which stemmer?
What is your use case? If you don't have a specific use case in mind, try
each of them with some common words that you expect will or won't be
stemmed. If you have Solr, you can experiment interactively using the Solr
Admin Analysis web page.
It would be nice if the javadoc for each stemmer gave a handful of examples
that illustrated how some common words are stemmed.
-- Jack Krupansky
-----Original Message-----
From: Scott Smith
Sent: Wednesday, November 14, 2012 10:55 AM
To: java-user@lucene.apache.org
Subject: Which stemmer?
Does anyone have any experience with the stemmers? I know that Porter is
what "everyone" uses. Am I better off with KStemFilter (better performance)
or ?? Does anyone understand the differences between the various stemmers
and how to choose one over another?
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org