Another word set to try: invest, investing, investment, investments, invests, investor, invester, investors, investers.

Also, take a look at EnglishMinimalStemmer (EnglishMinimalStemFilterFactory) for minimal stemming.

See:
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemFilterFactory.html
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemmer.html

-- Jack Krupansky

-----Original Message----- From: Scott Smith
Sent: Wednesday, November 14, 2012 5:17 PM
To: java-user@lucene.apache.org
Subject: RE: Which stemmer?

Unfortunately, my "use case" is a customer who wants stemming, but has very little knowledge of what that means except they think they want it.

I agree with your last comment.  So, here's my contribution:

 Original      porter       kstem     minStem
  -------     -------     -------     -------
  country     countri     country     country
      run         run         run         run
     runs         run        runs         run
  running         run     running     running
     read        read        read        read
  reading        read     reading     reading
   reader      reader      reader      reader
association     associ association association
associate      associ   associate   associate
  listing        list        list     listing
    water       water       water       water
  watered       water       water     watered
     sure        sure        sure        sure
   surely        sure      surely      surely
   fred's       fred'      fred's       fred'
    roses        rose        rose        rose

Still not sure which one to pick. Porter is more aggressive. Min stemmer is pretty minimal. Perhaps the kstemmer is "just right" :-)

Cheers

Scott

-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, November 14, 2012 4:14 PM
To: java-user@lucene.apache.org
Subject: Re: Which stemmer?

What is your use case? If you don't have a specific use case in mind, try each of them with some common words that you expect will or won't be stemmed. If you have Solr, you can experiment interactively using the Solr Admin Analysis web page.

It would be nice if the javadoc for each stemmer gave a handful of examples that illustrated how some common words are stemmed.

-- Jack Krupansky

-----Original Message-----
From: Scott Smith
Sent: Wednesday, November 14, 2012 10:55 AM
To: java-user@lucene.apache.org
Subject: Which stemmer?

Does anyone have any experience with the stemmers? I know that Porter is what "everyone" uses. Am I better off with KStemFilter (better performance) or ?? Does anyone understand the differences between the various stemmers and how to choose one over another?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to