Unfortunately, my "use case" is a customer who wants stemming, but has very 
little knowledge of what that means except they think they want it.  

I agree with your last comment.  So, here's my contribution:

  Original      porter       kstem     minStem
   -------     -------     -------     -------
   country     countri     country     country
       run         run         run         run
      runs         run        runs         run
   running         run     running     running
      read        read        read        read
   reading        read     reading     reading
    reader      reader      reader      reader
association     associ association association
 associate      associ   associate   associate
   listing        list        list     listing
     water       water       water       water
   watered       water       water     watered
      sure        sure        sure        sure
    surely        sure      surely      surely
    fred's       fred'      fred's       fred'
     roses        rose        rose        rose

Still not sure which one to pick.  Porter is more aggressive.  Min stemmer is 
pretty minimal.  Perhaps the kstemmer is "just right" :-)

Cheers

Scott

-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Wednesday, November 14, 2012 4:14 PM
To: java-user@lucene.apache.org
Subject: Re: Which stemmer?

What is your use case? If you don't have a specific use case in mind, try each 
of them with some common words that you expect will or won't be stemmed. If you 
have Solr, you can experiment interactively using the Solr Admin Analysis web 
page.

It would be nice if the javadoc for each stemmer gave a handful of examples 
that illustrated how some common words are stemmed.

-- Jack Krupansky

-----Original Message-----
From: Scott Smith
Sent: Wednesday, November 14, 2012 10:55 AM
To: java-user@lucene.apache.org
Subject: Which stemmer?

Does anyone have any experience with the stemmers?  I know that Porter is what 
"everyone" uses.  Am I better off with KStemFilter (better performance) or ??  
Does anyone understand the differences between the various stemmers and how to 
choose one over another? 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to