Hi all, I have ported the kstem stemmer to Java and incorporated it to Lucene. You can get the source code (Kstem.jar) from the following website:
http://ciir.cs.umass.edu/downloads/ Just click on "KStem Java Implementation" (you will need to register your e-mail, for free of course, with the CIIR --Center for Intelligent Information Retrieval, UMass -- and get an access code). Content of Kstem.jar: java/org/apache/lucene/analysis/KStemData1.java java/org/apache/lucene/analysis/KStemData2.java java/org/apache/lucene/analysis/KStemData3.java java/org/apache/lucene/analysis/KStemData4.java java/org/apache/lucene/analysis/KStemData5.java java/org/apache/lucene/analysis/KStemData6.java java/org/apache/lucene/analysis/KStemData7.java java/org/apache/lucene/analysis/KStemData8.java java/org/apache/lucene/analysis/KStemFilter.java java/org/apache/lucene/analysis/KStemmer.java KStemData1.java, ..., KStemData8.java Contain several lists of words used by Kstem KStemmer.java Implements the Kstem algorithm KStemFilter.java Extends TokenFilter applying Kstem To compile unjar the file Kstem.jar to Lucene's "src" directory, and compile it there. What is Kstem? A stemmer designed by Bob Krovetz (for more information see http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). Copyright issues This is open source. The actual license agreement is included at the top of every source file. Any comments/questions/suggestions are welcome, Sergio Guzman-Lara Senior Research Fellow CIIR UMass