On Apr 1, 2005, at 7:03 PM, Chris Hostetter wrote:


: > > Are there any Lucene extensions that can do simple stemming, i.e. just
: > > for plurals? Or is the only stemming package available Snowball?


LIA has a case study of jGuru which uses a very specific, home grown
utility method called "stripEnglishPlural" ... since it's in the case
study chapter, i'm not sure if it's included in the books source code, but
is included verbatim in the book...


http://lucenebook.com/search?query=stripEnglishPlural

Thanks for the reminder, Chris. I'm sure jGuru wouldn't mind us posting it, so I've pasted it below. It is not included in the LIA source code - only the code Otis and I wrote ourselves is included there and we didn't get the source code from any of the case studies (other than Bob Carpenter's LingPipe stuff).


        Erik


/** A useful, but not particularly efficient plural stripper */ public static String stripEnglishPlural(String word) { // too small? if ( word.length()<STRIP_PLURAL_MIN_WORD_SIZE ) { return word; } // special cases if ( word.equals("has") || word.equals("was") || word.equals("does") || word.equals("goes") || word.equals("dies") || word.equals("yes") || word.equals("gets") || // means too much in java/JSP word.equals("its") ) { return word; } String newWord=word; if ( word.endsWith("sses") || word.endsWith("xes") || word.endsWith("hes") ) { // remove 'es' newWord = word.substring(0,word.length()-2); } else if ( word.endsWith("ies") ) { // remove 'ies', replace with 'y' newWord = word.substring(0,word.length()-3)+'y'; } else if ( word.endsWith("s") && !word.endsWith("ss") && !word.endsWith("is") && !word.endsWith("us") && !word.endsWith("pos") && !word.endsWith("ses") ) { // remove 's' newWord = word.substring(0,word.length()-1); } return newWord; }


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to