Does anyone have any experience with the stemmers? I know that Porter is what "everyone" uses. Am I better off with KStemFilter (better performance) or ?? Does anyone understand the differences between the various stemmers and how to choose one over another?
We started off using Porter, then switched to KStem since Porter is way too aggressive for us (you get a lot of false matches), but KStem seemed a little bit too conservative, so we've had to augment it with synonyms.

For example, KStem doesn't seem to reduce plurals in some cases where it seems it should - like "bounds" was a problem - it won't match "bound," even though many (most) other plurals will match their singular form, and verbs get reduced to their stems as well. I thought maybe this was because there is also a heteronym (spelled same, different word) that is *not* a plural or verb ("bounds" as boundary as in "out of bounds"??), but I'm not really sure how KStem's word lists were put together or what the goal was. Maybe this was ust an oversight?

YMMV; it depends a lot on what you are trying to achieve.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to