Does anyone have any experience with the stemmers? I know that Porter
is what "everyone" uses. Am I better off with KStemFilter (better
performance) or ?? Does anyone understand the differences between the
various stemmers and how to choose one over another?
We started off using Porter, then switched to KStem since Porter is way
too aggressive for us (you get a lot of false matches), but KStem seemed
a little bit too conservative, so we've had to augment it with synonyms.
For example, KStem doesn't seem to reduce plurals in some cases where it
seems it should - like "bounds" was a problem - it won't match "bound,"
even though many (most) other plurals will match their singular form,
and verbs get reduced to their stems as well. I thought maybe this was
because there is also a heteronym (spelled same, different word) that is
*not* a plural or verb ("bounds" as boundary as in "out of bounds"??),
but I'm not really sure how KStem's word lists were put together or what
the goal was. Maybe this was ust an oversight?
YMMV; it depends a lot on what you are trying to achieve.
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org