What is the appropriate way of achieving both stopwords and stemming of
stopwords when the MoreLikeThis class is used? My analyzer
(MoreLikeThis.setAnalyzer) uses the Snowball filter, and is initialized
with a stopwords set:
analyzer = new StandardAnalyzer(stopwords) {
public TokenStream tokenStream(String fieldName,
java.io.Reader reader) {
return new SnowballFilter(super
.tokenStream(fieldName,reader),
"English");
}
};
If I do NOTsupply a separate stopwords list to the MoreLikeThis object
(that is, using MoreLikeThis.setStopWords), will "the right thing" happen;
that is, will my input text to the MoreLikeThis object be stemmed and
(stemmed) stopwords removed before a query is formed? It seems that
MoreLikeThis.setStopWords uses a simple lookup of words in the stop words
list (no stemming) which is not what I want.
Thanks in advance
Donna
Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[EMAIL PROTECTED]