There is a list of stop words in NutchAnalysis class (
org.apache.nutch.analysis). I guess thats where the common terms are removed
during analysis.

--Rajesh Munavalli

On 3/30/06, Vanderdray, Jacob <[EMAIL PROTECTED]> wrote:
>        I've added some code to query-basic to log the query after it
> has run both addTerms and addPhrases.  This helps me to better
> understand what's going on.  I've noticed that when my search contains
> words like "the" or "a", those don't appear in the actual query.
>        It looks to me like the common-terms.utf8 file is supposed to be
> used to strip common words like "the" out of queries for specific
> fields, but that doesn't seem to be what's happening.  The term "the"
> ends up getting stripped out of the query for all fields (url, content,
> anchor, etc.).  I even tried removing "the" from the common-terms.utf8
> file, but didn't see any change in behavior.
>        Does this file only get used when indexing?  If so what
> determines which words get stripped out of searches?
> Thanks,
> Jake.

Reply via email to