They shouldn't be in the index.  You must be using StandardAnalyzer 
incorrectly, or maybe you think you are using it, but are really using 
something else.

Otis

----- Original Message ----
From: Jason Polites <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Saturday, September 2, 2006 9:05:27 AM
Subject: Stop words in index

Hey all,

I am using the StandardAnalyzer with my own list of stop words (which is
more comprehensive than the default list), and my expectation was that this
would omit these stop words from the index when data is indexed using this
analyzer.  However, I am seeing stop words in the term vector for documents
indexed with this analyzer.

Is this expected behaviour?  Is there any way I can force these stop words
to be omitted from the index?  Having them in the index is wreaking havoc
with term vector analysis to determine document similarity.

Thanks.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to