It is possible to do stop-word processing at index-time or at query-time.
It is generally good practice except in extreme applications to do it at
query time so that you have the use of the stop words in phrases.  Classic
examples is searching for "The Inc" (a company name) or "to be or not to be"
(a famous quote).

I can't comment on your SOLR setup, but it is plausible that SOLR is
stopping at query-time and leaving the stop words in your index to be found
by the vectorizer.  Perhaps Grant can comment more authoritatively on how
SOLR works.

On Sat, Jan 2, 2010 at 6:31 PM, Bogdan Vatkov <[email protected]>wrote:

> I am still not an expert in reading from Lucene index - is it possible that
> the Vector generation uses some "raw" reading of the Solr/Lucene index and
> thus getting the stopwords?
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to