On 19.01.2007, at 20:47, William Morgan wrote:

> Perhaps I'm wrong; I've never verified it empirically. I'm of the
> opinion that the whole concept of stopwords is a relic of 1970's
> technology and the TREC ad-hoc query paradigm, neither of which are
> particularly relevant for modern-day web search, so I typically turn
> them off.

Could you elaborate on that, please? What exactly has changed since  
the 70's which isn't relevant any more and what is the TREC ad-hoc  
query paradigm anyway?

My understanding is that stop words reduce the size of the index (and  
hence speed up queries) by filtering out words that occur frequently  
in almost any text of considerable length. Isn't it even worse if you  
store term vectors?

I'd turn off stop words right away if there wasn't any considerable  
impact on performance, but I'd like to have a little more information  
on that. I'd appreciate if you could give some pointers.

Thanks!
Andy
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to