On 19.01.2007, at 20:47, William Morgan wrote: > Perhaps I'm wrong; I've never verified it empirically. I'm of the > opinion that the whole concept of stopwords is a relic of 1970's > technology and the TREC ad-hoc query paradigm, neither of which are > particularly relevant for modern-day web search, so I typically turn > them off.
Could you elaborate on that, please? What exactly has changed since the 70's which isn't relevant any more and what is the TREC ad-hoc query paradigm anyway? My understanding is that stop words reduce the size of the index (and hence speed up queries) by filtering out words that occur frequently in almost any text of considerable length. Isn't it even worse if you store term vectors? I'd turn off stop words right away if there wasn't any considerable impact on performance, but I'd like to have a little more information on that. I'd appreciate if you could give some pointers. Thanks! Andy _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

