Re: [Ferret-talk] not understanding search results

Marvin Humphrey Sat, 31 Mar 2007 10:47:34 -0800

On Mar 31, 2007, at 10:41 AM, Andreas Korth wrote:

> @David: You should probably consider changing StandardAnalyzer not to
> use stop words by default. It confuses people because no one would
> suspect such a feature to be enabled by default. It just doesn't
> follow the principle of least astonishment.
>
> Even if people want to use stop words, they might not be happy with
> the ones built into Ferret. It very much depends on the nature of the
> content that is indexed and instead of using a one-size-fit-all stop
> word list one is usually better off with compiling a custom one for
> any particular application.


I concur.  Ferret's StandardAnalyzer is based upon Lucene's class of  
the same name, so some parallelism would be lost, but I think  
omitting stop lists is better nonetheless.

There are performance and disk-space implications for avoiding stop  
lists by default.  However, disk space is cheap, Ferret is fast, and  
search results are slightly better when you avoid stop lists (e.g.  
searching for '"the who"' actually returns something).  Users with  
large deployments will be able to trade away some amount of IR  
precision for increased performance by enabling stop lists if they so  
choose.

KinoSearch doesn't have a StandardAnalyzer; a class called  
PolyAnalyzer fills that role.  By default, it performs lowercasing,  
tokenizing and stemming -- but no stopalizing.  <http:// 
www.rectangular.com/kinosearch/docs/devel/KinoSearch/Analysis/ 
PolyAnalyzer.html>

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] not understanding search results

Reply via email to