Re: Search quality evaluation

Doug Cutting Wed, 05 Apr 2006 10:45:30 -0700

FYI, Mike wrote some evaluation stuff for Nutch a long time ago. Ifound it in the Sourceforge Attic:


http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/quality/Attic/


This worked by querying a set of search engines, those in:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/engines/

The results of each engine is scored by how much they differ from all ofthe other engines combined. The Kendall Tau distance is used to comparerankings. Thus this is a good tool to find out how close Nutch is tothe quality of other engines, but it may not not be a good tool to makeNutch better than other search engines.

In any case, it includes a system to scrape search results from otherengines, based on Apple's Sherlock search-engine descriptors. Thesedescriptors are also used by Mozilla:


http://mycroft.mozdev.org/deepdocs/quickstart.html

So there's a ready supply of up-to-date descriptions for most majorsearch engines. Many engines provide a skin specifically to simplifyparsing by these plugins.


The code that implemented Sherlock plugins in Nutch is at:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/quality/dynamic/

Doug

Andrzej Bialecki wrote:

Hi,

I found this paper, more or less by accident:
"Scaling IR-System Evaluation using Term Relevance Sets"; Einat Amitay,David Carmel, Ronny Lempel, Aya Soffer
   http://einat.webir.org/SIGIR_2004_Trels_p10-amitay.pdf
It gives an interesting and rather simple framework for evaluating thequality of search results.
Anybody interested in hacking together a component for Nutch and e.g.for Google, to run this evaluation? ;)

Re: Search quality evaluation

Reply via email to