FYI, Mike wrote some evaluation stuff for Nutch a long time ago. I found it in the Sourceforge Attic:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/quality/Attic/

This worked by querying a set of search engines, those in:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/engines/

The results of each engine is scored by how much they differ from all of the other engines combined. The Kendall Tau distance is used to compare rankings. Thus this is a good tool to find out how close Nutch is to the quality of other engines, but it may not not be a good tool to make Nutch better than other search engines.

In any case, it includes a system to scrape search results from other engines, based on Apple's Sherlock search-engine descriptors. These descriptors are also used by Mozilla:

http://mycroft.mozdev.org/deepdocs/quickstart.html

So there's a ready supply of up-to-date descriptions for most major search engines. Many engines provide a skin specifically to simplify parsing by these plugins.

The code that implemented Sherlock plugins in Nutch is at:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/quality/dynamic/

Doug

Andrzej Bialecki wrote:
Hi,

I found this paper, more or less by accident:

"Scaling IR-System Evaluation using Term Relevance Sets"; Einat Amitay, David Carmel, Ronny Lempel, Aya Soffer

   http://einat.webir.org/SIGIR_2004_Trels_p10-amitay.pdf

It gives an interesting and rather simple framework for evaluating the quality of search results.

Anybody interested in hacking together a component for Nutch and e.g. for Google, to run this evaluation? ;)



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to