Grant Ingersoll wrote:
OK, so how do we get this started? Seems like there are a lot of collections out there we could use. Also, we can crawl. Seems the tricky part is getting judgments.

I think we should establish first what kind of relevance judgments we want to collect:

1. given a corpus, and a query, define an ordered list of top-N documents that are relevant to the query. This is our baseline. Getting this sort of information is very time-consuming and subjective.

2. given a corpus, a query and a list of top-N results obtained from a real search, define what results are relevant and how they should be ordered. The reviewed list of top-N results becomes then the initial approximation of our baseline. Calculate a distance metric between real and reviewed result, and adjust ranking to maximize this metric.

The second scenario could be handled by a webapp, which could present the following areas of functionality:

* corpus selection and browsing

* searching using selected search impl and its ranking parameters, and storing tuples of <corpus, impl, query, results>

* review of the results (marking relevant / non-relevant, reordering), and saving of tuples <corpus, impl, query, reviewed results>

* calculation of distance metrics.

* adjustment of ranking parameters for a given search implementation.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to