Re: Getting Started

Andrzej Bialecki Fri, 31 Jul 2009 16:24:02 -0700

Grant Ingersoll wrote:

OK, so how do we get this started? Seems like there are a lot ofcollections out there we could use. Also, we can crawl. Seems thetricky part is getting judgments.

I think we should establish first what kind of relevance judgments wewant to collect:

1. given a corpus, and a query, define an ordered list of top-Ndocuments that are relevant to the query. This is our baseline. Gettingthis sort of information is very time-consuming and subjective.

2. given a corpus, a query and a list of top-N results obtained from areal search, define what results are relevant and how they should beordered. The reviewed list of top-N results becomes then the initialapproximation of our baseline. Calculate a distance metric between realand reviewed result, and adjust ranking to maximize this metric.

The second scenario could be handled by a webapp, which could presentthe following areas of functionality:


* corpus selection and browsing

* searching using selected search impl and its ranking parameters, andstoring tuples of <corpus, impl, query, results>

* review of the results (marking relevant / non-relevant, reordering),and saving of tuples <corpus, impl, query, reviewed results>


* calculation of distance metrics.

* adjustment of ranking parameters for a given search implementation.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Getting Started

Reply via email to