Grant Ingersoll wrote:
OK, so how do we get this started? Seems like there are a lot of
collections out there we could use. Also, we can crawl. Seems the
tricky part is getting judgments.
I think we should establish first what kind of relevance judgments we
want to collect:
1. given a corpus, and a query, define an ordered list of top-N
documents that are relevant to the query. This is our baseline. Getting
this sort of information is very time-consuming and subjective.
2. given a corpus, a query and a list of top-N results obtained from a
real search, define what results are relevant and how they should be
ordered. The reviewed list of top-N results becomes then the initial
approximation of our baseline. Calculate a distance metric between real
and reviewed result, and adjust ranking to maximize this metric.
The second scenario could be handled by a webapp, which could present
the following areas of functionality:
* corpus selection and browsing
* searching using selected search impl and its ranking parameters, and
storing tuples of <corpus, impl, query, results>
* review of the results (marking relevant / non-relevant, reordering),
and saving of tuples <corpus, impl, query, reviewed results>
* calculation of distance metrics.
* adjustment of ranking parameters for a given search implementation.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com