On Aug 1, 2009, at 2:27 PM, Simon Willnauer wrote:
On Sat, Aug 1, 2009 at 1:23 AM, Andrzej Bialecki<[email protected]> wrote:
Grant Ingersoll wrote:
OK, so how do we get this started? Seems like there are a lot of
collections out there we could use. Also, we can crawl. Seems
the tricky
part is getting judgments.
I think we should establish first what kind of relevance judgments
we want
to collect:
This looks like two different things.
One thing is deciding what we use to get "a" collection of documents -
a corpus. It seems to be a very good idea to me to create a
heterogeneous collection of documents such as wikipedia to kick off
ORP. I guess we do not need a huge collection of documents to get
started, right?!
@Grant: I might have missed something but have we a list of available
collections on some wiki page?! Would be great to have something like
that.
Not yet, we have some on Mahout