Re: Test corpus
Marvin Humphrey wrote: Greets, I'm looking for a test corpus to use for some benchmarking and parsing tests. I can whip one up myself, but it would be nice to use something standardized. I'd like something that doesn't require a license/fee, so that other people can run the same tests. At least 1000 docs, a few hundred words each. Any suggestions? 20 newsgroups or the old Reuters corpus are freely available, and contain sufficient number of documents. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Test corpus
Take a look at Project Guttenberg: http://www.gutenberg.org/ Igor On 4/1/06, Pasha Bizhan <[EMAIL PROTECTED]> wrote: > > Hi, > > > From: Marvin Humphrey [mailto:[EMAIL PROTECTED] > > > I'm looking for a test corpus to use for some benchmarking > > and parsing tests. I can whip one up myself, but it would be > > nice to use something standardized. I'd like something that > > doesn't require a license/fee, so that other people can run > > the same tests. At least 1000 docs, a few hundred words > > each. Any suggestions? > > See Corpora section at http://wiki.apache.org/jakarta-lucene/Resources > > Pasha Bizhan > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
RE: Test corpus
Hi, > From: Marvin Humphrey [mailto:[EMAIL PROTECTED] > I'm looking for a test corpus to use for some benchmarking > and parsing tests. I can whip one up myself, but it would be > nice to use something standardized. I'd like something that > doesn't require a license/fee, so that other people can run > the same tests. At least 1000 docs, a few hundred words > each. Any suggestions? See Corpora section at http://wiki.apache.org/jakarta-lucene/Resources Pasha Bizhan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]