Marvin Humphrey wrote:
Greets,

I'm looking for a test corpus to use for some benchmarking and parsing tests. I can whip one up myself, but it would be nice to use something standardized. I'd like something that doesn't require a license/fee, so that other people can run the same tests. At least 1000 docs, a few hundred words each. Any suggestions?

20 newsgroups or the old Reuters corpus are freely available, and contain sufficient number of documents.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to