Marvin Humphrey wrote:
Greets,
I'm looking for a test corpus to use for some benchmarking and parsing
tests. I can whip one up myself, but it would be nice to use
something standardized. I'd like something that doesn't require a
license/fee, so that other people can run the same tests. At least
1000 docs, a few hundred words each. Any suggestions?
20 newsgroups or the old Reuters corpus are freely available, and
contain sufficient number of documents.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]