On Fri, 03 Dec 2010 Thilo Goetz <[email protected]> wrote: > Exactly. I would really like to create some training data > that's under a permissive license. It's surprising how > little of that there is in general. It would be a lot of > work, but maybe we'll find help.
If you are interested in gathering training data sets under permissive licences you may want to have a look into the Lucene Open Relevance project: http://lucene.apache.org/openrelevance/ From the project explanation: "The Open Relevance Project (ORP) is a new Apache Lucene sub-project aimed at making materials for doing relevance testing for Information Retrieval (IR), Machine Learning and Natural Language Processing (NLP) into open source." Isabel
