Re: homework that acts as OpenNLP tutorial, sort of

Jörn Kottmann Wed, 27 Apr 2011 14:05:50 -0700

On 4/27/11 9:04 PM, Chris Collins wrote:

1) I can understand you cannot distribute the original training set for english 
etc because of perhaps distribution rights.  Knowing where or at least the 
flavor of where the original corpus came from would be nice.  What type of 
people and how many people were used in labeling the data and how much of it 
would be useful in determining if we are off.

This is actually on my to do list. We need to create a wiki page or soto document the trainingdata the english models have been trained on. All the other models aremostly trained on public data.

2) What are the planned models, are there any existing open source projects 
that want help on these exercises?

There are no plans from my side, if you know of a public corpus, youwould like to train OpenNLP onwe are happy to add native support for it, like we did for a couple ofcorpora already.

3) I see that with 1.5 there seems to be better support for taking training 
sets from other file formats.  What are the motivations?  Is it so that ONLP 
can take advantage of existing training sets that will help with 2) or is it 
generally to help the community interoperate better?

Form my side the main motivation was to have data sets people can testOpenNLP on, if someone

wants to contribute something he can now at least test the modification.

Another motivation is that the more languages and corpora we support themore people are interested

in working on and with OpenNLP.

BTW, we had a discussion here to start a wikinews (and also wikipedia)content based corpus project,

maybe you would be interested in helping with that.

Jörn

Re: homework that acts as OpenNLP tutorial, sort of

Reply via email to