On 2/1/11 10:45 PM, Grant Ingersoll wrote:
Yes, we should start assembling a list of corpora, even so we at least have it 
for others that come later and want to reproduce them.  In the meantime, I 
would agree that we can just keep the models elsewhere.  We don't have to 
provide models.  They are a convenience for all involved, but not a requirement 
in order to run.  I wonder how many people actually train there own.  (BTW, we 
should update our website to point to older models, too.  They are really hard 
to find unless you do some URL rewriting.)

OK, then lets get out the release as quickly as possible without depending on the legal issues for the models And lets do as much as possible to resolve these issues, just next to the release work. I might have a
few spare cycles here and there to work on that.

To get started with the legal stuff we need to compile a list with all the necessary information,
that list will make a nice corpora page in our wiki.
Our documentation already contains instructions on how to train on some freely available data.

In the end I believe we are all best served with a wikinews corpus which can be labeled by our community.

Jörn

Reply via email to