Hi, here are some models trained on Wikipedia data. They have similar performance. Is this useful?
https://gist.github.com/3291931 Peace. Michael On Fri, Jun 29, 2012 at 7:43 PM, Michael Schmitz <sch...@cs.washington.edu>wrote: > Well, if I find time, I'll run the models on an Apache-license dataset > and then train new models using the output. I'm sure this would be > safe from licensing issues and if we had any time, we could clean up > the annotations. > > Peace. Michael > > > On Sun, Jun 24, 2012 at 6:52 PM, Benson Margulies <bimargul...@gmail.com> > wrote: > > On Sun, Jun 24, 2012 at 9:48 PM, James Kosin <james.ko...@gmail.com> > wrote: > >> Hi Michael, > >> > >> Sorry about the late response to this. > >> > >> Yes, it is however they also restrict the distribution of the models as > >> well... I've already asked. The license allows us to use for research > >> purposes only and we are not allowed to redistribute the models. I've > >> already asked this to the person in charge of distributing the corpus. > >> > >> None of OpenNLP's models are based on this corpus as far as I know. All > >> the models are produced from different copyrights and limitations. > >> Apache license however, doesn't allow for binary only distribution with > >> no way of producing or reproducing from our own sources that must be > >> licensed under the Apache license. The best way we can do right now is > >> to distribute the sources and binaries for the java classes and work on > >> producing a corpus of our own from non-copyrighted text and distributed > >> those sources and models in Apache under the licensing from Apache. > > > > Also note that nothing stops someone else from distributing binary > > models outside of Apache. Anyone who wanted to pick up the corpora and > > reach their own conclusion about the legitimacy of open distribution > > of binary models could build these models and distribute them via > > OSSRH to maven central. Just so long as they respect ASF trademark > > policies in describing the models as, oh, 'useful with the Apache > > OpenNLP software library'. > > > > > > > >> > >> James > >> > >> On 6/12/2012 12:37 PM, Michael Schmitz wrote: > >>> Hi James, is this the contract? > >>> > >>> http://trec.nist.gov/data/reuters/org_appl_reuters_v4.html > >>> > >>> If so, I think you are free to license your derived models however you > >>> please although you may not redistribute the training data. > >>> > >>> What models does the Reuters contract apply to? > >>> > >>> Peace. Michael > >>> > >>> > >>> On Mon, Jun 11, 2012 at 7:23 PM, James Kosin <james.ko...@gmail.com> > wrote: > >>>> Michael, > >>>> > >>>> I only have the contract for the Reuters corpus I use and it > >>>> specifically prohibits use for anything other than educational or > >>>> research wise. Commercial applications violate the copyright and > >>>> contract terms. I'm sure many of the others are similar. This > includes > >>>> any trained models. > >>>> > >>>> James > >>>> > >>>> On 6/11/2012 1:45 PM, Michael Schmitz wrote: > >>>>> Are you sure the copyright applies to your trained model? Do you > have > >>>>> any information about the corpuses you used to train the models? > >>>>> > >>>>> Peace. Michael > >>>>> > >>>>> > >>>>> On Sat, Jun 9, 2012 at 3:44 PM, James Kosin <james.ko...@gmail.com> > wrote: > >>>>>> Michael, > >>>>>> > >>>>>> It is one of the things we are working on. The problem is most if > not > >>>>>> all the models are currently trained on copyrighted material that > >>>>>> restricts the usage of the resulting trained data to research > purposes ONLY. > >>>>>> We currently host the models on another site; due to this > limitation and > >>>>>> the licensing conflict that would result if we tried to host on > Apache. > >>>>>> > >>>>>> You are more than welcome to help, if you choose. > >>>>>> > >>>>>> James > >>>>>> > >>>>>> On 6/8/2012 6:55 PM, Michael Schmitz wrote: > >>>>>>> Hi, is there any interest in hosting the stock OpenNLP models in > Maven > >>>>>>> Central? I know that OpenNLP intends for users to train models on > >>>>>>> their particular corpus, but often it's useful to get started with > the > >>>>>>> stock models. > >>>>>>> > >>>>>>> I'm developing a common interface to some NLP toolkits in Scala and > >>>>>>> would like to include OpenNLP. I would like to use OpenNLP and > have > >>>>>>> use the stock models by default as a maven dependency. If I do > this, > >>>>>>> then I don't need to include the models with my artifact and I > don't > >>>>>>> need to keep the models in my git repository. More importantly, > users > >>>>>>> can exclude the stock models if they wish. > >>>>>>> > >>>>>>> What do you think? > >>>>>>> > >>>>>>> Peace. Michael > >>>> > >> > >> >