Yes I think it’s critical that we also distribute models and have e.g., things like brew packages and so forth so they are install to install. Imagine:
# brew install opennlp —with-models I’ll start working on that. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Joern Kottmann <kottm...@gmail.com> Reply-To: <dev@opennlp.apache.org> Date: Thursday, November 12, 2015 at 5:22 PM To: <dev@opennlp.apache.org> Subject: Re: Question about OpenNLP and comparison to e.g., NTLK, Stanford NER, etc. >On Thu, 2015-11-12 at 15:43 +0000, Russ, Daniel (NIH/CIT) [E] wrote: >> 1) I use the old sourceforge models. I find that the source of error >> in my analysis are usually not do to mistakes in sentence detection or >> POS tagging. I don’t have the annotated data or the time/money to >> build custom models. Yes, the text I analyze is quite different than >> the (WSJ? or what corpus was used to build the models), but it is good >> enough. > >That is interesting, wasn't aware of that those are still useful. > >It really depends on the component as well, I was mostly thinking about >the name finder models when I wrote that. > >Do you only use the Sentence Detector, Tokenizer and POS tagger? > >You could use OntoNotes (almost for free) to train models. Maybe we >should look into distributing models trained on OntoNotes. > >Jörn >