I suspected this might be the case. What about the tools used to generate the model? Are those freely available or part of OpenNLP?
I tried searching through OpenNLP's codebase, but I'm still new to it, so I'm not really sure what I'm looking for. Regards, Chris On Mon, Feb 14, 2011 at 5:58 PM, James Kosin <[email protected]> wrote: > Chris, > > Unfortunately, most... if not all, of the training data is not FREE or > openly available due to copyright. If you would like to start a group > to engage in collecting non-copyrighted text and parse the data by hand > you are more than welcome and encouraged to do so. > Jorn or Jason may have a more complete set of training data and could > help if you pass on your samples. > > James > > On 2/13/2011 11:03 PM, Chris Spencer wrote: >> Where would we download the source data and tools used to generate the >> pretrained models available at >> http://opennlp.sourceforge.net/models-1.5/, specifically for the >> English Treebank Parser? >> >> I have a large corpus of hand-corrected sentence/parse-tree pairs, as >> well as an extended lexicon, and I'd like to incorporate these into >> the training data and retrain a new parser better fitted for my >> domain. >> >> Regards, >> Chris > >
