+1 to doing this. I already removed that from Chalk for similar reasons. Also, the best way to do coreference these days is to build on the rule-based sieve approach given in this paper:
http://www.mitpressjournals.org/doi/abs/10.1162/COLI_a_00152 -Jason On Wed, Apr 17, 2013 at 4:31 PM, Jörn Kottmann <kottm...@gmail.com> wrote: > Hi all, > > I am proposing that we move the coref component into the sandbox until we > manage > to train and test it on a publicly available dataset. In the current state > it is complicated to maintain the > code because without training it can't be tested properly, which makes > bigger changes on OpenNLP > difficult, for example the maxent refactoring. > > I tried to implement parsers for the MUC corpus and added training code, > but it does not yet work as > well as the current models on SourceForge. More work is needed to get > everything fixed. > > Additionally the code should be refactored like the other components in > OpenNLP, > e.g. one model instantiation, build in evaluation, simple training, etc. > There is a jira issue with > all the details. > > Any opinions? > > Jörn > -- Jason Baldridge Associate Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge