Hi all,
I am proposing that we move the coref component into the sandbox until
we manage
to train and test it on a publicly available dataset. In the current
state it is complicated to maintain the
code because without training it can't be tested properly, which makes
bigger changes on OpenNLP
difficult, for example the maxent refactoring.
I tried to implement parsers for the MUC corpus and added training code,
but it does not yet work as
well as the current models on SourceForge. More work is needed to get
everything fixed.
Additionally the code should be refactored like the other components in
OpenNLP,
e.g. one model instantiation, build in evaluation, simple training, etc.
There is a jira issue with
all the details.
Any opinions?
Jörn