Hello,
your training data only contains tokens which are
the begin or a continuation of a name, but zero "other"
tokens.
If the name finder would be trained like this, it will always
estimate that these are the two only valid outcomes. That should
be possible actually (but maybe not useful).
I didn't look at the source code, but I guess the error is caused by
a bug in the outcome validating code. We should add your case
to the unit test and fix the problem
.
To work around the problem just add a few sentences to your training
data which contain normal plain text without names.
Please feel free to open a jira issue.
Thanks,
Jörn
On 12/8/10 8:24 PM, A. Allen wrote:
Hello,
Has anyone been able to train the name finder? I followed the instructions
in the wiki and used pieces of the sample code, but keep getting the
following:
Indexing events using cutoff of 5
Computing event counts... done. 29376 events
Indexing... done.
Sorting and merging events... done. Reduced 29376 events to 8313.
Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 8313
Number of Outcomes: 1
Number of Predicates: 11869
...done.
Computing model parameters...
Performing 100 iterations.
1: .. loglikelihood=0.0 1.0
2: .. loglikelihood=0.0 1.0
Exception in thread "main" java.lang.IllegalArgumentException: Model not
compatible with name finder!
at
opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
at NameTrainer.main(NameTrainer.java:21)
My training data looks like this:
<START:person>Neil Abercrombie<END>
<START:person>Anibal Acevedo-Vila<END>
<START:person>Gary Ackerman<END>
<START:person>Robert Aderholt<END>
<START:person>Daniel Akaka<END>
<START:person>Todd Akin<END>
<START:person>Lamar Alexander<END>
<START:person>Rodney Alexander<END>
I appreciate any help that can be provided . Thank you.
-AA