On 9/1/11 4:50 PM, [email protected] wrote:
Maybe you need some language specific features. I just evaluated the
Portuguese proper name finder with the default OpenNLP features and got the
following:
Evaluated 56994 samples with 26462 entities; found: 26623 entities; correct:
23077.
TOTAL: precision: 86,68%; recall: 87,21%; F1: 86,94%.
prop: precision: 86,68%; recall: 87,21%; F1: 86,94%. [target:
26462; tp: 23077; fp: 3546]
A friend of mine is working directly with Maxent and got better results
because he is using specific features he developed for Portuguese. But it is
really difficult to tune it.
I am still not sure how the feature generation should be modified, these
papers
suggest that using prefix and suffix features help. And we already have
such feature
generators, when I use these the recall goes up a little and the precision.
I got now 85% precision, and 44% recall, but I still would like to get a
much higher
recall some where in the range of 70% or even 80%.
Some also use trigger words, not sure if that helps much, or other
dictionaries.
Maybe compound noun splitting helps, not sure.
Or should I try to use a topic model, like they do in more modern NERs?
Jörn