Before starting to make significant changes just based on the numbers, it
would be very useful to see the system output side by side with the gold
annotations to see if there are clear patterns in the errors.

-Jason

On Thu, Sep 1, 2011 at 3:41 PM, Jörn Kottmann <[email protected]> wrote:

> On 9/1/11 4:50 PM, [email protected] wrote:
>
>> Maybe you need some language specific features. I just evaluated the
>> Portuguese proper name finder with the default OpenNLP features and got
>> the
>> following:
>>
>>
>> Evaluated 56994 samples with 26462 entities; found: 26623 entities;
>> correct:
>> 23077.
>>        TOTAL: precision:   86,68%;  recall:   87,21%; F1:   86,94%.
>>         prop: precision:   86,68%;  recall:   87,21%; F1:   86,94%.
>> [target:
>> 26462; tp: 23077; fp: 3546]
>>
>> A friend of mine is working directly with Maxent and got better results
>> because he is using specific features he developed for Portuguese. But it
>> is
>> really difficult to tune it.
>>
>
> I am still not sure how the feature generation should be modified, these
> papers
> suggest that using prefix and suffix features help. And we already have
> such feature
> generators, when I use these the recall goes up a little and the precision.
> I got now 85% precision, and 44% recall, but I still would like to get a
> much higher
> recall some where in the range of 70% or even 80%.
>
> Some also use trigger words, not sure if that helps much, or other
> dictionaries.
> Maybe compound noun splitting helps, not sure.
>
> Or should I try to use a topic model, like they do in more modern NERs?
>
> Jörn
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Reply via email to