On Fri, Aug 12, 2011 at 10:46 AM, Jörn Kottmann <[email protected]> wrote:

> On 8/12/11 3:28 PM, [email protected] wrote:
>
>> If you know the tags which are causing trouble you might just want to
>> remove
>>
>>> >  all
>>> >  tokens from your dictionary which contain them. Removing a few words
>>> will
>>> >  not
>>> >  make a big difference in accuracy anyway.
>>> >
>>>
>> Doing it during training is not a good idea? I thought it would help other
>> people.
>>
>>
>>
> No, I don't think so, because it makes it difficult to understand what
> is going on and with the current system you really need enough training
> data to cover all the tags.
> If one tag is only mentioned 5 or 6 times I doubt that an an accurate
> detection
> is possible.
>
> As said before it might be possible to create a POS Tagger which can deal
> better
> with less training data, but the one we have right now seems to have it
> limits when
> you want to use a tag dict.
>
> Jörn
>

Thanks Jörn, I'm trying the suggested to improve my pos tagger.

Now back to the misclassified report interface. I could not find a good
design for it because I could not take advantage of the sample classes, so
what I proposed was 3 methods to handle different methods:

// for the sentence detector
void missclassified(Span references[], Span predictions[], String
referenceSample, String predictedSample, String sentence)

// for namefinder, chunker...
void missclassified(Span references[], Span predictions[], String
referenceSample, String predictedSample, String[] sentenceTokens)

// for pos tagger
void missclassified(String references[], String predictions[], String
referenceSample, String predictedSample, String[] sentenceTokens)


Can you help me with a better design?

Reply via email to