On Fri, Aug 12, 2011 at 10:46 AM, Jörn Kottmann <[email protected]> wrote:
> On 8/12/11 3:28 PM, [email protected] wrote: > >> If you know the tags which are causing trouble you might just want to >> remove >> >>> > all >>> > tokens from your dictionary which contain them. Removing a few words >>> will >>> > not >>> > make a big difference in accuracy anyway. >>> > >>> >> Doing it during training is not a good idea? I thought it would help other >> people. >> >> >> > No, I don't think so, because it makes it difficult to understand what > is going on and with the current system you really need enough training > data to cover all the tags. > If one tag is only mentioned 5 or 6 times I doubt that an an accurate > detection > is possible. > > As said before it might be possible to create a POS Tagger which can deal > better > with less training data, but the one we have right now seems to have it > limits when > you want to use a tag dict. > > Jörn > Thanks Jörn, I'm trying the suggested to improve my pos tagger. Now back to the misclassified report interface. I could not find a good design for it because I could not take advantage of the sample classes, so what I proposed was 3 methods to handle different methods: // for the sentence detector void missclassified(Span references[], Span predictions[], String referenceSample, String predictedSample, String sentence) // for namefinder, chunker... void missclassified(Span references[], Span predictions[], String referenceSample, String predictedSample, String[] sentenceTokens) // for pos tagger void missclassified(String references[], String predictions[], String referenceSample, String predictedSample, String[] sentenceTokens) Can you help me with a better design?
