Hi,

Turns out that it was something easy to do.
I created a class TokenTag to hold a token and its postag. Then I changed
the Featurizer to work with BeamSearch<TokenTag> and
SenquenceValidator<TokenTag>. With this change we can access the token and
its postag from inside the sequence validator.

For now I am only validating the features using a tag dictionary.

The accuracy now in a 10-fold cross-validation using the brazilian corpus
is 97.142%.

The accuracy should increase if I modify the evaluator: if the Featurizer
selects, for example, male as the gender of a token, but according to the
corpus it has two genders, the evaluator considers it as an error.

Thank you,
William


On Thu, Feb 2, 2012 at 2:13 AM, William Colen <[email protected]> wrote:

> Hi,
>
> I am trying to develop an OpenNLP based learnable featurizer. It can
> attach tags like gender, number, mood, person and verb tense. The input is
> the sentence tokens and the POS Tags.
> The context generator I am using is based on the one from Chunker, plus
> some prefix and suffix features.
>
> The current accuracy is 95,395%, but I think I can improve it using a
> sequence validator.
>
> Question:
> Is it possible to create a sequence validator that, besides the tokens,
> also knows the POS Tags? I would like to check if the combination POS Tag +
> features is OK (tense tags only for verbs for example).
>
> Thank you in advance. If it works, and you think it is a good tool, I will
> contribute the featurizer to OpenNLP.
>
> William
>
>
>
>
>

Reply via email to