On 05/14/2013 11:07 PM, Benson Margulies wrote:
Folks,

I expected to see something like a feature generator; something that
looked at a structure and returned a set of feature activations.

I don't claim to have much expertise with MEMM, but I sure know one
end of a perceptron from another.

Looking, for example, at POSContextGenerator, what is the String[]
return value? Is it perhaps just a list of named active features? But
wouldn't you need a count for each one?

Yes, its a list of all named active features, if a feature is detected n times it occurs n times in the list. We started to work on a feature generation framework (opennlp.util.featuregen) to make the name finder adaptable, the original plan was to reuse this work for the POS Tagger and Chunker as well, but it has not been done yet.

Are you interested to experiment with your own feature generation? Its possible to implement a custom POSTaggerFactory which
can completely customize the feature generation.

At work I use a fork of OpenNLP where the feature generation for the name finder produces 64 bit hash features instead of Strings, this works quite a bit faster, and I will probably write up a proposal at some point and contribute the code, but currently I am limited time wise.

In OpenNLP we also have a perceptron, you can configure this via a params file you can pass in during training. Exchanging the classifier against your
own implementation is not yet possible, but will be in the next release.

HTH,
Jörn

Reply via email to