2011/1/25 Jörn Kottmann <[email protected]>:
> What do you think about CRFs, Jason ?
> Is that something you believe would be valuable in OpenNLP ?
>
> Since you are working a lot with research it would be nice if you
> could once in a while point out to us what you think would
> make sense to put into OpenNLP. Like you did with the bloom filters.

Before implementing CRFs, I think the low hanging fruit would be to
generalize the use of perceptrons (e.g. for NER models) and to
introduce smarter features: non local features where the results of
the first pass of a model trained using only local features is used to
train a second model that uses both the output of the first model and
the initial local features, see e.g.:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.80.4

Perceptron models (or more generaly linear models trained with SGD
such as the regularized logistic regression implemented in mahout) are
really fast to train, can scale to huge dataset (no need to load all
the data in memory, esp if you hash the features such as done in
Vowpal Wabbit and Mahout with murmurhash) and simple to implement and
debug.

I would also be worth experimenting with richer word representations,
see e.g.: A preliminary evaluation of word representations for
named-entity recognition by Turian et al.

  http://www.iro.umontreal.ca/~lisa/pointeurs/wordrepresentations-ner.pdf

My 2 cents,

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Reply via email to