2011/1/25 Jörn Kottmann <[email protected]>: > What do you think about CRFs, Jason ? > Is that something you believe would be valuable in OpenNLP ? > > Since you are working a lot with research it would be nice if you > could once in a while point out to us what you think would > make sense to put into OpenNLP. Like you did with the bloom filters.
Before implementing CRFs, I think the low hanging fruit would be to generalize the use of perceptrons (e.g. for NER models) and to introduce smarter features: non local features where the results of the first pass of a model trained using only local features is used to train a second model that uses both the output of the first model and the initial local features, see e.g.: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.80.4 Perceptron models (or more generaly linear models trained with SGD such as the regularized logistic regression implemented in mahout) are really fast to train, can scale to huge dataset (no need to load all the data in memory, esp if you hash the features such as done in Vowpal Wabbit and Mahout with murmurhash) and simple to implement and debug. I would also be worth experimenting with richer word representations, see e.g.: A preliminary evaluation of word representations for named-entity recognition by Turian et al. http://www.iro.umontreal.ca/~lisa/pointeurs/wordrepresentations-ner.pdf My 2 cents, -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
