CRFs are definitely cleaner than the maxent-markov models (MEMMs) we've been using, since CRFs provide a globally normalized probability of the label sequence (rather than finding the best path through a sequence of locally normalized predictions). They do consistently provide better accuracy than MEMMs, but until recently that has come at significant increase in training complexity. I say until recently, because Andrew McCallum and co have been coming up with new MCMC based methods for estimating CRFs, which you can check out with http://code.google.com/p/factorie/. So:
- using undirected graphical models in general would be a good thing - but I don't think we should implement that in OpenNLP right now - instead, see whether Factorie can be used in some way, or base implementations on the newer inference procedures - and, there is plenty of room for improvements with better use of noisy training sources that will probably outweigh investment in things like CRFs, at least for the near future The perceptron sequence taggers have the same basic properties as the MEMMs, in that they are locally normalized, but do have a simpler parameter estimation than maxent. Perceptrons are simple and beautiful, and usually perform only a bit worse than maxent models, so there is plenty of reason to use them throughout the system. Other things that could make a difference would be allowing EM training for certain situations, e.g. building text classifiers that have some unlabeled data and can bootstrap on that. Finally, I'm also very interested in label propagation approaches these days. They are useful in a wide variety of contexts, and they can scale easily. I just helped Partha Talukdar release the Junto Label Propagation Toolkit, an ASL licensed version of the code base he created for his PhD at UPenn. If anyone is interested, you can get it here: http://code.google.com/p/junto/ I'm already using this in connection with models trained with OpenNLP, e.g. an initial OpenNLP model is used to seed the label distributions in the graph, then run label propagation. In general, Olivier is absolutely correct that there is much to be gained by using better features/representations. Jason On Tue, Jan 25, 2011 at 5:51 PM, Olivier Grisel <[email protected]>wrote: > 2011/1/25 Jörn Kottmann <[email protected]>: > > What do you think about CRFs, Jason ? > > Is that something you believe would be valuable in OpenNLP ? > > > > Since you are working a lot with research it would be nice if you > > could once in a while point out to us what you think would > > make sense to put into OpenNLP. Like you did with the bloom filters. > > Before implementing CRFs, I think the low hanging fruit would be to > generalize the use of perceptrons (e.g. for NER models) and to > introduce smarter features: non local features where the results of > the first pass of a model trained using only local features is used to > train a second model that uses both the output of the first model and > the initial local features, see e.g.: > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.80.4 > > Perceptron models (or more generaly linear models trained with SGD > such as the regularized logistic regression implemented in mahout) are > really fast to train, can scale to huge dataset (no need to load all > the data in memory, esp if you hash the features such as done in > Vowpal Wabbit and Mahout with murmurhash) and simple to implement and > debug. > > I would also be worth experimenting with richer word representations, > see e.g.: A preliminary evaluation of word representations for > named-entity recognition by Turian et al. > > > http://www.iro.umontreal.ca/~lisa/pointeurs/wordrepresentations-ner.pdf<http://www.iro.umontreal.ca/%7Elisa/pointeurs/wordrepresentations-ner.pdf> > > My 2 cents, > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com
