CRFs are definitely cleaner than the maxent-markov models (MEMMs) we've been
using, since CRFs provide a globally normalized probability of the label
sequence (rather than finding the best path through a sequence of locally
normalized predictions). They do consistently provide better accuracy than
MEMMs, but until recently that has come at significant increase in training
complexity. I say until recently, because Andrew McCallum and co have been
coming up with new MCMC based methods for estimating CRFs, which you can
check out with http://code.google.com/p/factorie/. So:

 - using undirected graphical models in general would be a good thing
 - but I don't think we should implement that in OpenNLP right now
 - instead, see whether Factorie can be used in some way, or base
implementations on the newer inference procedures
 - and, there is plenty of room for improvements with better use of noisy
training sources that will probably outweigh investment in things like CRFs,
at least for the near future

The perceptron sequence taggers have the same basic properties as the MEMMs,
in that they are locally normalized, but do have a simpler parameter
estimation than maxent. Perceptrons are simple and beautiful, and usually
perform only a bit worse than maxent models, so there is plenty of reason to
use them throughout the system.

Other things that could make a difference would be allowing EM training for
certain situations, e.g. building text classifiers that have some unlabeled
data and can bootstrap on that.

Finally, I'm also very interested in label propagation approaches these
days. They are useful in a wide variety of contexts, and they can scale
easily. I just helped Partha Talukdar release the Junto Label Propagation
Toolkit, an ASL licensed version of the code base he created for his PhD at
UPenn. If anyone is interested, you can get it here:

http://code.google.com/p/junto/

I'm already using this in connection with models trained with OpenNLP, e.g.
an initial OpenNLP model is used to seed the label distributions in the
graph, then run label propagation.

In general, Olivier is absolutely correct that there is much to be gained by
using better features/representations.

Jason


On Tue, Jan 25, 2011 at 5:51 PM, Olivier Grisel <[email protected]>wrote:

> 2011/1/25 Jörn Kottmann <[email protected]>:
> > What do you think about CRFs, Jason ?
> > Is that something you believe would be valuable in OpenNLP ?
> >
> > Since you are working a lot with research it would be nice if you
> > could once in a while point out to us what you think would
> > make sense to put into OpenNLP. Like you did with the bloom filters.
>
> Before implementing CRFs, I think the low hanging fruit would be to
> generalize the use of perceptrons (e.g. for NER models) and to
> introduce smarter features: non local features where the results of
> the first pass of a model trained using only local features is used to
> train a second model that uses both the output of the first model and
> the initial local features, see e.g.:
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.80.4
>
> Perceptron models (or more generaly linear models trained with SGD
> such as the regularized logistic regression implemented in mahout) are
> really fast to train, can scale to huge dataset (no need to load all
> the data in memory, esp if you hash the features such as done in
> Vowpal Wabbit and Mahout with murmurhash) and simple to implement and
> debug.
>
> I would also be worth experimenting with richer word representations,
> see e.g.: A preliminary evaluation of word representations for
> named-entity recognition by Turian et al.
>
>  
> http://www.iro.umontreal.ca/~lisa/pointeurs/wordrepresentations-ner.pdf<http://www.iro.umontreal.ca/%7Elisa/pointeurs/wordrepresentations-ner.pdf>
>
> My 2 cents,
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>



-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com

Reply via email to