Pluggable Machine Learning support

Jörn Kottmann Thu, 30 May 2013 08:00:25 -0700

Hi all,

we spoke about it here and there already, to ensure that OpenNLP canstay competitive with other NLP libraries I am proposing to make themachine learning pluggable.

The extensions should not make it harder to use OpenNLP, if a user loadsa model OpenNLP should be capable of setting up everything by itselfwithout forcing the user to write custom integration code based on theml implementation.We solved this problem already with the extension mechanism, we build tosupport the customization of our components, I suggest that we reusethis extension mechanism to load a ml implementation. To use a custom mlimplementation the user has to specify the class name of the factory inthe Algorithm field of the params file. The params file is availableduring training and tagging time.

Most components in the tools package use the maxent library to doclassification. The Java interfaces for this are currently located inthe maxent package, to be able to swap the implementation the interfacesshould be defined inside the tools package. To make things easier Ipropose to move the maxent and perceptron implemention as well.

Through the code base we use the AbstractModel, thats a bit unluckybecause the only reason for this is the lack of model serializationsupport in the MaxentModel interface, a serialization method should beadded to it, and maybe renamed to ClassificationModel. This will

break backward compatibility in non-standard use cases.

To be able to test the extension mechanism I suggest that we implementan addon which integrates liblinear and the Apache Mahout classifiers.

There are still a few deprecated 1.4 constructors and methods in OpenNLPwhich directly reference interfaces and classes in the maxent library,these need to be removed, to be able to move the interfaces to the toolspackage.


Any opinions?

Jörn

Pluggable Machine Learning support

Reply via email to