+1 to add pluggable machine learning algorithms +1 to improve the API and remove deprecated methods in 1.6.0
You can assign related Jira issues to me and I will be glad to help. On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <[email protected]> wrote: > Hi all, > > we spoke about it here and there already, to ensure that OpenNLP can stay > competitive with other NLP libraries I am proposing to make the machine > learning pluggable. > > The extensions should not make it harder to use OpenNLP, if a user loads a > model OpenNLP should be capable of setting up everything by itself without > forcing the user to write custom integration code based on the ml > implementation. > We solved this problem already with the extension mechanism, we build to > support the customization of our components, I suggest that we reuse this > extension mechanism to load a ml implementation. To use a custom ml > implementation the user has to specify the class name of the factory in the > Algorithm field of the params file. The params file is available during > training and tagging time. > > Most components in the tools package use the maxent library to do > classification. The Java interfaces for this are currently located in the > maxent package, to be able to swap the implementation the interfaces should > be defined inside the tools package. To make things easier I propose to > move the maxent and perceptron implemention as well. > > Through the code base we use the AbstractModel, thats a bit unlucky > because the only reason for this is the lack of model serialization support > in the MaxentModel interface, a serialization method should be added to it, > and maybe renamed to ClassificationModel. This will > break backward compatibility in non-standard use cases. > > To be able to test the extension mechanism I suggest that we implement an > addon which integrates liblinear and the Apache Mahout classifiers. > > There are still a few deprecated 1.4 constructors and methods in OpenNLP > which directly reference interfaces and classes in the maxent library, > these need to be removed, to be able to move the interfaces to the tools > package. > > Any opinions? > > Jörn >
