big +1! Tommaso
2013/5/31 William Colen <[email protected]> > I don't see any issue. People that uses Maxent directly would need to > change how they use it, but that is OK for a major release. > > > > > On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <[email protected]> wrote: > > > Are there any objections to move the maxent/perceptron classes to an > > opennlp.tools.ml > > package as part of this issue? Moving the things would avoid a second > > interface layer and > > probably make using OpenNLP Tools a bit easier, because then we are down > > to a single jar. > > > > Jörn > > > > > > On 05/30/2013 08:57 PM, William Colen wrote: > > > >> +1 to add pluggable machine learning algorithms > >> +1 to improve the API and remove deprecated methods in 1.6.0 > >> > >> You can assign related Jira issues to me and I will be glad to help. > >> > >> > >> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <[email protected]> > >> wrote: > >> > >> Hi all, > >>> > >>> we spoke about it here and there already, to ensure that OpenNLP can > stay > >>> competitive with other NLP libraries I am proposing to make the machine > >>> learning pluggable. > >>> > >>> The extensions should not make it harder to use OpenNLP, if a user > loads > >>> a > >>> model OpenNLP should be capable of setting up everything by itself > >>> without > >>> forcing the user to write custom integration code based on the ml > >>> implementation. > >>> We solved this problem already with the extension mechanism, we build > to > >>> support the customization of our components, I suggest that we reuse > this > >>> extension mechanism to load a ml implementation. To use a custom ml > >>> implementation the user has to specify the class name of the factory in > >>> the > >>> Algorithm field of the params file. The params file is available during > >>> training and tagging time. > >>> > >>> Most components in the tools package use the maxent library to do > >>> classification. The Java interfaces for this are currently located in > the > >>> maxent package, to be able to swap the implementation the interfaces > >>> should > >>> be defined inside the tools package. To make things easier I propose to > >>> move the maxent and perceptron implemention as well. > >>> > >>> Through the code base we use the AbstractModel, thats a bit unlucky > >>> because the only reason for this is the lack of model serialization > >>> support > >>> in the MaxentModel interface, a serialization method should be added to > >>> it, > >>> and maybe renamed to ClassificationModel. This will > >>> break backward compatibility in non-standard use cases. > >>> > >>> To be able to test the extension mechanism I suggest that we implement > an > >>> addon which integrates liblinear and the Apache Mahout classifiers. > >>> > >>> There are still a few deprecated 1.4 constructors and methods in > OpenNLP > >>> which directly reference interfaces and classes in the maxent library, > >>> these need to be removed, to be able to move the interfaces to the > tools > >>> package. > >>> > >>> Any opinions? > >>> > >>> Jörn > >>> > >>> > > >
