I don't see any issue. People that uses Maxent directly would need to
change how they use it, but that is OK for a major release.




On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <[email protected]> wrote:

> Are there any objections to move the maxent/perceptron classes to an
> opennlp.tools.ml
> package as part of this issue? Moving the things would avoid a second
> interface layer and
> probably make using OpenNLP Tools a bit easier, because then we are down
> to a single jar.
>
> Jörn
>
>
> On 05/30/2013 08:57 PM, William Colen wrote:
>
>> +1 to add pluggable machine learning algorithms
>> +1 to improve the API and remove deprecated methods in 1.6.0
>>
>> You can assign related Jira issues to me and I will be glad to help.
>>
>>
>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <[email protected]>
>> wrote:
>>
>>  Hi all,
>>>
>>> we spoke about it here and there already, to ensure that OpenNLP can stay
>>> competitive with other NLP libraries I am proposing to make the machine
>>> learning pluggable.
>>>
>>> The extensions should not make it harder to use OpenNLP, if a user loads
>>> a
>>> model OpenNLP should be capable of setting up everything by itself
>>> without
>>> forcing the user to write custom integration code based on the ml
>>> implementation.
>>> We solved this problem already with the extension mechanism, we build to
>>> support the customization of our components, I suggest that we reuse this
>>> extension mechanism to load a ml implementation. To use a custom ml
>>> implementation the user has to specify the class name of the factory in
>>> the
>>> Algorithm field of the params file. The params file is available during
>>> training and tagging time.
>>>
>>> Most components in the tools package use the maxent library to do
>>> classification. The Java interfaces for this are currently located in the
>>> maxent package, to be able to swap the implementation the interfaces
>>> should
>>> be defined inside the tools package. To make things easier I propose to
>>> move the maxent and perceptron implemention as well.
>>>
>>> Through the code base we use the AbstractModel, thats a bit unlucky
>>> because the only reason for this is the lack of model serialization
>>> support
>>> in the MaxentModel interface, a serialization method should be added to
>>> it,
>>> and maybe renamed to ClassificationModel. This will
>>> break backward compatibility in non-standard use cases.
>>>
>>> To be able to test the extension mechanism I suggest that we implement an
>>> addon which integrates liblinear and the Apache Mahout classifiers.
>>>
>>> There are still a few deprecated 1.4 constructors and methods in OpenNLP
>>> which directly reference interfaces and classes in the maxent library,
>>> these need to be removed, to be able to move the interfaces to the tools
>>> package.
>>>
>>> Any opinions?
>>>
>>> Jörn
>>>
>>>
>

Reply via email to