Tommaso,

I have created the Jira issue:
https://issues.apache.org/jira/browse/OPENNLP-777

The details of the Java version compatibility and the classifier's
internals are as follows:

"Implementation details: We have a production-hardened piece of Java code
for a multinomial Naive Bayesian classifier (with default Laplace
smoothing) that we'd like to contribute. The code is Java 1.5 compatible.
I'd have to write an adapter to make the interface compatible with the ME
classifier in OpenNLP. I expect the patch to be available 1 to 3 weeks from
now."

This is the default configuration but the code is well-refactored and you
can actually plug in any smoothing algorithm and any feature set. It also
has some support for succinct memory models, and I later plan to add a
multivariate bernoulli implementation as well (I wanted to start with the
multinomial version because the advantages of the multinomial model will
make it the better performer for most NLP projects).

I could not figure out how to assign the issue to myself.  The patch will
be available 1 to 3 weeks from now.

Thanks and regards,

Cohan Sujay Carlos


On Tue, May 19, 2015 at 5:26 PM, Tommaso Teofili <[email protected]>
wrote:

> Hi Cohan,
>
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
>
> Thanks and regards,
> Tommaso
>
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos <[email protected]>:
>
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement
> one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled
> training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for
> a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India
> > +91-77605-80015 +91-80-4125-0730
> >
>

Reply via email to