[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-05-19 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550578#comment-14550578
 ] 

Joern Kottmann commented on OPENNLP-777:


Yes, that would be really nice to have in OpenNLP!

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
>Affects Versions: 1.6.0
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Fix For: 1.6.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-05-19 Thread Haider Ali (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551168#comment-14551168
 ] 

Haider Ali commented on OPENNLP-777:


i also wan to contribute to Naive Bayesian Classifier 

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
>Affects Versions: 1.6.0
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Fix For: 1.6.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632413#comment-14632413
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


In implementing the Naive Bayes classifier, we tried to ensure minimal 
disruption to existing code.

The only changes to existing code are as follows:

1.  The opennlp.tools.ml.model.AbstractModel class has been changed to include 
a new model type:

line 35:  public enum ModelType {Maxent,Perceptron,MaxentQn,NaiveBayes};

2.  The opennlp.tools.ml.model.GenericModelReader class has been changed in one 
place:

line 53:
else if (modelType.equals("NaiveBayes")) {
delegateModelReader = new NaiveBayesModelReader(this.dataReader);
}

3.  The opennlp.tools.ml.model.GenericModelWriter class has been changed in two 
places:

line 79:
if (model.getModelType() == ModelType.NaiveBayes) {
delegateWriter = new BinaryNaiveBayesModelWriter(model,dos);
}

line 91:
if (model.getModelType() == ModelType.NaiveBayes) {
delegateWriter = new PlainTextNaiveBayesModelWriter(model,bw);
}

4.  The initializer of the opennlp.tools.ml.TrainerFactory class has been 
changed in one place to add in the built-in Naive Bayes trainer:

line 51:
_trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE, NaiveBayesTrainer.class);

That was it!

We didn't change anything else in the existing OpenNLP code.

All the new code for the Naive Bayesian classifier sits in the package 
opennlp.tools.ml.naivebayes - just above the perceptron ;)

The code for the document categorizer using the Naive Bayesian classifier can 
be found in opennlp.tools.doccat (we didn't have to change any existing code).  
The new doccat is called opennlp.tools.doccat.DocumentCategorizerNB (reflecting 
the name of the maxent document categorizer, which is DocumentCategorizerME).

Proof of correctness!

I have included two testcases:

1.  A test to validate the document categorizer - under the tests folder, you 
will find opennlp.tools.doccat.DocumentCategorizerNBTest - which runs the same 
tests that were run on the ME document categorizer, but on the Naive Bayes 
categorizer instead (all tests passed).

2.  A test to check the mathematical correctness of the Naive Bayes 
implementation can be found in 
opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest.

So, the inclusion of this code will minimally impact any existing code.

And the code in the latest patch is verifiably correct.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: 
> naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> naive-bayes-patch-for-opennlp-1.6.0-rc6.patch, topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there 

[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-07-19 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632937#comment-14632937
 ] 

Joern Kottmann commented on OPENNLP-777:


Thanks for contributing!

Before we can pull it in we need an ICLA on file. Did you fill one already?
In case you didn't it would be nice if you could send one in.

The link to the icla is here:
https://www.apache.org/licenses/icla.txt

Thanks!

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-07-20 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633129#comment-14633129
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


Joern, I have just emailed the completed ICLA to your email address (to your 
apache.org one).

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-07-20 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633157#comment-14633157
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


Joern, I have just emailed the completed ICLA to your email address (to your 
apache.org one).

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-07-29 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645701#comment-14645701
 ] 

Joern Kottmann commented on OPENNLP-777:


The ICLA was not received yet. You have to be listed on this page: 
https://people.apache.org/committer-index.html

Can you please try to send it again? The ICLA explains how it can be submitted 
in the beginning of it (either via email or fax).

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-07-29 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645783#comment-14645783
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


Joern, I've just resent the completed ICLA both to you and to the Secretary ... 
I had sent the earlier email only to you, not to the Secretary.  I hope that 
takes care of the paperwork!

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-07-29 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646044#comment-14646044
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


Joern,

I've received an acknowledgement from Craig.  He wrote to say that the ICLA has 
been filed in the Apache Software Foundation records.


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-08-07 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661877#comment-14661877
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

Patch looks overall good to me, integration tests pass.
I think it'd be important to have a bit more unit test covering for the 
ml/naivebayes package, [~cohan.sujay] if you want to do it that's perfect, 
otherwise I can take care of it.
Some minor things to fix:
- there's a missing exclusion for 
opennlp-tools/src/test/resources/data/ppa/NOTICE in the pom file thus that is 
reported by RAT as unknown license
- Tabs vs spaces based indent


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-08-10 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679794#comment-14679794
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

thanks [~cohan.sujay]! I'll take care of the rest of the stuff.

If everything is fine, I'll commit it later today.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-08-11 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681743#comment-14681743
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

Thanks a lot [~cohan.sujay] for your additional patch.

The attached test fails on my machine with:
{noformat}
Failed tests: 
  NaiveBayesPrepAttachTest.testNaiveBayesOnPrepAttachData:47 
expected:<0.7897994553107205> but was:<0.7655360237682595>
  NaiveBayesPrepAttachTest.testNaiveBayesOnPrepAttachDataUsingTrainUtil:61 
expected:<0.7897994553107205> but was:<0.7655360237682595>
  
NaiveBayesPrepAttachTest.testNaiveBayesOnPrepAttachDataUsingTrainUtilWithCutoff5:75
 expected:<0.7945035899975241> but was:<0.7930180737806388>
{noformat}

I'll look deeper into it but I think we can tweak the tests to tolerate a 
certain delta in such numbers.

Other than that few more style related comments:
 - we tend to avoid putting @author tags in the javadoc
 - javadoc generally needs to be adjusted a bit
To help with style / formatting conventions we have some guidelines on the 
website at http://opennlp.apache.org/code-conventions.html

I'll work a bit on it and attach a new patch.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-08-11 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681753#comment-14681753
 ] 

Joern Kottmann commented on OPENNLP-777:


The difference in the test results indicates that something is a bit different 
between your system and his.
My best guess is: the encoding of the training data is differently decoded on 
your system and his.

All other tests are computing exactly identical numbers across systems.


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-08-11 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681884#comment-14681884
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


Tommaso, I have no objection to the removal of the @author tags.

Is there a way to reformat the code?

If you know of a way to reformat it automatically, that would be great.

You will probably find the indent conventions messed up - I hadn't used the 
openNLP code formatter :|


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-08-11 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681979#comment-14681979
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

I've committed [~cohan.sujay]'s patch, thanks Cohan!

I've adjusted indent, author tags and some javadoc. I'll follow up shortly with 
some more commits for minor improvements and more unit tests.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-10 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738878#comment-14738878
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

[~cohan.sujay] I am writing some tests around model IO (persist to and read 
from file) but I am not sure if I am doing something wrong or there's a bug 
there.
If you try the two tests below they'll both fail at reading the model written 
to file:
{code}

@Test
  public void testBinaryModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

Path path = Paths.get(getClass().getResource("/").getFile());
Path tempFile = Files.createTempFile(path, "bnb-", ".bin");
File file = tempFile.toFile();
GenericModelWriter modelWriter = new GenericModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new NaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.getModel();
assertNotNull(abstractModel);
  }

  @Test
  public void testTextModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

Path path = Paths.get(getClass().getResource("/").getFile());
Path tempFile = Files.createTempFile(path, "ptnb-", ".txt");
File file = tempFile.toFile();
GenericModelWriter modelWriter = new GenericModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new NaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.getModel();
assertNotNull(abstractModel);
  }

{code}

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >

[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-15 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746005#comment-14746005
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


Tommaso,

The problem in the above testcases seems to be in the use of the 
GenericModelWriter.

Each of the machine learning algorithms has its own set of ModelWriter and 
ModelReader classes which must be used to persist their models.

The Writers come in one of 2 flavours - Binary and PlainText.

So, the following testcases work for me (one thing that baffled me was that I 
had to use constructModel rather than getModel to make these testcases work).

I hope that answers your question.

@Test
  public void testBinaryModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.bin");
NaiveBayesModelWriter modelWriter = new 
BinaryNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
BinaryNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }

  @Test
  public void testTextModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.txt");
NaiveBayesModelWriter modelWriter = new 
PlainTextNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
PlainTextNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data form

[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-16 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791028#comment-14791028
 ] 

Joern Kottmann commented on OPENNLP-777:


I noticed that the patch added this file:
opennlp-tools/src/main/java/opennlp/tools/doccat/DocumentCategorizerNB.java

What is the reason behind that? If the NB classifier is integrated correctly it 
can be activated via the params mechanism and be used by any component in 
OpenNLP which uses a classifier. 
Is it already possible to configure it via parameters?

The DocumentCategorizerNB class should be removed.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-17 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791801#comment-14791801
 ] 

Joern Kottmann commented on OPENNLP-777:


I trained the Document Categorizer with a params file and set it to NAIVEBAYES. 
Is that suppose to work? The performance of the model was really bad. I am 
currently working on adding language detection to OpenNLP and trained the 
Document Categorizer on the Leipzig corpus. It works with Maxent and 
Perceptron, but the NB classifier couldn't even classify longer pieces of text 
correctly. 

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-17 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791808#comment-14791808
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

thanks @Cohan .sujay] for the help on the unit test, I'll have a look at why 
getModel doesn't work.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-17 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14802714#comment-14802714
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

[~joern] thanks for your feedback, I'll have a deeper look at the doccat part 
and report here as well.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-17 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14802853#comment-14802853
 ] 

Joern Kottmann commented on OPENNLP-777:


Great, we should also have a sample ml file inside lang/ml. All the possible 
parameters should be explained there.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-17 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14802976#comment-14802976
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


Joern,

I built the DocumentCategorizerNB to mirror the concrete class 
DocumentCategorizerME which you find in the same package as DocumentCategorizer 
(which is just an interface).

I've only tested DocumentCategorizerNB (there is a testcase called 
'DocumentCategorizerNBTest') and not DocumentCategorizerME with parameters 
passed to it (because I don't know how to do it).

But from your observation that it is performing poorly, I believe it is not 
working correctly, because the NB classifier should typically outperform the 
Perceptron (as in does in the prep attach test that directly exercises the NB 
classifier implementation).

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-18 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805159#comment-14805159
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


Tommaso,

I had built the NaiveBayes reader by looking at the PerceptronReader.  So, I 
rewrote your test with the Perceptron class hierarchy instead of the NaiveBayes 
class hierarchy and obtained the same error.  The reader.getModel method fails 
in exactly the same way in the PerceptronReader as well.

Here is the test code:

{code}
PerceptronModel model = (PerceptronModel)new 
PerceptronTrainer().trainModel(10, new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false), 1);

File file = new File("test_perceptron.bin");
PerceptronModelWriter modelWriter = new 
BinaryPerceptronModelWriter(model, file);
modelWriter.persist();

PerceptronModelReader reader = new 
BinaryPerceptronModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.getModel();
assertNotNull(abstractModel);
{code}

I hope that helps you with this problem.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-18 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805176#comment-14805176
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


[~joern] and [~teofili],

There is another problem with the DocumentCategorizer, and that is in the 
nomenclature.

DocumentCategorizer is just the interface and there is no concrete 
implementation thereof at present.

So, if you look at the tutorials available on OpenNLP, including the 1.6.0 
manual
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.doccat.classifying.api)

You see that sample code tends to use DocumentCategorizerME explicitly.

The ME suffix seems to indicate Maximum Entropy.

So, wouldn't it be confusing for a user if they instantiated a subclass that 
was named Maximum Entropy, but if, owing to the setting of parameters, it used 
a Naive Bayes algorithm internally instead?

The 1.6.0 manual actually says:

{quote}
Document Categorizer API

To perform classification you will need a *maxent* model - these are 
encapsulated in the DoccatModel class of OpenNLP tools.

First you need to grab the bytes from the serialized model on an InputStream - 
we'll leave it you to do that, since you were the one who serialized it to 
begin with. Now for the easy part:
{quote}

And the code goes:

{code}
String inputText = ...
DocumentCategorizerME myCategorizer = new DocumentCategorierME(m);
double[] outcomes = myCategorizer.categorize(inputText);
String category = myCategorizer.getBestOutcome();
{code}

Wouldn't this necessitate the use of a different concrete subclass (i.e., 
DocumentCategorizerNB) to preserve backward compatibility? (Because users have 
already written code using DocumentCategorierME rendering a change of 
nomenclature of the concrete class inadvisable)?

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CE

[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-18 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805204#comment-14805204
 ] 

Joern Kottmann commented on OPENNLP-777:


You are right the name is not indicating that Maxent is used. We decided to 
keep the name anyway to not break backward compatibility. Every existing user 
who updates to a new version of OpenNLP would have to change their code to 
reflect this name change. 

The documentation should be updated. I therefore suggest that we just drop the 
DocumentCategorizerNB class and one day rename all the components to they are 
not ending on ME anymore.

Have a look at the files in lang/ml, these are parameter files for the trainer. 
If you take one e.g. for maxent and change the algorithm to NAIVEBAYES you can 
train a NB  document categorizer.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-18 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805338#comment-14805338
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


Right, [~joern], I get it now.

I know [~teofili] is looking into Doccat, but I suppose I'll take a look as 
well.

I'll run a test on the Leipsig corpus and see what it's using, and if anything 
more needs to be done to hook up the NB algorithm so that it can be used with a 
param switch.

That should allow us to dispense with the DocumentCategorizerNB class.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-09-18 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14860187#comment-14860187
 ] 

Joern Kottmann commented on OPENNLP-777:


Great.

Here is the command I used to train it:
bin/opennlp DoccatTrainer.leipzig -sentencesDir /home/blue/Documents/langtrain/ 
 -model langid-ngram.bin -lang mul -params lang/ml/NaiveBayesTrainerParams.txt

And here are the files I used:
afr_web_2013_100K-sentences.txt
lit_newscrawl_2011_100K-sentences.txt
ara_web_2011_100K-sentences.txt
 mal_newscrawl_2011_100K-sentences.txt
bak_newscrawl_2011_100K-sentences.txt
mar_newscrawl_2011_100K-sentences.txt
bel_news_2011_100K-sentences.txt
mkd_newscrwal_2011_100K-sentences.txt
ben_newscrawl_2011_100K-sentences.txt
mlt_web_2012_100K-sentences.txt
bos_newscrawl_2011_100K-sentences.txt
mri_web_2011_100K-sentences.txt
bul_newscrawl_2011_100K-sentences.txt
msa_newscrwal_2011_100K-sentences.txt
cat_newscrawl_2011_100K-sentences.txt
nep_news_2010_100K-sentences.txt
ces_web_2012_100K-sentences.txt
nld_mixed_2012_100K-sentences.txt
cmn_wikipedia_2012_100K-sentences.txt
nob_news_2013_100K-sentences.txt
dan_mixed_2014_100K-sentences.txt
pol_newscrawl_2011_100K-sentences.txt
deu_news_2010_100K-sentences.txt
por_newscrawl_2011_100K-sentences.txt
ell_web_2011_100K-sentences.txt
pus_newscrawl_2011_100K-sentences.txt
eng_news_2010_100K-sentences.txt
ron_web_2011_100K-sentences.txt
epo_web_2012_100K-sentences.txt
rus_news_2010_100K-sentences.txt
est_newscrawl_2011_100K-sentences.txt
slk_newscrawl_2011_100K-sentences.txt
eus_newscrawl_2012_100K-sentences.txt
slv_newscrawl_2011_100K-sentences.txt
fao_web_2013_100K-sentences.txt
 som_newscrawl_2011_100K-sentences.txt
fas_newscrawl_2011_100K-sentences.txt
spa_news_2011_100K-sentences.txt
fin_newscrawl_2011_100K-sentences.txt
srp_wikipedia_2010_100K-sentences.txt
fra_news_2010_100K-sentences.txt
swe_news_2007_100K-sentences.txt
glg_wikipedia_2012_100K-sentences.txt
tam_newscrawl_2011_100K-sentences.txt
hin_newscrawl_2012_100K-sentences.txt
tat_mixed_2015_100K-sentences.txt
hrv_newscrawl_2011_100K-sentences.txt
tel_newscrawl_2011_100K-sentences.txt
hun_mixed_2012_100K-sentences.txt
tgk_newscrawl_2011_100K-sentences.txt
hye_newscrawl_2011_100K-sentences.txt
tgl_newscrwal_2011_100K-sentences.txt
ind_web_2012_100K-sentences.txt
tha_newscrawl_2011_100K-sentences.txt
isl_newscrawl_2011_100K-sentences.txt
tur_newscrawl_2011_100K-sentences.txt
ita_web_2011_100K-sentences.txt
ukr_web_2012_100K-sentences.txt
jpn_news_2005-2008_100K-sentences.txt
urd_newscrwal_2011_100K-sentences.txt
kat_newscrawl_2011_100K-sentences.txt
uzb_newscrawl_2011_100K-sentences.txt
kaz_newscrawl_2011_100K-sentences.txt
vie_newscrwal_2011_100K-sentences.txt
kir_newscrawl_2011_100K-sentences.txt
vol_wikipedia_2011_100K-sentences.txt
kor_news_2007_100K-sentences.txt
zho_news_2007-2009_100K-sentences.txt
lav_newscrawl_2011_100K-sentences.txt
zul_mixed_2013_100K-sentences.txt

You can download them from here:
http://corpora2.informatik.uni-leipzig.de/download.html

The resulting language detection works rather well for texts that have at least 
a few words.


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, event

[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-10-19 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963353#comment-14963353
 ] 

Cohan Sujay Carlos commented on OPENNLP-777:


[~joern] and [~teofili],

Just a quick update, since it's been a month since the last message on this 
thread.

I'm finding the setup required to replicate Joern's environment a real bear 
(I'm sorry but I am just not familiar with the build tools), but I'll continue 
banging at it next week.

But is there any way that either [~joern] or [~teofili] could check if the 
Naive Bayes classifier is indeed being used when you call the model as Joern 
has described using NaiveBayesTrainerParams.txt?

It just seems very unlikely that the NB model would - if it is being called - 
fare worse than a perceptron at this task - given its superior performance in 
the PrepAttach testcase.

Also since [~joern] is working on a language identifier, I thought I should 
mention that sequential models (markov chains) with character level features 
fare far better than linear classifiers with word-level features at that task 
... a really good method is described by Sibun and Reynar in their paper 
"Language Identification: Examining the Issues".  The training command that 
Joern used above doesn't seem to specify character-level features, so I am 
assuming he's using word-level features and a linear classifier - and that 
wouldn't work well in any case.


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2015-10-20 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964843#comment-14964843
 ] 

Joern Kottmann commented on OPENNLP-777:


The Prep Attach Test here is verifying that already. It trains the classifier 
once by directly instantiating it and then once via TrainUtil. In both cases it 
comes to the same accuracy number.

So that is good. In another test I see you switch smoothing on/off. The way 
that is done will not work very well. The method for doing that is static 
(concurrency problems) and it can't be influenced by a training parameter. I 
suggest we add a parameter for smoothing. Is that even an option a user should 
be able to set?

Any other options that should be configurable via training params?

What do you think?

It would be nice to have state of the art language detection in OpenNLP. For my 
specific use case word-level features worked quite well, I use it to classify 
long news articles into a handful of languages.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2016-01-07 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088848#comment-15088848
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

thanks [~cohan.sujay], the build is now green. I'll have a second look in the 
next hours and commit it if everything looks good.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2016-01-08 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088980#comment-15088980
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

I've committed latest [~cohan.sujay]'s patch in r1723671, thanks Cohan!

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2016-01-13 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095834#comment-15095834
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

very good to hear, thanks [~cohan.sujay] for sharing code and results. I'll try 
it out myself too.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, 
> NaiveBayesOpenNLPTestCode.zip, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2016-01-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103045#comment-15103045
 ] 

Tommaso Teofili commented on OPENNLP-777:
-

I've tried NBC using the client code provided by [~cohan.sujay] and it worked 
nicely. I am keen to consider this resolved and to eventually open new issues 
when we spot something to improve / fix.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, 
> NaiveBayesOpenNLPTestCode.zip, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

2016-11-02 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630765#comment-15630765
 ] 

Joern Kottmann commented on OPENNLP-777:


Can we close this issue?

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Fix For: 1.6.1
>
> Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, 
> NaiveBayesOpenNLPTestCode.zip, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)