[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2016-01-12 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: NaiveBayesOpenNLPTestCode.zip

I've tested the Naive Bayes doccat in OpenNLP built from the trunk 
programmatically.  It works fine.

Here are the numbers.

1.  Subjectivity classification experiment on the 5000 movie reviews dataset 
(used in the paper "A Sentimental Education" by Bo Pang and Lillian Lee) with a 
50:50 split into training and test:

Accuracies
--
Perceptron: 57.54% (100 iterations)
Perceptron: 59.96% (1000 iterations)
Maxent:  91.48% (100 iterations)
Maxent:  90.68% (1000 iterations)
Naive Bayes:  90.72%

2.  Sentiment polarity classification Cornell movie review dataset v1.1 (700 
positive and 700 negative reviews).

With 350 of each as training and the rest as test, I get:

Perceptron: 49.70% (100 iterations)
Perceptron: 49.85% (1000 iterations)
Maxent: 77.11% (100 iterations)
Maxent: 77.55% (1000 iterations)
Naive Bayes:  75.65%

The code I used for the testing is attached.

The data used in this experiment was taken from 
http://www.cs.cornell.edu/people/pabo/movie-review-data/

Thank you, [~teofili] and [~joern].

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, 
> NaiveBayesOpenNLPTestCode.zip, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2016-01-07 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: NaiveBayesCorrectnessTest.java

Apologies, [~teofili] ... I have attached the {{NaiveBayesCorrectnessTest}} ... 
I think the problem in the patch is because I copying the files back into the 
Eclipse project where I created the patch.  I believe Eclipse treats copies 
between projects as delete+add.  In this case, it seems to have left out the 
add mysteriously.

I hope the attached test solves the issue.  I'll stop using two projects for my 
OpenNLP development work henceforth.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2016-01-07 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: NaiveBayesModel.java

[~teofili] the Naive Bayes Model needs to be in there.

I have attached the latest NaiveBayesModel.java file.

I had created the patch with this file in there so I am surprised the file gets 
deleted when you apply the patch!!!

Could you add this file and see if it works fine?

Thank you, [~teofili].

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesModel.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: 
naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: 
naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: 
naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: 
naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: 
naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch

Affixing a patch with the formatting issues in NaiveBayesModel taken care of 
(you'll just need to check the patch 
"naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch" 
in - it is to be applied to the trunk).

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: topics.train)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: 
naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: NaiveBayesCorrectnessTest.java)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: 
prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: 
naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: D1TopicClassifierUsageDemoNB.java)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch,
>  naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: D1TopicClassifierTrainingDemoNB.java)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierUsageDemoNB.java, 
> NaiveBayesCorrectnessTest.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch,
>  naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: 
naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch

[~joern] and [~teofili],

I am submitting herewith another patch with the fixes requested by Joern.

In this patch, I have made the following changes:

a)  Removed the DocumentCategorizerNB file.
b)  Rewritten the test-case for the above to operate on DocumentCategorizerME 
(passing suitable parameters to train() to exercise the NB classifier instead 
of the ME classifier).
c)  Changed NaiveBayesModel in ml.naivebayes to remove the flag to disable 
smoothing (I've removed the flag since there is no use-case where the NB 
classifier would be used without smoothing).
d)  Changed the NaiveBayesCorrectnessTest to reflect the above.
e)  Made small changes to Tommaso's test-case "NaiveBayesModelReadWriteTest" 
because it was causing the tests to fail when executed on the Maven Eclipse 
plugin on Windows.  I changed the location of the temp file so that the tests 
no longer fail on Windows.  Could [~teofili] run this testcase on Unix to 
verify that it works fine.

(This patch is to be applied to the trunk).

I suppose I may have undone the formatting that [~teofili] corrected on the 
above files in making these changes.

I will need [~joern] or [~teofili] to check this patch in for me if all is well.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch,
>  naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-08-11 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: NaiveBayesCorrectnessTest.java

Yes, that's right Joern.  That test is deterministic so the results have to be 
the same every single time.

I found the error.  The error was a result of a side-effect in the previous 
testcase: (NaiveBayesCorrectnessTest).

In the correctness test, in order to mathematically validate the classifier, I 
was deliberately hobbling it (turning off the smoothing and instead using 
Maximum Likelihood estimators of probability).

I had forgotten to re-enable smoothing in the correctness test, so the output 
of the tests came to depend upon the order in which these tests were run.

I have now bracketed each of the correctness tests with functions reenabling 
the smoothing.

I also wanted to you let you know that the function that hobbles the classifier 
(ml.naivebayes.NaiveBayesMode.setSmoothed(boolean)) has package-level 
visibility.

I did that deliberately to ensure that it can only be invoked from code that is 
in the same package.  The only use of the hobbling function is 
testing/validation (no user would really want to hobble the classifier and lose 
a few percentage points of accuracy).

The corrected 'correctness test' is attached.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-08-10 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: 
prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch

Thanks Tommaso.  Here's a new testcase patch with a cosmetic change.  I had 
forgotten to change the names of the tests from *Perceptron* to *NaiveBayes*.  
Now it's fixed and clean.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-08-10 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: 
prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-08-10 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: 
prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch

Tommaso,

As you have requested, here is a patch with a testcase for the ml/naivebayes 
package.

This is an adaptation of the PrepAttachTest found in the ml/perceptron package.

As you would expect, the NaiveBayes' accuracy is ahead of the Perceptron's on 
this test, but is slightly behind the MaxEnt's.

Could you take care of the rest (I don't quite understand how to fix that 
exclusion) and check it in for me please Tommaso?

Cohan

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-29 Thread Joern Kottmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joern Kottmann updated OPENNLP-777:
---
Labels: NBClassifier bayes bayesian classifier multinomial naive patch  
(was: NBClassifier bayes bayesian classifier multinomial naive)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: 
naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: 
naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: D1TopicClassifierUsageDemoNB.java
D1TopicClassifierTrainingDemoNB.java

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: D1TopicClassifierUsageDemoNB.java)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: D1TopicClassifierTrainingDemoNB.java)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: 
naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: 
naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch

Updated patch (I removed a superfluous piece of code).

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: D1TopicClassifierUsageDemoNB.java
D1TopicClassifierTrainingDemoNB.java

Files demonstrating how the Naive Bayesian document categorizer 
(DocumentCategorizerNB) may be trained and used for document classification.  
These Java files are meant to be used with the training data file 
(topics.train) that you will also find in the attachments.

When training said categorizer, place 'topics.train' in a 'corpora/topics' 
directory under the directory where you are running this code.  The model will 
be created in the sub-folder 'models' (make sure you have a folder by that name 
under your current directory).

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, 
> naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: (was: naive-bayes-patch-for-opennlp-1.6.0-rc6.patch)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: 
> naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: 
naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch

A new patch.

This patch contains test-cases validating the Naive Bayesian classifier 
(ensuring that the mathematics in it is correct) and the Naive Bayesian 
implementation of the document categorizer.

This patch can be applied to 1.6.0 rc6.


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: 
> naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> naive-bayes-patch-for-opennlp-1.6.0-rc6.patch, topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-14 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: topics.train

The attached training file can be used to train a Naive Bayes classifier model 
(... the training file 'topics.train' will have to be placed in a directory 
named 'corpora/topics'...) ... and the code to train and save a model looks as 
follows:

public class D1TopicClassifierTrainingDemoNB {
public static void main(String[] args) {

DoccatModel model = null;

InputStream dataIn = null;
try {
  dataIn = new FileInputStream("corpora/topics/topics.train");
  ObjectStream lineStream =
new PlainTextByLineStream(dataIn, "UTF-8");
  ObjectStream sampleStream = new 
DocumentSampleStream(lineStream);

  model = DocumentCategorizerNB.train("en", sampleStream);
}
catch (IOException e) {
  // Failed to read or parse training data, training failed
  e.printStackTrace();
}
finally {
  if (dataIn != null) {
try {
  dataIn.close();
}
catch (IOException e) {
  // Not an issue, training already finished.
  // The exception should be logged and investigated
  // if part of a production system.
  e.printStackTrace();
}
  }
}
String modelFile = "models/topics_nb.bin";
OutputStream modelOut = null;
try {
  modelOut = new BufferedOutputStream(new 
FileOutputStream(modelFile));
  model.serialize(modelOut);
}
catch (IOException e) {
  // Failed to save model
  e.printStackTrace();
}
finally {
  if (modelOut != null) {
try {
   modelOut.close();
}
catch (IOException e) {
  // Failed to correctly save model.
  // Written model might be invalid.
  e.printStackTrace();
}
  }
}
}
}

The model will be created in the directory "models" and can be loaded and used 
as follows:

public class D1TopicClassifierUsageDemoNB {
public static void main(String[] args) {

//String paragraph = "Although the outfit has been banned, no 
restriction has been imposed on movement of its leaders outside 
Pakistan-occupied-Kashmir (PoK) but they could not conduct their organisational 
activities in the country, Interior Minister Faisel Saleh Hayat said.";
String paragraph = "Rumours before the game suggested the 
Portuguese would be out at the end of the season if Inter failed to progress 
but in the end there was little to worry about as goals from Samuel Eto'o and 
Mario Balotelli ensured a comfortable night.";

// always start with a model, a model is learned from training 
data
InputStream is = null;
try {
is = new FileInputStream("models/topics_nb.bin");

DoccatModel model = new DoccatModel(is);

AbstractModel internalModel = 
(AbstractModel)model.getMaxentModel();

System.out.println("ModelType: 
"+internalModel.getModelType());
System.out.println("Model Outcomes: ");
Object[] data = internalModel.getDataStructures();
for (String val : 
(String[])internalModel.getDataStructures()[2]) {
System.out.println(val);
}
IndexHashTable pmap = (IndexHashTable) 
data[1];
//String[] PRED_LABELS = new String[pmap.size()];
//pmap.toArray(PRED_LABELS);
//Context[] contexts = (Context[])data[0];
//System.out.println("Pred labels: ");
//for (String label : PRED_LABELS) {
//  System.out.println(label + " " + 
pmap.get(label) + " " + contexts[pmap.get(label)].getOutcomes().length + " " + 
contexts[pmap.get(label)].getOutcomes()[0] + " " + 
contexts[pmap.get(label)].getParameters()[0]);
//}
System.out.println("Running the classifier: ");
   

[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-07-14 Thread Cohan Sujay Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cohan Sujay Carlos updated OPENNLP-777:
---
Attachment: naive-bayes-patch-for-opennlp-1.6.0-rc6.patch

This is the patch with the code for a multinomial Naive Bayesian Classifier.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
> Attachments: naive-bayes-patch-for-opennlp-1.6.0-rc6.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

2015-05-22 Thread Joern Kottmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joern Kottmann updated OPENNLP-777:
---
Affects Version/s: (was: 1.6.0)
Fix Version/s: (was: 1.6.0)

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)