[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: NaiveBayesOpenNLPTestCode.zip I've tested the Naive Bayes doccat in OpenNLP built from the trunk programmatically. It works fine. Here are the numbers. 1. Subjectivity classification experiment on the 5000 movie reviews dataset (used in the paper "A Sentimental Education" by Bo Pang and Lillian Lee) with a 50:50 split into training and test: Accuracies -- Perceptron: 57.54% (100 iterations) Perceptron: 59.96% (1000 iterations) Maxent: 91.48% (100 iterations) Maxent: 90.68% (1000 iterations) Naive Bayes: 90.72% 2. Sentiment polarity classification Cornell movie review dataset v1.1 (700 positive and 700 negative reviews). With 350 of each as training and the rest as test, I get: Perceptron: 49.70% (100 iterations) Perceptron: 49.85% (1000 iterations) Maxent: 77.11% (100 iterations) Maxent: 77.55% (1000 iterations) Naive Bayes: 75.65% The code I used for the testing is attached. The data used in this experiment was taken from http://www.cs.cornell.edu/people/pabo/movie-review-data/ Thank you, [~teofili] and [~joern]. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, > NaiveBayesOpenNLPTestCode.zip, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: NaiveBayesCorrectnessTest.java Apologies, [~teofili] ... I have attached the {{NaiveBayesCorrectnessTest}} ... I think the problem in the patch is because I copying the files back into the Eclipse project where I created the patch. I believe Eclipse treats copies between projects as delete+add. In this case, it seems to have left out the add mysteriously. I hope the attached test solves the issue. I'll stop using two projects for my OpenNLP development work henceforth. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: NaiveBayesModel.java [~teofili] the Naive Bayes Model needs to be in there. I have attached the latest NaiveBayesModel.java file. I had created the patch with this file in there so I am surprised the file gets deleted when you apply the patch!!! Could you add this file and see if it works fine? Thank you, [~teofili]. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesModel.java, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch Affixing a patch with the formatting issues in NaiveBayesModel taken care of (you'll just need to check the patch "naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch" in - it is to be applied to the trunk). > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: topics.train) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: NaiveBayesCorrectnessTest.java) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesCorrectnessTest.java, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesCorrectnessTest.java, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: D1TopicClassifierUsageDemoNB.java) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesCorrectnessTest.java, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: D1TopicClassifierTrainingDemoNB.java) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierUsageDemoNB.java, > NaiveBayesCorrectnessTest.java, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch [~joern] and [~teofili], I am submitting herewith another patch with the fixes requested by Joern. In this patch, I have made the following changes: a) Removed the DocumentCategorizerNB file. b) Rewritten the test-case for the above to operate on DocumentCategorizerME (passing suitable parameters to train() to exercise the NB classifier instead of the ME classifier). c) Changed NaiveBayesModel in ml.naivebayes to remove the flag to disable smoothing (I've removed the flag since there is no use-case where the NB classifier would be used without smoothing). d) Changed the NaiveBayesCorrectnessTest to reflect the above. e) Made small changes to Tommaso's test-case "NaiveBayesModelReadWriteTest" because it was causing the tests to fail when executed on the Maven Eclipse plugin on Windows. I changed the location of the temp file so that the tests no longer fail on Windows. Could [~teofili] run this testcase on Unix to verify that it works fine. (This patch is to be applied to the trunk). I suppose I may have undone the formatting that [~teofili] corrected on the above files in making these changes. I will need [~joern] or [~teofili] to check this patch in for me if all is well. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: NaiveBayesCorrectnessTest.java Yes, that's right Joern. That test is deterministic so the results have to be the same every single time. I found the error. The error was a result of a side-effect in the previous testcase: (NaiveBayesCorrectnessTest). In the correctness test, in order to mathematically validate the classifier, I was deliberately hobbling it (turning off the smoothing and instead using Maximum Likelihood estimators of probability). I had forgotten to re-enable smoothing in the correctness test, so the output of the tests came to depend upon the order in which these tests were run. I have now bracketed each of the correctness tests with functions reenabling the smoothing. I also wanted to you let you know that the function that hobbles the classifier (ml.naivebayes.NaiveBayesMode.setSmoothed(boolean)) has package-level visibility. I did that deliberately to ensure that it can only be invoked from code that is in the same package. The only use of the hobbling function is testing/validation (no user would really want to hobble the classifier and lose a few percentage points of accuracy). The corrected 'correctness test' is attached. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch Thanks Tommaso. Here's a new testcase patch with a cosmetic change. I had forgotten to change the names of the tests from *Perceptron* to *NaiveBayes*. Now it's fixed and clean. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch Tommaso, As you have requested, here is a patch with a testcase for the ml/naivebayes package. This is an adaptation of the PrepAttachTest found in the ml/perceptron package. As you would expect, the NaiveBayes' accuracy is ahead of the Perceptron's on this test, but is slightly behind the MaxEnt's. Could you take care of the rest (I don't quite understand how to fix that exclusion) and check it in for me please Tommaso? Cohan > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joern Kottmann updated OPENNLP-777: --- Labels: NBClassifier bayes bayesian classifier multinomial naive patch (was: NBClassifier bayes bayesian classifier multinomial naive) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: D1TopicClassifierUsageDemoNB.java D1TopicClassifierTrainingDemoNB.java > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: D1TopicClassifierUsageDemoNB.java) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: D1TopicClassifierTrainingDemoNB.java) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch Updated patch (I removed a superfluous piece of code). > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: D1TopicClassifierUsageDemoNB.java D1TopicClassifierTrainingDemoNB.java Files demonstrating how the Naive Bayesian document categorizer (DocumentCategorizerNB) may be trained and used for document classification. These Java files are meant to be used with the training data file (topics.train) that you will also find in the attachments. When training said categorizer, place 'topics.train' in a 'corpora/topics' directory under the directory where you are running this code. The model will be created in the sub-folder 'models' (make sure you have a folder by that name under your current directory). > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: (was: naive-bayes-patch-for-opennlp-1.6.0-rc6.patch) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: > naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch A new patch. This patch contains test-cases validating the Naive Bayesian classifier (ensuring that the mathematics in it is correct) and the Naive Bayesian implementation of the document categorizer. This patch can be applied to 1.6.0 rc6. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: > naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > naive-bayes-patch-for-opennlp-1.6.0-rc6.patch, topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: topics.train The attached training file can be used to train a Naive Bayes classifier model (... the training file 'topics.train' will have to be placed in a directory named 'corpora/topics'...) ... and the code to train and save a model looks as follows: public class D1TopicClassifierTrainingDemoNB { public static void main(String[] args) { DoccatModel model = null; InputStream dataIn = null; try { dataIn = new FileInputStream("corpora/topics/topics.train"); ObjectStream lineStream = new PlainTextByLineStream(dataIn, "UTF-8"); ObjectStream sampleStream = new DocumentSampleStream(lineStream); model = DocumentCategorizerNB.train("en", sampleStream); } catch (IOException e) { // Failed to read or parse training data, training failed e.printStackTrace(); } finally { if (dataIn != null) { try { dataIn.close(); } catch (IOException e) { // Not an issue, training already finished. // The exception should be logged and investigated // if part of a production system. e.printStackTrace(); } } } String modelFile = "models/topics_nb.bin"; OutputStream modelOut = null; try { modelOut = new BufferedOutputStream(new FileOutputStream(modelFile)); model.serialize(modelOut); } catch (IOException e) { // Failed to save model e.printStackTrace(); } finally { if (modelOut != null) { try { modelOut.close(); } catch (IOException e) { // Failed to correctly save model. // Written model might be invalid. e.printStackTrace(); } } } } } The model will be created in the directory "models" and can be loaded and used as follows: public class D1TopicClassifierUsageDemoNB { public static void main(String[] args) { //String paragraph = "Although the outfit has been banned, no restriction has been imposed on movement of its leaders outside Pakistan-occupied-Kashmir (PoK) but they could not conduct their organisational activities in the country, Interior Minister Faisel Saleh Hayat said."; String paragraph = "Rumours before the game suggested the Portuguese would be out at the end of the season if Inter failed to progress but in the end there was little to worry about as goals from Samuel Eto'o and Mario Balotelli ensured a comfortable night."; // always start with a model, a model is learned from training data InputStream is = null; try { is = new FileInputStream("models/topics_nb.bin"); DoccatModel model = new DoccatModel(is); AbstractModel internalModel = (AbstractModel)model.getMaxentModel(); System.out.println("ModelType: "+internalModel.getModelType()); System.out.println("Model Outcomes: "); Object[] data = internalModel.getDataStructures(); for (String val : (String[])internalModel.getDataStructures()[2]) { System.out.println(val); } IndexHashTable pmap = (IndexHashTable) data[1]; //String[] PRED_LABELS = new String[pmap.size()]; //pmap.toArray(PRED_LABELS); //Context[] contexts = (Context[])data[0]; //System.out.println("Pred labels: "); //for (String label : PRED_LABELS) { // System.out.println(label + " " + pmap.get(label) + " " + contexts[pmap.get(label)].getOutcomes().length + " " + contexts[pmap.get(label)].getOutcomes()[0] + " " + contexts[pmap.get(label)].getParameters()[0]); //} System.out.println("Running the classifier: ");
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cohan Sujay Carlos updated OPENNLP-777: --- Attachment: naive-bayes-patch-for-opennlp-1.6.0-rc6.patch This is the patch with the code for a multinomial Naive Bayesian Classifier. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: naive-bayes-patch-for-opennlp-1.6.0-rc6.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joern Kottmann updated OPENNLP-777: --- Affects Version/s: (was: 1.6.0) Fix Version/s: (was: 1.6.0) > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)