[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550578#comment-14550578 ] Joern Kottmann commented on OPENNLP-777: Yes, that would be really nice to have in OpenNLP! > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning >Affects Versions: 1.6.0 > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Fix For: 1.6.0 > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551168#comment-14551168 ] Haider Ali commented on OPENNLP-777: i also wan to contribute to Naive Bayesian Classifier > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning >Affects Versions: 1.6.0 > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Fix For: 1.6.0 > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632413#comment-14632413 ] Cohan Sujay Carlos commented on OPENNLP-777: In implementing the Naive Bayes classifier, we tried to ensure minimal disruption to existing code. The only changes to existing code are as follows: 1. The opennlp.tools.ml.model.AbstractModel class has been changed to include a new model type: line 35: public enum ModelType {Maxent,Perceptron,MaxentQn,NaiveBayes}; 2. The opennlp.tools.ml.model.GenericModelReader class has been changed in one place: line 53: else if (modelType.equals("NaiveBayes")) { delegateModelReader = new NaiveBayesModelReader(this.dataReader); } 3. The opennlp.tools.ml.model.GenericModelWriter class has been changed in two places: line 79: if (model.getModelType() == ModelType.NaiveBayes) { delegateWriter = new BinaryNaiveBayesModelWriter(model,dos); } line 91: if (model.getModelType() == ModelType.NaiveBayes) { delegateWriter = new PlainTextNaiveBayesModelWriter(model,bw); } 4. The initializer of the opennlp.tools.ml.TrainerFactory class has been changed in one place to add in the built-in Naive Bayes trainer: line 51: _trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE, NaiveBayesTrainer.class); That was it! We didn't change anything else in the existing OpenNLP code. All the new code for the Naive Bayesian classifier sits in the package opennlp.tools.ml.naivebayes - just above the perceptron ;) The code for the document categorizer using the Naive Bayesian classifier can be found in opennlp.tools.doccat (we didn't have to change any existing code). The new doccat is called opennlp.tools.doccat.DocumentCategorizerNB (reflecting the name of the maxent document categorizer, which is DocumentCategorizerME). Proof of correctness! I have included two testcases: 1. A test to validate the document categorizer - under the tests folder, you will find opennlp.tools.doccat.DocumentCategorizerNBTest - which runs the same tests that were run on the ME document categorizer, but on the Naive Bayes categorizer instead (all tests passed). 2. A test to check the mathematical correctness of the Naive Bayes implementation can be found in opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest. So, the inclusion of this code will minimally impact any existing code. And the code in the latest patch is verifiably correct. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: > naive-bayes-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > naive-bayes-patch-for-opennlp-1.6.0-rc6.patch, topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632937#comment-14632937 ] Joern Kottmann commented on OPENNLP-777: Thanks for contributing! Before we can pull it in we need an ICLA on file. Did you fill one already? In case you didn't it would be nice if you could send one in. The link to the icla is here: https://www.apache.org/licenses/icla.txt Thanks! > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633129#comment-14633129 ] Cohan Sujay Carlos commented on OPENNLP-777: Joern, I have just emailed the completed ICLA to your email address (to your apache.org one). > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633157#comment-14633157 ] Cohan Sujay Carlos commented on OPENNLP-777: Joern, I have just emailed the completed ICLA to your email address (to your apache.org one). > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645701#comment-14645701 ] Joern Kottmann commented on OPENNLP-777: The ICLA was not received yet. You have to be listed on this page: https://people.apache.org/committer-index.html Can you please try to send it again? The ICLA explains how it can be submitted in the beginning of it (either via email or fax). > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645783#comment-14645783 ] Cohan Sujay Carlos commented on OPENNLP-777: Joern, I've just resent the completed ICLA both to you and to the Secretary ... I had sent the earlier email only to you, not to the Secretary. I hope that takes care of the paperwork! > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646044#comment-14646044 ] Cohan Sujay Carlos commented on OPENNLP-777: Joern, I've received an acknowledgement from Craig. He wrote to say that the ICLA has been filed in the Apache Software Foundation records. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661877#comment-14661877 ] Tommaso Teofili commented on OPENNLP-777: - Patch looks overall good to me, integration tests pass. I think it'd be important to have a bit more unit test covering for the ml/naivebayes package, [~cohan.sujay] if you want to do it that's perfect, otherwise I can take care of it. Some minor things to fix: - there's a missing exclusion for opennlp-tools/src/test/resources/data/ppa/NOTICE in the pom file thus that is reported by RAT as unknown license - Tabs vs spaces based indent > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679794#comment-14679794 ] Tommaso Teofili commented on OPENNLP-777: - thanks [~cohan.sujay]! I'll take care of the rest of the stuff. If everything is fine, I'll commit it later today. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681743#comment-14681743 ] Tommaso Teofili commented on OPENNLP-777: - Thanks a lot [~cohan.sujay] for your additional patch. The attached test fails on my machine with: {noformat} Failed tests: NaiveBayesPrepAttachTest.testNaiveBayesOnPrepAttachData:47 expected:<0.7897994553107205> but was:<0.7655360237682595> NaiveBayesPrepAttachTest.testNaiveBayesOnPrepAttachDataUsingTrainUtil:61 expected:<0.7897994553107205> but was:<0.7655360237682595> NaiveBayesPrepAttachTest.testNaiveBayesOnPrepAttachDataUsingTrainUtilWithCutoff5:75 expected:<0.7945035899975241> but was:<0.7930180737806388> {noformat} I'll look deeper into it but I think we can tweak the tests to tolerate a certain delta in such numbers. Other than that few more style related comments: - we tend to avoid putting @author tags in the javadoc - javadoc generally needs to be adjusted a bit To help with style / formatting conventions we have some guidelines on the website at http://opennlp.apache.org/code-conventions.html I'll work a bit on it and attach a new patch. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681753#comment-14681753 ] Joern Kottmann commented on OPENNLP-777: The difference in the test results indicates that something is a bit different between your system and his. My best guess is: the encoding of the training data is differently decoded on your system and his. All other tests are computing exactly identical numbers across systems. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681884#comment-14681884 ] Cohan Sujay Carlos commented on OPENNLP-777: Tommaso, I have no objection to the removal of the @author tags. Is there a way to reformat the code? If you know of a way to reformat it automatically, that would be great. You will probably find the indent conventions messed up - I hadn't used the openNLP code formatter :| > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681979#comment-14681979 ] Tommaso Teofili commented on OPENNLP-777: - I've committed [~cohan.sujay]'s patch, thanks Cohan! I've adjusted indent, author tags and some javadoc. I'll follow up shortly with some more commits for minor improvements and more unit tests. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738878#comment-14738878 ] Tommaso Teofili commented on OPENNLP-777: - [~cohan.sujay] I am writing some tests around model IO (persist to and read from file) but I am not sure if I am doing something wrong or there's a bug there. If you try the two tests below they'll both fail at reading the model written to file: {code} @Test public void testBinaryModelPersistence() throws Exception { NaiveBayesModel model = (NaiveBayesModel)new NaiveBayesTrainer().trainModel(new TwoPassDataIndexer( NaiveBayesCorrectnessTest.createTrainingStream(), 1, false)); Path path = Paths.get(getClass().getResource("/").getFile()); Path tempFile = Files.createTempFile(path, "bnb-", ".bin"); File file = tempFile.toFile(); GenericModelWriter modelWriter = new GenericModelWriter(model, file); modelWriter.persist(); NaiveBayesModelReader reader = new NaiveBayesModelReader(file); reader.checkModelType(); AbstractModel abstractModel = reader.getModel(); assertNotNull(abstractModel); } @Test public void testTextModelPersistence() throws Exception { NaiveBayesModel model = (NaiveBayesModel)new NaiveBayesTrainer().trainModel(new TwoPassDataIndexer( NaiveBayesCorrectnessTest.createTrainingStream(), 1, false)); Path path = Paths.get(getClass().getResource("/").getFile()); Path tempFile = Files.createTempFile(path, "ptnb-", ".txt"); File file = tempFile.toFile(); GenericModelWriter modelWriter = new GenericModelWriter(model, file); modelWriter.persist(); NaiveBayesModelReader reader = new NaiveBayesModelReader(file); reader.checkModelType(); AbstractModel abstractModel = reader.getModel(); assertNotNull(abstractModel); } {code} > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > >
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746005#comment-14746005 ] Cohan Sujay Carlos commented on OPENNLP-777: Tommaso, The problem in the above testcases seems to be in the use of the GenericModelWriter. Each of the machine learning algorithms has its own set of ModelWriter and ModelReader classes which must be used to persist their models. The Writers come in one of 2 flavours - Binary and PlainText. So, the following testcases work for me (one thing that baffled me was that I had to use constructModel rather than getModel to make these testcases work). I hope that answers your question. @Test public void testBinaryModelPersistence() throws Exception { NaiveBayesModel model = (NaiveBayesModel)new NaiveBayesTrainer().trainModel(new TwoPassDataIndexer( NaiveBayesCorrectnessTest.createTrainingStream(), 1, false)); File file = new File("test.bin"); NaiveBayesModelWriter modelWriter = new BinaryNaiveBayesModelWriter(model, file); modelWriter.persist(); NaiveBayesModelReader reader = new BinaryNaiveBayesModelReader(file); reader.checkModelType(); AbstractModel abstractModel = reader.constructModel(); assertNotNull(abstractModel); } @Test public void testTextModelPersistence() throws Exception { NaiveBayesModel model = (NaiveBayesModel)new NaiveBayesTrainer().trainModel(new TwoPassDataIndexer( NaiveBayesCorrectnessTest.createTrainingStream(), 1, false)); File file = new File("test.txt"); NaiveBayesModelWriter modelWriter = new PlainTextNaiveBayesModelWriter(model, file); modelWriter.persist(); NaiveBayesModelReader reader = new PlainTextNaiveBayesModelReader(file); reader.checkModelType(); AbstractModel abstractModel = reader.constructModel(); assertNotNull(abstractModel); } > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data form
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791028#comment-14791028 ] Joern Kottmann commented on OPENNLP-777: I noticed that the patch added this file: opennlp-tools/src/main/java/opennlp/tools/doccat/DocumentCategorizerNB.java What is the reason behind that? If the NB classifier is integrated correctly it can be activated via the params mechanism and be used by any component in OpenNLP which uses a classifier. Is it already possible to configure it via parameters? The DocumentCategorizerNB class should be removed. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791801#comment-14791801 ] Joern Kottmann commented on OPENNLP-777: I trained the Document Categorizer with a params file and set it to NAIVEBAYES. Is that suppose to work? The performance of the model was really bad. I am currently working on adding language detection to OpenNLP and trained the Document Categorizer on the Leipzig corpus. It works with Maxent and Perceptron, but the NB classifier couldn't even classify longer pieces of text correctly. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791808#comment-14791808 ] Tommaso Teofili commented on OPENNLP-777: - thanks @Cohan .sujay] for the help on the unit test, I'll have a look at why getModel doesn't work. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14802714#comment-14802714 ] Tommaso Teofili commented on OPENNLP-777: - [~joern] thanks for your feedback, I'll have a deeper look at the doccat part and report here as well. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14802853#comment-14802853 ] Joern Kottmann commented on OPENNLP-777: Great, we should also have a sample ml file inside lang/ml. All the possible parameters should be explained there. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14802976#comment-14802976 ] Cohan Sujay Carlos commented on OPENNLP-777: Joern, I built the DocumentCategorizerNB to mirror the concrete class DocumentCategorizerME which you find in the same package as DocumentCategorizer (which is just an interface). I've only tested DocumentCategorizerNB (there is a testcase called 'DocumentCategorizerNBTest') and not DocumentCategorizerME with parameters passed to it (because I don't know how to do it). But from your observation that it is performing poorly, I believe it is not working correctly, because the NB classifier should typically outperform the Perceptron (as in does in the prep attach test that directly exercises the NB classifier implementation). > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805159#comment-14805159 ] Cohan Sujay Carlos commented on OPENNLP-777: Tommaso, I had built the NaiveBayes reader by looking at the PerceptronReader. So, I rewrote your test with the Perceptron class hierarchy instead of the NaiveBayes class hierarchy and obtained the same error. The reader.getModel method fails in exactly the same way in the PerceptronReader as well. Here is the test code: {code} PerceptronModel model = (PerceptronModel)new PerceptronTrainer().trainModel(10, new TwoPassDataIndexer( NaiveBayesCorrectnessTest.createTrainingStream(), 1, false), 1); File file = new File("test_perceptron.bin"); PerceptronModelWriter modelWriter = new BinaryPerceptronModelWriter(model, file); modelWriter.persist(); PerceptronModelReader reader = new BinaryPerceptronModelReader(file); reader.checkModelType(); AbstractModel abstractModel = reader.getModel(); assertNotNull(abstractModel); {code} I hope that helps you with this problem. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805176#comment-14805176 ] Cohan Sujay Carlos commented on OPENNLP-777: [~joern] and [~teofili], There is another problem with the DocumentCategorizer, and that is in the nomenclature. DocumentCategorizer is just the interface and there is no concrete implementation thereof at present. So, if you look at the tutorials available on OpenNLP, including the 1.6.0 manual (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.doccat.classifying.api) You see that sample code tends to use DocumentCategorizerME explicitly. The ME suffix seems to indicate Maximum Entropy. So, wouldn't it be confusing for a user if they instantiated a subclass that was named Maximum Entropy, but if, owing to the setting of parameters, it used a Naive Bayes algorithm internally instead? The 1.6.0 manual actually says: {quote} Document Categorizer API To perform classification you will need a *maxent* model - these are encapsulated in the DoccatModel class of OpenNLP tools. First you need to grab the bytes from the serialized model on an InputStream - we'll leave it you to do that, since you were the one who serialized it to begin with. Now for the easy part: {quote} And the code goes: {code} String inputText = ... DocumentCategorizerME myCategorizer = new DocumentCategorierME(m); double[] outcomes = myCategorizer.categorize(inputText); String category = myCategorizer.getBestOutcome(); {code} Wouldn't this necessitate the use of a different concrete subclass (i.e., DocumentCategorizerNB) to preserve backward compatibility? (Because users have already written code using DocumentCategorierME rendering a change of nomenclature of the concrete class inadvisable)? > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CE
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805204#comment-14805204 ] Joern Kottmann commented on OPENNLP-777: You are right the name is not indicating that Maxent is used. We decided to keep the name anyway to not break backward compatibility. Every existing user who updates to a new version of OpenNLP would have to change their code to reflect this name change. The documentation should be updated. I therefore suggest that we just drop the DocumentCategorizerNB class and one day rename all the components to they are not ending on ME anymore. Have a look at the files in lang/ml, these are parameter files for the trainer. If you take one e.g. for maxent and change the algorithm to NAIVEBAYES you can train a NB document categorizer. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805338#comment-14805338 ] Cohan Sujay Carlos commented on OPENNLP-777: Right, [~joern], I get it now. I know [~teofili] is looking into Doccat, but I suppose I'll take a look as well. I'll run a test on the Leipsig corpus and see what it's using, and if anything more needs to be done to hook up the NB algorithm so that it can be used with a param switch. That should allow us to dispense with the DocumentCategorizerNB class. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14860187#comment-14860187 ] Joern Kottmann commented on OPENNLP-777: Great. Here is the command I used to train it: bin/opennlp DoccatTrainer.leipzig -sentencesDir /home/blue/Documents/langtrain/ -model langid-ngram.bin -lang mul -params lang/ml/NaiveBayesTrainerParams.txt And here are the files I used: afr_web_2013_100K-sentences.txt lit_newscrawl_2011_100K-sentences.txt ara_web_2011_100K-sentences.txt mal_newscrawl_2011_100K-sentences.txt bak_newscrawl_2011_100K-sentences.txt mar_newscrawl_2011_100K-sentences.txt bel_news_2011_100K-sentences.txt mkd_newscrwal_2011_100K-sentences.txt ben_newscrawl_2011_100K-sentences.txt mlt_web_2012_100K-sentences.txt bos_newscrawl_2011_100K-sentences.txt mri_web_2011_100K-sentences.txt bul_newscrawl_2011_100K-sentences.txt msa_newscrwal_2011_100K-sentences.txt cat_newscrawl_2011_100K-sentences.txt nep_news_2010_100K-sentences.txt ces_web_2012_100K-sentences.txt nld_mixed_2012_100K-sentences.txt cmn_wikipedia_2012_100K-sentences.txt nob_news_2013_100K-sentences.txt dan_mixed_2014_100K-sentences.txt pol_newscrawl_2011_100K-sentences.txt deu_news_2010_100K-sentences.txt por_newscrawl_2011_100K-sentences.txt ell_web_2011_100K-sentences.txt pus_newscrawl_2011_100K-sentences.txt eng_news_2010_100K-sentences.txt ron_web_2011_100K-sentences.txt epo_web_2012_100K-sentences.txt rus_news_2010_100K-sentences.txt est_newscrawl_2011_100K-sentences.txt slk_newscrawl_2011_100K-sentences.txt eus_newscrawl_2012_100K-sentences.txt slv_newscrawl_2011_100K-sentences.txt fao_web_2013_100K-sentences.txt som_newscrawl_2011_100K-sentences.txt fas_newscrawl_2011_100K-sentences.txt spa_news_2011_100K-sentences.txt fin_newscrawl_2011_100K-sentences.txt srp_wikipedia_2010_100K-sentences.txt fra_news_2010_100K-sentences.txt swe_news_2007_100K-sentences.txt glg_wikipedia_2012_100K-sentences.txt tam_newscrawl_2011_100K-sentences.txt hin_newscrawl_2012_100K-sentences.txt tat_mixed_2015_100K-sentences.txt hrv_newscrawl_2011_100K-sentences.txt tel_newscrawl_2011_100K-sentences.txt hun_mixed_2012_100K-sentences.txt tgk_newscrawl_2011_100K-sentences.txt hye_newscrawl_2011_100K-sentences.txt tgl_newscrwal_2011_100K-sentences.txt ind_web_2012_100K-sentences.txt tha_newscrawl_2011_100K-sentences.txt isl_newscrawl_2011_100K-sentences.txt tur_newscrawl_2011_100K-sentences.txt ita_web_2011_100K-sentences.txt ukr_web_2012_100K-sentences.txt jpn_news_2005-2008_100K-sentences.txt urd_newscrwal_2011_100K-sentences.txt kat_newscrawl_2011_100K-sentences.txt uzb_newscrawl_2011_100K-sentences.txt kaz_newscrawl_2011_100K-sentences.txt vie_newscrwal_2011_100K-sentences.txt kir_newscrawl_2011_100K-sentences.txt vol_wikipedia_2011_100K-sentences.txt kor_news_2007_100K-sentences.txt zho_news_2007-2009_100K-sentences.txt lav_newscrawl_2011_100K-sentences.txt zul_mixed_2013_100K-sentences.txt You can download them from here: http://corpora2.informatik.uni-leipzig.de/download.html The resulting language detection works rather well for texts that have at least a few words. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, event
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963353#comment-14963353 ] Cohan Sujay Carlos commented on OPENNLP-777: [~joern] and [~teofili], Just a quick update, since it's been a month since the last message on this thread. I'm finding the setup required to replicate Joern's environment a real bear (I'm sorry but I am just not familiar with the build tools), but I'll continue banging at it next week. But is there any way that either [~joern] or [~teofili] could check if the Naive Bayes classifier is indeed being used when you call the model as Joern has described using NaiveBayesTrainerParams.txt? It just seems very unlikely that the NB model would - if it is being called - fare worse than a perceptron at this task - given its superior performance in the PrepAttach testcase. Also since [~joern] is working on a language identifier, I thought I should mention that sequential models (markov chains) with character level features fare far better than linear classifiers with word-level features at that task ... a really good method is described by Sibun and Reynar in their paper "Language Identification: Examining the Issues". The training command that Joern used above doesn't seem to specify character-level features, so I am assuming he's using word-level features and a linear classifier - and that wouldn't work well in any case. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964843#comment-14964843 ] Joern Kottmann commented on OPENNLP-777: The Prep Attach Test here is verifying that already. It trains the classifier once by directly instantiating it and then once via TrainUtil. In both cases it comes to the same accuracy number. So that is good. In another test I see you switch smoothing on/off. The way that is done will not work very well. The method for doing that is static (concurrency problems) and it can't be influenced by a training parameter. I suggest we add a parameter for smoothing. Is that even an option a user should be able to set? Any other options that should be configurable via training params? What do you think? It would be nice to have state of the art language detection in OpenNLP. For my specific use case word-level features worked quite well, I use it to classify long news articles into a handful of languages. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: D1TopicClassifierTrainingDemoNB.java, > D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, > naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, > prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, > topics.train > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088848#comment-15088848 ] Tommaso Teofili commented on OPENNLP-777: - thanks [~cohan.sujay], the build is now green. I'll have a second look in the next hours and commit it if everything looks good. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088980#comment-15088980 ] Tommaso Teofili commented on OPENNLP-777: - I've committed latest [~cohan.sujay]'s patch in r1723671, thanks Cohan! > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095834#comment-15095834 ] Tommaso Teofili commented on OPENNLP-777: - very good to hear, thanks [~cohan.sujay] for sharing code and results. I'll try it out myself too. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, > NaiveBayesOpenNLPTestCode.zip, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103045#comment-15103045 ] Tommaso Teofili commented on OPENNLP-777: - I've tried NBC using the client code provided by [~cohan.sujay] and it worked nicely. I am keen to consider this resolved and to eventually open new issues when we spot something to improve / fix. > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, > NaiveBayesOpenNLPTestCode.zip, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier
[ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630765#comment-15630765 ] Joern Kottmann commented on OPENNLP-777: Can we close this issue? > Naive Bayesian Classifier > - > > Key: OPENNLP-777 > URL: https://issues.apache.org/jira/browse/OPENNLP-777 > Project: OpenNLP > Issue Type: New Feature > Components: Machine Learning > Environment: J2SE 1.5 and above >Reporter: Cohan Sujay Carlos >Assignee: Tommaso Teofili >Priority: Minor > Labels: NBClassifier, bayes, bayesian, classifier, multinomial, > naive, patch > Fix For: 1.6.1 > > Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, > NaiveBayesOpenNLPTestCode.zip, > naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it > lacks one at present). > Implementation details: We have a production-hardened piece of Java code for > a multinomial Naive Bayesian classifier (with default Laplace smoothing) that > we'd like to contribute. The code is Java 1.5 compatible. I'd have to write > an adapter to make the interface compatible with the ME classifier in > OpenNLP. I expect the patch to be available 1 to 3 weeks from now. > Below is the email trail of a discussion in the dev mailing list around this > dated May 19th, 2015. > > Tommaso Teofili via opennlp.apache.org > to dev > Hi Cohan, > I think that'd be a very valuable contribution, as NB is one of the > foundation algorithms, often used as basis for comparisons. > It would be good if you could create a Jira issue and provide more details > about the implementation and, eventually, a patch. > Thanks and regards, > Tommaso > > 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos > > I have a question for the OpenNLP project team. > > > > I was wondering if there is a Naive Bayesian classifier implementation in > > OpenNLP that I've not come across, or if there are plans to implement one. > > > > If it is the latter, I should love to contribute an implementation. > > > > There is an ME classifier already available in OpenNLP, of course, but I > > felt that there was an unmet need for a Naive Bayesian (NB) classifier > > implementation to be offered as well. > > > > An NB classifier could be bootstrapped up with partially labelled training > > data as explained in the Nigam, McCallum, et al paper of 2000 "Text > > Classification from Labeled and Unlabeled Documents using EM". > > > > So, if there isn't an NB code base out there already, I'd be happy to > > contribute a very solid implementation that we've used in production for a > > good 5 years. > > > > I'd have to adapt it to load the same training data format as the ME > > classifier, but I guess that shouldn't be very difficult to do. > > > > I was wondering if there was some interest in adding an NB implementation > > and I'd love to know who could I coordinate with if there is? > > > > Cohan Sujay Carlos > > CEO, Aiaioo Labs, India -- This message was sent by Atlassian JIRA (v6.3.4#6332)