Yet another patch (ouch, yeah, like they say, "in product development, you're never done, till you're done!").
Reason: Error in one of the test-cases. The test-case DocumentCategorizerNBTest is now correct and runs fine. Cohan Sujay Carlos CEO, Aiaioo Labs +91-77605-80015 On Sat, Jul 18, 2015 at 6:51 PM, Cohan Sujay Carlos <[email protected]> wrote: > A small update to the patch (I removed a superfluous piece of code). > > In the earlier path, I had used a subclass of > opennlp.tools.doccat.DoccatModel called opennlp.tools.doccat.DoccatModelNB > that was functionally identical. I removed that subclass since it wasn't > essential (DoccatModel does the trick just fine). > > Is there anything else I need to do? > > Is someone on the dev team going to be responsible for incorporating the > patch into the codebase? > > Can I mark this Jira issue fixed (for target 1.6.1?). > > Cohan Sujay Carlos > CEO, Aiaioo Labs > +91-77605-80015 > > > On Sat, Jul 18, 2015 at 6:02 PM, Cohan Sujay Carlos <[email protected]> > wrote: > >> I have gone ahead and written the test-cases and verified that the Naive >> Bayes Classifier works correctly. >> >> Here is the latest patch (attached) with the test-cases and everything. >> >> In implementing the Naive Bayes classifier, we tried to *ensure minimal >> disruption* to existing code. >> >> The *only* changes to existing code are as follows: >> >> 1. The opennlp.tools.ml.model.AbstractModel class has been changed to >> include a new model type: >> >> line 35: *public enum ModelType * >> *{Maxent,Perceptron,MaxentQn,NaiveBayes};* >> >> 2. The opennlp.tools.ml.model.GenericModelReader class has been changed >> in one place: >> >> line 53: >> *else if (modelType.equals("NaiveBayes")) **{ delegateModelReader = new >> NaiveBayesModelReader(this.dataReader); }* >> >> 3. The opennlp.tools.ml.model.GenericModelWriter class has been changed >> in two places: >> >> line 79: >> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter = >> new BinaryNaiveBayesModelWriter(model,dos); }* >> >> line 91: >> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter = >> new PlainTextNaiveBayesModelWriter(model,bw); }* >> >> 4. The initializer of the opennlp.tools.ml.TrainerFactory class has been >> changed in one place to add the Naive Bayes trainer: >> >> line 51: >> *_trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE, >> NaiveBayesTrainer.class);* >> >> That was it! >> >> We didn't change anything else in the existing OpenNLP code. >> >> All the new code for the Naive Bayesian classifier sits in the package >> opennlp.tools.ml.naivebayes - just above the perceptron >> >> The code for the document categorizer using the Naive Bayesian classifier >> can be found in opennlp.tools.doccat (we didn't have to change any >> existing code). The new doccat is called >> opennlp.tools.doccat.DocumentCategorizerNB (reflecting the name of the >> maxent document categorizer, which is DocumentCategorizerME). >> >> Proof of correctness! >> >> I have included two testcases: >> >> 1. A test to validate the document categorizer - under the tests folder, >> you will find opennlp.tools.doccat.DocumentCategorizerNBTest - which >> runs the same tests that were run on the ME document categorizer, but on >> the Naive Bayes categorizer instead (all tests passed). >> >> 2. A test to check the mathematical correctness of the Naive Bayes >> implementation can be found in >> opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest. >> >> So, the inclusion of this code will minimally impact any existing code. >> >> And the code in this patch contains a multinomial Naive Bayesian >> classifier that is verifiably correct. >> >> Is there anything else I have to do to have this patch pulled into the >> OpenNLP code base (for say 1.7.0)? >> >> Cohan Sujay Carlos >> CEO, Aiaioo Labs >> +91-77605-80015 >> >> On Tue, May 19, 2015 at 7:21 PM, Cohan Sujay Carlos <[email protected]> >>> wrote: >>> >>>> Tommaso, >>>> >>>> I have created the Jira issue: >>>> https://issues.apache.org/jira/browse/OPENNLP-777 >>>> >>>>
