Hi Cohan, for future reference it'd be better if you could attach the patch to the Jira issue, I can have a look at your patch and comment later this week. Usually we mark an issue as fixed as soon as the relevant code is committed.
Regards, Tommaso 2015-07-18 15:36 GMT+02:00 Cohan Sujay Carlos <[email protected]>: > Yet another patch (ouch, yeah, like they say, "in product development, > you're never done, till you're done!"). > > Reason: Error in one of the test-cases. The test-case > DocumentCategorizerNBTest is now correct and runs fine. > > Cohan Sujay Carlos > CEO, Aiaioo Labs > +91-77605-80015 > > On Sat, Jul 18, 2015 at 6:51 PM, Cohan Sujay Carlos <[email protected]> > wrote: > >> A small update to the patch (I removed a superfluous piece of code). >> >> In the earlier path, I had used a subclass of >> opennlp.tools.doccat.DoccatModel called opennlp.tools.doccat.DoccatModelNB >> that was functionally identical. I removed that subclass since it >> wasn't essential (DoccatModel does the trick just fine). >> >> Is there anything else I need to do? >> >> Is someone on the dev team going to be responsible for incorporating the >> patch into the codebase? >> >> Can I mark this Jira issue fixed (for target 1.6.1?). >> >> Cohan Sujay Carlos >> CEO, Aiaioo Labs >> +91-77605-80015 >> >> >> On Sat, Jul 18, 2015 at 6:02 PM, Cohan Sujay Carlos <[email protected]> >> wrote: >> >>> I have gone ahead and written the test-cases and verified that the Naive >>> Bayes Classifier works correctly. >>> >>> Here is the latest patch (attached) with the test-cases and everything. >>> >>> In implementing the Naive Bayes classifier, we tried to *ensure minimal >>> disruption* to existing code. >>> >>> The *only* changes to existing code are as follows: >>> >>> 1. The opennlp.tools.ml.model.AbstractModel class has been changed to >>> include a new model type: >>> >>> line 35: *public enum ModelType * >>> *{Maxent,Perceptron,MaxentQn,NaiveBayes};* >>> >>> 2. The opennlp.tools.ml.model.GenericModelReader class has been changed >>> in one place: >>> >>> line 53: >>> *else if (modelType.equals("NaiveBayes")) **{ delegateModelReader = new >>> NaiveBayesModelReader(this.dataReader); }* >>> >>> 3. The opennlp.tools.ml.model.GenericModelWriter class has been changed >>> in two places: >>> >>> line 79: >>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter = >>> new BinaryNaiveBayesModelWriter(model,dos); }* >>> >>> line 91: >>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter = >>> new PlainTextNaiveBayesModelWriter(model,bw); }* >>> >>> 4. The initializer of the opennlp.tools.ml.TrainerFactory class has >>> been changed in one place to add the Naive Bayes trainer: >>> >>> line 51: >>> *_trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE, >>> NaiveBayesTrainer.class);* >>> >>> That was it! >>> >>> We didn't change anything else in the existing OpenNLP code. >>> >>> All the new code for the Naive Bayesian classifier sits in the package >>> opennlp.tools.ml.naivebayes - just above the perceptron >>> >>> The code for the document categorizer using the Naive Bayesian >>> classifier can be found in opennlp.tools.doccat (we didn't have to >>> change any existing code). The new doccat is called >>> opennlp.tools.doccat.DocumentCategorizerNB (reflecting the name of the >>> maxent document categorizer, which is DocumentCategorizerME). >>> >>> Proof of correctness! >>> >>> I have included two testcases: >>> >>> 1. A test to validate the document categorizer - under the tests folder, >>> you will find opennlp.tools.doccat.DocumentCategorizerNBTest - which >>> runs the same tests that were run on the ME document categorizer, but on >>> the Naive Bayes categorizer instead (all tests passed). >>> >>> 2. A test to check the mathematical correctness of the Naive Bayes >>> implementation can be found in >>> opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest. >>> >>> So, the inclusion of this code will minimally impact any existing code. >>> >>> And the code in this patch contains a multinomial Naive Bayesian >>> classifier that is verifiably correct. >>> >>> Is there anything else I have to do to have this patch pulled into the >>> OpenNLP code base (for say 1.7.0)? >>> >>> Cohan Sujay Carlos >>> CEO, Aiaioo Labs >>> +91-77605-80015 >>> >>> On Tue, May 19, 2015 at 7:21 PM, Cohan Sujay Carlos <[email protected]> >>>> wrote: >>>> >>>>> Tommaso, >>>>> >>>>> I have created the Jira issue: >>>>> https://issues.apache.org/jira/browse/OPENNLP-777 >>>>> >>>>>
