Thanks Tommaso. The latest patch is attached to the Jira issue as well (and I see I needn't have attached it in these email messages).
(I have also emailed a copy of the completed ICLA to Joern). Cohan On Mon, Jul 20, 2015 at 1:44 PM, Tommaso Teofili <[email protected]> wrote: > Hi Cohan, > > for future reference it'd be better if you could attach the patch to the > Jira issue, I can have a look at your patch and comment later this week. > Usually we mark an issue as fixed as soon as the relevant code is committed. > > Regards, > Tommaso > > > 2015-07-18 15:36 GMT+02:00 Cohan Sujay Carlos <[email protected]>: > >> Yet another patch (ouch, yeah, like they say, "in product development, >> you're never done, till you're done!"). >> >> Reason: Error in one of the test-cases. The test-case >> DocumentCategorizerNBTest is now correct and runs fine. >> >> Cohan Sujay Carlos >> CEO, Aiaioo Labs >> +91-77605-80015 >> >> On Sat, Jul 18, 2015 at 6:51 PM, Cohan Sujay Carlos <[email protected]> >> wrote: >> >>> A small update to the patch (I removed a superfluous piece of code). >>> >>> In the earlier path, I had used a subclass of >>> opennlp.tools.doccat.DoccatModel called opennlp.tools.doccat.DoccatModelNB >>> that was functionally identical. I removed that subclass since it >>> wasn't essential (DoccatModel does the trick just fine). >>> >>> Is there anything else I need to do? >>> >>> Is someone on the dev team going to be responsible for incorporating the >>> patch into the codebase? >>> >>> Can I mark this Jira issue fixed (for target 1.6.1?). >>> >>> Cohan Sujay Carlos >>> CEO, Aiaioo Labs >>> +91-77605-80015 >>> >>> >>> On Sat, Jul 18, 2015 at 6:02 PM, Cohan Sujay Carlos <[email protected]> >>> wrote: >>> >>>> I have gone ahead and written the test-cases and verified that the >>>> Naive Bayes Classifier works correctly. >>>> >>>> Here is the latest patch (attached) with the test-cases and everything. >>>> >>>> In implementing the Naive Bayes classifier, we tried to *ensure >>>> minimal disruption* to existing code. >>>> >>>> The *only* changes to existing code are as follows: >>>> >>>> 1. The opennlp.tools.ml.model.AbstractModel class has been changed to >>>> include a new model type: >>>> >>>> line 35: *public enum ModelType * >>>> *{Maxent,Perceptron,MaxentQn,NaiveBayes};* >>>> >>>> 2. The opennlp.tools.ml.model.GenericModelReader class has been >>>> changed in one place: >>>> >>>> line 53: >>>> *else if (modelType.equals("NaiveBayes")) **{ delegateModelReader = >>>> new NaiveBayesModelReader(this.dataReader); }* >>>> >>>> 3. The opennlp.tools.ml.model.GenericModelWriter class has been >>>> changed in two places: >>>> >>>> line 79: >>>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter >>>> = new BinaryNaiveBayesModelWriter(model,dos); }* >>>> >>>> line 91: >>>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter >>>> = new PlainTextNaiveBayesModelWriter(model,bw); }* >>>> >>>> 4. The initializer of the opennlp.tools.ml.TrainerFactory class has >>>> been changed in one place to add the Naive Bayes trainer: >>>> >>>> line 51: >>>> *_trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE, >>>> NaiveBayesTrainer.class);* >>>> >>>> That was it! >>>> >>>> We didn't change anything else in the existing OpenNLP code. >>>> >>>> All the new code for the Naive Bayesian classifier sits in the package >>>> opennlp.tools.ml.naivebayes - just above the perceptron >>>> >>>> The code for the document categorizer using the Naive Bayesian >>>> classifier can be found in opennlp.tools.doccat (we didn't have to >>>> change any existing code). The new doccat is called >>>> opennlp.tools.doccat.DocumentCategorizerNB (reflecting the name of the >>>> maxent document categorizer, which is DocumentCategorizerME). >>>> >>>> Proof of correctness! >>>> >>>> I have included two testcases: >>>> >>>> 1. A test to validate the document categorizer - under the tests >>>> folder, you will find opennlp.tools.doccat.DocumentCategorizerNBTest - >>>> which runs the same tests that were run on the ME document categorizer, but >>>> on the Naive Bayes categorizer instead (all tests passed). >>>> >>>> 2. A test to check the mathematical correctness of the Naive Bayes >>>> implementation can be found in >>>> opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest. >>>> >>>> So, the inclusion of this code will minimally impact any existing code. >>>> >>>> And the code in this patch contains a multinomial Naive Bayesian >>>> classifier that is verifiably correct. >>>> >>>> Is there anything else I have to do to have this patch pulled into the >>>> OpenNLP code base (for say 1.7.0)? >>>> >>>> Cohan Sujay Carlos >>>> CEO, Aiaioo Labs >>>> +91-77605-80015 >>>> >>>> On Tue, May 19, 2015 at 7:21 PM, Cohan Sujay Carlos <[email protected]> >>>>> wrote: >>>>> >>>>>> Tommaso, >>>>>> >>>>>> I have created the Jira issue: >>>>>> https://issues.apache.org/jira/browse/OPENNLP-777 >>>>>> >>>>>> >
