A small update to the patch (I removed a superfluous piece of code).

In the earlier path, I had used a subclass of
opennlp.tools.doccat.DoccatModel called opennlp.tools.doccat.DoccatModelNB
that was functionally identical.  I removed that subclass since it wasn't
essential (DoccatModel does the trick just fine).

Is there anything else I need to do?

Is someone on the dev team going to be responsible for incorporating the
patch into the codebase?

Can I mark this Jira issue fixed (for target 1.6.1?).

Cohan Sujay Carlos
CEO, Aiaioo Labs
+91-77605-80015


On Sat, Jul 18, 2015 at 6:02 PM, Cohan Sujay Carlos <[email protected]>
wrote:

> I have gone ahead and written the test-cases and verified that the Naive
> Bayes Classifier works correctly.
>
> Here is the latest patch (attached) with the test-cases and everything.
>
> In implementing the Naive Bayes classifier, we tried to *ensure minimal
> disruption* to existing code.
>
> The *only* changes to existing code are as follows:
>
> 1. The opennlp.tools.ml.model.AbstractModel class has been changed to
> include a new model type:
>
> line 35: *public enum ModelType *
> *{Maxent,Perceptron,MaxentQn,NaiveBayes};*
>
> 2. The opennlp.tools.ml.model.GenericModelReader class has been changed
> in one place:
>
> line 53:
> *else if (modelType.equals("NaiveBayes")) **{ delegateModelReader = new
> NaiveBayesModelReader(this.dataReader); }*
>
> 3. The opennlp.tools.ml.model.GenericModelWriter class has been changed
> in two places:
>
> line 79:
> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter =
> new BinaryNaiveBayesModelWriter(model,dos); }*
>
> line 91:
> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter =
> new PlainTextNaiveBayesModelWriter(model,bw); }*
>
> 4. The initializer of the opennlp.tools.ml.TrainerFactory class has been
> changed in one place to add the Naive Bayes trainer:
>
> line 51:
> *_trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE,
> NaiveBayesTrainer.class);*
>
> That was it!
>
> We didn't change anything else in the existing OpenNLP code.
>
> All the new code for the Naive Bayesian classifier sits in the package
> opennlp.tools.ml.naivebayes - just above the perceptron
>
> The code for the document categorizer using the Naive Bayesian classifier
> can be found in opennlp.tools.doccat (we didn't have to change any
> existing code). The new doccat is called
> opennlp.tools.doccat.DocumentCategorizerNB (reflecting the name of the
> maxent document categorizer, which is DocumentCategorizerME).
>
> Proof of correctness!
>
> I have included two testcases:
>
> 1. A test to validate the document categorizer - under the tests folder,
> you will find opennlp.tools.doccat.DocumentCategorizerNBTest - which runs
> the same tests that were run on the ME document categorizer, but on the
> Naive Bayes categorizer instead (all tests passed).
>
> 2. A test to check the mathematical correctness of the Naive Bayes
> implementation can be found in
> opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest.
>
> So, the inclusion of this code will minimally impact any existing code.
>
> And the code in this patch contains a multinomial Naive Bayesian
> classifier that is verifiably correct.
>
> Is there anything else I have to do to have this patch pulled into the
> OpenNLP code base (for say 1.7.0)?
>
> Cohan Sujay Carlos
> CEO, Aiaioo Labs
> +91-77605-80015
>
> On Tue, May 19, 2015 at 7:21 PM, Cohan Sujay Carlos <[email protected]>
>> wrote:
>>
>>> Tommaso,
>>>
>>> I have created the Jira issue:
>>> https://issues.apache.org/jira/browse/OPENNLP-777
>>>
>>>

Reply via email to