Re: Are there plans to offer a Naive Bayesian Classifier in OpenNLP ?

Cohan Sujay Carlos Sat, 18 Jul 2015 06:37:12 -0700

Yet another patch (ouch, yeah, like they say, "in product development,
you're never done, till you're done!").


Reason: Error in one of the test-cases.  The test-case
DocumentCategorizerNBTest is now correct and runs fine.

Cohan Sujay Carlos
CEO, Aiaioo Labs
+91-77605-80015

On Sat, Jul 18, 2015 at 6:51 PM, Cohan Sujay Carlos <[email protected]>
wrote:

> A small update to the patch (I removed a superfluous piece of code).
>
> In the earlier path, I had used a subclass of
> opennlp.tools.doccat.DoccatModel called opennlp.tools.doccat.DoccatModelNB
> that was functionally identical.  I removed that subclass since it wasn't
> essential (DoccatModel does the trick just fine).
>
> Is there anything else I need to do?
>
> Is someone on the dev team going to be responsible for incorporating the
> patch into the codebase?
>
> Can I mark this Jira issue fixed (for target 1.6.1?).
>
> Cohan Sujay Carlos
> CEO, Aiaioo Labs
> +91-77605-80015
>
>
> On Sat, Jul 18, 2015 at 6:02 PM, Cohan Sujay Carlos <[email protected]>
> wrote:
>
>> I have gone ahead and written the test-cases and verified that the Naive
>> Bayes Classifier works correctly.
>>
>> Here is the latest patch (attached) with the test-cases and everything.
>>
>> In implementing the Naive Bayes classifier, we tried to *ensure minimal
>> disruption* to existing code.
>>
>> The *only* changes to existing code are as follows:
>>
>> 1. The opennlp.tools.ml.model.AbstractModel class has been changed to
>> include a new model type:
>>
>> line 35: *public enum ModelType *
>> *{Maxent,Perceptron,MaxentQn,NaiveBayes};*
>>
>> 2. The opennlp.tools.ml.model.GenericModelReader class has been changed
>> in one place:
>>
>> line 53:
>> *else if (modelType.equals("NaiveBayes")) **{ delegateModelReader = new
>> NaiveBayesModelReader(this.dataReader); }*
>>
>> 3. The opennlp.tools.ml.model.GenericModelWriter class has been changed
>> in two places:
>>
>> line 79:
>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter =
>> new BinaryNaiveBayesModelWriter(model,dos); }*
>>
>> line 91:
>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter =
>> new PlainTextNaiveBayesModelWriter(model,bw); }*
>>
>> 4. The initializer of the opennlp.tools.ml.TrainerFactory class has been
>> changed in one place to add the Naive Bayes trainer:
>>
>> line 51:
>> *_trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE,
>> NaiveBayesTrainer.class);*
>>
>> That was it!
>>
>> We didn't change anything else in the existing OpenNLP code.
>>
>> All the new code for the Naive Bayesian classifier sits in the package
>> opennlp.tools.ml.naivebayes - just above the perceptron
>>
>> The code for the document categorizer using the Naive Bayesian classifier
>> can be found in opennlp.tools.doccat (we didn't have to change any
>> existing code). The new doccat is called
>> opennlp.tools.doccat.DocumentCategorizerNB (reflecting the name of the
>> maxent document categorizer, which is DocumentCategorizerME).
>>
>> Proof of correctness!
>>
>> I have included two testcases:
>>
>> 1. A test to validate the document categorizer - under the tests folder,
>> you will find opennlp.tools.doccat.DocumentCategorizerNBTest - which
>> runs the same tests that were run on the ME document categorizer, but on
>> the Naive Bayes categorizer instead (all tests passed).
>>
>> 2. A test to check the mathematical correctness of the Naive Bayes
>> implementation can be found in
>> opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest.
>>
>> So, the inclusion of this code will minimally impact any existing code.
>>
>> And the code in this patch contains a multinomial Naive Bayesian
>> classifier that is verifiably correct.
>>
>> Is there anything else I have to do to have this patch pulled into the
>> OpenNLP code base (for say 1.7.0)?
>>
>> Cohan Sujay Carlos
>> CEO, Aiaioo Labs
>> +91-77605-80015
>>
>> On Tue, May 19, 2015 at 7:21 PM, Cohan Sujay Carlos <[email protected]>
>>> wrote:
>>>
>>>> Tommaso,
>>>>
>>>> I have created the Jira issue:
>>>> https://issues.apache.org/jira/browse/OPENNLP-777
>>>>
>>>>

Re: Are there plans to offer a Naive Bayesian Classifier in OpenNLP ?

Reply via email to