Re: Are there plans to offer a Naive Bayesian Classifier in OpenNLP ?

Tommaso Teofili Mon, 20 Jul 2015 01:15:52 -0700

Hi Cohan,

for future reference it'd be better if you could attach the patch to the
Jira issue, I can have a look at your patch and comment later this week.
Usually we mark an issue as fixed as soon as the relevant code is committed.


Regards,
Tommaso


2015-07-18 15:36 GMT+02:00 Cohan Sujay Carlos <[email protected]>:

> Yet another patch (ouch, yeah, like they say, "in product development,
> you're never done, till you're done!").
>
> Reason: Error in one of the test-cases.  The test-case
> DocumentCategorizerNBTest is now correct and runs fine.
>
> Cohan Sujay Carlos
> CEO, Aiaioo Labs
> +91-77605-80015
>
> On Sat, Jul 18, 2015 at 6:51 PM, Cohan Sujay Carlos <[email protected]>
> wrote:
>
>> A small update to the patch (I removed a superfluous piece of code).
>>
>> In the earlier path, I had used a subclass of
>> opennlp.tools.doccat.DoccatModel called opennlp.tools.doccat.DoccatModelNB
>> that was functionally identical.  I removed that subclass since it
>> wasn't essential (DoccatModel does the trick just fine).
>>
>> Is there anything else I need to do?
>>
>> Is someone on the dev team going to be responsible for incorporating the
>> patch into the codebase?
>>
>> Can I mark this Jira issue fixed (for target 1.6.1?).
>>
>> Cohan Sujay Carlos
>> CEO, Aiaioo Labs
>> +91-77605-80015
>>
>>
>> On Sat, Jul 18, 2015 at 6:02 PM, Cohan Sujay Carlos <[email protected]>
>> wrote:
>>
>>> I have gone ahead and written the test-cases and verified that the Naive
>>> Bayes Classifier works correctly.
>>>
>>> Here is the latest patch (attached) with the test-cases and everything.
>>>
>>> In implementing the Naive Bayes classifier, we tried to *ensure minimal
>>> disruption* to existing code.
>>>
>>> The *only* changes to existing code are as follows:
>>>
>>> 1. The opennlp.tools.ml.model.AbstractModel class has been changed to
>>> include a new model type:
>>>
>>> line 35: *public enum ModelType *
>>> *{Maxent,Perceptron,MaxentQn,NaiveBayes};*
>>>
>>> 2. The opennlp.tools.ml.model.GenericModelReader class has been changed
>>> in one place:
>>>
>>> line 53:
>>> *else if (modelType.equals("NaiveBayes")) **{ delegateModelReader = new
>>> NaiveBayesModelReader(this.dataReader); }*
>>>
>>> 3. The opennlp.tools.ml.model.GenericModelWriter class has been changed
>>> in two places:
>>>
>>> line 79:
>>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter =
>>> new BinaryNaiveBayesModelWriter(model,dos); }*
>>>
>>> line 91:
>>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter =
>>> new PlainTextNaiveBayesModelWriter(model,bw); }*
>>>
>>> 4. The initializer of the opennlp.tools.ml.TrainerFactory class has
>>> been changed in one place to add the Naive Bayes trainer:
>>>
>>> line 51:
>>> *_trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE,
>>> NaiveBayesTrainer.class);*
>>>
>>> That was it!
>>>
>>> We didn't change anything else in the existing OpenNLP code.
>>>
>>> All the new code for the Naive Bayesian classifier sits in the package
>>> opennlp.tools.ml.naivebayes - just above the perceptron
>>>
>>> The code for the document categorizer using the Naive Bayesian
>>> classifier can be found in opennlp.tools.doccat (we didn't have to
>>> change any existing code). The new doccat is called
>>> opennlp.tools.doccat.DocumentCategorizerNB (reflecting the name of the
>>> maxent document categorizer, which is DocumentCategorizerME).
>>>
>>> Proof of correctness!
>>>
>>> I have included two testcases:
>>>
>>> 1. A test to validate the document categorizer - under the tests folder,
>>> you will find opennlp.tools.doccat.DocumentCategorizerNBTest - which
>>> runs the same tests that were run on the ME document categorizer, but on
>>> the Naive Bayes categorizer instead (all tests passed).
>>>
>>> 2. A test to check the mathematical correctness of the Naive Bayes
>>> implementation can be found in
>>> opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest.
>>>
>>> So, the inclusion of this code will minimally impact any existing code.
>>>
>>> And the code in this patch contains a multinomial Naive Bayesian
>>> classifier that is verifiably correct.
>>>
>>> Is there anything else I have to do to have this patch pulled into the
>>> OpenNLP code base (for say 1.7.0)?
>>>
>>> Cohan Sujay Carlos
>>> CEO, Aiaioo Labs
>>> +91-77605-80015
>>>
>>> On Tue, May 19, 2015 at 7:21 PM, Cohan Sujay Carlos <[email protected]>
>>>> wrote:
>>>>
>>>>> Tommaso,
>>>>>
>>>>> I have created the Jira issue:
>>>>> https://issues.apache.org/jira/browse/OPENNLP-777
>>>>>
>>>>>

Re: Are there plans to offer a Naive Bayesian Classifier in OpenNLP ?

Reply via email to