[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2016-01-07 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088015#comment-15088015
 ] 

Cohan Sujay Carlos edited comment on OPENNLP-777 at 1/7/16 8:10 PM:


Apologies, [~teofili] ... I have attached the {{NaiveBayesCorrectnessTest}} ... 
I think the problem in the patch is because I copied some files back into the 
Eclipse project where I created the patch.  I believe Eclipse treats copies 
between projects as delete+add.  In this case, it seems to have left out the 
add mysteriously.

I hope the attached test solves the issue.  I'll stop using two projects for my 
OpenNLP development work henceforth.


was (Author: cohan.sujay):
Apologies, [~teofili] ... I have attached the {{NaiveBayesCorrectnessTest}} ... 
I think the problem in the patch is because I copying the files back into the 
Eclipse project where I created the patch.  I believe Eclipse treats copies 
between projects as delete+add.  In this case, it seems to have left out the 
add mysteriously.

I hope the attached test solves the issue.  I'll stop using two projects for my 
OpenNLP development work henceforth.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: NaiveBayesCorrectnessTest.java, NaiveBayesModel.java, 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2015-12-28 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072948#comment-15072948
 ] 

Cohan Sujay Carlos edited comment on OPENNLP-777 at 12/28/15 5:45 PM:
--

Attaching a patch with the formatting issues in NaiveBayesModel taken care of 
(you'll just need to check the patch 
"naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch" 
in - it is to be applied to the trunk).


was (Author: cohan.sujay):
Affixing a patch with the formatting issues in NaiveBayesModel taken care of 
(you'll just need to check the patch 
"naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch" 
in - it is to be applied to the trunk).

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: 
> naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2015-09-18 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805159#comment-14805159
 ] 

Cohan Sujay Carlos edited comment on OPENNLP-777 at 9/18/15 7:53 AM:
-

[~teofili],

I had built the NaiveBayes reader by looking at the PerceptronReader.  So, I 
rewrote your test with the Perceptron class hierarchy instead of the NaiveBayes 
class hierarchy and obtained the same error.  The reader.getModel method fails 
in exactly the same way in the PerceptronReader as well.

Here is the test code:

{code}
PerceptronModel model = (PerceptronModel)new 
PerceptronTrainer().trainModel(10, new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false), 1);

File file = new File("test_perceptron.bin");
PerceptronModelWriter modelWriter = new 
BinaryPerceptronModelWriter(model, file);
modelWriter.persist();

PerceptronModelReader reader = new 
BinaryPerceptronModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.getModel();
assertNotNull(abstractModel);
{code}

I hope that helps you with this problem.


was (Author: cohan.sujay):
Tommaso,

I had built the NaiveBayes reader by looking at the PerceptronReader.  So, I 
rewrote your test with the Perceptron class hierarchy instead of the NaiveBayes 
class hierarchy and obtained the same error.  The reader.getModel method fails 
in exactly the same way in the PerceptronReader as well.

Here is the test code:

{code}
PerceptronModel model = (PerceptronModel)new 
PerceptronTrainer().trainModel(10, new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false), 1);

File file = new File("test_perceptron.bin");
PerceptronModelWriter modelWriter = new 
BinaryPerceptronModelWriter(model, file);
modelWriter.persist();

PerceptronModelReader reader = new 
BinaryPerceptronModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.getModel();
assertNotNull(abstractModel);
{code}

I hope that helps you with this problem.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be 

[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2015-09-18 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746005#comment-14746005
 ] 

Cohan Sujay Carlos edited comment on OPENNLP-777 at 9/18/15 7:55 AM:
-

Tommaso,

The problem in the above testcases seems to be in the use of the 
GenericModelWriter.

Each of the machine learning algorithms has its own set of ModelWriter and 
ModelReader classes which must be used to persist their models.

The Writers come in one of 2 flavours - Binary and PlainText.

So, the following testcases work for me (one thing that baffled me was that I 
had to use constructModel rather than getModel to make these testcases work).

I hope that answers your question.

{code}
@Test

  public void testBinaryModelPersistence() throws Exception {

NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(

NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.bin");

NaiveBayesModelWriter modelWriter = new 
BinaryNaiveBayesModelWriter(model, file);

modelWriter.persist();

NaiveBayesModelReader reader = new 
BinaryNaiveBayesModelReader(file);

reader.checkModelType();

AbstractModel abstractModel = reader.constructModel();

assertNotNull(abstractModel);

  }

  @Test

  public void testTextModelPersistence() throws Exception {

NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(

NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.txt");

NaiveBayesModelWriter modelWriter = new 
PlainTextNaiveBayesModelWriter(model, file);

modelWriter.persist();

NaiveBayesModelReader reader = new 
PlainTextNaiveBayesModelReader(file);

reader.checkModelType();

AbstractModel abstractModel = reader.constructModel();

assertNotNull(abstractModel);

  }
{code}


was (Author: cohan.sujay):
Tommaso,

The problem in the above testcases seems to be in the use of the 
GenericModelWriter.

Each of the machine learning algorithms has its own set of ModelWriter and 
ModelReader classes which must be used to persist their models.

The Writers come in one of 2 flavours - Binary and PlainText.

So, the following testcases work for me (one thing that baffled me was that I 
had to use constructModel rather than getModel to make these testcases work).

I hope that answers your question.

@Test

  public void testBinaryModelPersistence() throws Exception {

NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(

NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.bin");

NaiveBayesModelWriter modelWriter = new 
BinaryNaiveBayesModelWriter(model, file);

modelWriter.persist();

NaiveBayesModelReader reader = new 
BinaryNaiveBayesModelReader(file);

reader.checkModelType();

AbstractModel abstractModel = reader.constructModel();

assertNotNull(abstractModel);

  }

  @Test

  public void testTextModelPersistence() throws Exception {

NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(

NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.txt");

NaiveBayesModelWriter modelWriter = new 
PlainTextNaiveBayesModelWriter(model, file);

modelWriter.persist();

NaiveBayesModelReader reader = new 
PlainTextNaiveBayesModelReader(file);

reader.checkModelType();

AbstractModel abstractModel = reader.constructModel();

assertNotNull(abstractModel);

  }


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original 

[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2015-09-17 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791808#comment-14791808
 ] 

Tommaso Teofili edited comment on OPENNLP-777 at 9/17/15 8:58 AM:
--

thanks [~cohan.sujay] for the help on the unit test, I'll have a look at why 
getModel doesn't work.


was (Author: teofili):
thanks @Cohan .sujay] for the help on the unit test, I'll have a look at why 
getModel doesn't work.

> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> 
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> 
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2015-09-15 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746005#comment-14746005
 ] 

Cohan Sujay Carlos edited comment on OPENNLP-777 at 9/15/15 7:38 PM:
-

Tommaso,

The problem in the above testcases seems to be in the use of the 
GenericModelWriter.

Each of the machine learning algorithms has its own set of ModelWriter and 
ModelReader classes which must be used to persist their models.

The Writers come in one of 2 flavours - Binary and PlainText.

So, the following testcases work for me (one thing that baffled me was that I 
had to use constructModel rather than getModel to make these testcases work).

I hope that answers your question.


@Test
  public void testBinaryModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.bin");
NaiveBayesModelWriter modelWriter = new 
BinaryNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
BinaryNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }

  @Test
  public void testTextModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.txt");
NaiveBayesModelWriter modelWriter = new 
PlainTextNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
PlainTextNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }



was (Author: cohan.sujay):
Tommaso,

The problem in the above testcases seems to be in the use of the 
GenericModelWriter.

Each of the machine learning algorithms has its own set of ModelWriter and 
ModelReader classes which must be used to persist their models.

The Writers come in one of 2 flavours - Binary and PlainText.

So, the following testcases work for me (one thing that baffled me was that I 
had to use constructModel rather than getModel to make these testcases work).

I hope that answers your question.

@Test
  public void testBinaryModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.bin");
NaiveBayesModelWriter modelWriter = new 
BinaryNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
BinaryNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }

  @Test
  public void testTextModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.txt");
NaiveBayesModelWriter modelWriter = new 
PlainTextNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
PlainTextNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> 

[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2015-09-15 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746005#comment-14746005
 ] 

Cohan Sujay Carlos edited comment on OPENNLP-777 at 9/15/15 7:40 PM:
-

Tommaso,

The problem in the above testcases seems to be in the use of the 
GenericModelWriter.

Each of the machine learning algorithms has its own set of ModelWriter and 
ModelReader classes which must be used to persist their models.

The Writers come in one of 2 flavours - Binary and PlainText.

So, the following testcases work for me (one thing that baffled me was that I 
had to use constructModel rather than getModel to make these testcases work).

I hope that answers your question.

@Test

  public void testBinaryModelPersistence() throws Exception {

NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(

NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.bin");

NaiveBayesModelWriter modelWriter = new 
BinaryNaiveBayesModelWriter(model, file);

modelWriter.persist();

NaiveBayesModelReader reader = new 
BinaryNaiveBayesModelReader(file);

reader.checkModelType();

AbstractModel abstractModel = reader.constructModel();

assertNotNull(abstractModel);

  }

  @Test

  public void testTextModelPersistence() throws Exception {

NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(

NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.txt");

NaiveBayesModelWriter modelWriter = new 
PlainTextNaiveBayesModelWriter(model, file);

modelWriter.persist();

NaiveBayesModelReader reader = new 
PlainTextNaiveBayesModelReader(file);

reader.checkModelType();

AbstractModel abstractModel = reader.constructModel();

assertNotNull(abstractModel);

  }



was (Author: cohan.sujay):
Tommaso,

The problem in the above testcases seems to be in the use of the 
GenericModelWriter.

Each of the machine learning algorithms has its own set of ModelWriter and 
ModelReader classes which must be used to persist their models.

The Writers come in one of 2 flavours - Binary and PlainText.

So, the following testcases work for me (one thing that baffled me was that I 
had to use constructModel rather than getModel to make these testcases work).

I hope that answers your question.

@Test
  public void testBinaryModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.bin");
NaiveBayesModelWriter modelWriter = new 
BinaryNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
BinaryNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }

  @Test
  public void testTextModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.txt");
NaiveBayesModelWriter modelWriter = new 
PlainTextNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
PlainTextNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining 

[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2015-09-15 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746005#comment-14746005
 ] 

Cohan Sujay Carlos edited comment on OPENNLP-777 at 9/15/15 7:39 PM:
-

Tommaso,

The problem in the above testcases seems to be in the use of the 
GenericModelWriter.

Each of the machine learning algorithms has its own set of ModelWriter and 
ModelReader classes which must be used to persist their models.

The Writers come in one of 2 flavours - Binary and PlainText.

So, the following testcases work for me (one thing that baffled me was that I 
had to use constructModel rather than getModel to make these testcases work).

I hope that answers your question.

@Test
  public void testBinaryModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.bin");
NaiveBayesModelWriter modelWriter = new 
BinaryNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
BinaryNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }

  @Test
  public void testTextModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.txt");
NaiveBayesModelWriter modelWriter = new 
PlainTextNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
PlainTextNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }



was (Author: cohan.sujay):
Tommaso,

The problem in the above testcases seems to be in the use of the 
GenericModelWriter.

Each of the machine learning algorithms has its own set of ModelWriter and 
ModelReader classes which must be used to persist their models.

The Writers come in one of 2 flavours - Binary and PlainText.

So, the following testcases work for me (one thing that baffled me was that I 
had to use constructModel rather than getModel to make these testcases work).

I hope that answers your question.


@Test
  public void testBinaryModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.bin");
NaiveBayesModelWriter modelWriter = new 
BinaryNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
BinaryNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }

  @Test
  public void testTextModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

File file = new File("test.txt");
NaiveBayesModelWriter modelWriter = new 
PlainTextNaiveBayesModelWriter(model, file);
modelWriter.persist();

NaiveBayesModelReader reader = new 
PlainTextNaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.constructModel();
assertNotNull(abstractModel);
  }


> Naive Bayesian Classifier
> -
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Machine Learning
> Environment: J2SE 1.5 and above
>Reporter: Cohan Sujay Carlos
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> 

[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2015-07-18 Thread Cohan Sujay Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632418#comment-14632418
 ] 

Cohan Sujay Carlos edited comment on OPENNLP-777 at 7/18/15 1:24 PM:
-

I've attached files demonstrating how the Naive Bayesian document categorizer 
(DocumentCategorizerNB) may be trained and used for document classification.

These Java files (D1TopicClassifierTrainingDemo and D1TopicClassifierUsageDemo) 
are meant to be used with the training data file (topics.train) that you will 
also find in the attachments.

When training said categorizer, place 'topics.train' in a 'corpora/topics' 
directory under the directory where you are running this code.

The model will be created in the sub-folder 'models' as 'topics_nb.bin' (make 
sure you have a folder by that name under your current directory).

D1TopicClassifierUsageDemo will use that model file to classify some documents.


was (Author: cohan.sujay):
Files demonstrating how the Naive Bayesian document categorizer 
(DocumentCategorizerNB) may be trained and used for document classification.  
These Java files are meant to be used with the training data file 
(topics.train) that you will also find in the attachments.

When training said categorizer, place 'topics.train' in a 'corpora/topics' 
directory under the directory where you are running this code.  The model will 
be created in the sub-folder 'models' (make sure you have a folder by that name 
under your current directory).

 Naive Bayesian Classifier
 -

 Key: OPENNLP-777
 URL: https://issues.apache.org/jira/browse/OPENNLP-777
 Project: OpenNLP
  Issue Type: New Feature
  Components: Machine Learning
 Environment: J2SE 1.5 and above
Reporter: Cohan Sujay Carlos
Priority: Minor
  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
 naive
 Attachments: D1TopicClassifierTrainingDemoNB.java, 
 D1TopicClassifierUsageDemoNB.java, 
 naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
 topics.train

   Original Estimate: 504h
  Remaining Estimate: 504h

 I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
 lacks one at present).
 Implementation details:  We have a production-hardened piece of Java code for 
 a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
 we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
 an adapter to make the interface compatible with the ME classifier in 
 OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
 Below is the email trail of a discussion in the dev mailing list around this 
 dated May 19th, 2015.
 snip
 Tommaso Teofili via opennlp.apache.org 
 to dev 
 Hi Cohan,
 I think that'd be a very valuable contribution, as NB is one of the
 foundation algorithms, often used as basis for comparisons.
 It would be good if you could create a Jira issue and provide more details
 about the implementation and, eventually, a patch.
 Thanks and regards,
 Tommaso
 /snip
 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
  I have a question for the OpenNLP project team.
 
  I was wondering if there is a Naive Bayesian classifier implementation in
  OpenNLP that I've not come across, or if there are plans to implement one.
 
  If it is the latter, I should love to contribute an implementation.
 
  There is an ME classifier already available in OpenNLP, of course, but I
  felt that there was an unmet need for a Naive Bayesian (NB) classifier
  implementation to be offered as well.
 
  An NB classifier could be bootstrapped up with partially labelled training
  data as explained in the Nigam, McCallum, et al paper of 2000 Text
  Classification from Labeled and Unlabeled Documents using EM.
 
  So, if there isn't an NB code base out there already, I'd be happy to
  contribute a very solid implementation that we've used in production for a
  good 5 years.
 
  I'd have to adapt it to load the same training data format as the ME
  classifier, but I guess that shouldn't be very difficult to do.
 
  I was wondering if there was some interest in adding an NB implementation
  and I'd love to know who could I coordinate with if there is?
 
  Cohan Sujay Carlos
  CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OPENNLP-777) Naive Bayesian Classifier

2015-05-19 Thread Haider Ali (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551168#comment-14551168
 ] 

Haider Ali edited comment on OPENNLP-777 at 5/19/15 8:33 PM:
-

i also want to contribute to Naive Bayesian Classifier 


was (Author: haider.ali):
i also wan to contribute to Naive Bayesian Classifier 

 Naive Bayesian Classifier
 -

 Key: OPENNLP-777
 URL: https://issues.apache.org/jira/browse/OPENNLP-777
 Project: OpenNLP
  Issue Type: New Feature
  Components: Machine Learning
Affects Versions: 1.6.0
 Environment: J2SE 1.5 and above
Reporter: Cohan Sujay Carlos
Priority: Minor
  Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
 naive
 Fix For: 1.6.0

   Original Estimate: 504h
  Remaining Estimate: 504h

 I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
 lacks one at present).
 Implementation details:  We have a production-hardened piece of Java code for 
 a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
 we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
 an adapter to make the interface compatible with the ME classifier in 
 OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
 Below is the email trail of a discussion in the dev mailing list around this 
 dated May 19th, 2015.
 snip
 Tommaso Teofili via opennlp.apache.org 
 to dev 
 Hi Cohan,
 I think that'd be a very valuable contribution, as NB is one of the
 foundation algorithms, often used as basis for comparisons.
 It would be good if you could create a Jira issue and provide more details
 about the implementation and, eventually, a patch.
 Thanks and regards,
 Tommaso
 /snip
 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
  I have a question for the OpenNLP project team.
 
  I was wondering if there is a Naive Bayesian classifier implementation in
  OpenNLP that I've not come across, or if there are plans to implement one.
 
  If it is the latter, I should love to contribute an implementation.
 
  There is an ME classifier already available in OpenNLP, of course, but I
  felt that there was an unmet need for a Naive Bayesian (NB) classifier
  implementation to be offered as well.
 
  An NB classifier could be bootstrapped up with partially labelled training
  data as explained in the Nigam, McCallum, et al paper of 2000 Text
  Classification from Labeled and Unlabeled Documents using EM.
 
  So, if there isn't an NB code base out there already, I'd be happy to
  contribute a very solid implementation that we've used in production for a
  good 5 years.
 
  I'd have to adapt it to load the same training data format as the ME
  classifier, but I guess that shouldn't be very difficult to do.
 
  I was wondering if there was some interest in adding an NB implementation
  and I'd love to know who could I coordinate with if there is?
 
  Cohan Sujay Carlos
  CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)