[ 
https://issues.apache.org/jira/browse/MAHOUT-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622040#action_12622040
 ] 

robinanil edited comment on MAHOUT-60 at 8/12/08 5:11 PM:
-----------------------------------------------------------

I have merged the BayesClassifier and CBayesClassifier. Now both use some 
common Map reduce operation. The specific Map-Reduce operations are factored 
out. 
The Model is also factored out. 

The new feature in this patch is a n-gram generator using the cli parameter  
-ng <gram-size>
If a model is made using a 3-gram then you can use 1/2/3 gram to classify.

Try increasing n-gram and see how the classification accuracy grow with it.

cbayes.TestTwentyNewsgroups is renamed to bayes.TestClassifier
cbayes.TrainTwentyNewsgrousp is renamed to bayes.TrainClassifier

The Tests will fail when using this patch. So dont worry. New tests will be put 
up shortly.

 {noformat} 
     //To Train a Bayes Classifier using tri-grams
      hadoop jar build/apache-mahout-0.1-dev-ex.jar 
org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i newstrain -o 
newsmodel -ng 3 -type bayes
     //To Test a Bayes Classifier using tri-grams
      hadoop jar build/apache-mahout-0.1-dev-ex.jar 
org.apache.mahout.examples.classifiers.bayes.TestClassifier -p newsmodel -t 
work/newstest -ng 3 -type bayes

     //To Train a CBayes Classifier using bi-grams
      hadoop jar build/apache-mahout-0.1-dev-ex.jar 
org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i newstrain -o 
newsmodel -ng 2 -type cbayes
     //To Test a CBayes Classifier using bi-grams
      hadoop jar build/apache-mahout-0.1-dev-ex.jar 
org.apache.mahout.examples.classifiers.bayes.TestClassifier -p newsmodel -t 
work/newstest -ng 2 -type cbayes
 {noformat} 

Hope you will enjoy using this patch.

      was (Author: robinanil):
    I have merged the BayesClassifier and CBayesClassifier. Now both use some 
common Map reduce operation. The specific Map-Reduce operations are factored 
out. 
The Model is also factored out. 

The new feature in this patch is a n-gram generator using the cli parameter  
-ng <gram-size>
If a model is made using a 3-gram then you can use 1/2/3 gram to classify.

Try increasing n-gram and see how the classification accuracy grow with it.

cbayes.TestTwentyNewsgroups is renamed to bayes.TestClassifier
cbayes.TrainTwentyNewsgrousp is renamed to bayes.TrainClassifier

The Tests will fail when using this patch. So dont worry. New tests will be put 
up shortly.

 {noformat} 
     //To Train a Bayes Classifier using tri-grams
      hadoop jar build/apache-mahout-0.1-dev-ex.jar 
org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i newstrain -o 
newsmodel -ng 3 -type bayes
     //To Test a Bayes Classifier using tri-grams
      hadoop jar build/apache-mahout-0.1-dev-ex.jar 
org.apache.mahout.examples.classifiers.bayes.TestClassifier -p newsmodel -t 
work/newstest -ng 3 -type bayes

     //To Train a CBayes Classifier using bi-grams
      hadoop jar build/apache-mahout-0.1-dev-ex.jar 
org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i newstrain -o 
newsmodel -ng 2 -type bayes
     //To Test a CBayes Classifier using bi-grams
      hadoop jar build/apache-mahout-0.1-dev-ex.jar 
org.apache.mahout.examples.classifiers.bayes.TestClassifier -p newsmodel -t 
work/newstest -ng 2 -type cbayes
 {noformat} 

Hope you will enjoy using this patch.
  
> Complementary Naive Bayes
> -------------------------
>
>                 Key: MAHOUT-60
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-60
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification
>            Reporter: Robin Anil
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: country.txt, MAHOUT-60-13082008.patch, MAHOUT-60.patch, 
> MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, twcnb.jpg
>
>
> The focus is to implement an improved text classifier based on this paper 
> http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to