Results produced by Complementary Bayes Classifier seem odd
-----------------------------------------------------------

                 Key: MAHOUT-562
                 URL: https://issues.apache.org/jira/browse/MAHOUT-562
             Project: Mahout
          Issue Type: Bug
          Components: Classification
    Affects Versions: 0.4
            Reporter: Oleg Kalnichevski


The 20newsgroups example produces expected results (95% correctness rate) when 
using the Naive Bayes algorithm. When switching the algorithm to the 
Complementary Bayes while all other parameters remain the same the rate of 
correctly classified documents drops to 5%. This seems odd to me. 

I admit I know next to nothing about the Bayes theorem and possibly my 
expectations are totally off. 

---
Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier 
classifySequential
INFO: Loading model from: {basePath=/home/oleg/data/mahout/20news-bayes-model, 
classifierType=cbayes, alpha_i=1, dataSource=hdfs, gramSize=1, verbose=false, 
encoding=UTF-8, defaultCat=unknown, 
testDirPath=/home/oleg/data/mahout/20news-bayes-train-input}
Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier 
classifySequential
INFO: Testing Complementary Bayes Classifier
...
INFO: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :        578        5.1087%
Incorrectly Classified Instances        :      10736       94.8913%
Total Classified Instances              :      11314

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j       
k       l       m       n       o       p       q       r       s       t       
<--Classified as
0       0       0       0       0       0       0       0       0       0       
0       0       0       597     0       0       0       0       0       0       
 |  597         a     = rec.sport.baseball
0       0       0       0       0       0       0       0       0       0       
0       0       0       595     0       0       0       0       0       0       
 |  595         b     = sci.crypt
0       0       0       0       0       0       0       0       0       0       
0       0       0       600     0       0       0       0       0       0       
 |  600         c     = rec.sport.hockey
0       0       0       0       0       0       0       0       0       0       
0       0       0       546     0       0       0       0       0       0       
 |  546         d     = talk.politics.guns
0       0       0       0       0       0       0       0       0       0       
0       0       0       599     0       0       0       0       0       0       
 |  599         e     = soc.religion.christian
0       0       0       0       0       0       0       0       0       0       
0       0       0       591     0       0       0       0       0       0       
 |  591         f     = sci.electronics
0       0       0       0       0       0       0       0       0       0       
0       0       0       591     0       0       0       0       0       0       
 |  591         g     = comp.os.ms-windows.misc
0       0       0       0       0       0       0       0       0       0       
0       0       0       585     0       0       0       0       0       0       
 |  585         h     = misc.forsale
0       0       0       0       0       0       0       0       0       0       
0       0       0       377     0       0       0       0       0       0       
 |  377         i     = talk.religion.misc
0       0       0       0       0       0       0       0       0       0       
0       0       0       480     0       0       0       0       0       0       
 |  480         j     = alt.atheism
0       0       0       0       0       0       0       0       0       0       
0       0       0       593     0       0       0       0       0       0       
 |  593         k     = comp.windows.x
0       0       0       0       0       0       0       0       0       0       
0       0       0       564     0       0       0       0       0       0       
 |  564         l     = talk.politics.mideast
0       0       0       0       0       0       0       0       0       0       
0       0       0       590     0       0       0       0       0       0       
 |  590         m     = comp.sys.ibm.pc.hardware
0       0       0       0       0       0       0       0       0       0       
0       0       0       578     0       0       0       0       0       0       
 |  578         n     = comp.sys.mac.hardware
0       0       0       0       0       0       0       0       0       0       
0       0       0       593     0       0       0       0       0       0       
 |  593         o     = sci.space
0       0       0       0       0       0       0       0       0       0       
0       0       0       598     0       0       0       0       0       0       
 |  598         p     = rec.motorcycles
0       0       0       0       0       0       0       0       0       0       
0       0       0       594     0       0       0       0       0       0       
 |  594         q     = rec.autos
0       0       0       0       0       0       0       0       0       0       
0       0       0       584     0       0       0       0       0       0       
 |  584         r     = comp.graphics
0       0       0       0       0       0       0       0       0       0       
0       0       0       465     0       0       0       0       0       0       
 |  465         s     = talk.politics.misc
0       0       0       0       0       0       0       0       0       0       
0       0       0       594     0       0       0       0       0       0       
 |  594         t     = sci.med
Default Category: unknown: 20

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to