Results produced by Complementary Bayes Classifier seem odd
-----------------------------------------------------------
Key: MAHOUT-562
URL: https://issues.apache.org/jira/browse/MAHOUT-562
Project: Mahout
Issue Type: Bug
Components: Classification
Affects Versions: 0.4
Reporter: Oleg Kalnichevski
The 20newsgroups example produces expected results (95% correctness rate) when
using the Naive Bayes algorithm. When switching the algorithm to the
Complementary Bayes while all other parameters remain the same the rate of
correctly classified documents drops to 5%. This seems odd to me.
I admit I know next to nothing about the Bayes theorem and possibly my
expectations are totally off.
---
Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier
classifySequential
INFO: Loading model from: {basePath=/home/oleg/data/mahout/20news-bayes-model,
classifierType=cbayes, alpha_i=1, dataSource=hdfs, gramSize=1, verbose=false,
encoding=UTF-8, defaultCat=unknown,
testDirPath=/home/oleg/data/mahout/20news-bayes-train-input}
Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier
classifySequential
INFO: Testing Complementary Bayes Classifier
...
INFO: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 578 5.1087%
Incorrectly Classified Instances : 10736 94.8913%
Total Classified Instances : 11314
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s t
<--Classified as
0 0 0 0 0 0 0 0 0 0
0 0 0 597 0 0 0 0 0 0
| 597 a = rec.sport.baseball
0 0 0 0 0 0 0 0 0 0
0 0 0 595 0 0 0 0 0 0
| 595 b = sci.crypt
0 0 0 0 0 0 0 0 0 0
0 0 0 600 0 0 0 0 0 0
| 600 c = rec.sport.hockey
0 0 0 0 0 0 0 0 0 0
0 0 0 546 0 0 0 0 0 0
| 546 d = talk.politics.guns
0 0 0 0 0 0 0 0 0 0
0 0 0 599 0 0 0 0 0 0
| 599 e = soc.religion.christian
0 0 0 0 0 0 0 0 0 0
0 0 0 591 0 0 0 0 0 0
| 591 f = sci.electronics
0 0 0 0 0 0 0 0 0 0
0 0 0 591 0 0 0 0 0 0
| 591 g = comp.os.ms-windows.misc
0 0 0 0 0 0 0 0 0 0
0 0 0 585 0 0 0 0 0 0
| 585 h = misc.forsale
0 0 0 0 0 0 0 0 0 0
0 0 0 377 0 0 0 0 0 0
| 377 i = talk.religion.misc
0 0 0 0 0 0 0 0 0 0
0 0 0 480 0 0 0 0 0 0
| 480 j = alt.atheism
0 0 0 0 0 0 0 0 0 0
0 0 0 593 0 0 0 0 0 0
| 593 k = comp.windows.x
0 0 0 0 0 0 0 0 0 0
0 0 0 564 0 0 0 0 0 0
| 564 l = talk.politics.mideast
0 0 0 0 0 0 0 0 0 0
0 0 0 590 0 0 0 0 0 0
| 590 m = comp.sys.ibm.pc.hardware
0 0 0 0 0 0 0 0 0 0
0 0 0 578 0 0 0 0 0 0
| 578 n = comp.sys.mac.hardware
0 0 0 0 0 0 0 0 0 0
0 0 0 593 0 0 0 0 0 0
| 593 o = sci.space
0 0 0 0 0 0 0 0 0 0
0 0 0 598 0 0 0 0 0 0
| 598 p = rec.motorcycles
0 0 0 0 0 0 0 0 0 0
0 0 0 594 0 0 0 0 0 0
| 594 q = rec.autos
0 0 0 0 0 0 0 0 0 0
0 0 0 584 0 0 0 0 0 0
| 584 r = comp.graphics
0 0 0 0 0 0 0 0 0 0
0 0 0 465 0 0 0 0 0 0
| 465 s = talk.politics.misc
0 0 0 0 0 0 0 0 0 0
0 0 0 594 0 0 0 0 0 0
| 594 t = sci.med
Default Category: unknown: 20
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.