http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5686
------- Additional Comments From [EMAIL PROTECTED] 2007-10-20 07:19 ------- ok, some more tests.... trying the (crazy) K3=20 with the Bayes chain rule combiner: SCORE NUMHIT DETAIL OVERALL HISTOGRAM (. = ham, # = spam) 0.000 (100.000%) ..........|....................................................... 0.000 ( 1.047%) ##########|# 0.200 ( 0.110%) # | 0.320 ( 0.055%) # | 0.680 ( 0.055%) # | 0.800 ( 0.055%) # | 0.840 ( 0.055%) # | 0.880 ( 0.165%) ## | 0.920 ( 0.110%) # | 0.960 (98.346%) ##########|####################################################### let's try the naive Bayes combiner, K3 = 0.8: SCORE NUMHIT DETAIL OVERALL HISTOGRAM (. = ham, # = spam) 0.120 (32.439%) ..........|...................................................... 0.160 (32.844%) ..........|....................................................... 0.160 ( 0.055%) # | 0.200 (27.379%) ..........|.............................................. 0.240 ( 6.275%) ..........|........... 0.280 ( 0.658%) ..........|. 0.280 ( 0.331%) ###### | 0.320 ( 0.152%) ..... | 0.320 ( 0.221%) #### | 0.360 ( 0.051%) .. | 0.400 ( 0.202%) ....... | 0.400 ( 0.110%) ## | 0.440 ( 0.331%) ###### | 0.480 ( 0.496%) ######### | 0.520 ( 0.331%) ###### | 0.560 ( 1.378%) ##########|# 0.600 (15.160%) ##########|############## 0.640 (57.938%) ##########|####################################################### 0.680 ( 2.426%) ##########|## 0.720 (20.066%) ##########|################### 0.760 ( 1.047%) ##########|# 0.840 ( 0.110%) ## | So far I think K3=1, with the traditional naive Bayes combiner, is working best for us, since it's so good at avoiding FPs and FNs that the others leave behind. To compare with the figures from comment 1, here's the results from a full 10-fold cross validation: SCORE NUMHIT DETAIL OVERALL HISTOGRAM (. = ham, # = spam) 0.000 (25.415%) ..........|....................................................... 0.040 ( 9.831%) ..........|..................... 0.080 (22.571%) ..........|................................................. 0.120 (21.716%) ..........|............................................... 0.160 ( 8.435%) ..........|.................. 0.200 ( 5.444%) ..........|............ 0.200 ( 0.028%) # | 0.240 ( 3.916%) ..........|........ 0.240 ( 0.022%) # | 0.280 ( 1.801%) ..........|.... 0.280 ( 0.022%) # | 0.320 ( 0.491%) ..........|. 0.320 ( 0.226%) ##### | 0.360 ( 0.116%) ..... | 0.360 ( 0.231%) ###### | 0.400 ( 0.040%) .. | 0.400 ( 0.193%) ##### | 0.440 ( 0.132%) ### | 0.480 ( 0.223%) ..........| 0.480 ( 1.334%) ##########|## 0.520 ( 0.110%) ### | 0.560 ( 0.419%) ##########|# 0.600 ( 0.832%) ##########|# 0.640 ( 1.769%) ##########|## 0.680 ( 8.813%) ##########|########### 0.720 (36.767%) ##########|############################################ 0.760 (45.712%) ##########|####################################################### 0.800 ( 3.279%) ##########|#### 0.840 ( 0.006%) | 0.880 ( 0.011%) | 0.920 ( 0.022%) # | 0.960 ( 0.072%) ## | Threshold optimization for hamcutoff=0.30, spamcutoff=0.70: cost=$206.30 Total ham:spam: 19764:18144 FP: 0 0.000% FN: 9 0.050% Unsure: 1973 5.205% (ham: 528 2.672% spam: 1445 7.964%) TCRs: l=1 12.479 l=5 12.479 l=9 12.479 SUMMARY: 0.30/0.70 fp 0 fn 9 uh 528 us 1445 c 206.30 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
