http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5686
------- Additional Comments From [EMAIL PROTECTED] 2007-11-01 16:22 ------- (In reply to comment #18) > I'm rerunning now to establish another fix to that bug, that still displays an > improvement, since the attempt in r588709 doesn't do that... this proved really tricky. after a full 10-fold cv run, here's what the original (buggy) code scores, for two sample score thresholds: SUMMARY: 0.30/0.70 fp 0 fn 9 uh 528 us 1445 c 206.30 SUMMARY: 0.20/0.80 fp 0 fn 0 uh 2378 us 17529 c 1990.70 it took a few days, but I've finally figured out a patch that is both (a) not buggy ;) and (b) has better results: SUMMARY: 0.30/0.70 fp 0 fn 7 uh 994 us 631 c 169.50 SUMMARY: 0.20/0.80 fp 0 fn 0 uh 3018 us 3295 c 631.30 It includes a small hack -- it scales the scores up by 10%, since EDDC and the naive Bayes combiner seem to skew scores a little lower. results improve with this; it'd probably be better to analyze the EDDC equation and figure out why the scores aren't 10% higher to start with, but hey ;) This is now the new baseline, checked in as r591167. Here's the score histogram: SCORE NUMHIT DETAIL OVERALL HISTOGRAM (. = ham, # = spam) 0.000 (25.086%) ..........|....................................................... 0.040 ( 9.016%) ..........|.................... 0.080 (16.146%) ..........|................................... 0.120 (23.593%) ..........|.................................................... 0.160 (10.888%) ..........|........................ 0.200 ( 5.976%) ..........|............. 0.200 ( 0.011%) | 0.240 ( 4.265%) ..........|......... 0.240 ( 0.028%) # | 0.280 ( 2.970%) ..........|....... 0.280 ( 0.011%) | 0.320 ( 1.295%) ..........|... 0.320 ( 0.039%) # | 0.360 ( 0.390%) ..........|. 0.360 ( 0.220%) ###### | 0.400 ( 0.106%) ..... | 0.400 ( 0.209%) ###### | 0.440 ( 0.040%) .. | 0.440 ( 0.165%) ##### | 0.480 ( 0.121%) ### | 0.520 ( 0.228%) ..........| 0.520 ( 1.361%) ##########|## 0.560 ( 0.072%) ## | 0.600 ( 0.259%) ####### | 0.640 ( 0.612%) ##########|# 0.680 ( 0.970%) ##########|# 0.720 ( 2.750%) ##########|#### 0.760 (11.332%) ##########|################ 0.800 (38.261%) ##########|##################################################### 0.840 (40.074%) ##########|####################################################### 0.880 ( 3.390%) ##########|##### 0.920 ( 0.011%) | 0.960 ( 0.105%) ### | ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
