http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270
------- Additional Comments From [EMAIL PROTECTED] 2007-01-10 12:04 ------- I just ran a quick experiment on the zone over the past day, to see what perceptron tweaks work well on a 10% slice of last week's set1 logs, by searching the HAM_PREFERENCE space between [1.0 .. 30.0] in 0.5 increments, and THRESHOLD between [3.0 .. 10.0] in 0.25 increments (using an efficient tesselating algorithm, of course), with the "validate-model" stuff as described on http://wiki.apache.org/spamassassin/RunningPerceptron . http://taint.org/x/2007/roc-test-set1.png is a ROC graph of the results. (I haven't multiplied the values by 100 to percentify them, so 1.0 == 100%, 0.1 == 10%, 0.01 == 1%, 0.001 = 0.1%, you get the idea.) http://taint.org/x/2007/roc-test-set1.txt is the raw data for that ROC graph, sorted, in space-separated FP%, FN%, vm-name format. (ignore the "set3" typo; these are all "set1" logs really since there are no Bayes results.) interesting to note: - the perceptron generally conforms nicely to a neat ROC curve, except for a "mirror" curve of occasional way-off results: those are the results where HAM_PREFERENCE==1.0. so we can discard that! - the "sweet spot", IMO, is around 0.45% FPs, 3.9% FNs, which is vm-set3-8.25-5.1875-100 - in other words, HAM_PREFERENCE=8.25 THRESHOLD=5.1875. I'll try a runGA with that. - here's a sample scores file from that vm, http://taint.org/x/2007/roc-test-set1-scores.txt , if you're curious. Henry, have I gone a bit overboard here? ;) what else should I be trying? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
