http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270





------- Additional Comments From [EMAIL PROTECTED]  2007-01-10 12:04 -------
I just ran a quick experiment on the zone over the past day, to see what
perceptron tweaks work well on a 10% slice of last week's set1 logs, by
searching the HAM_PREFERENCE space between [1.0 .. 30.0] in 0.5 increments, and
THRESHOLD between [3.0 .. 10.0] in 0.25 increments (using an efficient
tesselating algorithm, of course), with the "validate-model" stuff as described
on http://wiki.apache.org/spamassassin/RunningPerceptron .

http://taint.org/x/2007/roc-test-set1.png is a ROC graph of the results.  (I
haven't multiplied the values by 100 to percentify them, so 1.0 == 100%, 0.1 ==
10%, 0.01 == 1%, 0.001 = 0.1%, you get the idea.)

http://taint.org/x/2007/roc-test-set1.txt is the raw data for that ROC
graph, sorted, in space-separated FP%, FN%, vm-name format.  (ignore the "set3"
typo; these are all "set1" logs really since there are no Bayes results.)

interesting to note:

- the perceptron generally conforms nicely to a neat ROC curve, except for a
"mirror" curve of occasional way-off results: those are the results where
HAM_PREFERENCE==1.0.  so we can discard that!

- the "sweet spot", IMO, is around 0.45% FPs, 3.9% FNs, which is
vm-set3-8.25-5.1875-100 - in other words, HAM_PREFERENCE=8.25 THRESHOLD=5.1875.
I'll try a runGA with that.

- here's a sample scores file from that vm,
http://taint.org/x/2007/roc-test-set1-scores.txt , if you're curious.


Henry, have I gone a bit overboard here? ;)  what else should I be trying?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to