[Bug 4505] Score generation for SpamAssassin 3.1

bugzilla-daemon Wed, 03 Aug 2005 19:04:38 -0700

http://bugzilla.spamassassin.org/show_bug.cgi?id=4505






------- Additional Comments From [EMAIL PROTECTED]  2005-08-03 19:04 -------
anyway, back to the score generation thing, a few items:


1. I'm -1 on using those scores. They look great all-round, *except* for the
Bayes scores:

 56.044  84.1316   0.0375    1.000   0.84    1.89  BAYES_99
  1.716   2.5715   0.0099    0.996   0.83    2.06  BAYES_95
  1.983   2.9654   0.0251    0.992   0.76    2.09  BAYES_80
  1.685   2.5064   0.0463    0.982   0.68    0.37  BAYES_60
 31.996   0.3606  95.0772    0.004   0.60   -2.60  BAYES_00
  4.503   5.9619   1.5927    0.789   0.47    0.00  BAYES_50
  0.311   0.0880   0.7556    0.104   0.36   -0.41  BAYES_05
  0.377   0.1622   0.8048    0.168   0.32   -1.95  BAYES_20
  0.401   0.2655   0.6706    0.284   0.27   -1.10  BAYES_40

(scoreset 3 freqs output.)   note that none of them was permitted above 2
points by the perceptron; those scores have the odd flattening for
BAYES_95/99 we had to fix in 3.0.3 in r165033; and there seems to be
unanimous support on the record for fixing these.

(ok, I'm being a little disingenuous on the last point, as I think someone,
either Daniel or Henry, was ok with letting them float, but they made the
comment on a transitory medium like IRC or IM so it doesn't count. ;)

So I suggest we set them to the static scores and move out of the mutable
section, as done in the attached patch, then get Henry to rerun
the perceptron.   for ease of review, those static scores are:

score BAYES_00 0.0001 0.0001 -2.312 -2.599
score BAYES_05 0.0001 0.0001 -1.110 -1.110
score BAYES_20 0.0001 0.0001 -0.740 -0.740
score BAYES_40 0.0001 0.0001 -0.185 -0.185
score BAYES_50 0.0001 0.0001 0.001 0.001
score BAYES_60 0.0001 0.0001 1.0 1.0
score BAYES_80 0.0001 0.0001 2.0 2.0
score BAYES_95 0.0001 0.0001 3.0 3.0
score BAYES_99 0.0001 0.0001 3.5 3.5

they're a mix of what the perceptron said in that last run, what was used in
3.0.3, and some smoothing (to avoid the FAQs again).


Henry -- any chance you can gzip up the validation set after you run the
perceptron, and put them somewhere?   There's a whole batch of stuff that needs
to be done that needs those.  also, we need to get the statistics in.   I've
updated http://wiki.apache.org/spamassassin/RescoreMassCheck with what I think
needs to be done (steps 5 onwards).

Probably not worth doing those until we vote on the patch / figure out
what to do with the BAYES scores, though.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4505] Score generation for SpamAssassin 3.1

Reply via email to