[Bug 4505] Score generation for SpamAssassin 3.1

bugzilla-daemon Sat, 06 Aug 2005 03:30:15 -0700

http://bugzilla.spamassassin.org/show_bug.cgi?id=4505






------- Additional Comments From [EMAIL PROTECTED]  2005-08-06 03:30 -------
Another suggested set of bayes values:

Bayes   Set 2   Set 3   Eqn 2   Eqn 3
0       -2.312  -2.599  -2.5    -2.6
5       -1.11   -0.413  -1.525  -2.2
20      -0.74   -1.951  -0.7    -2
40      -0.185  -1.096  0.4     -0.78
50      0.912   0.001   0.95    -0.1
60      2.22    0.372   1.8     0.58
80      2.775   2.087   2.7     1.94
95      3.237   2.063   3.425   2.96
99      3.145   1.886   3.645   3.232

The second and third columns are sets 2 and 3 from Henry's data.  The final two 
columns are my proposed values for sets 2 and 3.  These values are not what I 
would really like to see on the high end, but I think are about as high as one 
can somewhat reasonably go based on the data.

Both sets are essentially linear trendlines for sets 2 and 3, with some hand 
corrections to better match what I consider a few important data points.
In particular, bayes_00 for both sets 2 and 3 are close to -2.5.  However the 
trendlines would predict values around -1.7 for set 2 and -3.2 or so for set 
3.  I've moved the bayes_00 point to something that the data will support in 
both cases.  Also both sets show a weakness in bayes_05.  I've pushed the 
bayes_05 trendline values upward for both sets, although not far enough to 
create score inversions.

It should be noted that both original sets indicate a flattening of the bayes 
scores over 80%.  I've left these values as the linear trendline would predict, 
since that seems to be closer to normal human experience.  It must be noted 
though that the data doesn't really support these extrapolations, especially 
for bayes_99.  

Neither bayes_99 score comes close to 4.0.  I tried to play with the data until 
I could get something in that range, but it wouldn't go along with the game.  
It would be possible to tweak the set 2 scores for 95 and 99 upward to aim at 
4.0 without departing too badly from the data.  This wouldn't be possible with 
the set 3 scores.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4505] Score generation for SpamAssassin 3.1

Reply via email to