Spam score range and distribution statistics?

2014-06-09 Thread Ben Stover
As far as I found out SpamAssassin calculates the spam score and puts the value 
into the email header.

What is the maximum range of the score?

-10,,+10

or other?

Is there a statistic for an average email account how much emails get which 
score?

In other words is there something like a gaussian distribution graphic 
visualisation?

Ben




Re: Spam score range and distribution statistics?

2014-06-09 Thread Matus UHLAR - fantomas

On 09.06.14 09:47, Ben Stover wrote:

As far as I found out SpamAssassin calculates the spam score and puts the
value into the email header.

What is the maximum range of the score?

-10,,+10


I don't think it has limits. Maybe just limist for integer.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Eagles may soar, but weasels don't get sucked into jet engines. 


Re: Spam score range and distribution statistics?

2014-06-09 Thread Antony Stone
On Monday 09 June 2014 at 09:50, Matus UHLAR - fantomas wrote:

 On 09.06.14 09:47, Ben Stover wrote:
 As far as I found out SpamAssassin calculates the spam score and puts the
  value into the email header.
 
 What is the maximum range of the score?
 
 -10,,+10
 
 I don't think it has limits. Maybe just limist for integer.

http://spamassassin.apache.org/gtube for example has a default score of 1000.


Antony.

-- 
In fact I wanted to be John Cleese and it took me some time to realise that 
the job was already taken.

 - Douglas Adams

 Please reply to the list;
   please don't CC me.


Re: Spam score range and distribution statistics?

2014-06-09 Thread Bowie Bailey

On 6/9/2014 3:47 AM, Ben Stover wrote:

As far as I found out SpamAssassin calculates the spam score and puts the value 
into the email header.

What is the maximum range of the score?

-10,,+10

or other?


There are no limits on the score.  The higher the score, the more likely 
the email is spam and the lower the score, the more likely it is to be 
non-spam.  Looking through the last month's worth of logs on my server, 
I see scores ranging from -98 to 101.



Is there a statistic for an average email account how much emails get which 
score?

In other words is there something like a gaussian distribution graphic 
visualisation?


That would be different on every server depending on what type of spam 
and ham you see and which rule sets you are running.  I graphed mine out 
of curiosity and it forms a reasonable bell curve from -14 to 40 peaking 
at about 9.  Although there is an odd spike sticking up from -3 to 1 for 
some reason (and a rather large spike at 0).


I'm not a statistics guy, so I can't give you all the distribution 
numbers -- and, as I said, it will likely differ a fair amount between 
installations.


Are you just looking for general information, or is there something you 
are trying to determine?  If you tell us what you are looking for, we 
may be able to give you some better answers.


--
Bowie


Re: Spam score range and distribution statistics?

2014-06-09 Thread Joe Quinn

On 6/9/2014 11:34 AM, Bowie Bailey wrote:

On 6/9/2014 3:47 AM, Ben Stover wrote:
As far as I found out SpamAssassin calculates the spam score and puts 
the value into the email header.


What is the maximum range of the score?

-10,,+10

or other?


There are no limits on the score.  The higher the score, the more 
likely the email is spam and the lower the score, the more likely it 
is to be non-spam.  Looking through the last month's worth of logs on 
my server, I see scores ranging from -98 to 101.


Is there a statistic for an average email account how much emails get 
which score?


In other words is there something like a gaussian distribution 
graphic visualisation?


That would be different on every server depending on what type of spam 
and ham you see and which rule sets you are running.  I graphed mine 
out of curiosity and it forms a reasonable bell curve from -14 to 40 
peaking at about 9.  Although there is an odd spike sticking up from 
-3 to 1 for some reason (and a rather large spike at 0).


I'm not a statistics guy, so I can't give you all the distribution 
numbers -- and, as I said, it will likely differ a fair amount between 
installations.


Are you just looking for general information, or is there something 
you are trying to determine?  If you tell us what you are looking for, 
we may be able to give you some better answers.


That spike around zero is going to be your typical boring ham. It passes 
SPF and some other minor ham rules, and hits very very minor spam rules, 
if any.


Re: Spam score range and distribution statistics?

2014-06-09 Thread Karsten Bräckelmann
On Mon, 2014-06-09 at 11:34 -0400, Bowie Bailey wrote:
  In other words is there something like a gaussian distribution
  graphic visualisation?
 
 That would be different on every server depending on what type of spam 
 and ham you see and which rule sets you are running.  I graphed mine out 
 of curiosity and it forms a reasonable bell curve from -14 to 40 peaking 
 at about 9.  Although there is an odd spike sticking up from -3 to 1 for 
 some reason (and a rather large spike at 0).

I don't think that second spike is odd. That's the majority of your ham.

Since the data-set includes both spam and ham combined, there are two
spikes to be expected. A single bell curve would mean too many messages
in the gray area, no clear distinction between ham and spam, and
consequently lots of false positives and negatives.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}