Spam score range and distribution statistics?
As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 or other? Is there a statistic for an average email account how much emails get which score? In other words is there something like a gaussian distribution graphic visualisation? Ben
Re: Spam score range and distribution statistics?
On 09.06.14 09:47, Ben Stover wrote: As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 I don't think it has limits. Maybe just limist for integer. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Eagles may soar, but weasels don't get sucked into jet engines.
Re: Spam score range and distribution statistics?
On Monday 09 June 2014 at 09:50, Matus UHLAR - fantomas wrote: On 09.06.14 09:47, Ben Stover wrote: As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 I don't think it has limits. Maybe just limist for integer. http://spamassassin.apache.org/gtube for example has a default score of 1000. Antony. -- In fact I wanted to be John Cleese and it took me some time to realise that the job was already taken. - Douglas Adams Please reply to the list; please don't CC me.
Re: Spam score range and distribution statistics?
On 6/9/2014 3:47 AM, Ben Stover wrote: As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 or other? There are no limits on the score. The higher the score, the more likely the email is spam and the lower the score, the more likely it is to be non-spam. Looking through the last month's worth of logs on my server, I see scores ranging from -98 to 101. Is there a statistic for an average email account how much emails get which score? In other words is there something like a gaussian distribution graphic visualisation? That would be different on every server depending on what type of spam and ham you see and which rule sets you are running. I graphed mine out of curiosity and it forms a reasonable bell curve from -14 to 40 peaking at about 9. Although there is an odd spike sticking up from -3 to 1 for some reason (and a rather large spike at 0). I'm not a statistics guy, so I can't give you all the distribution numbers -- and, as I said, it will likely differ a fair amount between installations. Are you just looking for general information, or is there something you are trying to determine? If you tell us what you are looking for, we may be able to give you some better answers. -- Bowie
Re: Spam score range and distribution statistics?
On 6/9/2014 11:34 AM, Bowie Bailey wrote: On 6/9/2014 3:47 AM, Ben Stover wrote: As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 or other? There are no limits on the score. The higher the score, the more likely the email is spam and the lower the score, the more likely it is to be non-spam. Looking through the last month's worth of logs on my server, I see scores ranging from -98 to 101. Is there a statistic for an average email account how much emails get which score? In other words is there something like a gaussian distribution graphic visualisation? That would be different on every server depending on what type of spam and ham you see and which rule sets you are running. I graphed mine out of curiosity and it forms a reasonable bell curve from -14 to 40 peaking at about 9. Although there is an odd spike sticking up from -3 to 1 for some reason (and a rather large spike at 0). I'm not a statistics guy, so I can't give you all the distribution numbers -- and, as I said, it will likely differ a fair amount between installations. Are you just looking for general information, or is there something you are trying to determine? If you tell us what you are looking for, we may be able to give you some better answers. That spike around zero is going to be your typical boring ham. It passes SPF and some other minor ham rules, and hits very very minor spam rules, if any.
Re: Spam score range and distribution statistics?
On Mon, 2014-06-09 at 11:34 -0400, Bowie Bailey wrote: In other words is there something like a gaussian distribution graphic visualisation? That would be different on every server depending on what type of spam and ham you see and which rule sets you are running. I graphed mine out of curiosity and it forms a reasonable bell curve from -14 to 40 peaking at about 9. Although there is an odd spike sticking up from -3 to 1 for some reason (and a rather large spike at 0). I don't think that second spike is odd. That's the majority of your ham. Since the data-set includes both spam and ham combined, there are two spikes to be expected. A single bell curve would mean too many messages in the gray area, no clear distinction between ham and spam, and consequently lots of false positives and negatives. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}