RE: generating rule stats from spamd logs

2005-08-03 Thread Matthew Yette

No one has any thoughts on this? It's not a quick fix? :(
--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
[EMAIL PROTECTED]
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: [EMAIL PROTECTED]

-Original Message-
From: Matthew Yette 
Sent: Friday, July 29, 2005 8:24 AM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs


I'd be able to code it in myself but I'm not fluent in perl (PHP guy)
and of course, the string parsing functions confuse the hell out of me.
LOL. Thought that there might be a lot of perl coders here who can make
this a snap. [Recipient-domain-based filtering  date range also]

Thanks so much!

--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
[EMAIL PROTECTED]
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: [EMAIL PROTECTED]

-Original Message-
From: Matthew Yette 
Sent: Thursday, July 28, 2005 12:07 PM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs


Is there any way to modify this code to accept another command-line
argument for domain-specific? Meaning, I want to look for all rule hits
for mail destined for domain.com?

--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
[EMAIL PROTECTED]
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: [EMAIL PROTECTED]

-Original Message-
From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 27, 2005 1:02 PM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs


My mistake.. It is fixed, hopefully for good.
v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt


TOP SPAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1UNPARSEABLE_RELAY   25322 7.35   74.72   99.76
99.13
   2URIBL_SBL   22241 6.46   65.63   87.63
0.38
   3URIBL_JP_SURBL  21419 6.22   63.20   84.39
0.28
   4URIBL_BLACK 19436 5.64   57.35   76.57
0.93
   5RAZOR2_CF_RANGE_51_100  17562 5.10   51.82   69.19
1.34
   6RAZOR2_CHECK17475 5.07   51.57   68.85
1.15
   7SARE_SPEC_ROLEX_REP 16553 4.81   48.84   65.22
0.29
   8SPOOF_COM2OTH   16537 4.80   48.80   65.15
0.05
   9RAZOR2_CF_RANGE_E8_51_100   16329 4.74   48.18   64.33
0.16
  10BAYES_9915380 4.47   45.38   60.59
0.28

 
TOP HAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1UNPARSEABLE_RELAY843318.93   24.88   99.76
99.13
   2BAYES_00 700515.72   20.670.74
82.34
   3AWL  490411.01   14.47   26.64
57.65
   4HTML_MESSAGE 3813 8.56   11.25   22.92
44.82
   5NO_REAL_NAME 1453 3.264.29   37.79
17.08
   6HTML_80_90   1279 2.873.77   10.98
15.03
   7MIME_HTML_ONLY972 2.182.876.88
11.43
   8HTML_FONT_BIG 794 1.782.349.28
9.33
   9BAYES_50  625 1.401.84   25.40
7.35
  10HTML_FONT_FACE_BAD545 1.221.610.76
6.41


 




From: Steve Martin [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 27, 2005 11:44 AM
To: Andy Jezierski
Cc: Dallas L. Engelken; users@spamassassin.apache.org
Subject: Re: generating rule stats from spamd logs


He only fixed the spam rules section. 

The TOP HAM RULES sections still has these two incorrect
computations...

my $perc2=sprintf(%.2f,($HAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_HAM)*100);


Number of times a rule fired on ham / total number of spam
messages.
Number of times a rule fired on spam / total number of ham
messages.

my $perc2=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf(%.2f,($HAM_RULES{$key}/$NUM_HAM)*100);

On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote:



Dallas L. Engelken [EMAIL PROTECTED] wrote on
07/27/2005 11:26:54 AM:

   -Original Message-
  From: Chris Thielen
[mailto:[EMAIL PROTECTED] 
  Sent: Wednesday, July 27

RE: generating rule stats from spamd logs

2005-08-03 Thread Dallas L. Engelken
v1.0 now has per-user and per-domain support
http://www.rulesemporium.com/programs/sa-stats.txt
 
# Per User and Per Domain Statistics...
# --
#
# ./sa-stats.pl -r postmaster  
#- this would give all stats for postmaster users, 
#  regardless of which domain it was for.  handy if you
#  have alot of domain aliases
#
# ./sa-stats.pl -r @domain
#- this would give all stats for the domain specified.
#  make sure you include the '@' sign before the 
#  domain or the script will assume you wanted a user
#  name instead.
#
# ./sa-stats.pl -r [EMAIL PROTECTED]
#- this would give all stats for a specific email address.
#  this assumes you pass 'spamc -u fullemail' vs. 
#  'spamc -u userpart'.  If you do the latter, you simply
#  want to call -r userpart instead.
#
# --


I would have to incorporate Time::Local, Date::Manip, and Parse::Syslog
into it to be able to do date start and stop times, and at this point I
really don't want to ;)   Besides, I store my logs in tai64 format, so
it wouldn't help me at all.  

If someone else wants to code Parse::Syslog support into it, be my
guest.. Or port some of this code into the sa-stats that provided in the
distro, have at it..

Dallas




 -Original Message-
 From: Matthew Yette [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 03, 2005 11:05 AM
 To: users@spamassassin.apache.org
 Subject: RE: generating rule stats from spamd logs
 
 
 No one has any thoughts on this? It's not a quick fix? :(
 --
 Matthew Yette
 Senior Engineer - NOC/Operations
 MA Polce Consulting, Inc.
 [EMAIL PROTECTED]
 315-838-1644 (w)
 315-356-0597 (f)
 AIM/Yahoo: MAPolceNOC
 MSN: [EMAIL PROTECTED]
 
 -Original Message-
 From: Matthew Yette
 Sent: Friday, July 29, 2005 8:24 AM
 To: users@spamassassin.apache.org
 Subject: RE: generating rule stats from spamd logs
 
 
 I'd be able to code it in myself but I'm not fluent in perl (PHP guy)
 and of course, the string parsing functions confuse the hell 
 out of me.
 LOL. Thought that there might be a lot of perl coders here 
 who can make
 this a snap. [Recipient-domain-based filtering  date range also]
 
 Thanks so much!
 
 --
 Matthew Yette
 Senior Engineer - NOC/Operations
 MA Polce Consulting, Inc.
 [EMAIL PROTECTED]
 315-838-1644 (w)
 315-356-0597 (f)
 AIM/Yahoo: MAPolceNOC
 MSN: [EMAIL PROTECTED]
 
 -Original Message-
 From: Matthew Yette 
 Sent: Thursday, July 28, 2005 12:07 PM
 To: users@spamassassin.apache.org
 Subject: RE: generating rule stats from spamd logs
 
 
 Is there any way to modify this code to accept another command-line
 argument for domain-specific? Meaning, I want to look for all 
 rule hits
 for mail destined for domain.com?
 
 --
 Matthew Yette
 Senior Engineer - NOC/Operations
 MA Polce Consulting, Inc.
 [EMAIL PROTECTED]
 315-838-1644 (w)
 315-356-0597 (f)
 AIM/Yahoo: MAPolceNOC
 MSN: [EMAIL PROTECTED]
 
 -Original Message-
 From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, July 27, 2005 1:02 PM
 To: users@spamassassin.apache.org
 Subject: RE: generating rule stats from spamd logs
 
 
 My mistake.. It is fixed, hopefully for good.
 v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt
 
 
 TOP SPAM RULES FIRED
 
 RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
 %OFHAM
 
1UNPARSEABLE_RELAY   25322 7.35   74.72   99.76
 99.13
2URIBL_SBL   22241 6.46   65.63   87.63
 0.38
3URIBL_JP_SURBL  21419 6.22   63.20   84.39
 0.28
4URIBL_BLACK 19436 5.64   57.35   76.57
 0.93
5RAZOR2_CF_RANGE_51_100  17562 5.10   51.82   69.19
 1.34
6RAZOR2_CHECK17475 5.07   51.57   68.85
 1.15
7SARE_SPEC_ROLEX_REP 16553 4.81   48.84   65.22
 0.29
8SPOOF_COM2OTH   16537 4.80   48.80   65.15
 0.05
9RAZOR2_CF_RANGE_E8_51_100   16329 4.74   48.18   64.33
 0.16
   10BAYES_9915380 4.47   45.38   60.59
 0.28
 
  
 TOP HAM RULES FIRED
 
 RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
 %OFHAM
 
1UNPARSEABLE_RELAY843318.93   24.88   99.76
 99.13
2BAYES_00 700515.72   20.670.74
 82.34
3AWL  490411.01   14.47   26.64
 57.65
4HTML_MESSAGE 3813 8.56   11.25   22.92
 44.82
5NO_REAL_NAME 1453 3.264.29

RE: generating rule stats from spamd logs

2005-08-03 Thread Dallas L. Engelken
This would explain it.  

result: .  3 -
AWL,BAYES_50,DNS_FROM_AHBL_RHSBL,HTML_90_100,HTML_IMAGE_RATIO_02,HTML_ME
SSAGE,HTML_TAG_EXIST_TBODY,MIME_HTML_ONLY,SARE_OEM_S_PRICE,SARE_SUBLRNMR
scantime=1.9,size=14453,mid=[EMAIL PROTECTED]
m,bayes=0.515278005793156,autolearn=no

Its looking for something like this   user=user in the result:
line.   Maybe this a 3.1.x  thing only??  

[22289] info: spamd: result: . -2 -
BAYES_00,DK_SIGNED,RCVD_BY_IP,UNPARSEABLE_RELAY
scantime=1.2,size=3225,[EMAIL PROTECTED],uid=200,required_score=3.5,
rhost=localhost,raddr=127.0.0.1,rport=60485,mid=eee57c64050803112263a1e
[EMAIL PROTECTED],bayes=2.22044604925031e-16,autolearn=ham

I'd have to read the recipient from the previous line (prior to
result:), and hope a race condition doesn't apply that causes multiple
'clean message' lines or multiple 'result:' lines in a row

Its much nicer to have user= in the result: line for doing statistics
per user/domain... Maybe this is something that has to wait until 3.1

 
Dallas



 -Original Message-
 From: Matthew Yette [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 03, 2005 2:21 PM
 To: Dallas L. Engelken
 Subject: RE: generating rule stats from spamd logs
 
 This enough?
 
 Aug  3 15:16:15 mailer-03 spamd[19788]: connection from 
 localhost.localdomain [127.0.0.1] at port 60266 Aug  3 
 15:16:15 mailer-03 spamd[19788]: handle_user: unable to find 
 user '[EMAIL PROTECTED]'!
 Aug  3 15:16:15 mailer-03 spamd[19788]: checking message 
 (unknown) for [EMAIL PROTECTED]:511.
 Aug  3 15:16:17 mailer-03 spamd[22961]: connection from 
 localhost.localdomain [127.0.0.1] at port 60269 Aug  3 
 15:16:17 mailer-03 spamd[22961]: handle_user: unable to find 
 user '[EMAIL PROTECTED]'!
 Aug  3 15:16:17 mailer-03 spamd[22961]: checking message 
 [EMAIL PROTECTED] for [EMAIL PROTECTED]:511.
 Aug  3 15:16:19 mailer-03 spamd[22961]: clean message (3.3/5.0) for
 [EMAIL PROTECTED]:511 in 1.9 seconds, 14453 bytes.
 Aug  3 15:16:19 mailer-03 spamd[22961]: result: .  3 - 
 AWL,BAYES_50,DNS_FROM_AHBL_RHSBL,HTML_90_100,HTML_IMAGE_RATIO_
 02,HTML_ME
 SSAGE,HTML_TAG_EXIST_TBODY,MIME_HTML_ONLY,SARE_OEM_S_PRICE,SAR
 E_SUBLRNMR
 scantime=1.9,size=14453,mid=[EMAIL PROTECTED]
 -dialog.co
 m,bayes=0.515278005793156,autolearn=no
 Aug  3 15:16:22 mailer-03 spamd[19788]: clean message (4.6/5.0) for
 [EMAIL PROTECTED]:511 in 6.8 seconds, 24341 bytes.
 Aug  3 15:16:22 mailer-03 spamd[19788]: result: .  4 - 
 BAYES_99,FREE_SAMPLE,HTML_80_90,HTML_MESSAGE
 scantime=6.8,size=24341,mid=(unknown),bayes=0.05127653852,
 autolearn=
 no
 
 --
 Matthew Yette
 Senior Engineer - NOC/Operations
 MA Polce Consulting, Inc.
 [EMAIL PROTECTED]
 315-838-1644 (w)
 315-356-0597 (f)
 AIM/Yahoo: MAPolceNOC
 MSN: [EMAIL PROTECTED]
 
 -Original Message-
 From: Dallas L. Engelken [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, August 03, 2005 3:06 PM
 To: Matthew Yette
 Subject: RE: generating rule stats from spamd logs
 
 
 Can you give me a snip of your maillog please.
 
 Thanks,
 Dallas
  
 
  -Original Message-
  From: Matthew Yette [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, August 03, 2005 12:56 PM
  To: Dallas L. Engelken
  Subject: RE: generating rule stats from spamd logs
  
  http://nospam3.mapolce.com:812/stats/sas.html
  
  That right there is the output using 0.9 - just ran it!
  
  Perl sa-stats.pl --web  sas.html
  
  1.0 returns 0's for all #s, no rules fired... :(
  
  --
  Matthew Yette
  Senior Engineer - NOC/Operations
  MA Polce Consulting, Inc.
  [EMAIL PROTECTED]
  315-838-1644 (w)
  315-356-0597 (f)
  AIM/Yahoo: MAPolceNOC
  MSN: [EMAIL PROTECTED]
  
  -Original Message-
  From: Dallas L. Engelken [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, August 03, 2005 1:50 PM
  To: Matthew Yette
  Subject: RE: generating rule stats from spamd logs
  
  
  Thank god for SVN ;)
  
  http://www.rulesemporium.com/programs/sa-stats-0.9.txt
  
  
  
   -Original Message-
   From: Matthew Yette [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, August 03, 2005 12:49 PM
   To: Dallas L. Engelken
   Subject: RE: generating rule stats from spamd logs
   
   Odd - I used to run the prior version and it worked right 
 as rain. 
   Do you have the prvious version handy so that I may give 
 that a shot
 
   and compare side to side? Thanks
   
   Matt
   
   --
   Matthew Yette
   Senior Engineer - NOC/Operations
   MA Polce Consulting, Inc.
   [EMAIL PROTECTED]
   315-838-1644 (w)
   315-356-0597 (f)
   AIM/Yahoo: MAPolceNOC
   MSN: [EMAIL PROTECTED]
   
   -Original Message-
   From: Dallas L. Engelken [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, August 03, 2005 1:32 PM
   To: Matthew Yette
   Subject: RE: generating rule stats from spamd logs
   
   
   Working for me  Without -r, with -r @domain, and with -r 
   [EMAIL PROTECTED]
   
   ###
   
   # perl sa-stats.pl -l /var/log/spamd -f current
   
   Email:   29  Autolearn:14  AvgScore:   9.86

RE: generating rule stats from spamd logs

2005-07-29 Thread Matthew Yette
I'd be able to code it in myself but I'm not fluent in perl (PHP guy)
and of course, the string parsing functions confuse the hell out of me.
LOL. Thought that there might be a lot of perl coders here who can make
this a snap. [Recipient-domain-based filtering  date range also]

Thanks so much!

--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
[EMAIL PROTECTED]
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: [EMAIL PROTECTED]

-Original Message-
From: Matthew Yette 
Sent: Thursday, July 28, 2005 12:07 PM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs


Is there any way to modify this code to accept another command-line
argument for domain-specific? Meaning, I want to look for all rule hits
for mail destined for domain.com?

--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
[EMAIL PROTECTED]
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: [EMAIL PROTECTED]

-Original Message-
From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 27, 2005 1:02 PM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs


My mistake.. It is fixed, hopefully for good.
v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt


TOP SPAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1UNPARSEABLE_RELAY   25322 7.35   74.72   99.76
99.13
   2URIBL_SBL   22241 6.46   65.63   87.63
0.38
   3URIBL_JP_SURBL  21419 6.22   63.20   84.39
0.28
   4URIBL_BLACK 19436 5.64   57.35   76.57
0.93
   5RAZOR2_CF_RANGE_51_100  17562 5.10   51.82   69.19
1.34
   6RAZOR2_CHECK17475 5.07   51.57   68.85
1.15
   7SARE_SPEC_ROLEX_REP 16553 4.81   48.84   65.22
0.29
   8SPOOF_COM2OTH   16537 4.80   48.80   65.15
0.05
   9RAZOR2_CF_RANGE_E8_51_100   16329 4.74   48.18   64.33
0.16
  10BAYES_9915380 4.47   45.38   60.59
0.28

 
TOP HAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1UNPARSEABLE_RELAY843318.93   24.88   99.76
99.13
   2BAYES_00 700515.72   20.670.74
82.34
   3AWL  490411.01   14.47   26.64
57.65
   4HTML_MESSAGE 3813 8.56   11.25   22.92
44.82
   5NO_REAL_NAME 1453 3.264.29   37.79
17.08
   6HTML_80_90   1279 2.873.77   10.98
15.03
   7MIME_HTML_ONLY972 2.182.876.88
11.43
   8HTML_FONT_BIG 794 1.782.349.28
9.33
   9BAYES_50  625 1.401.84   25.40
7.35
  10HTML_FONT_FACE_BAD545 1.221.610.76
6.41


 




From: Steve Martin [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 27, 2005 11:44 AM
To: Andy Jezierski
Cc: Dallas L. Engelken; users@spamassassin.apache.org
Subject: Re: generating rule stats from spamd logs


He only fixed the spam rules section. 

The TOP HAM RULES sections still has these two incorrect
computations...

my $perc2=sprintf(%.2f,($HAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_HAM)*100);


Number of times a rule fired on ham / total number of spam
messages.
Number of times a rule fired on spam / total number of ham
messages.

my $perc2=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf(%.2f,($HAM_RULES{$key}/$NUM_HAM)*100);

On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote:



Dallas L. Engelken [EMAIL PROTECTED] wrote on
07/27/2005 11:26:54 AM:

   -Original Message-
  From: Chris Thielen
[mailto:[EMAIL PROTECTED] 
  Sent: Wednesday, July 27, 2005 11:02 AM
  To: Dallas L. Engelken
  Cc: users@spamassassin.apache.org
  Subject: Re: generating rule stats from spamd logs
  
  Dallas L. Engelken wrote:
  
  BAYES_00 hits 15.27 of spam on yours, the %ofspam
on top ham 
  rules and 
  %ofham on top spam

RE: generating rule stats from spamd logs

2005-07-28 Thread Matthew Yette
Is there any way to modify this code to accept another command-line
argument for domain-specific? Meaning, I want to look for all rule hits
for mail destined for domain.com?

--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
[EMAIL PROTECTED]
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: [EMAIL PROTECTED]

-Original Message-
From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 27, 2005 1:02 PM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs


My mistake.. It is fixed, hopefully for good.
v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt


TOP SPAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1UNPARSEABLE_RELAY   25322 7.35   74.72   99.76
99.13
   2URIBL_SBL   22241 6.46   65.63   87.63
0.38
   3URIBL_JP_SURBL  21419 6.22   63.20   84.39
0.28
   4URIBL_BLACK 19436 5.64   57.35   76.57
0.93
   5RAZOR2_CF_RANGE_51_100  17562 5.10   51.82   69.19
1.34
   6RAZOR2_CHECK17475 5.07   51.57   68.85
1.15
   7SARE_SPEC_ROLEX_REP 16553 4.81   48.84   65.22
0.29
   8SPOOF_COM2OTH   16537 4.80   48.80   65.15
0.05
   9RAZOR2_CF_RANGE_E8_51_100   16329 4.74   48.18   64.33
0.16
  10BAYES_9915380 4.47   45.38   60.59
0.28

 
TOP HAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1UNPARSEABLE_RELAY843318.93   24.88   99.76
99.13
   2BAYES_00 700515.72   20.670.74
82.34
   3AWL  490411.01   14.47   26.64
57.65
   4HTML_MESSAGE 3813 8.56   11.25   22.92
44.82
   5NO_REAL_NAME 1453 3.264.29   37.79
17.08
   6HTML_80_90   1279 2.873.77   10.98
15.03
   7MIME_HTML_ONLY972 2.182.876.88
11.43
   8HTML_FONT_BIG 794 1.782.349.28
9.33
   9BAYES_50  625 1.401.84   25.40
7.35
  10HTML_FONT_FACE_BAD545 1.221.610.76
6.41


 




From: Steve Martin [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 27, 2005 11:44 AM
To: Andy Jezierski
Cc: Dallas L. Engelken; users@spamassassin.apache.org
Subject: Re: generating rule stats from spamd logs


He only fixed the spam rules section. 

The TOP HAM RULES sections still has these two incorrect
computations...

my $perc2=sprintf(%.2f,($HAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_HAM)*100);


Number of times a rule fired on ham / total number of spam
messages.
Number of times a rule fired on spam / total number of ham
messages.

my $perc2=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf(%.2f,($HAM_RULES{$key}/$NUM_HAM)*100);

On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote:



Dallas L. Engelken [EMAIL PROTECTED] wrote on
07/27/2005 11:26:54 AM:

   -Original Message-
  From: Chris Thielen
[mailto:[EMAIL PROTECTED] 
  Sent: Wednesday, July 27, 2005 11:02 AM
  To: Dallas L. Engelken
  Cc: users@spamassassin.apache.org
  Subject: Re: generating rule stats from spamd logs
  
  Dallas L. Engelken wrote:
  
  BAYES_00 hits 15.27 of spam on yours, the %ofspam
on top ham 
  rules and 
  %ofham on top spam rules must be buggy.
  
  i'm not running that version with the 5th column.
It must be buggy.
  i play with it after bit. 
   
  Dallas

  
  
  Dallas,
  
  Did you see the patch I sent to the SARE list?  Just
need to 
  swap two hash lookups.
  
  
 
 Yup yup.
http://www.rulesemporium.com/programs/sa-stats.txt updated.
 
 D

RE: generating rule stats from spamd logs

2005-07-27 Thread Chris Santerre


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, July 26, 2005 7:15 PM
 To: jdow
 Cc: users@spamassassin.apache.org
 Subject: Re: generating rule stats from spamd logs 
 
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 
 jdow writes:
  From: Chris Santerre [EMAIL PROTECTED]
  
   Do you mean this script?
  
   http://www.rulesemporium.com/programs/sa-stats.txt
  
   Note: It may be named the same as sa-stats.pl, but it is 
 different. Per
   rule based.
  
   Another Dallas miracle!
  
  Oh? Er, how does it determine if a message was ham or spam? 
 It looks like
  it is rather random based on the reports. BAYES_99 may well 
 hit on 84.33%
  of spam. But I doubt, given it's score, it hits on 44.53% of ham.
 
 BTW, it might be quite helpful to rename that script, since there's
 already an sa-stats.pl in the 'tools' dir -- as follows:
 
 NAME
   sa-stats.pl - Builds received spam/ham report from mail log
 

Yeah, we know. It was originaly only used internaly by SARE. But why not
share the love :)

I'll see about renaming it. sare-stats.pl ? 

--Chris 


RE: generating rule stats from spamd logs

2005-07-27 Thread Andy Jezierski

 Another Dallas miracle!

Oh? Er, how does it determine if a message was ham
or spam? 
   It looks like
it is rather random based on the reports. BAYES_99
may well
   hit on 84.33%
of spam. But I doubt, given it's score, it hits on
44.53% of ham.
   
 
 The code should be right... It uses spamassassin's judgement, ie 
 
 info: spamd: result: Y 20 - BAYES_99,...
 info: spamd: result: . -2 - AWL,
 
 44.53% of your ham hit BAYES_99... That gotta tell you something is
 wrong! My bayes hits break down like
 
 # ./sa-stats.pl -f spamdlog -n 500 | grep BAYES
 For spam...
  10  BAYES_99 
  15351   4.46%
45.42% 60.57%
  19  BAYES_50 
   6443   1.87%
19.06% 25.42%
  31  BAYES_80 
   1154   0.34%
 3.41%  4.55%
  32  BAYES_60 
   1147   0.33%
 3.39%  4.53%
  38  BAYES_95 
   864  
0.25%  2.56%  3.41%
 102  BAYES_00 
   187  
0.05%  0.55%  0.74%
 152  BAYES_40 
92  
0.03%  0.27%  0.36%
 209  BAYES_20 
53  
0.02%  0.16%  0.21%
 228  BAYES_05 
44  
0.01%  0.13%  0.17%
 
 For ham...
  2  BAYES_00 
   6959  15.73%
20.59% 82.32%
  9  BAYES_50 
   623  
1.41%  1.84%  7.37%
  20  BAYES_40 
   296  
0.67%  0.88%  3.50%
  24  BAYES_20 
   267  
0.60%  0.79%  3.16%
  29  BAYES_05 
   217  
0.49%  0.64%  2.57%
  73  BAYES_60 
51  
0.12%  0.15%  0.60%
 113  BAYES_99 
24  
0.05%  0.07%  0.28%
 142  BAYES_80 
14  
0.03%  0.04%  0.17%
 280  BAYES_95 
2 
 0.00%  0.01%  0.02%
 
 So, BAYES_99 hits 0.28% of my ham and 60.57% of my spam. 
 
 

So from your explanation I should be ignoring the
%ofham column in the spam stats and the %ofspam column in ham? Otherwise
the stats don't seem to make much sense:

python# ./sa-stats -f maillog.0 -n 500 | grep BAYES

spam rules...
 3  BAYES_99  
  305
  3.49  4.99  46.56  5.59
 10  BAYES_50   
 172
  1.97  2.81  26.26  3.15
 23  BAYES_00   
 100
  1.14  1.64  15.27  1.83
 77  BAYES_80   
  21
  0.24  0.34  3.21  0.38
 85  BAYES_95   
  19
  0.22  0.31  2.90  0.35
111  BAYES_60   
  14
  0.16  0.23  2.14  0.26
131  BAYES_05   
  12
  0.14  0.20  1.83  0.22
186  BAYES_20   
  7
  0.08  0.11  1.07  0.13
224  BAYES_40   
  5
  0.06  0.08  0.76  0.09
373  SARE_BAYES_5x8  
2 
 0.02  0.03  0.31  0.04
387  SARE_BAYES_6x8  
2 
 0.02  0.03  0.31  0.04
412  SARE_BAYES_7x8  
2 
 0.02  0.03  0.31  0.04

ham rules...
 1  BAYES_00  
  4079
 14.05  66.75 622.75  74.76

BAYES_00 hitting 622% of spam???

 6  BAYES_50  
  771
  2.65  12.62 117.71  14.13
 25  BAYES_40   
 238
  0.82  3.89  36.34  4.36
 35  BAYES_20   
 190
  0.65  3.11  29.01  3.48
 40  BAYES_05   
 148
  0.51  2.42  22.60  2.71
173  BAYES_60   
  15
  0.05  0.25  2.29  0.27
232  BAYES_80   
  9
  0.03  0.15  1.37  0.16
310  BAYES_95   
  5
  0.02  0.08  0.76  0.09
349  SARE_BAYES_6x6  
4 
 0.01  0.07  0.61  0.07
416  SARE_BAYES_5x8  
2 
 0.01  0.03  0.31  0.04
496  SARE_BAYES_5x7  
1 
 0.00  0.02  0.15  0.02



Andy

RE: generating rule stats from spamd logs

2005-07-27 Thread Dallas L. Engelken
BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham rules and
%ofham on top spam rules must be buggy.

i'm not running that version with the 5th column.   It must be buggy.
i play with it after bit. 
 
Dallas
 
 




From: Andy Jezierski [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 27, 2005 10:44 AM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs



 Another Dallas miracle!

Oh? Er, how does it determine if a message was ham or
spam? 
   It looks like
it is rather random based on the reports. BAYES_99 may
well
   hit on 84.33%
of spam. But I doubt, given it's score, it hits on
44.53% of ham.
   
 
 The code should be right... It uses spamassassin's judgement,
ie 
 
 info: spamd: result: Y 20 - BAYES_99,...
 info: spamd: result: . -2 - AWL,
 
 44.53% of your ham hit BAYES_99... That gotta tell you
something is
 wrong!  My bayes hits break down like
 
 # ./sa-stats.pl -f spamdlog -n 500 | grep BAYES
 For spam...
   10BAYES_9915351 4.46%
45.42%  60.57%
   19BAYES_50 6443 1.87%
19.06%  25.42%
   31BAYES_80 1154 0.34%
3.41%   4.55%
   32BAYES_60 1147 0.33%
3.39%   4.53%
   38BAYES_95  864 0.25%
2.56%   3.41%
  102BAYES_00  187 0.05%
0.55%   0.74%
  152BAYES_40   92 0.03%
0.27%   0.36%
  209BAYES_20   53 0.02%
0.16%   0.21%
  228BAYES_05   44 0.01%
0.13%   0.17%
 
 For ham...
2BAYES_00 695915.73%
20.59%  82.32%
9BAYES_50  623 1.41%
1.84%   7.37%
   20BAYES_40  296 0.67%
0.88%   3.50%
   24BAYES_20  267 0.60%
0.79%   3.16%
   29BAYES_05  217 0.49%
0.64%   2.57%
   73BAYES_60   51 0.12%
0.15%   0.60%
  113BAYES_99   24 0.05%
0.07%   0.28%
  142BAYES_80   14 0.03%
0.04%   0.17%
  280BAYES_952 0.00%
0.01%   0.02%
 
 So, BAYES_99 hits 0.28% of my ham and 60.57% of my spam.  
 
 

So from your explanation I should be ignoring the %ofham column
in the spam stats and the %ofspam column in ham?  Otherwise the stats
don't seem to make much sense: 

python# ./sa-stats -f maillog.0 -n 500 | grep BAYES 

spam rules... 
   3BAYES_99  305 3.494.99
46.565.59 
  10BAYES_50  172 1.972.81
26.263.15 
  23BAYES_00  100 1.141.64
15.271.83 
  77BAYES_80   21 0.240.34
3.210.38 
  85BAYES_95   19 0.220.31
2.900.35 
 111BAYES_60   14 0.160.23
2.140.26 
 131BAYES_05   12 0.140.20
1.830.22 
 186BAYES_207 0.080.11
1.070.13 
 224BAYES_405 0.060.08
0.760.09 
 373SARE_BAYES_5x8  2 0.020.03
0.310.04 
 387SARE_BAYES_6x8  2 0.020.03
0.310.04 
 412SARE_BAYES_7x8  2 0.020.03
0.310.04 

ham rules... 
   1BAYES_00 407914.05   66.75
622.75   74.76 

BAYES_00 hitting 622% of spam??? 

   6BAYES_50  771 2.65   12.62
117.71   14.13 
  25BAYES_40  238 0.823.89
36.344.36 
  35BAYES_20  190 0.653.11
29.013.48 
  40BAYES_05  148 0.512.42
22.602.71 
 173BAYES_60   15 0.050.25
2.290.27 
 232BAYES_809 0.030.15
1.370.16 
 310BAYES_955 0.020.08
0.760.09 
 349SARE_BAYES_6x6  4 0.010.07
0.610.07 
 416SARE_BAYES_5x8

Re: generating rule stats from spamd logs

2005-07-27 Thread Chris Thielen

Dallas L. Engelken wrote:


BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham rules and
%ofham on top spam rules must be buggy.

i'm not running that version with the 5th column.   It must be buggy.
i play with it after bit. 


Dallas
 



Dallas,

Did you see the patch I sent to the SARE list?  Just need to swap two 
hash lookups.



Chris T


signature.asc
Description: OpenPGP digital signature


RE: generating rule stats from spamd logs

2005-07-27 Thread martin smith
M  10BAYES_9915351 4.46%  45.42%  60.57%
M  19BAYES_50 6443 1.87%  19.06%  25.42%
M  31BAYES_80 1154 0.34%   3.41%   4.55%
M  32BAYES_60 1147 0.33%   3.39%   4.53%
M  38BAYES_95  864 0.25%   2.56%   3.41%
M 102BAYES_00  187 0.05%   0.55%   0.74%
M 152BAYES_40   92 0.03%   0.27%   0.36%
M 209BAYES_20   53 0.02%   0.16%   0.21%
M 228BAYES_05   44 0.01%   0.13%   0.17%
M
MFor ham...
M   2BAYES_00 695915.73%  20.59%  82.32%
M   9BAYES_50  623 1.41%   1.84%   7.37%
M  20BAYES_40  296 0.67%   0.88%   3.50%
M  24BAYES_20  267 0.60%   0.79%   3.16%
M  29BAYES_05  217 0.49%   0.64%   2.57%
M  73BAYES_60   51 0.12%   0.15%   0.60%
M 113BAYES_99   24 0.05%   0.07%   0.28%
M 142BAYES_80   14 0.03%   0.04%   0.17%
M 280BAYES_952 0.00%   0.01%   0.02%
M
MSo, BAYES_99 hits 0.28% of my ham and 60.57% of my spam.  
M

You must have a different version to the one now available because your
missing one column

Spam
RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1BAYES_99  468 5.94   75.48   97.91
329.58
   2RAZOR2_CHECK  422 5.35   68.06   88.28
297.18
   3RAZOR2_CF_RANGE_51_100421 5.34   67.90   88.08
296.48
   4URIBL_BLACK   353 4.48   56.94   73.85
248.59

The %ofham column is obviously wrong but the others seem fine

Ham
RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1BAYES_00  13737.33   22.10   28.66
96.48
   2AWL   11230.52   18.06   23.43
78.87
   3HTML_MESSAGE   16 4.362.583.35
11.27
   7UPPERCASE_25_50 9 2.451.451.88
6.34
   8URIBL_BLACK 5 1.360.811.05
3.52

Again the Spam column is wrong here and should be ignored, nice to see whats
false positiving so I can lower scores accordingly.

Martin



RE: generating rule stats from spamd logs

2005-07-27 Thread Dallas L. Engelken
  -Original Message-
 From: Chris Thielen [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, July 27, 2005 11:02 AM
 To: Dallas L. Engelken
 Cc: users@spamassassin.apache.org
 Subject: Re: generating rule stats from spamd logs
 
 Dallas L. Engelken wrote:
 
 BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham 
 rules and 
 %ofham on top spam rules must be buggy.
 
 i'm not running that version with the 5th column.   It must be buggy.
 i play with it after bit. 
  
 Dallas
   
 
 
 Dallas,
 
 Did you see the patch I sent to the SARE list?  Just need to 
 swap two hash lookups.
 
 

Yup yup.  http://www.rulesemporium.com/programs/sa-stats.txt updated.

D


RE: generating rule stats from spamd logs

2005-07-27 Thread Andy Jezierski

Dallas L. Engelken [EMAIL PROTECTED]
wrote on 07/27/2005 11:26:54 AM:

  -Original Message-
  From: Chris Thielen [mailto:[EMAIL PROTECTED]

  Sent: Wednesday, July 27, 2005 11:02 AM
  To: Dallas L. Engelken
  Cc: users@spamassassin.apache.org
  Subject: Re: generating rule stats from spamd logs
  
  Dallas L. Engelken wrote:
  
  BAYES_00 hits 15.27 of spam on yours, the %ofspam on top
ham 
  rules and 
  %ofham on top spam rules must be buggy.
  
  i'm not running that version with the 5th column. 
It must be buggy.
  i play with it after bit. 
   
  Dallas
   
  
  
  Dallas,
  
  Did you see the patch I sent to the SARE list? Just need
to 
  swap two hash lookups.
  
  
 
 Yup yup. http://www.rulesemporium.com/programs/sa-stats.txt
updated.
 
 D


Something's still a little fishy. SA 3.1 latest
SVN, if it makes any difference.



python# ./sa-stats -f maillog.0 -n 5
Email:   6111 Autolearn: 
226 AvgScore:  2.15 AvgScanTime: 3.91 sec
Spam:655 Autolearn: 
133 AvgScore: 14.81 AvgScanTime: 3.76 sec
Ham:5456 Autolearn: 
93 AvgScore:  0.63 AvgScanTime: 3.93 sec

Time Spent Running SA:
6.64 hours
Time Spent Processing Spam:  0.68 hours
Time Spent Processing Ham:   5.96 hours

TOP SPAM RULES FIRED

RANK  RULE NAME
   COUNT %OFRULES %OFMAIL
%OFSPAM %OFHAM

 1  HTML_MESSAGE  
496 
 5.67  8.12  75.73  62.19
 2  DCC_CHECK  
  310
  3.55  5.07  47.33  7.02
 3  BAYES_99  
  305
  3.49  4.99  46.56  0.02
 4  RAZOR2_CHECK  
277 
 3.17  4.53  42.29  4.23
 5  DIGEST_MULTIPLE 
251  
2.87  4.11  38.32  2.42


TOP HAM RULES FIRED

RANK  RULE NAME
   COUNT %OFRULES %OFMAIL
%OFSPAM %OFHAM

 1  BAYES_00  
  4079
 14.05  66.75 622.75  1.83
 2  HTML_MESSAGE  
3393  11.68
 55.52 518.02  9.09
 3  NO_REAL_NAME  
1053  
3.63  17.23 160.76  1.06
 4  HTML_80_90  
 931
  3.21  15.23 142.14  2.35
 5  LG_4C_2V_3C  
 798 
 2.75  13.06 121.83  2.20




Re: generating rule stats from spamd logs

2005-07-27 Thread Steve Martin
He only fixed the spam rules section.The TOP HAM RULES sections still has these two incorrect computations...    my $perc2=sprintf("%.2f",($HAM_RULES{$key}/$NUM_SPAM)*100);    my $perc3=sprintf("%.2f",($SPAM_RULES{$key}/$NUM_HAM)*100);Number of times a rule fired on ham / total number of spam messages.Number of times a rule fired on spam / total number of ham messages.    my $perc2=sprintf("%.2f",($SPAM_RULES{$key}/$NUM_SPAM)*100);    my $perc3=sprintf("%.2f",($HAM_RULES{$key}/$NUM_HAM)*100);On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote:"Dallas L. Engelken" [EMAIL PROTECTED] wrote on 07/27/2005 11:26:54 AM:     -Original Message-   From: Chris Thielen [mailto:[EMAIL PROTECTED]]Sent: Wednesday, July 27, 2005 11:02 AM   To: Dallas L. Engelken   Cc: users@spamassassin.apache.org   Subject: Re: generating rule stats from spamd logs  Dallas L. Engelken wrote:  BAYES_00 hits 15.27 of spam on yours, the %ofspam on top hamrules and%ofham on top spam rules must be buggy.  i'm not running that version with the 5th column.   It must be buggy.   i play with it after bit.Dallas  Dallas,  Did you see the patch I sent to the SARE list?  Just need toswap two hash lookups.  Yup yup.  http://www.rulesemporium.com/programs/sa-stats.txt updated.D   Something's still a little fishy.  SA 3.1 latest SVN, if it makes any difference.python# ./sa-stats -f maillog.0 -n 5 Email:     6111  Autolearn:   226  AvgScore:   2.15  AvgScanTime:  3.91 sec Spam:       655  Autolearn:   133  AvgScore:  14.81  AvgScanTime:  3.76 sec Ham:       5456  Autolearn:    93  AvgScore:   0.63  AvgScanTime:  3.93 sec  Time Spent Running SA:         6.64 hours Time Spent Processing Spam:    0.68 hours Time Spent Processing Ham:     5.96 hours  TOP SPAM RULES FIRED  RANK    RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM     1    HTML_MESSAGE                      496     5.67    8.12   75.73   62.19    2    DCC_CHECK                         310     3.55    5.07   47.33    7.02    3    BAYES_99                          305     3.49    4.99   46.56    0.02    4    RAZOR2_CHECK                      277     3.17    4.53   42.29    4.23    5    DIGEST_MULTIPLE                   251     2.87    4.11   38.32    2.42   TOP HAM RULES FIRED  RANK    RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM     1    BAYES_00                         4079    14.05   66.75  622.75    1.83    2    HTML_MESSAGE                     3393    11.68   55.52  518.02    9.09    3    NO_REAL_NAME                     1053     3.63   17.23  160.76    1.06    4    HTML_80_90                        931     3.21   15.23  142.14    2.35    5    LG_4C_2V_3C                       798     2.75   13.06  121.83    2.20     -- Steve Martin                              http://www.cheezmo.com/ Smart Calibration, LLC           http://www.smartcalibration.com/ The Widescreen Movie Center            http://www.widemovies.com/ Letterboxed Movie TV Schedule  http://www.widemovies.com/lbx.html  

RE: generating rule stats from spamd logs

2005-07-27 Thread Dallas L. Engelken
My mistake.. It is fixed, hopefully for good.
v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt


TOP SPAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1UNPARSEABLE_RELAY   25322 7.35   74.72   99.76
99.13
   2URIBL_SBL   22241 6.46   65.63   87.63
0.38
   3URIBL_JP_SURBL  21419 6.22   63.20   84.39
0.28
   4URIBL_BLACK 19436 5.64   57.35   76.57
0.93
   5RAZOR2_CF_RANGE_51_100  17562 5.10   51.82   69.19
1.34
   6RAZOR2_CHECK17475 5.07   51.57   68.85
1.15
   7SARE_SPEC_ROLEX_REP 16553 4.81   48.84   65.22
0.29
   8SPOOF_COM2OTH   16537 4.80   48.80   65.15
0.05
   9RAZOR2_CF_RANGE_E8_51_100   16329 4.74   48.18   64.33
0.16
  10BAYES_9915380 4.47   45.38   60.59
0.28

 
TOP HAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1UNPARSEABLE_RELAY843318.93   24.88   99.76
99.13
   2BAYES_00 700515.72   20.670.74
82.34
   3AWL  490411.01   14.47   26.64
57.65
   4HTML_MESSAGE 3813 8.56   11.25   22.92
44.82
   5NO_REAL_NAME 1453 3.264.29   37.79
17.08
   6HTML_80_90   1279 2.873.77   10.98
15.03
   7MIME_HTML_ONLY972 2.182.876.88
11.43
   8HTML_FONT_BIG 794 1.782.349.28
9.33
   9BAYES_50  625 1.401.84   25.40
7.35
  10HTML_FONT_FACE_BAD545 1.221.610.76
6.41


 




From: Steve Martin [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 27, 2005 11:44 AM
To: Andy Jezierski
Cc: Dallas L. Engelken; users@spamassassin.apache.org
Subject: Re: generating rule stats from spamd logs


He only fixed the spam rules section. 

The TOP HAM RULES sections still has these two incorrect
computations...

my $perc2=sprintf(%.2f,($HAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_HAM)*100);


Number of times a rule fired on ham / total number of spam
messages.
Number of times a rule fired on spam / total number of ham
messages.

my $perc2=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf(%.2f,($HAM_RULES{$key}/$NUM_HAM)*100);

On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote:



Dallas L. Engelken [EMAIL PROTECTED] wrote on
07/27/2005 11:26:54 AM:

   -Original Message-
  From: Chris Thielen
[mailto:[EMAIL PROTECTED] 
  Sent: Wednesday, July 27, 2005 11:02 AM
  To: Dallas L. Engelken
  Cc: users@spamassassin.apache.org
  Subject: Re: generating rule stats from spamd logs
  
  Dallas L. Engelken wrote:
  
  BAYES_00 hits 15.27 of spam on yours, the %ofspam
on top ham 
  rules and 
  %ofham on top spam rules must be buggy.
  
  i'm not running that version with the 5th column.
It must be buggy.
  i play with it after bit. 
   
  Dallas

  
  
  Dallas,
  
  Did you see the patch I sent to the SARE list?  Just
need to 
  swap two hash lookups.
  
  
 
 Yup yup.
http://www.rulesemporium.com/programs/sa-stats.txt updated.
 
 D


Something's still a little fishy.  SA 3.1 latest SVN, if
it makes any difference. 



python# ./sa-stats -f maillog.0 -n 5 
Email: 6111  Autolearn:   226  AvgScore:   2.15
AvgScanTime:  3.91 sec 
Spam:   655  Autolearn:   133  AvgScore:  14.81
AvgScanTime:  3.76 sec 
Ham:   5456  Autolearn:93  AvgScore:   0.63
AvgScanTime:  3.93 sec 

Time Spent Running SA

RE: generating rule stats from spamd logs

2005-07-26 Thread Chris Santerre


 -Original Message-
 From: Charles Sprickman [mailto:[EMAIL PROTECTED]
 Sent: Monday, July 25, 2005 10:46 PM
 To: users@spamassassin.apache.org
 Subject: generating rule stats from spamd logs
 
 
 Hi,
 
 Anyone aware of anything that can parse a day's spamd logs 
 and then give a 
 summary of total hits per rule?  I noticed since 3.0.x that 
 all rule hits 
 are in the logs now:
 
 Jul 25 22:44:49 spamd2 spamd[59436]: result: Y 14 - 
 BAYES_60,DATE_IN_FUTURE_03_06,DNS_FROM_RFC_POST,URIBL_BLACK,UR
 IBL_JP_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL 
 scantime=6.7,size=2027,mid=[EMAIL PROTECTED]
 ah,bayes=0.781998195315203,autolearn=disabled
 
 I've got three spamd boxes logging to one server.  I already run 
 sa-stats.pl daily, but I'd like to see more information about 
 what rules 
 are hitting.  I did see a few things in the wiki, but most of 
 them look to 
 be tied to snarfing MTA logs.

Do you mean this script?

http://www.rulesemporium.com/programs/sa-stats.txt

Note: It may be named the same as sa-stats.pl, but it is different. Per rule
based. 

Another Dallas miracle!

Chris Santerre
SysAdmin and SARE/URIBL ninja
http://www.uribl.com
http://www.rulesemporium.com



Re: generating rule stats from spamd logs

2005-07-26 Thread jdow
From: Chris Santerre [EMAIL PROTECTED]

 Do you mean this script?

 http://www.rulesemporium.com/programs/sa-stats.txt

 Note: It may be named the same as sa-stats.pl, but it is different. Per
rule
 based.

 Another Dallas miracle!

Oh? Er, how does it determine if a message was ham or spam? It looks like
it is rather random based on the reports. BAYES_99 may well hit on 84.33%
of spam. But I doubt, given it's score, it hits on 44.53% of ham.

{^_^}