RE: generating rule stats from spamd logs
No one has any thoughts on this? It's not a quick fix? :( -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Matthew Yette Sent: Friday, July 29, 2005 8:24 AM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs I'd be able to code it in myself but I'm not fluent in perl (PHP guy) and of course, the string parsing functions confuse the hell out of me. LOL. Thought that there might be a lot of perl coders here who can make this a snap. [Recipient-domain-based filtering date range also] Thanks so much! -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Matthew Yette Sent: Thursday, July 28, 2005 12:07 PM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs Is there any way to modify this code to accept another command-line argument for domain-specific? Meaning, I want to look for all rule hits for mail destined for domain.com? -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 1:02 PM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs My mistake.. It is fixed, hopefully for good. v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt TOP SPAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY 25322 7.35 74.72 99.76 99.13 2URIBL_SBL 22241 6.46 65.63 87.63 0.38 3URIBL_JP_SURBL 21419 6.22 63.20 84.39 0.28 4URIBL_BLACK 19436 5.64 57.35 76.57 0.93 5RAZOR2_CF_RANGE_51_100 17562 5.10 51.82 69.19 1.34 6RAZOR2_CHECK17475 5.07 51.57 68.85 1.15 7SARE_SPEC_ROLEX_REP 16553 4.81 48.84 65.22 0.29 8SPOOF_COM2OTH 16537 4.80 48.80 65.15 0.05 9RAZOR2_CF_RANGE_E8_51_100 16329 4.74 48.18 64.33 0.16 10BAYES_9915380 4.47 45.38 60.59 0.28 TOP HAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY843318.93 24.88 99.76 99.13 2BAYES_00 700515.72 20.670.74 82.34 3AWL 490411.01 14.47 26.64 57.65 4HTML_MESSAGE 3813 8.56 11.25 22.92 44.82 5NO_REAL_NAME 1453 3.264.29 37.79 17.08 6HTML_80_90 1279 2.873.77 10.98 15.03 7MIME_HTML_ONLY972 2.182.876.88 11.43 8HTML_FONT_BIG 794 1.782.349.28 9.33 9BAYES_50 625 1.401.84 25.40 7.35 10HTML_FONT_FACE_BAD545 1.221.610.76 6.41 From: Steve Martin [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:44 AM To: Andy Jezierski Cc: Dallas L. Engelken; users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs He only fixed the spam rules section. The TOP HAM RULES sections still has these two incorrect computations... my $perc2=sprintf(%.2f,($HAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_HAM)*100); Number of times a rule fired on ham / total number of spam messages. Number of times a rule fired on spam / total number of ham messages. my $perc2=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf(%.2f,($HAM_RULES{$key}/$NUM_HAM)*100); On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote: Dallas L. Engelken [EMAIL PROTECTED] wrote on 07/27/2005 11:26:54 AM: -Original Message- From: Chris Thielen [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27
RE: generating rule stats from spamd logs
v1.0 now has per-user and per-domain support http://www.rulesemporium.com/programs/sa-stats.txt # Per User and Per Domain Statistics... # -- # # ./sa-stats.pl -r postmaster #- this would give all stats for postmaster users, # regardless of which domain it was for. handy if you # have alot of domain aliases # # ./sa-stats.pl -r @domain #- this would give all stats for the domain specified. # make sure you include the '@' sign before the # domain or the script will assume you wanted a user # name instead. # # ./sa-stats.pl -r [EMAIL PROTECTED] #- this would give all stats for a specific email address. # this assumes you pass 'spamc -u fullemail' vs. # 'spamc -u userpart'. If you do the latter, you simply # want to call -r userpart instead. # # -- I would have to incorporate Time::Local, Date::Manip, and Parse::Syslog into it to be able to do date start and stop times, and at this point I really don't want to ;) Besides, I store my logs in tai64 format, so it wouldn't help me at all. If someone else wants to code Parse::Syslog support into it, be my guest.. Or port some of this code into the sa-stats that provided in the distro, have at it.. Dallas -Original Message- From: Matthew Yette [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 11:05 AM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs No one has any thoughts on this? It's not a quick fix? :( -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Matthew Yette Sent: Friday, July 29, 2005 8:24 AM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs I'd be able to code it in myself but I'm not fluent in perl (PHP guy) and of course, the string parsing functions confuse the hell out of me. LOL. Thought that there might be a lot of perl coders here who can make this a snap. [Recipient-domain-based filtering date range also] Thanks so much! -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Matthew Yette Sent: Thursday, July 28, 2005 12:07 PM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs Is there any way to modify this code to accept another command-line argument for domain-specific? Meaning, I want to look for all rule hits for mail destined for domain.com? -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 1:02 PM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs My mistake.. It is fixed, hopefully for good. v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt TOP SPAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY 25322 7.35 74.72 99.76 99.13 2URIBL_SBL 22241 6.46 65.63 87.63 0.38 3URIBL_JP_SURBL 21419 6.22 63.20 84.39 0.28 4URIBL_BLACK 19436 5.64 57.35 76.57 0.93 5RAZOR2_CF_RANGE_51_100 17562 5.10 51.82 69.19 1.34 6RAZOR2_CHECK17475 5.07 51.57 68.85 1.15 7SARE_SPEC_ROLEX_REP 16553 4.81 48.84 65.22 0.29 8SPOOF_COM2OTH 16537 4.80 48.80 65.15 0.05 9RAZOR2_CF_RANGE_E8_51_100 16329 4.74 48.18 64.33 0.16 10BAYES_9915380 4.47 45.38 60.59 0.28 TOP HAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY843318.93 24.88 99.76 99.13 2BAYES_00 700515.72 20.670.74 82.34 3AWL 490411.01 14.47 26.64 57.65 4HTML_MESSAGE 3813 8.56 11.25 22.92 44.82 5NO_REAL_NAME 1453 3.264.29
RE: generating rule stats from spamd logs
This would explain it. result: . 3 - AWL,BAYES_50,DNS_FROM_AHBL_RHSBL,HTML_90_100,HTML_IMAGE_RATIO_02,HTML_ME SSAGE,HTML_TAG_EXIST_TBODY,MIME_HTML_ONLY,SARE_OEM_S_PRICE,SARE_SUBLRNMR scantime=1.9,size=14453,mid=[EMAIL PROTECTED] m,bayes=0.515278005793156,autolearn=no Its looking for something like this user=user in the result: line. Maybe this a 3.1.x thing only?? [22289] info: spamd: result: . -2 - BAYES_00,DK_SIGNED,RCVD_BY_IP,UNPARSEABLE_RELAY scantime=1.2,size=3225,[EMAIL PROTECTED],uid=200,required_score=3.5, rhost=localhost,raddr=127.0.0.1,rport=60485,mid=eee57c64050803112263a1e [EMAIL PROTECTED],bayes=2.22044604925031e-16,autolearn=ham I'd have to read the recipient from the previous line (prior to result:), and hope a race condition doesn't apply that causes multiple 'clean message' lines or multiple 'result:' lines in a row Its much nicer to have user= in the result: line for doing statistics per user/domain... Maybe this is something that has to wait until 3.1 Dallas -Original Message- From: Matthew Yette [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 2:21 PM To: Dallas L. Engelken Subject: RE: generating rule stats from spamd logs This enough? Aug 3 15:16:15 mailer-03 spamd[19788]: connection from localhost.localdomain [127.0.0.1] at port 60266 Aug 3 15:16:15 mailer-03 spamd[19788]: handle_user: unable to find user '[EMAIL PROTECTED]'! Aug 3 15:16:15 mailer-03 spamd[19788]: checking message (unknown) for [EMAIL PROTECTED]:511. Aug 3 15:16:17 mailer-03 spamd[22961]: connection from localhost.localdomain [127.0.0.1] at port 60269 Aug 3 15:16:17 mailer-03 spamd[22961]: handle_user: unable to find user '[EMAIL PROTECTED]'! Aug 3 15:16:17 mailer-03 spamd[22961]: checking message [EMAIL PROTECTED] for [EMAIL PROTECTED]:511. Aug 3 15:16:19 mailer-03 spamd[22961]: clean message (3.3/5.0) for [EMAIL PROTECTED]:511 in 1.9 seconds, 14453 bytes. Aug 3 15:16:19 mailer-03 spamd[22961]: result: . 3 - AWL,BAYES_50,DNS_FROM_AHBL_RHSBL,HTML_90_100,HTML_IMAGE_RATIO_ 02,HTML_ME SSAGE,HTML_TAG_EXIST_TBODY,MIME_HTML_ONLY,SARE_OEM_S_PRICE,SAR E_SUBLRNMR scantime=1.9,size=14453,mid=[EMAIL PROTECTED] -dialog.co m,bayes=0.515278005793156,autolearn=no Aug 3 15:16:22 mailer-03 spamd[19788]: clean message (4.6/5.0) for [EMAIL PROTECTED]:511 in 6.8 seconds, 24341 bytes. Aug 3 15:16:22 mailer-03 spamd[19788]: result: . 4 - BAYES_99,FREE_SAMPLE,HTML_80_90,HTML_MESSAGE scantime=6.8,size=24341,mid=(unknown),bayes=0.05127653852, autolearn= no -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 3:06 PM To: Matthew Yette Subject: RE: generating rule stats from spamd logs Can you give me a snip of your maillog please. Thanks, Dallas -Original Message- From: Matthew Yette [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 12:56 PM To: Dallas L. Engelken Subject: RE: generating rule stats from spamd logs http://nospam3.mapolce.com:812/stats/sas.html That right there is the output using 0.9 - just ran it! Perl sa-stats.pl --web sas.html 1.0 returns 0's for all #s, no rules fired... :( -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 1:50 PM To: Matthew Yette Subject: RE: generating rule stats from spamd logs Thank god for SVN ;) http://www.rulesemporium.com/programs/sa-stats-0.9.txt -Original Message- From: Matthew Yette [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 12:49 PM To: Dallas L. Engelken Subject: RE: generating rule stats from spamd logs Odd - I used to run the prior version and it worked right as rain. Do you have the prvious version handy so that I may give that a shot and compare side to side? Thanks Matt -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 1:32 PM To: Matthew Yette Subject: RE: generating rule stats from spamd logs Working for me Without -r, with -r @domain, and with -r [EMAIL PROTECTED] ### # perl sa-stats.pl -l /var/log/spamd -f current Email: 29 Autolearn:14 AvgScore: 9.86
RE: generating rule stats from spamd logs
I'd be able to code it in myself but I'm not fluent in perl (PHP guy) and of course, the string parsing functions confuse the hell out of me. LOL. Thought that there might be a lot of perl coders here who can make this a snap. [Recipient-domain-based filtering date range also] Thanks so much! -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Matthew Yette Sent: Thursday, July 28, 2005 12:07 PM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs Is there any way to modify this code to accept another command-line argument for domain-specific? Meaning, I want to look for all rule hits for mail destined for domain.com? -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 1:02 PM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs My mistake.. It is fixed, hopefully for good. v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt TOP SPAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY 25322 7.35 74.72 99.76 99.13 2URIBL_SBL 22241 6.46 65.63 87.63 0.38 3URIBL_JP_SURBL 21419 6.22 63.20 84.39 0.28 4URIBL_BLACK 19436 5.64 57.35 76.57 0.93 5RAZOR2_CF_RANGE_51_100 17562 5.10 51.82 69.19 1.34 6RAZOR2_CHECK17475 5.07 51.57 68.85 1.15 7SARE_SPEC_ROLEX_REP 16553 4.81 48.84 65.22 0.29 8SPOOF_COM2OTH 16537 4.80 48.80 65.15 0.05 9RAZOR2_CF_RANGE_E8_51_100 16329 4.74 48.18 64.33 0.16 10BAYES_9915380 4.47 45.38 60.59 0.28 TOP HAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY843318.93 24.88 99.76 99.13 2BAYES_00 700515.72 20.670.74 82.34 3AWL 490411.01 14.47 26.64 57.65 4HTML_MESSAGE 3813 8.56 11.25 22.92 44.82 5NO_REAL_NAME 1453 3.264.29 37.79 17.08 6HTML_80_90 1279 2.873.77 10.98 15.03 7MIME_HTML_ONLY972 2.182.876.88 11.43 8HTML_FONT_BIG 794 1.782.349.28 9.33 9BAYES_50 625 1.401.84 25.40 7.35 10HTML_FONT_FACE_BAD545 1.221.610.76 6.41 From: Steve Martin [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:44 AM To: Andy Jezierski Cc: Dallas L. Engelken; users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs He only fixed the spam rules section. The TOP HAM RULES sections still has these two incorrect computations... my $perc2=sprintf(%.2f,($HAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_HAM)*100); Number of times a rule fired on ham / total number of spam messages. Number of times a rule fired on spam / total number of ham messages. my $perc2=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf(%.2f,($HAM_RULES{$key}/$NUM_HAM)*100); On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote: Dallas L. Engelken [EMAIL PROTECTED] wrote on 07/27/2005 11:26:54 AM: -Original Message- From: Chris Thielen [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:02 AM To: Dallas L. Engelken Cc: users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs Dallas L. Engelken wrote: BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham rules and %ofham on top spam
RE: generating rule stats from spamd logs
Is there any way to modify this code to accept another command-line argument for domain-specific? Meaning, I want to look for all rule hits for mail destined for domain.com? -- Matthew Yette Senior Engineer - NOC/Operations MA Polce Consulting, Inc. [EMAIL PROTECTED] 315-838-1644 (w) 315-356-0597 (f) AIM/Yahoo: MAPolceNOC MSN: [EMAIL PROTECTED] -Original Message- From: Dallas L. Engelken [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 1:02 PM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs My mistake.. It is fixed, hopefully for good. v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt TOP SPAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY 25322 7.35 74.72 99.76 99.13 2URIBL_SBL 22241 6.46 65.63 87.63 0.38 3URIBL_JP_SURBL 21419 6.22 63.20 84.39 0.28 4URIBL_BLACK 19436 5.64 57.35 76.57 0.93 5RAZOR2_CF_RANGE_51_100 17562 5.10 51.82 69.19 1.34 6RAZOR2_CHECK17475 5.07 51.57 68.85 1.15 7SARE_SPEC_ROLEX_REP 16553 4.81 48.84 65.22 0.29 8SPOOF_COM2OTH 16537 4.80 48.80 65.15 0.05 9RAZOR2_CF_RANGE_E8_51_100 16329 4.74 48.18 64.33 0.16 10BAYES_9915380 4.47 45.38 60.59 0.28 TOP HAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY843318.93 24.88 99.76 99.13 2BAYES_00 700515.72 20.670.74 82.34 3AWL 490411.01 14.47 26.64 57.65 4HTML_MESSAGE 3813 8.56 11.25 22.92 44.82 5NO_REAL_NAME 1453 3.264.29 37.79 17.08 6HTML_80_90 1279 2.873.77 10.98 15.03 7MIME_HTML_ONLY972 2.182.876.88 11.43 8HTML_FONT_BIG 794 1.782.349.28 9.33 9BAYES_50 625 1.401.84 25.40 7.35 10HTML_FONT_FACE_BAD545 1.221.610.76 6.41 From: Steve Martin [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:44 AM To: Andy Jezierski Cc: Dallas L. Engelken; users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs He only fixed the spam rules section. The TOP HAM RULES sections still has these two incorrect computations... my $perc2=sprintf(%.2f,($HAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_HAM)*100); Number of times a rule fired on ham / total number of spam messages. Number of times a rule fired on spam / total number of ham messages. my $perc2=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf(%.2f,($HAM_RULES{$key}/$NUM_HAM)*100); On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote: Dallas L. Engelken [EMAIL PROTECTED] wrote on 07/27/2005 11:26:54 AM: -Original Message- From: Chris Thielen [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:02 AM To: Dallas L. Engelken Cc: users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs Dallas L. Engelken wrote: BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham rules and %ofham on top spam rules must be buggy. i'm not running that version with the 5th column. It must be buggy. i play with it after bit. Dallas Dallas, Did you see the patch I sent to the SARE list? Just need to swap two hash lookups. Yup yup. http://www.rulesemporium.com/programs/sa-stats.txt updated. D
RE: generating rule stats from spamd logs
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 26, 2005 7:15 PM To: jdow Cc: users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 jdow writes: From: Chris Santerre [EMAIL PROTECTED] Do you mean this script? http://www.rulesemporium.com/programs/sa-stats.txt Note: It may be named the same as sa-stats.pl, but it is different. Per rule based. Another Dallas miracle! Oh? Er, how does it determine if a message was ham or spam? It looks like it is rather random based on the reports. BAYES_99 may well hit on 84.33% of spam. But I doubt, given it's score, it hits on 44.53% of ham. BTW, it might be quite helpful to rename that script, since there's already an sa-stats.pl in the 'tools' dir -- as follows: NAME sa-stats.pl - Builds received spam/ham report from mail log Yeah, we know. It was originaly only used internaly by SARE. But why not share the love :) I'll see about renaming it. sare-stats.pl ? --Chris
RE: generating rule stats from spamd logs
Another Dallas miracle! Oh? Er, how does it determine if a message was ham or spam? It looks like it is rather random based on the reports. BAYES_99 may well hit on 84.33% of spam. But I doubt, given it's score, it hits on 44.53% of ham. The code should be right... It uses spamassassin's judgement, ie info: spamd: result: Y 20 - BAYES_99,... info: spamd: result: . -2 - AWL, 44.53% of your ham hit BAYES_99... That gotta tell you something is wrong! My bayes hits break down like # ./sa-stats.pl -f spamdlog -n 500 | grep BAYES For spam... 10 BAYES_99 15351 4.46% 45.42% 60.57% 19 BAYES_50 6443 1.87% 19.06% 25.42% 31 BAYES_80 1154 0.34% 3.41% 4.55% 32 BAYES_60 1147 0.33% 3.39% 4.53% 38 BAYES_95 864 0.25% 2.56% 3.41% 102 BAYES_00 187 0.05% 0.55% 0.74% 152 BAYES_40 92 0.03% 0.27% 0.36% 209 BAYES_20 53 0.02% 0.16% 0.21% 228 BAYES_05 44 0.01% 0.13% 0.17% For ham... 2 BAYES_00 6959 15.73% 20.59% 82.32% 9 BAYES_50 623 1.41% 1.84% 7.37% 20 BAYES_40 296 0.67% 0.88% 3.50% 24 BAYES_20 267 0.60% 0.79% 3.16% 29 BAYES_05 217 0.49% 0.64% 2.57% 73 BAYES_60 51 0.12% 0.15% 0.60% 113 BAYES_99 24 0.05% 0.07% 0.28% 142 BAYES_80 14 0.03% 0.04% 0.17% 280 BAYES_95 2 0.00% 0.01% 0.02% So, BAYES_99 hits 0.28% of my ham and 60.57% of my spam. So from your explanation I should be ignoring the %ofham column in the spam stats and the %ofspam column in ham? Otherwise the stats don't seem to make much sense: python# ./sa-stats -f maillog.0 -n 500 | grep BAYES spam rules... 3 BAYES_99 305 3.49 4.99 46.56 5.59 10 BAYES_50 172 1.97 2.81 26.26 3.15 23 BAYES_00 100 1.14 1.64 15.27 1.83 77 BAYES_80 21 0.24 0.34 3.21 0.38 85 BAYES_95 19 0.22 0.31 2.90 0.35 111 BAYES_60 14 0.16 0.23 2.14 0.26 131 BAYES_05 12 0.14 0.20 1.83 0.22 186 BAYES_20 7 0.08 0.11 1.07 0.13 224 BAYES_40 5 0.06 0.08 0.76 0.09 373 SARE_BAYES_5x8 2 0.02 0.03 0.31 0.04 387 SARE_BAYES_6x8 2 0.02 0.03 0.31 0.04 412 SARE_BAYES_7x8 2 0.02 0.03 0.31 0.04 ham rules... 1 BAYES_00 4079 14.05 66.75 622.75 74.76 BAYES_00 hitting 622% of spam??? 6 BAYES_50 771 2.65 12.62 117.71 14.13 25 BAYES_40 238 0.82 3.89 36.34 4.36 35 BAYES_20 190 0.65 3.11 29.01 3.48 40 BAYES_05 148 0.51 2.42 22.60 2.71 173 BAYES_60 15 0.05 0.25 2.29 0.27 232 BAYES_80 9 0.03 0.15 1.37 0.16 310 BAYES_95 5 0.02 0.08 0.76 0.09 349 SARE_BAYES_6x6 4 0.01 0.07 0.61 0.07 416 SARE_BAYES_5x8 2 0.01 0.03 0.31 0.04 496 SARE_BAYES_5x7 1 0.00 0.02 0.15 0.02 Andy
RE: generating rule stats from spamd logs
BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham rules and %ofham on top spam rules must be buggy. i'm not running that version with the 5th column. It must be buggy. i play with it after bit. Dallas From: Andy Jezierski [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 10:44 AM To: users@spamassassin.apache.org Subject: RE: generating rule stats from spamd logs Another Dallas miracle! Oh? Er, how does it determine if a message was ham or spam? It looks like it is rather random based on the reports. BAYES_99 may well hit on 84.33% of spam. But I doubt, given it's score, it hits on 44.53% of ham. The code should be right... It uses spamassassin's judgement, ie info: spamd: result: Y 20 - BAYES_99,... info: spamd: result: . -2 - AWL, 44.53% of your ham hit BAYES_99... That gotta tell you something is wrong! My bayes hits break down like # ./sa-stats.pl -f spamdlog -n 500 | grep BAYES For spam... 10BAYES_9915351 4.46% 45.42% 60.57% 19BAYES_50 6443 1.87% 19.06% 25.42% 31BAYES_80 1154 0.34% 3.41% 4.55% 32BAYES_60 1147 0.33% 3.39% 4.53% 38BAYES_95 864 0.25% 2.56% 3.41% 102BAYES_00 187 0.05% 0.55% 0.74% 152BAYES_40 92 0.03% 0.27% 0.36% 209BAYES_20 53 0.02% 0.16% 0.21% 228BAYES_05 44 0.01% 0.13% 0.17% For ham... 2BAYES_00 695915.73% 20.59% 82.32% 9BAYES_50 623 1.41% 1.84% 7.37% 20BAYES_40 296 0.67% 0.88% 3.50% 24BAYES_20 267 0.60% 0.79% 3.16% 29BAYES_05 217 0.49% 0.64% 2.57% 73BAYES_60 51 0.12% 0.15% 0.60% 113BAYES_99 24 0.05% 0.07% 0.28% 142BAYES_80 14 0.03% 0.04% 0.17% 280BAYES_952 0.00% 0.01% 0.02% So, BAYES_99 hits 0.28% of my ham and 60.57% of my spam. So from your explanation I should be ignoring the %ofham column in the spam stats and the %ofspam column in ham? Otherwise the stats don't seem to make much sense: python# ./sa-stats -f maillog.0 -n 500 | grep BAYES spam rules... 3BAYES_99 305 3.494.99 46.565.59 10BAYES_50 172 1.972.81 26.263.15 23BAYES_00 100 1.141.64 15.271.83 77BAYES_80 21 0.240.34 3.210.38 85BAYES_95 19 0.220.31 2.900.35 111BAYES_60 14 0.160.23 2.140.26 131BAYES_05 12 0.140.20 1.830.22 186BAYES_207 0.080.11 1.070.13 224BAYES_405 0.060.08 0.760.09 373SARE_BAYES_5x8 2 0.020.03 0.310.04 387SARE_BAYES_6x8 2 0.020.03 0.310.04 412SARE_BAYES_7x8 2 0.020.03 0.310.04 ham rules... 1BAYES_00 407914.05 66.75 622.75 74.76 BAYES_00 hitting 622% of spam??? 6BAYES_50 771 2.65 12.62 117.71 14.13 25BAYES_40 238 0.823.89 36.344.36 35BAYES_20 190 0.653.11 29.013.48 40BAYES_05 148 0.512.42 22.602.71 173BAYES_60 15 0.050.25 2.290.27 232BAYES_809 0.030.15 1.370.16 310BAYES_955 0.020.08 0.760.09 349SARE_BAYES_6x6 4 0.010.07 0.610.07 416SARE_BAYES_5x8
Re: generating rule stats from spamd logs
Dallas L. Engelken wrote: BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham rules and %ofham on top spam rules must be buggy. i'm not running that version with the 5th column. It must be buggy. i play with it after bit. Dallas Dallas, Did you see the patch I sent to the SARE list? Just need to swap two hash lookups. Chris T signature.asc Description: OpenPGP digital signature
RE: generating rule stats from spamd logs
M 10BAYES_9915351 4.46% 45.42% 60.57% M 19BAYES_50 6443 1.87% 19.06% 25.42% M 31BAYES_80 1154 0.34% 3.41% 4.55% M 32BAYES_60 1147 0.33% 3.39% 4.53% M 38BAYES_95 864 0.25% 2.56% 3.41% M 102BAYES_00 187 0.05% 0.55% 0.74% M 152BAYES_40 92 0.03% 0.27% 0.36% M 209BAYES_20 53 0.02% 0.16% 0.21% M 228BAYES_05 44 0.01% 0.13% 0.17% M MFor ham... M 2BAYES_00 695915.73% 20.59% 82.32% M 9BAYES_50 623 1.41% 1.84% 7.37% M 20BAYES_40 296 0.67% 0.88% 3.50% M 24BAYES_20 267 0.60% 0.79% 3.16% M 29BAYES_05 217 0.49% 0.64% 2.57% M 73BAYES_60 51 0.12% 0.15% 0.60% M 113BAYES_99 24 0.05% 0.07% 0.28% M 142BAYES_80 14 0.03% 0.04% 0.17% M 280BAYES_952 0.00% 0.01% 0.02% M MSo, BAYES_99 hits 0.28% of my ham and 60.57% of my spam. M You must have a different version to the one now available because your missing one column Spam RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1BAYES_99 468 5.94 75.48 97.91 329.58 2RAZOR2_CHECK 422 5.35 68.06 88.28 297.18 3RAZOR2_CF_RANGE_51_100421 5.34 67.90 88.08 296.48 4URIBL_BLACK 353 4.48 56.94 73.85 248.59 The %ofham column is obviously wrong but the others seem fine Ham RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1BAYES_00 13737.33 22.10 28.66 96.48 2AWL 11230.52 18.06 23.43 78.87 3HTML_MESSAGE 16 4.362.583.35 11.27 7UPPERCASE_25_50 9 2.451.451.88 6.34 8URIBL_BLACK 5 1.360.811.05 3.52 Again the Spam column is wrong here and should be ignored, nice to see whats false positiving so I can lower scores accordingly. Martin
RE: generating rule stats from spamd logs
-Original Message- From: Chris Thielen [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:02 AM To: Dallas L. Engelken Cc: users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs Dallas L. Engelken wrote: BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham rules and %ofham on top spam rules must be buggy. i'm not running that version with the 5th column. It must be buggy. i play with it after bit. Dallas Dallas, Did you see the patch I sent to the SARE list? Just need to swap two hash lookups. Yup yup. http://www.rulesemporium.com/programs/sa-stats.txt updated. D
RE: generating rule stats from spamd logs
Dallas L. Engelken [EMAIL PROTECTED] wrote on 07/27/2005 11:26:54 AM: -Original Message- From: Chris Thielen [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:02 AM To: Dallas L. Engelken Cc: users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs Dallas L. Engelken wrote: BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham rules and %ofham on top spam rules must be buggy. i'm not running that version with the 5th column. It must be buggy. i play with it after bit. Dallas Dallas, Did you see the patch I sent to the SARE list? Just need to swap two hash lookups. Yup yup. http://www.rulesemporium.com/programs/sa-stats.txt updated. D Something's still a little fishy. SA 3.1 latest SVN, if it makes any difference. python# ./sa-stats -f maillog.0 -n 5 Email: 6111 Autolearn: 226 AvgScore: 2.15 AvgScanTime: 3.91 sec Spam:655 Autolearn: 133 AvgScore: 14.81 AvgScanTime: 3.76 sec Ham:5456 Autolearn: 93 AvgScore: 0.63 AvgScanTime: 3.93 sec Time Spent Running SA: 6.64 hours Time Spent Processing Spam: 0.68 hours Time Spent Processing Ham: 5.96 hours TOP SPAM RULES FIRED RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1 HTML_MESSAGE 496 5.67 8.12 75.73 62.19 2 DCC_CHECK 310 3.55 5.07 47.33 7.02 3 BAYES_99 305 3.49 4.99 46.56 0.02 4 RAZOR2_CHECK 277 3.17 4.53 42.29 4.23 5 DIGEST_MULTIPLE 251 2.87 4.11 38.32 2.42 TOP HAM RULES FIRED RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1 BAYES_00 4079 14.05 66.75 622.75 1.83 2 HTML_MESSAGE 3393 11.68 55.52 518.02 9.09 3 NO_REAL_NAME 1053 3.63 17.23 160.76 1.06 4 HTML_80_90 931 3.21 15.23 142.14 2.35 5 LG_4C_2V_3C 798 2.75 13.06 121.83 2.20
Re: generating rule stats from spamd logs
He only fixed the spam rules section.The TOP HAM RULES sections still has these two incorrect computations... my $perc2=sprintf("%.2f",($HAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf("%.2f",($SPAM_RULES{$key}/$NUM_HAM)*100);Number of times a rule fired on ham / total number of spam messages.Number of times a rule fired on spam / total number of ham messages. my $perc2=sprintf("%.2f",($SPAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf("%.2f",($HAM_RULES{$key}/$NUM_HAM)*100);On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote:"Dallas L. Engelken" [EMAIL PROTECTED] wrote on 07/27/2005 11:26:54 AM: -Original Message- From: Chris Thielen [mailto:[EMAIL PROTECTED]]Sent: Wednesday, July 27, 2005 11:02 AM To: Dallas L. Engelken Cc: users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs Dallas L. Engelken wrote: BAYES_00 hits 15.27 of spam on yours, the %ofspam on top hamrules and%ofham on top spam rules must be buggy. i'm not running that version with the 5th column. It must be buggy. i play with it after bit.Dallas Dallas, Did you see the patch I sent to the SARE list? Just need toswap two hash lookups. Yup yup. http://www.rulesemporium.com/programs/sa-stats.txt updated.D Something's still a little fishy. SA 3.1 latest SVN, if it makes any difference.python# ./sa-stats -f maillog.0 -n 5 Email: 6111 Autolearn: 226 AvgScore: 2.15 AvgScanTime: 3.91 sec Spam: 655 Autolearn: 133 AvgScore: 14.81 AvgScanTime: 3.76 sec Ham: 5456 Autolearn: 93 AvgScore: 0.63 AvgScanTime: 3.93 sec Time Spent Running SA: 6.64 hours Time Spent Processing Spam: 0.68 hours Time Spent Processing Ham: 5.96 hours TOP SPAM RULES FIRED RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1 HTML_MESSAGE 496 5.67 8.12 75.73 62.19 2 DCC_CHECK 310 3.55 5.07 47.33 7.02 3 BAYES_99 305 3.49 4.99 46.56 0.02 4 RAZOR2_CHECK 277 3.17 4.53 42.29 4.23 5 DIGEST_MULTIPLE 251 2.87 4.11 38.32 2.42 TOP HAM RULES FIRED RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1 BAYES_00 4079 14.05 66.75 622.75 1.83 2 HTML_MESSAGE 3393 11.68 55.52 518.02 9.09 3 NO_REAL_NAME 1053 3.63 17.23 160.76 1.06 4 HTML_80_90 931 3.21 15.23 142.14 2.35 5 LG_4C_2V_3C 798 2.75 13.06 121.83 2.20 -- Steve Martin http://www.cheezmo.com/ Smart Calibration, LLC http://www.smartcalibration.com/ The Widescreen Movie Center http://www.widemovies.com/ Letterboxed Movie TV Schedule http://www.widemovies.com/lbx.html
RE: generating rule stats from spamd logs
My mistake.. It is fixed, hopefully for good. v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt TOP SPAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY 25322 7.35 74.72 99.76 99.13 2URIBL_SBL 22241 6.46 65.63 87.63 0.38 3URIBL_JP_SURBL 21419 6.22 63.20 84.39 0.28 4URIBL_BLACK 19436 5.64 57.35 76.57 0.93 5RAZOR2_CF_RANGE_51_100 17562 5.10 51.82 69.19 1.34 6RAZOR2_CHECK17475 5.07 51.57 68.85 1.15 7SARE_SPEC_ROLEX_REP 16553 4.81 48.84 65.22 0.29 8SPOOF_COM2OTH 16537 4.80 48.80 65.15 0.05 9RAZOR2_CF_RANGE_E8_51_100 16329 4.74 48.18 64.33 0.16 10BAYES_9915380 4.47 45.38 60.59 0.28 TOP HAM RULES FIRED RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1UNPARSEABLE_RELAY843318.93 24.88 99.76 99.13 2BAYES_00 700515.72 20.670.74 82.34 3AWL 490411.01 14.47 26.64 57.65 4HTML_MESSAGE 3813 8.56 11.25 22.92 44.82 5NO_REAL_NAME 1453 3.264.29 37.79 17.08 6HTML_80_90 1279 2.873.77 10.98 15.03 7MIME_HTML_ONLY972 2.182.876.88 11.43 8HTML_FONT_BIG 794 1.782.349.28 9.33 9BAYES_50 625 1.401.84 25.40 7.35 10HTML_FONT_FACE_BAD545 1.221.610.76 6.41 From: Steve Martin [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:44 AM To: Andy Jezierski Cc: Dallas L. Engelken; users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs He only fixed the spam rules section. The TOP HAM RULES sections still has these two incorrect computations... my $perc2=sprintf(%.2f,($HAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_HAM)*100); Number of times a rule fired on ham / total number of spam messages. Number of times a rule fired on spam / total number of ham messages. my $perc2=sprintf(%.2f,($SPAM_RULES{$key}/$NUM_SPAM)*100); my $perc3=sprintf(%.2f,($HAM_RULES{$key}/$NUM_HAM)*100); On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote: Dallas L. Engelken [EMAIL PROTECTED] wrote on 07/27/2005 11:26:54 AM: -Original Message- From: Chris Thielen [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:02 AM To: Dallas L. Engelken Cc: users@spamassassin.apache.org Subject: Re: generating rule stats from spamd logs Dallas L. Engelken wrote: BAYES_00 hits 15.27 of spam on yours, the %ofspam on top ham rules and %ofham on top spam rules must be buggy. i'm not running that version with the 5th column. It must be buggy. i play with it after bit. Dallas Dallas, Did you see the patch I sent to the SARE list? Just need to swap two hash lookups. Yup yup. http://www.rulesemporium.com/programs/sa-stats.txt updated. D Something's still a little fishy. SA 3.1 latest SVN, if it makes any difference. python# ./sa-stats -f maillog.0 -n 5 Email: 6111 Autolearn: 226 AvgScore: 2.15 AvgScanTime: 3.91 sec Spam: 655 Autolearn: 133 AvgScore: 14.81 AvgScanTime: 3.76 sec Ham: 5456 Autolearn:93 AvgScore: 0.63 AvgScanTime: 3.93 sec Time Spent Running SA
RE: generating rule stats from spamd logs
-Original Message- From: Charles Sprickman [mailto:[EMAIL PROTECTED] Sent: Monday, July 25, 2005 10:46 PM To: users@spamassassin.apache.org Subject: generating rule stats from spamd logs Hi, Anyone aware of anything that can parse a day's spamd logs and then give a summary of total hits per rule? I noticed since 3.0.x that all rule hits are in the logs now: Jul 25 22:44:49 spamd2 spamd[59436]: result: Y 14 - BAYES_60,DATE_IN_FUTURE_03_06,DNS_FROM_RFC_POST,URIBL_BLACK,UR IBL_JP_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL scantime=6.7,size=2027,mid=[EMAIL PROTECTED] ah,bayes=0.781998195315203,autolearn=disabled I've got three spamd boxes logging to one server. I already run sa-stats.pl daily, but I'd like to see more information about what rules are hitting. I did see a few things in the wiki, but most of them look to be tied to snarfing MTA logs. Do you mean this script? http://www.rulesemporium.com/programs/sa-stats.txt Note: It may be named the same as sa-stats.pl, but it is different. Per rule based. Another Dallas miracle! Chris Santerre SysAdmin and SARE/URIBL ninja http://www.uribl.com http://www.rulesemporium.com
Re: generating rule stats from spamd logs
From: Chris Santerre [EMAIL PROTECTED] Do you mean this script? http://www.rulesemporium.com/programs/sa-stats.txt Note: It may be named the same as sa-stats.pl, but it is different. Per rule based. Another Dallas miracle! Oh? Er, how does it determine if a message was ham or spam? It looks like it is rather random based on the reports. BAYES_99 may well hit on 84.33% of spam. But I doubt, given it's score, it hits on 44.53% of ham. {^_^}
generating rule stats from spamd logs
Hi, Anyone aware of anything that can parse a day's spamd logs and then give a summary of total hits per rule? I noticed since 3.0.x that all rule hits are in the logs now: Jul 25 22:44:49 spamd2 spamd[59436]: result: Y 14 - BAYES_60,DATE_IN_FUTURE_03_06,DNS_FROM_RFC_POST,URIBL_BLACK,URIBL_JP_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL scantime=6.7,size=2027,mid=[EMAIL PROTECTED],bayes=0.781998195315203,autolearn=disabled I've got three spamd boxes logging to one server. I already run sa-stats.pl daily, but I'd like to see more information about what rules are hitting. I did see a few things in the wiki, but most of them look to be tied to snarfing MTA logs. Thanks, Charles