> -----Original Message-----
> From: Brian Ipsen [mailto:[EMAIL PROTECTED]
> Sent: Sunday, August 24, 2003 10:29 AM
> To: [EMAIL PROTECTED]
> Subject: [Qmail-scanner-general]Suggestion: Option to archive
> all messages tagged by SpamAssassin
>
>
> Hi!
>
> I miss an option, where it is possible to specify that
> qmail-scanner should archive all mails that SpamAssassin
> identifies as spam. The reason for this is that I'd like to
> be able to gather statistics on what rules are triggered on
> each message - and I can only do this either by storing a
> copy of each message - or enabling debug-log in SpamAssassin,
> which unfortunately reguires some disk-space. The other way
> around I'm able to process each message and store the needed
> data in an SQL database - and afterwards delete the message.
>
> Regards,
>
> /Brian
>
i would start my stripping the test=(.*) line from X-Spam-Status and
splitting the matching tests
@tests=split(/\,/,$1);
with 2.60-x, you will run into a small problem with TERSE report which
is no longer an on|off option. The X-Spam-Status _REPORT_ is
automatically TERSE, and will fold at 78 chars, so your tests= will look
like
X-Spam-Status: Yes, hits=14.6 required=5.0
tests=BAYES_99,CLICK_BELOW_CAPS,
DATE_MISSING,SUBJ_HAS_SPACES,SUBJ_HAS_UNIQ_ID autolearn=no
version=2.60-rc2
on emails that have large amounts of matching rules, so $1 will hold
"BAYES_99=5.4,CLICK_BELOW_CAPS=0.5,DATE_MISSING=1.917," and not grab the
fold. you would need to set a $next_header=0 and watch for \t's for
header continuation.
it'll take a little work, but it will be much easier that anything else
you are thinking about doing (IMHO).
then, once you have all the rules in @tests, you can
foreach my $test (@tests) {
$sql="INSERT INTO test_hits (msgid,rule,score) VALUES (?,?,?)";
..
..
$sth->execute($msgid,$test,NULL);
i use the score field and run
_TESTSSCORES(,)_ as above, except with scores appended (eg.
AWL=-3.0,...)
instead of
_TESTS(,)_ tests hit separated by , (or other separator)
in the X-Spam-Status: header, so then in my foreach loop, i split again
on the = sign,
foreach my $test (@tests) {
$sql="INSERT INTO test_hits (msgid,rule,score) VALUES (?,?,?)";
..
..
my ($rule,$score) = split(/=/,$test);
$sth->execute($msgid,$rule,$score);
my test_hits db contains 5 columns
CREATE TABLE test_hits (
id int(10) unsigned NOT NULL auto_increment,
msgid varchar(254) NOT NULL default '',
rule varchar(64) NOT NULL default '',
score float(5,2) NOT NULL default '0.00',
t timestamp(14) NOT NULL,
PRIMARY KEY (id),
KEY msgid (msgid),
KEY rule (rule)
) TYPE=MyISAM COMMENT='SpamAssasin Rule Matches';
and indicies on msgid and rule, so i can easily show all rules that
match for a specific msgid, or show how many messages match a certain
rule....
you could extend as needed to include env_sender, recips, spam score,
etc....
enjoy, and good luck!
dallas
-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
Qmail-scanner-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/qmail-scanner-general