Re: [qmailtoaster] sa-learn-attach

2011-12-22 Thread Martin Waschbüsch IT-Dienstleistungen
Hi Eric,


Am 21.12.2011 um 19:48 schrieb Eric Shubert:

 Here's the script I use with a shared folder to learn ham and spam, fwiw:
 #!/bin/sh
 #
 # learn and remove spam and ham in shared folders
 #
 # shubes 3/26/08 - created
 #
 
 learndir=/home/vpopmail/domains/shubes.net/sa-learn
 hambox=.Ham
 spambox=.Spam
 
 do_the_learning(){
 
 learnas=$1
 maildir=$2
 
 shopt -s extglob
 for spamfile in `find $maildir/+(cur|new)/* 2/dev/null`; do
  sudo -u vpopmail -H sa-learn --$learnas $spamfile
  rc=$?
  if [ $? != 0 ]; then
echo sa-learn failed, rc=$rc, spamfile=$spamfile
exit $rc
  fi
  rm $spamfile
 done
 }
 
 do_the_learning ham  $learndir/$hambox
 do_the_learning spam $learndir/$spambox
 
 exit 0

I am using a version of something I found on the wiki (or as part of qtp?).

It adds the following things:
- loop through all domains and users
- do not touch/learn special files (e.g. dovecot cache, etc.).
- move ham back to inbox (which is safe with dovecot - I asked the author.)
- learn items without syncing *before* making spamassassin sync the database 
(if you host several domains with users actively using the ham/spam feature, 
you'll be glad to do that, as perl / spam assassin is no lightweight)


# Let's define our folder conventions:
SPAMDIR=.Spam.Lernen
HAMDIR=.Spam.Korrektur

# find and process each SPAMDIR

for directory in $( find /home/vpopmail/domains -type d -name $SPAMDIR );
do
  # then find and process each file in SPAMDIR that is not a dovecot special 
file

  for file in $( find $directory -type f -not \( -name dovecot.index -o -name 
dovecot.index.log -o -name dovecot.index.cache -o -name dovecot-keywords -o 
-name dovecot-uidlist -o -name maildirfolder \) );
  do

# learn the file with sa-learn as Spam (use the vpopmail user so it ends up 
in the correct database)
   sudo -u vpopmail -H sa-learn --no-sync --spam ${file}  /dev/null 21

# Spam belongs to nirvana!
rm -f ${file}  /dev/null 21

  done
done

# find and process each HAMDIR
for directory in $( find /home/vpopmail/domains -type d -name $HAMDIR );
do

  # then find and process each file in HAMDIR that is not a dovecot special file
  for file in $( find $directory -type f -not \( -name dovecot.index -o -name 
dovecot.index.log -o -name dovecot.index.cache -o -name dovecot-keywords -o 
-name dovecot-uidlist -o -name maildirfolder \) );
  do

# learn the file with sa-learn as HAM (use the vpopmail user so it ends up 
in the correct database)
sudo -u vpopmail -H sa-learn --no-sync --ham ${file} /dev/null 21

# move the file back to the the INBOX.
mv ${file} ${directory}/../cur /dev/null 21

  done
done

# to speed up learning, we only sync the journal with the database at the end.
sudo -u vpopmail -H sa-learn --sync /dev/null 21

signature.asc
Description: Message signed with OpenPGP using GPGMail


[qmailtoaster] sa-learn-attach

2011-12-20 Thread Helmut Fritz
Any of you ever heard of this script?  Would it work to learn on attachments
vs actual emails?  This could be very handy for training SpamAssassin if it
would work, or at least easier on users.  I am definitely not a
coder/scripter, so wondering if anyone can take a look.  Or is there a built
in method for doing this in the newer versions of SpamAssassin?  Thx!


#!/usr/bin/perl
# /lib 20030227
# based on SpamAssassin's sa-learn

use strict;
use warnings;

my $PREFIX = '/usr/local/stow/perl-5.6.1';  # substituted at 'make' time
my $DEF_RULES_DIR = '/usr/local/stow/perl-5.6.1/share/spamassassin';  # 
substituted at 'make' time
my $LOCAL_RULES_DIR = '/etc/mail/spamassassin';  # substituted at 'make' time

use Mail::SpamAssassin;
use Mail::SpamAssassin::ArchiveIterator;
#use Mail::SpamAssassin::NoMailAudit;
use Mail::SpamAssassin::PerMsgLearner;

use Getopt::Long;
use Pod::Usage;

use MIME::Parser ();

Getopt::Long::Configure(qw(bundling no_getopt_compat
   no_auto_abbrev no_ignore_case));

my ($isspam, $forget, %opt);

GetOptions(
   'spam'   = sub { $isspam = 1; },
   'ham|nonspam'= sub { $isspam = 0; },
   'forget' = \$forget,
   'config-file|C=s'= \$opt{'config-file'},
   'prefs-file|p=s' = \$opt{'prefs-file'},

   'no-rebuild|norebuild'   = \$opt{'norebuild'},
   'force-expire'   = \$opt{'force-expire'},

   'randseed=i' = \$opt{'randseed'},

   'auto-whitelist|a'   = \$opt{'auto-whitelist'},
   'bias-scores|b'  = \$opt{'bias-scores'},

   'debug-level|D'  = \$opt{'debug-level'},
   'version|V'  = \$opt{'version'},
   'help|h|?'   = \$opt{'help'},
   ) or usage(0, Unknown option!);


if (defined $opt{'help'}) { usage(0, For more information read the manual 
page); }
if (defined $opt{'version'}) {
print SpamAssassin version  . Mail::SpamAssassin::Version() . \n;
exit 0;
}
if ( !defined $isspam  !defined $forget ) {
usage(0, Please select either --spam, --ham, or --forget);
}

# create the tester factory
my $spamtest = new Mail::SpamAssassin ({
rules_filename  = $opt{'config-file'},
userprefs_filename  = $opt{'prefs-file'},
debug   = defined($opt{'debug-level'}),
local_tests_only= 1,
dont_copy_prefs = 1,
PREFIX  = $PREFIX,
DEF_RULES_DIR   = $DEF_RULES_DIR,
LOCAL_RULES_DIR = $LOCAL_RULES_DIR,
});

$spamtest-init (1);

$spamtest-init_learner({
use_whitelist   = $opt{'auto-whitelist'},
bias_scores = $opt{'bias-scores'},
force_expire= $opt{'force-expire'},
caller_will_untie   = 1,
});

if (defined $opt{'randseed'}) {
srand ($opt{'randseed'});
}

# run this lot in an eval block, so we can catch die's and clear
# up the dbs.
eval {
$SIG{INT} = \killed;
$SIG{TERM} = \killed;

# new MIME Parser:
my $parser = new MIME::Parser;

# don't parse rfc/822 sub-messages:
$parser-extract_nested_messages(0);

# don't create files:
$parser-output_to_core(1);

# now parse the message: ($entity is a MIME::Entity)
my $entity = $parser-parse(\*STDIN) or die parse failed\n;

# must be multipart message:
$entity-is_multipart() or die is not multipart\n;

my $messagecount = 0;

# loop over the parts: ($part is a MIME::Entity)
foreach my $part ($entity-parts()) {

my $effective_type = $part-effective_type;

# skip if not a message sub-part:
next unless $effective_type =~ m{^message/};

my $body = $part-stringify_body();
my @body = split (/^/m, $body);
my $dataref = \@body;

#   my $ma = Mail::SpamAssassin::NoMailAudit-new ('data' = $dataref);
my $ma = $spamtest-parse($dataref);
if ($ma-get_pristine_header(X-Spam-Status)) {
my $newtext = $spamtest-remove_spamassassin_markup($ma);
my @newtext = split (/^/m, $newtext);
$dataref = \@newtext;
   # $ma = Mail::SpamAssassin::NoMailAudit-new ('data' = $dataref);
$ma = $spamtest-parse($dataref);
}

$ma-{noexit} = 1;

my $learner = $spamtest-learn ($ma, undef, $isspam, $forget);
$messagecount++ if ($learner-did_learn());
$learner-finish();

}

warn Learned from $messagecount messages.\n;

if (!$opt{norebuild}) {
$spamtest-rebuild_learner_caches();
}
};


if ($@) {
my $failure = $@;
$spamtest-finish_learner();
die $failure;
}

$spamtest-finish_learner();
exit 0;

sub killed {
  $spamtest-finish_learner();
  die interrupted;
}


sub usage {
my ($verbose, $message) = @_;
my $ver = 

RE: [qmailtoaster] sa-learn-attach

2011-12-20 Thread Helmut Fritz
And a little background on this - my server is both a mail server and acts
as a smart host front end for an exchange server.  So I cannot use the
typical method of scanning users junk email folders, I can only have known
spams forwarded back to a spam mailbox and false positives to a ham mailbox.

-Original Message-
From: Helmut Fritz [mailto:hel...@fritz.us.com] 
Sent: Tuesday, December 20, 2011 8:13 PM
To: qmailtoaster-list@qmailtoaster.com
Subject: [qmailtoaster] sa-learn-attach

Any of you ever heard of this script?  Would it work to learn on attachments
vs actual emails?  This could be very handy for training SpamAssassin if it
would work, or at least easier on users.  I am definitely not a
coder/scripter, so wondering if anyone can take a look.  Or is there a built
in method for doing this in the newer versions of SpamAssassin?  Thx!




-
Qmailtoaster is sponsored by Vickers Consulting Group 
(www.vickersconsulting.com)
Vickers Consulting Group offers Qmailtoaster support and installations.
  If you need professional help with your setup, contact them today!
-
 Please visit qmailtoaster.com for the latest news, updates, and packages.
 
  To unsubscribe, e-mail: qmailtoaster-list-unsubscr...@qmailtoaster.com
 For additional commands, e-mail: qmailtoaster-list-h...@qmailtoaster.com