Re: [qmailtoaster] sa-learn-attach
Hi Eric, Am 21.12.2011 um 19:48 schrieb Eric Shubert: Here's the script I use with a shared folder to learn ham and spam, fwiw: #!/bin/sh # # learn and remove spam and ham in shared folders # # shubes 3/26/08 - created # learndir=/home/vpopmail/domains/shubes.net/sa-learn hambox=.Ham spambox=.Spam do_the_learning(){ learnas=$1 maildir=$2 shopt -s extglob for spamfile in `find $maildir/+(cur|new)/* 2/dev/null`; do sudo -u vpopmail -H sa-learn --$learnas $spamfile rc=$? if [ $? != 0 ]; then echo sa-learn failed, rc=$rc, spamfile=$spamfile exit $rc fi rm $spamfile done } do_the_learning ham $learndir/$hambox do_the_learning spam $learndir/$spambox exit 0 I am using a version of something I found on the wiki (or as part of qtp?). It adds the following things: - loop through all domains and users - do not touch/learn special files (e.g. dovecot cache, etc.). - move ham back to inbox (which is safe with dovecot - I asked the author.) - learn items without syncing *before* making spamassassin sync the database (if you host several domains with users actively using the ham/spam feature, you'll be glad to do that, as perl / spam assassin is no lightweight) # Let's define our folder conventions: SPAMDIR=.Spam.Lernen HAMDIR=.Spam.Korrektur # find and process each SPAMDIR for directory in $( find /home/vpopmail/domains -type d -name $SPAMDIR ); do # then find and process each file in SPAMDIR that is not a dovecot special file for file in $( find $directory -type f -not \( -name dovecot.index -o -name dovecot.index.log -o -name dovecot.index.cache -o -name dovecot-keywords -o -name dovecot-uidlist -o -name maildirfolder \) ); do # learn the file with sa-learn as Spam (use the vpopmail user so it ends up in the correct database) sudo -u vpopmail -H sa-learn --no-sync --spam ${file} /dev/null 21 # Spam belongs to nirvana! rm -f ${file} /dev/null 21 done done # find and process each HAMDIR for directory in $( find /home/vpopmail/domains -type d -name $HAMDIR ); do # then find and process each file in HAMDIR that is not a dovecot special file for file in $( find $directory -type f -not \( -name dovecot.index -o -name dovecot.index.log -o -name dovecot.index.cache -o -name dovecot-keywords -o -name dovecot-uidlist -o -name maildirfolder \) ); do # learn the file with sa-learn as HAM (use the vpopmail user so it ends up in the correct database) sudo -u vpopmail -H sa-learn --no-sync --ham ${file} /dev/null 21 # move the file back to the the INBOX. mv ${file} ${directory}/../cur /dev/null 21 done done # to speed up learning, we only sync the journal with the database at the end. sudo -u vpopmail -H sa-learn --sync /dev/null 21 signature.asc Description: Message signed with OpenPGP using GPGMail
[qmailtoaster] sa-learn-attach
Any of you ever heard of this script? Would it work to learn on attachments vs actual emails? This could be very handy for training SpamAssassin if it would work, or at least easier on users. I am definitely not a coder/scripter, so wondering if anyone can take a look. Or is there a built in method for doing this in the newer versions of SpamAssassin? Thx! #!/usr/bin/perl # /lib 20030227 # based on SpamAssassin's sa-learn use strict; use warnings; my $PREFIX = '/usr/local/stow/perl-5.6.1'; # substituted at 'make' time my $DEF_RULES_DIR = '/usr/local/stow/perl-5.6.1/share/spamassassin'; # substituted at 'make' time my $LOCAL_RULES_DIR = '/etc/mail/spamassassin'; # substituted at 'make' time use Mail::SpamAssassin; use Mail::SpamAssassin::ArchiveIterator; #use Mail::SpamAssassin::NoMailAudit; use Mail::SpamAssassin::PerMsgLearner; use Getopt::Long; use Pod::Usage; use MIME::Parser (); Getopt::Long::Configure(qw(bundling no_getopt_compat no_auto_abbrev no_ignore_case)); my ($isspam, $forget, %opt); GetOptions( 'spam' = sub { $isspam = 1; }, 'ham|nonspam'= sub { $isspam = 0; }, 'forget' = \$forget, 'config-file|C=s'= \$opt{'config-file'}, 'prefs-file|p=s' = \$opt{'prefs-file'}, 'no-rebuild|norebuild' = \$opt{'norebuild'}, 'force-expire' = \$opt{'force-expire'}, 'randseed=i' = \$opt{'randseed'}, 'auto-whitelist|a' = \$opt{'auto-whitelist'}, 'bias-scores|b' = \$opt{'bias-scores'}, 'debug-level|D' = \$opt{'debug-level'}, 'version|V' = \$opt{'version'}, 'help|h|?' = \$opt{'help'}, ) or usage(0, Unknown option!); if (defined $opt{'help'}) { usage(0, For more information read the manual page); } if (defined $opt{'version'}) { print SpamAssassin version . Mail::SpamAssassin::Version() . \n; exit 0; } if ( !defined $isspam !defined $forget ) { usage(0, Please select either --spam, --ham, or --forget); } # create the tester factory my $spamtest = new Mail::SpamAssassin ({ rules_filename = $opt{'config-file'}, userprefs_filename = $opt{'prefs-file'}, debug = defined($opt{'debug-level'}), local_tests_only= 1, dont_copy_prefs = 1, PREFIX = $PREFIX, DEF_RULES_DIR = $DEF_RULES_DIR, LOCAL_RULES_DIR = $LOCAL_RULES_DIR, }); $spamtest-init (1); $spamtest-init_learner({ use_whitelist = $opt{'auto-whitelist'}, bias_scores = $opt{'bias-scores'}, force_expire= $opt{'force-expire'}, caller_will_untie = 1, }); if (defined $opt{'randseed'}) { srand ($opt{'randseed'}); } # run this lot in an eval block, so we can catch die's and clear # up the dbs. eval { $SIG{INT} = \killed; $SIG{TERM} = \killed; # new MIME Parser: my $parser = new MIME::Parser; # don't parse rfc/822 sub-messages: $parser-extract_nested_messages(0); # don't create files: $parser-output_to_core(1); # now parse the message: ($entity is a MIME::Entity) my $entity = $parser-parse(\*STDIN) or die parse failed\n; # must be multipart message: $entity-is_multipart() or die is not multipart\n; my $messagecount = 0; # loop over the parts: ($part is a MIME::Entity) foreach my $part ($entity-parts()) { my $effective_type = $part-effective_type; # skip if not a message sub-part: next unless $effective_type =~ m{^message/}; my $body = $part-stringify_body(); my @body = split (/^/m, $body); my $dataref = \@body; # my $ma = Mail::SpamAssassin::NoMailAudit-new ('data' = $dataref); my $ma = $spamtest-parse($dataref); if ($ma-get_pristine_header(X-Spam-Status)) { my $newtext = $spamtest-remove_spamassassin_markup($ma); my @newtext = split (/^/m, $newtext); $dataref = \@newtext; # $ma = Mail::SpamAssassin::NoMailAudit-new ('data' = $dataref); $ma = $spamtest-parse($dataref); } $ma-{noexit} = 1; my $learner = $spamtest-learn ($ma, undef, $isspam, $forget); $messagecount++ if ($learner-did_learn()); $learner-finish(); } warn Learned from $messagecount messages.\n; if (!$opt{norebuild}) { $spamtest-rebuild_learner_caches(); } }; if ($@) { my $failure = $@; $spamtest-finish_learner(); die $failure; } $spamtest-finish_learner(); exit 0; sub killed { $spamtest-finish_learner(); die interrupted; } sub usage { my ($verbose, $message) = @_; my $ver =
RE: [qmailtoaster] sa-learn-attach
And a little background on this - my server is both a mail server and acts as a smart host front end for an exchange server. So I cannot use the typical method of scanning users junk email folders, I can only have known spams forwarded back to a spam mailbox and false positives to a ham mailbox. -Original Message- From: Helmut Fritz [mailto:hel...@fritz.us.com] Sent: Tuesday, December 20, 2011 8:13 PM To: qmailtoaster-list@qmailtoaster.com Subject: [qmailtoaster] sa-learn-attach Any of you ever heard of this script? Would it work to learn on attachments vs actual emails? This could be very handy for training SpamAssassin if it would work, or at least easier on users. I am definitely not a coder/scripter, so wondering if anyone can take a look. Or is there a built in method for doing this in the newer versions of SpamAssassin? Thx! - Qmailtoaster is sponsored by Vickers Consulting Group (www.vickersconsulting.com) Vickers Consulting Group offers Qmailtoaster support and installations. If you need professional help with your setup, contact them today! - Please visit qmailtoaster.com for the latest news, updates, and packages. To unsubscribe, e-mail: qmailtoaster-list-unsubscr...@qmailtoaster.com For additional commands, e-mail: qmailtoaster-list-h...@qmailtoaster.com