Re: Training Bayes On A Gateway
I collect spam this way, periodically I scan the mail logs looking for "unknown user" entries and sort the results - usernames/email addresses that are repeatedly being "guessed" get an alias entry added that forwards the spam to a spam mailbox. I have about 20 of these now that are aliased to the spambox and that box gets tons and tons of spam. Ham is just my own email folders - all legitimate mail I get, once I finish dealing with it, goes into an archive, and that archive is periodically fed into the Bays learner. Ted On 10/9/2014 12:43 PM, John Traweek CCNA, Sec+ wrote: I’ve built a gateway server using sa-exim to filter email for our corporate Microsoft Exchange environment. It’s working pretty good, but I have Bayes turned off due to the fact that I am unsure on how to train it in this type of environment. Has someone written a how to article on how to efficiently continually train Bayes in any environment like this. I was thinking if specific users could forward SPAM to some box on Exchange and have sa-exim POP it or something to “learn” that would be ideal, but maybe there is a better way. Any ideas are appreciated, the easier the better. TIA… *John Traweek CCNA, Sec+ *Executive Director, Information Technology Proud PCI Associate for 18 years T: 214.530.0394 *Did you know last year, PCI raised over 9 million dollars in donations for our clients? Ask us how!* * * This Email is covered by the Electronic Communications Privacy Act, 18 U.S.C. Sections 2510-2521 and is legally privileged. The information contained in this Email is only for the intended recipient. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distributions or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us by telephone 1.800.395.4724 X160, and destroy the original message.
Re: Training Bayes On A Gateway
On Thu, Oct 9, 2014 at 4:14 PM, John Hardin wrote: > On Thu, 9 Oct 2014, John Traweek CCNA, Sec+ wrote: > > I've built a gateway server using sa-exim to filter email for our >> > > This topic comes up fairly regularly. Did you search the list archives on > terms like "exchange bayes" ? > Since the OP mentioned exim, I'll share a bit of how I did something similar. While I have Exchange in the picture, most of my users are not on it. I wanted to be able to fully reject mail at SMTP time if SpamAssassin (SA does not block mail ) and not worry about whether exim would change the log format if I did a 'fakereject'. SMTP rejects are nice since I do not quarantine spam. I didn't see elsewhere, either on the SA wiki or elsewhere, so figured I'd share and maybe help out somewhere.. I use exim's native SA integration, not sa-exim. I also use dovecot for my IMAP users' mailboxes, and this is where my spam mail goes. In my data ACL within exim.conf, I have: --- # Call SA and add some headers to the email delivered via normal means if it's non-spam warnspam = spam:true/defer_ok add_header= X-Spam-Score: $spam_score ($spam_bar) add_header= X-Spam-Report: $spam_report # If it's spam (defined as an SA score > 5), then run my custom deliver script against the copy of the email in the exim mail spool. # Exim's mail spool copy won't have the above added headers, so need to do so here to see them in the spam mailbox. warncondition = ${if >{$spam_score_int}{50}{1}{0}} condition = ${run{/home/spam/bin/deliver incoming-spam $spool_directory/scan/$message_id/$message_id.eml 'X-Spam-Score: $spam_score\nX-Spam-Report: $spam_report'}} denycondition = ${if >{$spam_score_int}{50}{1}{0}} message = . - /home/spam/bin/deliver contains: #!/bin/bash MAILBOX=$1 FILE=$2 shift shift HEADERS="$*" TMPFILE=/tmp/deliver.$$ echo -e "$HEADERS" >> $TMPFILE # Exim writes out a standard mbox-style From line, remove it cat $FILE | tail -n +2 >> $TMPFILE # Dovecot must be root to do direct delievery cat $TMPFILE | sudo /usr/libexec/dovecot/deliver -d spam -m $MAILBOX rm $TMPFILE -- HTH, YMMV, HANW :) Jason The path to enlightenment is /usr/bin/enlightenment.
Re: Training Bayes On A Gateway
On Thu, 9 Oct 2014, John Traweek CCNA, Sec+ wrote: I've built a gateway server using sa-exim to filter email for our corporate Microsoft Exchange environment. It's working pretty good, but I have Bayes turned off due to the fact that I am unsure on how to train it in this type of environment. Has someone written a how to article on how to efficiently continually train Bayes in any environment like this. I was thinking if specific users could forward SPAM to some box on Exchange and have sa-exim POP it or something to "learn" that would be ideal, but maybe there is a better way. Any ideas are appreciated, the easier the better. TIA... This topic comes up fairly regularly. Did you search the list archives on terms like "exchange bayes" ? There's no explicit coverage of this in the wiki, but these pages may help: http://wiki.apache.org/spamassassin/SiteWideBayesFeedback http://wiki.apache.org/spamassassin/RemoteImapFolder ...though I've heard Exchange has deprecated public IMAP folders. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The reason it took so long to get Bin Laden is that it took the SEALs five years to swim that far into the desert. -- anon --- 861 days since the first successful private support mission to ISS (SpaceX)
Re: Training Bayes On A Gateway
Am 09.10.2014 um 21:43 schrieb John Traweek CCNA, Sec+: I’ve built a gateway server using sa-exim to filter email for our corporate Microsoft Exchange environment. It’s working pretty good, but I have Bayes turned off due to the fact that I am unsure on how to train it in this type of environment. Has someone written a how to article on how to efficiently continually train Bayes in any environment like this. I was thinking if specific users could forward SPAM to some box on Exchange and have sa-exim POP it or something to “learn” that would be ideal, but maybe there is a better way. Any ideas are appreciated, the easier the better i just decided to stay on spamass-milter which implies a single user and so one central bayes trained with a simple script from two folders (ham and spam) and disable any autolearning - users are adviced to foreard samples as attachment which get added after review, until now not more than 5 per day, the rest is catched by the fact that i receive currently 10 email addresses including some alias-lists and so face all sort of crap the ham folder just contains a lot of my legit mail in case it don#t contain sensible data the machine itself is inbound only with postfix-transport tables after the filters and so should match your subject so far the results are impressive the first script is a wrapper running as root and take care of permissions and remove dulicates to optimize the training in case of a complete rebuild, the sample eml-files are renamend with Konqueror to "-mm-dd-#" and so get a automatic number wich offers to remove outdated spam samples and rebuild easy in a year or two the second script does the training itself, is running as the milter-user and is called with "su" from the wrapper, the milter-user has /bin/dash as sehll instead /sbin/nologin [root@mail-gw:~]$ cat /scripts/sa-learn.sh #!/usr/bin/bash # Home-Directory und Name des Milter-Users SA_MILTER_HOME="/var/lib/spamass-milter" SA_MILTER_USER="sa-milt" # Permissions der Lern-Dateien sicherstellen chown root:$SA_MILTER_USER -R $SA_MILTER_HOME/training/ham/ chown root:$SA_MILTER_USER -R $SA_MILTER_HOME/training/spam/ chmod 750 $SA_MILTER_HOME/training/ham/ chmod 750 $SA_MILTER_HOME/training/spam/ chmod 640 $SA_MILTER_HOME/training/ham/*.eml chmod 640 $SA_MILTER_HOME/training/spam/*.eml # Duplikate in beiden Ordnern entfernen /usr/bin/fdupes -r -f $SA_MILTER_HOME/training/ham/ | grep -v '^$' | xargs rm -v 2> /dev/null /usr/bin/fdupes -r -f $SA_MILTER_HOME/training/spam/ | grep -v '^$' | xargs rm -v 2> /dev/null # Worker-Script als Milter-User ausfuehren /usr/bin/su -c "$SA_MILTER_HOME/training/learn.sh $1" $SA_MILTER_USER [root@mail-gw:~]$ cat /var/lib/spamass-milter/training/learn.sh #!/usr/bin/bash SA_MILTER_HOME="/var/lib/spamass-milter" SA_MILTER_USER="sa-milt" if test `whoami` = "$SA_MILTER_USER" then /bin/echo "" > /dev/null else /bin/echo "Das Script 'learn.sh' muss als Benutzer '$SA_MILTER_USER' aufgerufen werden" exit fi cd $SA_MILTER_HOME SHOW_HELP="0" if [ "$1" == "rebuild" ] || [ "$1" == "" ] || [ `echo $((($1*2)/2))` == "$1" ]; then # Kompletter Rebuild angefordert if [ "$1" == "rebuild" ]; then # Bayes-Reset /usr/bin/sa-learn --clear # SPAM-Training MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Verarbeite SPAM Samples" nice -n 19 /usr/bin/sa-learn --progress --spam $SA_MILTER_HOME/training/spam/*.eml echo "" # HAM-Training MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Verarbeite HAM Samples" nice -n 19 /usr/bin/sa-learn --progress --ham $SA_MILTER_HOME/training/ham/*.eml echo "" else # Default auf aktuellen Tag oder Parameter if [ "$1" == "" ]; then TRAIN_DAYS="1" else TRAIN_DAYS="$1" fi # HAM-Training MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Verarbeite SPAM Samples" nice -n 19 /usr/bin/find $SA_MILTER_HOME/training/spam/ -type f -name \*.eml -mtime -$TRAIN_DAYS | xargs -r /usr/bin/sa-learn --spam echo "" # HAM-Training MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Verarbeite HAM Samples" nice -n 19 /usr/bin/find $SA_MILTER_HOME/training/ham/ -type f -name \*.eml -mtime -$TRAIN_DAYS | xargs -r /usr/bin/sa-learn --ham echo "" fi else SHOW_HELP="1" fi if [ "$1" == "--help" ] || [ "$1" == "-h" ] || [ "$SHOW_HELP" == "1" ]; then echo "Bayes-Maintaining-Skript" echo "Usage:" echo " rebuild: Bayes komplett zuruecksetzen und anhand der Samples neu aufbauen" echo " : Alter der zu trainierenden Samples in Tagen (Default: 1)" exit fi MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Done" echo "" nice -n 19 /usr/bin/sa-learn --dump magic echo "" /usr/bin/ls -l -h --time-style=long-is $SA_MILTER_HOME/.spamassassin/ signature.asc Description: OpenPGP digital signature
Training Bayes On A Gateway
I've built a gateway server using sa-exim to filter email for our corporate Microsoft Exchange environment. It's working pretty good, but I have Bayes turned off due to the fact that I am unsure on how to train it in this type of environment. Has someone written a how to article on how to efficiently continually train Bayes in any environment like this. I was thinking if specific users could forward SPAM to some box on Exchange and have sa-exim POP it or something to "learn" that would be ideal, but maybe there is a better way. Any ideas are appreciated, the easier the better. TIA... John Traweek CCNA, Sec+ Executive Director, Information Technology Proud PCI Associate for 18 years PCI: the data company Heritage Square . 4835 LBJ Freeway, Suite 1100 . Dallas, TX 75244 . 214.530.0394 Did you know last year, PCI raised over 9 million dollars in donations for our clients? Ask us how! This Email is covered by the Electronic Communications Privacy Act, 18 U.S.C. Sections 2510-2521 and is legally privileged. The information contained in this Email is intended only for . If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distributions or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us by telephone 1.800.395.4724 X160, and destroy the original message.