Re: Training Bayes On A Gateway

2014-10-09 Thread Ted Mittelstaedt


I collect spam this way, periodically I scan the mail logs looking for 
"unknown user" entries and sort the results - usernames/email addresses 
that are repeatedly being "guessed" get an alias entry added that 
forwards the spam to a spam mailbox.  I have about 20 of these now that

are aliased to the spambox and that box gets tons and tons of spam.

Ham is just my own email folders - all legitimate mail I get, once I
finish dealing with it, goes into an archive, and that archive is
periodically fed into the Bays learner.

Ted

On 10/9/2014 12:43 PM, John Traweek CCNA, Sec+ wrote:

I’ve built a gateway server using sa-exim to filter email for our
corporate Microsoft Exchange environment. It’s working pretty good, but
I have Bayes turned off due to the fact that I am unsure on how to train
it in this type of environment. Has someone written a how to article on
how to efficiently continually train Bayes in any environment like this.
I was thinking if specific users could forward SPAM to some box on
Exchange and have sa-exim POP it or something to “learn” that would be
ideal, but maybe there is a better way. Any ideas are appreciated, the
easier the better. TIA…



*John Traweek CCNA, Sec+
*Executive Director, Information Technology
Proud PCI Associate for 18 years
T: 214.530.0394


*Did you know last year, PCI raised over 9 million dollars in donations
for our clients? Ask us how!*



*
*

This Email is covered by the Electronic Communications Privacy Act, 18
U.S.C. Sections 2510-2521 and is legally privileged. The information
contained in this Email is only for the intended recipient. If the
reader of this message is not the intended recipient, you are hereby
notified that any dissemination, distributions or copying of this
communication is strictly prohibited. If you have received this
communication in error, please notify us by telephone 1.800.395.4724
X160, and destroy the original message.



Re: Training Bayes On A Gateway

2014-10-09 Thread Jason W.
On Thu, Oct 9, 2014 at 4:14 PM, John Hardin  wrote:

> On Thu, 9 Oct 2014, John Traweek CCNA, Sec+ wrote:
>
>  I've built a gateway server using sa-exim to filter email for our
>>
>

> This topic comes up fairly regularly. Did you search the list archives on
> terms like "exchange bayes" ?
>

Since the OP mentioned exim, I'll share a bit of how I did something
similar. While I have Exchange in the picture, most of my users are not on
it.

I wanted to be able to fully reject mail at SMTP time if SpamAssassin (SA
does not block mail ) and not worry about whether exim would change the
log format if I did a 'fakereject'. SMTP rejects are nice since I do not
quarantine spam. I didn't see elsewhere, either on the SA wiki or
elsewhere, so figured I'd share and maybe help out somewhere..

I use exim's native SA integration, not sa-exim. I also use dovecot for my
IMAP users' mailboxes, and this is where my spam mail goes.

In my data ACL within exim.conf, I have:

---

  # Call SA and add some headers to the email delivered via normal means if
it's non-spam
  warnspam  = spam:true/defer_ok
  add_header= X-Spam-Score: $spam_score ($spam_bar)
  add_header= X-Spam-Report: $spam_report

  # If it's spam (defined as an SA score > 5), then run my custom deliver
script against the copy of the email in the exim mail spool.
 # Exim's mail spool copy won't have the above added headers, so need to do
so here to see them in the spam mailbox.
  warncondition = ${if >{$spam_score_int}{50}{1}{0}}
  condition = ${run{/home/spam/bin/deliver incoming-spam
$spool_directory/scan/$message_id/$message_id.eml 'X-Spam-Score:
$spam_score\nX-Spam-Report: $spam_report'}}

  denycondition = ${if >{$spam_score_int}{50}{1}{0}}
  message   = .

-

/home/spam/bin/deliver contains:



#!/bin/bash

MAILBOX=$1
FILE=$2
shift
shift
HEADERS="$*"

TMPFILE=/tmp/deliver.$$

echo -e "$HEADERS" >> $TMPFILE
# Exim writes out a standard mbox-style From line, remove it
cat $FILE | tail -n +2 >> $TMPFILE

# Dovecot must be root to do direct delievery
cat $TMPFILE | sudo /usr/libexec/dovecot/deliver -d spam -m $MAILBOX

rm $TMPFILE

-- 
HTH, YMMV, HANW :)

Jason

The path to enlightenment is /usr/bin/enlightenment.


Re: Training Bayes On A Gateway

2014-10-09 Thread John Hardin

On Thu, 9 Oct 2014, John Traweek CCNA, Sec+ wrote:


I've built a gateway server using sa-exim to filter email for our
corporate Microsoft Exchange environment.  It's working pretty good, but
I have Bayes turned off due to the fact that I am unsure on how to train
it in this type of environment.  Has someone written a how to article on
how to efficiently continually train Bayes in any environment like this.
I was thinking if specific users could forward SPAM to some box on
Exchange and have sa-exim POP it or something to "learn" that would be
ideal, but maybe there is a better way.  Any ideas are appreciated, the
easier the better.  TIA...


This topic comes up fairly regularly. Did you search the list archives on 
terms like "exchange bayes" ?


There's no explicit coverage of this in the wiki, but these pages may 
help:


http://wiki.apache.org/spamassassin/SiteWideBayesFeedback

http://wiki.apache.org/spamassassin/RemoteImapFolder

...though I've heard Exchange has deprecated public IMAP folders.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The reason it took so long to get Bin Laden is that it took the
  SEALs five years to swim that far into the desert.  -- anon
---
 861 days since the first successful private support mission to ISS (SpaceX)


Re: Training Bayes On A Gateway

2014-10-09 Thread Reindl Harald


Am 09.10.2014 um 21:43 schrieb John Traweek CCNA, Sec+:

I’ve built a gateway server using sa-exim to filter email for our
corporate Microsoft Exchange environment.  It’s working pretty good, but
I have Bayes turned off due to the fact that I am unsure on how to train
it in this type of environment.  Has someone written a how to article on
how to efficiently continually train Bayes in any environment like
this.  I was thinking if specific users could forward SPAM to some box
on Exchange and have sa-exim POP it or something to “learn” that would
be ideal, but maybe there is a better way.  Any ideas are appreciated,
the easier the better


i just decided to stay on spamass-milter which implies a single user and 
so one central bayes trained with a simple script from two folders (ham 
and spam) and disable any autolearning - users are adviced to foreard 
samples as attachment which get added after review, until now not more 
than 5 per day, the rest is catched by the fact that i receive currently 
10 email addresses including some alias-lists and so face all sort of crap


the ham folder just contains a lot of my legit mail in case it don#t 
contain sensible data


the machine itself is inbound only with postfix-transport tables after 
the filters and so should match your subject


so far the results are impressive

the first script is a wrapper running as root and take care of 
permissions and remove dulicates to optimize the training in case of a 
complete rebuild, the sample eml-files are renamend with Konqueror to 
"-mm-dd-#" and so get a automatic number wich offers to remove 
outdated spam samples and rebuild easy in a year or two


the second script does the training itself, is running as the 
milter-user and is called with "su" from the wrapper, the milter-user 
has /bin/dash as sehll instead /sbin/nologin


[root@mail-gw:~]$ cat /scripts/sa-learn.sh
#!/usr/bin/bash
# Home-Directory und Name des Milter-Users
SA_MILTER_HOME="/var/lib/spamass-milter"
SA_MILTER_USER="sa-milt"
# Permissions der Lern-Dateien sicherstellen
chown root:$SA_MILTER_USER -R $SA_MILTER_HOME/training/ham/
chown root:$SA_MILTER_USER -R $SA_MILTER_HOME/training/spam/
chmod 750 $SA_MILTER_HOME/training/ham/
chmod 750 $SA_MILTER_HOME/training/spam/
chmod 640 $SA_MILTER_HOME/training/ham/*.eml
chmod 640 $SA_MILTER_HOME/training/spam/*.eml
# Duplikate in beiden Ordnern entfernen
/usr/bin/fdupes -r -f $SA_MILTER_HOME/training/ham/ | grep -v '^$' | 
xargs rm -v 2> /dev/null
/usr/bin/fdupes -r -f $SA_MILTER_HOME/training/spam/ | grep -v '^$' | 
xargs rm -v 2> /dev/null

# Worker-Script als Milter-User ausfuehren
/usr/bin/su -c "$SA_MILTER_HOME/training/learn.sh $1" $SA_MILTER_USER

[root@mail-gw:~]$ cat /var/lib/spamass-milter/training/learn.sh
#!/usr/bin/bash
SA_MILTER_HOME="/var/lib/spamass-milter"
SA_MILTER_USER="sa-milt"
if test `whoami` = "$SA_MILTER_USER"
then
 /bin/echo "" > /dev/null
else
 /bin/echo "Das Script 'learn.sh' muss als Benutzer '$SA_MILTER_USER' 
aufgerufen werden"

 exit
fi
cd $SA_MILTER_HOME
SHOW_HELP="0"
if [ "$1" == "rebuild" ] || [ "$1" == "" ] || [ `echo $((($1*2)/2))` == 
"$1" ]; then

 # Kompletter Rebuild angefordert
 if [ "$1" == "rebuild" ]; then
  # Bayes-Reset
  /usr/bin/sa-learn --clear
  # SPAM-Training
  MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
  echo "$MY_TIME: Verarbeite SPAM Samples"
  nice -n 19 /usr/bin/sa-learn --progress --spam 
$SA_MILTER_HOME/training/spam/*.eml

  echo ""
  # HAM-Training
  MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
  echo "$MY_TIME: Verarbeite HAM Samples"
  nice -n 19 /usr/bin/sa-learn --progress --ham 
$SA_MILTER_HOME/training/ham/*.eml

  echo ""
 else
  # Default auf aktuellen Tag oder Parameter
  if [ "$1" == "" ]; then
   TRAIN_DAYS="1"
  else
   TRAIN_DAYS="$1"
  fi
  # HAM-Training
  MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
  echo "$MY_TIME: Verarbeite SPAM Samples"
  nice -n 19 /usr/bin/find $SA_MILTER_HOME/training/spam/ -type f -name 
\*.eml -mtime -$TRAIN_DAYS | xargs -r /usr/bin/sa-learn --spam

  echo ""
  # HAM-Training
  MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
  echo "$MY_TIME: Verarbeite HAM Samples"
  nice -n 19 /usr/bin/find $SA_MILTER_HOME/training/ham/ -type f -name 
\*.eml -mtime -$TRAIN_DAYS | xargs -r /usr/bin/sa-learn --ham

  echo ""
 fi
else
 SHOW_HELP="1"
fi
if [ "$1" == "--help" ] || [ "$1" == "-h" ] || [ "$SHOW_HELP" == "1" ]; then
 echo "Bayes-Maintaining-Skript"
 echo "Usage:"
 echo "  rebuild: Bayes komplett zuruecksetzen und anhand der Samples 
neu aufbauen"

 echo "  :  Alter der zu trainierenden Samples in Tagen (Default: 1)"
 exit
fi
MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
echo "$MY_TIME: Done"
echo ""
nice -n 19 /usr/bin/sa-learn --dump magic
echo ""
/usr/bin/ls -l -h --time-style=long-is $SA_MILTER_HOME/.spamassassin/



signature.asc
Description: OpenPGP digital signature


Training Bayes On A Gateway

2014-10-09 Thread John Traweek CCNA, Sec+
I've built a gateway server using sa-exim to filter email for our
corporate Microsoft Exchange environment.  It's working pretty good, but
I have Bayes turned off due to the fact that I am unsure on how to train
it in this type of environment.  Has someone written a how to article on
how to efficiently continually train Bayes in any environment like this.
I was thinking if specific users could forward SPAM to some box on
Exchange and have sa-exim POP it or something to "learn" that would be
ideal, but maybe there is a better way.  Any ideas are appreciated, the
easier the better.  TIA...

 






John Traweek CCNA, Sec+
Executive Director, Information Technology
Proud PCI Associate for 18 years
PCI: the data company




Heritage Square . 4835 LBJ Freeway, Suite 1100 . Dallas, TX  75244 . 
214.530.0394

Did you know last year, PCI raised over 9 million dollars in donations for our 
clients? Ask us how!

This Email is covered by the Electronic Communications Privacy Act, 18 U.S.C. 
Sections 2510-2521 and is legally privileged. The information contained in this 
Email is intended only for . If the reader of this message is not the intended 
recipient, you are hereby notified that any dissemination, distributions or 
copying of this communication is strictly prohibited. If you have received this 
communication in error, please notify us by telephone 1.800.395.4724 X160, and 
destroy the original message.