John Peacock wrote:
Elliot F wrote:
John, did you try any other bayesian filters before going to dspam? I was thinking of doing the same thing as I'm using crm114.......
I looked at crm114, but didn't test it...........
With dspam, I was able to build a shared profile using the SA corpus, and run with that for about two weeks on behalf of the ~400 users I manage. Then I was able to change one line and set everyone loose with their own configuration........
I wound up writing a process which scans the users' "Junk" folder on a daily basis and resubmits those false negatives for retraining. I am able to force everyone to use the web-based "Quarantine" folder for handling false positives.............
John
I just wanted to thread on the concept of grabbing the Junk folder, and the idea that spamassassin's bayes could do as well as any bayes, perhaps, depending more on how the administrator sets up ham and spam systemwide and for individuals--including grabbing their thunderbird Junk folders via nfs?!
To put it another way, it sounds like spamass bayes might not be getting the same training as another bayes being touted here lately. On the other hand, if it's more convenient to train another bayes, it's fair to say they are "better".
At least to make an initial "SA corpus", I am grabbing my own thunderbird Junk folder via nfs, deleting it, and also grabbing my carefully un-false-positived thunderbird Inbox via nfs also. In both cases I rip into individual messages via an awk script(hey, not bad, not perl, but SHORT) and then pipe to "safecat" to avoid file naming collisions. Then sa-learn the spam and ham. I guess it's too intrusive to grab Junk folders via nfs, RIGHT? But terrorism justifies ignoring user rights? Oh well. _________________________________________________________
# bigmailfile_to_files.awk called by rip_ham_spam BEGIN { CONVFMT = "%d" ; OFMT = "%d" ; i = 0 ; file = 0 ; } { ++i if ( $1 == "From" && $2 == "-" ) { if ( 0 == file ) { file = i ; } else { close("nice -n 19 /usr/bin/safecat tmp .") ; file = i ; } } print $0 | "nice -n 19 /usr/bin/safecat tmp ." } END { close("nice -n 19 /usr/bin/safecat tmp .") ; }
________________________________________________________ # rip_ham_spam calls above bigmailfile_to_files.awk to separate # big thunderbird mail folders into individual files which are piped # to safecat for naming(namespace collision avoidance) #!/bin/bash if [ -f "$1" ] then nice -n 19 awk -f /home/bb/bin/bigmailfile_to_files.awk $1 else inbox=/home/bb/nfs/.thunderbird/*.default/Mail/k-kdom.bushiedarpa.con/Inbox if [ -f $inbox ] && [ "$USER" = "root" ] then pushd /home/bb/m > /dev/null junk=/home/bb/nfs/.thunderbird/*.default/Mail/k-kdom.bushiedarpa.con/Junk pushd spam > /dev/null [ -d "tmp" ] || mkdir tmp cp $junk tmp/target nice -n 19 awk -f /home/bb/bin/bigmailfile_to_files.awk $junk rm -r tmp echo > ${junk} for spamslice in $( nice -n 19 ls -1 | nice -n 19 sort -r |\ nice -n 19 sed -n -e '5000,$p' |\ nice -n 19 tr '\n' ' ' ) do rm $spamslice 2> /dev/null done popd > /dev/null [ -d "ham" ] && rm -r ham mkdir -p ham/tmp cd ham cp $inbox tmp/target nice -n 19 awk -f /home/bb/bin/bigmailfile_to_files.awk tmp/target rm -r tmp for hamslice in $( nice -n 19 ls -1 | nice -n 19 sort -r |\ nice -n 19 sed -n -e "$[ 2 * $( ls -1 ../spam | wc -l ) ],\$p" |\ nice -n 19 tr '\n' ' ' ) do rm $hamslice 2> /dev/null done popd > /dev/null nice -n 19 chown -R bb.home /home/bb/m sa-learn -C /usr/share/spamassassin --clear sa-learn -C /usr/share/spamassassin --no-sync --ham /home/bb/m/ham sa-learn -C /usr/share/spamassassin --no-sync --spam /home/bb/m/spam sa-learn -C /usr/share/spamassassin --sync chown -R spamd.spamd /usr/share/spamassassin fi fi
I also honeypot web crawlers and usenet using a dozen non existing email addresses to collect some juicy spam, and one thing I do with that is take all IP's from those and make my own blacklist database, then tcprules, then that will set RBLSMTPD, which is looked at by dnsbl even though it runs under pperl. That hits on three to five percent of my spam denials, I would say, because those spammers hit me several times over a short interval.
update_blacklist() { # regress to bashism
[ "$( hostname )" = "heinous.harmless.info" ] || return 0
for a in /home/bb/m/sa-rbl/new/* # To: [EMAIL PROTECTED]
do [ -f "$a" ] && \
( mv $a /home/bb/m/spam
chown bb.home "/home/bb/m/spam/$( echo $a | sed 's/^.*\///g' )"
)
done
pushd /home/bb/m/spam > /dev/null
for oldspam in $( nice -n 19 ls -1 | nice -n 19 sort -r |\
nice -n 19 sed -n -e '5000,$p' | nice -n 19 tr '\n' ' ' )
do rm $oldspam 2> /dev/null
done
( for spork in *
do nice -n 19 sed -n -e '/^Received:/p' $spork | nice -n 19 tr '][)(' '\n\n\n\n'
done
) | nice -n 19 sed -n -e '/^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$/p' |\
nice -n 19 sort | nice -n 19 uniq | nice -n 19 sort -n |\
nice -n 19 sed -e '/[0-9][0-9][0-9][0-9]/d' -e '/[3-9][0-9][0-9]/d' \
-e '/2[6-9][0-9]/d' -e '/25[6-9]/d' | grep -v "37.79.123." | grep -v "^192" |\
grep -v "^127" |\
nice -n 19 tee -a /var/www/spammer.convicts.com/badlist | nice -n 19 sed \
-e 's/^.*$/&:allow,RBLSMTPD="IP listed at http:\/\/convicts.com\/"/1'
nice -n 19 chown -R bb.home /home/bb/m
popd > /dev/null
}
-Bob