Re: [GENERAL] [OT] Tom's/Marc's spam filters?

Joe Conway Wed, 21 Apr 2004 11:07:28 -0700

Michael Chaney wrote:

Make sure you have the latest SA and make sure that Bayesian filtering
is turned on and working, and make sure to train the filter.  Reply to
me offlist if you need a group of 5000 or so spams to help train it.

I've got the latest SA and I'm using Bayesian filtering, autolearn, razor2, dcc, and pyzor. I'm also using relays.ordb.org, sbl.spamhaus.org, bl.spamcop.net, and blackholes.five-ten-sg.com (although I just added that last one yesterday). I've verified that autolearn is working. I have my threshold set downward, from the default of 5.0, to 2.5.

I get a comparible amount of spam (~600 to 1000 per day) and my setup *was* about 98% effective until a month or so ago. These days it is more like 80%. I've noticed many of the spam getting through appears specifically targeted at getting by SA -- no HTML, a paragraph of nonsense (or sometimes out of some public domain book), and a one liner trying to sell me a mortgage or something.

The one thing I had *not* been doing, but started to do as of last night, is to use the false-negatives to explicitly train the Bayesian filter. It was easy enough to set up. I created an hourly cron job as follows:

/usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox

Now I just drop all false negatives into that mailbox, and clean them out periodically. Hopefully that will make a significant improvement.

Joe

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: [GENERAL] [OT] Tom's/Marc's spam filters?

Reply via email to