Re: [exim] Reducing Spam Assassin Load

Marc Perkel Sat, 01 Oct 2005 08:30:44 -0700

The MD5 fingerprint is an interesting idea. Probably just concatinatespecific fields like the From: header and the host it was received from.If the message is "hammy" enough you append it to a text file called"blessed.txt". New messages are first checked against the blessed fileand if blessed they bypass spam assassin.

The blessed file is deleted every 30 minutes by a cron job which limitsthe time of the blessing and keeps the list size down so as to keep it fast.


Not a perfect solution - but I think it could work.

Lanny Jason Godsey wrote:

Maybe you could generate a fingerprint based on the first X lines of
the email and match?

Mail comes in, first 20 lines generate 20 md5 fingerprints.  You
process and store SA score.

Next message comes in, if 80% of the fingerprints match, bypass SA.

This would require integration into some sort of database.  You simply
do a select equal join based on queue id possibly and see if it returns
16 or more rows.

This of course is prone to all kinds of problems, I like to have every
user train their own filters.  This is why droping 85% of spam prior to
stastical filtering is a bad idea IMHO.

I use DSPAM and SA in tandem, as far as I know I'm the only one using
it   in the fassion I have setup.  After a while, SA is simply not
used.

--- Marc Perkel <[EMAIL PROTECTED]> wrote:

Peter Bowyer wrote:

On 01/10/05, Marc Perkel <[EMAIL PROTECTED]> wrote:

One of the things that is creating SA load is processing good

email. I'm

trying to figure out a way to bless stuff that I know is ham so I

can

bypass spam assassin. And it has to somehow just learn it

automatically.

But that's what SA does - learns what's spam and what's ham by
Bayesian analysis. I'd have thought any attempt to do this up front
would end up duplicating what SA does?

You could experiment with a reputation system which applies positive
scores whan an IP sends you ham and negative scores when it sends

spam

or fails an up-front test (DNSBL, HELO checks and so on). And set a
threshold for whitelisting around the SA check. But that would

prevent

SA learning from known ham - which is an important part of the
Bayesian process.

I know SA does that but SA is very processor and resource hungry. One

ofthe tricks I use to process the volume of email that I do is to avoid


using SA whenever I can. I have eliminated about 85% of spam before

itgoes to SA and that has increased my capacity to process mailgreatly.Now the problem is that all ham has to be processed through SA. Often


I'm getting a lot of ham from the same users or mailing lists which

isthe same good message over and over. And it all passes - but it slows


things down.


--

## List details at http://www.exim.org/mailman/listinfo/exim-users## Exim details at http://www.exim.org/

## Please use the Wiki with this list - http://www.exim.org/eximwiki/


--
Marc Perkel - [EMAIL PROTECTED]

Spam Filter: http://www.junkemailfilter.com
   My Blog: http://marc.perkel.com


--

## List details at http://www.exim.org/mailman/listinfo/exim-users## Exim details at http://www.exim.org/

## Please use the Wiki with this list - http://www.exim.org/eximwiki/

Re: [exim] Reducing Spam Assassin Load

Reply via email to