Maybe you could generate a fingerprint based on the first X lines of the email and match?
Mail comes in, first 20 lines generate 20 md5 fingerprints. You process and store SA score. Next message comes in, if 80% of the fingerprints match, bypass SA. This would require integration into some sort of database. You simply do a select equal join based on queue id possibly and see if it returns 16 or more rows. This of course is prone to all kinds of problems, I like to have every user train their own filters. This is why droping 85% of spam prior to stastical filtering is a bad idea IMHO. I use DSPAM and SA in tandem, as far as I know I'm the only one using it in the fassion I have setup. After a while, SA is simply not used. --- Marc Perkel <[EMAIL PROTECTED]> wrote: > > > Peter Bowyer wrote: > > >On 01/10/05, Marc Perkel <[EMAIL PROTECTED]> wrote: > > > > > >>One of the things that is creating SA load is processing good > email. I'm > >>trying to figure out a way to bless stuff that I know is ham so I > can > >>bypass spam assassin. And it has to somehow just learn it > automatically. > >> > >> > > > >But that's what SA does - learns what's spam and what's ham by > >Bayesian analysis. I'd have thought any attempt to do this up front > >would end up duplicating what SA does? > > > >You could experiment with a reputation system which applies positive > >scores whan an IP sends you ham and negative scores when it sends > spam > >or fails an up-front test (DNSBL, HELO checks and so on). And set a > >threshold for whitelisting around the SA check. But that would > prevent > >SA learning from known ham - which is an important part of the > >Bayesian process. > > > > > > > I know SA does that but SA is very processor and resource hungry. One > of > the tricks I use to process the volume of email that I do is to avoid > > using SA whenever I can. I have eliminated about 85% of spam before > it > goes to SA and that has increased my capacity to process mail > greatly. > Now the problem is that all ham has to be processed through SA. Often > > I'm getting a lot of ham from the same users or mailing lists which > is > the same good message over and over. And it all passes - but it slows > > things down. > > > -- > ## List details at http://www.exim.org/mailman/listinfo/exim-users > ## Exim details at http://www.exim.org/ > ## Please use the Wiki with this list - http://www.exim.org/eximwiki/ > -- ## List details at http://www.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://www.exim.org/eximwiki/
