Maybe you could generate a fingerprint based on the first X lines of
the email and match?

Mail comes in, first 20 lines generate 20 md5 fingerprints.  You
process and store SA score.

Next message comes in, if 80% of the fingerprints match, bypass SA.

This would require integration into some sort of database.  You simply
do a select equal join based on queue id possibly and see if it returns
16 or more rows.

This of course is prone to all kinds of problems, I like to have every
user train their own filters.  This is why droping 85% of spam prior to
stastical filtering is a bad idea IMHO.

I use DSPAM and SA in tandem, as far as I know I'm the only one using
it   in the fassion I have setup.  After a while, SA is simply not
used.

--- Marc Perkel <[EMAIL PROTECTED]> wrote:

> 
> 
> Peter Bowyer wrote:
> 
> >On 01/10/05, Marc Perkel <[EMAIL PROTECTED]> wrote:
> >  
> >
> >>One of the things that is creating SA load is processing good
> email. I'm
> >>trying to figure out a way to bless stuff that I know is ham so I
> can
> >>bypass spam assassin. And it has to somehow just learn it
> automatically.
> >>    
> >>
> >
> >But that's what SA does - learns what's spam and what's ham by
> >Bayesian analysis. I'd have thought any attempt to do this up front
> >would end up duplicating what SA does?
> >
> >You could experiment with a reputation system which applies positive
> >scores whan an IP sends you ham and negative scores when it sends
> spam
> >or fails an up-front test (DNSBL, HELO checks and so on). And set a
> >threshold for whitelisting around the SA check. But that would
> prevent
> >SA learning from known ham - which is an important part of the
> >Bayesian process.
> >
> >  
> >
> I know SA does that but SA is very processor and resource hungry. One
> of 
> the tricks I use to process the volume of email that I do is to avoid
> 
> using SA whenever I can. I have eliminated about 85% of spam before
> it 
> goes to SA and that has increased my capacity to process mail
> greatly. 
> Now the problem is that all ham has to be processed through SA. Often
> 
> I'm getting a lot of ham from the same users or mailing lists which
> is 
> the same good message over and over. And it all passes - but it slows
> 
> things down.
> 
> 
> -- 
> ## List details at http://www.exim.org/mailman/listinfo/exim-users 
> ## Exim details at http://www.exim.org/
> ## Please use the Wiki with this list - http://www.exim.org/eximwiki/
> 


-- 
## List details at http://www.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/

Reply via email to