The MD5 fingerprint is an interesting idea. Probably just concatinate
specific fields like the From: header and the host it was received from.
If the message is "hammy" enough you append it to a text file called
"blessed.txt". New messages are first checked against the blessed file
and if blessed they bypass spam assassin.
The blessed file is deleted every 30 minutes by a cron job which limits
the time of the blessing and keeps the list size down so as to keep it fast.
Not a perfect solution - but I think it could work.
Lanny Jason Godsey wrote:
Maybe you could generate a fingerprint based on the first X lines of
the email and match?
Mail comes in, first 20 lines generate 20 md5 fingerprints. You
process and store SA score.
Next message comes in, if 80% of the fingerprints match, bypass SA.
This would require integration into some sort of database. You simply
do a select equal join based on queue id possibly and see if it returns
16 or more rows.
This of course is prone to all kinds of problems, I like to have every
user train their own filters. This is why droping 85% of spam prior to
stastical filtering is a bad idea IMHO.
I use DSPAM and SA in tandem, as far as I know I'm the only one using
it in the fassion I have setup. After a while, SA is simply not
used.
--- Marc Perkel <[EMAIL PROTECTED]> wrote:
Peter Bowyer wrote:
On 01/10/05, Marc Perkel <[EMAIL PROTECTED]> wrote:
One of the things that is creating SA load is processing good
email. I'm
trying to figure out a way to bless stuff that I know is ham so I
can
bypass spam assassin. And it has to somehow just learn it
automatically.
But that's what SA does - learns what's spam and what's ham by
Bayesian analysis. I'd have thought any attempt to do this up front
would end up duplicating what SA does?
You could experiment with a reputation system which applies positive
scores whan an IP sends you ham and negative scores when it sends
spam
or fails an up-front test (DNSBL, HELO checks and so on). And set a
threshold for whitelisting around the SA check. But that would
prevent
SA learning from known ham - which is an important part of the
Bayesian process.
I know SA does that but SA is very processor and resource hungry. One
of
the tricks I use to process the volume of email that I do is to avoid
using SA whenever I can. I have eliminated about 85% of spam before
it
goes to SA and that has increased my capacity to process mail
greatly.
Now the problem is that all ham has to be processed through SA. Often
I'm getting a lot of ham from the same users or mailing lists which
is
the same good message over and over. And it all passes - but it slows
things down.
--
## List details at http://www.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/
--
Marc Perkel - [EMAIL PROTECTED]
Spam Filter: http://www.junkemailfilter.com
My Blog: http://marc.perkel.com
--
## List details at http://www.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/