Andy,

Yes, I am keeping a running total, but I'm only adding to the total if
SA tags it as spam (minimum score of 5 by default), so a score of 2
would not be added to the total.  (My immediate goal was to identify
really obvious spamsites.)

However, I'm very open to ideas for better/different algorithms...

What I have now is only a simple prototype, so I'm not married to the
algorithm at all.

In fact, I think there should be several different algorithms available,
so each site can pick which works best for them.  A normalized total
could be one of several.

I've been thinking of using average scores, moving averages, ratio of
spam to ham... that sort of thing.  I've also been trying to wrap my
head around how to incorporate arrival rates (how often they send spam)
into the algorithms, but I'm not entirely sure that makes a lot of
sense.

The possibilities for algorithms are endless.

I think a major design criteria for algorithms is that it not blacklist
any ham sites.  People get upset when they don't get their ham. ;-)

Do many sites get both ham and spam from the same IP addrs?  

My traffic is very black-and-white - either I get spam from a site or I
don't.  This makes a simple algorithm work well for me, but I'm guessing
that many sites aren't like mine.

Can some of you on the list help out here and comment with your traffic
patterns?

Thanks,

Vince


-----Original Message-----
From: Andy Fiddaman [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 23, 2007 6:48 AM
To: Vincent Fleming
Cc: [email protected]
Subject: Re: SA Functional extension suggestions?

On Wed, 18 Apr 2007, Vincent Fleming wrote:
; Here's what I did:  I decided to track spam scores (a running total)
and
; a timestamp (of the last spam detection).  If a ipaddr's spamscore
gets
; over a certain number (I picked 20), I reject connections in
; mlfi_connect().  I implemented an auto-delisting by deducting 1 point
; per day, so they won't stay on the blacklist forever, and then track
the
; number of times I delist them.  I weight their scores thereafter with
; the number of times they've been delisted, so they'll re-list
; automatically if they continue to send spam, and list for longer each
; time. (I multiply the spamcore of all new messages by the number of
; times I've delisted them.)

This is an interesting approach. Do you really mean running total
though?
That would mean that 10 messages scoring 2 would trigger the blacklist.
A normalised total sum(score - 5) would make more sense here.

Thanks,

Andy

Reply via email to