On Mon, 23 Apr 2007, Vincent Fleming wrote:
;
; Can some of you on the list help out here and comment with your traffic
; patterns?

Quite happy to. It certainly looks like this approach is going to be
useful.

If you look at http://www.fiddaman.net/t.html you'll see a table generated
from my last few hours' worth of mail - I've blocked out the last two
octets of the IP addresses but otherwise it's as it comes out of the
database. I've only shown IPs which have sent more than 50 messages and
Norm5tot is sum(score - 5)

Based on this small sample of data, it looks very clear cut between
spam and non-spam sources - there may well be some scope for weighting
in both directions in the same way that AWL does.

I'm cautious, so I'm wondering about something like the following
algorithm - numbers just starting points.

Using a data window of the last 5 days,
IF average score > 20 AND total score normalised around 5 > 500
 AND Ham/Spam ratio < 0.1
        Start randomly sampling email from the IP such that ~1 in 20 is
        passed to SA, the others just get assigned the average score.
ENDIF

That data table took just a few seconds to generate from the log data I
record anyway so I can easily convert it into a lookup table for the
milter and update it every hour or so.

Of course, the SA approach would be to implement this as a plugin but I
have to say that the idea of avoiding the SA overhead completely appeals.
A plugin wouldn't be too difficult though, the existing AWL plugin has
most of the code and structure required.

A.

Reply via email to