Re: [dspam-users] multiple use?

Tom Allison Mon, 15 Jan 2007 07:18:03 -0800

Tony Earnshaw wrote:

Tom Allison wrote:
[...]

What about the tokens and the signature from the first instance?
What are the chances that I could do this without doing data integritydamage or suffering other inconsistencies in performance?
If it had been such a good idea, don't you reckon that Jonathan (whoafter all wrote dspam in the first place) would have "stumbled" on itlong ago?
--Tonni

If that was the case then why would I consider the wheel since someone mighthave stumbled on that one too...


I was working on a few assumptions:

a token is a representation of essentially a regex match in either case, CRM114or Bayes. Any overlap is purely coincidental.

How you manipulate the tokens, based on history, is dependent upon the method ofcalculation, markov/chi-square/naive, but they are dependent on the same basehistory of good/bad messages and good/bad tokens.


So a signature can consist of both naive derived tokens and SPBH derived tokens.

Any learning or correction of that token will be to apply a correction to thehistorical count (+1/-1) in either case. So the data and it's history remainsconsistent.

The more variations you can deploy in checking for spam the better the chancesthat something will get trapped.

The biggest advantage that dspam can provide is a lighter weight naive orchi-square determination, removing some of the more obvious spam via quarantine,followed by the slower CRM114 methodology to further determine what's left overfrom the bayes determination.

It probably won't work because there just isn't enough data captured about thetokens. But if it was truely a bad idea then why do so many people use multiplefilters to capture spam?

Re: [dspam-users] multiple use?

Reply via email to