Re: perceptron and over-scoring (Re: Over-scoring of SURBL lists... )
On Monday, February 20, 2006, 12:39:31 PM, Theo Dinter wrote: Just for some info... I went through the set1 spam logs for 3.1 score generation. 1112804 total messages 776108 messages hit SURBL 138407 1 SURBL list(s) hit (1+ = 776108) 189795 2 SURBL list(s) hit (2+ = 637701) 281255 3 SURBL list(s) hit (3+ = 447906) 136964 4 SURBL list(s) hit (4+ = 166651) 29685 5 SURBL list(s) hit (5+ = 29687) 2 6 SURBL list(s) hit (6+ = 2) The set1 ham logs: 477629 total messages 1023 messages hit SURBL 992 1 SURBL list(s) hit (1+ = 1023) 23 2 SURBL list(s) hit (2+ = 31) 5 3 SURBL list(s) hit (3+ = 8) 3 4 SURBL list(s) hit (4+ = 3) 0 5 SURBL list(s) hit (5+ = 0) 0 6 SURBL list(s) hit (6+ = 0) So from these results, the FP rate is very low for SURBL (0.21%), and while there is a ton of overlap for spam (57.3%), there's very little for ham (0.01%). Thank you for data. They seem to support what we've been saying. At a count of 138407, messages that hit only 1 SURBL are significant, so lowering the scoring of a single list hit significantly may result in significant FNs. Cheers, Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: perceptron and over-scoring (Re: Over-scoring of SURBL lists... )
On Tue, 2006-02-21 at 06:53 -0800, Jeff Chan wrote: On Monday, February 20, 2006, 12:39:31 PM, Theo Dinter wrote: Just for some info... I went through the set1 spam logs for 3.1 score generation. 1112804 total messages 776108 messages hit SURBL 138407 1 SURBL list(s) hit (1+ = 776108) 189795 2 SURBL list(s) hit (2+ = 637701) 281255 3 SURBL list(s) hit (3+ = 447906) 136964 4 SURBL list(s) hit (4+ = 166651) 29685 5 SURBL list(s) hit (5+ = 29687) 2 6 SURBL list(s) hit (6+ = 2) The set1 ham logs: 477629 total messages 1023 messages hit SURBL 992 1 SURBL list(s) hit (1+ = 1023) 23 2 SURBL list(s) hit (2+ = 31) 5 3 SURBL list(s) hit (3+ = 8) 3 4 SURBL list(s) hit (4+ = 3) 0 5 SURBL list(s) hit (5+ = 0) 0 6 SURBL list(s) hit (6+ = 0) So from these results, the FP rate is very low for SURBL (0.21%), and while there is a ton of overlap for spam (57.3%), there's very little for ham (0.01%). Thank you for data. They seem to support what we've been saying. At a count of 138407, messages that hit only 1 SURBL are significant, so lowering the scoring of a single list hit significantly may result in significant FNs. But maybe we have to have a scoring like this - current SURBL score if only on that list - if on List1 and list2 then not a score of list1+list2 but more like a basic SURBL score + fixed value - if on List1 and list2 and list3 then not a score of list1+list2+list3 but more like a basic SURBL score + 2*(fixed value) 21% of all the SURBL hitting spam hit more then 4 list records. If this where a FN (not very likely but possible) then the score would be to high to compensate but if we use a scoring rule like above then the score of a 4+ hiting spam message would be e.g. basic SURBL score = 3 3*fixed value = 1 score = 6 and maybe with a SURBL list with very low FP score there could be a gain in the fixed value score. Maurice Lucas
Re: perceptron and over-scoring (Re: Over-scoring of SURBL lists... )
On Mon, Feb 20, 2006 at 07:38:42PM +, Justin Mason wrote: yes, I'm a little worried about that, too. Just for some info... I went through the set1 spam logs for 3.1 score generation. 1112804 total messages 776108 messages hit SURBL 138407 1 SURBL list(s) hit (1+ = 776108) 189795 2 SURBL list(s) hit (2+ = 637701) 281255 3 SURBL list(s) hit (3+ = 447906) 136964 4 SURBL list(s) hit (4+ = 166651) 29685 5 SURBL list(s) hit (5+ = 29687) 2 6 SURBL list(s) hit (6+ = 2) The set1 ham logs: 477629 total messages 1023 messages hit SURBL 992 1 SURBL list(s) hit (1+ = 1023) 23 2 SURBL list(s) hit (2+ = 31) 5 3 SURBL list(s) hit (3+ = 8) 3 4 SURBL list(s) hit (4+ = 3) 0 5 SURBL list(s) hit (5+ = 0) 0 6 SURBL list(s) hit (6+ = 0) So from these results, the FP rate is very low for SURBL (0.21%), and while there is a ton of overlap for spam (57.3%), there's very little for ham (0.01%). -- Randomly Generated Tagline: Winny and I lived in a house that ran on static electricity... If you wanted to run the blender, you had to rub balloons on your head... if you wanted to cook, you had to pull off a sweater real quick... -- Steven Wright pgptJCSaZiRLm.pgp Description: PGP signature
Re: perceptron and over-scoring (Re: Over-scoring of SURBL lists... )
Theo Van Dinter writes: On Mon, Feb 20, 2006 at 07:38:42PM +, Justin Mason wrote: yes, I'm a little worried about that, too. So from these results, the FP rate is very low for SURBL (0.21%), and while there is a ton of overlap for spam (57.3%), there's very little for ham (0.01%). aha, that's very interesting! --j.
Re: perceptron and over-scoring (Re: Over-scoring of SURBL lists... )
Hi! On Mon, Feb 20, 2006 at 07:38:42PM +, Justin Mason wrote: yes, I'm a little worried about that, too. So from these results, the FP rate is very low for SURBL (0.21%), and while there is a ton of overlap for spam (57.3%), there's very little for ham (0.01%). aha, that's very interesting! And no surprise, we have been discussing this internally also and really see very few FP reports overlapping. And please, if you DO get a FP, _report_ ! ([EMAIL PROTECTED]) Thanks! Raymond.