Couple of useful tests

2005-06-01 Thread Craig Jackson

Hi,
I created these tests which I find very accurate for detecting spam and 
so thought I'd let the list have a view. Lots of numbers or consonants 
in the reply-to usually bodes ill.


header REPLY_TO_NUMS_CJ Reply-To =~ /[0-9]{6,}/
score REPLY_TO_NUMS_CJ 5.000
header RET_PATH_NUMS_CJ Return-path =~ /[0-9]{6,}/
score RET_PATH_NUMS_CJ 5.000
header REPLY_TO_CONSON_CJ Reply-To =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
score RET_PATH_CONSON_CJ 5.000
header RET_PATH_CONSON_CJ Return-path =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
score RET_PATH_CONSON_CJ 5.000


If you can improve, please do -- have no mercy.

Craig Jackson



Re: Couple of useful tests

2005-06-01 Thread qqqq
Have you run it through the corpus tests?



- Original Message - 
From: Craig Jackson [EMAIL PROTECTED]
To: users@spamassassin.apache.org
Sent: Wednesday, June 01, 2005 12:50 PM
Subject: Couple of useful tests


| Hi,
| I created these tests which I find very accurate for detecting spam and 
| so thought I'd let the list have a view. Lots of numbers or consonants 
| in the reply-to usually bodes ill.
| 
| header REPLY_TO_NUMS_CJ Reply-To =~ /[0-9]{6,}/
| score REPLY_TO_NUMS_CJ 5.000
| header RET_PATH_NUMS_CJ Return-path =~ /[0-9]{6,}/
| score RET_PATH_NUMS_CJ 5.000
| header REPLY_TO_CONSON_CJ Reply-To =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
| score RET_PATH_CONSON_CJ 5.000
| header RET_PATH_CONSON_CJ Return-path =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
| score RET_PATH_CONSON_CJ 5.000
| 
| 
| If you can improve, please do -- have no mercy.
| 
| Craig Jackson
| 
| 


Re: Couple of useful tests

2005-06-01 Thread Fred
I am checking it now, I will have results in a few minutes.


 wrote:
 Have you run it through the corpus tests?
 
 
 



Re: Couple of useful tests

2005-06-01 Thread Wolfgang Zeikat



On 06/01/05 20:50, Craig Jackson wrote:

Hi,
I created these tests which I find very accurate for detecting spam and 
so thought I'd let the list have a view. Lots of numbers or consonants 
in the reply-to usually bodes ill.


Good point about the reply-to, thanks!


header REPLY_TO_NUMS_CJ Reply-To =~ /[0-9]{6,}/
score REPLY_TO_NUMS_CJ 5.000
header RET_PATH_NUMS_CJ Return-path =~ /[0-9]{6,}/
score RET_PATH_NUMS_CJ 5.000
header REPLY_TO_CONSON_CJ Reply-To =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
score RET_PATH_CONSON_CJ 5.000
header RET_PATH_CONSON_CJ Return-path =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
score RET_PATH_CONSON_CJ 5.000


I'd suggest to remove the y there. Shouldn' that be Return-Path instead 
of Return-path ?


Speaking of Return-Paths, have you checked your rules against mailing 
list software (ezmlm?!) envelope sender adresses? IIRC, they slightly 
resemble what you are trying to match ...


Regards,

wolfgang


RE: Couple of useful tests

2005-06-01 Thread Pierre Thomson
Even after corpus tests, I never give a single rule a score over 3 (local 
threshold is 6).  There's no reason a real live person couldn't choose a 
consonant-only email name, and I know of some universities that give out 
addresses like [EMAIL PROTECTED] which would trigger your first rule.

Pierre Thomson
BIC


-Original Message-
From: Craig Jackson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 01, 2005 2:50 PM
To: users@spamassassin.apache.org
Subject: Couple of useful tests


Hi,
I created these tests which I find very accurate for detecting spam and 
so thought I'd let the list have a view. Lots of numbers or consonants 
in the reply-to usually bodes ill.

header REPLY_TO_NUMS_CJ Reply-To =~ /[0-9]{6,}/
score REPLY_TO_NUMS_CJ 5.000
header RET_PATH_NUMS_CJ Return-path =~ /[0-9]{6,}/
score RET_PATH_NUMS_CJ 5.000
header REPLY_TO_CONSON_CJ Reply-To =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
score RET_PATH_CONSON_CJ 5.000
header RET_PATH_CONSON_CJ Return-path =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
score RET_PATH_CONSON_CJ 5.000


If you can improve, please do -- have no mercy.

Craig Jackson



Re: Couple of useful tests

2005-06-01 Thread Matt Kettler
Wolfgang Zeikat wrote:

 I'd suggest to remove the y there. Shouldn' that be Return-Path instead
 of Return-path ?

In spamassassin, header names are case-insensitive.

 
 Speaking of Return-Paths, have you checked your rules against mailing
 list software (ezmlm?!) envelope sender adresses? IIRC, they slightly
 resemble what you are trying to match ...

True, although most only use 5 digits or so, I guess it depends a lot on how
deep the list archive goes...

ie, this list:
[EMAIL PROTECTED]



Re: Couple of useful tests

2005-06-01 Thread Craig Jackson

Wolfgang Zeikat wrote:



On 06/01/05 20:50, Craig Jackson wrote:


Hi,
I created these tests which I find very accurate for detecting spam 
and so thought I'd let the list have a view. Lots of numbers or 
consonants in the reply-to usually bodes ill.



Good point about the reply-to, thanks!



header REPLY_TO_NUMS_CJ Reply-To =~ /[0-9]{6,}/
score REPLY_TO_NUMS_CJ 5.000
header RET_PATH_NUMS_CJ Return-path =~ /[0-9]{6,}/
score RET_PATH_NUMS_CJ 5.000
header REPLY_TO_CONSON_CJ Reply-To =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
score RET_PATH_CONSON_CJ 5.000
header RET_PATH_CONSON_CJ Return-path =~ 
/[bcdfghjklmnpqrstvwxyz]{5,}.*@/i

score RET_PATH_CONSON_CJ 5.000



I'd suggest to remove the y there. Shouldn' that be Return-Path instead 


Yes, you might be right. Many names end in consonants-y followed by 
consonants in the last name: e.g. [EMAIL PROTECTED]


Also, it might be good to throw in a ~ and a * which are usually part of 
spam.




Re: Couple of useful tests

2005-06-01 Thread Craig Jackson

Pierre Thomson wrote:

Even after corpus tests, I never give a single rule a score over 3 (local threshold is 
6).  There's no reason a real live person couldn't choose a consonant-only email name, 
and I know of some universities that give out addresses like [EMAIL 
PROTECTED] which would trigger your first rule.

Pierre Thomson
BIC


-Original Message-
From: Craig Jackson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 01, 2005 2:50 PM
To: users@spamassassin.apache.org
Subject: Couple of useful tests


Hi,
I created these tests which I find very accurate for detecting spam and 
so thought I'd let the list have a view. Lots of numbers or consonants 
in the reply-to usually bodes ill.


header REPLY_TO_NUMS_CJ Reply-To =~ /[0-9]{6,}/
score REPLY_TO_NUMS_CJ 5.000
header RET_PATH_NUMS_CJ Return-path =~ /[0-9]{6,}/
score RET_PATH_NUMS_CJ 5.000
header REPLY_TO_CONSON_CJ Reply-To =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
score RET_PATH_CONSON_CJ 5.000
header RET_PATH_CONSON_CJ Return-path =~ /[bcdfghjklmnpqrstvwxyz]{5,}.*@/i
score RET_PATH_CONSON_CJ 5.000


We're extremely aggressive with the scores because the tagged mail is 
sent to an IMAP folder -- and not deleted. We have strict email policies 
that preclude all personal email. This means that many emails that 
Spamassassin would ordinarily try to allow through, is a fair target for us.