Re: spam w/ special characters

2004-09-18 Thread Matt Kettler
At 06:21 PM 9/18/2004 +0300, Marie Fischer wrote:
Hello,
in the last week or two, we have been getting some
spam that spamassassin doesn't seem to recognize.
A common feature of all these messages seems to be
that they contain lots of special characters
(~, ^, `, and others) mixed into the text.
I put some examples up at http://marie.vtl.ee/spam.txt
Would a rule to calculate some kind of special chars vs
total chars ratio be useful?
Does anybody have that kind of rule already?
I don't have any rules that count ratios, however most of example spams are 
EXACTLY why antidrug.cf was created. It has special rules which detect 
obfuscated forms of common spam drugs and it will penalize the obfuscations 
quite heavily.

http://mywebpages.comcast.net/mkettler/sa/antidrug.cf
Disclaimer: I am the author of antidrug, thus I do have a bias. I'd suggest 
checking the mass-check results for this ruleset that are posted on the 
spamassasin wiki:

http://wiki.apache.org/spamassassin/CustomRulesets


Re: spam w/ special characters

2004-09-18 Thread Loren Wilton
 Would a rule to calculate some kind of special chars vs
 total chars ratio be useful?
 Does anybody have that kind of rule already?

Doing that as a ratio would require an eval, I suspect.  However, detecting
obfuscated things is pretty easy.  You need some new rules!  :-)  Hie thee
off to exit0 or rulesemporium or the like and get Matt's antidrug set, just
for a start.

Here are some results from the first few of your spam:

Content analysis details:   (27.1 points, 4.6 required)

 pts rule name  description
 -- 
--
 1.4 SARE_ALC   Some header matches /improve your/i
 0.6 RATWR10a_MESSIDMessage-ID has ratware pattern (HEXHEX.HEXHEX@)
 1.8 LOCAL_OBFU_CELEXA  BODY: Obfuscated 'CELEXA' in body
 1.8 LOCAL_OBFU_XANAX   BODY: Obfuscated 'XANAX' in body
 1.8 LOCAL_OBFU_LEVITRA BODY: Obfuscated 'LEVITRA' in body
 1.8 LOCAL_OBFU_PAXIL   BODY: Obfuscated 'PAXIL' in body
 1.8 LOCAL_OBFU_VIAGRA  BODY: Obfuscated 'VIAGRA' in body
 1.0 SARE_OBFUGIRLS BODY: masked spam word(s)
 1.8 LOCAL_OBFU_MERIDIA BODY: Obfuscated 'MERIDIA' in body
 0.1 TW_OC  BODY: Odd Letter Triples with OC
 1.8 LOCAL_OBFU_CIALIS  BODY: Obfuscated 'CIALIS' in body
 1.8 LOCAL_OBFU_XENICAL BODY: Obfuscated 'XENICAL' in body
 1.5 DRUGS_ERECTILE_OBFUObfuscated reference to an erectile drug
 1.0 DRUGS_ANXIETY_OBFU Obfuscated reference to an anxiety control drug
 1.0 DRUGS_ERECTILE Refers to an erectile drug
 0.0 DRUGS_ANXIETY  Refers to an anxiety control drug
 0.0 DRUGS_DEPRESSION   Refers to an antidepressant
 0.0 DRUGS_DIET Refers to a diet drug
 1.0 DRUGS_DEPR_ERECRefers to both an erectile and an antidepressant
 1.0 DRUGS_ANXIETY_EREC Refers to both an erectile and an anxiety drug
 1.0 DRUGS_DIET_ERECRefers to both an erectile and a diet drug
 1.0 DRUGS_MANYKINDSRefers to at least four kinds of drugs

Content analysis details:   (31.0 points, 4.6 required)

 pts rule name  description
 -- 
--
 1.8 LOCAL_OBFU_XANAX   BODY: Obfuscated 'XANAX' in body
 1.8 LOCAL_OBFU_ZOLOFT  BODY: Obfuscated 'ZOLOFT' in body
 1.8 LOCAL_OBFU_LEVITRA BODY: Obfuscated 'LEVITRA' in body
 1.8 LOCAL_OBFU_CELEBREXBODY: Obfuscated 'CELEBREX' in body
 1.8 LOCAL_OBFU_PAXIL   BODY: Obfuscated 'PAXIL' in body
 2.8 LOCAL_OBFU_VICODIN BODY: Obfuscated 'VICODIN' in body
 1.8 LOCAL_OBFU_VIAGRA  BODY: Obfuscated 'VIAGRA' in body
 1.8 LOCAL_OBFU_MERIDIA BODY: Obfuscated 'MERIDIA' in body
 1.8 LOCAL_OBFU_VIOXX   BODY: Obfuscated 'VIOXX' in body
 1.8 LOCAL_OBFU_XENICAL BODY: Obfuscated 'XENICAL' in body
-0.0 BAYES_44   BODY: Bayesian spam probability is 44 to 50%
[score: 0.4966]
 1.5 DRUGS_ERECTILE_OBFUObfuscated reference to an erectile drug
 1.0 DRUGS_ANXIETY_OBFU Obfuscated reference to an anxiety control drug
 1.0 DRUGS_ERECTILE Refers to an erectile drug
 0.0 DRUGS_ANXIETY  Refers to an anxiety control drug
 1.0 DRUGS_PAIN_OBFUObfuscated reference to a pain relief drug
 0.0 DRUGS_DEPRESSION   Refers to an antidepressant
 0.0 DRUGS_PAIN Refers to a pain relief drug
 0.0 DRUGS_DIET Refers to a diet drug
 1.0 DRUGS_DEPR_ERECRefers to both an erectile and an antidepressant
 1.0 DRUGS_ANXIETY_EREC Refers to both an erectile and an anxiety drug
 1.0 DRUGS_PAIN_ERECRefers to both an erectile and a painkiller
 0.5 DRUGS_DIET_PAINRefers to both a diet drug and a pain drug
 1.0 DRUGS_DIET_ERECRefers to both an erectile and a diet drug
 1.0 DRUGS_MANYKINDSRefers to at least four kinds of drugs

Content analysis details:   (9.3 points, 4.6 required)

 pts rule name  description
 -- 
--
 0.6 RATWR10a_MESSIDMessage-ID has ratware pattern (HEXHEX.HEXHEX@)
 3.3 SARE_SUB_ONLINE_OB subject has obfuscated spammer topic
 1.7 BAYES_80   BODY: Bayesian spam probability is 80 to 90%
[score: 0.8257]
 1.7 SARE_SPEC_ANUMAURI: Domain with ALPHAs NUMBERs APLHAs



Re: spam w/ special characters

2004-09-18 Thread Raymond Dijkxhoorn
Hi!
in the last week or two, we have been getting some
spam that spamassassin doesn't seem to recognize.
A common feature of all these messages seems to be
that they contain lots of special characters
(~, ^, `, and others) mixed into the text.
I put some examples up at http://marie.vtl.ee/spam.txt
Would a rule to calculate some kind of special chars vs
total chars ratio be useful?
Does anybody have that kind of rule already?
Start using SURBL, as far as i can see in your samples all would have been 
tagged.

Bye,
Raymond.