I'd like to contribute some research I've done on spam that doesn't use
traditional bayes filters or other scoring methods nor traditional DNS BLs. Its
either spam or its not, but I'd like to see this technique in spamassasin,
possibly with really high scores for things that this method says are "spam".

Advantages:
  * Simple method, no magic numbers
  * Targets the information spammers need you to see
  * Difficult to counter
  * Low false positives (almost none)
  * Incredible high success rate

Disadvantages:
  * High network traffic overhead - a DNS cache is pretty much required.  I use
djbdns's dnscache.


Here's the algorithm:

  1  Decode any URL-encoding in the message
  2  Un-MIME the message
  3  Scan all parts of the message for URLs and email addresses (this can be
links, IMG tags, mailto:'s, or even just something that looks like a web
address or email address).  Do NOT scan the headers.
  4  For each address, resolve the hostname to an IP and then look up that IP
in your favorite DNS RBL - I use "sbl-xbl.spamhaus.org" as it caches the most,
but you can also add bl.spamcop.net and relays.ordb.net
  5  As soon as any test in #4 comes back with a positive result, that message
is spam, and you can go on to the next message.

The last email server I set up had an additional step - each mime piece was a
seperate file (from ripmime) sent to a small perl program to scan the messages
for URLs.  This extra test also sent each file to clamav for virus scanning
configured to search inside archives (but not mailboxes).

-- Evan Langlois

Reply via email to