I'd like to contribute some research I've done on spam that doesn't use traditional bayes filters or other scoring methods nor traditional DNS BLs. Its either spam or its not, but I'd like to see this technique in spamassasin, possibly with really high scores for things that this method says are "spam".
Advantages: * Simple method, no magic numbers * Targets the information spammers need you to see * Difficult to counter * Low false positives (almost none) * Incredible high success rate Disadvantages: * High network traffic overhead - a DNS cache is pretty much required. I use djbdns's dnscache. Here's the algorithm: 1 Decode any URL-encoding in the message 2 Un-MIME the message 3 Scan all parts of the message for URLs and email addresses (this can be links, IMG tags, mailto:'s, or even just something that looks like a web address or email address). Do NOT scan the headers. 4 For each address, resolve the hostname to an IP and then look up that IP in your favorite DNS RBL - I use "sbl-xbl.spamhaus.org" as it caches the most, but you can also add bl.spamcop.net and relays.ordb.net 5 As soon as any test in #4 comes back with a positive result, that message is spam, and you can go on to the next message. The last email server I set up had an additional step - each mime piece was a seperate file (from ripmime) sent to a small perl program to scan the messages for URLs. This extra test also sent each file to clamav for virus scanning configured to search inside archives (but not mailboxes). -- Evan Langlois
