Re: [Declude.JunkMail] Filtering Question...

Matthew Bramble Mon, 15 Dec 2003 20:05:28 -0800

Chuck,

There are several different general uses for custom filtering. The Matt's School of Thought would teach as follows:

1) Programmatic filtering. This is more like pattern matching with custom filters. Patterns can be as simple as the country of origin, or more complex like gibberish inserted into spam in order to throw off some products. These filters can be highly effective at targeting crud spammers, even when they find a perfectly clean IP address. These guys often try multiple types of obfuscation in each message, and it's the techniques that give them away instead of the content. You can download a bunch of filters from my site, www.mailpure.com/software/decludefilters/ , and search the archives for versions of OBFUSCATION, DYNAMIC, PEXICOM, FORGEDHELO-IP, FORGEDHELP-FDQN, FORGEDASLOCAL, SPAMDOMAINS, and last week's "New fraud exploit". There are other examples as well that appear now and then.

2) Banned words list. These should be scored fairly low, but some words are highly indicative of spam, for instance the various drugs that are advertised, or terms related to sex, printer cartridges, anti-virus products, fraud and scams, etc. You can categorize these in one single file, and score each entry independently. You can also add words to the list as you discover false negatives that get through your system. This need not be a very large list, in fact I make due quite well with maybe 50 such entries, though I could pay a bit more attention to it. Spammers will obfuscate problematic words, which means that the entries themselves may cause more FP's than P's.

3) Pseudo-whitelist. This is a very useful file to have in order to mitigate the effects of false positives from tests. Every system out there makes a subconscious attempt to deem what a normal score is, and it's not necessary to counterbalance every last point that might be scored from every last test...otherwise we would be blocking on every RBL and whitelisting with every filter. I really don't get concerned about false positives on E-mails until they start to score consistently at 70% of my fail weight, and then I take action on them by listing them in this filter. My pseudo-whitelist is much larger than my own blocklist because I add a listing to it every time I encounter a false positive as a result of an RBL or external test. I do differentiate between responsible bulk mailers, direct senders, and those that come from neither.

4) Pseudo-blacklist. This is mostly what Kami has done by building a list of identifiers for what he considers to be spam. In many cases he lists multiple types of information, probably in the off chance that one piece changes, but the others remain trackable. The downside of tracking multiple pieces is that FP's can occur with multiple elements. I personally keep two filters for this use, one is IP based (uses IPFILE functionality) and the other is based on a range of things, it all depends on what I deem as a reliable identifier, but I group them by identifier. If I consider a source to be spam and its not he crud type of spam that comes from open relays or zombied machines (so it can be tracked by way of some identifier where that type will even throw away domains after a few days), then I throw it in that file. I don't add a lot of this stuff because most of the static spammers tend to be well blocked by the RBL's, though I must block something if a customer asks me to. This becomes resource intensive if your file(s) grow too large and can be hard to maintain, i.e. how do you expire listings.

Now as far as the pros and cons of using a particular data element for pseudo-whitelisting goes, you want to use the hardest to spoof piece of data that is reliable. The IP is the hardest, but it is rarely tracked due to the difficulty in maintaining this information, REVDNS is the next best, however it is sometimes spoofed with major ISP's and ecommerce sites. Data elements like HELO and MAILFROM are easily and often spoofed, and should be used as a last resort. You might even be forced to use HEADERS to search for an address that appears as the from, but not the MAILFROM, or in the event that you are counterbalancing an external test such as Message Sniffer, you might need to list URL's in a BODY filter since they will often track such things, and while you might get something through originally with a REVDNS counterbalance, a reply or forward of the same content could still trip Sniffer based on the content of the message.

A recent issue highlights the decision making process required for pseudo-whitelisting. I had a FP reported to me from a pay site that sends out daily newsletters. This company uses a third-party delivery service which has a big problem with spammers and is even listed on SBL, though they also managed to get listed in Bonded Sender (both of which seem inappropriate). The REMOTEIP, REVDNS, HELO and MAILFROM is from this untrusted third-party, however the From address (which isn't trackable in Declude currently), is unique to this sender, having their domain listed. So in order to allow them through in a reliable way, I chose a header filter that reads as follows:

HEADERS -15 CONTAINS @some-domain.com>

Most of course though get listed as REVDNS though, and I plan on starting an IPFILE for pseudo-whitelisting trusted bulk mailers, ecommerce companies, and ISP's, primarily because they might be spoofed and this protects from that. I've never seen an IP spoofed on the last hop, though you have to be very careful about this on multiple hop scanning.

Matt

Chuck Schick wrote:

We have  just upgraded to the Declude Junkmail Pro version mostly to take
advantage of filtering.  I have looked at Kami's filtering setup and I would
like to get some input on other filters especially negative filters.

1) Are others using revdns filters for mail from aol, yahoo, excite, etc.
with success since many of these domains trip no abuse, no postmaster tests?
If so, does anyone have a list they would care to share for this purpose?

2) I notice some are using a MAILFROM counterweight instead of Revdns
counterweight.  What are the pros and cons of that approach?

Chuck Schick
Warp 8, Inc.
303-421-5140
www.warp8.com

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Re: [Declude.JunkMail] Filtering Question...

Reply via email to