Andy Schmidt wrote:
Hey Matt:
One question - I know that you have been spending a lot of time programming content filters.
I'm curious whether you are using Sniffer and whether you found that you
needed all those filters to improve detection over Sniffer rules (which then
makes me wonder why they are not made part of Sniffer) - or whether you are
trying to substitute Sniffer?
I'm not trying to substitute Sniffer, but I see no reason to be heavily dependent on it either. Sniffer is a critical component on our system and it hits 94% to 97% of the messages that we block on a daily basis. The results on pure spam is probably a bit higher, but for instance we are blocking about 2% of our volume as Joe-Job bounces and there are other things that get blocked that aren't technically spam, but is garbage, and while Sniffer does hit on much of this stuff, it does in lower numbers.
I consider Sniffer primarily to be my substitute for content filtering. Instead of tagging the wordage, it tags the links primarily (some exceptions of course). When combined with other filters, it is much more powerful than both alone, and the same thing goes of our custom filters. So for instance, if we get a DUL hit plus Sniffer hit, the confidence in it being spam goes up and we add extra points for that condition as well as many others, this also allows us to lower the scores on both Sniffer and DUL hits (and others) because combination filters are like multipliers, and they often hit in combination. At the same time however we were finding that a good deal of obvious DUL stuff wasn't hitting on the DNSBL's that we use so we started creating our own DUL filters based on reverse DNS entries using the new NOTCONTAINS functionality (required for this sort of work). We are now tagging 20% more DUL hits as a result, and doing it more reliably than before in fact (we defeat the filter when IPNOTINMX is not hit, meaning that an MX record has been created for the domain to point to that DUL space, thus allowing servers from such space to connect without punishment). I actually consider most of my filters to be "technical heuristics" instead of "content filters" because I'm looking for patterns in almost all of them and not words or phrases.
I've gotten serious about pushing a business model for spam blocking in recent months and word-of-mouth combined with old-fashioned sales has brought us a good deal of business for a company that hasn't even launched a site or done any advertising. Our spam blocking percentage is about 99.7% on our Medium setting (Hold at 13). While that is definitely much better than the big players and impossible to beat measureably, I figure that over time the big players will catch up or come a lot closer. What makes us special though is that we have managed to segregate the blocked messages so that 99% of it lands in what we call Drop (score of 25+) and 1% of it lands in Hold (score of 10 or 13-24), and along with that comes other associated capabilities. We are able to review our Hold file for every one of our customers on a daily basis because the work load is so little, for instance yesterday out of just over 52,000 blocked messages, only 465 landed in our Hold range (0.89%). We advise our customers to review this themselves and by not mixing in 100% of the spam for them to review, it makes it much more likely that they will do so. Naturally not all false positives will land in our Hold range, but I have never seen a personal message land in our Drop range, and it's generally very gray stuff that lands in Drop such as some newsletter that uses the services of a company that primarily engages in spamming (I've only caught this 3 times in Drop, but it should be more than 99.99% accurate). We try to get all mixed sources to land in Hold however, but sometimes Sniffer helps to push some over the top and of course we also make mistakes. Yesterday we found and reprocessed 9 false positives (personal E-mail and newsletters) out of 52,000 messages blocked, and we resolved the conditions that created every one of them so that they would no longer have issues. There was some additional advertising content that is questionable that was blocked as well but those things generally require more research and are not handled immediately as they are not missed. Without Sniffer our accuracy would go down and the size of our hold file would go up, and we would leak more spam, but we would survive and that's important because we can't become completely dependent on any single source of data as that represents a liability.
Sniffer has played a major role in our ability to do all of this, but on it's own it's just another tool, albeit one that hits the vast majority of spam, and it's up to the administrator to make as much as they can of it. By creating pattern filters and also our own RBL, we are able to achieve better differentiation between spam and non-spam. Believe it or not, we have actually reduced the number of custom filters that we use while dramatically reducing the number of messages that land in our Hold range and improving our block rate, primarily through combo filters and our RBL. Just two months ago messages were landing in our Hold range at a rate that is more than 3 times what it is now and our workload for review has actually reduced while dramatically growing volume which gives us more time to deal with things like false positives and automating other tasks. Still though, there's much to be done.
I recommend that everyone buy Sniffer, and it's not just because I think Pete is a swell guy :)
Matt
-- ===================================================== MailPure custom filters for Declude JunkMail Pro. http://www.mailpure.com/software/ =====================================================
--- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
--- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com.