> From: Gary Mills <[EMAIL PROTECTED]> > I've been reading Ironport's advertizing. They claim good success > rates on blocking image spam. In addition to analysis of the message > body, they use OCR techniques to extract the text from the image, as > well as examining the composition of the image for features typical of > current spam. Could DCC do anything of this sort for image attachments?
For that last question, perhaps the DCC clients might do something more for images, but it would be like -Gon (greylisting) and -B (DNS blacklsits) and not directly related to Distributed Checksum Clearinghouses. Concerning the advertising, I think the real question is which or how many contradictory brands of wishful thinking are you willing to believe? The two brands relevant here are "CAPCHAs prevent abuse" and "OCR can decode evil images." If the good guys can use OCR on spam images can convert them to text that might be analyzed with keywords (e.g. so called Bayesian filters) or even DCC body checks, then bad guys can use OCR to bypass CAPCHAs with automated account sign-ups etc. Worse, the good guys generally need to use already heavily loaded computers to decode 100 or 10,000 times as many evil images (one per image spam) than the bad guys need to decode CAPCHAs on their lightly loaded attack systems. Examining the composition of the image for features typical of current spam would involve looking for animation or statistical characterics of pixels of fuzzed-out text. That sounds to me like sooner or later rejecting most images, which sounds rather like treating images like Microsoft program text and rejecting all of them. I don't see any harm in requireing that images be transported with a prototocol other than SMTP (e.g. HTTP or FTP), but that may say more about me and my continued use of a pure text mail user agent that cannot handle any MIME at all. That's not so say that you can't make a system that uses a bunch of spam filters including image analysis and get good results. I am claiming that if you skip the image analysis and stick to simpler things such as checking the URLs that anchor the images in DNS blacklists, you probably will get results as good or better. (Spam that requires spam targets to manually copy URLs from fuzzy images to a browser sounds like bad idea, which might be why the images are often only covers for <A HREF> links.) Vernon Schryver [EMAIL PROTECTED] _______________________________________________ DCC mailing list [email protected] http://www.rhyolite.com/mailman/listinfo/dcc
