Re: [Clamav-users] Complexity limit on (custom) signatures?
On Sat, Oct 28, 2006 at 04:28:47PM -0700, Dennis Peterson wrote: > > > >I don't get it.. unless you have some big honeypot, maybe 5% of traffic > >contain small images to be OCRd. If your server can't handle that, I guess > >it's running out of juice anyway. :) > > > >You can even easily create separate scanning queue for OCR, so it doesn't > >interfere with normal traffic. > > You may have missed that I'm in the image industry - a great deal of > what we do is imagery including imagery with text in it, and as we have > to scan all images over a particular size, it would require more cpu > than is worth it. Ok that's fair. But you probably meant: scan everything _under_ SpamAssassin scan size. That's only whole messages less than ~256kB to be scanned by default in most software. I guess if you get images from all over, you can't whitelist etc then. Cheers, Henrik ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
Bill Randle wrote: On Sat, 2006-10-28 at 16:21 -0700, Dennis Peterson wrote: Actually, the FuzzyOCR plugin already handles animated gifs using various techniques to extract the hidden text. It also is able to decode png and jpeg files. Ah - so it does. I hadn't looked at v. 2.3. I'll have another look. Thanks, Bill. dp ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
On Sat, 2006-10-28 at 16:21 -0700, Dennis Peterson wrote: > Bill Randle wrote: > > On Sat, 2006-10-28 at 16:54 -0400, Kris Deugau wrote: > >> > >> However, in the long run, OCR to feed the text to SpamAssassin's other > >> rules is a better solution; it's much more flexible. > > > > Indeed. For those interested in the topic of OCR to feed SpamAssassin, > > there's an active project with its own mailing list that does just this. > > It turns out to be a non-trivial task because many of these image spam > > are animated gifs, so you need to find the right frame to pass to the > > OCR program. > > > > Start here: http://wiki.apache.org/spamassassin/FuzzyOcrPlugin then > > subscribe to the Devel-Spam mailing list (there's a link on that page). > > > You might want to consider the next level of image spam before you go > too far down the OCR path: > > http://www.iss.net/threats/Animated%20GIF.html Actually, the FuzzyOCR plugin already handles animated gifs using various techniques to extract the hidden text. It also is able to decode png and jpeg files. -Bill ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
Henrik Krohns wrote: On Sat, Oct 28, 2006 at 09:20:55AM -0700, Dennis Peterson wrote: I've explored OCR on both color and de-colorized images and there have been successes, but not enough to warrant turning it on in production. It is very cpu intensive. I don't get it.. unless you have some big honeypot, maybe 5% of traffic contain small images to be OCRd. If your server can't handle that, I guess it's running out of juice anyway. :) You can even easily create separate scanning queue for OCR, so it doesn't interfere with normal traffic. You may have missed that I'm in the image industry - a great deal of what we do is imagery including imagery with text in it, and as we have to scan all images over a particular size, it would require more cpu than is worth it. And when you consider repeating it all at a disaster recovery site it's starting to be a lot of computer power with a high false positive probability. You cannot count on the image spam being gif as png images are showing up now as are jpg, and animated gifs are also out there. OCR isn't practical for me but may be for others for a while - at least until they start to use CAPTCHA technology to get around it. dp ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
Bill Randle wrote: On Sat, 2006-10-28 at 16:54 -0400, Kris Deugau wrote: Henrik Krohns wrote: I don't get it.. unless you have some big honeypot, maybe 5% of traffic contain small images to be OCRd. If your server can't handle that, I guess it's running out of juice anyway. :) Well... yeah.The basic problem is that all the other garbage (with the occasional inevitable exception) is getting caught by Clam (viruses and most phishes) or SpamAssassin (all but a few text-based spams. I've found *enough* similarities in the raw binary image data to usefully make signatures for a lot of what is otherwise getting through; at the moment this is just a stopgap until these machines can be retired. However, in the long run, OCR to feed the text to SpamAssassin's other rules is a better solution; it's much more flexible. Indeed. For those interested in the topic of OCR to feed SpamAssassin, there's an active project with its own mailing list that does just this. It turns out to be a non-trivial task because many of these image spam are animated gifs, so you need to find the right frame to pass to the OCR program. Start here: http://wiki.apache.org/spamassassin/FuzzyOcrPlugin then subscribe to the Devel-Spam mailing list (there's a link on that page). You might want to consider the next level of image spam before you go too far down the OCR path: http://www.iss.net/threats/Animated%20GIF.html dp ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
On Sat, 2006-10-28 at 16:54 -0400, Kris Deugau wrote: > Henrik Krohns wrote: > > I don't get it.. unless you have some big honeypot, maybe 5% of traffic > > contain small images to be OCRd. If your server can't handle that, I guess > > it's running out of juice anyway. :) > > Well... yeah.The basic problem is that all the other garbage > (with the occasional inevitable exception) is getting caught by Clam > (viruses and most phishes) or SpamAssassin (all but a few text-based spams. > > I've found *enough* similarities in the raw binary image data to > usefully make signatures for a lot of what is otherwise getting through; > at the moment this is just a stopgap until these machines can be retired. > > However, in the long run, OCR to feed the text to SpamAssassin's other > rules is a better solution; it's much more flexible. Indeed. For those interested in the topic of OCR to feed SpamAssassin, there's an active project with its own mailing list that does just this. It turns out to be a non-trivial task because many of these image spam are animated gifs, so you need to find the right frame to pass to the OCR program. Start here: http://wiki.apache.org/spamassassin/FuzzyOcrPlugin then subscribe to the Devel-Spam mailing list (there's a link on that page). -Bill ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
Henrik Krohns wrote: > I don't get it.. unless you have some big honeypot, maybe 5% of traffic > contain small images to be OCRd. If your server can't handle that, I guess > it's running out of juice anyway. :) Well... yeah.The basic problem is that all the other garbage (with the occasional inevitable exception) is getting caught by Clam (viruses and most phishes) or SpamAssassin (all but a few text-based spams. I've found *enough* similarities in the raw binary image data to usefully make signatures for a lot of what is otherwise getting through; at the moment this is just a stopgap until these machines can be retired. However, in the long run, OCR to feed the text to SpamAssassin's other rules is a better solution; it's much more flexible. -kgd ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
On Sat, Oct 28, 2006 at 09:20:55AM -0700, Dennis Peterson wrote: > > I've explored OCR on both color and de-colorized images and there have > been successes, but not enough to warrant turning it on in production. It > is very cpu intensive. I don't get it.. unless you have some big honeypot, maybe 5% of traffic contain small images to be OCRd. If your server can't handle that, I guess it's running out of juice anyway. :) You can even easily create separate scanning queue for OCR, so it doesn't interfere with normal traffic. Cheers, Henrik ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
Kris Deugau wrote: The stock and pill spams that I'm trying to tag, however, have images that have *very small* variations message-to-message, but over a larger sample there's really very little that can be seen as "common" across the whole set - or even a significant part of the set. Automating the process of finding "all possible values for the byte at this position" is the only way I can usefully get anywhere. I did a binary diff and md5 checksums on hundreds of the stock and pill images and never found any two to be the same. They use a random noise generator to sprinkle the images with enough debris to prevent analysis, so even splitting the files into 128 and 512 byte slices and checking each of the slices was not helpful. Even when you convert the image to black and white to remove the color element there's still sufficient randomness to prevent go-nogo certainty. I've explored OCR on both color and de-colorized images and there have been successes, but not enough to warrant turning it on in production. It is very cpu intensive. I attempted to see if there were any digital watermarks in these images and found nothing although the math for doing this pushes my limits. I work in the image industry so have to be more careful than most regarding these, so others may have better luck than I which is another way of saying acceptable risk is site dependent. I'd be very interested in any headway you make. FWIW, I checked my current logs and found the MSRBL sigs blocked over 6,000 images in a two week period. The Sanesecurity filters stopped an additional 4,000. There were a total of 16383 messages blocked using all ClamAV filters, and many more thousands found by various milters and RBL/SURBL scans. This is on one of the smaller servers I run. The bigger mail farms are magnitudes greater for all categories. I mention this only because the out of pocket cost for these successes was $0.00 USD and very little time invested. Which reminds me, I should send some donation money to all the great folks who made these success possible. dp ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
Dennis Peterson wrote: > Not to change the direction on you, but you might want to take advantage > of the work Steve Basford is doing at > http://www.sanesecurity.com/clamav/ for phishing problems, and also look > at http://www.msrbl.com/site/stats for image and spam solutions. Both > sites are providing excellent results on systems I'm running. The > patterns are downloadable and very up to date. I've not had a single > complaint of false positives, and the number of patterns provided is > quite large. Those both look like excellent projects for the things they're targetting... but they don't really fit my problem. Phishing scams are mostly tagged by Clam already, and if not, they're generally tagged by SpamAssassin. This is working fine. Imagespam that doesn't mutate will quickly get noticed and tagged either via SpamAssassin's Bayes learner, or when I find a run of copies of the exact same image (which is all you can really tag with the MD5 signatures). FWIW, I have seen a few of these... about one in several thousand reported missed spams. :/ The stock and pill spams that I'm trying to tag, however, have images that have *very small* variations message-to-message, but over a larger sample there's really very little that can be seen as "common" across the whole set - or even a significant part of the set. Automating the process of finding "all possible values for the byte at this position" is the only way I can usefully get anywhere. On rare occasion, I find a duplicate, but that's ~1 in 500 or worse, which would add up to a LOT of MD5 sigs that wouldn't really do me any good. I've seen general patterns in the hex dumps, but there's enough variation that manually creating a signature to match these things is unworkable. > Steve has also written a very useable how-to for creating these patterns. A lot of the how-tos I've seen assume that whatever you're trying to create a signature for shows minor variations message-to-message, but shows a *very* large range over a larger number of messages (100+). :/ Thus the scripts I wrote to extract a chunk of hex-coded bytes, and crunch those down to what should be valid ClamAV signatures. An average signature from this process might look something like: ImgSpam.Misc.5:0:0:474946383761??(01|00)??00442c??(01|00)??0084(00|48|53)(00|15)(00|30|1c)f0f0f0(f0|e0|c0)f0(e0|b0|f0|d0|c0)f0(00|f0|40)(00|d0|e0|60|70)(f0|90|00|c0)(e0|90|00|b0|70)f0??(00|90|40|7d|10)(f0|ea)??(f0|00|e0|d0|46) Watch for linewrap, this is the just the first ~175 characters of a ~630-character sig. The complexity is typical of results I've been getting, and the rest of the sig is similar. -kgd ___ http://lurker.clamav.net/list/clamav-users.html
Re: [Clamav-users] Complexity limit on (custom) signatures?
On Friday October 27, 2006 at 08:42:34 (PM) Dennis Peterson wrote: > Not to change the direction on you, but you might want to take advantage > of the work Steve Basford is doing at > http://www.sanesecurity.com/clamav/ for phishing problems, and also look > at http://www.msrbl.com/site/stats for image and spam solutions. Both > sites are providing excellent results on systems I'm running. The > patterns are downloadable and very up to date. I've not had a single > complaint of false positives, and the number of patterns provided is > quite large. > > Steve has also written a very useable how-to for creating these patterns. Steve has done a remarkable job with his 'sig' files. He is constantly updating them. I know because I use them. they are always catching 'phishing' threats' on my PC. He also has two automated installers for downloading and installing his signature files. I wrote the 'script' version. There is also a Perl version available on his site. -- Gerard "There is nothing wrong with making love with the light on. Just make sure the car door is closed." George Burns ___ http://lurker.clamav.net/list/clamav-users.html