RE: [Mimedefang] Image validator/OCR SA plugin
Hi, be something to be gained by running the OCR scan from mimdefang? The idea would be to run the scan, and if sufficient text results (I'd hesitate to suggest that a quick spelling scan would be run on the result, but that is a possibility) that this text is written by MdF into a new text attachment. The message is then reformulated and passed to Spamassassin. The advantage of this approach is that SA (and rules du jour) already have rules for catching things like pharma and stock scam e-mail, so the normal scoring should catch these Hmm, the SA and rules du jour stock and obfu rules suck ;-) Beside that, I also match some words which are 100% legitimate. And the OCR words are often truncated so one must match those too. things. Also this approach would work on versions of SA prior to 3.1.1. There is a design decision as to whether the OCR'd text attachment should remain in the message and then be delivered to the user, or whether it would only be kept if SA scores the message as spam. If you add the OCR'd text attachment to the message you'll have to resend the whole message. Not a good idea IMHO. Martin ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: [Mimedefang] Image validator/OCR SA plugin
Martin Blapp wrote: Hmm, the SA and rules du jour stock and obfu rules suck ;-) Beside that, I also match some words which are 100% legitimate. And the OCR words are often truncated so one must match those too. But the real key is Bayes. Adding the OCR words to Bayes will be a real advantage. If you add the OCR'd text attachment to the message you'll have to resend the whole message. Not a good idea IMHO. No, you only add it to what you feed SpamAssassin. Regards, David. ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
RE: AW: [Mimedefang] Image validator/OCR SA plugin
-Original Message- From: Martin Blapp Sent: Monday, April 17, 2006 8:00 AM Spamassassin version is 3.1.0, looks like I'll have to upgrade to 3.1.1 to get this to work? Seems so, yes. I'll correct the manual. Has this package/plugin been updated yet with the various fixes suggested to date? I had a little trouble tracking some of the suggested fixes. ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
RE: [Mimedefang] Image validator/OCR SA plugin
So far in my tests, this OCR plugin looks like it's working ok. I rounded up the needed prereqs (that was a bit of a chore, but everything compiled cleanly), and changed the package definition as indicated in Martin's post (be sure to run spamassassin -D --lint). So far I've seen several hits for the ocr SUSPECT_GIF rule, with no detectable problems. Ken ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: [Mimedefang] Image validator/OCR SA plugin
On 14 Apr 2006 at 18:42, Martin Blapp wrote: This is just a little advertisement for my plugin which is now in a usable state and works very well. Anyone interested should keep an eye on it - it really helps with the image only spam we get today. But problably the spammers will soon change their tricks to different images which are more difficult to read :-( This is a really cool idea. As far as spammers obfuscating their images, couldn't that be worked around by tying OCR into the bayesian system? Then obfuscation wouldn't matter--whatever munging is done to a particular image would produce the same OCR strings, before and after bayes training. You wouldn't need to know particular strings to match beforehand in that case. That would force image spammers would to produce a unique obfuscated graphic for every single message, which seems like an expensive proposition. Of course, I once thought producing a unique set of (text) bayes poison for every message was expensive, and that sure didn't stop them... Nels Lindquist * Information Systems Manager Morningstar Air Express Inc. ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: [Mimedefang] Image validator/OCR SA plugin
Nels Lindquist wrote: As far as spammers obfuscating their images, couldn't that be worked around by tying OCR into the bayesian system? I think the original idea was to obfuscate the images so people could read the text, but OCR tools wouldn't be able to. Then obfuscation wouldn't matter--whatever munging is done to a particular image would produce the same OCR strings, before and after bayes training. You wouldn't need to know particular strings to match beforehand in that case. True, but you'd need to see enough of them to train your Bayes engine. That would force image spammers would to produce a unique obfuscated graphic for every single message, which seems like an expensive proposition. Sadly, serious spammers have virtually unlimited computing resources. There are armies of thousands of zombie machines out there waiting to do their masters' bidding... Adding random noise that fools OCR tools but leaves the images legible for humans probably isn't that computationally expensive. The only way to defeat image spam would be if Microsoft modifies Outlook not to display HTML or images, and for Thunderbird et al to follow suit. Anyone care to bet on the odds of that happening? :-( Regards, David. ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: [Mimedefang] Image validator/OCR SA plugin
Dave Williss wrote: - Original Message - From: Gary Funck [EMAIL PROTECTED] To: mimedefang@lists.roaringpenguin.com Sent: Sunday, April 16, 2006 6:34 PM Subject: RE: [Mimedefang] Image validator/OCR SA plugin Martin wrote: But problably the spammers will soon change their tricks to different images which are more difficult to read :-( http://antispam.imp.ch/patches/patch-ocrtext On this topic, Nick FitzGerald mentioned this article, http://www.jgc.org/blog/2006/01/do-spammers-fear-ocr.html Sunday, January 15, 2006 Do spammers fear OCR? Nick FitzGerald recently sent me two sample spams that seem to indicate that some spammers fear that using images in place of words isn't enough. They've started to obscure their messages to prevent optical character recognition. I'm afraid they'll start using OCR themselves. One common trick to allow a web site to have an email address humanly readable but not harvestable is to put it in an image. That may not be so safe any more :-( Of course, they'd have to scan an awful lot of images in the hopes of finding an email address in any of them, so they may not find it worth the effort. Good! That's all I can say. Fine! Putting stuff into bitmaps is a travesty, and makes websites (a) harder to search on automatically, but much more egregious is that (b) it limits their accessibility to people with vision impairment using text-to-speech browsers. A lot of web pages also don't seem to take into account .21 or .19 pitch LCD monitors (like the 2560x1600 monitor I'm staring at). Do you know how small a 7x9 font looks like on that monitor? It looks like lint. Putting images or explicit font sizes (like use a 9 pixel high font here instead of saying use a 8 point font here [and let the browser scale it appropriately] is idiotic). I'm glad SOME good has finally come of the travail of spammers (though I never thought I'd live to hear myself say it). Ok. End of rant. -Philip ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
AW: [Mimedefang] Image validator/OCR SA plugin
Hi Martin, Anyone interested should keep an eye on it - it really helps with the image only spam we get today. But problably the spammers will soon change their tricks to different images which are more difficult to read :-( http://antispam.imp.ch/patches/patch-ocrtext Just tried to get this to run on one of my test boxes; * first problem, as described by Paul Murphy: Can't locate object method new via package Mail::SpamAssassi::Plugin::ocrtext, so I changed the package definition. * next problem: running spamassassin -t on a test messages give me this output: [5681] warn: plugin: eval failed: Can't locate object method new via package Mail::SpamAssassin::Timeout (perhaps you forgot to load Mail::SpamAssassin::Timeout?) at /etc/mail/spamassassin/ocrtext.pm line 391. Spamassassin version is 3.1.0, looks like I'll have to upgrade to 3.1.1 to get this to work? Thanks, Martin ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: AW: [Mimedefang] Image validator/OCR SA plugin
Hi, Spamassassin version is 3.1.0, looks like I'll have to upgrade to 3.1.1 to get this to work? Seems so, yes. I'll correct the manual. Thanks, Martin ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: [Mimedefang] Image validator/OCR SA plugin
- Original Message - From: Gary Funck [EMAIL PROTECTED] To: mimedefang@lists.roaringpenguin.com Sent: Sunday, April 16, 2006 6:34 PM Subject: RE: [Mimedefang] Image validator/OCR SA plugin Martin wrote: But problably the spammers will soon change their tricks to different images which are more difficult to read :-( http://antispam.imp.ch/patches/patch-ocrtext On this topic, Nick FitzGerald mentioned this article, http://www.jgc.org/blog/2006/01/do-spammers-fear-ocr.html Sunday, January 15, 2006 Do spammers fear OCR? Nick FitzGerald recently sent me two sample spams that seem to indicate that some spammers fear that using images in place of words isn't enough. They've started to obscure their messages to prevent optical character recognition. I'm afraid they'll start using OCR themselves. One common trick to allow a web site to have an email address humanly readable but not harvestable is to put it in an image. That may not be so safe any more :-( Of course, they'd have to scan an awful lot of images in the hopes of finding an email address in any of them, so they may not find it worth the effort. ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
RE: [Mimedefang] Image validator/OCR SA plugin
Martin wrote: But problably the spammers will soon change their tricks to different images which are more difficult to read :-( http://antispam.imp.ch/patches/patch-ocrtext On this topic, Nick FitzGerald mentioned this article, http://www.jgc.org/blog/2006/01/do-spammers-fear-ocr.html Sunday, January 15, 2006 Do spammers fear OCR? Nick FitzGerald recently sent me two sample spams that seem to indicate that some spammers fear that using images in place of words isn't enough. They've started to obscure their messages to prevent optical character recognition. [...] ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
RE: [Mimedefang] Image validator/OCR SA plugin
Martin, I installed your plugin for testing, but found that it would not load correctly on my system, giving the error: [5631] dbg: plugin: loading Mail::SpamAssassin::Plugin::ocrtext from @INC [5631] warn: plugin: failed to create instance of plugin Mail::SpamAssassin::Pl ugin::ocrtext: Can't locate object method new via package Mail::SpamAssassin ::Plugin::ocrtext at (eval 28) line 1. To solve this, I changed the package definition in ocrtext.pm to be: package Mail::SpamAssassin::Plugin::ocrtext; The distributed version has package ocrtext; instead, so while the plugin is loaded from the .pm file correctly, it then can't find anything which is registered using the fully referenced name. Is there a test file available to demonstrate this working, or do I have to make one myself? Best Wishes, Paul. -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.385 / Virus Database: 268.4.1/312 - Release Date: 14/04/2006 ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
[Mimedefang] Image validator/OCR SA plugin
Hi all, This is just a little advertisement for my plugin which is now in a usable state and works very well. Anyone interested should keep an eye on it - it really helps with the image only spam we get today. But problably the spammers will soon change their tricks to different images which are more difficult to read :-( http://antispam.imp.ch/patches/patch-ocrtext Martin Martin Blapp, [EMAIL PROTECTED] [EMAIL PROTECTED] -- ImproWare AG, UNIXSP ISP, Zurlindenstrasse 29, 4133 Pratteln, CH Phone: +41 61 826 93 00 Fax: +41 61 826 93 01 PGP: finger -l [EMAIL PROTECTED] PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E -- ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
RE: [Mimedefang] Image validator/OCR SA plugin
Martin Blapp wrote: http://antispam.imp.ch/patches/patch-ocrtext That is unbelievably sweet. I remember a couple of years ago there was a virus that sent itself in a password-protected .zip file, with an image containing the password. OCR would have been useful... I could easily see MIMEDefang reading the password from the image and feeding it to the virus scanner. -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: [Mimedefang] Image validator/OCR SA plugin
On Fri, 2006-04-14 at 18:42 +0200, Martin Blapp wrote: Anyone interested should keep an eye on it - it really helps with the image only spam we get today. But problably the spammers will soon change their tricks to different images which are more difficult to read :-( Interesting... What's the performance like with this? How many messages do you scan per day with it? Richard ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: [Mimedefang] Image validator/OCR SA plugin
On Apr 14, 2006, at 9:42 AM, Martin Blapp wrote: Anyone interested should keep an eye on it - it really helps with the image only spam we get today. But problably the spammers will soon change their tricks to different images which are more difficult to read :-( I can see it now ... pretty soon, we'll be seeing spam in capcha form. ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: [Mimedefang] Image validator/OCR SA plugin
Interesting... What's the performance like with this? How many messages do you scan per day with it? It is rather fast. On a Pentium IV 3Ghz I can scan a average jpg/gif picture in 0,2 - 0,3 seconds. I've limited the scantime to 5 seconds per image, and I allow only three images to be scanned per mail. Of course this is user configurable. The greps here are just up to now, not a full day. grep hits= /var/log/maillog | wc -l 78050 grep X-Spam-Status: Yes /var/log/maillog | wc -l 48400 grep hits=.*SPAMPIC /var/log/maillog | wc -l 9572 grep X-Spam-Status: Yes.*hits=.*SPAMPIC /var/log/maillog | wc -l 9558 grep X-Spam-Status: Yes.*hits=.*SPAMPIC /var/log/maillog | grep HTML_IMAGE_ONLY | wc -l 9528 # grep HTML_IMAGE_ONLY /var/log/maillog | wc -l 35834 This means 60% of all mails we get are SPAM. More than 10% of the SPAM are some gif and jpg pictures advertizing for stocks and meds. But almost 45% of all mails match HTML_IMAGE_ONLY, so it's unusable at all. I even use lower scores for those rules now - which gives me less FPS: score HTML_IMAGE_ONLY_041.400 score HTML_IMAGE_ONLY_081.300 score HTML_IMAGE_ONLY_121.200 score HTML_IMAGE_ONLY_161.100 score HTML_IMAGE_ONLY_200.950 score HTML_IMAGE_ONLY_240.900 score HTML_IMAGE_ONLY_280.700 score HTML_IMAGE_ONLY_320.400 Martin ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
Re: [Mimedefang] Image validator/OCR SA plugin
# grep HTML_IMAGE_ONLY /var/log/maillog | wc -l 35834 This is wrong. It should have been # grep HTML_IMAGE_ONLY.*hits= /var/log/maillog | wc -l 17917 But almost 45% of all mails match HTML_IMAGE_ONLY, so it's unusable at all. I even use lower scores for those rules now - which gives me less FPS: 22% is still a lot ... Martin ___ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com http://lists.roaringpenguin.com/mailman/listinfo/mimedefang