Re: Stock spam in images
I'm having marvelous luck with FuzzyOCR - but the spammers are learning too. When I first started using it just a couple of months ago, it really whacked the image-based spam. You could see why when gocr file.gif returned nice text that was easy to match against. However, now is a different matter. I just got a lose weight spam 10 minutes ago that gocr returns as: lI__c_tc)r _rc_hc_rihc_Ll _cnLl .h1c_Llic_;cll_ _u__c_c __ihc LI l c htc)hlc_rc)c_c_ B llr_ll l hc r_cp_ _ t4 __cc_'un ic) __'ri_c _ hH3s, t_k _ ,r o_E,y _h K E,_ _ ,_ics r _ sncu)._r. t.ihk). lhirkrr x_)) ' gg __, r _ Krvc)_H t)r r_irk cct .__ _ O _' Y O ___ TE_ E _Lncl nLnn __ mc)R hnrtb That tells me to go to www.realhgh dot org , but their GIF processing munged it enough to slip by gocr Not much FuzzyOCR can do with that :-( -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +64 3 9635 377 Fax: +64 3 9635 417 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
RE: Stock spam in images
Title: RE: Stock spam in images Greetings list, The old timers on the list know I tend to try things outside the norm. Like my strong resistence to sitewide bayes. Well for months I've been using a simpler approach to these Stock Spams w/ images. I don't look at the image at all. Heresy I know, but thats the way I roll :) This goes back to my old philosophy of: One rule hit (either FP, FN, or legit) should not make a messege an FP, FN, or legit on its own. With that in mind, I wrote a series of 3-4 simple rules, scored them low, and watched the results. These are unpublished rules, and I'm not sure they are ready to be published just yet. But this is about the idea of what I'm doing. Simple example: Is there even an inline image attached? (note: I'm talking about a src="" here, not an attached image to the email!) Well if there is, why not add low points? Which is what I do. I actually score this at a crazy 1.5! Before you scream to the heavens that I'm nuts, let me continue. EVERYONE of these Stock image spams has hit mutiple rules. SARE rules, standard rules , and my 3-4 rules I wrote from finding the simple patterns in these spams. This is the key. Combined rule hits mark it as spam. I've yet to see a single FP caused by ONE of these rules. Sure, if a legit mail comes thru with a src="" it will hit the rule. But I've never seen one that hit the other rules and passed it over the marking threshold. This is not a knew idea by any means, but one that seems to be lost under new fangled fuzzyOCR. I think FuzzyOCR is wonderful. Imageinfo is great! But IMHO, wasting too many CPU cycles and energy. Spammers already trying animated gifs, and noise. I wanted to quietly give this method a try and it seems to be working beautifully. I say my rules aren't ready for publishing because for the public I'd like the rules to be tighter. Prbly used as metas to reduce FPs in general world usage. Anyway, I just wanted to say that sometimes the simple ways still work great! (Any spelling errors in this post are your fault!) Thanks, Chris Santerre SysAdmin and Spamfighter www.rulesemporium.com www.uribl.com
Re: Stock spam in images
Jason Haar wrote: I'm having marvelous luck with FuzzyOCR - but the spammers are learning too. When I first started using it just a couple of months ago, it really whacked the image-based spam. You could see why when gocr file.gif returned nice text that was easy to match against. However, now is a different matter. I just got a lose weight spam 10 minutes ago that gocr returns as: lI__c_tc)r _rc_hc_rihc_Ll _cnLl .h1c_Llic_;cll_ _u__c_c __ihc LI l c htc)hlc_rc)c_c_ B llr_ll l hc r_cp_ _ t4 __cc_'un ic) __'ri_c _ hH3s, t_k _ ,r o_E,y _h K E,_ _ ,_ics r _ sncu)._r. t.ihk). lhirkrr x_)) ' gg __, r _ Krvc)_H t)r r_irk cct .__ _ O _' Y O ___ TE_ E _Lncl nLnn __ mc)R hnrtb That tells me to go to www.realhgh dot org , but their GIF processing munged it enough to slip by gocr Not much FuzzyOCR can do with that :-( A few days ago, someone provided me with an image that returned garbage when using plain 'gocr file'. The trick to better detection is to adjust gocr's -l parameter to get better contrast (and better results). By looping 0...255 you will find a setting which will give you good results for this type of image, and if you start getting a lot of these images, adding another scanset will not add too many cpu cycles to your scan. This new setting will almost certainly give you better results with other images too, so unless you have a really overloaded system, adding another scanset will not 'break the bank'. -- Jorge Valdes
RE: Stock spam in images
For Debian Users I've found the follow link, a step by step guide in order to implement FuzzyOCR and ImageInfo with spamassassin. http://www200.pair.com/mecham/spam/image_spam.html Andrea
RE: Stock spam in images
Have been answered few threads ago and more... May be you didn't scan enough ^^ You can use FuzzyOCR module (But dont ask me how to use, I've never tried ^^) -Message d'origine- De : Dylan Bouterse [mailto:[EMAIL PROTECTED] Envoyé : lundi 2 octobre 2006 15:38 À : users@spamassassin.apache.org Objet : Stock spam in images I'm a newbie to the list and have been scanning recent posts to see if what I'm about to ask about has been covered but I haven't seen anything yet. Lately I have been getting more and more of the stock alert spam but now all the good info is in an image and typically following the image is random text to fool the Bayesian filter. I think the random text thing has been covered here recently. It's frustrating when sa is giving a -1.6 (or so) score to these emails right off the bat. Quite a few of these aren't even getting spam headers because they aren't scoring high enough. Is there some magical trick to help score these messages higher? Maybe a future version of sa will incorporate an OCR module? :) Dylan
RE: Stock spam in images
Dylan Bouterse wrote: I'm a newbie to the list and have been scanning recent posts to see if what I'm about to ask about has been covered but I haven't seen anything yet. Lately I have been getting more and more of the stock alert spam but now all the good info is in an image and typically following the image is random text to fool the Bayesian filter. I think the random text thing has been covered here recently. It's frustrating when sa is giving a -1.6 (or so) score to these emails right off the bat. Quite a few of these aren't even getting spam headers because they aren't scoring high enough. Is there some magical trick to help score these messages higher? Maybe a future version of sa will incorporate an OCR module? :) Dylan How about the FuzzyOCR plugin? That has been discussed quite a bit here recently. http://wiki.apache.org/spamassassin/FuzzyOcrPlugin -- Bowie
RE: Stock spam in images
-Original Message- From: Bowie Bailey [mailto:[EMAIL PROTECTED] Sent: Monday, October 02, 2006 9:46 AM To: users@spamassassin.apache.org Subject: RE: Stock spam in images Dylan Bouterse wrote: I'm a newbie to the list and have been scanning recent posts to see if what I'm about to ask about has been covered but I haven't seen anything yet. Lately I have been getting more and more of the stock alert spam but now all the good info is in an image and typically following the image is random text to fool the Bayesian filter. I think the random text thing has been covered here recently. It's frustrating when sa is giving a -1.6 (or so) score to these emails right off the bat. Quite a few of these aren't even getting spam headers because they aren't scoring high enough. Is there some magical trick to help score these messages higher? Maybe a future version of sa will incorporate an OCR module? :) Dylan How about the FuzzyOCR plugin? That has been discussed quite a bit here recently. http://wiki.apache.org/spamassassin/FuzzyOcrPlugin -- Bowie Thank you everyone for your responses! I will try the FuzzyOCR module. Dylan
RE: Stock spam in images
This has been covered so many times on this list. 1: if you're not on spamassassin 3.1.5 get it now, and run sa-update (via a cron job daily, but test first with a manual sa-update -D) 2: pop over to http://www.rulesemporium.com and get an appropriate selection of their rules, and configure Rules du Jour ( http://www.exit0.us/index.php?pagename=RulesDuJour ) to download them daily. 3: don't forget the additional rules here: http://www.rulesemporium.com/other-rules.htm I've found Fred's header rules helpful 4: add the ImageInfo plugin from http://www.rulesemporium.com/plugins.htm 5: if you want to be adventurous, make sure you have ImageMagick, ImageMagick-perl and other prerequisites installed and use the FuzzyOCR plugin ( latest version at http://www.joval.info/proj/FuzzyOcr.html , but see also http://wiki.apache.org/spamassassin/FuzzyOcrPlugin ). The FuzzyOCR mailing list is very helpful too. In my experience here a well-trained Bayes plus the various RulesEmporium rulesets gets most of them. Cheers, Phil -- Phil Randal Network Engineer Herefordshire Council Hereford, UK -Original Message- From: Dylan Bouterse [mailto:[EMAIL PROTECTED] Sent: 02 October 2006 14:38 To: users@spamassassin.apache.org Subject: Stock spam in images I'm a newbie to the list and have been scanning recent posts to see if what I'm about to ask about has been covered but I haven't seen anything yet. Lately I have been getting more and more of the stock alert spam but now all the good info is in an image and typically following the image is random text to fool the Bayesian filter. I think the random text thing has been covered here recently. It's frustrating when sa is giving a -1.6 (or so) score to these emails right off the bat. Quite a few of these aren't even getting spam headers because they aren't scoring high enough. Is there some magical trick to help score these messages higher? Maybe a future version of sa will incorporate an OCR module? :) Dylan
RE: Stock spam in images
Giampaolo Tomassoni wrote: And, by the way, it seems to work! Actually, the only limit I see is the own-made FuzzyOcr.words (and, maybe, the fact that script text may probably get undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images. Am I wrong? I think so. Some of the words would be perfectly legitimate in the text of emails but rarely found in attached legitimate images. Quite apart from the fact that Spamassassin isn't designed for reinjection. Cheers, Phil -- Phil Randal Network Engineer Herefordshire Council Hereford, UK
Re: Stock spam in images
On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote: undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images. I think so. Some of the words would be perfectly legitimate in the text of emails but rarely found in attached legitimate images. Quite apart from the fact that Spamassassin isn't designed for reinjection. FWIW, 3.2 adds in support to have rendering of non-text parts. So a plugin could, for instance, OCR text from an image, and then the normal body rules and such would be able to use that information. -- Randomly Selected Tagline: ... and now we have a parallelogram, or at least we would if I could draw. - Prof. Farr pgp0DlEmXyPiF.pgp Description: PGP signature
RE: Stock spam in images
Too bad, cause I agree with Giampaolo, it would be great. What about making a plugin including OCR components but instead of using inner dictionnary, passing it back to spamassassin through the MTA... Yeah, I know, the load will increase ... But that would be nice ? ... ... Ok,I go back to sleep -Message d'origine- De : Randal, Phil [mailto:[EMAIL PROTECTED] Envoyé : lundi 2 octobre 2006 16:19 À : users@spamassassin.apache.org Objet : RE: Stock spam in images Giampaolo Tomassoni wrote: And, by the way, it seems to work! Actually, the only limit I see is the own-made FuzzyOcr.words (and, maybe, the fact that script text may probably get undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images. Am I wrong? I think so. Some of the words would be perfectly legitimate in the text of emails but rarely found in attached legitimate images. Quite apart from the fact that Spamassassin isn't designed for reinjection. Cheers, Phil -- Phil Randal Network Engineer Herefordshire Council Hereford, UK
Re: Stock spam in images
Theo Van Dinter wrote: On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote: undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images. I think so. Some of the words would be perfectly legitimate in the text of emails but rarely found in attached legitimate images. Quite apart from the fact that Spamassassin isn't designed for reinjection. FWIW, 3.2 adds in support to have rendering of non-text parts. So a plugin could, for instance, OCR text from an image, and then the normal body rules and such would be able to use that information. Would it also be possible to create a rule that matches on text rendered specifically from a non-text part and not the whole body? That way you could get the benefit of Bayes and existing body rules in the general case while still taking advantage of the fact the certain words in an image have more spammy-weight than the same words in text.
Re: Stock spam in images
Stuart Johnston wrote: Theo Van Dinter wrote: On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote: undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images. I think so. Some of the words would be perfectly legitimate in the text of emails but rarely found in attached legitimate images. Quite apart from the fact that Spamassassin isn't designed for reinjection. FWIW, 3.2 adds in support to have rendering of non-text parts. So a plugin could, for instance, OCR text from an image, and then the normal body rules and such would be able to use that information. Would it also be possible to create a rule that matches on text rendered specifically from a non-text part and not the whole body? That way you could get the benefit of Bayes and existing body rules in the general case while still taking advantage of the fact the certain words in an image have more spammy-weight than the same words in text. Or perhaps: tflags RULE_NAME ocr /Andreas
RE: Stock spam in images
You'd need some clever rules... As an example, the word stock is perfectly valid in emails, but if you found it in an attached image you'd be pretty sure it was spam. So you'd need two sets of rules anyhow. It looks like SA 3.2 will let us do that in a sane manner. Phil -- Phil Randal Network Engineer Herefordshire Council Hereford, UK -Original Message- From: Fabien GARZIANO [mailto:[EMAIL PROTECTED] Sent: 02 October 2006 16:11 To: users@spamassassin.apache.org Subject: RE: Stock spam in images Too bad, cause I agree with Giampaolo, it would be great. What about making a plugin including OCR components but instead of using inner dictionnary, passing it back to spamassassin through the MTA... Yeah, I know, the load will increase ... But that would be nice ? ... ... Ok,I go back to sleep -Message d'origine- De : Randal, Phil [mailto:[EMAIL PROTECTED] Envoyé : lundi 2 octobre 2006 16:19 À : users@spamassassin.apache.org Objet : RE: Stock spam in images Giampaolo Tomassoni wrote: And, by the way, it seems to work! Actually, the only limit I see is the own-made FuzzyOcr.words (and, maybe, the fact that script text may probably get undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images. Am I wrong? I think so. Some of the words would be perfectly legitimate in the text of emails but rarely found in attached legitimate images. Quite apart from the fact that Spamassassin isn't designed for reinjection. Cheers, Phil -- Phil Randal Network Engineer Herefordshire Council Hereford, UK
RE: Stock spam in images
Newbie is a derogatory term and to call yourself a newbie is like calling yourself a moron(no offense). From Wiki: A newbie is a newcomer to a particular field, the term being commonly used on the Internet, where it might refer to new, inexperienced, or ignorant users of a game, a newsgroup, an operating system or the Internet itself. The term is generally regarded as an insult, although in many cases more experienced/knowledgeable people use it in purposes of negative reinforcement, urging newbies to learn more about the field or area in question. Sorry just had to say it.. Was bugging me. :) -Original Message- From: Dylan Bouterse [mailto:[EMAIL PROTECTED] Sent: Monday, October 02, 2006 9:38 AM To: users@spamassassin.apache.org Subject: Stock spam in images I'm a newbie to the list and have been scanning recent posts to see if what I'm about to ask about has been covered but I haven't seen anything yet. Lately I have been getting more and more of the stock alert spam but now all the good info is in an image and typically following the image is random text to fool the Bayesian filter. I think the random text thing has been covered here recently. It's frustrating when sa is giving a -1.6 (or so) score to these emails right off the bat. Quite a few of these aren't even getting spam headers because they aren't scoring high enough. Is there some magical trick to help score these messages higher? Maybe a future version of sa will incorporate an OCR module? :) Dylan
RE: Stock spam in images
...omissis... How about the FuzzyOCR plugin? That has been discussed quite a bit here recently. http://wiki.apache.org/spamassassin/FuzzyOcrPlugin -- Bowie And, by the way, it seems to work! Actually, the only limit I see is the own-made FuzzyOcr.words (and, maybe, the fact that script text may probably get undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images. Am I wrong? Probably not... Just wish there was a compiled version for windows... ImageInfo also works well for the image spam. Check www.rulesemporium.com for that. ImageInfo is also less CPU overhead... Bret
Re: Stock spam in images
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter wrote: On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote: undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images. I think so. Some of the words would be perfectly legitimate in the text of emails but rarely found in attached legitimate images. Quite apart from the fact that Spamassassin isn't designed for reinjection. FWIW, 3.2 adds in support to have rendering of non-text parts. So a plugin could, for instance, OCR text from an image, and then the normal body rules and such would be able to use that information. This sounds great. Once I am back to continue the developing process of FuzzyOcr, I might add an option to pass the text back to SA. Combined with a new, more precise OCR engine like tesseract, this will probably work very well. Unfortunately, there is currently a lot of picture spam being sent around which won't be caught at all by FuzzyOcr because they use new obfuscation technics with animated gifs etc and I don't have the time atm to adjust the plugin to these... Best regards Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFIVIfJQIKXnJyDxURAlIlAKCCcaD5O43KmvAHUxcew85d7cE82wCgwbGG NAd6j8vgv1pvV9zVBN+5oqE= =LB3n -END PGP SIGNATURE-
Re: Stock spam in images
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Randal, Phil wrote: This has been covered so many times on this list. 1: if you're not on spamassassin 3.1.5 get it now, and run sa-update (via a cron job daily, but test first with a manual sa-update -D) 2: pop over to http://www.rulesemporium.com and get an appropriate selection of their rules, and configure Rules du Jour ( http://www.exit0.us/index.php?pagename=RulesDuJour ) to download them daily. 3: don't forget the additional rules here: http://www.rulesemporium.com/other-rules.htm I've found Fred's header rules helpful 4: add the ImageInfo plugin from http://www.rulesemporium.com/plugins.htm 5: if you want to be adventurous, make sure you have ImageMagick, ImageMagick-perl and other prerequisites installed and use the FuzzyOCR plugin ( latest version at http://www.joval.info/proj/FuzzyOcr.html , but see also http://wiki.apache.org/spamassassin/FuzzyOcrPlugin ). The FuzzyOCR mailing list is very helpful too. What do you mean with adventurous? Those versions published by joval are all devel. The stable version is available at http://users.own-hero.net/~decoder/fuzzyocr/ and works fine. There is nothing adventurous about them and the prerequisites are also lower than for the devel stuff. I am simply not able to continue development at the moment, but maybe in a few weeks, I'll start again. Best regards, Chris In my experience here a well-trained Bayes plus the various RulesEmporium rulesets gets most of them. Cheers, Phil -- Phil Randal Network Engineer Herefordshire Council Hereford, UK -Original Message- From: Dylan Bouterse [mailto:[EMAIL PROTECTED] Sent: 02 October 2006 14:38 To: users@spamassassin.apache.org Subject: Stock spam in images I'm a newbie to the list and have been scanning recent posts to see if what I'm about to ask about has been covered but I haven't seen anything yet. Lately I have been getting more and more of the stock alert spam but now all the good info is in an image and typically following the image is random text to fool the Bayesian filter. I think the random text thing has been covered here recently. It's frustrating when sa is giving a -1.6 (or so) score to these emails right off the bat. Quite a few of these aren't even getting spam headers because they aren't scoring high enough. Is there some magical trick to help score these messages higher? Maybe a future version of sa will incorporate an OCR module? :) Dylan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFIVpDJQIKXnJyDxURAoTiAJ0SS12lfncMkv/vaLpPX2dscSMkWwCfftby uosbxGicE+jBtHgaYCd0Klc= =RRVE -END PGP SIGNATURE-
Re: Stock spam in images
On Mon, Oct 02, 2006 at 11:05:38AM -0500, Stuart Johnston wrote: Would it also be possible to create a rule that matches on text rendered specifically from a non-text part and not the whole body? That way you You'd have to do that in a plugin, but otherwise, sure. There's currently no method to have a body rule specify the content-types that it tries to get matched against. -- Randomly Selected Tagline: * Do not remove this tagline under penalty of the law * pgpkmVE9xV9Qr.pgp Description: PGP signature
RE: Stock spam in images
-Original Message- From: Randal, Phil [mailto:[EMAIL PROTECTED] Sent: Monday, October 02, 2006 3:58 AM To: Dylan Bouterse; users@spamassassin.apache.org Subject: RE: Stock spam in images This has been covered so many times on this list. 1: if you're not on spamassassin 3.1.5 get it now, and run sa-update (via a cron job daily, but test first with a manual sa-update -D) 2: pop over to http://www.rulesemporium.com and get an appropriate selection of their rules, and configure Rules du Jour ( http://www.exit0.us/index.php?pagename=RulesDuJour ) to download them daily. [Wilson] Does RulesDuJour support an auto update for Step #4 (ImageInfo.cf)? 3: don't forget the additional rules here: http://www.rulesemporium.com/other-rules.htm I've found Fred's header rules helpful 4: add the ImageInfo plugin from http://www.rulesemporium.com/plugins.htm [Wilson] # Install (From ImageInfo.pm): # 1) place ruleset in your local config dir # 2) place plugin in your plugins dir # 3) add to init.pre (or v310.pre) the following line # loadplugin Mail::SpamAssassin::Plugin::ImageInfo # or if not in plugin dir.. # loadplugin Mail::SpamAssassin::Plugin::ImageInfo /path/to/plugin #4) restart spamd (if necessary) For installing the ImageInfo plugin where do you put the ImageInfo.pm without defining a path? Im running CentOS4.4 Fedora Core 5 as test machines. Thanks! Wilson
RE: Stock spam in images
On Tue, October 3, 2006 00:01, Gary V wrote: For installing the ImageInfo plugin where do you put the ImageInfo.pm without defining a path? Im running CentOS4.4 Fedora Core 5 as test machines. This should find your Plugin directory (which is where you place it): find /usr -type d -name Plugin remember to install the plugin again after a rpm update of new perl version thats why its better to use /etc/mail/spamassassin/ as plugin dir, and use the path in local.pre file to load the plugin with full path to the perl module -- This message was sent using 100% recycled spam mails.