test my bleeding edge broken code. with your finger!
two bits of sa related code i've written, neither of them are what i'd particularly call polished, but if you feel like firing them up, i'd love to hear your feedback: Phisher: http://www.faisal.com/software/phisher/ This is a plugin that does nothing more complicated than check for the case of something like a href=http://scam.ru;www.paypal.com/ a. I've run it on and off since August of last year, although most of the time was not after 3.1.1 (which is why I only claim it works on 3.1). I don't have a suggested score for it (would love feedback there). I ran it at .1 mostly to see how much it triggered and fp'd (not much, as it turns out. I know this has been a problem in the past, so I'm wondering if the normalization code helps there, or I've just been lucky). As noted, this has some rewrite bits coming when I get some time. sa-harvest: http://www.faisal.com/software/sa-harvest/ This is a script that does several obvious things and one possibly not-so-obvious thing: - You configure it, telling it what your spam and ham folders are, and after that it will automatically train whenever you invoke it, without having to explicitly configure folders to scan (I find this useful for cron jobs, and less typing when I'm doing the same obvious thing every couple days). - It also scans your ham boxes and automatically rebuilds your whitelist based on the contents of presumed food folders (this will mangle your user_prefs. READ THE DOCS ON HOW THIS WORKS SO YOU DON'T LOSE OTHER SETTINGS.) I've been using variants of this script since about a week after the first SA with training came out. I finally generalized it a little last month, and have been running it nightly via cron ever since. Feedback would be greatly appreciated. -faisal
Strange Score
Look at this, X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on chrysalis.chrysalishosting.comX-Spam-Level: X-Spam-Status: No, score= 4.3 required=5.0 tests=BAYES_50,HG_HORMONE, HTML_40_50,HTML_MESSAGE,J_CHICKENPOX_43,J_CHICKENPOX_55,OFFER, SPECIAL_OFFER,UNPARSEABLE_RELAY autolearn=no version=3.1.4Received: from localhost by chrysalis.chrysalishosting.com with SpamAssassin (version 3.1.4); Fri, 25 Aug 2006 01:45:43 -0500The score is off. It flagged the message as {Spam?} as it should, because the required score is 5. XSpam level shows 5 stars, but the line below says it got a spam score of 4.3What gives?
Re: Strange Score
On 8/25/2006 2:59 AM, Christopher Mills wrote: Look at this, X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on chrysalis.chrysalishosting.com http://chrysalis.chrysalishosting.com X-Spam-Level: X-Spam-Status: No, score= 4.3 required=5.0 tests=BAYES_50,HG_HORMONE, HTML_40_50,HTML_MESSAGE,J_CHICKENPOX_43,J_CHICKENPOX_55,OFFER, SPECIAL_OFFER,UNPARSEABLE_RELAY autolearn=no version=3.1.4 Received: from localhost by chrysalis.chrysalishosting.com http://chrysalis.chrysalishosting.com with SpamAssassin (version 3.1.4); Fri, 25 Aug 2006 01:45:43 -0500 The score is off. It flagged the message as {Spam?} as it should, because the required score is 5. XSpam level shows 5 stars, but the line below says it got a spam score of 4.3 Unless I'm mistaken, there's only 4 stars. What gives? It looks like it was scanned once, found to be spam, and then scanned again, and found not to be spam. Daryl
Re: Broken images in mails
Adding a point for corrupted images is sounding better and better. I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I have no idea what is wrong and how it could be fixed. Only this: a GIF file seems to be divided into several blocks. Perhaps one block (perhaps the last block) is too short and does not match to its block header (if any exists?). Perhaps it is possible to read out the correct block length from a header and fill the block with 00h to get a valid GIF file. Ah... I just found that there is a program named GIFFIX. I should try it out. -- View this message in context: http://www.nabble.com/Broken-images-in-mails-tf2071676.html#a5978451 Sent from the SpamAssassin - Users forum at Nabble.com.
Re: Strange Score
On 8/25/2006 3:06 AM, Christopher Mills wrote: You're right...time to change my glasses, BUT, it is flagging the message as SPAM when the score has not yet reached the required 5.0, any clues as to why that is so? Unless you're invoking SpamAssassin with the -t option, I highly doubt that this is what is happening. It's far more likely that, like I said before, you're scanning your mail twice. Daryl
Re: False positives and Bayes
Hi, Justin Lloyd wrote: Hello, all. A couple of months ago I built new mail servers to replace our existing ones that had aging mail configurations (and disparate OS configurations), running sendmail 8.12.6 and SA 3.0.2. Our configuration now consists of 2 RHEL 4 ES servers that share the load using DNS round-robin, running sendmail 8.13.7 and SpamAssassin 3.1.3, and we are running sa-update and rulesdujour nightly (though actual updates are rare). We use spamass-milter 0.31, which we have configured to drop spams with scores = 10, thereby dropping about 75% of the incoming email before it gets to our Exchange servers. Speaking of which, these servers do not deliver mail locally, rather all received mail either goes to internal MS Exchange servers or Linux helpdesk and mailing list servers. Also, our company is about 350 people and we receive a good deal of legitimate international email. Here is our SA configuration from /etc/mail/spamassassin/local.cf: required_score 5 rewrite_header Subject *** SPAM [_SCORE_] *** report_safe 0 dcc_path /usr/local/bin/dccproc razor_config /etc/mail/spamassassin/.razor/razor-agent.conf dns_available yes bayes_path /localhost/home/spamd/bayes bayes_auto_learn_threshold_spam 30 bayes_auto_learn_threshold_nonspam -0.1 bayes_min_ham_num 10 bayes_min_spam_num 10 auto_whitelist_path /localhost/home/spamd/auto-whitelist include /etc/mail/spamassassin/whitelist include /etc/mail/spamassassin/blacklist Here are the statistics from both mail servers for the past 31 days: Email: 1303815 Autolearn: 608540 AvgScore: 12.23 AvgScanTime: 1.38 sec Spam:745609 Autolearn: 139632 AvgScore: 23.36 AvgScanTime: 1.52 sec Ham: 558206 Autolearn: 468908 AvgScore: -2.63 AvgScanTime: 1.20 sec Email: 945103 Autolearn: 284139 AvgScore: 15.33 AvgScanTime: 1.46 sec Spam:701327 Autolearn: 131994 AvgScore: 22.30 AvgScanTime: 1.46 sec Ham: 243776 Autolearn: 152145 AvgScore: -4.74 AvgScanTime: 1.44 sec (We think the disparity in mail counts between the two is due to some senders having cached or hard-coded the first one's IP address and using it rather than MX lookups like normal people do.) The major problem we are seeing is a number of false positives in the 6-8 point range due to 3.5 points from BAYES_99 on messages that should not be hitting that rule from what we can see. One thing we've noticed is that many such messages are from mailing lists and newsletters and from ISPs that shall remain nameless, though many of these also score high due to several rfc-ignorant rules being hit. We have turned off Bayes in the past (before the upgrade) and are debating doing so again, but first we decided to see what constructive criticism and advice the SA community may have regarding our configuration. Please let me know if any additional information would be useful. How do you train your Bayes database? You should be feeding the false positives back using sa-learn as ham, so that the Bayes scorer learns that these are not spam. I manually train Bayes with false positives and false negatives on a regular basis. You probably should also be looking at whitelisting some of the mailing lists. When the manual training really doesn't convinve Bayes that the spammy looking maling lists messages are ham I add those lists to one of the whitelists. -- Anthony Peacock CHIME, Royal Free University College Medical School WWW:http://www.chime.ucl.ac.uk/~rmhiajp/ If you have an apple and I have an apple and we exchange apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas. -- George Bernard Shaw
Re: Broken images in mails
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Plenz wrote: Adding a point for corrupted images is sounding better and better. I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I have no idea what is wrong and how it could be fixed. Only this: a GIF file seems to be divided into several blocks. Perhaps one block (perhaps the last block) is too short and does not match to its block header (if any exists?). Perhaps it is possible to read out the correct block length from a header and fill the block with 00h to get a valid GIF file. Ah... I just found that there is a program named GIFFIX. I should try it out. FuzzyOcr will try to invoke Giffix if an image is broken. If giffix does not completely fail, then it will only give a low score for the picture being corrupted. If it isn't able to fix the image at all, then it will give a higher score. Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE7sVkJQIKXnJyDxURAv29AJ9i/LjlLx1me4TZiwRrSuD0KasBYQCfagl2 95Nt5kXjo3v+WO7i2jngnCk= =XN3X -END PGP SIGNATURE-
Re: Strange Score
From: Christopher Mills [EMAIL PROTECTED] Look at this, X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on chrysalis.chrysalishosting.com X-Spam-Level: 12345 X-Spam-Status: No, score=4.3 required=5.0 tests=BAYES_50,HG_HORMONE, HTML_40_50,HTML_MESSAGE,J_CHICKENPOX_43,J_CHICKENPOX_55,OFFER, SPECIAL_OFFER,UNPARSEABLE_RELAY autolearn=no version=3.1.4 Received: from localhost by chrysalis.chrysalishosting.com with SpamAssassin (version 3.1.4); Fri, 25 Aug 2006 01:45:43 -0500 The score is off. It flagged the message as {Spam?} as it should, because the required score is 5. XSpam level shows 5 stars, but the line below says it got a spam score of 4.3 What gives? Er, a recount is called for in Florida, again. {^_-}
Problem with conf
Hello, My local.cf is like that : required_hits 5.0 add_header all Report _REPORT_ rewrite_header Subject 1 add_header spam Flag _YESNOCAPS_ add_header all Checker-Version SpamAssassin _VERSION_ (_SUBVERSION_) on _HOSTNAME_ add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_ add_header all Level _STARS(*)_ dns_available yes dcc_add_header 0 skip_rbl_checks 0 bayes_auto_learn 1 use_bayes 1 bayes_path /var/qmail/spamassassin/ auto_whitelist_path /var/qmail/spamassassin/auto_whitelist use_pyzor 1 use_razor2 1 It works 'cause if i change required_hits it works, but, the add_header directive is not working, I still have the same headers all the time : X-Spam-Status: No, hits=1.1 required=5.0 X-Spam-Level: + But it should be more complete than that no? why are the add_header not used? Thanks, -- Jean Wrack Respen [EMAIL PROTECTED] http://www.wrackweb.net
Re: Filtering spam in national language
I work in an italian company, we are receiving some spam written in (very bad) italian language, obviously produced by some automatic translator. Although their content is heavily pornographic, the spam score is very low, because they don't match any of the porn-specific rules, which are designed only for english language. Does anybody know how can we extend the basic rules to add support for italian language pornografic spam? There are a number of language-specific rulesets on http://wiki.apache.org/spamassassin/CustomRulesets , such as the http://www.ccert.edu.cn/spam/sa/Chinese_rules.cf ruleset, written to catch spam written in Chinese. I would suggest that this is the best approach -- maintain your own ruleset to catch them... writing SpamAssassin rules is quite easy ;) --j.
FP with Outook SMTPing to Lotus Domino
Hi, I just spotted this FP in our SA 3.1.4 quarantine... I have no means to contact the sender, but I guess he used an Outlook (Express?) client to SMTP a Domino server. Even if we had the threshold at the default 5 it would have been stopped. Is there a workaround on the rules or should I decrease some scores?! Moreover PRIORITY_NO_NAME is not listed in http://spamassassin.apache.org/tests_3_1_x.html but is present in my 20_head_tests.cf (require_version 3.001004). TIA, Paolo X-Spam-Status: Yes, score=5.091 tag=-999 tag2=3.5 kill=3.5 tests=[BAYES_00=-2.599, HTML_40_50=0.496, HTML_MESSAGE=0.001, MSGID_DOLLARS=1.716, PRIORITY_NO_NAME=2.7, RATWARE_OUTLOOK_NONAME=2.777] Received: from smarthost02.ISP.it (smarthost02.ISP.it [xxx.yyy.zzz.nnn]) by MYamavisSERVER.it (Postfix) with ESMTP id 777AD5840A5; Fri, 25 Aug 2006 09:12:11 +0200 (CEST) Received: from relay03.portal ([192.168.bbb.aaa]) by smarthost02.ISP.it (Lotus Domino Release 6.5.1) with ESMTP id 2006082509000547-2363 ; Fri, 25 Aug 2006 09:00:05 +0200 Received: from acme ([xxx.yyy.zzz.mmm]) by relay03.portal (Lotus Domino Release 6.5.1) with ESMTP id 2006082509004734-2554 ; Fri, 25 Aug 2006 09:00:47 +0200 Message-ID: [EMAIL PROTECTED] From: RFC2822 COMPLIANT [EMAIL PROTECTED] To: RFC2822 COMPLIANT Subject: RFC2822 COMPLIANT Date: Fri, 25 Aug 2006 09:11:30 +0200 MIME-Version: 1.0 X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1807 Content-Type: multipart/alternative; boundary==_NextPart_000_0005_01C6C826.73BE4680
Re: sa-learn -q patch in FreeBSD
Mark Martinec writes: Vivek Khera wrote: in the current port for 3.1.4, there are no freebsd-specific patches to SA, so whatever this was is no longer there. You are one day behind :) On Aug 23, 2006, at 5:01 PM, Justin Mason wrote: anyone know what this is/does? http://cia.navi.cx/stats/project/FreeBSD/.message/32ba98d/xml No idea why it is there, but apparently adds option -q (=quiet) to sa-learn and to spamassassin, suppressing the: print $phrase tokens from $learnedcount message(s) ($messagecount message(s) examined)\n and the: print $count message(s) examined.\n well, if the FreeBSD people want to submit that patch upstream, we'd appreciate it. Forks between platforms are not a good thing. --j.
RE: FP with Outook SMTPing to Lotus Domino
You might wish to look at tweaking your BAYES_xx scores to reduce false positives. I guess that depends on how healthy your Bayes database is, though. Cheers, Phil -- Phil Randal Network Engineer Herefordshire Council Hereford, UK -Original Message- From: Paolo Cravero as2594 [mailto:[EMAIL PROTECTED] Sent: 25 August 2006 11:27 To: SpamAssassin Users Subject: FP with Outook SMTPing to Lotus Domino Hi, I just spotted this FP in our SA 3.1.4 quarantine... I have no means to contact the sender, but I guess he used an Outlook (Express?) client to SMTP a Domino server. Even if we had the threshold at the default 5 it would have been stopped. Is there a workaround on the rules or should I decrease some scores?! Moreover PRIORITY_NO_NAME is not listed in http://spamassassin.apache.org/tests_3_1_x.html but is present in my 20_head_tests.cf (require_version 3.001004). TIA, Paolo X-Spam-Status: Yes, score=5.091 tag=-999 tag2=3.5 kill=3.5 tests=[BAYES_00=-2.599, HTML_40_50=0.496, HTML_MESSAGE=0.001, MSGID_DOLLARS=1.716, PRIORITY_NO_NAME=2.7, RATWARE_OUTLOOK_NONAME=2.777] Received: from smarthost02.ISP.it (smarthost02.ISP.it [xxx.yyy.zzz.nnn]) by MYamavisSERVER.it (Postfix) with ESMTP id 777AD5840A5; Fri, 25 Aug 2006 09:12:11 +0200 (CEST) Received: from relay03.portal ([192.168.bbb.aaa]) by smarthost02.ISP.it (Lotus Domino Release 6.5.1) with ESMTP id 2006082509000547-2363 ; Fri, 25 Aug 2006 09:00:05 +0200 Received: from acme ([xxx.yyy.zzz.mmm]) by relay03.portal (Lotus Domino Release 6.5.1) with ESMTP id 2006082509004734-2554 ; Fri, 25 Aug 2006 09:00:47 +0200 Message-ID: [EMAIL PROTECTED] From: RFC2822 COMPLIANT [EMAIL PROTECTED] To: RFC2822 COMPLIANT Subject: RFC2822 COMPLIANT Date: Fri, 25 Aug 2006 09:11:30 +0200 MIME-Version: 1.0 X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1807 Content-Type: multipart/alternative; boundary==_NextPart_000_0005_01C6C826.73BE4680
Re: How to whitelist_from ?
Matt Kettler wrote: Philip Prindeville wrote: There's no way to whitelist just the empty address then? Rather than everything? -Philip Not given the simple file-glob format of the whitelist commands. You'd need a regular expression and negation. You could do it with a rule... header __NULL_RETURN From !~ /./i header __RCVD_MYHOST Received =~ /insert Received header regex matching your servers exchanging../ meta MY_NULL_RETURN (__NULL_RETURN __RCVD_MYHOST) How about modifying the source to accept some sort of notation for an empty address in whitelist_from_rcvd? -Philip
Re: FP with Outook SMTPing to Lotus Domino
Randal, Phil wrote: You might wish to look at tweaking your BAYES_xx scores to reduce false positives. I guess that depends on how healthy your Bayes database is, though. Can't really say how healthy it is. 99% of spam (guessing, but pretty close) is in English language, 99% of our ham is in Italian language. Spam in Italian is so rare (so far!) that I had to write custom rules to catch specific spam, because Bayes wouldn't hit hard enough after several training rounds. So... our Bayes is probably highly unbalanced due to the nature of our traffic and spam. Am I right? Any workaround? Paolo
Re: Strange Score
Matt Kettler wrote: Christopher Mills wrote: Look at this, X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on chrysalis.chrysalishosting.com http://chrysalis.chrysalishosting.com X-Spam-Level: X-Spam-Status: No, score= 4.3 required=5.0 tests=BAYES_50,HG_HORMONE, HTML_40_50,HTML_MESSAGE,J_CHICKENPOX_43,J_CHICKENPOX_55,OFFER, SPECIAL_OFFER,UNPARSEABLE_RELAY autolearn=no version=3.1.4 Received: from localhost by chrysalis.chrysalishosting.com http://chrysalis.chrysalishosting.com with SpamAssassin (version 3.1.4); Fri, 25 Aug 2006 01:45:43 -0500 The score is off. It flagged the message as {Spam?} as it should, because the required score is 5. XSpam level shows 5 stars, but the line below says it got a spam score of 4.3 Erm, I count 4 stars, not 5. As for the spam tag in the subject, are you sure this message wasn't scanned twice (possibly by the sender)? If you scan a message twice, only the second set of X-Spam-* headers is present, but any other changes from the first scan still hang around. I have to say, the first 3 times I read this message, I counted 5 stars too. Really strange.. if you look at it long enough you can see a guy in a boat fishing in the middle of the ocean! -Jim
RE: False positives and Bayes
We have an Exchange SpamAssassin folder that our users can drop false negatives into. Then I periodically run a Perl script (using Mail::IMAPClient) to retrieve the messages and retrain both mail servers with those (not just the mail server through which the message arrived). Whenever I receive a report of a false positive, I generally visit the user and review the message, in case there is some other problem that could be resolved or to determine if whitelisting would be appropriate, before having them put it in another Exchange folder, which I then use to retrain both mail servers as well. As for the mailing lists, I generally have been avoiding whitelisting those and instead trying to rely on retraining to get such messages to not get tagged on their own merits. So far it seems to be working. False positives on personal emails are more of an issue for us than those from mailing lists. Justin -Original Message- From: Anthony Peacock [mailto:[EMAIL PROTECTED] Sent: Friday, August 25, 2006 2:25 AM To: users@spamassassin.apache.org Subject: Re: False positives and Bayes Hi, Justin Lloyd wrote: Hello, all. A couple of months ago I built new mail servers to replace our existing ones that had aging mail configurations (and disparate OS configurations), running sendmail 8.12.6 and SA 3.0.2. Our configuration now consists of 2 RHEL 4 ES servers that share the load using DNS round-robin, running sendmail 8.13.7 and SpamAssassin 3.1.3, and we are running sa-update and rulesdujour nightly (though actual updates are rare). We use spamass-milter 0.31, which we have configured to drop spams with scores = 10, thereby dropping about 75% of the incoming email before it gets to our Exchange servers. Speaking of which, these servers do not deliver mail locally, rather all received mail either goes to internal MS Exchange servers or Linux helpdesk and mailing list servers. Also, our company is about 350 people and we receive a good deal of legitimate international email. Here is our SA configuration from /etc/mail/spamassassin/local.cf: required_score 5 rewrite_header Subject *** SPAM [_SCORE_] *** report_safe 0 dcc_path /usr/local/bin/dccproc razor_config /etc/mail/spamassassin/.razor/razor-agent.conf dns_available yes bayes_path /localhost/home/spamd/bayes bayes_auto_learn_threshold_spam 30 bayes_auto_learn_threshold_nonspam -0.1 bayes_min_ham_num 10 bayes_min_spam_num 10 auto_whitelist_path /localhost/home/spamd/auto-whitelist include /etc/mail/spamassassin/whitelist include /etc/mail/spamassassin/blacklist Here are the statistics from both mail servers for the past 31 days: Email: 1303815 Autolearn: 608540 AvgScore: 12.23 AvgScanTime: 1.38 sec Spam:745609 Autolearn: 139632 AvgScore: 23.36 AvgScanTime: 1.52 sec Ham: 558206 Autolearn: 468908 AvgScore: -2.63 AvgScanTime: 1.20 sec Email: 945103 Autolearn: 284139 AvgScore: 15.33 AvgScanTime: 1.46 sec Spam:701327 Autolearn: 131994 AvgScore: 22.30 AvgScanTime: 1.46 sec Ham: 243776 Autolearn: 152145 AvgScore: -4.74 AvgScanTime: 1.44 sec (We think the disparity in mail counts between the two is due to some senders having cached or hard-coded the first one's IP address and using it rather than MX lookups like normal people do.) The major problem we are seeing is a number of false positives in the 6-8 point range due to 3.5 points from BAYES_99 on messages that should not be hitting that rule from what we can see. One thing we've noticed is that many such messages are from mailing lists and newsletters and from ISPs that shall remain nameless, though many of these also score high due to several rfc-ignorant rules being hit. We have turned off Bayes in the past (before the upgrade) and are debating doing so again, but first we decided to see what constructive criticism and advice the SA community may have regarding our configuration. Please let me know if any additional information would be useful. How do you train your Bayes database? You should be feeding the false positives back using sa-learn as ham, so that the Bayes scorer learns that these are not spam. I manually train Bayes with false positives and false negatives on a regular basis. You probably should also be looking at whitelisting some of the mailing lists. When the manual training really doesn't convinve Bayes that the spammy looking maling lists messages are ham I add those lists to one of the whitelists. -- Anthony Peacock CHIME, Royal Free University College Medical School WWW:http://www.chime.ucl.ac.uk/~rmhiajp/ If you have an apple and I have an apple and we exchange apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas. -- George Bernard Shaw
Re: Strange Score
On Friday, Aug 25th 2006 at 10:25 -0400, quoth Jim Maul: =Matt Kettler wrote: = Christopher Mills wrote: = Look at this, = = X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on = chrysalis.chrysalishosting.com http://chrysalis.chrysalishosting.com = X-Spam-Level: = X-Spam-Status: No, score= 4.3 required=5.0 tests=BAYES_50,HG_HORMONE, = HTML_40_50,HTML_MESSAGE,J_CHICKENPOX_43,J_CHICKENPOX_55,OFFER, = SPECIAL_OFFER,UNPARSEABLE_RELAY autolearn=no version=3.1.4 = Received: from localhost by chrysalis.chrysalishosting.com = http://chrysalis.chrysalishosting.com = with SpamAssassin (version 3.1.4); = Fri, 25 Aug 2006 01:45:43 -0500 = = The score is off. It flagged the message as {Spam?} as it should, = because the required score is 5. = XSpam level shows 5 stars, but the line below says it got a spam score = of 4.3 = = Erm, I count 4 stars, not 5. = = As for the spam tag in the subject, are you sure this message wasn't = scanned twice (possibly by the sender)? If you scan a message twice, = only the second set of X-Spam-* headers is present, but any other = changes from the first scan still hang around. = = = = =I have to say, the first 3 times I read this message, I counted 5 stars too. =Really strange.. if you look at it long enough you can see a guy in a boat =fishing in the middle of the ocean! I saw a ducky and a horsie.
Discourage broken content (was: Broken images in mails)
--On Friday, August 25, 2006 12:05 AM -0700 Plenz [EMAIL PROTECTED] wrote: I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I think we should discourage all broken content in email and on the web. At one time we could assume that broken content was an honest mistake and make an attempt at fixing it. But with the rise of malicious content attempting to exploit bugs in content handlers (like overruns in image libraries), we should simply reject anything that fails to pass validation, on the assumption that's it out to get us. This includes not just broken images but also broken HTML, which is so commonly used to conceal spam. We need to stop giving a free pass to broken content creation software just because it's popular. When someone sends you broken content, you should react the same way you would if they sent you documents on dirt-smeared paper. Stop letting your emperor walk around naked.
Re: Discourage broken content
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kenneth Porter wrote: --On Friday, August 25, 2006 12:05 AM -0700 Plenz [EMAIL PROTECTED] wrote: I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I think we should discourage all broken content in email and on the web. At one time we could assume that broken content was an honest mistake and make an attempt at fixing it. But with the rise of malicious content attempting to exploit bugs in content handlers (like overruns in image libraries), we should simply reject anything that fails to pass validation, on the assumption that's it out to get us. This includes not just broken images but also broken HTML, which is so commonly used to conceal spam. We need to stop giving a free pass to broken content creation software just because it's popular. When someone sends you broken content, you should react the same way you would if they sent you documents on dirt-smeared paper. Stop letting your emperor walk around naked. I completely agree, the problem is, some implementations makes this impossible. For example MailScanner. I've heard that it truncates the mail at 30kb, no matter if that is within a MIME block or not... So my plugin gets a broken image.. though it was not broken originally... Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE705eJQIKXnJyDxURAiGZAJ4q2f5KIxWjrYN3U6vB4kFhLbZ2igCfVM1l n13w21PXoSH7IethDVc3uio= =IWPe -END PGP SIGNATURE-
Re: Discourage broken content (was: Broken images in mails)
On Friday 25 August 2006 11:20, Kenneth Porter wrote: We need to stop giving a free pass to broken content creation software just because it's popular. When someone sends you broken content, you should react the same way you would if they sent you documents on dirt-smeared paper. Stop letting your emperor walk around naked. Actually there is very little broken content IMAGE software out there in any modern mailer, even microsoft crapware does not break images. The image corruption is intentional, and may be malicious (not JUST spam). So I agree with you there. Broken html is another issue, because there is broken, and there is simply lame (lazy) html. Which of the several versions of the standards are you going to impose? The agreed upon standards? or the defacto ones? -- _ John Andersen pgpqrnYNR3Yfg.pgp Description: PGP signature
Re: Discourage broken content
On Friday 25 August 2006 11:24, decoder wrote: I've heard that it truncates the mail at 30kb, no matter if that is within a MIME block or not... So my plugin gets a broken image.. though it was not broken originally... How better to get that fixed than to put them on notice, and start tagging based on the mere fact that the image is broken. Mailscanner has no business changing content. -- _ John Andersen pgpBa2MfS7p4K.pgp Description: PGP signature
Re: Discourage broken content
From: decoder [EMAIL PROTECTED] To: users@spamassassin.apache.org Subject: Re: Discourage broken content Date: Fri, 25 Aug 2006 21:24:14 +0200 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kenneth Porter wrote: --On Friday, August 25, 2006 12:05 AM -0700 Plenz [EMAIL PROTECTED] wrote: I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I think we should discourage all broken content in email and on the web. At one time we could assume that broken content was an honest mistake and make an attempt at fixing it. But with the rise of malicious content attempting to exploit bugs in content handlers (like overruns in image libraries), we should simply reject anything that fails to pass validation, on the assumption that's it out to get us. This includes not just broken images but also broken HTML, which is so commonly used to conceal spam. We need to stop giving a free pass to broken content creation software just because it's popular. When someone sends you broken content, you should react the same way you would if they sent you documents on dirt-smeared paper. Stop letting your emperor walk around naked. I completely agree, the problem is, some implementations makes this impossible. For example MailScanner. I've heard that it truncates the mail at 30kb, no matter if that is within a MIME block or not... So my plugin gets a broken image.. though it was not broken originally... Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE705eJQIKXnJyDxURAiGZAJ4q2f5KIxWjrYN3U6vB4kFhLbZ2igCfVM1l n13w21PXoSH7IethDVc3uio= =IWPe -END PGP SIGNATURE- Could somebody explain to me the reason why MailScanner acts this way? A good question could be decide if you adapt this plugin to be compatible with MailScanner or tha last one should change this practice. IMHO, any kind of information included into an email could be revised but shouldn't be transformed. greetings Enediel
RE: Discourage broken content (was: Broken images in mails)
I think we should discourage all broken content in email and on the web. But who is to decide what is broken. Just because giftext/giffix/gocr/etc. fail to parse it, doesn't necessarily mean it's broken. The software may be buggy (note the patches on the download page needed to make these utilities work properly with legitimate images). Howard
Re: Discourage broken content (was: Broken images in mails)
On Friday 25 August 2006 11:33, Kash, Howard (Civ, ARL/CISD) wrote: I think we should discourage all broken content in email and on the web. But who is to decide what is broken. Just because giftext/giffix/gocr/etc. fail to parse it, doesn't necessarily mean it's broken. Yes, by definition, it DOES mean its broken. -- _ John Andersen pgpqkudEyt5sv.pgp Description: PGP signature
RE: Discourage broken content (was: Broken images in mails)
Yes, by definition, it DOES mean its broken. So when then giftext author made an error in assuming every image would have a global colormap, he redefined the GIF specification so that any that don't are no longer valid? Howard
Discourage broken configs (was: Discourage broken content (was: Broken images in mails)
On 25-Aug-06, at 3:20 PM, Kenneth Porter wrote: --On Friday, August 25, 2006 12:05 AM -0700 Plenz [EMAIL PROTECTED] online.de wrote: I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I think we should discourage all broken content in email and on the web. At one time we could assume that broken content was an honest mistake and make an attempt at fixing it. But with the rise of malicious content attempting to exploit bugs in content handlers (like overruns in image libraries), we should simply reject anything that fails to pass validation, on the assumption that's it out to get us. This includes not just broken images but also broken HTML, which is so commonly used to conceal spam. We need to stop giving a free pass to broken content creation software just because it's popular. When someone sends you broken content, you should react the same way you would if they sent you documents on dirt-smeared paper. Stop letting your emperor walk around naked. I would, and do, go even further and discourage broken Server/DNS configurations. I've downright had it with all this crap hitting my server. I'm now doing checks right at the MTA and if the sending server fails any hostname, HELO, domain name, SPF etc., checks they don't even get to my content filters. The biggest thing we have in our favour is that the spambots are mostly broken or running on machines that will fail most of these checks. For legitimate email, I send an message to the admins responsible for the broken configs with my log entries explaining why their email was blocked. It's up to them to fix it if they want to send email my way. I know this isn't practical in an environment where you're administering hundreds or thousands of accounts, and I feel your pain, but I think it's time we encouraged proper and correct server and DNS configurations so we can use all the tools at our disposal to our advantage. -- Gino Cerullo Pixel Point Studios 21 Chesham Drive Toronto, ON M3M 1W6 416-247-7740
RE: Discourage broken content
Could somebody explain to me the reason why MailScanner acts this way? A good question could be decide if you adapt this plugin to be compatible with MailScanner or tha last one should change this practice. As a resource/denial of service protection mechanism. If someone starts feeding you 10MB messages and spamassassin has to run all of its regular expression checks, etc. on the full content of every message, your server would die. Or consider sites the have lots of messages with huge PowerPoint attachments. SPAM messages are rarely very big, so it's actually a nice feature - until you want to use plugins like FuzzyOCR that need full content. Howard
Re: Broken images in mails
On Fri, 25 Aug 2006, Plenz wrote: Adding a point for corrupted images is sounding better and better. I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I had similar results. As soon as I installed FuzzyOcr, I saw a whole series of legit messages the log going back and forth between two users, all getting FUZZY_OCR_CORRUPT_IMG. I didn't look at the messages, but one assumes they were somebody's e-mail signature with a GIF in it or something. Ideally, users wouldn't include corrupt images in messages, but it does happen, so I thought a score of 3.0 for FUZZY_OCR_CORRUPT_IMG was too harsh. I set it to 2.0 at my site. FuzzyOcr is still catching the bad stuff, and I feel less nervous that a minor file format infraction might cause false positives. Also, there is the small matter that just because giftopnm doesn't recognize it doesn't mean it's invalid. Are we sure that giftopnm recognizes 100% of all possible items that occur in GIF files? - Logan
Re: Discourage broken content
On Fri, 25 Aug 2006, enediel gonzalez wrote: From: decoder [EMAIL PROTECTED] Kenneth Porter wrote: I completely agree, the problem is, some implementations makes this impossible. For example MailScanner. I've heard that it truncates the mail at 30kb, no matter if that is within a MIME block or not... So my plugin gets a broken image.. though it was not broken originally... Yes, if you leave the default Max SpamAssassin Size = 3 setting in place, it will do this. Could somebody explain to me the reason why MailScanner acts this way? Performance. The theory, I think, is that if a message is spam, there should be some evidence of that in the first 3 bytes, so there is no need to pass the whole message to SpamAssassin. I think this was a good assumption and a good plan when SpamAssassin didn't check a lot of attachments. Now that there are plugins which do check attachments, leaving the MIME structure of the message intact is more important, but MailScanner hasn't caught up with this reality. Of course, you can always just remove the limitation by changing the MailScanner configuration file. A good question could be decide if you adapt this plugin to be compatible with MailScanner or tha last one should change this practice. MailScanner calls SpamAssassin, so no adaptation needed in most cases. Unless you are talking about workarounds for issues like the above. - Logan
Re: Discourage broken content
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Logan Shaw wrote: On Fri, 25 Aug 2006, enediel gonzalez wrote: From: decoder [EMAIL PROTECTED] Kenneth Porter wrote: I completely agree, the problem is, some implementations makes this impossible. For example MailScanner. I've heard that it truncates the mail at 30kb, no matter if that is within a MIME block or not... So my plugin gets a broken image.. though it was not broken originally... Yes, if you leave the default Max SpamAssassin Size = 3 setting in place, it will do this. Could somebody explain to me the reason why MailScanner acts this way? Performance. The theory, I think, is that if a message is spam, there should be some evidence of that in the first 3 bytes, so there is no need to pass the whole message to SpamAssassin. I think this was a good assumption and a good plan when SpamAssassin didn't check a lot of attachments. Now that there are plugins which do check attachments, leaving the MIME structure of the message intact is more important, but MailScanner hasn't caught up with this reality. I heard that a proposal on letting the MIME structure intact has been made... so at least if the message was truncated, it wouldn't be truncated in the middle of an attachment (which would make absolutely no sense, either you truncate before or after the attachment, a broken attachment doesnt help anyone and will only cause unnecessary errors) Chris Of course, you can always just remove the limitation by changing the MailScanner configuration file. A good question could be decide if you adapt this plugin to be compatible with MailScanner or tha last one should change this practice. MailScanner calls SpamAssassin, so no adaptation needed in most cases. Unless you are talking about workarounds for issues like the above. - Logan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE71X+JQIKXnJyDxURAnGdAKC2aHFPzyX8lFhhsoSsrIgl+ci6QgCeJO4q 58fKQR01gJE0I/0P2Zpdprw= =MU3c -END PGP SIGNATURE-
Re: Discourage broken content (was: Broken images in mails)
On Friday 25 August 2006 11:40, Kash, Howard (Civ, ARL/CISD) wrote: Yes, by definition, it DOES mean its broken. So when then giftext author made an error in assuming every image would have a global colormap, he redefined the GIF specification so that any that don't are no longer valid? One presumes adherence to the standard. If the image does not adhere to the standards for gif then it is broken. These are easily seen to be broken with any standard gif viewer, usually with trash along the bottom edge. You are addressing a temporal problem, in a beta product, and using that developmental shortcoming as a justification for allowing broken image in mail. -- _ John Andersen pgpbYP09mKPsY.pgp Description: PGP signature
RE: Discourage broken content
-Original Message- From: decoder [mailto:[EMAIL PROTECTED] Sent: Friday, August 25, 2006 2:24 PM To: users@spamassassin.apache.org Subject: Re: Discourage broken content -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kenneth Porter wrote: --On Friday, August 25, 2006 12:05 AM -0700 Plenz [EMAIL PROTECTED] wrote: I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I think we should discourage all broken content in email and on the web. At one time we could assume that broken content was an honest mistake and make an attempt at fixing it. But with the rise of malicious content attempting to exploit bugs in content handlers (like overruns in image libraries), we should simply reject anything that fails to pass validation, on the assumption that's it out to get us. This includes not just broken images but also broken HTML, which is so commonly used to conceal spam. We need to stop giving a free pass to broken content creation software just because it's popular. When someone sends you broken content, you should react the same way you would if they sent you documents on dirt-smeared paper. Stop letting your emperor walk around naked. I completely agree, the problem is, some implementations makes this impossible. For example MailScanner. I've heard that it truncates the mail at 30kb, no matter if that is within a MIME block or not... So my plugin gets a broken image.. though it was not broken originally... That is patently false. I have a graphics design/advertising department at one of my locations and these fellas send huge graphics files back and forth when they have emergency proofs/changes and MailScanner has *never* damaged anything, ever, anywhere. Now, there is a setting for scanning (much like exiscan IIRCC) that allows you to truncate the message and only scan xxx amount, it's optional and doesn't modify the actual message in anyway. Rick -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: Discourage broken content
On Friday 25 August 2006 12:10, Rick Cooper wrote: That is patently false. I have a graphics design/advertising department at one of my locations and these fellas send huge graphics files back and forth when they have emergency proofs/changes and MailScanner has *never* damaged anything, ever, anywhere. Now, there is a setting for scanning (much like exiscan IIRCC) that allows you to truncate the message and only scan xxx amount, it's optional and doesn't modify the actual message in anyway. Yes, Rick, that is correct, but the situation under discussion is that mailscanner passes a partial file to the spamassassin proceess, which in turn passes that partial file to the image analysis plugins, which decide that the image is broken. Upon being passed by spamassassin, the entire, unchanged mail is sent on its way intact by mailscanner. Amavis-New does something similar. Shreds mail into pieces, launches scanners on the pieces. The problem is that the spam scanner (and presumably virus scanner) plugins are being handed partial files. Not a good practice in my view. -- _ John Andersen pgpqgyuWogszM.pgp Description: PGP signature
Re: Discourage broken content
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rick Cooper wrote: -Original Message- From: decoder [mailto:[EMAIL PROTECTED] Sent: Friday, August 25, 2006 2:24 PM To: users@spamassassin.apache.org Subject: Re: Discourage broken content -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kenneth Porter wrote: --On Friday, August 25, 2006 12:05 AM -0700 Plenz [EMAIL PROTECTED] wrote: I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I think we should discourage all broken content in email and on the web. At one time we could assume that broken content was an honest mistake and make an attempt at fixing it. But with the rise of malicious content attempting to exploit bugs in content handlers (like overruns in image libraries), we should simply reject anything that fails to pass validation, on the assumption that's it out to get us. This includes not just broken images but also broken HTML, which is so commonly used to conceal spam. We need to stop giving a free pass to broken content creation software just because it's popular. When someone sends you broken content, you should react the same way you would if they sent you documents on dirt-smeared paper. Stop letting your emperor walk around naked. I completely agree, the problem is, some implementations makes this impossible. For example MailScanner. I've heard that it truncates the mail at 30kb, no matter if that is within a MIME block or not... So my plugin gets a broken image.. though it was not broken originally... That is patently false. I have a graphics design/advertising department at one of my locations and these fellas send huge graphics files back and forth when they have emergency proofs/changes and MailScanner has *never* damaged anything, ever, anywhere. Now, there is a setting for scanning (much like exiscan IIRCC) that allows you to truncate the message and only scan xxx amount, it's optional and doesn't modify the actual message in anyway. Rick I did not say it damages the mail. I said it feds only a given amount of the message to SpamAssassin and THAT breaks plugins requiring the whole message, especially when MailScanner breaks messages in the middle of attachments. And as far as I know, it is the default setting of mailscanner to feed only a given amount of kb to SpamAssassin. That does not mean it truncates the message before delivering it. Chris -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE71wLJQIKXnJyDxURAtxUAJ9/O5F4cC/1vlsE6EsRb6vLcepH+ACfcTCA x4CmnLDyZbUFtAr2kWK9koY= =Ckpc -END PGP SIGNATURE-
FuzzyOcr 2.3b released, fixes bugs and improves stability
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I just uploaded FuzzyOcr 2.3b to the download site. If you find bugs or run into problems, please mail back :) The major changes are: - - Added a configurable timeout (maximum runtime) for the plugin, to avoid any lockups/unwanted delays - - The default matching threshold (set in the config file) can now be overridden on a per-word basis in the wordlist An example, wordlist contains: word1 word2::0 word3::0.2 Then word1 is matched with the default threshold set in the config file, word2 must be an exact match (threshold 0), and word 3 is matched with a threshold of 0.2. This is especially useful for words which trigger false positives very often like: penis, money or news. Note that the tendency to produce a FP is not directly connected to the word length. The word buy produces very few FP compared to penis, when both are being matched with the same threshold. The FuzzyOcr.words.sample contains some suggestions for word specific thresholds which I recommend. - - The experimental MD5 database has been replaced by a custom hash database which is able to match very similar images. Often, you get the same image twice, or all your customers get the same spam mail. But even though the pictures look the same, they are not identical. That is why MD5 was useless. The newly introduced hash (self invented) is able to recognize almost identical images based on features that I won't explain here as it would make it easier for spammers :) If a message contains a picture previously registered in the database, the original score is reread from the database and the message is immediatly tagged with this score and the plugin ends. - - Some non-alpha-alpha translations are now used on the gocr output, that fix common mistakes, like i being misread as ; or a as 8. - - There are now 2 scores for broken images, one is used when the picture is recognized as broken, but giffix was able to correct the errors and it gave some output that can be scanned, the other one is used if the image is unfixable (that means either too broken, or interlaced/animated and broken). The first one is set lower than the second one (2.5 vs. 5). - -Various bugfixes TODO: - -Write an external program to manage the database (add, remove and verify given pictures). - -Rewrite the temp file system to do all external program operations on files (saves memory). Another wish: I'd like to create a database to ship with the plugin so it can be used out of the box but I do not have much samples here, so it would be nice if you sent me picture samples of common picture spam you get with [picture sample] in the subject to my mail address. I will post here again if I got enough :). Thanks to Jorge Valdes, Michael Alan Dorman and UxBoD for finding bugs and sending improvement suggestions for this version Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE72jaJQIKXnJyDxURApfeAJ47JcACEeIaYtEA8z6wDdFxGPhrUgCZAZSE sdWROYeF8IFdbUX0njAdV+o= =y7XM -END PGP SIGNATURE-
spamd, DnsResolver, and URIDNSBL errors?
Hello all.. Running OS/X, SA 3.1.3 ... Recently, and unfortunately, I don't check my logs that often, but it goes back at least as far back as my logs go (5 days), I'm getting the below in my mail.log: Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Aug 25 14:01:59 www spamd[257]: bayes: bayes db version 0 is not able to be used, aborting! at /Library/Perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm line 195, GEN7 line 97.\n Aug 25 14:13:38 www spamd[257]: Use of uninitialized value in exists at /Library/Perl/5.8.6/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 718, GEN11 line 88.\n Aug 25 14:13:38 www spamd[257]: Use of uninitialized value in exists at /Library/Perl/5.8.6/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 718, GEN11 line 88.\n Aug 25 14:13:52 www spamd[257]: Use of uninitialized value in exists at /Library/Perl/5.8.6/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 718, GEN11 line 88.\n Aug 25 14:13:54 www spamd[257]: Use of uninitialized value in exists at /Library/Perl/5.8.6/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 718, GEN11 line 88.\n I've googled both, and can't find much of any help.. Any hints, helps, tips appreciated. Thanks. Evan
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
On Friday 25 August 2006 13:17, decoder wrote: Another wish: I'd like to create a database to ship with the plugin so it can be used out of the box but I do not have much samples here, so it would be nice if you sent me picture samples of common picture spam you get with [picture sample] in the subject to my mail address. I will post here again if I got enough :). Wouldn't it be more productive to the community to work with SURBL to enable the centralized storage of these hashes? Or perhaps with Razor2? I'm not an expert on Razor, but my limited understanding of it is that it generates hashes of (portions of) message bodies and stores that hash for future comparison. It would seem that once someone decide something is spam, one could take your hash and wrap a minimal message around it and report THAT to razor. Then your engine could examine an image, generate your hash, and wrap it in the same minimal message and Query Razor. Presumably getting a hit. No local database is needed, because a world wide one would be substituted. That way, if you get this spam and report it, It will already be known by the time I get the spam. -- _ John Andersen pgpdkBPTGIpMT.pgp Description: PGP signature
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John Andersen wrote: On Friday 25 August 2006 13:17, decoder wrote: Another wish: I'd like to create a database to ship with the plugin so it can be used out of the box but I do not have much samples here, so it would be nice if you sent me picture samples of common picture spam you get with [picture sample] in the subject to my mail address. I will post here again if I got enough :). Wouldn't it be more productive to the community to work with SURBL to enable the centralized storage of these hashes? Or perhaps with Razor2? I'm not an expert on Razor, but my limited understanding of it is that it generates hashes of (portions of) message bodies and stores that hash for future comparison. It would seem that once someone decide something is spam, one could take your hash and wrap a minimal message around it and report THAT to razor. Then your engine could examine an image, generate your hash, and wrap it in the same minimal message and Query Razor. Presumably getting a hit. No local database is needed, because a world wide one would be substituted. That way, if you get this spam and report it, It will already be known by the time I get the spam. Maybe it would. But this kind of hash is no real hash. It is just a combination of picture features that I invented... but it seems reliable in my tests so far. Once it has been tested in public, such a cooperation with SURBL or Razor might be possible Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE724mJQIKXnJyDxURAuW6AKClt1V0/faPEJaTwjLRXChXqhtTkwCfc9Yp UBsuigcaOac6pOZz2EP7Gkk= =LJEa -END PGP SIGNATURE-
Re: spamd, DnsResolver, and URIDNSBL errors?
On Fri, 25 Aug 2006, Evan Platt wrote: Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Verify that the DNS server is actually running on any hosts that you're looking to for DNS services. /etc/resolv.conf should list them. Connection Refused means there's nothing listening at the port you're trying to connect to. In this case, it smells like your DNS server isn't running, or your host is configured to get DNS services from a computer that isn't running a DNS server. As for the others, correcting the first error may fix the rest. -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The fetters imposed on liberty at home have ever been forged out of the weapons provided for defense against real, pretended, or imaginary dangers from abroad. -- James Madison, 1799 --- 25 days until Talk Like a Pirate day
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Scheidell wrote: Now if you could just ocr the whole thing as text, and pass it back to SA to score! I explained before why this is not going to happen really soon: a) It is VERY hard to realize. To preserve the message, you would need two plugins, one that runs as first rule, converts the message to text only, and another one that runs as last rule and puts the image back into the message (so the message stays unchanged). b) The default gocr output is not reliable enough for text only rules. The current FuzzyOcr archives better results by doing multiple scans with different settings. Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE728SJQIKXnJyDxURAlaQAJ447+AJu7pHwnqfHR5MkdCRIf5zDQCfedAb 7PyOxUGE4oTuoVmd5JRGuGw= =dMnX -END PGP SIGNATURE-
Re: spamd, DnsResolver, and URIDNSBL errors?
On Friday 25 August 2006 13:41, John D. Hardin wrote: On Fri, 25 Aug 2006, Evan Platt wrote: Aug 25 14:01:58 www spamd[257]: dns: sendto() failed: Connection refused at /Library/Perl/5.8.6/Mail/SpamAssassin/DnsResolver.pm line 339, GEN7 line 97.\n Verify that the DNS server is actually running on any hosts that you're looking to for DNS services. /etc/resolv.conf should list them. Connection Refused means there's nothing listening at the port you're trying to connect to. In this case, it smells like your DNS server isn't running, or your host is configured to get DNS services from a computer that isn't running a DNS server. As for the others, correcting the first error may fix the rest. It seems likely that he would have notices such a glaring deficiency, No? Without DNS, a whole lot of stuff is broke. I wonder if he has all necessary Perl Modules. -- _ John Andersen pgp9NMODoCGlP.pgp Description: PGP signature
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
On Friday 25 August 2006 13:39, decoder wrote: Maybe it would. But this kind of hash is no real hash. It is just a combination of picture features that I invented... but it seems reliable in my tests so far. Not sure it matters a whole lot what the actual content is when using Razor. If enough (trusted) people report a message with a given text content, it builds a razor confidence level fairly quickly. So what you report could be a simple hex dump of your hash, what ever that hash may look like. I'm betting this could be done with razor without any action on their part, (not that I'm recommending going around them). -- _ John Andersen pgpKzx7QyDwQO.pgp Description: PGP signature
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John Andersen wrote: On Friday 25 August 2006 13:39, decoder wrote: Maybe it would. But this kind of hash is no real hash. It is just a combination of picture features that I invented... but it seems reliable in my tests so far. Not sure it matters a whole lot what the actual content is when using Razor. If enough (trusted) people report a message with a given text content, it builds a razor confidence level fairly quickly. So what you report could be a simple hex dump of your hash, what ever that hash may look like. I'm betting this could be done with razor without any action on their part, (not that I'm recommending going around them). The problem is, this is not a hash that is simply matched against another hash 1:1. When comparing two hashes, a small percentage of difference is allowed on the values for better results. Sometimes it is a 100% match, but it might also be a 99% match. So matching two hashes is rather complex. If you know perl, feel free to check out the routines :) Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE73FaJQIKXnJyDxURAqW7AJ9yJ+9yPQIYOWQl8xZpT8Mf3q2YygCeLae8 HJZm5YWEk19RuOCGRS0sJ7A= =Sv3C -END PGP SIGNATURE-
RE: FuzzyOcr 2.3b released, fixes bugs and improves stability
On Fri, 25 Aug 2006, Michael Scheidell wrote: Now if you could just ocr the whole thing as text, and pass it back to SA to score! That's what I was thinking, and would allow leverage by a lot of plugins (e.g. the Word plugin I am prepping to start)... Create some PerMsgStatus string variable or some such that the body rules would be run over... -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The fetters imposed on liberty at home have ever been forged out of the weapons provided for defense against real, pretended, or imaginary dangers from abroad. -- James Madison, 1799 --- 25 days until Talk Like a Pirate day
Re: spamd, DnsResolver, and URIDNSBL errors?
On Fri, Aug 25, 2006 at 01:44:54PM -0800, John Andersen wrote: It seems likely that he would have notices such a glaring deficiency, No? Possibly. My recollection is that Net::DNS only looks at the first server entry in resolv.conf. So if that server happens to not be running named, the system may work fine (assuming multiple entries in resolv.conf), but Net::DNS would barf. -- Randomly Generated Tagline: Oh My God! They Killed init! You Bastards! - Unknown pgpYBU5RjyGgo.pgp Description: PGP signature
Re: spamd, DnsResolver, and URIDNSBL errors?
On Friday 25 August 2006 13:55, Theo Van Dinter wrote: On Fri, Aug 25, 2006 at 01:44:54PM -0800, John Andersen wrote: It seems likely that he would have notices such a glaring deficiency, No? Possibly. My recollection is that Net::DNS only looks at the first server entry in resolv.conf. So if that server happens to not be running named, the system may work fine (assuming multiple entries in resolv.conf), but Net::DNS would barf. Could be. Simply reversing the order of lines in resolv.conf might determine this. If the first server was down, dns would have to time out (usually 10 to 30 seconds) before the query would fall-over to the second server. If the first server was set to reject connections instantly, fallover is almost instant, but your only if you have (and use) a secondary. Instant rejection is what competent ISPs use when doing maintenance. rant Even a minimalist configuration of nscd on the mail server allows for working around this sort of problem and using DNS servers of your choice. Often the secondary server provided by you ISP is less busy than the primary. It is not necessary to blindly take what ever is offered to you via DHCP. /rant -- _ John Andersen pgpV78E48RK76.pgp Description: PGP signature
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
On Fri, Aug 25, 2006 at 11:43:47PM +0200, decoder wrote: a) It is VERY hard to realize. To preserve the message, you would need two plugins, one that runs as first rule, converts the message to text only, and another one that runs as last rule and puts the image back into the message (so the message stays unchanged). Preserving the message isn't really a big deal. The internal message tree format is for internal use only. It's not used when writing out the message -- that's what the pristine parts are for. So you *could* completely mess around with the tree if you wanted to, it only affects the scan. It actually wouldn't be difficult to add in a rendered section for image/* types, and have that included in the normal rules if found. The main thing to do is make sure that the image is rendered into text before the message body text array is cached -- and that's solved (generally speaking) by doing the rendering in check_start(). Heck, this may be worth having a new plugin call in M::SA::parse() which happens right after the normal parsing run, called render_parts or something, where plugins get called with the message and main SA objects, and are expected to only generate renderings for the non-standard types. Actually, that's not a bad idea. Feel like opening a BZ about it? ;) -- Randomly Generated Tagline: Now it's time for pay back ... Can someone lend me enough for a Coke? - Chris Bentley pgp7X6Lh6FI9v.pgp Description: PGP signature
Re: spamd, DnsResolver, and URIDNSBL errors?
At 02:44 PM 8/25/2006, you wrote: It seems likely that he would have notices such a glaring deficiency, No? my resolv.conf consists of nameserver 192.168.1.66 (my router). Without DNS, a whole lot of stuff is broke. I wonder if he has all necessary Perl Modules. I'm open to suggestions, or a hint to a link of which ones I may be missing? Thanks.
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
On Fri, 25 Aug 2006, John Andersen wrote: On Friday 25 August 2006 13:17, decoder wrote: Another wish: I'd like to create a database to ship with the plugin so it can be used out of the box but I do not have much samples here, so it would be nice if you sent me picture samples of common picture spam you get with [picture sample] in the subject to my mail address. I will post here again if I got enough :). Wouldn't it be more productive to the community to work with SURBL to enable the centralized storage of these hashes? I think he was speaking of word lists. I agree with the other poster, the best solution would be a way to append some extra text to the PerMsgStatus object and have the body rules process that as well as the real message body. -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The fetters imposed on liberty at home have ever been forged out of the weapons provided for defense against real, pretended, or imaginary dangers from abroad. -- James Madison, 1799 --- 25 days until Talk Like a Pirate day
Re: spamd, DnsResolver, and URIDNSBL errors?
On Friday 25 August 2006 14:09, Evan Platt wrote: At 02:44 PM 8/25/2006, you wrote: It seems likely that he would have notices such a glaring deficiency, No? my resolv.conf consists of nameserver 192.168.1.66 (my router). Manually change that to your ISPs DNS server, or better yet, to his Secondary. Try that for a bit... -- _ John Andersen pgp03I4kWOq2y.pgp Description: PGP signature
Re: spamd, DnsResolver, and URIDNSBL errors?
On Fri, 25 Aug 2006, John Andersen wrote: Verify that the DNS server is actually running on any hosts that you're looking to for DNS services. /etc/resolv.conf should list them. Connection Refused means there's nothing listening at the port you're trying to connect to. In this case, it smells like your DNS server isn't running, or your host is configured to get DNS services from a computer that isn't running a DNS server. As for the others, correcting the first error may fix the rest. It seems likely that he would have notices such a glaring deficiency, No? You'd think... Without DNS, a whole lot of stuff is broke. Yeah. I wonder if he has all necessary Perl Modules. In that case I'd expect different errors than Connection Refused. -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The fetters imposed on liberty at home have ever been forged out of the weapons provided for defense against real, pretended, or imaginary dangers from abroad. -- James Madison, 1799 --- 25 days until Talk Like a Pirate day
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John D. Hardin wrote: On Fri, 25 Aug 2006, John Andersen wrote: On Friday 25 August 2006 13:17, decoder wrote: Another wish: I'd like to create a database to ship with the plugin so it can be used out of the box but I do not have much samples here, so it would be nice if you sent me picture samples of common picture spam you get with [picture sample] in the subject to my mail address. I will post here again if I got enough :). Wouldn't it be more productive to the community to work with SURBL to enable the centralized storage of these hashes? I think he was speaking of word lists. I agree with the other poster, the best solution would be a way to append some extra text to the PerMsgStatus object and have the body rules process that as well as the real message body. No, I was actually speaking about hashes... Most spam seems recurring so it might be a good idea to ship the plugin with a prebuilt database. Just my thoughts... other opinions are welcome... Chris -- John Hardin KA7OHZICQ#15735746 http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The fetters imposed on liberty at home have ever been forged out of the weapons provided for defense against real, pretended, or imaginary dangers from abroad. -- James Madison, 1799 --- 25 days until Talk Like a Pirate day -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE73YTJQIKXnJyDxURAlMbAKCDyuFBb4RYVsG6ICIw8MbqZO/ExwCgl3GN dGYobKLzcV6OVioMVCTgnno= =OWVS -END PGP SIGNATURE-
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter wrote: On Fri, Aug 25, 2006 at 11:43:47PM +0200, decoder wrote: a) It is VERY hard to realize. To preserve the message, you would need two plugins, one that runs as first rule, converts the message to text only, and another one that runs as last rule and puts the image back into the message (so the message stays unchanged). Preserving the message isn't really a big deal. The internal message tree format is for internal use only. It's not used when writing out the message -- that's what the pristine parts are for. So you *could* completely mess around with the tree if you wanted to, it only affects the scan. It actually wouldn't be difficult to add in a rendered section for image/* types, and have that included in the normal rules if found. The main thing to do is make sure that the image is rendered into text before the message body text array is cached -- and that's solved (generally speaking) by doing the rendering in check_start(). Heck, this may be worth having a new plugin call in M::SA::parse() which happens right after the normal parsing run, called render_parts or something, where plugins get called with the message and main SA objects, and are expected to only generate renderings for the non-standard types. Actually, that's not a bad idea. Feel like opening a BZ about it? ;) Well, I guess I'm too busy to start another plugin now... but maybe someone else has the time.. Just don't remove the image ;) otherwise my plugin gets useless ;D Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE73aJJQIKXnJyDxURAu+bAKCEnCDQrCEormQ47HLo+6IdBOlTwQCgv90j iw7E9MqFO13bqmX05QN8HZU= =DsO+ -END PGP SIGNATURE-
Re: spamd, DnsResolver, and URIDNSBL errors?
At 03:12 PM 8/25/2006, you wrote: Manually change that to your ISPs DNS server, or better yet, to his Secondary. Try that for a bit... Done. Still getting Aug 25 15:26:09 www spamd[281]: bayes: bayes db version 0 is not able to be used, aborting! at /Library/Perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm line 195, GEN50 line 2.\n Aug 25 15:26:09 www spamd[281]: bayes: bayes db version 0 is not able to be used, aborting! at /Library/Perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm line 195, GEN50 line 392.\n Aug 25 15:26:11 www spamd[281]: bayes: bayes db version 0 is not able to be used, aborting! at /Library/Perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm line 195, GEN50 line 392.\n And fyi, it appears on a reboot, /etc/resolv.conf is reverted back to 192.168.1.66.
Re: spamd, DnsResolver, and URIDNSBL errors?
On Friday 25 August 2006 14:27, Evan Platt wrote: At 03:12 PM 8/25/2006, you wrote: Manually change that to your ISPs DNS server, or better yet, to his Secondary. Try that for a bit... Done. Still getting Aug 25 15:26:09 www spamd[281]: bayes: bayes db version 0 is not able to be used, aborting! at /Library/Perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm line 195, GEN50 line 2.\n Aug 25 15:26:09 www spamd[281]: bayes: bayes db version 0 is not able to be used, aborting! at /Library/Perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm line 195, GEN50 line 392.\n Aug 25 15:26:11 www spamd[281]: bayes: bayes db version 0 is not able to be used, aborting! at /Library/Perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm line 195, GEN50 line 392.\n Ok this cleared up the other problem, so something is lame about your router. Time for a reboot of that router perhaps. Check its configuration to see if it allows you to manually code a dns server for the local subnet, and put your ISPs IP in there. Its no different that sending it to the router and having it send it onward to the ISP. Now you just have a bases DB problem. Sounds like this user was either root, or some other user that did not have bayes set up yet. And fyi, it appears on a reboot, /etc/resolv.conf is reverted back to 192.168.1.66. Yes, that's normal. You can prevent this in your DHCP setings. But the better solution is to solve it at the router. -- _ John Andersen pgpQHBmjlMhrG.pgp Description: PGP signature
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
On Fri, 25 Aug 2006, John D. Hardin wrote: I think he was speaking of word lists. Sigh. That's what I get for reading and responding in sequence. -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The fetters imposed on liberty at home have ever been forged out of the weapons provided for defense against real, pretended, or imaginary dangers from abroad. -- James Madison, 1799 --- 25 days until Talk Like a Pirate day
Re: bayes autolearn acting up
On Aug 24, 2006, at 10:11 AM, [EMAIL PROTECTED] wrote: Since upgrading to 3.14, when I turn on bayes auto-learn with: bayes_auto_learn 1 and I set the learn boundaries with: bayes_auto_learn_threshold_nonspam-3.5 bayes_auto_learn_threshold_spam 15.5 I get unexpected auto-learning. Example: I just saw a spam come through that scored 9.9, which is enough for it to be tagged as spam, but it should not be auto-learned as spam. But, in the header it clearly reads: X-Spam-Status: Yes, score=9.9 required=5.0 tests=AWL,BAYES_99, DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE , MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam version=3.1.4 Any ideas? SA does not autolearn based on the final message score. So, toss the 9.9 out the window. That's not the number SA compares to the 15.5. For learning SA uses what the message score would have been if: 1) the AWL is off. 2) Bayes was disabled, including shifting what scoreset is used for all the other rules. 3) all white/blacklists are disabled. This is often *quite* different from the final score. However, in this case I don't entirely understand... The default SA 3.1 scores are: score DATE_IN_PAST_03_06 0.736 0 1.122 0.478 score DCC_CHECK 0 1.37 0 2.17 score DIGEST_MULTIPLE 0 0.233 0 0.765 score HTML_40_50 0.611 0 0.497 0.496 score HTML_MESSAGE 0.001 score MIME_HTML_ONLY 0.414 0.001 0.389 0.001 score RAZOR2_CHECK 0 0.5 0 0.5 score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234 Adding the set1 scores up, the learning score should have been 4.753. Have you modified any rule scores? Here's another example: X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on localhost X-Spam-Level: * X-Spam-Status: Yes, score=6.0 required=5.0 tests=ADVANCE_FEE_1,BAYES_95, DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_PASS autolearn=spam version=3.1.4 I just can't see why it is autolearning everything that is tagged as spam. If anyone has any ideas, i'd appreciate it! Regards, Devin
Re: FuzzyOcr 2.3b released, fixes bugs and improves stability
On Fri, 25 Aug 2006, Theo Van Dinter wrote: On Fri, Aug 25, 2006 at 11:43:47PM +0200, decoder wrote: a) It is VERY hard to realize. To preserve the message, you would need two plugins, one that runs as first rule, converts the message to text only, and another one that runs as last rule and puts the image back into the message (so the message stays unchanged). The main thing to do is make sure that the image is rendered into text before the message body text array is cached -- and that's solved (generally speaking) by doing the rendering in check_start(). Heck, this may be worth having a new plugin call in M::SA::parse() which happens right after the normal parsing run, called render_parts or something, where plugins get called with the message and main SA objects, and are expected to only generate renderings for the non-standard types. I was pondering this a few weeks ago, and I started thinking about how some print spoolers (like the old System V stuff and also CUPS) do format conversions. Basically, they have a directed, acyclic graph of formats and converters. Just as an example, you might have edges like this: text - postscript ; cmd='enscript' postscript - PCL ; cmd='gs -DDEVICE=SomePclDriver' jpeg - pnm ; cmd='djpeg' gif - pnm ; cmd='giftopnm' pnm - postscript ; cmd='pnmtops' So then you declare, hey my printer takes PCL input. Then when someone enqueues a jpeg to be printed, the spooler pieces together a converter pipeline by going backwards through the graph: postscript - PCL ; cmd='gs -DDEVICE=SomePclDriver' pnm - postscript ; cmd='pnmtops' jpeg - pnm ; cmd='djpeg' which tells it it needs to do something like the following to convert the input format into what a printer understands: djpeg | pnmtops | gs -DDEVICE=SomePclDriver It struck me that this isn't entirely different from what you might want for spam detection via deep content scanning. The plugins are analogous to printers (they would register what MIME types they can handle), and the spam is like a print job. Basically, you want to start with a message and a set of enabled plugins, then convert the message to all formats that the plugins can recognize. There are some differences, though. With the printer, you have no interest in printing the intermediate formats. With a spam detector, you can never rule out the idea that scanning intermediate data might be helpful because of some tell-tale sign that it's spam. The point is, though, it could be interesting to have a general method for allowing spam to be converted into anything that any plugin can understand, rather than having each plugin do this itself. For example, let's suppose you have a Word document with an image in it, and that image contains spammy words that can be recognized via OCR. Wouldn't it be nice if the Word document scanner could feed the images it finds back into some framework so that anything which can scan images can scan things from inside the Word doc? Similarly with zip files (although I doubt spammers will use them since everyone is too lazy to open them up) and a million other things. Of course, this is just an idea, and it's a little bit of an out there idea, but as long as the conversion topic is being describe, I thought I'd bring it up so the idea is on the table. - Logan
RE: Discourage broken content
-Original Message- From: John Andersen [mailto:[EMAIL PROTECTED] Sent: Friday, August 25, 2006 4:20 PM To: users@spamassassin.apache.org Subject: Re: Discourage broken content On Friday 25 August 2006 12:10, Rick Cooper wrote: That is patently false. I have a graphics design/advertising department at one of my locations and these fellas send huge graphics files back and forth when they have emergency proofs/changes and MailScanner has *never* damaged anything, ever, anywhere. Now, there is a setting for scanning (much like exiscan IIRCC) that allows you to truncate the message and only scan xxx amount, it's optional and doesn't modify the actual message in anyway. Yes, Rick, that is correct, but the situation under discussion is that mailscanner passes a partial file to the spamassassin proceess, which in turn passes that partial file to the image analysis plugins, which decide that the image is broken. Upon being passed by spamassassin, the entire, unchanged mail is sent on its way intact by mailscanner. Amavis-New does something similar. Shreds mail into pieces, launches scanners on the pieces. The problem is that the spam scanner (and presumably virus scanner) plugins are being handed partial files. Not a good practice in my view. I misunderstood what decoder was saying. And no, MailScanner doesn't give the virus scanners partial messages. In fact it goes to great pains to completely unpack all attachments (including tnef) and sanitize the file names, etc. The option to give partial messages to SA is due in part to the historical lack of need to hand a large message to SA to determine ham/spam and there are/were vulnerabilities in the tnef processing that could be exploited by very large tnef attachments. Mailscanner currently handles tnef in a way I doubt there would be a problem and can in fact (optionally) decode tnef attachments and recreate them as standard attachments that any mail client can handle. In any event I plan to bring this up on the MailScanner list and suggest the default behavior should no longer be handing only a part of the message to SA. Rick -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: [Devel-spam] FuzzyOcr 2.3b released,fixes bugs and improves stability
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Expertsites, Inc. wrote: From: decoder [EMAIL PROTECTED] Hello, I just uploaded FuzzyOcr 2.3b to the download site. If you find bugs or run into problems, please mail back :) This release failed to recognize the sample png.eml file with logfile error message: Debug mode: Image type not recognized, unknown format. Skipping this image... I resolved this problem by changing one line in FuzzyOcr.pm Changed: elsif ( substr($picture_data,0,5) eq \x89\x50\x4e\x47 ) { To read: elsif ( substr($picture_data,0,4) eq \x89\x50\x4e\x47 ) { ^ Tom Green -- Expertsites, Inc. Thank you for reporting this... seems I cant count bytes anymore ;) For anyone who is downloading this past this message, the tarball has been updated... For all others, please change the line :) Chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE754FJQIKXnJyDxURAv1BAJ9KHh9VcKtCN4NWmPoWDg4Tp6m4nQCggOKT aInWSnQgKlh0YhvE0YZclxs= =nAbb -END PGP SIGNATURE-
Re: bayes autolearn acting up
From: [EMAIL PROTECTED] On Aug 24, 2006, at 10:11 AM, [EMAIL PROTECTED] wrote: Since upgrading to 3.14, when I turn on bayes auto-learn with: bayes_auto_learn 1 and I set the learn boundaries with: bayes_auto_learn_threshold_nonspam-3.5 bayes_auto_learn_threshold_spam 15.5 I get unexpected auto-learning. Example: I just saw a spam come through that scored 9.9, which is enough for it to be tagged as spam, but it should not be auto-learned as spam. But, in the header it clearly reads: X-Spam-Status: Yes, score=9.9 required=5.0 tests=AWL,BAYES_99, DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE , MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam version=3.1.4 Any ideas? SA does not autolearn based on the final message score. So, toss the 9.9 out the window. That's not the number SA compares to the 15.5. For learning SA uses what the message score would have been if: 1) the AWL is off. 2) Bayes was disabled, including shifting what scoreset is used for all the other rules. 3) all white/blacklists are disabled. This is often *quite* different from the final score. However, in this case I don't entirely understand... The default SA 3.1 scores are: score DATE_IN_PAST_03_06 0.736 0 1.122 0.478 score DCC_CHECK 0 1.37 0 2.17 score DIGEST_MULTIPLE 0 0.233 0 0.765 score HTML_40_50 0.611 0 0.497 0.496 score HTML_MESSAGE 0.001 score MIME_HTML_ONLY 0.414 0.001 0.389 0.001 score RAZOR2_CHECK 0 0.5 0 0.5 score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234 Adding the set1 scores up, the learning score should have been 4.753. Have you modified any rule scores? Here's another example: X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on localhost X-Spam-Level: * X-Spam-Status: Yes, score=6.0 required=5.0 tests=ADVANCE_FEE_1,BAYES_95, DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_PASS autolearn=spam version=3.1.4 I just can't see why it is autolearning everything that is tagged as spam. If anyone has any ideas, i'd appreciate it! grep bayes_auto_learn_threshold /etc/mail/spamassassin/* grep bayes_auto_learn_threshold /usr/share/spamassassin/*.cf grep bayes_auto_learn_threshold /var/lib/spamassassin/*.*/*.cf See if somewhere your setting is getting overridden. You might also perform some simply checks to see if the file you are changing is actually one that SpamAssassin is using. Some distros move the directories around. /etc/mail/spamassassin is often /etc/spamassassin, for example. {^_^}
Re: Discourage broken configs (was: Discourage broken content (was: Broken images in mails)
From: Gino Cerullo [EMAIL PROTECTED] On 25-Aug-06, at 3:20 PM, Kenneth Porter wrote: --On Friday, August 25, 2006 12:05 AM -0700 Plenz [EMAIL PROTECTED] online.de wrote: I disagree. To check out what happens I converted a JPG picture into a GIF file and sent it to myself. One time I converted it with IrfanView and the second time with PaintShop Pro. Both GIF files had the result giftopnm: EOF or error reading data portion... So I produced a corrupt (?) image, but it was not spam. I think we should discourage all broken content in email and on the web. At one time we could assume that broken content was an honest mistake and make an attempt at fixing it. But with the rise of malicious content attempting to exploit bugs in content handlers (like overruns in image libraries), we should simply reject anything that fails to pass validation, on the assumption that's it out to get us. This includes not just broken images but also broken HTML, which is so commonly used to conceal spam. We need to stop giving a free pass to broken content creation software just because it's popular. When someone sends you broken content, you should react the same way you would if they sent you documents on dirt-smeared paper. Stop letting your emperor walk around naked. I would, and do, go even further and discourage broken Server/DNS configurations. I've downright had it with all this crap hitting my server. I'm now doing checks right at the MTA and if the sending server fails any hostname, HELO, domain name, SPF etc., checks they don't even get to my content filters. The biggest thing we have in our favour is that the spambots are mostly broken or running on machines that will fail most of these checks. For legitimate email, I send an message to the admins responsible for the broken configs with my log entries explaining why their email was blocked. It's up to them to fix it if they want to send email my way. I know this isn't practical in an environment where you're administering hundreds or thousands of accounts, and I feel your pain, but I think it's time we encouraged proper and correct server and DNS configurations so we can use all the tools at our disposal to our advantage. I am with you right up until the moment my head says, Who defines proper content? Then I come back to email format rwars and say Fahgeddit. One man's cilantro spice is another man's intolerable bitterness. Do we try to force the bitterness on the other man or do we try to accommodate? Who gets to define how much we must tolerate? It's purely an rwar issue when you apply this to formatting wars. It is best to do what YOU will and not get evangelistic about it. If you do characters like me get contrary. {^_^} Joanne, The Stubborn
Re: SPF and envelope senders
Logan Shaw wrote: So... is it safe to assume their servers are configured incorrectly? Or should our MTA be somehow adding that header if it's missing? Or is there some other way that our MailScanner+SpamAssassin combo should be getting the envelope sender information? MailScanner versions of any reasonably recent vintage add a X-MailScanner-From: header to each message before passing it to SA. Edit your SA local.cf file and add: envelope_sender_header X-MailScanner-From
Re: bayes autolearn acting up
Here's another example: X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on localhost X-Spam-Level: * X-Spam-Status: Yes, score=6.0 required=5.0 tests=ADVANCE_FEE_1,BAYES_95, DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_PASS autolearn=spam version=3.1.4 I just can't see why it is autolearning everything that is tagged as spam. If anyone has any ideas, i'd appreciate it! grep bayes_auto_learn_threshold /etc/mail/spamassassin/* grep bayes_auto_learn_threshold /usr/share/spamassassin/*.cf grep bayes_auto_learn_threshold /var/lib/spamassassin/*.*/*.cf See if somewhere your setting is getting overridden. You might also perform some simply checks to see if the file you are changing is actually one that SpamAssassin is using. Some distros move the directories around. /etc/mail/spamassassin is often /etc/spamassassin, for example. No other conf files anywhere - the conf files I am modifying are definitely the ones SA is using. Could my Bayes DB be corrupt? I am using a mysql DB for Bayes. Besides this autolearn glitch, Bayes is performing well. Thanks, Devin
Re: [Devel-spam] FuzzyOcr 2.3b released,fixes bugs and improves stability
Hello, I just uploaded FuzzyOcr 2.3b to the download site. If you find bugs or run into problems, please mail back :) The jpeg.eml and png.eml samples failed to provide FuzzyOcr hits on my system because the messages scored higher than the default focr_autodisable_score. You should mention in the README file in the samples directory that you may need to temporarily raise the focr_autodisable_score while testing. Gary V _ Check the weather nationwide with MSN Search: Try it now! http://search.msn.com/results.aspx?q=weatherFORM=WLMTAG
Re: Discourage broken configs (was: Discourage broken content (was: Broken images in mails)
I think we should discourage all broken content in email and on the web. At one time we could assume that broken content was an honest mistake and make an attempt at fixing it. But with the rise of malicious content attempting to exploit bugs in content handlers (like overruns in image libraries), we should simply reject anything that fails to pass validation, on the assumption that's it out to get us. This includes not just broken images but also broken HTML, which is so commonly used to conceal spam. We need to stop giving a free pass to broken content creation software just because it's popular. When someone sends you broken content, you should react the same way you would if they sent you documents on dirt-smeared paper. Stop letting your emperor walk around naked. I would, and do, go even further and discourage broken Server/DNS configurations. I've downright had it with all this crap hitting my server. I'm now doing checks right at the MTA and if the sending server fails any hostname, HELO, domain name, SPF etc., checks they don't even get to my content filters. The biggest thing we have in our favour is that the spambots are mostly broken or running on machines that will fail most of these checks. For legitimate email, I send an message to the admins responsible for the broken configs with my log entries explaining why their email was blocked. It's up to them to fix it if they want to send email my way. I know this isn't practical in an environment where you're administering hundreds or thousands of accounts, and I feel your pain, but I think it's time we encouraged proper and correct server and DNS configurations so we can use all the tools at our disposal to our advantage. I am with you right up until the moment my head says, Who defines proper content? Then I come back to email format rwars and say Fahgeddit. One man's cilantro spice is another man's intolerable bitterness. Do we try to force the bitterness on the other man or do we try to accommodate? Who gets to define how much we must tolerate? It's purely an rwar issue when you apply this to formatting wars. It is best to do what YOU will and not get evangelistic about it. If you do characters like me get contrary. {^_^} Joanne, The Stubborn A great and a wonderful idea until you have users paying you for e-mail service and you start bouncing their mails because someone or some program has a bug in it that they have no control over and they lose that email from their employer, client or whatever and I can assure you that they will find another provider right quick. ===[George R. Kasica]===+1 262 677 0766 President +1 206 374 6482 FAX Netwrx Consulting Inc. Jackson, WI USA http://www.netwrx1.com [EMAIL PROTECTED] ICQ #12862186
Re: Animated images in mails
Today I got animated spam. The first frame only with dots an lines, the second frame with spam text, the third frame again with dots and lines. The duration of the text frame is very long, the others are very short. Is there a command line utility which can extract animated GIFs? -- View this message in context: http://www.nabble.com/Broken-images-in-mails-tf2071676.html#a5995071 Sent from the SpamAssassin - Users forum at Nabble.com.