spamassassin -report says Wide character in syswrite at /usr/lib/i386-linux-gnu/perl/5.22/IO/Handle.pm line 220.
I'm trying to use spamassassin's ability to report an email as spam to various folks who collect that kind of data: https://wiki.apache.org/spamassassin/ReportingSpam I'm piping the email to "spamassassin -report", and the result I get is: Wide character in syswrite at /usr/lib/i386-linux-gnu/perl/5.22/IO/Handle.pm line 220. This is on Ubuntu 16.04.6 LTS. A newer LTS release came out almost a year ago, and maybe upgrading would fix that. But it kind of looks like this is a bug within spamassassin, and unicode should be getting handled differently? https://www.perlmonks.org/bare/?node_id=329994 I see I have "normalize_charset 1" in my local.cf - https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#normalize_charset-0-1-default:-0 This problem may be specific to email that has "Content-Transfer-Encoding: 7bit", but then includes unicode. For example: http://www.chaosreigns.com/sa/wide.txt (Search for "So you".)
Subscription confirmation flood
I've gotten many subscription confirmation requests today. These rules are getting most of them. I don't claim they're particularly good rules. I'm interested in better options. http://www.chaosreigns.com/sa/subscriptionflood.txt
Re: UTF-8 rule generator script Re: UTF-8 rules, what am I missing?
On 09/29, Jay Sekora wrote: Seems like it would be a huge convenience if either (1) turning on normalize_charset forced interpretation of rule files as UTF-8, (2) there were a similar setting to specify the encoding of rule files, or (3) there were a way on a file-by-file basis to say what charset the rules in the file were in (which is probably best since it would facilitate custom rule sharing across sites). That's off the top of my head with no thought so it may be dumb. :-) I think it's worth opening a bug. If I can copy and paste UTF8, I feel like I really should be able to paste it into a spamassassin rule.
UTF-8 rules, what am I missing?
I created some rules to match Polish text: http://www.chaosreigns.com/sa/polish.txt The rules with only ascii characters work, the ones with utf8 characters don't. According to hexedit, they're identical in my maildir and in my /etc/spamassassin/local.cf. SA can handle UTF-8 strings in rules at least since SA 3.2 on Perl 5.8.x. - http://spamassassin.1065346.n5.nabble.com/UTF-8-Spam-rules-td106485.html $ spamassassin --version SpamAssassin version 3.4.0-rsvnunknown $ perl --version This is perl, v5.10.1 (*) built for i486-linux-gnu-thread-multi spamassassin --lint has nothing to say. This properly prints a euro sign: $ perl -Mcharnames=:full -CS -wle 'print \N{EURO SIGN}' € But spamassassin -t says the rules with non-ascii utf8 characters aren't hitting. What am I missing? If anyone happens upon this email trying to get utf8 stuff straightened out, to get gnome-terminal to work I needed to add: $ cat .gnomerc export LANG=en_US.utf8 To get apache to work I needed: AddDefaultCharset utf-8 The rest is covered here: http://perlgeek.de/en/article/set-up-a-clean-utf8-environment
UTF-8 rule generator script Re: UTF-8 rules, what am I missing?
I wrote a script that takes a list of words with UTF-8 characters, and generates rules matching them: http://chaosreigns.com/code/dl/sawordrule.pl For example: $ echo análisis | perl ./sawordrule.pl SPANISH_ body SPANISH_ANALISIS /\ban[\x{C1}\x{E1}]lisis\b/i # análisis (The two characters per UTF8 character are the upper and lower case characters, because /i apparently doesn't apply to these.) For a bigger example: cat spanish.txt | tr -d ',;.:()-' | tr ' ' '\n' | sort -f | uniq -i | ./sawordrule.pl SPANISH_ spanish.cf A couple untested results: http://www.chaosreigns.com/sa/spanish.cf http://www.chaosreigns.com/sa/polish.cf To be clear, these files will likely flag ALL Polish or Spanish emails as spam. By default, rules have a score of 1, so without a corresponding score line, each of these have a score of 1. The output is going to include some garbage rules you're going to need to manually delete. It's also probably going to include occasional rules which will match English words. I'm sure I missed a couple of these in the .cf files I provided. To use the .cf files, add something like this to your local.cf: include /etc/spamassassin/spanish.cf include /etc/spamassassin/polish.cf On 09/26, John Hardin wrote: On Fri, 26 Sep 2014, dar...@chaosreigns.com wrote: I created some rules to match Polish text: http://www.chaosreigns.com/sa/polish.txt The rules with only ascii characters work, the ones with utf8 characters don't. According to hexedit, they're identical in my maildir and in my /etc/spamassassin/local.cf. Put the hex strings for the accented characters into the RE. I've had the best reliability from placing each byte in its own character class: [\xd0][\x80] Thanks.
Re: UTF-8 rule generator script Re: UTF-8 rules, what am I missing?
On 09/26, Adi wrote: are part of some SPAM messages but normal messages too. You should consider use long phrase to eliminate wrong matching. Many Polish words have many meanings depending on the context. Certainly proper rules that hit only spam would be preferable, but to make any decent attempt at that would require access to a bunch of Polish non-spam for testing, which I do not have. If you (or anybody) are regularly receiving non-spam in a language other than English (and willing to sort it into spam vs. non-spam folders), it would be valuable to the spamassassin project to run the testing script (masscheck) to report how many of your spams and non-spams each of the rules hit. You don't have to give anybody a copy of your emails, just the report of the hit counts. More info here: https://wiki.apache.org/spamassassin/NightlyMassCheck There's also stuff about automatic rule generation here that might be fun: https://wiki.apache.org/spamassassin/WritingRules#Automatic_rule_generation On 09/26, John Hardin wrote: How do you get a one byte match for two-byte-long UTF-8-encoded accented characters? Shouldn't it generate this: I believe it was putting 'export PERL_UNICODE=' in my ~/.bashrc. Documentation is here: http://perldoc.perl.org/perlrun.html#*-C-[_number/list_]* Before I set that environment variable, as you said, I was getting two output characters per two byte long UTF-8 character. Your rule doesn't hit in my test environment (though I just pasted that word into an existing message to test...) Weird.
Non-English spam
I had TexCat set up to detect non-English emails as spam: https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Plugin_TextCat.html But I apparently didn't have the score turned up high enough. The default score for its UNWANTED_LANGUAGE_BODY is 2.800. I just added this to my /etc/spamassassin/local.cf: score UNWANTED_LANGUAGE_BODY 5 Which I expect to be helpful. Since 129 of the 193 spams spamassassin has missed this month hit that rule (and none of my non-spams have). 67%. 39% of them contained the Polish word for district. To enable TextCat to flag everything that's not English, in local.pre I have: loadplugin Mail::SpamAssassin::Plugin::TextCat And in local.cf I have: ok_languages en This post was originally going to be asking if anybody wanted to collaborate on some non-English spam rules. I guess I'll re-consider that after October.
Re: SPF failure very low score
On 08/08, Quanah Gibson-Mount wrote: For SA 3.4.0, it says in 50_scores.cf: # SPF # Note that the benefit for a valid SPF record is deliberately minimal; it's # likely that more spammers would quickly move to setting valid SPF records # otherwise. The penalties for an *incorrect* record, however, are large. ;) However, .001 does not seem LARGE to me at all. I would expect at least a 1. Right now there is tons of facebook spam out there that clearly fails SPF, such as the following: X-Spam-Status: No, score=2.407 tagged_above=-10 required=3 tests=[BAYES_50=0.8, DKIM_ADSP_ALL=0.8, HTML_FONT_LOW_CONTRAST=0.001, HTML_MESSAGE=0.001, KHOP_BIG_TO_CC=0.001, RDNS_NONE=0.793, SPF_FAIL=0.001, T_HEADER_FROM_DIFFERENT_DOMAINS=0.01] autolearn=no How is .001 in any way considered a large penalty? As has been said, SPF is kind of a terrible spam indicator: http://ruleqa.spamassassin.org/?daterev=20130808-r1511618-nrule=SPF_FAIL MSECSSPAM% HAM% S/ORANK SCORE NAME WHO/AGE 0 0.1057 1.4410 0.0680.400.00 SPF_FAIL That says it hits over 10x as large a portion of non-spam as spam. The explanation for the quote is, quite simply, that it is out of date, and you should fix it. -- As humans, we are taught to forget that we are animals. - forward to Johnny The Homicidal Maniac http://www.ChaosReigns.com
Re: ok_languages
Sounds like you didn't load the plugin (in the right place). There's some related stuff on http://wiki.apache.org/spamassassin/ImproveAccuracy On 07/12, Timothy Murphy wrote: When I run spamassin --lint I get the response - [tim@alfred ~]$ sudo spamassassin --lint Jul 12 21:59:15.538 [19228] warn: config: failed to parse, now a plugin, skipping, in /etc/mail/spamassassin/local.cf: ok_languages en it fr de ga - So where do I say now which languages I like? -- Timothy Murphy e-mail: gayleard /at/ eircom.net tel: +353-86-2336090, +353-1-2842366 School of Mathematics, Trinity College, Dublin 2, Ireland -- Let's just say that if complete and utter chaos was lightning, then he'd be the sort to stand on a hilltop in a thunderstorm wearing wet copper armour and shouting 'All gods are bastards'. - The Color of Magic http://www.ChaosReigns.com
Re: 2 Seems To Be My Sweet Spot
The default rule scores are generated with an assumed threshold of 5 and a target of 1 false positive in 2,500 non-spams. It sounds like you may be substantially increasing the false positive rate. Which you are certainly entitled to do, but I would not recommend. http://wiki.apache.org/spamassassin/ImproveAccuracy On 06/03, Bill Polhemus wrote: Hello. I am not a major admin. I have used a Linux box w/ Sendmail + Spamassassin off and on for years, just for personal and small-biz email. I have only two dozen or so accounts allocated among three domains. Using third-party email service for many years, which supposedly includes Spam filtering, I noticed that gradually, of ~500 or so mails per account per day, about 40% are spam. And in fact I noticed perhaps half again as many spam were getting through as were caught in my email service provider's Spam trap (I have no idea what they use). Decided to take things in hand again. After about 3 months of fiddling I've got it to the point where I'm down to maybe two Spam per account per day getting through. Typical SA Bayes files sizes are about 650K Bayes_seen/AWL and 1.2G Bayes_toks Thing is, in order to get this performance I've had to set the threshold for Spam/Ham at a SA score of 2, after all hand-feeding and tweaking I know to do. I lowered it gradually over time by 0.5 every two weeks or so, to this point. So far I've found maybe 1 or 2 false positives per account per week at this scoring. I'm fine with it as is, but thought some folks here might find it interesting to note. William L. Polhemus, Jr. P.E. Sent from my iPhone 5 -- Believe nothing, no matter where you read it or who has said it, even if I have said it, unless it agrees with your own reason and your own common sense. - Buddha, 563-483 B.C. http://www.ChaosReigns.com
Re: Sare anda OpenProject Updates
https://wiki.apache.org/spamassassin/SoughtRules On 05/27, Rejaine Monteiro wrote: Hello guys, There are still some active rules update channel? Sare and Open looks that are no longer available... The SARE rules are broken to the point of being harmful (see in http://wiki.apache.org/spamassassin/SareChannels) OpenProtect' SpamAssassin sa-update channel is obsolete since SARE stopped updating their rulesets. Please stop using this channel (see in http://saupdates.openprotect.com/) -- Whole problem with the world is that fools and fanatics are always so sure of themselves, and wiser people are full of doubts. - George Bernard Shaw http://www.ChaosReigns.com
With similar rules, rspamd is about ten times faster than SpamAssassin.
http://freecode.com/projects/rspamd Somebody asked about it in IRC today. I don't know anything about it. -- You will need: a big heavy rock, something with a bit of a swing to it... perhaps Mars - How to destroy the Earth http://www.ChaosReigns.com
Re: RCVD_IN_DNSWL_HI false negatives (my solution)
On 02/07, Lutz Petersen wrote: If you use mobile.de as a forwarder, it may make sense to add there IPs to your trusted_networks configuration. If you do this, the DNSxL tests are applied to the IP _before_ the mobile.de hop. That is no problem special to us or our customers. The whitelist level for the four mobile.de IPs in the dnswl simply is wrong. Instead of HI a level of NONE would be right. FYI, the guy you were replying to there runs dnswl. It sounds like one of your customers has created a mobile.de account, and requested that email to that account be forwarded to an address for which you are hosting mail. If that is the case, this is what spamassassin would call a trusted relay, and you should add mobile.de's IPs as trusted relays, like: trusted_networks 194.50.69.1 This will cause spamassassin to use the IP from the relay before mobile.de for blacklist and whitelist (dnswl) lookups. It's kind of an awkward, inconvenient situation. But if your customer has requested these emails be relayed, it's kind of unreasonable for you to expect dnswl to delist them. Does that all make sense? On the other hand, if nobody ever requested that these emails be relayed, and you can firmly establish that, I (and a couple other people in this thread) would be happy to drop their score in dnswl. It just doesn't sound like that's what's happening. As Niamh mentioned, dnswl.org has no record of abuse reports, or blacklists listing this IP, which is further evidence that something else is going on in your situation. (I'm also an (inactive) dnswl admin.) -- The whole aim of practical politics is to keep the populace alarmed -- and hence clamorous to be led to safety -- by menacing it with an endless series of hobgoblins, all of them imaginary. - H. L. Mencken http://www.ChaosReigns.com
Do you have your trusted networks configured correctly?
I feel like this comes up often enough, people not having trusted_networks or internal_networks set. Probably for most people it's unnecessary. But if you have some server relaying / forwarding mail to your server, and you don't have one of these set, spamassassin is using the IP address of that relaying server for blacklist lookups, which is not useful. And all you have to do is add a line to your local.cf containing: trusted_networks IP Where IP is the IP address of the relaying machine. You can have multiple, separated by a space. Often, it seems, people are getting email relayed and have forgotten about it. So to look for that, you can add to your local.cf: add_header all RelaysUntrusted _RELAYSUNTRUSTED_ Then wait till you get a bunch of email, then run something like: cat ~/Maildir/cur/* ~/Maildir/new/* | grep ^X-Spam-RelaysUntrusted | cut -d' ' -f3 | sort | uniq -c | sort -nr | less This will list the untrusted IPs you most commonly get email from. You should make sure the ones near the top aren't actually trusted relays you should add to trusted_networks. These are the related wiki pages: http://wiki.apache.org/spamassassin/TrustPath http://wiki.apache.org/spamassassin/TrustedRelays I should probably add this testing stuff somewhere. -- I'd rather be happy than right any day. - Slartiblartfast, The Hitchhiker's Guide to the Galaxy http://www.ChaosReigns.com
Re: ANNOUNCEMENT: update to ivmURI regarding surge in rarely-blacklisted domains spammers use from legit site that are compromised
What spamassassin rules is this related to? On 01/07, Rob McEwen wrote: ANNOUNCEMENT: update to ivmURI regarding surge in rarely-blacklisted domains spammers use from legit site that are compromised There has been a surge during the past couple of days in rarely-blacklisted domains (as in, you see few of these blacklisted on SURBL/URIBL/DBL) ...where the spammers used compromised sites which are normally legit sites. (maybe the FTP password was cracked? or some other security hole exploited?) Likewise, ivmURI was missing many of these because our FP-prevention-filters... which normally prevent decoy domains or innocent domains from getting blacklisted... were also causing many of these to be overlooked. (I suspect that the same was happening with the other URI blacklists, since [it seems?] even fewer of these were getting blacklisted on those other URI/domain blacklists?) This isn't new. For months, it has been on my mind to make some adjustments to surgically target listing these types of domains... where our FP-prevention-filters would then back off just a tad... yet in a very surgically targeted way... so that these would start blacklisting, yet without those changes to the filters suddenly causing many FPs, and where these domains would also expire off of ivmURI faster--with the idea that the site owners would probably find and fix their problem somewhat quickly. (we don't want these to remain blacklisted weeks after the spam has ceased and the security problem fixed) Yes, this WILL cause a tiny bit of collateral damage... but my estimation is that the ratio is off-the-chart GOOD! These are relatively minor sites. This could potentially cause hundreds of thousands of spams blocked for every one legit mail blocked. And if someone STILL has a problem with that ratio... then my message to them is... the site owner should be somewhat held accountable for their poor security--which is partly at fault for so much elusive spam making it into inboxes! (and, again, these listings will expire MUCH faster than regular ivmURI listings) Many of these spams are especially elusive because the spammers then combine the use of a somewhat legit domain... with sending from freemail servers, or other legit mail servers which would cause far too much collateral damage if blocked by IP. At best, this puts a HUGE burden on content filters. At worst, many of these are slipping past many spam filters. This major milestone improvement for ivmURI was implemented mere hours ago. Here are some results... where these were added to the ivmURI list today: http://dnsbl.invaluement.com/uri_surge.txt NOTE: These are all domains impacted by this change. Unfortunately, many in that list would been blacklisted on ivmURI anyways, without the changes... but many domains in that list required this change to get listed on ivmURI. Also, across the board, you'll also find very few in that list which are on ANY other URI blacklists! Questions/Feedback are welcome! -- Rob McEwen http://dnsbl.invaluement.com/ r...@invaluement.com +1 (478) 475-9032 -- And I got these stunning rushes of pure timeless joy, when my consciousness seemed to expand outwards from the limits of my skin to fill the universe and I could no longer tell whether I was playing the music or the music was playing me. - http://www.catb.org/esr/writings/dancing.html http://www.ChaosReigns.com
Re: Is the SpamAssassin wiki dead?
You need to create an account on the wiki, then post to the dev list requesting write access, mentioning the user name of the account you created. As it says at the bottom of http://wiki.apache.org/spamassassin/ On 01/07, Jeremy Morton wrote: Sorry, I'm not sure what you mean by added me. I don't think I already had an account with username jez so I was expecting to be send a password too. What should I do? -- Best regards, Jeremy Morton (Jez) On 06/01/2013 14:25, Jeremy McSpadden wrote: Kevin added you back on the 31st. Should be done. Happy new year, KAM On 12/28/2012 7:53 AM, Jeremy Morton wrote: Hi, Please add me to the Contributors Group with the wiki username jez. -- Jeremy McSpadden Flux Labs | Endless Solutions Cell : 850-890-2543 | Fax : 850-254-2955 On Jan 6, 2013, at 6:50 AM, Jeremy Morton ad...@game-point.net mailto:ad...@game-point.net wrote: I've been trying to get edit access to the SpamAssassin wiki now for weeks, and have gotten nowhere. Is the wiki just dead now? Should someone else start a documentation project for SpamAssassin? It's pretty ludicrous that nobody even seems to care about letting people improve the documentation when they are willing to do so. -- Best regards, Jeremy Morton (Jez) -- All that is necessary for evil to triumph is for good men to do nothing - War and Peace (film series) http://www.ChaosReigns.com
Re: the sa-rules tarball http://spamassassin.apache.org/ is ancient
On 12/08, Per Jessen wrote: FYI, see $SUBJ. Just noticed I opened a bug about this nearly a year and a half ago: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6632 -- Anarchy is based on the observation that since few are fit to rule themselves, even fewer are fit to rule others. -Edward Abbey http://www.ChaosReigns.com
Re: sa-update generates errors
Probably this known problem, bug open for over a year: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6649#c19 The initial comments make it sound like a simple problem of not correctly escaping rules containing binary data. While it is actually a much more complicated problem related to the same thing. On 12/17, Eric Krona wrote: From time to time when sa-update is running, I get errors in the output. Like today I got: Illegal octal digit '8' ignored at /usr/share/perl5/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 1083, $fh line 1097. re2c: error: line 170, column 2: unterminated string constant (missing ) command 're2c -i -b -o scanner2.c scanner2.re' failed: exit 1 What is the reason for it, are some rules poorly written, or do I miss some library or what could be the problem? /eric -- You shall know the truth, and it shall make you odd. -- Flannery O'Connor http://www.ChaosReigns.com
Re: sa-update generates errors
Can this error at least be improved to state which input file the error is associated with? On 12/17, Eric Krona wrote: From time to time when sa-update is running, I get errors in the output. Like today I got: Illegal octal digit '8' ignored at /usr/share/perl5/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm line 1083, $fh line 1097. re2c: error: line 170, column 2: unterminated string constant (missing ) command 're2c -i -b -o scanner2.c scanner2.re' failed: exit 1 What is the reason for it, are some rules poorly written, or do I miss some library or what could be the problem? /eric -- Hermes will help you get your wagon unstuck, but only if you push on it. - Greek Alphabet Oracle http://www.ChaosReigns.com
Re: the sa-rules tarball http://spamassassin.apache.org/ is ancient
On 12/08, Per Jessen wrote: FYI, see $SUBJ. Much like the 3.2.5 release which that page still unfortunately implies is reasonable to use. I'd love an explanation of a situation where somebody is running spamassassin but can't run sa-update, even once. I hear that exists. -- We will be dead soon. Is this how we want to live? http://www.ChaosReigns.com
Re: Report your webmail usage
On 12/04, David F. Skoll wrote: http://sourceforge.net/projects/aper/ Their phishing_links file did have the URL you reported in it: But did it contain that url at the time he received the email? That seems to be a very important question with these things. So all some kind soul needs to do is write a SpamAssassin plugin that gets the link list from the project and looks for URLs in message bodies (or even just the Google formkey values which are pretty likely to be unique.) Or a script, similar to their https://aper.svn.sourceforge.net/svnroot/aper/addresses2spamassassin.pl which grabs https://aper.svn.sourceforge.net/svnroot/aper/phishing_links and converts it to SA rules. Since something (other than an SA plugin) is going to need to download the file anyway, might as well convert it to rules in the process. Shouldn't be too hard, right? Maybe use \Q\E to avoid needing to escape everything? Oh, somewhat off-topic but in case anyone with clout at Google is reading this: More than a year ago, I recommended to Google that all of their user-created forms should display this text: This is a user-created form hosted at Google. Do not enter sensitive information such as credit card numbers or passwords. If you are asked to enter such information, please report this form as abusive. but Google never got back to me. It seems to me they're complicit in helping phishers... You think people who will enter sensitive information into a random web form will even read that warning? Or be prevented from entering that information even if they do read it? Also, it seems like it would be pretty obnoxious for people who constantly use that stuff legitimately (which I don't). On 12/04, Eric Krona wrote: -0.5 BAYES_05 BODY: Bayes spam probability is 1 to 5% Is your bayes data poisoned? ( http://wiki.apache.org/spamassassin/ImproveAccuracy ) -- I don't want to die... just yet... not while there's... women. - J. Matthew Root, 8/23/02 (http://www.jmrart.com/) http://www.ChaosReigns.com
Can somebody unsubscribe me...@leigh.ssllock.com from this list?
I'm guessing they're sending this garbage to everybody who posts. - Forwarded message from MDaemon at leigh.ssllock.com mdae...@leigh.ssllock.com - Date: Tue, 04 Dec 2012 17:19:58 -0600 From: MDaemon at leigh.ssllock.com mdae...@leigh.ssllock.com Reply-To: nore...@leigh.ssllock.com To: dar...@chaosreigns.com Subject: Transient Delivery Failure X-DNSWL: No -- MDaemon Delivery Status Notification - http://www.altn.com/dsn -- The attached message had TEMPORARY non-fatal delivery errors. -- THIS IS A WARNING MESSAGE ONLY - YOU DO NOT NEED TO RESEND YOUR MESSAGE -- MDaemon is configured to automatically retry delivery at configured intervals. Subsequent attempts to deliver this message are pending. Failed address: ol2...@company.mail --- Session Transcript --- Tue 2012-12-04 17:19:33: [54:1] Session 54; child 1 Tue 2012-12-04 17:19:33: [54:1] Parsing message \pd5003000.msg Tue 2012-12-04 17:19:33: [54:1] * From: dar...@chaosreigns.com Tue 2012-12-04 17:19:33: [54:1] * To: ol2...@company.mail Tue 2012-12-04 17:19:33: [54:1] * Subject: Re: Report your webmail usage Tue 2012-12-04 17:19:33: [54:1] * Size (bytes): 6325 Tue 2012-12-04 17:19:33: [54:1] * Message-ID: 20121204224257.gj12...@chaosreigns.com Tue 2012-12-04 17:19:33: [54:1] Attempting SMTP connection to [company.mail] Tue 2012-12-04 17:19:33: [54:1] Resolving MX records for [company.mail] (DNS Server: 10.20.20.105)... Tue 2012-12-04 17:19:33: [54:1] Match to MXCACHE.DAT file: Tue 2012-12-04 17:19:33: [54:1] * P=010 D=company.mail TTL=(0) MX=[company.mail] {10.10.42.34} Tue 2012-12-04 17:19:33: [54:1] Attempting SMTP connection to [10.10.42.34:25] Tue 2012-12-04 17:19:33: [54:1] Waiting for socket connection... Tue 2012-12-04 17:19:54: [54:1] * Winsock Error 10060 Tue 2012-12-04 17:19:54: [54:1] * 10.10.42.34 added to connection failure cache for 5 minutes Tue 2012-12-04 17:19:54: [54:1] This message is 36 minutes old; it has 0 minutes left in this queue Tue 2012-12-04 17:19:54: [54:1] Remote queue lifetime exceeded; message placed in retry queue --- End Transcript --- -- This is a test server. Please do not submit support requests via this channel. X-MDAV-Result: clean X-MDAV-Processed: leigh.ssllock.com, Tue, 04 Dec 2012 16:43:26 -0600 Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by leigh.ssllock.com (leigh.ssllock.com) (MDaemon PRO v13.0.3) with ESMTP id md5008389.msg for me...@leigh.ssllock.com; Tue, 04 Dec 2012 16:43:26 -0600 Authentication-Results: leigh.ssllock.com spf=pass smtp.mail=users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org; x-ip-ptr=pass dns.ptr=hermes.apache.org (ip=140.211.11.3); x-ip-helo=pass smtp.helo=mail.apache.org (ip=140.211.11.3); x-ip-mail=hardfail smtp.mail=users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org (does not match 140.211.11.3); dkim=pass header.d=chaosreigns.com (b=X4pc00xgJL; 1:0:good); Received-SPF: pass (leigh.ssllock.com: domain of users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org designates 140.211.11.3 as permitted sender) x-spf-client=MDaemon.PRO.v13.0.3 receiver=leigh.ssllock.com client-ip=140.211.11.3 envelope-from=users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org helo=mail.apache.org X-Spam-Processed: leigh.ssllock.com, Tue, 04 Dec 2012 16:43:26 -0600 (not processed: message spf and/or cryptographically verified and approved) X-MDPtrLookup-Result: pass dns.ptr=hermes.apache.org (ip=140.211.11.3) (leigh.ssllock.com) X-MDHeloLookup-Result: pass smtp.helo=mail.apache.org (ip=140.211.11.3) (leigh.ssllock.com) X-MDMailLookup-Result: hardfail smtp.mail=users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org (does not match 140.211.11.3) (leigh.ssllock.com) X-MDDKIM-Result: unapproved (leigh.ssllock.com) X-MDSPF-Result: pass (leigh.ssllock.com) X-Rcpt-To: me...@leigh.ssllock.com X-MDRcpt-To: me...@leigh.ssllock.com X-MDRemoteIP: 140.211.11.3 X-Envelope-From: users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org X-CAV-Result: clean Received: (qmail 24505 invoked by uid 500); 4 Dec 2012 22:43:22 - Mailing-List: contact users-h...@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: mailto:users-h...@spamassassin.apache.org list-unsubscribe: mailto:users-unsubscr...@spamassassin.apache.org List-Post: mailto:users@spamassassin.apache.org List-Id: users.spamassassin.apache.org Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 24496 invoked by uid 99); 4 Dec 2012
Re: Spamassassin test files: sample-nonspam.txt and sample-spam.txt are missing?
They're in the Debian package I have installed, and the subversion source tree. Sounds like a FreeBSD packaging problem. In the source: http://svn.apache.org/repos/asf/spamassassin/trunk/sample-nonspam.txt http://svn.apache.org/repos/asf/spamassassin/trunk/sample-spam.txt On 11/26, Ed Flecko wrote: Hi folks, I'm running SpamAssassin version 3.3.2 (running on Perl version 5.14.2) on FreeBSD 9.0. I've installed Spamassassin from the FBSD ports collection by: # cd /usr/ports/mail/p5-Mail-SpamAssassin # make config ; make -D WITH_DCC install clean I'm trying to test spamassassin using the sample-nonspam.txt and sample-spam.txt files...but I can't find them anywhere! Is it possible that when I installed spamassassin using the install clean method that I wiped out my sample files? If so...how do I test spamassassin? Thank you! Ed -- For every complex problem, there is a solution that is simple, neat, and wrong. - H. L. Mencken http://www.ChaosReigns.com
Re: Provide sa-learn with a CSV file of spam and ham?
--mboxInput sources are in mbox format --mbx Input sources are in mbx format --folders=filename, -f filename sa-learn will read in the list of folders from the specified file, one folder per line in the file. If the folder is prefixed with ham:type: or spam:type:, sa-learn will learn that folder appropriately, otherwise the folders will be assumed to be of the type specified by --ham or --spam. type above is optional, but is the same as the standard for ArchiveIterator: mbox, mbx, dir, file, or detect (the default if not specified). - http://spamassassin.apache.org/full/3.3.x/doc/sa-learn.html So you can specify an input format of mbox, mbx, dir (maildir), file, or detect. Looks like no csv. I'd guess a lot of people use spamassassin without bayes. On 11/26, Ed Flecko wrote: Hi folks, I'm running SpamAssassin version 3.3.2 (running on Perl version 5.14.2) on FreeBSD 9.0. I've exported a bunch of spam and ham messages from my Baracuda 400. I have an Excel .csv file of about 2500 spam messages and 2500 ham messages, and I'm wondering if I can supply those as a parameter to sa-learn? I've looked at the documentation (http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html) and I see that you can pass the file as a parameter, but I'm not clear how you'd do that and in what format the file needs to be? CAN it be a .csv or should it be something else? I'm new to spamassassin, but (for those of you more familiar with the product), teaching spamassassin is TYPICALLY the first thing one would do before deploying it in a production environment, wouldn't you? Thank you, Ed -- Hermes will help you get your wagon unstuck, but only if you push on it. - Greek Alphabet Oracle http://www.ChaosReigns.com
Re: wrong RCVD_IN_PBL?
This is quite different. The IP delivering the email to your server is what's hitting RCVD_IN_PBL. Providing that part of the spamassassin -t output so I didn't need to do it myself would've been helpful. pts rule name description -- -- 3.6 RCVD_IN_PBLRBL: Received via a relay in Spamhaus PBL [82.165.159.34 listed in zen.spamhaus.org] On 11/20, Andreas Schulze wrote: I have a similiar issue with a web.de (german webmail) user. He uses his iPhone to submit mail via web.de submission service. (TLS + Authentication) The message triggers RCVD_IN_PBL and others. Any hint to make those message pass sa? here are the headers: --- snip X-Spam-Status: Yes, score=7.14 tag=-999 tag2=5 kill=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, HTML_IMAGE_ONLY_12=2.059, HTML_MESSAGE=0.001, MTX_NONE=0.001, RCVD_IN_PBL=3.335, RCVD_IN_PSBL=2.7, RCVD_IN_RP_RNBL=1.31, RP_MATCHES_RCVD=-0.369, TVD_SPACE_RATIO=0.001] autolearn=no X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on idvamavis03.datev.de X-Spam-ASN: AS8560 82.165.0.0/16 Received: from mout-xforward.web.de (mout-xforward.web.de [82.165.159.34]) by idvmailin03.datev.de (Postfix) with ESMTP id 3Y5btV2sQ8z690G; Tue, 20 Nov 2012 20:04:02 +0100 (CET) Received: from [192.168.178.43] ([93.205.254.85]) by smtp.web.de (mrweb102) with ESMTPSA (Nemesis) id 0MA5v3-1TPekj36PR-00BSUp; Tue, 20 Nov 2012 19:59:01 +0100 Subject: test References: a0323c6a-fb02-42df-aa94-c97672816...@web.de From: foo...@web.de foo...@web.de Mime-Version: 1.0 (1.0) Content-Type: multipart/alternative; boundary=Apple-Mail-87E5DAF2-18C6-4FCD-BF0D-CD6386E473CE X-Mailer: iPhone Mail (10A523) Message-Id: e41b88ea-b9cf-4ab1-a033-c2c7c0a13...@web.de Date: Tue, 20 Nov 2012 19:58:57 +0100 Cc: foo...@datev.de Content-Transfer-Encoding: 7bit To: foo...@datev.de X-Provags-ID: V02:K0:EvqK/RN09UfFRommwYltjAXMl2r5JXh5KWYmQ/XvFE7 v78RzfvGZ2i90sbUnAmle0j16h4tGzLgsFuwPaanb1zpyriAC1 wbvb4NZuBy1wZDi2uIhlRUmtyTNNXdYa4InULTNS7wG4t+vqOm ugaM5p60njVb35BTzZd8ONV2nh4sL0Mke/7RawEhWRPZkuXKs8 LiB5mlVf7ikRcHdur53ew== --Apple-Mail-87E5DAF2-18C6-4FCD-BF0D-CD6386E473CE --- snap -- My definition of a free society is a society where it is safe to be unpopular. - Adlai E. Stevenson Jr. http://www.ChaosReigns.com
Re: wrong RCVD_IN_PBL?
On 11/17, umeca74 wrote: Received: from hppro (ppp-94-68-74-194.home.otenet.gr [94.68.74.194]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) I believe if that said ESMTPA instead of ESMTP, you would not have that problem are you sure? I will report it to my ISP No, I'm not sure, which is why I said I believe and But I haven't actually looked into those details lately. We need better documentation of this. But I am very confident something along these lines is your problem, and that it's appropriate to complain to your ISP that they're not properly indicating authentication in the received header they're adding. -- If you would be a real seeker after truth, it is necessary that at least once in your life you doubt, as far as possible, all things. - Rene Descartes http://www.ChaosReigns.com
Re: wrong RCVD_IN_PBL?
On 11/18, RW wrote: Whilst that wont hurt, it's not the real cause of the problem here which rests entirely with UnifiedeMail.net. Whilst it would have prevented this FP, authentication is intended to solve a different problem. It shouldn't be necessary to have a workaround for the internal network being needlessly allowed to bleed into a remote private network. I wouldn't worry too much about this, it's not a general problem. I disagree. I think indicating the authentication is a better option than chopping off the early received header(s). -- I'd rather be happy than right any day. - Slartiblartfast, The Hitchhiker's Guide to the Galaxy http://www.ChaosReigns.com
Re: wrong RCVD_IN_PBL?
On 11/17, Frederic De Mees wrote: From: umeca74 umec...@hotmail.com 3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [94.68.74.194 listed in zen.spamhaus.org] Your IP (ppp-94-68-74-194.home.otenet.gr is: 94.68.74.194) looks like a dynamic home user subscriber line (adsl, cable, dialup). PBL contains ranges of IP addresses that should never send e-mail directly to other domains. You should use Otenet's SMTP service offered with your subscription as a relay host (smart host), or rent a dedicated server/VPS in a colo as an alternative. No, all this should be completely unnecessary, and handled by spamassassin detecting an indication of authentication in the received header. That indication of authentication is missing. I'd suggest complaining to the mail server provider about it. Received: from hppro (ppp-94-68-74-194.home.otenet.gr [94.68.74.194]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0LhkkD-1Svsfh1rOL-00mkUj; Sat, 17 Nov 2012 04:20:25 +0100 I believe if that said ESMTPA instead of ESMTP, you would not have that problem. But I haven't actually looked into those details lately. We need better documentation of this. -- The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw http://www.ChaosReigns.com
Re: wrong RCVD_IN_PBL?
I don't think that should cause triggering RCVD_IN_PBL. On 11/17, Frederic De Mees wrote: There is one line missing in the following path: = Received: from mx.mg2.unifiedemail.net ([10.251.10.236]) by corpserv1.corp.unifiedemail.net with Microsoft SMTPSVC(6.0.3790.4675); Fri, 16 Nov 2012 22:20:32 -0500 Received: from ([127.0.0.1]) with MailEnable ESMTP; Fri, 16 Nov 2012 22:20:28 -0500 Received: from hppro (ppp-94-68-74-194.home.otenet.gr [94.68.74.194]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0LhkkD-1Svsfh1rOL-00mkUj; Sat, 17 Nov 2012 04:20:25 +0100 = A no time the message shows that Unifiedmail has received it from kundenserver. Have you submitted your sample to Unifiedemail via the webform, or via e-mail ? Frédéric - Original Message - From: umeca74 umec...@hotmail.com To: users@spamassassin.apache.org Sent: Saturday, November 17, 2012 5:00 PM Subject: Re: wrong RCVD_IN_PBL? Your IP (ppp-94-68-74-194.home.otenet.gr is: 94.68.74.194) looks like a dynamic home user subscriber line (adsl, cable, dialup). that's correct PBL contains ranges of IP addresses that should never send e-mail directly to other domains. that's what I'm saying, I am NOT sending emails directly from this IP, the SMTP server is located in germany (1and1.co.uk) and I am connecting to it using an encrypted authorized connection. That's why I think there is a problem with spam assassin's RCVD_IN_PBL report! -- View this message in context: http://spamassassin.1065346.n5.nabble.com/wrong-RCVD-IN-PBL-tp102334p102340.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com. -- No human thing is of serious importance. - Plato http://www.ChaosReigns.com
Re: wrong RCVD_IN_PBL?
On 11/16, umeca74 wrote: Hello I am doing some tests sending my emails to contentanaly...@unifiedemail.net to assess their spamminess when I send an email through e.g. hotmail, then it is low scored by spamassassin if I use MS Outlook to go through my SMTP server I immediately see a hefty spam score on account of a blocked IP address: 3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [94.68.74.194 listed in zen.spamhaus.org] The explanation given there is that I am not using authenticated SMTP, whereas I *am* using an authenticated SMTP connection through port 587 is there something wrong with spam assassin here or is it my fault? Your MTA isn't mentioning the authentication in the relevant received header in a way that spamassassin recognizes. -- The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw http://www.ChaosReigns.com
Re: wrong RCVD_IN_PBL?
On 11/16, umeca74 wrote: thanks for your reply. By MTA you mean my email program, Microsoft Outlook? I didn't change any of its settings, is there anything I could try? No, your mail server software. If your mail client (outlook) could add it, then any client could forge that information. Providing full headers would probably make it easier to help you. -- You shall know the truth, and it shall make you odd. -- Flannery O'Connor http://www.ChaosReigns.com
Re: Regex Help
On 11/10, Marc Perkel wrote: Need a rule to catch this: HtTp://goOGleplAcESSEOopTimiZaTIonx.cOm body GOOGLEMIXED /HtTp:\/\/goOGleplAcESSEOopTimiZaTIonx.cOm/ Untested, because I kind of expect that's not actually what you want. If you want something to match things that look similar to this, you need to provide multiple examples. -- it's not how good you are, it's how bad you want it - no fear http://www.ChaosReigns.com
Re: Claims manager / LOTTO_AGENT
Just in case nobody has pointed you toward it before: https://wiki.apache.org/spamassassin/NightlyMassCheck Stats we currently have on that rule: http://ruleqa.spamassassin.org/?daterev=20121103rule=LOTTO_AGENT MSECSSPAM% HAM% S/ORANK SCORE NAME WHO/AGE 0 0.5022 0.0011 0.9980.743.50 LOTTO_AGENT It hits 2 of the 180,272 non-spams we have for use in optimal score generation. On 11/07, Michael Orlitzky wrote: So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points. This is bad news for, Barbara R. Krieg, Claims... When you put a string an an email that hits a spamassassin rule... your email then hits that spamassassin rule. You should generally try to avoid that. -- It's never too late to panic. http://www.ChaosReigns.com
Re: Claims manager / LOTTO_AGENT
On 11/07, Michael Orlitzky wrote: Yeah, well it's her job title, so...? You misunderstand statistics. The data aren't wrong. Do I? I think it's more likely that you misunderstand what is expected of spamassassin rules. Somebody really should put up a page in the wiki explaining that rules all have false positives, and that's the entire reason we don't flag an email as spam for any one rule, etc.. But if you provide us with more masscheck data, we can do a better job of automatically calculating ideal scores. -- Of course there's strength in numbers. But there's strength in sharp weaponry too. Ironically, this lead to what we call 'civilization'. - spore http://www.ChaosReigns.com
Re: Claims manager / LOTTO_AGENT
On 11/07, Michael Orlitzky wrote: On 11/07/2012 09:49 PM, dar...@chaosreigns.com wrote: On 11/07, Michael Orlitzky wrote: So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points. This is bad news for, Barbara R. Krieg, Claims... When you put a string an an email that hits a spamassassin rule... your email then hits that spamassassin rule. You should generally try to avoid that. Yeah, well it's her job title, so...? You misunderstand statistics. The data aren't wrong. After re-reading, I think you may have misunderstood my suggestion to avoid putting stuff in emails that is known to hit spam rules. I wasn't suggesting that Barbara R. Krieg change her signature, I was suggesting that you not include it intact when posting to this mailing list about it. -- You shall know the truth, and it shall make you odd. -- Flannery O'Connor http://www.ChaosReigns.com
Re: Claims manager / LOTTO_AGENT
On 11/07, Michael Orlitzky wrote: Sorry, I was a little rude. But saying that she shouldn't put her job title anywhere in an email, ever, is ridiculous. Certainly. The inputs (spam, ham) to the classifier are assumed god-given; and the classification needs to reflect the data, not the other way around. If the classifier is spamassassin, and The inputs are the spam and ham data provided via masscheck, then... the scores provided via sa-update *do* reflect the data. So I'm not sure what you mean. The ideal rule scores are chosen to cause one false positive (ham flagged as spam) in every 2,500 hams, while maximizing the number of spams correctly flagged as spams. With so few hams hitting this rule in the masscheck corpora, we're way below that threshold based on the data we have. This is my fault, of course, but I'm not allowed to mass-check this stuff. It's ongoing legal correspondence. Er, what? You're not allowed to provide a list of which rules hit each of your emails? Or you're not allowed to run a program on your emails that isn't spamassassin? Or did I just not put This does not require sending us your email in bold enough times on the masscheck page? -- It's never too late to panic. http://www.ChaosReigns.com
Re: HK_LOTTO hitting ham from the UK national lottery
On 11/01, Niamh Holding wrote: Hello Darxus, Wednesday, October 31, 2012, 10:34:42 PM, you wrote: dcc They're talking about automated score generation. Currently, apparently, dcc the scores for this rule are fixed, and not included in the calculation of dcc ideal scores. So currently submitting the ham to the corpus won't actually help change anything? Yes. But two of the developers have agreed that's worth changing, so it could happen today And that could change the scores in either direction. -- If you believe everything you read, better not read. - Japanese Proverb http://www.ChaosReigns.com
Re: HK_LOTTO hitting ham from the UK national lottery
On 10/31, Niamh Holding wrote: A if you provide a few dozen samples of these hammy msgs , they can be A included in the SA ham corpus That can be supplied, an mbox of a good supply do? A you can directly contribute to rescoring by running a masscheck instance A as per: A http://wiki.apache.org/spamassassin/NightlyMassCheck Currently not so easy as- a) all high scoring spam is dumped by procmail b) I'd need to get back from all the users details of misclassified messages so they could be moved to the correct corpora. You could just provide a few dozen samples of these hammy msgs via masscheck. The more you can provide, and the more representative it is, the better. Not including high scoring spam isn't a big problem. Things spamassassin gets wrong are most useful. The automated score generation used for the sa-updates comes from email from about fourteen people, so anything you can provide would probably be beneficial. At the bottom of that page is an UploadedCorpora link which you can use to upload the emails themselves without even needing to run masscheck yourself. -- You only truly own what you can carry at a dead run. - 14th 15th century Landsknechts http://www.ChaosReigns.com
Re: HK_LOTTO hitting ham from the UK national lottery
On 10/31, jdow wrote: On 2012/10/31 14:05, John Hardin wrote: On Wed, 31 Oct 2012, Kevin A. McGrail wrote: Shouldn't it be set via GA in 72_scores.cf ? Doesn't sound like a bad idea to comment it in 50_scores.cf and let it float. +1. That's what threw me when I did my quickie analysis early on. RaaallY? Would it not be better to put in a line like this? score HK_LOTTO 0 50_scores.cf would be continually getting overwritten by updates, would it not? They're talking about automated score generation. Currently, apparently, the scores for this rule are fixed, and not included in the calculation of ideal scores. They're talking about including it in the calculation of ideal scores. Which you download the results of from sa-update. They're not talking about local score modification. -- Eh, wisdom's overrated. I prefer beatings and snacks. - Unity, Skin Horse http://www.ChaosReigns.com
Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'
On 10/28, Alexandre Boyer wrote: I understood that. I however need to rescore my ruleset because the setup I inherited was 1) not updated with sa-update and 2) manually maintained (with , for example, lot's of perso rules that essentially do the same as the SA rules added over time). I don't understand why re-scoring seems like a necessary step to you. One thing I really want you to understand is that the automated main SA re-scoring does not happen unless we have 150,000 spams, and 150,000 hams (non-spams). Because we do not trust the results to be sufficiently accurate / reliable with fewer. If you can get that many hand classified hams and spams together, that's awesome, I envy you, and I think that would be a great idea for your accuracy. However, I doubt it. If you do get re-scoring to work at all, I strongly encourage you to update the wiki. I'm sure that section is particularly in need of love because nobody ever does that. Just create an account on the wiki, and email the dev mailing list to request write access. The age thresholds for re-scoring are: Ham: 6 years (crazy, right? another reason we need more data) Spam: 2 months As a brutal reset is out of question, I need to do things step by step, rescoring being one of them prior to have my threshold back to 5 and sa-update enabled. Taking things step by step sounds reasonable enough. Re-scoring doesn't. All this being my own private problem, nothing to do with our off topic exchange :-) Eh, it's some obscure usage, but I still think it's entirely appropriate to discuss here. Arround 10 corpora. Are those corpora used tu run the SA mass-check on SA servers or do it also include what I will send one day (my mc logs)? I'll assume you'll find my email which said more on this subject, instead of replying to some of this again. -- Life is either a daring adventure or it is nothing at all. - Helen Keller http://www.ChaosReigns.com
Masscheck Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'
On 10/26, Alexandre Boyer wrote: Well, discouraged was implicit (as is the fact that every admin is I don't think there's anything implicit about it being discouraged to use a threshold below 5. There are lots of local changes which are far less likely to cause problems, and encouraged. The SA rules scores are computed based on the mass-checks, from the project and, to some extend, from contributors. A good question is: how many contributors really give a feedback on the mass-checks? This is public information, although not very explicit. On http://ruleqa.spamassassin.org/ look in the green box, it lists all the corpora included: axb-coi-bulk axb-fraud axb-generic axb-ham-misc axb-sa-users axb-woas bb-guenther_fraud bb-jhardin bb-jhardin_fraud bb-jm bb-kmcgrail bb-zmi bpoliakoff danmcdonald darxus grenier jarif kpg-gah mas zmi The ones starting with bb- are uploaded emails, instead of running masscheck locally, it's run centrally. Other than that, the prefixes are each different contribtors. So: axb, guenther, jhardin, jm, kmcgrail, zmi, bpoliakoff, danmcdonald, darxus, grenier, kpg-gah, kpg, mas, zmi. 14 masscheck contributors. We'd probably benefit a lot by significantly increasing that, which is why I mention it somewhat often. This is something I do not know, but the fewer they are, the greater the bias is. Bias in spam and ham samples. Emails reaching my servers are different from yours and from each and every SA users. Absolutely. Unless everybody on earth run a nightly mass-check and report results to SA project for it to compute a world wide scoring, there is a bias. At least this is my understanding, may be I'm wrong, please correct me if so. No, you're totally right. We do what we can with what we have, and I think we do pretty darn good. But we could do better with more data. For example, I'm in the process of learning to use mass-check to contribute back to SA (which implies a lot of hard work, simply to build and maintain valid ham/spam corpora, use mass-check, then hit-freq, then fp-fn-stat, I'm not even close to understand how to compute a re-score. I don't know what fp-fn-stat is. You don't need to computer a re-score - that's part of what is done with your maccheck data after you upload it. There's a reletively recently created mailing list specifically for helping people with this stuff, to which I believe you automatically get subscribed when you get a masscheck account: http://wiki.apache.org/spamassassin/MailingLists#RuleQA If you're having difficulty with it, the docs probably need improvement, so do let us know. Your mention of fp-fn-stat makes me think you may have veered a little too far from https://wiki.apache.org/spamassassin/NightlyMassCheck with this, I'm not sure my contribution would be sufficient to make SA scores to be closer to my email traffic reality. I think it would. For example, I'm sure, from what you've posted, that you have enough examples of hams that hit DEAR_SOMETHING that the score of it would drop significantly. Do you have any stat about how many contributors are giving a feedback on the masscheck? and about their geographical location? I'm just asking because I was not able to find this kind of information anywhere. I believe they're almost all in the US, primarily English speakers. That's bad. -- You only truly own what you can carry at a dead run. - 14th 15th century Landsknechts http://www.ChaosReigns.com
Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'
On 10/25, Bowie Bailey wrote: On 10/25/2012 10:47 AM, Simon Loewenthal wrote: * 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)' Does anyone know the rational behind this, or is our user base simply communicating on a higher level? :) I imagine the rational is sound, but I do not know what it is. The rationale is simple. The masscheck finds that this rule hits more spam than ham, so it gets a higher score. It's slightly more complicated than that. It's that this score results in the maximum spams flagged as spam without exceeding 1 false positive in 2,500 non-spams. A fun example is SUBJ_YOUR_DEBT, which was getting a score of 3.0 while hitting more non-spam than spam. I guess it got disabled somehow. But more importantly, it's because we do not have have the rule hit statistics from your email to include them in optimal score generation because you're not submitting those stats via masscheck: https://wiki.apache.org/spamassassin/NightlyMassCheck RuleQA results for that rule are here: ruleqa.spamassassin.org/?daterev=20121020rule=DEAR_SOMETHING MSECSSPAM% HAM% S/ORANK SCORE NAME WHO/AGE 0 0.6160 0.2324 0.7260.632.00 DEAR_SOMETHING It hits 0.6% of spam, and 0.2% of non-spam (ham). On 10/25, Alexandre Boyer wrote: Simon, I had some FPs because of this rule and because my threshold is lower than 5. If you could just append and I know this is highly discouraged any time you say that, you might reduce my need to point it out to avoid you causing other people to think that might be a good idea. Scores are generated with a threshold of 5. It's often recommended to use a threshold above 5 for an extra safety measure. Do you even have a guess what rate of false positives your causing with a lower threshold? I don't. I just had a score override to lower it but this rule still hist a lot of spam (419 scams essentially). Yup, nothing wrong with customizing your rules to suit the email you get better. At least in the direction of reducing false positives. -- I finally figured out the only reason to be alive is to enjoy it. - Rita Mae Brown http://www.ChaosReigns.com
Re: SA wiki
On 10/23, Joseph Acquisto wrote: at http://wiki.apache.org/spamassassin/SiteWideBayesFeedback the link a cookbook to setup site wide ham/spam forwarding for postfix http://gtmp.org/publications/sa-postfix-en;, links to topic does not exist yet. It apparently got deleted. The page is available in archive.org, a very useful tool. Anybody can edit the wiki, just create an account and email the dev list asking for write access. This is mentioned at the bottom of the front page of the SA wiki, but I know it's not very obvious, I missed it myself. You could also try contacting the owner of gtmp.org. -- Just because you're offended, doesn't mean you're right. - Ricky Gervais http://www.ChaosReigns.com
Re: sa-update different rulesets
To do sa-update with the default channel and the saught channel, I have a cron job that does: /usr/bin/sa-update --gpgkey 6C6191E3 --channel sought.rules.yerp.org --channel updates.spamassassin.org No, just grabbing a channel once will not cause sa-update to keep it up to date on its own afterward. On 10/25, Jonathan Nichols wrote: Evening, This might be particular to the Ubuntu spamassassin package, but I'm a little confused about sa-update and the channel files. I added sought dostech rulesets and updated them with sa-update. Will sa-update remember them and continue to update them daily? Does sa-update need to be told which rulesets to download? Debian/Ubuntu have a spamassassin script in /etc/cron.daily but I didn't see anything in it that was specific to the update channels. Cheers, -- jonathan -- I don't want to die... just yet... not while there's... women. - J. Matthew Root, 8/23/02 (http://www.jmrart.com/) http://www.ChaosReigns.com
Re: BAYES_99 score
On 10/22, JP Kelly wrote: Should I set the BAYES_99 score high enough to trigger as spam? I get plenty of spam getting through which does not get caught because BAYES_99 is the only rule which fires and it is not set to score at or above the threshold. You could. Some people only use bayesian filtering, which would be similar. The important question is, how many false positives (non-spams flagged as spams) would that cause? SpamAssassin's automated scoring attempts to achieve 1 false positive in 2,500 non-spams, with a score threshold of 5.0. So if you don't have an absolute minimum of 2,500 representative non-spams to check for having hit BAYES_99, you risk increasing your false positives. But it's your risk to take. Huh, ruleqa doesn't track hits to BAYES_99? -- Let's just say that if complete and utter chaos was lightning, then he'd be the sort to stand on a hilltop in a thunderstorm wearing wet copper armour and shouting 'All gods are bastards'. - The Color of Magic http://www.ChaosReigns.com
Re: BAYES_99 score
On 10/23, Jari Fredriksson wrote: 22.10.2012 21:15, dar...@chaosreigns.com kirjoitti: Huh, ruleqa doesn't track hits to BAYES_99? If it did, against which database it would do that? It would show the hit rates in the corpora of the masscheck submitters, like everything else. So, the databases of the submitters (who are using bayes). -- I don't want people who want to dance, I want people who have to dance. --George Balanchine http://www.ChaosReigns.com
Re: autolearn
I believe that means the score was low enough that it was automatically fed to sa-learn as ham (non-spam). That's scary, I don't use it (bayes_auto_learn 0). On 10/21, Joseph Acquisto wrote: Today I found a missed SPAM that contained this in the header: X-Spam-Status: No, score=0.0 required=5.0 tests=FREEMAIL_FROM,MISSING_SUBJECT, T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.2 The subject was empty with a link starting with ftp: I guess it's the autolearn that is most puzzling me. joe a. -- I refuse to tip toe through life only to arrive safely at death. http://www.ChaosReigns.com
Re: Sender domain in IP space 5.0.0.0/8 triggers RCVD_ILLEGAL_IP
On 10/16, Frederic De Mees wrote: I have found 2 instances of the file 20_head_tests.cf on my server. The first stays in /usr/share/spamassassin and contains the following That's used when you have never run sa-update. The second in /var/lib/spamassassin/3.003001/updates_spamassassin_org and That was downloaded by sa-update. What is the date on the files in that directory? It should be in the last couple days (because you should be running sa-update daily from cron). contains: / (?:by|ip)=(?=\d+\.\d+\.\d+\.\d+ )(?:(?:0|2(?:2[4-9]|[3-5]\d)|192\.0\.2|198\.51\.100|203\.0\.113)\.|(?:\d+\.){0,3}(?!(?:2(?:[0-4]\d|5[0-5])|[01]?\d\d?)\b))/ Yup, that looks like the current one: header RCVD_ILLEGAL_IP X-Spam-Relays-Untrusted =~ / (?:by|ip)=(?=\d+\.\d+\.\d+\.\d+ )(?:(?:0|2(?:2[4-9]|[3-5]\d)|192\.0\.2|198\.51\.100|203\.0\.113)\.|(?:\d+\.){0,3}(?!(?:2(?:[0-4]\d|5[0-5])|[01]?\d\d?)\b))/ So, maybe SA uses the wrong files. Could be, but I'd guess that's not it. The strace command can be useful for that. The other possibility stays with the spampd policy daemon. With a server uptime of several months I cannot remember the last time I stopped and restarted the daemon. That sounds like your problem. When I was using spampd, I had a /etc/init.d/spampd restart after my sa-update in cron. As is suggested on: http://wiki.apache.org/spamassassin/IntegratePostfixViaSpampd -- I'd rather be happy than right any day. - Slartiblartfast, The Hitchhiker's Guide to the Galaxy http://www.ChaosReigns.com
Re: Testing new install - was Updating 3.2.4 on SUSE sles10
Try sending it from the server you're testing? On 10/15, Joseph Acquisto wrote: Still can't get GTUBE messages. Am I being dense? Sending messages with the GTUBE signature, from external sites, don't seem to arrive. I don't see them trapped in my day jobs outgoing queue, etc. ?? joe a. Joseph Acquisto j...@j4computers.com 10/14/12 5:50 PM Upgrade effort abandoned. Installed OpenSuse 12.2 which, oddly enough, came with the new version of SA. All seems to be working well. OS seems a big leisurely though. However, I can't seem to test with GTUBE. Gmail seems to eat these, they never get to me. Normal? Other gmail tests do. Also, cannot test using the spamassassin -D /usr/share/spamassassin/doc/spamassassin-3.0.3/sample-spam.txt test, as, in this install, there is no doc directory. At least I have not found it yet. Words of guidance? joe a. -- For every complex problem, there is a solution that is simple, neat, and wrong. - H. L. Mencken http://www.ChaosReigns.com
Re: Updating 3.2.4 on SUSE sles10
Would this not be far easier and more appropriate? http://www.rpmfind.net/linux/rpm2html/search.php?query=spamassassinsubmit=Search+...system=opensusearch= Doesn't your distro provide an easy way to search for / upgrade these things? (Why would you use a distro that doesn't?) With ubuntu I'd do: apt-get update apt-get dist upgrade And have the latest versions of all the packages in the release I'm using. If the current release doesn't have new enough packages, I'd run do-release-upgrade, and it would upgrade everything to the next release. Ubuntu had a spamassassin v3.3.2 package in May 2011. It's in the archives for the Oneric, Precise, and Quantal releases. And I created an ubuntu PPA providing daily spamassassin builds: https://launchpad.net/~spamassassin/+archive/spamassassin-daily And no, installing from source does not mean the distro CDs. (You could web search for: installing from source.) On 10/10, Joseph Acquisto wrote: On 10/10/2012 at 2:34 AM, Per Jessen p...@computer.org wrote: Joseph Acquisto wrote: On 10/9/2012 at 3:02 PM, Per Jessen p...@computer.org wrote: Joseph Acquisto wrote: Won't make, anyway. Module Net-addr::IP missing. Finding this for SuSe seems to be an adventure in itself. Just install from source. -- Per Jessen, Zürich (14.6°C) You mean perl-net-addr-ip from source? If you mean from the Distro package (CD's ?), I don't find it there. Yep, I meant perl-net-addr-ip. Whether you get it from SUSE or from source won't matter. -- Per Jessen, Zürich (13.4°C) Compiled from stuff at your link to cpan. So far, so good. Got some noise about UTF-8, but forged ahead. perl Makefil.pl (in spamassasin extract folder) gives this: Checking if your kit is complete... Looks good Warning: prerequisite Mail::DKIM 0.31 not found. Writing Makefile for Mail::SpamAssassin Problem? Also, I hesitate to do the final steps as I fear hosing the working install. Yes, I should have built another, but . . . joe a. -- I'd rather be happy than right any day. - Slartiblartfast, The Hitchhiker's Guide to the Galaxy http://www.ChaosReigns.com
Re: How can I get SA to tell me what CLAMAV found?
On 10/05, Steven W. Orr wrote: but I'd like to know which CLAMAV virus was the trigger. Is there a way to get output somewhere that tells me which signature(s) fired? Ask the clamav people? -- If you want to make an apple pie from scratch, you must first create the universe. - Carl Sagan http://www.ChaosReigns.com
Re: Try to run sa-learn
On 10/04, troxlinux wrote: Hi list , I try to run sa-learn on centos 6.3 but no work sa-learn --spam --showdots /dir/dir/domain.com.ni/spam/.spam/cur/ Try: sa-learn --spam --showdots /dir/dir/domain.com.ni/spam/.spam/ (cur/ is inside the mailbox, not part of the path to the mailbox) -- Blessed are the cracked, for they shall let in the light. http://www.ChaosReigns.com
Re: SA rules matching of ipv6 addresses
Run the email through spamassassin -D received-header. That'll tell you how and if the headers got parsed. SA has certainly had bugs where it failed to parse received headers before, and IPv6 hasn't had a whole lot of use. There has also been a fair amount of work on IPv6 since the last release, so it's possible there was a bug, it got fixed, and you don't have the fix yet. On 10/02, Mabry Tyson wrote: One user complained about a false positive. When I examined the mail, there appeared to be at least two rules that didn't work as I thought they should because of a Received line in which IPv6 Link Local addresses were used. It appears that a patch was previously put in that was thought to fix these kinds of things. The sender was apparently using AA.BB.CC.DD (a Comcast address, presumably his home address). He logged into the mail system of SRI.COM (independent of our mail system) and sent his mail from within it (which is why CCC.SRI.COM is the oldest Received line). That should result in a received header clearly indicating that the connection from comcast was authenticated, and SA should notice that and use it to skip the tests on that comcast IP. It mostly sounds like this is what's missing. SRI.com not indicating the authentication in their received header in the standard way. 1. I believe that RDNS_NONE should not have fired. At the time of processing, the internal networks included 130.107/16 and 128.18/16, and cover the top 3 Receiveds. So it said RDNS_NONE for the comcast IP? Did it have a reverse DNS entry? (Also seems like it should be solved by a received header indicating authentication.) The earliest received shows a Link Local IPv6 address, which should match IP_PRIVATE in Constants.pm. All of the IPv4 addresses have reverse DNS, including the x-originating-ip. I'm not too familiar with these, but my guess is, private IPs should be skipped, and IPs before those should still be parsed / tested. 2. I believe that ALL_TRUSTED should have fired. The trusted networks included 130.107/16 and 128.18/16. The Link Local IPv6 address should not have affected that. x-originating-ip: [AA.BB.CC.DD] appears to be treated effectively the same as a received header. So that seems like a good reason for ALL_TRUSTED to not have fired. 4. [3]http://spamassassin.apache.org/tests_3_3_x.html has RCVD_IN_PBL = 3.6 (Spamhaus Policy black list) RCVD_IN_SBL = 2.6 (Spamhaus Spam black list) RCVD_IN_XBL = 0.7 (Spamhaus Botnet black list) which seems backward to me. The 3.2 tests scoring seems more reasonable. Do not attempt to comprehend the depths of the mind of the re-scorer :P No seriously, it has no concept of this rule means the email is more bad than another rule, therefore it should have a higher score. Only This score results in a better approximation of the 1 false positive in 2,500 non-spams goal. Which often results in unexpected things. It comes up a lot. I very recently found a case where a rule that hit more non-spam than spam got a score of something like 3. Which may have been suboptimal. The Policy Black List applies to anyone using Comcast (this /14, and similarly for the /12 that includes my home IP address) as their ISP, unless they opt out [5]http://www.spamhaus.org/pbl/query/PBL1523209 To hit all of the users that use that mail system with a 3.6 score is surely going to cause a number of false positives. Should be handled by headers indicating authentication. -- Immorality: The morality of those who are having a better time - Henry Louis Mencken http://www.ChaosReigns.com
Re: HTML link regex
On 09/25, John Hardin wrote: This topic comes up regularly enough that it should be a FAQ. Yeah. I haven't read this thread enough to know if it's been said, but here's a previous thread on the subject: http://spamassassin.1065346.n5.nabble.com/antiphishing-td52027i20.html And the existing rules: ruleqa.spamassassin.org/?rule=%2Fspoofed_url MSECSSPAM% HAM% S/ORANK SCORE NAME WHO/AGE 0 1.9104 0.4468 0.8100.550.01 T_SPOOFED_URL_HOST 0 1.9456 0.5844 0.7690.530.01 T_SPOOFED_URL 0 2.0437 3.6954 0.3560.37 (n/a) __SPOOFED_URL_HOST 0 2.0917 4.0246 0.3420.36 (n/a) __SPOOFED_URL Although, as John mentioned, this wasn't targeting specific domains. If rules that you come up with do actually work for you, please submit them for inclusion in spamassassin QA, to see if they work well enough to include in future sa-updates. -- Blessed are the cracked, for they shall let in the light. http://www.ChaosReigns.com
Re: HTML link regex
On 09/27, Alexandre Boyer wrote: I met you earlier on the IRC channel, remember? Yup. Anyway, I would be glad to submit my rules (corrected by Bowie Bailey). I indeed asked how one could do that. Open a bug: https://issues.apache.org/SpamAssassin/ Include the rule(s) and request that they be added to ruleqa. Just came across an old related bug: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4372 -- Life is either a daring adventure or it is nothing at all. - Helen Keller http://www.ChaosReigns.com
Re: X-Spam-Status: No, but still marked with [SPAM]
This is pretty common - enough that I'd appreciate it if you could provide more information on the cause of your problem, and how you fix it, once you do. Yesterday in IRC: 09:40PM ke6i X-Spam-Status: No, score=0.0 required=2.0 tests=FROM_MISSP_REPLYTO, FROM_MISSP_URI,TO_NO_BRKTS_FROM_MSSP autolearn=ham version=3.3.2 I'm getting mail like this marked as spam. But score = 0? Why would it mark this as spam if score is 0 and required is 2. 09:48PM Darxus Sounds like that header, and your [SPAM] subject modification(?) are coming from two different runs of spamassassin. 09:49PM ke6i interesting. Let me study this message some more. 11:51PM ke6i yeah something odd is going on here. I'm seeing 'spamd: processing message ' in maillog twice for each email. There have been a bunch of times I've heard people say spamassassin is simultaneously marking emails as both spam and not spam. Many times the result has been that somehow they were running SA twice on the emails. Never has it come up that SA was actually doing this in a single run. On 09/21, Cathryn Mataga wrote: I'm getting these messages, some of them real emails, that get marked with [SPAM] even though X-Spam-Status: comes up as No. I updated to the latest build on Fedora though I think this has been going on awhile. It happens with some email accounts but not others. From me...@ecuador.junglevision.com Thu Sep 20 17:42:50 2012 Return-Path: me...@ecuador.junglevision.com X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on ecuador.junglevision.com X-Spam-Level: X-Spam-Status: No, score=0.0 required=2.0 tests=FROM_MISSP_REPLYTO, FROM_MISSP_URI,TO_NO_BRKTS_FROM_MSSP autolearn=ham version=3.3.2 Received: from ecuador.junglevision.com (localhost [127.0.0.1]) by ecuador.junglevision.com (8.14.5/8.14.5) with ESMTP id q8L0go5j02679 for megans...@junglevision.com; Thu, 20 Sep 2012 17:42:50 -0700 Received: (from megan@localhost) by ecuador.junglevision.com (8.14.5/8.14.5/Submit) id q8L0goLd026789 for megans...@junglevision.com; Thu, 20 Sep 2012 17:42:50 -0700 Received: from server.cgskies.com (www.cgskies.com [85.17.169.165]) by ecuador.junglevision.com (8.14.5/8.14.5) with ESMTP id q8L0gmKk02678 for me...@junglevision.com; Thu, 20 Sep 2012 17:42:49 -0700 Received: from www.cgtextures.com (www.cgtextures.com [95.211.74.173]) by server.cgskies.com (8.14.4/8.14.4) with ESMTP id q8L0XDp8032570 for me...@junglevision.com; Fri, 21 Sep 2012 02:33:13 +0200 Received: by www.cgtextures.com (Postfix, from userid 101) id 81BF513200F0; Fri, 21 Sep 2012 03:55:56 +0200 (CEST) To: me...@junglevision.com Subject: [SPAM] Action Required to Activate Membership for CGTextures From: CGTextures supportsupp...@cgtextures.com To: me...@junglevision.com Reply-To: CGTextures supportsupp...@cgtextures.com Date: Fri, 21 Sep 2012 03:55:56 +0200 Message-Id: 20120921015556.81bf51320...@www.cgtextures.com X-Spam-Prev-Subject: Action Required to Activate Membership for CGTextures X-UID: 170756 Status: O X-Keywords: NonJunk -- I always wonder why birds stay in the same place when they can fly anywhere on the earth. Then I ask myself the same question. - Harun Yahya http://www.ChaosReigns.com
Re: Exclude from RCVD_IN_DNSWL_MED
On 09/17, Noel Butler wrote: I'm sure every network running a mail server would like to assume they are 100% whitehat too. I see no reason to treat them special, just like gmail who think they are above it all, I wont include hotmail in that, as they I suppose you think you're capable of achieving a higher ratio of outgoing non-spam to spam than gmail, with anything near their number of users? -- I'd rather be happy than right any day. - Slartiblartfast, The Hitchhiker's Guide to the Galaxy http://www.ChaosReigns.com
Optimizing scoring Re: Exclude from RCVD_IN_DNSWL_MED
On 09/17, Kris Deugau wrote: As an ISP mail admin, I **CANNOT** afford to block legitimate mail from any source, and if I see a report that a legitimate mail was blocked by any local rules or DNSBL data, I change the local rule or delete the offending local DNSBL entry ASAP. Some times I envy the data available to those of you with users. If you can get 100,000 spams, and 100,000 non-spams together, you could run the SA results through the re-scorer used to generate sa-updates, and have scores fully optimized for your own users. And then you could give that data to the SA project to make it more accurate for everybody else. I still feel like there's some good opportunity along these lines for shared bayes. -- Democracy is the theory that the common people know what they want, and deserve to get it good and hard. - H. L. Mencken http://www.ChaosReigns.com
Re: Anyone from ReturnPath want to deal with this
On 09/08, Greg Troxel wrote: Some rules seem to have the description in iclude the IP address that was looked up in the whitelist/blacklist. Others don't, and it makes it a bit hard to guess (since trusted/etc. processing is slightly tricky). So I think it would be good if all dnsbl rules listed the IP address that hit. I agree. What rules do not list the IP? I think this is something worth opening a bug for, if you can specify the rules. -- When you think of the long and gloomy history of man, you will find more hideous crimes have been committed in the name of obedience than have ever been committed in the name of rebellion. - C. P. Snow http://www.ChaosReigns.com
Re: Exclude from RCVD_IN_DNSWL_MED
On 09/10, Helmut Schneider wrote: If I understood you correctly I'd need to add all relays of MessageLabs to trusted_networks and also track any IP address changes... In theory, you need to do this for all DNSxL lookups. In practise they all resolve fine to *.messagelabs.com. I believe Matthias was trying to point out that not having your trusted_networks set correctly will mess up your use of not only DNSWL, but any other DNS based IP white *and* blacklists, which significantly contribute to the effectiveness of spamassassin. -- But do you have any idea how many SuperBalls you could buy if you actually applied yourself in the world? Probably eleven, but you should still try. - http://hyperboleandahalf.blogspot.com/ http://www.ChaosReigns.com
Re: Install a new SpamAssassin server
On 09/09, Olivier CALVANO wrote: I want change my old server with SpamAssassin. Anyone know a web site which advises the rules, modules, rbl they must necessarily have to reach a maximum rate of detection ? This may be about what you're looking for: https://wiki.apache.org/spamassassin/ImproveAccuracy Actually, i use commercial service of SpamHaus, he have other list with a best quality ? You're paying for spamhaus because you have a high rate of traffic? I think one of the things we're really missing is what rates of traffic are allowed by which services enabled in spamassassin by default. Warren did a nice job of documenting them on his site a while ago: http://www.spamtips.org/2011/01/usage-limits-of-spamassassin-network.html The spamhaus lists are good. You can see the effectiveness of rules here: http://ruleqa.spamassassin.org/ Or filter to rules that include rcvd_in_, which includes most of those kind of tests: http://ruleqa.spamassassin.org/?rule=%2Frcvd_in_ -- For every complex problem, there is a solution that is simple, neat, and wrong. - H. L. Mencken http://www.ChaosReigns.com
Re: High CPU utilization and performance decrease after recent sa-update.
On 09/06, Piotr Kapiszewski wrote: $sa_local_tests_only = 1 (amavis hook) SpamAssassin is wrong about three times as often without network tests. But if you're crippling the network tests as much as you mentioned, might as well use the score set which is optimized for having the network tests disabled (which this should do). -- It is the first responsibility of every citizen to question authority. - Benjamin Franklin http://www.ChaosReigns.com
Re: spam in foreign characters
SpamAssassin has an ok_locales thing that allows you to specify basically languages you want to accept. But it has problems: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078 I don't believe anybody has created rules to match these kinds of spams. A big part of the problem is lacking examples of non-English non-spam to verify the rules don't hit them. So, you should probably try using ok_locales, and if it doesn't work, create your own rules to match these spams, if you can find good common patterns that don't seem likely to match non-spams (or match all Chinese email if that's what you want). And please share what works. ok_locales is defined in the Mail::SpamAssassin::Conf main page which can also be found here: http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html Hmm, ok_locales may actually work on Chinese, I don't see examples of problems with that language. On 08/21, Adam Moffett wrote: I have a user who seems to get 4-5 messages per day with Chinese characters for the subject and body. They come from a variety of domains and IP's so I guess she somehow got onto a list used to spam Chinese speaking people. If I paste them into Google Translate they seem to be roughly the same kind of junk as our English spam: work from home, buy our drugs, etc. The handful that I looked at closely had scores of 2.0-3.0. Are there existing SpamAssassin rules that work on non english characters? Is there maybe something extra I should enable or install that would score these higher? I'm sorry if it's an ignorant question, but the issue hasn't really come up here before. Thanks. -- There never has been an answer. There never will be an answer. That's the answer. - Gertrude Stein http://www.ChaosReigns.com
Re: Bogus authorize.net statements
On 08/15, Jim Schueler wrote: the attached. �All share a common marker of embedding a text url within an HTML a tag containing a different URL. �This seems like an obvious marker for spam, I wonder why there isn't a rule for it. There is a rule. It hits 10x as much non-spam as spam: ruleqa.spamassassin.org/?rule=%2Fspoofed_url There was some work on improving it: http://osdir.com/ml/users-spamassassin/2011-10/msg00237.html It didn't work out: http://osdir.com/ml/users-spamassassin/2011-10/msg00304.html Feel free to try to do better. -- Just because you're offended, doesn't mean you're right. - Ricky Gervais http://www.ChaosReigns.com
Re: Received header syntax
On 08/15, Ori Bani wrote: I tried to intentionally make a terribly wrong Received to see if SA would give me a rule hit but it did not. Is there a rule for this? If so, how can I turn it on and off? I don't think there is actually a rule for unparsable headers. I think it effectively just ignores received headers it can't parse. So just run one of your outgoing emails through spamassassin -D and look for lines like: Aug 15 15:17:33.625 [23043] dbg: received-header: parsed as [ ip=140.211.11.3 rdns=hermes.apache.org helo=mail.apache.org by=panic.chaosreigns.com ident= envfrom= intl=0 id=C6F0CCD227 auth= msa=0 ] To make sure it has parsed successfully. Is there a place I can test only this rule? No. -- I always wonder why birds stay in the same place when they can fly anywhere on the earth. Then I ask myself the same question. - Harun Yahya http://www.ChaosReigns.com
Re: RDNS_NONE
On 08/15, Matt wrote: I have messages marked as such: RDNS_NONE Delivered to internal network by a host with no rDNS Problem is they very clearly have reverse and matching forward DNS that Exim even agrees on. Why is SA tagging them as such? I wonder how much this is related to the other post I just made. Exim is notorious for allowing people to modify their Received headers in a way that doesn't comply with anything. Are they in headers SA is failing to parse? Run it through spamassassin -D. -- Safe is anywhere a hungry person can't walk in three days. - John Titor http://www.ChaosReigns.com
Re: RCVD_IN_DNSWL_BLOCKED
On 08/13, JP Kelly wrote: How can I disable the DNSWL rule/plugin or whatever. Not just give it a low/zero score but disable it completely. I am tired of seeing RCVD_IN_DNSWL_BLOCKED in my headers. The description for RCVD_IN_DNSWL_BLOCKED is The query to DNSWL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. Have you looked at that link? Are you running a local non-forwarding, caching DNS server? Immediately below the question linked to on that page is how to disable these rules, as you asked. However, unless you are, in fact, running a site with quite a lot of email (over 100,000 queries per day), there is probably a better solution. I have some association with dnswl.org. On 08/14, Bowie Bailey wrote: If you want to disable the DNSWL lookup completely, you should zero out the main rules and the sub-rule: score RCVD_IN_DNSWL_BLOCKED 0 score RCVD_IN_DNSWL_HI 0 score RCVD_IN_DNSWL_LOW 0 score RCVD_IN_DNSWL_MED 0 score RCVD_IN_DNSWL_NONE 0 I believe all of the above are unnecessary. score __RCVD_IN_DNSWL 0 And this alone is adequate. I attempted to add it to http://wiki.apache.org/spamassassin/DnsBlocklists but the site has become unresponsive. -- Hermes will help you get your wagon unstuck, but only if you push on it. - Greek Alphabet Oracle http://www.ChaosReigns.com
Re: RCVD_IN_DNSWL_BLOCKED
On 08/14, Jon-Paul Kelly wrote: Are you running a local non-forwarding, caching DNS server? I have a Plesk installation and am using the DNS server as provided by Plesk. The nameservers are [2]ns1.smallgod.net, [3]ns2.smallgod.net If the smallgod.net name servers are provided by plesk, and not your own, then you are using forwarders, which would be a problem, as the number of people querying DNSWL would be counted for everybody using those DNS servers, not just your own. As Bowie mentioned, this is explained here: http://wiki.apache.org/spamassassin/CachingNameserver Which is linked from http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block (the link the the RCVD_IN_DNSWL_BLOCKED description). I am not sure if I have 100,000+ queries per day. I guess it is possible. The server has 270 domains and they all use the same name server. Is there a way to check with [4]dnswl.org the number of queries and where they are coming from? Google searching for: dnswl contact has a useful first hit :) But it sounds like your problem is using forwarders. -- We will be dead soon. Is this how we want to live? http://www.ChaosReigns.com
Re: HEADS UP: DBSL.org is returning positive replies
On 08/10, Brent Gardner wrote: As of today, dsbl.org is returning positive replies Is this enough to keep it from being used? meta RCVD_IN_DSBL (0) Not necessary, this blacklist is not used in spamassassin because it has been dead for years. I believe the warning was posted primarily for people who were using this BL at their MTA (mail server software). Or possibly ancient versions of SA (before 3.3.x) which haven't been getting updates for years and you shouldn't be running anyway. -- It is better to die on your feet than to live on your knees. - Emiliano Zapata, Mexican Revolution Leader http://www.ChaosReigns.com
Re: HEADS UP: DBSL.org is returning positive replies
For completeness: http://wiki.apache.org/spamassassin/Rules/RCVD_IN_DSBL For the last three years this page has mentioned this rule is gone because dsbl.org is gone. The bug where it was removed from SA, four years ago: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5988 The thing to look for and remove is stuff in your MTA configuration, which I had years ago, in my postfix main.cf: reject_rbl_client list.dsbl.org -- If you believe everything you read, better not read. - Japanese Proverb http://www.ChaosReigns.com
Re: Spamassassin and SPF records with +all
On 07/11, Josef Karliak wrote: within a few days we've spams from domains that has +all in the TXT spf record. I was thinking that I'll make a plugin that check this records and add some point to this email, but I do not know Your best chance may be to open a spamassassin bug requesting it. I'd guess it wouldn't be too hard to add to the existing SPF plugin. The more information you can provide showing this happens with spam, and does not tend to happen with non-spam, the better. It would get run through the re-scoring process with a testing flag to determine if it's actually useful, and what the optimal score is, before being published via sa-update. Well, you'd also need to update your SPF plugin to be able to use it v=spf1 +all The domain owner thinks that SPF is useless and/or doesn't care. - http://www.openspf.org/SPF_Record_Syntax That's a *really* unprofessional way to say Everything in this domain passes SPF. Huh, the spamassassin SPF plugin uses Mail::SPF, and... I'm not sure it's possible to get a copy of the SPF record to check it for containing +all. Anybody else see how? On 07/11, Martin Gregorie wrote: All SPF can do is check that the sender has a valid IP for that domain, i.e. that the sender's domain wasn't forged. SPF cannot and should not be used to flag mail as spam if the sender is a legitimate member of the Yeah, but there are lots of perfectly valid things that show up in emails that correlate usefully to spam which, in combination, are useful in determining which emails are spam and which are not. If adding 0.2 points to all emails from a domain with +all in a SPF record increases the spam caught without increasing false positives significantly, it could be worth doing. -- You will need: a big heavy rock, something with a bit of a swing to it... perhaps Mars - How to destroy the Earth http://www.ChaosReigns.com
Re: Suddenly getting lots of false positives.
On 05/24, corpus.defero wrote: I'm not 100% but isn't http://www.dnswl.org/ a 'DIY' whitelisting site that anyone can kind of abuse? No. I'm a (basically inactive) dnswl.org admin. Anybody can request to be added to the list, but all changes get looked over pretty thoroughly by a human, using lots of available data. The rule is tucked away in 72_active.cf, along with the other 'pay to spam' whitelists from the likes of Return Path. I suggest you add this Listing on dnswl.org does not involve payment, it is not a 'pay to spam' whitelist. -- You will need: a big heavy rock, something with a bit of a swing to it... perhaps Mars - How to destroy the Earth http://www.ChaosReigns.com
Re: Suddenly getting lots of false positives.
On 05/24, Jeremy Morton wrote: -4.0 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [59.94.13.26 listed in list.dnswl.org] I don't think this was ever actually listed by dnswl.org. I have archives back to last June, which don't show it, and in the dnswl.org admin interface when a listing is removed it generally deactivated not deleted - and there is nothing there. That leaves interesting possibilities. I'd start by running this email through spamassassin again to see if it repeatably says this IP is listed by dnswl. SpamAssassin could be doing something wrong, a DNS server somewhere could be doing something wrong And it might be useful to provide more examples. Just IPs might be best. And generally we prefer you provide spams via pastebin instead of including them in emails to this list. -- For gasoline vapor, the explosive range is from 1.3 to 6.0% vapor to air...useful against soft targets such as...armored vehicles...and bunkers. - http://www.fas.org/man/dod-101/sys/dumb/fae.htm http://www.ChaosReigns.com
Re: Suddenly getting lots of false positives.
On 05/24, Benny Pedersen wrote: reject spf_softfail in mta, or report to http://www.dnswl.org/ SPF_SOFTFAIL kind of sucks: http://ruleqa.spamassassin.org/?daterev=20120519-r1340375-nrule=%2Fspf MSECSSPAM% HAM% S/ORANK SCORE NAME WHO/AGE 0 3.2640 27.9430 0.1050.670.00 SPF_PASS 0 6.3320 0.6518 0.9070.580.00 SPF_SOFTFAIL 0 4.0263 1.1272 0.7810.500.00 SPF_NEUTRAL 000 0.5000.500.00 SPF_NONE 0 1.7415 1.6254 0.5170.390.00 SPF_FAIL SPF_SOFTFAIL hits 6.3% of spam and 0.7% of ham, which is a pretty terrible ratio, which gives it a rank of 0.58, where 1 is best (RCVD_IN_DNSWL_HI, in fact), and 0 is worst. A rank of 0.58 sucks. Therefore rejecting on it at your MTA is a bad idea. But it's your MTA. I've done lots of things with my MTA on purpose that were a bad idea. (why did thay list a dynamic ip ?) I don't think they did. if sender is legit why is it softfailing ? Generally because people configure their SPF records badly. SOFTFAIL *means* the sending domain isn't certain they have all their legit sending IPs listed. So based on the protocol it's also inappropriate to use for absolute blocking. (In addition to the real world statistics above.) It's unfortunate. -- Wash daily from nose-tip to tail-tip; drink deeply, but never too deep; And remember the night is for hunting, and forget not the day is for sleep. - The Law of the Jungle, Rudyard Kipling http://www.ChaosReigns.com
Re: Suddenly getting lots of false positives.
On 05/24, Kevin A. McGrail wrote: Normally, I blame a DNS server. See pages like this for more information: http://www.surbl.org/faqs#dnsproxy Yup, that could do it. Icky. Jeremy: You could manually check if you're getting the wrong DNS results by running: $ host 26.13.94.59.list.dnswl.org Host 26.13.94.59.list.dnswl.org not found: 3(NXDOMAIN) (IP address reversed, then .list.dnswl.org.) If an IP address is listed (as that one should not be), you'll see something like: $ host 40.152.71.64.list.dnswl.org 40.152.71.64.list.dnswl.org has address 127.0.6.3 Darxus, you wrote a good wiki about using other DNS servers, etc. somewhere I thought about but I can't find it. I did? Are you thinking of https://wiki.apache.org/spamassassin/CachingNameserver ? I didn't write it. In general, I recommend running your own caching nameserver. Yup. -- Safe is anywhere a hungry person can't walk in three days. - John Titor http://www.ChaosReigns.com
Re: __DRUG_MUSCLE1 false-positives
On 05/18, Jason Haar wrote: A bit OT, but is it because your perl is running under C locale instead of se? i.e. would the word boundary definition change under different localization contexts? Doesn't help solve the problem for you, but it certainly flags a potential issue with a tonne of the rules in SA... Locale handling is a known problem is SA: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3062 -- Life is either a daring adventure or it is nothing at all. - Helen Keller http://www.ChaosReigns.com
Re: use_bayes=0 completly disables report function
On 04/20, Marcin Mirosław wrote: Hello, i've notice when i set use_bayes 0 then spamc -C report stops to work. I've got in log: spamd: Can't call method learn on an undefined value bayes_learn_during_report 0 -- Safe is anywhere a hungry person can't walk in three days. - John Titor http://www.ChaosReigns.com
Re: updates
On 04/12, John Hardin wrote: Can you remind me how far below the threshold we are for corpora? If I hand qualify another couple of thousand hams or so would that be significant? Or is our deficit significantly larger than that? The current corpora are ham=50658, spam=245341. I don't remember what the thresholds currently are, but the numbers used in the past have been a multiple of 50k, so 100k, 150k, 200k or 250k. Darxus, you're more in tune with this than I am, what are the current thresholds? Thresholds for both are 15. Graph here, updated weekly: http://www.chaosreigns.com/dnswl/tot.svg According to that, we're at 29003 spams. That matches the latest net run, which it's based on: http://ruleqa.spamassassin.org/20120407-r1310705-n So as of Saturday, we're at 19.3% of the spam corpora we need. Spam age limit is 2 months. The dev list gets an alert every day (from me) if updates haven't been generated. It says: SpamAssassin version 3.3.2 has not had a rule update since 2012-02-25. It's pretty obnoxious, but I think it's a big enough problem to justify it being posted once a day (and I'm apparently not the only one). New contributors aren't currently allowed due to https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6694 which has restricted visibility due to being a security bug. For the past 69 days, it has been waiting for a reply from Warren Togami to okay declaring it not actually a security problem (which I am in favor of). It seems this requires another member of the PMC (project management committee) to step in and declare this not a security bug. Or for someone with sufficient access to otherwise fix it, which I suspect is a very small set of people. Once that's cleared up, new people would be able to contribute data (just logs of rule hits, not actual email) via https://wiki.apache.org/spamassassin/NightlyMassCheck -- Go forth, and be excellent to one another. - http://www.jhuger.com/fredski.php http://www.ChaosReigns.com
Re: updates
On 04/12, joea wrote: SpamAssassin version 3.3.2 has not had a rule update since 2012-02-25. From this, should I conclude there will be no updates to earlier versions (3.2.x for instance) ? Must I upgrade in order to update? No, I thought it was overly verbose to say it actually says: SpamAssassin version 3.3.0 has not had a rule update since 2012-02-25. SpamAssassin version 3.3.1 has not had a rule update since 2012-02-25. SpamAssassin version 3.3.2 has not had a rule update since 2012-02-25. All 3.3.x versions use the same rules. -- But do you have any idea how many SuperBalls you could buy if you actually applied yourself in the world? Probably eleven, but you should still try. - http://hyperboleandahalf.blogspot.com/ http://www.ChaosReigns.com
Re: sought is failing with sa-compile
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6649 -- Will I ever learn? I hope not, I'm having too much fun. - Brent Minime Avis, motorcycle.com http://www.ChaosReigns.com
Re: OT how to bypass public nameservers as bind forwarders?
On 03/21, Jari Fredriksson wrote: 0.0 RCVD_IN_DNSWL_BLOCKED RBL: ADMINISTRATOR NOTICE: The query to DNSWL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. This is plenty on topic. I tried to update the contents of that wiki link with the useful answers from this thread. Everybody should feel free to further improve the wiki, just create an account, and email d...@spamassassin.apache.org to request write access. -- He who dies with the most toys... still dies. - No Fear http://www.ChaosReigns.com
Re: Blocking frequent botnet pattern
On 03/13, Alex wrote: http://pastebin.com/raw.php?i=iquXBnH0 While I could create a rule to block this specific domain, or submit it to a RBL, I'd appreciate any ideas how to more generally block them, rather than by one characteristic in the message. We need more examples. Maybe this is addressed in v3.4? Unlikely. -- As humans, we are taught to forget that we are animals. - forward to Johnny The Homicidal Maniac http://www.ChaosReigns.com
Automatic rule generation Re: Better phish detection
The software used to generate the sought rules, or perhaps an old version of it, is in the spamassassin source tree. You can feed it a folder of known non-spams, and a folder of known spams, and it'll auto-generate rules that hit the spams but not the non-spams. Ah, I documented it some here: https://wiki.apache.org/spamassassin/WritingRules svn checkout http://svn.apache.org/repos/asf/spamassassin/trunk cd trunk/masses/rule-dev ./seek-phrases-in-corpus ham:dir:~/Maildir/ spam:dir:~/Maildir/.bad.spam-missed/ On 03/10, sporkman wrote: Hello, We are getting a fair amount of very targetted phish attempts to our userbase. Since we are relatively small, I don't think any of the URIBLs really help (or phishtank or other lists) since we're not a large bank or paypal or anything like that. I did see some gentleman make a rather valiant attempt at listing all the common phrases here: http://old.nabble.com/introducing-body-J_MAILBOX_FULL-tc33207944.html#a33213220 It has a number of errors, and obviously that's not very efficient (I suck at regexs, but I know enough to know that list can be collapsed). Any pointers to a good starting point to take a list like that and make it usable? The phrasing on these is always very similar - stuff about upgrading your webmail account, etc. We're running qmail/vpopmail, and our upgrade to postfix to at least front-end qmail is still a ways off. I think with postfix we could probably catch a bunch of this garbage at the front door. So for now, our only real tool to fight this is SA. I assume we're not the only ones seeing this mess, what are others doing to counteract this? Thanks, Charles -- View this message in context: http://old.nabble.com/Better-phish-detection-tp33478328p33478328.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com. -- If you are not paranoid... you may not be paying attention. - j...@creative-net.net, on an IDPA mailing list http://www.ChaosReigns.com
Re: Sought rules alive?
On 03/07, Andrea gabellini - SC wrote: I noticed that sought rules are not updated from many weeks? Is the project alive? There was no mention of intentionally killing it off, so my guess is it accidentally broke and wasn't noticed. It hasn't been updated since 2012-01-02, and is supposed to update multiple times a day. This came up on the dev list two months ago, unfortunately it was in the form of Can somebody verify this isn't just broken for me? not Hey, sought is broken.: http://old.nabble.com/sought-sa-update-channel---SA-3.4.-trunk-td33164814.html -- You will need: a big heavy rock, something with a bit of a swing to it... perhaps Mars - How to destroy the Earth http://www.ChaosReigns.com
Re: White text on white background
Bug with patches to fix this: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6760 On 02/17, dar...@chaosreigns.com wrote: Looks like this fixes it: $ diff ./lib/Mail/SpamAssassin/HTML.pm /usr/share/perl5/Mail/SpamAssassin/HTML.pm 952a953,956 # Handle 3 character color shorthand. if (length($color) == 3) { $color =~ s/(.)(.)(.)/$1$1$2$2$3$3/; } Opening a bug to apply it. On 02/17, dar...@chaosreigns.com wrote: Confirmed. #999 is getting converted to #090909, when it should be getting converted to #99. (Threw a print statement into the top of html_font_invisible().) On 02/17, dar...@chaosreigns.com wrote: You should open a bug. SpamAssassin attempts to catch these via html_font_invisible() in HTML.pm (should hit rule HTML_FONT_LOW_CONTRAST). My guess is that it's failing to handle the short form of color values (FFF instead of FF). Looks like they should be converted like 123 - 112233. Report bugs here: https://issues.apache.org/SpamAssassin/ On 02/16, JP Kelly wrote: I am getting a bunch of spam with white text on a white background. Any ideas how to catch it? Here is an example: body bgcolor=#FF leftmargin=0 topmargin=0 marginwidth=0 marginheight=0 p style=color:#FFF; font-size:1px; width:600px; -- Life is either a daring adventure or it is nothing at all. - Helen Keller http://www.ChaosReigns.com -- It is the first responsibility of every citizen to question authority. - Benjamin Franklin http://www.ChaosReigns.com -- I don't want to die... just yet... not while there's... women. - J. Matthew Root, 8/23/02 (http://www.jmrart.com/) http://www.ChaosReigns.com -- Force, my friends, is violence; the supreme authority from which all other authority is derived. - Michael Ironside, Starship Troopers http://www.ChaosReigns.com
Re: White text on white background
You should open a bug. SpamAssassin attempts to catch these via html_font_invisible() in HTML.pm (should hit rule HTML_FONT_LOW_CONTRAST). My guess is that it's failing to handle the short form of color values (FFF instead of FF). Looks like they should be converted like 123 - 112233. Report bugs here: https://issues.apache.org/SpamAssassin/ On 02/16, JP Kelly wrote: I am getting a bunch of spam with white text on a white background. Any ideas how to catch it? Here is an example: body bgcolor=#FF leftmargin=0 topmargin=0 marginwidth=0 marginheight=0 p style=color:#FFF; font-size:1px; width:600px; -- Life is either a daring adventure or it is nothing at all. - Helen Keller http://www.ChaosReigns.com
Re: White text on white background
Confirmed. #999 is getting converted to #090909, when it should be getting converted to #99. (Threw a print statement into the top of html_font_invisible().) On 02/17, dar...@chaosreigns.com wrote: You should open a bug. SpamAssassin attempts to catch these via html_font_invisible() in HTML.pm (should hit rule HTML_FONT_LOW_CONTRAST). My guess is that it's failing to handle the short form of color values (FFF instead of FF). Looks like they should be converted like 123 - 112233. Report bugs here: https://issues.apache.org/SpamAssassin/ On 02/16, JP Kelly wrote: I am getting a bunch of spam with white text on a white background. Any ideas how to catch it? Here is an example: body bgcolor=#FF leftmargin=0 topmargin=0 marginwidth=0 marginheight=0 p style=color:#FFF; font-size:1px; width:600px; -- Life is either a daring adventure or it is nothing at all. - Helen Keller http://www.ChaosReigns.com -- It is the first responsibility of every citizen to question authority. - Benjamin Franklin http://www.ChaosReigns.com
Re: White text on white background
Looks like this fixes it: $ diff ./lib/Mail/SpamAssassin/HTML.pm /usr/share/perl5/Mail/SpamAssassin/HTML.pm 952a953,956 # Handle 3 character color shorthand. if (length($color) == 3) { $color =~ s/(.)(.)(.)/$1$1$2$2$3$3/; } Opening a bug to apply it. On 02/17, dar...@chaosreigns.com wrote: Confirmed. #999 is getting converted to #090909, when it should be getting converted to #99. (Threw a print statement into the top of html_font_invisible().) On 02/17, dar...@chaosreigns.com wrote: You should open a bug. SpamAssassin attempts to catch these via html_font_invisible() in HTML.pm (should hit rule HTML_FONT_LOW_CONTRAST). My guess is that it's failing to handle the short form of color values (FFF instead of FF). Looks like they should be converted like 123 - 112233. Report bugs here: https://issues.apache.org/SpamAssassin/ On 02/16, JP Kelly wrote: I am getting a bunch of spam with white text on a white background. Any ideas how to catch it? Here is an example: body bgcolor=#FF leftmargin=0 topmargin=0 marginwidth=0 marginheight=0 p style=color:#FFF; font-size:1px; width:600px; -- Life is either a daring adventure or it is nothing at all. - Helen Keller http://www.ChaosReigns.com -- It is the first responsibility of every citizen to question authority. - Benjamin Franklin http://www.ChaosReigns.com -- I don't want to die... just yet... not while there's... women. - J. Matthew Root, 8/23/02 (http://www.jmrart.com/) http://www.ChaosReigns.com
Re: SPF and DKIM tests by default?
On 02/10, email builder wrote: I believe for SPF you *should* be doing the detecting at your MTA (mail server software) and inserting a header for spamassassin to use: Received-SPF. (Because SPF is supposed to use the envelope from, which is not necessarily included in a header.) I see. That makes sense. Is there a wiki page suggesting solutions for this? Anyone know of tips for doing this in postfix? Or during amavis processing? I use postfix-policyd-spf-perl. Which appears to currently be officially hosted at: https://launchpad.net/postfix-policyd-spf-perl/ -- For gasoline vapor, the explosive range is from 1.3 to 6.0% vapor to air...useful against soft targets such as...armored vehicles...and bunkers. - http://www.fas.org/man/dod-101/sys/dumb/fae.htm http://www.ChaosReigns.com
Re: SPF and DKIM tests by default?
On 02/08, email builder wrote: Hello, I have a server where I never customized any of the SA rules/tests (SA v.3.3.1). The server does run sa-update every day. Is this the right place to look to know what tests the server should be running? https://spamassassin.apache.org/tests_3_0_x.html At the top of that page, it says Tests Performed: v3.0.x which is not the version you are running. https://spamassassin.apache.org/tests_3_3_x.html contains tests for 3.3. I don't know when they get updated, maybe only when 3.3.0 was released. I wouldn't trust it much. Run: sa-update -D 21| grep DIR That will output something like: Feb 9 12:08:49.609 [20855] dbg: generic: Perl 5.010001, PREFIX=/usr, DEF_RULES_DIR=/usr/share/spamassassin, LOCAL_RULES_DIR=/etc/spamassassin, LOCAL_STATE_DIR=/var/lib/spamassassin On this system, sa-update downloads rules to /var/lib/spamassassin, so I guess you're looking for the LOCAL_STATE_DIR. That directory will contain a directory related to your SA version, something like 3.003001, which will contain updates_spamassassin_org, which will contain the files defining all the rules. Although that doesn't necessarily tell you which are enabled by default. Some require configuration changes. I believe for SPF you *should* be doing the detecting at your MTA (mail server software) and inserting a header for spamassassin to use: Received-SPF. (Because SPF is supposed to use the envelope from, which is not necessarily included in a header.) From that page, it seems that SPF checks are normal but DKIM is not. Is this right? Contrary to that, this page suggests that DKIM test are enabled by default in version 3.3: https://wiki.apache.org/spamassassin/Plugin/DKIM I don't have anything in my /etc/spamassassin/local.cf related to DKIM, and I'm getting DKIM rule hits, so I agree that DKIM is enabled by default (although I'm running trunk / v3.4.0 which is unreleased). I believe SPF tests are also enabled by default, but won't do quite the right thing unless you're inserting the Received-SPF header at your MTA. Also, where can I look to verify the tests/rules currently in place on the server? (per-user rules are not implemented) I looked in /usr/share/spamassassin and there are a few files with spf and dkim in their names. Does that mean those tests are active? Using the official Debian / Ubuntu packages, that directory contains the rules installed by the spamassassin package, which are only used if you do not run sa-update. Which would obviously be sub-optimal. ls *spf* -rw-r--r-- 1 root root 3100 Mar 15 2010 25_spf.cf -rw-r--r-- 1 root root 3584 Mar 15 2010 60_whitelist_spf.cf ls *dkim* -rw-r--r-- 1 root root 4407 Mar 15 2010 25_dkim.cf -rw-r--r-- 1 root root 9288 Mar 15 2010 60_adsp_override_dkim.cf -rw-r--r-- 1 root root 6455 Mar 15 2010 60_whitelist_dkim.cf Those are related, although their presence doesn't indicate anything about defaults. None of the SPF or DKIM rules are particularly highly ranked in spamassassin rule QA, so I wouldn't actually expect significant improvements in accuracy from it: http://ruleqa.spamassassin.org/?daterev=20120204 They both have some substantial flaws. -- Every man, woman and child on the face of this earth is at the mercy of chaos. - a maxwell smart movie http://www.ChaosReigns.com
Re: ham marked as spam: bogus IP in report
On 01/23, Toni Mueller wrote: On Mon, Jan 23, 2012 at 11:59:43AM -0500, Kevin A. McGrail wrote: Am I looking at a bug in SA? And/Or, how do I debug this, please? Baffling. Checking your maillogs, you don't see that IP anywhere? I do see this IP number several times, but it tried to send a completely different email to someone else on my server. I was just about to ask if it might be showing up in other emails, afraid it might be related to: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6617 FreeMail rule description shows emails from previous messages. Ick. -- Begin at the beginning and go on till you come to the end; then stop. - Lewis Carrol, Alice in Wonderland http://www.ChaosReigns.com
Re: update channel list
On 01/18, Micah Anderson wrote: updates.spamassassin.org sought.rules.yerp.org khop-bl.sa.khopesh.com khop-blessed.sa.khopesh.com khop-general.sa.khopesh.com khop-sc-neighbors.sa.khopesh.com but I suspect that some of these are no longer good. I was hoping folks out there might be able to make some suggestions for improvements? All of those are currently listed by Adam Katz on http://khopesh.com/wiki/Anti-spam I expect that list to be up to date. He's an active spamassassin developer. That page also lists 90_2tld.cf.sare.sa-update.dostech.net. I doubt there are any others worth using. If there are, they should probably get added to http://wiki.apache.org/spamassassin/CustomRulesets If there were more sa-update channels that were useful, I'd recommend breaking that page up a little more to put the rule sets with update channels at the top. If you're looking to improve SA accuracy in general, I've tried to make a thorough checklist here: http://wiki.apache.org/spamassassin/ImproveAccuracy -- You only truly own what you can carry at a dead run. - 14th 15th century Landsknechts http://www.ChaosReigns.com
Re: sa-update channel list
On 01/12, jida...@jidanni.org wrote: MS == Michael Scheidell michael.scheid...@secnap.com writes: MS #1 priority: keep your version of sa updated Hmmm, taking a look at it, I find the last update was about 2011/10/24. Too bad sa-update -D doesn't spit out the date. I don't remember what that update was for, but versions prior to 3.3.0 stopped getting regular updates in 2008. -- Every normal man must be tempted at times to spit upon his hands, hoist the black flag, and begin slitting throats. - Henry Louis Mencken (1880-1956) http://www.ChaosReigns.com
Re: installation problem
I have little faith in installing spamassassin from cpan. I'd recommend uninstalling it if you can, and installing from whatever packaging system your OS uses, which I believe is ports. But if there is a related bug in installation from cpan, it would be nice to track it down and fix it. From your debug output: Jan 1 17:06:23.374 [20281] dbg: generic: Perl 5.01, PREFIX=/usr/pkg, DEF_RULES_DIR=/usr/pkg/share/spamassassin, LOCAL_RULES_DIR=/usr/pkg/etc/mail/spamassassin, LOCAL_STATE_DIR=/usr/pkg/var/spamassassin What exactly is the directory sa-update is downloading to? Is it one of those? Does it actually contain rules? $ sa-update -D 21 | grep LOCAL_STATE Jan 1 14:08:54.614 [9446] dbg: generic: Perl 5.010001, PREFIX=/usr, DEF_RULES_DIR=/usr/share/spamassassin, LOCAL_RULES_DIR=/etc/spamassassin, LOCAL_STATE_DIR=/var/lib/spamassassin $ spamassassin -D /dev/null 21 | grep LOCAL_STATE Jan 1 14:09:38.542 [9459] dbg: generic: Perl 5.010001, PREFIX=/usr, DEF_RULES_DIR=/usr/share/spamassassin, LOCAL_RULES_DIR=/etc/spamassassin, LOCAL_STATE_DIR=/var/lib/spamassassin So on my machine, sa-update is downloading to, and spamassassin is loading rules from, the LOCAL_STATE_DIR, and the rule definitions are all in /var/lib/spamassassin/3.004000/updates_spamassassin_org/ . On 01/01, Steve Blinkhorn wrote: Thank you for your various responses. spamassassin --lint -D output is at http://pastebin.com/Hjmt8CbE There is only one sa-update on the system. I installed from CPAN -- Steve Blinkhorn st...@prd.co.uk --f46d04428890f2fb1a04b568e766 Content-Type: text/plain; charset=ISO-8859-1 Check you've only got one saupdate etc installed and you are calling the saupdate associated with the spamassassin you are running. Ie check there's not one installed from ports or as the base install if you hand installed a version and vice versa Martin On Saturday, 31 December 2011, Steve Blinkhorn st...@prd.co.uk wrote: Hi, I just tried to install spamassassin: everything proceeded normally, AFAIK, but the basic spamassassin -t' on the provided sample fails because no rules are found (line 400, which looks to my untutored eye like an all-purpose error-spitter). sa-update appears to run, and exits silently. There is a rules directory under the the directory where I ran the installation, and also under usr/pkg/share, and they are both populated with files which look relevant. I tweaked the script so as not to require rules, and it ran and produced output. NetBSD 4.01, working as root. What is amiss? -- Steve Blinkhorn st...@prd.co.uk This email is for the addressee only. If you are not the addressee you should immediately delete this email from your system(s) and inform us. It may contain information that is confidential or otherwise privileged, and should not be copied or redistributed to recipients not originally specified as addressees without permission. Psychometric Research Development Ltd. PO Box 1143, St Albans, Herts, AL1 9UT, UK Registered in England No. 1909571 Registered Office: 47 Holywell Hill, St Albans, Herts, AL1 1HD Phone: +44 (0)1727 841455 www.prd.co.uk -- -- Martin Hepworth Oxford, UK --f46d04428890f2fb1a04b568e766 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Check you#39;ve only got one saupdate etc installed and you are calling th= e saupdate associated with the spamassassin you are running. Ie check there= #39;s not one installed from ports or as the base install if you hand inst= alled a version and vice versabr brMartinbrbrOn Saturday, 31 December 2011, Steve Blinkhorn lt;a hre= f=3Dmailto:st...@prd.co.uk;st...@prd.co.uk/agt; wrote:brgt; Hi,br= gt; I just tried to install spamassassin: everything proceeded normally,b= r gt; AFAIK, but the basic quot;spamassassin -t#39; on the provided sample= failsbrgt; because no rules are found (line 400, which looks to my untu= tored eyebrgt; like an all-purpose error-spitter). sa-update appears to = run, andbr gt; exits silently. =A0 There is a rules directory under the the directory= brgt; where I ran the installation, and also under usr/pkg/share, and th= eybrgt; are both populated with files which look relevant.brgt;br gt; I tweaked the script so as not to require rules, and it ran andbrgt= ; produced output.brgt;brgt; NetBSD 4.01, working as root. =A0 What i= s amiss?brgt;brgt; --brgt; Steve Blinkhorn lt;a href=3Dmailto:s= t...@prd.co.ukst...@prd.co.uk/agt;br gt;brgt; **= **brgt; This email is for the addressee only. =A0 If you are= not the addresseebrgt; you should
Re: installation problem
On 01/01, Steve Blinkhorn wrote: files like init.pre, sa-update-keys, v312.pre, v330.pre local.cf, v310.pre, v320.pre? I don't know exactly what I'm looking for - is there a standard extgension for rule files? No, those are installed with spamassassin. The files you're looking end in .cf. A good example is the file 50_scores.cf. I'm afraid you'll have to tell me... http://pastebin.com/xaWNQ0GS Your LOCAL_STATE_DIR matches in the output of both - /usr/pkg/var/spamassassin. You should have rules there. This file should exist: /usr/pkg/var/spamassassin/3.004000/updates_spamassassin_org/50_scores.cf Does it? If it doesn't exist, sa-update isn't writing file successfully. If it does exist, spamassassin isn't reading them. Could be weird file permissions I guess. -- But do you have any idea how many SuperBalls you could buy if you actually applied yourself in the world? Probably eleven, but you should still try. - http://hyperboleandahalf.blogspot.com/ http://www.ChaosReigns.com
Re: installation problem
On 01/01, wolfgang wrote: /usr/pkg/var/spamassassin/3.004000/updates_spamassassin_org/50_scores.cf I would rather suspect that file to be located in Jan 1 19:55:45.157 [6360] dbg: channel: update directory /usr/pkg/var/spamassassin/3.003002/updates_spamassassin_org You're right, thanks. I hadn't figured out till now where exactly that version number comes from. 3.003002 = v3.3.2. ^ ^ ^ So Steve, you should have a file /usr/pkg/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf -- If everything seems under control, you're not going fast enough - Mario Andretti http://www.ChaosReigns.com
Re: Help tagging URL spam
body PILSPHARMNEW /pilspharmnew/ score PILSPHARMNEW 5 describe PILSPHARMNEW Body contains /pilspharmnew/. Untested, let me know if it works, but that should do it. On 01/01, Alex wrote: Hi, I'm having difficulty catching a series of spams with just a text component and a URL and hoped someone could help. I've included a few samples on pastebin here: http://pastebin.com/raw.php?i=1Y5QCkfh http://pastebin.com/raw.php?i=KdmZXM0d It only hits BAYES_50 usually, despite learning a few dozen of these over the last week. It also appears to originate from yahoo.com. Any ideas greatly appreciated. Thanks, Alex -- Blessed are they who, in the face of death, think only about the front sight. http://www.ChaosReigns.com
Re: Help tagging URL spam
On 01/02, Alex wrote: What I haven't been able to figure out is a more generalized pattern from these, such as something in the header that is inconsistent with non-spam or contains some type of invalid header data, such as the mismatch between having originated at yahoo but being sent as sbcglobal? Then you should provide a better variety of examples. Shouldn't have bayes picked this up after learning a dozen or more of these? They're probably carefully crafted to avoid being caught by bayes. And I don't think SA's bayes even looks at headers. And bayes definitely doesn't do stuff like mismatching domains in headers. -- Begin at the beginning and go on till you come to the end; then stop. - Lewis Carrol, Alice in Wonderland http://www.ChaosReigns.com
Re: Upgrade FuzzyOcr Plugin to 3.6.0
Is it wise to use FuzzyOCR at this point? Its home page appears to be http://fuzzyocr.own-hero.net/ That says: This project is UNMAINTAINED as of 2009-06-01. Use it at your own risk. If you want to fork this project, drop me a note (decoder[at]own-hero.net). Also, it is highly recommended that you upgrade spamassassin to version 3.3.0 or newer. On 12/21, eliasml wrote: Hello folks!, I have a box running freebsd with SpamAssassin and FuzzyOcr plugin, I have noted that it doen't work fine, the body/description of rules of FuzzyOcr is empty ever, I have googling and I have found that the SpamAssassin version 3.2.4 not is compatible with the FuzzyOcr 3.4 and I must upgrade this version 3.6.0. I'd like know how can do it, somebody can I help me, please? Thanks in advance!! -- View this message in context: http://old.nabble.com/Upgrade-FuzzyOcr-Plugin-to-3.6.0-tp33015472p33015472.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com. -- Rebellion to Tyrants is Obedience to God. - Benjamin Franklin, first version of the reverse side of the Great Seal of the United States. http://www.ChaosReigns.com
Re: DNSWL was re-enabled
On 12/26, Karsten Bräckelmann wrote: score __RCVD_IN_DNSWL 0 It is a non-scoring double underscore sub-rule. It does not have a score. It cannot have a score. Setting its score to zero does nothing, and certainly not prevent the DNS query. Instead, you need to meta out the rule, overwriting the rule definition. And frankly, disabling a rule by logically making it never hit is the better approach anyway. Just re-define rules to disable them: meta FOO 0 I asked about this on the dev list a week ago. I guess I should've cc'd you. http://wiki.apache.org/spamassassin/DnsBlocklists says to use the score method. I went with that. That last one is really important, because without it, you'll still stop getting hits on the dnswl rules, but you'll still be sending queries to dnswl. I'm hoping that'll get fixed. There is nothing to be fixed. There is no problem. The problem is the potential for large sites to disable the rules but not disable the queries, continuing to send millions of unused queries per day. -- Life is either a daring adventure or it is nothing at all. - Helen Keller http://www.ChaosReigns.com