Re: [Bug 7331] channel: SHA1 verification failed, channel failed
On 11 Jan 2018, at 12:58 (-0500), Kevin A. McGrail wrote: And not to run GPG if we don't even download anything. I have not had this issue myself so I all I have is the one example in the ticket, but the logged bad hash there was for a partial download: the first 14372 bytes of 1749638.tar.gz. If there was no download, the attempt to hash a nonexistent file would fail without generating a hash and emitting some error. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Scoring Issues
On 26 Jan 2018, at 17:47 (-0500), Computer Bob wrote: My understanding is that spamassassin is configured for razor and uribl. amavisd-new is configured to call spamassassin so is spamassassin not doing the sub calls ? Not exactly. The command-line 'spamassassin' script is written in Perl and it uses various Perl modules in the Mail::SpamAssassin::* tree. Amavisd-new also uses Mail::SpamAssassin::* modules but it does NOT use the spamassassin script or any other command-line tool. The effect of this is that it is possible for amavisd-new and spamassassin to use different configurations for the Mail::SpamAssassin::* modules. it is clear that this is happening on your system. I see no docs on configuring razor directly in amavis. If you could tell me what to look for it would be appreciated. Unfortunately, I can't help with amavisd-new because I don't use it. However, it is certain that it is using its own oddball config because these scores are ridiculous: tests=[HTML_MESSAGE=0.001, SPF_HELO_PASS=-1, SPF_PASS=-1, It's madness to give SPF_HELO_PASS or SPF_PASS significant scores on their own. Neither should have a score outside of the -0.01 to 0.01 range: SPF is informative but not probative. These rules somehow got set intentionally to sabotage-level scores somewhere that only the amavisd-new process is looking. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Body rules hit on Subject
On 2 Feb 2018, at 16:59 (-0500), Kevin A. McGrail wrote: There is no solution at the moment. The subject is appended to the body of the text for rule parsing. The 2nd sentence is wrong: the subject is *prepended* to the body. Also: the 1st sentence is wrong, there's no *PRETTY* solution. If every rendered 'body' starts with an appended line containing the Subject (with '^Subject: ' stripped off) then one can solve the problem of matching body rules in the Subject header thus: body__DOCUSIGN_BODY_1ST /\A.*\bdocusign\b.*\n/mi body__DOCUSIGN_BODY_NOT1ST /(?!\A).*\bdocusign\b.*\n/mi meta DOCUSIGN_BODY (HAS_SUBJECT && __DOCUSIGN_BODY_NOT1ST) || (__DOCUSIGN_BODY_1ST || __DOCUSIGN_BODY_NOT1ST) -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Body rules hit on Subject
On 3 Feb 2018, at 16:37 (-0500), Bill Cole wrote: On 2 Feb 2018, at 16:59 (-0500), Kevin A. McGrail wrote: There is no solution at the moment. The subject is appended to the body of the text for rule parsing. The 2nd sentence is wrong: the subject is *prepended* to the body. Also: the 1st sentence is wrong, there's no *PRETTY* solution. If every rendered 'body' starts with an appended line containing the Subject (with '^Subject: ' stripped off) then one can solve the problem of matching body rules in the Subject header thus: body__DOCUSIGN_BODY_1ST /\A.*\bdocusign\b.*\n/mi body__DOCUSIGN_BODY_NOT1ST /(?!\A).*\bdocusign\b.*\n/mi meta DOCUSIGN_BODY (HAS_SUBJECT && __DOCUSIGN_BODY_NOT1ST) || (__DOCUSIGN_BODY_1ST || __DOCUSIGN_BODY_NOT1ST) make that: meta DOCUSIGN_BODY (HAS_SUBJECT && __DOCUSIGN_BODY_NOT1ST) || (MISSING_SUBJECT && (__DOCUSIGN_BODY_1ST || __DOCUSIGN_BODY_NOT1ST)) -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Email filtering theory and the definition of spam
On 10 Feb 2018, at 16:00 (-0500), Alex wrote: Can we really trust end-users to properly classify email and not infect themselves with something or follow a phish without knowing? Nope. However, we need to act like we do to some degree while doing the best we can to make it difficult for them to do dumb things. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Barracuda Reputation Block List (BRBL) removal from the SA ruleset
On 11 Feb 2018, at 9:54 (-0500), Benny Pedersen wrote: first query would be valid for 300 secs, but that is imho still not free, problem is that keeping low ttls does not change how dns works, any auth dns servers will upate on soa serial anyway, the crime comes in when sa using remote dns servers that ignore soa serial updates in that case ttls would keep spammers listed for 300 secs only That's not how DNS TTLs work. When a record's TTL elapses in the local name cache, it is dropped. The next query for that name and record type causes the resolver to make another query to the authoritative nameservers, which will return the same record whose TTL expired unless it has been removed from the zone. No standards-conforming DNS resolver returns NXDOMAIN based on the lack of a non-expired record in its cache and an unchanged SOA serial above the name. That would make no sense at all and require many more SOA queries than actually happen. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Email filtering theory and the definition of spam
On 11 Feb 2018, at 16:20 (-0500), Antony Stone wrote: Strange that I can't find SMTP under www.rfc-editor.org/rfc/std/std-index.txt though, other than STD0060 and STD0071, which are both extensions. STD10 is SMTP (RFC821), STD11 is message format(RFC822). -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam
On 13 Feb 2018, at 9:33, Horváth Szabolcs wrote: This is a production mail gateway serving since 2015. I saw that a few messages (both hams and spams) automatically learned by amavisd/spamassassin. Today's statistics: 3616 autolearn=ham 10076 autolearn=no 2817 autolearn=spam 134 autolearn=unavailable That's quite high for spam, ham, AND "unavailable" (which indicates something wrong with the Bayes subsystem, usually transient.) This seems like a recipe for a mis-learning disaster. For comparison, my 2018 autolearn counts: spam: 418 ham: 15018 unavailable: 166 no: 129555 I also manually train any spam that gets through to me (the biggest spam target,) a small number of spams reported by others, and 'trap' hits. A wide variety of ham is harder to get for training but I have found it useful to give users a well-documented and simple way to help. One way is to look at what happens to mail AFTER delivery which can indicate that a message is ham without needing an admin to try to make a determination based on content. The simplest one is to learn anything users mark as $NotJunk as ham. Another is to create an "Archive" mailbox for every user and learn anything as ham that has been moved there a day after it is moved. The most important factor (especially in jurisdictions where human examination of email is a problem) is to tell users how to protect their email and then do what you tell them, robotically. In the US, Canada, and *SOME* of the EU, this is not risky. However, I have been told by people in *SOME* EU countries that they can't even robotically scan ANY mail content, so you shouldn't take my advice as authoritative: I'm not even a lawyer in the US, much less Hungary... I think I have no control over what is learnt automatically. Yes, you do. Run "perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold" for details. You can set the learning thresholds, which control what gets learned. The defaults (0.1 and 12) mis-learn far too much spam as ham and not enough spam. I use -0.2 and 6, which means I don't autolearn a lot but everything I autolearn as ham has at least one hit on a substantial "nice" rule or 2 hits on weak ones. There's a lot of vehemence against autolearn expressed here but not a lot of evidence that it operates poorly when configured wisely. The defaults are NOT wise. Let's just assume for a moment that 1.4M ham-samples are valid. Bad assumption. Your Bayes checks are uncertain about mail you've told SA is definitely spam. That's broken. It's a sort of breakage that cannot exist if you do not have a large quantity of spam that has been learned as ham. Is there a ham:spam ratio I should stick to it? No. I presume if we have a 1:1 ratio then future messages won't be considered as spam as well. The ham:spam ratio in the Bayes DB or its autolearning is not a generally useful metric. 1:1 is not magically good and neither is any other ratio, even with reference to a single site's mailstream. A very large ratio *on either side* indicates a likely problem in what is being learned, but you can't correlate the ratio to any particularly wrong bias in Bayes scoring. It is an inherently chaotic relationship. Factors that actually matter are correctness of learning, sample quality, and currency. You can control how current your Bayes DB is (USE AUTO-EXPIRE) but the other two factors are never going to be perfect.
Re: URIBL_BLOCKED
On 15 Feb 2018, at 4:10 (-0500), Tobi wrote: Am 15.02.2018 um 02:35 schrieb @lbutlr: On 2018-02-14 (09:55 MST), Tobi <jahli...@gmx.ch> wrote: Am 14.02.2018 um 17:16 schrieb @lbutlr: I can't imagine why i'd be over limit, my mail server is tiny. its not the mailserver that got blocked by limits, but the dns resolver your mailserver uses! I use my own DNS on Bind 9.12, however the block error is not appearing today, so... and does your bind server use other forward servers? Or does it directly resolve the queries from the authorative nameservers? All depends whether you resolver is in forward mode or not. If it's in forward mode then it sounds that the ips of those forwarders might got limited Another possibility is DNS hijacking. Connection providers pitch it as a security measure, and I guess it can be for residential customers and small businesses that essentially use their connections in the same ways as home users, but it's lethal for mail systems. My provider (WOW Business) does it by default. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: problem with spamassassin for WIndows
On 17 Feb 2018, at 14:48 (-0500), Kevin A. McGrail wrote: I gave you a suggestion the other day. Your configuration is wrong. You aren't passing lint Look at that line 717 or if that's not the right line number, look at your configuration around your DB for Bayes. I'm not sure that Bayes has anything to do with it specifically. To get to Parser.pm line 571, it seems to me that the Parser needs to read a line that starts with "ifplugin" (or "if plugin") and that it is expecting a plugin name argument on that line. I can reproduce the base error by adding these 2 lines to any of the .pre or .cf files in the site preferences directory or to the active user_prefs file: ifplugin endif The one oddity is that I don't get any ' line [number]' clause in the error message from creating that broken config. This may be a Windows-specific quirk (I don't have a Windows machine for testing) or it may indicate some more arcane issue in how the configuration is being parsed. So the thing to look for is 'ifplugin' in local.cf, any other *.pre or *.cf file in the same directory as local.cf, or your user_prefs file. It should be followed by the name of a plugin, a block of lines defining rules or setting configuration parameters, and an "endif" line. On 2/17/2018 2:31 PM, Gianluca Furnarotto wrote: So, anyone can't give me a suggestion? On 16 febbraio 2018 a 08:24:04, Gianluca Furnarotto (keyst...@libero.it <mailto:keyst...@libero.it>) scritto: Hi Bill, this is the result of the command you suggested to type: feb 16 07:21:09.678 [21824] warn: Use of uninitialized value $_[1] in hash eleme nt at Mail/SpamAssassin/Conf/Parser.pm line 571, line 717. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Junk mixed in with ham on whitelists
On 20 Feb 2018, at 16:48, David Jones wrote: It doesn't seem like a good idea for whitelists to list these senders just because most of the email is ham. I can see no evidence for that in a quick check of my personal mail. In 10 years: 68 messages 50 spam (all reported) 6 replies to spam reports 2 OoO Autoreplies to mailing messages with vacation info for guys I didn't know. 8 messages to single-sender (webite-specific) addresses 2 messages from Namecheap themselves (privateemail.com ) trying to arrange an automatic monitoring rig for when their space lands on my (extremely irrelevant...) blacklist or a FBL for when I get spam from them. This raises the question: if a company whose business model is dependent on snowshoe spammers and domain squatters sends email asking for unpaid help in evading recognition of their essential evil, is it spam? In the previous decade: 64 messages, 56 spams, 8 ham (all from 3 websites to tagged addresses.) Of course, my personal email isn't representative. I reject a substantial fraction of the mail from the networks where those domains have servers, and for a complex of reasons I have extremely high confidence in those rejections being pure spam. So, the above is less spammy than if I tagged and delivered. What's special about such sources isn't that they're mostly ham or even significantly less spammy than a random sample of mail, it's that they have a lot of tiny customers who barely use email and occasional waves of transient spammers. It makes them hard to pigeonhole either way. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: spamasssassin vs mimedefang scores
On 22 Feb 2018, at 4:15, saqariden wrote: Hello guys, i'm using mimedefang with spamassasin, when I test an email with the command "spamassain -t file.eml", I got results like this: Dails de l'analyse du message: (-5.8 points, 3.0 requis) -5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at http://www.dnswl.org/, high trust [70.38.112.54 listed in list.dnswl.org] -1.9 BAYES_00 BODY: L'algorithme Bayien a alula probabilitde spam entre 0 et 1% [score: 0.] 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS 0.3 TO_EQ_FM_DOM_SPF_FAIL To domain == From domain and external SPF failed However, the SA check which was done trough mimedefang, seems like giving other scores, how can i test an email to get these scores, and saw the difference. Typically mimedefang runs as its own special user (e.g. 'defang') which may be configured to block normal interactive use or even simple 'su' use by root. This means that if you run 'spamassassin -t' in an interactive shell, you use the user_prefs, AWL/TxRep and BayesDB for the user running that shell, not the special user. This is particularly problematic for 'learning' ham and spam for the BayesDB, because it is easy to end up either training into a DB that is entirely separate from the system-wide one used by mimedefang OR working with the system-wide DBs in ways that change ownership of them so that mimedefang can't use them. My solution for this is to use sudo and these shell aliases: satest='sudo -H -u defang spamassassin -t ' lham='sudo -H -u defang sa-learn --ham --progress ' lspam='sudo -H -u defang sa-learn --spam --progress ' blspam='sudo -H -u defang spamassassin --add-to-blacklist ' reportspam='sudo -H -u defang spamassassin -r -t '
Re: problem with spamassassin for WIndows
On 15 Feb 2018, at 15:33, Gianluca Furnarotto wrote: Hi, I am trying to use Bayes with spamassassin, now it seems stop to learn, and when I use a command as "sa-learn --dump magic", or "sa-learn --sync", or other sa-learn commands, it appears this error: "Use of uninitialized value $_[1] in hash element at Mail/SpamAssassin/Conf/Parser.pm line 571." Line 571 is this: " } " inside these lines. " elsif ($type == $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST) { $cmd->{code} = \_addrlist_value; }" <--- line 571 That absolutely IS NOT line 571 of Mail/SpamAssassin/Conf/Parser.pm in SA version 3.4.1. That's line 685. The relevant lines in Mail/SpamAssassin/Conf/Parser.pm: 568 569 # functions supported in the "if" eval: 570 sub cond_clause_plugin_loaded { 571return $_[0]->{conf}->{plugins_loaded}->{$_[1]}; 572 } 573 My first guess on this is that your configuration has a typo. Try running 'spamassassin --lint' to check it. The error message indicates that something is calling the subroutine 'cond_clause_plugin_loaded' in a way that gives it only one parameter where it is expecting 2, the first of which is an object reference.
Re: Run expensive test last, and skip if meaningless
On 25 Feb 2018, at 11:13 (-0500), Peter Thomassen wrote: Reminder: My question was not "how to run DNS efficiently" or "how does SpamAssassin run DNS queries", my question was "how can I influence the order of tests". The canonical answer is: by adjusting rule priority values and using the short-circuit feature. Unfortunately, that's not applicable to DNS tests because SA's code is optimized for total scan time. This means that DNS checks, which have built-in latency, are started asynchronously before everything else. So if you want to change that to postpone DNS checks with the possibility of short-circuiting them, you will need to re-architect that part of SA. If you choose that route, I expect that patches to make that alternative design a configurable option would be welcome upstream but I doubt that creating such a mechanism would be made a project priority in any way because the current optimization for overall performance is much more useful for most users than economizing on DNS queries. For what it's worth, a few years ago I had to do an analysis of URIBL data value with hard numbers and I found that for that particular operation, URIBLs were decisive in a large majority of spams accurately classified as spam by SA. It is important to note that for this site (as for all sites I have done significant work with in this millennium) the vast majority of mail never was seen by SA because it was rejected (or much less often was exempted from SA by whitelisting) ahead of the DATA phase. This also meant that in that particular case, there was no risk of hitting the point of URIBL_BLOCKED. If all messages had been run through SA, they would have needed to pay for data feeds to have a useful spam control function. Obviously, no 2 mail systems see exactly the same distribution of ham and spam, so your circumstances might make buying feeds uneconomic. OTOH, unless you're an extremely adept programmer or your time is not very valuable, redesigning the way SA runs tests is very likely to be the most uneconomic choice available to addressing your root problem. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: IADB whitelist
On 25 Dec 2017, at 3:28 (-0500), Sebastian Arcus wrote: Also, any idea why are there 6 different rules associated with this particular whitelist? IADB has many independent return codes that each have distinct meaning. See http://www.isipp.com/email-accreditation/about-the-codes/list-of-codes/ for details. If you get mail from an IADB-listed sender that you are 100% sure is spam (i.e. not "I would never ask for such mail" but "the recipient absolutely did not consent to receiving this mail.") then you should report that to ISIPP. "ab...@suretymail.com" is the reporting address listed on their website and while I've not had cause to use it, people I trust with no reason to lie say that reports to that address do actually work to either change sender behavior or eliminate listings. Anne Mitchell (head of ISIPP) is an ex-coworker of mine whose integrity and dedication to the anti-spam fight (which is dependent on keeping *wanted* mail deliverable) I can personally vouch for. However, the different responses from IADB are VERY nuanced and the two strongest rules you listed (RCVD_IN_IADB_OPTIN and RCVD_IN_IADB_VOUCHED) are essentially "good intentions" markers. Due to unfortunate terminology choices by ISIPP and a willingness to engage in nuance and estimate intentions, those aren't really as worthwhile as they might seem. The IADB definition of "All mailing list mail is opt-in" is (effectively) "we believe that this ESP believes in good faith that every recipient has chosen to receive this mail." Their "vouching" for a record is an assertion that either the ESP is personally known to ISIPP staff as competent and honest OR has maintained stable positive listings for >6 months. I'm pretty sure I don't want ANY score for a non-vouched record and unlike ISIPP (and some valuable SA contributors!) I really don't care much about ESPs' intentions or responsiveness to complaints, only about actual spamming behavior. So I have made substantial modification on my own system to how IADB results are scored, but those specific adjustments are probably not fit for most other sites. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Malformed spam email gets through.
On 2 Jan 2018, at 5:12 (-0500), Rupert Gallagher wrote: This is the normative reference. This is the OBSOLETED normative reference. RFC 822, pg. 30, section 6.2.3 -- msg-id = "<" addr-spec ">"; addr-spec = local-part "@" domain; domain = sub-domain *("." sub-domain); sub-domain = domain-ref / domain-literal; <host>> Note that the "@" must also be present as part of the well-formed-formula. When absent, the string is not well formed, and a syntax error occurs. The change of formal syntax in RFC2822 to remove the reference to domain entities was not inadvertent or surreptitious. RFC5322 didn't reverse that change. RFC 5322, pg. 27, section 3.6.4 --- << The message identifier (msg-id) itself MUST be a globally unique identifier for a message. The generator of the message identifier MUST guarantee that the msg-id is unique. There are several algorithms that can be used to accomplish this. Since the msg-id has a similar syntax to addr-spec (identical except that quoted strings, comments, and folding white space are not allowed), a good method is to put the domain name (or a domain literal IP address) of the host on which the message identifier was created on the right-hand side of the "@" (since domain names and IP addresses are normally unique), and put a combination of the current absolute date and time along with some other currently unique (perhaps sequential) identifier available on the system (for example, a process id number) on the left-hand side. Though other algorithms will work, it is RECOMMENDED that the right-hand side contain some domain identifier (either of the host itself or otherwise) such that the generator of the message identifier can guarantee the uniqueness of the left-hand side within the scope of that domain. >> Note the use of RFC2119 terms. MUST and RECOMMENDED mean different things. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Malformed spam email gets through.
On 1 Jan 2018, at 9:59 (-0500), David Jones wrote: I think some mail systems will keep the same message-ID per email thread so your system must reject some replies. I have not seen such behavior in the past 20 years... Intentionally re-using another site's MIDs is so wrong that I'd happily make it break hard. HOWEVER, the idea of enforcing any standard on MIDs beyond gross format (e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the sole user is ludicrous. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Malformed spam email gets through.
On 1 Jan 2018, at 10:33 (-0500), David Jones wrote: On 01/01/2018 09:29 AM, Bill Cole wrote: On 1 Jan 2018, at 9:59 (-0500), David Jones wrote: I think some mail systems will keep the same message-ID per email thread so your system must reject some replies. I have not seen such behavior in the past 20 years... Ok. I stand corrected then. What about bounces? Don't they intentionally keep all of the same headers with an empty envelope-from? Nope. A modern standard 'bounce' message is a MIME entity with a special type, denoted by a header somewhat like this: Content-Type: multipart/report; report-type=delivery-status; boundary="blah.foo.bar-baz/example.com" It should have a unique MID, a Date header reflecting the time of the bounce, a Subject header like "Undelivered Mail Returned to Sender", a To header with the original message's envelope sender, a From header clearly identifying the last MTA to hold the message and it's non-human nature such as 'mailer-dae...@example.com (Mail Delivery System)', and Received headers only reflecting the transit from that MTA to the target of the bounce. One PART of a bounce is a message/rfc822 entity which has at least the headers of the original message and usually some or all of the body -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Malformed spam email gets through.
On 1 Jan 2018, at 12:47 (-0500), Matus UHLAR - fantomas wrote: On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote: the gross format in RFCs 822,2822 and 5322 describes message-id consisting of local and domain part, thus is must contain "@". On 01.01.18 12:17, Bill Cole wrote: No, it does not. Re-read the cited sections. From RFC5322, the ABNF definition: msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS] this is the part that says message-id must consist of local and domain parts. It just says it implicitly, not explicitly, but: It's not possible to construct Message-Id without the "@" while conforming to any of mentioned RFCs. True, but one could just as easily split up a UUID with '@' instead of '-' and comply while being as sure of uniqueness as could ever matter. Or put full UUIDs on both sides of the '@'. If a V1 UUID is on the right, it is even a host-unique identifier after a fashion. Also note that if you demand that MIDs contain '@' with conforming strings on both sides, you risk losing mail that users want. This is a mistake I have made. what exactly was the problem? Message-Id without the "@" or the non-conforming parts there? Missing '@' Some messages lacking it were generated by antique systems that had proven themselves resistant to evolutionary pressures. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Malformed spam email gets through.
On 1 Jan 2018, at 14:30 (-0500), Alan Hodgson wrote: On Mon, 2018-01-01 at 10:29 -0500, Bill Cole wrote: [...] HOWEVER, the idea of enforcing any standard on MIDs beyond gross format (e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the sole user is ludicrous. I've had good success junking anything with one of my domains in the message-id, where I know the mail isn't actually from someone in that domain. That's a pretty solid spam signature. Yes, I was a bit imprecise. Very specific idiosyncratic MID patterns can be extremely accurate spam indicators. Enforcement of RFC or common practice "standards" is riskier than it is worth. Lack of any message-id is also significant, but sadly there are still some real senders sending mail with no message-id. Yes. It's one of the most annoying persistent sorts of mail sloppiness. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Malformed spam email gets through.
On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote: the gross format in RFCs 822,2822 and 5322 describes message-id consisting of local and domain part, thus is must contain "@". No, it does not. Re-read the cited sections. From RFC5322, the ABNF definition: msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS] id-left = dot-atom-text / obs-id-left id-right= dot-atom-text / no-fold-literal / obs-id-right no-fold-literal = "[" *dtext "]" Note the lack of specification of "local" and "domain" parts. Also note that if you demand that MIDs contain '@' with conforming strings on both sides, you risk losing mail that users want. This is a mistake I have made. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Malformed spam email gets through.
On 1 Jan 2018, at 3:54 (-0500), Rupert Gallagher wrote: We reject anything whose mid does not include the fqdn or address literal of their sending server. We do this because the RFC says explicitly that the mid *MUST* have those features. This is a blatant falsehood. Relevant RFCs: https://tools.ietf.org/html/rfc5322#section-3.6.4 https://tools.ietf.org/html/rfc2822#section-3.6.4 https://tools.ietf.org/html/rfc822#section-4.6 The only "MUST" in regard to MID content in any of those is uniqueness. Use of a domain identifier is merely RECOMMENDED. Beyond that, it is *IMPOSSIBLE* for a receiving system to reliably determine whether the right-hand part of a MID is a valid host or domain identifier for the generator of the MID. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Malformed spam email gets through.
On 2 Jan 2018, at 20:39, Alex wrote: Is it possible to at least enforce that the message-ID has a valid domain? Not reliably. About 1.5% of my personal non-spam email over the past 20 years has had "localhost" as the right hand side of the MID. This implies a de facto RFC violation because it poses a real risk of duplication. An additional ~1% has a MID header with either no dots or no '@'. This includes mail from Facebook, Seagate, Apple, one of my credit unions, a medical supply house that we buy from for my son's care, GMX (German freemail provider), multiple regulars on a private mailing list of old-timer anti-spam nutcases, the postmaster of LinkedIn sending personal mail with his linkedin.com address via GMail, iFixit, Verizon's SMS->Email gateway, and multiple ESPs including Eloqua and Digital River. At least one recent version of CommuniGate Pro (6.1.2) generated event invitations with a bare UUID as the MID. In other words: a significant number of messages, largely legitimate transactional messages, lack a FQDN in the MID. I have run an environment where each MTA node in the external gateway layer would add a MID with its own FQDN to any message passing through missing a MID. Those names could not be resolved in the world at large, but they were absolutely valid and guaranteed unique.
Re: Periodic error
On 1 Aug 2018, at 12:12 (-0400), Nick Bright wrote: spamd[1833]: plugin: eval failed: error closing socket: Bad file descriptor at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/DnsResolver.pm line 185, line 156. What version of SpamAssassin are you using? Those line numbers make no sense with the 3.4.1 release or either current development branch. The last version it seems to make sense for is 3.3.2, which is antique. This is particularly important because that module makes heavy use of the Net::DNS module, which has undergone a huge amount of change in recent years, much of it wise and some of it causing old code to break. If you are using a modern Net::DNS and an antique SpamAssassin, there will be trouble. I'm sometimes receiving this error in my maillog, certainly not for every message that gets scanned. It seems to come in bursts. I've been unable to determine what's causing the error though. I'm running a BIND9 resolver on 127.0.0.1. When it occurs, the system sees high load average and poor performance due to iowait caused by the error. I don't think it's a file descriptor limit, as I've set my system to 512,000 for /proc/sys/fs/file-max and 65535 for ulimits, and "sysctl fs.file-nr" shows 17,056 out of 512,000 in use. You are correct. This "file descriptor" is a socket being used for DNS resolution. Suggestions? Thoughts? Upgrade to a modern SpamAssassin. If that's not possible, make sure that you are using a Net::DNS of a similar age to the antique SA. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steadier Work: https://linkedin.com/in/billcole
Re: Phish with xps attachment
On 7 Aug 2018, at 15:31 (-0400), Martin Gregorie wrote: On Tue, 2018-08-07 at 14:09 -0400, Alex wrote: Anyone have ideas for viewing inside of an XPS file or otherwise blocking phish attempts with xps attachments? https://pastebin.com/KtMnNPAg I don't think this is validly base64 encoded. I chopped it down to just the supposed base64 text and fed it through the Linux base64 decode utility, which gave up and said it isn't valid base 64 after decoding about 150 characters. Maybe check how you did that. Using the mimeexplode tool from the Perl MIME-Tools package: # mimeexplode /tmp/xpsspam Message: msg0 (/tmp/xpsspam) Part: msg0/msg-53100-1.txt (text/plain) Part: msg0/msg-53100-2.html (text/html) Part: msg0/Remittance Copy.xps (application/octet-stream) # ls -lAR msg0/ total 720 -rw-r--r-- 1 root wheel 354446 Aug 7 16:49 Remittance Copy.xps -rw-r--r-- 1 root wheel 336 Aug 7 16:49 msg-53100-1.txt -rw-r--r-- 1 root wheel4629 Aug 7 16:49 msg-53100-2.html # file msg0/Remittance\ Copy.xps msg0/Remittance Copy.xps: Zip archive data, at least v2.0 to extract # zipinfo msg0/Remittance\ Copy.xps Archive: msg0/Remittance Copy.xps 354446 bytes 18 files -rw 4.5 fat 1063 b- defS 1-Jan-80 00:00 [Content_Types].xml -rw 4.5 fat 567 b- defS 1-Jan-80 00:00 _rels/.rels -rw 4.5 fat 3566 b- stor 1-Jan-80 00:00 docProps/thumbnail.jpeg -rw 4.5 fat 564 b- defS 1-Jan-80 00:00 docProps/core.xml -rw 4.5 fat 287 b- defS 1-Jan-80 00:00 Documents/1/_rels/FixedDoc.fdoc.rels -rw 4.5 fat 320 b- defS 1-Jan-80 00:00 FixedDocSeq.fdseq -rw 4.5 fat2 b- defN 1-Jan-80 00:00 Resources/31AB0740-4E67-23ED-1861-906DB2445D30.odttf -rw 4.5 fat61580 b- defN 1-Jan-80 00:00 Resources/36F32615-19BB-2EEA-BD7D-5051E214FE53.odttf -rw 4.5 fat 266980 b- defN 1-Jan-80 00:00 Resources/128F6B1F-5739-13F9-6E4A-207A4466DE12.odttf -rw 4.5 fat 1346 b- defS 1-Jan-80 00:00 Documents/1/Pages/_rels/1.fpage.rels -rw 4.5 fat 282 b- defS 1-Jan-80 00:00 Documents/1/FixedDoc.fdoc -rw 4.5 fat 4990 b- defN 1-Jan-80 00:00 Documents/1/Structure/Fragments/1.frag -rw 4.5 fat50574 b- defN 1-Jan-80 00:00 Documents/1/Pages/1.fpage -rw 4.5 fat 7042 b- stor 1-Jan-80 00:00 Resources/Images/image_0.png -rw 4.5 fat 290 b- stor 1-Jan-80 00:00 Resources/Images/image_1.png -rw 4.5 fat 481 b- stor 1-Jan-80 00:00 Resources/Images/image_2.png -rw 4.5 fat 386 b- defN 1-Jan-80 00:00 Documents/1/Structure/DocStructure.struct -rw 4.5 fat 527552 b- defN 1-Jan-80 00:00 Resources/01EC0564-4D18-6AF6-270E-667DA377AC79.odttf 18 files, 983422 bytes uncompressed, 350592 bytes compressed: 64.3% The payload is not in that XPS document, which is just a picture that claims to be an Office365 document with a big "Open File" button. That region is linked to a URL (MUNGED: hxxps://ssllink(dot)me/1sta) which at present redirects to a Brazilian domain which yields a 500 reply with a "bandwidth exceeded" message. Presumably the payload used to be there... -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steadier Work: https://linkedin.com/in/billcole
Re: Update to Ubuntu 18.04.1 seems to have partially broken SA
On 17 Aug 2018, at 18:49 (-0400), Chris wrote: > Not in one of my rules: OK, but also not part of the standard ruleset: 3rd-party rules. Kevin is a highly respected and active leader in the SpamAssassin project and the ASF as a whole but those rules aren't part of his contributions to the project. They are useful for small & medium sized business mail systems but don't really fit the broad safety & efficacy requirements of the official distribution. You use them by active choice, not by trusting in what SA provides... > > /etc/mail/spamassassin$ grep -i "POWERBALL" KAM.cf > body __KAM_LOTTO5/(POWERBALL LOTTO|freelotto > group|Royal Heritage Lottery|(British|UK) National( Online)? > Lottery|U\.?K\.? Grand Promotions|Lottery Department UK|Euromillion > Loteria|Luckyday International Lottery|International Lottery|Euro - > Afro Asian Sweepstake|urawinner|Free Lotto Sweepstakes|PROMOTION > DEPARTMENT|PROMOTION\/PRIZE AWARD|Nederlandse Internationale > Loterij|EURO MILLIONS|APPLE LOTTERY ONLINE|MSW MEGA JACKPOT|MICROSOFT > EMAIL PROMO|MSNlottery|ECOWAS|Nigeria|National > Lottery|claim.{1,10}your.gbp|won.you.{1,10]gbp)/is > header__KAM_LOTTO8From =~ > /Lottery|powerball|western.union/i If you're using KAM.cf, you should set up a mechanism for keeping that file up to date. This typo was fixed over 2 months ago (as far back as I have online backups of it) and the current KAM.cf has dozens of other changes in that time. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steadier Work: https://linkedin.com/in/billcole signature.asc Description: OpenPGP digital signature
Re: spample: porn extortion with pure numeric From domain and base64 body
On 17 Jul 2018, at 20:00 (-0400), Chip M. wrote: There's a new morph of the porn extortion campaign, with some interesting under-the-hood changes. The previous ones were always: - two "quoted-printable" parts (plain text, html) - "From" Outlook accounts - sent via Outlook/Hotmail/MS IPs (no other IPs in route) - passed both DKIM and SPF The new version has: - one base64 html part - pure numeric "From" domain (same address in SMTP & header) Why would anyone accept mail with a SMTP sender domain that is purely numeric? Checking that the sender domain exists is a fundamental necessity in operating an Internet-facing MTA. - sent via compromised computers (and typically 3 or 4 Received IPs) Or not. The Received headers in your sample beyond the top one are entirely ridiculous, obviously fake.They are not even internally consistent. - bogus domains so neither DKIM nor SPF possible So what? Bogus domains are bogus. If you can't resolve an A or MX for an envelope sender domain, nothing else (including SA) needs be done. It's bogus mail. [...] Three other unusual things (all demonstrated in this spample): 1. 9 of the 13 had a two part pure numeric claimed host (see below). I don't recall seeing that before. ** Is that a botnet fingerprint? Probably not in a strict sense. Obviously it is a fingerprint of a particular stream of spam, but it is not a behavior that is widespread, such as claiming to be "User" or "ylmf-pc" or the IP of the server being spammed. My guess is that it's one spammer with a severe clue shortage. 2. 9 of the 13 lacked a trailing "=". I don't recall seeing that before. That's slightly more common than normal, but not spectacularly more when considering that the input data isn't random. The encoded version of 1/3 of all inputs will not end with '=' and 1/3 will end with '==' while none should ever end with '==='. It's probably worth a quick test, if easy to implement. Nope. Absolutely NOT spamsign, unless you consider having an even multiple of 3 bytes in the unencoded data to be spamsign. :) 3. 4 of the 13 failed to hit "MIME_BASE64_TEXT". I'm curious what the issue is. The trailing "=" was not a factor. The main thing that stood out is that the hits all had this CT: Content-Type: text/html; charset="us-ascii" The misses all had: Content-Type: text/html; charset="iso-8859-1" I have not traced the code to be sure, but as I understand it, it shouldn't be possible to hit MIME_BASE64_TEXT unless the character set is US-ASCII. Since iso-8859-1 is an 8-bit character set, it is entirely proper and frequently essential that either Base64 or QP be used to encode it. [...] I've just added the above suggested SA metas, and a low level (non-regex) pure numeric TLD test. I would not expect the numeric TLD test to hit much in the submitted corpora, since NO_DNS_FOR_FROM is not hitting enough to have a meaningful score and a pure numeric TLD in the envelope sender would always hit NO_DNS_FOR_FROM. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steadier Work: https://linkedin.com/in/billcole
Re: spample: porn extortion with pure numeric From domain and base64 body
And in addition... On 17 Jul 2018, at 20:00 (-0400), Chip M. wrote: > 3. Pure numeric TLDs appear to be non existent (so far!) I expect that this will hold true for a long time. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steadier Work: https://linkedin.com/in/billcole
Re: Best practice for learning submissions
[N.B.: Your prior correspondent is not able to post to this list, so we only saw your side of that exchange.] On 23 Jul 2018, at 19:38 (-0400), Nick Bright wrote: When requesting submissions from users for use with sa-learn, if they are going to forward the message somewhere; is it best for that to be forwarded as an attachment, or forwarded inline? Will sa-learn automatically understand "the spam is attached" if it's an attachment? Learning from a mailbox of my own spam (with full headers - the actual mails) is quite different from users *forwarding* spam for training. So I ask: what is the best practice for learning submissions when using site-wide bayes? The goal is to get a copy of the message that is identical to what SA saw when it arrived. For IMAP users, this is easiest to get with a 'missed spam' mailbox into which users can move messages for learning. If you must rely on forwarded submissions, make sure users are forwarding messages as attachments, and have the target deliver into a mailbox that is processed to extract the 'message/rfc822' MIME object(s) in those submissions and learn those, not the submission mail itself. Learning ham is harder, because generally speaking it is not a good idea to deliver mail that SA believes is spam *at all* unless you can't reject it in SMTP. As a result, users don't have 'false positive' samples to submit (although their irate would-be correspondents could...) In an IMAP environment, you can identify borderline ham that is useful to learn by looking at tagging and archiving. If the user assigns a keyword to a message and/or moves it to a mailbox (other than ones with names like Junk and Spam and Trash) you can usually be sure it is ham. If your users are trainable (it DOES happen...) you might even get them to use specific keywords and/or archival mailboxes and use those to feed ham training. In a POP3 environment, this is a much harder problem to solve. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steadier Work: https://linkedin.com/in/billcole
Re: Best practice for learning submissions
On 24 Jul 2018, at 13:39, Nick Bright wrote: On 7/23/2018 11:49 PM, Bill Cole wrote: The goal is to get a copy of the message that is identical to what SA saw when it arrived. For IMAP users, this is easiest to get with a 'missed spam' mailbox into which users can move messages for learning. If you must rely on forwarded submissions, make sure users are forwarding messages as attachments, and have the target deliver into a mailbox that is processed to extract the 'message/rfc822' MIME object(s) in those submissions and learn those, not the submission mail itself. Any specific utilities you could suggest? I've used an adapted version of the mimeexplode tool from the Perl MIME-Tools distribution (https://metacpan.org/source/DSKOLL/MIME-tools-5.509/examples/mimeexplode) in conjunction with formail (part of Procmail.)
Re: Line too long [rfc 2822, section 2.1.1]
On 13 Jul 2018, at 14:49, Rupert Gallagher wrote: A little survey on your local policies... What do you do when a subject line is longer than 78 characters? A. Reject B. Accept as spam C. Accept Accept, absent some actual spam sign. Note that the 78-character recommendation is not applicable to logical (decoded and unfolded) header fields but only to lines in the uninterpreted & unmodified transport format. To catch that in SA, a test would be something like: header LONG_SUBJ_LINE Subject:raw =~ /.{79,}/m And that will match mail that many people really want to not be blocked.
Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)
On 30 Aug 2018, at 12:40, Grant Taylor wrote: > On 08/30/2018 10:16 AM, Bill Cole wrote: >> It's hard to understand this circumstance based on the generic description. >> >> It appears that you have a configuration where a relay is in >> trusted_networks (i.e. you believe what it asserts in Received headers) but >> it is NOT in internal_networks so it is in the synthetic >> X-Spam-Relays-External pseudo-header, it is the only element in >> X-Spam-Relays-External so the message matches__DOS_SINGLE_EXT_RELAY, and it >> has no rDNS so the message matches __RDNS_NONE. >> >> So: why is that nameless machine that you cannot make a named machine NOT in >> internal_networks? > > I don't know if this is the OP's case or not, but the following example comes > to mind. > > SA (running on your receiving MTA) receives a message from an MTA (which is > itself an MSA) of an external Business-to-Business partner (thus a trusted > MTA that is not internal to the recipient's organization) which itself > received the message from a client on an RFC 1918 network without reverse DNS. If that MSA is requiring authentication (as it should) and recording that in the Received header (as it should) then as I understand it, the handoff of the message will not be considered for __RDNS_NONE. >> Of course not, but if a machine is trusted to tell the truth in Received >> headers and has no rDNS because it is talking to a close affiliate on a >> RFC1918 IP, in what sense is it not internal? > > Trusting a B2B partner's external MTA. OK, but in that case the MTA would use an IP that should be in trusted_networks and have rDNS. >> Or is it in internal_networks but there's something wrong in how SA is >> parsing Received headers to build X-Spam-Relays-External? >> >> >> I think the fix for all is for everyone to get their internal_networks and >> trusted_networks configurations correct. > > What should trusted_networks and internal_networks be set to in the B2B > scenario I'm describing? The partner machine's IP should be in trusted_networks AND should have rDNS as an explicit technical requirement of the cooperation, which is entirely reasonable. signature.asc Description: OpenPGP digital signature
Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)
On 30 Aug 2018, at 15:56, Grant Taylor wrote: > On 08/30/2018 01:08 PM, Bill Cole wrote: >> If that MSA is requiring authentication (as it should) and recording that in >> the Received header (as it should) then as I understand it, the handoff of >> the message will not be considered for __RDNS_NONE. > > Okay. > > What happens if the MSA isn't using authentication and instead is configured > to blindly allow relaying from the local / internal / private LAN. As is / > was traditional for a long time for ISPs to allow relaying from their > (client) IP address space. (Granted, this is against best practices.) > > How would this type of scenario effect your statement above? That will depend on how that particular MTA constructs its Received headers in relation to the parsing in Mail::SpamAssassin::Message::Metadata::Received, which is non-trivial to describe in human language. >> OK, but in that case the MTA would use an IP that should be in >> trusted_networks and have rDNS. > > Agreed. > >> The partner machine's IP should be in trusted_networks AND should have rDNS >> as an explicit technical requirement of the cooperation, which is entirely >> reasonable. > > Okay. > > > > -- > Grant. . . . > unix || die signature.asc Description: OpenPGP digital signature
Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)
On 30 Aug 2018, at 10:01, Matus UHLAR - fantomas wrote: On 30.08.18 09:49, Kevin A. McGrail wrote: I feel that you are fighting a bigger battle than one rule in SA. two rules actually ;-) (with two more possible). Without RDNS, you are running afoul of the postmaster rules of virtually every major email player. You will have massive deliverability issues.. Those IP addresses are in internal network with private IP ranges. When connecting to world, their IPs are NAtted to public. even if I fixed the DNS (and I can't since the network is not in my control), HDR_ORDER_FTSDMCXX_DIRECT would still apply. It's hard to understand this circumstance based on the generic description. It appears that you have a configuration where a relay is in trusted_networks (i.e. you believe what it asserts in Received headers) but it is NOT in internal_networks so it is in the synthetic X-Spam-Relays-External pseudo-header, it is the only element in X-Spam-Relays-External so the message matches__DOS_SINGLE_EXT_RELAY, and it has no rDNS so the message matches __RDNS_NONE. So: why is that nameless machine that you cannot make a named machine NOT in internal_networks? I believe faking DNS is not what you advise to me, although it would "fix" the problem temporarily (but could create another problem should the DNS be created later). Of course not, but if a machine is trusted to tell the truth in Received headers and has no rDNS because it is talking to a close affiliate on a RFC1918 IP, in what sense is it not internal? Or is it in internal_networks but there's something wrong in how SA is parsing Received headers to build X-Spam-Relays-External? That is why I believe that adding ALL_TRUSTED would solve the problem without unnecessary issues for others. Yes, I can do that locally - but by redefining rule I could miss it getting fixes or improved later. And since different people have already reportted this problem in the past, I would like to make the fix possible for all, if viable. I think the fix for all is for everyone to get their internal_networks and trusted_networks configurations correct.
Re: Non-ascii subjects with images
On 1 Sep 2018, at 18:22 (-0400), David B Funk wrote: On the other-hand, if you want to decode the subject line and then pattern-match against all the possible UTF-8 emojies, you're going to end up with a rather unwieldy rule. SA "header" rules match against decoded headers, not the Base64 or QP encoded text. In principle, modern Perl can match against named Unicode "properties" so in principle it should be possible to have a rule something like: header EMOJI_IN_SUBJ Subject =~ '/[\p{Miscellaneous Symbols and Pictographs}\p{Emoticons}\p{Ornamental Dingbats}]/' HOWEVER: this does not work in current SA. I have not dissected exactly why but I know there have been many problems in handling Unicode in the SA code. Universal Unicode support is a defining goal of SA 4.0.0, so maybe this would be possible in the svn 'trunk' codebase where a lot of that work is done. On the other hand, it may be a consequence of SA parsing rules too harshly and mangling that particular odd RE syntax. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)
On 31 Aug 2018, at 4:53, Matus UHLAR - fantomas wrote: Note that I list internal clients as trusted, not as internal. Maybe this is the problem. Yes, maybe... Long time ago I learned to configure dynamic IP addresses (dialups) as trusted, but not as internal. They probably should be neither. In this case, clients are internal, not dialup, but I still think they should not be listed in internal_networks (as I don't trust them not to spoof anything). If you do not trust them not to spoof anything, they absolutely must not be in trusted_networks. It seems to me that you have a technical & management arrangement unsuited to the SpamAssassin trusted_networks/internal_networks/msa_networks logical model. My recommendation would NOT be to modify stock rules that are constructed with that logical model as a base assumption, but rather to create your own mitigating rules to handle the fact that you seem to want to always accept mail from certain internal clients which are nameless, untrustworthy, and sources of mail with features that in the world at large mostly correlate to spam.
Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)
On 31 Aug 2018, at 4:05, Matus UHLAR - fantomas wrote: On 08/30/2018 10:16 AM, Bill Cole wrote: It's hard to understand this circumstance based on the generic description. It appears that you have a configuration where a relay is in trusted_networks (i.e. you believe what it asserts in Received headers) but it is NOT in internal_networks so it is in the synthetic X-Spam-Relays-External pseudo-header, it is the only element in X-Spam-Relays-External so the message matches__DOS_SINGLE_EXT_RELAY, and it has no rDNS so the message matches __RDNS_NONE. So: why is that nameless machine that you cannot make a named machine NOT in internal_networks? multiple client PCs in the local network. and as client PCs, I don't want to put them into internal_networks. (And if I remember correctly, I should not). This is a great example of why it is always helpful to have actual (or carefully constructed) samples of mail and of how that mail is analyzed by SA in order to solve a classification problem. I still don't have a solid understanding of how this mail is flowing and what sort of trust you have in the behavior of the specific machines involved in generating and/or transporting the mislabeled email, so I can't say for sure how you should classify those client PCs. As I said in my earlier message today, I think you have a circumstance that can't be forced into how SA classifies hosts. On 30 Aug 2018, at 12:40, Grant Taylor wrote: I don't know if this is the OP's case or not, but the following example comes to mind. SA (running on your receiving MTA) receives a message from an MTA (which is itself an MSA) of an external Business-to-Business partner (thus a trusted MTA that is not internal to the recipient's organization) which itself received the message from a client on an RFC 1918 network without reverse DNS. On 30.08.18 15:08, Bill Cole wrote: If that MSA is requiring authentication (as it should) and recording that in the Received header (as it should) then as I understand it, the handoff of the message will not be considered for __RDNS_NONE. Authentication not implemented yet, and telling the network admins they must to implement it now that I have installed spamassassin, is not acceptable. Tuning DNS is of course possible but it requires some time. Yes. My response to Grant was solely in regards to his hypothetical.
Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)
On 30 Aug 2018, at 18:02, Grant Taylor wrote: > On 08/30/2018 03:50 PM, Bill Cole wrote: >> That will depend on how that particular MTA constructs its Received headers >> in relation to the parsing in >> Mail::SpamAssassin::Message::Metadata::Received, which is non-trivial to >> describe in human language. > > Fair enough. > > Would it be possible for this scenario to present with the symptoms that the > OP described? I don't think so, given the description, but maybe. Mail::SpamAssassin::Message::Metadata::Received implements a baroque ad hoc parsing mechanism that has been adapted organically for most of 2 decades and which "knows" many special cases where a particular Received header pattern indicates a trusted hand-off. My understanding is that the client lacking rDNS in this case is talking directly to the SA host, which is a simpler case. > Thank you for humoring me as I try to learn. No problem. I make no claim to knowing absolutely everything about how the *_networks and Received header parsing behaves, or even to know better than anyone else in particular, but I've fought with it a bunch... signature.asc Description: OpenPGP digital signature
Re: From name containing a spoofed email address
On 19 Jan 2018, at 10:20 (-0500), Rupert Gallagher wrote: > Empty Message You're repeating yourself... -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: From name containing a spoofed email address
On 19 Jan 2018, at 20:02 (-0500), jdow wrote: After your first time being a victim of cyberstalking you'll soon enough wish your "from" line was as generic as mine. People who put their full name in the From: line haven't been mugged yet. I spent a year learning about this 1985-1986. I think that's variable. I had issues 95-97 with both a herd of Usenet kooks and the Church of Scientology (for my small role in defending a.r.s) that included in-person confrontations but I didn't then drop nor have I since dropped the use of my real name online, (with a partial exception during my divorce when I reverted to my birth surname before it was official) despite an attenuated but never really dead trickle of net-originated hostility for 20+ years. I think one's individual vulnerabilities make a huge difference, as there are threats that would literally be laughable to me which would be legitimately and justifiably terrifying to others. The worst I got was 2 visits from CPS in response to anonymous 'tips' of child abuse and a threat of a beating from a man who actually showed up at my door but didn't stay long enough for any substantive interaction after he made a snap judgement of his prospects... OTOH, if I were a woman or looked less like a biker-bar bouncer or had a history I'd rather not have widely known, I'd almost surely evaluate my risk differently. This is a hard problem that no one has yet solved. As a byproduct of this habit of mine, when I see a "To: John" or other name than mine it's automatically spam, especially when it cannot even get the gender right. That can be useful even without a nym in the From header, although it is helpful to have a tricky name. e.g. no one has ever called me "Willy" except for a few spammers. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: From name containing a spoofed email address
On 19 Jan 2018, at 16:17 (-0500), Chip wrote: Do you mean don't whitelist_auth *@example.com *unless* they have published spf/dkim? I can't speak to Dave's meaning (although I value it...) but in fact whitelist_auth directives only have any effect if the domain has published SPF or DKIM records (and in the latter case, signs mail.) Having those directives is harmless if they don't support one of those authentication mechanisms. Certainly paypal and chase (your examples where you would use whitelist_auth) have real human users. . . Nope. OK, so I don't know about those SPECIFIC domains but in general, major consumer-facing brand holders are usually smart enough (or hire ESPs smart enough...) to keep their humans and their non-human bulk senders segregated by domain and relevant authentication mechanisms. For example, a decade ago I had personally specific addresses directly under the audiusa.com and vw.com domains but neither of those domains had ANY bulk sender addresses except in subdomains and those subdomains shared NO authentication mechanisms with the base domains that had human users. PayPal and Chase may have stupider admins & governance today than VWoA had a decade ago, but I doubt that. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Maxium URL acceptable length
On 23 Jan 2018, at 11:55 (-0500), Pedro David Marco wrote: Shall SA accept URLs 5MB big for example? Generally speaking, SA should not be seeing whole messages that big, much less single URLs. Beyond the slowness and the resource demands of scanning large messages, the discernment power of SA fizzles out around 500KB. You won't catch much of the spam that is very large with SA, because it isn't very similar to the spam SA is designed for or usually trained on. But to the original question, it is unfortunately true that entities which are generally recognized as legitimate sometimes use URLs in email that exceed 1KB, while URLs longer than 2KB are quite rare in ham or spam. Some data: For a while I had test rules that hit URLs with long parts after the hostname and found that a 600 character threshold was useless, with a tiny correlation to ham. At 800 there was a stronger but still not useful correlation to ham. Over 1000 it was a minor menace, hitting 5 times as much ham as spam with most of the spam already scored in double digits by SA and very few that had slipped past SA. I killed the rules as a failed experiment. Today I did a very rough check of an unrepresentative corpus (hand-classified but containing only ham and SA escapees) of 95k messages (93k ham/2k spam) from the past 42 months. Longest URL is 2054 characters after the hostname and that one is in ridiculously pathological spam whose text/plain part is mostly HTML-encoded versions of UTF-16(?) entities. The next longest is 1852 characters and it's in ham. I see no way to make the length of URLs a useful spam test. However, there is a bright side of that. While it will not catch much, it is *probably* perfectly safe to set a prudent limit on URLs (say, 5KB?) and not need to worry much about FPs. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: how to grep multiline add-header X-Spam lines
On 28 Feb 2018, at 16:13 (-0500), RW wrote: On Wed, 28 Feb 2018 21:01:36 +0100 Benny Pedersen wrote: how do one make multiline grep of add-header line, this is imho triggy since it on long lines continue on next line with a first char space, if one could help me solve it i be thankfull If you want to use grep, you can pipe the files through an awk one-liner to unfold the headers. That works, but it is probably more convenient (if one has the procmail package installed or can install it easily and doesn't have awk syntax in the wetware) to use formmail -cs -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Dealing with links to malicious documents
On 13 Mar 2018, at 14:21 (-0400), John Hardin wrote: d) Don't accept emails from outside your organization that link to hosted documents. The document needs to be attached, so that it can be scanned. Unfortunately this is not feasible if you're not a (at least semi-)monolithic organization where you can apply such policies. Also not feasible if any users subscribe to this list or most technical discussion mailing lists. For example, here you are likely to get links into the SA Wiki or to KAM's rules. On the Postfix list it is a rare week that does not have multiple links to the DEBUG_README file posted. The example provided was apparently to a directory (URL ending in '/') but redirected to a .doc. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Problems with SORBS?
On 6 Apr 2018, at 8:08, Martin Gregorie wrote: I'm getting a lot of SORBS lookups rejected due to an "unexpected RCODE". Is anybody else seeing these? I'm sure someone is... There are none of those where I see. If the "unexpected RCODE" is SERVFAIL, it was likely transient on their end. If it was REFUSED then you may need to whether you might be hitting whatever the volume caps are (I have no idea what the SORBS volume caps are.)
Re: FSL_BULK_SIG still active?
On 7 Apr 2018, at 8:08 (-0400), Robert Boyl wrote: Hi, everyone Pls... Is this still an active spamassassin test? No. It is a 'sandbox' rule that got auto-promoted at some point and was auto-demoted March 12. If you run sa-update daily and restart any persistent processes using the rules afterwards, you will keep up with those automatic changes. header __FSL_HAS_LIST_UNSUB exists:List-Unsubscribe meta FSL_BULK_SIG ((DCC_CHECK || RAZOR2_CHECK || PYZOR_CHECK) && !__FSL_HAS_LIST_UNSUB) describe FSL_BULK_SIG Bulk signature with no Unsubscribe Had some odd false positive due to its high score of 1,35... It was a forgot password message... and it scored "Bulk signature with no Unsubscribe". Which is probably not wrong. Password reset messages are usually quite similar to each other, just like a large fraction of spam, and there's little point in them having unsub links. Seems strange as it depends on DCC, Razor, Pyzor, systems that I also see score wrongly. Those are all primarily distributed bulk detectors, rather than spam detectors. Currently they each score about 1, so even if they all hit simultaneously on non-spam bulk mail (which they rarely do) there needs to be something else spammy about a message to push it into the 'spam' classification with the standard threshold of 5. If you're using a lower threshold (as I do) you should have carefully-managed local rules and score adjustments, and have a valid reason to believe that your mail flow fits that divergence. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: low score on very spammy email
On 10 Apr 2018, at 18:28, Motty Cruz wrote: reject_rbl_client zen.spamhaus.org, reject_rbl_client cbl.abuseat.org, That is redundant. The Zen list includes the CBL and Spamhaus has taken over operation of the CBL so there's no lag time between them any more.
Re: FORGED_GMAIL_RCVD and USER_IN_DEF_SPF_WL
On 11 Apr 2018, at 15:28 (-0400), Alex wrote: Hi, this message seems suspicious to me (appears to be some type of survey), but I don't understand how it was whitelisted when google.com is not listed among def_whitelist_from_dkim (or at least shouldn't be) Note that google.com has historically been reserved for Google corporate mail, NOT GMail. Hence these rules exist in the default rules: 60_whitelist_auth.cf:def_whitelist_auth *@*.google.com 60_whitelist_dkim.cf:def_whitelist_from_dkim googlealerts-nore...@google.com 60_whitelist_dkim.cf:# def_whitelist_from_dkim *@google.com https://pastebin.com/raw/h1370F1F I'd appreciate any clarification on what's going on here... The envelope sender is 3ue3owhmjamkzhabyuuhahsbe.qpzhvnthps.jvtytilzadlzalyu@trix.bounces.google.com and the SPF-relevant relay IP is 209.85.223.199, so SPF passes. That's good enough for def_whitelist_auth. Messages of this sort make an irrefutable argument for removing the general pass given to Google in the default ruleset, as it is clearly based on a use model of the domain which no longer is true. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: URI_TRY_3LD fp's with QuickBooks Intuit emails
On 13 Apr 2018, at 6:36 (-0400), Giovanni Bechis wrote: On 04/13/18 09:06, Sebastian Arcus wrote: Hello all. I am getting some fp's with emails from QuickBooks / Intuit with the above rule: Apr 13 08:00:30.853 [5768] dbg: rules: ran uri rule URI_TRY_3LD ==> got hit: "https://myturbotax.intuit.com; On a slightly different note, and mainly for my curiosity to understand SA rules syntax, in 72_active.cf, the score seems to be commented out: #score URI_TRY_3LD 2.000 # limit But when it hits, it still adds 2.0 to the score (and I haven't customized the score anywhere else). That's exceedingly unusual and difficult to explain... Is this a special form of SA syntax? No, it is an artifact of how sandbox rules are included in the published rules. the score is present in rulesrc/sandbox/jhardin/20_misc_testing.cf with tflags publish. Giovanni Yes, but it is published in 72_scores.cf with a trivial score: score URI_TRY_3LD 0.001 0.001 0.001 0.001 -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK
On 7 Apr 2018, at 11:42 (-0400), Sebastian Arcus wrote: Do the standards really require a message id to be in all lower case? Of course not, and that's also not an accurate description of MSGID_SPAM_CAPS. A small minority of rules in SA are based on any external standard. They are empirical and pragmatic, not legalistic. There is a complex analysis of multiple mail streams used to generate scores for the rules and to decide which rules are good enough to publish in updates, run on a daily basis because it takes most of a day to run. The fact that MSGID_SPAM_CAPS exists with that name (and mot with a 'T_' or developer's tag prefix) implies that at some point in the past it was reliable enough as an indicator of spam to be part of the default set. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: plugin: eval failed: __alarm__ignore__(xxx) how to troubleshoot
On 20 Apr 2018, at 14:50 (-0400), John Hardin wrote: Given your findings, I kinda suspect *all* of the tflags=multiple rules are misbehaving from time to time under 3.3.1 - the compiled code may be getting into an infinite loop somehow if the number if *real* hits on the rule exceeds some value - I note there were 17 hits on "your business" there. Not ALL rules... Unless I'm addled by the past 2 days of fever, it looks like an example of this SA bug caused by a perl bug: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6558 I'm surprised RH didn't backport the fix for either perl or SA. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: anyone recognize these headers? From SA or are they from another spam product?
On 24 Apr 2018, at 20:10 (-0400), L A Walsh wrote: These headers (not these values) are in most or all of my emails. In one email on the net they were adjacent to SA's headers (but they aren't in my emails). I was wondering if anyone knew what product might be inserting these headers: X-CSC: 0 X-CHA: v=1.1 cv=6jkfEoj2u7Yj9etNrzOg8LH7MfGxzbc6Xn0EJkmycus= c=1 sm=1 a=nDghuxUhq_wA:10 a=CxQU8S3nryls5r8B3V4N1Q==:17 a=3Y9Ew-73vc-33Fzs_NIA:9 a=wPNLvfGTeEIA:10 a=z11Dn8fxQD8A:10 a=Pmo6RyrIMpYA:10 a=zoqau9DHoPcA:10 a=zE7RolXeqPMA:10 a=CxQU8S3nryls5r8B3V4N1Q==:117 X-CTCH-Spam: Unknown X-CTCH-RefID: str=0001.0A020207.521CE122.0254,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0 X-WHL: SLR The X-CTCH-* headers are a sign of filtering software from Cyren (formerly Commtouch,) which has been resold or integrated by multiple vendors of commercial email filtering products, including Sophos and Ipswitch. I don't know if it is related, but some evidence of scanning by something called 'ironport', as well as by Semantec. I'm trying to track down what is scanning my email at an upstream mail host as they've rejected random emails on initial rcpt of the msg -- without accepting the message and bouncing it, but just not accepting it with the message: User and password not set, continuing without authentication. 64.29.145.41 failed after I sent the message. Remote host said: 550 5.7.1 vB73jgO3003858 This message has been blocked for containing SPAM-like characteristics. What email SW censors things by rejecting them before accepting them? That is not a unique feature, and is widely regarded as a best practice. A MTA which accepts mail and later decides that it is spam has an insoluble problem: pass along mail which is probably malicious, bounce it to an inherently untrustworthy sender address that may belong to an innocent victim, or drop it silently. Since this mail is being rejected immediately, you have an obvious place to go to get the problem fixed: whoever runs the server you're submitting mail to. Presumably that is an entity with whom you have a direct relationship. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: SpamAssassin 3.4.2.
On 17 Apr 2018, at 16:54, John Hardin wrote: On Tue, 17 Apr 2018, David Jones wrote: On 04/17/2018 03:29 PM, Kevin A. McGrail wrote: Dave, why would it go into EPEL? SpamAssassin is a core RPM. I will be updating my main SA platform servers to CentOS 7 this summer so this should be good timing to get SA 3.4.2 from the core repo update. :) RHEL 7 / CentOS 7 core is still on SA 3.4.0 - I had to manually roll my own SA 3.4.1 RPMs from Fedora SRPMs. Anybody here from RH that can commit to packaging SA 3.4.2 for a RHEL 7 core update or explain why it's behind? It's a Red Hat long-standing stability policy. They backport security and some bugfix patches (which is why they have a version '3.4.0-2' RPM) but they do not generally import any upstream version updates that have any potential backward compatibility risk at all except at major EL version releases. So EL7 systems will never get anything but a patched 3.4.0. If you want to track current releases of software on something like RHEL, use Fedora.
Re: SpamAssassin 3.4.2.
On 17 Apr 2018, at 18:13, David Jones wrote: Why hasn't the packaging in RHEL/CentOS been updated to 3.4.1? At my last job where there were supported RHEL machines, I asked a RH support person a similar question regarding Postfix and got the answer: "If you want Fedora, you know where to get it."
Re: SpamAssassin 3.4.2.
On 17 Apr 2018, at 16:38, David Jones wrote: On 04/17/2018 03:29 PM, Kevin A. McGrail wrote: Dave, why would it go into EPEL? SpamAssassin is a core RPM. Oh yeh. I guess because it's been so long since we had an update and my main boxes are running CentOS/SL 6.9 that I forgot it was a core package. The CentOS 5 and 6 boxes out there aren't going to get the new version unless it gets put in some other repo like EPEL or another third party since they are not getting any updates. My understanding of EPEL policy is that its packages never replace the EL base packages. It is often possible to install RPMs from the Fedora updates repos that are analogous to your EL/CentOS version.
Re: Differing scores on spamassassin checks
On 16 Apr 2018, at 19:01 (-0400), John Hardin wrote: On Mon, 16 Apr 2018, Computer Bob wrote: Why should sa-learn not be run as root ? That's a general safe practice. Do as little as root as you possibly can. Why risk a root crack from an unknown bug in sa-learn that somebody has discovered and figured out how to exploit via email? Right: don't let malicious strangers talk to root, even via email. ALSO: sa-learn itself won't stop you from running it as root. Without a global bayes_path, it will learn into ~root/.spamassassin/bayes_* files which no other user can access and spamd can't even TRY to use because it refuses to run as root and drops to 'nobody' if run by root. With a global bayes_path, the bayes_* files will become owned by root and everything else trying to use them (i.e. everything) will fail. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Lots of money, score of 0??
On 27 Mar 2018, at 10:24, Robert Boyl wrote: Guys, Do you usually tune up Lots of money rule? Strange, our spamassassin/EFA scores 0 and false negative. Imho it should score at least something, few people would write Million dollars in an email, why not add up score? LOTS_OF_MONEY 0.00 See https://pastebin.com/dY6iFeYL I see a very large number of legitimate and definitely wanted messages hitting the LOTS_OF_MONEY rule. 849 in my own mail in the past year, excluding mail with quoted spam. This includes YOUR message asking about it.
Re: This sucks
On 1 Apr 2018, at 12:26 (-0400), Michael Brunnbauer wrote: So let's look at my problem again: running my example spam through spamassassin gets it marked as spam while using spamc+spamd does not. This is a critical fact. It indicates that your spamd and the spamassassin script you are running are definitely using different SpamAssassin configurations, possibly different versions of the SpamAssassin distribution, and or possibly even different versions of Perl. Determining what config the spamassassin script is using is fairly easy: 'spamassassin -D generic,config,diag --lint' will give you all the details. Figuring out what spamd is using is less simple (and system-specific) but since you've been maintaining a system by hand for a long time I expect you'll be able to figure out how to do so safely. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Spam from addresses where full name mirrors left-hand side of address
On 2 Apr 2018, at 1:33 (-0400), Rich Wales wrote: [I tried asking this question a couple of days ago, but I've seen no signs that it made it out to the list -- possibly because the sample e-mail addresses I included in my question might have caused it to be flagged as spam. So here goes again, this time with the addresses mangled a bit.] I see a lot of spam with "From:" lines where the left-hand side of the address is essentially the same (modulo punctuation) as the "full name" portion of the address. The right-hand side, on the other hand, is a random gibberish domain. A few examples currently sitting in my local server's spam quarantine (with the addresses edited so they hopefully won't trigger any spam checks): Adding To Human Lifespan "Eliminate Fat Fast" jeanettejtaylor (dot) com> "Home Warranty Special" racerville (dot) com> Smartphone Screen Protector dtqmp (dot) com> Two questions: Is it *technically possible* to create a Spamassassin rule which would match this sort of "From:" line? This (UNTESTED) should do it: header THREE_WORD_MONTY From =~ /(\w+) (\w+) (\w+) <\1.\2.\3/ And assuming it can be done, is it *worthwhile* to do it? Not a clue. Maybe worth a try? -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: This sucks
On 1 Apr 2018, at 21:09 (-0400), Michael Brunnbauer wrote: [...] Figuring out what spamd is using is less simple (and system-specific) but since you've been maintaining a system by hand for a long time I expect you'll be able to figure out how to do so safely. This does not sound very helpful of you Well, it's rather difficult to guess what mechanism you are using to start spamd and when I wrote that I had not seen your message narrowing down the range of possibilities to those one might use on Linux. so I did some debugging on my own and have more information: So I guess I was right? The problem only occurs only when spamd is started in the homedir of root. If I start it in any other directory (including subdirs of /root), Net:DNS behaves like it should: $answer->rdatastr in dnsbl_uri in Dns.pm contains IP addresses in dotted quad notation, like 127.0.0.3. If I start spamd in /root, $answer->rdatastr contains strings like "\# 4 7f03" instead. This occurs regardless of any -x or -u flags to spamd. OK, so that is quite odd. Is there a tree of Perl modules under /root? Perl before 5.26 includes '.' in the @INC array that defines where modules might live, which could cause this if you have a private install tree (as is the default in some distros.) Also, if you run spamd as root it drops to 'nobody' but if you're in /root and /root/.spamassassin is world-readable, it will still get used as the user prefs directory. A normal startup of spamd (by sysvinit, Upstart, systemd, etc.) is what you need to diagnose, not a manual startup from a login shell. None of those normally should put the daemon in /root as a working directory. Run as a proper system daemon by the normal startup subsystem, spamd gets a substantially different environment than if you run it by hand from an interactive shell. So being in /root when started changes the behavior of spamd. Is it possible that this is a timing issue? Could "\# 4 7f03" be some unprocessed response that would be converted to 127.0.0.3 a moment later? Or is there some other explanation for this? No, it's not a timing issue. The root cause is that Net::DNS::RR->rdatastr() should never have been relied upon by SA to have any particular format because it was always poorly documented and quietly vanished from the documentation (but not the code) for Net::DNS::RR.pm in 0.69. What it actually contains is a function of the specific DNS record and what server generated the response, making an explanation for any specific oddity something of a guessing game. More recently, there have been multiple other changes in various components of the Net-DNS distribution that have caused other problems in SA, and they may interact with the rdatastr issue. These issues have all been addressed in the current SA code, both in the 'trunk' and in the 3.4 branch which will (hopefully soon) become the 3.4.2 release. Many (most? all?) packagers of SA maintaining it for major platforms have incorporated some or all of the necessary DNS-related fixes. I've attached a patch that aggregates all of the fixes to this message. You could also install SA from the current 3.4 branch or the last 3.4.2 release candidate package, or if you're adventurous, from the SVN 'trunk' that will eventually yield v4.0. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole Index: lib/Mail/SpamAssassin/Plugin/AskDNS.pm === --- lib/Mail/SpamAssassin/Plugin/AskDNS.pm (.../tags/spamassassin_release_3_4_1/lib/Mail/SpamAssassin) (revision 1676603) +++ lib/Mail/SpamAssassin/Plugin/AskDNS.pm (.../branches/3.4/lib/Mail/SpamAssassin)(working copy) @@ -140,7 +140,7 @@ multiple character-strings (as defined in Section 3.3 of [RFC1035]), these strings are concatenated with no delimiters before comparing the result to the filtering string. This follows requirements of several documents, -such as RFC 5518, RFC 4408, RFC 4871, RFC 5617. Examples of a plain text +such as RFC 5518, RFC 7208, RFC 4871, RFC 5617. Examples of a plain text filtering parameter: "127.0.0.1", "transaction", 'list' . A regular expression follows a familiar perl syntax like /.../ or m{...} @@ -192,10 +192,9 @@ use Mail::SpamAssassin::Util qw(decode_dns_question_entry); use Mail::SpamAssassin::Logger; -use vars qw(@ISA %rcode_value $txtdata_can_provide_a_list); -@ISA = qw(Mail::SpamAssassin::Plugin); +our @ISA = qw(Mail::SpamAssassin::Plugin); -%rcode_value = ( # http://www.iana.org/assignments/dns-parameters, RFC 6195 +our %rcode_value = ( # http://www.iana.org/assignments/dns-parameters, RFC 6195 NOERROR => 0, FORMERR => 1, SERVFAIL => 2, NXDOMAIN => 3, NOTIMP => 4, REFUSED => 5,
Re: T_DKIM_INVALID false positives with Gmail
On 19 Mar 2018, at 11:29, Sebastian Arcus wrote: I've been seeing a number of false positives recently from T_DKIM_INVALID with Gmail emails. Are some Gmail servers misconfigured, or could something be going on at my end? The DKIM record which is flagged as invalid is below: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=mime-version:from:date:message-id:subject:to;bh=8wlgvdpEOmUO2ugslPxRkFYA/ZThwu2bWy5VmlR76ug=; b=gRcnOIzmENqS8a91mSdETdXvyH6df7u0tSwsadk6CMD0KtAbzuM3ojHW+kPEo7AB1i vnbCDc/vsR6H7pP0k3hZmF7z/dAaeZWD4RVzqM+Fv70oHy4af64j+fGSekOCM9o4ShRQ Vk3KyF+69sKTK3rRWEnfrcgi/pN2DJWDvrIBRjmFOZYKNVN+8elaVM9DOO7tEMLYuw7T +sVaUMNt8MuPxRhrskJYOIxK8zzkcJHYV+1TuWJuqZAHRVwgnDWX7q3Wx0GwrX+3lKpm 3A1+F5dBVjH4dXvdfIESm5XpV8b9uBn9daGWrUgkR+PB23XsL9QkxEqCRXdgII3FRxtQ Ps6A== There are LOTS of ways to break a DKIM signature. Whether that one is broken can't be checked and how it might have been broken can't be guessed at without the full *unmodified* headers and body of the message.
Re: Spam from compromised accounts scoring just under block threshold
On 5 Mar 2018, at 15:14, David Jones wrote: FYI This could be something for KAM.cf potentially... I have seen a few of these this morning that would be scoring just under the default SA threshold of 5.0 and are just under my MailScanner 6.0 threshold. https://pastebin.com/r2eZJaef I am reporting these to Spamcop but new waves of compromised accounts keep sending them. They all seem to have a From address with two periods on the left side so something like this: header __ODD_FROM_SPAM From:addr =~ /.{1,20}\..{1,20}\..{1,20}@/ could be combined with something else in a meta to help detect these and push them over the edge. This looks intrinsically shady and could be useful:
Re: Why emails relayedfrom trusted/internal networks trigger rules?
On 26 Apr 2018, at 3:04 (-0400), Palvelin Postmaster wrote: Hi, I relay mail from another server to my main mail server. I have set its IP 52.28.104.67 in my spamassassin conf in the internal_networks and trusted_networks. I assumed that would prevent spamassassin from scanning the messages but no. Why does this happen? Even when SA can recognize that a message is coming via only trusted systems, by default it does not exempt the message from other scanning, it simply hits the ALL_TRUSTED rule. That rule normally has a significant negative score itself and is used to prevent matching in many 'meta' rules. X-Spam-Status: Yes, score=6.1 required=5.0 tests=AWL,DKIM_ADSP_NXDOMAIN, HELO_DYNAMIC_IPADDR,NO_DNS_FOR_FROM,RDNS_DYNAMIC,T_RP_MATCHES_RCVD autolearn=disabled version=3.4.1 Since proper determination of the "X-Spam-Relays-*" pseudo-headers controls most of those hits as well as ALL_TRUSTED, getting that fixed will almost surely be adequate. It will also help with other rules that depend on identifying the boundaries between internal vs. external and/or trusted vs. untrusted Received headers. Received: by palvelin.fi (CommuniGate Pro PIPE 6.2.3) That may be your first problem. SA can't parse that as a proper Received header, which may trigger it to not classify the rest of the Received headers correctly. It's hard to tell if this is causing trouble in this case, because there are problems with the rest as well. With that said, if you can make CGP use SA via an external filter rather than delivering through the PIPE module, you'll get a more robust and performant solution without this oddball Received header. For the 8+ years I ran CGP systems, I used the free cgpav filter, but for modern CGP that needs some patching to work. I seem to recall that cgpsa is also a free tool that works. Received: from [52.28.104.67] (HELO ip-172-31-20-213.eu-central-1.compute.internal) by palvelin.fi (CommuniGate Pro SMTP 6.2.3) with ESMTPS id 10108357 for i...@.com; Mon, 23 Apr 2018 06:35:44 +0300 The Postfix MTA running on the AWS instance using 52.28.104.67 is grossly misconfigured. It should use a EHLO/HELO name ( myhostname, or smtp_helo_name if there's a reason myhostname can't be changed) that resolves to 52.28.104.67 and also should have proxy_interfaces set to 52.28.104.67. It appears that ec2-52-28-104-67.eu-central-1.compute.amazonaws.com would be a good choice, but if that machine talks to anyone else you may want to post a non-generic name at the IP and use that. Received: from ip-172-31-26-125.eu-central-1.compute.internal (ip-172-31-26-125.eu-central-1.compute.internal [172.31.26.125]) by ip-172-31-20-213.eu-central-1.compute.internal (Postfix) with ESMTP id ECF2CC0C32 for <i...@obesus.fi>; Mon, 23 Apr 2018 06:35:43 +0300 (EEST) At this point, SA should have already given up parsing Received headers so the fact that this and the remaining Received headers use RFC1918 IPs and a generic name in a non-resolvable domain doesn't matter: SA cannot trust these because the chain of trust and working DNS is already broken. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Method of setting score for a custom rule to be the required_score ?
On 27 Jun 2018, at 22:17, J Doe wrote: I went back to “man Mail::SpamAssassin::Conf” and can see mention of the shortcircuit plugin . . . is there more documentation (perhaps in another man or perldoc), where the shortcircuit keyword is mentioned ? perldoc Mail::SpamAssassin::Plugin::Shortcircuit For any Perl module that has embedded 'pod' documentation, 'perldoc' provides the best documentation because it is extracted from the actual module rather than relying on a 'man' page that was almost certainly extracted from the module originally but may be stale.
Re: Error 74 with spamc
On 21 Oct 2018, at 21:14, Cecil Westerhof wrote: When executing spamc I do not get output and the exit status is 74 (EX_IOERR: IO error). This would be the result of spamc not being able to communicate with spamd. Is spamd running? Is spamd listening on the socket that spamc is trying to connect to? The man pages for spamc and spamd can help you understand how to determine the answers to these questions. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steadier Work: https://linkedin.com/in/billcole
Re: Error 74 with spamc
On 22 Oct 2018, at 11:08, Cecil Westerhof wrote: "Bill Cole" writes: On 21 Oct 2018, at 21:14, Cecil Westerhof wrote: When executing spamc I do not get output and the exit status is 74 (EX_IOERR: IO error). This would be the result of spamc not being able to communicate with spamd. Is spamd running? Yes, spamd is running. Is spamd listening on the socket that spamc is trying to connect to? The man pages for spamc and spamd can help you understand how to determine the answers to these questions. I should have looked into the logs. :'-( Always a wise choice. When I run it again I see in the logging: Oct 22 16:47:15 munus.decebal.nl spamd[17102]: spamd: connection from localhost [::1]:58764 to port 783, fd 5 Oct 22 16:47:15 munus.decebal.nl spamd[17102]: spamd: setuid to imaps succeeded Oct 22 16:47:15 munus.decebal.nl spamd[17102]: spamd: service unavailable: TELL commands are not enabled, set the --allow-tell switch. Oct 22 16:47:15 munus.decebal.nl spamd[17101]: prefork: child states: II It is a bit strange. I had the same problem 1½ year ago. I solved it by adding --allow-tell switch in the service file. Now it contained: ExecStart=/usr/sbin/spamd -d --pidfile=/var/run/spamd.pid $OPTIONS I do not see the OPTIONS defined. I substituted --allow-tell for $OPTIONS and restarted the service. Now it works again. But why the service file has been changed … That would be an issue for whoever packages SA for your system. There is no systemd service file distributed in the SA release. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: URI_WPADMIN fp
On 19 Oct 2018, at 9:37, Alex wrote: Hi, Should we be adding 3 points for just this, or is there never a reason users should be using /wp-admin in their URLs? The score is coming out of RuleQA, so the score is derived empirically, not by a logical process based in arbitrary axioms. That doesn't mean it's the one true score for everyone, just that it's a useful score in the context of the spam and ham corpora submitted to RuleQA. If it causes actual FPs (i.e. ham that is identified as spam, NOT ham identified as ham that happens to hit a strong spam rule but scores below the threshold) then it is probably a good idea to limit its score in RuleQA or to examine the FPs to find ways to narrow the rule. I see that John has the basic rigging in place to allow for narrowing via meta conditions, so presumably he anticipated the possibility. Oct 19 09:33:11.561 [1299] dbg: rules: ran uri rule __URI_WPADMIN ==> got hit: "/wp-admin/images/" The rule description says possible phishing, but how would an end-user be in a position to create a public link that involves their WP admin directory in the first place? Think more carefully about that question. As written it seems much more naive than you can actually be. 2 hints: 1. WordPress is probably the most frequently compromised server software in the history of the web, excluding Microsoft products. 2. If a website isn't built on WordPress (as most are not) there is nothing in any way special about a 'wp-admin' token in a functioning URL. I'd offer to demonstrate that with my own website, but I'm not in a mood to disable the trap that converts every request for a WordPress-like URL into a firewall rule and DNSBL entry...
Re: Status Authenticated Received Chain (ARC) Support
On 17 Oct 2018, at 14:27, Markus Kolb wrote: Hi, what is the status of ARC Support (https://tools.ietf.org/html/draft-ietf-dmarc-arc-protocol-16)? It is not supported in any way in SA as of 3.4.2 and I am unaware of anyone proposing an operational model for supporting it. There is no supporting code in the current 'trunk' codebase. If someone were to provide a reasonable model for supporting ARC in SA and a sound implementation, I would expect that it *could* make the 4.0.0 release. This would be made somewhat more likely by the draft progressing to a final RFC, but the critical component is really a well-designed implementation that provides some utility in determining whether or not mail is spam. It is worth noting that the utility of DKIM and hence DMARC to that end has been marginal. Also note that while there will be a 3.4.3 release, there is no chance of this or any other completely new feature being added for it, as 3.4.3 is intended to be the final bug fix release for the 3.x lineage. The perl Mail-DKIM module has ARC support since version 0.50 (https://metacpan.org/pod/release/MBRADSHAW/Mail-DKIM-0.50/lib/Mail/DKIM.pm) Notably, that support is documented as being 10 draft revisions behind the current one, so it might not be wise to actually use it... Does SpamAssassin use this feature from Mail-DKIM if this version or newer is available? No.
Re: KAM_Back rule
On 26 Oct 2018, at 15:13, John wrote: I just got an email from a mailing list of which i am a member (UK academic geophysics) which was scored at 5, mainly from a 5.5 contribution from KAM_BACK, described as background check SPAM. I have not managed to work out what that rule is trying to do, but it is the first detected oh-nasty from using the KAM rules. Clearly I can reduce the score but I am struggling to see what was wrong with the message, attached. There's nothing wrong with the message, the rule is too aggressive. It consists of 5 sub-rules, 3 body and 2 header for From and Subject. Hitting any three satisfies the meta-rule. It seems to be targeted at spam selling criminal and/or financial background reports (which is a real market here in the US, where we have no serious privacy laws...) Unfortunately, it does not seem to be constructed with an appreciation for the fact that people discuss criminality in non-spam. Personally, I just zeroed the score for that on my personal system. Thanks for bringing it to light. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: Rule for a link with an numeric IP in body?
On 29 Oct 2018, at 9:55, Anders Gustafsson wrote: Is there such a rule already in 3.3.x? Do not run SpamAssassin 3.3.x. It is not safe. There have been multiple serious security bugs fixed in the 3.4.x series. However, the rules for 3.3.x and 3.4.x are identical. And yes, the rule "NORMAL_HTTP_TO_IP" will catch http(s) URLs using dotted-quad IPs and "NUMERIC_HTTP_ADDR" will catch http(s) URLs using a single decimal number (which some resolvers will treat as an IP address) I would ideally want a version of that that adds to the spam score if it sees a x.x.x.x/unsubscribe link, possibly translated. IP_LINK_PLUS does something similar that could be adapted pretty easily. Asking here as regexps are not really my strong side. Well, if you look in the standard rule distribution (under /var/lib/spamassassin/ or somewhere similar, depending on your platform) you will find the file 20_uri_tests.cf, in which all of the standard URI-based rules are defined, many with comments. Even if you're not a wizard with regexps, you may be able to find rules there which you can adapt to your own needs with simple changes.
Re: config files in spamasassin is unintended tlds :/
On 5 Nov 2018, at 9:44, RW wrote: I created an A-record at Namecheap for a_b.mydomain.tld and neither firefox nor chromium had a problem with it. That's interesting and unfortunate because 'a_b' is unequivocally a violation of the syntax for hostnames. It may be acceptable as a DNS label, but it isn't a valid hostname. FWIW, BIND 9.x (since 9.4-ish) will parse and load a zone with such an A in it, but complains and does not serve the record: NXDOMAIN for a normal query, no hint of it in a zone transfer. Deep in the mists of time, the resolver for 'classic' MacOS (not derived from any other resolver) got an update that made it no longer resolve hostnames with underscores and while there was a brief bit of grumbling, they never reversed that stringency. I would guess that with some authoritative servers refusing to serve invalid names and some resolvers refusing to resolve them, it would be a low-yield tactic to use them to evade filtering.
Re: private networks are default rbl tested :/
On 5 Nov 2018, at 20:04, RW wrote: On Mon, 05 Nov 2018 23:37:59 +0100 Benny Pedersen wrote: https://en.wikipedia.org/wiki/Private_network why are this network not default internal_networks trusted_networks msa_networks They are if you let SA guess your networks. If you specify the networks manually you have to specify everything And the reason for that is simply that not everyone trusts all of the machines on reachable RFC1918 networks. For example, I worked for some years at a multinational where 10/8 was allocated globally and was routed globally. I had a list of specific non-local machines I was supposed to trust for outbound relay (and use when my outbounds couldn't use the local external link) but there was no way I could also trust the tens of thousands of other 10.* machines around the world that could very well be compromised personal desktops. I didn't even trust my own local personal desktops.
Re: Bayes underperforming, HTML entities?
On 7 Nov 2018, at 14:33, Amir Caspi wrote: Hi all, In the past couple of weeks I've gotten a number of clearly-spam messages that slipped past SA, and the only reason was because they were getting low Bayes scores (BAYES_50 or even down to BAYES_00 or BAYES_05). I do my Bayes training manually on both ham and spam so there should not be any mis-categorizations... and things worked fine until a few weeks ago, so I don't know what's going on now. Here's the magic dump: -bash-3.2$ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 253112 0 non-token data: nspam 0.000 0 106767 0 non-token data: nham 0.000 0 150434 0 non-token data: ntokens 0.000 0 1536087614 0 non-token data: oldest atime 0.000 0 1541617125 0 non-token data: newest atime 0.000 0 1541614751 0 non-token data: last journal sync atime 0.000 0 1541614749 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 1173 0 non-token data: last expire reduction count I don't see any obvious problem but I'm not an expert at interpreting these... The only useful info is the the number of spams and hams scanned (nham and nspam) is well above the usage threshold and the fact that the various timestamps (other than 'oldest atime') are reasonably recent. If you happen not to live in Unix epoch time, the conversion is not hard: # date -j -f %s 1541617125 Wed Nov 7 13:58:45 EST 2018 Do I need to completely trash and rebuild my DB, or am I missing something obvious? No and no. Although it is perhaps helpful to recognize that Bayes is inherently imperfect and will always be wrong about some messages. In many cases, it would appear that these spams have either very little (real) text (besides the usual attempt at Bayes poisoning) and/or are using HTML-entity encoding to try to bypass Bayes. Here are a couple of spamples: https://pastebin.com/peiXZivJ https://pastebin.com/3h3r7r7j Those both have broken MIME structure, so SA can't treat the HTML part as HTML. No MUA would render and display them correctly. Assuming that you did that breakage yourself, intentionally: Stop doing that. It is pointless and hampers any attempt to assist you. The only things that could ever be private about spam are the target address and internally-added headers. Does SA decode HTML entities as part of normalize_charset? If not ... can this be added? I'm not entirely certain, but the documentation of bayes_token_sources in Mail::SpamAssassin::Conf implies that HTML is rendered to text to the point where SA can tell whether it is visible, which makes me suspect that the entities get decoded. But that IS just a guess: I haven't traced the code. Empirically, I had SA learn a message with regular text in an HTML part encoded as entities and then scanned a message with the same text as text, and I got a 1.000 Bayes score (BAYES_999) for the second one. YMMV -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: Bayes underperforming, HTML entities?
[Resending because it looks like my first send went into a black hole...] On 7 Nov 2018, at 14:33, Amir Caspi wrote: Hi all, In the past couple of weeks I've gotten a number of clearly-spam messages that slipped past SA, and the only reason was because they were getting low Bayes scores (BAYES_50 or even down to BAYES_00 or BAYES_05). I do my Bayes training manually on both ham and spam so there should not be any mis-categorizations... and things worked fine until a few weeks ago, so I don't know what's going on now. Here's the magic dump: -bash-3.2$ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 253112 0 non-token data: nspam 0.000 0 106767 0 non-token data: nham 0.000 0 150434 0 non-token data: ntokens 0.000 0 1536087614 0 non-token data: oldest atime 0.000 0 1541617125 0 non-token data: newest atime 0.000 0 1541614751 0 non-token data: last journal sync atime 0.000 0 1541614749 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 1173 0 non-token data: last expire reduction count I don't see any obvious problem but I'm not an expert at interpreting these... The only useful info is the the number of spams and hams scanned (nham and nspam) is well above the usage threshold and the fact that the various timestamps (other than 'oldest atime') are reasonably recent. If you happen not to live in Unix epoch time, the conversion is not hard: # date -j -f %s 1541617125 Wed Nov 7 13:58:45 EST 2018 Do I need to completely trash and rebuild my DB, or am I missing something obvious? No and no. Although it is perhaps helpful to recognize that Bayes is inherently imperfect and will always be wrong about some messages. In many cases, it would appear that these spams have either very little (real) text (besides the usual attempt at Bayes poisoning) and/or are using HTML-entity encoding to try to bypass Bayes. Here are a couple of spamples: https://pastebin.com/peiXZivJ https://pastebin.com/3h3r7r7j Those both have broken MIME structure, so SA can't treat the HTML part as HTML. No MUA would render and display them correctly. Assuming that you did that breakage yourself, intentionally: Stop doing that. It is pointless and hampers any attempt to assist you. The only things that could ever be private about spam are the target address and internally-added headers. Does SA decode HTML entities as part of normalize_charset? If not ... can this be added? I'm not entirely certain, but the documentation of bayes_token_sources in Mail::SpamAssassin::Conf implies that HTML is rendered to text to the point where SA can tell whether it is visible, which makes me suspect that the entities get decoded. But that IS just a guess: I haven't traced the code. Empirically, I had SA learn a message with regular text in an HTML part encoded as entities and then scanned a message with the same text as text, and I got a 1.000 Bayes score (BAYES_999) for the second one. YMMV -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: Warnings when enabling URILocalBL plugin
On 8 Nov 2018, at 17:26, Kevin A. McGrail wrote: > There are a lot of changes to GeoIP having to do with the database behind > it being deprecated. I think you might have to look at all the GeoIP stuff > and would appreciate your feedback. Bill, do you remember who was working > on all the GeoIP stuff? Giovanni mostly. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: Bayes underperforming, HTML entities?
On 8 Nov 2018, at 21:55, John Hardin wrote: On Thu, 8 Nov 2018, Amir Caspi wrote: On Nov 8, 2018, at 7:41 PM, John Hardin wrote: Sure, but I't also prefer to have a sample to test on before committing. I'll see if I can get the pastebin to work (i.e. fix the boundary) I can send you some new spamples via attachment, privately. No, the pastebinned ones work unaltered. The problem with that: they are mangled in a way that prevents HTML interpretation of then HTML part. Hence a 'body' rule will match the uninterpreted entities. For the real world, i.e. with proper MIME structure, I think you need a 'rawbody' rule to match against the uninterpreted entities. I have confirmed that de-munging the boundaries fixes them to allow proper MIME interpretation. In both cases, the boundary line between the 2 MIME parts is apparently unchan ged, so you can just use it to fix the other 3 places that it needs to match. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: googleapis hosted phish
On 15 Nov 2018, at 7:52, RW wrote: On Thu, 15 Nov 2018 01:22:00 -0500 Bill Cole wrote: On 14 Nov 2018, at 20:11, Alex wrote: Where is it getting these long hostname strings from? There's a bunch of garbage HTML using invisible text (font-size: 0) between tiny bits of visible text to break Bayes and/or specific word detection. That particular example is actually html in a text/plain mime section. The mess in the text/plain part is a result of a botched rendering/tag-stripping of the insane text/html part, but yes: the specific misidentified domain name is in the plain part and is a result of a line-breaking artifact inside the rendered HTML. -- Bill Cole
Re: googleapis hosted phish
On 14 Nov 2018, at 20:11, Alex wrote: Where is it getting these long hostname strings from? There's a bunch of garbage HTML using invisible text (font-size: 0) between tiny bits of visible text to break Bayes and/or specific word detection. The overly-thirsty "URI" parser strings this junk together and is seeing .az\b somewhere in it, and picks it up as a domain name. It's noisy in debug output but in this case harmless because what it is seeing includes a hostname that's too long to be a DNS label. FWIW, that junk can be detected with rawbody rules looking for idiosyncratic HTML. I don't publish my local rules which do that sort of thing because they are very useful but very evadable and I suspect that if the precise rules were broadcast, they'd stop being useful in a matter of days. Instead, it would be really good if everyone maintaining their own local rules would take that hint and devise an invisible forest of slightly different rules to catch HTML structures with no legitimate purpose, making it impossible for spammers to get around a single rule published in the default channel or KAM.cf or anything else known to be under spammers' watch. (CAVEAT: For some reason, a lot of opt-in political bulk mail also catches on such rules.) Should we be rethinking whether googleapis.com should be in the DNSBL skip list? I think it may deserve a special rule all its own (with extensive FP shielding) but I suspect that you will never see it in a URIDNSBL that is safe to use, so it would do no good to keep resolving storage.googleapis.com and other such names with short-TTL CNAME records pointing to shorter-TTL A records on a frequent basis only to determine that it will never get listed OR that you're using a URIDNSBL which intends to generate widespread collateral damage. Of course, I could be wrong. You could test how wrong I might be with this: clear_uridnsbl_skip_domain googleapis.com -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: config files in spamasassin is unintended tlds :/
On 4 Nov 2018, at 11:45, Grant Taylor wrote: > Why does it matter if there's a naming collision between DNS domain names and > file names? Discussion of config files for SpamAssassin and Postfix has intermittently been matched by URI DNSBLs. Some years ago I discovered just how widespread dumb bounce models were when I talked about the master config file for Postfix on the Postfix Users list, the same week that someone was spamvertising URLs under master (dot) cf. -- Bill Cole signature.asc Description: OpenPGP digital signature
Re: config files in spamasassin is unintended tlds :/
On 4 Nov 2018, at 16:27, Henrik K wrote: Can someone actually register and use a domain with underscore in it? No. It is worth noting that the SA "standard" for what is treated as a domain part of an URI is grounded in how MUAs behave, not in conformance to to any well-defined specification. I recall a conversation I had either here or in a bug with Kevin McGrail some years back in which I argued that "could be a domain name in a URI" was too broad a definition and lost badly on the fact that most of my examples of "Not A URI" were in fact turned into clickable links by some horrific MUA. I support the concept of not treating domain-name-like strings that are not valid hostnames as if they are URI domain-parts. That would mean anything with an underscore. It MIGHT be more prudent to exempt leading-underscore labels, as those can be legal domain names that could have CNAME or DNAME records mapping them to working hostnames. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: config files in spamasassin is unintended tlds :/
On 4 Nov 2018, at 14:48, Matus UHLAR - fantomas wrote: On 4 Nov 2018, at 11:45, Grant Taylor wrote: Why does it matter if there's a naming collision between DNS domain names and file names? Bill Cole skrev den 2018-11-04 19:25: Discussion of config files for SpamAssassin and Postfix has intermittently been matched by URI DNSBLs. Some years ago I discovered just how widespread dumb bounce models were when I talked about the master config file for Postfix on the Postfix Users list, the same week that someone was spamvertising URLs under master (dot) cf. On 04.11.18 19:48, Benny Pedersen wrote: Nov 3 03:22:50 localhost named[2301]: connection refused resolving '72_scores.cf/NS/IN': 2a04:1b00:6::1#53 [...] Oct 31 08:30:38 localhost named[2301]: connection refused resolving '20_imageinfo.cf/NS/IN': 2a04:1b00:6::1#53 so ns.cf blocks my named now, i cant resolve any cf domains with it time to change imho I recommend chasing who is treating those as URLs. That would be SpamAssassin itself. The policy of treating anything matching '[-a-zA-Z0-9_]+\.' as an URI in all contexts dates back to v3.3.1 at least. See https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6716 and note this scan of a recent message: # spamassassin -t -D uridnsbl /tmp/mdpreserve.42nj0s5F0hz1hSbGh/INPUTMSG 2>&1 |pcregrep '\.cf\b|^(From|Subject|Date|Message-Id): ' Nov 4 15:55:21.684 [55625] dbg: uridnsbl: considering host=72_scores.cf, domain=72_scores.cf Nov 4 15:55:21.720 [55625] dbg: uridnsbl: complete_dnsbl_lookup X_URIBL_A DNSBL:72_scores.cf:dnsbltest.spamassassin.org Nov 4 15:55:21.721 [55625] dbg: uridnsbl: complete_dnsbl_lookup X_URIBL_B DNSBL:72_scores.cf:dnsbltest.spamassassin.org Nov 4 15:55:21.721 [55625] dbg: uridnsbl: complete_dnsbl_lookup X_URIBL_DOMSONLY DNSBL:72_scores.cf:dnsbltest.spamassassin.org Nov 4 15:55:21.722 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_RHS_DOB DNSBL:72_scores.cf:dob.sibl.support-intelligence.net Nov 4 15:55:22.051 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_MW_SURBL DNSBL:72_scores.cf:multi.surbl.org Nov 4 15:55:22.051 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_WS_SURBL DNSBL:72_scores.cf:multi.surbl.org Nov 4 15:55:22.052 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_PH_SURBL DNSBL:72_scores.cf:multi.surbl.org Nov 4 15:55:22.052 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_CR_SURBL DNSBL:72_scores.cf:multi.surbl.org Nov 4 15:55:22.052 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_ABUSE_SURBL DNSBL:72_scores.cf:multi.surbl.org Nov 4 15:55:22.052 [55625] dbg: uridnsbl: complete_dnsbl_lookup SURBL_BLOCKED DNSBL:72_scores.cf:multi.surbl.org Nov 4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_MALWARE DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_ABUSE_PHISH DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_BOTNETCC DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_PHISH DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_ABUSE_REDIR DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_ERROR DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_ABUSE_SPAM DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_SPAM DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_ABUSE_BOTCC DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.054 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_DBL_ABUSE_MALW DNSBL:72_scores.cf:dbl.spamhaus.org Nov 4 15:55:22.054 [55625] dbg: uridnsbl: complete_ns_lookup NS:72_scores.cf Nov 4 15:55:22.055 [55625] dbg: uridnsbl: complete_a_lookup A:72_scores.cf Nov 4 15:55:22.056 [55625] dbg: uridnsbl: complete_dnsbl_lookup KAM_BODY_COMPROMISED_URIBL_PCCC DNSBL:72_scores.cf:wild.pccc.com Nov 4 15:55:22.056 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_RED DNSBL:72_scores.cf:multi.uribl.com Nov 4 15:55:22.056 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_BLOCKED DNSBL:72_scores.cf:multi.uribl.com Nov 4 15:55:22.057 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_GREY DNSBL:72_scores.cf:multi.uribl.com Nov 4 15:55:22.057 [55625] dbg: uridnsbl: complete_dnsbl_lookup URIBL_BLACK DNSBL:72_scores.cf:multi.uribl.com Subject: svn commit: r1845712 - in /spamassassin/trunk/rulesrc/scores: 72_scores.cf Date: Sun, 04 Nov 2018 04:06:19 - From: spamassassin_r...@apache.org Message-Id: <20181104040619.bb2623a0...@svn01-us-west.apache.org> Date: Sun Nov 4 04:06:18 2018 spamassassin/trunk/rulesrc/scores/72_scores.cf Modified: spamassassin/trunk/rulesrc/scores/72_
Re: FPs on FORGED_MUA_MOZILLA (for my own hand-typed messages from my latest-version Thunderbird)
On 2 Oct 2018, at 9:36, Rob McEwen wrote: SIDE NOTE: I don't think there was any domain my message that was blacklisted on URIBL - so I can't explain the "URIBL_BLOCKED", but that only scored 0.001, so that was innocuous. I suspect that that rule is malfunctioning on their end, and then they changed the score to .001 - so just please ignore that for the purpose of this discussion. No, "URIBL_BLOCKED" means that the URIBL DNS returned a value that is supposed to be a message to a mail admin that they are using URIBL wrong and will nevewr get a useful answer without either (1) paying for a feed to support their usage volume or (2) using their own recursive resolver instead of forwarding queries to the likes of Google, OpenDNS, & CloudFlare. A mail filtering system that gets URIBL_BLOCKED hits is broken. A mail filtering system that gets them chronically is mismanaged.
Re: FPs on FORGED_MUA_MOZILLA (for my own hand-typed messages from my latest-version Thunderbird)
On 2 Oct 2018, at 13:39, Matus UHLAR - fantomas wrote: On 2 Oct 2018, at 9:36, Rob McEwen wrote: SIDE NOTE: I don't think there was any domain my message that was blacklisted on URIBL - so I can't explain the "URIBL_BLOCKED", but that only scored 0.001, so that was innocuous. I suspect that that rule is malfunctioning on their end, and then they changed the score to .001 - so just please ignore that for the purpose of this discussion. On 02.10.18 11:48, Bill Cole wrote: No, "URIBL_BLOCKED" means that the URIBL DNS returned a value that is supposed to be a message to a mail admin that they are using URIBL wrong A mail filtering system that gets URIBL_BLOCKED hits is broken. A mail filtering system that gets them chronically is mismanaged. Nonsense. There is no such implication here. While URIBL_BLOCKED may and most of the time apparently does mean that system uses DNS server shared with too many clients, any system that receives and checks too much mail may get URIBL_BLOCKED just because they have crossed the limit, withous using it wrong or being broken. Operating a system in a manner which chronically crosses that limit is abusive. The DNS reply that results in URIBL_BLOCKED is not "free" for the URIBL operators and depending on their software may be as expensive as sending a real reply. It has the advantage over simply dropping abusive queries that it does not impose timeout delays on abusive queriers and sends a clear signal that can and should be acted upon.
Re: Dependency: fetch binary
On 23 Sep 2018, at 10:56 (-0400), Jari Fredriksson wrote: > What is this binary? It's a core FreeBSD utility used to fetch remote files. > I could not find any package providing this… I need it for debian (Raspbian) > and CentOS 7. As Kevin noted, you do not. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole signature.asc Description: OpenPGP digital signature
Re: Bayes not learning, blacklist not filtering
On 15 Nov 2018, at 14:27, MarkCS wrote: So I've been tasked with researching an issue with the mail server at work. We use Spamassassin and at present, it's not blocking some pretty obvious spam, largely from the domain qq.com. Basically email is slipping through, being bounced back at the end receiving server, then our server tries to bounce back to qq.com, which doesn't exist at that point and we get a bounce message. Hundreds of these suckers are coming through daily. As John said, absolutely blocking a whole domain is best done before SpamAssassin, in the MTA (in your case that looks like Postfix.) In fact, all of John's reply was good. There's one thing he was probably too polite to mention though... X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on Upgrade SA. 3.3.2 is antique and hasn't seen any updates in (as note) 7+ years. Each 3.4.x release has added useful functionality. Substantial parts of the default ruleset are wrapped in version checks because they demand 3.4.x features. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: SPF weirdness...
On 15 Jan 2019, at 11:08, Grant Taylor wrote: Does anybody know off the top of their head—don't dig, I'll do that later—what might cause SpamAssassin to apply SPF processing to earlier Received: headers (lower in the message source)? Check both the contents and documentation of trusted_networks, msa_networks, and internal_networks. I'm seeing SpamAssassin claim that a message failed SPF processing based on chronologically earlier internal Received: headers. Conversely, the connection to my SMTP server are perfectly acceptable with the published SPF record. If SA thinks a prior hop is through a machine that writes trustworthy Received headers and is a normal part of your relay path, it will check SPF there. There MAY be a design bug there. I'm not sure how SA deals with a machine you trust and which is a normal inbound relay that also is SPF-approved for mail it gets from other places. Maybe msa_networks can solve this. I just noticed this and will look into it further later as soon as time permits. I'm hoping that someone may have a 15 second knee-jerk "check this or that" type response. Thank you in advance. -- Grant. . . . unix || die
Re: SPF weirdness...
On 15 Jan 2019, at 12:15, Grant Taylor wrote: > On 01/15/2019 09:24 AM, Kevin A. McGrail wrote: >> What is your glue for SA? Is it getting the received header you are >> expecting in time for the parsing? > > Both SA and my spfmilter are are milters on the same inbound Internet edge > MTA. > > I will have to research to see if the header is added by the time that SA > checks things. > > I do know that the Received: header isn't there by the time that SA runs. I > don't know if my MTA has added the proper Authentication-Results: header yet > or not. > > … > > As sure as I type that, "…the Received: header isn't there…", which may mean > that SA is running the contents of the previous Received: header through SPF > checks. > > … > > That seems to be part of the problem. > > Thank you Kevin. I now have something more specific to investigate. This strikes me as a flaw in whatever milter you're using. Some (e.g. MIMEDefang) milters deal with the fact that they don't get a local Received header by constructing one from what they know before passing the message to SA. signature.asc Description: OpenPGP digital signature
Re: SPF weirdness...
On 15 Jan 2019, at 14:24, Grant Taylor wrote: > On 01/15/2019 11:39 AM, Bill Cole wrote: >> This strikes me as a flaw in whatever milter you're using. Some (e.g. >> MIMEDefang) milters deal with the fact that they don't get a local Received >> header by constructing one from what they know before passing the message to >> SA. > > The SPF milter is constructing the header. I assume that it's doing so > properly. At least the headers I see coming out of the MTA are correct. > > I think that SpamAssassin is looking for a header that isn't there yet. - > Both SpamAssassin and my SPF filter are hooked into the same MTA as milters. > So both of them see the message before it's accepted and all headers new are > added. > > I don't know if the SPF milter can add the header sooner, or if that is > controlled by the MTA. > > I would also like SpamAssassin to use the information available to it via the > milter interface instead of relying on a header. Let me clarify... There are at many different milters that can use SpamAssassin listed at https://wiki.apache.org/spamassassin/IntegratedInMta#Integrated_into_Sendmail. Some links there may be dead. SpamAssassin is not a milter. SpamAssassin knows nothing about message parameters passed through the milter interface between a MTA and a milter. The ONLY message data that SpamAssassin knows about is what it gets in a RFC822/2822/5322 format message with parseable headers. A milter that uses SpamAssassin can modify the message that it receives via the milter interface before passing it to SpamAssassin for analysis. This allows the milter to inform SpamAssassin of facts that SpamAssassin can use, such as the SMTP client address, envelope sender and recipients, and whatever else it gets from the MTA. For SpamAssassin to do SPF calculations it needs to have a Received header and envelope sender, which can be embedded in headers that are added by a milter that uses SpamAssassin. signature.asc Description: OpenPGP digital signature
Re: SPF weirdness...
On 15 Jan 2019, at 15:05, Grant Taylor wrote: > I will investigate to see if spamass-milter can fabricate a satisfactory > Received: header. A quick look at the issue tracker for it implies that it does so. A milter that actually works with SA really needs to. Unfortunately, it is a nuisance to debug spamass-milter because it talks to spamc which talks to spamd, so you need to give debug flags to the spamass-milter process and spamd to see exactly what's going on. signature.asc Description: OpenPGP digital signature
Re: Phishing.pm
On 21 Jan 2019, at 13:58, Rick Cooper wrote: Giovanni Bechis wrote: Il 13 gennaio 2019 21:52:19 CET, Giovanni Bechis ha scritto: Il 13 gennaio 2019 20:22:40 CET, Ian Evans ha scritto: Running 3.4.2, spamd daemon. Just enabled the new Phishing.pm plugin but wondering about the data feeds. Is that something we need to set up a cron to wget or does the plugin handle it? Unless my google fu is weak due to a lack of caffeine, I couldn't find any doc on setting it up. Thanks for any advice. try Mail::SpamAssassin::Plugin::Phishing Cheers Giovanni man Mail::SpamAssassin::Plugin::Phishing to be precise. Giovanni Something that isn't answered in the docs is the default score If you define a rule using the plugin, you must either give it a score or it will have the default score of any rule: 1.0. Note that because the plugin is disabled by default, the default ruleset distributed via sa-update does not include a rule using the plugin and so you must define a rule as documented for the plugin to be used at all. and I am wondering if SA has to be restarted after each update of the data or does it reread each time the plugin is called It seems to me that the data file is re-read for each scan, so no restart is needed. even if I'm mis-reading, it would be re-read for each new spamd child process (or mimedefang worker) so a restart would not be *needed* if you can tolerate a delay until children are respawned. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: Phishing.pm
[Pulling this conversation back on-list where I can misinform everyone publicly] On 22 Jan 2019, at 5:04, Ian Evans wrote: On Tue, Jan 22, 2019 at 2:15 AM Bill Cole < sausers-20150...@billmail.scconsult.com> wrote: [snip] Note that because the plugin is disabled by default, the default ruleset distributed via sa-update does not include a rule using the plugin and so you must define a rule as documented for the plugin to be used at all. One thing I'm not clear on: a) do we need to add this to local.cf: ifplugin Mail::SpamAssassin::Plugin::Phishing phishing_openphish_feed /etc/mail/spamassassin/openphish-feed.txt phishing_phishtank_feed /etc/mail/spamassassin/phishtank-feed.csv body URI_PHISHING eval:check_phishing() describe URI_PHISHING Url match phishing in feed endif Yes. You may want to only use one of the two feeds, put the feed file(s) in different places, or name the rule something other than URI_PHISHING, but you need to have a body eval rule calling check_phishing() and the path to at least one of the feeds specified. and b) is that sufficient to "define a rule as documented for the plugin to be used at all." Yes. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: Subtest __E_LIKE_LETTER and __LOWER_E listed many times in message header
On 9 Dec 2018, at 18:23, Chris Pollock wrote: > On Sun, 2018-12-09 at 13:06 -0500, Bill Cole wrote: >> On 9 Dec 2018, at 12:04, Chris Pollock wrote: >> >>> This is probably very trivial and doesn't affect anything except >>> maybe >>> the size of the headers but I have to ask. When looking at the >>> headers >>> of some ham I noticed - https://pastebin.com/H7euxqVX the two rules >>> I >>> mention above are in 72_active.cf. Is there a reason for the number >>> of >>> times it's listed? Couldn't each subtest be listed just once >>> instead >>> of >>> multiple times? >> >> Not with the current documented behavior of the code, given the way >> those sub-rules are designed to work together. The goal is to >> identify >> messages which use Latin-script 'e' characters but also use many >> non-Latin-script characters which look like 'e' but are not. To make >> this determination, the rules require the 'multiple' flag without a >> cap >> on thne number of matches which a 'maxhits' parameter would set. > > Got it, thanks Bill. I've never noticed this before. I also noticed > that according to my daily sa-update output this subtest is apparently > new or at least it didn't appear in the output until this past Fri. Correct. See the thread with the subject "No longer just embedded =9D characters in blackmail emails" here last week for the background. >> >> It is not recommended to routinely add the list of matched sub-rules >> to >> scanned messages. >> > Any specific reason why? This is just on my home system. It's got the potential to be VERY noisy (as you've discovered) while not really providing much useful info. Not a big deal on a small system. Anyway, as of today I've capped those 2 subrules at levels which leave ample space to still match the target spam. Should show up in tomorrow's update. signature.asc Description: OpenPGP digital signature
Re: Spamassassin using remote rules definition source?
On 10 Dec 2018, at 13:28, ozgurerdogan wrote: Can you give me some more step by step for : "set up your own local published ruleset source and configure your instances to include that in their rule sources for the standard sa-update processing (will require managing DNS entries and generating SHA checksums for the rules file) " This is what I needed. Thank you everyone by the way. The setup John refers to is fully documented at https://wiki.apache.org/spamassassin/PublishingRuleUpdates
Re: Subtest __E_LIKE_LETTER and __LOWER_E listed many times in message header
On 13 Dec 2018, at 16:24, Chris Pollock wrote: > On Thu, 2018-12-13 at 15:14 -0600, Chris Pollock wrote: >> On Tue, 2018-12-11 at 19:00 -0500, Bill Cole wrote: >>> On 11 Dec 2018, at 16:37, Chris Pollock wrote: >>> >>>> On Mon, 2018-12-10 at 13:09 -0500, Bill Cole wrote: >>> >>> [...] >>>>> Anyway, as of today I've capped those 2 subrules at levels >>>>> which >>>>> leave ample space to still match the target spam. Should show >>>>> up >>>>> in >>>>> tomorrow's update. >>> >>> I was wrong. The addition of a 'maxhits' parameter to the two >>> subrules apparently didn't get committed in time for the nightly >>> rule >>> promotion run. It was in r1848602 and the current ruleset is still >>> at >>> r1848555. Assuming all goes well tonight, the change will appear >>> tomorrow. >>> >> >> Shouldn't this have stopped by now - https://pastebin.com/7260daT3 >> Today's update was '1848731'. >> > Hit send too fast. Doing a compare between 72_active.cf dated the 11th > and the one dated today I do see: > > Dated 11 Dec > if can(Mail::SpamAssassin::Conf::feature_bug6558_free) > ifplugin Mail::SpamAssassin::Plugin::ReplaceTags > body__E_LIKE_LETTER // > tflags __E_LIKE_LETTER multiple > > if can(Mail::SpamAssassin::Conf::feature_bug6558_free) > ifplugin > Mail::SpamAssassin::Plugin::ReplaceTags > body__LOWER_E > /e/i > tflags __LOWER_E multiple > > Dated today 13 Dec > if can(Mail::SpamAssassin::Conf::feature_bug6558_free) > ifplugin Mail::SpamAssassin::Plugin::ReplaceTags > body__E_LIKE_LETTER // > tflags __E_LIKE_LETTER multiple maxhits=400 > > if can(Mail::SpamAssassin::Conf::feature_bug6558_free) > ifplugin > Mail::SpamAssassin::Plugin::ReplaceTags > body__LOWER_E > /e/ > tflags __LOWER_E multiple maxhits=250 > > IIUC then __E_LIKE_LETTER can hit a max of 400 times in one message and > __LOWER_E a max of 250 times in one message. For now, yes. Those numbers were the result of a mis-think on my part and will be 320 and 230 once the current rev works its way through. > Therefore I may still have > a large listing of subtest ran. Yes. I don't expect that behavior to change. SA has always tallied rules and sub-rules with multiple matches and the 'multiple' tflag this way and I see no compelling reason to change that. It almost certainly will not change for 3.4.3, which should be the last 3.4.x release. If there's a bug opened and someone is willing to work on code for whatever changes need to be made to collapse duplicate hit names in the lists of rule matches into a single citation with a count of hits, I expect that change would be accepted for v4, even though it may impact existing users' tooling. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole signature.asc Description: OpenPGP digital signature
Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]
On 20 Dec 2018, at 11:55, Marcus Schopen wrote: > Am Donnerstag, den 20.12.2018, 12:35 +0100 schrieb Marcus Schopen: >> Hi, >> >> I get a warning, when updating the channel: >> >> -- >> config: warning: description exists for non-existent rule EXCUSE_24 >> >> channel: lint check of update failed, channel failed >> sa-update failed for unknown reasons >> -- > > seems not to be a problem of the EXCUSE_24 rule, but a general problem > with sa-update, as other users do have the same problem since today. This should now be fixed for the next rules update.
Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]
On 20 Dec 2018, at 13:41, Bill Cole wrote: This should now be fixed for the next rules update. And, On 20 Dec 2018, at 17:04, (ignoring an explicit Reply-To header in a direct message to me!) Frank Giesecke wrote: How can I force the rules update? You cannot. The "rules update" I referred to is the one that runs every night on an Apache infrastructure host, to update the default rules channel. The update completes around 03:30 UTC. I still get the error on my Debian system. If you cannot wait 5 more hours and have an updated SVN checkout of the 'trunk' code, you can run: make clean ; echo |perl Makefile.PL ; make build_rules That will leave a proper set of rules files in the rules/ directory. If you copy rules/72_active.cf to your local site-wide rules directory (probably /var/lib/spamassassin/3.004002/updates_spamassassin_org/) you will fix the worst effects of last night's broken update. We've had a few occurrences of essentially the same problem (a bad rules package due to an ignored lint failure in a nightly update) over the past few years. In addition to correcting the problematic rule I have also fixed the script which intentionally (!) masked the lint failure and allowed the broken rules package to be built and distributed. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]
On 20 Dec 2018, at 17:54, Bill Cole wrote: If you cannot wait 5 more hours and have an updated SVN checkout of the 'trunk' code, you can run: make clean ; echo |perl Makefile.PL ; make build_rules That will leave a proper set of rules files in the rules/ directory. If you copy rules/72_active.cf to your local site-wide rules directory (probably /var/lib/spamassassin/3.004002/updates_spamassassin_org/) you will fix the worst effects of last night's broken update. It has been pointed out to me that a simpler and less error-prone fix would be to revert to the prior day's rule collection: mkdir /tmp/saupdate-1849156 cd $_ curl -O http://sa-update.spamassassin.org/1849156.tar.gz curl -O http://sa-update.spamassassin.org/1849156.tar.gz.asc curl -O http://sa-update.spamassassin.org/1849156.tar.gz.sha245 curl -O http://sa-update.spamassassin.org/1849156.tar.gz.sha512 sa-update -D --install 1849156.tar.gz
Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]
On 20 Dec 2018, at 17:56, Kevin A. McGrail wrote: We've had a few occurrences of essentially the same problem (a bad rules package due to an ignored lint failure in a nightly update) over the past few years. In addition to correcting the problematic rule I have also fixed the script which intentionally (!) masked the lint failure and allowed the broken rules package to be built and distributed. The file shouldn't get installed though because sa-update checks the lint, doesn't it? It depends on why the lint failed in the update process and on the local config. In the immediate case, sa-update installed the bad package. The root cause of this particular failure was a 'replace_tag' rule that was outside an 'ifplugin Mail::SpamAssassin::Plugin::ReplaceTags' block. Because 'make build_rules' runs with minimal plugins loaded, the rule failed to parse and the design error in the mkrules script papered over the problem with an empty 72_active.cf. The rules package was assembled correctly with that empty file. When tested by sa-update after download, the rules pass lint because the file where the 'bad' rule would have gone was empty.
Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]
On 21 Dec 2018, at 15:57, Michael Orlitzky wrote: > On 12/20/18 7:00 PM, Bill Cole wrote: >> >> mkdir /tmp/saupdate-1849156 > > Never use a fixed path under /tmp =) Fine: #!/bin/sh cd `mktemp -d -t HappyMichael???` curl -O http://sa-update.spamassassin.org/1849156.tar.gz curl -O http://sa-update.spamassassin.org/1849156.tar.gz.asc curl -O http://sa-update.spamassassin.org/1849156.tar.gz.sha245 curl -O http://sa-update.spamassassin.org/1849156.tar.gz.sha512 sa-update -D --install 1849156.tar.gz
Re: Another form of obfuscation email.
On 10 Dec 2018, at 14:13, RW wrote: On Mon, 10 Dec 2018 12:45:53 -0500 Mark London wrote: Hi - Here's another form of obfuscation spam. This time, not a porn blackmail one. Almost the whole text is obfuscated. https://pastebin.com/VURwmrrF You say obfuscated, but it looked completely unreadable to me. The text/plain part is garbage, but the text/html part renders to a mostly readable phish. -- Bill Cole