Re: [Bug 7331] channel: SHA1 verification failed, channel failed

2018-01-11 Thread Bill Cole

On 11 Jan 2018, at 12:58 (-0500), Kevin A. McGrail wrote:


And not to run GPG if we don't even download anything.


I have not had this issue myself so I all I have is the one example in 
the ticket, but the logged bad hash there was for a partial download: 
the first 14372 bytes of 1749638.tar.gz. If there was no download, the 
attempt to hash a nonexistent file would fail without generating a hash 
and emitting some error.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Scoring Issues

2018-01-26 Thread Bill Cole

On 26 Jan 2018, at 17:47 (-0500), Computer Bob wrote:

My understanding is that spamassassin is configured for razor and 
uribl.
amavisd-new is configured to call spamassassin so is spamassassin not 
doing the sub calls ?


Not exactly. The command-line 'spamassassin' script is written in Perl 
and it uses various Perl modules in the Mail::SpamAssassin::* tree. 
Amavisd-new also uses Mail::SpamAssassin::* modules but it does NOT use 
the spamassassin script or any other command-line tool.


The effect of this is that it is possible for amavisd-new and 
spamassassin to use different configurations for the 
Mail::SpamAssassin::* modules. it is clear that this is happening on 
your system.



I see no docs on configuring razor directly in amavis.
If you could tell me what to look for it would be appreciated.


Unfortunately, I can't help with amavisd-new because I don't use it. 
However, it is certain that it is using its own oddball config because 
these scores are ridiculous:



tests=[HTML_MESSAGE=0.001, SPF_HELO_PASS=-1, SPF_PASS=-1,


It's madness to give SPF_HELO_PASS or SPF_PASS significant scores on 
their own. Neither should have a score outside of the -0.01 to 0.01 
range: SPF is informative but not probative. These rules somehow got set 
intentionally to sabotage-level scores somewhere that only the 
amavisd-new process is looking.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Body rules hit on Subject

2018-02-03 Thread Bill Cole

On 2 Feb 2018, at 16:59 (-0500), Kevin A. McGrail wrote:

There is no solution at the moment.  The subject is appended to the 
body of the text for rule parsing. 


The 2nd sentence is wrong: the subject is *prepended* to the body. Also: 
the 1st sentence is wrong, there's no *PRETTY* solution.


If every rendered 'body' starts with an appended line containing the 
Subject (with '^Subject: ' stripped off) then one can solve the problem 
of matching body rules in the Subject header thus:


body__DOCUSIGN_BODY_1ST  /\A.*\bdocusign\b.*\n/mi

body__DOCUSIGN_BODY_NOT1ST  /(?!\A).*\bdocusign\b.*\n/mi

meta  DOCUSIGN_BODY  (HAS_SUBJECT && __DOCUSIGN_BODY_NOT1ST) || 
(__DOCUSIGN_BODY_1ST || __DOCUSIGN_BODY_NOT1ST)



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Body rules hit on Subject

2018-02-03 Thread Bill Cole

On 3 Feb 2018, at 16:37 (-0500), Bill Cole wrote:


On 2 Feb 2018, at 16:59 (-0500), Kevin A. McGrail wrote:

There is no solution at the moment.  The subject is appended to the 
body of the text for rule parsing. 


The 2nd sentence is wrong: the subject is *prepended* to the body. 
Also: the 1st sentence is wrong, there's no *PRETTY* solution.


If every rendered 'body' starts with an appended line containing the 
Subject (with '^Subject: ' stripped off) then one can solve the 
problem of matching body rules in the Subject header thus:


body__DOCUSIGN_BODY_1ST  /\A.*\bdocusign\b.*\n/mi

body__DOCUSIGN_BODY_NOT1ST  /(?!\A).*\bdocusign\b.*\n/mi

meta  DOCUSIGN_BODY  (HAS_SUBJECT && __DOCUSIGN_BODY_NOT1ST) || 
(__DOCUSIGN_BODY_1ST || __DOCUSIGN_BODY_NOT1ST)


make that:

meta  DOCUSIGN_BODY  (HAS_SUBJECT && __DOCUSIGN_BODY_NOT1ST) || 
(MISSING_SUBJECT && (__DOCUSIGN_BODY_1ST || __DOCUSIGN_BODY_NOT1ST))



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Email filtering theory and the definition of spam

2018-02-10 Thread Bill Cole

On 10 Feb 2018, at 16:00 (-0500), Alex wrote:


Can we really trust end-users to properly classify email and not
infect themselves with something or follow a phish without knowing?


Nope. However, we need to act like we do to some degree while doing the 
best we can to make it difficult for them to do dumb things.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Barracuda Reputation Block List (BRBL) removal from the SA ruleset

2018-02-11 Thread Bill Cole

On 11 Feb 2018, at 9:54 (-0500), Benny Pedersen wrote:

first query would be valid for 300 secs, but that is imho still not 
free, problem is that keeping low ttls does not change how dns works, 
any auth dns servers will upate on soa serial anyway, the crime comes 
in when sa using remote dns servers that ignore soa serial updates


in that case ttls would keep spammers listed for 300 secs only


That's not how DNS TTLs work.

When a record's TTL elapses in the local name cache, it is dropped. The 
next query for that name and record type causes the resolver to make 
another query to the authoritative nameservers, which will return the 
same record whose TTL expired unless it has been removed from the zone. 
No standards-conforming DNS resolver returns NXDOMAIN based on the lack 
of a non-expired record in its cache and an unchanged SOA serial above 
the name. That would make no sense at all and require many more SOA 
queries than actually happen.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Email filtering theory and the definition of spam

2018-02-11 Thread Bill Cole

On 11 Feb 2018, at 16:20 (-0500), Antony Stone wrote:

Strange that I can't find SMTP under 
www.rfc-editor.org/rfc/std/std-index.txt

‎though, other than STD0060 and STD0071, which are both extensions.


STD10 is SMTP (RFC821), STD11 is message format(RFC822).


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Bill Cole

On 13 Feb 2018, at 9:33, Horváth Szabolcs wrote:

This is a production mail gateway serving since 2015. I saw that a few 
messages (both hams and spams) automatically learned by 
amavisd/spamassassin. Today's statistics:


   3616 autolearn=ham
  10076 autolearn=no
   2817 autolearn=spam
134 autolearn=unavailable


That's quite high for spam, ham, AND "unavailable" (which indicates 
something wrong with the Bayes subsystem, usually transient.) This seems 
like a recipe for a mis-learning disaster. For comparison, my 2018 
autolearn counts:


spam: 418
ham: 15018
unavailable: 166
no: 129555

I also manually train any spam that gets through to me (the biggest spam 
target,) a small number of spams reported by others, and 'trap' hits. A 
wide variety of ham is harder to get for training but I have found it 
useful to give users a well-documented and simple way to help. One way 
is to look at what happens to mail AFTER delivery which can indicate 
that a message is ham without needing an admin to try to make a 
determination based on content. The simplest one is to learn anything 
users mark as $NotJunk as ham. Another is to create an "Archive" mailbox 
for every user and learn anything as ham that has been moved there a day 
after it is moved. The most important factor (especially in 
jurisdictions where human examination of email is a problem) is to tell 
users how to protect their email and then do what you tell them, 
robotically. In the US, Canada, and *SOME* of the EU, this is not risky. 
However, I have been told by people in *SOME* EU countries that they 
can't even robotically scan ANY mail content, so you shouldn't take my 
advice as authoritative: I'm not even a lawyer in the US, much less 
Hungary...



I think I have no control over what is learnt automatically.


Yes, you do. Run "perldoc 
Mail::SpamAssassin::Plugin::AutoLearnThreshold" for details.


You can set the learning thresholds, which control what gets learned. 
The defaults (0.1 and 12) mis-learn far too much spam as ham and not 
enough spam. I use -0.2 and 6, which means I don't autolearn a lot but 
everything I autolearn as ham has at least one hit on a substantial 
"nice" rule or 2 hits on weak ones.


There's a lot of vehemence against autolearn expressed here but not a 
lot of evidence that it operates poorly when configured wisely. The 
defaults are NOT wise.



Let's just assume for a moment that 1.4M ham-samples are valid.


Bad assumption. Your Bayes checks are uncertain about mail you've told 
SA is definitely spam. That's broken. It's a sort of breakage that 
cannot exist if you do not have a large quantity of spam that has been 
learned as ham.



Is there a ham:spam ratio I should stick to it?


No.

I presume if we have a 1:1 ratio then future messages won't be 
considered as spam as well.


The ham:spam ratio in the Bayes DB or its autolearning is not a 
generally useful metric. 1:1 is not magically good and neither is any 
other ratio, even with reference to a single site's mailstream. A very 
large ratio *on either side* indicates a likely problem in what is being 
learned, but you can't correlate the ratio to any particularly wrong 
bias in Bayes scoring. It is an inherently chaotic relationship. Factors 
that actually matter are correctness of learning, sample quality, and 
currency. You can control how current your Bayes DB is (USE AUTO-EXPIRE) 
but the other two factors are never going to be perfect.


Re: URIBL_BLOCKED

2018-02-15 Thread Bill Cole

On 15 Feb 2018, at 4:10 (-0500), Tobi wrote:


Am 15.02.2018 um 02:35 schrieb @lbutlr:

On 2018-02-14 (09:55 MST), Tobi <jahli...@gmx.ch> wrote:


Am 14.02.2018 um 17:16 schrieb @lbutlr:

I can't imagine why i'd be over limit, my mail server is tiny.


its not the mailserver that got blocked by limits, but the dns 
resolver

your mailserver uses!


I use my own DNS on Bind 9.12, however the block error is not

appearing today, so...




and does your bind server use other forward servers? Or does it 
directly

resolve the queries from the authorative nameservers? All depends
whether you resolver is in forward mode or not. If it's in forward
mode then it sounds that the ips of those forwarders might got limited


Another possibility is DNS hijacking. Connection providers pitch it as a 
security measure, and I guess it can be for residential customers and 
small businesses that essentially use their connections in the same ways 
as home users, but it's lethal for mail systems. My provider (WOW 
Business) does it by default.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: problem with spamassassin for WIndows

2018-02-17 Thread Bill Cole

On 17 Feb 2018, at 14:48 (-0500), Kevin A. McGrail wrote:

I gave you a suggestion the other day. Your configuration is wrong.  
You aren't passing lint


Look at that line 717 or if that's not the right line number, look at 
your configuration around your DB for Bayes.


I'm not sure that Bayes has anything to do with it specifically. To get 
to Parser.pm line 571, it seems to me that the Parser needs to read a 
line that starts with "ifplugin" (or "if plugin") and that it is 
expecting a plugin name argument on that line. I can reproduce the base 
error by adding these 2 lines to any of the .pre or .cf files in the 
site preferences directory or to the active user_prefs file:


  ifplugin
  endif

The one oddity is that I don't get any ' line [number]' clause in 
the error message from creating that broken config. This may be a 
Windows-specific quirk (I don't have a Windows machine for testing) or 
it may indicate some more arcane issue in how the configuration is being 
parsed.


So the thing to look for is 'ifplugin' in local.cf, any other *.pre or 
*.cf file in the same directory as local.cf, or your user_prefs file. It 
should be followed by the name of a plugin, a block of lines defining 
rules or setting configuration parameters, and an "endif" line.





On 2/17/2018 2:31 PM, Gianluca Furnarotto wrote:

So, anyone can't give me a suggestion?


On 16 febbraio 2018 a 08:24:04, Gianluca Furnarotto 
(keyst...@libero.it <mailto:keyst...@libero.it>) scritto:



Hi Bill,

this is the result of the command you suggested to type:

feb 16 07:21:09.678 [21824] warn: Use of uninitialized value $_[1] 
in hash eleme

nt at Mail/SpamAssassin/Conf/Parser.pm line 571,  line 717.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Junk mixed in with ham on whitelists

2018-02-20 Thread Bill Cole

On 20 Feb 2018, at 16:48, David Jones wrote:

It doesn't seem like a good idea for whitelists to list these senders 
just because most of the email is ham.


I can see no evidence for that in a quick check of my personal mail. In 
10 years:


68 messages
50 spam (all reported)
6 replies to spam reports
2 OoO Autoreplies to mailing messages with vacation info for guys I 
didn't know.

8 messages to single-sender (webite-specific) addresses
2 messages from Namecheap themselves (privateemail.com ) trying to 
arrange an automatic monitoring rig for when their space lands on my 
(extremely irrelevant...) blacklist or a FBL for when I get spam from 
them. This raises the question: if a company whose business model is 
dependent on snowshoe spammers and domain squatters sends email asking 
for unpaid help in evading recognition of their essential evil, is it 
spam?


In the previous decade: 64 messages, 56 spams, 8 ham (all from 3 
websites to tagged addresses.)


Of course, my personal email isn't representative. I reject a 
substantial fraction of the mail from the networks where those domains 
have servers, and for a complex of reasons I have extremely high 
confidence in those rejections being pure spam. So, the above is less 
spammy than if I tagged and delivered.


What's special about such sources isn't that they're mostly ham or even 
significantly less spammy than a random sample of mail, it's that they 
have a lot of tiny customers who barely use email and occasional waves 
of transient spammers.  It makes them hard to pigeonhole either way.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: spamasssassin vs mimedefang scores

2018-02-22 Thread Bill Cole

On 22 Feb 2018, at 4:15, saqariden wrote:


Hello guys,

i'm using mimedefang with spamassasin, when I test an email with the 
command "spamassain -t file.eml", I got results like this:


Dails de l'analyse du message:   (-5.8 points, 3.0 requis)
-5.0 RCVD_IN_DNSWL_HI   RBL: Sender listed at 
http://www.dnswl.org/, high

trust
[70.38.112.54 listed in list.dnswl.org]

-1.9 BAYES_00   BODY: L'algorithme Bayien a alula 
probabilitde spam

entre 0 et 1%
[score: 0.]
 0.8 RDNS_NONE  Delivered to internal network by a host 
with no rDNS

 0.3 TO_EQ_FM_DOM_SPF_FAIL  To domain == From domain and external SPF
failed

However, the SA check which was done trough mimedefang, seems like 
giving other scores, how can i test an email to get these scores, and 
saw the difference.


Typically mimedefang runs as its own special user (e.g. 'defang') which 
may be configured to block normal interactive use or even simple 'su' 
use by root. This means that if you run 'spamassassin -t' in an 
interactive shell, you use the user_prefs, AWL/TxRep and BayesDB for the 
user running that shell, not the special user. This is particularly 
problematic for 'learning' ham and spam for the BayesDB, because it is 
easy to end up either training into a DB that is entirely separate from 
the system-wide one used by mimedefang OR working with the system-wide 
DBs in ways that change ownership of them so that mimedefang can't use 
them.


My solution for this is to use sudo and these shell aliases:

satest='sudo -H -u defang spamassassin -t '
lham='sudo -H -u defang  sa-learn --ham --progress '
lspam='sudo -H -u defang  sa-learn --spam --progress '
blspam='sudo -H -u defang spamassassin --add-to-blacklist '
reportspam='sudo -H -u defang spamassassin -r -t '



Re: problem with spamassassin for WIndows

2018-02-15 Thread Bill Cole

On 15 Feb 2018, at 15:33, Gianluca Furnarotto wrote:


Hi,

I am trying to use Bayes with spamassassin, now it seems stop to 
learn, and
when I use a command as "sa-learn --dump magic", or "sa-learn --sync", 
or other sa-learn commands,

it appears this error:
"Use of uninitialized value $_[1] in hash element at 
Mail/SpamAssassin/Conf/Parser.pm line 571."


Line 571 is this:
" } "
inside these lines.
" elsif ($type == $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST) {
$cmd->{code} = \_addrlist_value;
}" <--- line 571


That absolutely IS NOT line 571 of Mail/SpamAssassin/Conf/Parser.pm in 
SA version 3.4.1. That's line 685.


The relevant lines in Mail/SpamAssassin/Conf/Parser.pm:

   568  
   569  # functions supported in the "if" eval:
   570  sub cond_clause_plugin_loaded {
   571return $_[0]->{conf}->{plugins_loaded}->{$_[1]};
   572  }
   573  

My first guess on this is that your configuration has a typo. Try 
running 'spamassassin --lint' to check it.


The error message indicates that something is calling the subroutine 
'cond_clause_plugin_loaded' in a way that gives it only one parameter 
where it is expecting 2, the first of which is an object reference.





Re: Run expensive test last, and skip if meaningless

2018-02-25 Thread Bill Cole

On 25 Feb 2018, at 11:13 (-0500), Peter Thomassen wrote:

Reminder: My question was not "how to run DNS efficiently" or "how 
does
SpamAssassin run DNS queries", my question was "how can I influence 
the

order of tests".


The canonical answer is: by adjusting rule priority values and using the 
short-circuit feature.


Unfortunately, that's not applicable to DNS tests because SA's code is 
optimized for total scan time. This means that DNS checks, which have 
built-in latency, are started asynchronously before everything else. So 
if you want to change that to postpone DNS checks with the possibility 
of short-circuiting them, you will need to re-architect that part of SA. 
If you choose that route, I expect that patches to make that alternative 
design a configurable option would be welcome upstream but I doubt that 
creating such a mechanism would be made a project priority in any way 
because the current optimization for overall performance is much more 
useful for most users than economizing on DNS queries.


For what it's worth, a few years ago I had to do an analysis of URIBL 
data value with hard numbers and I found that for that particular 
operation, URIBLs were decisive in a large majority of spams accurately 
classified as spam by SA. It is important to note that for this site (as 
for all sites I have done significant work with in this millennium) the 
vast majority of mail never was seen by SA because it was rejected (or 
much less often was exempted from SA by whitelisting) ahead of the DATA 
phase. This also meant that in that particular case, there was no risk 
of hitting the point of URIBL_BLOCKED. If all messages had been run 
through SA, they would have needed to pay for data feeds to have a 
useful spam control function.


Obviously, no 2 mail systems see exactly the same distribution of ham 
and spam, so your circumstances might make buying feeds uneconomic. 
OTOH, unless you're an extremely adept programmer or your time is not 
very valuable, redesigning the way SA runs tests is very likely to be 
the most uneconomic choice available to addressing your root problem.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: IADB whitelist

2017-12-25 Thread Bill Cole

On 25 Dec 2017, at 3:28 (-0500), Sebastian Arcus wrote:

Also, any idea why are there 6 different rules associated with this 
particular whitelist?


IADB has many independent return codes that each have distinct meaning. 
See 
http://www.isipp.com/email-accreditation/about-the-codes/list-of-codes/ 
for details.


If you get mail from an IADB-listed sender that you are 100% sure is 
spam (i.e. not "I would never ask for such mail" but "the recipient 
absolutely did not consent to receiving this mail.") then you should 
report that to ISIPP. "ab...@suretymail.com" is the reporting address 
listed on their website and while I've not had cause to use it, people I 
trust with no reason to lie say that reports to that address do actually 
work to either change sender behavior or eliminate listings. Anne 
Mitchell (head of ISIPP) is an ex-coworker of mine whose integrity and 
dedication to the anti-spam fight (which is dependent on keeping 
*wanted* mail deliverable) I can personally vouch for.


However, the different responses from IADB are VERY nuanced and the two 
strongest rules you listed (RCVD_IN_IADB_OPTIN and RCVD_IN_IADB_VOUCHED) 
are essentially "good intentions" markers. Due to unfortunate 
terminology choices by ISIPP and a willingness to engage in nuance and 
estimate intentions, those aren't really as worthwhile as they might 
seem. The IADB definition of "All mailing list mail is opt-in" is 
(effectively) "we believe that this ESP believes in good faith that 
every recipient has chosen to receive this mail." Their "vouching" for a 
record is an assertion that either the ESP is personally known to ISIPP 
staff as competent and honest OR has maintained stable positive listings 
for >6 months. I'm pretty sure I don't want ANY score for a non-vouched 
record and unlike ISIPP (and some valuable SA contributors!) I really 
don't care much about ESPs' intentions or responsiveness to complaints, 
only about actual spamming behavior. So I have made substantial 
modification on my own system to how IADB results are scored, but those 
specific adjustments are probably not fit for most other sites.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-02 Thread Bill Cole

On 2 Jan 2018, at 5:12 (-0500), Rupert Gallagher wrote:


This is the normative reference.


This is the OBSOLETED normative reference.


RFC 822, pg. 30, section 6.2.3
--
msg-id = "<" addr-spec ">";
addr-spec = local-part "@" domain;
domain = sub-domain *("." sub-domain);
sub-domain = domain-ref / domain-literal;

<host>>


Note that the "@" must also be present as part of the 
well-formed-formula.

When absent, the string is not well formed, and a syntax error occurs.


The change of formal syntax in RFC2822 to remove the reference to domain 
entities was not inadvertent or surreptitious. RFC5322 didn't reverse 
that change.




RFC 5322, pg. 27, section 3.6.4
---

<<  The message identifier (msg-id) itself MUST be a globally unique
   identifier for a message.  The generator of the message identifier
   MUST guarantee that the msg-id is unique.  There are several
   algorithms that can be used to accomplish this.  Since the msg-id 
has
   a similar syntax to addr-spec (identical except that quoted 
strings,
   comments, and folding white space are not allowed), a good method 
is

   to put the domain name (or a domain literal IP address) of the host
   on which the message identifier was created on the right-hand side 
of

   the "@" (since domain names and IP addresses are normally unique),
   and put a combination of the current absolute date and time along
   with some other currently unique (perhaps sequential) identifier
   available on the system (for example, a process id number) on the
   left-hand side.  Though other algorithms will work, it is 
RECOMMENDED

   that the right-hand side contain some domain identifier (either of
   the host itself or otherwise) such that the generator of the 
message
   identifier can guarantee the uniqueness of the left-hand side 
within

   the scope of that domain. >>


Note the use of RFC2119 terms. MUST and RECOMMENDED mean different 
things.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...

Intentionally re-using another site's MIDs is so wrong that I'd happily 
make it break hard.


HOWEVER, the idea of enforcing any standard on MIDs beyond gross format 
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the sole 
user is ludicrous.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 10:33 (-0500), David Jones wrote:


On 01/01/2018 09:29 AM, Bill Cole wrote:

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...



Ok.  I stand corrected then.  What about bounces?  Don't they 
intentionally keep all of the same headers with an empty 
envelope-from?


Nope. A modern standard 'bounce' message is a MIME entity with a special 
type, denoted by a header somewhat like this:


Content-Type: multipart/report; report-type=delivery-status;
  boundary="blah.foo.bar-baz/example.com"

It should have a unique MID, a Date header reflecting the time of the 
bounce, a Subject header like "Undelivered Mail Returned to Sender", a 
To header with the original message's envelope sender, a From header 
clearly identifying the last MTA to hold the message and it's non-human 
nature such as 'mailer-dae...@example.com (Mail Delivery System)', and 
Received headers only reflecting the transit from that MTA to the target 
of the bounce.


One PART of a bounce is a message/rfc822 entity which has at least the 
headers of the original message and usually some or all of the body


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 12:47 (-0500), Matus UHLAR - fantomas wrote:


On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:
the gross format in RFCs 822,2822 and 5322 describes message-id 
consisting

of local and domain part, thus is must contain "@".


On 01.01.18 12:17, Bill Cole wrote:
No, it does not. Re-read the cited sections. From RFC5322, the ABNF 
definition:


  msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]


this is the part that says message-id must consist of local and domain
parts. It just says it implicitly, not explicitly, but:

It's not possible to construct Message-Id without the "@" while 
conforming

to any of mentioned RFCs.


True, but one could just as easily split up a UUID with '@' instead of 
'-' and comply while being as sure of uniqueness as could ever matter. 
Or put full UUIDs on both sides of the '@'. If a V1 UUID is on the 
right, it is even a host-unique identifier after a fashion.


Also note that if you demand that MIDs contain '@' with conforming 
strings on both sides, you risk losing mail that users want. This is 
a mistake I have made.


what exactly was the problem? Message-Id without the "@" or the
non-conforming parts there?


Missing '@'

Some messages lacking it were generated by antique systems that had 
proven themselves resistant to evolutionary pressures.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 14:30 (-0500), Alan Hodgson wrote:


On Mon, 2018-01-01 at 10:29 -0500, Bill Cole wrote:

[...]

HOWEVER, the idea of enforcing any standard on MIDs beyond gross
format 
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the
sole 
user is ludicrous.


I've had good success junking anything with one of my domains in the
message-id, where I know the mail isn't actually from someone in that
domain. That's a pretty solid spam signature.


Yes, I was a bit imprecise. Very specific idiosyncratic MID patterns can 
be extremely accurate spam indicators. Enforcement of RFC or common 
practice "standards" is riskier than it is worth.



Lack of any message-id is also significant, but sadly there are still
some real senders sending mail with no message-id.


Yes. It's one of the most annoying persistent sorts of mail sloppiness.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:

the gross format in RFCs 822,2822 and 5322 describes message-id 
consisting

of local and domain part, thus is must contain "@".


No, it does not. Re-read the cited sections. From RFC5322, the ABNF 
definition:


   msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]

   id-left =   dot-atom-text / obs-id-left

   id-right=   dot-atom-text / no-fold-literal / obs-id-right

   no-fold-literal =   "[" *dtext "]"

Note the lack of specification of "local" and "domain" parts.

Also note that if you demand that MIDs contain '@' with conforming 
strings on both sides, you risk losing mail that users want. This is a 
mistake I have made.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 3:54 (-0500), Rupert Gallagher wrote:

We reject anything whose mid does not include the fqdn or address 
literal of their sending server. We do this because the RFC says 
explicitly that the mid *MUST* have those features.


This is a blatant falsehood. Relevant RFCs:

https://tools.ietf.org/html/rfc5322#section-3.6.4
https://tools.ietf.org/html/rfc2822#section-3.6.4
https://tools.ietf.org/html/rfc822#section-4.6

The only "MUST" in regard to MID content in any of those is uniqueness. 
Use of a domain identifier is merely RECOMMENDED.


Beyond that, it is *IMPOSSIBLE* for a receiving system to reliably 
determine whether the right-hand part of a MID is a valid host or domain 
identifier for the generator of the MID.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-03 Thread Bill Cole

On 2 Jan 2018, at 20:39, Alex wrote:

Is it possible to at least enforce that the message-ID has a valid 
domain?


Not reliably.

About 1.5% of my personal non-spam email over the past 20 years has had 
"localhost" as the right hand side of the MID. This implies a de facto 
RFC violation because it poses a real risk of duplication.


An additional ~1% has a MID header with either no dots or no '@'. This 
includes mail from Facebook, Seagate, Apple, one of my credit unions, a 
medical supply house that we buy from for my son's care, GMX (German 
freemail provider), multiple regulars on a private mailing list of 
old-timer anti-spam nutcases, the postmaster of LinkedIn sending 
personal mail with his linkedin.com address via GMail, iFixit, Verizon's 
SMS->Email gateway, and multiple ESPs including Eloqua and Digital 
River. At least one recent version of CommuniGate Pro (6.1.2) generated 
event invitations with a bare UUID as the MID.


In other words: a significant number of messages, largely legitimate 
transactional messages, lack a FQDN in the MID.


I have run an environment where each MTA node in the external gateway 
layer would add a MID with its own FQDN to any message passing through 
missing a MID. Those names could not be resolved in the world at large, 
but they were absolutely valid and guaranteed unique.


Re: Periodic error

2018-08-01 Thread Bill Cole

On 1 Aug 2018, at 12:12 (-0400), Nick Bright wrote:

spamd[1833]: plugin: eval failed: error closing socket: Bad file 
descriptor at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/DnsResolver.pm line 
185,  line 156.


What version of SpamAssassin are you using? Those line numbers make no 
sense with the 3.4.1 release or either current development branch. The 
last version it seems to make sense for is 3.3.2, which is antique.


This is particularly important because that module makes heavy use of 
the Net::DNS module, which has undergone a huge amount of change in 
recent years, much of it wise and some of it causing old code to break. 
If you are using a modern Net::DNS and an antique SpamAssassin, there 
will be trouble.


I'm sometimes receiving this error in my maillog, certainly not for 
every message that gets scanned. It seems to come in bursts.


I've been unable to determine what's causing the error though. I'm 
running a BIND9 resolver on 127.0.0.1.


When it occurs, the system sees high load average and poor performance 
due to iowait caused by the error.


I don't think it's a file descriptor limit, as I've set my system to 
512,000 for /proc/sys/fs/file-max and 65535 for ulimits, and "sysctl 
fs.file-nr" shows 17,056 out of 512,000 in use.


You are correct. This "file descriptor" is a socket being used for DNS 
resolution.




Suggestions? Thoughts?


Upgrade to a modern SpamAssassin. If that's not possible, make sure that 
you are using a Net::DNS of a similar age to the antique SA.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steadier Work: https://linkedin.com/in/billcole


Re: Phish with xps attachment

2018-08-07 Thread Bill Cole

On 7 Aug 2018, at 15:31 (-0400), Martin Gregorie wrote:


On Tue, 2018-08-07 at 14:09 -0400, Alex wrote:


Anyone have ideas for viewing inside of an XPS file or otherwise
blocking phish attempts with xps attachments?

https://pastebin.com/KtMnNPAg

I don't think this is validly base64 encoded. I chopped it down to 
just

the supposed base64 text and fed it through the Linux base64 decode
utility, which gave up and said it isn't valid base 64 after decoding
about 150 characters.


Maybe check how you did that. Using the mimeexplode tool from the Perl 
MIME-Tools package:


# mimeexplode /tmp/xpsspam
Message: msg0 (/tmp/xpsspam)
Part: msg0/msg-53100-1.txt (text/plain)
Part: msg0/msg-53100-2.html (text/html)
Part: msg0/Remittance Copy.xps (application/octet-stream)
# ls -lAR msg0/
total 720
-rw-r--r--  1 root  wheel  354446 Aug  7 16:49 Remittance Copy.xps
-rw-r--r--  1 root  wheel 336 Aug  7 16:49 msg-53100-1.txt
-rw-r--r--  1 root  wheel4629 Aug  7 16:49 msg-53100-2.html
# file msg0/Remittance\ Copy.xps
msg0/Remittance Copy.xps: Zip archive data, at least v2.0 to extract
# zipinfo msg0/Remittance\ Copy.xps
Archive:  msg0/Remittance Copy.xps   354446 bytes   18 files
-rw 4.5 fat 1063 b- defS  1-Jan-80 00:00 [Content_Types].xml
-rw 4.5 fat  567 b- defS  1-Jan-80 00:00 _rels/.rels
-rw 4.5 fat 3566 b- stor  1-Jan-80 00:00 
docProps/thumbnail.jpeg

-rw 4.5 fat  564 b- defS  1-Jan-80 00:00 docProps/core.xml
-rw 4.5 fat  287 b- defS  1-Jan-80 00:00 
Documents/1/_rels/FixedDoc.fdoc.rels

-rw 4.5 fat  320 b- defS  1-Jan-80 00:00 FixedDocSeq.fdseq
-rw 4.5 fat2 b- defN  1-Jan-80 00:00 
Resources/31AB0740-4E67-23ED-1861-906DB2445D30.odttf
-rw 4.5 fat61580 b- defN  1-Jan-80 00:00 
Resources/36F32615-19BB-2EEA-BD7D-5051E214FE53.odttf
-rw 4.5 fat   266980 b- defN  1-Jan-80 00:00 
Resources/128F6B1F-5739-13F9-6E4A-207A4466DE12.odttf
-rw 4.5 fat 1346 b- defS  1-Jan-80 00:00 
Documents/1/Pages/_rels/1.fpage.rels
-rw 4.5 fat  282 b- defS  1-Jan-80 00:00 
Documents/1/FixedDoc.fdoc
-rw 4.5 fat 4990 b- defN  1-Jan-80 00:00 
Documents/1/Structure/Fragments/1.frag
-rw 4.5 fat50574 b- defN  1-Jan-80 00:00 
Documents/1/Pages/1.fpage
-rw 4.5 fat 7042 b- stor  1-Jan-80 00:00 
Resources/Images/image_0.png
-rw 4.5 fat  290 b- stor  1-Jan-80 00:00 
Resources/Images/image_1.png
-rw 4.5 fat  481 b- stor  1-Jan-80 00:00 
Resources/Images/image_2.png
-rw 4.5 fat  386 b- defN  1-Jan-80 00:00 
Documents/1/Structure/DocStructure.struct
-rw 4.5 fat   527552 b- defN  1-Jan-80 00:00 
Resources/01EC0564-4D18-6AF6-270E-667DA377AC79.odttf

18 files, 983422 bytes uncompressed, 350592 bytes compressed:  64.3%


The payload is not in that XPS document, which is just a picture that 
claims to be an Office365 document with a big "Open File" button. That 
region is linked to a URL (MUNGED: hxxps://ssllink(dot)me/1sta) which at 
present redirects to a Brazilian domain which yields a 500 reply with a 
"bandwidth exceeded" message. Presumably the payload used to be there...


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steadier Work: https://linkedin.com/in/billcole


Re: Update to Ubuntu 18.04.1 seems to have partially broken SA

2018-08-17 Thread Bill Cole
On 17 Aug 2018, at 18:49 (-0400), Chris wrote:

> Not in one of my rules:

OK, but also not part of the standard ruleset: 3rd-party rules. Kevin is a 
highly respected and active leader in the SpamAssassin project and the ASF as a 
whole but those rules aren't part of his contributions to the project. They are 
useful for small & medium sized business mail systems but don't really fit the 
broad safety & efficacy requirements of the official distribution. You use them 
by active choice, not by trusting in what SA provides...

>
> /etc/mail/spamassassin$ grep -i "POWERBALL" KAM.cf
> body  __KAM_LOTTO5/(POWERBALL LOTTO|freelotto
> group|Royal Heritage Lottery|(British|UK) National( Online)?
> Lottery|U\.?K\.? Grand Promotions|Lottery Department UK|Euromillion
> Loteria|Luckyday International Lottery|International Lottery|Euro -
> Afro Asian Sweepstake|urawinner|Free Lotto Sweepstakes|PROMOTION
> DEPARTMENT|PROMOTION\/PRIZE AWARD|Nederlandse Internationale
> Loterij|EURO MILLIONS|APPLE LOTTERY ONLINE|MSW MEGA JACKPOT|MICROSOFT
> EMAIL PROMO|MSNlottery|ECOWAS|Nigeria|National
> Lottery|claim.{1,10}your.gbp|won.you.{1,10]gbp)/is
> header__KAM_LOTTO8From =~
> /Lottery|powerball|western.union/i

If you're using KAM.cf, you should set up a mechanism for keeping that file up 
to date. This typo was fixed over 2 months ago (as far back as I have online 
backups of it) and the current KAM.cf has dozens of other changes in that time.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steadier Work: https://linkedin.com/in/billcole


signature.asc
Description: OpenPGP digital signature


Re: spample: porn extortion with pure numeric From domain and base64 body

2018-07-17 Thread Bill Cole

On 17 Jul 2018, at 20:00 (-0400), Chip M. wrote:


There's a new morph of the porn extortion campaign, with some
interesting under-the-hood changes.

The previous ones were always:
- two "quoted-printable" parts (plain text, html)
- "From" Outlook accounts
- sent via Outlook/Hotmail/MS IPs (no other IPs in route)
- passed both DKIM and SPF

The new version has:
- one base64 html part
- pure numeric "From" domain (same address in SMTP & header)


Why would anyone accept mail with a SMTP sender domain that is purely 
numeric? Checking that the sender domain exists  is a fundamental 
necessity in operating an Internet-facing MTA.



- sent via compromised computers (and typically 3 or 4 Received IPs)


Or not. The Received headers in your sample beyond the top one are 
entirely ridiculous, obviously fake.They are not even internally 
consistent.



- bogus domains so neither DKIM nor SPF possible


So what? Bogus domains are bogus. If you can't resolve an A or MX for an 
envelope sender domain, nothing else (including SA) needs be done. It's 
bogus mail.


[...]

Three other unusual things (all demonstrated in this spample):

1. 9 of the 13 had a two part pure numeric claimed host (see below).
I don't recall seeing that before.
** Is that a botnet fingerprint?


Probably not in a strict sense. Obviously it is a fingerprint of a 
particular stream of spam, but it is not a behavior that is widespread, 
such as claiming to be "User" or "ylmf-pc" or  the IP of the server 
being spammed. My guess is that it's one spammer with a severe clue 
shortage.



2. 9 of the 13 lacked a trailing "=".
I don't recall seeing that before.


That's slightly more common than normal, but not spectacularly more when 
considering that the input data isn't random. The encoded version of 1/3 
of all inputs will not end with '=' and 1/3 will end with '==' while 
none should ever end with '==='.



It's probably worth a quick test, if easy to implement.


Nope. Absolutely NOT spamsign, unless you consider having an even 
multiple of 3 bytes in the unencoded data to be spamsign. :)



3. 4 of the 13 failed to hit "MIME_BASE64_TEXT".
I'm curious what the issue is.
The trailing "=" was not a factor.
The main thing that stood out is that the hits all had this CT:
Content-Type: text/html;
charset="us-ascii"
The misses all had:
Content-Type: text/html;
charset="iso-8859-1"


I have not traced the code to be sure, but as I understand it, it 
shouldn't be possible to hit MIME_BASE64_TEXT unless the character set 
is US-ASCII. Since iso-8859-1 is an 8-bit character set, it is entirely 
proper and frequently essential that either Base64 or QP be used to 
encode it.


[...]

I've just added the above suggested SA metas, and a
low level (non-regex) pure numeric TLD test.


I would not expect the numeric TLD test to hit much in the submitted 
corpora, since NO_DNS_FOR_FROM is not hitting enough to have a 
meaningful score and a pure numeric TLD in the envelope sender would 
always hit NO_DNS_FOR_FROM.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steadier Work: https://linkedin.com/in/billcole


Re: spample: porn extortion with pure numeric From domain and base64 body

2018-07-17 Thread Bill Cole
And in addition...

On 17 Jul 2018, at 20:00 (-0400), Chip M. wrote:

> 3. Pure numeric TLDs appear to be non existent (so far!)

I expect that this will hold true for a long time.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steadier Work: https://linkedin.com/in/billcole


Re: Best practice for learning submissions

2018-07-23 Thread Bill Cole
[N.B.: Your prior correspondent is not able to post to this list, so we 
only saw your side of that exchange.]


On 23 Jul 2018, at 19:38 (-0400), Nick Bright wrote:

When requesting submissions from users for use with sa-learn, if they 
are going to forward the message somewhere; is it best for that to be 
forwarded as an attachment, or forwarded inline? Will sa-learn 
automatically understand "the spam is attached" if it's an attachment?


Learning from a mailbox of my own spam (with full headers - the actual 
mails) is quite different from users *forwarding* spam for training.


So I ask: what is the best practice for learning submissions when 
using site-wide bayes?


The goal is to get a copy of the message that is identical to what SA 
saw when it arrived. For IMAP users, this is easiest to get with a 
'missed spam' mailbox into which users can move messages for learning. 
If you must rely on forwarded submissions, make sure users are 
forwarding messages as attachments, and have the target deliver into a 
mailbox that is processed to extract the 'message/rfc822' MIME object(s) 
in those submissions and learn those, not the submission mail itself.


Learning ham is harder, because generally speaking it is not a good idea 
to deliver mail that SA believes is spam *at all* unless you can't 
reject it in SMTP. As a result, users don't have 'false positive' 
samples to submit (although their irate would-be correspondents 
could...) In an IMAP environment, you can identify borderline ham that 
is useful to learn by looking at tagging and archiving. If the user 
assigns a keyword to a message and/or moves it to a mailbox (other than 
ones with names like Junk and Spam and Trash) you can usually be sure it 
is ham. If your users are trainable (it DOES happen...) you might even 
get them to use specific keywords and/or archival mailboxes and use 
those to feed ham training. In a POP3 environment, this is a much harder 
problem to solve.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steadier Work: https://linkedin.com/in/billcole


Re: Best practice for learning submissions

2018-07-24 Thread Bill Cole

On 24 Jul 2018, at 13:39, Nick Bright wrote:


On 7/23/2018 11:49 PM, Bill Cole wrote:
The goal is to get a copy of the message that is identical to what SA 
saw when it arrived. For IMAP users, this is easiest to get with a 
'missed spam' mailbox into which users can move messages for 
learning. If you must rely on forwarded submissions, make sure users 
are forwarding messages as attachments, and have the target deliver 
into a mailbox that is processed to extract the 'message/rfc822' MIME 
object(s) in those submissions and learn those, not the submission 
mail itself.

Any specific utilities you could suggest?


I've used an adapted version of the mimeexplode tool from the Perl 
MIME-Tools distribution 
(https://metacpan.org/source/DSKOLL/MIME-tools-5.509/examples/mimeexplode) 
in conjunction with formail (part of Procmail.)




Re: Line too long [rfc 2822, section 2.1.1]

2018-07-13 Thread Bill Cole

On 13 Jul 2018, at 14:49, Rupert Gallagher wrote:


A little survey on your local policies...

What do you do when a subject line is longer than 78 characters?

A. Reject
B. Accept as spam
C. Accept


Accept, absent some actual spam sign.

Note that the 78-character recommendation is not applicable to logical 
(decoded and unfolded) header fields but only to lines in the 
uninterpreted & unmodified transport format. To catch that in SA, a test 
would be something like:


header LONG_SUBJ_LINE Subject:raw =~ /.{79,}/m


And that will match mail that many people really want to not be blocked.


Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)

2018-08-30 Thread Bill Cole
On 30 Aug 2018, at 12:40, Grant Taylor wrote:

> On 08/30/2018 10:16 AM, Bill Cole wrote:
>> It's hard to understand this circumstance based on the generic description.
>>
>> It appears that you have a configuration where a relay is in 
>> trusted_networks (i.e. you believe what it asserts in Received headers) but 
>> it is NOT in internal_networks so it is in the synthetic 
>> X-Spam-Relays-External pseudo-header, it is the only element in 
>> X-Spam-Relays-External so the message matches__DOS_SINGLE_EXT_RELAY, and it 
>> has no rDNS so the message matches __RDNS_NONE.
>>
>> So: why is that nameless machine that you cannot make a named machine NOT in 
>> internal_networks?
>
> I don't know if this is the OP's case or not, but the following example comes 
> to mind.
>
> SA (running on your receiving MTA) receives a message from an MTA (which is 
> itself an MSA) of an external Business-to-Business partner (thus a trusted 
> MTA that is not internal to the recipient's organization) which itself 
> received the message from a client on an RFC 1918 network without reverse DNS.

If that MSA is requiring authentication (as it should) and recording that in 
the Received header (as it should) then as I understand it, the handoff of the 
message will not be considered for __RDNS_NONE.

>> Of course not, but if a machine is trusted to tell the truth in Received 
>> headers and has no rDNS because it is talking to a close affiliate on a 
>> RFC1918 IP, in what sense is it not internal?
>
> Trusting a B2B partner's external MTA.

OK, but in that case the MTA would use an IP that should be in trusted_networks 
and have rDNS.

>> Or is it in internal_networks but there's something wrong in how SA is 
>> parsing Received headers to build X-Spam-Relays-External?
>>
>>
>> I think the fix for all is for everyone to get their internal_networks and 
>> trusted_networks configurations correct.
>
> What should trusted_networks and internal_networks be set to in the B2B 
> scenario I'm describing?

The partner machine's IP should be in trusted_networks AND should have rDNS as 
an explicit technical requirement of the cooperation, which is entirely 
reasonable.

signature.asc
Description: OpenPGP digital signature


Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)

2018-08-30 Thread Bill Cole
On 30 Aug 2018, at 15:56, Grant Taylor wrote:

> On 08/30/2018 01:08 PM, Bill Cole wrote:
>> If that MSA is requiring authentication (as it should) and recording that in 
>> the Received header (as it should) then as I understand it, the handoff of 
>> the message will not be considered for __RDNS_NONE.
>
> Okay.
>
> What happens if the MSA isn't using authentication and instead is configured 
> to blindly allow relaying from the local / internal / private LAN.  As is / 
> was traditional for a long time for ISPs to allow relaying from their 
> (client) IP address space.  (Granted, this is against best practices.)
>
> How would this type of scenario effect your statement above?

That will depend on how that particular MTA constructs its Received headers in 
relation to the parsing in Mail::SpamAssassin::Message::Metadata::Received, 
which is non-trivial to describe in human language.



>> OK, but in that case the MTA would use an IP that should be in 
>> trusted_networks and have rDNS.
>
> Agreed.
>
>> The partner machine's IP should be in trusted_networks AND should have rDNS 
>> as an explicit technical requirement of the cooperation, which is entirely 
>> reasonable.
>
> Okay.
>
>
>
> -- 
> Grant. . . .
> unix || die


signature.asc
Description: OpenPGP digital signature


Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)

2018-08-30 Thread Bill Cole

On 30 Aug 2018, at 10:01, Matus UHLAR - fantomas wrote:


On 30.08.18 09:49, Kevin A. McGrail wrote:

I feel that you are fighting a bigger battle than one rule in SA.


two rules actually ;-) (with two more possible).

Without RDNS, you are running afoul of the postmaster rules of 
virtually
every major email player.  You will have massive deliverability 
issues..


Those IP addresses are in internal network with private IP ranges. 
When

connecting to world, their IPs are NAtted to public.

even if I fixed the DNS (and I can't since the network is not in my
control), HDR_ORDER_FTSDMCXX_DIRECT would still apply.


It's hard to understand this circumstance based on the generic 
description.


It appears that you have a configuration where a relay is in 
trusted_networks (i.e. you believe what it asserts in Received headers) 
but it is NOT in internal_networks so it is in the synthetic 
X-Spam-Relays-External pseudo-header, it is the only element in 
X-Spam-Relays-External so the message matches__DOS_SINGLE_EXT_RELAY, and 
it has no rDNS so the message matches __RDNS_NONE.


So: why is that nameless machine that you cannot make a named machine 
NOT in internal_networks?


I believe faking DNS is not what you advise to me, although it would 
"fix"
the problem temporarily (but could create another problem should the 
DNS be

created later).


Of course not, but if a machine is trusted to tell the truth in Received 
headers and has no rDNS because it is talking to a close affiliate on a 
RFC1918 IP, in what sense is it not internal?


Or is it in internal_networks but there's something wrong in how SA is 
parsing Received headers to build X-Spam-Relays-External?



That is why I believe that adding ALL_TRUSTED would solve the problem
without unnecessary issues for others.

Yes, I can do that locally - but by redefining rule I could miss it 
getting

fixes or improved later.

And since different people have already reportted this problem in the 
past,

I would like to make the fix possible for all, if viable.


I think the fix for all is for everyone to get their internal_networks 
and trusted_networks configurations correct.




Re: Non-ascii subjects with images

2018-09-01 Thread Bill Cole

On 1 Sep 2018, at 18:22 (-0400), David B Funk wrote:

On the other-hand, if you want to decode the subject line and then 
pattern-match against all the possible UTF-8 emojies, you're going to 
end up with a rather unwieldy rule.


SA "header" rules match against decoded headers, not the Base64 or QP 
encoded text.


In principle, modern Perl can match against named Unicode "properties" 
so in principle it should be possible to have a rule something like:


header EMOJI_IN_SUBJ  Subject =~ '/[\p{Miscellaneous Symbols and 
Pictographs}\p{Emoticons}\p{Ornamental Dingbats}]/'


HOWEVER: this does not work in current SA. I have not dissected exactly 
why but I know there have been many problems in handling Unicode in the 
SA code. Universal Unicode support is a defining goal of SA 4.0.0, so 
maybe this would be possible in the svn 'trunk' codebase where a lot of 
that work is done. On the other hand, it may be a consequence of SA 
parsing rules too harshly and mangling that particular odd RE syntax.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)


Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)

2018-08-31 Thread Bill Cole

On 31 Aug 2018, at 4:53, Matus UHLAR - fantomas wrote:


Note that I list internal clients as trusted, not as internal.

Maybe this is the problem.


Yes, maybe...


Long time ago I learned to configure dynamic IP addresses (dialups) as
trusted, but not as internal.


They probably should be neither.


In this case, clients are internal, not dialup, but I still think they
should not be listed in internal_networks (as I don't trust them not 
to

spoof anything).


If you do not trust them not to spoof anything, they absolutely must not 
be in trusted_networks.


It seems to me that you have a technical & management arrangement 
unsuited to the SpamAssassin 
trusted_networks/internal_networks/msa_networks logical model. My 
recommendation would NOT be to modify stock rules that are constructed 
with that logical model as a base assumption, but rather to create your 
own mitigating rules to handle the fact that you seem to want to always 
accept mail from certain internal clients which are nameless, 
untrustworthy, and sources of mail with features that in the world at 
large mostly correlate to spam.


Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)

2018-08-31 Thread Bill Cole

On 31 Aug 2018, at 4:05, Matus UHLAR - fantomas wrote:


On 08/30/2018 10:16 AM, Bill Cole wrote:
It's hard to understand this circumstance based on the generic 
description.


It appears that you have a configuration where a relay is in
trusted_networks (i.e.  you believe what it asserts in Received 
headers)

but it is NOT in internal_networks so it is in the synthetic
X-Spam-Relays-External pseudo-header, it is the only element in
X-Spam-Relays-External so the message 
matches__DOS_SINGLE_EXT_RELAY, and

it has no rDNS so the message matches __RDNS_NONE.

So: why is that nameless machine that you cannot make a named 
machine NOT in internal_networks?


multiple client PCs in the local network.

and as client PCs, I don't want to put them into internal_networks.
(And if I remember correctly, I should not).


This is a great example of why it is always helpful to have actual (or 
carefully constructed) samples of mail and of how that mail is analyzed 
by SA in order to solve a classification problem.  I still don't have a 
solid understanding of how this mail is flowing and what sort of trust 
you have in the behavior of the specific machines involved in generating 
and/or transporting the mislabeled email, so I can't say for sure how 
you should classify those client PCs.


As I said in my earlier message today, I think you have a circumstance 
that can't be forced into how SA classifies hosts.





On 30 Aug 2018, at 12:40, Grant Taylor wrote:
I don't know if this is the OP's case or not, but the following 
example

comes to mind.

SA (running on your receiving MTA) receives a message from an MTA 
(which
is itself an MSA) of an external Business-to-Business partner (thus 
a
trusted MTA that is not internal to the recipient's organization) 
which
itself received the message from a client on an RFC 1918 network 
without

reverse DNS.


On 30.08.18 15:08, Bill Cole wrote:
If that MSA is requiring authentication (as it should) and recording 
that
in the Received header (as it should) then as I understand it, the 
handoff

of the message will not be considered for __RDNS_NONE.


Authentication not implemented yet, and telling the network admins 
they must
to implement it now that I have installed spamassassin, is not 
acceptable.


Tuning DNS is of course possible but it requires some time.


Yes. My response to Grant was solely in regards to his hypothetical.


Re: __HDR_ORDER_FTSDMCXXXX hitting windows live mail (and outlook express)

2018-08-30 Thread Bill Cole
On 30 Aug 2018, at 18:02, Grant Taylor wrote:

> On 08/30/2018 03:50 PM, Bill Cole wrote:
>> That will depend on how that particular MTA constructs its Received headers 
>> in relation to the parsing in 
>> Mail::SpamAssassin::Message::Metadata::Received, which is non-trivial to 
>> describe in human language.
>
> Fair enough.
>
> Would it be possible for this scenario to present with the symptoms that the 
> OP described?

I don't think so, given the description, but maybe.

Mail::SpamAssassin::Message::Metadata::Received implements a baroque ad hoc 
parsing mechanism that has been adapted organically for most of 2 decades and 
which "knows" many special cases where a particular Received header pattern 
indicates a trusted hand-off.

My understanding is that the client lacking rDNS in this case is talking 
directly to the SA host, which is a simpler case.

> Thank you for humoring me as I try to learn.

No problem. I make no claim to knowing absolutely everything about how the 
*_networks and Received header parsing behaves, or even to know better than 
anyone else in particular, but I've fought with it a bunch...

signature.asc
Description: OpenPGP digital signature


Re: From name containing a spoofed email address

2018-01-19 Thread Bill Cole
On 19 Jan 2018, at 10:20 (-0500), Rupert Gallagher wrote:

> Empty Message

You're repeating yourself...


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: From name containing a spoofed email address

2018-01-19 Thread Bill Cole

On 19 Jan 2018, at 20:02 (-0500), jdow wrote:

After your first time being a victim of cyberstalking you'll soon 
enough wish your "from" line was as generic as mine. People who put 
their full name in the From: line haven't been mugged yet. I spent a 
year learning about this 1985-1986.


I think that's variable. I had issues 95-97 with both a herd of Usenet 
kooks and the Church of Scientology (for my small role in defending 
a.r.s) that included in-person confrontations but I didn't then drop nor 
have I since dropped the use of my real name online, (with a partial 
exception during my divorce when I reverted to my birth surname before 
it was official) despite an attenuated but never really dead trickle of 
net-originated hostility for 20+ years. I think one's individual 
vulnerabilities make a huge difference, as there are threats that would 
literally be laughable to me which would be legitimately and justifiably 
terrifying to others. The worst I got was 2 visits from CPS in response 
to anonymous 'tips' of child abuse and a threat of a beating from a man 
who actually showed up at my door but didn't stay long enough for any 
substantive interaction after he made a snap judgement of his 
prospects...


OTOH, if I were a woman or looked less like a biker-bar bouncer or had a 
history I'd rather not have widely known, I'd almost surely evaluate my 
risk differently. This is a hard problem that no one has yet solved.


As a byproduct of this habit of mine, when I see a "To: John" or other 
name than mine it's automatically spam, especially when it cannot even 
get the gender right.


That can be useful even without a nym in the From header, although it is 
helpful to have a tricky name. e.g. no one has ever called me "Willy" 
except for a few spammers.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: From name containing a spoofed email address

2018-01-19 Thread Bill Cole

On 19 Jan 2018, at 16:17 (-0500), Chip wrote:


Do you mean don't whitelist_auth *@example.com *unless* they have
published spf/dkim?


I can't speak to Dave's meaning (although I value it...) but in fact 
whitelist_auth directives only have any effect if the domain has 
published SPF or DKIM records (and in the latter case, signs mail.) 
Having those directives is harmless if they don't support one of those 
authentication mechanisms.



Certainly paypal and chase (your examples where you would use
whitelist_auth) have real human users. . .


Nope.

OK, so I don't know about those SPECIFIC domains but in general, major 
consumer-facing brand holders are usually smart enough (or hire ESPs 
smart enough...) to keep their humans and their non-human bulk senders 
segregated by domain and relevant authentication mechanisms. For 
example, a decade ago I had personally specific addresses directly under 
the audiusa.com and vw.com domains but neither of those domains had ANY 
bulk sender addresses except in subdomains and those subdomains shared 
NO authentication mechanisms with the base domains that had human users. 
PayPal and Chase may have stupider admins & governance today than VWoA 
had a decade ago, but I doubt that.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Maxium URL acceptable length

2018-01-23 Thread Bill Cole

On 23 Jan 2018, at 11:55 (-0500), Pedro David Marco wrote:


Shall SA accept URLs 5MB big for example?


Generally speaking, SA should not be seeing whole messages that big, 
much less single URLs. Beyond the slowness and the resource demands of 
scanning large messages, the discernment power of SA fizzles out around 
500KB. You won't catch much of the spam that is very large with SA, 
because it isn't very similar to the spam SA is designed for or usually 
trained on.


But to the original question, it is unfortunately true that entities 
which are generally recognized as legitimate sometimes use URLs in email 
that exceed 1KB, while URLs longer than 2KB are quite rare in ham or 
spam. Some data:


For a while I had test rules that hit URLs with long parts after the 
hostname and found that a 600 character threshold was useless, with a 
tiny correlation to ham. At 800 there was a stronger but still not 
useful correlation to ham. Over 1000 it was a minor menace, hitting 5 
times as much ham as spam with most of the spam already scored in double 
digits by SA and very few that had slipped past SA. I killed the rules 
as a failed experiment.


Today I did a very rough check of an unrepresentative corpus 
(hand-classified but containing only ham and SA escapees) of 95k 
messages (93k ham/2k spam) from the past 42 months. Longest URL is 2054 
characters after the hostname and that one is in ridiculously 
pathological spam whose text/plain part is mostly HTML-encoded versions 
of UTF-16(?) entities. The next longest is 1852 characters and it's in 
ham. I see no way to make the length of URLs a useful spam test.


However, there is a bright side of that. While it will not catch much, 
it is *probably* perfectly safe to set a prudent limit on URLs (say, 
5KB?) and not need to worry much about FPs.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: how to grep multiline add-header X-Spam lines

2018-03-01 Thread Bill Cole

On 28 Feb 2018, at 16:13 (-0500), RW wrote:


On Wed, 28 Feb 2018 21:01:36 +0100
Benny Pedersen wrote:


how do one make multiline grep of add-header line, this is imho
triggy since it on long lines continue on next line with a first char
space, if one could help me solve it i be thankfull


If you want to use grep, you can pipe the files through an awk
one-liner to unfold the headers.


That works, but it is probably more convenient (if one has the procmail 
package installed or can install it easily and doesn't have awk syntax 
in the wetware) to use formmail -cs



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Dealing with links to malicious documents

2018-03-13 Thread Bill Cole

On 13 Mar 2018, at 14:21 (-0400), John Hardin wrote:

d) Don't accept emails from outside your organization that link to 
hosted documents. The document needs to be attached, so that it can be 
scanned. Unfortunately this is not feasible if you're not a (at least 
semi-)monolithic organization where you can apply such policies.


Also not feasible if any users subscribe to this list or most technical 
discussion mailing lists. For example, here you are likely to get links 
into the SA Wiki or to KAM's rules. On the Postfix list it is a rare 
week that does not have multiple links to the DEBUG_README file posted. 
The example provided was apparently to a directory (URL ending in '/') 
but redirected to a .doc.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Problems with SORBS?

2018-04-07 Thread Bill Cole

On 6 Apr 2018, at 8:08, Martin Gregorie wrote:


I'm getting a lot of SORBS lookups rejected due to an "unexpected
RCODE". Is anybody else seeing these?


I'm sure someone is...

There are none of those where I see. If the "unexpected RCODE" is 
SERVFAIL, it was likely transient on their end. If it was REFUSED then 
you may need to whether you might be hitting whatever the volume caps 
are (I have no idea what the SORBS volume caps are.)


Re: FSL_BULK_SIG still active?

2018-04-07 Thread Bill Cole

On 7 Apr 2018, at 8:08 (-0400), Robert Boyl wrote:


Hi, everyone

Pls...

Is this still an active spamassassin test?


No. It is a 'sandbox' rule that got auto-promoted at some point and was 
auto-demoted March 12. If you run sa-update daily and restart any 
persistent processes using the rules afterwards, you will keep up with 
those automatic changes.



header   __FSL_HAS_LIST_UNSUB  exists:List-Unsubscribe
meta FSL_BULK_SIG  ((DCC_CHECK || RAZOR2_CHECK || 
PYZOR_CHECK)

&& !__FSL_HAS_LIST_UNSUB)
describe FSL_BULK_SIG  Bulk signature with no Unsubscribe

Had some odd false positive due to its high score of 1,35...

It was a forgot password message... and it scored "Bulk signature with 
no

Unsubscribe".


Which is probably not wrong. Password reset messages are usually quite 
similar to each other, just like a large fraction of spam, and there's 
little point in them having unsub links.


Seems strange as it depends on DCC, Razor, Pyzor, systems that I also 
see

score wrongly.


Those are all primarily distributed bulk detectors, rather than spam 
detectors. Currently they each score about 1, so even if they all hit 
simultaneously on non-spam bulk mail (which they rarely do) there needs 
to be something else spammy about a message to push it into the 'spam' 
classification with the standard threshold of 5. If you're using a lower 
threshold (as I do) you should have carefully-managed local rules and 
score adjustments, and have a valid reason to believe that your mail 
flow fits that divergence.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: low score on very spammy email

2018-04-11 Thread Bill Cole

On 10 Apr 2018, at 18:28, Motty Cruz wrote:


 reject_rbl_client zen.spamhaus.org,
 reject_rbl_client cbl.abuseat.org,


That is redundant. The Zen list includes the CBL and Spamhaus has taken 
over operation of the CBL so there's no lag time between them any more.


Re: FORGED_GMAIL_RCVD and USER_IN_DEF_SPF_WL

2018-04-11 Thread Bill Cole

On 11 Apr 2018, at 15:28 (-0400), Alex wrote:


Hi, this message seems suspicious to me (appears to be some type of
survey), but I don't understand how it was whitelisted when google.com
is not listed among def_whitelist_from_dkim (or at least shouldn't be)


Note that google.com has historically been reserved for Google corporate 
mail, NOT GMail. Hence these rules exist in the default rules:


60_whitelist_auth.cf:def_whitelist_auth *@*.google.com
60_whitelist_dkim.cf:def_whitelist_from_dkim  
googlealerts-nore...@google.com

60_whitelist_dkim.cf:# def_whitelist_from_dkim  *@google.com



https://pastebin.com/raw/h1370F1F

I'd appreciate any clarification on what's going on here...


The envelope sender is 
3ue3owhmjamkzhabyuuhahsbe.qpzhvnthps.jvtytilzadlzalyu@trix.bounces.google.com 
and the SPF-relevant relay IP is 209.85.223.199, so SPF passes. That's 
good enough for def_whitelist_auth.


Messages of this sort make an irrefutable argument for removing the 
general pass given to Google in the default ruleset, as it is clearly 
based on a use model of the domain which no longer is true.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: URI_TRY_3LD fp's with QuickBooks Intuit emails

2018-04-13 Thread Bill Cole

On 13 Apr 2018, at 6:36 (-0400), Giovanni Bechis wrote:


On 04/13/18 09:06, Sebastian Arcus wrote:
Hello all. I am getting some fp's with emails from QuickBooks / 
Intuit with the above rule:


Apr 13 08:00:30.853 [5768] dbg: rules: ran uri rule URI_TRY_3LD 
==> got hit: "https://myturbotax.intuit.com;


On a slightly different note, and mainly for my curiosity to 
understand SA rules syntax, in 72_active.cf, the score seems to be 
commented out:


#score   URI_TRY_3LD   2.000   # limit

But when it hits, it still adds 2.0 to the score (and I haven't 
customized the score anywhere else).


That's exceedingly unusual and difficult to explain...


Is this a special form of SA syntax?


No, it is an artifact of how sandbox rules are included in the published 
rules.


the score is present in rulesrc/sandbox/jhardin/20_misc_testing.cf 
with tflags publish.

 Giovanni


Yes, but it is published in 72_scores.cf with a trivial score:

score URI_TRY_3LD   0.001 0.001 0.001 0.001



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-07 Thread Bill Cole

On 7 Apr 2018, at 11:42 (-0400), Sebastian Arcus wrote:


Do the standards really require a message id to be in all lower case?


Of course not, and that's also not an accurate description of 
MSGID_SPAM_CAPS.


A small minority of rules in SA are based on any external standard. They 
are empirical and pragmatic, not legalistic. There is a complex analysis 
of multiple mail streams  used to generate scores for the rules and to 
decide which rules are good enough to publish in updates, run on a daily 
basis because it takes most of a day to run. The fact that 
MSGID_SPAM_CAPS exists with that name (and mot with a 'T_' or 
developer's tag prefix) implies that at some point in the past it was 
reliable enough as an indicator of spam to be part of the default set.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: plugin: eval failed: __alarm__ignore__(xxx) how to troubleshoot

2018-04-20 Thread Bill Cole

On 20 Apr 2018, at 14:50 (-0400), John Hardin wrote:

Given your findings, I kinda suspect *all* of the tflags=multiple 
rules are misbehaving from time to time under 3.3.1 - the compiled 
code may be getting into an infinite loop somehow if the number if 
*real* hits on the rule exceeds some value - I note there were 17 hits 
on "your business" there.


Not ALL rules... Unless I'm addled by the past 2 days of fever, it looks 
like an example of this SA bug caused by a perl bug:


https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6558

I'm surprised RH didn't backport the fix for either perl or SA.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: anyone recognize these headers? From SA or are they from another spam product?

2018-04-24 Thread Bill Cole

On 24 Apr 2018, at 20:10 (-0400), L A Walsh wrote:


These headers (not these values) are in most or all of my emails.

In one email on the net they were adjacent to SA's headers (but they
aren't in my emails).  I was wondering if anyone knew what
product might be inserting these headers:

X-CSC: 0
X-CHA: v=1.1 cv=6jkfEoj2u7Yj9etNrzOg8LH7MfGxzbc6Xn0EJkmycus= c=1 sm=1
a=nDghuxUhq_wA:10 a=CxQU8S3nryls5r8B3V4N1Q==:17 
a=3Y9Ew-73vc-33Fzs_NIA:9
a=wPNLvfGTeEIA:10 a=z11Dn8fxQD8A:10 a=Pmo6RyrIMpYA:10 
a=zoqau9DHoPcA:10

a=zE7RolXeqPMA:10 a=CxQU8S3nryls5r8B3V4N1Q==:117
X-CTCH-Spam: Unknown
X-CTCH-RefID: 
str=0001.0A020207.521CE122.0254,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0

X-WHL: SLR


The X-CTCH-* headers are a sign of filtering software from Cyren 
(formerly Commtouch,) which has been resold or integrated by multiple 
vendors of commercial email filtering products, including Sophos and 
Ipswitch.


I don't know  if it is related, but some evidence of scanning by 
something

called 'ironport', as well as by Semantec.

I'm trying to track down what is scanning my email at an upstream mail 
host
as they've rejected random emails on initial rcpt of the msg -- 
without

accepting the message and bouncing it, but just not accepting it
with the message:

   User and password not set, continuing without authentication.
64.29.145.41 failed after I sent the message.
   Remote host said: 550 5.7.1 vB73jgO3003858 This message has been
   blocked for containing SPAM-like characteristics.


What email SW censors things by rejecting them before accepting them?


That is not a unique feature, and is widely regarded as a best practice. 
A MTA which accepts mail and later decides that it is spam has an 
insoluble problem: pass along mail which is probably malicious, bounce 
it to an inherently untrustworthy sender address that may belong to an 
innocent victim, or drop it silently.


Since this mail is being rejected immediately, you have an obvious place 
to go to get the problem fixed: whoever runs the server you're 
submitting mail to. Presumably that is an entity with whom you have a 
direct relationship.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: SpamAssassin 3.4.2.

2018-04-17 Thread Bill Cole

On 17 Apr 2018, at 16:54, John Hardin wrote:


On Tue, 17 Apr 2018, David Jones wrote:


On 04/17/2018 03:29 PM, Kevin A. McGrail wrote:

Dave, why would it go into EPEL?  SpamAssassin is a core RPM.


I will be updating my main SA platform servers to CentOS 7 this 
summer so this should be good timing to get SA 3.4.2 from the core 
repo update.  :)


RHEL 7 / CentOS 7 core is still on SA 3.4.0 - I had to manually roll 
my own SA 3.4.1 RPMs from Fedora SRPMs.


Anybody here from RH that can commit to packaging SA 3.4.2 for a RHEL 
7 core update or explain why it's behind?


It's a Red Hat long-standing stability policy. They backport security 
and some bugfix patches (which is why they have a version '3.4.0-2' RPM) 
but they do not generally import any upstream version updates that have 
any potential backward compatibility risk at all except at major EL 
version releases. So EL7 systems will never get anything but a patched 
3.4.0.


If you want to track current releases of software on something like 
RHEL, use Fedora.


Re: SpamAssassin 3.4.2.

2018-04-17 Thread Bill Cole

On 17 Apr 2018, at 18:13, David Jones wrote:


 Why hasn't the packaging in RHEL/CentOS been updated to 3.4.1?


At my last job where there were supported RHEL machines, I asked a RH 
support person a similar question regarding Postfix and got the answer: 
"If you want Fedora, you know where to get it."


Re: SpamAssassin 3.4.2.

2018-04-17 Thread Bill Cole

On 17 Apr 2018, at 16:38, David Jones wrote:


On 04/17/2018 03:29 PM, Kevin A. McGrail wrote:

Dave, why would it go into EPEL?  SpamAssassin is a core RPM.



Oh yeh.  I guess because it's been so long since we had an update and 
my main boxes are running CentOS/SL 6.9 that I forgot it was a core 
package.  The CentOS 5 and 6 boxes out there aren't going to get the 
new version unless it gets put in some other repo like EPEL or another 
third party since they are not getting any updates.


My understanding of EPEL policy is that its packages never replace the 
EL base packages.


It is often possible to install RPMs from the Fedora updates repos that 
are analogous to your EL/CentOS version.





Re: Differing scores on spamassassin checks

2018-04-16 Thread Bill Cole

On 16 Apr 2018, at 19:01 (-0400), John Hardin wrote:


On Mon, 16 Apr 2018, Computer Bob wrote:


Why should sa-learn not be run as root ?


That's a general safe practice. Do as little as root as you possibly 
can. Why risk a root crack from an unknown bug in sa-learn that 
somebody has discovered and figured out how to exploit via email?


Right: don't let malicious strangers talk to root, even via email.

ALSO: sa-learn itself won't stop you from running it as root. Without a 
global bayes_path, it will learn into ~root/.spamassassin/bayes_* files 
which no other user can access and spamd can't even TRY to use because 
it refuses to run as root and drops to 'nobody' if run by root. With a 
global bayes_path, the bayes_* files will become owned by root and 
everything else trying to use them (i.e. everything) will fail.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Lots of money, score of 0??

2018-03-27 Thread Bill Cole

On 27 Mar 2018, at 10:24, Robert Boyl wrote:


Guys,

Do you usually tune up Lots of money rule? Strange, our 
spamassassin/EFA
scores 0 and false negative. Imho it should score at least something, 
few

people would write Million dollars in an email, why not add up score?

LOTS_OF_MONEY 0.00

See https://pastebin.com/dY6iFeYL


I see a very large number of legitimate and definitely wanted messages 
hitting the LOTS_OF_MONEY rule. 849 in my own mail in the past year, 
excluding mail with quoted spam. This includes YOUR message asking about 
it.


Re: This sucks

2018-04-01 Thread Bill Cole

On 1 Apr 2018, at 12:26 (-0400), Michael Brunnbauer wrote:


So let's look at
my problem again: running my example spam through spamassassin gets it
marked as spam while using spamc+spamd does not.


This is a critical fact. It indicates that your spamd and the 
spamassassin script you are running are definitely using different 
SpamAssassin configurations, possibly different versions of the 
SpamAssassin distribution, and or possibly even different versions of 
Perl.


Determining what config the spamassassin script is using is fairly easy: 
'spamassassin -D generic,config,diag  --lint' will give you all the 
details. Figuring out what spamd is using is less simple (and 
system-specific) but since you've been maintaining a system by hand for 
a long time I expect you'll be able to figure out how to do so safely.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Spam from addresses where full name mirrors left-hand side of address

2018-04-02 Thread Bill Cole

On 2 Apr 2018, at 1:33 (-0400), Rich Wales wrote:


[I tried asking this question a couple of days ago, but I've seen no
signs that it made it out to the list -- possibly because the sample
e-mail addresses I included in my question might have caused it to be
flagged as spam.  So here goes again, this time with the addresses
mangled a bit.]

I see a lot of spam with "From:" lines where the left-hand side of the
address is essentially the same (modulo punctuation) as the "full 
name"

portion of the address.  The right-hand side, on the other hand, is a
random gibberish domain.

A few examples currently sitting in my local server's spam quarantine
(with the addresses edited so they hopefully won't trigger any spam 
checks):


    Adding To Human Lifespan 
    "Eliminate Fat Fast" jeanettejtaylor

(dot) com>
    "Home Warranty Special" racerville

(dot) com>
    Smartphone Screen Protector 

dtqmp (dot) com>

Two questions:

Is it *technically possible* to create a Spamassassin rule which would
match this sort of "From:" line?



This (UNTESTED) should do it:

header THREE_WORD_MONTY  From =~ /(\w+) (\w+) (\w+) <\1.\2.\3/



And assuming it can be done, is it *worthwhile* to do it? 


Not a clue. Maybe worth a try?

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: This sucks

2018-04-02 Thread Bill Cole

On 1 Apr 2018, at 21:09 (-0400), Michael Brunnbauer wrote:
[...]

Figuring out what spamd is using is less simple (and system-specific) 
but
since you've been maintaining a system by hand for a long time I 
expect

you'll be able to figure out how to do so safely.



This does not sound very helpful of you


Well, it's rather difficult to guess what mechanism you are using to 
start spamd and when I wrote that I had not seen your message narrowing 
down the range of possibilities to those one might use on Linux.



so I did some debugging on my own and
have more information:


So I guess I was right?

The problem only occurs only when spamd is started in the homedir of 
root. If
I start it in any other directory (including subdirs of /root), 
Net:DNS
behaves like it should: $answer->rdatastr in dnsbl_uri in Dns.pm 
contains IP
addresses in dotted quad notation, like 127.0.0.3. If I start spamd in 
/root,
$answer->rdatastr contains strings like "\# 4 7f03" instead. This 
occurs

regardless of any -x or -u flags to spamd.


OK, so that is quite odd.

Is there a tree of Perl modules under /root? Perl before 5.26 includes 
'.' in the @INC array that defines where modules might live, which could 
cause this if you have a private install tree (as is the default in some 
distros.) Also, if you run spamd as root it drops to 'nobody' but if 
you're in /root and /root/.spamassassin is world-readable, it will still 
get used as the user prefs directory.


A normal startup of spamd (by sysvinit, Upstart, systemd, etc.) is what 
you need to diagnose, not a manual startup from a login shell. None of 
those normally should put the daemon in /root as a working directory. 
Run as a proper system daemon by the normal startup subsystem, spamd 
gets a substantially different environment than if you run it by hand 
from an interactive shell.


So being in /root when started changes the behavior of spamd. Is it 
possible

that this is a timing issue? Could "\# 4 7f03" be some unprocessed
response that would be converted to 127.0.0.3 a moment later? Or is 
there

some other explanation for this?


No, it's not a timing issue. The root cause is that 
Net::DNS::RR->rdatastr() should never have been relied upon by SA to 
have any particular format because it was always poorly documented and 
quietly vanished from the documentation (but not the code) for 
Net::DNS::RR.pm  in 0.69. What it actually contains is a function of the 
specific DNS record and what server generated the response, making an 
explanation for any specific oddity something of a guessing game.


More recently, there have been multiple other changes in various 
components of the Net-DNS distribution that have caused other problems 
in SA, and they may interact with the rdatastr issue. These issues have 
all been addressed in the current SA code, both in the 'trunk' and in 
the 3.4 branch which will (hopefully soon) become the 3.4.2 release. 
Many (most? all?) packagers of SA maintaining it for major platforms 
have incorporated some or all of the necessary DNS-related fixes. I've 
attached a patch that aggregates all of the fixes to this message. You 
could also install SA from the current 3.4 branch or the last 3.4.2 
release candidate package, or if you're adventurous, from the SVN 
'trunk' that will eventually yield v4.0.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole
Index: lib/Mail/SpamAssassin/Plugin/AskDNS.pm
===
--- lib/Mail/SpamAssassin/Plugin/AskDNS.pm  
(.../tags/spamassassin_release_3_4_1/lib/Mail/SpamAssassin) (revision 
1676603)
+++ lib/Mail/SpamAssassin/Plugin/AskDNS.pm  
(.../branches/3.4/lib/Mail/SpamAssassin)(working copy)
@@ -140,7 +140,7 @@
 multiple character-strings (as defined in Section 3.3 of [RFC1035]), these
 strings are concatenated with no delimiters before comparing the result
 to the filtering string. This follows requirements of several documents,
-such as RFC 5518, RFC 4408, RFC 4871, RFC 5617.  Examples of a plain text
+such as RFC 5518, RFC 7208, RFC 4871, RFC 5617.  Examples of a plain text
 filtering parameter: "127.0.0.1", "transaction", 'list' .
 
 A regular expression follows a familiar perl syntax like /.../ or m{...}
@@ -192,10 +192,9 @@
 use Mail::SpamAssassin::Util qw(decode_dns_question_entry);
 use Mail::SpamAssassin::Logger;
 
-use vars qw(@ISA %rcode_value $txtdata_can_provide_a_list);
-@ISA = qw(Mail::SpamAssassin::Plugin);
+our @ISA = qw(Mail::SpamAssassin::Plugin);
 
-%rcode_value = (  # http://www.iana.org/assignments/dns-parameters, RFC 6195
+our %rcode_value = (  # http://www.iana.org/assignments/dns-parameters, RFC 
6195
   NOERROR => 0,  FORMERR => 1, SERVFAIL => 2, NXDOMAIN => 3, NOTIMP => 4,
   REFUSED => 5,  

Re: T_DKIM_INVALID false positives with Gmail

2018-03-19 Thread Bill Cole

On 19 Mar 2018, at 11:29, Sebastian Arcus wrote:

I've been seeing a number of false positives recently from 
T_DKIM_INVALID with Gmail emails. Are some Gmail servers 
misconfigured, or could something be going on at my end? The DKIM 
record which is flagged as invalid is below:


DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; 
d=googlemail.com; s=20161025; 
h=mime-version:from:date:message-id:subject:to;bh=8wlgvdpEOmUO2ugslPxRkFYA/ZThwu2bWy5VmlR76ug=;
b=gRcnOIzmENqS8a91mSdETdXvyH6df7u0tSwsadk6CMD0KtAbzuM3ojHW+kPEo7AB1i   
 vnbCDc/vsR6H7pP0k3hZmF7z/dAaeZWD4RVzqM+Fv70oHy4af64j+fGSekOCM9o4ShRQ
Vk3KyF+69sKTK3rRWEnfrcgi/pN2DJWDvrIBRjmFOZYKNVN+8elaVM9DOO7tEMLYuw7T   
+sVaUMNt8MuPxRhrskJYOIxK8zzkcJHYV+1TuWJuqZAHRVwgnDWX7q3Wx0GwrX+3lKpm   
   3A1+F5dBVjH4dXvdfIESm5XpV8b9uBn9daGWrUgkR+PB23XsL9QkxEqCRXdgII3FRxtQ

Ps6A==


There are LOTS of ways to break a DKIM signature. Whether that one is 
broken can't be checked and how it might have been broken can't be 
guessed at without the full *unmodified* headers and body of the 
message.


Re: Spam from compromised accounts scoring just under block threshold

2018-03-05 Thread Bill Cole

On 5 Mar 2018, at 15:14, David Jones wrote:


FYI  This could be something for KAM.cf potentially...

I have seen a few of these this morning that would be scoring just 
under the default SA threshold of 5.0 and are just under my 
MailScanner 6.0 threshold.


https://pastebin.com/r2eZJaef

I am reporting these to Spamcop but new waves of compromised accounts 
keep sending them.


They all seem to have a From address with two periods on the left side 
so something like this:


header __ODD_FROM_SPAM From:addr =~ /.{1,20}\..{1,20}\..{1,20}@/

could be combined with something else in a meta to help detect these 
and push them over the edge.


This looks intrinsically shady and could be useful:




Re: Why emails relayedfrom trusted/internal networks trigger rules?

2018-04-26 Thread Bill Cole

On 26 Apr 2018, at 3:04 (-0400), Palvelin Postmaster wrote:


Hi,

I relay mail from another server to my main mail server. I have set 
its IP 52.28.104.67 in my spamassassin conf in the internal_networks 
and trusted_networks. I assumed that would prevent spamassassin from 
scanning the messages but no. Why does this happen?


Even when SA can recognize that a message is coming via only trusted 
systems, by default it does not exempt the message from other scanning, 
it simply hits the ALL_TRUSTED rule. That rule normally has a 
significant negative score itself and is used to prevent matching in 
many 'meta' rules.



X-Spam-Status: ⁨Yes, score=6.1 required=5.0 
tests=AWL,DKIM_ADSP_NXDOMAIN, 
HELO_DYNAMIC_IPADDR,NO_DNS_FOR_FROM,RDNS_DYNAMIC,T_RP_MATCHES_RCVD 
autolearn=disabled version=3.4.1⁩


Since proper determination of the "X-Spam-Relays-*" pseudo-headers 
controls most of those hits as well as ALL_TRUSTED, getting that fixed 
will almost surely be adequate. It will also help with other rules that 
depend on identifying the boundaries between internal vs. external 
and/or trusted vs. untrusted Received headers.



Received: ⁨by palvelin.fi (CommuniGate Pro PIPE 6.2.3)


That may be your first problem. SA can't parse that as a proper Received 
header, which may trigger it to not classify the rest of the Received 
headers correctly. It's hard to tell if this is causing trouble in this 
case, because there are problems with the rest as well. With that said, 
if you can make CGP use SA via an external filter rather than delivering 
through the PIPE module, you'll get a more robust and performant 
solution without this oddball Received header. For the 8+ years I ran 
CGP systems, I used the free cgpav filter, but for modern CGP that needs 
some patching to work. I seem to recall that cgpsa is also a free tool 
that works.


Received: ⁨from [52.28.104.67] (HELO 
ip-172-31-20-213.eu-central-1.compute.internal) by palvelin.fi 
(CommuniGate Pro SMTP 6.2.3) with ESMTPS id 10108357 for 
i...@.com; Mon, 23 Apr 2018 06:35:44 +0300⁩


The Postfix MTA running on the AWS instance using 52.28.104.67 is 
grossly misconfigured. It should use a EHLO/HELO name ( myhostname, or 
smtp_helo_name if there's a reason myhostname can't be changed) that 
resolves to 52.28.104.67 and also should have proxy_interfaces set to 
52.28.104.67. It appears that 
ec2-52-28-104-67.eu-central-1.compute.amazonaws.com would be a good 
choice, but if that machine talks to anyone else you may want to post a 
non-generic name at the IP and use that.


Received: ⁨from ip-172-31-26-125.eu-central-1.compute.internal 
(ip-172-31-26-125.eu-central-1.compute.internal [172.31.26.125]) by 
ip-172-31-20-213.eu-central-1.compute.internal (Postfix) with ESMTP id 
ECF2CC0C32 for <i...@obesus.fi>; Mon, 23 Apr 2018 06:35:43 +0300 
(EEST)⁩


At this point, SA should have already given up parsing Received headers 
so the fact that this and the remaining Received headers use RFC1918 IPs 
and a generic name in a non-resolvable domain doesn't matter: SA cannot 
trust these because the chain of trust and working DNS is already 
broken.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Method of setting score for a custom rule to be the required_score ?

2018-06-28 Thread Bill Cole

On 27 Jun 2018, at 22:17, J Doe wrote:

I went back to “man Mail::SpamAssassin::Conf” and can see mention 
of the shortcircuit plugin . . . is there more documentation (perhaps 
in another man or perldoc), where the shortcircuit keyword is 
mentioned ?


perldoc Mail::SpamAssassin::Plugin::Shortcircuit

For any Perl module that has embedded 'pod' documentation, 'perldoc' 
provides the best documentation because it is extracted from the actual 
module rather than relying on a 'man' page that was almost certainly 
extracted from the module originally but may be stale.


Re: Error 74 with spamc

2018-10-22 Thread Bill Cole

On 21 Oct 2018, at 21:14, Cecil Westerhof wrote:


When executing spamc I do not get output and the exit status is 74
(EX_IOERR: IO error).


This would be the result of spamc not being able to communicate with 
spamd.


Is spamd running?
Is spamd listening on the socket that spamc is trying to connect to?

The man pages for spamc and spamd can help you understand how to 
determine the answers to these questions.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steadier Work: https://linkedin.com/in/billcole


Re: Error 74 with spamc

2018-10-22 Thread Bill Cole

On 22 Oct 2018, at 11:08, Cecil Westerhof wrote:


"Bill Cole"  writes:


On 21 Oct 2018, at 21:14, Cecil Westerhof wrote:


When executing spamc I do not get output and the exit status is 74
(EX_IOERR: IO error).


This would be the result of spamc not being able to communicate with
spamd.

Is spamd running?


Yes, spamd is running.



Is spamd listening on the socket that spamc is trying to connect to?

The man pages for spamc and spamd can help you understand how to
determine the answers to these questions.


I should have looked into the logs. :'-(


Always a wise choice.



When I run it again I see in the logging:
Oct 22 16:47:15 munus.decebal.nl spamd[17102]: spamd: connection 
from localhost [::1]:58764 to port 783, fd 5
Oct 22 16:47:15 munus.decebal.nl spamd[17102]: spamd: setuid to 
imaps succeeded
Oct 22 16:47:15 munus.decebal.nl spamd[17102]: spamd: service 
unavailable: TELL commands are not enabled, set the --allow-tell 
switch.
Oct 22 16:47:15 munus.decebal.nl spamd[17101]: prefork: child 
states: II


It is a bit strange. I had the same problem 1½ year ago. I solved it
by adding --allow-tell switch in the service file. Now it contained:
ExecStart=/usr/sbin/spamd -d --pidfile=/var/run/spamd.pid $OPTIONS

I do not see the OPTIONS defined.

I substituted --allow-tell for $OPTIONS and restarted the service. Now
it works again. But why the service file has been changed …


That would be an issue for whoever packages SA for your system. There is 
no systemd service file distributed in the SA release.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: URI_WPADMIN fp

2018-10-19 Thread Bill Cole

On 19 Oct 2018, at 9:37, Alex wrote:


Hi,

Should we be adding 3 points for just this, or is there never a reason
users should be using /wp-admin in their URLs?


The score is coming out of RuleQA, so the score is derived empirically, 
not by a logical process based in arbitrary axioms.


That doesn't mean it's the one true score for everyone, just that it's a 
useful score in the context of the spam and ham corpora submitted to 
RuleQA. If it causes actual FPs (i.e. ham that is identified as spam, 
NOT ham identified as ham that happens to hit a strong spam rule but 
scores below the threshold) then it is probably a good idea to limit its 
score in RuleQA or to examine the FPs to find ways to narrow the rule. I 
see that John has the basic rigging in place to allow for narrowing via 
meta conditions, so presumably he anticipated the possibility.



Oct 19 09:33:11.561 [1299] dbg: rules: ran uri rule __URI_WPADMIN
==> got hit: "/wp-admin/images/"

The rule description says possible phishing, but how would an end-user
be in a position to create a public link that involves their WP admin
directory in the first place?


Think more carefully about that question. As written it seems much more 
naive than you can actually be.


2 hints:

1. WordPress is probably the most frequently compromised server software 
in the history of the web, excluding Microsoft products.
2. If a website isn't built on WordPress (as most are not) there is 
nothing in any way special about a 'wp-admin' token in a functioning 
URL. I'd offer to demonstrate that with my own website, but I'm not in a 
mood to disable the trap that converts every request for a 
WordPress-like URL into a firewall rule and DNSBL entry...


Re: Status Authenticated Received Chain (ARC) Support

2018-10-17 Thread Bill Cole

On 17 Oct 2018, at 14:27, Markus Kolb wrote:


Hi,

what is the status of ARC Support 
(https://tools.ietf.org/html/draft-ietf-dmarc-arc-protocol-16)?


It is not supported in any way in SA as of 3.4.2 and I am unaware of 
anyone proposing an operational model for supporting it. There is no 
supporting code in the current 'trunk' codebase. If someone were to 
provide a reasonable model for supporting ARC in SA and a sound 
implementation, I would expect that it *could* make the 4.0.0 release. 
This would be made somewhat more likely by the draft progressing to a 
final RFC, but the critical component is really a well-designed 
implementation that provides some utility in determining whether or not 
mail is spam. It is worth noting that the utility of DKIM and hence 
DMARC to that end has been marginal.


Also note that while there will be a 3.4.3 release, there is no chance 
of this or any other completely new feature being added for it, as 3.4.3 
is intended to be the final bug fix release for the 3.x lineage.


The perl Mail-DKIM module has ARC support since version 0.50 
(https://metacpan.org/pod/release/MBRADSHAW/Mail-DKIM-0.50/lib/Mail/DKIM.pm)


Notably, that support is documented as being 10 draft revisions behind 
the current one, so it might not be wise to actually use it...


Does SpamAssassin use this feature from Mail-DKIM if this version or 
newer is available?


No.


Re: KAM_Back rule

2018-10-26 Thread Bill Cole

On 26 Oct 2018, at 15:13, John wrote:


I just got an email from a mailing list of which i am a member (UK
academic geophysics) which was scored at 5, mainly from a 5.5
contribution from KAM_BACK, described as background check SPAM.  I 
have

not managed to work out what that rule is trying to do, but it is the
first detected oh-nasty from using the KAM rules.

Clearly I can reduce the score but I am struggling to see what was
wrong with the message, attached.


There's nothing wrong with the message, the rule is too aggressive.

It consists of 5 sub-rules, 3 body and 2 header for From and Subject. 
Hitting any three satisfies the meta-rule. It seems to be targeted at 
spam selling criminal and/or financial background reports (which is a 
real market here in the US, where we have no serious privacy laws...) 
Unfortunately, it does not seem to be constructed with an appreciation 
for the fact that people discuss criminality in non-spam.


Personally, I just zeroed the score for that on my personal system. 
Thanks for bringing it to light.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: Rule for a link with an numeric IP in body?

2018-10-29 Thread Bill Cole

On 29 Oct 2018, at 9:55, Anders Gustafsson wrote:


Is there such a rule already in 3.3.x?


Do not run SpamAssassin 3.3.x. It is not safe. There have been multiple 
serious security bugs fixed in the 3.4.x series.


However, the rules for 3.3.x and 3.4.x are identical.  And yes, the rule 
"NORMAL_HTTP_TO_IP" will catch http(s) URLs using dotted-quad IPs and 
"NUMERIC_HTTP_ADDR" will catch http(s) URLs using a single decimal 
number (which some resolvers will treat as an IP address)


I would ideally want a version of that that adds to the spam score if 
it sees a x.x.x.x/unsubscribe link, possibly translated.


IP_LINK_PLUS does something similar that could be adapted pretty easily.


Asking here as regexps are not really my strong side.


Well, if you look in the standard rule distribution (under 
/var/lib/spamassassin/ or somewhere similar, depending on your platform) 
you will find the file 20_uri_tests.cf, in which all of the standard 
URI-based rules are defined, many with comments. Even if you're not a 
wizard with regexps, you may be able to find rules there which you can 
adapt to your own needs with simple changes.


Re: config files in spamasassin is unintended tlds :/

2018-11-05 Thread Bill Cole

On 5 Nov 2018, at 9:44, RW wrote:


I created an A-record at Namecheap for a_b.mydomain.tld and
neither firefox nor chromium had a problem with it.


That's interesting and unfortunate because 'a_b' is unequivocally a 
violation of the syntax for hostnames. It may be acceptable as a DNS 
label, but it isn't a valid hostname.


FWIW, BIND 9.x (since 9.4-ish) will parse and load a zone with such an A 
in it, but complains and does not serve the record: NXDOMAIN for a 
normal query, no hint of it in a zone transfer. Deep in the mists of 
time, the resolver for 'classic' MacOS (not derived from any other 
resolver) got an update that made it no longer resolve hostnames with 
underscores and while there was a brief bit of grumbling, they never 
reversed that stringency. I would guess that with some authoritative 
servers refusing to serve invalid names and some resolvers refusing to 
resolve them, it would be a low-yield tactic to use them to evade 
filtering.




Re: private networks are default rbl tested :/

2018-11-06 Thread Bill Cole

On 5 Nov 2018, at 20:04, RW wrote:


On Mon, 05 Nov 2018 23:37:59 +0100
Benny Pedersen wrote:



https://en.wikipedia.org/wiki/Private_network

why are this network not default internal_networks trusted_networks
msa_networks



They are if you let SA guess your networks. If you specify the 
networks

manually you have to specify everything


And the reason for that is simply that not everyone trusts all of the 
machines on reachable RFC1918 networks. For example, I worked for some 
years at a multinational where 10/8 was allocated globally and was 
routed globally. I had a list of specific non-local machines I was 
supposed to trust for outbound relay (and use when my outbounds couldn't 
use the local external link) but there was no way I could also trust the 
tens of thousands of other 10.* machines around the world that could 
very well be compromised personal desktops. I didn't even trust my own 
local personal desktops.


Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Bill Cole

On 7 Nov 2018, at 14:33, Amir Caspi wrote:


Hi all,

	In the past couple of weeks I've gotten a number of clearly-spam 
messages that slipped past SA, and the only reason was because they 
were getting low Bayes scores (BAYES_50 or even down to BAYES_00 or 
BAYES_05).  I do my Bayes training manually on both ham and spam so 
there should not be any mis-categorizations... and things worked fine 
until a few weeks ago, so I don't know what's going on now.


Here's the magic dump:

-bash-3.2$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db 
version

0.000  0 253112  0  non-token data: nspam
0.000  0 106767  0  non-token data: nham
0.000  0 150434  0  non-token data: ntokens
0.000  0 1536087614  0  non-token data: oldest atime
0.000  0 1541617125  0  non-token data: newest atime
0.000  0 1541614751  0  non-token data: last journal 
sync atime
0.000  0 1541614749  0  non-token data: last expiry 
atime
0.000  05529600  0  non-token data: last expire 
atime delta
0.000  0   1173  0  non-token data: last expire 
reduction count



I don't see any obvious problem but I'm not an expert at interpreting 
these...


The only useful info is the the number of spams and hams scanned (nham 
and nspam) is well above the usage threshold and the fact that the 
various timestamps (other than 'oldest atime') are reasonably recent. If 
you happen not to live in Unix epoch time, the conversion is not hard:


   # date -j -f %s 1541617125
   Wed Nov  7 13:58:45 EST 2018


Do I need to completely trash and rebuild my DB, or am I missing 
something obvious?


No and no.

Although it is perhaps helpful to recognize that Bayes is inherently 
imperfect and will always be wrong about some messages.


In many cases, it would appear that these spams have either very 
little (real) text (besides the usual attempt at Bayes poisoning) 
and/or are using HTML-entity encoding to try to bypass Bayes.  Here 
are a couple of spamples:


https://pastebin.com/peiXZivJ
https://pastebin.com/3h3r7r7j


Those both have broken MIME structure, so SA can't treat the HTML part 
as HTML. No MUA would render and display them correctly.


Assuming that you did that breakage yourself, intentionally: Stop doing 
that. It is pointless and hampers any attempt to assist you. The only 
things that could ever be private about spam are the target address and 
internally-added headers.


Does SA decode HTML entities as part of normalize_charset?  If not ... 
can this be added?


I'm not  entirely certain, but the documentation of bayes_token_sources 
in Mail::SpamAssassin::Conf implies that HTML is rendered to text to the 
point where SA can tell whether it is visible, which makes me suspect 
that the entities get decoded. But that IS just a guess: I haven't 
traced the code.


Empirically, I had SA learn a message with regular text in an HTML part 
encoded as entities and then scanned a message with the same text as 
text, and I got a 1.000 Bayes score (BAYES_999) for the second one. YMMV


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Bill Cole
[Resending because it looks like my first send went into a black 
hole...]



On 7 Nov 2018, at 14:33, Amir Caspi wrote:


Hi all,

	In the past couple of weeks I've gotten a number of clearly-spam 
messages that slipped past SA, and the only reason was because they 
were getting low Bayes scores (BAYES_50 or even down to BAYES_00 or 
BAYES_05).  I do my Bayes training manually on both ham and spam so 
there should not be any mis-categorizations... and things worked fine 
until a few weeks ago, so I don't know what's going on now.


Here's the magic dump:

-bash-3.2$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db 
version

0.000  0 253112  0  non-token data: nspam
0.000  0 106767  0  non-token data: nham
0.000  0 150434  0  non-token data: ntokens
0.000  0 1536087614  0  non-token data: oldest atime
0.000  0 1541617125  0  non-token data: newest atime
0.000  0 1541614751  0  non-token data: last journal 
sync atime
0.000  0 1541614749  0  non-token data: last expiry 
atime
0.000  05529600  0  non-token data: last expire 
atime delta
0.000  0   1173  0  non-token data: last expire 
reduction count



I don't see any obvious problem but I'm not an expert at interpreting 
these...


The only useful info is the the number of spams and hams scanned (nham 
and nspam) is well above the usage threshold and the fact that the 
various timestamps (other than 'oldest atime') are reasonably recent. If 
you happen not to live in Unix epoch time, the conversion is not hard:


   # date -j -f %s 1541617125
   Wed Nov  7 13:58:45 EST 2018


Do I need to completely trash and rebuild my DB, or am I missing 
something obvious?


No and no.

Although it is perhaps helpful to recognize that Bayes is inherently 
imperfect and will always be wrong about some messages.


In many cases, it would appear that these spams have either very 
little (real) text (besides the usual attempt at Bayes poisoning) 
and/or are using HTML-entity encoding to try to bypass Bayes.  Here 
are a couple of spamples:


https://pastebin.com/peiXZivJ
https://pastebin.com/3h3r7r7j


Those both have broken MIME structure, so SA can't treat the HTML part 
as HTML. No MUA would render and display them correctly.


Assuming that you did that breakage yourself, intentionally: Stop doing 
that. It is pointless and hampers any attempt to assist you. The only 
things that could ever be private about spam are the target address and 
internally-added headers.


Does SA decode HTML entities as part of normalize_charset?  If not ... 
can this be added?


I'm not  entirely certain, but the documentation of bayes_token_sources 
in Mail::SpamAssassin::Conf implies that HTML is rendered to text to the 
point where SA can tell whether it is visible, which makes me suspect 
that the entities get decoded. But that IS just a guess: I haven't 
traced the code.


Empirically, I had SA learn a message with regular text in an HTML part 
encoded as entities and then scanned a message with the same text as 
text, and I got a 1.000 Bayes score (BAYES_999) for the second one. YMMV


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: Warnings when enabling URILocalBL plugin

2018-11-08 Thread Bill Cole
On 8 Nov 2018, at 17:26, Kevin A. McGrail wrote:

> There are a lot of changes to GeoIP having to do with the database behind
> it being deprecated.  I think you might have to look at all the GeoIP stuff
> and would appreciate your feedback.  Bill, do you remember who was working
> on all the GeoIP stuff?

Giovanni mostly.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Bill Cole

On 8 Nov 2018, at 21:55, John Hardin wrote:


On Thu, 8 Nov 2018, Amir Caspi wrote:


On Nov 8, 2018, at 7:41 PM, John Hardin  wrote:


Sure, but I't also prefer to have a sample to test on before 
committing. I'll see if I can get the pastebin to work (i.e. fix the 
boundary)


I can send you some new spamples via attachment, privately.


No, the pastebinned ones work unaltered.


The problem with that: they are mangled in a way that prevents HTML 
interpretation of then HTML part. Hence a 'body' rule will match the 
uninterpreted entities. For the real world, i.e. with proper MIME 
structure, I think you need a 'rawbody' rule to match against the 
uninterpreted entities.


I have confirmed that de-munging the boundaries fixes them to allow 
proper MIME interpretation. In both cases, the boundary line between the 
2 MIME parts is apparently unchan ged, so you can just use it to fix the 
other 3 places that it needs to match.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: googleapis hosted phish

2018-11-15 Thread Bill Cole

On 15 Nov 2018, at 7:52, RW wrote:


On Thu, 15 Nov 2018 01:22:00 -0500
Bill Cole wrote:


On 14 Nov 2018, at 20:11, Alex wrote:


Where is it getting these long hostname strings from?


There's a bunch of garbage HTML using invisible text (font-size: 0)
between tiny bits of visible text to break Bayes and/or specific word
detection.


That particular example is actually html in a  text/plain mime 
section.


The mess in the text/plain part is a result of a botched 
rendering/tag-stripping of the insane text/html part, but yes: the 
specific misidentified domain name is in the plain part and is a result 
of a line-breaking artifact inside the rendered HTML.



--
Bill Cole


Re: googleapis hosted phish

2018-11-14 Thread Bill Cole

On 14 Nov 2018, at 20:11, Alex wrote:


Where is it getting these long hostname strings from?


There's a bunch of garbage HTML using invisible text (font-size: 0) 
between tiny bits of visible text to break Bayes and/or specific word 
detection. The overly-thirsty "URI" parser strings this junk together 
and is seeing .az\b somewhere in it, and picks it up as a 
domain name. It's noisy in debug output but in this case harmless 
because what it is seeing includes a hostname that's too long to be a 
DNS label.


FWIW, that junk can be detected with rawbody rules looking for 
idiosyncratic HTML. I don't publish my local rules which do that sort of 
thing because they are very useful but very evadable and I suspect that 
if the precise rules were broadcast, they'd stop being useful in a 
matter of days. Instead, it would be really good if everyone maintaining 
their own local rules would take that hint and devise an invisible 
forest of slightly different rules to catch HTML structures with no 
legitimate purpose, making it impossible for spammers to get around a 
single rule published in the default channel or KAM.cf or anything else 
known to be under spammers' watch.


(CAVEAT: For some reason, a lot of opt-in political bulk mail also 
catches on such rules.)


Should we be rethinking whether googleapis.com should be in the DNSBL 
skip list?


I think it may deserve a special rule all its own (with extensive FP 
shielding) but I suspect that you will never see it in a URIDNSBL that 
is safe to use, so it would do no good to keep resolving 
storage.googleapis.com and other such names with short-TTL CNAME records 
pointing to shorter-TTL A records on a frequent basis only to determine 
that it will never get listed OR that you're using a URIDNSBL which 
intends to generate widespread collateral damage.


Of course, I could be wrong. You could test how wrong I might be with 
this:


clear_uridnsbl_skip_domain  googleapis.com



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: config files in spamasassin is unintended tlds :/

2018-11-04 Thread Bill Cole
On 4 Nov 2018, at 11:45, Grant Taylor wrote:

> Why does it matter if there's a naming collision between DNS domain names and 
> file names?

Discussion of config files for SpamAssassin and Postfix has intermittently been 
matched by URI DNSBLs. Some years ago I discovered just how widespread dumb 
bounce models were when I talked about the master config file for Postfix on 
the Postfix Users list, the same week that someone was spamvertising URLs under 
master (dot) cf.

-- 
Bill Cole


signature.asc
Description: OpenPGP digital signature


Re: config files in spamasassin is unintended tlds :/

2018-11-04 Thread Bill Cole

On 4 Nov 2018, at 16:27, Henrik K wrote:


Can someone actually register and use a domain with underscore in it?


No.

It is worth noting that the SA "standard" for what is treated as a 
domain part of an URI is grounded in how MUAs behave, not in conformance 
to to any well-defined specification. I recall a conversation I had 
either here or in a bug with Kevin McGrail some years back in which I 
argued that "could be a domain name in a URI" was too broad a definition 
and lost badly on the fact that most of my examples of "Not A URI" were 
in fact turned into clickable links by some horrific MUA.


I support the concept of not treating domain-name-like strings that are 
not valid hostnames as if they are URI domain-parts. That would mean 
anything with an underscore. It MIGHT be more prudent to exempt 
leading-underscore labels, as those can be legal domain names that could 
have CNAME or DNAME records mapping them to working hostnames.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: config files in spamasassin is unintended tlds :/

2018-11-04 Thread Bill Cole

On 4 Nov 2018, at 14:48, Matus UHLAR - fantomas wrote:


On 4 Nov 2018, at 11:45, Grant Taylor wrote:
Why does it matter if there's a naming collision between DNS domain 
names and file names?



Bill Cole skrev den 2018-11-04 19:25:

Discussion of config files for SpamAssassin and Postfix has
intermittently been matched by URI DNSBLs. Some years ago I 
discovered

just how widespread dumb bounce models were when I talked about the
master config file for Postfix on the Postfix Users list, the same
week that someone was spamvertising URLs under master (dot) cf.


On 04.11.18 19:48, Benny Pedersen wrote:
Nov  3 03:22:50 localhost named[2301]: connection refused resolving 
'72_scores.cf/NS/IN': 2a04:1b00:6::1#53

[...]
Oct 31 08:30:38 localhost named[2301]: connection refused resolving 
'20_imageinfo.cf/NS/IN': 2a04:1b00:6::1#53


so ns.cf blocks my named now, i cant resolve any cf domains with it

time to change imho


I recommend chasing who is treating those as URLs.


That would be SpamAssassin itself. The policy of treating anything 
matching '[-a-zA-Z0-9_]+\.' as an URI in all contexts dates 
back to v3.3.1 at least. See 
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6716 and note this 
scan of a recent message:



# spamassassin -t -D uridnsbl  
/tmp/mdpreserve.42nj0s5F0hz1hSbGh/INPUTMSG 2>&1 |pcregrep 
'\.cf\b|^(From|Subject|Date|Message-Id): '
Nov  4 15:55:21.684 [55625] dbg: uridnsbl: considering 
host=72_scores.cf, domain=72_scores.cf
Nov  4 15:55:21.720 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
X_URIBL_A DNSBL:72_scores.cf:dnsbltest.spamassassin.org
Nov  4 15:55:21.721 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
X_URIBL_B DNSBL:72_scores.cf:dnsbltest.spamassassin.org
Nov  4 15:55:21.721 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
X_URIBL_DOMSONLY DNSBL:72_scores.cf:dnsbltest.spamassassin.org
Nov  4 15:55:21.722 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_RHS_DOB DNSBL:72_scores.cf:dob.sibl.support-intelligence.net
Nov  4 15:55:22.051 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_MW_SURBL DNSBL:72_scores.cf:multi.surbl.org
Nov  4 15:55:22.051 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_WS_SURBL DNSBL:72_scores.cf:multi.surbl.org
Nov  4 15:55:22.052 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_PH_SURBL DNSBL:72_scores.cf:multi.surbl.org
Nov  4 15:55:22.052 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_CR_SURBL DNSBL:72_scores.cf:multi.surbl.org
Nov  4 15:55:22.052 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_ABUSE_SURBL DNSBL:72_scores.cf:multi.surbl.org
Nov  4 15:55:22.052 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
SURBL_BLOCKED DNSBL:72_scores.cf:multi.surbl.org
Nov  4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_MALWARE DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_ABUSE_PHISH DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_BOTNETCC DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_PHISH DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_ABUSE_REDIR DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_ERROR DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_ABUSE_SPAM DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_SPAM DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.053 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_ABUSE_BOTCC DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.054 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_DBL_ABUSE_MALW DNSBL:72_scores.cf:dbl.spamhaus.org
Nov  4 15:55:22.054 [55625] dbg: uridnsbl: complete_ns_lookup 
NS:72_scores.cf
Nov  4 15:55:22.055 [55625] dbg: uridnsbl: complete_a_lookup 
A:72_scores.cf
Nov  4 15:55:22.056 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
KAM_BODY_COMPROMISED_URIBL_PCCC DNSBL:72_scores.cf:wild.pccc.com
Nov  4 15:55:22.056 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_RED DNSBL:72_scores.cf:multi.uribl.com
Nov  4 15:55:22.056 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_BLOCKED DNSBL:72_scores.cf:multi.uribl.com
Nov  4 15:55:22.057 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_GREY DNSBL:72_scores.cf:multi.uribl.com
Nov  4 15:55:22.057 [55625] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_BLACK DNSBL:72_scores.cf:multi.uribl.com
Subject: svn commit: r1845712 - in /spamassassin/trunk/rulesrc/scores: 
72_scores.cf

Date: Sun, 04 Nov 2018 04:06:19 -
From: spamassassin_r...@apache.org
Message-Id: <20181104040619.bb2623a0...@svn01-us-west.apache.org>
Date: Sun Nov  4 04:06:18 2018
spamassassin/trunk/rulesrc/scores/72_scores.cf
Modified: spamassassin/trunk/rulesrc/scores/72_

Re: FPs on FORGED_MUA_MOZILLA (for my own hand-typed messages from my latest-version Thunderbird)

2018-10-02 Thread Bill Cole

On 2 Oct 2018, at 9:36, Rob McEwen wrote:

SIDE NOTE: I don't think there was any domain my message that was 
blacklisted on URIBL - so I can't explain the "URIBL_BLOCKED", but 
that only scored 0.001, so that was innocuous. I suspect that that 
rule is malfunctioning on their end, and then they changed the score 
to .001 - so just please ignore that for the purpose of this 
discussion.


No, "URIBL_BLOCKED" means that the URIBL DNS returned a value that is 
supposed to be a message to a mail admin that they are using URIBL wrong 
and will nevewr get a useful answer without either (1) paying for a feed 
to support their usage volume or (2) using their own recursive resolver 
instead of forwarding queries to the likes of Google, OpenDNS, & 
CloudFlare.


A mail filtering system that gets URIBL_BLOCKED hits is broken. A mail 
filtering system that gets them chronically is mismanaged.


Re: FPs on FORGED_MUA_MOZILLA (for my own hand-typed messages from my latest-version Thunderbird)

2018-10-02 Thread Bill Cole

On 2 Oct 2018, at 13:39, Matus UHLAR - fantomas wrote:


On 2 Oct 2018, at 9:36, Rob McEwen wrote:
SIDE NOTE: I don't think there was any domain my message that was 
blacklisted on URIBL - so I can't explain the "URIBL_BLOCKED", but 
that only scored 0.001, so that was innocuous. I suspect that that 
rule is malfunctioning on their end, and then they changed the score 
to .001 - so just please ignore that for the purpose of this 
discussion.


On 02.10.18 11:48, Bill Cole wrote:
No, "URIBL_BLOCKED" means that the URIBL DNS returned a value that is 
supposed to be a message to a mail admin that they are using URIBL 
wrong


A mail filtering system that gets URIBL_BLOCKED hits is broken. A 
mail filtering system that gets them chronically is mismanaged.


Nonsense. There is no such implication here. While URIBL_BLOCKED may 
and
most of the time apparently does mean that system uses DNS server 
shared
with too many clients, any system that receives and checks too much 
mail may
get URIBL_BLOCKED just because they have crossed the limit, withous 
using it

wrong or being broken.


Operating a system in a manner which chronically crosses that limit is 
abusive.


The DNS reply that results in URIBL_BLOCKED is not "free" for the URIBL 
operators and depending on their software may be as expensive as sending 
a real reply. It has the advantage over simply dropping abusive queries 
that it does not impose timeout delays on abusive queriers and sends a 
clear signal that can and should be acted upon.


Re: Dependency: fetch binary

2018-09-23 Thread Bill Cole
On 23 Sep 2018, at 10:56 (-0400), Jari Fredriksson wrote:

> What is this binary?

It's a core FreeBSD utility used to fetch remote files.

> I could not find any package providing this… I need it for debian (Raspbian) 
> and CentOS 7.

As Kevin noted, you do not.

-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


signature.asc
Description: OpenPGP digital signature


Re: Bayes not learning, blacklist not filtering

2018-11-16 Thread Bill Cole

On 15 Nov 2018, at 14:27, MarkCS wrote:

So I've been tasked with researching an issue with the mail server at 
work.
We use Spamassassin and at present, it's not blocking some pretty 
obvious
spam, largely from the domain qq.com. Basically email is slipping 
through,
being bounced back at the end receiving server, then our server tries 
to
bounce back to qq.com, which doesn't exist at that point and we get a 
bounce

message. Hundreds of these suckers are coming through daily.


As John said, absolutely blocking a whole domain is best done before 
SpamAssassin, in the MTA (in your case that looks like Postfix.)


In fact, all of John's reply was good. There's one thing he was probably 
too polite to mention though...



X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on 


Upgrade SA. 3.3.2 is antique and hasn't seen any updates in (as note) 7+ 
years. Each 3.4.x release has added useful functionality. Substantial 
parts of the default ruleset are wrapped in version checks because they 
demand 3.4.x features.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: SPF weirdness...

2019-01-15 Thread Bill Cole

On 15 Jan 2019, at 11:08, Grant Taylor wrote:

Does anybody know off the top of their head—don't dig, I'll do that 
later—what might cause SpamAssassin to apply SPF processing to 
earlier Received: headers (lower in the message source)?


Check both the contents and documentation of trusted_networks, 
msa_networks, and internal_networks.


I'm seeing SpamAssassin claim that a message failed SPF processing 
based on chronologically earlier internal Received: headers.  
Conversely, the connection to my SMTP server are perfectly acceptable 
with the published SPF record.


If SA thinks a prior hop is through a machine that writes trustworthy 
Received headers and is a normal part of your relay path, it will check 
SPF there.


There MAY be a design bug there. I'm not sure how SA deals with a 
machine you trust and which is a normal inbound relay that also is 
SPF-approved for mail it gets from other places. Maybe msa_networks can 
solve this.


I just noticed this and will look into it further later as soon as 
time permits.  I'm hoping that someone may have a 15 second knee-jerk 
"check this or that" type response.


Thank you in advance.



--
Grant. . . .
unix || die


Re: SPF weirdness...

2019-01-15 Thread Bill Cole
On 15 Jan 2019, at 12:15, Grant Taylor wrote:

> On 01/15/2019 09:24 AM, Kevin A. McGrail wrote:
>> What is your glue for SA?  Is it getting the received header you are 
>> expecting in time for the parsing?
>
> Both SA and my spfmilter are are milters on the same inbound Internet edge 
> MTA.
>
> I will have to research to see if the header is added by the time that SA 
> checks things.
>
> I do know that the Received: header isn't there by the time that SA runs.  I 
> don't know if my MTA has added the proper Authentication-Results: header yet 
> or not.
>
> …
>
> As sure as I type that, "…the Received: header isn't there…", which may mean 
> that SA is running the contents of the previous Received: header through SPF 
> checks.
>
> …
>
> That seems to be part of the problem.
>
> Thank you Kevin.  I now have something more specific to investigate.

This strikes me as a flaw in whatever milter you're using. Some (e.g. 
MIMEDefang) milters deal with the fact that they don't get a local Received 
header by constructing one from what they know before passing the message to SA.


signature.asc
Description: OpenPGP digital signature


Re: SPF weirdness...

2019-01-15 Thread Bill Cole
On 15 Jan 2019, at 14:24, Grant Taylor wrote:

> On 01/15/2019 11:39 AM, Bill Cole wrote:
>> This strikes me as a flaw in whatever milter you're using. Some (e.g. 
>> MIMEDefang) milters deal with the fact that they don't get a local Received 
>> header by constructing one from what they know before passing the message to 
>> SA.
>
> The SPF milter is constructing the header.  I assume that it's doing so 
> properly.  At least the headers I see coming out of the MTA are correct.
>
> I think that SpamAssassin is looking for a header that isn't there yet. -  
> Both SpamAssassin and my SPF filter are hooked into the same MTA as milters.  
> So both of them see the message before it's accepted and all headers new are 
> added.
>
> I don't know if the SPF milter can add the header sooner, or if that is 
> controlled by the MTA.
>
> I would also like SpamAssassin to use the information available to it via the 
> milter interface instead of relying on a header.

Let me clarify...

There are at many different milters that can use SpamAssassin listed at 
https://wiki.apache.org/spamassassin/IntegratedInMta#Integrated_into_Sendmail. 
Some links there may be dead.

SpamAssassin is not a milter. SpamAssassin knows nothing about message 
parameters passed through the milter interface between a MTA and a milter. The 
ONLY message data that SpamAssassin knows about is what it gets in a 
RFC822/2822/5322 format message with parseable headers.

A milter that uses SpamAssassin can modify the message that it receives via the 
milter interface before passing it to SpamAssassin for analysis. This allows 
the milter to inform SpamAssassin of facts that SpamAssassin can use, such as 
the SMTP client address, envelope sender and recipients, and whatever else it 
gets from the MTA. For SpamAssassin to do SPF calculations it needs to have a 
Received header and envelope sender, which can be embedded in headers that are 
added by a milter that uses SpamAssassin.

signature.asc
Description: OpenPGP digital signature


Re: SPF weirdness...

2019-01-15 Thread Bill Cole
On 15 Jan 2019, at 15:05, Grant Taylor wrote:

> I will investigate to see if spamass-milter can fabricate a satisfactory 
> Received: header.

A quick look at the issue tracker for it implies that it does so. A milter that 
actually works with SA really needs to.

Unfortunately, it is a nuisance to debug spamass-milter because it talks to 
spamc which talks to spamd, so you need to give debug flags to the 
spamass-milter process and spamd to see exactly what's going on.

signature.asc
Description: OpenPGP digital signature


Re: Phishing.pm

2019-01-21 Thread Bill Cole

On 21 Jan 2019, at 13:58, Rick Cooper wrote:


Giovanni Bechis wrote:

Il 13 gennaio 2019 21:52:19 CET, Giovanni Bechis 
ha scritto:
Il 13 gennaio 2019 20:22:40 CET, Ian Evans  
ha

scritto:

Running 3.4.2, spamd daemon.

Just enabled the new Phishing.pm plugin but wondering about the
data feeds. Is that something we need to set up a cron to wget or
does the plugin handle it? Unless my google fu is weak due to a
lack of caffeine, I couldn't find any doc on setting it up.

Thanks for any advice.


try Mail::SpamAssassin::Plugin::Phishing

 Cheers
Giovanni


man Mail::SpamAssassin::Plugin::Phishing
to be precise.
   Giovanni


Something that isn't answered in the docs is the default score


If you define a rule using the plugin, you must either give it a score 
or it will have the default score of any rule: 1.0.


Note that because the plugin is disabled by default, the default ruleset 
distributed via sa-update does not include a rule using the plugin and 
so you must define a rule as documented for the plugin to be used at 
all.



and I am
wondering if SA has to be restarted after each update of the data or 
does it

reread each time the plugin is called


It seems to me that the data file is re-read for each scan, so no 
restart is needed. even if I'm mis-reading, it would be re-read for each 
new spamd child process (or mimedefang worker) so a restart would not be 
*needed* if you can tolerate a delay until children are respawned.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: Phishing.pm

2019-01-22 Thread Bill Cole
[Pulling this conversation back on-list where I can misinform everyone 
publicly]


On 22 Jan 2019, at 5:04, Ian Evans wrote:


On Tue, Jan 22, 2019 at 2:15 AM Bill Cole <
sausers-20150...@billmail.scconsult.com> wrote:


[snip]
Note that because the plugin is disabled by default, the default 
ruleset
distributed via sa-update does not include a rule using the plugin 
and

so you must define a rule as documented for the plugin to be used at
all.



One thing I'm not clear on:

a) do we need to add this to local.cf:

  ifplugin Mail::SpamAssassin::Plugin::Phishing
phishing_openphish_feed /etc/mail/spamassassin/openphish-feed.txt
phishing_phishtank_feed /etc/mail/spamassassin/phishtank-feed.csv
body URI_PHISHING  eval:check_phishing()
describe URI_PHISHING  Url match phishing in feed
  endif


Yes. You may want to only use one of the two feeds, put the feed file(s) 
in different places, or name the rule something other than URI_PHISHING, 
but you need to have a body eval rule calling check_phishing() and the 
path to at least one of the feeds specified.


and b) is that sufficient to "define a rule as documented for the 
plugin to

be used at
all."


Yes.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: Subtest __E_LIKE_LETTER and __LOWER_E listed many times in message header

2018-12-10 Thread Bill Cole
On 9 Dec 2018, at 18:23, Chris Pollock wrote:

> On Sun, 2018-12-09 at 13:06 -0500, Bill Cole wrote:
>> On 9 Dec 2018, at 12:04, Chris Pollock wrote:
>>
>>> This is probably very trivial and doesn't affect anything except
>>> maybe
>>> the size of the headers but I have to ask. When looking at the
>>> headers
>>> of some ham I noticed - https://pastebin.com/H7euxqVX the two rules
>>> I
>>> mention above are in 72_active.cf. Is there a reason for the number
>>> of
>>> times it's listed? Couldn't each subtest be listed just once
>>> instead
>>> of
>>> multiple times?
>>
>> Not with the current documented behavior of the code, given the way
>> those sub-rules are designed to work together. The goal is to
>> identify
>> messages which use Latin-script 'e' characters but also use many
>> non-Latin-script characters which look like 'e' but are not. To make
>> this determination, the rules require the 'multiple' flag without a
>> cap
>> on thne number of matches which a 'maxhits' parameter would set.
>
> Got it, thanks Bill. I've never noticed this before. I also noticed
> that according to my daily sa-update output this subtest is apparently
> new or at least it didn't appear in the output until this past Fri.

Correct. See the thread with the subject "No longer just embedded =9D 
characters in blackmail emails" here last week for the background.

>>
>> It is not recommended to routinely add the list of matched sub-rules
>> to
>> scanned messages.
>>
> Any specific reason why? This is just on my home system.

It's got the potential to be VERY noisy (as you've discovered) while not really 
providing much useful info.  Not a big deal on a small system.


Anyway, as of today I've capped those 2 subrules at levels which leave ample 
space to still match the target spam. Should show up in tomorrow's update.


signature.asc
Description: OpenPGP digital signature


Re: Spamassassin using remote rules definition source?

2018-12-10 Thread Bill Cole

On 10 Dec 2018, at 13:28, ozgurerdogan wrote:


Can you give me some more step by step for :

"set up your own local published ruleset source and configure your
instances to include that in their rule sources for the standard 
sa-update
processing (will require managing DNS entries and generating SHA 
checksums

for the rules file) "

This is what I needed. Thank you everyone by the way.


The setup John refers to is fully documented at 
https://wiki.apache.org/spamassassin/PublishingRuleUpdates




Re: Subtest __E_LIKE_LETTER and __LOWER_E listed many times in message header

2018-12-13 Thread Bill Cole
On 13 Dec 2018, at 16:24, Chris Pollock wrote:

> On Thu, 2018-12-13 at 15:14 -0600, Chris Pollock wrote:
>> On Tue, 2018-12-11 at 19:00 -0500, Bill Cole wrote:
>>> On 11 Dec 2018, at 16:37, Chris Pollock wrote:
>>>
>>>> On Mon, 2018-12-10 at 13:09 -0500, Bill Cole wrote:
>>>
>>> [...]
>>>>> Anyway, as of today I've capped those 2 subrules at levels
>>>>> which
>>>>> leave ample space to still match the target spam. Should show
>>>>> up
>>>>> in
>>>>> tomorrow's update.
>>>
>>> I was wrong. The addition of a 'maxhits' parameter to the two
>>> subrules apparently didn't get committed in time for the nightly
>>> rule
>>> promotion run. It was in r1848602 and the current ruleset is still
>>> at
>>> r1848555. Assuming all goes well tonight, the change will appear
>>> tomorrow.
>>>
>>
>> Shouldn't this have stopped by now - https://pastebin.com/7260daT3
>> Today's update was '1848731'.
>>
> Hit send too fast. Doing a compare between 72_active.cf dated the 11th
> and the one dated today I do see:
>
> Dated 11 Dec
> if can(Mail::SpamAssassin::Conf::feature_bug6558_free)
>   ifplugin Mail::SpamAssassin::Plugin::ReplaceTags
> body__E_LIKE_LETTER //
> tflags  __E_LIKE_LETTER multiple
>
> if can(Mail::SpamAssassin::Conf::feature_bug6558_free)
>   ifplugin
> Mail::SpamAssassin::Plugin::ReplaceTags
> body__LOWER_E
> /e/i
> tflags  __LOWER_E   multiple
>
> Dated today 13 Dec
> if can(Mail::SpamAssassin::Conf::feature_bug6558_free)
>   ifplugin Mail::SpamAssassin::Plugin::ReplaceTags
> body__E_LIKE_LETTER //
> tflags  __E_LIKE_LETTER multiple maxhits=400
>
> if can(Mail::SpamAssassin::Conf::feature_bug6558_free)
>   ifplugin
> Mail::SpamAssassin::Plugin::ReplaceTags
> body__LOWER_E
> /e/
> tflags  __LOWER_E   multiple maxhits=250
>
> IIUC then __E_LIKE_LETTER can hit a max of 400 times in one message and
> __LOWER_E a max of 250 times in one message.

For now, yes. Those numbers were the result of a mis-think on my part and will 
be 320 and 230 once the current rev works its way through.

> Therefore I may still have
> a large listing of subtest ran.

Yes.
I don't expect that behavior to change. SA has always tallied rules and 
sub-rules with multiple matches and the 'multiple' tflag this way and I see no 
compelling reason to change that. It almost certainly will not change for 
3.4.3, which should be the last 3.4.x release.
If there's a bug opened and someone is willing to work on code for whatever 
changes need to be made to collapse duplicate hit names in the lists of rule 
matches into a single citation with a count of hits, I expect that change would 
be accepted for v4, even though it may impact existing users' tooling.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


signature.asc
Description: OpenPGP digital signature


Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]

2018-12-20 Thread Bill Cole
On 20 Dec 2018, at 11:55, Marcus Schopen wrote:

> Am Donnerstag, den 20.12.2018, 12:35 +0100 schrieb Marcus Schopen:
>> Hi,
>>
>> I get a warning, when updating the channel:
>>
>> --
>> config: warning: description exists for non-existent rule EXCUSE_24
>>
>> channel: lint check of update failed, channel failed
>> sa-update failed for unknown reasons
>> --
>
> seems not to be a problem of the EXCUSE_24 rule, but a general problem
> with sa-update, as other users do have the same problem since today.


This should now be fixed for the next rules update.



Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]

2018-12-20 Thread Bill Cole

On 20 Dec 2018, at 13:41, Bill Cole wrote:


This should now be fixed for the next rules update.


And, On 20 Dec 2018, at 17:04, (ignoring an explicit Reply-To header in 
a direct message to me!) Frank Giesecke wrote:



How can I force the rules update?


You cannot. The "rules update" I referred to is the one that runs every 
night on an Apache infrastructure host, to update the default rules 
channel. The update completes around 03:30 UTC.



I still get the error on my Debian system.


If you cannot wait 5 more hours and have an updated SVN checkout of the 
'trunk' code, you can run:


make clean ; echo |perl Makefile.PL ; make build_rules

That will leave a proper set of rules files in the rules/ directory. If 
you copy rules/72_active.cf to your local site-wide rules directory 
(probably  /var/lib/spamassassin/3.004002/updates_spamassassin_org/) you 
will fix the worst effects of last night's broken update.


We've had a few occurrences of essentially the same problem (a bad rules 
package due to an ignored lint failure in a nightly update) over the 
past few years. In addition to correcting the problematic rule I have 
also fixed the script which intentionally (!) masked the lint failure 
and allowed the broken rules package to be built and distributed.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]

2018-12-20 Thread Bill Cole

On 20 Dec 2018, at 17:54, Bill Cole wrote:

If you cannot wait 5 more hours and have an updated SVN checkout of 
the 'trunk' code, you can run:


make clean ; echo |perl Makefile.PL ; make build_rules

That will leave a proper set of rules files in the rules/ directory. 
If you copy rules/72_active.cf to your local site-wide rules directory 
(probably  /var/lib/spamassassin/3.004002/updates_spamassassin_org/) 
you will fix the worst effects of last night's broken update.



It has been pointed out to me that a simpler and less error-prone fix 
would be to revert to the prior day's rule collection:


   mkdir /tmp/saupdate-1849156
   cd $_
   curl -O http://sa-update.spamassassin.org/1849156.tar.gz
   curl -O http://sa-update.spamassassin.org/1849156.tar.gz.asc
   curl -O http://sa-update.spamassassin.org/1849156.tar.gz.sha245
   curl -O http://sa-update.spamassassin.org/1849156.tar.gz.sha512
   sa-update -D --install 1849156.tar.gz



Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]

2018-12-20 Thread Bill Cole

On 20 Dec 2018, at 17:56, Kevin A. McGrail wrote:


We've had a few occurrences of essentially the same problem (a bad
rules package due to an ignored lint failure in a nightly update) 
over

the past few years. In addition to correcting the problematic rule I
have also fixed the script which intentionally (!) masked the lint
failure and allowed the broken rules package to be built and 
distributed.




The file shouldn't get installed though because sa-update checks the
lint, doesn't it?


It depends on why the lint failed in the update process and on the local 
config. In the immediate case, sa-update installed the bad package.


The root cause of this particular failure was a 'replace_tag' rule that 
was outside an 'ifplugin Mail::SpamAssassin::Plugin::ReplaceTags' block. 
Because 'make build_rules' runs with minimal plugins loaded, the rule 
failed to parse and the design error in the mkrules script papered over 
the problem with an empty 72_active.cf. The rules package was assembled 
correctly with that empty file. When tested by sa-update after download, 
the rules pass lint because the file where the 'bad' rule would have 
gone was empty.




Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]

2018-12-21 Thread Bill Cole
On 21 Dec 2018, at 15:57, Michael Orlitzky wrote:

> On 12/20/18 7:00 PM, Bill Cole wrote:
>>
>>  mkdir /tmp/saupdate-1849156
>
> Never use a fixed path under /tmp =)


Fine:

#!/bin/sh
cd `mktemp -d -t HappyMichael???`
curl -O http://sa-update.spamassassin.org/1849156.tar.gz
curl -O http://sa-update.spamassassin.org/1849156.tar.gz.asc
curl -O http://sa-update.spamassassin.org/1849156.tar.gz.sha245
curl -O http://sa-update.spamassassin.org/1849156.tar.gz.sha512
sa-update -D --install 1849156.tar.gz



Re: Another form of obfuscation email.

2018-12-10 Thread Bill Cole

On 10 Dec 2018, at 14:13, RW wrote:


On Mon, 10 Dec 2018 12:45:53 -0500
Mark London wrote:


Hi - Here's another form of obfuscation spam.  This time, not a porn
blackmail one.   Almost the whole text is obfuscated.

https://pastebin.com/VURwmrrF



You say obfuscated, but it looked completely unreadable to me.


The text/plain part is garbage, but the text/html part renders to a 
mostly readable phish.


--
Bill Cole


<    1   2   3   4   5   6   7   8   9   10   >